For doing something before starting the pipeline, can you do it in the main program? The only disadvantage I can see is that it wouldn't be amenable to using templates (ValueProvider's) - is that the blocker?
For doing something after a transform finishes processing a window of a PCollection - we already have a thread about that, and it's a hard problem that we're already thinking about but don't have a general solution yet; I'd suggest to keep the discussion on that thread. Minor note on terminology: PCollection's don't run - the same way as filenames or database tables don't run: the thing that runs is PTransforms; PCollections might not even physically exist <https://cloud.google.com/dataflow/service/dataflow-service-desc#fusion-optimization>. However, you could say that a PCollection is being produced (while its producing transform runs). On Thu, Sep 14, 2017 at 12:19 AM Chaim Turkel <ch...@behalf.com> wrote: > My use case is that I have generic code to transfer for example tables > from mongo to bigquery. I iterate over all tables in mongo and create > a PCollection for each. But there are things that need to be checked > before running, and to run only if validated. > I tried the visitor but there is no way to stop a PCollection from running. > > It would be nice to have hooks that during run time (not graph time) I > can decide on the PBegin not to start > > chaim > > On Thu, Sep 14, 2017 at 9:25 AM, Jean-Baptiste Onofré <j...@nanthrax.net> > wrote: > > Hi, > > > > I don't think it makes sense on a transform (as it expects a > PCollection). > > However, why not introducing a specific hook for that. > > > > I think you can workaround using a Pipeline Visitor, but it would be > runner > > level. > > > > Regards > > JB > > > > > > On 09/14/2017 08:21 AM, Chaim Turkel wrote: > >> > >> Hi, > >> I have a few scenarios where I would like to have code that is > >> before the PBegin and after the PDone. > >> This is usually for monitoring purposes. > >> It would be nice to be able to transform from PBegin to PBegin, and > >> PDone to PDone, so that code can be run before and after and not in > >> the driver program > >> > >> > >> chaim > >> > > > > -- > > Jean-Baptiste Onofré > > jbono...@apache.org > > http://blog.nanthrax.net > > Talend - http://www.talend.com >