For doing something before starting the pipeline, can you do it in the main
program? The only disadvantage I can see is that it wouldn't be amenable to
using templates (ValueProvider's) - is that the blocker?

For doing something after a transform finishes processing a window of a
PCollection - we already have a thread about that, and it's a hard problem
that we're already thinking about but don't have a general solution yet;
I'd suggest to keep the discussion on that thread.

Minor note on terminology: PCollection's don't run - the same way as
filenames or database tables don't run: the thing that runs is PTransforms;
PCollections might not even physically exist
<https://cloud.google.com/dataflow/service/dataflow-service-desc#fusion-optimization>.
However, you could say that a PCollection is being produced (while its
producing transform runs).

On Thu, Sep 14, 2017 at 12:19 AM Chaim Turkel <ch...@behalf.com> wrote:

> My use case is that I have generic code to transfer for example tables
> from mongo to bigquery. I iterate over all tables in mongo and create
> a PCollection for each. But there are things that need to be checked
> before running, and to run only if validated.
> I tried the visitor but there is no way to stop a PCollection from running.
>
> It would be nice to have hooks that during run time (not graph time) I
> can decide on the PBegin not to start
>
> chaim
>
> On Thu, Sep 14, 2017 at 9:25 AM, Jean-Baptiste Onofré <j...@nanthrax.net>
> wrote:
> > Hi,
> >
> > I don't think it makes sense on a transform (as it expects a
> PCollection).
> > However, why not introducing a specific hook for that.
> >
> > I think you can workaround using a Pipeline Visitor, but it would be
> runner
> > level.
> >
> > Regards
> > JB
> >
> >
> > On 09/14/2017 08:21 AM, Chaim Turkel wrote:
> >>
> >> Hi,
> >>    I have a few scenarios where I would like to have code that is
> >> before the PBegin and after the PDone.
> >> This is usually for monitoring purposes.
> >> It would be nice to be able to transform from PBegin to PBegin, and
> >> PDone to PDone, so that code can be run before and after and not in
> >> the driver program
> >>
> >>
> >> chaim
> >>
> >
> > --
> > Jean-Baptiste Onofré
> > jbono...@apache.org
> > http://blog.nanthrax.net
> > Talend - http://www.talend.com
>

Reply via email to