Re: PBegin, PDone

2017-09-14 Thread Eugene Kirpichov
For doing something before starting the pipeline, can you do it in the main
program? The only disadvantage I can see is that it wouldn't be amenable to
using templates (ValueProvider's) - is that the blocker?

For doing something after a transform finishes processing a window of a
PCollection - we already have a thread about that, and it's a hard problem
that we're already thinking about but don't have a general solution yet;
I'd suggest to keep the discussion on that thread.

Minor note on terminology: PCollection's don't run - the same way as
filenames or database tables don't run: the thing that runs is PTransforms;
PCollections might not even physically exist
.
However, you could say that a PCollection is being produced (while its
producing transform runs).

On Thu, Sep 14, 2017 at 12:19 AM Chaim Turkel  wrote:

> My use case is that I have generic code to transfer for example tables
> from mongo to bigquery. I iterate over all tables in mongo and create
> a PCollection for each. But there are things that need to be checked
> before running, and to run only if validated.
> I tried the visitor but there is no way to stop a PCollection from running.
>
> It would be nice to have hooks that during run time (not graph time) I
> can decide on the PBegin not to start
>
> chaim
>
> On Thu, Sep 14, 2017 at 9:25 AM, Jean-Baptiste Onofré 
> wrote:
> > Hi,
> >
> > I don't think it makes sense on a transform (as it expects a
> PCollection).
> > However, why not introducing a specific hook for that.
> >
> > I think you can workaround using a Pipeline Visitor, but it would be
> runner
> > level.
> >
> > Regards
> > JB
> >
> >
> > On 09/14/2017 08:21 AM, Chaim Turkel wrote:
> >>
> >> Hi,
> >>I have a few scenarios where I would like to have code that is
> >> before the PBegin and after the PDone.
> >> This is usually for monitoring purposes.
> >> It would be nice to be able to transform from PBegin to PBegin, and
> >> PDone to PDone, so that code can be run before and after and not in
> >> the driver program
> >>
> >>
> >> chaim
> >>
> >
> > --
> > Jean-Baptiste Onofré
> > jbono...@apache.org
> > http://blog.nanthrax.net
> > Talend - http://www.talend.com
>


Re: PBegin, PDone

2017-09-14 Thread Chaim Turkel
My use case is that I have generic code to transfer for example tables
from mongo to bigquery. I iterate over all tables in mongo and create
a PCollection for each. But there are things that need to be checked
before running, and to run only if validated.
I tried the visitor but there is no way to stop a PCollection from running.

It would be nice to have hooks that during run time (not graph time) I
can decide on the PBegin not to start

chaim

On Thu, Sep 14, 2017 at 9:25 AM, Jean-Baptiste Onofré  wrote:
> Hi,
>
> I don't think it makes sense on a transform (as it expects a PCollection).
> However, why not introducing a specific hook for that.
>
> I think you can workaround using a Pipeline Visitor, but it would be runner
> level.
>
> Regards
> JB
>
>
> On 09/14/2017 08:21 AM, Chaim Turkel wrote:
>>
>> Hi,
>>I have a few scenarios where I would like to have code that is
>> before the PBegin and after the PDone.
>> This is usually for monitoring purposes.
>> It would be nice to be able to transform from PBegin to PBegin, and
>> PDone to PDone, so that code can be run before and after and not in
>> the driver program
>>
>>
>> chaim
>>
>
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com


Re: PBegin, PDone

2017-09-14 Thread Jean-Baptiste Onofré

Hi,

I don't think it makes sense on a transform (as it expects a PCollection). 
However, why not introducing a specific hook for that.


I think you can workaround using a Pipeline Visitor, but it would be runner 
level.

Regards
JB

On 09/14/2017 08:21 AM, Chaim Turkel wrote:

Hi,
   I have a few scenarios where I would like to have code that is
before the PBegin and after the PDone.
This is usually for monitoring purposes.
It would be nice to be able to transform from PBegin to PBegin, and
PDone to PDone, so that code can be run before and after and not in
the driver program


chaim



--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


PBegin, PDone

2017-09-14 Thread Chaim Turkel
Hi,
  I have a few scenarios where I would like to have code that is
before the PBegin and after the PDone.
This is usually for monitoring purposes.
It would be nice to be able to transform from PBegin to PBegin, and
PDone to PDone, so that code can be run before and after and not in
the driver program


chaim