I think this question depends on how much both subgraphs overlap? But in
general, I agree that the first approach seems more desirable from the
runtime view (multiple consumers at the branch point).

On Mon, Jan 19, 2015 at 10:59 AM, Robert Metzger <rmetz...@apache.org>
wrote:

> I would also execute the sinks immediately. I think its a corner case
> because the sinks are usually the last thing in a plan and all print() or
> collect() statements are earlier in the plan.
>
> print() should go to the client command line, yes.
>
> On Mon, Jan 19, 2015 at 1:42 AM, Stephan Ewen <se...@apache.org> wrote:
>
> > Hi there!
> >
> > With the upcoming more interactive extensions to the API (operations that
> > go back to the client from a program and need to be eagerly evaluated) we
> > need to define how different actions should behave.
> >
> > Currently, nothing gets executed until the "env.execute()" call is made.
> > That allows to produce multiple data sources at the same time, which is a
> > good feature.
> >
> > For certain operations, like the "count()" and "collect()" functions
> added
> > in https://github.com/apache/flink/pull/210 , we need to trigger
> execution
> > immediately.
> >
> > The open question is, how should this behave in connection with already
> > defined data sinks:
> >
> > 1) Should all yet defined data sinks be executed as well?
> > 2) Should only that immediate operation be executed and the data sinks be
> > pending till a call to "env.execute()"
> >
> > I am somewhat leaning towards the first option right now, because I think
> > that executing them later may force re-execution of larger parts of the
> > plan.
> >
> > In addition: I think that the "print()" commands should go to the client
> > command line. In that sense, they would behave like
> > "collect().foreach(print)"
> >
> >
> > Greetings,
> > Stephan
> >
>

Reply via email to