I think this question depends on how much both subgraphs overlap? But in general, I agree that the first approach seems more desirable from the runtime view (multiple consumers at the branch point).
On Mon, Jan 19, 2015 at 10:59 AM, Robert Metzger <rmetz...@apache.org> wrote: > I would also execute the sinks immediately. I think its a corner case > because the sinks are usually the last thing in a plan and all print() or > collect() statements are earlier in the plan. > > print() should go to the client command line, yes. > > On Mon, Jan 19, 2015 at 1:42 AM, Stephan Ewen <se...@apache.org> wrote: > > > Hi there! > > > > With the upcoming more interactive extensions to the API (operations that > > go back to the client from a program and need to be eagerly evaluated) we > > need to define how different actions should behave. > > > > Currently, nothing gets executed until the "env.execute()" call is made. > > That allows to produce multiple data sources at the same time, which is a > > good feature. > > > > For certain operations, like the "count()" and "collect()" functions > added > > in https://github.com/apache/flink/pull/210 , we need to trigger > execution > > immediately. > > > > The open question is, how should this behave in connection with already > > defined data sinks: > > > > 1) Should all yet defined data sinks be executed as well? > > 2) Should only that immediate operation be executed and the data sinks be > > pending till a call to "env.execute()" > > > > I am somewhat leaning towards the first option right now, because I think > > that executing them later may force re-execution of larger parts of the > > plan. > > > > In addition: I think that the "print()" commands should go to the client > > command line. In that sense, they would behave like > > "collect().foreach(print)" > > > > > > Greetings, > > Stephan > > >