I was curious as to how the `count` method on DataSet worked, and was surprised 
to see that it executes the entire program graph.   Wouldn’t this cause 
undesirable side-effects like writing to sinks?    Also strange that the graph 
is mutated with the addition of a sink (that isn’t subsequently removed).

Surveying the Flink code, there aren’t many situations where the program graph 
is implicitly executed (`collect` is another).   Nonetheless, this has deepened 
my appreciation for how dynamic the application might be.

// DataSet.java
public long count() throws Exception {
   final String id = new AbstractID().toString();

   output(new Utils.CountHelper<T>(id)).name("count()");

   JobExecutionResult res = getExecutionEnvironment().execute();
   return res.<Long> getAccumulatorResult(id);
}
Eron

Reply via email to