[ https://issues.apache.org/jira/browse/FLINK-3789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15251023#comment-15251023 ]
Greg Hogan commented on FLINK-3789: ----------------------------------- I was thinking on Clustering Coefficient, for which we return the local clustering coefficient for each vertex as in DataSet via a GraphAlgorithm, that it would also be nice to compute the global clustering coefficient which would need to access accumulators. Both local and global clustering coefficient count triangles so their is certainly advantage it computing the two simultaneously, but there is extra cost for each so we should allow separate computation. So there is need to do similar things as collect and count but still allow the user to perform the execute (which of course allows direct configuration of the job name) so they can compose multiple algorithms and analytics. Perhaps instead of overloading these functions we can provide alternative, slightly more sophisticated options which would allow configuring a job name. In many ways the current implementation of count, collect, print, and checksum is very limiting because you can only perform that single action per job. You can't print and count, or print and write. The current DataSet API works well because it's simple, but I think we could expand on this. > Overload methods which trigger program execution to allow naming job > -------------------------------------------------------------------- > > Key: FLINK-3789 > URL: https://issues.apache.org/jira/browse/FLINK-3789 > Project: Flink > Issue Type: Improvement > Components: Java API > Affects Versions: 1.1.0 > Reporter: Greg Hogan > Assignee: Greg Hogan > Priority: Minor > > Overload the following functions to additionally accept a job name to pass to > {{ExecutionEnvironment.execute(String)}}. > * {{DataSet.collect()}} > * {{DataSet.count()}} > * {{DataSetUtils.checksumHashCode(DataSet)}} > * {{GraphUtils.checksumHashCode(Graph)}} > Once the deprecated {{DataSet.print(String)}} and > {{DataSet.printToErr(String)}} are removed we can overload > {{DataSet.print()}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)