[ 
https://issues.apache.org/jira/browse/FLINK-3789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15251023#comment-15251023
 ] 

Greg Hogan commented on FLINK-3789:
-----------------------------------

I was thinking on Clustering Coefficient, for which we return the local 
clustering coefficient for each vertex as in DataSet via a GraphAlgorithm, that 
it would also be nice to compute the global clustering coefficient which would 
need to access accumulators. Both local and global clustering coefficient count 
triangles so their is certainly advantage it computing the two simultaneously, 
but there is extra cost for each so we should allow separate computation.

So there is need to do similar things as collect and count but still allow the 
user to perform the execute (which of course allows direct configuration of the 
job name) so they can compose multiple algorithms and analytics. Perhaps 
instead of overloading these functions we can provide alternative, slightly 
more sophisticated options which would allow configuring a job name. In many 
ways the current implementation of count, collect, print, and checksum is very 
limiting because you can only perform that single action per job. You can't 
print and count, or print and write. The current DataSet API works well because 
it's simple, but I think we could expand on this.

> Overload methods which trigger program execution to allow naming job
> --------------------------------------------------------------------
>
>                 Key: FLINK-3789
>                 URL: https://issues.apache.org/jira/browse/FLINK-3789
>             Project: Flink
>          Issue Type: Improvement
>          Components: Java API
>    Affects Versions: 1.1.0
>            Reporter: Greg Hogan
>            Assignee: Greg Hogan
>            Priority: Minor
>
> Overload the following functions to additionally accept a job name to pass to 
> {{ExecutionEnvironment.execute(String)}}.
> * {{DataSet.collect()}}
> * {{DataSet.count()}}
> * {{DataSetUtils.checksumHashCode(DataSet)}}
> * {{GraphUtils.checksumHashCode(Graph)}}
> Once the deprecated {{DataSet.print(String)}} and 
> {{DataSet.printToErr(String)}} are removed we can overload 
> {{DataSet.print()}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to