[
https://issues.apache.org/jira/browse/FLINK-3806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15301896#comment-15301896
]
ASF GitHub Bot commented on FLINK-3806:
---------------------------------------
Github user StephanEwen commented on a diff in the pull request:
https://github.com/apache/flink/pull/2036#discussion_r64721168
--- Diff:
flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/gsa/GatherSumApplyIteration.java
---
@@ -289,6 +300,11 @@ private GatherUdf(GatherFunction<VV, EV, M>
gatherFunction, TypeInformation<Tupl
@Override
public void open(Configuration parameters) throws Exception {
+ try {
+ Collection<LongValue> numberOfVertices =
getRuntimeContext().getBroadcastVariable("number of vertices");
+
this.gatherFunction.setNumberOfVertices(numberOfVertices.iterator().next().getValue());
+ } catch (Exception e) {
+ }
--- End diff --
The BatchAPI usually throws exceptions when people build a seemingly
incorrect program, for example by using a broadcast variable that does not
exist. It seemed to help people to catch bugs faster.
The exception makes it obvious that something is wrong, the empty set would
make various programs run but return a wrong result. That is much harder to
recognize and debug.
> Revert use of DataSet.count() in Gelly
> --------------------------------------
>
> Key: FLINK-3806
> URL: https://issues.apache.org/jira/browse/FLINK-3806
> Project: Flink
> Issue Type: Improvement
> Components: Gelly
> Affects Versions: 1.1.0
> Reporter: Greg Hogan
> Assignee: Greg Hogan
> Priority: Critical
> Fix For: 1.1.0
>
>
> FLINK-1632 replaced {{GraphUtils.count}} with {{DataSetUtils.count}}. The
> former returns a {{DataSet}} while the latter executes a job to return a Java
> value.
> {{DataSetUtils.count}} is called from {{Graph.numberOfVertices}} and
> {{Graph.numberOfEdges}} which are called from {{GatherSumApplyIteration}} and
> {{ScatterGatherIteration}} as well as the {{PageRank}} algorithms when the
> user does not pass the number of vertices as a parameter.
> As noted in FLINK-1632, this does make the code simpler but if my
> understanding is correct will materialize the Graph twice. The Graph will
> need to be reread from input, regenerated, or recomputed by preceding
> algorithms.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)