[jira] [Commented] (GIRAPH-1148) Connected components - make calculate sizes work with large number of components

2017-06-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-1148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16035065#comment-16035065
 ] 

ASF GitHub Bot commented on GIRAPH-1148:


Github user asfgit closed the pull request at:

https://github.com/apache/giraph/pull/39


> Connected components - make calculate sizes work with large number of 
> components
> 
>
> Key: GIRAPH-1148
> URL: https://issues.apache.org/jira/browse/GIRAPH-1148
> Project: Giraph
>  Issue Type: Improvement
>Reporter: Maja Kabiljo
>Assignee: Maja Kabiljo
>
> Currently if we have a graph with large number of connected components, 
> calculating connected components sizes fails because reducer becomes too 
> large. Use array of handles instead.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (GIRAPH-1148) Connected components - make calculate sizes work with large number of components

2017-06-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-1148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16033823#comment-16033823
 ] 

ASF GitHub Bot commented on GIRAPH-1148:


Github user dlogothetis commented on a diff in the pull request:

https://github.com/apache/giraph/pull/39#discussion_r119745185
  
--- Diff: 
giraph-block-app-8/src/main/java/org/apache/giraph/block_app/library/prepare_graph/UndirectedConnectedComponents.java
 ---
@@ -352,10 +352,15 @@ Block calculateConnectedComponentSizes(
 Pair componentToReducePair = Pair.of(
 new LongWritable(), new LongWritable(1));
 LongWritable reusableLong = new LongWritable();
-return Pieces.reduceAndBroadcast(
-"CalcConnectedComponentSizes",
+// This reduce operation is stateless so we can use a single instance
+BasicMapReduce 
reduceOperation =
 new BasicMapReduce<>(
-LongTypeOps.INSTANCE, LongTypeOps.INSTANCE, SumReduce.LONG),
+LongTypeOps.INSTANCE, LongTypeOps.INSTANCE, SumReduce.LONG);
+return Pieces.reduceAndBroadcastWithArrayOfHandles(
+"CalcConnectedComponentSizes",
+3137, /* Just using some large prime number */
--- End diff --

Sounds good. Looks good then.


> Connected components - make calculate sizes work with large number of 
> components
> 
>
> Key: GIRAPH-1148
> URL: https://issues.apache.org/jira/browse/GIRAPH-1148
> Project: Giraph
>  Issue Type: Improvement
>Reporter: Maja Kabiljo
>Assignee: Maja Kabiljo
>
> Currently if we have a graph with large number of connected components, 
> calculating connected components sizes fails because reducer becomes too 
> large. Use array of handles instead.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (GIRAPH-1148) Connected components - make calculate sizes work with large number of components

2017-06-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-1148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16033817#comment-16033817
 ] 

ASF GitHub Bot commented on GIRAPH-1148:


Github user majakabiljo commented on a diff in the pull request:

https://github.com/apache/giraph/pull/39#discussion_r119744463
  
--- Diff: 
giraph-block-app-8/src/main/java/org/apache/giraph/block_app/library/prepare_graph/UndirectedConnectedComponents.java
 ---
@@ -352,10 +352,15 @@ Block calculateConnectedComponentSizes(
 Pair componentToReducePair = Pair.of(
 new LongWritable(), new LongWritable(1));
 LongWritable reusableLong = new LongWritable();
-return Pieces.reduceAndBroadcast(
-"CalcConnectedComponentSizes",
+// This reduce operation is stateless so we can use a single instance
+BasicMapReduce 
reduceOperation =
 new BasicMapReduce<>(
-LongTypeOps.INSTANCE, LongTypeOps.INSTANCE, SumReduce.LONG),
+LongTypeOps.INSTANCE, LongTypeOps.INSTANCE, SumReduce.LONG);
+return Pieces.reduceAndBroadcastWithArrayOfHandles(
+"CalcConnectedComponentSizes",
+3137, /* Just using some large prime number */
--- End diff --

I can't come up with a reason why someone would want to change it. This can 
start having problems only at trillion components which wouldn't work for many 
other reasons, for tiny ones this few reducers won't add any overhead, and for 
larger ones which were currently working this is still improvement since 
reducers are processed on many machines now.


> Connected components - make calculate sizes work with large number of 
> components
> 
>
> Key: GIRAPH-1148
> URL: https://issues.apache.org/jira/browse/GIRAPH-1148
> Project: Giraph
>  Issue Type: Improvement
>Reporter: Maja Kabiljo
>Assignee: Maja Kabiljo
>
> Currently if we have a graph with large number of connected components, 
> calculating connected components sizes fails because reducer becomes too 
> large. Use array of handles instead.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (GIRAPH-1148) Connected components - make calculate sizes work with large number of components

2017-06-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-1148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16033539#comment-16033539
 ] 

ASF GitHub Bot commented on GIRAPH-1148:


Github user dlogothetis commented on a diff in the pull request:

https://github.com/apache/giraph/pull/39#discussion_r119708329
  
--- Diff: 
giraph-block-app-8/src/main/java/org/apache/giraph/block_app/library/prepare_graph/UndirectedConnectedComponents.java
 ---
@@ -352,10 +352,15 @@ Block calculateConnectedComponentSizes(
 Pair componentToReducePair = Pair.of(
 new LongWritable(), new LongWritable(1));
 LongWritable reusableLong = new LongWritable();
-return Pieces.reduceAndBroadcast(
-"CalcConnectedComponentSizes",
+// This reduce operation is stateless so we can use a single instance
+BasicMapReduce 
reduceOperation =
 new BasicMapReduce<>(
-LongTypeOps.INSTANCE, LongTypeOps.INSTANCE, SumReduce.LONG),
+LongTypeOps.INSTANCE, LongTypeOps.INSTANCE, SumReduce.LONG);
+return Pieces.reduceAndBroadcastWithArrayOfHandles(
+"CalcConnectedComponentSizes",
+3137, /* Just using some large prime number */
--- End diff --

Should this be configurable?


> Connected components - make calculate sizes work with large number of 
> components
> 
>
> Key: GIRAPH-1148
> URL: https://issues.apache.org/jira/browse/GIRAPH-1148
> Project: Giraph
>  Issue Type: Improvement
>Reporter: Maja Kabiljo
>Assignee: Maja Kabiljo
>
> Currently if we have a graph with large number of connected components, 
> calculating connected components sizes fails because reducer becomes too 
> large. Use array of handles instead.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (GIRAPH-1148) Connected components - make calculate sizes work with large number of components

2017-06-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-1148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16033404#comment-16033404
 ] 

ASF GitHub Bot commented on GIRAPH-1148:


Github user majakabiljo commented on a diff in the pull request:

https://github.com/apache/giraph/pull/39#discussion_r119686989
  
--- Diff: 
giraph-block-app/src/main/java/org/apache/giraph/block_app/library/Pieces.java 
---
@@ -320,6 +325,91 @@ public String toString() {
   }
 
   /**
+   * Like reduceAndBroadcast, but uses array of handles for reducers and
+   * broadcasts, to make it feasible and performant when values are large.
+   * Each supplied value to reduce will be reduced in the handle defined by
+   * handleHashSupplier%numHandles
+   *
+   * @param  Single value type, objects passed on workers
+   * @param  Reduced value type
+   * @param  Vertex id type
+   * @param  Vertex value type
+   * @param  Edge value type
+   */
+  public static
+  
+  Piece reduceAndBroadcastWithArrayOfHandles(
+  final String name,
+  final int numHandles,
+  final ReduceOperation reduceOp,
+  final SupplierFromVertex handleHashSupplier,
+  final SupplierFromVertex valueSupplier,
+  final ConsumerWithVertex reducedValueConsumer) {
+return new Piece() {
+  protected ArrayOfHandles.ArrayOfReducers reducers;
+  protected BroadcastArrayHandle broadcasts;
+
+  private int getHandleIndex(Vertex vertex) {
+return (int) Math.abs(handleHashSupplier.get(vertex) % numHandles);
+  }
+
+  @Override
+  public void registerReducers(
+  final CreateReducersApi reduceApi, Object executionStage) {
+reducers = new ArrayOfHandles.ArrayOfReducers<>(
+numHandles,
+new Supplier>() {
+  @Override
+  public ReducerHandle get() {
+return reduceApi.createLocalReducer(reduceOp);
--- End diff --

Good catch, it didn't occur to me. I'll fix it not to reuse the same 
ReduceOperation object.


> Connected components - make calculate sizes work with large number of 
> components
> 
>
> Key: GIRAPH-1148
> URL: https://issues.apache.org/jira/browse/GIRAPH-1148
> Project: Giraph
>  Issue Type: Improvement
>Reporter: Maja Kabiljo
>Assignee: Maja Kabiljo
>
> Currently if we have a graph with large number of connected components, 
> calculating connected components sizes fails because reducer becomes too 
> large. Use array of handles instead.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (GIRAPH-1148) Connected components - make calculate sizes work with large number of components

2017-06-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-1148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16033358#comment-16033358
 ] 

ASF GitHub Bot commented on GIRAPH-1148:


Github user dlogothetis commented on a diff in the pull request:

https://github.com/apache/giraph/pull/39#discussion_r119681042
  
--- Diff: 
giraph-block-app/src/main/java/org/apache/giraph/block_app/library/Pieces.java 
---
@@ -320,6 +325,91 @@ public String toString() {
   }
 
   /**
+   * Like reduceAndBroadcast, but uses array of handles for reducers and
+   * broadcasts, to make it feasible and performant when values are large.
+   * Each supplied value to reduce will be reduced in the handle defined by
+   * handleHashSupplier%numHandles
+   *
+   * @param  Single value type, objects passed on workers
+   * @param  Reduced value type
+   * @param  Vertex id type
+   * @param  Vertex value type
+   * @param  Edge value type
+   */
+  public static
+  
+  Piece reduceAndBroadcastWithArrayOfHandles(
+  final String name,
+  final int numHandles,
+  final ReduceOperation reduceOp,
+  final SupplierFromVertex handleHashSupplier,
+  final SupplierFromVertex valueSupplier,
+  final ConsumerWithVertex reducedValueConsumer) {
+return new Piece() {
+  protected ArrayOfHandles.ArrayOfReducers reducers;
+  protected BroadcastArrayHandle broadcasts;
+
+  private int getHandleIndex(Vertex vertex) {
+return (int) Math.abs(handleHashSupplier.get(vertex) % numHandles);
+  }
+
+  @Override
+  public void registerReducers(
+  final CreateReducersApi reduceApi, Object executionStage) {
+reducers = new ArrayOfHandles.ArrayOfReducers<>(
+numHandles,
+new Supplier>() {
+  @Override
+  public ReducerHandle get() {
+return reduceApi.createLocalReducer(reduceOp);
--- End diff --

Actually, because you're using an ArrayOfHandles, this is going to be 
serialized-deserialized as a whole. This should be ok.

Though, this also assumes that reduce operations are stateless. In general, 
they are but there's nothing that prevents from writing a stateful one. Maybe 
add a comment about this.


> Connected components - make calculate sizes work with large number of 
> components
> 
>
> Key: GIRAPH-1148
> URL: https://issues.apache.org/jira/browse/GIRAPH-1148
> Project: Giraph
>  Issue Type: Improvement
>Reporter: Maja Kabiljo
>Assignee: Maja Kabiljo
>
> Currently if we have a graph with large number of connected components, 
> calculating connected components sizes fails because reducer becomes too 
> large. Use array of handles instead.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (GIRAPH-1148) Connected components - make calculate sizes work with large number of components

2017-06-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-1148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16033356#comment-16033356
 ] 

ASF GitHub Bot commented on GIRAPH-1148:


Github user dlogothetis commented on a diff in the pull request:

https://github.com/apache/giraph/pull/39#discussion_r119680240
  
--- Diff: 
giraph-block-app/src/main/java/org/apache/giraph/block_app/library/Pieces.java 
---
@@ -320,6 +325,91 @@ public String toString() {
   }
 
   /**
+   * Like reduceAndBroadcast, but uses array of handles for reducers and
+   * broadcasts, to make it feasible and performant when values are large.
+   * Each supplied value to reduce will be reduced in the handle defined by
+   * handleHashSupplier%numHandles
+   *
+   * @param  Single value type, objects passed on workers
+   * @param  Reduced value type
+   * @param  Vertex id type
+   * @param  Vertex value type
+   * @param  Edge value type
+   */
+  public static
+  
+  Piece reduceAndBroadcastWithArrayOfHandles(
+  final String name,
+  final int numHandles,
+  final ReduceOperation reduceOp,
+  final SupplierFromVertex handleHashSupplier,
+  final SupplierFromVertex valueSupplier,
+  final ConsumerWithVertex reducedValueConsumer) {
+return new Piece() {
+  protected ArrayOfHandles.ArrayOfReducers reducers;
+  protected BroadcastArrayHandle broadcasts;
+
+  private int getHandleIndex(Vertex vertex) {
+return (int) Math.abs(handleHashSupplier.get(vertex) % numHandles);
+  }
+
+  @Override
+  public void registerReducers(
+  final CreateReducersApi reduceApi, Object executionStage) {
+reducers = new ArrayOfHandles.ArrayOfReducers<>(
+numHandles,
+new Supplier>() {
+  @Override
+  public ReducerHandle get() {
+return reduceApi.createLocalReducer(reduceOp);
--- End diff --

This means that the same ReduceOperation instance is going to be shared 
across the different handles. Not entirely sure how the (de)serialization will 
work here. 


> Connected components - make calculate sizes work with large number of 
> components
> 
>
> Key: GIRAPH-1148
> URL: https://issues.apache.org/jira/browse/GIRAPH-1148
> Project: Giraph
>  Issue Type: Improvement
>Reporter: Maja Kabiljo
>Assignee: Maja Kabiljo
>
> Currently if we have a graph with large number of connected components, 
> calculating connected components sizes fails because reducer becomes too 
> large. Use array of handles instead.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (GIRAPH-1148) Connected components - make calculate sizes work with large number of components

2017-05-30 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-1148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16029759#comment-16029759
 ] 

ASF GitHub Bot commented on GIRAPH-1148:


GitHub user majakabiljo opened a pull request:

https://github.com/apache/giraph/pull/39

[GIRAPH-1148] Connected components - make calculate sizes work with l…

…arge number of components

Summary: Currently if we have a graph with large number of connected 
components, calculating connected components sizes fails because reducer 
becomes too large. Use array of handles instead.

Test Plan: Successfully ran the job which was failing without this change

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/majakabiljo/giraph cc

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/giraph/pull/39.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #39


commit 0b9ef9415a67737122e6fb6caeca5307df9632a1
Author: Maja Kabiljo 
Date:   2017-05-30T17:10:17Z

[GIRAPH-1148] Connected components - make calculate sizes work with large 
number of components

Summary: Currently if we have a graph with large number of connected 
components, calculating connected components sizes fails because reducer 
becomes too large. Use array of handles instead.

Test Plan: Successfully ran the job which was failing without this change




> Connected components - make calculate sizes work with large number of 
> components
> 
>
> Key: GIRAPH-1148
> URL: https://issues.apache.org/jira/browse/GIRAPH-1148
> Project: Giraph
>  Issue Type: Improvement
>Reporter: Maja Kabiljo
>Assignee: Maja Kabiljo
>
> Currently if we have a graph with large number of connected components, 
> calculating connected components sizes fails because reducer becomes too 
> large. Use array of handles instead.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)