[ 
https://issues.apache.org/jira/browse/GIRAPH-1148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16033817#comment-16033817
 ] 

ASF GitHub Bot commented on GIRAPH-1148:
----------------------------------------

Github user majakabiljo commented on a diff in the pull request:

    https://github.com/apache/giraph/pull/39#discussion_r119744463
  
    --- Diff: 
giraph-block-app-8/src/main/java/org/apache/giraph/block_app/library/prepare_graph/UndirectedConnectedComponents.java
 ---
    @@ -352,10 +352,15 @@ Block calculateConnectedComponentSizes(
         Pair<LongWritable, LongWritable> componentToReducePair = Pair.of(
             new LongWritable(), new LongWritable(1));
         LongWritable reusableLong = new LongWritable();
    -    return Pieces.reduceAndBroadcast(
    -        "CalcConnectedComponentSizes",
    +    // This reduce operation is stateless so we can use a single instance
    +    BasicMapReduce<LongWritable, LongWritable, LongWritable> 
reduceOperation =
             new BasicMapReduce<>(
    -            LongTypeOps.INSTANCE, LongTypeOps.INSTANCE, SumReduce.LONG),
    +            LongTypeOps.INSTANCE, LongTypeOps.INSTANCE, SumReduce.LONG);
    +    return Pieces.reduceAndBroadcastWithArrayOfHandles(
    +        "CalcConnectedComponentSizes",
    +        3137, /* Just using some large prime number */
    --- End diff --
    
    I can't come up with a reason why someone would want to change it. This can 
start having problems only at trillion components which wouldn't work for many 
other reasons, for tiny ones this few reducers won't add any overhead, and for 
larger ones which were currently working this is still improvement since 
reducers are processed on many machines now.


> Connected components - make calculate sizes work with large number of 
> components
> --------------------------------------------------------------------------------
>
>                 Key: GIRAPH-1148
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-1148
>             Project: Giraph
>          Issue Type: Improvement
>            Reporter: Maja Kabiljo
>            Assignee: Maja Kabiljo
>
> Currently if we have a graph with large number of connected components, 
> calculating connected components sizes fails because reducer becomes too 
> large. Use array of handles instead.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to