[ 
https://issues.apache.org/jira/browse/GIRAPH-91?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13151587#comment-13151587
 ] 

jirapos...@reviews.apache.org commented on GIRAPH-91:
-----------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2868/
-----------------------------------------------------------

Review request for giraph.


Summary
-------

There general changes should support larger heap sizes (i.e. >20G)

- Added new EdgeListVertex that stores its edges in a compact pair of lists 
instead of Vertex's HashMap.

- Added unittests TestEdgeArrayVertex to test EdgeListVertex.

- Augmented PageRankBenchmark to choose between EdgeListArrayVertex or Vertex 
(to try it out).

- Added failure cleanup for failed workers to quickly alert the master that 
they are dead by deleting its health ephemeral znode.  This allows us to set 
higher ZooKeeper timeouts to deal with GC pauses and the like.  In a quick test 
of 3 nodes, I saw failure in 43 seconds instead of 1m 52 sec.

- Added a context.progress() to flushing to not kill jobs with long timeouts 
(GC or lots of messages).


This addresses bug GIRAPH-91.
    https://issues.apache.org/jira/browse/GIRAPH-91


Diffs
-----

  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/benchmark/PageRankBenchmark.java
 1202898 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/bsp/CentralizedServiceWorker.java
 1202898 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java
 1202898 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BspServiceWorker.java
 1202898 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/EdgeListVertex.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/GiraphJob.java
 1202898 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/GraphMapper.java
 1202898 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/test/java/org/apache/giraph/TestJsonBase64Format.java
 1202898 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/test/java/org/apache/giraph/graph/TestEdgeListVertex.java
 PRE-CREATION 

Diff: https://reviews.apache.org/r/2868/diff


Testing
-------

Local unittests, PageRankBenchmark on multiple machines with >20GB heaps.


Thanks,

Avery


                
> Large-memory improvements (Memory reduced vertex implementation, fast 
> failure, added settings) 
> -----------------------------------------------------------------------------------------------
>
>                 Key: GIRAPH-91
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-91
>             Project: Giraph
>          Issue Type: Improvement
>            Reporter: Avery Ching
>
> Current vertex implementation uses a HashMap for storing the edges, which is 
> quite memory heavy for large graphs.  The default settings in Giraph need to 
> be improved for large graphs and heaps of >20G.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to