-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/6004/
-----------------------------------------------------------

Review request for giraph.


Description
-------

All the workers iterate through partitions identified by znodes the master 
creates. They all start at 0 and iterate through the size of the znode list 
they were able to acquire, when one becomes available. The logs reveal lots of 
contention as they all iterate from roughly the same part of the list they 
receive. By tricking them into starting their loop through the list at 
different indices, only workers (mappers) that happen to exist on the same host 
start iterating at the same place in their respective input split lists. 
Althought there can be momentary contention for splits among such workers, the 
speedup and clean resolution I have seen with this tiny patch during the 
INPUT_SUPERSTEP in my hadoop logs has been dramatic.


Diffs
-----


Diff: https://reviews.apache.org/r/6004/diff/


Testing
-------

since last patch upload (july 12, 2012) on cluster many times, passes mvn 
verify etc.


Thanks,

Eli Reisman

Reply via email to