----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/6004/ -----------------------------------------------------------
Review request for giraph. Description ------- All the workers iterate through partitions identified by znodes the master creates. They all start at 0 and iterate through the size of the znode list they were able to acquire, when one becomes available. The logs reveal lots of contention as they all iterate from roughly the same part of the list they receive. By tricking them into starting their loop through the list at different indices, only workers (mappers) that happen to exist on the same host start iterating at the same place in their respective input split lists. Althought there can be momentary contention for splits among such workers, the speedup and clean resolution I have seen with this tiny patch during the INPUT_SUPERSTEP in my hadoop logs has been dramatic. Diffs ----- Diff: https://reviews.apache.org/r/6004/diff/ Testing ------- since last patch upload (july 12, 2012) on cluster many times, passes mvn verify etc. Thanks, Eli Reisman