[ 
https://issues.apache.org/jira/browse/GIRAPH-301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Reisman updated GIRAPH-301:
-------------------------------

    Attachment: GIRAPH-301-7.patch

just a quick rebase, no changes.
                
> InputSplit Reservations are clumping, leaving many workers asleep while other 
> process too many splits and get overloaded.
> -------------------------------------------------------------------------------------------------------------------------
>
>                 Key: GIRAPH-301
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-301
>             Project: Giraph
>          Issue Type: Improvement
>          Components: bsp, graph, zookeeper
>    Affects Versions: 0.2.0
>            Reporter: Eli Reisman
>            Assignee: Eli Reisman
>              Labels: patch
>             Fix For: 0.2.0
>
>         Attachments: GIRAPH-301-1.patch, GIRAPH-301-2.patch, 
> GIRAPH-301-3.patch, GIRAPH-301-4.patch, GIRAPH-301-5.patch, 
> GIRAPH-301-6.patch, GIRAPH-301-7.patch
>
>
> With recent additions to the codebase, users here have noticed many workers 
> are able to load input splits extremely quickly, and this has altered the 
> behavior of Giraph during INPUT_SUPERSTEP when using the current algorithm 
> for split reservations. A few workers process multiple splits (often 
> overwhelming Netty and getting GC errors as they attempt to offload too much 
> data too quick) while many (often most) of the others just sleep through the 
> superstep, never successfully participating at all.
> Essentially, the current algo is:
> 1. scan input split list, skipping nodes that are marked "Finsihed"
> 2. grab the first unfinished node in the list (reserved or not) and check its 
> reserved status.
> 3. if not reserved, attempt to reserve & return it if successful.
> 4. if the first one you check is already taken, sleep for way too long and 
> only wake up if another worker finishes a split, then contend with that 
> worker for another split, while the majority of the split list might sit 
> idle, not actually checked or claimed by anyone yet.
> This does not work. By making a few simple changes (and acknowledging that ZK 
> reads are cheap, only writes are not) this patch is able to get every worker 
> involved, and keep them in the game, ensuring that the INPUT_SUPERSTEP passes 
> quickly and painlessly, and without overwhelming Netty by spreading the 
> memory load the split readers bear more evenly. If the giraph.splitmb and -w 
> options are set correctly, behavior is now exactly as one would expect it to 
> be.
> This also results in INPUT_SUPERSTEP passing more quickly, and survive the 
> INPUT_SUPERSTEP for a given data load on less Hadoop memory slots.
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to