[ 
https://issues.apache.org/jira/browse/GIRAPH-301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443671#comment-13443671
 ] 

Eli Reisman commented on GIRAPH-301:
------------------------------------

Thats it exactly. The best approach after lots of versions was the 301-6 (now 
rebased to 301-7) which hashes where the worker will start iterating to 
distributed the workers across the list better, but places their local blocks 
at "adjusted index 0" if there are any. There is still a small chance a worker 
who did not find any local blocks of his/her own will hash to start iterating 
at a block that another worker finds to be local, but from the load in speeds I 
have gotten over many, many runs over the last week or so combined with the 
instrumented runs I did, everyone seems to get a split that is local if 
possible (since usually at least 2-3 candidates exist on any one for a worker 
to try to claim) and everyone regardless of locality seems to claim a split 
within 1-3 tests on the split list, which gets us through the input stage much 
faster.

When you try to match splits to workers 1-to-1, often every worker gets a 
split, occasionally a few will still end up running the list and finding all 
reserved. This seemed to end up with the fewest wasted workers of anything I 
tried.

As for RESERVED change, the comment from 21/Aug/12 17:32 is still exactly what 
happens, and while I can't confirm I am right about why it works (although I 
think those comments sum it up) I can say I have tried a lot of weird # of 
splits / # of workers combos on this patch and it has never had trouble. So I 
think we're good there. I did (and can step it up if you like) add comments 
such that anyone trying to implement a recovery plan for failed input reader 
workers might want to revisit this in the future, but for now I think we're 
safe and this really cuts down the "dead time" when all splits are read and a 
bunch of workers took forever to figure out the superstep was already 
effectively over.

The iterator is a great idea, I can do that right now...patch up in a minute...


                
> InputSplit Reservations are clumping, leaving many workers asleep while other 
> process too many splits and get overloaded.
> -------------------------------------------------------------------------------------------------------------------------
>
>                 Key: GIRAPH-301
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-301
>             Project: Giraph
>          Issue Type: Improvement
>          Components: bsp, graph, zookeeper
>    Affects Versions: 0.2.0
>            Reporter: Eli Reisman
>            Assignee: Eli Reisman
>              Labels: patch
>             Fix For: 0.2.0
>
>         Attachments: GIRAPH-301-1.patch, GIRAPH-301-2.patch, 
> GIRAPH-301-3.patch, GIRAPH-301-4.patch, GIRAPH-301-5.patch, 
> GIRAPH-301-6.patch, GIRAPH-301-7.patch
>
>
> With recent additions to the codebase, users here have noticed many workers 
> are able to load input splits extremely quickly, and this has altered the 
> behavior of Giraph during INPUT_SUPERSTEP when using the current algorithm 
> for split reservations. A few workers process multiple splits (often 
> overwhelming Netty and getting GC errors as they attempt to offload too much 
> data too quick) while many (often most) of the others just sleep through the 
> superstep, never successfully participating at all.
> Essentially, the current algo is:
> 1. scan input split list, skipping nodes that are marked "Finsihed"
> 2. grab the first unfinished node in the list (reserved or not) and check its 
> reserved status.
> 3. if not reserved, attempt to reserve & return it if successful.
> 4. if the first one you check is already taken, sleep for way too long and 
> only wake up if another worker finishes a split, then contend with that 
> worker for another split, while the majority of the split list might sit 
> idle, not actually checked or claimed by anyone yet.
> This does not work. By making a few simple changes (and acknowledging that ZK 
> reads are cheap, only writes are not) this patch is able to get every worker 
> involved, and keep them in the game, ensuring that the INPUT_SUPERSTEP passes 
> quickly and painlessly, and without overwhelming Netty by spreading the 
> memory load the split readers bear more evenly. If the giraph.splitmb and -w 
> options are set correctly, behavior is now exactly as one would expect it to 
> be.
> This also results in INPUT_SUPERSTEP passing more quickly, and survive the 
> INPUT_SUPERSTEP for a given data load on less Hadoop memory slots.
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to