[jira] [Commented] (GIRAPH-275) Restore data locality to workers reading InputSplits where possible without querying NameNode, ZooKeeper

Eli Reisman (JIRA) Tue, 07 Aug 2012 10:08:11 -0700

    [ 
https://issues.apache.org/jira/browse/GIRAPH-275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13430440#comment-13430440
 ]


Eli Reisman commented on GIRAPH-275:
------------------------------------

This is a typical blow-up when I configure wrong. One unlucky worker read 3 
very large splits in 10 minutes, Netty couldn't keep up. Working on the right # 
of workers/transfer limits/Netty configs.

Aug 7, 2012 4:53:26 PM org.jboss.netty.channel.socket.nio.NioWorker
WARNING: Unexpected exception in the selector loop.
java.lang.OutOfMemoryError: GC overhead limit exceeded
        at java.util.HashMap.newKeyIterator(HashMap.java:840)
        at java.util.HashMap$KeySet.iterator(HashMap.java:874)
        at java.util.HashSet.iterator(HashSet.java:153)
        at sun.nio.ch.SelectorImpl.processDeregisterQueue(SelectorImpl.java:127)
        at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:69)
        at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
        at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
        at 
org.jboss.netty.channel.socket.nio.SelectorUtil.select(SelectorUtil.java:33)
        at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:157)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)

                
> Restore data locality to workers reading InputSplits where possible without 
> querying NameNode, ZooKeeper
> --------------------------------------------------------------------------------------------------------
>
>                 Key: GIRAPH-275
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-275
>             Project: Giraph
>          Issue Type: Improvement
>          Components: bsp, graph
>    Affects Versions: 0.2.0
>            Reporter: Eli Reisman
>            Assignee: Eli Reisman
>             Fix For: 0.2.0
>
>         Attachments: GIRAPH-275-1.patch, GIRAPH-275-2.patch, 
> GIRAPH-275-3.patch, GIRAPH-275-4.patch
>
>
> During INPUT_SUPERSTEP, workers wait on a barrier until the master has 
> created a complete list of available input splits. Once the barrier is past, 
> each worker iterates through this list of input splits, creating a znode to 
> lay claim to the next unprocessed split the worker encounters.
> For a brief moment while the master is creating the input split znodes each 
> worker iterates through, it has access to InputSplit objects that also 
> contain a list of hostnames on which the blocks of the file are hosted. By 
> including that list of locations in each znode pathname we can allow each 
> worker reading the list of available splits to sort it so that splits the 
> worker attempts to claim first are the ones that contain a block that is 
> local to that worker's host.
> This allows the possibility for many workers to end up reading at least one 
> split that is local to its own host. If the input split selected holds a 
> local block, the RecordReader Hadoop supplies us with will automatically read 
> from that block anyway. By supplying this locality data as part of the znode 
> name rather than info inside the znode, we avoid reading the data from each 
> znode while sorting, which is only currently done when a split is claimed and 
> which is IO intensive. Sorting the string path data is cheap and faster, and 
> making the final split znode's name longer doesn't seem to matter too much.
> By using the BspMaster's InputSplit data to include locality information in 
> the znode path directly, we also avoid having to access the 
> FileSystem/BlockLocations directly from either master or workers, which could 
> also flood the name node with queries. This is the only place I've found 
> where some locality information is already available to Giraph free of 
> additional cost.
> Finally, by sorting each worker's split list this way, we get the 
> contention-reduction of GIRAPH-250 for free, since only workers on the same 
> host will be likely to contend for a split instead of the current situation 
> in which all workers contend for the same input splits from the same list, 
> iterating from the same index. GIRAPH-250 has already been logged as reducing 
> pages of contention on the first pass (when using many 100's of workers) down 
> to 0-3 contentions before claiming a split to read.
> This passes 'mvn verify' etc. I will post results of cluster testing ASAP. If 
> anyone else could try this on an HDFS cluster where locality info is supplied 
> to InputSplit objects, I would be really interested to see other folks' 
> results.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (GIRAPH-275) Restore data locality to workers reading InputSplits where possible without querying NameNode, ZooKeeper

Reply via email to