[jira] [Updated] (GIRAPH-275) Restore data locality to workers reading InputSplits where possible without querying NameNode, ZooKeeper

Eli Reisman (JIRA) Wed, 01 Aug 2012 20:32:07 -0700

     [ 
https://issues.apache.org/jira/browse/GIRAPH-275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Eli Reisman updated GIRAPH-275:
-------------------------------

    Attachment: GIRAPH-275-3.patch

This patch puts the location data in the input split ZNode's byte data. The 
pathname solution (patch -2) works fine, accomplishes the same goal, and does 
not add any ZK overhead to the job run. This solution (patch 3) is much 
cleaner. Take your pick.

This piggybacks on the existing writes the BspServiceMaster does to place the 
split data on ZK already. On the down side, each worker much read the data of 
every ZNode in the split list to get the location data and reorder the splits 
so it tries to claim local splits first. This is a lot of ZK reads, which are 
cheap too, but unneeded with patch-2. Again, take your pick. Every precaution 
was taken to make sure no extra data reads were done etc.

I have had trouble getting enough clear time on my cluster to do large-scale 
tests, I will get to it as soon as I see a window (hopefully some night this 
week; weekend in the worst case) and will report back what I find.

                
> Restore data locality to workers reading InputSplits where possible without 
> querying NameNode, ZooKeeper
> --------------------------------------------------------------------------------------------------------
>
>                 Key: GIRAPH-275
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-275
>             Project: Giraph
>          Issue Type: Improvement
>          Components: bsp, graph
>    Affects Versions: 0.2.0
>            Reporter: Eli Reisman
>            Assignee: Eli Reisman
>             Fix For: 0.2.0
>
>         Attachments: GIRAPH-275-1.patch, GIRAPH-275-2.patch, 
> GIRAPH-275-3.patch
>
>
> During INPUT_SUPERSTEP, workers wait on a barrier until the master has 
> created a complete list of available input splits. Once the barrier is past, 
> each worker iterates through this list of input splits, creating a znode to 
> lay claim to the next unprocessed split the worker encounters.
> For a brief moment while the master is creating the input split znodes each 
> worker iterates through, it has access to InputSplit objects that also 
> contain a list of hostnames on which the blocks of the file are hosted. By 
> including that list of locations in each znode pathname we can allow each 
> worker reading the list of available splits to sort it so that splits the 
> worker attempts to claim first are the ones that contain a block that is 
> local to that worker's host.
> This allows the possibility for many workers to end up reading at least one 
> split that is local to its own host. If the input split selected holds a 
> local block, the RecordReader Hadoop supplies us with will automatically read 
> from that block anyway. By supplying this locality data as part of the znode 
> name rather than info inside the znode, we avoid reading the data from each 
> znode while sorting, which is only currently done when a split is claimed and 
> which is IO intensive. Sorting the string path data is cheap and faster, and 
> making the final split znode's name longer doesn't seem to matter too much.
> By using the BspMaster's InputSplit data to include locality information in 
> the znode path directly, we also avoid having to access the 
> FileSystem/BlockLocations directly from either master or workers, which could 
> also flood the name node with queries. This is the only place I've found 
> where some locality information is already available to Giraph free of 
> additional cost.
> Finally, by sorting each worker's split list this way, we get the 
> contention-reduction of GIRAPH-250 for free, since only workers on the same 
> host will be likely to contend for a split instead of the current situation 
> in which all workers contend for the same input splits from the same list, 
> iterating from the same index. GIRAPH-250 has already been logged as reducing 
> pages of contention on the first pass (when using many 100's of workers) down 
> to 0-3 contentions before claiming a split to read.
> This passes 'mvn verify' etc. I will post results of cluster testing ASAP. If 
> anyone else could try this on an HDFS cluster where locality info is supplied 
> to InputSplit objects, I would be really interested to see other folks' 
> results.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (GIRAPH-275) Restore data locality to workers reading InputSplits where possible without querying NameNode, ZooKeeper

Reply via email to