Eli Reisman created GIRAPH-275:
----------------------------------

             Summary: Restore data locality to workers reading InputSplits 
where possible without querying NameNode, ZooKeeper
                 Key: GIRAPH-275
                 URL: https://issues.apache.org/jira/browse/GIRAPH-275
             Project: Giraph
          Issue Type: Improvement
          Components: bsp, graph
    Affects Versions: 0.2.0
            Reporter: Eli Reisman
            Assignee: Eli Reisman
             Fix For: 0.2.0
         Attachments: GIRAPH-275-1.patch

During INPUT_SUPERSTEP, workers wait on a barrier until the master has created 
a complete list of available input splits. Once the barrier is past, each 
worker iterates through this list of input splits, creating a znode to lay 
claim to the next unprocessed split the worker encounters.

For a brief moment while the master is creating the input split znodes each 
worker iterates through, it has access to InputSplit objects that also contain 
a list of hostnames on which the blocks of the file are hosted. By including 
that list of locations in each znode pathname we can allow each worker reading 
the list of available splits to sort it so that splits the worker attempts to 
claim first are the ones that contain a block that is local to that worker's 
host.

This allows the possibility for many workers to end up reading at least one 
split that is local to its own host. If the input split selected holds a local 
block, the RecordReader Hadoop supplies us with will automatically read from 
that block anyway. By supplying this locality data as part of the znode name 
rather than info inside the znode, we avoid reading the data from each znode 
while sorting, which is only currently done when a split is claimed and which 
is IO intensive. Sorting the string path data is cheap and faster, and making 
the final split znode's name longer doesn't seem to matter too much.

By using the BspMaster's InputSplit data to include locality information in the 
znode path directly, we also avoid having to access the 
FileSystem/BlockLocations directly from either master or workers, which could 
also flood the name node with queries. This is the only place I've found where 
some locality information is already available to Giraph free of additional 
cost.

Finally, by sorting each worker's split list this way, we get the 
contention-reduction of GIRAPH-250 for free, since only workers on the same 
host will be likely to contend for a split instead of the current situation in 
which all workers contend for the same input splits from the same list, 
iterating from the same index. GIRAPH-250 has already been logged as reducing 
pages of contention on the first pass (when using many 100's of workers) down 
to 0-3 contentions before claiming a split to read.

This passes 'mvn verify' etc. I will post results of cluster testing ASAP. If 
anyone else could try this on an HDFS cluster where locality info is supplied 
to InputSplit objects, I would be really interested to see other folks' results.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to