There's this article on InfoQ that deals with this issue... ;-)

http://www.infoq.com/articles/HadoopInputFormat

Sent from a remote device. Please excuse any typos...

Mike Segel

On Nov 12, 2011, at 7:51 AM, Harsh J <ha...@cloudera.com> wrote:

> Tharindu,
> 
> InputSplit#getLocations()  i.e.,
> http://hadoop.apache.org/common/docs/r0.20.2/api/org/apache/hadoop/mapreduce/InputSplit.html#getLocations()
> is used to decide locality of a task. You need your custom InputFormat
> to prepare the right array of these objects. The # of objects == # of
> map tasks, and the locations array gets used by the scheduler for
> local assignment.
> 
> For a FileSplit preparation, this is as easy as passing the block
> locations obtained from the NameNode. For the rest type of splits, you
> need to fill them up yourself.
> 
> On Sat, Nov 12, 2011 at 7:12 PM, Tharindu Mathew <mcclou...@gmail.com> wrote:
>> Hi hadoop devs,
>> 
>> I'm implementing a custom input format and want to understand how to make
>> use of data locality.
>> 
>> AFAIU, only file input format makes use of data locality since the job
>> tracker picks data locality based on the block location defined in the file
>> input split.
>> 
>> So, the job tracker code is partly responsible for this. So providing data
>> locality for a custom input format would be to either either extend file
>> input format or modify job tracker code (if that makes sense even).
>> 
>> Is my understanding correct?
>> 
>> --
>> Regards,
>> 
>> Tharindu
>> 
>> blog: http://mackiemathew.com/
>> 
> 
> 
> 
> -- 
> Harsh J
> 

Reply via email to