I may have expressed myself wrong. You don't need to do any test to see how
locality works with files of multiple blocks. If you are accessing a file
of more than one block over webhdfs, you only have assured locality for the
first block of the file.

Thanks.


On Sun, Mar 16, 2014 at 9:18 PM, RJ Nowling <rnowl...@gmail.com> wrote:

> Thank you, Mingjiang and Alejandro.
>
> This is interesting.  Since we will use the data locality information for
> scheduling, we could "hack" this to get the data locality information, at
> least for the first block.  As Alejandro says, we'd have to test what
> happens for other data blocks -- e.g., what if, knowing the block sizes, we
> request the second or third block?
>
> Interesting food for thought!  I see some experiments in my future!
>
> Thanks!
>
>
> On Sun, Mar 16, 2014 at 10:14 PM, Alejandro Abdelnur <t...@cloudera.com>wrote:
>
>> well, this is for the first block of the file, the rest of the file
>> (blocks being local or not) are streamed out by the same datanode. for
>> small files (one block) you'll get locality, for large files only the first
>> block, and by chance if other blocks are local to that datanode.
>>
>>
>> Alejandro
>> (phone typing)
>>
>> On Mar 16, 2014, at 18:53, Mingjiang Shi <m...@gopivotal.com> wrote:
>>
>> According to this page:
>> http://hortonworks.com/blog/webhdfs-%E2%80%93-http-rest-access-to-hdfs/
>>
>>> *Data Locality*: The file read and file write calls are redirected to
>>> the corresponding datanodes. It uses the full bandwidth of the Hadoop
>>> cluster for streaming data.
>>>
>>> *A HDFS Built-in Component*: WebHDFS is a first class built-in
>>> component of HDFS. It runs inside Namenodes and Datanodes, therefore, it
>>> can use all HDFS functionalities. It is a part of HDFS - there are no
>>> additional servers to install
>>>
>>
>> So it looks like the data locality is built-into webhdfs, client will be
>> redirected to the data node automatically.
>>
>>
>>
>>
>> On Mon, Mar 17, 2014 at 6:07 AM, RJ Nowling <rnowl...@gmail.com> wrote:
>>
>>> Hi all,
>>>
>>> I'm writing up a Google Summer of Code proposal to add HDFS support to
>>> Disco, an Erlang MapReduce framework.
>>>
>>> We're interested in using WebHDFS.  I have two questions:
>>>
>>> 1) Does WebHDFS allow querying data locality information?
>>>
>>> 2) If the data locality information is known, can data on specific data
>>> nodes be accessed via Web HDFS?  Or do all Web HDFS requests have to go
>>> through a single server?
>>>
>>> Thanks,
>>> RJ
>>>
>>> --
>>> em rnowl...@gmail.com
>>> c 954.496.2314
>>>
>>
>>
>>
>> --
>> Cheers
>> -MJ
>>
>>
>
>
> --
> em rnowl...@gmail.com
> c 954.496.2314
>



-- 
Alejandro

Reply via email to