Re: Data Locality and WebHDFS

Tsz Wo Sze Mon, 17 Mar 2014 12:43:32 -0700

The file offset is considered in WebHDFS redirection.  It redirects to a 
datanode with the first block the client going to read, not the first block of 
the file.


Hope it helps.
Tsz-Wo



On Monday, March 17, 2014 10:09 AM, Alejandro Abdelnur <t...@cloudera.com> 
wrote:
 
actually, i am wrong, the webhdfs rest call has an offset. 
>
>Alejandro
>(phone typing)
>
>On Mar 17, 2014, at 10:07, Alejandro Abdelnur <t...@cloudera.com> wrote:
>
>
>dont recall how skips are handled in webhdfs, but i would assume that you'll 
>get to the first block As usual, and the skip is handled by the DN serving the 
>file (as webhdfs doesnot know at open that you'll skip)
>
>Alejandro
>(phone typing)
>
>On Mar 17, 2014, at 9:47, RJ Nowling <rnowl...@gmail.com> wrote:
>
>
>Hi Alejandro,
>>
>>
>>The WebHDFS API allows specifying an offset and length for the request.  If I 
>>specify an offset that start in the second block for a file (thus skipping 
>>the first block all together), will the namenode still direct me to a 
>>datanode with the first block or will it direct me to a namenode with the 
>>second block?  I.e., am I assured data locality only on the first block of 
>>the file (as you're saying) or on the first block I am accessing?
>>
>>
>>If it is as you say, then I may want to reach out the WebHDFS developers and 
>>see if they would be interested in the additional functionality.
>>
>>
>>Thank you,
>>RJ
>>
>>
>>
>>On Mon, Mar 17, 2014 at 2:40 AM, Alejandro Abdelnur <t...@cloudera.com> wrote:
>>
>>I may have expressed myself wrong. You don't need to do any test to see how 
>>locality works with files of multiple blocks. If you are accessing a file of 
>>more than one block over webhdfs, you only have assured locality for the 
>>first block of the file.
>>>
>>>
>>>Thanks.
>>>
>>>
>>>
>>>On Sun, Mar 16, 2014 at 9:18 PM, RJ Nowling <rnowl...@gmail.com> wrote:
>>>
>>>Thank you, Mingjiang and Alejandro.
>>>>
>>>>
>>>>This is interesting.  Since we will use the data locality information for 
>>>>scheduling, we could "hack" this to get the data locality information, at 
>>>>least for the first block.  As Alejandro says, we'd have to test what 
>>>>happens for other data blocks -- e.g., what if, knowing the block sizes, we 
>>>>request the second or third block?
>>>>
>>>>
>>>>Interesting food for thought!  I see some experiments in my future!  
>>>>
>>>>
>>>>Thanks!
>>>>
>>>>
>>>>
>>>>On Sun, Mar 16, 2014 at 10:14 PM, Alejandro Abdelnur <t...@cloudera.com> 
>>>>wrote:
>>>>
>>>>well, this is for the first block of the file, the rest of the file (blocks 
>>>>being local or not) are streamed out by the same datanode. for small files 
>>>>(one block) you'll get locality, for large files only the first block, and 
>>>>by chance if other blocks are local to that datanode. 
>>>>>
>>>>>
>>>>>
>>>>>Alejandro
>>>>>(phone typing)
>>>>>
>>>>>On Mar 16, 2014, at 18:53, Mingjiang Shi <m...@gopivotal.com> wrote:
>>>>>
>>>>>
>>>>>According to this page: 
>>>>>http://hortonworks.com/blog/webhdfs-%E2%80%93-http-rest-access-to-hdfs/
>>>>>>
>>>>>>Data Locality: The file read and file write calls 
are redirected to the corresponding datanodes. It uses the full 
bandwidth of the Hadoop cluster for streaming data.
>>>>>>>A HDFS Built-in Component: WebHDFS is a first class 
built-in component of HDFS. It runs inside Namenodes and Datanodes, 
therefore, it can use all HDFS functionalities. It is a part of HDFS – 
there are no additional servers to install
>>>>>>
>>>>>>So it looks like the data locality is built-into webhdfs, client will be 
>>>>>>redirected to the data node automatically. 
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>On Mon, Mar 17, 2014 at 6:07 AM, RJ Nowling <rnowl...@gmail.com> wrote:
>>>>>>
>>>>>>Hi all,
>>>>>>>
>>>>>>>
>>>>>>>I'm writing up a Google Summer of Code proposal to add HDFS support to 
>>>>>>>Disco, an Erlang MapReduce framework.  
>>>>>>>
>>>>>>>
>>>>>>>We're interested in using WebHDFS.  I have two questions:
>>>>>>>
>>>>>>>
>>>>>>>1) Does WebHDFS allow querying data locality information?
>>>>>>>
>>>>>>>
>>>>>>>2) If the data locality information is known, can data on specific data 
>>>>>>>nodes be accessed via Web HDFS?  Or do all Web HDFS requests have to go 
>>>>>>>through a single server?
>>>>>>>
>>>>>>>Thanks,
>>>>>>>RJ
>>>>>>>
>>>>>>>
>>>>>>>-- 
>>>>>>>em rnowl...@gmail.com
>>>>>>>c 954.496.2314 
>>>>>>
>>>>>>
>>>>>>-- 
>>>>>>
>>>>>>Cheers
>>>>>>-MJ
>>>>>>
>>>>
>>>>
>>>>
>>>>-- 
>>>>em rnowl...@gmail.com
>>>>c 954.496.2314 
>>>
>>>
>>>
>>>-- 
>>>Alejandro 
>>
>>
>>
>>-- 
>>em rnowl...@gmail.com
>>c 954.496.2314 
>
>

Re: Data Locality and WebHDFS

Reply via email to