The file offset is considered in WebHDFS redirection. It redirects to a datanode with the first block the client going to read, not the first block of the file.
Hope it helps. Tsz-Wo On Monday, March 17, 2014 10:09 AM, Alejandro Abdelnur <t...@cloudera.com> wrote: actually, i am wrong, the webhdfs rest call has an offset. > >Alejandro >(phone typing) > >On Mar 17, 2014, at 10:07, Alejandro Abdelnur <t...@cloudera.com> wrote: > > >dont recall how skips are handled in webhdfs, but i would assume that you'll >get to the first block As usual, and the skip is handled by the DN serving the >file (as webhdfs doesnot know at open that you'll skip) > >Alejandro >(phone typing) > >On Mar 17, 2014, at 9:47, RJ Nowling <rnowl...@gmail.com> wrote: > > >Hi Alejandro, >> >> >>The WebHDFS API allows specifying an offset and length for the request. If I >>specify an offset that start in the second block for a file (thus skipping >>the first block all together), will the namenode still direct me to a >>datanode with the first block or will it direct me to a namenode with the >>second block? I.e., am I assured data locality only on the first block of >>the file (as you're saying) or on the first block I am accessing? >> >> >>If it is as you say, then I may want to reach out the WebHDFS developers and >>see if they would be interested in the additional functionality. >> >> >>Thank you, >>RJ >> >> >> >>On Mon, Mar 17, 2014 at 2:40 AM, Alejandro Abdelnur <t...@cloudera.com> wrote: >> >>I may have expressed myself wrong. You don't need to do any test to see how >>locality works with files of multiple blocks. If you are accessing a file of >>more than one block over webhdfs, you only have assured locality for the >>first block of the file. >>> >>> >>>Thanks. >>> >>> >>> >>>On Sun, Mar 16, 2014 at 9:18 PM, RJ Nowling <rnowl...@gmail.com> wrote: >>> >>>Thank you, Mingjiang and Alejandro. >>>> >>>> >>>>This is interesting. Since we will use the data locality information for >>>>scheduling, we could "hack" this to get the data locality information, at >>>>least for the first block. As Alejandro says, we'd have to test what >>>>happens for other data blocks -- e.g., what if, knowing the block sizes, we >>>>request the second or third block? >>>> >>>> >>>>Interesting food for thought! I see some experiments in my future! >>>> >>>> >>>>Thanks! >>>> >>>> >>>> >>>>On Sun, Mar 16, 2014 at 10:14 PM, Alejandro Abdelnur <t...@cloudera.com> >>>>wrote: >>>> >>>>well, this is for the first block of the file, the rest of the file (blocks >>>>being local or not) are streamed out by the same datanode. for small files >>>>(one block) you'll get locality, for large files only the first block, and >>>>by chance if other blocks are local to that datanode. >>>>> >>>>> >>>>> >>>>>Alejandro >>>>>(phone typing) >>>>> >>>>>On Mar 16, 2014, at 18:53, Mingjiang Shi <m...@gopivotal.com> wrote: >>>>> >>>>> >>>>>According to this page: >>>>>http://hortonworks.com/blog/webhdfs-%E2%80%93-http-rest-access-to-hdfs/ >>>>>> >>>>>>Data Locality: The file read and file write calls are redirected to the corresponding datanodes. It uses the full bandwidth of the Hadoop cluster for streaming data. >>>>>>>A HDFS Built-in Component: WebHDFS is a first class built-in component of HDFS. It runs inside Namenodes and Datanodes, therefore, it can use all HDFS functionalities. It is a part of HDFS – there are no additional servers to install >>>>>> >>>>>>So it looks like the data locality is built-into webhdfs, client will be >>>>>>redirected to the data node automatically. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>On Mon, Mar 17, 2014 at 6:07 AM, RJ Nowling <rnowl...@gmail.com> wrote: >>>>>> >>>>>>Hi all, >>>>>>> >>>>>>> >>>>>>>I'm writing up a Google Summer of Code proposal to add HDFS support to >>>>>>>Disco, an Erlang MapReduce framework. >>>>>>> >>>>>>> >>>>>>>We're interested in using WebHDFS. I have two questions: >>>>>>> >>>>>>> >>>>>>>1) Does WebHDFS allow querying data locality information? >>>>>>> >>>>>>> >>>>>>>2) If the data locality information is known, can data on specific data >>>>>>>nodes be accessed via Web HDFS? Or do all Web HDFS requests have to go >>>>>>>through a single server? >>>>>>> >>>>>>>Thanks, >>>>>>>RJ >>>>>>> >>>>>>> >>>>>>>-- >>>>>>>em rnowl...@gmail.com >>>>>>>c 954.496.2314 >>>>>> >>>>>> >>>>>>-- >>>>>> >>>>>>Cheers >>>>>>-MJ >>>>>> >>>> >>>> >>>> >>>>-- >>>>em rnowl...@gmail.com >>>>c 954.496.2314 >>> >>> >>> >>>-- >>>Alejandro >> >> >> >>-- >>em rnowl...@gmail.com >>c 954.496.2314 > >