Thank you, Tsz. That helps!
On Mon, Mar 17, 2014 at 2:30 PM, Tsz Wo Sze <szets...@yahoo.com> wrote: > The file offset is considered in WebHDFS redirection. It redirects to a > datanode with the first block the client going to read, not the first block > of the file. > > Hope it helps. > Tsz-Wo > > > On Monday, March 17, 2014 10:09 AM, Alejandro Abdelnur < > t...@cloudera.com> wrote: > > actually, i am wrong, the webhdfs rest call has an offset. > > Alejandro > (phone typing) > > On Mar 17, 2014, at 10:07, Alejandro Abdelnur <t...@cloudera.com> wrote: > > dont recall how skips are handled in webhdfs, but i would assume that > you'll get to the first block As usual, and the skip is handled by the DN > serving the file (as webhdfs doesnot know at open that you'll skip) > > Alejandro > (phone typing) > > On Mar 17, 2014, at 9:47, RJ Nowling <rnowl...@gmail.com> wrote: > > Hi Alejandro, > > The WebHDFS API allows specifying an offset and length for the request. > If I specify an offset that start in the second block for a file (thus > skipping the first block all together), will the namenode still direct me > to a datanode with the first block or will it direct me to a namenode with > the second block? I.e., am I assured data locality only on the first block > of the file (as you're saying) or on the first block I am accessing? > > If it is as you say, then I may want to reach out the WebHDFS developers > and see if they would be interested in the additional functionality. > > Thank you, > RJ > > > On Mon, Mar 17, 2014 at 2:40 AM, Alejandro Abdelnur <t...@cloudera.com>wrote: > > I may have expressed myself wrong. You don't need to do any test to see > how locality works with files of multiple blocks. If you are accessing a > file of more than one block over webhdfs, you only have assured locality > for the first block of the file. > > Thanks. > > > On Sun, Mar 16, 2014 at 9:18 PM, RJ Nowling <rnowl...@gmail.com> wrote: > > Thank you, Mingjiang and Alejandro. > > This is interesting. Since we will use the data locality information for > scheduling, we could "hack" this to get the data locality information, at > least for the first block. As Alejandro says, we'd have to test what > happens for other data blocks -- e.g., what if, knowing the block sizes, we > request the second or third block? > > Interesting food for thought! I see some experiments in my future! > > Thanks! > > > On Sun, Mar 16, 2014 at 10:14 PM, Alejandro Abdelnur <t...@cloudera.com>wrote: > > well, this is for the first block of the file, the rest of the file > (blocks being local or not) are streamed out by the same datanode. for > small files (one block) you'll get locality, for large files only the first > block, and by chance if other blocks are local to that datanode. > > > Alejandro > (phone typing) > > On Mar 16, 2014, at 18:53, Mingjiang Shi <m...@gopivotal.com> wrote: > > According to this page: > http://hortonworks.com/blog/webhdfs-%E2%80%93-http-rest-access-to-hdfs/ > > *Data Locality*: The file read and file write calls are redirected to the > corresponding datanodes. It uses the full bandwidth of the Hadoop cluster > for streaming data. > *A HDFS Built-in Component*: WebHDFS is a first class built-in component > of HDFS. It runs inside Namenodes and Datanodes, therefore, it can use all > HDFS functionalities. It is a part of HDFS - there are no additional > servers to install > > > So it looks like the data locality is built-into webhdfs, client will be > redirected to the data node automatically. > > > > > On Mon, Mar 17, 2014 at 6:07 AM, RJ Nowling <rnowl...@gmail.com> wrote: > > Hi all, > > I'm writing up a Google Summer of Code proposal to add HDFS support to > Disco, an Erlang MapReduce framework. > > We're interested in using WebHDFS. I have two questions: > > 1) Does WebHDFS allow querying data locality information? > > 2) If the data locality information is known, can data on specific data > nodes be accessed via Web HDFS? Or do all Web HDFS requests have to go > through a single server? > > Thanks, > RJ > > -- > em rnowl...@gmail.com > c 954.496.2314 > > > > > -- > Cheers > -MJ > > > > > -- > em rnowl...@gmail.com > c 954.496.2314 > > > > > -- > Alejandro > > > > > -- > em rnowl...@gmail.com > c 954.496.2314 > > > > -- em rnowl...@gmail.com c 954.496.2314