Stanley and all, thanks. I will write a client application to explore this path. A quick question again. Using the fsck command, I can retrieve all the necessary info $ hadoop fsck /tmp/list2.txt -files -blocks -racks ..... *BP-13-7914115-10.122.195.197-14909166276345:blk_1073742025* len=8 repl=2 [/default/10.122.195.198:50010, /default/10.122.195.196:50010]
However, using getFileBlockLocations(), I can't get the block name/id info, such as *BP-13-7914115-10.122.195.197-14909166276345:blk_1073742025*seem the BlockLocation don't provide the public info here. http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/BlockLocation.html is there another entry point? somethinig fsck is using? thanks Demai On Wed, Aug 27, 2014 at 11:09 PM, Stanley Shi <s...@pivotal.io> wrote: > As far as I know, there's no combination of hadoop API can do that. > You can easily get the location of the block (on which DN), but there's no > way to get the local address of that block file. > > > > On Thu, Aug 28, 2014 at 11:54 AM, Demai Ni <nid...@gmail.com> wrote: > >> Yehia, >> >> No problem at all. I really appreciate your willingness to help. Yeah. >> now I am able to get such information through two steps, and the first step >> will be either hadoop fsck or getFileBlockLocations(). and then search >> the local filesystem, my cluster is using the default from CDH, which is >> /dfs/dn >> >> I would like to it programmatically, so wondering whether someone already >> done it? or maybe better a hadoop API call already implemented for this >> exact purpose >> >> Demai >> >> >> On Wed, Aug 27, 2014 at 7:58 PM, Yehia Elshater <y.z.elsha...@gmail.com> >> wrote: >> >>> Hi Demai, >>> >>> Sorry, I missed that you are already tried this out. I think you can >>> construct the block location on the local file system if you have the block >>> pool id and the block id. If you are using cloudera distribution, the >>> default location is under /dfs/dn ( the value of dfs.data.dir, >>> dfs.datanode.data.dir configuration keys). >>> >>> Thanks >>> Yehia >>> >>> >>> On 27 August 2014 21:20, Yehia Elshater <y.z.elsha...@gmail.com> wrote: >>> >>>> Hi Demai, >>>> >>>> You can use fsck utility like the following: >>>> >>>> hadoop fsck /path/to/your/hdfs/file -files -blocks -locations -racks >>>> >>>> This will display all the information you need about the blocks of your >>>> file. >>>> >>>> Hope it helps. >>>> Yehia >>>> >>>> >>>> On 27 August 2014 20:18, Demai Ni <nid...@gmail.com> wrote: >>>> >>>>> Hi, Stanley, >>>>> >>>>> Many thanks. Your method works. For now, I can have two steps approach: >>>>> 1) getFileBlockLocations to grab hdfs BlockLocation[] >>>>> 2) use local file system call(like find command) to match the block to >>>>> files on local file system . >>>>> >>>>> Maybe there is an existing Hadoop API to return such info in already? >>>>> >>>>> Demai on the run >>>>> >>>>> On Aug 26, 2014, at 9:14 PM, Stanley Shi <s...@pivotal.io> wrote: >>>>> >>>>> I am not sure this is what you want but you can try this shell command: >>>>> >>>>> find [DATANODE_DIR] -name [blockname] >>>>> >>>>> >>>>> On Tue, Aug 26, 2014 at 6:42 AM, Demai Ni <nid...@gmail.com> wrote: >>>>> >>>>>> Hi, folks, >>>>>> >>>>>> New in this area. Hopefully to get a couple pointers. >>>>>> >>>>>> I am using Centos and have Hadoop set up using cdh5.1(Hadoop 2.3) >>>>>> >>>>>> I am wondering whether there is a interface to get each hdfs block >>>>>> information in the term of local file system. >>>>>> >>>>>> For example, I can use "Hadoop fsck /tmp/test.txt -files -blocks >>>>>> -racks" to get blockID and its replica on the nodes, such as: repl =3[ >>>>>> /rack/hdfs01, /rack/hdfs02...] >>>>>> >>>>>> With such info, is there a way to >>>>>> 1) login to hfds01, and read the block directly at local file system >>>>>> level? >>>>>> >>>>>> >>>>>> Thanks >>>>>> >>>>>> Demai on the run >>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Regards, >>>>> *Stanley Shi,* >>>>> >>>>> >>>> >>> >> > > > -- > Regards, > *Stanley Shi,* > >