Stanley and all,

thanks. I will write a client application to explore this path. A quick
question again.
Using the fsck command, I can retrieve all the necessary info
$ hadoop fsck /tmp/list2.txt -files -blocks -racks
.....
 *BP-13-7914115-10.122.195.197-14909166276345:blk_1073742025* len=8 repl=2
[/default/10.122.195.198:50010, /default/10.122.195.196:50010]

However, using getFileBlockLocations(), I can't get the block name/id info,
such as
*BP-13-7914115-10.122.195.197-14909166276345:blk_1073742025*seem the
BlockLocation don't provide the public info here.
http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/BlockLocation.html

is there another entry point? somethinig fsck is using? thanks

Demai




On Wed, Aug 27, 2014 at 11:09 PM, Stanley Shi <s...@pivotal.io> wrote:

> As far as I know, there's no combination of hadoop API can do that.
> You can easily get the location of the block (on which DN), but there's no
> way to get the local address of that block file.
>
>
>
> On Thu, Aug 28, 2014 at 11:54 AM, Demai Ni <nid...@gmail.com> wrote:
>
>> Yehia,
>>
>> No problem at all. I really appreciate your willingness to help. Yeah.
>> now I am able to get such information through two steps, and the first step
>> will be either hadoop fsck or getFileBlockLocations(). and then search
>> the local filesystem, my cluster is using the default from CDH, which is
>> /dfs/dn
>>
>> I would like to it programmatically, so wondering whether someone already
>> done it? or maybe better a hadoop API call already implemented for this
>> exact purpose
>>
>> Demai
>>
>>
>> On Wed, Aug 27, 2014 at 7:58 PM, Yehia Elshater <y.z.elsha...@gmail.com>
>> wrote:
>>
>>> Hi Demai,
>>>
>>> Sorry, I missed that you are already tried this out. I think you can
>>> construct the block location on the local file system if you have the block
>>> pool id and the block id. If you are using cloudera distribution, the
>>> default location is under /dfs/dn ( the value of dfs.data.dir,
>>> dfs.datanode.data.dir configuration keys).
>>>
>>> Thanks
>>> Yehia
>>>
>>>
>>> On 27 August 2014 21:20, Yehia Elshater <y.z.elsha...@gmail.com> wrote:
>>>
>>>> Hi Demai,
>>>>
>>>> You can use fsck utility like the following:
>>>>
>>>> hadoop fsck /path/to/your/hdfs/file -files -blocks -locations -racks
>>>>
>>>> This will display all the information you need about the blocks of your
>>>> file.
>>>>
>>>> Hope it helps.
>>>> Yehia
>>>>
>>>>
>>>> On 27 August 2014 20:18, Demai Ni <nid...@gmail.com> wrote:
>>>>
>>>>> Hi, Stanley,
>>>>>
>>>>> Many thanks. Your method works. For now, I can have two steps approach:
>>>>> 1) getFileBlockLocations to grab hdfs BlockLocation[]
>>>>> 2) use local file system call(like find command) to match the block to
>>>>> files on local file system .
>>>>>
>>>>> Maybe there is an existing Hadoop API to return such info in already?
>>>>>
>>>>> Demai on the run
>>>>>
>>>>> On Aug 26, 2014, at 9:14 PM, Stanley Shi <s...@pivotal.io> wrote:
>>>>>
>>>>> I am not sure this is what you want but you can try this shell command:
>>>>>
>>>>> find [DATANODE_DIR] -name [blockname]
>>>>>
>>>>>
>>>>> On Tue, Aug 26, 2014 at 6:42 AM, Demai Ni <nid...@gmail.com> wrote:
>>>>>
>>>>>> Hi, folks,
>>>>>>
>>>>>> New in this area. Hopefully to get a couple pointers.
>>>>>>
>>>>>> I am using Centos and have Hadoop set up using cdh5.1(Hadoop 2.3)
>>>>>>
>>>>>> I am wondering whether there is a interface to get each hdfs block
>>>>>> information in the term of local file system.
>>>>>>
>>>>>> For example, I can use "Hadoop fsck /tmp/test.txt -files -blocks
>>>>>> -racks" to get blockID and its replica on the nodes, such as: repl =3[
>>>>>> /rack/hdfs01, /rack/hdfs02...]
>>>>>>
>>>>>>  With such info, is there a way to
>>>>>> 1) login to hfds01, and read the block directly at local file system
>>>>>> level?
>>>>>>
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>> Demai on the run
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Regards,
>>>>> *Stanley Shi,*
>>>>>
>>>>>
>>>>
>>>
>>
>
>
> --
> Regards,
> *Stanley Shi,*
>
>

Reply via email to