The DNs do not expose the mapping they maintain, to clients. So it has
to be either routed through the NN (for which I've provided the few
commands), or you may otherwise disregard protocols and just enter the
directory yourself.

Note that block backups may not always make sense, since, for example,
there are chances that some blocks are no longer belonging to any
active file on an NN since the deletion progresses in small phases and
it could be some time before a raw block at a DN is invalidated (and
then deleted).

On Sat, Jul 7, 2012 at 12:28 AM, Yaron Gonen <yaron.go...@gmail.com> wrote:
> Thanks, I'll look at that tool.
> I still wish to iterate the blocks from the Java interface since I want to
> look at their metadata. I'll look at the source code of the command line
> tools you mentioned.
>
> Thanks again.
>
> On Jul 6, 2012 9:07 PM, "Harsh J" <ha...@cloudera.com> wrote:
>>
>> Does HDFS's replication feature not do this automatically and more
>> effectively for you?
>>
>> I think for backups you should look at the DistCp tool, which backup
>> at proper file-levels rather than granular block level copies. It can
>> do incremental copies too, AFAICT.
>>
>> In any case, if you wish to have a list of all blocks at each DN,
>> either parse out the info returned via "dfsadmin -metasave", "fsck
>> -files -blocks -locations", or ls -lR the DN's data dir.
>>
>> On Fri, Jul 6, 2012 at 11:23 PM, Yaron Gonen <yaron.go...@gmail.com>
>> wrote:
>> > Thanks for the fast reply.
>> > My top goal is to backup any new blocks on the DN.
>> > What i'd like to do is to go over all the blocks in the DN and to make a
>> > signature for any one of them. I'll compare that signature with a backup
>> > server.
>> > I guess another feature will be to check only new blocks, so i'll have
>> > to
>> > look at the metadata of each block.
>> >
>> > On Jul 6, 2012 5:59 PM, "Harsh J" <ha...@cloudera.com> wrote:
>> >>
>> >> When you say 'scan blocks on that datanode', what do you mean to do by
>> >> 'scan'? If you want merely a list of blocks per DN at a given time,
>> >> there are ways to get that. However, if you want to then perform
>> >> operations on each of these block remotely, then thats not possible to
>> >> do.
>> >>
>> >> In any case, you can run whatever program you wish to agnostically on
>> >> any DN by running it on the dfs.datanode.data.dir directories of the
>> >> DN (take it from its config), and visiting all files with the format
>> >> ^blk_<ID number>$.
>> >>
>> >> We can help you better if you tell us what exactly are you attempting
>> >> to do, for which you need a list of all the blocks per DN.
>> >>
>> >> On Fri, Jul 6, 2012 at 7:58 PM, Yaron Gonen <yaron.go...@gmail.com>
>> >> wrote:
>> >> > Hi,
>> >> > I'm trying to write an agent that will run on a datanode and will
>> >> > scan
>> >> > blocks on a that datanode.
>> >> > The logical thing to do is to look in the DataBlockScanner code,
>> >> > which
>> >> > lists
>> >> > all the blocks on a node, which is what I did.
>> >> > The problem is that the DataBlockScanner object is instantiated
>> >> > during
>> >> > the
>> >> > start-up of a DataNode, so a lot of objects needed (like FSDataSet)
>> >> > are
>> >> > already instantiated.
>> >> > Then, I tried with DataNode.getDataNode(), but it returned null
>> >> > (needless to
>> >> > say that the node is up-and-running).
>> >> > I'd be grateful if you can refer me to the right object or to a a
>> >> > guide.
>> >> >
>> >> > I'm new in hdfs, so I'm sorry if its a trivial question.
>> >> >
>> >> > Thanks,
>> >> > Yaron
>> >>
>> >>
>> >>
>> >> --
>> >> Harsh J
>>
>>
>>
>> --
>> Harsh J



-- 
Harsh J

Reply via email to