Thanks, Uma! I'll try to figure it out according to your direction.
Best,
Yuduo
On 10/17/2011 10:51 PM, Uma Maheswara Rao G 72686 wrote:
----- Original Message -----
From: Yuduo Zhou<yuduoz...@gmail.com>
Date: Tuesday, October 18, 2011 6:30 am
Subject: About block name and location.
To: hdfs-user@hadoop.apache.org
Hi all,
I'm a rookie to HDFS. Here is just a quick question, suppose I have
a big file stored in HDFS, is there any way to generate a file
containing all information about blocks belong to this file?
For example list of records with format of "block_id, length,
offset, hosts[], local/path/to/this/block"?
FileSystem#getFileStatus(Path f) will give some information. FileStatus
contains below parameters to get.
Path path;
long length;
boolean isdir;
short block_replication;
long blocksize;
long modification_time;
long access_time;
FsPermission permission;
String owner;
String group;
Path symlink;
And to get the blcok locations nd offsets you can use
FileSystem#getFileBlockLocations
If you want exactly in your format, i would suggest you to write small wrapper
in your app and format it using above APIs.
The purpose is to enable programs to only access blocks on the same
node, to utilize block locality.
Hadoop already supports it.
I can retrieve most information using getFileBlockLocations() but I
didn't find how to gather information about the local path.
AFAIK, Local files will be written as just normal file. So, hadoop will not
split local files into blocks. It will do that only in DFS case.
Thanks,
Yuduo