----- Original Message ----- From: Yuduo Zhou <yuduoz...@gmail.com> Date: Tuesday, October 18, 2011 6:30 am Subject: About block name and location. To: hdfs-user@hadoop.apache.org
> Hi all, > > I'm a rookie to HDFS. Here is just a quick question, suppose I have > a big file stored in HDFS, is there any way to generate a file > containing all information about blocks belong to this file? > For example list of records with format of "block_id, length, > offset, hosts[], local/path/to/this/block"? > FileSystem#getFileStatus(Path f) will give some information. FileStatus contains below parameters to get. Path path; long length; boolean isdir; short block_replication; long blocksize; long modification_time; long access_time; FsPermission permission; String owner; String group; Path symlink; And to get the blcok locations nd offsets you can use FileSystem#getFileBlockLocations If you want exactly in your format, i would suggest you to write small wrapper in your app and format it using above APIs. > The purpose is to enable programs to only access blocks on the same > node, to utilize block locality. > Hadoop already supports it. > I can retrieve most information using getFileBlockLocations() but I > didn't find how to gather information about the local path. > AFAIK, Local files will be written as just normal file. So, hadoop will not split local files into blocks. It will do that only in DFS case. > Thanks, > Yuduo