Yuduo, Before you may possibly end up duplicating work done to improve co-located client reads from DNs, I'd suggest seeing JIRAs https://issues.apache.org/jira/browse/HDFS-2246 and https://issues.apache.org/jira/browse/HDFS-347
Regarding your last requirement, about getting the path to the block files - there's no public API available for that yet. The info is carried by the DataNode alone at the moment, and does not expose it out directly (one instead makes a transceiver and DN does the read work by itself). On Tue, Oct 18, 2011 at 8:35 AM, Yuduo <yuduoz...@gmail.com> wrote: > Thanks, Uma! I'll try to figure it out according to your direction. > > Best, > Yuduo > On 10/17/2011 10:51 PM, Uma Maheswara Rao G 72686 wrote: >> >> ----- Original Message ----- >> From: Yuduo Zhou<yuduoz...@gmail.com> >> Date: Tuesday, October 18, 2011 6:30 am >> Subject: About block name and location. >> To: hdfs-user@hadoop.apache.org >> >>> Hi all, >>> >>> I'm a rookie to HDFS. Here is just a quick question, suppose I have >>> a big file stored in HDFS, is there any way to generate a file >>> containing all information about blocks belong to this file? >>> For example list of records with format of "block_id, length, >>> offset, hosts[], local/path/to/this/block"? >>> >> FileSystem#getFileStatus(Path f) will give some information. FileStatus >> contains below parameters to get. >> >> Path path; >> long length; >> boolean isdir; >> short block_replication; >> long blocksize; >> long modification_time; >> long access_time; >> FsPermission permission; >> String owner; >> String group; >> Path symlink; >> >> And to get the blcok locations nd offsets you can use >> FileSystem#getFileBlockLocations >> >> If you want exactly in your format, i would suggest you to write small >> wrapper in your app and format it using above APIs. >> >>> The purpose is to enable programs to only access blocks on the same >>> node, to utilize block locality. >>> >> Hadoop already supports it. >>> >>> I can retrieve most information using getFileBlockLocations() but I >>> didn't find how to gather information about the local path. >>> >> AFAIK, Local files will be written as just normal file. So, hadoop will >> not split local files into blocks. It will do that only in DFS case. >>> >>> Thanks, >>> Yuduo > > -- Harsh J