[ https://issues.apache.org/jira/browse/HADOOP-894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12482631 ]
Konstantin Shvachko commented on HADOOP-894: -------------------------------------------- I understand the problem as that a lot of clients are opening the same file and read the first block of it, e.g. in streaming, and then each reads a specific part of the file. So each client does not need to receive a block map for the whole file, but rather needs to get block locations in a specified range. I propose to modify ClientProtocol.open() to OpenFileInfo open( String src, int numBlocks ) where src - is the path; numBlocks - is the number of blocks, which locations the client wants to be calculated by the open() @returns OpenFileInfo : extends DFSFileInfo { LocatedBlock[ numBlocks ]; } DFSFileInfo contains file information including file length and replication. ClientProtocol should also contain public LocatedBlock[] getBlockLocations(String src, int offset, int length) throws IOException; offset - is the starting offset in the file length - is the number of bytes the client is supposed to read class LocatedBlock should include an additional field + long startFrom; which determines the offset within the block to the desired region of bytes. Then we will need to reimplement seeks and reads for DFSInputStream using that API. What would be a good default for the number of blocks that getBlockLocations() would fetch per call if the file is read from start to finish? > dfs client protocol should allow asking for parts of the block map > ------------------------------------------------------------------ > > Key: HADOOP-894 > URL: https://issues.apache.org/jira/browse/HADOOP-894 > Project: Hadoop > Issue Type: Improvement > Components: dfs > Reporter: Owen O'Malley > Assigned To: Wendy Chien > > I think that the HDFS client protocol should change like: > /** The meta-data about a file that was opened. */ > class OpenFileInfo { > /** the info for the first block */ > public LocatedBlockInfo getBlockInfo(); > public long getBlockSize(); > public long getLength(); > } > interface ClientProtocol extends VersionedProtocol { > public OpenFileInfo open(String name) throws IOException; > /** get block info for any range of blocks */ > public LocatedBlockInfo[] getBlockInfo(String name, int blockOffset, int > blockLength) throws IOException; > } > so that the client can decide how much block info to request and when. > Currently, when the file is opened or an error occurs, the entire block list > is requested and sent. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.