[ 
https://issues.apache.org/jira/browse/HADOOP-894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12482631
 ] 

Konstantin Shvachko commented on HADOOP-894:
--------------------------------------------

I understand the problem as that a lot of clients are opening the same file and 
read the first block of it,
e.g. in streaming, and then each reads a specific part of the file. So each 
client does not need to receive
a block map for the whole file, but rather needs to get block locations in a 
specified range.

I propose to modify ClientProtocol.open() to
OpenFileInfo open( String src, int numBlocks )
where
src - is the path;
numBlocks - is the number of blocks, which locations the client wants to be 
calculated by the open()
@returns
OpenFileInfo : extends DFSFileInfo {
    LocatedBlock[ numBlocks ];
}
DFSFileInfo contains file information including file length and replication.

ClientProtocol should also contain
public LocatedBlock[] getBlockLocations(String src, int offset, int length) 
throws IOException;
offset - is the starting offset in the file
length - is the number of bytes the client is supposed to read

class LocatedBlock should include an additional field
+ long startFrom;  which determines the offset within the block to the desired 
region of bytes.

Then we will need to reimplement seeks and reads for DFSInputStream using that 
API.
What would be a good default for the number of blocks that getBlockLocations()
would fetch per call if the file is read from start to finish?

> dfs client protocol should allow asking for parts of the block map
> ------------------------------------------------------------------
>
>                 Key: HADOOP-894
>                 URL: https://issues.apache.org/jira/browse/HADOOP-894
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: Owen O'Malley
>         Assigned To: Wendy Chien
>
> I think that the HDFS client protocol should change like:
> /** The meta-data about a file that was opened. */
> class OpenFileInfo {
>   /** the info for the first block */
>   public LocatedBlockInfo getBlockInfo();
>   public long getBlockSize();
>   public long getLength();
> }
> interface ClientProtocol extends VersionedProtocol {
>   public OpenFileInfo open(String name) throws IOException;
>   /** get block info for any range of blocks */
>   public LocatedBlockInfo[] getBlockInfo(String name, int blockOffset, int 
> blockLength) throws IOException;
> }
> so that the client can decide how much block info to request and when. 
> Currently, when the file is opened or an error occurs, the entire block list 
> is requested and sent.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to