[ 
https://issues.apache.org/jira/browse/HADOOP-2758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12569380#action_12569380
 ] 

Raghu Angadi commented on HADOOP-2758:
--------------------------------------

Regd couple of concerns in Konstantin's review :

- > 2. Do we still need the notion of a chunk? [...]
-- I think so. A CRC chunk is still central to many things that DataNode and 
DFSClients do. It is very useful for discussions, descriptions and even in code 
to have a single word to consistently describe this essential unit of DFS data. 
If we see a member called 'sendChunk()', its clear what it sends. For e.g. this 
patch renamed {{sendChunk()}} to {{sendChunks(int)}} because it sends multiple 
CRC chunks.

- > 5. DATA_TRANSFER_VERSION : I generally do not understand what is the 
meaning of this constant, [...]
-- data transfers do not use RPCs. As noted in the comment, it unfortunately 
does depend on Datanode serializations. Probably it should not. This is 
analogous RPC versions and a Protocol version, which are at two different 
levels of the stack.

> Reduce memory copies when data is read from DFS
> -----------------------------------------------
>
>                 Key: HADOOP-2758
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2758
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: Raghu Angadi
>            Assignee: Raghu Angadi
>             Fix For: 0.17.0
>
>         Attachments: HADOOP-2758.patch
>
>
> Currently datanode and client part of DFS perform multiple copies of data on 
> the 'read path' (i.e. path from storage on datanode to user buffer on the 
> client). This jira reduces these copies by enhancing data read protocol and 
> implementation of read on both datanode and the client. I will describe the 
> changes in next comment.
> Requirement is that this fix should reduce CPU used and should not cause 
> regression in any benchmarks. It might not improve the benchmarks since most 
> benchmarks are not cpu bound.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to