[jira] Commented: (HADOOP-2758) Reduce memory copies when data is read from DFS

Konstantin Shvachko (JIRA) Thu, 14 Feb 2008 18:45:29 -0800

    [ 
https://issues.apache.org/jira/browse/HADOOP-2758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12569161#action_12569161
 ]


Konstantin Shvachko commented on HADOOP-2758:
---------------------------------------------

# Could you please add comments about the format of the send/receive protocol 
for blocks.
We do not have formal definition of the protocol, and it is very hard to 
understand the code
without knowing the general structure of the packet and the bytes and nits it 
is composed of.
# Do we still need the notion of a chunk? Previously it corresponded to 512 
data bytes prefixed 
by a 4 byte checksum. Now it is just a sequence if checksums followed by data.
I think any mentioning of chunks should be either removed or replaced by 
something else in order
to avoid confusion with the previous version.
# Too many (20 or so) members in BlockSender and BlockReceiver.
#- Some of them can be made local variables
checksumSize
#- some are used merely to pass parameters between or into internal methods
BlockSender.throttler
BlockSender.out
BlockSender.corruptChecksumOk
BlockSender.chunkOffsetOK
#- and some are derivatives (computable) from other variables.
#- Please check the BlockReceiver as well.
#- Most of them are not introduced by this patch, but it'd be great if you 
could work on 
that to make the code more understandable. Right now it is in incomprehensible 
state.
# Constants should be named in all capital letterers.
{code}
static final int packetHeaderLen =
{code}
# DATA_TRANFER_VERSION
#- Spelling _TRANSFER_
#- Please leave javadoc-style comments in place.
#- Please remove previous version descriptions.
#- I generally do not understand what is the meaning of this constant, 
introduced btw in HADOOP-1134.
If DatanodeInfo serialization changes then both ClientProtocol and 
DatanodeProtocol versions should be 
incremented. Why do we need extra version?
Only the last version changes should be described, the old ones could be 
retrieved from svn.
# enumerateThreadGroup() is not used locally.
# Spelling: should be "stream"
{code}
  private InputStream blockIn; // data strean
{code}

Sorry, I did not get to the essentials of your patch. But it is really hard to 
understand without 
knowing the details of the protocol.

> Reduce memory copies when data is read from DFS
> -----------------------------------------------
>
>                 Key: HADOOP-2758
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2758
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: Raghu Angadi
>            Assignee: Raghu Angadi
>             Fix For: 0.17.0
>
>         Attachments: HADOOP-2758.patch
>
>
> Currently datanode and client part of DFS perform multiple copies of data on 
> the 'read path' (i.e. path from storage on datanode to user buffer on the 
> client). This jira reduces these copies by enhancing data read protocol and 
> implementation of read on both datanode and the client. I will describe the 
> changes in next comment.
> Requirement is that this fix should reduce CPU used and should not cause 
> regression in any benchmarks. It might not improve the benchmarks since most 
> benchmarks are not cpu bound.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2758) Reduce memory copies when data is read from DFS

Reply via email to