[
https://issues.apache.org/jira/browse/HADOOP-2758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12569161#action_12569161
]
Konstantin Shvachko commented on HADOOP-2758:
---------------------------------------------
# Could you please add comments about the format of the send/receive protocol
for blocks.
We do not have formal definition of the protocol, and it is very hard to
understand the code
without knowing the general structure of the packet and the bytes and nits it
is composed of.
# Do we still need the notion of a chunk? Previously it corresponded to 512
data bytes prefixed
by a 4 byte checksum. Now it is just a sequence if checksums followed by data.
I think any mentioning of chunks should be either removed or replaced by
something else in order
to avoid confusion with the previous version.
# Too many (20 or so) members in BlockSender and BlockReceiver.
#- Some of them can be made local variables
checksumSize
#- some are used merely to pass parameters between or into internal methods
BlockSender.throttler
BlockSender.out
BlockSender.corruptChecksumOk
BlockSender.chunkOffsetOK
#- and some are derivatives (computable) from other variables.
#- Please check the BlockReceiver as well.
#- Most of them are not introduced by this patch, but it'd be great if you
could work on
that to make the code more understandable. Right now it is in incomprehensible
state.
# Constants should be named in all capital letterers.
{code}
static final int packetHeaderLen =
{code}
# DATA_TRANFER_VERSION
#- Spelling _TRANSFER_
#- Please leave javadoc-style comments in place.
#- Please remove previous version descriptions.
#- I generally do not understand what is the meaning of this constant,
introduced btw in HADOOP-1134.
If DatanodeInfo serialization changes then both ClientProtocol and
DatanodeProtocol versions should be
incremented. Why do we need extra version?
Only the last version changes should be described, the old ones could be
retrieved from svn.
# enumerateThreadGroup() is not used locally.
# Spelling: should be "stream"
{code}
private InputStream blockIn; // data strean
{code}
Sorry, I did not get to the essentials of your patch. But it is really hard to
understand without
knowing the details of the protocol.
> Reduce memory copies when data is read from DFS
> -----------------------------------------------
>
> Key: HADOOP-2758
> URL: https://issues.apache.org/jira/browse/HADOOP-2758
> Project: Hadoop Core
> Issue Type: Improvement
> Components: dfs
> Reporter: Raghu Angadi
> Assignee: Raghu Angadi
> Fix For: 0.17.0
>
> Attachments: HADOOP-2758.patch
>
>
> Currently datanode and client part of DFS perform multiple copies of data on
> the 'read path' (i.e. path from storage on datanode to user buffer on the
> client). This jira reduces these copies by enhancing data read protocol and
> implementation of read on both datanode and the client. I will describe the
> changes in next comment.
> Requirement is that this fix should reduce CPU used and should not cause
> regression in any benchmarks. It might not improve the benchmarks since most
> benchmarks are not cpu bound.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.