[
https://issues.apache.org/jira/browse/HDFS-3357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13267796#comment-13267796
]
Eli Collins commented on HDFS-3357:
-----------------------------------
It's worth pointing out that we now have a timeout for the non-cached case as
well. This changes fixes two bugs: #1 that the setSoTimeout for the keepalive
was a NOP, and #2 that there was no default timeout set on the socket given to
DataXceiver. Perhaps DataXceiverServer could use a comment:
{code}
s = ss.accept();
s.setTcpNoDelay(true);
+ // DataXceiver sets the socket timeout
{code}
Looks good, +1 pending jenkins
> DataXceiver reads from client socket with incorrect/no timeout
> --------------------------------------------------------------
>
> Key: HDFS-3357
> URL: https://issues.apache.org/jira/browse/HDFS-3357
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: data-node
> Affects Versions: 1.0.2, 2.0.0
> Reporter: Todd Lipcon
> Assignee: Todd Lipcon
> Priority: Critical
> Attachments: hdfs-3357.txt
>
>
> In DataXceiver, we currently use Socket.setSoTimeout to try to manage the
> read timeout when switching between reading the initial opCode, reading a
> keepalive opcode, and reading the status after a successfully sent block.
> However, since all of these reads use the same underlying DataInputStream,
> the change to the socket timeout isn't respected. Thus, they all occur with
> whatever timeout is set on the socket at the time of DataXceiver
> construction. In practice this turns out to be 0, which can cause infinitely
> hung xceivers.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira