[ https://issues.apache.org/jira/browse/HDFS-7608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14315315#comment-14315315 ]
Chris Nauroth commented on HDFS-7608: ------------------------------------- [~cmccabe], thanks for double-checking me on this. I'm also now starting to wonder if HDFS-7005 had unintended side effects. By setting read timeout as a socket option in {{DFSClient#newConnectedPeer}}, the setting also would have applied for {{DFSOutputStream}}, and thus circumvented the extension time that {{DFSClient#getDatanodeReadTimeout}} wants to apply. bq. For example, in RemoteBlockReader2#newBlockReader, we are writing stuff to the socket, all before ever calling DFSClient#getDataNodeWriteTimeout. Yes, you're right. I suppose a complete implementation, with retention of the "timeout extension" behavior, is going to require pushing the {{NetUtils}} socket wrapping calls up to those earlier layers, or setting the socket option at a point where we know the number of nodes in the pipeline, and therefore can calculate the extension. I expect either of those are going to be much more invasive changes than the posted patch. bq. I'm not even sure most HDFS developers could answer which one(s) this key does, if quizzed. I sure couldn't without reading the code fresh again today. :-) +1 for your proposal for new, clearer configuration properties. > hdfs dfsclient newConnectedPeer has no write timeout > ----------------------------------------------------- > > Key: HDFS-7608 > URL: https://issues.apache.org/jira/browse/HDFS-7608 > Project: Hadoop HDFS > Issue Type: Bug > Components: dfsclient, fuse-dfs > Affects Versions: 2.3.0, 2.6.0 > Environment: hdfs 2.3.0 hbase 0.98.6 > Reporter: zhangshilong > Assignee: Xiaoyu Yao > Labels: patch > Attachments: HDFS-7608.0.patch, HDFS-7608.1.patch > > Original Estimate: 24h > Remaining Estimate: 24h > > problem: > hbase compactSplitThread may lock forever on read datanode blocks. > debug found: epollwait timeout set to 0,so epollwait can not run out. > cause: in hdfs 2.3.0 > hbase using DFSClient to read and write blocks. > DFSClient creates one socket using newConnectedPeer(addr), but has no read > or write timeout. > in v 2.6.0, newConnectedPeer has added readTimeout to deal with the > problem,but did not add writeTimeout. why did not add write Timeout? > I think NioInetPeer need a default socket timeout,so appalications will no > need to force adding timeout by themselives. -- This message was sent by Atlassian JIRA (v6.3.4#6332)