[ 
https://issues.apache.org/jira/browse/HDFS-7608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14315315#comment-14315315
 ] 

Chris Nauroth commented on HDFS-7608:
-------------------------------------

[~cmccabe], thanks for double-checking me on this.  I'm also now starting to 
wonder if HDFS-7005 had unintended side effects.  By setting read timeout as a 
socket option in {{DFSClient#newConnectedPeer}}, the setting also would have 
applied for {{DFSOutputStream}}, and thus circumvented the extension time that 
{{DFSClient#getDatanodeReadTimeout}} wants to apply.

bq. For example, in RemoteBlockReader2#newBlockReader, we are writing stuff to 
the socket, all before ever calling DFSClient#getDataNodeWriteTimeout.

Yes, you're right.  I suppose a complete implementation, with retention of the 
"timeout extension" behavior, is going to require pushing the {{NetUtils}} 
socket wrapping calls up to those earlier layers, or setting the socket option 
at a point where we know the number of nodes in the pipeline, and therefore can 
calculate the extension.  I expect either of those are going to be much more 
invasive changes than the posted patch.

bq. I'm not even sure most HDFS developers could answer which one(s) this key 
does, if quizzed.

I sure couldn't without reading the code fresh again today.  :-)

+1 for your proposal for new, clearer configuration properties.

> hdfs dfsclient  newConnectedPeer has no write timeout
> -----------------------------------------------------
>
>                 Key: HDFS-7608
>                 URL: https://issues.apache.org/jira/browse/HDFS-7608
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: dfsclient, fuse-dfs
>    Affects Versions: 2.3.0, 2.6.0
>         Environment: hdfs 2.3.0  hbase 0.98.6
>            Reporter: zhangshilong
>            Assignee: Xiaoyu Yao
>              Labels: patch
>         Attachments: HDFS-7608.0.patch, HDFS-7608.1.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> problem:
> hbase compactSplitThread may lock forever on  read datanode blocks.
> debug found:  epollwait timeout set to 0,so epollwait can not  run out.
> cause: in hdfs 2.3.0
> hbase using DFSClient to read and write blocks.
> DFSClient  creates one socket using newConnectedPeer(addr), but has no read 
> or write timeout. 
> in v 2.6.0,  newConnectedPeer has added readTimeout to deal with the 
> problem,but did not add writeTimeout. why did not add write Timeout?
> I think NioInetPeer need a default socket timeout,so appalications will no 
> need to force adding timeout by themselives. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to