[ https://issues.apache.org/jira/browse/HDFS-6286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14010718#comment-14010718 ]
Liang Xie commented on HDFS-6286: --------------------------------- bq. There is a high overhead to adding communication between threads to every read, and I don't think we want this in short-circuit reads (which is an optimization, after all) Indeed, i am fine with my prototype not in community codebase, just as a kindly heads up to notice this corner case:) It doesn't help for regular request perf, just against the long tail request. bq. If we create an extra thread per DFSInputStream using SCR i used a thread pool, so the overhead should be acceptable, and when i checked the timeout/execution exception, the upper layer will treat that dn as datanode immediately, so it was expected no halt pool be observed per my understanding. bq. I am going to create a JIRA to implement hedged reads for the non-pread case. I think that will be a better general solution that doesn't have the above-mentioned problems. Cool, i also have got some of your concerns, and i totally agree that we need a more general solution in community code like hedged reads for regular read. Let's work on HDFS-6450 now and close this one. > adding a timeout setting for local read io > ------------------------------------------ > > Key: HDFS-6286 > URL: https://issues.apache.org/jira/browse/HDFS-6286 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client > Affects Versions: 3.0.0, 2.4.0 > Reporter: Liang Xie > Assignee: Liang Xie > > Currently, if a write or remote read requested into a sick disk, > DFSClient.hdfsTimeout could help the caller have a guaranteed time cost to > return back. but it doesn't work on local read. Take HBase scan for example, > DFSInputStream.read -> readWithStrategy -> readBuffer -> > BlockReaderLocal.read -> dataIn.read -> FileChannelImpl.read > if it hits a bad disk, the low read io probably takes tens of seconds, and > what's worse is, the "DFSInputStream.read" hold a lock always. > Per my knowledge, there's no good mechanism to cancel a running read > io(Please correct me if it's wrong), so my opinion is adding a future around > the read request, and we could set a timeout there, if the threshold reached, > we can add the local node into deadnode probably... > Any thought? -- This message was sent by Atlassian JIRA (v6.2#6252)