[ https://issues.apache.org/jira/browse/HDFS-4273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13511012#comment-13511012 ]
Jing Zhao commented on HDFS-4273: --------------------------------- I do not think the DFSInputstream#read will be used concurrently. Thus the failure variable reset should be correct. Also, to clear deadNodes looks reasonable especially when you have multiple replications (since at that you do not have any candidate nodes to try and some previous temporary "deaths" may already have been recovered). > Problem in DFSInputStream read retry logic may cause early failure > ------------------------------------------------------------------ > > Key: HDFS-4273 > URL: https://issues.apache.org/jira/browse/HDFS-4273 > Project: Hadoop HDFS > Issue Type: Bug > Reporter: Binglin Chang > Assignee: Binglin Chang > Priority: Minor > Attachments: TestDFSInputStream.java > > > Assume the following call logic > {noformat} > readWithStrategy() > -> blockSeekTo() > -> readBuffer() > -> reader.doRead() > -> seekToNewSource() add currentNode to deadnode, wish to get a > different datanode > -> blockSeekTo() > -> chooseDataNode() > -> block missing, clear deadNodes and pick the currentNode again > seekToNewSource() return false > readBuffer() re-throw the exception quit loop > readWithStrategy() got the exception, and may fail the read call before > tried MaxBlockAcquireFailures. > {noformat} > some issues of the logic: > 1. seekToNewSource() logic is broken because it may clear deadNodes in the > middle. > 2. the variable "int retries=2" in readWithStrategy seems have conflict with > MaxBlockAcquireFailures, should it be removed? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira