[ https://issues.apache.org/jira/browse/HDFS-4273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Masatake Iwasaki updated HDFS-4273: ----------------------------------- Resolution: Won't Fix Target Version/s: 2.0.3-alpha, 3.0.0 (was: 3.0.0, 2.0.3-alpha) Status: Resolved (was: Patch Available) I looked into tests added by .v8 patch. {{TestDFSClientRetries#testDFSInputStreamReadRetryTime}} added by .v8 patch. The test expects client to always retry up to maxBlockAcquireFailures but it is not true. Client does not retry to same node on ChecksumException. {{seekToNewSource}} returning 0 means there is no more possible datanodes and it is right to give up even if retry count (failures) does not reache to max. {{testSeekToNewSourcePastFileSize}} and {{testNegativeSeekToNewSource}} added to {{TestSeekBug}} calls {{FSDataInpuStream#seekToNewSource}} just after opening file. This causes NullPointerException because currentNode is not set in DFSInputstream. Tests passed after fixing this. I close this issue as Won't fix. > Fix some issue in DFSInputstream > -------------------------------- > > Key: HDFS-4273 > URL: https://issues.apache.org/jira/browse/HDFS-4273 > Project: Hadoop HDFS > Issue Type: Bug > Affects Versions: 2.0.2-alpha > Reporter: Binglin Chang > Assignee: Binglin Chang > Priority: Minor > Attachments: HDFS-4273-v2.patch, HDFS-4273.patch, HDFS-4273.v3.patch, > HDFS-4273.v4.patch, HDFS-4273.v5.patch, HDFS-4273.v6.patch, > HDFS-4273.v7.patch, HDFS-4273.v8.patch, TestDFSInputStream.java > > > Following issues in DFSInputStream are addressed in this jira: > 1. read may not retry enough in some cases cause early failure > Assume the following call logic > {noformat} > readWithStrategy() > -> blockSeekTo() > -> readBuffer() > -> reader.doRead() > -> seekToNewSource() add currentNode to deadnode, wish to get a > different datanode > -> blockSeekTo() > -> chooseDataNode() > -> block missing, clear deadNodes and pick the currentNode again > seekToNewSource() return false > readBuffer() re-throw the exception quit loop > readWithStrategy() got the exception, and may fail the read call before > tried MaxBlockAcquireFailures. > {noformat} > 2. In multi-threaded scenario(like hbase), DFSInputStream.failures has race > condition, it is cleared to 0 when it is still used by other thread. So it is > possible that some read thread may never quit. Change failures to local > variable solve this issue. > 3. If local datanode is added to deadNodes, it will not be removed from > deadNodes if DN is back alive. We need a way to remove local datanode from > deadNodes when the local datanode is become live. -- This message was sent by Atlassian JIRA (v6.3.4#6332)