[ https://issues.apache.org/jira/browse/HDFS-4273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Binglin Chang updated HDFS-4273: -------------------------------- Description: Follow issues in DFSInputStream is address in this jira: 1. read may not retry enough in some cases cause early failure Assume the following call logic {noformat} readWithStrategy() -> blockSeekTo() -> readBuffer() -> reader.doRead() -> seekToNewSource() add currentNode to deadnode, wish to get a different datanode -> blockSeekTo() -> chooseDataNode() -> block missing, clear deadNodes and pick the currentNode again seekToNewSource() return false readBuffer() re-throw the exception quit loop readWithStrategy() got the exception, and may fail the read call before tried MaxBlockAcquireFailures. {noformat} 2. In multi-threaded scenario(like hbase), DFSInputStream.failures has race condition, it cleared to 0 when it is still used by other thread. So it is possible that some read thread may never quit. 3. was: Assume the following call logic {noformat} readWithStrategy() -> blockSeekTo() -> readBuffer() -> reader.doRead() -> seekToNewSource() add currentNode to deadnode, wish to get a different datanode -> blockSeekTo() -> chooseDataNode() -> block missing, clear deadNodes and pick the currentNode again seekToNewSource() return false readBuffer() re-throw the exception quit loop readWithStrategy() got the exception, and may fail the read call before tried MaxBlockAcquireFailures. {noformat} some issues of the logic: 1. seekToNewSource() logic is broken because it may clear deadNodes in the middle. 2. the variable "int retries=2" in readWithStrategy seems have conflict with MaxBlockAcquireFailures, should it be removed? > Fix some issue in DFSInputstream > -------------------------------- > > Key: HDFS-4273 > URL: https://issues.apache.org/jira/browse/HDFS-4273 > Project: Hadoop HDFS > Issue Type: Bug > Affects Versions: 2.0.2-alpha > Reporter: Binglin Chang > Assignee: Binglin Chang > Priority: Minor > Attachments: HDFS-4273-v2.patch, HDFS-4273.patch, HDFS-4273.v3.patch, > HDFS-4273.v4.patch, HDFS-4273.v5.patch, HDFS-4273.v6.patch, > HDFS-4273.v7.patch, TestDFSInputStream.java > > > Follow issues in DFSInputStream is address in this jira: > 1. read may not retry enough in some cases cause early failure > Assume the following call logic > {noformat} > readWithStrategy() > -> blockSeekTo() > -> readBuffer() > -> reader.doRead() > -> seekToNewSource() add currentNode to deadnode, wish to get a > different datanode > -> blockSeekTo() > -> chooseDataNode() > -> block missing, clear deadNodes and pick the currentNode again > seekToNewSource() return false > readBuffer() re-throw the exception quit loop > readWithStrategy() got the exception, and may fail the read call before > tried MaxBlockAcquireFailures. > {noformat} > 2. In multi-threaded scenario(like hbase), DFSInputStream.failures has race > condition, it cleared to 0 when it is still used by other thread. So it is > possible that some read thread may never quit. > 3. -- This message was sent by Atlassian JIRA (v6.1.5#6160)