[ 
https://issues.apache.org/jira/browse/HDFS-4273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Binglin Chang updated HDFS-4273:
--------------------------------

    Description: 
Follow issues in DFSInputStream is address in this jira:
1. read may not retry enough in some cases cause early failure
Assume the following call logic
{noformat} 
readWithStrategy()
  -> blockSeekTo()
  -> readBuffer()
     -> reader.doRead()
     -> seekToNewSource() add currentNode to deadnode, wish to get a different 
datanode
        -> blockSeekTo()
           -> chooseDataNode()
              -> block missing, clear deadNodes and pick the currentNode again
        seekToNewSource() return false
     readBuffer() re-throw the exception quit loop
readWithStrategy() got the exception,  and may fail the read call before tried 
MaxBlockAcquireFailures.
{noformat} 

2. In multi-threaded scenario(like hbase), DFSInputStream.failures has race 
condition, it cleared to 0 when it is still used by other thread. So it is 
possible that  some read thread may never quit.

3. 

  was:
Assume the following call logic
{noformat} 
readWithStrategy()
  -> blockSeekTo()
  -> readBuffer()
     -> reader.doRead()
     -> seekToNewSource() add currentNode to deadnode, wish to get a different 
datanode
        -> blockSeekTo()
           -> chooseDataNode()
              -> block missing, clear deadNodes and pick the currentNode again
        seekToNewSource() return false
     readBuffer() re-throw the exception quit loop
readWithStrategy() got the exception,  and may fail the read call before tried 
MaxBlockAcquireFailures.
{noformat} 
some issues of the logic:
1. seekToNewSource() logic is broken because it may clear deadNodes in the 
middle.
2. the variable "int retries=2" in readWithStrategy seems have conflict with 
MaxBlockAcquireFailures, should it be removed?



> Fix some issue in DFSInputstream
> --------------------------------
>
>                 Key: HDFS-4273
>                 URL: https://issues.apache.org/jira/browse/HDFS-4273
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 2.0.2-alpha
>            Reporter: Binglin Chang
>            Assignee: Binglin Chang
>            Priority: Minor
>         Attachments: HDFS-4273-v2.patch, HDFS-4273.patch, HDFS-4273.v3.patch, 
> HDFS-4273.v4.patch, HDFS-4273.v5.patch, HDFS-4273.v6.patch, 
> HDFS-4273.v7.patch, TestDFSInputStream.java
>
>
> Follow issues in DFSInputStream is address in this jira:
> 1. read may not retry enough in some cases cause early failure
> Assume the following call logic
> {noformat} 
> readWithStrategy()
>   -> blockSeekTo()
>   -> readBuffer()
>      -> reader.doRead()
>      -> seekToNewSource() add currentNode to deadnode, wish to get a 
> different datanode
>         -> blockSeekTo()
>            -> chooseDataNode()
>               -> block missing, clear deadNodes and pick the currentNode again
>         seekToNewSource() return false
>      readBuffer() re-throw the exception quit loop
> readWithStrategy() got the exception,  and may fail the read call before 
> tried MaxBlockAcquireFailures.
> {noformat} 
> 2. In multi-threaded scenario(like hbase), DFSInputStream.failures has race 
> condition, it cleared to 0 when it is still used by other thread. So it is 
> possible that  some read thread may never quit.
> 3. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to