[ 
https://issues.apache.org/jira/browse/HDFS-17801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17987123#comment-17987123
 ] 

ASF GitHub Bot commented on HDFS-17801:
---------------------------------------

hfutatzhanghb opened a new pull request, #7774:
URL: https://github.com/apache/hadoop/pull/7774

   ### Description of PR
   Current , there is no retry policy If we meet IOExcetion when creating block 
reader to read EC file. Suppose below case (using RS-6-3-1024k):
   
   The first 4 to 6 data blocks' datanodes are very busy at the same time, 
createBlockReader will timeout. This will cause read failure, we should make EC 
support retry mechanism to mitigate read failure.
   
   ### How was this patch tested?
   Add an unit test to reproduce.




> EC: Reading support retryCurrentNode to avoid transient errors cause 
> application level failures
> -----------------------------------------------------------------------------------------------
>
>                 Key: HDFS-17801
>                 URL: https://issues.apache.org/jira/browse/HDFS-17801
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: farmmamba
>            Assignee: farmmamba
>            Priority: Major
>              Labels: pull-request-available
>
> *Description of PR*
>   Under the 3-replication read implementation, when an IOException occurs, 
> there is the retryCurrentNode mechanism.
> This is very useful to avoid application level failures due to transient 
> errors (e.g. Datanode could have closed the connection because the client is 
> idle for too long).  Please refer to below codes : 
> [https://github.com/apache/hadoop/blob/6eae1589aeea9bd9c6885e405bd9be5ef6199df7/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java#L824-L828]
>   We should make EC read also support this mechanism. 
>   BTW, this issue is motivated by the failure of our cluster's applications 
> failure when we change the data from 3-rep to EC policy.
> *How was this patch tested?*
> Add an unit test.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to