[ https://issues.apache.org/jira/browse/HDFS-17801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
farmmamba updated HDFS-17801: ----------------------------- Description: Currently , there is no retry policy If we meet IOExcetion when creating block reader to read EC file. Suppose below case (using RS-6-3-1024k): The first 4 to 6 data blocks' datanodes are very busy at the same time, createBlockReader will timeout. This will cause read failure, we should make EC support retry mechanism to mitigate read failure. was: *Description of PR* Under the 3-replication read implementation, when an IOException occurs, there is the retryCurrentNode mechanism. This is very useful to avoid application level failures due to transient errors (e.g. Datanode could have closed the connection because the client is idle for too long). Please refer to below codes : [https://github.com/apache/hadoop/blob/6eae1589aeea9bd9c6885e405bd9be5ef6199df7/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java#L824-L828] We should make EC read also support this mechanism. BTW, this issue is motivated by the failure of our cluster's applications failure when we change the data from 3-rep to EC policy. *How was this patch tested?* Add an unit test. > EC: Reading support retryCurrentNode to avoid transient errors cause > application level failures > ----------------------------------------------------------------------------------------------- > > Key: HDFS-17801 > URL: https://issues.apache.org/jira/browse/HDFS-17801 > Project: Hadoop HDFS > Issue Type: Improvement > Reporter: farmmamba > Assignee: farmmamba > Priority: Major > Labels: pull-request-available > > Currently , there is no retry policy If we meet IOExcetion when creating > block reader to read EC file. Suppose below case (using RS-6-3-1024k): > The first 4 to 6 data blocks' datanodes are very busy at the same time, > createBlockReader will timeout. This will cause read failure, we should make > EC support retry mechanism to mitigate read failure. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org