[ https://issues.apache.org/jira/browse/HDFS-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16716690#comment-16716690 ]
Kitti Nanasi commented on HDFS-14134: ------------------------------------- Thanks for the new patch [~lukmajercak]! It looks better regarding retrying on non-remote IOExceptions, but there is one thing I don't understand, which I think is wrong in the pdf as well. In case of remote IOException, we should retry if the operation is idempotent, and not the opposite. So instead of this code: {code:java} else if (e instanceof IOException) { if (e instanceof RemoteException && isIdempotentOrAtMostOnce) { return new RetryAction(RetryAction.RetryDecision.FAIL, 0, "Remote exception and the invoked method is idempotent " + "or at most once."); } return new RetryAction(RetryAction.RetryDecision.FAILOVER_AND_RETRY, getFailoverOrRetrySleepTime(failovers)); } {code} I think it should look like this: {code:java} else if (e instanceof IOException) { if (e instanceof RemoteException && !isIdempotentOrAtMostOnce) { return new RetryAction(RetryAction.RetryDecision.FAIL, 0, "Remote exception and the invoked method is idempotent " + "or at most once."); } return new RetryAction(RetryAction.RetryDecision.FAILOVER_AND_RETRY, getFailoverOrRetrySleepTime(failovers)); } {code} What do you think [~lukmajercak]? > Idempotent operations throwing RemoteException should not be retried by the > client > ---------------------------------------------------------------------------------- > > Key: HDFS-14134 > URL: https://issues.apache.org/jira/browse/HDFS-14134 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, hdfs-client, ipc > Reporter: Lukas Majercak > Assignee: Lukas Majercak > Priority: Critical > Attachments: HDFS-14134.001.patch, HDFS-14134.002.patch, > HDFS-14134.003.patch, HDFS-14134.004.patch, HDFS-14134.005.patch, > HDFS-14134_retrypolicy_change_proposal.pdf > > > Currently, some operations that throw IOException on the NameNode are > evaluated by RetryPolicy as FAILOVER_AND_RETRY, but they should just fail > fast. > For example, when calling getXAttr("user.some_attr", file") where the file > does not have the attribute, NN throws an IOException with message "could not > find attr". The current client retry policy determines the action for that to > be FAILOVER_AND_RETRY. The client then fails over and retries until it > reaches the maximum number of retries. Supposedly, the client should be able > to tell that this exception is normal and fail fast. > Moreover, even if the action was FAIL, the RetryInvocationHandler looks at > all the retry actions from all requests, and FAILOVER_AND_RETRY takes > precedence over FAIL action. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org