[ 
https://issues.apache.org/jira/browse/HDFS-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16716690#comment-16716690
 ] 

Kitti Nanasi commented on HDFS-14134:
-------------------------------------

Thanks for the new patch [~lukmajercak]!

It looks better regarding retrying on non-remote IOExceptions, but there is one 
thing I don't understand, which I think is wrong in the pdf as well. In case of 
remote IOException, we should retry if the operation is idempotent, and not the 
opposite. 
 So instead of this code:
{code:java}
else if (e instanceof IOException) {
        if (e instanceof RemoteException && isIdempotentOrAtMostOnce) {
          return new RetryAction(RetryAction.RetryDecision.FAIL, 0,
              "Remote exception and the invoked method is idempotent " +
                  "or at most once.");
        }
        return new RetryAction(RetryAction.RetryDecision.FAILOVER_AND_RETRY,
            getFailoverOrRetrySleepTime(failovers));
}
{code}
I think it should look like this:
{code:java}
else if (e instanceof IOException) {
        if (e instanceof RemoteException && !isIdempotentOrAtMostOnce) {
          return new RetryAction(RetryAction.RetryDecision.FAIL, 0,
              "Remote exception and the invoked method is idempotent " +
                  "or at most once.");
        }
        return new RetryAction(RetryAction.RetryDecision.FAILOVER_AND_RETRY,
            getFailoverOrRetrySleepTime(failovers));
}
{code}
What do you think [~lukmajercak]?
  

> Idempotent operations throwing RemoteException should not be retried by the 
> client
> ----------------------------------------------------------------------------------
>
>                 Key: HDFS-14134
>                 URL: https://issues.apache.org/jira/browse/HDFS-14134
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: hdfs, hdfs-client, ipc
>            Reporter: Lukas Majercak
>            Assignee: Lukas Majercak
>            Priority: Critical
>         Attachments: HDFS-14134.001.patch, HDFS-14134.002.patch, 
> HDFS-14134.003.patch, HDFS-14134.004.patch, HDFS-14134.005.patch, 
> HDFS-14134_retrypolicy_change_proposal.pdf
>
>
> Currently, some operations that throw IOException on the NameNode are 
> evaluated by RetryPolicy as FAILOVER_AND_RETRY, but they should just fail 
> fast.
> For example, when calling getXAttr("user.some_attr", file") where the file 
> does not have the attribute, NN throws an IOException with message "could not 
> find attr". The current client retry policy determines the action for that to 
> be FAILOVER_AND_RETRY. The client then fails over and retries until it 
> reaches the maximum number of retries. Supposedly, the client should be able 
> to tell that this exception is normal and fail fast. 
> Moreover, even if the action was FAIL, the RetryInvocationHandler looks at 
> all the retry actions from all requests, and FAILOVER_AND_RETRY takes 
> precedence over FAIL action.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to