[ 
https://issues.apache.org/jira/browse/HADOOP-14521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16186697#comment-16186697
 ] 

Xiao Chen commented on HADOOP-14521:
------------------------------------

Thanks for the ping Arun.

I'm hoping to get this in soon. Does any watcher have cycles to review? For 
convenience, below is the diff of LBKMSCP between patch 11 and the previously 
committed patch 10.
{noformat}
> --- 
> a/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/crypto/key/kms/LoadBalancingKMSClientProvider.java
> +++ 
> b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/crypto/key/kms/LoadBalancingKMSClientProvider.java
109c109
< @@ -80,24 +87,79 @@ public LoadBalancingKMSClientProvider(KMSClientProvider[] 
providers,
---
> @@ -80,24 +87,82 @@ public LoadBalancingKMSClientProvider(KMSClientProvider[] 
> providers,
171c171,174
< +        if (action.action == RetryAction.RetryDecision.FAIL) {
---
> +        // make sure each provider is tried at least once, to keep behavior
> +        // compatible with earlier versions of LBKMSCP
> +        if (action.action == RetryAction.RetryDecision.FAIL
> +            && numFailovers >= providers.length - 1) {
193c196
{noformat}

> KMS client needs retry logic
> ----------------------------
>
>                 Key: HADOOP-14521
>                 URL: https://issues.apache.org/jira/browse/HADOOP-14521
>             Project: Hadoop Common
>          Issue Type: Improvement
>    Affects Versions: 2.6.0
>            Reporter: Rushabh S Shah
>            Assignee: Rushabh S Shah
>         Attachments: HADOOP-14521.09.patch, HADOOP-14521.11.patch, 
> HADOOP-14521-branch-2.8.002.patch, HADOOP-14521-branch-2.8.2.patch, 
> HADOOP-14521-trunk-10.patch, HDFS-11804-branch-2.8.patch, 
> HDFS-11804-trunk-1.patch, HDFS-11804-trunk-2.patch, HDFS-11804-trunk-3.patch, 
> HDFS-11804-trunk-4.patch, HDFS-11804-trunk-5.patch, HDFS-11804-trunk-6.patch, 
> HDFS-11804-trunk-7.patch, HDFS-11804-trunk-8.patch, HDFS-11804-trunk.patch
>
>
> The kms client appears to have no retry logic – at all.  It's completely 
> decoupled from the ipc retry logic.  This has major impacts if the KMS is 
> unreachable for any reason, including but not limited to network connection 
> issues, timeouts, the +restart during an upgrade+.
> This has some major ramifications:
> # Jobs may fail to submit, although oozie resubmit logic should mask it
> # Non-oozie launchers may experience higher rates if they do not already have 
> retry logic.
> # Tasks reading EZ files will fail, probably be masked by framework reattempts
> # EZ file creation fails after creating a 0-length file – client receives 
> EDEK in the create response, then fails when decrypting the EDEK
> # Bulk hadoop fs copies, and maybe distcp, will prematurely fail



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to