[ 
https://issues.apache.org/jira/browse/HDDS-5317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharat Viswanadham updated HDDS-5317:
-------------------------------------
    Description: 
GetSCMCertificate can happen non-leader SCM, as rootCA is only run on primary 
SCM.
So, when an SCM is bootstrapped, let's say it connects first to a bootstrapped 
SCM, we fail with a SCMSecurityResponse with status set to NOT_A_PRIMARY_SCM. 
As we return with a response, failOver will not happen.

*SCMSecurityProtocolClientSideTranslatorPB*
{code:java}
  private SCMSecurityResponse handleError(SCMSecurityResponse resp)
      throws SCMSecurityException {
    if (resp.getStatus() != SCMSecurityProtocolProtos.Status.OK) {
      throw new SCMSecurityException(resp.getMessage(),
          SCMSecurityException.ErrorCode.values()[resp.getStatus().ordinal()]);
    }
    return resp;
  }
{code}

To solve this issue, one possible solution is on server check if it is 
SCMSecurityException with errorCode NOT_A_PRIMARY_SCM return a 
RetriableWithFailOverException. In this way, FailOverProxyProvider performs 
failOver and Retry to the next SCM.

The exception message is available in comments.


  was:
GetSCMCertificate can happen non-leader SCM, as rootCA is only run on primary 
SCM.
So, when an SCM is bootstrapped, let's say it connects first to a bootstrapped 
SCM, we fail with a SCMSecurityResponse with status set to NOT_A_PRIMARY_SCM. 
As we return with a response, failOver will not happen.

*SCMSecurityProtocolClientSideTranslatorPB*
{code:java}
  private SCMSecurityResponse handleError(SCMSecurityResponse resp)
      throws SCMSecurityException {
    if (resp.getStatus() != SCMSecurityProtocolProtos.Status.OK) {
      throw new SCMSecurityException(resp.getMessage(),
          SCMSecurityException.ErrorCode.values()[resp.getStatus().ordinal()]);
    }
    return resp;
  }
{code}

To solve this issue, one possible solution is on server check if it is 
SCMSecurityException with errorCode NOT_A_PRIMARY_SCM return a 
RetriableWithFailOverException. In this way, FailOverProxyProvider performs 
failOver and Retry to the next SCM.



> BootStrapped SCM fails to bootstrap if it connects to another bootstrapped 
> SCM first.
> -------------------------------------------------------------------------------------
>
>                 Key: HDDS-5317
>                 URL: https://issues.apache.org/jira/browse/HDDS-5317
>             Project: Apache Ozone
>          Issue Type: Bug
>          Components: SCM HA, Security
>            Reporter: Bharat Viswanadham
>            Assignee: Bharat Viswanadham
>            Priority: Blocker
>
> GetSCMCertificate can happen non-leader SCM, as rootCA is only run on primary 
> SCM.
> So, when an SCM is bootstrapped, let's say it connects first to a 
> bootstrapped SCM, we fail with a SCMSecurityResponse with status set to 
> NOT_A_PRIMARY_SCM. As we return with a response, failOver will not happen.
> *SCMSecurityProtocolClientSideTranslatorPB*
> {code:java}
>   private SCMSecurityResponse handleError(SCMSecurityResponse resp)
>       throws SCMSecurityException {
>     if (resp.getStatus() != SCMSecurityProtocolProtos.Status.OK) {
>       throw new SCMSecurityException(resp.getMessage(),
>           
> SCMSecurityException.ErrorCode.values()[resp.getStatus().ordinal()]);
>     }
>     return resp;
>   }
> {code}
> To solve this issue, one possible solution is on server check if it is 
> SCMSecurityException with errorCode NOT_A_PRIMARY_SCM return a 
> RetriableWithFailOverException. In this way, FailOverProxyProvider performs 
> failOver and Retry to the next SCM.
> The exception message is available in comments.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to