[ 
https://issues.apache.org/jira/browse/HDDS-7985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17719184#comment-17719184
 ] 

Nandakumar commented on HDDS-7985:
----------------------------------

[~NeilJoshi], in case of a disk failure on SCM we should decommission that SCM 
and then bootstrap it again. Cleaning up the storage location without 
decommissioning that node will cause issues as old details of the SCM is still 
present in the cluster.

> [SCM HA] On SCM Disk failure recovery causes Datanode Failure on startup 
> -------------------------------------------------------------------------
>
>                 Key: HDDS-7985
>                 URL: https://issues.apache.org/jira/browse/HDDS-7985
>             Project: Apache Ozone
>          Issue Type: Bug
>            Reporter: Neil Joshi
>            Assignee: Neil Joshi
>            Priority: Major
>              Labels: pull-request-available
>
> Recovery from an SCM disk failure when no backup is avail requires,
>  * Clean _ozone.scm.db.dirs_ __ and __ _ozone.metadata.dirs_ locations
> and bootstrapping the SCM.  Whether SCM is primodial or not an error occurs 
> when recovering from a failed disk with no backup when starting a datanode 
> after SCM recovery. 
>  
> Datanodes brought up after SCM disk failure recovery are unable to start due 
> to a CA certificate error observed, stating the number of certificates 
> received from the SCM is greater than the number expected:
> {code:java}
> ozonesecure-ha-datanode1-1  | 2023-02-17 00:46:40 INFO  HAUtils:457 - 
> Expected CA list size 4, where as received CA List size 5.{code}
> In this case when listing the certificates stored by the SCM, it reports a 
> total of 5 scm certificates after SCM2 recovers from disk failure:
>  
> {code:java}
> [email protected]
> [email protected]
> [email protected]
> [email protected]
> [email protected]
>  
> {code}
> It appears to have 2 entries for SCM 2 (the scm disk failure recovery node)
>  
> $ ozone admin certs list
> bash-4.2$ ozone admin cert list
> {code:java}
> Total 12 valid certificates: 
> SerialNumber      Valid From                     Expiry                       
>   Subject                                                                     
>                                   
> 1                 Fri Feb 17 00:00:00 UTC 2023   Mon Mar 27 00:00:00 UTC 2028 
>   O=CID-abb46225-77ba-4132-ac6e-96792b40450c, 
> OU=f02b032a-7da0-4132-8a31-61c3d078e6cb, [email protected]          
> 10760186198072    Fri Feb 17 00:00:00 UTC 2023   Mon Mar 27 00:00:00 UTC 2028 
>   O=CID-abb46225-77ba-4132-ac6e-96792b40450c, 
> OU=f02b032a-7da0-4132-8a31-61c3d078e6cb, [email protected]      
> 10779888473070    Fri Feb 17 00:00:00 UTC 2023   Sat Feb 17 00:00:00 UTC 2024 
>   O=CID-abb46225-77ba-4132-ac6e-96792b40450c, 
> OU=f02b032a-7da0-4132-8a31-61c3d078e6cb, CN=recon@recon           
> 10780166036417    Fri Feb 17 00:00:00 UTC 2023   Mon Mar 27 00:00:00 UTC 2028 
>   O=CID-abb46225-77ba-4132-ac6e-96792b40450c, 
> OU=f99f1a81-7cce-44c9-a09b-9f7bbc48b6ac, [email protected]      
> 10788394717480    Fri Feb 17 00:00:00 UTC 2023   Mon Mar 27 00:00:00 UTC 2028 
>   O=CID-abb46225-77ba-4132-ac6e-96792b40450c, 
> OU=598be6bc-7d86-4cab-84dc-668a162a7ec2, [email protected]      
> 10800769855768    Fri Feb 17 00:00:00 UTC 2023   Sat Feb 17 00:00:00 UTC 2024 
>   O=CID-abb46225-77ba-4132-ac6e-96792b40450c, 
> OU=f02b032a-7da0-4132-8a31-61c3d078e6cb, CN=dn@bd3138308a3f       
> 10801305457014    Fri Feb 17 00:00:00 UTC 2023   Sat Feb 17 00:00:00 UTC 2024 
>   O=CID-abb46225-77ba-4132-ac6e-96792b40450c, 
> OU=f02b032a-7da0-4132-8a31-61c3d078e6cb, CN=dn@e4795cc77124       
> 10801871334038    Fri Feb 17 00:00:00 UTC 2023   Sat Feb 17 00:00:00 UTC 2024 
>   O=CID-abb46225-77ba-4132-ac6e-96792b40450c, 
> OU=f02b032a-7da0-4132-8a31-61c3d078e6cb, CN=dn@3eb28ff965a1       
> 10803980992569    Fri Feb 17 00:00:00 UTC 2023   Sat Feb 17 00:00:00 UTC 2024 
>   O=CID-abb46225-77ba-4132-ac6e-96792b40450c, 
> OU=f02b032a-7da0-4132-8a31-61c3d078e6cb, CN=om2                   
> 10804543987939    Fri Feb 17 00:00:00 UTC 2023   Sat Feb 17 00:00:00 UTC 2024 
>   O=CID-abb46225-77ba-4132-ac6e-96792b40450c, 
> OU=f02b032a-7da0-4132-8a31-61c3d078e6cb, CN=om3                   
> 10806118720884    Fri Feb 17 00:00:00 UTC 2023   Sat Feb 17 00:00:00 UTC 2024 
>   O=CID-abb46225-77ba-4132-ac6e-96792b40450c, 
> OU=f02b032a-7da0-4132-8a31-61c3d078e6cb, CN=om1                   
> 10932809284268    Fri Feb 17 00:00:00 UTC 2023   Mon Mar 27 00:00:00 UTC 2028 
>   O=CID-abb46225-77ba-4132-ac6e-96792b40450c, 
> OU=b4a175f3-c6a4-47fd-bcc5-c081b03de8c7, [email protected]      {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to