[ 
https://issues.apache.org/jira/browse/HDDS-13491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Rose resolved HDDS-13491.
-------------------------------
    Fix Version/s: 2.1.0
       Resolution: Fixed

> QUASI-CLOSED Container State Causes Checksum Mismatch After Replica 
> Re-Replication
> ----------------------------------------------------------------------------------
>
>                 Key: HDDS-13491
>                 URL: https://issues.apache.org/jira/browse/HDDS-13491
>             Project: Apache Ozone
>          Issue Type: Bug
>          Components: Ozone Datanode
>            Reporter: Bablu Raul
>            Priority: Major
>             Fix For: 2.1.0
>
>
> To achieve the quasi-closed state, I stopped DataNodes 15, and 8
> {code:java}
> data8      0
> data16     976cd849  ← Original checksum
> data15     0 {code}
> After running the reconcile command, it correctly replicated the data from 
> the healthy replica (data16) to the newly added DataNodes data2 and data20:
> {code:java}
> data2      0
> data16     976cd849
> data20     976cd849 {code}
> However, the checksum on DataNode 2 did not update automatically. I had to 
> run the reconcile command multiple times before it eventually reflected the 
> correct checksum (976cd849), which is not the expected behavior
> {code:java}
> data2      bc9276dc  ← Incorrect
> data8      976cd849
> data15     976cd849 {code}
> Here, the DataNode 2 checksum is being updated incorrectly. dn-container.log
> {code:java}
> grep 10136 /var/log/hadoop-ozone/dn-container.log
> 2025-07-16 09:02:08,183 | INFO  | ID=10136 | Index=0 | BCSID=9 | 
> State=QUASI_CLOSED | DataChecksum=0 |  
> 2025-07-16 09:04:58,251 | WARN  | ID=10136 | Index=0 | BCSID=9 | 
> State=QUASI_CLOSED | DataChecksum=976cd849 | Container data checksum updated 
> from 0 to 976cd849 |  
> 2025-07-16 09:05:03,707 | INFO  | ID=10136 | Index=0 | BCSID=9 | 
> State=QUASI_CLOSED | DataChecksum=976cd849 | Container reconciled with peer 
> ac0284ad-ffa6-461a-8fe8-8a1e5ff95206(data20/10.140.55.137). No change in 
> checksum. |  
> 2025-07-16 09:05:03,727 | INFO  | ID=10136 | Index=0 | BCSID=9 | 
> State=QUASI_CLOSED | DataChecksum=976cd849 | Container reconciled with peer 
> e8ceede0-ed4d-40a8-845d-5eead2abc4b9(data16/10.140.176.198). No change in 
> checksum. |  
> 2025-07-16 09:05:14,748 | INFO  | ID=10136 | Index=0 | BCSID=9 | 
> State=QUASI_CLOSED | DataChecksum=976cd849 | Container reconciled with peer 
> ac0284ad-ffa6-461a-8fe8-8a1e5ff95206(data20/10.140.55.137). No change in 
> checksum. |  
> 2025-07-16 09:05:14,770 | INFO  | ID=10136 | Index=0 | BCSID=9 | 
> State=QUASI_CLOSED | DataChecksum=976cd849 | Container reconciled with peer 
> e8ceede0-ed4d-40a8-845d-5eead2abc4b9(data16/10.140.176.198). No change in 
> checksum. |  
> 2025-07-16 09:06:05,774 | INFO  | ID=10136 | Index=0 | BCSID=9 | State=CLOSED 
> | DataChecksum=976cd849 |  
> 2025-07-16 09:06:24,154 | INFO  | ID=10136 | Index=0 | BCSID=9 | State=CLOSED 
> | DataChecksum=976cd849 | Container reconciled with peer 
> ac0284ad-ffa6-461a-8fe8-8a1e5ff95206(data20/10.140.55.137). No change in 
> checksum. |  
> 2025-07-16 09:06:24,181 | INFO  | ID=10136 | Index=0 | BCSID=9 | State=CLOSED 
> | DataChecksum=976cd849 | Container reconciled with peer 
> dba2a33f-41a2-4d7b-8671-6a8ab46464f7(data8/10.140.237.196). No change in 
> checksum. |  
> 2025-07-16 09:06:24,207 | INFO  | ID=10136 | Index=0 | BCSID=9 | State=CLOSED 
> | DataChecksum=976cd849 | Container reconciled with peer 
> ec0a9775-f5c6-481c-8484-44a967a0bd94(data15/10.140.119.137). No change in 
> checksum. |  
> 2025-07-16 09:06:24,211 | WARN  | ID=10136 | Index=0 | BCSID=9 | State=CLOSED 
> | DataChecksum=bc9276dc | Container data checksum updated from 976cd849 to 
> bc9276dc |  
>  {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to