Priyesh K created HDDS-15610:
--------------------------------

             Summary: SCM: Pending deletion block size metrics go negative 
causing corrupted Recon capacity display
                 Key: HDDS-15610
                 URL: https://issues.apache.org/jira/browse/HDDS-15610
             Project: Apache Ozone
          Issue Type: Bug
          Components: SCM, SCM HA
            Reporter: Priyesh K
            Assignee: Priyesh K


Problem: During long-running key deletion, {{ozone admin scm deletedBlocksTxn 
summary}} reports negative {{totalBlockSize}} and {{totalBlockReplicatedSize}} 
(e.g., {{-1996386304}} bytes). Recon reads this metric for cluster capacity, 
resulting in corrupted UI values like {{{}-631 MB{}}}.

Root Cause: A two-release deployment gap exists between when 
{{STORAGE_SPACE_DISTRIBUTION}} size fields were added to the 
{{DeletedBlocksTransaction}} proto (in {{{}constructNewTransaction{}}}) and 
when the summary accounting code was added to 
{{{}addTransactions{}}}/{{{}removeTransactions{}}}. A leader running the older 
release wrote TXs with size fields into the deletedBlocks CF but never wrote a 
summary to {{{}statefulConfigTable{}}}. When a leader running the newer release 
took over, {{initDataDistributionData()}} found no persisted summary and left 
all counters at 0. {{getTransactions()}} then populated {{txSizeMap}} for those 
size-carrying TXs. As datanodes committed them, {{descDeletedBlocksSummary()}} 
decremented from 0, driving {{totalBlocksSize}} negative. These negative values 
were Raft-replicated to all followers and reloaded on every restart, making the 
corruption self-perpetuating (confirmed in logs: {{2026-06-16 02:44 — 
totalBlocksSize -1996386304}} loaded at startup).

Fix:
 * Round decrements at 0 in {{descDeletedBlocksSummary()}} to prevent negative 
values from being persisted.
 * In {{{}initDataDistributionData(){}}}, trust the persisted summary only when 
both size fields are {{> 0}} . Otherwise fall back to a one-time deletedBlocks 
CF scan to recompute correct totals.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to