[ 
https://issues.apache.org/jira/browse/HDDS-15610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-15610:
----------------------------------
    Labels: pull-request-available  (was: )

> SCM: Pending deletion block size metrics go negative causing corrupted Recon 
> capacity display
> ---------------------------------------------------------------------------------------------
>
>                 Key: HDDS-15610
>                 URL: https://issues.apache.org/jira/browse/HDDS-15610
>             Project: Apache Ozone
>          Issue Type: Bug
>          Components: SCM, SCM HA
>            Reporter: Priyesh K
>            Assignee: Priyesh K
>            Priority: Major
>              Labels: pull-request-available
>
> Problem: During long-running key deletion, {{ozone admin scm deletedBlocksTxn 
> summary}} reports negative {{totalBlockSize}} and 
> {{totalBlockReplicatedSize}} (e.g., {{-1996386304}} bytes). Recon reads this 
> metric for cluster capacity, resulting in corrupted UI values like {{{}-631 
> MB{}}}.
> Root Cause: A two-release deployment gap exists between when 
> {{STORAGE_SPACE_DISTRIBUTION}} size fields were added to the 
> {{DeletedBlocksTransaction}} proto (in {{{}constructNewTransaction{}}}) and 
> when the summary accounting code was added to 
> {{{}addTransactions{}}}/{{{}removeTransactions{}}}. A leader running the 
> older release wrote TXs with size fields into the deletedBlocks CF but never 
> wrote a summary to {{{}statefulConfigTable{}}}. When a leader running the 
> newer release took over, {{initDataDistributionData()}} found no persisted 
> summary and left all counters at 0. {{getTransactions()}} then populated 
> {{txSizeMap}} for those size-carrying TXs. As datanodes committed them, 
> {{descDeletedBlocksSummary()}} decremented from 0, driving 
> {{totalBlocksSize}} negative. These negative values were Raft-replicated to 
> all followers and reloaded on every restart, making the corruption 
> self-perpetuating (confirmed in logs: {{2026-06-16 02:44 — totalBlocksSize 
> -1996386304}} loaded at startup).
> Fix:
>  * Round decrements at 0 in {{descDeletedBlocksSummary()}} to prevent 
> negative values from being persisted.
>  * In {{{}initDataDistributionData(){}}}, trust the persisted summary only 
> when both size fields are {{> 0}} . Otherwise fall back to a one-time 
> deletedBlocks CF scan to recompute correct totals.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to