[
https://issues.apache.org/jira/browse/HDDS-15610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated HDDS-15610:
----------------------------------
Labels: pull-request-available (was: )
> SCM: Pending deletion block size metrics go negative causing corrupted Recon
> capacity display
> ---------------------------------------------------------------------------------------------
>
> Key: HDDS-15610
> URL: https://issues.apache.org/jira/browse/HDDS-15610
> Project: Apache Ozone
> Issue Type: Bug
> Components: SCM, SCM HA
> Reporter: Priyesh K
> Assignee: Priyesh K
> Priority: Major
> Labels: pull-request-available
>
> Problem: During long-running key deletion, {{ozone admin scm deletedBlocksTxn
> summary}} reports negative {{totalBlockSize}} and
> {{totalBlockReplicatedSize}} (e.g., {{-1996386304}} bytes). Recon reads this
> metric for cluster capacity, resulting in corrupted UI values like {{{}-631
> MB{}}}.
> Root Cause: A two-release deployment gap exists between when
> {{STORAGE_SPACE_DISTRIBUTION}} size fields were added to the
> {{DeletedBlocksTransaction}} proto (in {{{}constructNewTransaction{}}}) and
> when the summary accounting code was added to
> {{{}addTransactions{}}}/{{{}removeTransactions{}}}. A leader running the
> older release wrote TXs with size fields into the deletedBlocks CF but never
> wrote a summary to {{{}statefulConfigTable{}}}. When a leader running the
> newer release took over, {{initDataDistributionData()}} found no persisted
> summary and left all counters at 0. {{getTransactions()}} then populated
> {{txSizeMap}} for those size-carrying TXs. As datanodes committed them,
> {{descDeletedBlocksSummary()}} decremented from 0, driving
> {{totalBlocksSize}} negative. These negative values were Raft-replicated to
> all followers and reloaded on every restart, making the corruption
> self-perpetuating (confirmed in logs: {{2026-06-16 02:44 — totalBlocksSize
> -1996386304}} loaded at startup).
> Fix:
> * Round decrements at 0 in {{descDeletedBlocksSummary()}} to prevent
> negative values from being persisted.
> * In {{{}initDataDistributionData(){}}}, trust the persisted summary only
> when both size fields are {{> 0}} . Otherwise fall back to a one-time
> deletedBlocks CF scan to recompute correct totals.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]