[ 
https://issues.apache.org/jira/browse/HBASE-29905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18058956#comment-18058956
 ] 

Jan Van Besien commented on HBASE-29905:
----------------------------------------

Attempted a fix in this PR: https://github.com/apache/hbase/pull/7761

> BackupLogCleaner retains old WAL files due to stale entries in system:backup 
> table
> ----------------------------------------------------------------------------------
>
>                 Key: HBASE-29905
>                 URL: https://issues.apache.org/jira/browse/HBASE-29905
>             Project: HBase
>          Issue Type: Bug
>          Components: backup&restore
>            Reporter: Jan Van Besien
>            Priority: Major
>              Labels: pull-request-available
>
> The backup:system table stores trslm: (table-region-server-log-map) rows with 
> the row key format: {{trslm:\0}}
> Each row's value is a protobuf-serialized map of {{\{RegionServer → WAL 
> timestamp}}}, representing the WAL position up to which each RegionServer has 
> been backed up for that table.
> BackupLogCleaner uses this information to decide what WAL files to cleanup, 
> as follows:
>  * During backup completion (FullTableBackupClient.java:192 / 
> IncrementalTableBackupClient.java:330), writeRegionServerLogTimestamp() 
> writes a trslm: row for each table in the backup, recording the latest WAL 
> timestamp per RS.
>  * Immediately after, readLogTimestampMap() (BackupSystemTable.java:802) 
> scans all trslm: rows for that backup root — every table that has ever been 
> backed up to that root, not just the tables in the current backup. This full 
> map is stored into the BackupInfo object 
> (backupInfo.setTableSetTimestampMap(...)) and persisted as part of the 
> session: row in backup:system.
>  * BackupLogCleaner (BackupLogCleaner.java:89-142) reads the most recent 
> BackupInfo per backup root and iterates over its tableSetTimestampMap. For 
> each RegionServer found across all tables, it computes the minimum timestamp 
> as the "preservation boundary" for that server. WALs older than or equal to 
> this boundary can be deleted; newer ones are retained. A single stale table 
> with a year-old timestamp for any RS will pin WAL retention for that RS all 
> the way back, preventing WAL cleanup.
> The root cause is that there is no code anywhere that deletes trslm: rows. 
> They are only written (overwritten) when a backup runs for that specific 
> table. Two scenarios create stale rows:
>  * (a) Table removed from backup (because the table is no longer included in 
> backups or simple because the table is deleted).
>  * (b) Regionserver decommissioned
> Problem (a) was observed in production (workaround was to remove the stale 
> entries manually).
> To fix this, I think we need to have a cleanup mechanism. Perhaps we can 
> filter readLogTimestampMap() results to only include tables in the current 
> backup info, and delete everything else (or only filter, without delete, but 
> then the stale entries still remain in the table).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to