keith-turner opened a new issue #1377: Add table ID sanity checks to garbage 
collector
URL: https://github.com/apache/accumulo/issues/1377
 
 
   Currently the Accumulo GC checks that each user table seen in the metadata 
table is properly formed (this check was recently improved by #1266). However 
there is no check to ensure all expected user tables are seen in the metadata 
table.  So if there is an error and nothing is seen for a user table in the 
metadata table, then the Accumulo GC will not know there is a problem.
   
   The garbage collection algorithm reads a set of delete candidates into 
memory and then scans the metadata table to remove any candidates that a 
referenced.  Sanity checks could added to cross reference tables ids seen in 
the metadata table with zookeeper.
   
   One possible way to do this is with the following three sets : 
   
    * **BSTI** : Table ids in zookeeper before the scan
    * **UMTI** : Table ids seen while scanning metadata table
    * **ASTI** : Table ids in zookeeper after the scan
    
   If (BSTI ∩ ASTI) ⊆ UMTI  is true then all expected table ids were 
seen.  If its not true, then its not safe to delete files.  Building these sets 
and checking them in the GC before deleting could make the Accumulo GC more 
robust against unknown errors when scanning the metadata table.
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to