cshannon commented on issue #608: URL: https://github.com/apache/accumulo/issues/608#issuecomment-1399316343
I investigated this again yesterday and there are still some reports of this same behavior happening against 1.10.x. It's similar behavior where there is missing data returned from the scan of the accumulo.metdata table during GC leading to references being missed. In this case an entire tablet worth of data is missing which leads to files being deleted that shouldn't because the file references aren't seen. A simple example of the scenario seen is: 1. accumulo.root has 1 tablet containing the metadata for 4 accumulo.metadata tablets. 2. The 3rd accumulo metadata tablet contains metadata for the 10th, 11th, and 12th tablets in table X. 3. GC runs and all of the metadata contained in the 1st, 2nd, and 4th accumulo.metadata tablets is returned and used appropriately by the garbage collector 4. However, during the GC run all of the data for the 3rd accumulo metadata tablet goes missing which leads to missing the references to files in the 10th, 11th, and 12th tablets from table X. 5. Because the file references are missing, files for the 10th, 11th, 12th tablets of table X get removed In some instances the issue could be traced to a hardware failure (disk issue) but this has not always been the case. Even if there is a hardware failure the fact that data is missing should be detected during the safety checks of the scan for file references of the metadata table. If there are errors/inconsistencies detected during the scan (due to splits, etc) then the iterator that is used should reset itself to prevent an issue. However, because there ends up being file references missing and bypassing the checks this would seem to maybe indicate an issue in the iterator error handling/detection so maybe something isn't quite right because GC continued with data not being detected as missing. The iterator used for scanning for file references is the [TabletIterator](https://github.com/apache/accumulo/blob/88b75633f7eef883f07446cfbff01cc193ecf6fa/server/base/src/main/java/org/apache/accumulo/server/util/TabletIterator.java#L57) used in version 1.10. It's not known if the issue still exists in 2.1 as there were changes made and a different Iterator is now used but if it does it would likely be in the equivalent iterator used for the references scan in the [LinkingIterator](https://github.com/apache/accumulo/blob/bd49cff1be7de00a43cd9f03e75c319e5ab3735d/core/src/main/java/org/apache/accumulo/core/metadata/schema/LinkingIterator.java#L50) in 2.1 . The root cause of the issue isn't known so it makes it hard to fix or find a solution for now but I talked to @dlmarion about this and thought of a few things to bring up and discuss. 1. Does it make sense to just abort the GC run vs trying to reset the iterator if an error is detected? The gc process runs in a loop anyways and will just re-run the next iteration. It's possible there's an issue with resetting the iterator so maybe just aborting would be better. 2. Is it possible there is actually a server side error that isn't bubbling back to the client which would lead to the client continuing the scan when it shouldn't? 3. Should we try and run an extra sanity check somehow to verify the data is actually correct? For example, should we try also using the Offline iterator to scan and compare? Can we verify we actually received all the tablets we should by checking before and after the run? Do we run the scan multiple times and compare? If we ran multiple times we could take a superset of files seen (anything that truly should be GC'd would just get removed next run) 4. Because an entire tablet of data was missing in this scenario is it possible it's related to the tablet splitting and the data was missed? 5. Does this error actually happen in other cases (scans of other/normal tables) but no one notices usually because missing data during the scan may not be noticed but in this case it is noticed because it leads to errors later? Or is this only related to the metadata table and these specific iterators? Obviously some more investigating work needs to be done but this is where things currently stand after I started looking into it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
