keith-turner opened a new pull request, #5833:
URL: https://github.com/apache/accumulo/pull/5833
Attempting to split a tablet that had files that did not have data for the
tablet would cause an error. There were two bugs. First bug was the splits
code would fail if a file went to zero child tablets. Second bug was if a file
had a fence range that was disjoint from data in the file, then the FencedRFile
code would fail. This happened be cause the code would compute a range where
the start was after the end.
Both of these situations can occur over time with concurrent splits, merges,
and bulk imports. For example the following could happen.
1. bulk import calculates tablets tha files go to
2. split add more tablets
3. bulk import adds files to the ranges it calculated before the split
happened. This could result in a tablet pointing to a file that has no data
for it.
4. Tablets are merged and fence ranges are added. If the file has no data
in the tablet range, then the fence range will be disjoint w/ the range of data
in the file.
To fix this a new FileRange class was added that represents a tablet range
or an empty range. This code replaces two method for getting a files first and
last row that returned null when the file was empty. The null was really
confusing, explicitly representing empty in the class makes the code easier to
understand.
Using this new FileRange class, the split code and fenced rfile code were
fixed.
These problems were found when running the bulk randomwalk test.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]