cshannon opened a new pull request, #5353: URL: https://github.com/apache/accumulo/pull/5353
This change adds support for periodically scanning tables to find ranges of tables that can be automatically merged. The thread that runs uses the recently added TabletMergeabilty column as well as total file counts/sizes to compute if a range can be merged. By default, tablets can be merged if the total size is less than 25% of the split threshold so we don't immediately merge only to split again. There is a new thread in the Manager class that performs the computation and will submit fate jobs for merging on tablet ranges. There is a new fate operation during merge that will validate the range is still ok to merge if the merge was submitted as a system merge. This closes #5014 I am making this a draft because while it's feature complete, I still think there's some more work to do on the tests and tweaks to make, but wanted to get some feedback/suggestions. #### Outstanding issues/note/questions: 1. **For user created tables, the default tablet was previously set to `never` merge, but I think we should change this to always so i switched the default tablet to be a mergeability of `always` in this PR.** The reason is we can split the default tablet, so we should be able to merge other tablets back to it. It will never go away because the end row is null. One reason why I did this is there is no way in the API right now to change the mergeability for the default tablet (end row is null which you can't put into a map as a key) so I figure we should just make it always mergeable and any other splits would not be mergeable by default for user created tablets. This would be easy to set back to never by default if desired, but if keep the default tablet as never then we can never merge back to a single tablet again. 2. The default tables (metadata, fate, scanref) are configured with their initital tablet(s) to have a mergeability setting of `never`. Should we change this? I am thinking I probably want to switch the default tablet to be mergeable just like I did for user tables for consistency. However, pre-splitting is ever done the tablets would need to be made sure to be marked as never merge so they don't get merged away due to a size of 0. 3. What thread pool should we use for `FindMergeableRangeTask`? What interval by default? For now I put in a todo marker and just used the default context scheduler and set a run time of every 24 hours. 4. As mentioned, the tests could definitely be improved with more edge cases and complexity. The `TabletMergeabilityIT` especially only has 4 tests for now to demonstrate but there's a lot more tests we could write to verify the checks work like max files, sizes, etc work and we get the exepcted merges done. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
