cshannon opened a new pull request, #5353:
URL: https://github.com/apache/accumulo/pull/5353

   This change adds support for periodically scanning tables to find ranges of 
tables that can be automatically merged. The thread that runs uses the recently 
added TabletMergeabilty column as well as total file counts/sizes to compute if 
a range can be merged. By default, tablets can be merged if the total size is 
less than 25% of the split threshold so we don't immediately merge only to 
split again.
   
   There is a new thread in the Manager class that performs the computation and 
will submit fate jobs for merging on tablet ranges. There is a new fate 
operation during merge that will validate the range is still ok to merge if the 
merge was submitted as a system merge.
   
   This closes #5014
   
   I am making this a draft because while it's feature complete, I still think 
there's some more work to do on the tests and tweaks to make, but wanted to get 
some feedback/suggestions.
   
   #### Outstanding issues/note/questions:
   
   1. **For user created tables, the default tablet was previously set to 
`never` merge, but I think we should change this to always so i switched the 
default tablet to be a mergeability of `always` in this PR.** The reason is we 
can split the default tablet, so we should be able to merge other tablets back 
to it. It will never go away because the end row is null. One reason why I did 
this is there is no way in the API right now to change the mergeability for the 
default tablet (end row is null which you can't put into a map as a key) so I 
figure we should just make it always mergeable and any other splits would not 
be mergeable by default for user created tablets. This would be easy to set 
back to never by default if desired, but if keep the default tablet as never 
then we can never merge back to a single tablet again.
   2. The default tables (metadata, fate, scanref) are configured with their 
initital tablet(s) to have a mergeability setting of `never`. Should we change 
this? I am thinking I probably want to switch the default tablet to be 
mergeable just like I did for user tables for consistency. However, 
pre-splitting is ever done the tablets would need to be made sure to be marked 
as never merge so they don't get merged away due to a size of 0.
   3. What thread pool should we use for `FindMergeableRangeTask`? What 
interval by default? For now I put in a todo marker and just used the default 
context scheduler and set a run time of every 24 hours.
   4. As mentioned, the tests could definitely be improved with more edge cases 
and complexity. The `TabletMergeabilityIT` especially only has 4 tests for now 
to demonstrate but there's a lot more tests we could write to verify the checks 
work like max files, sizes, etc work and we get the exepcted merges done.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to