Keith and I have been working on a solution for issue #1451 - being able to
run major compactions outside the tablet server. This would enable
compactions to run when tables are offline, tablet servers die, and tablets
are balancing. We have created two pull requests, one for the code[1]
changes and another for the documentation[2] changes.
This change introduces two new optional components in the architecture. The
CompactionCoordinator is much like the Manager in that it is a singleton in
the system and it manages the state of external compactions across the
system. The CompactionCoordinator is started with the command:
bin/accumulo compaction-coordinator
The Compactor is the other optional component. There can be many
Compactor's running in the system and each Compactor runs one compaction at
a time. It communicates with the CompactionCoordinator to get information
about the next compaction that it needs to complete and to relay the status
of the compaction. The Compactor is started with the command:
bin/accumulo compactor -q <queueName>
The queueName parameter should match the name of the external queue set in
the compaction service options. This allows an administrator to define
different compaction services for tables, each with their own queue, and to
scale the number of Compactors differently. For example we can define a
compaction service named cs1, then create a table and configure it to use
the compaction service:
config -s
tserver.compaction.major.service.cs1.planner=org.apache.accumulo.core.spi.compaction.DefaultCompactionPlanner
config -s
'tserver.compaction.major.service.cs1.planner.opts.executors=[{"name":"all","externalQueue":"Q1"}]'
createtable test
config -t test -s
table.compaction.dispatcher=org.apache.accumulo.core.spi.compaction.SimpleCompactionDispatcher
config -t test -s table.compaction.dispatcher.opts.service=cs1
Compactions on table "test" will occur externally by starting the
CompactionCoordinator and Compactor with queueName "Q1".
With regards to testing, we have unit and integration tests.
ExternalCompactionIT has pretty decent coverage. We have also tested
locally with multiple Compactors using uno. We are hoping to perform a
cluster test soon, potentially deploying the Compactors using k8s and it's
horizontal pod scaler for a follow-on blog post. Please let us know if you
are interested in helping out with testing.
[1] https://github.com/apache/accumulo/pull/2096
[2] https://github.com/apache/accumulo-website/pull/282