[PR] Adds support for multiple managers running distributed fate [accumulo]

via GitHub Tue, 03 Mar 2026 16:09:08 -0800


keith-turner opened a new pull request, #6168:
URL: https://github.com/apache/accumulo/pull/6168


   Lays the foundation for multiple manager with the following changes. The 
best place to start looking at these changes is in the Manager.run() method 
which sets everything and ties it all together.
   
    * Each manager process acquires two zookeeper locks now, a primary lock and 
an assistant lock. Only one manager process can obtain the primary lock and 
when it does it assumes the role of primary manager.  All manager processes 
acquire an assistant lock, which is similar to a tserver or compactor lock.  
The assistant lock advertises the manager process as being available to other 
Accumulo processes to handle assistant manager operations.
    * Manager processes have a single thrift server and thrift services hosted 
on that thrift server are categorized into primary manager and assistant 
manager services. When an assistant manager receives an RPC for a primary 
manager thrift service it will not execute the request and will throw an error 
or ignore the request.
    * The primary manager process delegates manager responsibility via RPCs to 
assistant managers.
    * Any management responsibility not delegated runs on the primary manager.
   
   Using the changes above fate is now distributed across all manager 
processes. In the future the changes above should make it easy to delegate 
other responsibilities to assistant managers. The following is an outline of 
the fate changes.
   
     * New FateWorker class.  This runs in every manager and handles request 
from the primary manager to adjust what range of the fate table its currently 
responsible for. FateWorker implements a new thrift service used to assign it 
ranges.
     * New FateManager class that is run by the primary manager and is 
responsible for partitioning fate processing across all assistant managers. As 
manager processes come and go this will repartition the fate table evenly 
across all available managers. The FateManager communicates with FateWorkers 
via thrift.
     * Some new RPCs for best effort notifications. Before these changes there 
were in memory notification systems that made the manager more responsive.  
These would allow a fate operation to signal the Tablet Group Watcher to take 
action sooner.  FateWorkerEnv sends these notifications to the primary manger 
over a new RPC.  Does not matter if they are lost, things will still eventually 
happen.
   
   Other than fate, the primary manager process does everything the current 
manager does.  This change pulls from #3262 and #6139.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[PR] Adds support for multiple managers running distributed fate [accumulo]

Reply via email to