dlmarion commented on PR #3262: URL: https://github.com/apache/accumulo/pull/3262#issuecomment-1829805334
> When the set of managers and tables are steady for a bit, all manager processes need to arrive at the same decisions for partitioning tables into buckets. With the algorithm in this method different manager processes may see different counts for the same tables at different times and end up partitioning tables into different buckets. This could lead to overlap in the partitions or in the worst case a table that no manager processes. We could start with a deterministic hash partitioning of tables and open a follow in issue to improve. One possible way to improve would be to have a single manager process run this algorithm and publish the partitioning information, with all other manager just using it. > This would be a follow on issue, thinking we could distribute the compaction coordinator by having it hash parition queue names. among manager processes. TGW could make an RPC to add a job to a remote queue. Compaction coordinators could hash the name to find the manager process to ask for work. > We may need to make the EventCoordinator use the same partitioning as the TGW and send events to other manager processes via a new async RPC. Need to analyze the EventCoordinator, may make sense to pull it in to the TGW conceptually. Every manager uses it local TGW instance to signal events and internally the TGW code knows how to route that in the cluster to other TGW instances. I'm now concerned that this is going to be overly complex - lot's of moving parts with the potential for multiple managers to claim ownership of the same object, or using some external process (ZK) to coordinate which Manager is responsible for a specific object. The Multiple Manager implementation in this PR is based off [this](https://cwiki.apache.org/confluence/display/ACCUMULO/Elasticity+Design+Notes+-+March+2023) design, which has multiple managers try to manage everything. I think there may be a simpler way as we have already introduced a natural partitioning mechanism - resource groups. I went back and looked in the wiki and you (@keith-turner ) had a very similar idea at the bottom of [this](https://cwiki.apache.org/confluence/display/ACCUMULO/Implementing+multiple+managers+via+independant+distributed+services) page. So, instead of having a single set of Managers try to manage everything, you would have a single Manager manage tablets, compactions, and Fate for all of the tables that map to a specific resource group. We could continue to have the active/backup Manager feature that we have today, but per resource group. This also solves the Monitor problem. If we look at this using the `cluster.yaml` file it would go from what we have today: ``` manager: - localhost monitor: - localhost gc: - localhost tserver: default: - localhost group1: - localhost compactor: accumulo_meta: - localhost user_small: - localhost user_large: - localhost sserver: default: - localhost group1: - localhost ``` to something like: ``` default: manager: - localhost monitor: - localhost gc: - localhost tserver: - localhost compactor: accumulo_meta: - localhost user_small: - localhost user_large: - localhost sserver: default: - localhost group1: manager: - localhost monitor: - localhost gc: - localhost tserver: - localhost compactor: accumulo_meta: - localhost user_small: - localhost user_large: - localhost sserver: default: - localhost ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
