[ 
https://issues.apache.org/jira/browse/HBASE-25714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17791761#comment-17791761
 ] 

Andrew Kyle Purtell edited comment on HBASE-25714 at 11/30/23 8:25 PM:
-----------------------------------------------------------------------

[~vjasani] [~niuyulin] I spent a couple of days a while back backporting these 
changes to a local tree based on branch-2. I got as far as subtask 4, decided 
it could be done when and if needed, and moved on to other work. The commits on 
the dev branch can serve as an implementation guide for redo of the work in the 
current master code base. If you are thinking about moving forward, the 
subtasks on this issue do remain a good implementation plan. Many hunks in the 
patches apply even on branch-2 so can guide as well, although should not be 
applied directly. 

However compatibility and rolling upgrade concerns are underspecified in the 
design doc and in the "implementation guide", as I will call this umbrella. 
Reading through the design doc then, and again now, I think some changes to the 
design should be considered:

- In the new design the regionserver, in order to initiate a compaction, makes 
a request to the master (_requestCompaction_), which then forwards the request 
to the compaction server. The compaction server, when complete, then sends 
completion notice directly to the RS. This doesn't seem ideal. Why does the 
master need to be involved at all? The RS can contact one of the compaction 
servers directly and make the request? Or, ok, say we do decide the master 
should be involved in making region level decisions about compaction, which 
would be a new architecture, why does the completion notice not get sent back 
to the master? The new flow should be symmetric. Stated reason for this in the 
discussion on the design doc is maybe the master is not available and does not 
need to be involved in the path of the completion notice. Exactly. My advice: 
Do not involve the master at all. Not even to track what compaction servers are 
available or not. We can use a znode in zookeeper to track the live set of 
compaction servers instead. Have the compaction servers register themselves as 
ephemeral znodes under {{/hbase/compactionServers}} or similar and the 
regionservers can keep a watch on this and manage a local list of available 
compaction servers themselves.

- Coprocessor API changes should be backwards compatible. Introducing a new 
abstract types in the middle of the hierarchy will cause compatibility breaks. 
Do we need these changes? The compaction server can provide 
_RegionCoprocessorHost_ and _RegionServerServices_ objects too. Many methods 
should throw _UnsupportedOperationException_ and this can be fine. Coprocessor 
implementors would need to adapt their logic based on whether or not an 
exception is thrown. They can interact with the region by _Connection_ instead 
of _Region_/_HRegion_ directly. And we can add a convenience marker interface 
so coprocessor developers can check if they are operating in a regionserver or 
in a compaction server at init time. I think this can be done in a compatible 
way. That said, downstream coprocessor users like Apache Phoenix, will need to 
update their design to account for potential compaction offload. They will not 
be able to rely on shared state with the regionserver process, because it will 
be a separate process. 

- Only when a cluster upgrade is complete should compaction offload be enabled. 
The design doc and implementation plan should be updated with relevant 
discussion. We can assume during an upgrade scenario that the new compaction 
servers will be deployed along with existing regionserver and master servers 
but they will not be enabled. After the cluster is fully upgraded then 
compaction offload can be enabled. This can be done with an hbase-site 
configuration upgrade and rolling restart of the regionservers, or by a new 
procedure, or by a new admin API that updates some switch in zookeeper... lots 
of possibilities, but the upgrade procedure should be documented. In the 
current design doc there is brief mention of switching offload on and off by 
way of a switch in zookeeper, implying the addition of a new admin API for 
compaction offload enable/disable; and also brief mention of supporting 
per-table opt in via table attribute. 


was (Author: apurtell):
[~vjasani] [~niuyulin] I spent a couple of days a while back backporting these 
changes to a local tree based on branch-2. I got as far as subtask 4, decided 
it could be done when and if needed, and moved on to other work. The commits on 
the dev branch can serve as an implementation guide for redo of the work in the 
current master code base. If you are thinking about moving forward, the 
subtasks on this issue do remain a good implementation plan. Many hunks in the 
patches apply even on branch-2 so can guide as well, although should not be 
applied directly. 

However compatibility and rolling upgrade concerns are underspecified in the 
design doc and in the "implementation guide", as I will call this umbrella. 
Reading through the design doc then, and again now, I think some changes to the 
design should be considered:

- In the new design the regionserver, in order to initiate a compaction, makes 
a request to the master (_requestCompaction_), which then forwards the request 
to the compaction server. The compaction server, when complete, then sends 
completion notice directly to the RS. This doesn't seem ideal. Why does the 
master need to be involved at all? The RS can contact one of the compaction 
servers directly and make the request? Or, ok, say we do decide the master 
should be involved in making region level decisions about compaction, which 
would be a new architecture, why does the completion notice not get sent back 
to the master? The new flow should be symmetric. Stated reason for this in the 
discussion on the design doc is maybe the master is not available and does not 
need to be involved in the path of the completion notice. Exactly. My advice: 
Do not involve the master at all. Not even to track what compaction servers are 
available or not. We can use a znode in zookeeper to track the live set of 
compaction servers instead. Have the compaction servers register themselves as 
ephemeral znodes under {{/hbase/compactionServers}} or similar and the 
regionservers can keep a watch on this and manage a local list of available 
compaction servers themselves.

- Coprocessor API changes should be backwards compatible. Introducing a new 
abstract types in the middle of the hierarchy will cause compatibility breaks. 
Do we need these changes? The compaction server can provide 
_RegionCoprocessorHost_ and _RegionServerServices_ objects too. Many methods 
should throw _UnsupportedOperationException_ and this can be fine. Coprocessor 
implementors would need to adapt their logic based on whether or not an 
exception is thrown. They can interact with the region by _Connection_ instead 
of _Region_/_HRegion_ directly. And we can add a convenience marker interface 
so coprocessor developers can check if they are operating in a regionserver or 
in a compaction server at init time. I think this can be done in a compatible 
way. That said, downstream coprocessor users like Apache Phoenix, will need to 
update their design to account for potential compaction offload. They will not 
be able to rely on shared state with the regionserver process, because it will 
be a separate process. 

- Only when a cluster upgrade is complete should compaction offload be enabled. 
The design doc and implementation plan should be updated with relevant 
discussion. We can assume during an upgrade scenario that the new compaction 
servers will be deployed along with existing regionserver and master servers 
but they will not be enabled. And then compaction offload can be enabled. This 
can be done with an hbase-site configuration upgrade and rolling restart of the 
regionservers, or by a new procedure, or by a new admin API that updates some 
switch in zookeeper... lots of possibilities, but the upgrade procedure should 
be documented. In the current design doc there is brief mention of switching 
offload on and off by way of a switch in zookeeper, implying the addition of a 
new admin API for compaction offload enable/disable; and also brief mention of 
supporting per-table opt in via table attribute. 

> Offload the compaction job to independent Compaction Server
> -----------------------------------------------------------
>
>                 Key: HBASE-25714
>                 URL: https://issues.apache.org/jira/browse/HBASE-25714
>             Project: HBase
>          Issue Type: Umbrella
>            Reporter: Yulin Niu
>            Assignee: Yulin Niu
>            Priority: Major
>         Attachments: CoprocessorSupport1.png, CoprocessorSupport2.png
>
>
> The basic idea is add a role "CompactionServer" to take the Compaction job. 
> HMaster is responsible for scheduling the compaction job to different 
> CompactionServer.
> [design 
> doc|https://docs.google.com/document/d/1exmhQpQArAgnryLaV78K3260rKm64BHBNzZE4VdTz0c/edit?usp=sharing]
> Suggestions are welcomed. Thanks.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to