[PR] PHOENIX-7370 Server to server system table RPC calls should use separate RPC handler pool [phoenix]

via GitHub Fri, 17 Oct 2025 11:55:38 -0700


virajjasani opened a new pull request, #2301:
URL: https://github.com/apache/phoenix/pull/2301

Jira: PHOENIX-7370

HBase uses RPC (Remote Procedure Call) framework for all the wire
communication among its components e.g. client to server (client to master
daemon or client to regionservers) as well as server to server (master to
regionserver, regionserver to regionserver) communication. HBase RPC uses
Google's Protocol Buffers (protobuf) for defining the structure of messages
sent between clients and servers. Protocol Buffers allow efficient
serialization and deserialization of data, which is crucial for performance.
HBase defines service interfaces using Protocol Buffers, which outline the
operations that clients can request from HBase servers. These interfaces define
methods like get, put, scan, etc., that clients use to interact with the
database.

HBase also provides Coprocessors. HBase Coprocessors are used to extend the
regionservers functionalities. They allow custom code to execute within the
context of the regionserver during specific phases of the given workflow, such
as during data reads (preScan, postScan etc), writes (preBatchMutate,
postBatchMutate etc), region splits or even at the start or end of regionserver
operations. In addition to being SQL query engine, Phoenix is also a
Coprocessor component. RPC framework using Protobuf is used to define how
coprocessor endpoints communicate between clients and the coprocessors running
on the regionservers.

Phoenix client creates CQSI connection (ConnectionQueryServices), which
maintains long time TCP connection with HBase server, usually knowns as
HConnection or HBase Connection. Once the connection is created, it is cached
by the Phoenix client.

While PHOENIX-6066 is considered the correct fix to improve the query
performance, releasing it has surfaced other issues related to RPC framework.
One of the issues surfaced caused deadlock for SYSTEM.CATALOG serving
regionserver as it could not make any more progress because all handler threads
serving RPC calls for Phoenix system tables (thread pool:
RpcServer.Metadata.Fifo.handler) got exhausted while creating server side
connection from the given regionserver.
Several workflows from MetaDataEndpointImpl coproc requires Phoenix
connection, which is usually CQSI connection. Phoenix differentiates CQSI
connections initiated by clients and servers by using a property:
IS_SERVER_CONNECTION.
For CQSI connections created by servers, IS_SERVER_CONNECTION is kept true.
Under heavy load, when several clients execute getTable() calls for the same
base table simultaneously, MetaDataEndpointImpl coproc attempts to create
server side CQSI connection initially. As CQSI initialization also depends on
Phoenix system tables existence check as well as client to server version
compatibility checks, it also performs MetaDataEndpointImpl#getVersion() RPC
call which is meant to be served by RpcServer.Metadata.Fifo.handler
thread-pool. However, under heavy load, the thread-pool can be completely
occupied if all getTable() calls tries to initiate CQSI connection, whereas
only single thread can take global CQSI lock to initiate HBase Connection
before caching CQSI connection for other threads to use. This has potential to
create deadlock.

Doc on HBase/Phoenix RPC Scheduler Framework:
https://docs.google.com/document/d/12SzcAY3mJVsN0naMnq45qsHcUIk1CzHsAI0EOi6IIgg/edit?usp=sharing

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[PR] PHOENIX-7370 Server to server system table RPC calls should use separate RPC handler pool [phoenix]

Reply via email to