[jira] [Updated] (PHOENIX-7370) Server to server system table RPC calls should use separate RPC handler pool

Viraj Jasani (Jira) Sat, 27 Jul 2024 12:41:04 -0700


     [ 
https://issues.apache.org/jira/browse/PHOENIX-7370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Viraj Jasani updated PHOENIX-7370:
----------------------------------
    Priority: Critical  (was: Major)

> Server to server system table RPC calls should use separate RPC handler pool
> ----------------------------------------------------------------------------
>
>                 Key: PHOENIX-7370
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-7370
>             Project: Phoenix
>          Issue Type: Improvement
>    Affects Versions: 5.2.0
>            Reporter: Viraj Jasani
>            Assignee: Viraj Jasani
>            Priority: Critical
>
> HBase uses RPC (Remote Procedure Call) framework for all the wire 
> communication among its components e.g. client to server (client to master 
> daemon or client to regionservers) as well as server to server (master to 
> regionserver, regionserver to regionserver) communication. HBase RPC uses 
> Google's Protocol Buffers (protobuf) for defining the structure of messages 
> sent between clients and servers. Protocol Buffers allow efficient 
> serialization and deserialization of data, which is crucial for performance. 
> HBase defines service interfaces using Protocol Buffers, which outline the 
> operations that clients can request from HBase servers. These interfaces 
> define methods like get, put, scan, etc., that clients use to interact with 
> the database.
> HBase also provides Coprocessors. HBase Coprocessors are used to extend the 
> regionservers functionalities. They allow custom code to execute within the 
> context of the regionserver during specific phases of the given workflow, 
> such as during data reads (preScan, postScan etc), writes (preBatchMutate, 
> postBatchMutate etc), region splits or even at the start or end of 
> regionserver operations. In addition to being SQL query engine, Phoenix is 
> also a Coprocessor component. RPC framework using Protobuf is used to define 
> how coprocessor endpoints communicate between clients and the coprocessors 
> running on the regionservers.
> Phoenix client creates CQSI connection ({{{}ConnectionQueryServices{}}}), 
> which maintains long time TCP connection with HBase server, usually knowns as 
> {{HConnection}} or HBase Connection. Once the connection is created, it is 
> cached by the Phoenix client.
> While PHOENIX-6066 is considered the correct fix to improve the query 
> performance, releasing it has surfaced other issues related to RPC framework. 
> One of the issues surfaced caused deadlock for SYSTEM.CATALOG serving 
> regionserver as it could not make any more progress because all handler 
> threads serving RPC calls for Phoenix system tables (thread pool: 
> {{{}RpcServer.Metadata.Fifo.handler{}}}) got exhausted while creating server 
> side connection from the given regionserver.
> Several workflows from MetaDataEndpointImpl coproc requires Phoenix 
> connection, which is usually CQSI connection. Phoenix differentiates CQSI 
> connections initiated by clients and servers by using a property: 
> {{{}IS_SERVER_CONNECTION{}}}.
> For CQSI connections created by servers, IS_SERVER_CONNECTION is kept true.
> Under heavy load, when several clients execute getTable() calls for the same 
> base table simultaneously, MetaDataEndpointImpl coproc attempts to create 
> server side CQSI connection initially. As CQSI initialization also depends on 
> Phoenix system tables existence check as well as client to server version 
> compatibility checks, it also performs MetaDataEndpointImpl#getVersion() RPC 
> call which is meant to be served by RpcServer.Metadata.Fifo.handler 
> thread-pool. However, under heavy load, the thread-pool can be completely 
> occupied if all getTable() calls tries to initiate CQSI connection, whereas 
> only single thread can take global CQSI lock to initiate HBase Connection 
> before caching CQSI connection for other threads to use. This has potential 
> to create deadlock.
> h3. Solutions:
>  * Phoenix server to server system table RPC calls are supposed to be using 
> separate handler thread-pools (PHOENIX-6687). However, this is not correctly 
> working because regardless of whether the HBase Connection is initiated by 
> client or server, Phoenix only provides ClientRpcControllerFactory by 
> default. We need to provide separate RpcControllerFactory during HBase 
> Connection initialization done by Coprocessors that operate on regionservers.
>  * For Phoenix server creating CQSI connection, we do not need to check for 
> existence of system tables as well as client-server version compatibility. 
> This redundant RPC call can be avoided.
>  
> Doc on HBase/Phoenix RPC Scheduler Framework: 
> https://docs.google.com/document/d/12SzcAY3mJVsN0naMnq45qsHcUIk1CzHsAI0EOi6IIgg/edit?usp=sharing



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (PHOENIX-7370) Server to server system table RPC calls should use separate RPC handler pool

Reply via email to