Viraj Jasani created PHOENIX-7389: ------------------------------------- Summary: Phoenix metadata updates should fail-fast for noisy neighbor Key: PHOENIX-7389 URL: https://issues.apache.org/jira/browse/PHOENIX-7389 Project: Phoenix Issue Type: Improvement Affects Versions: 5.1.3, 5.2.0 Reporter: Viraj Jasani
Phoenix is high scale, low latency, high throughput multi-tenant database. The multi-tenancy can come with its own set of challenges, one of which is noisy neighbour problem. Single client can initiate very high num of tenant view updates (e.g. drop views, create views, create index etc) while all other clients are making RPC calls to SYSTEM.CATALOG for retrieving the updated PTable objects. With more metadata update calls, it is possible for more RPC calls to get stuck while waiting for HBase RowLock to be acquired. We have also seen high memory pressure with increasing num of metadata update APIs. HBase RowLock by default has 30s of timeout for acquiring lock, which is configurable by {_}hbase.rowlock.wait.duration{_}. While this is applicable at the cluster level, Phoenix metadata RPC calls are expected to have much lower timeout value for the RowLock acquisition because metadata updates and reads are expected to be extremely low latency operations. If this is not the case, we are essentially blocking some client from getting either enough RPC handlers to execute getTable RPC call or causing significant delays with ongoing getTable calls. While HBASE-28797 has a proposal to introduce new Region API for acquiring RowLock, Phoenix already has its own RowLock implementation and its already being used by getTable RPC calls while protecting metadata server side cache updates (PHOENIX-7363). The proposal of this Jira is to eliminate using HBase RowLock for all Phoenix metadata operations and use Phoenix RowLock with default timeout of 3 sec. -- This message was sent by Atlassian Jira (v8.20.10#820010)