[
https://issues.apache.org/jira/browse/PHOENIX-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Viraj Jasani resolved PHOENIX-7233.
-----------------------------------
Resolution: Won't Fix
With HBASE-28428 resolved, we should no longer need PHOENIX-7233.
> CQSI openConnection should timeout to unblock other connection threads
> ----------------------------------------------------------------------
>
> Key: PHOENIX-7233
> URL: https://issues.apache.org/jira/browse/PHOENIX-7233
> Project: Phoenix
> Issue Type: Improvement
> Affects Versions: 5.1.3
> Reporter: Viraj Jasani
> Priority: Major
>
> PhoenixDriver initializes and caches ConnectionQueryServices objects with
> connectionQueryServicesCache. As part of the CQSI initialization, connection
> is opened with HBase server by using HBase client provided ConnectionFactory,
> which provides Connection object to the client. The Connection object
> provided by HBase allows clients to share Zookeeper connection, meta cache as
> well as remote connections to regionservers and master daemons. The
> Connection object is used to perform Table CRUD operations as well as
> Administrative actions on the cluster.
> HBase Connection object initialization requires ClusterId, which is
> maintained either in Zookeeper or Master daemons (or both) and retrieved by
> client depending on whether the client is configured to use
> ZKConnectionRegistry or MasterRegistry/RpcConnectionRegistry.
> For ZKConnectionRegistry, we have run into an edge case wherein the
> connection to Zookeeper server got stuck for more than 12 hours. When the
> client tried to create connection to Zookeeper quorum to retrieve the
> ClusterId, Zookeeper leader was switched from one server to another. While
> the leader switch event resulting into stuck connection requires RCA, it is
> not appropriate for Phoenix/HBase client to indefinitely wait for the
> response from Zookeeper without any connection timeout.
> For Phoenix client, if one thread is stuck in opening connection during
> CQSI#init, all other threads trying to create connections would get stuck
> because we take class level lock before opening the connection, leading to
> all threads getting stuck and potential termination or degradation of the
> client JVM.
> While HBase client should also use timeout, however not having timeout from
> Phoenix client side has far worse complications. As part of this Jira, we
> should introduce a way for CQSI#openConnection to timeout, either by using
> CompletableFuture API or using our preconfigured thread-pool.
>
> Stacktrace for reference:
>
> {code:java}
> jdk.internal.misc.Unsafe.park
> java.util.concurrent.locks.LockSupport.park
> java.util.concurrent.CompletableFuture$Signaller.block
> java.util.concurrent.ForkJoinPool.managedBlock
> java.util.concurrent.CompletableFuture.waitingGet
> java.util.concurrent.CompletableFuture.get
> org.apache.hadoop.hbase.client.ConnectionImplementation.retrieveClusterId
> org.apache.hadoop.hbase.client.ConnectionImplementation.<init>
> jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance?
> jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance
> jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance
> java.lang.reflect.Constructor.newInstance
> org.apache.hadoop.hbase.client.ConnectionFactory.lambda$createConnection$?
> org.apache.hadoop.hbase.client.ConnectionFactory$$Lambda$?.run
> java.security.AccessController.doPrivileged
> javax.security.auth.Subject.doAs
> org.apache.hadoop.security.UserGroupInformation.doAs
> org.apache.hadoop.hbase.security.User$SecureHadoopUser.runAs
> org.apache.hadoop.hbase.client.ConnectionFactory.createConnection
> org.apache.hadoop.hbase.client.ConnectionFactory.createConnection
> org.apache.phoenix.query.ConnectionQueryServicesImpl.openConnection
> org.apache.phoenix.query.ConnectionQueryServicesImpl.access$?
> org.apache.phoenix.query.ConnectionQueryServicesImpl$?.call
> org.apache.phoenix.query.ConnectionQueryServicesImpl$?.call
> org.apache.phoenix.util.PhoenixContextExecutor.call
> org.apache.phoenix.query.ConnectionQueryServicesImpl.init
> org.apache.phoenix.jdbc.PhoenixDriver.getConnectionQueryServices
> org.apache.phoenix.jdbc.HighAvailabilityGroup.connectToOneCluster
> org.apache.phoenix.jdbc.ParallelPhoenixConnection.getConnection
> org.apache.phoenix.jdbc.ParallelPhoenixConnection.lambda$new$?
> org.apache.phoenix.jdbc.ParallelPhoenixConnection$$Lambda$?.get
> org.apache.phoenix.jdbc.ParallelPhoenixContext.lambda$chainOnConnClusterContext$?
> org.apache.phoenix.jdbc.ParallelPhoenixContext$$Lambda$?.apply {code}
>
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)