[
https://issues.apache.org/jira/browse/PHOENIX-7253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17823491#comment-17823491
]
Daniel Wong commented on PHOENIX-7253:
--------------------------------------
I found some of my notes:
Phoenix code doing this is a hack given HBase limitations... in 1.x there was
no easy access to the hbase clients metadata cache and this method was used to
kind of hackly get the cached values only as opposed to directly calling
{{HConnection.getAllRegionLocations }}
This code path is being called from “{{{}getAllTableRegions.{}}} This is used
when the Phoenix client wants/needs to update the region info for a hbase
table. This could be done during scan generation or when a split/merge is
detected. This method calls {{HConnection.getRegionLocation}} iteratively using
one region at a time to find the next region. This is done so that this uses
the cache in the HConnection itself rather than the
HTable.getAllRegionLocations which appears to cause a MetaScan and re-cache on
the HBase client of all the regions in the 1.x codebase. Where this can become
an issue is if one region splits many scans may simultaneously call this method
in overloading the client threads as well as possibly meta. Some protection so
that a client can only call this from a single thread for a single table would
improve this flow greatly. Whether this should be done inside of hbase itself
or in phoenix is debatable but Phoenix is intending to use the HBase client
cache rather than store its own cache of region boundaries.
I haven't had time to explore the 2.X codebase but much of this has changed and
been replace via async region locator flow here
[https://github.com/apache/hbase/blob/master/hbase-client/src/main/java/org/apache/hadoop/hbase/client/AsyncTableRegionLocatorImpl.java#L59]
and this appears to again call and then cache rather than use the cache were
available... some trickyness here.
Want the code orignally wanted/intended was to get all the regions from the
hclient cache and only renew the cache on detected split/merge iirc.
> Perf improvement for non-full scan queries on large table
> ---------------------------------------------------------
>
> Key: PHOENIX-7253
> URL: https://issues.apache.org/jira/browse/PHOENIX-7253
> Project: Phoenix
> Issue Type: Improvement
> Affects Versions: 5.2.0, 5.1.3
> Reporter: Viraj Jasani
> Assignee: Viraj Jasani
> Priority: Critical
> Fix For: 5.2.0, 5.1.4
>
>
> Any considerably large table with more than 100k regions can give problematic
> performance if we access all region locations from meta for the given table
> before generating parallel or sequential scans for the given query. The perf
> impact can really hurt range scan queries.
> Consider a table with hundreds of thousands of tenant views. Unless the query
> is strict point lookup, any query on any tenant view would end up retrieving
> region locations of all regions of the base table. In case if IOException is
> thrown by HBase client during any region location lookup in meta, we only
> perform single retry.
> Proposal:
> # All non point lookup queries should only retrieve region locations that
> cover the scan boundary. Avoid fetching all region locations of the base
> table.
> # Make retries configurable with higher default value.
>
> Sample stacktrace from the multiple failures observed:
> {code:java}
> java.sql.SQLException: ERROR 1102 (XCL02): Cannot get all table regions.Stack
> trace: java.sql.SQLException: ERROR 1102 (XCL02): Cannot get all table
> regions.
> at
> org.apache.phoenix.exception.SQLExceptionCode$Factory$1.newException(SQLExceptionCode.java:620)
> at
> org.apache.phoenix.exception.SQLExceptionInfo.buildException(SQLExceptionInfo.java:229)
> at
> org.apache.phoenix.query.ConnectionQueryServicesImpl.getAllTableRegions(ConnectionQueryServicesImpl.java:781)
> at
> org.apache.phoenix.query.DelegateConnectionQueryServices.getAllTableRegions(DelegateConnectionQueryServices.java:87)
> at
> org.apache.phoenix.query.DelegateConnectionQueryServices.getAllTableRegions(DelegateConnectionQueryServices.java:87)
> at
> org.apache.phoenix.iterate.DefaultParallelScanGrouper.getRegionBoundaries(DefaultParallelScanGrouper.java:74)
> at
> org.apache.phoenix.iterate.BaseResultIterators.getRegionBoundaries(BaseResultIterators.java:587)
> at
> org.apache.phoenix.iterate.BaseResultIterators.getParallelScans(BaseResultIterators.java:936)
> at
> org.apache.phoenix.iterate.BaseResultIterators.getParallelScans(BaseResultIterators.java:669)
> at
> org.apache.phoenix.iterate.BaseResultIterators.<init>(BaseResultIterators.java:555)
> at
> org.apache.phoenix.iterate.SerialIterators.<init>(SerialIterators.java:69)
> at org.apache.phoenix.execute.ScanPlan.newIterator(ScanPlan.java:278)
> at
> org.apache.phoenix.execute.BaseQueryPlan.iterator(BaseQueryPlan.java:374)
> at
> org.apache.phoenix.execute.BaseQueryPlan.iterator(BaseQueryPlan.java:222)
> at
> org.apache.phoenix.execute.BaseQueryPlan.iterator(BaseQueryPlan.java:217)
> at
> org.apache.phoenix.execute.BaseQueryPlan.iterator(BaseQueryPlan.java:212)
> at
> org.apache.phoenix.jdbc.PhoenixStatement$1.call(PhoenixStatement.java:370)
> at
> org.apache.phoenix.jdbc.PhoenixStatement$1.call(PhoenixStatement.java:328)
> at org.apache.phoenix.call.CallRunner.run(CallRunner.java:53)
> at
> org.apache.phoenix.jdbc.PhoenixStatement.executeQuery(PhoenixStatement.java:328)
> at
> org.apache.phoenix.jdbc.PhoenixStatement.executeQuery(PhoenixStatement.java:320)
> at
> org.apache.phoenix.jdbc.PhoenixPreparedStatement.executeQuery(PhoenixPreparedStatement.java:188)
> ...
> ...
> Caused by: java.io.InterruptedIOException: Origin: InterruptedException
> at
> org.apache.hadoop.hbase.util.ExceptionUtil.asInterrupt(ExceptionUtil.java:72)
> at
> org.apache.hadoop.hbase.client.ConnectionImplementation.takeUserRegionLock(ConnectionImplementation.java:1129)
> at
> org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegionInMeta(ConnectionImplementation.java:994)
> at
> org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegion(ConnectionImplementation.java:895)
> at
> org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegion(ConnectionImplementation.java:881)
> at
> org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegion(ConnectionImplementation.java:851)
> at
> org.apache.hadoop.hbase.client.ConnectionImplementation.getRegionLocation(ConnectionImplementation.java:730)
> at
> org.apache.phoenix.query.ConnectionQueryServicesImpl.getAllTableRegions(ConnectionQueryServicesImpl.java:766)
> ... 254 more
> Caused by: java.lang.InterruptedException
> at
> java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireNanos(AbstractQueuedSynchronizer.java:982)
> at
> java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireNanos(AbstractQueuedSynchronizer.java:1288)
> at
> java.base/java.util.concurrent.locks.ReentrantLock.tryLock(ReentrantLock.java:424)
> at
> org.apache.hadoop.hbase.client.ConnectionImplementation.takeUserRegionLock(ConnectionImplementation.java:1117)
> ... 264 more {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)