[jira] [Commented] (PHOENIX-7253) Perf improvement for non-full scan queries on large table

Daniel Wong (Jira) Tue, 05 Mar 2024 00:37:17 -0800


    [ 
https://issues.apache.org/jira/browse/PHOENIX-7253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17823491#comment-17823491
 ]


Daniel Wong commented on PHOENIX-7253:
--------------------------------------

I found some of my notes:

Phoenix code doing this is a hack given HBase limitations... in 1.x there was 
no easy access to the hbase clients metadata cache and this method was used to 
kind of hackly get the cached values only as opposed to directly calling 
{{HConnection.getAllRegionLocations }}

This code path is being called from “{{{}getAllTableRegions.{}}} This is used 
when the Phoenix client wants/needs to update the region info for a hbase 
table. This could be done during scan generation or when a split/merge is 
detected. This method calls {{HConnection.getRegionLocation}} iteratively using 
one region at a time to find the next region. This is done so that this uses 
the cache in the HConnection itself rather than the 
HTable.getAllRegionLocations which appears to cause a MetaScan and re-cache on 
the HBase client of all the regions in the 1.x codebase. Where this can become 
an issue is if one region splits many scans may simultaneously call this method 
in overloading the client threads as well as possibly meta. Some protection so 
that a client can only call this from a single thread for a single table would 
improve this flow greatly. Whether this should be done inside of hbase itself 
or in phoenix is debatable but Phoenix is intending to use the HBase client 
cache rather than store its own cache of region boundaries.

I haven't had time to explore the 2.X codebase but much of this has changed and 
been replace via async region locator flow here 
[https://github.com/apache/hbase/blob/master/hbase-client/src/main/java/org/apache/hadoop/hbase/client/AsyncTableRegionLocatorImpl.java#L59]
 and this appears to again call and then cache rather than use the cache were 
available... some trickyness here.
Want the code orignally wanted/intended was to get all the regions from the 
hclient cache and only renew the cache on detected split/merge iirc.

> Perf improvement for non-full scan queries on large table
> ---------------------------------------------------------
>
>                 Key: PHOENIX-7253
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-7253
>             Project: Phoenix
>          Issue Type: Improvement
>    Affects Versions: 5.2.0, 5.1.3
>            Reporter: Viraj Jasani
>            Assignee: Viraj Jasani
>            Priority: Critical
>             Fix For: 5.2.0, 5.1.4
>
>
> Any considerably large table with more than 100k regions can give problematic 
> performance if we access all region locations from meta for the given table 
> before generating parallel or sequential scans for the given query. The perf 
> impact can really hurt range scan queries.
> Consider a table with hundreds of thousands of tenant views. Unless the query 
> is strict point lookup, any query on any tenant view would end up retrieving 
> region locations of all regions of the base table. In case if IOException is 
> thrown by HBase client during any region location lookup in meta, we only 
> perform single retry.
> Proposal:
>  # All non point lookup queries should only retrieve region locations that 
> cover the scan boundary. Avoid fetching all region locations of the base 
> table.
>  # Make retries configurable with higher default value.
>  
> Sample stacktrace from the multiple failures observed:
> {code:java}
> java.sql.SQLException: ERROR 1102 (XCL02): Cannot get all table regions.Stack 
> trace: java.sql.SQLException: ERROR 1102 (XCL02): Cannot get all table 
> regions.
>     at 
> org.apache.phoenix.exception.SQLExceptionCode$Factory$1.newException(SQLExceptionCode.java:620)
>     at 
> org.apache.phoenix.exception.SQLExceptionInfo.buildException(SQLExceptionInfo.java:229)
>     at 
> org.apache.phoenix.query.ConnectionQueryServicesImpl.getAllTableRegions(ConnectionQueryServicesImpl.java:781)
>     at 
> org.apache.phoenix.query.DelegateConnectionQueryServices.getAllTableRegions(DelegateConnectionQueryServices.java:87)
>     at 
> org.apache.phoenix.query.DelegateConnectionQueryServices.getAllTableRegions(DelegateConnectionQueryServices.java:87)
>     at 
> org.apache.phoenix.iterate.DefaultParallelScanGrouper.getRegionBoundaries(DefaultParallelScanGrouper.java:74)
>     at 
> org.apache.phoenix.iterate.BaseResultIterators.getRegionBoundaries(BaseResultIterators.java:587)
>     at 
> org.apache.phoenix.iterate.BaseResultIterators.getParallelScans(BaseResultIterators.java:936)
>     at 
> org.apache.phoenix.iterate.BaseResultIterators.getParallelScans(BaseResultIterators.java:669)
>     at 
> org.apache.phoenix.iterate.BaseResultIterators.<init>(BaseResultIterators.java:555)
>     at 
> org.apache.phoenix.iterate.SerialIterators.<init>(SerialIterators.java:69)
>     at org.apache.phoenix.execute.ScanPlan.newIterator(ScanPlan.java:278)
>     at 
> org.apache.phoenix.execute.BaseQueryPlan.iterator(BaseQueryPlan.java:374)
>     at 
> org.apache.phoenix.execute.BaseQueryPlan.iterator(BaseQueryPlan.java:222)
>     at 
> org.apache.phoenix.execute.BaseQueryPlan.iterator(BaseQueryPlan.java:217)
>     at 
> org.apache.phoenix.execute.BaseQueryPlan.iterator(BaseQueryPlan.java:212)
>     at 
> org.apache.phoenix.jdbc.PhoenixStatement$1.call(PhoenixStatement.java:370)
>     at 
> org.apache.phoenix.jdbc.PhoenixStatement$1.call(PhoenixStatement.java:328)
>     at org.apache.phoenix.call.CallRunner.run(CallRunner.java:53)
>     at 
> org.apache.phoenix.jdbc.PhoenixStatement.executeQuery(PhoenixStatement.java:328)
>     at 
> org.apache.phoenix.jdbc.PhoenixStatement.executeQuery(PhoenixStatement.java:320)
>     at 
> org.apache.phoenix.jdbc.PhoenixPreparedStatement.executeQuery(PhoenixPreparedStatement.java:188)
>     ...
>     ...
>     Caused by: java.io.InterruptedIOException: Origin: InterruptedException
>         at 
> org.apache.hadoop.hbase.util.ExceptionUtil.asInterrupt(ExceptionUtil.java:72)
>         at 
> org.apache.hadoop.hbase.client.ConnectionImplementation.takeUserRegionLock(ConnectionImplementation.java:1129)
>         at 
> org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegionInMeta(ConnectionImplementation.java:994)
>         at 
> org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegion(ConnectionImplementation.java:895)
>         at 
> org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegion(ConnectionImplementation.java:881)
>         at 
> org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegion(ConnectionImplementation.java:851)
>         at 
> org.apache.hadoop.hbase.client.ConnectionImplementation.getRegionLocation(ConnectionImplementation.java:730)
>         at 
> org.apache.phoenix.query.ConnectionQueryServicesImpl.getAllTableRegions(ConnectionQueryServicesImpl.java:766)
>         ... 254 more
> Caused by: java.lang.InterruptedException
>         at 
> java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireNanos(AbstractQueuedSynchronizer.java:982)
>         at 
> java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireNanos(AbstractQueuedSynchronizer.java:1288)
>         at 
> java.base/java.util.concurrent.locks.ReentrantLock.tryLock(ReentrantLock.java:424)
>         at 
> org.apache.hadoop.hbase.client.ConnectionImplementation.takeUserRegionLock(ConnectionImplementation.java:1117)
>         ... 264 more {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (PHOENIX-7253) Perf improvement for non-full scan queries on large table

Reply via email to