[ https://issues.apache.org/jira/browse/PHOENIX-2724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15353533#comment-15353533 ]
Samarth Jain commented on PHOENIX-2724: --------------------------------------- Currently our way of determining whether a query should be executed serially relies on whether the amount of data we need to scan is below a threshold. So the way you are suggesting, it would mean that we should execute our queries serially following these conditions: {code} if (perScanLimit == null || scan.getFilter() != null) { return false; } estRowSize = SchemaUtil.estimateRowSize(table); return (perScanLimit * estRowSize < threshold); {code} This kind of check will also mean that we don't need to fetch guide posts info for determining whether to execute a query serially or in parallel. Which mode to follow will then just be governed by a static threshold config. We would likely need to set our threshold to something like 500 MB or so. Having to scan a 20GB region using a single scan will likely cause our queries to run slower. It also looks like we use guide posts today for executing point look up queries. {code} private boolean useStats() { boolean isPointLookup = context.getScanRanges().isPointLookup(); /* * Don't use guide posts if: * 1) We're doing a point lookup, as HBase is fast enough at those * to not need them to be further parallelized. TODO: perf test to verify * 2) We're collecting stats, as in this case we need to scan entire * regions worth of data to track where to put the guide posts. */ if (isPointLookup || ScanUtil.isAnalyzeTable(scan)) { return false; } return true; } {code} So it seems like we shouldn't fetch guide posts for point queries too. > Query with large number of guideposts is slower compared to no stats > -------------------------------------------------------------------- > > Key: PHOENIX-2724 > URL: https://issues.apache.org/jira/browse/PHOENIX-2724 > Project: Phoenix > Issue Type: Bug > Affects Versions: 4.7.0 > Environment: Phoenix 4.7.0-RC4, HBase-0.98.17 on a 8 node cluster > Reporter: Mujtaba Chohan > Assignee: Samarth Jain > Fix For: 4.8.0 > > Attachments: PHOENIX-2724.patch, PHOENIX-2724_addendum.patch, > PHOENIX-2724_v2.patch > > > With 1MB guidepost width for ~900GB/500M rows table. Queries with short scan > range gets significantly slower. > Without stats: > {code} > select * from T limit 10; // query execution time <100 msec > {code} > With stats: > {code} > select * from T limit 10; // query execution time >20 seconds > Explain plan: CLIENT 876085-CHUNK 476569382 ROWS 876060986727 BYTES SERIAL > 1-WAY FULL SCAN OVER T SERVER 10 ROW LIMIT CLIENT 10 ROW LIMIT > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)