[ https://issues.apache.org/jira/browse/PHOENIX-2724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15355481#comment-15355481 ]
Samarth Jain commented on PHOENIX-2724: --------------------------------------- So I debugged this by adding a bunch of log lines. And it turns out the culprit here isn't the size of stats cache. The default cache size of is more than enough to fit in all the guide posts in memory. The below code in BaseResultIterators is taking up majority of the time when we have large number of guideposts. {code} try { while (guideIndex < gpsSize && (currentGuidePost.compareTo(endKey) <= 0 || endKey.length == 0)) { Scan newScan = scanRanges.intersectScan(scan, currentKeyBytes, currentGuidePostBytes, keyOffset, false); if(newScan != null) { ScanUtil.setLocalIndexAttributes(newScan, keyOffset, regionInfo.getStartKey(), regionInfo.getEndKey(), newScan.getStartRow(), newScan.getStopRow()); } if (newScan != null) { estimatedRows += gps.getRowCounts().get(guideIndex); estimatedSize += gps.getByteCounts().get(guideIndex); } scans = addNewScan(parallelScans, scans, newScan, currentGuidePostBytes, false, regionLocation); currentKeyBytes = currentGuidePost.copyBytes(); currentGuidePost = PrefixByteCodec.decode(decoder, input); currentGuidePostBytes = currentGuidePost.copyBytes(); guideIndex++; } } catch (EOFException e) {} {code} Another interesting thing to note is the first call to explain plan takes way longer than subsequent ones. Also of note is that the first explain plan also results in an EOFException which we end up ignoring. @James Taylor - should we be just swallowing the exception? Does it point to a bug in our stats collection or prefix encoding/decoding? I am using the same phoenix connection. EOF not getting triggered on subsequent executions tells me that there is possibly some stale state that isn't getting cleared out when we are closing phoenix statements and/or result sets. +----------------------------------------------------------------------------------------------------+ | PLAN | +----------------------------------------------------------------------------------------------------+ | CLIENT 75130-CHUNK 7672485 ROWS 2775600674 BYTES SERIAL 1-WAY ROUND ROBIN FULL SCAN OVER LINEITEM | | SERVER 1 ROW LIMIT | | CLIENT 1 ROW LIMIT | +----------------------------------------------------------------------------------------------------+ 3 rows selected (32.121 seconds) Subsequent executions of the explain plan/query compilation take lesser time but still in the order of seconds. +----------------------------------------------------------------------------------------------------+ | PLAN | +----------------------------------------------------------------------------------------------------+ | CLIENT 75130-CHUNK 7672485 ROWS 2775600674 BYTES SERIAL 1-WAY ROUND ROBIN FULL SCAN OVER LINEITEM | | SERVER 1 ROW LIMIT | | CLIENT 1 ROW LIMIT | +----------------------------------------------------------------------------------------------------+ 3 rows selected (4.729 seconds) I couldn't identify one particular method call in the loop mentioned above that is taking a long time. It is probably an accumulation of several millisecond calls. Will keep digging. > Query with large number of guideposts is slower compared to no stats > -------------------------------------------------------------------- > > Key: PHOENIX-2724 > URL: https://issues.apache.org/jira/browse/PHOENIX-2724 > Project: Phoenix > Issue Type: Bug > Affects Versions: 4.7.0 > Environment: Phoenix 4.7.0-RC4, HBase-0.98.17 on a 8 node cluster > Reporter: Mujtaba Chohan > Assignee: Samarth Jain > Fix For: 4.8.0 > > Attachments: PHOENIX-2724.patch, PHOENIX-2724_addendum.patch, > PHOENIX-2724_v2.patch > > > With 1MB guidepost width for ~900GB/500M rows table. Queries with short scan > range gets significantly slower. > Without stats: > {code} > select * from T limit 10; // query execution time <100 msec > {code} > With stats: > {code} > select * from T limit 10; // query execution time >20 seconds > Explain plan: CLIENT 876085-CHUNK 476569382 ROWS 876060986727 BYTES SERIAL > 1-WAY FULL SCAN OVER T SERVER 10 ROW LIMIT CLIENT 10 ROW LIMIT > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)