[jira] [Comment Edited] (PHOENIX-2724) Query with large number of guideposts is slower compared to no stats

Samarth Jain (JIRA) Wed, 29 Jun 2016 10:05:11 -0700

    [ 
https://issues.apache.org/jira/browse/PHOENIX-2724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15355481#comment-15355481
 ]


Samarth Jain edited comment on PHOENIX-2724 at 6/29/16 5:03 PM:
----------------------------------------------------------------

So I debugged this by adding a bunch of log lines. And it turns out the culprit 
here isn't the size of stats cache. The default cache size of is more than 
enough to fit in all the guide posts in memory. The below code in 
BaseResultIterators is taking up majority of the time when we have large number 
of guideposts. 

{code}
try {
                    while (guideIndex < gpsSize && 
(currentGuidePost.compareTo(endKey) <= 0 || endKey.length == 0)) {
                        Scan newScan = scanRanges.intersectScan(scan, 
currentKeyBytes, currentGuidePostBytes, keyOffset,
                                false);
                        if(newScan != null) {
                            ScanUtil.setLocalIndexAttributes(newScan, 
keyOffset, regionInfo.getStartKey(),
                                regionInfo.getEndKey(), newScan.getStartRow(), 
newScan.getStopRow());
                        }
                        if (newScan != null) {
                            estimatedRows += gps.getRowCounts().get(guideIndex);
                            estimatedSize += 
gps.getByteCounts().get(guideIndex);
                        }
                        scans = addNewScan(parallelScans, scans, newScan, 
currentGuidePostBytes, false, regionLocation);
                        currentKeyBytes = currentGuidePost.copyBytes();
                        currentGuidePost = PrefixByteCodec.decode(decoder, 
input);
                        currentGuidePostBytes = currentGuidePost.copyBytes();
                        guideIndex++;
                    }
                } 
catch (EOFException e) {}
{code}

Another interesting thing to note is the first call to explain plan takes way 
longer than subsequent ones. And almost all of that time is spent in the above 
loop. That majority of time does *not* include the time spent in fetching stats 
from server or the table metadata. Also of note is that the first explain plan 
results in an EOFException which we end up ignoring. [~jamestaylor] - should we 
be just swallowing the exception? Does it point to a bug in our stats 
collection or prefix encoding/decoding? I am using the same phoenix connection. 
EOF not getting triggered on subsequent executions tells me that there is 
possibly some stale state that isn't getting cleared out when we are closing 
phoenix statements and/or result sets.

+----------------------------------------------------------------------------------------------------+
|                                                PLAN                           
                     |
+----------------------------------------------------------------------------------------------------+
| CLIENT 75130-CHUNK 7672485 ROWS 2775600674 BYTES SERIAL 1-WAY ROUND ROBIN 
FULL SCAN OVER LINEITEM  |
|     SERVER 1 ROW LIMIT                                                        
                     |
| CLIENT 1 ROW LIMIT                                                            
                     |
+----------------------------------------------------------------------------------------------------+
3 rows selected (32.121 seconds)


Subsequent executions of the explain plan/query compilation take lesser time 
but still in the order of seconds. 


+----------------------------------------------------------------------------------------------------+
|                                                PLAN                           
                     |
+----------------------------------------------------------------------------------------------------+
| CLIENT 75130-CHUNK 7672485 ROWS 2775600674 BYTES SERIAL 1-WAY ROUND ROBIN 
FULL SCAN OVER LINEITEM  |
|     SERVER 1 ROW LIMIT                                                        
                     |
| CLIENT 1 ROW LIMIT                                                            
                     |
+----------------------------------------------------------------------------------------------------+
3 rows selected (4.729 seconds)

I couldn't identify one particular method call in the loop mentioned above that 
is taking a long time. It is probably an accumulation of several millisecond 
calls. Will keep digging.





was (Author: samarthjain):
So I debugged this by adding a bunch of log lines. And it turns out the culprit 
here isn't the size of stats cache. The default cache size of is more than 
enough to fit in all the guide posts in memory. The below code in 
BaseResultIterators is taking up majority of the time when we have large number 
of guideposts. 

{code}
try {
                    while (guideIndex < gpsSize && 
(currentGuidePost.compareTo(endKey) <= 0 || endKey.length == 0)) {
                        Scan newScan = scanRanges.intersectScan(scan, 
currentKeyBytes, currentGuidePostBytes, keyOffset,
                                false);
                        if(newScan != null) {
                            ScanUtil.setLocalIndexAttributes(newScan, 
keyOffset, regionInfo.getStartKey(),
                                regionInfo.getEndKey(), newScan.getStartRow(), 
newScan.getStopRow());
                        }
                        if (newScan != null) {
                            estimatedRows += gps.getRowCounts().get(guideIndex);
                            estimatedSize += 
gps.getByteCounts().get(guideIndex);
                        }
                        scans = addNewScan(parallelScans, scans, newScan, 
currentGuidePostBytes, false, regionLocation);
                        currentKeyBytes = currentGuidePost.copyBytes();
                        currentGuidePost = PrefixByteCodec.decode(decoder, 
input);
                        currentGuidePostBytes = currentGuidePost.copyBytes();
                        guideIndex++;
                    }
                } 
catch (EOFException e) {}
{code}

Another interesting thing to note is the first call to explain plan takes way 
longer than subsequent ones. Also of note is that the first explain plan also 
results in an EOFException which we end up ignoring. @James Taylor - should we 
be just swallowing the exception? Does it point to a bug in our stats 
collection or prefix encoding/decoding? I am using the same phoenix connection. 
EOF not getting triggered on subsequent executions tells me that there is 
possibly some stale state that isn't getting cleared out when we are closing 
phoenix statements and/or result sets.

+----------------------------------------------------------------------------------------------------+
|                                                PLAN                           
                     |
+----------------------------------------------------------------------------------------------------+
| CLIENT 75130-CHUNK 7672485 ROWS 2775600674 BYTES SERIAL 1-WAY ROUND ROBIN 
FULL SCAN OVER LINEITEM  |
|     SERVER 1 ROW LIMIT                                                        
                     |
| CLIENT 1 ROW LIMIT                                                            
                     |
+----------------------------------------------------------------------------------------------------+
3 rows selected (32.121 seconds)


Subsequent executions of the explain plan/query compilation take lesser time 
but still in the order of seconds. 


+----------------------------------------------------------------------------------------------------+
|                                                PLAN                           
                     |
+----------------------------------------------------------------------------------------------------+
| CLIENT 75130-CHUNK 7672485 ROWS 2775600674 BYTES SERIAL 1-WAY ROUND ROBIN 
FULL SCAN OVER LINEITEM  |
|     SERVER 1 ROW LIMIT                                                        
                     |
| CLIENT 1 ROW LIMIT                                                            
                     |
+----------------------------------------------------------------------------------------------------+
3 rows selected (4.729 seconds)

I couldn't identify one particular method call in the loop mentioned above that 
is taking a long time. It is probably an accumulation of several millisecond 
calls. Will keep digging.




> Query with large number of guideposts is slower compared to no stats
> --------------------------------------------------------------------
>
>                 Key: PHOENIX-2724
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-2724
>             Project: Phoenix
>          Issue Type: Bug
>    Affects Versions: 4.7.0
>         Environment: Phoenix 4.7.0-RC4, HBase-0.98.17 on a 8 node cluster
>            Reporter: Mujtaba Chohan
>            Assignee: Samarth Jain
>             Fix For: 4.8.0
>
>         Attachments: PHOENIX-2724.patch, PHOENIX-2724_addendum.patch, 
> PHOENIX-2724_v2.patch
>
>
> With 1MB guidepost width for ~900GB/500M rows table. Queries with short scan 
> range gets significantly slower.
> Without stats:
> {code}
> select * from T limit 10; // query execution time <100 msec
> {code}
> With stats:
> {code}
> select * from T limit 10; // query execution time >20 seconds
> Explain plan: CLIENT 876085-CHUNK 476569382 ROWS 876060986727 BYTES SERIAL 
> 1-WAY FULL SCAN OVER T SERVER 10 ROW LIMIT CLIENT 10 ROW LIMIT
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (PHOENIX-2724) Query with large number of guideposts is slower compared to no stats

Reply via email to