[jira] [Comment Edited] (PHOENIX-2724) Query with large number of guideposts is slower compared to no stats

Samarth Jain (JIRA) Sat, 19 Mar 2016 00:16:21 -0700

    [ 
https://issues.apache.org/jira/browse/PHOENIX-2724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15200775#comment-15200775
 ]


Samarth Jain edited comment on PHOENIX-2724 at 3/18/16 1:32 AM:
----------------------------------------------------------------

I created a table with 300 million rows and 330K+ guideposts. I did some 
micro-benchmarking to see where we are spending time and this is what I have:

select * from testExplainPlanTime limit 10;

Time spent in computing 330045 BaseResultIterators.getParallelScans()  122 ms
Included in the above time is the time spent in ScanRanges.intersectScan() 82 ms
In SerialIterators.java, time spent by a single thread in creating 330045 
iterators : 1589
Total time spent in above tasks = 1589 + 122 = 1711 ms
Overall query time = 1809 ms

So it turns out the single biggest culprit is this piece of code in 
SerialIterators.java:

{code}

                @Override
                public PeekingResultIterator call() throws Exception {
                    long startTime = System.currentTimeMillis();
                        List<PeekingResultIterator> concatIterators = 
Lists.newArrayListWithExpectedSize(scans.size());
                        for (final Scan scan : scans) {
                            TableResultIterator scanner = new 
TableResultIterator(mutationState, tableRef, scan, 
context.getReadMetricsQueue().allotMetric(SCAN_BYTES, tableName), 
renewLeaseThreshold);
                            conn.addIterator(scanner);
                            
concatIterators.add(iteratorFactory.newIterator(context, scanner, scan, 
tableName));
                        }
                        PeekingResultIterator concatIterator = 
ConcatResultIterator.newIterator(concatIterators);
                    allIterators.add(concatIterator);
                    System.out.println("Serial iterators - time taken to create 
" + scans.size() + " iterators : " + (System.currentTimeMillis() - startTime));
                    return concatIterator;
                }
{code}

Looping over 330K+ scans and creating iterators out of them takes up much of 
the query time.





was (Author: samarthjain):
I created a table with 300 million rows and 330K+ guideposts. I did some 
micro-benchmarking to see where we are spending time and this is what I have:

select * from testExplainPlanTime limit 10;

Time spent in computing 698818 BaseResultIterators.getParallelScans() 1858 ms

 
In SerialIterators.java, time spent by the thread in creating 698818 iterators 
: 3644
Total time taken: 

Time spent in computing 330045 BaseResultIterators.getParallelScans()  122 ms
Included in the above time is the time spent in ScanRanges.intersectScan() 82 ms
In SerialIterators.java, time spent by a single thread in creating 330045 
iterators : 1589
Total time spent in above tasks = 1589 + 122 = 1711 ms
Overall query time = 1809 ms

So it turns out the single biggest culprit is this piece of code in 
SerialIterators.java:

{code}

                @Override
                public PeekingResultIterator call() throws Exception {
                    long startTime = System.currentTimeMillis();
                        List<PeekingResultIterator> concatIterators = 
Lists.newArrayListWithExpectedSize(scans.size());
                        for (final Scan scan : scans) {
                            TableResultIterator scanner = new 
TableResultIterator(mutationState, tableRef, scan, 
context.getReadMetricsQueue().allotMetric(SCAN_BYTES, tableName), 
renewLeaseThreshold);
                            conn.addIterator(scanner);
                            
concatIterators.add(iteratorFactory.newIterator(context, scanner, scan, 
tableName));
                        }
                        PeekingResultIterator concatIterator = 
ConcatResultIterator.newIterator(concatIterators);
                    allIterators.add(concatIterator);
                    System.out.println("Serial iterators - time taken to create 
" + scans.size() + " iterators : " + (System.currentTimeMillis() - startTime));
                    return concatIterator;
                }
{code}

Looping over 330K+ scans and creating iterators out of them takes up much of 
the query time.




> Query with large number of guideposts is slower compared to no stats
> --------------------------------------------------------------------
>
>                 Key: PHOENIX-2724
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-2724
>             Project: Phoenix
>          Issue Type: Bug
>    Affects Versions: 4.7.0
>         Environment: Phoenix 4.7.0-RC4, HBase-0.98.17 on a 8 node cluster
>            Reporter: Mujtaba Chohan
>            Assignee: Samarth Jain
>             Fix For: 4.8.0
>
>
> With 1MB guidepost width for ~900GB/500M rows table. Queries with short scan 
> range gets significantly slower.
> Without stats:
> {code}
> select * from T limit 10; // query execution time <100 msec
> {code}
> With stats:
> {code}
> select * from T limit 10; // query execution time >20 seconds
> Explain plan: CLIENT 876085-CHUNK 476569382 ROWS 876060986727 BYTES SERIAL 
> 1-WAY FULL SCAN OVER T SERVER 10 ROW LIMIT CLIENT 10 ROW LIMIT
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (PHOENIX-2724) Query with large number of guideposts is slower compared to no stats

Reply via email to