[ 
https://issues.apache.org/jira/browse/PHOENIX-2724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15355811#comment-15355811
 ] 

James Taylor commented on PHOENIX-2724:
---------------------------------------

bq. should we be just swallowing the exception? Does it point to a bug in our 
stats collection or prefix encoding/decoding?
That doesn't sound right, so would be good to follow up on that.

I suspect the culprit might be that we don't do a binary search into the 
guideposts to know where to start, but have to linearly go through them all. We 
could either:
- implement a binary search on top of our encoding format (i.e. in place), but 
this would be tricky because we don't have an easy way of knowing where the 
midway point is with our compression scheme.
- materialize the guideposts (or really just a List<ImmutableBytesWritable> of 
them - no need to explode each key - we can keep them compressed). The 
ImmutableBytesWritable would point to the byte[], offset, and length of each 
key. Then we can do a binary search *mostly* in place.
I think the latter would be a good solution.

I still think my suggestion here[1] works around the issue at SFDC and would be 
a pretty easy change.

[1] 
https://issues.apache.org/jira/browse/PHOENIX-2724?focusedCommentId=15353721&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15353721

> Query with large number of guideposts is slower compared to no stats
> --------------------------------------------------------------------
>
>                 Key: PHOENIX-2724
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-2724
>             Project: Phoenix
>          Issue Type: Bug
>    Affects Versions: 4.7.0
>         Environment: Phoenix 4.7.0-RC4, HBase-0.98.17 on a 8 node cluster
>            Reporter: Mujtaba Chohan
>            Assignee: Samarth Jain
>             Fix For: 4.8.0
>
>         Attachments: PHOENIX-2724.patch, PHOENIX-2724_addendum.patch, 
> PHOENIX-2724_v2.patch
>
>
> With 1MB guidepost width for ~900GB/500M rows table. Queries with short scan 
> range gets significantly slower.
> Without stats:
> {code}
> select * from T limit 10; // query execution time <100 msec
> {code}
> With stats:
> {code}
> select * from T limit 10; // query execution time >20 seconds
> Explain plan: CLIENT 876085-CHUNK 476569382 ROWS 876060986727 BYTES SERIAL 
> 1-WAY FULL SCAN OVER T SERVER 10 ROW LIMIT CLIENT 10 ROW LIMIT
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to