[jira] [Commented] (PHOENIX-2724) Query with large number of guideposts is slower compared to no stats

Josh Elser (JIRA) Tue, 28 Jun 2016 14:04:15 -0700

    [ 
https://issues.apache.org/jira/browse/PHOENIX-2724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15353718#comment-15353718
 ]


Josh Elser commented on PHOENIX-2724:
-------------------------------------

bq. Mujtaba Chohan - as a first test, can you try increasing that cache size 
phoenix.stats.cache.maxSize? This will be an important config parameter. We 
might want to switch it to being a percentage of the heap instead of an 
absolute time.

Additionally, enabling {{TRACE}} on 
{{org.apache.phoenix.query.TableStatsCache}} will tell you when entries are 
added or evicted to the client-side patch.

bq. Mujtaba Chohan did try updating the client side cache by adjusting 
phoenix.client.maxMetaDataCacheSize to 1GB but that didn't help either.

If it is related to the stats not being cached, altering that property wouldn't 
change anything.

bq. Previously, the server-side cache was being used (which I think is bigger). 
If the cache is too small, we end up making an RPC each time to get the stats.

I'm also wondering if there's an optimization to be had in avoiding this case 
in TableStatsCache. We should be able to determine when this is happening (the 
cache not actually acting as a cache for some configuration reason) and just 
short-circuit the RPC, sending back {{EMPTY_STATS}}.

[~samarthjain], [~mujtabachohan], sorry you both got sucked into debugging this 
one. I'm lamenting even more the lack of insight we have into this (ideally, it 
should have been very easy to tell after the fact). I've been rolling the idea 
around about some mechanism we can plug into on the client to better understand 
execution (nothing fancy). Maybe we need to think about this soon after 4.8.0

I'm at a conference most of this week, but I'll try to keep an eye on my inbox 
and help out where possible.

> Query with large number of guideposts is slower compared to no stats
> --------------------------------------------------------------------
>
>                 Key: PHOENIX-2724
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-2724
>             Project: Phoenix
>          Issue Type: Bug
>    Affects Versions: 4.7.0
>         Environment: Phoenix 4.7.0-RC4, HBase-0.98.17 on a 8 node cluster
>            Reporter: Mujtaba Chohan
>            Assignee: Samarth Jain
>             Fix For: 4.8.0
>
>         Attachments: PHOENIX-2724.patch, PHOENIX-2724_addendum.patch, 
> PHOENIX-2724_v2.patch
>
>
> With 1MB guidepost width for ~900GB/500M rows table. Queries with short scan 
> range gets significantly slower.
> Without stats:
> {code}
> select * from T limit 10; // query execution time <100 msec
> {code}
> With stats:
> {code}
> select * from T limit 10; // query execution time >20 seconds
> Explain plan: CLIENT 876085-CHUNK 476569382 ROWS 876060986727 BYTES SERIAL 
> 1-WAY FULL SCAN OVER T SERVER 10 ROW LIMIT CLIENT 10 ROW LIMIT
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (PHOENIX-2724) Query with large number of guideposts is slower compared to no stats

Reply via email to