[ 
https://issues.apache.org/jira/browse/PHOENIX-1304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14370798#comment-14370798
 ] 

James Taylor edited comment on PHOENIX-1304 at 3/20/15 5:57 AM:
----------------------------------------------------------------

[~samarthjain] - I think the idea is good, but the implementation can be quite 
a bit simpler. I don't think you need to track region servers at all and the 
logic can be completely isolated to BaseQueryPlan.iterator(final List<? extends 
SQLCloseable> dependencies):
- Add a member variable to BaseQueryPlan for iterators (either 
ParallelIterators or SerialIterators)
- If we're running serially or we're doing a skip scan, don't bother checking 
stats or setting no_cache hint.
- Otherwise estimate the bytes traversed given iterators.getSplits(). You can 
get an estimate of the guidepost width by getting the GuidePostsInfo for the 
empty column family and the guidePostsInfo.getByteCount() / 
guidePostInfo.getGuidePosts().size(). If you multiply this by the 
iterators.getSplits().size(), that's the approximate number of bytes traversed.
- Finally, based on if the total bytes traversed exceeds your config (which we 
may want to default if not set to the block size cache), then set the no_cache 
value right on the scan here in the iterator method.


was (Author: jamestaylor):
[~samarthjain] - I think the idea is good, but the implementation can be quite 
a bit simpler. I don't think you need to track region servers at all and the 
logic can be completely isolated to BaseQueryPlan.iterator(final List<? extends 
SQLCloseable> dependencies):
- Add a member variable to BaseQueryPlan for iterators (either 
ParallelIterators or SerialIterators)
- If we're running serially or we're doing a skip scan, don't bother checking 
stats or setting no_cache hint.
- Otherwise estimate the bytes traversed given iterators.getSplits(). You can 
get an estimate of the guidepost width by getting the GuidePostsInfo for the 
empty column family and the guidePostsInfo.getByteCount() / 
guidePostInfo.getGuidePosts().size(). If you multiply this by the 
iterators.getSplits().size(), that's the approximate number of bytes traversed.

> Auto-detect if we should pass the NO_CACHE hint
> -----------------------------------------------
>
>                 Key: PHOENIX-1304
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-1304
>             Project: Phoenix
>          Issue Type: Improvement
>            Reporter: Lars Hofhansl
>            Assignee: Samarth Jain
>            Priority: Minor
>         Attachments: wip.patch
>
>
> Most databases by default avoid filling the block cache during full scans.
> Typically either stats are consulted to decide whether a full scan should 
> fill the blockcache, or a subset of the block cache is dedicated to full scan 
> using the cache like a ring buffer.
> We already have the "NO_CACHE" hint, but we can do better.
> In Phoenix we could detect scans that neither use any parts of the key nor 
> any indexes and then optionally:
> # avoid using the blockcache
> # throw a "slow query" exception (this is especially useful for large data 
> set, where we'd rather fail than go into a nirvana for an hour)
> (both configurable - either globally or per table or connection or query)
> Skip scans represent an interesting middle ground. If we skip many blocks 
> between rows we'd definitely benefit from the blockcache, if not we have a 
> case similar to a full scan.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to