[ 
https://issues.apache.org/jira/browse/CASSANDRA-12915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15693319#comment-15693319
 ] 

Alex Petrov commented on CASSANDRA-12915:
-----------------------------------------

Thank you for the patch! 

I did not yet have enough time to take a deeper look into it (only had a rather 
short look) although I have several thoughts. I also think that problem with 
"primary" you discovered should be resolved prior to this patch (with a caveat 
I do not know what exactly the problem is).

Unfortunately, hitting the index (as it's done in the patch) and fetching 
tokens is already being done, so we're already losing part of performance. I 
think we could do it one level above. 

Currently, {{SSTableIndex}} has only min and max keys. There's also "size", but 
it's not exactly what we need here. Having counts would be much more helpful. 
When composing key intervals in {{View}}, we could as well compose counts, 
which could serve as index cardinalities (although without relation to 
particular expression, just like  {{Index#getEstimatedResultRows}} for 2i). 
Using these cardinalities we could avoid adding index without hitting the next 
tree level. Plus things like {{NEQ}} and non-index column expressions are 
skipped at the same place.

Another thing, the numbers {{100000}} and {{0.01d}} seem a bit arbitrary to me. 

What do you think about that approach?

Unrelated question: if you're only using EQ relations, why not use "native" 2i 
+ filtering for that query (for now, while RI + QueryPlan is not yet there)? 

> SASI: Index intersection can be very inefficient
> ------------------------------------------------
>
>                 Key: CASSANDRA-12915
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-12915
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: sasi
>            Reporter: Corentin Chary
>             Fix For: 3.x
>
>
> It looks like RangeIntersectionIterator.java and be pretty inefficient in 
> some cases. Let's take the following query:
> SELECT data FROM table WHERE index1 = 'foo' AND index2 = 'bar';
> In this case:
> * index1 = 'foo' will match 2 items
> * index2 = 'bar' will match ~300k items
> On my setup, the query will take ~1 sec, most of the time being spent in 
> disk.TokenTree.getTokenAt().
> if I patch RangeIntersectionIterator so that it doesn't try to do the 
> intersection (and effectively only use 'index1') the query will run in a few 
> tenth of milliseconds.
> I see multiple solutions for that:
> * Add a static thresold to avoid the use of the index for the intersection 
> when we know it will be slow. Probably when the range size factor is very 
> small and the range size is big.
> * CASSANDRA-10765



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to