[jira] [Commented] (CASSANDRA-6588) Add a 'NO EMPTY RESULTS' filter to SELECT

Gianluca Borello (JIRA) Mon, 15 Feb 2016 08:25:17 -0800

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-6588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15147545#comment-15147545
 ]


Gianluca Borello commented on CASSANDRA-6588:
---------------------------------------------

Thanks for the comment. As you predicted, this makes a night and day difference.

Running my benchmark script (the one I explained in the mailing list thread) on 
3.3 gives:

Response time for querying a single column on a large table (column size 10 MB):
10 columns: 236 ms
20 columns: 684 ms
30 columns: 1096 ms
40 columns: 1219 ms
50 columns: 1809 ms
... (heap failure after this)

Running it on the latest trunk as of today:

Response time for querying a single column on a large table (column size 10 MB):
10 columns: 52 ms
20 columns: 59 ms
30 columns: 72 ms
40 columns: 100 ms
50 columns: 109 ms
60 columns: 134 ms
70 columns: 155 ms
80 columns: 165 ms
90 columns: 178 ms
100 columns: 199 ms

That's absolutely perfect, I just wish this was addressed in 2.1 or maybe even 
2.2, moving my production environment to 3.4 is way too scary.


> Add a 'NO EMPTY RESULTS' filter to SELECT
> -----------------------------------------
>
>                 Key: CASSANDRA-6588
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6588
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Sylvain Lebresne
>            Priority: Minor
>
> It is the semantic of CQL that a (CQL) row exists as long as it has one 
> non-null column (including the PK columns, which, given that no PK columns 
> can be null, means that it's enough to have the PK set for a row to exist). 
> This does means that the result to
> {noformat}
> CREATE TABLE test (k int PRIMARY KEY, v1 int, v2 int);
> INSERT INTO test(k, v1) VALUES (0, 4);
> SELECT v2 FROM test;
> {noformat}
> must be (and is)
> {noformat}
>  v2
> ------
>  null
> {noformat}
> That fact does mean however that when we only select a few columns of a row, 
> we still need to find out rows that exist but have no values for the selected 
> columns. Long story short, given how the storage engine works, this means we 
> need to query full (CQL) rows even when only some of the columns are selected 
> because that's the only way to distinguish between "the row exists but have 
> no value for the selected columns" and "the row doesn't exist". I'll note in 
> particular that, due to CASSANDRA-5762, we can't unfortunately rely on the 
> row marker to optimize that out.
> Now, when you selects only a subsets of the columns of a row, there is many 
> cases where you don't care about rows that exists but have no value for the 
> columns you requested and are happy to filter those out. So, for those cases, 
> we could provided a new SELECT filter. Outside the potential convenience (not 
> having to filter empty results client side), one interesting part is that 
> when this filter is provided, we could optimize a bit by only querying the 
> columns selected, since we wouldn't need to return rows that exists but have 
> no values for the selected columns.
> For the exact syntax, there is probably a bunch of options. For instance:
> * {{SELECT NON EMPTY(v2, v3) FROM test}}: the vague rational for putting it 
> in the SELECT part is that such filter is kind of in the spirit to DISTINCT.  
> Possibly a bit ugly outside of that.
> * {{SELECT v2, v3 FROM test NO EMPTY RESULTS}} or {{SELECT v2, v3 FROM test 
> NO EMPTY ROWS}} or {{SELECT v2, v3 FROM test NO EMPTY}}: the last one is 
> shorter but maybe a bit less explicit. As for {{RESULTS}} versus {{ROWS}}, 
> the only small object to {{NO EMPTY ROWS}} could be that it might suggest it 
> is filtering non existing rows (I mean, the fact we never ever return non 
> existing rows should hint that it's not what it does but well...) while we're 
> just filtering empty "resultSet rows".
> Of course, if there is a pre-existing SQL syntax for that, it's even better, 
> though a very quick search didn't turn anything. Other suggestions welcome 
> too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-6588) Add a 'NO EMPTY RESULTS' filter to SELECT

Reply via email to