[jira] [Comment Edited] (CASSANDRA-8180) Optimize disk seek using min/max column name meta data when the LIMIT clause is used

Stefania (JIRA) Fri, 24 Jul 2015 02:37:14 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-8180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14640202#comment-14640202
 ]


Stefania edited comment on CASSANDRA-8180 at 7/24/15 9:35 AM:
--------------------------------------------------------------

We definitely need a place to archive these stress profiles, attaching them to 
the tickets like I've done so far is not very efficient. CASSANDRA-8503 aims to 
collect important stress profiles for regression testing, perhaps we can 
continue the discussion there?

In terms of this specific performance optimization, so far things aren't going 
too well. The problem is that I don't know how to create a 'timeseries' profile 
with cassandra-stress, is this possible [~benedict]?  We basically need to have 
many clustering rows per partition and they must be ordered.

With the profile attached, at best I have been able to show that we are not 
worse:

http://cstar.datastax.com/tests/id/ac8c686c-31de-11e5-95c3-42010af0688f
http://cstar.datastax.com/tests/id/11dc0080-31dd-11e5-80f3-42010af0688f

However, when I changed the primary key population distribution from 1..100K to 
1B, I ended up with something worse:

http://cstar.datastax.com/tests/id/ac8de5ca-31de-11e5-a5b9-42010af0688f

So I think there is still some work to do. I have also some weirdness in the 
flight recorder profiles, which I have not attached due to their size but I can 
do if anyone is interested. I fixed two hot-spots today but there must be at 
least one more problem. The whole approach of a lazy wrapping iterator is a bit 
fragile IMO, all it takes is to call a method too soon and the entire 
optimization is lost, except the overhead of the wrapper iterator and the 
increased complexity in the merge iterator remains.


was (Author: stefania):
We definitely need a place to archive these stress profiles, attaching them to 
the tickets like I've done so far is not very efficient. CASSANDRA-8503 aims to 
collect important stress profiles for regression testing, perhaps we can 
continue the discussion there?

In terms of this specific performance optimization, so far things aren't going 
too well. The problem is that I don't know how to create a 'timeseries' profile 
with cassandra-stress, is this possible [~benedict]?  We basically need to have 
many clustering rows per partition and they must be ordered.

With the profile attached, at best I have been able to show that we are not 
worse:

http://cstar.datastax.com/tests/id/ac8c686c-31de-11e5-95c3-42010af0688f

However, when I increased the data from 1M to 5M and changed the primary key 
population distribution from 1..100K to 1B, I ended up with something worse:

http://cstar.datastax.com/tests/id/11dc0080-31dd-11e5-80f3-42010af0688f

So I think there is still some work to do. I have also some weirdness in the 
flight recorder profiles, which I have not attached due to their size but I can 
do if anyone is interested. I fixed two hot-spots today but there must be at 
least one more problem. The whole approach of a lazy wrapping iterator is a bit 
fragile IMO, all it takes is to call a method too soon and the entire 
optimization is lost, except the overhead of the wrapper iterator and the 
increased complexity in the merge iterator remains.

> Optimize disk seek using min/max column name meta data when the LIMIT clause 
> is used
> ------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-8180
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8180
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>         Environment: Cassandra 2.0.10
>            Reporter: DOAN DuyHai
>            Assignee: Stefania
>            Priority: Minor
>             Fix For: 3.x
>
>         Attachments: 8180.yaml
>
>
> I was working on an example of sensor data table (timeseries) and face a use 
> case where C* does not optimize read on disk.
> {code}
> cqlsh:test> CREATE TABLE test(id int, col int, val text, PRIMARY KEY(id,col)) 
> WITH CLUSTERING ORDER BY (col DESC);
> cqlsh:test> INSERT INTO test(id, col , val ) VALUES ( 1, 10, '10');
> ...
> >nodetool flush test test
> ...
> cqlsh:test> INSERT INTO test(id, col , val ) VALUES ( 1, 20, '20');
> ...
> >nodetool flush test test
> ...
> cqlsh:test> INSERT INTO test(id, col , val ) VALUES ( 1, 30, '30');
> ...
> >nodetool flush test test
> {code}
> After that, I activate request tracing:
> {code}
> cqlsh:test> SELECT * FROM test WHERE id=1 LIMIT 1;
>  activity                                                                  | 
> timestamp    | source    | source_elapsed
> ---------------------------------------------------------------------------+--------------+-----------+----------------
>                                                         execute_cql3_query | 
> 23:48:46,498 | 127.0.0.1 |              0
>                             Parsing SELECT * FROM test WHERE id=1 LIMIT 1; | 
> 23:48:46,498 | 127.0.0.1 |             74
>                                                        Preparing statement | 
> 23:48:46,499 | 127.0.0.1 |            253
>                                   Executing single-partition query on test | 
> 23:48:46,499 | 127.0.0.1 |            930
>                                               Acquiring sstable references | 
> 23:48:46,499 | 127.0.0.1 |            943
>                                                Merging memtable tombstones | 
> 23:48:46,499 | 127.0.0.1 |           1032
>                                                Key cache hit for sstable 3 | 
> 23:48:46,500 | 127.0.0.1 |           1160
>                                Seeking to partition beginning in data file | 
> 23:48:46,500 | 127.0.0.1 |           1173
>                                                Key cache hit for sstable 2 | 
> 23:48:46,500 | 127.0.0.1 |           1889
>                                Seeking to partition beginning in data file | 
> 23:48:46,500 | 127.0.0.1 |           1901
>                                                Key cache hit for sstable 1 | 
> 23:48:46,501 | 127.0.0.1 |           2373
>                                Seeking to partition beginning in data file | 
> 23:48:46,501 | 127.0.0.1 |           2384
>  Skipped 0/3 non-slice-intersecting sstables, included 0 due to tombstones | 
> 23:48:46,501 | 127.0.0.1 |           2768
>                                 Merging data from memtables and 3 sstables | 
> 23:48:46,501 | 127.0.0.1 |           2784
>                                         Read 2 live and 0 tombstoned cells | 
> 23:48:46,501 | 127.0.0.1 |           2976
>                                                           Request complete | 
> 23:48:46,501 | 127.0.0.1 |           3551
> {code}
> We can clearly see that C* hits 3 SSTables on disk instead of just one, 
> although it has the min/max column meta data to decide which SSTable contains 
> the most recent data.
> Funny enough, if we add a clause on the clustering column to the select, this 
> time C* optimizes the read path:
> {code}
> cqlsh:test> SELECT * FROM test WHERE id=1 AND col > 25 LIMIT 1;
>  activity                                                                  | 
> timestamp    | source    | source_elapsed
> ---------------------------------------------------------------------------+--------------+-----------+----------------
>                                                         execute_cql3_query | 
> 23:52:31,888 | 127.0.0.1 |              0
>                Parsing SELECT * FROM test WHERE id=1 AND col > 25 LIMIT 1; | 
> 23:52:31,888 | 127.0.0.1 |             60
>                                                        Preparing statement | 
> 23:52:31,888 | 127.0.0.1 |            277
>                                   Executing single-partition query on test | 
> 23:52:31,889 | 127.0.0.1 |            961
>                                               Acquiring sstable references | 
> 23:52:31,889 | 127.0.0.1 |            971
>                                                Merging memtable tombstones | 
> 23:52:31,889 | 127.0.0.1 |           1020
>                                                Key cache hit for sstable 3 | 
> 23:52:31,889 | 127.0.0.1 |           1108
>                                Seeking to partition beginning in data file | 
> 23:52:31,889 | 127.0.0.1 |           1117
>  Skipped 2/3 non-slice-intersecting sstables, included 0 due to tombstones | 
> 23:52:31,889 | 127.0.0.1 |           1611
>                                 Merging data from memtables and 1 sstables | 
> 23:52:31,890 | 127.0.0.1 |           1624
>                                         Read 1 live and 0 tombstoned cells | 
> 23:52:31,890 | 127.0.0.1 |           1700
>                                                           Request complete | 
> 23:52:31,890 | 127.0.0.1 |           2140
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (CASSANDRA-8180) Optimize disk seek using min/max column name meta data when the LIMIT clause is used

Reply via email to