[ 
https://issues.apache.org/jira/browse/CASSANDRA-11814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Terry Ma updated CASSANDRA-11814:
---------------------------------
    Description: 
we have a cassandra cluster, which has 16 nodes(m4.2xlarge) on aws. 
The total data size is 2.5TB now. There is a very large table named 
inbox_musical, the schema is like this: 
{code}
CREATE TABLE feeds.inbox_musical (
    owner_id bigint,
    activity_id bigint,
    insert_time timestamp,
    PRIMARY KEY (owner_id, activity_id)
) WITH CLUSTERING ORDER BY (activity_id DESC)
    AND bloom_filter_fp_chance = 0.1
    AND caching = '{"keys":"ALL", "rows_per_partition":"20"}'
    AND comment = ''
    AND compaction = {'class': 
'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'}
    AND compression = {'sstable_compression': 
'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND dclocal_read_repair_chance = 0.1
    AND default_time_to_live = 15552000
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair_chance = 0.0
    AND speculative_retry = '99.0PERCENTILE';
{code}

owner_id is the partition key, and activity_id is clustering key.

95% query can return within 10ms. but a few queries read timeout ( > 5s ).

the slow queries is like this:
{code:sql}
SELECT activity_id FROM inbox_musical WHERE owner_id=5582322 ORDER BY 
activity_id DESC LIMIT 100;
{code}

there are only 230 rows in this partition. but the query takes seconds to 
complete.

the query tracing:
{code}
 activity                                                                       
                                                 | timestamp                  | 
source       | source_elapsed
---------------------------------------------------------------------------------------------------------------------------------+----------------------------+--------------+----------------
                                                                                
                              Execute CQL3 query | 2016-05-17 11:21:26.986000 | 
172.31.8.188 |              0
                                              READ message received from 
/172.31.8.188 [MessagingService-Incoming-/172.31.8.188] | 2016-05-17 
11:21:26.985000 | 172.31.15.14 |             13
                                                         Executing 
single-partition query on inbox_musical [SharedPool-Worker-4] | 2016-05-17 
11:21:26.985000 | 172.31.15.14 |            115
                                              READ message received from 
/172.31.8.188 [MessagingService-Incoming-/172.31.8.188] | 2016-05-17 
11:21:26.986000 | 172.31.4.135 |             13
                                                                              
Acquiring sstable references [SharedPool-Worker-4] | 2016-05-17 11:21:26.986000 
| 172.31.15.14 |            137
                                                         Executing 
single-partition query on inbox_musical [SharedPool-Worker-1] | 2016-05-17 
11:21:26.986000 | 172.31.4.135 |            113
                                                                               
Merging memtable tombstones [SharedPool-Worker-4] | 2016-05-17 11:21:26.986000 
| 172.31.15.14 |            161
 Parsing SELECT activity_id FROM inbox_musical WHERE owner_id=5582322 ORDER BY 
activity_id DESC LIMIT 100; [SharedPool-Worker-3] | 2016-05-17 11:21:26.986000 
| 172.31.8.188 |             66
                                                                              
Acquiring sstable references [SharedPool-Worker-1] | 2016-05-17 11:21:26.986000 
| 172.31.4.135 |            136
                                                                           Key 
cache hit for sstable 18948 [SharedPool-Worker-4] | 2016-05-17 11:21:26.986000 
| 172.31.15.14 |            199
                                                                                
       Preparing statement [SharedPool-Worker-3] | 2016-05-17 11:21:26.986000 | 
172.31.8.188 |             99
                                                                               
Merging memtable tombstones [SharedPool-Worker-1] | 2016-05-17 11:21:26.986000 
| 172.31.4.135 |            155
                                                               Seeking to 
partition beginning in data file [SharedPool-Worker-4] | 2016-05-17 
11:21:26.986000 | 172.31.15.14 |            205
                                                                                
      Read-repair DC_LOCAL [SharedPool-Worker-3] | 2016-05-17 11:21:26.986000 | 
172.31.8.188 |            142
                                                                Bloom filter 
allows skipping sstable 20159 [SharedPool-Worker-1] | 2016-05-17 
11:21:26.986000 | 172.31.4.135 |            179
                                                                           Key 
cache hit for sstable 18924 [SharedPool-Worker-4] | 2016-05-17 11:21:26.986000 
| 172.31.15.14 |            459
                                                                           
reading data from /172.31.4.135 [SharedPool-Worker-3] | 2016-05-17 
11:21:26.986000 | 172.31.8.188 |            160
                                                                           Key 
cache hit for sstable 20134 [SharedPool-Worker-1] | 2016-05-17 11:21:26.986000 
| 172.31.4.135 |            195
                                                               Seeking to 
partition beginning in data file [SharedPool-Worker-4] | 2016-05-17 
11:21:26.986000 | 172.31.15.14 |            468
                                                                           
reading data from /172.31.15.14 [SharedPool-Worker-3] | 2016-05-17 
11:21:26.986000 | 172.31.8.188 |            212
                                                               Seeking to 
partition beginning in data file [SharedPool-Worker-1] | 2016-05-17 
11:21:26.986001 | 172.31.4.135 |            202
                                                                           Key 
cache hit for sstable 18547 [SharedPool-Worker-4] | 2016-05-17 11:21:26.986001 
| 172.31.15.14 |            782
                                                 Sending READ message to 
/172.31.4.135 [MessagingService-Outgoing-/172.31.4.135] | 2016-05-17 
11:21:26.986001 | 172.31.8.188 |            249
                                                                           Key 
cache hit for sstable 19782 [SharedPool-Worker-1] | 2016-05-17 11:21:26.986001 
| 172.31.4.135 |            376
                                                               Seeking to 
partition beginning in data file [SharedPool-Worker-4] | 2016-05-17 
11:21:26.986001 | 172.31.15.14 |            792
                                                 Sending READ message to 
/172.31.15.14 [MessagingService-Outgoing-/172.31.15.14] | 2016-05-17 
11:21:26.986001 | 172.31.8.188 |            260
                                                               Seeking to 
partition beginning in data file [SharedPool-Worker-1] | 2016-05-17 
11:21:26.986001 | 172.31.4.135 |            388
                                 Skipped 0/3 non-slice-intersecting sstables, 
included 0 due to tombstones [SharedPool-Worker-4] | 2016-05-17 11:21:26.986001 
| 172.31.15.14 |            963
                                                                           Key 
cache hit for sstable 16500 [SharedPool-Worker-1] | 2016-05-17 11:21:26.986001 
| 172.31.4.135 |            477
                                                                Merging data 
from memtables and 3 sstables [SharedPool-Worker-4] | 2016-05-17 
11:21:26.986001 | 172.31.15.14 |            973
                                                               Seeking to 
partition beginning in data file [SharedPool-Worker-1] | 2016-05-17 
11:21:26.986001 | 172.31.4.135 |            484
                                 Skipped 0/4 non-slice-intersecting sstables, 
included 0 due to tombstones [SharedPool-Worker-1] | 2016-05-17 11:21:26.986001 
| 172.31.4.135 |            573
                                                                Merging data 
from memtables and 3 sstables [SharedPool-Worker-1] | 2016-05-17 
11:21:26.986001 | 172.31.4.135 |            582
                                                                       Read 101 
live and 0 tombstone cells [SharedPool-Worker-1] | 2016-05-17 11:21:31.805000 | 
172.31.4.135 |         819660
                                                                       
Enqueuing response to /172.31.8.188 [SharedPool-Worker-1] | 2016-05-17 
11:21:31.831000 | 172.31.4.135 |         844972
                                     Sending REQUEST_RESPONSE message to 
/172.31.8.188 [MessagingService-Outgoing-/172.31.8.188] | 2016-05-17 
11:21:31.868000 | 172.31.4.135 |         882380
                                  REQUEST_RESPONSE message received from 
/172.31.4.135 [MessagingService-Incoming-/172.31.4.135] | 2016-05-17 
11:21:31.962000 | 172.31.8.188 |         976119
                                                                    Processing 
response from /172.31.4.135 [SharedPool-Worker-2] | 2016-05-17 11:21:31.962000 
| 172.31.8.188 |         976166
                                                                                
                                Request c
                                                                                
                            omplete | 2016-05-17 11:21:32.014093 | 172.31.8.188 
|          28093
{code}

I used jstack to get some stacks when c* is execute query like above. and found 
it seems c* take a long time do the following methods:
{code}
"SharedPool-Worker-5" #221 daemon prio=5 os_prio=0 tid=0x00007f177517a000 
nid=0x8b6 runnable [0x00007f1471e5b000]
   java.lang.Thread.State: RUNNABLE
    at 
org.apache.cassandra.db.RangeTombstoneList.addInternal(RangeTombstoneList.java:689)
    at 
org.apache.cassandra.db.RangeTombstoneList.insertFrom(RangeTombstoneList.java:649)
    at 
org.apache.cassandra.db.RangeTombstoneList.add(RangeTombstoneList.java:170)
    at 
org.apache.cassandra.db.RangeTombstoneList.add(RangeTombstoneList.java:142)
    at org.apache.cassandra.db.DeletionInfo.add(DeletionInfo.java:239)
    at 
org.apache.cassandra.db.ArrayBackedSortedColumns.delete(ArrayBackedSortedColumns.java:490)
    at org.apache.cassandra.db.ColumnFamily.addAtom(ColumnFamily.java:153)
    at 
org.apache.cassandra.db.filter.QueryFilter$2.getNext(QueryFilter.java:185)
    at 
org.apache.cassandra.db.filter.QueryFilter$2.hasNext(QueryFilter.java:157)
    at 
org.apache.cassandra.utils.MergeIterator$Candidate.advance(MergeIterator.java:146)
    at 
org.apache.cassandra.utils.MergeIterator$ManyToOne.advance(MergeIterator.java:125)
    at 
org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:99)
    at 
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
    at 
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
    at 
org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(SliceQueryFilter.java:264)
    at 
org.apache.cassandra.db.filter.QueryFilter.collateColumns(QueryFilter.java:109)
    at 
org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:82)
    at 
org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:69)
    at 
org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:317)
    at 
org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:61)
    at 
org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:2011)
    at 
org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1815)
    at org.apache.cassandra.db.Keyspace.getRow(Keyspace.java:360)
    at 
org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:85)
{code}

I executed these query when server was on very low load.

from the tracing it seems that something is very slow between 'Merging data 
from memtables and 3 sstables' and 'Read 101 live and 0 tombstone cells'

  was:
we have a cassandra cluster, which has 16 nodes(m4.2xlarge) on aws. 
The total data size is 2.5TB now. There is a very large table named 
inbox_musical, the schema is like this: 
{code}
CREATE TABLE feeds.inbox_musical (
    owner_id bigint,
    activity_id bigint,
    insert_time timestamp,
    PRIMARY KEY (owner_id, activity_id)
) WITH CLUSTERING ORDER BY (activity_id DESC)
    AND bloom_filter_fp_chance = 0.1
    AND caching = '{"keys":"ALL", "rows_per_partition":"20"}'
    AND comment = ''
    AND compaction = {'class': 
'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'}
    AND compression = {'sstable_compression': 
'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND dclocal_read_repair_chance = 0.1
    AND default_time_to_live = 15552000
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair_chance = 0.0
    AND speculative_retry = '99.0PERCENTILE';
{code}

owner_id is the partition key, and activity_id is clustering key.

95% query can return within 10ms. but a few queries read timeout ( > 5s ).

the slow queries is like this:
{code:sql}
SELECT activity_id FROM inbox_musical WHERE owner_id=5582322 ORDER BY 
activity_id DESC LIMIT 100;
{code}

there are only 230 rows in this partition. but the query takes seconds to 
complete.

the query tracing:
{code}
 activity                                                                       
                                                 | timestamp                  | 
source       | source_elapsed
---------------------------------------------------------------------------------------------------------------------------------+----------------------------+--------------+----------------
                                                                                
                              Execute CQL3 query | 2016-05-17 11:21:26.986000 | 
172.31.8.188 |              0
                                              READ message received from 
/172.31.8.188 [MessagingService-Incoming-/172.31.8.188] | 2016-05-17 
11:21:26.985000 | 172.31.15.14 |             13
                                                         Executing 
single-partition query on inbox_musical [SharedPool-Worker-4] | 2016-05-17 
11:21:26.985000 | 172.31.15.14 |            115
                                              READ message received from 
/172.31.8.188 [MessagingService-Incoming-/172.31.8.188] | 2016-05-17 
11:21:26.986000 | 172.31.4.135 |             13
                                                                              
Acquiring sstable references [SharedPool-Worker-4] | 2016-05-17 11:21:26.986000 
| 172.31.15.14 |            137
                                                         Executing 
single-partition query on inbox_musical [SharedPool-Worker-1] | 2016-05-17 
11:21:26.986000 | 172.31.4.135 |            113
                                                                               
Merging memtable tombstones [SharedPool-Worker-4] | 2016-05-17 11:21:26.986000 
| 172.31.15.14 |            161
 Parsing SELECT activity_id FROM inbox_musical WHERE owner_id=5582322 ORDER BY 
activity_id DESC LIMIT 100; [SharedPool-Worker-3] | 2016-05-17 11:21:26.986000 
| 172.31.8.188 |             66
                                                                              
Acquiring sstable references [SharedPool-Worker-1] | 2016-05-17 11:21:26.986000 
| 172.31.4.135 |            136
                                                                           Key 
cache hit for sstable 18948 [SharedPool-Worker-4] | 2016-05-17 11:21:26.986000 
| 172.31.15.14 |            199
                                                                                
       Preparing statement [SharedPool-Worker-3] | 2016-05-17 11:21:26.986000 | 
172.31.8.188 |             99
                                                                               
Merging memtable tombstones [SharedPool-Worker-1] | 2016-05-17 11:21:26.986000 
| 172.31.4.135 |            155
                                                               Seeking to 
partition beginning in data file [SharedPool-Worker-4] | 2016-05-17 
11:21:26.986000 | 172.31.15.14 |            205
                                                                                
      Read-repair DC_LOCAL [SharedPool-Worker-3] | 2016-05-17 11:21:26.986000 | 
172.31.8.188 |            142
                                                                Bloom filter 
allows skipping sstable 20159 [SharedPool-Worker-1] | 2016-05-17 
11:21:26.986000 | 172.31.4.135 |            179
                                                                           Key 
cache hit for sstable 18924 [SharedPool-Worker-4] | 2016-05-17 11:21:26.986000 
| 172.31.15.14 |            459
                                                                           
reading data from /172.31.4.135 [SharedPool-Worker-3] | 2016-05-17 
11:21:26.986000 | 172.31.8.188 |            160
                                                                           Key 
cache hit for sstable 20134 [SharedPool-Worker-1] | 2016-05-17 11:21:26.986000 
| 172.31.4.135 |            195
                                                               Seeking to 
partition beginning in data file [SharedPool-Worker-4] | 2016-05-17 
11:21:26.986000 | 172.31.15.14 |            468
                                                                           
reading data from /172.31.15.14 [SharedPool-Worker-3] | 2016-05-17 
11:21:26.986000 | 172.31.8.188 |            212
                                                               Seeking to 
partition beginning in data file [SharedPool-Worker-1] | 2016-05-17 
11:21:26.986001 | 172.31.4.135 |            202
                                                                           Key 
cache hit for sstable 18547 [SharedPool-Worker-4] | 2016-05-17 11:21:26.986001 
| 172.31.15.14 |            782
                                                 Sending READ message to 
/172.31.4.135 [MessagingService-Outgoing-/172.31.4.135] | 2016-05-17 
11:21:26.986001 | 172.31.8.188 |            249
                                                                           Key 
cache hit for sstable 19782 [SharedPool-Worker-1] | 2016-05-17 11:21:26.986001 
| 172.31.4.135 |            376
                                                               Seeking to 
partition beginning in data file [SharedPool-Worker-4] | 2016-05-17 
11:21:26.986001 | 172.31.15.14 |            792
                                                 Sending READ message to 
/172.31.15.14 [MessagingService-Outgoing-/172.31.15.14] | 2016-05-17 
11:21:26.986001 | 172.31.8.188 |            260
                                                               Seeking to 
partition beginning in data file [SharedPool-Worker-1] | 2016-05-17 
11:21:26.986001 | 172.31.4.135 |            388
                                 Skipped 0/3 non-slice-intersecting sstables, 
included 0 due to tombstones [SharedPool-Worker-4] | 2016-05-17 11:21:26.986001 
| 172.31.15.14 |            963
                                                                           Key 
cache hit for sstable 16500 [SharedPool-Worker-1] | 2016-05-17 11:21:26.986001 
| 172.31.4.135 |            477
                                                                Merging data 
from memtables and 3 sstables [SharedPool-Worker-4] | 2016-05-17 
11:21:26.986001 | 172.31.15.14 |            973
                                                               Seeking to 
partition beginning in data file [SharedPool-Worker-1] | 2016-05-17 
11:21:26.986001 | 172.31.4.135 |            484
                                 Skipped 0/4 non-slice-intersecting sstables, 
included 0 due to tombstones [SharedPool-Worker-1] | 2016-05-17 11:21:26.986001 
| 172.31.4.135 |            573
                                                                Merging data 
from memtables and 3 sstables [SharedPool-Worker-1] | 2016-05-17 
11:21:26.986001 | 172.31.4.135 |            582
                                                                       Read 101 
live and 0 tombstone cells [SharedPool-Worker-1] | 2016-05-17 11:21:31.805000 | 
172.31.4.135 |         819660
                                                                       
Enqueuing response to /172.31.8.188 [SharedPool-Worker-1] | 2016-05-17 
11:21:31.831000 | 172.31.4.135 |         844972
                                     Sending REQUEST_RESPONSE message to 
/172.31.8.188 [MessagingService-Outgoing-/172.31.8.188] | 2016-05-17 
11:21:31.868000 | 172.31.4.135 |         882380
                                  REQUEST_RESPONSE message received from 
/172.31.4.135 [MessagingService-Incoming-/172.31.4.135] | 2016-05-17 
11:21:31.962000 | 172.31.8.188 |         976119
                                                                    Processing 
response from /172.31.4.135 [SharedPool-Worker-2] | 2016-05-17 11:21:31.962000 
| 172.31.8.188 |         976166
                                                                                
                                Request c
                                                                                
                            omplete | 2016-05-17 11:21:32.014093 | 172.31.8.188 
|          28093
{code}

I used jstack to get some stacks when c* is execute query like above. and found 
it seems c* take a long time do the following methods:
{code}
"SharedPool-Worker-5" #221 daemon prio=5 os_prio=0 tid=0x00007f177517a000 
nid=0x8b6 runnable [0x00007f1471e5b000]
   java.lang.Thread.State: RUNNABLE
    at 
org.apache.cassandra.db.RangeTombstoneList.addInternal(RangeTombstoneList.java:689)
    at 
org.apache.cassandra.db.RangeTombstoneList.insertFrom(RangeTombstoneList.java:649)
    at 
org.apache.cassandra.db.RangeTombstoneList.add(RangeTombstoneList.java:170)
    at 
org.apache.cassandra.db.RangeTombstoneList.add(RangeTombstoneList.java:142)
    at org.apache.cassandra.db.DeletionInfo.add(DeletionInfo.java:239)
    at 
org.apache.cassandra.db.ArrayBackedSortedColumns.delete(ArrayBackedSortedColumns.java:490)
    at org.apache.cassandra.db.ColumnFamily.addAtom(ColumnFamily.java:153)
    at 
org.apache.cassandra.db.filter.QueryFilter$2.getNext(QueryFilter.java:185)
    at 
org.apache.cassandra.db.filter.QueryFilter$2.hasNext(QueryFilter.java:157)
    at 
org.apache.cassandra.utils.MergeIterator$Candidate.advance(MergeIterator.java:146)
    at 
org.apache.cassandra.utils.MergeIterator$ManyToOne.advance(MergeIterator.java:125)
    at 
org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:99)
    at 
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
    at 
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
    at 
org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(SliceQueryFilter.java:264)
    at 
org.apache.cassandra.db.filter.QueryFilter.collateColumns(QueryFilter.java:109)
    at 
org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:82)
    at 
org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:69)
    at 
org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:317)
    at 
org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:61)
    at 
org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:2011)
    at 
org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1815)
    at org.apache.cassandra.db.Keyspace.getRow(Keyspace.java:360)
    at 
org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:85)
{code}

I executed these query when server was on very low load.


> some queries are very slow on a small partition
> -----------------------------------------------
>
>                 Key: CASSANDRA-11814
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11814
>             Project: Cassandra
>          Issue Type: Bug
>          Components: CQL
>         Environment: os: ubuntu server 14.04
> cassandra: 2.2.5 (apt-get installed from datastax repo)
> server: ec2 m4.2xlarge
>            Reporter: Terry Ma
>            Priority: Minor
>              Labels: performance
>
> we have a cassandra cluster, which has 16 nodes(m4.2xlarge) on aws. 
> The total data size is 2.5TB now. There is a very large table named 
> inbox_musical, the schema is like this: 
> {code}
> CREATE TABLE feeds.inbox_musical (
>     owner_id bigint,
>     activity_id bigint,
>     insert_time timestamp,
>     PRIMARY KEY (owner_id, activity_id)
> ) WITH CLUSTERING ORDER BY (activity_id DESC)
>     AND bloom_filter_fp_chance = 0.1
>     AND caching = '{"keys":"ALL", "rows_per_partition":"20"}'
>     AND comment = ''
>     AND compaction = {'class': 
> 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'}
>     AND compression = {'sstable_compression': 
> 'org.apache.cassandra.io.compress.LZ4Compressor'}
>     AND dclocal_read_repair_chance = 0.1
>     AND default_time_to_live = 15552000
>     AND gc_grace_seconds = 864000
>     AND max_index_interval = 2048
>     AND memtable_flush_period_in_ms = 0
>     AND min_index_interval = 128
>     AND read_repair_chance = 0.0
>     AND speculative_retry = '99.0PERCENTILE';
> {code}
> owner_id is the partition key, and activity_id is clustering key.
> 95% query can return within 10ms. but a few queries read timeout ( > 5s ).
> the slow queries is like this:
> {code:sql}
> SELECT activity_id FROM inbox_musical WHERE owner_id=5582322 ORDER BY 
> activity_id DESC LIMIT 100;
> {code}
> there are only 230 rows in this partition. but the query takes seconds to 
> complete.
> the query tracing:
> {code}
>  activity                                                                     
>                                                    | timestamp                
>   | source       | source_elapsed
> ---------------------------------------------------------------------------------------------------------------------------------+----------------------------+--------------+----------------
>                                                                               
>                                 Execute CQL3 query | 2016-05-17 
> 11:21:26.986000 | 172.31.8.188 |              0
>                                               READ message received from 
> /172.31.8.188 [MessagingService-Incoming-/172.31.8.188] | 2016-05-17 
> 11:21:26.985000 | 172.31.15.14 |             13
>                                                          Executing 
> single-partition query on inbox_musical [SharedPool-Worker-4] | 2016-05-17 
> 11:21:26.985000 | 172.31.15.14 |            115
>                                               READ message received from 
> /172.31.8.188 [MessagingService-Incoming-/172.31.8.188] | 2016-05-17 
> 11:21:26.986000 | 172.31.4.135 |             13
>                                                                               
> Acquiring sstable references [SharedPool-Worker-4] | 2016-05-17 
> 11:21:26.986000 | 172.31.15.14 |            137
>                                                          Executing 
> single-partition query on inbox_musical [SharedPool-Worker-1] | 2016-05-17 
> 11:21:26.986000 | 172.31.4.135 |            113
>                                                                               
>  Merging memtable tombstones [SharedPool-Worker-4] | 2016-05-17 
> 11:21:26.986000 | 172.31.15.14 |            161
>  Parsing SELECT activity_id FROM inbox_musical WHERE owner_id=5582322 ORDER 
> BY activity_id DESC LIMIT 100; [SharedPool-Worker-3] | 2016-05-17 
> 11:21:26.986000 | 172.31.8.188 |             66
>                                                                               
> Acquiring sstable references [SharedPool-Worker-1] | 2016-05-17 
> 11:21:26.986000 | 172.31.4.135 |            136
>                                                                            
> Key cache hit for sstable 18948 [SharedPool-Worker-4] | 2016-05-17 
> 11:21:26.986000 | 172.31.15.14 |            199
>                                                                               
>          Preparing statement [SharedPool-Worker-3] | 2016-05-17 
> 11:21:26.986000 | 172.31.8.188 |             99
>                                                                               
>  Merging memtable tombstones [SharedPool-Worker-1] | 2016-05-17 
> 11:21:26.986000 | 172.31.4.135 |            155
>                                                                Seeking to 
> partition beginning in data file [SharedPool-Worker-4] | 2016-05-17 
> 11:21:26.986000 | 172.31.15.14 |            205
>                                                                               
>         Read-repair DC_LOCAL [SharedPool-Worker-3] | 2016-05-17 
> 11:21:26.986000 | 172.31.8.188 |            142
>                                                                 Bloom filter 
> allows skipping sstable 20159 [SharedPool-Worker-1] | 2016-05-17 
> 11:21:26.986000 | 172.31.4.135 |            179
>                                                                            
> Key cache hit for sstable 18924 [SharedPool-Worker-4] | 2016-05-17 
> 11:21:26.986000 | 172.31.15.14 |            459
>                                                                            
> reading data from /172.31.4.135 [SharedPool-Worker-3] | 2016-05-17 
> 11:21:26.986000 | 172.31.8.188 |            160
>                                                                            
> Key cache hit for sstable 20134 [SharedPool-Worker-1] | 2016-05-17 
> 11:21:26.986000 | 172.31.4.135 |            195
>                                                                Seeking to 
> partition beginning in data file [SharedPool-Worker-4] | 2016-05-17 
> 11:21:26.986000 | 172.31.15.14 |            468
>                                                                            
> reading data from /172.31.15.14 [SharedPool-Worker-3] | 2016-05-17 
> 11:21:26.986000 | 172.31.8.188 |            212
>                                                                Seeking to 
> partition beginning in data file [SharedPool-Worker-1] | 2016-05-17 
> 11:21:26.986001 | 172.31.4.135 |            202
>                                                                            
> Key cache hit for sstable 18547 [SharedPool-Worker-4] | 2016-05-17 
> 11:21:26.986001 | 172.31.15.14 |            782
>                                                  Sending READ message to 
> /172.31.4.135 [MessagingService-Outgoing-/172.31.4.135] | 2016-05-17 
> 11:21:26.986001 | 172.31.8.188 |            249
>                                                                            
> Key cache hit for sstable 19782 [SharedPool-Worker-1] | 2016-05-17 
> 11:21:26.986001 | 172.31.4.135 |            376
>                                                                Seeking to 
> partition beginning in data file [SharedPool-Worker-4] | 2016-05-17 
> 11:21:26.986001 | 172.31.15.14 |            792
>                                                  Sending READ message to 
> /172.31.15.14 [MessagingService-Outgoing-/172.31.15.14] | 2016-05-17 
> 11:21:26.986001 | 172.31.8.188 |            260
>                                                                Seeking to 
> partition beginning in data file [SharedPool-Worker-1] | 2016-05-17 
> 11:21:26.986001 | 172.31.4.135 |            388
>                                  Skipped 0/3 non-slice-intersecting sstables, 
> included 0 due to tombstones [SharedPool-Worker-4] | 2016-05-17 
> 11:21:26.986001 | 172.31.15.14 |            963
>                                                                            
> Key cache hit for sstable 16500 [SharedPool-Worker-1] | 2016-05-17 
> 11:21:26.986001 | 172.31.4.135 |            477
>                                                                 Merging data 
> from memtables and 3 sstables [SharedPool-Worker-4] | 2016-05-17 
> 11:21:26.986001 | 172.31.15.14 |            973
>                                                                Seeking to 
> partition beginning in data file [SharedPool-Worker-1] | 2016-05-17 
> 11:21:26.986001 | 172.31.4.135 |            484
>                                  Skipped 0/4 non-slice-intersecting sstables, 
> included 0 due to tombstones [SharedPool-Worker-1] | 2016-05-17 
> 11:21:26.986001 | 172.31.4.135 |            573
>                                                                 Merging data 
> from memtables and 3 sstables [SharedPool-Worker-1] | 2016-05-17 
> 11:21:26.986001 | 172.31.4.135 |            582
>                                                                        Read 
> 101 live and 0 tombstone cells [SharedPool-Worker-1] | 2016-05-17 
> 11:21:31.805000 | 172.31.4.135 |         819660
>                                                                        
> Enqueuing response to /172.31.8.188 [SharedPool-Worker-1] | 2016-05-17 
> 11:21:31.831000 | 172.31.4.135 |         844972
>                                      Sending REQUEST_RESPONSE message to 
> /172.31.8.188 [MessagingService-Outgoing-/172.31.8.188] | 2016-05-17 
> 11:21:31.868000 | 172.31.4.135 |         882380
>                                   REQUEST_RESPONSE message received from 
> /172.31.4.135 [MessagingService-Incoming-/172.31.4.135] | 2016-05-17 
> 11:21:31.962000 | 172.31.8.188 |         976119
>                                                                     
> Processing response from /172.31.4.135 [SharedPool-Worker-2] | 2016-05-17 
> 11:21:31.962000 | 172.31.8.188 |         976166
>                                                                               
>                                   Request c
>                                                                               
>                               omplete | 2016-05-17 11:21:32.014093 | 
> 172.31.8.188 |          28093
> {code}
> I used jstack to get some stacks when c* is execute query like above. and 
> found it seems c* take a long time do the following methods:
> {code}
> "SharedPool-Worker-5" #221 daemon prio=5 os_prio=0 tid=0x00007f177517a000 
> nid=0x8b6 runnable [0x00007f1471e5b000]
>    java.lang.Thread.State: RUNNABLE
>     at 
> org.apache.cassandra.db.RangeTombstoneList.addInternal(RangeTombstoneList.java:689)
>     at 
> org.apache.cassandra.db.RangeTombstoneList.insertFrom(RangeTombstoneList.java:649)
>     at 
> org.apache.cassandra.db.RangeTombstoneList.add(RangeTombstoneList.java:170)
>     at 
> org.apache.cassandra.db.RangeTombstoneList.add(RangeTombstoneList.java:142)
>     at org.apache.cassandra.db.DeletionInfo.add(DeletionInfo.java:239)
>     at 
> org.apache.cassandra.db.ArrayBackedSortedColumns.delete(ArrayBackedSortedColumns.java:490)
>     at org.apache.cassandra.db.ColumnFamily.addAtom(ColumnFamily.java:153)
>     at 
> org.apache.cassandra.db.filter.QueryFilter$2.getNext(QueryFilter.java:185)
>     at 
> org.apache.cassandra.db.filter.QueryFilter$2.hasNext(QueryFilter.java:157)
>     at 
> org.apache.cassandra.utils.MergeIterator$Candidate.advance(MergeIterator.java:146)
>     at 
> org.apache.cassandra.utils.MergeIterator$ManyToOne.advance(MergeIterator.java:125)
>     at 
> org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:99)
>     at 
> com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
>     at 
> com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
>     at 
> org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(SliceQueryFilter.java:264)
>     at 
> org.apache.cassandra.db.filter.QueryFilter.collateColumns(QueryFilter.java:109)
>     at 
> org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:82)
>     at 
> org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:69)
>     at 
> org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:317)
>     at 
> org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:61)
>     at 
> org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:2011)
>     at 
> org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1815)
>     at org.apache.cassandra.db.Keyspace.getRow(Keyspace.java:360)
>     at 
> org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:85)
> {code}
> I executed these query when server was on very low load.
> from the tracing it seems that something is very slow between 'Merging data 
> from memtables and 3 sstables' and 'Read 101 live and 0 tombstone cells'



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to