[ 
https://issues.apache.org/jira/browse/CASSANDRA-10502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Boudreault updated CASSANDRA-10502:
----------------------------------------
    Reproduced In: 2.2.x

> Cassandra query degradation with high frequency updated tables
> --------------------------------------------------------------
>
>                 Key: CASSANDRA-10502
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10502
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Dodong Juan
>
> Hi,
> So we are developing a system that computes profile of things that it 
> observes.  The observation comes in form of events. Each thing that it 
> observe has an id and each thing has a set of subthings in it which has 
> measurement of some kind. Roughly there are about 500 subthings within each 
> thing. We receive events containing measurements of these 500 subthings every 
> 10 seconds or so.
> So as we receive events, we  read the old profile value, calculate the new 
> profile based on the new value and save it back. 
> One of the things we observe are the processes running on the server.
> We use the following schema to hold the profile. 
> {noformat}
> CREATE TABLE processinfometric_profile (
>     profilecontext text,
>     id text,
>     month text,
>     day text,
>     hour text,
>     minute text,
>     command text,
>     cpu map<text, double>,
>     majorfaults map<text, double>,
>     minorfaults map<text, double>,
>     nice map<text, double>,
>     pagefaults map<text, double>,
>     pid map<text, double>,
>     ppid map<text, double>,
>     priority map<text, double>,
>     resident map<text, double>,
>     rss map<text, double>,
>     sharesize map<text, double>,
>     size map<text, double>,
>     starttime map<text, double>,
>     state map<text, double>,
>     threads map<text, double>,
>     user map<text, double>,
>     vsize map<text, double>,
>     PRIMARY KEY ((profilecontext, id, month, day, hour, minute), command)
> ) WITH CLUSTERING ORDER BY (command ASC)
>     AND bloom_filter_fp_chance = 0.1
>     AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
>     AND comment = ''
>     AND compaction = {'class': 
> 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'}
>     AND compression = {'sstable_compression': 
> 'org.apache.cassandra.io.compress.LZ4Compressor'}
>     AND dclocal_read_repair_chance = 0.1
>     AND default_time_to_live = 0
>     AND gc_grace_seconds = 864000
>     AND max_index_interval = 2048
>     AND memtable_flush_period_in_ms = 0
>     AND min_index_interval = 128
>     AND read_repair_chance = 0.0
>     AND speculative_retry = '99.0PERCENTILE';
> {noformat}
> This profile will then be use for certain analytics that can use in the 
> context of the ‘thing’ or in the context of specific thing and subthing. 
> A profile can be defined as monthly, daily, hourly. So in case of monthly the 
> month will be set to the current month (i.e. ‘Oct’) and the day and hour will 
> be set to empty ‘’ string.
> The problem that we have observed is that over time (actually in just a 
> matter of hours) we will see a huge degradation of query response  for the 
> monthly profile. At the start it will be respinding in 10-100 ms and after a 
> couple of hours it will go to 2000-3000 ms . If you leave it for a couple of 
> days you will start experiencing readtimeouts . The query is basically just :
> {noformat}
> select * from myprofile where id=‘1’ and month=‘Oct’ and day=‘’ and hour=‘' 
> and minute=''
> {noformat}
> This will have only about 500 rows or so.
> We were using Cassandra 2.2.1 , but upgraded to 2.2.2 to see if it fixed the 
> issue to no avail. And since this is a test, we are running on a single node.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to