Re: Read query taking a long time

2015-10-21 Thread Carlos Alonso
Well...

I think that pretty much is showing the problem. The problem I'd say is a
bad data model. Your read query is perfect, it hits a single partition and
that's the best situation, but on the other hand, it turns out that there
are one or two huge partitions and fully reading them is extremely
expensive. The tail of all you cfhistograms are very long. ie: the
difference between 99% and the max is very big

As you can see, the 99% of your partitions are within a reasonable 454Kb,
but it turns out that the maximum goes above 10Mb. Reviewing your data
model I have two ideas:

1. Either you have very wide rows (ie, lots of records for the same
partition key)
2. Some of your blob data is huge. I kind of think the problem is here, as
the difference in cells between the 99% and the max is very small while the
data size difference is huge, so partitions are not very big (ie, not very
wide rows), but very heavy weighting (lots of data).

Does that make sense?

Carlos Alonso | Software Engineer | @calonso 

On 21 October 2015 at 08:35, Brice Figureau <
brice+cassan...@daysofwonder.com> wrote:

> Hi,
>
> On 20/10/2015 19:48, Carlos Alonso wrote:
> > I think also having the output of cfhistograms could help. I'd like to
> > know how many sstables are being hit during reads.
>
> Ooops, my bad I thought you mentioned proxyhistograms.
> Here's the worst node cfhistograms:
>
> nodetool cfhistograms akka messages
> akka/messages histograms
> Percentile  SSTables Write Latency  Read LatencyPartition
> SizeCell Count
>   (micros)  (micros)   (bytes)
> 50% 1.00 60.00310.00
>  535 1
> 75% 2.00103.00   9887.00
>  1186417
> 95% 3.00  14237.00 126934.00
>  73457   215
> 98% 3.00  51012.00 219342.00
> 263210   446
> 99% 3.00  61214.001131752.00
> 454826   924
> Min 0.00  6.00 18.00
> 73 0
> Max 6.00 263210.005839588.00
> 10090808  1109
>
>
> > Also, which CL are you reading with?
>
> For the test request in cqlsh, I'm using ONE, but the application in which
> I first observed the issue was using QUORUM.
>
> > cfstats is a local command, so maybe that node you've printed is working
> > fine but there's another that is causing the latency. Can you check that
> > command in all nodes?
>
> I checked and on all nodes, the read latency and read local latency are
> within 15 to 40ms.
>
> I also noticed that C* was taking a fair bit of CPU on some of the nodes
> (ranging from 60% to 200%), looking at ttop output it was mostly taken by
> SharedPool-Worker threads, which I assume are the thread that are doing the
> real query work.
>
> Well, I'm puzzled, and I'll keep searching, thanks for your help!
> --
> Brice Figureau
>


Re: Read query taking a long time

2015-10-21 Thread Laing, Michael
Are the clocks synchronized across the cluster - probably, but I thought I
would ask :)

On Wed, Oct 21, 2015 at 3:35 AM, Brice Figureau <
brice+cassan...@daysofwonder.com> wrote:

> Hi,
>
> On 20/10/2015 19:48, Carlos Alonso wrote:
> > I think also having the output of cfhistograms could help. I'd like to
> > know how many sstables are being hit during reads.
>
> Ooops, my bad I thought you mentioned proxyhistograms.
> Here's the worst node cfhistograms:
>
> nodetool cfhistograms akka messages
> akka/messages histograms
> Percentile  SSTables Write Latency  Read LatencyPartition
> SizeCell Count
>   (micros)  (micros)   (bytes)
> 50% 1.00 60.00310.00
>  535 1
> 75% 2.00103.00   9887.00
>  1186417
> 95% 3.00  14237.00 126934.00
>  73457   215
> 98% 3.00  51012.00 219342.00
> 263210   446
> 99% 3.00  61214.001131752.00
> 454826   924
> Min 0.00  6.00 18.00
> 73 0
> Max 6.00 263210.005839588.00
> 10090808  1109
>
>
> > Also, which CL are you reading with?
>
> For the test request in cqlsh, I'm using ONE, but the application in which
> I first observed the issue was using QUORUM.
>
> > cfstats is a local command, so maybe that node you've printed is working
> > fine but there's another that is causing the latency. Can you check that
> > command in all nodes?
>
> I checked and on all nodes, the read latency and read local latency are
> within 15 to 40ms.
>
> I also noticed that C* was taking a fair bit of CPU on some of the nodes
> (ranging from 60% to 200%), looking at ttop output it was mostly taken by
> SharedPool-Worker threads, which I assume are the thread that are doing the
> real query work.
>
> Well, I'm puzzled, and I'll keep searching, thanks for your help!
> --
> Brice Figureau
>


Re: Read query taking a long time

2015-10-21 Thread Brice Figureau
Hi,

On 20/10/2015 19:48, Carlos Alonso wrote:
> I think also having the output of cfhistograms could help. I'd like to
> know how many sstables are being hit during reads.

Ooops, my bad I thought you mentioned proxyhistograms.
Here's the worst node cfhistograms:

nodetool cfhistograms akka messages
akka/messages histograms
Percentile  SSTables Write Latency  Read LatencyPartition Size  
  Cell Count
  (micros)  (micros)   (bytes)
50% 1.00 60.00310.00   535  
   1
75% 2.00103.00   9887.00 11864  
  17
95% 3.00  14237.00 126934.00 73457  
 215
98% 3.00  51012.00 219342.00263210  
 446
99% 3.00  61214.001131752.00454826  
 924
Min 0.00  6.00 18.0073  
   0
Max 6.00 263210.005839588.00  10090808  
1109


> Also, which CL are you reading with?

For the test request in cqlsh, I'm using ONE, but the application in which I 
first observed the issue was using QUORUM.

> cfstats is a local command, so maybe that node you've printed is working
> fine but there's another that is causing the latency. Can you check that
> command in all nodes?

I checked and on all nodes, the read latency and read local latency are within 
15 to 40ms.

I also noticed that C* was taking a fair bit of CPU on some of the nodes 
(ranging from 60% to 200%), looking at ttop output it was mostly taken by 
SharedPool-Worker threads, which I assume are the thread that are doing the 
real query work.

Well, I'm puzzled, and I'll keep searching, thanks for your help!
-- 
Brice Figureau


Re: Read query taking a long time

2015-10-20 Thread Carlos Alonso
I think also having the output of cfhistograms could help. I'd like to know
how many sstables are being hit during reads.

Also, which CL are you reading with?

cfstats is a local command, so maybe that node you've printed is working
fine but there's another that is causing the latency. Can you check that
command in all nodes?

Regards

Carlos Alonso | Software Engineer | @calonso 

On 20 October 2015 at 13:59, Brice Figureau <
brice+cassan...@daysofwonder.com> wrote:

> Hi,
>
> Thanks for your answer. Unfortunately since I wrote my e-mail, things
> are a bit better.
>
> This might be because I moved from openjdk 7 to oracle jdk 8 after
> having seen a warning in the C* log about openjdk, and I also added a
> node (for other reasons).
>
> Now the query itself takes only 1.5s~2s instead of the 5s~6s it was
> taking before.
>
> On Mon, 2015-10-19 at 14:38 +0100, Carlos Alonso wrote:
> > Could you send cfhistograms and cfstats relevant to the read column
> > family?
>
> Here are the requested informatrion
> % nodetool proxyhistograms
> proxy histograms
> Percentile  Read Latency Write Latency Range Latency
> (micros)  (micros)  (micros)
> 50%  1109.00372.00   1916.00
> 75% 14237.00535.00   2759.00
> 95%105778.00642.00   4768.00
> 98%545791.00770.00  11864.00
> 99%785939.00924.00  14237.00
> Min73.00  0.00373.00
> Max   5839588.00  88148.00  73457.00
>
> % nodetool cfstats akka.messages
> Keyspace: akka
> Read Count: 3334784
> Read Latency: 9.98472696792356 ms.
> Write Count: 7124
> Write Latency: 0.572256457046603 ms.
> Pending Flushes: 0
> Table: messages
> SSTable count: 1
> Space used (live): 4680841
> Space used (total): 4680841
> Space used by snapshots (total): 23615746
> Off heap memory used (total): 4051
> SSTable Compression Ratio: 0.17318784636027024
> Number of keys (estimate): 478
> Memtable cell count: 317
> Memtable data size: 42293
> Memtable off heap memory used: 0
> Memtable switch count: 10
> Local read count: 3334784
> Local read latency: 9.985 ms
> Local write count: 7124
> Local write latency: 0.573 ms
> Pending flushes: 0
> Bloom filter false positives: 0
> Bloom filter false ratio: 0.0
> Bloom filter space used: 592
> Bloom filter off heap memory used: 584
> Index summary off heap memory used: 203
> Compression metadata off heap memory used: 3264
> Compacted partition minimum bytes: 73
> Compacted partition maximum bytes: 17436917
> Compacted partition mean bytes: 63810
> Average live cells per slice (last five minutes):
> 0.6693421039216356
> Maximum live cells per slice (last five minutes): 1033.0
> Average tombstones per slice (last five minutes): 0.0
> Maximum tombstones per slice (last five minutes): 0.0
>
> 
>
> If I read correctly this, there's a huge read latency while proxying,
> but local read latency, or even all node latency on this table is
> correct.
>
> Would that mean this is a network issue?
> --
> Brice Figureau 
>
>


Re: Read query taking a long time

2015-10-20 Thread Brice Figureau
Hi,

Thanks for your answer. Unfortunately since I wrote my e-mail, things
are a bit better.

This might be because I moved from openjdk 7 to oracle jdk 8 after
having seen a warning in the C* log about openjdk, and I also added a
node (for other reasons).

Now the query itself takes only 1.5s~2s instead of the 5s~6s it was
taking before.

On Mon, 2015-10-19 at 14:38 +0100, Carlos Alonso wrote:
> Could you send cfhistograms and cfstats relevant to the read column
> family?

Here are the requested informatrion
% nodetool proxyhistograms
proxy histograms
Percentile  Read Latency Write Latency Range Latency
(micros)  (micros)  (micros)
50%  1109.00372.00   1916.00
75% 14237.00535.00   2759.00
95%105778.00642.00   4768.00
98%545791.00770.00  11864.00
99%785939.00924.00  14237.00
Min73.00  0.00373.00
Max   5839588.00  88148.00  73457.00

% nodetool cfstats akka.messages
Keyspace: akka
Read Count: 3334784
Read Latency: 9.98472696792356 ms.
Write Count: 7124
Write Latency: 0.572256457046603 ms.
Pending Flushes: 0
Table: messages
SSTable count: 1
Space used (live): 4680841
Space used (total): 4680841
Space used by snapshots (total): 23615746
Off heap memory used (total): 4051
SSTable Compression Ratio: 0.17318784636027024
Number of keys (estimate): 478
Memtable cell count: 317
Memtable data size: 42293
Memtable off heap memory used: 0
Memtable switch count: 10
Local read count: 3334784
Local read latency: 9.985 ms
Local write count: 7124
Local write latency: 0.573 ms
Pending flushes: 0
Bloom filter false positives: 0
Bloom filter false ratio: 0.0
Bloom filter space used: 592
Bloom filter off heap memory used: 584
Index summary off heap memory used: 203
Compression metadata off heap memory used: 3264
Compacted partition minimum bytes: 73
Compacted partition maximum bytes: 17436917
Compacted partition mean bytes: 63810
Average live cells per slice (last five minutes):
0.6693421039216356
Maximum live cells per slice (last five minutes): 1033.0
Average tombstones per slice (last five minutes): 0.0
Maximum tombstones per slice (last five minutes): 0.0



If I read correctly this, there's a huge read latency while proxying,
but local read latency, or even all node latency on this table is
correct.

Would that mean this is a network issue?
-- 
Brice Figureau 



Re: Read query taking a long time

2015-10-19 Thread Carlos Alonso
Could you send cfhistograms and cfstats relevant to the read column family?

That could help

Carlos Alonso | Software Engineer | @calonso 

On 17 October 2015 at 16:15, Brice Figureau <
brice+cassan...@daysofwonder.com> wrote:

> Hi,
>
> I've read all I could find on how cassandra works, I'm still wondering why
> the following query takes more than 5s to return on a simple (and modest) 3
> nodes cassandra 2.1.9 cluster:
>
> SELECT sequence_nr, used
> FROM messages
> WHERE persistence_id = 'session-SW' AND partition_nr = 0;
>
> The schema of this table:
> 
> CREATE TABLE akka.messages (
> persistence_id text,
> partition_nr bigint,
> sequence_nr bigint,
> message blob,
> used boolean static,
> PRIMARY KEY ((persistence_id, partition_nr), sequence_nr)
> ) WITH CLUSTERING ORDER BY (sequence_nr ASC)
> AND bloom_filter_fp_chance = 0.01
> AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
> AND comment = ''
> AND compaction = {'class':
> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'}
> AND compression = {'sstable_compression':
> 'org.apache.cassandra.io.compress.LZ4Compressor'}
> AND dclocal_read_repair_chance = 0.1
> AND default_time_to_live = 0
> AND gc_grace_seconds = 864000
> AND max_index_interval = 2048
> AND memtable_flush_period_in_ms = 0
> AND min_index_interval = 128
> AND read_repair_chance = 0.0
> AND speculative_retry = '99.0PERCENTILE';
>
>
> This is a query from Akka cassandra journal (
> https://github.com/krasserm/akka-persistence-cassandra) which returns 33
> rows for the moment.
>
> The time doesn't vary if I use a QUORUM consistency or ONE.
> Note that the same query but with a different `persistence_id` are much
> faster (it might depend on the number of rows returned).
>
> Here's the (redacted) trace taken with cqlsh:
>
> activity
> | timestamp  | source |
> source_elapsed
>
> ---+++
>
>   Execute CQL3 query | 2015-10-17 17:01:05.681000 | 192.168.168.26 |
>   0
>   Parsing
> ..
> [SharedPool-Worker-1] | 2015-10-17 17:01:05.683000 | 192.168.168.26 |
>84
> READ message received from /192.168.168.26
> [MessagingService-Incoming-/192.168.168.26] | 2015-10-17 17:01:05.683000
> | 192.168.168.29 | 69
>  Preparing
> statement [SharedPool-Worker-1] | 2015-10-17 17:01:05.683000 |
> 192.168.168.26 |215
>reading data from /
> 192.168.168.29 [SharedPool-Worker-1] | 2015-10-17 17:01:05.683000 |
> 192.168.168.26 |   1189
>Sending READ message to /192.168.168.29
> [MessagingService-Outgoing-/192.168.168.29] | 2015-10-17 17:01:05.683000
> | 192.168.168.26 |   1317
> Executing single-partition query on
> messages [SharedPool-Worker-1] | 2015-10-17 17:01:05.684000 |
> 192.168.168.29 |175
> Acquiring sstable
> references [SharedPool-Worker-1] | 2015-10-17 17:01:05.684000 |
> 192.168.168.29 |189
>  Merging memtable
> tombstones [SharedPool-Worker-1] | 2015-10-17 17:01:05.685000 |
> 192.168.168.29 |204
>Key cache hit for sstable
> 434 [SharedPool-Worker-1] | 2015-10-17 17:01:05.685000 | 192.168.168.29 |
>   257
>  Seeking to partition beginning in data
> file [SharedPool-Worker-1] | 2015-10-17 17:01:05.685000 | 192.168.168.29 |
>   269
>Key cache hit for sstable
> 432 [SharedPool-Worker-1] | 2015-10-17 17:01:05.685000 | 192.168.168.29 |
>   514
>  Seeking to partition beginning in data
> file [SharedPool-Worker-1] | 2015-10-17 17:01:05.686000 | 192.168.168.29 |
>   527
>Key cache hit for sstable
> 433 [SharedPool-Worker-1] | 2015-10-17 17:01:05.686000 | 192.168.168.29 |
>   779
>  Seeking to partition beginning in data
> file [SharedPool-Worker-1] | 2015-10-17 17:01:05.686000 | 192.168.168.29 |
>   789
>Skipped 0/3 non-slice-intersecting sstables, included 0 due to
> tombstones [SharedPool-Worker-1] | 2015-10-17 17:01:05.686000 |
> 192.168.168.29 |929
>   Merging data from memtables and 3
> sstables [SharedPool-Worker-1] | 2015-10-17 17:01:05.687000 |
> 192.168.168.29 |956
>speculating read 

Re: Read query taking a long time

2015-10-19 Thread Jon Haddad
I wrote a blog post a while back you may find helpful on diagnosing problems in 
production.  There's a lot of potential things that could be wrong with your 
cluster and going back and forth on the ML to pin down the right one will take 
forever.

http://rustyrazorblade.com/2014/09/cassandra-summit-recap-diagnosing-problems-in-production/
 


Once you've figured out what's wrong (and fixed it), you should read Al's 
tuning guide:

https://tobert.github.io/pages/als-cassandra-21-tuning-guide.html 


Jon


> On Oct 19, 2015, at 7:38 AM, Carlos Alonso  wrote:
> 
> Could you send cfhistograms and cfstats relevant to the read column family?
> 
> That could help
> 
> Carlos Alonso | Software Engineer | @calonso 
> 
> On 17 October 2015 at 16:15, Brice Figureau  > wrote:
> Hi,
> 
> I've read all I could find on how cassandra works, I'm still wondering why 
> the following query takes more than 5s to return on a simple (and modest) 3 
> nodes cassandra 2.1.9 cluster:
> 
> SELECT sequence_nr, used
> FROM messages
> WHERE persistence_id = 'session-SW' AND partition_nr = 0;
> 
> The schema of this table:
> 
> CREATE TABLE akka.messages (
> persistence_id text,
> partition_nr bigint,
> sequence_nr bigint,
> message blob,
> used boolean static,
> PRIMARY KEY ((persistence_id, partition_nr), sequence_nr)
> ) WITH CLUSTERING ORDER BY (sequence_nr ASC)
> AND bloom_filter_fp_chance = 0.01
> AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
> AND comment = ''
> AND compaction = {'class': 
> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'}
> AND compression = {'sstable_compression': 
> 'org.apache.cassandra.io.compress.LZ4Compressor'}
> AND dclocal_read_repair_chance = 0.1
> AND default_time_to_live = 0
> AND gc_grace_seconds = 864000
> AND max_index_interval = 2048
> AND memtable_flush_period_in_ms = 0
> AND min_index_interval = 128
> AND read_repair_chance = 0.0
> AND speculative_retry = '99.0PERCENTILE';
> 
> 
> This is a query from Akka cassandra journal 
> (https://github.com/krasserm/akka-persistence-cassandra 
> ) which returns 33 
> rows for the moment.
> 
> The time doesn't vary if I use a QUORUM consistency or ONE.
> Note that the same query but with a different `persistence_id` are much 
> faster (it might depend on the number of rows returned).
> 
> Here's the (redacted) trace taken with cqlsh:
> 
> activity  
> | timestamp  | source | 
> source_elapsed
> ---+++
>   
>   Execute CQL3 query | 2015-10-17 17:01:05.681000 | 192.168.168.26 |  
> 0
>   Parsing .. 
> [SharedPool-Worker-1] | 2015-10-17 17:01:05.683000 | 192.168.168.26 | 
> 84
> READ message received from /192.168.168.26 
>  [MessagingService-Incoming-/192.168.168.26 
> ] | 2015-10-17 17:01:05.683000 | 192.168.168.29 | 
> 69
>  Preparing statement 
> [SharedPool-Worker-1] | 2015-10-17 17:01:05.683000 | 192.168.168.26 | 
>215
>reading data from /192.168.168.29 
>  [SharedPool-Worker-1] | 2015-10-17 17:01:05.683000 | 
> 192.168.168.26 |   1189
>Sending READ message to /192.168.168.29 
>  [MessagingService-Outgoing-/192.168.168.29 
> ] | 2015-10-17 17:01:05.683000 | 192.168.168.26 | 
>   1317
> Executing single-partition query on messages 
> [SharedPool-Worker-1] | 2015-10-17 17:01:05.684000 | 192.168.168.29 | 
>175
> Acquiring sstable references 
> [SharedPool-Worker-1] | 2015-10-17 17:01:05.684000 | 192.168.168.29 | 
>189
>  Merging memtable tombstones 
> [SharedPool-Worker-1] | 2015-10-17 17:01:05.685000 | 192.168.168.29 | 
>204
>Key cache hit for sstable 434 
> [SharedPool-Worker-1] | 2015-10-17 17:01:05.685000 | 192.168.168.29 | 
>257
>  Seeking to