Paul Ayers created CASSANDRA-19049:
--------------------------------------

             Summary: Speculative read retries and 3 replica responses driving 
up latencies on CL ONE queries with RF 5 keyspace in C* 4.0.7
                 Key: CASSANDRA-19049
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-19049
             Project: Cassandra
          Issue Type: Bug
            Reporter: Paul Ayers
         Attachments: iad8a-ra20-26a.log, pdx3a-ra1-15a.log, 
tracepdx3a-ra1-15a.log

A Cassandra 4.0.7 cluster is experiencing very high cpu utilization and 
extremely high latencies when certain partitions become hot.

This is occurring on a keyspace with a Replication Factor of 5 and a 
Consistency Level of ONE.  There are ~10 data drives per node, which is why 
you'll see multiple sstables read in some traces because the data is 
distributed round-robin among the drives.
All queries are single-partition queries.

I'm sure we haven't identified every partition that this occurs for, but at 
least for the couple that we found, it seems we're hitting at least 3 of the 5 
replicas in many cases and doing a lot of speculative retry, even though the CL 
is ONE.  We've kicked off some count queries just to capture a trace output for 
a couple of the partitions that are known to cause issues, attached to the 
Jira.  When any of these partitions become hot, it pegs the cpu, drives up 
latencies, and causes a lot of timeouts.
I assume this could be a bug related to the RF 5 keyspace as we'd probably have 
seen this already with RF 3 keyspaces, but I have yet to test changing the RF 
to 3 to see if that resolves the issue.

The schema for the table with the problematic partitions:

{code:java}
CREATE TABLE v2metadata.tag_values_fresh (
  metric_name ascii,
  tag_names ascii,
  shard_id tinyint,
  v2namespace ascii,
  tag_values ascii,
  metric_id blob,
  timestamp_mins_last varint,
  PRIMARY KEY ((metric_name, tag_names, shard_id), v2namespace, tag_values)
) WITH CLUSTERING ORDER BY (v2namespace ASC, tag_values ASC)
  AND additional_write_policy = '99p'
  AND bloom_filter_fp_chance = 0.01
  AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
  AND cdc = false
  AND comment = ''
  AND compaction = {'class': 
'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 
'max_threshold': '32', 'min_threshold': '4', 'unchecked_tombstone_compaction': 
'true'}
  AND compression = {'chunk_length_in_kb': '64', 'class': 
'org.apache.cassandra.io.compress.LZ4Compressor'}
  AND crc_check_chance = 1.0
  AND default_time_to_live = 864000
  AND extensions = {}
  AND gc_grace_seconds = 10800
  AND max_index_interval = 2048
  AND memtable_flush_period_in_ms = 0
  AND min_index_interval = 128
  AND read_repair = 'BLOCKING'
  AND speculative_retry = '99p';
{code}




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to