Paul Ayers created CASSANDRA-19049: -------------------------------------- Summary: Speculative read retries and 3 replica responses driving up latencies on CL ONE queries with RF 5 keyspace in C* 4.0.7 Key: CASSANDRA-19049 URL: https://issues.apache.org/jira/browse/CASSANDRA-19049 Project: Cassandra Issue Type: Bug Reporter: Paul Ayers Attachments: iad8a-ra20-26a.log, pdx3a-ra1-15a.log, tracepdx3a-ra1-15a.log
A Cassandra 4.0.7 cluster is experiencing very high cpu utilization and extremely high latencies when certain partitions become hot. This is occurring on a keyspace with a Replication Factor of 5 and a Consistency Level of ONE. There are ~10 data drives per node, which is why you'll see multiple sstables read in some traces because the data is distributed round-robin among the drives. All queries are single-partition queries. I'm sure we haven't identified every partition that this occurs for, but at least for the couple that we found, it seems we're hitting at least 3 of the 5 replicas in many cases and doing a lot of speculative retry, even though the CL is ONE. We've kicked off some count queries just to capture a trace output for a couple of the partitions that are known to cause issues, attached to the Jira. When any of these partitions become hot, it pegs the cpu, drives up latencies, and causes a lot of timeouts. I assume this could be a bug related to the RF 5 keyspace as we'd probably have seen this already with RF 3 keyspaces, but I have yet to test changing the RF to 3 to see if that resolves the issue. The schema for the table with the problematic partitions: {code:java} CREATE TABLE v2metadata.tag_values_fresh ( metric_name ascii, tag_names ascii, shard_id tinyint, v2namespace ascii, tag_values ascii, metric_id blob, timestamp_mins_last varint, PRIMARY KEY ((metric_name, tag_names, shard_id), v2namespace, tag_values) ) WITH CLUSTERING ORDER BY (v2namespace ASC, tag_values ASC) AND additional_write_policy = '99p' AND bloom_filter_fp_chance = 0.01 AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'} AND cdc = false AND comment = '' AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4', 'unchecked_tombstone_compaction': 'true'} AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'} AND crc_check_chance = 1.0 AND default_time_to_live = 864000 AND extensions = {} AND gc_grace_seconds = 10800 AND max_index_interval = 2048 AND memtable_flush_period_in_ms = 0 AND min_index_interval = 128 AND read_repair = 'BLOCKING' AND speculative_retry = '99p'; {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org