[ https://issues.apache.org/jira/browse/CASSANDRA-19049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Paul Ayers updated CASSANDRA-19049: ----------------------------------- Summary: Speculative read retries and multiple replica responses driving up latencies on CL ONE queries with RF 5 keyspace in C* 4.0.7 (was: Speculative read retries and 3 replica responses driving up latencies on CL ONE queries with RF 5 keyspace in C* 4.0.7) > Speculative read retries and multiple replica responses driving up latencies > on CL ONE queries with RF 5 keyspace in C* 4.0.7 > ----------------------------------------------------------------------------------------------------------------------------- > > Key: CASSANDRA-19049 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19049 > Project: Cassandra > Issue Type: Bug > Reporter: Paul Ayers > Priority: Urgent > Attachments: iad8a-ra20-26a.log, pdx3a-ra1-15a.log, > tracepdx3a-ra1-15a.log > > > A Cassandra 4.0.7 cluster is experiencing very high cpu utilization and > extremely high latencies when certain partitions become hot. > This is occurring on a keyspace with a Replication Factor of 5 and a > Consistency Level of ONE. There are ~10 data drives per node, which is why > you'll see multiple sstables read in some traces because the data is > distributed round-robin among the drives. > All queries are single-partition queries. > I'm sure we haven't identified every partition that this occurs for, but at > least for the couple that we found, it seems we're hitting at least 3 of the > 5 replicas in many cases and doing a lot of speculative retry, even though > the CL is ONE. We've kicked off some count queries just to capture a trace > output for a couple of the partitions that are known to cause issues, > attached to the Jira. When any of these partitions become hot, it pegs the > cpu, drives up latencies, and causes a lot of timeouts. > I assume this could be a bug related to the RF 5 keyspace as we'd probably > have seen this already with RF 3 keyspaces, but I have yet to test changing > the RF to 3 to see if that resolves the issue. > The schema for the table with the problematic partitions: > {code:java} > CREATE TABLE v2metadata.tag_values_fresh ( > metric_name ascii, > tag_names ascii, > shard_id tinyint, > v2namespace ascii, > tag_values ascii, > metric_id blob, > timestamp_mins_last varint, > PRIMARY KEY ((metric_name, tag_names, shard_id), v2namespace, tag_values) > ) WITH CLUSTERING ORDER BY (v2namespace ASC, tag_values ASC) > AND additional_write_policy = '99p' > AND bloom_filter_fp_chance = 0.01 > AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'} > AND cdc = false > AND comment = '' > AND compaction = {'class': > 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', > 'max_threshold': '32', 'min_threshold': '4', > 'unchecked_tombstone_compaction': 'true'} > AND compression = {'chunk_length_in_kb': '64', 'class': > 'org.apache.cassandra.io.compress.LZ4Compressor'} > AND crc_check_chance = 1.0 > AND default_time_to_live = 864000 > AND extensions = {} > AND gc_grace_seconds = 10800 > AND max_index_interval = 2048 > AND memtable_flush_period_in_ms = 0 > AND min_index_interval = 128 > AND read_repair = 'BLOCKING' > AND speculative_retry = '99p'; > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org