[
https://issues.apache.org/jira/browse/CASSANDRA-19049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Paul Ayers updated CASSANDRA-19049:
-----------------------------------
Attachment: (was: pdx3a-ra1-15a.log)
> Speculative read retries and multiple replica responses driving up latencies
> on CL ONE queries with RF 5 keyspace in C* 4.0.7
> -----------------------------------------------------------------------------------------------------------------------------
>
> Key: CASSANDRA-19049
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19049
> Project: Cassandra
> Issue Type: Bug
> Reporter: Paul Ayers
> Priority: Urgent
> Attachments: iad8a.log, pdx3a-ra.log
>
>
> A Cassandra 4.0.7 cluster is experiencing very high cpu utilization and
> extremely high latencies when certain partitions become hot.
> This is occurring on a keyspace with a Replication Factor of 5 and a
> Consistency Level of ONE. There are ~10 data drives per node, which is why
> you'll see multiple sstables read in some traces because the data is
> distributed round-robin among the drives.
> All queries are single-partition queries.
> I'm sure we haven't identified every partition that this occurs for, but at
> least for the couple that we found, it seems we're hitting at least 3 of the
> 5 replicas in many cases and doing a lot of speculative retry, even though
> the CL is ONE. We've kicked off some count queries just to capture a trace
> output for a couple of the partitions that are known to cause issues,
> attached to the Jira. When any of these partitions become hot, it pegs the
> cpu, drives up latencies, and causes a lot of timeouts.
> I assume this could be a bug related to the RF 5 keyspace as we'd probably
> have seen this already with RF 3 keyspaces, but I have yet to test changing
> the RF to 3 to see if that resolves the issue.
> The schema for the table with the problematic partitions:
> {code:java}
> CREATE TABLE v2metadata.tag_values_fresh (
> metric_name ascii,
> tag_names ascii,
> shard_id tinyint,
> v2namespace ascii,
> tag_values ascii,
> metric_id blob,
> timestamp_mins_last varint,
> PRIMARY KEY ((metric_name, tag_names, shard_id), v2namespace, tag_values)
> ) WITH CLUSTERING ORDER BY (v2namespace ASC, tag_values ASC)
> AND additional_write_policy = '99p'
> AND bloom_filter_fp_chance = 0.01
> AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
> AND cdc = false
> AND comment = ''
> AND compaction = {'class':
> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
> 'max_threshold': '32', 'min_threshold': '4',
> 'unchecked_tombstone_compaction': 'true'}
> AND compression = {'chunk_length_in_kb': '64', 'class':
> 'org.apache.cassandra.io.compress.LZ4Compressor'}
> AND crc_check_chance = 1.0
> AND default_time_to_live = 864000
> AND extensions = {}
> AND gc_grace_seconds = 10800
> AND max_index_interval = 2048
> AND memtable_flush_period_in_ms = 0
> AND min_index_interval = 128
> AND read_repair = 'BLOCKING'
> AND speculative_retry = '99p';
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]