[ https://issues.apache.org/jira/browse/CASSANDRA-12071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Joshua McKenzie updated CASSANDRA-12071: ---------------------------------------- Assignee: Marcus Eriksson > Regression in flushing throughput under load after CASSANDRA-6696 > ----------------------------------------------------------------- > > Key: CASSANDRA-12071 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12071 > Project: Cassandra > Issue Type: Bug > Components: Local Write-Read Paths > Reporter: Ariel Weisberg > Assignee: Marcus Eriksson > > The way flushing used to work is that a ColumnFamilyStore could have multiple > Memtables flushing at once and multiple ColumnFamilyStores could flush at the > same time. The way it works now there can be only a single flush of any > ColumnFamilyStore & Memtable running in the C* process, and the number of > threads applied to that flush is bounded by the number of disks in JBOD. > This works ok most of the time but occasionally flushing will be a little > slower and ingest will outstrip it and then block on available memory. At > this point you see several second stalls that cause timeouts. > This is a problem for reasonable configurations that don't use JBOD but have > access to a fast disk that can handle some IO queuing (RAID, SSD). > You can reproduce on beefy hardware (12 cores 24 threads, 64 gigs of RAM, > SSD) if you unthrottle compaction or set it to something like 64 > megabytes/second and run with 8 compaction threads and stress with the > default write workload and a reasonable number of threads. I tested with 96. > It started happening after about 60 gigabytes of data was loaded. -- This message was sent by Atlassian JIRA (v6.3.4#6332)