[ https://issues.apache.org/jira/browse/CASSANDRA-18123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Brandon Williams updated CASSANDRA-18123: ----------------------------------------- Bug Category: Parent values: Degradation(12984)Level 1 values: Performance Bug/Regression(12997) Complexity: Normal Discovered By: User Report Fix Version/s: 3.0.x 3.11.x 4.0.x 4.1.x 4.x Severity: Normal Status: Open (was: Triage Needed) > Reuse of metadata collector can break key count calculation > ----------------------------------------------------------- > > Key: CASSANDRA-18123 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18123 > Project: Cassandra > Issue Type: Bug > Components: Local/Compaction > Reporter: Branimir Lambov > Priority: Normal > Fix For: 3.0.x, 3.11.x, 4.0.x, 4.1.x, 4.x > > > When flushing a memtable we currently pass a constructed > {{MetadataCollector}} to the {{SSTableMultiWriter}} that is used for writing > sstables. The latter may decide to split the data into multiple sstables > (e.g. for separate disks or driven by compaction strategy) — if it does so, > the cardinality estimation component in the reused {{MetadataCollector}} for > each individual sstable contains the data for all of them. > As a result, when such sstables are compacted the estimation for the number > of keys in the resulting sstables, which is used to determine the size of the > bloom filter for the compaction result, is heavily overestimated. > This results in much bigger L1 bloom filters than they should be. One example > (which came about during testing of the upcoming CEP-26, after insertion of > 100GB data with 10% reads): > (current) > {code} > Bloom filter false positives: 22627369 > Bloom filter false ratio: 0.02257 > Bloom filter space used: 1848247864 > Bloom filter off heap memory used: 2338964088 > {code} > (fixed) > {code} > Bloom filter false positives: 24426545 > Bloom filter false ratio: 0.02429 > Bloom filter space used: 1118910096 > Bloom filter off heap memory used: 1532357432 > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org