[ https://issues.apache.org/jira/browse/CASSANDRA-9830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15189993#comment-15189993 ]
Paulo Motta commented on CASSANDRA-9830: ---------------------------------------- The tests completed but I don't quite get the results. The basic scenario is: # stress write 10M entries # wait for compactions OR major compaction # stress read 10M entries # stress write 25M entries # wait for compactions OR major compaction # stress read 25M entries # Check bloom filter total size in bytes Two variations are: # Organic compactions (wait for compactions to settle) # Major compactions (run major compaction after write) Tested versions are: # Trunk # {{disable_top_level_bloom_filter}} option enabled (patched) Expected results: patched version has smaller bloom filter size Below are the results: *4 runs with organic compactions:* ||[organic1|http://cstar.datastax.com/tests/id/1009bf9c-e1b7-11e5-bfef-0256e416528f]||trunk||patched||savings|| |node1|99486248|55634520|44.08%| |node2|62664008|51228104|18.25%| |node3|58143832|51227632|11.89%| ||[organic2|http://cstar.datastax.com/tests/id/6d5e679a-e0d8-11e5-b34c-0256e416528f]||trunk||patched||savings|| |node1|65969552|91197120|-38.24%| |node2|57046664|50175360|12.05%| |node3|57097360|58985440|-3.31%| ||[organic3|http://cstar.datastax.com/tests/id/f4696ce8-e0ac-11e5-b34c-0256e416528f]||trunk||patched||savings|| |node1|65807576|58977040|10.38%| |node2|57115416|50175360|12.15%| |node3|57081304|58985440|-3.34%| ||[organic4|http://cstar.datastax.com/tests/id/5cda1ba6-dc2a-11e5-bf87-0256e416528f]||trunk||patched||savings|| |node1|65958496|58749680|10.93%| |node2|57081792|50192640|12.07%| |node3|65958352|58977040|10.58%| *4 runs with major compactions:* ||[major1|http://cstar.datastax.com/tests/id/fc1c6edc-e259-11e5-995b-0256e416528f]||trunk||patched||savings|| |node1|8026368|3818000|52.43%| |node2|8017800|3822080|52.33%| |node3|8017800|3818000|52.38%| ||[major2|http://cstar.datastax.com/tests/id/dd0ac73c-d9c1-11e5-98e3-0256e416528f]||trunk||patched||savings|| |node1|8026368|3818000|52.43%| |node2|8026368|3822080|52.38%| |node3|8026368|3822080|52.38%| ||[major3|http://cstar.datastax.com/tests/id/05b6c474-d698-11e5-98bc-0256e416528f]||trunk||patched||savings|| |node1|8017800|3818000|52.38%| |node2|8026368|3818000|52.43%| |node3|8017800|3822080|52.33%| ||[major4|http://cstar.datastax.com/tests/id/f3f85d6c-e257-11e5-995b-0256e416528f]*||trunk||patched||savings|| |node1|8017800|3822080|10.93%| |node2|8026368|3822080|12.07%| |node3|8017800|3822080|10.58%| *\* Reduced bloom_filter_fp_chance from 0.1 to 0.01* While the major compaction results are very consistent, the organic compaction results look a bit strange due to its variability and even increase in bloom filter size in some cases (organic2 and organic3). I did some investigation and verified that in all organic compaction cases the bloom filter is in fact skipped on the last level (L2), but what causes this result variability is the L1 BF size, which can range from 50MB to 100MB in both unpatched or patched versions. Main questions that arise: #1 What causes the variability in L1 sizes on the organic compaction scenarios? Could it be the stress profile/distribution or a bug on bloom filter sizing? #2 On scenario [major4|http://cstar.datastax.com/tests/id/f3f85d6c-e257-11e5-995b-0256e416528f] I decreased the bloom filter fp chance from 0.1 to 0.01, but bloom filter size remained the same. Is this expected? Any ideas [~carlyeks], [~krummas] ? > Option to disable bloom filter in highest level of LCS sstables > --------------------------------------------------------------- > > Key: CASSANDRA-9830 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9830 > Project: Cassandra > Issue Type: New Feature > Components: Compaction > Reporter: Jonathan Ellis > Assignee: Paulo Motta > Priority: Minor > Labels: performance > Fix For: 3.x > > > We expect about 90% of data to be in the highest level of LCS in a fully > populated series. (See also CASSANDRA-9829.) > Thus if the user is primarily asking for data (partitions) that has actually > been inserted, the bloom filter on the highest level only helps reject > sstables about 10% of the time. > We should add an option that suppresses bloom filter creation on top-level > sstables. This will dramatically reduce memory usage for LCS and may even > improve performance as we no longer check a low-value filter. > (This is also an idea from RocksDB.) -- This message was sent by Atlassian JIRA (v6.3.4#6332)