[ 
https://issues.apache.org/jira/browse/CASSANDRA-9830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15189993#comment-15189993
 ] 

Paulo Motta commented on CASSANDRA-9830:
----------------------------------------

The tests completed but I don't quite get the results. The basic scenario is:
# stress write 10M entries
# wait for compactions OR major compaction
# stress read 10M entries
# stress write 25M entries
# wait for compactions OR major compaction
# stress read 25M entries
# Check bloom filter total size in bytes

Two variations are:
# Organic compactions (wait for compactions to settle)
# Major compactions (run major compaction after write)

Tested versions are:
# Trunk
# {{disable_top_level_bloom_filter}} option enabled (patched)

Expected results: patched version has smaller bloom filter size

Below are the results:

*4 runs with organic compactions:*
||[organic1|http://cstar.datastax.com/tests/id/1009bf9c-e1b7-11e5-bfef-0256e416528f]||trunk||patched||savings||
|node1|99486248|55634520|44.08%|
|node2|62664008|51228104|18.25%|
|node3|58143832|51227632|11.89%|
||[organic2|http://cstar.datastax.com/tests/id/6d5e679a-e0d8-11e5-b34c-0256e416528f]||trunk||patched||savings||
|node1|65969552|91197120|-38.24%|
|node2|57046664|50175360|12.05%|
|node3|57097360|58985440|-3.31%|
||[organic3|http://cstar.datastax.com/tests/id/f4696ce8-e0ac-11e5-b34c-0256e416528f]||trunk||patched||savings||
|node1|65807576|58977040|10.38%|
|node2|57115416|50175360|12.15%|
|node3|57081304|58985440|-3.34%|
||[organic4|http://cstar.datastax.com/tests/id/5cda1ba6-dc2a-11e5-bf87-0256e416528f]||trunk||patched||savings||
|node1|65958496|58749680|10.93%|
|node2|57081792|50192640|12.07%|
|node3|65958352|58977040|10.58%|

*4 runs with major compactions:*
||[major1|http://cstar.datastax.com/tests/id/fc1c6edc-e259-11e5-995b-0256e416528f]||trunk||patched||savings||
|node1|8026368|3818000|52.43%|
|node2|8017800|3822080|52.33%|
|node3|8017800|3818000|52.38%|
||[major2|http://cstar.datastax.com/tests/id/dd0ac73c-d9c1-11e5-98e3-0256e416528f]||trunk||patched||savings||
|node1|8026368|3818000|52.43%|
|node2|8026368|3822080|52.38%|
|node3|8026368|3822080|52.38%|
||[major3|http://cstar.datastax.com/tests/id/05b6c474-d698-11e5-98bc-0256e416528f]||trunk||patched||savings||
|node1|8017800|3818000|52.38%|
|node2|8026368|3818000|52.43%|
|node3|8017800|3822080|52.33%|
||[major4|http://cstar.datastax.com/tests/id/f3f85d6c-e257-11e5-995b-0256e416528f]*||trunk||patched||savings||
|node1|8017800|3822080|10.93%|
|node2|8026368|3822080|12.07%|
|node3|8017800|3822080|10.58%|

*\* Reduced bloom_filter_fp_chance from 0.1 to 0.01*

While the major compaction results are very consistent, the organic compaction 
results look a bit strange due to its variability and even increase in bloom 
filter size in some cases (organic2 and organic3). I did some investigation and 
verified that in all organic compaction cases the bloom filter is in fact 
skipped on the last level (L2), but what causes this result variability is the 
L1 BF size, which can range from 50MB to 100MB in both unpatched or patched 
versions.

Main questions that arise:
#1 What causes the variability in L1 sizes on the organic compaction scenarios? 
Could it be the stress profile/distribution or a bug on bloom filter sizing?
#2 On scenario 
[major4|http://cstar.datastax.com/tests/id/f3f85d6c-e257-11e5-995b-0256e416528f]
 I decreased the bloom filter fp chance from 0.1 to 0.01, but bloom filter size 
remained the same. Is this expected?

Any ideas [~carlyeks], [~krummas] ?

> Option to disable bloom filter in highest level of LCS sstables
> ---------------------------------------------------------------
>
>                 Key: CASSANDRA-9830
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9830
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Compaction
>            Reporter: Jonathan Ellis
>            Assignee: Paulo Motta
>            Priority: Minor
>              Labels: performance
>             Fix For: 3.x
>
>
> We expect about 90% of data to be in the highest level of LCS in a fully 
> populated series.  (See also CASSANDRA-9829.)
> Thus if the user is primarily asking for data (partitions) that has actually 
> been inserted, the bloom filter on the highest level only helps reject 
> sstables about 10% of the time.
> We should add an option that suppresses bloom filter creation on top-level 
> sstables.  This will dramatically reduce memory usage for LCS and may even 
> improve performance as we no longer check a low-value filter.
> (This is also an idea from RocksDB.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to