[ https://issues.apache.org/jira/browse/CASSANDRA-8413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14614284#comment-14614284 ]
Robert Stupp commented on CASSANDRA-8413: ----------------------------------------- Thanks. Committed as 23fd75f27c40462636f09920719b5dcbef5b8f36 with a few sentences in NEWS.txt regarding the new SSTable file version. > Bloom filter false positive ratio is not honoured > ------------------------------------------------- > > Key: CASSANDRA-8413 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8413 > Project: Cassandra > Issue Type: Bug > Components: Core > Reporter: Benedict > Assignee: Robert Stupp > Fix For: 3.0 beta 1 > > Attachments: 8413-patch.txt, 8413.hack-3.0.txt, 8413.hack.txt > > > Whilst thinking about CASSANDRA-7438 and hash bits, I realised we have a > problem with sabotaging our bloom filters when using the murmur3 partitioner. > I have performed a very quick test to confirm this risk is real. > Since a typical cluster uses the same murmur3 hash for partitioning as we do > for bloom filter lookups, and we own a contiguous range, we can guarantee > that the top X bits collide for all keys on the node. This translates into > poor bloom filter distribution. I quickly hacked LongBloomFilterTest to > simulate the problem, and the result in these tests is _up to_ a doubling of > the actual false positive ratio. The actual change will depend on the key > distribution, the number of keys, the false positive ratio, the number of > nodes, the token distribution, etc. But seems to be a real problem for > non-vnode clusters of at least ~128 nodes in size. -- This message was sent by Atlassian JIRA (v6.3.4#6332)