[ 
https://issues.apache.org/jira/browse/CASSANDRA-8413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14599498#comment-14599498
 ] 

Benedict commented on CASSANDRA-8413:
-------------------------------------

[~snazy]: sorry for the embarassingly slow review. I think I conflated this 
with your other bf ticket which requires a bit more thought (on my part).

I think I would prefer if we settle on the _old_ way being "inverted" - perhaps 
even just called "hasOldBfHashOrder", and we just swap the {{indexes[0]}} and 
{{indexes[1]}} positions in {{getHashBuckets}} - and then flip them iff we have 
the old layout. It's a small thing, but I think it is clearer if the expired 
way of doing things is considered the exceptional and extra work case.

Otherwise, can we rebase and get cassci vetting? The versioning conditions may 
need revisiting also.

> Bloom filter false positive ratio is not honoured
> -------------------------------------------------
>
>                 Key: CASSANDRA-8413
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8413
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Benedict
>            Assignee: Robert Stupp
>             Fix For: 3.x
>
>         Attachments: 8413-patch.txt, 8413.hack-3.0.txt, 8413.hack.txt
>
>
> Whilst thinking about CASSANDRA-7438 and hash bits, I realised we have a 
> problem with sabotaging our bloom filters when using the murmur3 partitioner. 
> I have performed a very quick test to confirm this risk is real.
> Since a typical cluster uses the same murmur3 hash for partitioning as we do 
> for bloom filter lookups, and we own a contiguous range, we can guarantee 
> that the top X bits collide for all keys on the node. This translates into 
> poor bloom filter distribution. I quickly hacked LongBloomFilterTest to 
> simulate the problem, and the result in these tests is _up to_ a doubling of 
> the actual false positive ratio. The actual change will depend on the key 
> distribution, the number of keys, the false positive ratio, the number of 
> nodes, the token distribution, etc. But seems to be a real problem for 
> non-vnode clusters of at least ~128 nodes in size.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to