[ 
https://issues.apache.org/jira/browse/CASSANDRA-9060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14385767#comment-14385767
 ] 

Gustav Munkby commented on CASSANDRA-9060:
------------------------------------------

Regarding the size of the Bloom filters, I think the immediate problem is that 
they are created to allow room for all keys in all anticompacted tables, 
whereas anticompactions process one table at a time. I've added a patch, which 
I believe does exactly that. Given that this change is fairly small, I targeted 
it at 2.1.

As the keys are going to be distributed over the two resulting tables, in the 
ideal world we might want to have much smaller bloom filters on either side 
than what we initially thought. I'm guessing this is a general problem with 
compactions, but the HyperLogLog cardinality estimators should help in the 
normal case.

For the general case of ensuring the Bloom filters are not too large, I can see 
basically two solutions. Either introduce a scanning phase before the actual 
compaction, where the size of the bloom filter(s) are calculated. Or reduce the 
size of the Bloom filter once compaction has completed. The obvious 
implementation of the latter would be to scan through the compacted index, 
possibly gated by a comparison of the index size and the bloom filter size.

I guess scanning through the index could be avoided by making sure that the 
IndexWriter kept track of multiple Bloom-filters of exponentially growing 
sizes. That way, once the index is complete, the most appropriate Bloom-filter 
could be picked and written to disk, discarding the others.

> Anticompaction hangs on bloom filter bitset serialization 
> ----------------------------------------------------------
>
>                 Key: CASSANDRA-9060
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9060
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Gustav Munkby
>            Assignee: Marcus Eriksson
>            Priority: Minor
>             Fix For: 3.0
>
>         Attachments: 2.1-9060-simple.patch, trunk-9060.patch
>
>
> I tried running an incremental repair against a 15-node vnode-cluster with 
> roughly 500GB data running on 2.1.3-SNAPSHOT, without performing the 
> suggested migration steps. I manually chose a small range for the repair 
> (using --start/end-token). The actual repair part took almost no time at all, 
> but the anticompactions took a lot of time (not surprisingly).
> Obviously, this might not be the ideal way to run incremental repairs, but I 
> wanted to look into what made the whole process so slow. The results were 
> rather surprising. The majority of the time was spent serializing bloom 
> filters.
> The reason seemed to be two-fold. First, the bloom-filters generated were 
> huge (probably because the original SSTables were large). With a proper 
> migration to incremental repairs, I'm guessing this would not happen. 
> Secondly, however, the bloom filters were being written to the output one 
> byte at a time (with quite a few type-conversions on the way) to transform 
> the little-endian in-memory representation to the big-endian on-disk 
> representation.
> I have implemented a solution where big-endian is used in-memory as well as 
> on-disk, which obviously makes de-/serialization much, much faster. This 
> introduces some slight overhead when checking the bloom filter, but I can't 
> see how that would be problematic. An obvious alternative would be to still 
> perform the serialization/deserialization using a byte array, but perform the 
> byte-order swap there.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to