[ 
https://issues.apache.org/jira/browse/CASSANDRA-9060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14385498#comment-14385498
 ] 

Marcus Eriksson commented on CASSANDRA-9060:
--------------------------------------------

yeah we should fix the bloom filter size estimations before starting the 
anticompaction

> Anticompaction hangs on bloom filter bitset serialization 
> ----------------------------------------------------------
>
>                 Key: CASSANDRA-9060
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9060
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Gustav Munkby
>            Priority: Minor
>             Fix For: 3.0
>
>         Attachments: trunk-9060.patch
>
>
> I tried running an incremental repair against a 15-node vnode-cluster with 
> roughly 500GB data running on 2.1.3-SNAPSHOT, without performing the 
> suggested migration steps. I manually chose a small range for the repair 
> (using --start/end-token). The actual repair part took almost no time at all, 
> but the anticompactions took a lot of time (not surprisingly).
> Obviously, this might not be the ideal way to run incremental repairs, but I 
> wanted to look into what made the whole process so slow. The results were 
> rather surprising. The majority of the time was spent serializing bloom 
> filters.
> The reason seemed to be two-fold. First, the bloom-filters generated were 
> huge (probably because the original SSTables were large). With a proper 
> migration to incremental repairs, I'm guessing this would not happen. 
> Secondly, however, the bloom filters were being written to the output one 
> byte at a time (with quite a few type-conversions on the way) to transform 
> the little-endian in-memory representation to the big-endian on-disk 
> representation.
> I have implemented a solution where big-endian is used in-memory as well as 
> on-disk, which obviously makes de-/serialization much, much faster. This 
> introduces some slight overhead when checking the bloom filter, but I can't 
> see how that would be problematic. An obvious alternative would be to still 
> perform the serialization/deserialization using a byte array, but perform the 
> byte-order swap there.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to