[ https://issues.apache.org/jira/browse/CASSANDRA-9060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Gustav Munkby updated CASSANDRA-9060: ------------------------------------- Attachment: trunk-9060.patch > Anticompaction hangs on bloom filter bitset serialization > ---------------------------------------------------------- > > Key: CASSANDRA-9060 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9060 > Project: Cassandra > Issue Type: Bug > Reporter: Gustav Munkby > Priority: Minor > Attachments: trunk-9060.patch > > > I tried running an incremental repair against a 15-node vnode-cluster with > roughly 500GB data running on 2.1.3-SNAPSHOT, without performing the > suggested migration steps. I manually chose a small range for the repair > (using --start/end-token). The actual repair part took almost no time at all, > but the anticompactions took a lot of time (not surprisingly). > Obviously, this might not be the ideal way to run incremental repairs, but I > wanted to look into what made the whole process so slow. The results were > rather surprising. The majority of the time was spent serializing bloom > filters. > The reason seemed to be two-fold. First, the bloom-filters generated were > huge (probably because the original SSTables were large). With a proper > migration to incremental repairs, I'm guessing this would not happen. > Secondly, however, the bloom filters were being written to the output one > byte at a time (with quite a few type-conversions on the way) to transform > the little-endian in-memory representation to the big-endian on-disk > representation. > I have implemented a solution where big-endian is used in-memory as well as > on-disk, which obviously makes de-/serialization much, much faster. This > introduces some slight overhead when checking the bloom filter, but I can't > see how that would be problematic. An obvious alternative would be to still > perform the serialization/deserialization using a byte array, but perform the > byte-order swap there. -- This message was sent by Atlassian JIRA (v6.3.4#6332)