[jira] [Commented] (CASSANDRA-14482) ZSTD Compressor support in Cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-14482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16775700#comment-16775700 ] Joseph Lynch commented on CASSANDRA-14482: -- [~bdeggleston] nice thank you! I'm looking forward to testing this out! > ZSTD Compressor support in Cassandra > > > Key: CASSANDRA-14482 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14482 > Project: Cassandra > Issue Type: New Feature > Components: Dependencies, Feature/Compression >Reporter: Sushma A Devendrappa >Assignee: Dinesh Joshi >Priority: Major > Labels: performance, pull-request-available > Fix For: 4.x > > Time Spent: 2h 40m > Remaining Estimate: 0h > > ZStandard has a great speed and compression ratio tradeoff. > ZStandard is open source compression from Facebook. > More about ZSTD > [https://github.com/facebook/zstd] > https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/ > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14482) ZSTD Compressor support in Cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-14482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16775678#comment-16775678 ] Dinesh Joshi commented on CASSANDRA-14482: -- Thank you [~bdeggleston]! > ZSTD Compressor support in Cassandra > > > Key: CASSANDRA-14482 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14482 > Project: Cassandra > Issue Type: New Feature > Components: Dependencies, Feature/Compression >Reporter: Sushma A Devendrappa >Assignee: Dinesh Joshi >Priority: Major > Labels: performance, pull-request-available > Fix For: 4.x > > Time Spent: 2h 40m > Remaining Estimate: 0h > > ZStandard has a great speed and compression ratio tradeoff. > ZStandard is open source compression from Facebook. > More about ZSTD > [https://github.com/facebook/zstd] > https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/ > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14482) ZSTD Compressor support in Cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-14482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16774646#comment-16774646 ] Dinesh Joshi commented on CASSANDRA-14482: -- Thanks, [~benedict] for the insightful comment. I patched the {{zstd-jni}} to add the ability to enable compression on the methods that [~bdeggleston] suggested. It was accepted upstream and is now available starting with {{zstd-jni-1.3.8-5}}. I have pulled it in and enabled it. I think that resolves Blake's concerns regarding GC and we get checksumming as well. > ZSTD Compressor support in Cassandra > > > Key: CASSANDRA-14482 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14482 > Project: Cassandra > Issue Type: New Feature > Components: Dependencies, Feature/Compression >Reporter: Sushma A Devendrappa >Assignee: Dinesh Joshi >Priority: Major > Labels: performance, pull-request-available > Fix For: 4.x > > Time Spent: 2h 40m > Remaining Estimate: 0h > > ZStandard has a great speed and compression ratio tradeoff. > ZStandard is open source compression from Facebook. > More about ZSTD > [https://github.com/facebook/zstd] > https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/ > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14482) ZSTD Compressor support in Cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-14482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16773915#comment-16773915 ] Benedict commented on CASSANDRA-14482: -- Going over the data twice is unlikely to incur much greater penalty than going over it once and doing both things. In fact, if the two behaviours are designed to behave optimally with the CPU pipeline (which compression and checksumming algorithms each certainly are) then mixing the two simultaneously would almost certainly be slower than running each independently. Looking at the ZStd code, it looks like it does the sensible thing and executes the checksum independently. It appears to checksum the input stream rather than the output, though, which is odd given that the latter should be smaller (and modulo any bugs in the compressor, should be just as good). The only possible advantage ZStd could probably have over us would to perform the checksum incrementally on, say, pages of data it is also compressing so that it is guaranteed to be in L1, and to guarantee no TLB misses. However, it doesn't *seem* to do this - it seems to assume you provide the data in reasonable chunks. Anyway, there should be no TLB misses on the size of data we're operating over when visiting it twice, and the data should be in L3 at worst, and prefetched to L2. We could also probably do this ourselves, by providing only page-sized frames to compress and performing the checksum incrementally, though this would mean tighter integration with the C API, and is unlikely to be worth the effort. I have, though, made some assumptions about the ZStd code on reading it, as I didn't make time to fully read the codebase. > ZSTD Compressor support in Cassandra > > > Key: CASSANDRA-14482 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14482 > Project: Cassandra > Issue Type: New Feature > Components: Dependencies, Feature/Compression >Reporter: Sushma A Devendrappa >Assignee: Dinesh Joshi >Priority: Major > Labels: performance, pull-request-available > Fix For: 4.x > > Time Spent: 2h 40m > Remaining Estimate: 0h > > ZStandard has a great speed and compression ratio tradeoff. > ZStandard is open source compression from Facebook. > More about ZSTD > [https://github.com/facebook/zstd] > https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/ > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14482) ZSTD Compressor support in Cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-14482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16773883#comment-16773883 ] Dinesh Joshi commented on CASSANDRA-14482: -- Adding our own checksumming incurs additional overhead not just in terms of the additional CPU that we would use going over the data twice (once for compression inside Zstd and then once in the compressor to compute the hash) but also additional code and maintaining that code. From my digging around in the code it seems [they're clobbering parameters|https://github.com/facebook/zstd/issues/1534] which should be an easy fix. > ZSTD Compressor support in Cassandra > > > Key: CASSANDRA-14482 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14482 > Project: Cassandra > Issue Type: New Feature > Components: Dependencies, Feature/Compression >Reporter: Sushma A Devendrappa >Assignee: Dinesh Joshi >Priority: Major > Labels: performance, pull-request-available > Fix For: 4.x > > Time Spent: 2h 40m > Remaining Estimate: 0h > > ZStandard has a great speed and compression ratio tradeoff. > ZStandard is open source compression from Facebook. > More about ZSTD > [https://github.com/facebook/zstd] > https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/ > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14482) ZSTD Compressor support in Cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-14482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16773871#comment-16773871 ] Benedict commented on CASSANDRA-14482: -- I don't see any issue with doing our own checksumming? It's not like they'll be doing anything magical. In fact, according to [this|https://facebook.github.io/zstd/zstd_manual.html] they're just using the lower 32bits of xxhash64, which we already utilise elsewhere I think? There's also a good case to made for permitting the checksum to be configurable, so that the user can decide on their preferred level of protection, both in algorithm and number of bits. > ZSTD Compressor support in Cassandra > > > Key: CASSANDRA-14482 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14482 > Project: Cassandra > Issue Type: New Feature > Components: Dependencies, Feature/Compression >Reporter: Sushma A Devendrappa >Assignee: Dinesh Joshi >Priority: Major > Labels: performance, pull-request-available > Fix For: 4.x > > Time Spent: 2h 40m > Remaining Estimate: 0h > > ZStandard has a great speed and compression ratio tradeoff. > ZStandard is open source compression from Facebook. > More about ZSTD > [https://github.com/facebook/zstd] > https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/ > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14482) ZSTD Compressor support in Cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-14482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16773821#comment-16773821 ] Dinesh Joshi commented on CASSANDRA-14482: -- [~bdeggleston] thanks for the review. The reason I used the streams was that {{Zstd}} does not enable setting the checksumming flag via the `ZStd::compress` JNI static helper. I confirmed this with the JNI author and it goes deeper than just the JNI bindings. However using the compression stream causes GC. So here are the options we have right now - # Move forward with checksumming & accept the GC overhead # Move forward withOUT checksumming # Allow user to turn on/off checksumming using a compression preference parameter (turning on will incur GC, turning off wont) # Add our own checksumming (ugly, burns additional CPU but still generates some garbage) # Work with Zstd & Zstd JNI to enable passing in flags such as checksumming flag I personally think in the near term we should pick option 1-3 and move forward and open a follow on ticket to address the GC issue. I am opposed to doing our own checksumming especially because Zstd already supports it and it is just a matter of plumbing and adding the appropriate APIs to make it happen in a performant manner for JNI. If anybody has any other ideas, I am all ears. [~aweisberg] [~iamaleksey] [~benedict] [~jjirsa] please feel free to chime in. I am already discussing this issue in the Zstd community and have a working prototype of what we need but I think it is incomplete. I have reached out to [~dikanggu] to help surface it with the Zstd team as well. > ZSTD Compressor support in Cassandra > > > Key: CASSANDRA-14482 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14482 > Project: Cassandra > Issue Type: New Feature > Components: Dependencies, Feature/Compression >Reporter: Sushma A Devendrappa >Assignee: Dinesh Joshi >Priority: Major > Labels: performance, pull-request-available > Fix For: 4.x > > Time Spent: 2h 40m > Remaining Estimate: 0h > > ZStandard has a great speed and compression ratio tradeoff. > ZStandard is open source compression from Facebook. > More about ZSTD > [https://github.com/facebook/zstd] > https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/ > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14482) ZSTD Compressor support in Cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-14482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16773524#comment-16773524 ] Blake Eggleston commented on CASSANDRA-14482: - [~djoshi3], I pushed up a commit [here|https://github.com/bdeggleston/cassandra/commit/716a6b631d68532773bb804e8ce33ab3dd23946c] that makes the following changes: 1. simplifies {{ZstdCompressor#compress}} method to just call {{Zstd#compress}}. The compressor doesn't support on heap buffers so there's no need to do the stream stuff. This is basically how the snappy compressor works. 2. updates test support method visibility/annotation in ZstdCompressor 3. adds ratio to CompressorPerformance output If you're ok with these changes and I haven't broken the tests, I think this is ready to commit. > ZSTD Compressor support in Cassandra > > > Key: CASSANDRA-14482 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14482 > Project: Cassandra > Issue Type: New Feature > Components: Dependencies, Feature/Compression >Reporter: Sushma A Devendrappa >Assignee: Dinesh Joshi >Priority: Major > Labels: performance, pull-request-available > Fix For: 4.x > > Time Spent: 2h 40m > Remaining Estimate: 0h > > ZStandard has a great speed and compression ratio tradeoff. > ZStandard is open source compression from Facebook. > More about ZSTD > [https://github.com/facebook/zstd] > https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/ > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14482) ZSTD Compressor support in Cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-14482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16768991#comment-16768991 ] Dinesh Joshi commented on CASSANDRA-14482: -- Here's the patch. This approach not only enables Zstd compression but also checksumming. ||trunk|| |[branch|https://github.com/dineshjoshi/cassandra/tree/14482-trunk]| |[utests & dtests|https://circleci.com/gh/dineshjoshi/workflows/cassandra/tree/14482-trunk]| > ZSTD Compressor support in Cassandra > > > Key: CASSANDRA-14482 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14482 > Project: Cassandra > Issue Type: New Feature > Components: Dependencies, Feature/Compression >Reporter: Sushma A Devendrappa >Assignee: Dinesh Joshi >Priority: Major > Labels: performance, pull-request-available > Fix For: 4.x > > Time Spent: 2h 40m > Remaining Estimate: 0h > > ZStandard has a great speed and compression ratio tradeoff. > ZStandard is open source compression from Facebook. > More about ZSTD > [https://github.com/facebook/zstd] > https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/ > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14482) ZSTD Compressor support in Cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-14482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16767573#comment-16767573 ] Dinesh Joshi commented on CASSANDRA-14482: -- Thanks, I'll have the patch up shortly! > ZSTD Compressor support in Cassandra > > > Key: CASSANDRA-14482 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14482 > Project: Cassandra > Issue Type: New Feature > Components: Dependencies, Feature/Compression >Reporter: Sushma A Devendrappa >Assignee: Sushma A Devendrappa >Priority: Major > Labels: performance, pull-request-available > Fix For: 4.x > > Time Spent: 2h 40m > Remaining Estimate: 0h > > ZStandard has a great speed and compression ratio tradeoff. > ZStandard is open source compression from Facebook. > More about ZSTD > [https://github.com/facebook/zstd] > https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/ > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14482) ZSTD Compressor support in Cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-14482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16767564#comment-16767564 ] Sushma A Devendrappa commented on CASSANDRA-14482: -- [~djoshi3] please go head, thanks for following up on this one! > ZSTD Compressor support in Cassandra > > > Key: CASSANDRA-14482 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14482 > Project: Cassandra > Issue Type: New Feature > Components: Dependencies, Feature/Compression >Reporter: Sushma A Devendrappa >Assignee: Sushma A Devendrappa >Priority: Major > Labels: performance, pull-request-available > Fix For: 4.x > > Time Spent: 2h 40m > Remaining Estimate: 0h > > ZStandard has a great speed and compression ratio tradeoff. > ZStandard is open source compression from Facebook. > More about ZSTD > [https://github.com/facebook/zstd] > https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/ > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14482) ZSTD Compressor support in Cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-14482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16767536#comment-16767536 ] Dinesh Joshi commented on CASSANDRA-14482: -- Sushma, there are only a few changes required. I am planning to make those changes so you can review it. > ZSTD Compressor support in Cassandra > > > Key: CASSANDRA-14482 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14482 > Project: Cassandra > Issue Type: New Feature > Components: Dependencies, Feature/Compression >Reporter: Sushma A Devendrappa >Assignee: Sushma A Devendrappa >Priority: Major > Labels: performance, pull-request-available > Fix For: 4.x > > Time Spent: 2h 40m > Remaining Estimate: 0h > > ZStandard has a great speed and compression ratio tradeoff. > ZStandard is open source compression from Facebook. > More about ZSTD > [https://github.com/facebook/zstd] > https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/ > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14482) ZSTD Compressor support in Cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-14482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16719599#comment-16719599 ] Dinesh Joshi commented on CASSANDRA-14482: -- [~sushm...@gmail.com], I left a few comments on the PR. > ZSTD Compressor support in Cassandra > > > Key: CASSANDRA-14482 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14482 > Project: Cassandra > Issue Type: New Feature > Components: Compression, Libraries >Reporter: Sushma A Devendrappa >Assignee: Sushma A Devendrappa >Priority: Major > Labels: performance, pull-request-available > Fix For: 4.x > > Time Spent: 2h 40m > Remaining Estimate: 0h > > ZStandard has a great speed and compression ratio tradeoff. > ZStandard is open source compression from Facebook. > More about ZSTD > [https://github.com/facebook/zstd] > https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/ > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14482) ZSTD Compressor support in Cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-14482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16719503#comment-16719503 ] Dinesh Joshi commented on CASSANDRA-14482: -- I think the general consensus is that this is a good feature to have in 4.0. It adds value (hopefully) without messing up the stability of the end product i.e. if you don't select this algo, the release should be no better or worse than not including this algo as part of the release. If it helps, I can help testing this if that is a big concern. > ZSTD Compressor support in Cassandra > > > Key: CASSANDRA-14482 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14482 > Project: Cassandra > Issue Type: New Feature > Components: Compression, Libraries >Reporter: Sushma A Devendrappa >Assignee: Sushma A Devendrappa >Priority: Major > Labels: performance, pull-request-available > Fix For: 4.x > > Time Spent: 2h 20m > Remaining Estimate: 0h > > ZStandard has a great speed and compression ratio tradeoff. > ZStandard is open source compression from Facebook. > More about ZSTD > [https://github.com/facebook/zstd] > https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/ > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14482) ZSTD Compressor support in Cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-14482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16714966#comment-16714966 ] Sushma A Devendrappa commented on CASSANDRA-14482: -- Thanks for all the feedback! [~aweisberg] we have seen similar results as [~jolynch], several usecases show better results than LZ4 and Deflate, ZSTD has good speed for most usecases given same compression ratio. Current code changes does not affect any testing neither touches any other code part of cassandra itself, apart from adding a new compression. So this should be safe changes for any experimentation and also real usage in production. Dictionary based compression is a very good feature of ZSTD, that can be enabled and code changes for that can go in a separate patch in other upcoming releases. Current patch gives a good start for introducing ZSTD as a compression variant and later add more feature. > ZSTD Compressor support in Cassandra > > > Key: CASSANDRA-14482 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14482 > Project: Cassandra > Issue Type: New Feature > Components: Compression, Libraries >Reporter: Sushma A Devendrappa >Assignee: Sushma A Devendrappa >Priority: Major > Labels: performance, pull-request-available > Fix For: 4.x > > Time Spent: 1h 10m > Remaining Estimate: 0h > > ZStandard has a great speed and compression ratio tradeoff. > ZStandard is open source compression from Facebook. > More about ZSTD > [https://github.com/facebook/zstd] > https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/ > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14482) ZSTD Compressor support in Cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-14482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16714590#comment-16714590 ] Benedict commented on CASSANDRA-14482: -- [~jolynch]: I know we've talked about this before, but once 4.0 freeze is over I'd be happy to help you get this dictionary approach merged into mainline (and help you figure out any issues with your interactions with early-open - presumably the issue is with sharing ownership of the new dictionary). I'm not sure it would strictly be necessary to wait until 5.0 for this, as while it certainly breaks _my_ interpretation of the freeze, it's probably not going to be a _dramatic_ change. It could be quite localised, and might be acceptable for a patch release, depending on how we approach our future release processes. > ZSTD Compressor support in Cassandra > > > Key: CASSANDRA-14482 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14482 > Project: Cassandra > Issue Type: New Feature > Components: Compression, Libraries >Reporter: Sushma A Devendrappa >Assignee: Sushma A Devendrappa >Priority: Major > Labels: performance, pull-request-available > Fix For: 4.x > > Time Spent: 1h 10m > Remaining Estimate: 0h > > ZStandard has a great speed and compression ratio tradeoff. > ZStandard is open source compression from Facebook. > More about ZSTD > [https://github.com/facebook/zstd] > https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/ > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14482) ZSTD Compressor support in Cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-14482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16714182#comment-16714182 ] Joseph Lynch commented on CASSANDRA-14482: -- {quote}We're in a freeze, but I can't imagine this would break anyone's testing {quote} Just want to +1 merging this if we can as zstd will be a significant performance improvement over deflate and imo should be the default "high compression" option. Even if we don't want to merge it in 4.0 it could be shipped first as a standalone jar (there's been [cassandra-zstd|https://github.com/MatejTymes/cassandra-zstd] for a while now). I do think that if we merge it for 4.0 having some "before and after" blog posts of zstd vs deflate for both performance and ratio would be really great (zstd should beat deflate both in ratio and performance). [~aweisberg] {quote}This doesn't seem to use the killer feature of ZSTD which is the possibility of using a dictionary? How good is zstd if you don't take advantage of that? {quote} Sushma can confirm but in our experiments zstd without dictionaries strictly dominates deflate as in it is significantly faster and still get's better compression ratios on most datasets. Zstd is actually on the same order of magnitude speed wise as LZ4 which is very impressive given the additional ratio but I guess it makes sense since it's written by the same person. Basically, even without dictionaries, zstd is a really good option for use cases that need to get high ratio. That being said, I've been playing around with integrating a dictionary per SSTable approach for zstd and it works _really_ well. You can run with like 4kb block size with a fixed 10kb of dictionary stored after the offsets in the {{CompressionInfo.db}} file and get similar ratios for many types of data as LZ4 did with a 64kb block size. Also, because the dictionaries are part of the {{CompressionInfo.db}} files it's also compatible with most backup/restore and offline processing systems as well. I haven't posted a patch yet because this strategy requires non trivial changes to the {{ICompressor}} contract which I think would violate the freeze and to be frank I haven't been able to get it fully working in Cassandra itself I believe due to early re-open (it looks like there is a code path that can re-open the {{SSTableReader}} and in my case it causes reads to use the wrong dictionary and everything blows up). All my prototypes have lived outside Cassandra and use dictionaries stored outside the database, but I think it's safe to say that the "hard" part of adding dictionary support is going to be how to generate and manage them. I think the best path forwards will be to hook into training the dictionaries at SSTable write time, store them in {{CompressionInfo.db}} and then have an instance of the compressor per sstable as the dictionaries can be held off heap by the zstd compressor so there would be little overhead. If we went with the dictionary per sstable approach you'd have a 10kb dictionary per sstable and 2TB of data / 160MB / sstable = 12500 stables * 10kb offheap memory / sstable = 125 megabytes of offheap memory for the dictionaries. This seems pretty worth it to me since you'll probably be getting another 20-30% ratio. Anyways, I believe dictionaries are going to be a game changer, but I also think it should be a follow up item to this ticket. Adding _just_ a zstd compressor without dictionaries should be a big win, and then for 5.0 we can work on integrating dictionaries and your in memory offset representation for a _huge_ win. > ZSTD Compressor support in Cassandra > > > Key: CASSANDRA-14482 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14482 > Project: Cassandra > Issue Type: New Feature > Components: Compression, Libraries >Reporter: Sushma A Devendrappa >Assignee: Sushma A Devendrappa >Priority: Major > Labels: performance, pull-request-available > Fix For: 4.x > > Time Spent: 10m > Remaining Estimate: 0h > > ZStandard has a great speed and compression ratio tradeoff. > ZStandard is open source compression from Facebook. > More about ZSTD > [https://github.com/facebook/zstd] > https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/ > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14482) ZSTD Compressor support in Cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-14482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16713835#comment-16713835 ] Ariel Weisberg commented on CASSANDRA-14482: An exciting addition for compression options. This doesn't seem to use the killer feature of ZSTD which is the possibility of using a dictionary? How good is zstd if you don't take advantage of that? > ZSTD Compressor support in Cassandra > > > Key: CASSANDRA-14482 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14482 > Project: Cassandra > Issue Type: New Feature > Components: Compression, Libraries >Reporter: Sushma A Devendrappa >Assignee: Sushma A Devendrappa >Priority: Major > Labels: performance, pull-request-available > Fix For: 4.x > > Time Spent: 10m > Remaining Estimate: 0h > > ZStandard has a great speed and compression ratio tradeoff. > ZStandard is open source compression from Facebook. > More about ZSTD > [https://github.com/facebook/zstd] > https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/ > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14482) ZSTD Compressor support in Cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-14482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16713792#comment-16713792 ] Jeff Jirsa commented on CASSANDRA-14482: I'm not [~djoshi3] , but a few quick notes (he's still reviewer, just adding my unsolicited 2 cents) - Please include the license for the library in {{lib/}} - We're in a freeze, but I can't imagine this would break anyone's testing, and a major version is a great time for a change like this (and I think we need to be doing more of this, updating to modern algorithms is important for the project). It may be worth floating an email to dev list to see if anyone objects to including it > ZSTD Compressor support in Cassandra > > > Key: CASSANDRA-14482 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14482 > Project: Cassandra > Issue Type: New Feature > Components: Compression, Libraries >Reporter: Sushma A Devendrappa >Assignee: Sushma A Devendrappa >Priority: Major > Labels: performance, pull-request-available > Fix For: 4.x > > Time Spent: 10m > Remaining Estimate: 0h > > ZStandard has a great speed and compression ratio tradeoff. > ZStandard is open source compression from Facebook. > More about ZSTD > [https://github.com/facebook/zstd] > https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/ > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14482) ZSTD Compressor support in Cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-14482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16712962#comment-16712962 ] Sushma A Devendrappa commented on CASSANDRA-14482: -- Thank you [~djoshi3] > ZSTD Compressor support in Cassandra > > > Key: CASSANDRA-14482 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14482 > Project: Cassandra > Issue Type: New Feature > Components: Compression, Libraries >Reporter: Sushma A Devendrappa >Assignee: Sushma A Devendrappa >Priority: Major > Labels: performance, pull-request-available > Fix For: 4.x > > Time Spent: 10m > Remaining Estimate: 0h > > ZStandard has a great speed and compression ratio tradeoff. > ZStandard is open source compression from Facebook. > More about ZSTD > [https://github.com/facebook/zstd] > https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/ > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14482) ZSTD Compressor support in Cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-14482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16712363#comment-16712363 ] Dinesh Joshi commented on CASSANDRA-14482: -- [~sushm...@gmail.com] I can help review this. Please go ahead and create a GH PR. > ZSTD Compressor support in Cassandra > > > Key: CASSANDRA-14482 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14482 > Project: Cassandra > Issue Type: New Feature > Components: Compression, Libraries >Reporter: Sushma A Devendrappa >Assignee: Sushma A Devendrappa >Priority: Major > Labels: performance, pull-request-available > Fix For: 4.x > > Time Spent: 10m > Remaining Estimate: 0h > > ZStandard has a great speed and compression ratio tradeoff. > ZStandard is open source compression from Facebook. > More about ZSTD > [https://github.com/facebook/zstd] > https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/ > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14482) ZSTD Compressor support in Cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-14482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16711824#comment-16711824 ] Sushma A Devendrappa commented on CASSANDRA-14482: -- [~vinaykumarcse] [~vinaykumarcse] [~snazy] [~dikanggu] [~jjirsa] can anyone help me review the PR > ZSTD Compressor support in Cassandra > > > Key: CASSANDRA-14482 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14482 > Project: Cassandra > Issue Type: New Feature > Components: Compression, Libraries >Reporter: Sushma A Devendrappa >Assignee: Sushma A Devendrappa >Priority: Major > Labels: performance, pull-request-available > Fix For: 4.x > > Time Spent: 10m > Remaining Estimate: 0h > > ZStandard has a great speed and compression ratio tradeoff. > ZStandard is open source compression from Facebook. > More about ZSTD > [https://github.com/facebook/zstd] > https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/ > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14482) ZSTD Compressor support in Cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-14482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16690790#comment-16690790 ] C. Scott Andreas commented on CASSANDRA-14482: -- [~sushm...@gmail.com] Thanks again for your work on this ticket and your presentation! I've updated the fix version to 4.x as 3.x releases are currently intended for bug fixes. > ZSTD Compressor support in Cassandra > > > Key: CASSANDRA-14482 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14482 > Project: Cassandra > Issue Type: New Feature > Components: Compression, Libraries >Reporter: Sushma A Devendrappa >Assignee: Sushma A Devendrappa >Priority: Major > Labels: performance > Fix For: 4.x > > > ZStandard has a great speed and compression ratio tradeoff. > ZStandard is open source compression from Facebook. > More about ZSTD > [https://github.com/facebook/zstd] > https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/ > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14482) ZSTD Compressor support in Cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-14482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16506282#comment-16506282 ] Vinay Chella commented on CASSANDRA-14482: -- (y)(y). Looking forward for your contributions [~sushm...@gmail.com] > ZSTD Compressor support in Cassandra > > > Key: CASSANDRA-14482 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14482 > Project: Cassandra > Issue Type: Wish > Components: Compression, Libraries >Reporter: Sushma A Devendrappa >Assignee: Sushma A Devendrappa >Priority: Major > Labels: performance > Fix For: 3.11.x, 4.x > > > ZStandard has a great speed and compression ratio tradeoff. > ZStandard is open source compression from Facebook. > More about ZSTD > [https://github.com/facebook/zstd] > https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/ > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14482) ZSTD Compressor support in Cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-14482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16506156#comment-16506156 ] Jeff Jirsa commented on CASSANDRA-14482: I think that's more than fair and reasonable. > ZSTD Compressor support in Cassandra > > > Key: CASSANDRA-14482 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14482 > Project: Cassandra > Issue Type: Wish > Components: Libraries >Reporter: Sushma A Devendrappa >Assignee: Sushma A Devendrappa >Priority: Major > Fix For: 3.11.x > > > ZStandard has a great speed and compression ratio tradeoff. > ZStandard is open source compression from Facebook. > More about ZSTD > [https://github.com/facebook/zstd] > https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/ > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14482) ZSTD Compressor support in Cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-14482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16506145#comment-16506145 ] Sushma A Devendrappa commented on CASSANDRA-14482: -- [~vinaykumarcse] [~jjirsa] [~zznate] do you guys mind if i work on this. I am internally working on this and would love to take it forward and this will be my first chance to contribute to the community. Thanks Sushma > ZSTD Compressor support in Cassandra > > > Key: CASSANDRA-14482 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14482 > Project: Cassandra > Issue Type: Wish > Components: Libraries >Reporter: Sushma A Devendrappa >Assignee: Vinay Chella >Priority: Major > Fix For: 3.11.x > > > ZStandard has a great speed and compression ratio tradeoff. > ZStandard is open source compression from Facebook. > More about ZSTD > [https://github.com/facebook/zstd] > https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/ > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14482) ZSTD Compressor support in Cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-14482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16505748#comment-16505748 ] Jeff Jirsa commented on CASSANDRA-14482: I'm not Nate, but I assigned it to you > ZSTD Compressor support in Cassandra > > > Key: CASSANDRA-14482 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14482 > Project: Cassandra > Issue Type: Wish > Components: Libraries >Reporter: Sushma A Devendrappa >Assignee: Vinay Chella >Priority: Major > Fix For: 3.11.x > > > ZStandard has a great speed and compression ratio tradeoff. > ZStandard is open source compression from Facebook. > More about ZSTD > [https://github.com/facebook/zstd] > https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/ > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14482) ZSTD Compressor support in Cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-14482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16505719#comment-16505719 ] Vinay Chella commented on CASSANDRA-14482: -- [~sushm...@gmail.com] We will also try zstd within C* internally and share our results. [~zznate] As we discussed in the meetup, can you assign this ticket to me. I am happy to work on this. > ZSTD Compressor support in Cassandra > > > Key: CASSANDRA-14482 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14482 > Project: Cassandra > Issue Type: Wish > Components: Libraries >Reporter: Sushma A Devendrappa >Priority: Major > Fix For: 3.11.x > > > ZStandard has a great speed and compression ratio tradeoff. > ZStandard is open source compression from Facebook. > More about ZSTD > [https://github.com/facebook/zstd] > https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/ > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14482) ZSTD Compressor support in Cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-14482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16505708#comment-16505708 ] Sushma A Devendrappa commented on CASSANDRA-14482: -- [~jolynch] Thanks for showing interest, I was planning to move this ticket from wish list to actionable item. Interesting to see there is some implementation already for this. I will soon share some bench mark results comparing Deflate and ZSTD on Java using jni. > ZSTD Compressor support in Cassandra > > > Key: CASSANDRA-14482 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14482 > Project: Cassandra > Issue Type: Wish > Components: Libraries >Reporter: Sushma A Devendrappa >Priority: Major > Fix For: 3.11.x > > > ZStandard has a great speed and compression ratio tradeoff. > ZStandard is open source compression from Facebook. > More about ZSTD > [https://github.com/facebook/zstd] > https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/ > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14482) ZSTD Compressor support in Cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-14482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16505322#comment-16505322 ] Joseph Lynch commented on CASSANDRA-14482: -- [~sushm...@gmail.com] this is a really interesting idea especially if the [benchmarks|https://github.com/facebook/zstd#benchmarks] are true as it would mean that zstd should strictly dominate deflate (3x faster and better compression ratio); sort of how LZ4 strictly dominates Snappy these days. It looks like someone has already created a zstd compressor [implementation for Cassandra|https://github.com/MatejTymes/cassandra-zstd#installation], I'd be curious how that benchmarks against deflate. > ZSTD Compressor support in Cassandra > > > Key: CASSANDRA-14482 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14482 > Project: Cassandra > Issue Type: Wish > Components: Libraries >Reporter: Sushma A Devendrappa >Priority: Major > Fix For: 3.11.x > > > ZStandard has a great speed and compression ratio tradeoff. > ZStandard is open source compression from Facebook. > More about ZSTD > [https://github.com/facebook/zstd] > https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/ > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org