There is a guardrail (1) which enforces table compression, at least some. In 
other words, it will not allow you to create / alter table without compression. 
We do not have any way to configure default compression algorithm. That is 
being done under (2) which is abandoned for now.  I would personally love to 
see that finished. We spent a lot of time on that, TCM guys were helpful with 
resolving some issues along the way.

(1) 
https://github.com/apache/cassandra/blob/trunk/conf/cassandra.yaml#L2250-L2251
(2) https://issues.apache.org/jira/browse/CASSANDRA-12937

From: Yifan Cai <[email protected]>
Date: Friday, 5 September 2025 at 19:03
To: [email protected] <[email protected]>
Subject: Re: [DISCUSS] CEP-54: ZSTD Compression with Dictionary Support
EXTERNAL EMAIL - USE CAUTION when clicking links or attachments


Hi Jindal,

Thanks for the questions. As Stefan mentioned (thanks to Stefan too), ZSTD 
dictionary compression is toggled via DDL at the table level. It is exactly the 
same way we configure compression for tables as of today. It is captured in the 
"New or Changed Public Interfaces" section of the CEP.

There is a cassandra-zstd-discuss slack channel. Please join if you are 
interested.

Hi Dinesh,

A "way to allowlist compression strategies as admins" sounds useful. It could 
be a guardrail if it is not there yet and can be added separately.

If the compression is left off, we should default to something sensible.
Maybe you can elaborate. I think the current behavior is no compression if 
compression is not configured. It might not be ideal in some cases. Maybe there 
could be an admin flag that admins can turn on to always compress with the 
default compressor (LZ4) when no compression is configured? If it sounds 
interesting, the toggle could be added separately too.

- Yifan

On Fri, Sep 5, 2025 at 9:33 AM Dinesh Joshi 
<[email protected]<mailto:[email protected]>> wrote:
On a related note, I don't recall if we have any way to allowlist compression 
strategies as admins? If not, it would be very helpful where the DB operator 
wants to avoid users that forget or do not set compression in their schema. If 
the compression is left off, we should default to something sensible.

On Fri, Sep 5, 2025 at 8:54 AM Štefan Miklošovič 
<[email protected]<mailto:[email protected]>> wrote:
Hi,

in table schema, there would be table compression configuration. Like new 
options for enabling compression, sampling strategy etc.

Then in cassandra.yaml, auto train, auto prune obsolete dics, training 
frequency, acceptance percentage, dictionary size and memory limits etc.

I took this from cassandra-zstd-discuss channel where this aspect of that was 
discussed and answered when I asked same question as you.

AFAIK it will be on _table level_.

You would just alter your table and change compression to some other 
compression strategy or you might just go to Zstd without dictionaries.

Regards



On Fri, Sep 5, 2025 at 5:23 PM Jindal, Himanshu 
<[email protected]<mailto:[email protected]>> wrote:
Hi Yifan,
This looks very promising for customers aiming to improve Cassandra 
performance. I had a few questions on the user experience:

  *   How does a user enable this feature—via YAML config or through CQL DDL?

  *   If it’s CQL, is it applied at the keyspace or table level?

  *   Is the process for disabling the feature the same?
Thanks,
Himanshu


From: Yifan Cai <[email protected]<mailto:[email protected]>>
Date: Thursday, September 4, 2025 at 7:00 PM
To: [email protected]<mailto:[email protected]> 
<[email protected]<mailto:[email protected]>>
Subject: RE: [EXTERNAL] [DISCUSS] CEP-54: ZSTD Compression with Dictionary 
Support

CAUTION: This email originated from outside of the organization. Do not click 
links or open attachments unless you can confirm the sender and know the 
content is safe.

Noted with thanks.

I agree that it does not need to be zstd specific. The additional dict 
information for CompressionInfo are dictionary id, dictionary bytes and 
checksum of id and content. It should be common for other dictionary-based 
compression algorithms. In terms of implementation, I will keep this in mind.

- Yifan

On Thu, Sep 4, 2025 at 5:49 PM David Capwell 
<[email protected]<mailto:[email protected]>> wrote:
Thanks for bringing this out!

My first question when quickly looking at this is can we make the 
CompressionInfo change agnostic to the algorithm or have the format change 
based off the algorithm?  Lz4 has similar (though not as easy to use as zstd) 
feature and new algorithms might come out which we want to include later on; It 
would be a shame to have the format tightly coupled to zstd only.


On Sep 4, 2025, at 1:50 PM, Yifan Cai 
<[email protected]<mailto:[email protected]>> wrote:

Hi community,

We would like to propose CEP-54: ZSTD Compression with Dictionary Support for 
adoption by the community:
https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-54%3A+ZSTD+with+Dictionary+SSTable+Compression<https://urldefense.com/v3/__https:/cwiki.apache.org/confluence/display/CASSANDRA/CEP-54*3A*ZSTD*with*Dictionary*SSTable*Compression__;JSsrKysr!!Nhn8V6BzJA!XdAhqt9Np6DpjFe6YCRx-Q4oMi0EkVW4gqVclVWNEpliurSUA-6lWEDmrI-lMwxcOylptre0w8K0lBNARQumNgQ$>

This CEP proposes introducing ZSTD with dictionary compression for SSTables. 
This feature allows users who need it to achieve significant improvements in 
compression ratio and speed, leading to better performance and storage 
efficiency. This is an entirely opt-in feature.

The proposed ZSTD with dictionary support will enable organizations to achieve:

- Faster read/write performance.
- Reduced storage footprint.
- Increased storage device lifetime from fewer writes.

Key design principles:

- Zero impact on users who don't enable the feature.
- Initial emphasis on simplicity, supporting a single global dictionary per 
table and manual training, while maintaining extensibility for future 
automation.
- SSTable-attached dictionaries to ensure that operations like backup, restore, 
and streaming continue to work seamlessly.
- Graceful fallback to standard ZSTD compression when a dictionary isn't 
available.
- A critical design constraint to avoid a large number of unique dictionaries, 
which can hurt decompression speed.

This enhancement addresses the need for better storage efficiency and 
performance by leveraging ZSTD dictionaries, while maintaining complete 
backward compatibility and requiring no changes to existing deployments that do 
not enable the feature.

Thanks to Jon Haddad for bringing up the topic and providing feedbacks in 
shaping the design, and to Dinesh Joshi, Joey Lynch, Stefan Miklosovic, and 
Francisco Guerrero for providing design feedbacks.
Thanks in advance for your time and feedback. Please keep the discussion on 
this mailing list thread.

- Yifan

Reply via email to