[jira] [Created] (CASSANDRA-21194) Sampling data for dictionary training on more than Integer.MAX_VALUE bytes in pointless

Stefan Miklosovic (Jira) Fri, 27 Feb 2026 02:52:39 -0800

Stefan Miklosovic created CASSANDRA-21194:
---------------------------------------------


             Summary: Sampling data for dictionary training on more than 
Integer.MAX_VALUE bytes in pointless
                 Key: CASSANDRA-21194
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-21194
             Project: Apache Cassandra
          Issue Type: Improvement
          Components: Feature/Compression
            Reporter: Stefan Miklosovic


ZstdDictTrainer from zstd-jni library we use uses 
ByteBuffer.allocateDirect(size) for training samples. {{size}} is integer. 
Integer.MAX_VALUE is basically 2.0 GiB. So if a user wants to sample on more, 
like 3GiB, the sampling just stops at 2GiB and in training output it looks like 
it is stuck. We should validate this value before training and reject anything 
bigger than 2GiB.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Created] (CASSANDRA-21194) Sampling data for dictionary training on more than Integer.MAX_VALUE bytes in pointless

Reply via email to