Re: [PR] CASSANDRA-21209 Rework ZSTD dictionary compression logic to create a trainer per training [cassandra]

via GitHub Thu, 12 Mar 2026 09:57:41 -0700


smiklosovic commented on code in PR #4667:
URL: https://github.com/apache/cassandra/pull/4667#discussion_r2926039596



##########
src/java/org/apache/cassandra/db/compression/CompressionDictionaryScheduler.java:
##########
@@ -83,31 +87,56 @@ public void scheduleRefreshTask()
     }
 
     @Override
-    public void scheduleSSTableBasedTraining(ICompressionDictionaryTrainer 
trainer,
-                                             ColumnFamilyStore.RefViewFragment 
refViewFragment,
+    public void scheduleSSTableBasedTraining(ColumnFamilyStore.RefViewFragment 
refViewFragment,
+                                             CompressionParams 
compressionParams,
                                              
CompressionDictionaryTrainingConfig config,
+                                             Consumer<CompressionDictionary> 
listener,
                                              boolean force)
     {
-        if (!manualTrainingInProgress.compareAndSet(false, true))
+        if (!trainingInProgress.compareAndSet(false, true))
         {
             refViewFragment.close();
             throw new IllegalStateException("Training already in progress for 
table " + keyspaceName + '.' + tableName);
         }
 
-        logger.info("Starting SSTable-based dictionary training for {}.{} from 
{} SSTables",
-                    keyspaceName, tableName, refViewFragment.sstables.size());
+        ICompressionDictionaryTrainer trainer;
 
-        // Run the SSTableSamplingTask asynchronously
-        SSTableSamplingTask task = new SSTableSamplingTask(refViewFragment, 
trainer, config, force);
-        ScheduledExecutors.nonPeriodicTasks.submit(task);
+        try
+        {
+            trainer = ICompressionDictionaryTrainer.create(keyspaceName, 
tableName, compressionParams);

Review Comment:
   whole execution chain (from manager.train) does everything to postpone 
trainer creation until it is absolutely necessary and all is OK, as the 
instantiation of a trainer might be memory-wise very demanding (when max sample 
size is not trivial) as it allocates a direct ByteBuffer. We do not want to 
create a trainer allocating a  big buffer just in case to throw it away if 
something else goes south. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] CASSANDRA-21209 Rework ZSTD dictionary compression logic to create a trainer per training [cassandra]

Reply via email to