yifan-c commented on code in PR #4667:
URL: https://github.com/apache/cassandra/pull/4667#discussion_r2969093742


##########
src/java/org/apache/cassandra/db/compression/CompressionDictionaryScheduler.java:
##########
@@ -83,31 +88,59 @@ public void scheduleRefreshTask()
     }
 
     @Override
-    public void scheduleSSTableBasedTraining(ICompressionDictionaryTrainer 
trainer,
-                                             ColumnFamilyStore.RefViewFragment 
refViewFragment,
+    public void scheduleSSTableBasedTraining(ColumnFamilyStore.RefViewFragment 
refViewFragment,
+                                             CompressionParams 
compressionParams,
                                              
CompressionDictionaryTrainingConfig config,
+                                             Consumer<CompressionDictionary> 
listener,
                                              boolean force)
     {
-        if (!manualTrainingInProgress.compareAndSet(false, true))
+        if (!trainingInProgress.compareAndSet(false, true))
         {
             refViewFragment.close();
             throw new IllegalStateException("Training already in progress for 
table " + keyspaceName + '.' + tableName);
         }
 
-        logger.info("Starting SSTable-based dictionary training for {}.{} from 
{} SSTables",
-                    keyspaceName, tableName, refViewFragment.sstables.size());
+        ICompressionDictionaryTrainer trainer;
 
-        // Run the SSTableSamplingTask asynchronously
-        SSTableSamplingTask task = new SSTableSamplingTask(refViewFragment, 
trainer, config, force);
-        ScheduledExecutors.nonPeriodicTasks.submit(task);
+        try
+        {
+            trainer = ICompressionDictionaryTrainer.create(keyspaceName, 
tableName, compressionParams);
+            trainer.setDictionaryTrainedListener(listener);
+        }
+        catch (Throwable t)
+        {
+            trainingInProgress.set(false);
+            refViewFragment.close();
+            throw t;
+        }
+
+        if (trainer.start(config))
+        {
+            activeTrainer = trainer;
+            lastTrainingState.set(trainer.getTrainingState());
+            logger.info("Starting SSTable-based dictionary training for {}.{} 
from {} SSTables",
+                        keyspaceName, tableName, 
refViewFragment.sstables.size());
+
+            SSTableSamplingTask task = new 
SSTableSamplingTask(refViewFragment, trainer, config, force);
+            // trainer is eventually closed here, as well as indicating
+            // in manualTrainingInProgress that it was finished
+            ScheduledExecutors.nonPeriodicTasks.submit(task);
+        }
+        else
+        {
+            finishTraining(trainer.getTrainingState());
+            cleanup(refViewFragment, trainer);
+        }
     }
 
     /**
      * Cancels the in-progress manual training task.
      */
-    private void cancelManualTraining()
+    private void finishTraining(TrainingState trainingState)
     {
-        manualTrainingInProgress.compareAndSet(true, false);
+        activeTrainer = null;
+        lastTrainingState.set(trainingState);

Review Comment:
   I think `lastTrainingState.set(trainingState)` should be called before 
`activeTrainer = null`, so that `getLastTrainingState()` method always return 
the latest state.
   Probably add a comment to note the expected sequence.



##########
src/java/org/apache/cassandra/db/compression/CompressionDictionaryTrainingConfig.java:
##########
@@ -67,12 +85,121 @@ public Builder chunkSize(int chunkSize)
             return this;
         }
 
+        public Builder minTrainingFrequency(int minTrainingFrequency)
+        {
+            this.minTrainingFrequency = minTrainingFrequency;
+            return this;
+        }
+
         public CompressionDictionaryTrainingConfig build()
         {
             Preconditions.checkArgument(maxDictionarySize > 0, 
"maxDictionarySize must be positive");
             Preconditions.checkArgument(maxTotalSampleSize > 0, 
"maxTotalSampleSize must be positive");
             Preconditions.checkArgument(chunkSize > 0, "chunkSize must be 
positive");
+            Preconditions.checkArgument(minTrainingFrequency >= 0, "min 
training frequency must be positive");

Review Comment:
   The message says "must be positive" but the condition is `>= 0`. Since 0 is 
a valid value, let's update the message to use "non-negative"



##########
src/java/org/apache/cassandra/db/compression/ZstdDictionaryTrainer.java:
##########
@@ -298,8 +295,14 @@ public boolean start(CompressionDictionaryTrainingConfig 
trainingConfig)
         }
         catch (Exception e)
         {
-            logger.warn("Failed to create ZstdDictTrainer for {}.{}", 
keyspaceName, tableName, e);
-            failureMessage = "Failed to create ZstdDictTrainer: " + 
e.getMessage();
+            String message = String.format("Failed to create %s for %s.%s, 
reason: %s",
+                                           ZstdDictTrainer.class,

Review Comment:
   how about just print the simple name `ZstdDictTrainer.class.getSimpleName()`?
   
   I think `ZstdDictTrainer.class` prints `"class 
com.github.luben.zstd.ZstdDictTrainer"`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to