rakeshadr commented on code in PR #10074:
URL: https://github.com/apache/ozone/pull/10074#discussion_r3180476391


##########
hadoop-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/scm/ReconStorageContainerManagerFacade.java:
##########
@@ -432,34 +437,76 @@ public void start() {
     } else {
       initializePipelinesFromScm();
     }
-    LOG.debug("Started the SCM Container Info sync scheduler.");
-    long interval = ozoneConfiguration.getTimeDuration(
-        OZONE_RECON_SCM_SNAPSHOT_TASK_INTERVAL_DELAY,
-        OZONE_RECON_SCM_SNAPSHOT_TASK_INTERVAL_DEFAULT, TimeUnit.MILLISECONDS);
-    long initialDelay = ozoneConfiguration.getTimeDuration(
-        OZONE_RECON_SCM_SNAPSHOT_TASK_INITIAL_DELAY,
-        OZONE_RECON_SCM_SNAPSHOT_TASK_INITIAL_DELAY_DEFAULT,
+    // -----------------------------------------------------------------------
+    // Scheduler (incremental/targeted sync): runs every 1h (default).
+    //
+    // Each cycle calls decideSyncAction() — two lightweight count RPCs to SCM
+    // — and then:
+    //
+    //   |total drift| > threshold (default 100,000)
+    //       → full snapshot: replace Recon's entire SCM DB from SCM checkpoint
+    //
+    //   0 < |total drift| <= threshold
+    //       → targeted sync: 4-pass incremental repair
+    //
+    //   total drift = 0 but per-state drift (OPEN or QUASI_CLOSED) > 
threshold (default 5)
+    //       → targeted sync: corrects containers stuck in a stale lifecycle 
state
+    //
+    //   no drift detected
+    //       → no action this cycle
+    //
+    // Running this on a 1h cadence means container state discrepancies are
+    // detected and corrected without an unconditional periodic full snapshot.
+    // -----------------------------------------------------------------------
+    long syncInterval = ozoneConfiguration.getTimeDuration(
+        OZONE_RECON_SCM_CONTAINER_SYNC_TASK_INTERVAL_DELAY,
+        OZONE_RECON_SCM_CONTAINER_SYNC_TASK_INTERVAL_DEFAULT, 
TimeUnit.MILLISECONDS);
+    long syncInitialDelay = ozoneConfiguration.getTimeDuration(
+        OZONE_RECON_SCM_CONTAINER_SYNC_TASK_INITIAL_DELAY,
+        OZONE_RECON_SCM_CONTAINER_SYNC_TASK_INITIAL_DELAY_DEFAULT,
         TimeUnit.MILLISECONDS);
-    // This periodic sync with SCM container cache is needed because during
-    // the window when recon will be down and any container being added
-    // newly and went missing, that container will not be reported as missing 
by
-    // recon till there is a difference of container count equivalent to
-    // threshold value defined in "ozone.recon.scm.container.threshold"
-    // between SCM container cache and recon container cache.
+    LOG.debug("Started the SCM Container Info sync scheduler (interval={}ms, 
initialDelay={}ms).",
+        syncInterval, syncInitialDelay);
     scheduler.scheduleWithFixedDelay(() -> {
+      if (!isSyncDataFromSCMRunning.compareAndSet(false, true)) {
+        LOG.debug("SCM container info sync is already running; skipping this 
cycle.");
+        return;
+      }
       try {
-        boolean isSuccess = syncWithSCMContainerInfo();
-        if (!isSuccess) {
-          LOG.debug("SCM container info sync is already running.");
+        ReconStorageContainerSyncHelper.SyncAction action =
+            containerSyncHelper.decideSyncAction();
+        switch (action) {
+        case FULL_SNAPSHOT:
+          LOG.info("Tiered sync decision: FULL_SNAPSHOT. "
+              + "Replacing Recon SCM DB with fresh SCM checkpoint.");

Review Comment:
   With TARGETED_SYNC running every hour by default(configurable and can 
increase/decrease interval) keeping drift minimal, FULL_SNAPSHOT should rarely 
be needed in a steady state. The only real case where it can leads to 
FULL_SNAPSHOT, Recon was completely down for long hours to days, came back up 
and the drift will be more.
   
   Since FULL_SNAPSHOT is considerably high resource centric, how about 
introducing command line option to the users? 
   ```
   ozone admin recon trigger-scm-snapshot
   # prints:
   # WARNING: This downloads the full SCM checkpoint. On large clusters this
   # can be several GB and take minutes. SCM will be under I/O load.
   # Are you sure? (yes/no): _
   ```
   
   ```
   ozone admin recon scm-snapshot-status
   # Output:
   # Status: IN_PROGRESS
   # Started: 14:28:03
   # Phase: Downloading checkpoint from SCM (file transfer)
   # Duration so far: 4m 32s
   # Cancel: run 'ozone admin recon cancel-scm-snapshot' (safe until DB swap 
begins)
   ```
   
   ```
   ozone admin recon cancel-scm-snapshot  
   # SCM streams the RocksDB checkpoint files over RPC/HTTP to Recon. This is 
interruptible:
   
   # Sample Code showing the cancellation task.
   // scmServiceProvider.getSCMDBSnapshot() is a blocking call
   DBCheckpoint dbSnapshot = scmServiceProvider.getSCMDBSnapshot();
   If you run this on a dedicated Future / Thread, you can call 
future.cancel(true) which sends a thread interrupt. The underlying socket read 
will throw InterruptedIOException and the download stops immediately. This is 
efficient and clean — no partial writes to the final DB location, the temp 
checkpoint dir can be deleted.
   
   Future<?> snapshotFuture = executor.submit(() -> {
       DBCheckpoint snap = scmServiceProvider.getSCMDBSnapshot();
       initializeNewRdbStore(snap.getCheckpointLocation().toFile());
   });
   // cancel command arrives:
   snapshotFuture.cancel(true);  // interrupts the download thread
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to