xiangfu0 commented on code in PR #18860:
URL: https://github.com/apache/pinot/pull/18860#discussion_r3487853657


##########
pinot-core/src/main/java/org/apache/pinot/core/data/manager/BaseTableDataManager.java:
##########
@@ -1108,6 +1111,47 @@ public void reloadSegment(String segmentName, 
IndexLoadingConfig indexLoadingCon
         _logger.warn("Skipping reload for segment: {} — concurrently offloaded 
after dispatch", segmentName);
         return;
       }
+      // For an upsert table with metadata TTL and snapshots enabled, a reload 
must not blindly re-scan the segment and
+      // re-add every primary key, as that would resurrect keys already 
expired (metadataTTL) or deleted
+      // (deletedKeysTTL). Instead we rebuild the upsert metadata from the 
persisted validDocIds snapshot (valid docs
+      // only). The snapshot is indexed by docId, so it is only meaningful 
when the segment content (CRC) is unchanged:
+      //   - normal reload: only proceed when the CRC matches and a snapshot 
exists, else fail the reload;
+      //   - forceDownload reload: always proceed, using the snapshot if 
present (operator accepts a possible CRC
+      //     mismatch), otherwise fall back to a regular full scan of the 
downloaded segment.
+      // Gated on snapshots being enabled so tables that never persist a 
snapshot (e.g. deletedKeysTTL-only without
+      // snapshot) keep the regular reload behavior instead of always failing.
+      byte[] validDocIdsSnapshotBytes = null;
+      UpsertContext upsertContext = isUpsertEnabled() ? 
_tableUpsertMetadataManager.getContext() : null;
+      if (upsertContext != null && upsertContext.isSnapshotEnabled()
+          && (upsertContext.getMetadataTTL() > 0 || 
upsertContext.getDeletedKeysTTL() > 0)) {
+        File snapshotFile = new 
File(SegmentDirectoryPaths.findSegmentDirectory(indexDir),
+            V1Constants.VALID_DOC_IDS_SNAPSHOT_FILE_NAME);
+        boolean snapshotExists = snapshotFile.exists();
+        boolean crcMatch = hasSameCRC(zkMetadata, localMetadata);
+        if (!forceDownload && (!crcMatch || !snapshotExists)) {
+          throw new IllegalStateException(String.format(
+              "Failing reload for segment: %s of upsert table with metadata 
TTL: %s because %s. Use a forceDownload "
+                  + "reload if you are okay with a CRC mismatch on the segment 
and confident the local validDocIds "
+                  + "snapshot's valid docs map to the deep-store segment; 
otherwise the forceDownload reload will scan "
+                  + "the entire segment data.", segmentName, 
_tableNameWithType,
+              !crcMatch ? "the segment CRC has changed from: " + 
localMetadata.getCrc() + " to: " + zkMetadata.getCrc()
+                  : "no validDocIds snapshot is available"));
+        }
+        if (snapshotExists) {
+          if (forceDownload && !crcMatch) {

Review Comment:
   `validDocIds` are docId-position based, so once `hasSameCRC()` says the 
downloaded segment is a different copy there is no proof this local snapshot 
still points at the same rows. Reusing it here can silently rebuild upsert 
state against the wrong docIds and drop live keys or preserve expired/deleted 
ones. Please fail closed on CRC/dataCRC mismatch unless the snapshot was 
produced from the downloaded segment itself.



##########
pinot-core/src/main/java/org/apache/pinot/core/data/manager/BaseTableDataManager.java:
##########
@@ -1108,6 +1111,47 @@ public void reloadSegment(String segmentName, 
IndexLoadingConfig indexLoadingCon
         _logger.warn("Skipping reload for segment: {} — concurrently offloaded 
after dispatch", segmentName);
         return;
       }
+      // For an upsert table with metadata TTL and snapshots enabled, a reload 
must not blindly re-scan the segment and
+      // re-add every primary key, as that would resurrect keys already 
expired (metadataTTL) or deleted
+      // (deletedKeysTTL). Instead we rebuild the upsert metadata from the 
persisted validDocIds snapshot (valid docs
+      // only). The snapshot is indexed by docId, so it is only meaningful 
when the segment content (CRC) is unchanged:
+      //   - normal reload: only proceed when the CRC matches and a snapshot 
exists, else fail the reload;
+      //   - forceDownload reload: always proceed, using the snapshot if 
present (operator accepts a possible CRC
+      //     mismatch), otherwise fall back to a regular full scan of the 
downloaded segment.
+      // Gated on snapshots being enabled so tables that never persist a 
snapshot (e.g. deletedKeysTTL-only without
+      // snapshot) keep the regular reload behavior instead of always failing.
+      byte[] validDocIdsSnapshotBytes = null;
+      UpsertContext upsertContext = isUpsertEnabled() ? 
_tableUpsertMetadataManager.getContext() : null;
+      if (upsertContext != null && upsertContext.isSnapshotEnabled()
+          && (upsertContext.getMetadataTTL() > 0 || 
upsertContext.getDeletedKeysTTL() > 0)) {
+        File snapshotFile = new 
File(SegmentDirectoryPaths.findSegmentDirectory(indexDir),
+            V1Constants.VALID_DOC_IDS_SNAPSHOT_FILE_NAME);
+        boolean snapshotExists = snapshotFile.exists();
+        boolean crcMatch = hasSameCRC(zkMetadata, localMetadata);
+        if (!forceDownload && (!crcMatch || !snapshotExists)) {

Review Comment:
   This changes the existing reload contract for clusters that set 
`instance.check.crc.on.segment.load=false`. Today that knob means "skip CRC 
enforcement and keep reloading the local segment"; this branch now hard-fails 
TTL+snapshot upsert reloads even when CRC checking is explicitly disabled. That 
is an operability/backward-compat regression for existing deployments. Please 
gate the fail-fast path on `shouldCheckCRCOnSegmentLoad()` or add a separate 
explicit knob for the stricter behavior.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to