swaminathanmanish commented on code in PR #15048:
URL: https://github.com/apache/pinot/pull/15048#discussion_r1954012191


##########
pinot-controller/src/main/java/org/apache/pinot/controller/helix/core/SegmentDeletionManager.java:
##########
@@ -282,6 +284,23 @@ protected void removeSegmentFromStore(String 
tableNameWithType, String segmentId
     }
   }
 
+  /**
+   * Gets URI for segment deletion by:
+   * 1. Fetching download URL from ZK metadata if available
+   * 2. Otherwise, constructing URI from data dir, table name and segment ID
+   */
+  private URI getFileToDeleteURI(String tableNameWithType, String segmentId) {
+    String segmentDownloadUrl =

Review Comment:
   A failure between segment metadata clean up and DS clean up can happen at 
which point we will not have the download URL, but standardization will help 
there. 
   
   We need 2 things
   1. Standardization on url (in BaseMultipleSegmentsConversionExecutor and any 
other places) - That'll fix forward clean up of new segments. Perhaps 
validation on downloadUrl format will catch things going forward. 
   2. Full scan of DS to clean up old ones with .tar.gz extension - This will 
be one time. 



##########
pinot-controller/src/main/java/org/apache/pinot/controller/helix/core/SegmentDeletionManager.java:
##########
@@ -235,7 +237,7 @@ protected void removeSegmentFromStore(String 
tableNameWithType, String segmentId
       long retentionMs = deletedSegmentsRetentionMs == null
           ? _defaultDeletedSegmentsRetentionMs : deletedSegmentsRetentionMs;
       String rawTableName = 
TableNameBuilder.extractRawTableName(tableNameWithType);
-      URI fileToDeleteURI = URIUtils.getUri(_dataDir, rawTableName, 
URIUtils.encode(segmentId));
+      URI fileToDeleteURI = getFileToDeleteURI(tableNameWithType, segmentId);

Review Comment:
   Since we observed that pinotFS has a different behavior when forceDelete is 
supplied, Can we verify in this method that the file has been deleted ?  The 
output of deletion should reflect whether deletion actually happened or not. 
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to