swaminathanmanish commented on code in PR #15142:
URL: https://github.com/apache/pinot/pull/15142#discussion_r1983669497
##########
pinot-controller/src/main/java/org/apache/pinot/controller/helix/core/retention/RetentionManager.java:
##########
@@ -136,11 +164,86 @@ private void manageRetentionForOfflineTable(String
offlineTableName, RetentionSt
}
}
+ /**
+ * Identifies segments in deepstore that are ready for deletion based on the
retention strategy.
+ *
+ * This method finds segments that are beyond the retention period and are
ready to be purged.
+ * It only considers segments that do not have entries in ZooKeeper metadata.
+ * The lastModified time of the file in deepstore is used to determine
whether the segment
+ * should be retained or purged.
+ *
+ * @param tableNameWithType Name of the offline table
+ * @param retentionStrategy Strategy to determine if a segment should be
purged
+ * @param segmentsToExclude List of segment names that should be excluded
from deletion
+ * @return List of segment names that should be deleted from deepstore
+ * @throws IOException If there's an error accessing the filesystem
+ */
+ private List<String> getSegmentsToDeleteFromDeepstore(String
tableNameWithType, RetentionStrategy retentionStrategy,
+ List<String> segmentsToExclude)
+ throws IOException {
+
+ List<String> segmentsToDelete = new ArrayList<>();
+ String rawTableName =
TableNameBuilder.extractRawTableName(tableNameWithType);
+ URI tableDataUri =
URIUtils.getUri(_pinotHelixResourceManager.getDataDir(), rawTableName);
+ PinotFS pinotFS = PinotFSFactory.create(tableDataUri.getScheme());
+
+ List<FileMetadata> deepstoreFiles =
pinotFS.listFilesWithMetadata(tableDataUri, false);
Review Comment:
Can we add a metric to get count of files that are in deep store but not in
segmentZkMetadata ? This is to see how many dangling files are in the deep
store
##########
pinot-controller/src/main/java/org/apache/pinot/controller/helix/core/retention/RetentionManager.java:
##########
@@ -136,11 +164,86 @@ private void manageRetentionForOfflineTable(String
offlineTableName, RetentionSt
}
}
+ /**
+ * Identifies segments in deepstore that are ready for deletion based on the
retention strategy.
+ *
+ * This method finds segments that are beyond the retention period and are
ready to be purged.
+ * It only considers segments that do not have entries in ZooKeeper metadata.
+ * The lastModified time of the file in deepstore is used to determine
whether the segment
+ * should be retained or purged.
+ *
+ * @param tableNameWithType Name of the offline table
+ * @param retentionStrategy Strategy to determine if a segment should be
purged
+ * @param segmentsToExclude List of segment names that should be excluded
from deletion
+ * @return List of segment names that should be deleted from deepstore
+ * @throws IOException If there's an error accessing the filesystem
+ */
+ private List<String> getSegmentsToDeleteFromDeepstore(String
tableNameWithType, RetentionStrategy retentionStrategy,
+ List<String> segmentsToExclude)
+ throws IOException {
+
+ List<String> segmentsToDelete = new ArrayList<>();
+ String rawTableName =
TableNameBuilder.extractRawTableName(tableNameWithType);
+ URI tableDataUri =
URIUtils.getUri(_pinotHelixResourceManager.getDataDir(), rawTableName);
+ PinotFS pinotFS = PinotFSFactory.create(tableDataUri.getScheme());
+
+ List<FileMetadata> deepstoreFiles =
pinotFS.listFilesWithMetadata(tableDataUri, false);
+
+ for (FileMetadata fileMetadata : deepstoreFiles) {
+ if (fileMetadata.isDirectory()) {
+ continue;
+ }
+
+ String segmentName = extractSegmentName(fileMetadata.getFilePath());
+ if (Strings.isEmpty(segmentName) ||
segmentsToExclude.contains(segmentName)) {
+ continue;
+ }
+
+ // determine whether the segment should be perged or not based on the
last modified time of the file
+ long lastModifiedTime = fileMetadata.getLastModifiedTime();
+
+ if (retentionStrategy.isPurgeable(segmentName, tableNameWithType,
lastModifiedTime)) {
Review Comment:
Sycned and learnt that default is 7 days of retention
##########
pinot-controller/src/main/java/org/apache/pinot/controller/helix/core/retention/RetentionManager.java:
##########
@@ -136,11 +164,86 @@ private void manageRetentionForOfflineTable(String
offlineTableName, RetentionSt
}
}
+ /**
+ * Identifies segments in deepstore that are ready for deletion based on the
retention strategy.
+ *
+ * This method finds segments that are beyond the retention period and are
ready to be purged.
+ * It only considers segments that do not have entries in ZooKeeper metadata.
+ * The lastModified time of the file in deepstore is used to determine
whether the segment
+ * should be retained or purged.
+ *
+ * @param tableNameWithType Name of the offline table
+ * @param retentionStrategy Strategy to determine if a segment should be
purged
+ * @param segmentsToExclude List of segment names that should be excluded
from deletion
+ * @return List of segment names that should be deleted from deepstore
+ * @throws IOException If there's an error accessing the filesystem
+ */
+ private List<String> getSegmentsToDeleteFromDeepstore(String
tableNameWithType, RetentionStrategy retentionStrategy,
+ List<String> segmentsToExclude)
+ throws IOException {
+
+ List<String> segmentsToDelete = new ArrayList<>();
+ String rawTableName =
TableNameBuilder.extractRawTableName(tableNameWithType);
+ URI tableDataUri =
URIUtils.getUri(_pinotHelixResourceManager.getDataDir(), rawTableName);
+ PinotFS pinotFS = PinotFSFactory.create(tableDataUri.getScheme());
+
+ List<FileMetadata> deepstoreFiles =
pinotFS.listFilesWithMetadata(tableDataUri, false);
+
+ for (FileMetadata fileMetadata : deepstoreFiles) {
+ if (fileMetadata.isDirectory()) {
+ continue;
+ }
+
+ String segmentName = extractSegmentName(fileMetadata.getFilePath());
+ if (Strings.isEmpty(segmentName) ||
segmentsToExclude.contains(segmentName)) {
+ continue;
+ }
+
+ // determine whether the segment should be perged or not based on the
last modified time of the file
+ long lastModifiedTime = fileMetadata.getLastModifiedTime();
+
+ if (retentionStrategy.isPurgeable(segmentName, tableNameWithType,
lastModifiedTime)) {
+ segmentsToDelete.add(segmentName);
+ }
+ }
+
+ return segmentsToDelete;
+ }
+
+ @Nullable
+ private String extractSegmentName(@Nullable String filePath) {
+ if (Strings.isEmpty(filePath)) {
+ return null;
+ }
+ String segmentName = filePath.substring(filePath.lastIndexOf("/") + 1);
+ if (segmentName.endsWith(TarCompressionUtils.TAR_GZ_FILE_EXTENSION)) {
+ segmentName = segmentName.substring(0, segmentName.length() -
TarCompressionUtils.TAR_GZ_FILE_EXTENSION.length());
+ }
+ return segmentName;
+ }
+
private void manageRetentionForRealtimeTable(String realtimeTableName,
RetentionStrategy retentionStrategy) {
+ List<SegmentZKMetadata> segmentZKMetadataList =
_pinotHelixResourceManager.getSegmentsZKMetadata(realtimeTableName);
Review Comment:
Understood that we need segmentZkmetadata for endTime.
##########
pinot-controller/src/main/java/org/apache/pinot/controller/helix/core/retention/RetentionManager.java:
##########
@@ -124,8 +138,19 @@ private void manageRetentionForTable(TableConfig
tableConfig) {
}
private void manageRetentionForOfflineTable(String offlineTableName,
RetentionStrategy retentionStrategy) {
+ List<SegmentZKMetadata> segmentZKMetadataList =
_pinotHelixResourceManager.getSegmentsZKMetadata(offlineTableName);
Review Comment:
I initially did not realize that we already loop through all of segmentZk
metadata to check if segment is purge-able (retentionStrategy.isPurgeable).
Yes its possible for a segment to be Zk metadata but not in IS @klsince, but
we want to handle that case as well right?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]