akashrn5 commented on a change in pull request #4044:
URL: https://github.com/apache/carbondata/pull/4044#discussion_r537230858



##########
File path: 
core/src/main/java/org/apache/carbondata/core/mutate/CarbonUpdateUtil.java
##########
@@ -482,176 +482,6 @@ public boolean accept(CarbonFile file) {
 
   }
 
-  /**
-   * Handling of the clean up of old carbondata files, index files , delete 
delta,
-   * update status files.
-   * @param table clean up will be handled on this table.
-   * @param forceDelete if true then max query execution timeout will not be 
considered.
-   */
-  public static void cleanUpDeltaFiles(CarbonTable table, boolean forceDelete) 
throws IOException {
-
-    SegmentStatusManager ssm = new 
SegmentStatusManager(table.getAbsoluteTableIdentifier());
-
-    LoadMetadataDetails[] details =
-        SegmentStatusManager.readLoadMetadata(table.getMetadataPath());
-
-    SegmentUpdateStatusManager updateStatusManager = new 
SegmentUpdateStatusManager(table);
-    SegmentUpdateDetails[] segmentUpdateDetails = 
updateStatusManager.getUpdateStatusDetails();
-    // hold all the segments updated so that wen can check the delta files in 
them, ne need to
-    // check the others.
-    Set<String> updatedSegments = new HashSet<>();
-    for (SegmentUpdateDetails updateDetails : segmentUpdateDetails) {
-      updatedSegments.add(updateDetails.getSegmentName());
-    }
-
-    String validUpdateStatusFile = "";
-
-    boolean isAbortedFile = true;
-
-    boolean isInvalidFile = false;
-
-    // take the update status file name from 0th segment.
-    validUpdateStatusFile = ssm.getUpdateStatusFileName(details);
-    // scan through each segment.
-    for (LoadMetadataDetails segment : details) {
-      // if this segment is valid then only we will go for delta file deletion.
-      // if the segment is mark for delete or compacted then any way it will 
get deleted.
-      if (segment.getSegmentStatus() == SegmentStatus.SUCCESS
-              || segment.getSegmentStatus() == 
SegmentStatus.LOAD_PARTIAL_SUCCESS) {
-        // when there is no update operations done on table, then no need to 
go ahead. So
-        // just check the update delta start timestamp and proceed if not empty
-        if (!segment.getUpdateDeltaStartTimestamp().isEmpty()
-                || updatedSegments.contains(segment.getLoadName())) {
-          // take the list of files from this segment.
-          String segmentPath = CarbonTablePath.getSegmentPath(
-              table.getAbsoluteTableIdentifier().getTablePath(), 
segment.getLoadName());
-          CarbonFile segDir =
-              FileFactory.getCarbonFile(segmentPath);
-          CarbonFile[] allSegmentFiles = segDir.listFiles();
-
-          // now handle all the delete delta files which needs to be deleted.
-          // there are 2 cases here .
-          // 1. if the block is marked as compacted then the corresponding 
delta files
-          //    can be deleted if query exec timeout is done.
-          // 2. if the block is in success state then also there can be delete
-          //    delta compaction happened and old files can be deleted.
-
-          SegmentUpdateDetails[] updateDetails = 
updateStatusManager.readLoadMetadata();
-          for (SegmentUpdateDetails block : updateDetails) {
-            CarbonFile[] completeListOfDeleteDeltaFiles;
-            CarbonFile[] invalidDeleteDeltaFiles;
-
-            if 
(!block.getSegmentName().equalsIgnoreCase(segment.getLoadName())) {
-              continue;
-            }
-
-            // aborted scenario.
-            invalidDeleteDeltaFiles = updateStatusManager
-                .getDeleteDeltaInvalidFilesList(block, false,
-                    allSegmentFiles, isAbortedFile);
-            for (CarbonFile invalidFile : invalidDeleteDeltaFiles) {
-              boolean doForceDelete = true;
-              compareTimestampsAndDelete(invalidFile, doForceDelete, false);
-            }
-
-            // case 1
-            if (CarbonUpdateUtil.isBlockInvalid(block.getSegmentStatus())) {
-              completeListOfDeleteDeltaFiles = updateStatusManager
-                  .getDeleteDeltaInvalidFilesList(block, true,
-                      allSegmentFiles, isInvalidFile);
-              for (CarbonFile invalidFile : completeListOfDeleteDeltaFiles) {
-                compareTimestampsAndDelete(invalidFile, forceDelete, false);
-              }
-
-            } else {
-              invalidDeleteDeltaFiles = updateStatusManager
-                  .getDeleteDeltaInvalidFilesList(block, false,
-                      allSegmentFiles, isInvalidFile);
-              for (CarbonFile invalidFile : invalidDeleteDeltaFiles) {
-                compareTimestampsAndDelete(invalidFile, forceDelete, false);
-              }
-            }
-          }
-        }
-        // handle cleanup of merge index files and data files after small 
files merge happened for
-        // SI table
-        cleanUpDataFilesAfterSmallFilesMergeForSI(table, segment);
-      }
-    }
-
-    // delete the update table status files which are old.
-    if (null != validUpdateStatusFile && !validUpdateStatusFile.isEmpty()) {
-
-      final String updateStatusTimestamp = validUpdateStatusFile
-          
.substring(validUpdateStatusFile.lastIndexOf(CarbonCommonConstants.HYPHEN) + 1);
-
-      String tablePath = table.getAbsoluteTableIdentifier().getTablePath();
-      CarbonFile metaFolder = FileFactory.getCarbonFile(
-          CarbonTablePath.getMetadataPath(tablePath));
-
-      CarbonFile[] invalidUpdateStatusFiles = metaFolder.listFiles(new 
CarbonFileFilter() {
-        @Override
-        public boolean accept(CarbonFile file) {
-          if 
(file.getName().startsWith(CarbonCommonConstants.TABLEUPDATESTATUS_FILENAME)) {
-            // CHECK if this is valid or not.
-            // we only send invalid ones to delete.
-            return !file.getName().endsWith(updateStatusTimestamp);
-          }
-          return false;
-        }
-      });
-
-      for (CarbonFile invalidFile : invalidUpdateStatusFiles) {
-        compareTimestampsAndDelete(invalidFile, forceDelete, true);
-      }
-    }
-  }
-
-  /**
-   * this is the clean up added specifically for SI table, because after we 
merge the data files
-   * inside the secondary index table, we need to delete the stale carbondata 
files.
-   * refer org.apache.spark.sql.secondaryindex.rdd.CarbonSIRebuildRDD
-   */
-  private static void cleanUpDataFilesAfterSmallFilesMergeForSI(CarbonTable 
table,

Review comment:
       can you please create a jira for the points discussed offline to track 
these parts?

##########
File path: 
core/src/main/java/org/apache/carbondata/core/util/CarbonProperties.java
##########
@@ -2123,29 +2123,35 @@ public int getMaxSIRepairLimit(String dbName, String 
tableName) {
    * folder will take place
    */
   private void validateTrashFolderRetentionTime() {
-    String propertyValue = carbonProperties.getProperty(CarbonCommonConstants
-        .CARBON_TRASH_RETENTION_DAYS, Integer.toString(CarbonCommonConstants
-        .CARBON_TRASH_RETENTION_DAYS_DEFAULT));
+    String propertyValue = carbonProperties.getProperty(

Review comment:
       can you use `getTrashFolderRetentionTime` here also? so you avoid 
integer parsing again

##########
File path: 
core/src/main/java/org/apache/carbondata/core/util/CleanFilesUtil.java
##########
@@ -163,8 +164,13 @@ private static void getStaleSegmentFiles(CarbonTable 
carbonTable, List<String> s
     }
     Set<String> loadNameSet = 
Arrays.stream(details).map(LoadMetadataDetails::getLoadName)
         .collect(Collectors.toSet());
-    List<String> staleSegments = segmentFiles.stream().filter(segmentFile -> 
!loadNameSet.contains(
-        
DataFileUtil.getSegmentNoFromSegmentFile(segmentFile))).collect(Collectors.toList());
+    // get all stale segment files, not include compaction segments

Review comment:
       can you please add two more line in comment saying why we need to 
exclude, so that next time any developer shouldnt remove by mistake. you can 
add like `During compaction we dont make entry in table status, so if 
parallelly clean files is triggered, it consider as stale and move to trash`

##########
File path: 
integration/spark/src/main/scala/org/apache/carbondata/events/package.scala
##########
@@ -0,0 +1,27 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata
+
+package object events {
+  def withEvents(preEvent: Event, postEvent: Event)(func: => Unit): Unit = {

Review comment:
       can you raise a jira and refactor later for all the events? as now its 
handled for clean files only

##########
File path: docs/clean-files.md
##########
@@ -38,6 +38,9 @@ The above clean files command will clean Marked For Delete 
and Compacted segment
    ``` 
   Once the timestamp subdirectory is expired as per the configured expiration 
day value, that subdirectory is deleted from the trash folder in the subsequent 
clean files command.
 
+**NOTE**:
+  * In trash folder, the retention time is "carbon.trash.retention.days"
+  * Outside trash folder, the retention time is max value of two 
properties("carbon.trash.retention.days", "max.query.execution.time")

Review comment:
       ```suggestion
     * Outside trash folder(Segment Directories in store path), the retention 
time is Max("carbon.trash.retention.days", "max.query.execution.time")
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to