[GitHub] [carbondata] akashrn5 commented on a change in pull request #4044: [CARBONDATA-4062] Refactor clean files feature

2020-12-07 Thread GitBox


akashrn5 commented on a change in pull request #4044:
URL: https://github.com/apache/carbondata/pull/4044#discussion_r537484782



##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/events/CleanFilesPostEventListener.scala
##
@@ -48,30 +50,59 @@ class CleanFilesPostEventListener extends 
OperationEventListener with Logging {
 event match {
   case cleanFilesPostEvent: CleanFilesPostEvent =>
 LOGGER.info("Clean files post event listener called")
-val carbonTable = cleanFilesPostEvent.carbonTable
-val indexTables = CarbonIndexUtil
-  .getIndexCarbonTables(carbonTable, cleanFilesPostEvent.sparkSession)
-val isForceDelete = cleanFilesPostEvent.ifForceDelete
-val inProgressSegmentsClean = cleanFilesPostEvent.cleanStaleInProgress
-indexTables.foreach { indexTable =>
-  val partitions: Option[Seq[PartitionSpec]] = 
CarbonFilters.getPartitions(
-Seq.empty[Expression],
-cleanFilesPostEvent.sparkSession,
-indexTable)
-  SegmentStatusManager.deleteLoadsAndUpdateMetadata(
-  indexTable, isForceDelete, partitions.map(_.asJava).orNull, 
inProgressSegmentsClean,
-true)
-  CarbonUpdateUtil.cleanUpDeltaFiles(indexTable, true)
-  cleanUpUnwantedSegmentsOfSIAndUpdateMetadata(indexTable, carbonTable)
-}
+cleanFilesForIndex(

Review comment:
   please see if you can do along with  
[CARBONDATA-4074](https://issues.apache.org/jira/browse/CARBONDATA-4074)





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] akashrn5 commented on a change in pull request #4044: [CARBONDATA-4062] Refactor clean files feature

2020-12-06 Thread GitBox


akashrn5 commented on a change in pull request #4044:
URL: https://github.com/apache/carbondata/pull/4044#discussion_r537288377



##
File path: 
integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataRDDFactory.scala
##
@@ -577,38 +556,46 @@ object CarbonDataRDDFactory {
   LOGGER.info("Data load is successful for " +
   s"${ carbonLoadModel.getDatabaseName }.${ 
carbonLoadModel.getTableName }")
 }
-
-// code to handle Pre-Priming cache for loading
-
-if (!StringUtils.isEmpty(carbonLoadModel.getSegmentId)) {
-  DistributedRDDUtils.triggerPrepriming(sqlContext.sparkSession, 
carbonTable, Seq(),
-operationContext, hadoopConf, List(carbonLoadModel.getSegmentId))
-}
-try {
-  // compaction handling
-  if (carbonTable.isHivePartitionTable) {
-carbonLoadModel.setFactTimeStamp(System.currentTimeMillis())
-  }
-  val compactedSegments = new util.ArrayList[String]()
-  handleSegmentMerging(sqlContext,
-carbonLoadModel
-  .getCopyWithPartition(carbonLoadModel.getCsvHeader, 
carbonLoadModel.getCsvDelimiter),
-carbonTable,
-compactedSegments,
-operationContext)
-  carbonLoadModel.setMergedSegmentIds(compactedSegments)
-  writtenSegment
-} catch {
-  case e: Exception =>
-LOGGER.error(
-  "Auto-Compaction has failed. Ignoring this exception because 
the" +
-  " load is passed.", e)
-writtenSegment
-}
+isLoadingCommitted = true
+writtenSegment
   }
 } finally {
   // Release the segment lock, once table status is finally updated
   segmentLock.unlock()
+  if (isLoadingCommitted) {
+triggerEventsAfterLoading(sqlContext, carbonLoadModel, hadoopConf, 
operationContext)
+  }
+}
+  }
+
+  private def triggerEventsAfterLoading(
+  sqlContext: SQLContext,
+  carbonLoadModel: CarbonLoadModel,
+  hadoopConf: Configuration,
+  operationContext: OperationContext): Unit = {
+val carbonTable = carbonLoadModel.getCarbonDataLoadSchema.getCarbonTable
+// code to handle Pre-Priming cache for loading
+if (!StringUtils.isEmpty(carbonLoadModel.getSegmentId)) {
+  DistributedRDDUtils.triggerPrepriming(sqlContext.sparkSession, 
carbonTable, Seq(),

Review comment:
   calling two times will increase time, better to have a logic to find out 
whether compacted or not and based on that send the segments to pre-prime only 
once, its better.
   
   Also in `DistributedRDDUtils.scala`, line number 376, new 
`SegmentUpdateStatusManager `is created which is not used, its simply reading 
the table status file and update status, please check if it can be removed. 
Just another input to optimization.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] akashrn5 commented on a change in pull request #4044: [CARBONDATA-4062] Refactor clean files feature

2020-12-06 Thread GitBox


akashrn5 commented on a change in pull request #4044:
URL: https://github.com/apache/carbondata/pull/4044#discussion_r537271584



##
File path: 
integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataRDDFactory.scala
##
@@ -263,15 +249,7 @@ object CarbonDataRDDFactory {
 throw new Exception("Exception in compaction " + 
exception.getMessage)
   }
 } finally {
-  executor.shutdownNow()
-  try {
-compactor.deletePartialLoadsInCompaction()

Review comment:
   better to add proper description in the PR and handle here only, instead 
of handling again in other PR, as review will be easy and to avoid duplicate 
working, should be fine @ajantha-bhat ?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] akashrn5 commented on a change in pull request #4044: [CARBONDATA-4062] Refactor clean files feature

2020-12-06 Thread GitBox


akashrn5 commented on a change in pull request #4044:
URL: https://github.com/apache/carbondata/pull/4044#discussion_r537271258



##
File path: 
integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataRDDFactory.scala
##
@@ -577,38 +556,46 @@ object CarbonDataRDDFactory {
   LOGGER.info("Data load is successful for " +
   s"${ carbonLoadModel.getDatabaseName }.${ 
carbonLoadModel.getTableName }")
 }
-
-// code to handle Pre-Priming cache for loading
-
-if (!StringUtils.isEmpty(carbonLoadModel.getSegmentId)) {
-  DistributedRDDUtils.triggerPrepriming(sqlContext.sparkSession, 
carbonTable, Seq(),
-operationContext, hadoopConf, List(carbonLoadModel.getSegmentId))
-}
-try {
-  // compaction handling
-  if (carbonTable.isHivePartitionTable) {
-carbonLoadModel.setFactTimeStamp(System.currentTimeMillis())
-  }
-  val compactedSegments = new util.ArrayList[String]()
-  handleSegmentMerging(sqlContext,
-carbonLoadModel
-  .getCopyWithPartition(carbonLoadModel.getCsvHeader, 
carbonLoadModel.getCsvDelimiter),
-carbonTable,
-compactedSegments,
-operationContext)
-  carbonLoadModel.setMergedSegmentIds(compactedSegments)
-  writtenSegment
-} catch {
-  case e: Exception =>
-LOGGER.error(
-  "Auto-Compaction has failed. Ignoring this exception because 
the" +
-  " load is passed.", e)
-writtenSegment
-}
+isLoadingCommitted = true
+writtenSegment
   }
 } finally {
   // Release the segment lock, once table status is finally updated
   segmentLock.unlock()
+  if (isLoadingCommitted) {
+triggerEventsAfterLoading(sqlContext, carbonLoadModel, hadoopConf, 
operationContext)
+  }
+}
+  }
+
+  private def triggerEventsAfterLoading(
+  sqlContext: SQLContext,
+  carbonLoadModel: CarbonLoadModel,
+  hadoopConf: Configuration,
+  operationContext: OperationContext): Unit = {
+val carbonTable = carbonLoadModel.getCarbonDataLoadSchema.getCarbonTable
+// code to handle Pre-Priming cache for loading
+if (!StringUtils.isEmpty(carbonLoadModel.getSegmentId)) {
+  DistributedRDDUtils.triggerPrepriming(sqlContext.sparkSession, 
carbonTable, Seq(),

Review comment:
   yes, agree with @ajantha-bhat , if auto compaction success pre-prime 
that segment else just load segment





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] akashrn5 commented on a change in pull request #4044: [CARBONDATA-4062] Refactor clean files feature

2020-12-06 Thread GitBox


akashrn5 commented on a change in pull request #4044:
URL: https://github.com/apache/carbondata/pull/4044#discussion_r537270696



##
File path: 
core/src/main/java/org/apache/carbondata/core/util/CarbonProperties.java
##
@@ -2123,29 +2123,35 @@ public int getMaxSIRepairLimit(String dbName, String 
tableName) {
* folder will take place
*/
   private void validateTrashFolderRetentionTime() {
-String propertyValue = carbonProperties.getProperty(CarbonCommonConstants
-.CARBON_TRASH_RETENTION_DAYS, Integer.toString(CarbonCommonConstants
-.CARBON_TRASH_RETENTION_DAYS_DEFAULT));
+String propertyValue = carbonProperties.getProperty(

Review comment:
   @ajantha-bhat the `getTrashFolderRetentionTime ` method implementation 
is just IntegerparseInt(Carbonproperties.getproperty) so, here also its the 
same thing right. What exactly you mean its already stored and not validation?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] akashrn5 commented on a change in pull request #4044: [CARBONDATA-4062] Refactor clean files feature

2020-12-06 Thread GitBox


akashrn5 commented on a change in pull request #4044:
URL: https://github.com/apache/carbondata/pull/4044#discussion_r537252858



##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/events/CleanFilesPostEventListener.scala
##
@@ -48,30 +50,61 @@ class CleanFilesPostEventListener extends 
OperationEventListener with Logging {
 event match {
   case cleanFilesPostEvent: CleanFilesPostEvent =>
 LOGGER.info("Clean files post event listener called")
-val carbonTable = cleanFilesPostEvent.carbonTable
-val indexTables = CarbonIndexUtil
-  .getIndexCarbonTables(carbonTable, cleanFilesPostEvent.sparkSession)
-val isForceDelete = cleanFilesPostEvent.ifForceDelete
-val inProgressSegmentsClean = cleanFilesPostEvent.cleanStaleInProgress
-indexTables.foreach { indexTable =>
-  val partitions: Option[Seq[PartitionSpec]] = 
CarbonFilters.getPartitions(
-Seq.empty[Expression],
-cleanFilesPostEvent.sparkSession,
-indexTable)
-  SegmentStatusManager.deleteLoadsAndUpdateMetadata(
-  indexTable, isForceDelete, partitions.map(_.asJava).orNull, 
inProgressSegmentsClean,
-true)
-  CarbonUpdateUtil.cleanUpDeltaFiles(indexTable, true)
-  cleanUpUnwantedSegmentsOfSIAndUpdateMetadata(indexTable, carbonTable)
-}
+cleanFilesForIndex(
+  cleanFilesPostEvent.sparkSession,
+  cleanFilesPostEvent.carbonTable,
+  cleanFilesPostEvent.options.getOrElse("force", "false").toBoolean,
+  cleanFilesPostEvent.options.getOrElse("stale_inprogress", 
"false").toBoolean)
+
+cleanFilesForMv(
+  cleanFilesPostEvent.sparkSession,
+  cleanFilesPostEvent.carbonTable,
+  cleanFilesPostEvent.options)
+}
+  }
+
+  private def cleanFilesForIndex(
+  sparkSession: SparkSession,
+  carbonTable: CarbonTable,
+  isForceDelete: Boolean,
+  cleanStaleInProgress: Boolean
+  ): Unit = {
+val indexTables = CarbonIndexUtil
+  .getIndexCarbonTables(carbonTable, sparkSession)
+indexTables.foreach { indexTable =>
+  val partitions: Option[Seq[PartitionSpec]] = CarbonFilters.getPartitions(
+Seq.empty[Expression],
+sparkSession,
+indexTable)
+  SegmentStatusManager.deleteLoadsAndUpdateMetadata(
+indexTable, isForceDelete, partitions.map(_.asJava).orNull, 
cleanStaleInProgress,
+true)
+  cleanUpUnwantedSegmentsOfSIAndUpdateMetadata(indexTable, carbonTable)
+}
+  }
+
+  private def cleanFilesForMv(
+  sparkSession: SparkSession,
+  carbonTable: CarbonTable,
+  options: Map[String, String]
+  ): Unit = {
+val viewSchemas = 
MVManagerInSpark.get(sparkSession).getSchemasOnTable(carbonTable)
+if (!viewSchemas.isEmpty) {
+  viewSchemas.asScala.map { schema =>

Review comment:
   replace  `schema ` with a placeholder as its not used





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] akashrn5 commented on a change in pull request #4044: [CARBONDATA-4062] Refactor clean files feature

2020-12-06 Thread GitBox


akashrn5 commented on a change in pull request #4044:
URL: https://github.com/apache/carbondata/pull/4044#discussion_r537252530



##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/events/CleanFilesPostEventListener.scala
##
@@ -48,30 +50,61 @@ class CleanFilesPostEventListener extends 
OperationEventListener with Logging {
 event match {
   case cleanFilesPostEvent: CleanFilesPostEvent =>
 LOGGER.info("Clean files post event listener called")
-val carbonTable = cleanFilesPostEvent.carbonTable
-val indexTables = CarbonIndexUtil
-  .getIndexCarbonTables(carbonTable, cleanFilesPostEvent.sparkSession)
-val isForceDelete = cleanFilesPostEvent.ifForceDelete
-val inProgressSegmentsClean = cleanFilesPostEvent.cleanStaleInProgress
-indexTables.foreach { indexTable =>
-  val partitions: Option[Seq[PartitionSpec]] = 
CarbonFilters.getPartitions(
-Seq.empty[Expression],
-cleanFilesPostEvent.sparkSession,
-indexTable)
-  SegmentStatusManager.deleteLoadsAndUpdateMetadata(
-  indexTable, isForceDelete, partitions.map(_.asJava).orNull, 
inProgressSegmentsClean,
-true)
-  CarbonUpdateUtil.cleanUpDeltaFiles(indexTable, true)
-  cleanUpUnwantedSegmentsOfSIAndUpdateMetadata(indexTable, carbonTable)
-}
+cleanFilesForIndex(
+  cleanFilesPostEvent.sparkSession,
+  cleanFilesPostEvent.carbonTable,
+  cleanFilesPostEvent.options.getOrElse("force", "false").toBoolean,
+  cleanFilesPostEvent.options.getOrElse("stale_inprogress", 
"false").toBoolean)
+
+cleanFilesForMv(
+  cleanFilesPostEvent.sparkSession,
+  cleanFilesPostEvent.carbonTable,
+  cleanFilesPostEvent.options)
+}
+  }
+
+  private def cleanFilesForIndex(
+  sparkSession: SparkSession,
+  carbonTable: CarbonTable,
+  isForceDelete: Boolean,
+  cleanStaleInProgress: Boolean
+  ): Unit = {
+val indexTables = CarbonIndexUtil
+  .getIndexCarbonTables(carbonTable, sparkSession)
+indexTables.foreach { indexTable =>
+  val partitions: Option[Seq[PartitionSpec]] = CarbonFilters.getPartitions(
+Seq.empty[Expression],
+sparkSession,
+indexTable)
+  SegmentStatusManager.deleteLoadsAndUpdateMetadata(
+indexTable, isForceDelete, partitions.map(_.asJava).orNull, 
cleanStaleInProgress,
+true)
+  cleanUpUnwantedSegmentsOfSIAndUpdateMetadata(indexTable, carbonTable)
+}
+  }
+
+  private def cleanFilesForMv(
+  sparkSession: SparkSession,
+  carbonTable: CarbonTable,
+  options: Map[String, String]
+  ): Unit = {

Review comment:
   same as above





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] akashrn5 commented on a change in pull request #4044: [CARBONDATA-4062] Refactor clean files feature

2020-12-06 Thread GitBox


akashrn5 commented on a change in pull request #4044:
URL: https://github.com/apache/carbondata/pull/4044#discussion_r537252406



##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/events/CleanFilesPostEventListener.scala
##
@@ -48,30 +50,61 @@ class CleanFilesPostEventListener extends 
OperationEventListener with Logging {
 event match {
   case cleanFilesPostEvent: CleanFilesPostEvent =>
 LOGGER.info("Clean files post event listener called")
-val carbonTable = cleanFilesPostEvent.carbonTable
-val indexTables = CarbonIndexUtil
-  .getIndexCarbonTables(carbonTable, cleanFilesPostEvent.sparkSession)
-val isForceDelete = cleanFilesPostEvent.ifForceDelete
-val inProgressSegmentsClean = cleanFilesPostEvent.cleanStaleInProgress
-indexTables.foreach { indexTable =>
-  val partitions: Option[Seq[PartitionSpec]] = 
CarbonFilters.getPartitions(
-Seq.empty[Expression],
-cleanFilesPostEvent.sparkSession,
-indexTable)
-  SegmentStatusManager.deleteLoadsAndUpdateMetadata(
-  indexTable, isForceDelete, partitions.map(_.asJava).orNull, 
inProgressSegmentsClean,
-true)
-  CarbonUpdateUtil.cleanUpDeltaFiles(indexTable, true)
-  cleanUpUnwantedSegmentsOfSIAndUpdateMetadata(indexTable, carbonTable)
-}
+cleanFilesForIndex(
+  cleanFilesPostEvent.sparkSession,
+  cleanFilesPostEvent.carbonTable,
+  cleanFilesPostEvent.options.getOrElse("force", "false").toBoolean,
+  cleanFilesPostEvent.options.getOrElse("stale_inprogress", 
"false").toBoolean)
+
+cleanFilesForMv(
+  cleanFilesPostEvent.sparkSession,
+  cleanFilesPostEvent.carbonTable,
+  cleanFilesPostEvent.options)
+}
+  }
+
+  private def cleanFilesForIndex(
+  sparkSession: SparkSession,
+  carbonTable: CarbonTable,
+  isForceDelete: Boolean,
+  cleanStaleInProgress: Boolean
+  ): Unit = {

Review comment:
   move this line above





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] akashrn5 commented on a change in pull request #4044: [CARBONDATA-4062] Refactor clean files feature

2020-12-06 Thread GitBox


akashrn5 commented on a change in pull request #4044:
URL: https://github.com/apache/carbondata/pull/4044#discussion_r537245017



##
File path: 
integration/spark/src/main/scala/org/apache/carbondata/trash/DataTrashManager.scala
##
@@ -0,0 +1,168 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.trash
+
+import scala.collection.JavaConverters._
+
+import org.apache.carbondata.common.logging.LogServiceFactory
+import org.apache.carbondata.core.constants.CarbonCommonConstants
+import org.apache.carbondata.core.datastore.filesystem.{CarbonFile, 
CarbonFileFilter}
+import org.apache.carbondata.core.datastore.impl.FileFactory
+import org.apache.carbondata.core.indexstore.PartitionSpec
+import org.apache.carbondata.core.locks.{CarbonLockUtil, ICarbonLock, 
LockUsage}
+import org.apache.carbondata.core.metadata.SegmentFileStore
+import org.apache.carbondata.core.metadata.schema.table.CarbonTable
+import org.apache.carbondata.core.statusmanager.SegmentStatusManager
+import org.apache.carbondata.core.util.{CarbonProperties, CarbonUtil, 
CleanFilesUtil, TrashUtil}
+import org.apache.carbondata.core.util.path.CarbonTablePath
+
+object DataTrashManager {
+  private val LOGGER = 
LogServiceFactory.getLogService(this.getClass.getCanonicalName)
+
+  /**
+   * clean garbage data
+   *  1. check and clean .Trash folder
+   *  2. move stale segments without metadata into .Trash
+   *  3. clean expired segments(MARKED_FOR_DELETE, Compacted, In Progress)
+   *
+   * @param isForceDeleteclean the MFD/Compacted segments immediately 
and empty trash folder
+   * @param cleanStaleInProgress clean the In Progress segments based on 
retention time,
+   * it will clean immediately when force is true
+   */
+  def cleanGarbageData(
+  carbonTable: CarbonTable,
+  isForceDelete: Boolean,
+  cleanStaleInProgress: Boolean,
+  partitionSpecs: Option[Seq[PartitionSpec]] = None): Unit = {
+// if isForceDelete = true need to throw exception if 
CARBON_CLEAN_FILES_FORCE_ALLOWED is false
+if (isForceDelete && 
!CarbonProperties.getInstance().isCleanFilesForceAllowed) {
+  LOGGER.error("Clean Files with Force option deletes the physical data 
and it cannot be" +
+" recovered. It is disabled by default, to enable clean files with 
force option," +
+" set " + CarbonCommonConstants.CARBON_CLEAN_FILES_FORCE_ALLOWED + " 
to true")
+  throw new RuntimeException("Clean files with force operation not 
permitted by default")
+}
+var carbonCleanFilesLock: ICarbonLock = null
+try {
+  val errorMsg = "Clean files request is failed for " +
+s"${ carbonTable.getQualifiedName }" +
+". Not able to acquire the clean files lock due to another clean files 
" +
+"operation is running in the background."
+  carbonCleanFilesLock = 
CarbonLockUtil.getLockObject(carbonTable.getAbsoluteTableIdentifier,
+LockUsage.CLEAN_FILES_LOCK, errorMsg)
+  // step 1: check and clean trash folder
+  checkAndCleanTrashFolder(carbonTable, isForceDelete)
+  // step 2: move stale segments which are not exists in metadata into 
.Trash
+  moveStaleSegmentsToTrash(carbonTable)
+  // step 3: clean expired segments(MARKED_FOR_DELETE, Compacted, In 
Progress)
+  cleanExpiredSegments(carbonTable, isForceDelete, cleanStaleInProgress, 
partitionSpecs)
+} finally {
+  if (carbonCleanFilesLock != null) {
+CarbonLockUtil.fileUnlock(carbonCleanFilesLock, 
LockUsage.CLEAN_FILES_LOCK)
+  }
+}
+  }
+
+  private def checkAndCleanTrashFolder(carbonTable: CarbonTable, 
isForceDelete: Boolean): Unit = {
+if (isForceDelete) {
+  // empty the trash folder
+  TrashUtil.emptyTrash(carbonTable.getTablePath)
+} else {
+  // clear trash based on timestamp
+  TrashUtil.deleteExpiredDataFromTrash(carbonTable.getTablePath)
+}
+  }
+
+  /**
+   * move stale segment to trash folder, but not include compaction segment
+   */
+  private def moveStaleSegmentsToTrash(carbonTable: CarbonTable): Unit = {
+if (carbonTable.isHivePartitionTable) {
+  

[GitHub] [carbondata] akashrn5 commented on a change in pull request #4044: [CARBONDATA-4062] Refactor clean files feature

2020-12-06 Thread GitBox


akashrn5 commented on a change in pull request #4044:
URL: https://github.com/apache/carbondata/pull/4044#discussion_r537230858



##
File path: 
core/src/main/java/org/apache/carbondata/core/mutate/CarbonUpdateUtil.java
##
@@ -482,176 +482,6 @@ public boolean accept(CarbonFile file) {
 
   }
 
-  /**
-   * Handling of the clean up of old carbondata files, index files , delete 
delta,
-   * update status files.
-   * @param table clean up will be handled on this table.
-   * @param forceDelete if true then max query execution timeout will not be 
considered.
-   */
-  public static void cleanUpDeltaFiles(CarbonTable table, boolean forceDelete) 
throws IOException {
-
-SegmentStatusManager ssm = new 
SegmentStatusManager(table.getAbsoluteTableIdentifier());
-
-LoadMetadataDetails[] details =
-SegmentStatusManager.readLoadMetadata(table.getMetadataPath());
-
-SegmentUpdateStatusManager updateStatusManager = new 
SegmentUpdateStatusManager(table);
-SegmentUpdateDetails[] segmentUpdateDetails = 
updateStatusManager.getUpdateStatusDetails();
-// hold all the segments updated so that wen can check the delta files in 
them, ne need to
-// check the others.
-Set updatedSegments = new HashSet<>();
-for (SegmentUpdateDetails updateDetails : segmentUpdateDetails) {
-  updatedSegments.add(updateDetails.getSegmentName());
-}
-
-String validUpdateStatusFile = "";
-
-boolean isAbortedFile = true;
-
-boolean isInvalidFile = false;
-
-// take the update status file name from 0th segment.
-validUpdateStatusFile = ssm.getUpdateStatusFileName(details);
-// scan through each segment.
-for (LoadMetadataDetails segment : details) {
-  // if this segment is valid then only we will go for delta file deletion.
-  // if the segment is mark for delete or compacted then any way it will 
get deleted.
-  if (segment.getSegmentStatus() == SegmentStatus.SUCCESS
-  || segment.getSegmentStatus() == 
SegmentStatus.LOAD_PARTIAL_SUCCESS) {
-// when there is no update operations done on table, then no need to 
go ahead. So
-// just check the update delta start timestamp and proceed if not empty
-if (!segment.getUpdateDeltaStartTimestamp().isEmpty()
-|| updatedSegments.contains(segment.getLoadName())) {
-  // take the list of files from this segment.
-  String segmentPath = CarbonTablePath.getSegmentPath(
-  table.getAbsoluteTableIdentifier().getTablePath(), 
segment.getLoadName());
-  CarbonFile segDir =
-  FileFactory.getCarbonFile(segmentPath);
-  CarbonFile[] allSegmentFiles = segDir.listFiles();
-
-  // now handle all the delete delta files which needs to be deleted.
-  // there are 2 cases here .
-  // 1. if the block is marked as compacted then the corresponding 
delta files
-  //can be deleted if query exec timeout is done.
-  // 2. if the block is in success state then also there can be delete
-  //delta compaction happened and old files can be deleted.
-
-  SegmentUpdateDetails[] updateDetails = 
updateStatusManager.readLoadMetadata();
-  for (SegmentUpdateDetails block : updateDetails) {
-CarbonFile[] completeListOfDeleteDeltaFiles;
-CarbonFile[] invalidDeleteDeltaFiles;
-
-if 
(!block.getSegmentName().equalsIgnoreCase(segment.getLoadName())) {
-  continue;
-}
-
-// aborted scenario.
-invalidDeleteDeltaFiles = updateStatusManager
-.getDeleteDeltaInvalidFilesList(block, false,
-allSegmentFiles, isAbortedFile);
-for (CarbonFile invalidFile : invalidDeleteDeltaFiles) {
-  boolean doForceDelete = true;
-  compareTimestampsAndDelete(invalidFile, doForceDelete, false);
-}
-
-// case 1
-if (CarbonUpdateUtil.isBlockInvalid(block.getSegmentStatus())) {
-  completeListOfDeleteDeltaFiles = updateStatusManager
-  .getDeleteDeltaInvalidFilesList(block, true,
-  allSegmentFiles, isInvalidFile);
-  for (CarbonFile invalidFile : completeListOfDeleteDeltaFiles) {
-compareTimestampsAndDelete(invalidFile, forceDelete, false);
-  }
-
-} else {
-  invalidDeleteDeltaFiles = updateStatusManager
-  .getDeleteDeltaInvalidFilesList(block, false,
-  allSegmentFiles, isInvalidFile);
-  for (CarbonFile invalidFile : invalidDeleteDeltaFiles) {
-compareTimestampsAndDelete(invalidFile, forceDelete, false);
-  }
-}
-  }
-}
-// handle cleanup of merge index files and data files after small 
files merge happened for
-// SI table
-