[ 
https://issues.apache.org/jira/browse/HIVE-27325?focusedWorklogId=861106&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-861106
 ]

ASF GitHub Bot logged work on HIVE-27325:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 09/May/23 02:41
            Start Date: 09/May/23 02:41
    Worklog Time Spent: 10m 
      Work Description: rbalamohan commented on code in PR #4302:
URL: https://github.com/apache/hive/pull/4302#discussion_r1188058912


##########
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java:
##########
@@ -676,6 +688,15 @@ public void 
executeOperation(org.apache.hadoop.hive.ql.metadata.Table hmsTable,
     }
   }
 
+  private static ExecutorService getDeleteExecutorService(String completeName, 
int numThreads) {
+    AtomicInteger deleteThreadsIndex = new AtomicInteger(0);
+    return Executors.newFixedThreadPool(numThreads, runnable -> {

Review Comment:
   Iceberg API may not take care of TP lifecycle. Do you need to take care of 
shutting down TP in finally block after expire snapshots? Otherwise it will end 
up creating too may TP depending on number of executions.





Issue Time Tracking
-------------------

    Worklog Id:     (was: 861106)
    Time Spent: 0.5h  (was: 20m)

> Expiring old snapshots deletes files with DirectExecutorService causing 
> runtime delays
> --------------------------------------------------------------------------------------
>
>                 Key: HIVE-27325
>                 URL: https://issues.apache.org/jira/browse/HIVE-27325
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Rajesh Balamohan
>            Assignee: Ayush Saxena
>            Priority: Major
>              Labels: iceberg, pull-request-available
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Expiring old snapshots takes a lot of time, as fileCleanupStrategy internally 
> uses directExecutorService. Creating this as a placeholder ticket to fix the 
> same. If fixed in iceberg, need to upgrade the lib here.
> {noformat}
> insert into store_sales_delete_9 select *, current_timestamp() as ts from 
> tpcds_1000_update.ssv ;;
> ALTER TABLE store_sales_delete_9 EXECUTE expire_snapshots('2023-05-09 
> 00:00:00');
> {noformat}
> {noformat}
>       at 
> org.apache.iceberg.relocated.com.google.common.util.concurrent.AbstractListeningExecutorService.submit(AbstractListeningExecutorService.java:36)
>       at org.apache.iceberg.util.Tasks$Builder.runParallel(Tasks.java:300)
>       at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:194)
>       at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:189)
>       at 
> org.apache.iceberg.FileCleanupStrategy.deleteFiles(FileCleanupStrategy.java:84)
>       at 
> org.apache.iceberg.IncrementalFileCleanup.cleanFiles(IncrementalFileCleanup.java:262)
>       at 
> org.apache.iceberg.RemoveSnapshots.cleanExpiredSnapshots(RemoveSnapshots.java:338)
>       at org.apache.iceberg.RemoveSnapshots.commit(RemoveSnapshots.java:312)
>       at 
> org.apache.iceberg.mr.hive.HiveIcebergStorageHandler.executeOperation(HiveIcebergStorageHandler.java:560)
>       at 
> org.apache.hadoop.hive.ql.metadata.Hive.alterTableExecuteOperation(Hive.java:6844)
>       at 
> org.apache.hadoop.hive.ql.ddl.table.execute.AlterTableExecuteOperation.execute(AlterTableExecuteOperation.java:37)
>       at org.apache.hadoop.hive.ql.ddl.DDLTask.execute(DDLTask.java:84)
>       at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:213)
>       at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105)
>       at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:360)
>       at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:333)
>       at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:250)
>       at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:111)
>       at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:809)
>       at org.apache.hadoop.hive.ql.Driver.run(Driver.java:547)
>       at org.apache.hadoop.hive.ql.Driver.run(Driver.java:541)
>       at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:166)
>       at 
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:235)
>       at 
> org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:92)
>       at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:340)
>       at java.security.AccessController.doPrivileged(java.base@11.0.19/Native 
> Method)
>       at javax.security.auth.Subject.doAs(java.base@11.0.19/Subject.java:423)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
>       at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:360)
>       at 
> java.util.concurrent.Executors$RunnableAdapter.call(java.base@11.0.19/Executors.java:515)
>       at 
> java.util.concurrent.FutureTask.run(java.base@11.0.19/FutureTask.java:264)
>       at 
> java.util.concurrent.Executors$RunnableAdapter.call(java.base@11.0.19/Executors.java:515)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to