[jira] [Commented] (TEZ-4604) Hive compaction in Tez does not delete files under staging directory

Shohei Okumiya (Jira) Tue, 25 Feb 2025 07:28:10 -0800


    [ 
https://issues.apache.org/jira/browse/TEZ-4604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17930375#comment-17930375
 ]


Shohei Okumiya commented on TEZ-4604:
-------------------------------------

[~gaya] Thanks for detailing your investigation and observations. We might need 
more information to determine what exactly happened.

What types of tables did you try to compact? Hive provides capabilities to 
compact Hive ACID or Iceberg tables. DDL and DML reproducing the issue would 
also be helpful.

What is the difference between "1. When using Mapreduce" and "2. When using 
Tez"? It would be helpful if we could know what parameters changed, ideally if 
you could provide all Hive, Tez, and MapReduce parameters.

> Hive compaction in Tez does not delete files under staging directory
> --------------------------------------------------------------------
>
>                 Key: TEZ-4604
>                 URL: https://issues.apache.org/jira/browse/TEZ-4604
>             Project: Apache Tez
>          Issue Type: Improvement
>            Reporter: Hiroyuki Nagaya
>            Priority: Critical
>
> I am using a combination of Hadoop, Hive and Tez.
> When I run major compaction with Hive, files under the staging directory are 
> not deleted.
> With Mapreduce, files are deleted from the staging directory and files are 
> created in the history directory.
> Hadoop 3.3.6
> Hive 4.0.1
> Tez 0.10.4
> *1. When using Mapreduce*
> The following data will be deleted.
> /tmp/hadoop-yarn/staging/hadoop/.staging/job_1705466455536_3620
> /tmp/hadoop-yarn/staging/hadoop/.staging/job_1705466455536_3620/job.jar
> /tmp/hadoop-yarn/staging/hadoop/.staging/job_1705466455536_3620/job.split
> /tmp/hadoop-yarn/staging/hadoop/.staging/job_1705466455536_3620/job.splitmetainfo
> /tmp/hadoop-yarn/staging/hadoop/.staging/job_1705466455536_3620/job.xml
> Historical data will be created in the following directories
> /tmp/hadoop-yarn/staging/history/done
> *2. When using Tez*
> The following data will not be deleted
> /tmp/hadoop-yarn/staging/hadoop/.staging/job_1740026697751_0002
> /tmp/hadoop-yarn/staging/hadoop/.staging/job_1740026697751_0002/.tez
> /tmp/hadoop-yarn/staging/hadoop/.staging/job_1740026697751_0002/job.jar
> /tmp/hadoop-yarn/staging/hadoop/.staging/job_1740026697751_0002/job.split
> /tmp/hadoop-yarn/staging/hadoop/.staging/job_1740026697751_0002/job.splitmetainfo
> /tmp/hadoop-yarn/staging/hadoop/.staging/job_1740026697751_0002/job.xml
> No historical data will be created.
> *Is it a bug that the following directories are not deleted?*
> *Or is it a Tez configuration problem?*
> *I would like it to be deleted because the process has been completed 
> successfully and it is about 80MB in size.*
> /tmp/hadoop-yarn/staging/hadoop/.staging/job_1740026697751_0002



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (TEZ-4604) Hive compaction in Tez does not delete files under staging directory

Reply via email to