[jira] [Updated] (TEZ-4604) Hive compaction in Tez does not delete files under staging directory

Hiroyuki Nagaya (Jira) Wed, 19 Feb 2025 23:26:08 -0800


     [ 
https://issues.apache.org/jira/browse/TEZ-4604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Hiroyuki Nagaya updated TEZ-4604:
---------------------------------
    Description: 
I am using a combination of Hadoop, Hive and Tez.
When I run major compaction with Hive, files under the staging directory are 
not deleted.
With Mapreduce, files are deleted from the staging directory and files are 
created in the history directory.

Hadoop 3.3.6
Hive 4.0.1
Tez 0.10.4

*1. When using Mapreduce*

The following data will be deleted.

/tmp/hadoop-yarn/staging/hadoop/.staging/job_1705466455536_3620
/tmp/hadoop-yarn/staging/hadoop/.staging/job_1705466455536_3620/job.jar
/tmp/hadoop-yarn/staging/hadoop/.staging/job_1705466455536_3620/job.split
/tmp/hadoop-yarn/staging/hadoop/.staging/job_1705466455536_3620/job.splitmetainfo
/tmp/hadoop-yarn/staging/hadoop/.staging/job_1705466455536_3620/job.xml

Historical data will be created in the following directories
/tmp/hadoop-yarn/staging/history/done

*2. When using Tez*

The following data will not be deleted

/tmp/hadoop-yarn/staging/hadoop/.staging/job_1740026697751_0002
/tmp/hadoop-yarn/staging/hadoop/.staging/job_1740026697751_0002/.tez
/tmp/hadoop-yarn/staging/hadoop/.staging/job_1740026697751_0002/job.jar
/tmp/hadoop-yarn/staging/hadoop/.staging/job_1740026697751_0002/job.split
/tmp/hadoop-yarn/staging/hadoop/.staging/job_1740026697751_0002/job.splitmetainfo
/tmp/hadoop-yarn/staging/hadoop/.staging/job_1740026697751_0002/job.xml

No historical data will be created.

*Is it a bug that the following directories are not deleted?*
*Or is it a Tez configuration problem?*
*I would like it to be deleted because the process has been completed 
successfully and it is about 80MB in size.*
/tmp/hadoop-yarn/staging/hadoop/.staging/job_1740026697751_0002

  was:
I am using a combination of Hadoop, Hive and Tez.
When I run major compaction with Hive, files under the staging directory are 
not deleted.
With Mapreduce, files are deleted from the staging directory and files are 
created in the history directory.

Hadoop 3.3.6
Hive 4.0.1
Tez 0.10.4

1. When using Mapreduce

The following data will be deleted.

/tmp/hadoop-yarn/staging/hadoop/.staging/job_1705466455536_3620
/tmp/hadoop-yarn/staging/hadoop/.staging/job_1705466455536_3620/job.jar
/tmp/hadoop-yarn/staging/hadoop/.staging/job_1705466455536_3620/job.split
/tmp/hadoop-yarn/staging/hadoop/.staging/job_1705466455536_3620/job.splitmetainfo
/tmp/hadoop-yarn/staging/hadoop/.staging/job_1705466455536_3620/job.xml

Historical data will be created in the following directories
/tmp/hadoop-yarn/staging/history/done

2. When using Tez

The following data will not be deleted

/tmp/hadoop-yarn/staging/hadoop/.staging/job_1740026697751_0002
/tmp/hadoop-yarn/staging/hadoop/.staging/job_1740026697751_0002/.tez
/tmp/hadoop-yarn/staging/hadoop/.staging/job_1740026697751_0002/job.jar
/tmp/hadoop-yarn/staging/hadoop/.staging/job_1740026697751_0002/job.split
/tmp/hadoop-yarn/staging/hadoop/.staging/job_1740026697751_0002/job.splitmetainfo
/tmp/hadoop-yarn/staging/hadoop/.staging/job_1740026697751_0002/job.xml

No historical data will be created.


Is it a bug that the following directories are not deleted?
Or is it a Tez configuration problem?
I would like it to be deleted because the process has been completed 
successfully and it is about 80MB in size.
/tmp/hadoop-yarn/staging/hadoop/.staging/job_1740026697751_0002


> Hive compaction in Tez does not delete files under staging directory
> --------------------------------------------------------------------
>
>                 Key: TEZ-4604
>                 URL: https://issues.apache.org/jira/browse/TEZ-4604
>             Project: Apache Tez
>          Issue Type: Improvement
>            Reporter: Hiroyuki Nagaya
>            Priority: Critical
>
> I am using a combination of Hadoop, Hive and Tez.
> When I run major compaction with Hive, files under the staging directory are 
> not deleted.
> With Mapreduce, files are deleted from the staging directory and files are 
> created in the history directory.
> Hadoop 3.3.6
> Hive 4.0.1
> Tez 0.10.4
> *1. When using Mapreduce*
> The following data will be deleted.
> /tmp/hadoop-yarn/staging/hadoop/.staging/job_1705466455536_3620
> /tmp/hadoop-yarn/staging/hadoop/.staging/job_1705466455536_3620/job.jar
> /tmp/hadoop-yarn/staging/hadoop/.staging/job_1705466455536_3620/job.split
> /tmp/hadoop-yarn/staging/hadoop/.staging/job_1705466455536_3620/job.splitmetainfo
> /tmp/hadoop-yarn/staging/hadoop/.staging/job_1705466455536_3620/job.xml
> Historical data will be created in the following directories
> /tmp/hadoop-yarn/staging/history/done
> *2. When using Tez*
> The following data will not be deleted
> /tmp/hadoop-yarn/staging/hadoop/.staging/job_1740026697751_0002
> /tmp/hadoop-yarn/staging/hadoop/.staging/job_1740026697751_0002/.tez
> /tmp/hadoop-yarn/staging/hadoop/.staging/job_1740026697751_0002/job.jar
> /tmp/hadoop-yarn/staging/hadoop/.staging/job_1740026697751_0002/job.split
> /tmp/hadoop-yarn/staging/hadoop/.staging/job_1740026697751_0002/job.splitmetainfo
> /tmp/hadoop-yarn/staging/hadoop/.staging/job_1740026697751_0002/job.xml
> No historical data will be created.
> *Is it a bug that the following directories are not deleted?*
> *Or is it a Tez configuration problem?*
> *I would like it to be deleted because the process has been completed 
> successfully and it is about 80MB in size.*
> /tmp/hadoop-yarn/staging/hadoop/.staging/job_1740026697751_0002



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (TEZ-4604) Hive compaction in Tez does not delete files under staging directory

Reply via email to