[
https://issues.apache.org/jira/browse/TEZ-4604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hiroyuki Nagaya updated TEZ-4604:
---------------------------------
Description:
I am using a combination of Hadoop, Hive and Tez.
When I run major compaction with Hive, files under the staging directory are
not deleted.
With Mapreduce, files are deleted from the staging directory and files are
created in the history directory.
Hadoop 3.3.6
Hive 4.0.1
Tez 0.10.4
*1. When using Mapreduce*
The following data will be deleted.
/tmp/hadoop-yarn/staging/hadoop/.staging/job_1705466455536_3620
/tmp/hadoop-yarn/staging/hadoop/.staging/job_1705466455536_3620/job.jar
/tmp/hadoop-yarn/staging/hadoop/.staging/job_1705466455536_3620/job.split
/tmp/hadoop-yarn/staging/hadoop/.staging/job_1705466455536_3620/job.splitmetainfo
/tmp/hadoop-yarn/staging/hadoop/.staging/job_1705466455536_3620/job.xml
Historical data will be created in the following directories
/tmp/hadoop-yarn/staging/history/done
*2. When using Tez*
The following data will not be deleted
/tmp/hadoop-yarn/staging/hadoop/.staging/job_1740026697751_0002
/tmp/hadoop-yarn/staging/hadoop/.staging/job_1740026697751_0002/.tez
/tmp/hadoop-yarn/staging/hadoop/.staging/job_1740026697751_0002/job.jar
/tmp/hadoop-yarn/staging/hadoop/.staging/job_1740026697751_0002/job.split
/tmp/hadoop-yarn/staging/hadoop/.staging/job_1740026697751_0002/job.splitmetainfo
/tmp/hadoop-yarn/staging/hadoop/.staging/job_1740026697751_0002/job.xml
No historical data will be created.
*Is it a bug that the following directories are not deleted?*
*Or is it a Tez configuration problem?*
*I would like it to be deleted because the process has been completed
successfully and it is about 80MB in size.*
/tmp/hadoop-yarn/staging/hadoop/.staging/job_1740026697751_0002
was:
I am using a combination of Hadoop, Hive and Tez.
When I run major compaction with Hive, files under the staging directory are
not deleted.
With Mapreduce, files are deleted from the staging directory and files are
created in the history directory.
Hadoop 3.3.6
Hive 4.0.1
Tez 0.10.4
1. When using Mapreduce
The following data will be deleted.
/tmp/hadoop-yarn/staging/hadoop/.staging/job_1705466455536_3620
/tmp/hadoop-yarn/staging/hadoop/.staging/job_1705466455536_3620/job.jar
/tmp/hadoop-yarn/staging/hadoop/.staging/job_1705466455536_3620/job.split
/tmp/hadoop-yarn/staging/hadoop/.staging/job_1705466455536_3620/job.splitmetainfo
/tmp/hadoop-yarn/staging/hadoop/.staging/job_1705466455536_3620/job.xml
Historical data will be created in the following directories
/tmp/hadoop-yarn/staging/history/done
2. When using Tez
The following data will not be deleted
/tmp/hadoop-yarn/staging/hadoop/.staging/job_1740026697751_0002
/tmp/hadoop-yarn/staging/hadoop/.staging/job_1740026697751_0002/.tez
/tmp/hadoop-yarn/staging/hadoop/.staging/job_1740026697751_0002/job.jar
/tmp/hadoop-yarn/staging/hadoop/.staging/job_1740026697751_0002/job.split
/tmp/hadoop-yarn/staging/hadoop/.staging/job_1740026697751_0002/job.splitmetainfo
/tmp/hadoop-yarn/staging/hadoop/.staging/job_1740026697751_0002/job.xml
No historical data will be created.
Is it a bug that the following directories are not deleted?
Or is it a Tez configuration problem?
I would like it to be deleted because the process has been completed
successfully and it is about 80MB in size.
/tmp/hadoop-yarn/staging/hadoop/.staging/job_1740026697751_0002
> Hive compaction in Tez does not delete files under staging directory
> --------------------------------------------------------------------
>
> Key: TEZ-4604
> URL: https://issues.apache.org/jira/browse/TEZ-4604
> Project: Apache Tez
> Issue Type: Improvement
> Reporter: Hiroyuki Nagaya
> Priority: Critical
>
> I am using a combination of Hadoop, Hive and Tez.
> When I run major compaction with Hive, files under the staging directory are
> not deleted.
> With Mapreduce, files are deleted from the staging directory and files are
> created in the history directory.
> Hadoop 3.3.6
> Hive 4.0.1
> Tez 0.10.4
> *1. When using Mapreduce*
> The following data will be deleted.
> /tmp/hadoop-yarn/staging/hadoop/.staging/job_1705466455536_3620
> /tmp/hadoop-yarn/staging/hadoop/.staging/job_1705466455536_3620/job.jar
> /tmp/hadoop-yarn/staging/hadoop/.staging/job_1705466455536_3620/job.split
> /tmp/hadoop-yarn/staging/hadoop/.staging/job_1705466455536_3620/job.splitmetainfo
> /tmp/hadoop-yarn/staging/hadoop/.staging/job_1705466455536_3620/job.xml
> Historical data will be created in the following directories
> /tmp/hadoop-yarn/staging/history/done
> *2. When using Tez*
> The following data will not be deleted
> /tmp/hadoop-yarn/staging/hadoop/.staging/job_1740026697751_0002
> /tmp/hadoop-yarn/staging/hadoop/.staging/job_1740026697751_0002/.tez
> /tmp/hadoop-yarn/staging/hadoop/.staging/job_1740026697751_0002/job.jar
> /tmp/hadoop-yarn/staging/hadoop/.staging/job_1740026697751_0002/job.split
> /tmp/hadoop-yarn/staging/hadoop/.staging/job_1740026697751_0002/job.splitmetainfo
> /tmp/hadoop-yarn/staging/hadoop/.staging/job_1740026697751_0002/job.xml
> No historical data will be created.
> *Is it a bug that the following directories are not deleted?*
> *Or is it a Tez configuration problem?*
> *I would like it to be deleted because the process has been completed
> successfully and it is about 80MB in size.*
> /tmp/hadoop-yarn/staging/hadoop/.staging/job_1740026697751_0002
--
This message was sent by Atlassian Jira
(v8.20.10#820010)