[
https://issues.apache.org/jira/browse/TEZ-4604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17931115#comment-17931115
]
Shohei Okumiya edited comment on TEZ-4604 at 2/27/25 11:13 AM:
---------------------------------------------------------------
It looks like
[MRAppMaster#cleanupStagingDir|https://github.com/apache/hadoop/blob/rel/release-3.3.6/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/MRAppMaster.java#L616-L634]
removes a staging directory on HDFS.
{code:java}
2025-02-27 10:42:13,900 INFO [Thread-74]
org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Deleting staging directory
hdfs://zookage /user/hdfs/.staging/job_1740651771156_0003
2025-02-27 10:42:13,903 INFO [Thread-74] org.apache.hadoop.ipc.Server: Stopping
server on 43939 {code}
MR on Tez creates a sub dir postfixed with `.tez/\{app id}`
{code:java}
2025-02-27 10:58:46,155 [INFO] [main] |recovery.RecoveryService|:
RecoveryService initialized with
recoveryPath=hdfs://zookage/user/hdfs/.staging/job_1740653778782_0001/.tez/application_1740653778782_0001/recovery/1,
bufferSize(bytes)=8192, flushInterval(s)=30, maxUnflushedEvents=100
2025-02-27 10:58:56,542 [INFO] [AMShutdownThread] |app.DAGAppMaster|: Completed
deletion of tez scratch data dir,
path=hdfs://zookage/user/hdfs/.staging/job_1740653778782_0001/.tez/application_1740653778782_0001
{code}
was (Author: okumin):
It looks like
[MRAppMaster#cleanupStagingDir|https://github.com/apache/hadoop/blob/rel/release-3.3.6/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/MRAppMaster.java#L616-L634]
removes a staging directory on HDFS.
{code:java}
2025-02-27 10:42:13,900 INFO [Thread-74]
org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Deleting staging directory
hdfs://zookage /user/hdfs/.staging/job_1740651771156_0003
2025-02-27 10:42:13,903 INFO [Thread-74] org.apache.hadoop.ipc.Server: Stopping
server on 43939 {code}
> Hive compaction in Tez does not delete files under staging directory
> --------------------------------------------------------------------
>
> Key: TEZ-4604
> URL: https://issues.apache.org/jira/browse/TEZ-4604
> Project: Apache Tez
> Issue Type: Improvement
> Reporter: Hiroyuki Nagaya
> Priority: Critical
> Attachments: createTable.sql.txt, hive-changed.xml,
> hive-default.xml.template, mapred-site.xml, tez-changed.xml,
> tez-default-template.xml
>
>
> I am using a combination of Hadoop, Hive and Tez.
> When I run major compaction with Hive, files under the staging directory are
> not deleted.
> With Mapreduce, files are deleted from the staging directory and files are
> created in the history directory.
> Hadoop 3.3.6
> Hive 4.0.1
> Tez 0.10.4
> *1. When using Mapreduce*
> The following data will be deleted.
> /tmp/hadoop-yarn/staging/hadoop/.staging/job_1705466455536_3620
> /tmp/hadoop-yarn/staging/hadoop/.staging/job_1705466455536_3620/job.jar
> /tmp/hadoop-yarn/staging/hadoop/.staging/job_1705466455536_3620/job.split
> /tmp/hadoop-yarn/staging/hadoop/.staging/job_1705466455536_3620/job.splitmetainfo
> /tmp/hadoop-yarn/staging/hadoop/.staging/job_1705466455536_3620/job.xml
> Historical data will be created in the following directories
> /tmp/hadoop-yarn/staging/history/done
> *2. When using Tez*
> The following data will not be deleted
> /tmp/hadoop-yarn/staging/hadoop/.staging/job_1740026697751_0002
> /tmp/hadoop-yarn/staging/hadoop/.staging/job_1740026697751_0002/.tez
> /tmp/hadoop-yarn/staging/hadoop/.staging/job_1740026697751_0002/job.jar
> /tmp/hadoop-yarn/staging/hadoop/.staging/job_1740026697751_0002/job.split
> /tmp/hadoop-yarn/staging/hadoop/.staging/job_1740026697751_0002/job.splitmetainfo
> /tmp/hadoop-yarn/staging/hadoop/.staging/job_1740026697751_0002/job.xml
> No historical data will be created.
> *Is it a bug that the following directories are not deleted?*
> *Or is it a Tez configuration problem?*
> *I would like it to be deleted because the process has been completed
> successfully and it is about 80MB in size.*
> /tmp/hadoop-yarn/staging/hadoop/.staging/job_1740026697751_0002
--
This message was sent by Atlassian Jira
(v8.20.10#820010)