[
https://issues.apache.org/jira/browse/TEZ-693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13991035#comment-13991035
]
Mohammad Kamrul Islam commented on TEZ-693:
-------------------------------------------
Some findings to share and proposing an approach to handle it before
implementation.
Needs feedback.
Current details of files written in staging or current_working dir:
1. Recovery files created in RecoveryService.java :
RECOVERY_BASE_PATH --> $STAGE_DIR/APP_ID/recovery/attempt_id/
A) App specific summary file: $RECOVERY_BASE_PATH/APP_ID.recovery
B) Dag specific recovery file is: $RECOVERY_BASE_PATH/DAG_ID.recovery
Proposal:
-------------
Remove these files in RecoveryService.serviceStop()
2. Configuration created at TezClientUtils.java
Configuration file : $STAGE_DIR/tez-conf.pb.APP_ID
Proposal:
-------------
Client has to remove this file at the end.
3. Session Jars created at TezClientUtils.java
Session jars as local resource file :
$STAGE_DIR/tez.session.local-resources.pb.file-name.APP_ID
Proposal:
-------------
Client has to remove this file at the end.
4. Dag Plan create at TezClientUtils.java
DAG plan binary file : $STAGE_DIR/tez-dag.pb.APP_ID
DAG plan text file : $STAGE_DIR/tez-dag.pb.txtAPP_ID
Proposal:
-------------
Client has to remove this file at the end.
5. Local file at DAGAppMaster.java
DAG plan text file : $CWD/tez-dag.pb.txt
Proposal:
-------------
Remove it in AMShutdownRunnable or when a DAG completes
A sample output of staging dir:
{noformat}
$ hadoop fs -lsr /user/mislam/.staging/application_1396311427728_22000
lsr: DEPRECATED: Please use 'ls -R' instead.
drwx------ - mislam mislam 0 2014-05-06 18:05
/user/mislam/.staging/application_1396311427728_22000/application_1396311427728_22000
drwx------ - mislam mislam 0 2014-05-06 18:05
/user/mislam/.staging/application_1396311427728_22000/application_1396311427728_22000/recovery
drwx------ - mislam mislam 0 2014-05-06 18:05
/user/mislam/.staging/application_1396311427728_22000/application_1396311427728_22000/recovery/1
-rw-r----- 3 mislam mislam 228 2014-05-06 18:05
/user/mislam/.staging/application_1396311427728_22000/application_1396311427728_22000/recovery/1/application_1396311427728_22000.summary
-rw-r----- 3 mislam mislam 110088 2014-05-06 18:05
/user/mislam/.staging/application_1396311427728_22000/application_1396311427728_22000/recovery/1/dag_1396311427728_22000_1.recovery
-rw-r--r-- 3 mislam mislam 249 2014-05-06 18:05
/user/mislam/.staging/application_1396311427728_22000/tez-conf.pb.application_1396311427728_22000
-rw-r--r-- 3 mislam mislam 3132 2014-05-06 18:05
/user/mislam/.staging/application_1396311427728_22000/tez.session.local-resources.pb.file-name.application_1396311427728_22000
{noformat}
> Deletion of DAG specific data after DAG completion
> --------------------------------------------------
>
> Key: TEZ-693
> URL: https://issues.apache.org/jira/browse/TEZ-693
> Project: Apache Tez
> Issue Type: Sub-task
> Reporter: Bikas Saha
> Assignee: Mohammad Kamrul Islam
>
> Currently the client uploads some dag specific data to a remote directory
> specified by the user. The burden is on the client to clean this data after
> the dag completes. The post dag completion code in the AM should be able to
> clean this custom uploaded data.
--
This message was sent by Atlassian JIRA
(v6.2#6252)