[ 
https://issues.apache.org/jira/browse/TEZ-693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13991035#comment-13991035
 ] 

Mohammad Kamrul Islam commented on TEZ-693:
-------------------------------------------

Some findings to share and proposing an approach to handle it before 
implementation.
Needs feedback.

Current details of files written in staging or current_working dir:
1.  Recovery files created in RecoveryService.java :

RECOVERY_BASE_PATH --> $STAGE_DIR/APP_ID/recovery/attempt_id/

A) App specific summary file:  $RECOVERY_BASE_PATH/APP_ID.recovery

B) Dag specific recovery file is: $RECOVERY_BASE_PATH/DAG_ID.recovery

Proposal:
-------------
Remove these files in RecoveryService.serviceStop()

2. Configuration created at TezClientUtils.java

Configuration file : $STAGE_DIR/tez-conf.pb.APP_ID

Proposal: 
-------------
Client has to remove this file at the end. 

3. Session Jars created at TezClientUtils.java

Session jars as local resource file : 
$STAGE_DIR/tez.session.local-resources.pb.file-name.APP_ID


Proposal: 
-------------
Client has to remove this file at the end.

4. Dag Plan create at TezClientUtils.java

DAG plan binary file : $STAGE_DIR/tez-dag.pb.APP_ID

DAG plan text file : $STAGE_DIR/tez-dag.pb.txtAPP_ID


Proposal: 
-------------
Client has to remove this file at the end.

5. Local file at DAGAppMaster.java
       
 DAG plan text file : $CWD/tez-dag.pb.txt


Proposal: 
-------------
Remove it in AMShutdownRunnable or when a DAG completes


A sample output of staging dir:
{noformat}
$ hadoop fs -lsr /user/mislam/.staging/application_1396311427728_22000
lsr: DEPRECATED: Please use 'ls -R' instead.
drwx------   - mislam mislam          0 2014-05-06 18:05 
/user/mislam/.staging/application_1396311427728_22000/application_1396311427728_22000
drwx------   - mislam mislam          0 2014-05-06 18:05 
/user/mislam/.staging/application_1396311427728_22000/application_1396311427728_22000/recovery
drwx------   - mislam mislam          0 2014-05-06 18:05 
/user/mislam/.staging/application_1396311427728_22000/application_1396311427728_22000/recovery/1
-rw-r-----   3 mislam mislam        228 2014-05-06 18:05 
/user/mislam/.staging/application_1396311427728_22000/application_1396311427728_22000/recovery/1/application_1396311427728_22000.summary
-rw-r-----   3 mislam mislam     110088 2014-05-06 18:05 
/user/mislam/.staging/application_1396311427728_22000/application_1396311427728_22000/recovery/1/dag_1396311427728_22000_1.recovery
-rw-r--r--   3 mislam mislam        249 2014-05-06 18:05 
/user/mislam/.staging/application_1396311427728_22000/tez-conf.pb.application_1396311427728_22000
-rw-r--r--   3 mislam mislam       3132 2014-05-06 18:05 
/user/mislam/.staging/application_1396311427728_22000/tez.session.local-resources.pb.file-name.application_1396311427728_22000

{noformat}



> Deletion of DAG specific data after DAG completion
> --------------------------------------------------
>
>                 Key: TEZ-693
>                 URL: https://issues.apache.org/jira/browse/TEZ-693
>             Project: Apache Tez
>          Issue Type: Sub-task
>            Reporter: Bikas Saha
>            Assignee: Mohammad Kamrul Islam
>
> Currently the client uploads some dag specific data to a remote directory 
> specified by the user. The burden is on the client to clean this data after 
> the dag completes. The post dag completion code in the AM should be able to 
> clean this custom uploaded data.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to