[ 
https://issues.apache.org/jira/browse/TEZ-1106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14002187#comment-14002187
 ] 

Hitesh Shah commented on TEZ-1106:
----------------------------------

bq. They are mainly used in ensureStagingDirExists() code. Do you propose to 
move it into TezCommonUtils? Btw this method is called by almost all examples 
(12 references).

I was just looking at all the code handling dir creation and permission setting 
being in a single place. Lets leave the wrapper function in TezClientUtils for 
now. 

bq. can any of these values ever be null?
Wanted to ensure that we do not hit an NPE in any scenario when someone enables 
debug logging. 

bq. Currently looks like there is no expectation. User can (not must) define it 
in the conf or TEZ default is applied.
For this, yes, lets discuss in TEZ-792. 

bq. I'm not sure which one will be better. Do you have any preference? I 
believe in Hadoop 1.x, the framework deletes the job specific stuffs during 
cleanup not the root stage dir. (Related JIRA: TEZ-693).

Again, I guess this boils down to how a user will use the staging dir. Is a 
user handing off control of the staging dir to Tez? or just providing a 
location that Tez can use for writing temporary data? In any case, for now, 
deleting the appId specific subdir might be safest approach for now. However, 
it becomes a tricky question of how to allow a user to delete all their scratch 
data when the Tez AM completes without requiring the client to be always 
running to remove this data on AM completion. 




> Tez framework should use a unique subdir when creating new files in staging  
> -----------------------------------------------------------------------------
>
>                 Key: TEZ-1106
>                 URL: https://issues.apache.org/jira/browse/TEZ-1106
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Mohammad Kamrul Islam
>            Assignee: Mohammad Kamrul Islam
>         Attachments: TEZ-1106.1.patch, TEZ-1106.2.patch, TEZ-1106.3.patch
>
>
> Currently the files are created in different sub-directories. It is hard to 
> manage and cleanup at the end.
> The proposal is to create a new subdir  : $STAGE_DIR/<APP_ID>/
> All recovery files will go under  : $STAGE_DIR/<APP_ID>/recovery/<attemp_num>/
> All confs will go under:  $STAGE_DIR/<APP_ID>/conf/
> All dagplans will go:  $STAGE_DIR/<APP_ID>/dag_id/plan/



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to