[
https://issues.apache.org/jira/browse/TEZ-1106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14000516#comment-14000516
]
Hitesh Shah commented on TEZ-1106:
----------------------------------
Comments:
- set permissions on staging dir related code in TezClientUtils.java could be
moved into the same class as other helper functions
- it is questionable whether we should be doing the set permissions for
the top level dir in the first place. May want to file a separate jira to
remove this code. Setting permissions is especially not needed if all tez data
generated by the framework is going into a special sub dir. +
{code}
+ if (LOG.isDebugEnabled()) {
+ LOG.debug("Stage directory information for AppId :" + appId + "
tezSysStagingPath :"
+ + tezSysStagingPath + " binaryConfPath :" + binaryConfPath + "
sessionJarsPath :"
+ + sessionJarsPath + " binaryPlanPath :" + binaryPath);
+ }
{code}
- can any of these values ever be null?
- TezCommonUtils:: getTezBaseStagingPath() vs getTezSystemStagingPath()
- What is the expectation for a user when configuring the staging dir?
- When the staging dir is deleted by the AM, what is deleted? the .tez
subdir or the base dir?
- for all the get*Path functions, it might be simpler to just do "return new
Path(..."
- In some get*Path functions, we have String dagId. In most, we have
ApplicationId as a param even though what is really needed is appIdStr. Maybe
accept appIdStr instead of converting to string multiple times?
- why do some functions do getFileSystem() and resolvePath() whereas others
just do simple string appends to create a path? What is the general behavior?
Also, please add javadocs for all newly added functions.
- all functions are public - do some need to change to private? Also are any
functions in the TezCommonUtils class meant to be used by user code or just by
internal framework code? Please use private/public annotations as needed.
- AppContext:: getTezSystemStagingPath() - is this being used anywhere?
- getTezSystemStagingPath() - always calls mkdirs without checking existence
of dir. Is this intentional?
> Tez framework should use a unique subdir when creating new files in staging
> -----------------------------------------------------------------------------
>
> Key: TEZ-1106
> URL: https://issues.apache.org/jira/browse/TEZ-1106
> Project: Apache Tez
> Issue Type: Bug
> Reporter: Mohammad Kamrul Islam
> Assignee: Mohammad Kamrul Islam
> Attachments: TEZ-1106.1.patch, TEZ-1106.2.patch, TEZ-1106.3.patch
>
>
> Currently the files are created in different sub-directories. It is hard to
> manage and cleanup at the end.
> The proposal is to create a new subdir : $STAGE_DIR/<APP_ID>/
> All recovery files will go under : $STAGE_DIR/<APP_ID>/recovery/<attemp_num>/
> All confs will go under: $STAGE_DIR/<APP_ID>/conf/
> All dagplans will go: $STAGE_DIR/<APP_ID>/dag_id/plan/
--
This message was sent by Atlassian JIRA
(v6.2#6252)