[
https://issues.apache.org/jira/browse/MAPREDUCE-1302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12797068#action_12797068
]
Vinod K V commented on MAPREDUCE-1302:
--------------------------------------
I've looked at your latest patch. Some comments I have on the patch:
TaskTracker.java
- {{#getMRAsyncDiskService()}} method is not used anywhere. Can we remove it?
- We should add javadoc of {{#cleanupStorage()}} method as to what one should
use instead of this deprecated method.
- The method calls {{this.distributedCacheManager.purgeCache()}} and
{{asyncDiskService.moveAndDeleteAllVolumes()}} (at +645-46) are not needed at
all because there is a {{asyncDiskService.moveAndDeleteFromEachVolume(SUBDIR)}}
(at +575) itself that completely clears the taskTracker subdir. They can be
removed.
MRAsyncDiskService.java
- There is an unused import of Configuration which can be removed.
- Annotate MRAsyncDiskService as @InterfaceAudience.Private as it is for
internal use only.
- Please change the java comment of the class to javadoc. Also we should make
it clear that clients of this service should NOT write files themselves in the
TOBEDELTED directory in any of the volumes or risk losing such files.
- Documentation (javadoc) is needed as to how {{MRAsyncDiskService(JobConf
conf)}} uses the 'conf' parameter.
- Can we rename {{moveAndDeleteAllVolumes()}} to something like
{{cleanupAllVolumes()}} or similar. This is because this method is only moving
and deleting the contents of the volumes and not the volumes themselves.
- To be more clear, can we also rename {{moveAndDelete(String
absolutePathName)}} to {{moveAndDeleteAbsolutePath(String absolutePathName)}}
and {{moveAndDelete(String volume, String pathName)}} to
{{moveAndDeleteRelativePath(String volume, String pathName)}}?
- In any case, we should make the absolute/relative difference explicit in the
javadoc of both the methods.
- {{getRelativePathName()}} is bit buggy in that it will return null if any
one or both of 'volume' string and 'absolutePathName' string have multiple
Path.SEPARATOR characters in them. We should construct normalized paths before
working on them.
- Similar to above, we should make sure that paths/path fragments are
normalized before their comparision in moveAndDeleteAllVolumes() at +289 and
before passing to DeleteTask at +246 {{moveAndDelete(String volume, String
pathName)}}
- Javadoc of {{moveAndDeleteAllVolumes()}} needs to be fixed, it looks
incomplete now
- Nit: Need to fix the comment at +288 to use TOBEDELETED instead of SUBDIR
- Not related to this JIRA: I think that uniqueId generation should be static
and be valid across different MRAsyncDiskService objects. We can file another
issue to fix this if you agree that it is a bug here.
JobConf
- When deprecating Jobconf.deleteLocalFiles(), we should also modify javadoc
to state that MRAsyncDiskService should be used instead.
- What about JobConf.deleteLocalFiles(String subDir) and its callers? Another
JIRA?
Testing:
- Even though the inline removal of files is tested by various other
test-cases, we haven't test the actual deletion anywhere. For this, we can
refactor the DeleteTask part of moveAndDelete(String, String) into a new method
and can then override it with inline deletion to assert the actual deletion of
files by verifying the contents of TOBEDELETED directory.
> TrackerDistributedCacheManager can delete file asynchronously
> -------------------------------------------------------------
>
> Key: MAPREDUCE-1302
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1302
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Components: tasktracker
> Affects Versions: 0.20.2, 0.21.0, 0.22.0
> Reporter: Zheng Shao
> Assignee: Zheng Shao
> Attachments: MAPREDUCE-1302.0.patch, MAPREDUCE-1302.1.patch,
> MAPREDUCE-1302.2.patch, MAPREDUCE-1302.3.patch
>
>
> With the help of AsyncDiskService from MAPREDUCE-1213, we should be able to
> delete files from distributed cache asynchronously.
> That will help make task initialization faster, because task initialization
> calls the code that localizes files into the cache and may delete some other
> files.
> The deletion can slow down the task initialization speed.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.