[ https://issues.apache.org/jira/browse/MAPREDUCE-1302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12797068#action_12797068 ]
Vinod K V commented on MAPREDUCE-1302: -------------------------------------- I've looked at your latest patch. Some comments I have on the patch: TaskTracker.java - {{#getMRAsyncDiskService()}} method is not used anywhere. Can we remove it? - We should add javadoc of {{#cleanupStorage()}} method as to what one should use instead of this deprecated method. - The method calls {{this.distributedCacheManager.purgeCache()}} and {{asyncDiskService.moveAndDeleteAllVolumes()}} (at +645-46) are not needed at all because there is a {{asyncDiskService.moveAndDeleteFromEachVolume(SUBDIR)}} (at +575) itself that completely clears the taskTracker subdir. They can be removed. MRAsyncDiskService.java - There is an unused import of Configuration which can be removed. - Annotate MRAsyncDiskService as @InterfaceAudience.Private as it is for internal use only. - Please change the java comment of the class to javadoc. Also we should make it clear that clients of this service should NOT write files themselves in the TOBEDELTED directory in any of the volumes or risk losing such files. - Documentation (javadoc) is needed as to how {{MRAsyncDiskService(JobConf conf)}} uses the 'conf' parameter. - Can we rename {{moveAndDeleteAllVolumes()}} to something like {{cleanupAllVolumes()}} or similar. This is because this method is only moving and deleting the contents of the volumes and not the volumes themselves. - To be more clear, can we also rename {{moveAndDelete(String absolutePathName)}} to {{moveAndDeleteAbsolutePath(String absolutePathName)}} and {{moveAndDelete(String volume, String pathName)}} to {{moveAndDeleteRelativePath(String volume, String pathName)}}? - In any case, we should make the absolute/relative difference explicit in the javadoc of both the methods. - {{getRelativePathName()}} is bit buggy in that it will return null if any one or both of 'volume' string and 'absolutePathName' string have multiple Path.SEPARATOR characters in them. We should construct normalized paths before working on them. - Similar to above, we should make sure that paths/path fragments are normalized before their comparision in moveAndDeleteAllVolumes() at +289 and before passing to DeleteTask at +246 {{moveAndDelete(String volume, String pathName)}} - Javadoc of {{moveAndDeleteAllVolumes()}} needs to be fixed, it looks incomplete now - Nit: Need to fix the comment at +288 to use TOBEDELETED instead of SUBDIR - Not related to this JIRA: I think that uniqueId generation should be static and be valid across different MRAsyncDiskService objects. We can file another issue to fix this if you agree that it is a bug here. JobConf - When deprecating Jobconf.deleteLocalFiles(), we should also modify javadoc to state that MRAsyncDiskService should be used instead. - What about JobConf.deleteLocalFiles(String subDir) and its callers? Another JIRA? Testing: - Even though the inline removal of files is tested by various other test-cases, we haven't test the actual deletion anywhere. For this, we can refactor the DeleteTask part of moveAndDelete(String, String) into a new method and can then override it with inline deletion to assert the actual deletion of files by verifying the contents of TOBEDELETED directory. > TrackerDistributedCacheManager can delete file asynchronously > ------------------------------------------------------------- > > Key: MAPREDUCE-1302 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1302 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: tasktracker > Affects Versions: 0.20.2, 0.21.0, 0.22.0 > Reporter: Zheng Shao > Assignee: Zheng Shao > Attachments: MAPREDUCE-1302.0.patch, MAPREDUCE-1302.1.patch, > MAPREDUCE-1302.2.patch, MAPREDUCE-1302.3.patch > > > With the help of AsyncDiskService from MAPREDUCE-1213, we should be able to > delete files from distributed cache asynchronously. > That will help make task initialization faster, because task initialization > calls the code that localizes files into the cache and may delete some other > files. > The deletion can slow down the task initialization speed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.