[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12797068#action_12797068
 ] 

Vinod K V commented on MAPREDUCE-1302:
--------------------------------------

I've looked at your latest patch. Some comments I have on the patch:

TaskTracker.java
 - {{#getMRAsyncDiskService()}} method is not used anywhere. Can we remove it?
 - We should add javadoc of {{#cleanupStorage()}} method as to what one should 
use instead of this deprecated method.
 - The method calls {{this.distributedCacheManager.purgeCache()}} and 
{{asyncDiskService.moveAndDeleteAllVolumes()}} (at +645-46) are not needed at 
all because there is a {{asyncDiskService.moveAndDeleteFromEachVolume(SUBDIR)}} 
(at +575) itself that completely clears the taskTracker subdir. They can be 
removed.
 
MRAsyncDiskService.java
 - There is an unused import of Configuration which can be removed.
 - Annotate MRAsyncDiskService as @InterfaceAudience.Private as it is for 
internal use only.
 - Please change the java comment of the class to javadoc. Also we should make 
it clear that clients of this service should NOT write files themselves in the 
TOBEDELTED directory in any of the volumes or risk losing such files.
 - Documentation (javadoc) is needed as to how {{MRAsyncDiskService(JobConf 
conf)}} uses the 'conf' parameter.
 - Can we rename {{moveAndDeleteAllVolumes()}} to something like 
{{cleanupAllVolumes()}} or similar. This is because this method is only moving 
and deleting the contents of the volumes and not the volumes themselves.
 - To be more clear, can we also rename {{moveAndDelete(String 
absolutePathName)}} to {{moveAndDeleteAbsolutePath(String absolutePathName)}} 
and {{moveAndDelete(String volume, String pathName)}} to 
{{moveAndDeleteRelativePath(String volume, String pathName)}}?
 - In any case, we should make the absolute/relative difference explicit in the 
javadoc of both the methods.
 - {{getRelativePathName()}} is bit buggy in that it will return null if any 
one or both of 'volume' string and 'absolutePathName' string have multiple 
Path.SEPARATOR characters in them. We should construct normalized paths before 
working on them.
 - Similar to above, we should make sure that paths/path fragments are 
normalized before their comparision in moveAndDeleteAllVolumes() at +289 and 
before passing to  DeleteTask at +246 {{moveAndDelete(String volume, String 
pathName)}}
 - Javadoc of {{moveAndDeleteAllVolumes()}} needs to be fixed, it looks 
incomplete now
 - Nit: Need to fix the comment at +288 to use TOBEDELETED instead of SUBDIR
 - Not related to this JIRA: I think that uniqueId generation should be static 
and be valid across different MRAsyncDiskService objects. We can file another 
issue to fix this if you agree that it is a bug here.

JobConf
 - When deprecating Jobconf.deleteLocalFiles(), we should also modify javadoc 
to state that MRAsyncDiskService should be used instead.
 - What about JobConf.deleteLocalFiles(String subDir) and its callers? Another 
JIRA?

Testing:
 - Even though the inline removal of files is tested by various other 
test-cases, we haven't test the actual deletion anywhere. For this, we can 
refactor the DeleteTask part of moveAndDelete(String, String) into a new method 
and can then override it with inline deletion to assert the actual deletion of 
files by verifying the contents of TOBEDELETED directory.

> TrackerDistributedCacheManager can delete file asynchronously
> -------------------------------------------------------------
>
>                 Key: MAPREDUCE-1302
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1302
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>    Affects Versions: 0.20.2, 0.21.0, 0.22.0
>            Reporter: Zheng Shao
>            Assignee: Zheng Shao
>         Attachments: MAPREDUCE-1302.0.patch, MAPREDUCE-1302.1.patch, 
> MAPREDUCE-1302.2.patch, MAPREDUCE-1302.3.patch
>
>
> With the help of AsyncDiskService from MAPREDUCE-1213, we should be able to 
> delete files from distributed cache asynchronously.
> That will help make task initialization faster, because task initialization 
> calls the code that localizes files into the cache and may delete some other 
> files.
> The deletion can slow down the task initialization speed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to