[ 
https://issues.apache.org/jira/browse/HADOOP-4780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12655115#action_12655115
 ] 

Zheng Shao commented on HADOOP-4780:
------------------------------------

The size of the file/archive (including decompressed files) can be easily 
calculated. Each of the file/archive is downloaded into a different directory:
{code}
DistributedCache.java:
      Path parchive = new Path(cacheStatus.localLoadPath,
                               new Path(cacheStatus.localLoadPath.getName()));
{code}

We just need to remember the size of {code} cacheStatus.localLoadPath {code} in 
DistributedCache. Then we won't need to do getDU() again (instead we just need 
to calculate the sum of the size, and call {code} deleteCache() {code} if limit 
is exceeded .)


Another finding is that the only caller of DistributedCache.getLocalCache(...) 
is TaskRunner.run() (2 places, 1 for files, 1 for archives). By looking at the 
TaskRunner.run(), we can see that we are round-robin-ing through all configured 
dirs for the cachePath. As a result, the size protection inside 
DistributedCache is actually per configured dir, not globally. (this is not 
related to this issue/fix).


> Task Tracker  burns a lot of cpu in calling getLocalCache
> ---------------------------------------------------------
>
>                 Key: HADOOP-4780
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4780
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.19.0
>            Reporter: Runping Qi
>         Attachments: 4780.patch
>
>
> I noticed that many times, a task tracker max up to 6 cpus.
> During that time, iostat shows majority of that was  system cpu.
> That situation can last for quite long.
> During that time, I saw a number of threads were in the following state:
>   java.lang.Thread.State: RUNNABLE
>         at java.io.UnixFileSystem.getBooleanAttributes0(Native Method)
>         at 
> java.io.UnixFileSystem.getBooleanAttributes(UnixFileSystem.java:228)
>         at java.io.File.exists(File.java:733)
>         at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:399)
>         at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
>         at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
>         at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
>         at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
>         at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
>         at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
>         at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
>         at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
>         at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
>         at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
>         at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
>         at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
>         at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
>         at 
> org.apache.hadoop.filecache.DistributedCache.getLocalCache(DistributedCache.java:176)
>         at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:140)
> I suspect that getLocalCache is too expensive.
> And calling it for every task initialization seems too much waste.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to