[
https://issues.apache.org/jira/browse/HADOOP-4780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12655115#action_12655115
]
Zheng Shao commented on HADOOP-4780:
------------------------------------
The size of the file/archive (including decompressed files) can be easily
calculated. Each of the file/archive is downloaded into a different directory:
{code}
DistributedCache.java:
Path parchive = new Path(cacheStatus.localLoadPath,
new Path(cacheStatus.localLoadPath.getName()));
{code}
We just need to remember the size of {code} cacheStatus.localLoadPath {code} in
DistributedCache. Then we won't need to do getDU() again (instead we just need
to calculate the sum of the size, and call {code} deleteCache() {code} if limit
is exceeded .)
Another finding is that the only caller of DistributedCache.getLocalCache(...)
is TaskRunner.run() (2 places, 1 for files, 1 for archives). By looking at the
TaskRunner.run(), we can see that we are round-robin-ing through all configured
dirs for the cachePath. As a result, the size protection inside
DistributedCache is actually per configured dir, not globally. (this is not
related to this issue/fix).
> Task Tracker burns a lot of cpu in calling getLocalCache
> ---------------------------------------------------------
>
> Key: HADOOP-4780
> URL: https://issues.apache.org/jira/browse/HADOOP-4780
> Project: Hadoop Core
> Issue Type: Bug
> Components: mapred
> Affects Versions: 0.19.0
> Reporter: Runping Qi
> Attachments: 4780.patch
>
>
> I noticed that many times, a task tracker max up to 6 cpus.
> During that time, iostat shows majority of that was system cpu.
> That situation can last for quite long.
> During that time, I saw a number of threads were in the following state:
> java.lang.Thread.State: RUNNABLE
> at java.io.UnixFileSystem.getBooleanAttributes0(Native Method)
> at
> java.io.UnixFileSystem.getBooleanAttributes(UnixFileSystem.java:228)
> at java.io.File.exists(File.java:733)
> at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:399)
> at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
> at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
> at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
> at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
> at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
> at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
> at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
> at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
> at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
> at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
> at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
> at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
> at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
> at
> org.apache.hadoop.filecache.DistributedCache.getLocalCache(DistributedCache.java:176)
> at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:140)
> I suspect that getLocalCache is too expensive.
> And calling it for every task initialization seems too much waste.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.