[
https://issues.apache.org/jira/browse/HADOOP-4780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654352#action_12654352
]
Zheng Shao commented on HADOOP-4780:
------------------------------------
@Yongqiang, we can calculate the size of decompressed size after the
decompression is done.
The reason that hadoop uses bash, gzip and tar to decompress tgz files is
probably that these files usually only exists on *nix platforms (so at least
.zip and .jar can still be decompressed on windows).
The reason that I prefer this is that it preserves the semantics of the old
code. If we want to remove the du() code, then we need to give a full story on
how we make sure distributedcache's local copy does not grow beyond the
configuration limit.
Does that make sense?
> Task Tracker burns a lot of cpu in calling getLocalCache
> ---------------------------------------------------------
>
> Key: HADOOP-4780
> URL: https://issues.apache.org/jira/browse/HADOOP-4780
> Project: Hadoop Core
> Issue Type: Bug
> Components: mapred
> Affects Versions: 0.19.0
> Reporter: Runping Qi
> Attachments: 4780.patch
>
>
> I noticed that many times, a task tracker max up to 6 cpus.
> During that time, iostat shows majority of that was system cpu.
> That situation can last for quite long.
> During that time, I saw a number of threads were in the following state:
> java.lang.Thread.State: RUNNABLE
> at java.io.UnixFileSystem.getBooleanAttributes0(Native Method)
> at
> java.io.UnixFileSystem.getBooleanAttributes(UnixFileSystem.java:228)
> at java.io.File.exists(File.java:733)
> at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:399)
> at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
> at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
> at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
> at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
> at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
> at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
> at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
> at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
> at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
> at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
> at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
> at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
> at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
> at
> org.apache.hadoop.filecache.DistributedCache.getLocalCache(DistributedCache.java:176)
> at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:140)
> I suspect that getLocalCache is too expensive.
> And calling it for every task initialization seems too much waste.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.