[ 
https://issues.apache.org/jira/browse/HADOOP-4780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654352#action_12654352
 ] 

Zheng Shao commented on HADOOP-4780:
------------------------------------

@Yongqiang, we can calculate the size of decompressed size after the 
decompression is done.

The reason that hadoop uses bash, gzip and tar to decompress tgz files is 
probably that these files usually only exists on *nix platforms (so at least 
.zip and .jar can still be decompressed on windows).

The reason that I prefer this is that it preserves the semantics of the old 
code. If we want to remove the du() code, then we need to give a full story on 
how we make sure distributedcache's local copy does not grow beyond the 
configuration limit.

Does that make sense?


> Task Tracker  burns a lot of cpu in calling getLocalCache
> ---------------------------------------------------------
>
>                 Key: HADOOP-4780
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4780
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.19.0
>            Reporter: Runping Qi
>         Attachments: 4780.patch
>
>
> I noticed that many times, a task tracker max up to 6 cpus.
> During that time, iostat shows majority of that was  system cpu.
> That situation can last for quite long.
> During that time, I saw a number of threads were in the following state:
>   java.lang.Thread.State: RUNNABLE
>         at java.io.UnixFileSystem.getBooleanAttributes0(Native Method)
>         at 
> java.io.UnixFileSystem.getBooleanAttributes(UnixFileSystem.java:228)
>         at java.io.File.exists(File.java:733)
>         at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:399)
>         at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
>         at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
>         at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
>         at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
>         at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
>         at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
>         at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
>         at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
>         at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
>         at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
>         at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
>         at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
>         at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
>         at 
> org.apache.hadoop.filecache.DistributedCache.getLocalCache(DistributedCache.java:176)
>         at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:140)
> I suspect that getLocalCache is too expensive.
> And calling it for every task initialization seems too much waste.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to