[ 
https://issues.apache.org/jira/browse/HADOOP-4780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

he yongqiang updated HADOOP-4780:
---------------------------------

    Attachment: 4780-2.patch

@Zheng, I checked the code again, and tested it on a cluster.Yes, you are right.
I think we have two options here, 
1)we can record each cache's size in its cacheStatus object, and when deleting, 
iterate over all cacheStatus objects and find the total size of a baseDir.  
this is easier, but much time-consuming since we have to iterate over all 
cacheStatus objects whenever we do the localizeCache.  
2)we recode the size for each baseDir, size of each cacheStatus Object, and 
also which baseDir this cache lies in. When a cache archive/cache file is 
stored in or removed from this baseDir, update the corresponding baseDir 
record. 

The first one is simple but time-comsuming, and it only needs to introduce one 
field in CacheStatus class. The second one needs to introduce a map for 
recording size for each baseDir, an int field in CacheStatus for recording its 
size, and a string field for recording which baseDir this cache lies in.

Submitted another patch based on option 2

> Task Tracker  burns a lot of cpu in calling getLocalCache
> ---------------------------------------------------------
>
>                 Key: HADOOP-4780
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4780
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.19.0
>            Reporter: Runping Qi
>         Attachments: 4780-2.patch, 4780.patch
>
>
> I noticed that many times, a task tracker max up to 6 cpus.
> During that time, iostat shows majority of that was  system cpu.
> That situation can last for quite long.
> During that time, I saw a number of threads were in the following state:
>   java.lang.Thread.State: RUNNABLE
>         at java.io.UnixFileSystem.getBooleanAttributes0(Native Method)
>         at 
> java.io.UnixFileSystem.getBooleanAttributes(UnixFileSystem.java:228)
>         at java.io.File.exists(File.java:733)
>         at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:399)
>         at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
>         at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
>         at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
>         at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
>         at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
>         at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
>         at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
>         at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
>         at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
>         at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
>         at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
>         at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
>         at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
>         at 
> org.apache.hadoop.filecache.DistributedCache.getLocalCache(DistributedCache.java:176)
>         at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:140)
> I suspect that getLocalCache is too expensive.
> And calling it for every task initialization seems too much waste.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to