[
https://issues.apache.org/jira/browse/HADOOP-4780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
he yongqiang updated HADOOP-4780:
---------------------------------
Attachment: 4780-2.patch
@Zheng, I checked the code again, and tested it on a cluster.Yes, you are right.
I think we have two options here,
1)we can record each cache's size in its cacheStatus object, and when deleting,
iterate over all cacheStatus objects and find the total size of a baseDir.
this is easier, but much time-consuming since we have to iterate over all
cacheStatus objects whenever we do the localizeCache.
2)we recode the size for each baseDir, size of each cacheStatus Object, and
also which baseDir this cache lies in. When a cache archive/cache file is
stored in or removed from this baseDir, update the corresponding baseDir
record.
The first one is simple but time-comsuming, and it only needs to introduce one
field in CacheStatus class. The second one needs to introduce a map for
recording size for each baseDir, an int field in CacheStatus for recording its
size, and a string field for recording which baseDir this cache lies in.
Submitted another patch based on option 2
> Task Tracker burns a lot of cpu in calling getLocalCache
> ---------------------------------------------------------
>
> Key: HADOOP-4780
> URL: https://issues.apache.org/jira/browse/HADOOP-4780
> Project: Hadoop Core
> Issue Type: Bug
> Components: mapred
> Affects Versions: 0.19.0
> Reporter: Runping Qi
> Attachments: 4780-2.patch, 4780.patch
>
>
> I noticed that many times, a task tracker max up to 6 cpus.
> During that time, iostat shows majority of that was system cpu.
> That situation can last for quite long.
> During that time, I saw a number of threads were in the following state:
> java.lang.Thread.State: RUNNABLE
> at java.io.UnixFileSystem.getBooleanAttributes0(Native Method)
> at
> java.io.UnixFileSystem.getBooleanAttributes(UnixFileSystem.java:228)
> at java.io.File.exists(File.java:733)
> at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:399)
> at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
> at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
> at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
> at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
> at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
> at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
> at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
> at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
> at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
> at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
> at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
> at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
> at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407)
> at
> org.apache.hadoop.filecache.DistributedCache.getLocalCache(DistributedCache.java:176)
> at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:140)
> I suspect that getLocalCache is too expensive.
> And calling it for every task initialization seems too much waste.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.