[ 
https://issues.apache.org/jira/browse/HADOOP-12107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated HADOOP-12107:
---------------------------------
    Attachment: HADOOP-12107.002.patch

v.2 patch posted.

Changes:
- restored the weak reference in StatisticsData and deprecated it instead
- changed the cleaner thread (and the reference queue) to be static (global)
- fixed a bug in cleanUp() where a wrong type was being removed
- changed the reference list from a LinkedList to a HashSet for faster removal
- wrote a new unit test for testing this
- removed unnecessary null check in visitAll()

This should address most of the review comments.

As for the cleaner thread, I was able to convince myself that a single global 
cleaner thread should be adequate. I do not see thread safety issues, and it 
just needs to keep up with the rate of (threads being discarded) * (filesystem 
instances). Even if it could fall behind at times, it should be able to catch 
up barring the most pathological situation. Let me know if you guys are OK with 
that reasoning. Thanks!

> long running apps may have a huge number of StatisticsData instances under 
> FileSystem
> -------------------------------------------------------------------------------------
>
>                 Key: HADOOP-12107
>                 URL: https://issues.apache.org/jira/browse/HADOOP-12107
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 2.7.0
>            Reporter: Sangjin Lee
>            Assignee: Sangjin Lee
>            Priority: Minor
>         Attachments: HADOOP-12107.001.patch, HADOOP-12107.002.patch
>
>
> We observed with some of our apps (non-mapreduce apps that use filesystems) 
> that they end up accumulating a huge memory footprint coming from 
> {{FileSystem$Statistics$StatisticsData}} (in the {{allData}} list of 
> {{Statistics}}).
> Although the thread reference from {{StatisticsData}} is a weak reference, 
> and thus can get cleared once a thread goes away, the actual 
> {{StatisticsData}} instances in the list won't get cleared until any of these 
> following methods is called on {{Statistics}}:
> - {{getBytesRead()}}
> - {{getBytesWritten()}}
> - {{getReadOps()}}
> - {{getLargeReadOps()}}
> - {{getWriteOps()}}
> - {{toString()}}
> It is quite possible to have an application that interacts with a filesystem 
> but does not call any of these methods on the {{Statistics}}. If such an 
> application runs for a long time and has a large amount of thread churn, the 
> memory footprint will grow significantly.
> The current workaround is either to limit the thread churn or to invoke these 
> operations occasionally to pare down the memory. However, this is still a 
> deficiency with {{FileSystem$Statistics}} itself in that the memory is 
> controlled only as a side effect of those operations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to