[ 
https://issues.apache.org/jira/browse/HDFS-15907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17306125#comment-17306125
 ] 

Stephen O'Donnell commented on HDFS-15907:
------------------------------------------

Yea, Shiv was concerned about the memory overhead of concurrentHashMap, but I 
cannot see why it is a problem.

The ConcurentHashMap implement is an object which contains a number of HashMaps 
under the covers. It simply store the keys by hashing the keys across the 
number of Maps it is using internally, and it synchronises at the sub-map 
level, somewhat like a "striped lock". It will be slightly slower for put and 
get due to this extra indirection, but the overhead is tiny.

Does it have a higher memory overhead than a HashMap? Yes but if you are 
storing thousands or millions of keys this will not matter. The extra overhead 
is not a "per entry" overhead, its a static number driven by the Java object 
overhead. If the memory overhead of a HashMap is 32 bytes (picking a number out 
of the air, I have not checked this) then the overhead of a ConcurrentHashMap 
is approx 32 * 32, as I think it creates 32 sub-maps by default.

I feel that this overhead is worth it as it will provide better concurrency 
that synchronising the entire map for all keys.

> Reduce Memory Overhead of AclFeature by avoiding AtomicInteger
> --------------------------------------------------------------
>
>                 Key: HDFS-15907
>                 URL: https://issues.apache.org/jira/browse/HDFS-15907
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: namenode
>            Reporter: Stephen O'Donnell
>            Assignee: Stephen O'Donnell
>            Priority: Major
>         Attachments: HDFS-15907.001.patch
>
>
> In HDFS-15792 we made some changes to the AclFeature and ReferenceCountedMap 
> classes to address a rare bug when loading the FSImage in parallel.
> One change we made was to replace an int inside AclFeature with an 
> AtomicInteger to avoid synchronising the methods in AclFeature.
> Discussing this change with [~weichiu], he pointed out that while the 
> AclFeature cache is intended to reduce the count of AclFeature objects, on a 
> large cluster, it is possible for there to be many millions of AclFeature 
> objects.
> Previously, the int will have taken 4 bytes of heap.
> By moving to a AtomicInteger, we probably have an overhead of:
>  4 bytes (or 8 if the heap is over 32GB) for a reference to the atomic long 
> object
>  12 byte overhead for the java object
>  4 bytes inside the atomic long to store an int.
>  
> So the total heap overhead has gone from 4 bytes to 20 bytes just to use an 
> AtomicInteger.
> Therefore I think it makes sense to remove the AtomicInteger and just 
> synchronise the methods of AclFeature where the value is incremented / 
> decremented / retrieved.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to