subject:"\[jira\] \[Comment Edited\] $HDFS\-12051$ Reimplement NameCache in NameNode\: Intern duplicate byte\[\] arrays $mainly those denoting file\/directory names$ to save memory"

[jira] [Comment Edited] (HDFS-12051) Reimplement NameCache in NameNode: Intern duplicate byte[] arrays (mainly those denoting file/directory names) to save memory

2018-01-25 Thread Misha Dmitriev (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-12051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16340176#comment-16340176
]

Misha Dmitriev edited comment on HDFS-12051 at 1/25/18 10:29 PM:
-

Thank you for the review, [~manojg] See my responses inline below.

{{NameCache.java}}
* _line 97: {{cache = new byte[cacheSize][];}} Since this will take up a
contiguous space, we need to restrict the cache size to much lesser size than
your current MAX size of 1 << 30. Your thoughts?_

As you can see from line 78, the cache size is always set from the
configuration, which provides a reasonable default, which is much, much smaller
than 1<<30. It's up to the customer to increase this value if they need. If
they have a huge heap, like 120GB (I've already heard of users approaching
this!), then with 1<<30 it will result in an 8GB contiguous array. With a huge
heap already, it is nothing wrong, if they really need it. But, anyway, if they
decide to change this number, I think it's reasonable to expect them to have
some understanding of what they are doing.

_{{#cache}} is now following the {{open addressing}} model. Any reasons why you
moved to this model compared to your initial design?_

My own design for this cache has always been open addressing. The reason is
that this is the most economical model in terms of memory: it uses just one
pointer per cache entry, which is 8 bytes at most. If you use a cache with
collision chains, like in java.util.HashMap, then each entry requires a pointer
and a separate object (HashMap$Entry) This separate object takes at least 32
bytes, so you end up with at least 40 bytes per entry - five times more!

Now, for a real HashMap, that needs to hold potentially very large number of
objects, and needs to hold them all, the collision chain design may be
justified in some cases. But for our specialized fixed-size cache, that strives
to minimize its own memory overhead, the open addressing design is more
appropriate.
* _{{#put()}}_
** _line 118: the first time cache fill .. shouldn't it be a new byte array
name constructed from the passed in name? Why use the same caller passed in
name?_

The goal of this cache is to _avoid_ object duplication as much as possible. If
the caller gave us a name X for which we don't have an existing copy, just
remember X and return it. If on the next invocation they gave us Y and it turns
out to be the same as X, return X again, and Y will be effectively lost and
GCed soon.

* _With the {{open addressing}} model, when you overwrite the cache slot with
new names, there could be INodes which are already referring to this name and
are cut from the cache. Though their references are still valid, want to
understand why the preference given to new names compared to the old one._

The preference is given to new names simply because it's the lesser evil. We
already discussed this with [~yzhangal] in the past. First, obviously when a
cache entry is overwritten, the old INodes will just continue to refer to their
old names, i.e. no information is lost. Second, all our solution details stem
from the fact that we don't know in advance how many names we are going to
have, and how many of them will be duplicate. Thus we want to have a fixed-size
cache that will be guaranteed to not waste much memory if there is little
duplication, but will provide a benefit and will save a lot of memory if there
is considerable duplication.

Now, suppose we have a cache of size 3, and names come as follows: 'A', 'B',
'C', 'D', 'D', 'D', ... The cache would be full after the first 3 names. If
after that we don't override one of the entries to accomodate 'D', we will not
realize any savings from deduplicating all the subsequent 'D's. To be fair, if
this cache receives something like 'A', 'B', 'C', 'D', 'E', 'F', 'A', 'B', 'C',
'D', 'E', 'F' - then it just gets rewritten all the time and provides no
benefit. But in practice (and I have already implemented a similar cache in
several other projects), I've never observed such a pathology. With a
reasonable-size cache and real-life data, it always works.
* _I don't see any cache invalidation even when the INodes are removed. This
takes up memory. Though not huge, design wise its not clean to leave the cache
with stale values and incur cache lookup penalty in the future put()_

This cache by default takes just 16MB, which is 0.1% of 16GB, which is on the
smaller side of NN heap size spectrum. So any losses due to stale cache entries
are pretty negligible. Furthermore, the above-mentioned overwriting of cache
entries when new data is coming also helps to keep the cache reasonably "fresh".
* _{{#getSize()}} since there is no cache invalidation, and since this open
addressing model, the size returned is not right._

As the javadoc for this method explains, this method may return

[jira] [Comment Edited] (HDFS-12051) Reimplement NameCache in NameNode: Intern duplicate byte[] arrays (mainly those denoting file/directory names) to save memory

2018-01-24 Thread Manoj Govindassamy (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-12051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16338473#comment-16338473
 ] 

Manoj Govindassamy edited comment on HDFS-12051 at 1/25/18 12:23 AM:
-

Thanks for working on this [~mi...@cloudera.com]. Few comments on 
HDFS-12051.07.patch

{{NameCache.java}}
 * line 97: {{cache = new byte[cacheSize][];}} Since this will take up a 
contiguous space, we need to restrict the cache size to much lesser size than 
your current MAX size of 1 << 30. Your thoughts?
 * {{#cache}} is now following the {{open addressing}} model. Any reasons why 
you moved to this model compared to your initial design?
 * {{#put()}} 
 ** line 118: the first time cache fill .. shouldn't it be a new byte array 
name constructed from the passed in name? Why use the same caller passed in 
name?
 ** With the {{open addressing}} model, when you overwrite the cache slot with 
new names,  there could be INodes which are already referring to this name and 
are cut from the cache. Though their references are still valid, want  to 
understand why the preference given to new names compared to the old one.
 * I don't see any cache invalidation even when the INodes are removed. This 
takes up memory. Though not huge, design wise its not clean to leave the cache 
with stale values and incur cache lookup penalty in the future put() 
 * {{#getSize()}} since there is no cache invalidation, and since this open 
addressing model, the size returned is not right.
 * line 149: {{cacheSizeFor}} is this roundUp or roundDown to the nearest 2 
power. Please add the link to {{HashMap#tableSizeFor()}} in the comment to show 
where the code is inspired from.


was (Author: manojg):
Thanks for working on this [~mi...@cloudera.com]. Few comments on 
HDFS-12051.07.patch

{{NameCache.java}}
 * line 97: {{cache = new byte[cacheSize][];}} Since this will take up a 
contiguous space, we need to restrict the cache size to much lesser size than 
your current MAX size of 1 << 30. Your thoughts?
 * {{#cache}} is now following the {{open addressing}} model. Any reasons why 
you moved to this model compared to your initial design?
 * {{#put()}} 
 ** line 118: the first time cache fill .. shouldn't it be a new byte array 
name constructed from the passed in name? Why use the same caller passed in 
name?
 ** With the {{open addressing}} model, when you overwrite the cache slot with 
new names,  there could be INodes which are already referring to this name and 
are cut from the cache. 
 * I don't see any cache invalidation even when the INodes are removed. This 
takes up memory. Though not huge, design wise its not clean to leave the cache 
with stale values and incur cache lookup penalty in the future put() 
 * {{#getSize()}} since there is no cache invalidation, and since this open 
addressing model, the size returned is not right.
 * line 149: {{cacheSizeFor}} is this roundUp or roundDown to the nearest 2 
power. Please add the link to {{HashMap#tableSizeFor()}} in the comment to show 
where the code is inspired from.

> Reimplement NameCache in NameNode: Intern duplicate byte[] arrays (mainly 
> those denoting file/directory names) to save memory
> -
>
> Key: HDFS-12051
> URL: https://issues.apache.org/jira/browse/HDFS-12051
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Misha Dmitriev
>Assignee: Misha Dmitriev
>Priority: Major
> Attachments: HDFS-12051.01.patch, HDFS-12051.02.patch, 
> HDFS-12051.03.patch, HDFS-12051.04.patch, HDFS-12051.05.patch, 
> HDFS-12051.06.patch, HDFS-12051.07.patch
>
>
> When snapshot diff operation is performed in a NameNode that manages several 
> million HDFS files/directories, NN needs a lot of memory. Analyzing one heap 
> dump with jxray (www.jxray.com), we observed that duplicate byte[] arrays 
> result in 6.5% memory overhead, and most of these arrays are referenced by 
> {{org.apache.hadoop.hdfs.server.namenode.INodeFileAttributes$SnapshotCopy.name}}
>  and {{org.apache.hadoop.hdfs.server.namenode.INodeFile.name}}:
> {code:java}
> 19. DUPLICATE PRIMITIVE ARRAYS
> Types of duplicate objects:
>  Ovhd Num objs  Num unique objs   Class name
> 3,220,272K (6.5%)   104749528  25760871 byte[]
> 
>   1,841,485K (3.7%), 53194037 dup arrays (13158094 unique)
> 3510556 of byte[17](112, 97, 114, 116, 45, 109, 45, 48, 48, 48, ...), 2228255 
> of byte[8](48, 48, 48, 48, 48, 48, 95, 48), 357439 of byte[17](112, 97, 114, 
> 116, 45, 109, 45, 48, 48, 48, ...), 237395 of byte[8](48, 48, 48, 48, 48, 49, 
> 95, 48), 227853 of byte[17](112, 97, 114, 116, 45, 109, 45, 48, 48, 48, ...), 
> 179193 of byte[17](112, 97, 114, 116, 45, 109, 45, 48, 48,

[jira] [Comment Edited] (HDFS-12051) Reimplement NameCache in NameNode: Intern duplicate byte[] arrays (mainly those denoting file/directory names) to save memory

2018-01-23 Thread Yongjun Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-12051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16336383#comment-16336383
 ] 

Yongjun Zhang edited comment on HDFS-12051 at 1/23/18 9:32 PM:
---

Thanks for your comments [~szetszwo]. I thought you had reviewed the patch. 

Misha described his tests at:

https://issues.apache.org/jira/browse/HDFS-12051?focusedCommentId=16084471&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16084471

and

https://issues.apache.org/jira/browse/HDFS-12051?focusedCommentId=16329891&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16329891

I agree more testing is always a good thing.

I did another round of review and found one thing I did not point out earlier 
(I'm sorry about that). The patch obsoleted the old config 
{code}
public static final String DFS_NAMENODE_NAME_CACHE_THRESHOLD_KEY = 
"dfs.namenode.name.cache.threshold";
public static final int DFS_NAMENODE_NAME_CACHE_THRESHOLD_DEFAULT = 10;
{code}
and introduced new config 
{code}
public static final String  DFS_NAMENODE_NAME_CACHE_SIZE_KEY = 
"dfs.namenode.name.cache.size";
public static final int DFS_NAMENODE_NAME_CACHE_SIZE_DEFAULT = 4 * 1024 * 
1024;
{code}
We should make this visible to the user saying that the former is deprecated 
because of the implementation change, and the new one can be used to config the 
new implementation.

Would you please address this [~mi...@cloudera.com]?

Since I'm the only one who has reviewed the patch, I'm inviting [~manojg] for a 
review (thanks Manoj). While he is reviewing, other folks are welcome to review.

Thanks.




was (Author: yzhangal):
Thanks for your comments [~szetszwo]. I thought you had reviewed the patch. 

Misha described his tests at:

https://issues.apache.org/jira/browse/HDFS-12051?focusedCommentId=16084471&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16084471

and

https://issues.apache.org/jira/browse/HDFS-12051?focusedCommentId=16329891&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16329891

I agree more testing is always a good thing.

I did another round of review and found one thing I did not point out earlier. 
The patch obsoleted the old config 
{code}
public static final String DFS_NAMENODE_NAME_CACHE_THRESHOLD_KEY = 
"dfs.namenode.name.cache.threshold";
public static final int DFS_NAMENODE_NAME_CACHE_THRESHOLD_DEFAULT = 10;
{code}
and introduced new config 
{code}
public static final String  DFS_NAMENODE_NAME_CACHE_SIZE_KEY = 
"dfs.namenode.name.cache.size";
public static final int DFS_NAMENODE_NAME_CACHE_SIZE_DEFAULT = 4 * 1024 * 
1024;
{code}
We should make this visible to the user saying that the former is deprecated 
because of the implementation change, and the new one can be used to config the 
new implementation.

Would you please address this [~mi...@cloudera.com]?

Since I'm the only one who has reviewed the patch, I'm inviting [~manojg] for a 
review (thanks Manoj). While he is reviewing, other folks are welcome to review.

Thanks.



> Reimplement NameCache in NameNode: Intern duplicate byte[] arrays (mainly 
> those denoting file/directory names) to save memory
> -
>
> Key: HDFS-12051
> URL: https://issues.apache.org/jira/browse/HDFS-12051
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Misha Dmitriev
>Assignee: Misha Dmitriev
>Priority: Major
> Attachments: HDFS-12051.01.patch, HDFS-12051.02.patch, 
> HDFS-12051.03.patch, HDFS-12051.04.patch, HDFS-12051.05.patch, 
> HDFS-12051.06.patch
>
>
> When snapshot diff operation is performed in a NameNode that manages several 
> million HDFS files/directories, NN needs a lot of memory. Analyzing one heap 
> dump with jxray (www.jxray.com), we observed that duplicate byte[] arrays 
> result in 6.5% memory overhead, and most of these arrays are referenced by 
> {{org.apache.hadoop.hdfs.server.namenode.INodeFileAttributes$SnapshotCopy.name}}
>  and {{org.apache.hadoop.hdfs.server.namenode.INodeFile.name}}:
> {code:java}
> 19. DUPLICATE PRIMITIVE ARRAYS
> Types of duplicate objects:
>  Ovhd Num objs  Num unique objs   Class name
> 3,220,272K (6.5%)   104749528  25760871 byte[]
> 
>   1,841,485K (3.7%), 53194037 dup arrays (13158094 unique)
> 3510556 of byte[17](112, 97, 114, 116, 45, 109, 45, 48, 48, 48, ...), 2228255 
> of byte[8](48, 48, 48, 48, 48, 48, 95, 48), 357439 of byte[17](112, 97, 114, 
> 116, 45, 109, 45, 48, 48, 48, ...), 237395 of byte[8](48, 48, 48, 48, 48, 49, 
> 95, 48), 227853 of byte[17](112, 97, 114, 116, 45, 109, 45, 48, 48, 48, ...), 
> 179193 of byte[17](112, 97, 114, 116, 45, 109, 4

[jira] [Comment Edited] (HDFS-12051) Reimplement NameCache in NameNode: Intern duplicate byte[] arrays (mainly those denoting file/directory names) to save memory

[jira] [Comment Edited] (HDFS-12051) Reimplement NameCache in NameNode: Intern duplicate byte[] arrays (mainly those denoting file/directory names) to save memory

[jira] [Comment Edited] (HDFS-12051) Reimplement NameCache in NameNode: Intern duplicate byte[] arrays (mainly those denoting file/directory names) to save memory

3 matches

Site Navigation

Mail list logo

Footer information