[jira] [Comment Edited] (HDFS-8900) Optimize XAttr memory footprint.

2015-08-21 Thread Yi Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14706763#comment-14706763
 ] 

Yi Liu edited comment on HDFS-8900 at 8/21/15 2:45 PM:
---

Thanks [~andrew.wang], I update the patch for your comments.

*About hard limit.*
{quote}
I talked with Colin Patrick McCabe about compatibility offline 
{quote}
Yes, I agree you and Colin. In fact, when I added the hard limit, the first 
thing I considered is the compatibility. I did it based on two reasons: 1). as 
you said too, it's unlikely anyone's ever changed the max size. 2). The max 
limit is to restrict user's (user/trusted) namespace xattrs, and it doesn't 
break the existing HDFS features. 
I felt it was OK to do this modification on trunk and branch-2 directly 
(Certainly we can still use 4 bytes, but I wanted to save 2 bytes from it :)). 
I see you agree with the hard limit generally, one thing is do we need to only 
have the hard limit in trunk? How about do it in the branch-2 if we think it's 
fine?

{quote}
Could we exit on setting an xattr size bigger than the hard limit, rather than 
doing a silent min? We should mention this new hard limit somewhere as well.
{quote}
Sure, updated in the new patch. User will see the hard limit from error msg. I 
updated the description of the xattr max size in hdfs-default.xml and mentioned 
the hard limit too.

{quote}
Any comment on the max size supported by other filesystems like ext4 or btrfs, 
for reference as to what's reasonable here?
{quote}
It's a long time since I read xattr of ext4 and other fs last time. I remember 
some fs have limit about the xattr length, and some fs don't have. It's detail, 
small difference should be OK.

{quote}
Typo in FSDirectory: should - should be. Also the line below it says 
unlimited but that'll never get triggered now.
{quote}
sure, I forgot to remove the unlimited...

{quote}
Regarding configuration, we could simplify by just using the hard limit. Admins 
would still have the option of disabling xattrs entirely; is there really any 
value in being able to set something smaller than 32KB? This would definitely 
make it a trunk change.
{quote}
Agree, How about doing it in a follow-on if we want later?

*More review comments:* 
{quote}
FSDirectory#addToInodeMap, do we need that new return?
{quote}
right, no need. I removed it. Maybe I added it suddenly :)

{quote}
A bit out of scope and so optional, but I think everywhere we say prefixName 
we really want to say prefixedName because prefixName sounds more like the 
name of the prefix rather than a name with a prefix.
{quote}
Good idea, {{prefixedName}} is much more better, I updated all those in the new 
patch.

{quote}
Some unrelated import changes in FSNamesystemLock and INodeAttributeProvider
{quote}
OK, I reverted this modification since they are unrelated (I saw them, I was 
thinking to remove them). 

{quote}
XAttrFeature has an extra import
{quote}
fixed

{quote}
What's the reason for switching from ImmutableList to List in some places? The 
switch is also not complete, since I still see some usages of ImmutableList. I 
remember we liked ImmutableList originally since it made the need to set very 
explicit.
{quote}
I think originally we use {{ImmutableList}} is mainly because it's immutable, 
and we keep it in {{XAttrFeature}}, we don't want outside modification affects 
it. Now it's packed to {{byte[]}} in {{XAttrFeature}}, so no need immutable 
list. 
I use ArrayList instead of ImmutableList is because when building 
ImmutableList, it needs additional list copy (from an internal ArrayList). Then 
the performance is a bit better.
I missed to switch one {{ImmutableList}} in XAttrStorage and fixed it now.

{quote}
Mind adding Javadoc for SerialNumberMap, and an interface audience private 
annotation?
XAttrsFormat's class javadoc goes over 80 chars, could use interface audience 
private also.
{quote}
Sure, updated them.  One thing is {{XAttrFormat}} is {{package-private}}, so no 
need to add audience private annotation for it.

{quote}
Can we add an IllegalStateException to SerialNumberMap#get(T) for Integer 
overflow? Also there's the case that the int from the map doesn't fit in the 29 
bits in XAttrFormat; check that in XAttrsFormattoBytes?
{quote}
Sure, I also add the check in XAttrFormat#toBytes.   Actually I want to create 
a follow-on to restrict the total number of xattr names for user's (user/trust) 
xattrs. For HDFS kernel, the number of xattr names are less than 10 currently, 
but if users create many various xattrs (maybe by mistake), then it will cause 
unexpected behavior. 

{quote}
Consider also renaming XAttrsFormat to XAttrFormat, so it's named like 
XAttrStorage
{quote}
Good idea.

{quote}
Is it worthwhile to do the same dictionary encoding for the FSImage as well? If 
the # xattrs is large enough to affect memory footprint, it'd also affect 
loading times 

[jira] [Comment Edited] (HDFS-8900) Optimize XAttr memory footprint.

2015-08-21 Thread Yi Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14706763#comment-14706763
 ] 

Yi Liu edited comment on HDFS-8900 at 8/21/15 2:08 PM:
---

Thanks [~andrew.wang], I update the patch for your comments.

*About hard limit.*
{quote}
I talked with Colin Patrick McCabe about compatibility offline 
{quote}
Yes, I agree you and Colin. In fact, when I added the hard limit, the first 
thing I considered is the compatibility. I did it based on two reasons: 1). as 
you said too, it's unlikely anyone's ever changed the max size. 2). The max 
limit is to restrict user's (user/trusted) namespace xattrs, and it doesn't 
break the existing HDFS features. 
I felt it was OK to do this modification on trunk and branch-2 directly 
(Certainly we can still use 4 bytes, but I wanted to save 2 bytes from it :)). 
I see you agree with the hard limit generally, one thing is do we need to only 
have the hard limit in trunk? How about do it in the branch-2 if we think it's 
fine?

{quote}
Could we exit on setting an xattr size bigger than the hard limit, rather than 
doing a silent min? We should mention this new hard limit somewhere as well.
{quote}
Sure, updated in the new patch. User will see the hard limit from error msg. I 
updated the description of the xattr max size in hdfs-default.xml and mentioned 
the hard limit too.

{quote}
Any comment on the max size supported by other filesystems like ext4 or btrfs, 
for reference as to what's reasonable here?
{quote}
It's a long time since I read xattr of ext4 and other fs last time. I remember 
some fs have limit about the xattr length, and some fs don't have. It's detail, 
small difference should be OK.

{quote}
Typo in FSDirectory: should - should be. Also the line below it says 
unlimited but that'll never get triggered now.
{quote}
sure, I forgot to remove the unlimited...

{quote}
Regarding configuration, we could simplify by just using the hard limit. Admins 
would still have the option of disabling xattrs entirely; is there really any 
value in being able to set something smaller than 32KB? This would definitely 
make it a trunk change.
{quote}
Agree, How about doing it in a follow-on if we want later?

*More review comments:* 
{quote}
FSDirectory#addToInodeMap, do we need that new return?
{quote}
right, no need. I removed it. Maybe I added it suddenly :)

{quote}
A bit out of scope and so optional, but I think everywhere we say prefixName 
we really want to say prefixedName because prefixName sounds more like the 
name of the prefix rather than a name with a prefix.
{quote}
Good idea, {{prefixedName}} is much more better, I updated all those in the new 
patch.

{quote}
Some unrelated import changes in FSNamesystemLock and INodeAttributeProvider
{quote}
OK, I reverted this modification since they are unrelated (I saw them, I was 
thinking to remove them). 

{quote}
XAttrFeature has an extra import
{quote}
fixed

{quote}
What's the reason for switching from ImmutableList to List in some places? The 
switch is also not complete, since I still see some usages of ImmutableList. I 
remember we liked ImmutableList originally since it made the need to set very 
explicit.
{quote}
I think originally we use {{ImmutableList}} is mainly because it's immutable, 
and we keep it in {{XAttrFeature}}, we don't want outside modification affects 
it. Now it's packed to {{byte[]}} in {{XAttrFeature}}, so no need immutable 
list. 
I use ArrayList instead of ImmutableList is because when building 
ImmutableList, it needs additional list copy (from an internal ArrayList). Then 
the performance is a bit better.
I missed one place in XAttrStorage and fixed it now.

{quote}
Mind adding Javadoc for SerialNumberMap, and an interface audience private 
annotation?
XAttrsFormat's class javadoc goes over 80 chars, could use interface audience 
private also.
{quote}
Sure, updated them.  One thing is {{XAttrFormat}} is {{package-private}}, so no 
need to add audience private annotation for it.

{quote}
Can we add an IllegalStateException to SerialNumberMap#get(T) for Integer 
overflow? Also there's the case that the int from the map doesn't fit in the 29 
bits in XAttrFormat; check that in XAttrsFormattoBytes?
{quote}
Sure, I also add the check in XAttrFormat#toBytes.   Actually I want to create 
a follow-on to restrict the total number of xattr names for user's (user/trust) 
xattrs. For HDFS kernel, the number of xattr names are less than 10 currently, 
but if users create many various xattrs (maybe by mistake), then it will cause 
unexpected behavior. 

{quote}
Consider also renaming XAttrsFormat to XAttrFormat, so it's named like 
XAttrStorage
{quote}
Good idea.

{quote}
Is it worthwhile to do the same dictionary encoding for the FSImage as well? If 
the # xattrs is large enough to affect memory footprint, it'd also affect 
loading times which can already be 

[jira] [Comment Edited] (HDFS-8900) Optimize XAttr memory footprint.

2015-08-20 Thread Yi Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14704529#comment-14704529
 ] 

Yi Liu edited comment on HDFS-8900 at 8/20/15 1:46 PM:
---

This optimization:
-   For each XAttr, can save: 12bytes (object overhead) + 2bytes (compact), 
memory alignment of value
-   For each XAttrFeature, can save: (ImmutableList object size) + num * (XAttr 
saved).

So for each XAttrFeature (associated with an INodeFile or INodeDirectory), we 
can *save* (bytes): (ImmutableList object size) + n * (14 + memory alignment of 
value).
-  *If n == 1*,  (ImmutableList object size) = 12 + 4 (bytes). Totally *30* 
bytes.
-  *if n  1*, ImmutableList  is RegularImmutableList, so (ImmutableList object 
size) = 12 + 4 + 4 + 12 + n * 4 + object alignment.  Totally *36 + n \* (18 + 
memory alignment of value) bytes*.

We can see the improvement is significant especially there is more than one 
XAttr in XAttrFeature.


The patch also makes code logic more simper, and has no impact on NN 
performance.

Besides, the xattrMaxSize is used to restrict user/trust namespace xattrs, I 
add a hard limit for max size. It's reasonable and make Xattr more 
controllable. 


was (Author: hitliuyi):
This optimization:
-   For each XAttr, can save: 12bytes (object overhead) + 2bytes (compact), 
memory alignment of value
-   For each XAttrFeature, can save: 12 bytes (object overhead) + num * XAttr 
saved.

So for each XAttrFeature (associated with an INodeFile or INodeDirectory), we 
can save (bytes): 12 + n * (14 + memory alignment of value in each XAttr) 


The patch also makes code logic more simper, and has no impact on NN 
performance.

Besides, the xattrMaxSize is used to restrict user/trust namespace xattrs, I 
add a hard limit for max size. It's reasonable and make Xattr more 
controllable. 

 Optimize XAttr memory footprint.
 

 Key: HDFS-8900
 URL: https://issues.apache.org/jira/browse/HDFS-8900
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Reporter: Yi Liu
Assignee: Yi Liu
 Attachments: HDFS-8900.001.patch


 {code}
 private final ImmutableListXAttr xAttrs;
 {code}
 Currently we use above in XAttrFeature, it's not efficient from memory point 
 of view, since {{ImmutableList}} and {{XAttr}} have object memory overhead, 
 and each object has memory alignment. 
 We can use a {{byte[]}} in XAttrFeature and do some compact in {{XAttr}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HDFS-8900) Optimize XAttr memory footprint.

2015-08-20 Thread Yi Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14705870#comment-14705870
 ] 

Yi Liu edited comment on HDFS-8900 at 8/20/15 10:13 PM:


Thanks [~andrew.wang] for the quick review and detailed comments (Actually I 
was planing to ask you for help if you were not here :), since this is one of 
your specialistic areas) !
I will update the patch and reply you later today.


was (Author: hitliuyi):
Thanks [~andrew.wang] for the quick review and detailed response (Actually I 
was planing to ask you for help if you were not here :), since this is one of 
your specialistic areas) !
I will update the patch and reply you later today.

 Optimize XAttr memory footprint.
 

 Key: HDFS-8900
 URL: https://issues.apache.org/jira/browse/HDFS-8900
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Reporter: Yi Liu
Assignee: Yi Liu
 Attachments: HDFS-8900.001.patch


 {code}
 private final ImmutableListXAttr xAttrs;
 {code}
 Currently we use above in XAttrFeature, it's not efficient from memory point 
 of view, since {{ImmutableList}} and {{XAttr}} have object memory overhead, 
 and each object has memory alignment. 
 We can use a {{byte[]}} in XAttrFeature and do some compact in {{XAttr}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)