[jira] [Comment Edited] (HDFS-8900) Optimize XAttr memory footprint.
[ https://issues.apache.org/jira/browse/HDFS-8900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14706763#comment-14706763 ] Yi Liu edited comment on HDFS-8900 at 8/21/15 2:45 PM: --- Thanks [~andrew.wang], I update the patch for your comments. *About hard limit.* {quote} I talked with Colin Patrick McCabe about compatibility offline {quote} Yes, I agree you and Colin. In fact, when I added the hard limit, the first thing I considered is the compatibility. I did it based on two reasons: 1). as you said too, it's unlikely anyone's ever changed the max size. 2). The max limit is to restrict user's (user/trusted) namespace xattrs, and it doesn't break the existing HDFS features. I felt it was OK to do this modification on trunk and branch-2 directly (Certainly we can still use 4 bytes, but I wanted to save 2 bytes from it :)). I see you agree with the hard limit generally, one thing is do we need to only have the hard limit in trunk? How about do it in the branch-2 if we think it's fine? {quote} Could we exit on setting an xattr size bigger than the hard limit, rather than doing a silent min? We should mention this new hard limit somewhere as well. {quote} Sure, updated in the new patch. User will see the hard limit from error msg. I updated the description of the xattr max size in hdfs-default.xml and mentioned the hard limit too. {quote} Any comment on the max size supported by other filesystems like ext4 or btrfs, for reference as to what's reasonable here? {quote} It's a long time since I read xattr of ext4 and other fs last time. I remember some fs have limit about the xattr length, and some fs don't have. It's detail, small difference should be OK. {quote} Typo in FSDirectory: should - should be. Also the line below it says unlimited but that'll never get triggered now. {quote} sure, I forgot to remove the unlimited... {quote} Regarding configuration, we could simplify by just using the hard limit. Admins would still have the option of disabling xattrs entirely; is there really any value in being able to set something smaller than 32KB? This would definitely make it a trunk change. {quote} Agree, How about doing it in a follow-on if we want later? *More review comments:* {quote} FSDirectory#addToInodeMap, do we need that new return? {quote} right, no need. I removed it. Maybe I added it suddenly :) {quote} A bit out of scope and so optional, but I think everywhere we say prefixName we really want to say prefixedName because prefixName sounds more like the name of the prefix rather than a name with a prefix. {quote} Good idea, {{prefixedName}} is much more better, I updated all those in the new patch. {quote} Some unrelated import changes in FSNamesystemLock and INodeAttributeProvider {quote} OK, I reverted this modification since they are unrelated (I saw them, I was thinking to remove them). {quote} XAttrFeature has an extra import {quote} fixed {quote} What's the reason for switching from ImmutableList to List in some places? The switch is also not complete, since I still see some usages of ImmutableList. I remember we liked ImmutableList originally since it made the need to set very explicit. {quote} I think originally we use {{ImmutableList}} is mainly because it's immutable, and we keep it in {{XAttrFeature}}, we don't want outside modification affects it. Now it's packed to {{byte[]}} in {{XAttrFeature}}, so no need immutable list. I use ArrayList instead of ImmutableList is because when building ImmutableList, it needs additional list copy (from an internal ArrayList). Then the performance is a bit better. I missed to switch one {{ImmutableList}} in XAttrStorage and fixed it now. {quote} Mind adding Javadoc for SerialNumberMap, and an interface audience private annotation? XAttrsFormat's class javadoc goes over 80 chars, could use interface audience private also. {quote} Sure, updated them. One thing is {{XAttrFormat}} is {{package-private}}, so no need to add audience private annotation for it. {quote} Can we add an IllegalStateException to SerialNumberMap#get(T) for Integer overflow? Also there's the case that the int from the map doesn't fit in the 29 bits in XAttrFormat; check that in XAttrsFormattoBytes? {quote} Sure, I also add the check in XAttrFormat#toBytes. Actually I want to create a follow-on to restrict the total number of xattr names for user's (user/trust) xattrs. For HDFS kernel, the number of xattr names are less than 10 currently, but if users create many various xattrs (maybe by mistake), then it will cause unexpected behavior. {quote} Consider also renaming XAttrsFormat to XAttrFormat, so it's named like XAttrStorage {quote} Good idea. {quote} Is it worthwhile to do the same dictionary encoding for the FSImage as well? If the # xattrs is large enough to affect memory footprint, it'd also affect loading times
[jira] [Comment Edited] (HDFS-8900) Optimize XAttr memory footprint.
[ https://issues.apache.org/jira/browse/HDFS-8900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14706763#comment-14706763 ] Yi Liu edited comment on HDFS-8900 at 8/21/15 2:08 PM: --- Thanks [~andrew.wang], I update the patch for your comments. *About hard limit.* {quote} I talked with Colin Patrick McCabe about compatibility offline {quote} Yes, I agree you and Colin. In fact, when I added the hard limit, the first thing I considered is the compatibility. I did it based on two reasons: 1). as you said too, it's unlikely anyone's ever changed the max size. 2). The max limit is to restrict user's (user/trusted) namespace xattrs, and it doesn't break the existing HDFS features. I felt it was OK to do this modification on trunk and branch-2 directly (Certainly we can still use 4 bytes, but I wanted to save 2 bytes from it :)). I see you agree with the hard limit generally, one thing is do we need to only have the hard limit in trunk? How about do it in the branch-2 if we think it's fine? {quote} Could we exit on setting an xattr size bigger than the hard limit, rather than doing a silent min? We should mention this new hard limit somewhere as well. {quote} Sure, updated in the new patch. User will see the hard limit from error msg. I updated the description of the xattr max size in hdfs-default.xml and mentioned the hard limit too. {quote} Any comment on the max size supported by other filesystems like ext4 or btrfs, for reference as to what's reasonable here? {quote} It's a long time since I read xattr of ext4 and other fs last time. I remember some fs have limit about the xattr length, and some fs don't have. It's detail, small difference should be OK. {quote} Typo in FSDirectory: should - should be. Also the line below it says unlimited but that'll never get triggered now. {quote} sure, I forgot to remove the unlimited... {quote} Regarding configuration, we could simplify by just using the hard limit. Admins would still have the option of disabling xattrs entirely; is there really any value in being able to set something smaller than 32KB? This would definitely make it a trunk change. {quote} Agree, How about doing it in a follow-on if we want later? *More review comments:* {quote} FSDirectory#addToInodeMap, do we need that new return? {quote} right, no need. I removed it. Maybe I added it suddenly :) {quote} A bit out of scope and so optional, but I think everywhere we say prefixName we really want to say prefixedName because prefixName sounds more like the name of the prefix rather than a name with a prefix. {quote} Good idea, {{prefixedName}} is much more better, I updated all those in the new patch. {quote} Some unrelated import changes in FSNamesystemLock and INodeAttributeProvider {quote} OK, I reverted this modification since they are unrelated (I saw them, I was thinking to remove them). {quote} XAttrFeature has an extra import {quote} fixed {quote} What's the reason for switching from ImmutableList to List in some places? The switch is also not complete, since I still see some usages of ImmutableList. I remember we liked ImmutableList originally since it made the need to set very explicit. {quote} I think originally we use {{ImmutableList}} is mainly because it's immutable, and we keep it in {{XAttrFeature}}, we don't want outside modification affects it. Now it's packed to {{byte[]}} in {{XAttrFeature}}, so no need immutable list. I use ArrayList instead of ImmutableList is because when building ImmutableList, it needs additional list copy (from an internal ArrayList). Then the performance is a bit better. I missed one place in XAttrStorage and fixed it now. {quote} Mind adding Javadoc for SerialNumberMap, and an interface audience private annotation? XAttrsFormat's class javadoc goes over 80 chars, could use interface audience private also. {quote} Sure, updated them. One thing is {{XAttrFormat}} is {{package-private}}, so no need to add audience private annotation for it. {quote} Can we add an IllegalStateException to SerialNumberMap#get(T) for Integer overflow? Also there's the case that the int from the map doesn't fit in the 29 bits in XAttrFormat; check that in XAttrsFormattoBytes? {quote} Sure, I also add the check in XAttrFormat#toBytes. Actually I want to create a follow-on to restrict the total number of xattr names for user's (user/trust) xattrs. For HDFS kernel, the number of xattr names are less than 10 currently, but if users create many various xattrs (maybe by mistake), then it will cause unexpected behavior. {quote} Consider also renaming XAttrsFormat to XAttrFormat, so it's named like XAttrStorage {quote} Good idea. {quote} Is it worthwhile to do the same dictionary encoding for the FSImage as well? If the # xattrs is large enough to affect memory footprint, it'd also affect loading times which can already be
[jira] [Comment Edited] (HDFS-8900) Optimize XAttr memory footprint.
[ https://issues.apache.org/jira/browse/HDFS-8900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14704529#comment-14704529 ] Yi Liu edited comment on HDFS-8900 at 8/20/15 1:46 PM: --- This optimization: - For each XAttr, can save: 12bytes (object overhead) + 2bytes (compact), memory alignment of value - For each XAttrFeature, can save: (ImmutableList object size) + num * (XAttr saved). So for each XAttrFeature (associated with an INodeFile or INodeDirectory), we can *save* (bytes): (ImmutableList object size) + n * (14 + memory alignment of value). - *If n == 1*, (ImmutableList object size) = 12 + 4 (bytes). Totally *30* bytes. - *if n 1*, ImmutableList is RegularImmutableList, so (ImmutableList object size) = 12 + 4 + 4 + 12 + n * 4 + object alignment. Totally *36 + n \* (18 + memory alignment of value) bytes*. We can see the improvement is significant especially there is more than one XAttr in XAttrFeature. The patch also makes code logic more simper, and has no impact on NN performance. Besides, the xattrMaxSize is used to restrict user/trust namespace xattrs, I add a hard limit for max size. It's reasonable and make Xattr more controllable. was (Author: hitliuyi): This optimization: - For each XAttr, can save: 12bytes (object overhead) + 2bytes (compact), memory alignment of value - For each XAttrFeature, can save: 12 bytes (object overhead) + num * XAttr saved. So for each XAttrFeature (associated with an INodeFile or INodeDirectory), we can save (bytes): 12 + n * (14 + memory alignment of value in each XAttr) The patch also makes code logic more simper, and has no impact on NN performance. Besides, the xattrMaxSize is used to restrict user/trust namespace xattrs, I add a hard limit for max size. It's reasonable and make Xattr more controllable. Optimize XAttr memory footprint. Key: HDFS-8900 URL: https://issues.apache.org/jira/browse/HDFS-8900 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Yi Liu Assignee: Yi Liu Attachments: HDFS-8900.001.patch {code} private final ImmutableListXAttr xAttrs; {code} Currently we use above in XAttrFeature, it's not efficient from memory point of view, since {{ImmutableList}} and {{XAttr}} have object memory overhead, and each object has memory alignment. We can use a {{byte[]}} in XAttrFeature and do some compact in {{XAttr}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HDFS-8900) Optimize XAttr memory footprint.
[ https://issues.apache.org/jira/browse/HDFS-8900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14705870#comment-14705870 ] Yi Liu edited comment on HDFS-8900 at 8/20/15 10:13 PM: Thanks [~andrew.wang] for the quick review and detailed comments (Actually I was planing to ask you for help if you were not here :), since this is one of your specialistic areas) ! I will update the patch and reply you later today. was (Author: hitliuyi): Thanks [~andrew.wang] for the quick review and detailed response (Actually I was planing to ask you for help if you were not here :), since this is one of your specialistic areas) ! I will update the patch and reply you later today. Optimize XAttr memory footprint. Key: HDFS-8900 URL: https://issues.apache.org/jira/browse/HDFS-8900 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Yi Liu Assignee: Yi Liu Attachments: HDFS-8900.001.patch {code} private final ImmutableListXAttr xAttrs; {code} Currently we use above in XAttrFeature, it's not efficient from memory point of view, since {{ImmutableList}} and {{XAttr}} have object memory overhead, and each object has memory alignment. We can use a {{byte[]}} in XAttrFeature and do some compact in {{XAttr}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)