[ https://issues.apache.org/jira/browse/HDFS-8900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14706763#comment-14706763 ]
Yi Liu edited comment on HDFS-8900 at 8/21/15 2:45 PM: ------------------------------------------------------- Thanks [~andrew.wang], I update the patch for your comments. *About hard limit.* {quote} I talked with Colin Patrick McCabe about compatibility offline .... {quote} Yes, I agree you and Colin. In fact, when I added the hard limit, the first thing I considered is the compatibility. I did it based on two reasons: 1). as you said too, it's unlikely anyone's ever changed the max size. 2). The max limit is to restrict user's (user/trusted) namespace xattrs, and it doesn't break the existing HDFS features. I felt it was OK to do this modification on trunk and branch-2 directly (Certainly we can still use 4 bytes, but I wanted to save 2 bytes from it :)). I see you agree with the hard limit generally, one thing is do we need to only have the hard limit in trunk? How about do it in the branch-2 if we think it's fine? {quote} Could we exit on setting an xattr size bigger than the hard limit, rather than doing a silent min? We should mention this new hard limit somewhere as well. {quote} Sure, updated in the new patch. User will see the hard limit from error msg. I updated the description of the xattr max size in hdfs-default.xml and mentioned the hard limit too. {quote} Any comment on the max size supported by other filesystems like ext4 or btrfs, for reference as to what's reasonable here? {quote} It's a long time since I read xattr of ext4 and other fs last time. I remember some fs have limit about the xattr length, and some fs don't have. It's detail, small difference should be OK. {quote} Typo in FSDirectory: "should" -> "should be". Also the line below it says "unlimited" but that'll never get triggered now. {quote} sure, I forgot to remove the unlimited... {quote} Regarding configuration, we could simplify by just using the hard limit. Admins would still have the option of disabling xattrs entirely; is there really any value in being able to set something smaller than 32KB? This would definitely make it a trunk change. {quote} Agree, How about doing it in a follow-on if we want later? *More review comments:* {quote} FSDirectory#addToInodeMap, do we need that new return? {quote} right, no need. I removed it. Maybe I added it suddenly :) {quote} A bit out of scope and so optional, but I think everywhere we say "prefixName" we really want to say "prefixedName" because "prefixName" sounds more like the name of the prefix rather than a name with a prefix. {quote} Good idea, {{prefixedName}} is much more better, I updated all those in the new patch. {quote} Some unrelated import changes in FSNamesystemLock and INodeAttributeProvider {quote} OK, I reverted this modification since they are unrelated (I saw them, I was thinking to remove them). {quote} XAttrFeature has an extra import {quote} fixed {quote} What's the reason for switching from ImmutableList to List in some places? The switch is also not complete, since I still see some usages of ImmutableList. I remember we liked ImmutableList originally since it made the need to set very explicit. {quote} I think originally we use {{ImmutableList}} is mainly because it's immutable, and we keep it in {{XAttrFeature}}, we don't want outside modification affects it. Now it's packed to {{byte[]}} in {{XAttrFeature}}, so no need immutable list. I use ArrayList instead of ImmutableList is because when building ImmutableList, it needs additional list copy (from an internal ArrayList). Then the performance is a bit better. I missed to switch one {{ImmutableList}} in XAttrStorage and fixed it now. {quote} Mind adding Javadoc for SerialNumberMap, and an interface audience private annotation? XAttrsFormat's class javadoc goes over 80 chars, could use interface audience private also. {quote} Sure, updated them. One thing is {{XAttrFormat}} is {{package-private}}, so no need to add audience private annotation for it. {quote} Can we add an IllegalStateException to SerialNumberMap#get(T) for Integer overflow? Also there's the case that the int from the map doesn't fit in the 29 bits in XAttrFormat; check that in XAttrsFormattoBytes? {quote} Sure, I also add the check in XAttrFormat#toBytes. Actually I want to create a follow-on to restrict the total number of xattr names for user's (user/trust) xattrs. For HDFS kernel, the number of xattr names are less than 10 currently, but if users create many various xattrs (maybe by mistake), then it will cause unexpected behavior. {quote} Consider also renaming XAttrsFormat to XAttrFormat, so it's named like XAttrStorage {quote} Good idea. {quote} Is it worthwhile to do the same dictionary encoding for the FSImage as well? If the # xattrs is large enough to affect memory footprint, it'd also affect loading times which can already be minutes. Can be a follow-on JIRA for sure. {quote} Actually we already have this compaction for XAttr in FSImage, please see {{XAttrCompactProto}} in fsimage.proto. was (Author: hitliuyi): Thanks [~andrew.wang], I update the patch for your comments. *About hard limit.* {quote} I talked with Colin Patrick McCabe about compatibility offline .... {quote} Yes, I agree you and Colin. In fact, when I added the hard limit, the first thing I considered is the compatibility. I did it based on two reasons: 1). as you said too, it's unlikely anyone's ever changed the max size. 2). The max limit is to restrict user's (user/trusted) namespace xattrs, and it doesn't break the existing HDFS features. I felt it was OK to do this modification on trunk and branch-2 directly (Certainly we can still use 4 bytes, but I wanted to save 2 bytes from it :)). I see you agree with the hard limit generally, one thing is do we need to only have the hard limit in trunk? How about do it in the branch-2 if we think it's fine? {quote} Could we exit on setting an xattr size bigger than the hard limit, rather than doing a silent min? We should mention this new hard limit somewhere as well. {quote} Sure, updated in the new patch. User will see the hard limit from error msg. I updated the description of the xattr max size in hdfs-default.xml and mentioned the hard limit too. {quote} Any comment on the max size supported by other filesystems like ext4 or btrfs, for reference as to what's reasonable here? {quote} It's a long time since I read xattr of ext4 and other fs last time. I remember some fs have limit about the xattr length, and some fs don't have. It's detail, small difference should be OK. {quote} Typo in FSDirectory: "should" -> "should be". Also the line below it says "unlimited" but that'll never get triggered now. {quote} sure, I forgot to remove the unlimited... {quote} Regarding configuration, we could simplify by just using the hard limit. Admins would still have the option of disabling xattrs entirely; is there really any value in being able to set something smaller than 32KB? This would definitely make it a trunk change. {quote} Agree, How about doing it in a follow-on if we want later? *More review comments:* {quote} FSDirectory#addToInodeMap, do we need that new return? {quote} right, no need. I removed it. Maybe I added it suddenly :) {quote} A bit out of scope and so optional, but I think everywhere we say "prefixName" we really want to say "prefixedName" because "prefixName" sounds more like the name of the prefix rather than a name with a prefix. {quote} Good idea, {{prefixedName}} is much more better, I updated all those in the new patch. {quote} Some unrelated import changes in FSNamesystemLock and INodeAttributeProvider {quote} OK, I reverted this modification since they are unrelated (I saw them, I was thinking to remove them). {quote} XAttrFeature has an extra import {quote} fixed {quote} What's the reason for switching from ImmutableList to List in some places? The switch is also not complete, since I still see some usages of ImmutableList. I remember we liked ImmutableList originally since it made the need to set very explicit. {quote} I think originally we use {{ImmutableList}} is mainly because it's immutable, and we keep it in {{XAttrFeature}}, we don't want outside modification affects it. Now it's packed to {{byte[]}} in {{XAttrFeature}}, so no need immutable list. I use ArrayList instead of ImmutableList is because when building ImmutableList, it needs additional list copy (from an internal ArrayList). Then the performance is a bit better. I missed one place in XAttrStorage and fixed it now. {quote} Mind adding Javadoc for SerialNumberMap, and an interface audience private annotation? XAttrsFormat's class javadoc goes over 80 chars, could use interface audience private also. {quote} Sure, updated them. One thing is {{XAttrFormat}} is {{package-private}}, so no need to add audience private annotation for it. {quote} Can we add an IllegalStateException to SerialNumberMap#get(T) for Integer overflow? Also there's the case that the int from the map doesn't fit in the 29 bits in XAttrFormat; check that in XAttrsFormattoBytes? {quote} Sure, I also add the check in XAttrFormat#toBytes. Actually I want to create a follow-on to restrict the total number of xattr names for user's (user/trust) xattrs. For HDFS kernel, the number of xattr names are less than 10 currently, but if users create many various xattrs (maybe by mistake), then it will cause unexpected behavior. {quote} Consider also renaming XAttrsFormat to XAttrFormat, so it's named like XAttrStorage {quote} Good idea. {quote} Is it worthwhile to do the same dictionary encoding for the FSImage as well? If the # xattrs is large enough to affect memory footprint, it'd also affect loading times which can already be minutes. Can be a follow-on JIRA for sure. {quote} Actually we already have this compaction for XAttr in FSImage, please see {{XAttrCompactProto}} in fsimage.proto. > Optimize XAttr memory footprint. > -------------------------------- > > Key: HDFS-8900 > URL: https://issues.apache.org/jira/browse/HDFS-8900 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode > Reporter: Yi Liu > Assignee: Yi Liu > Attachments: HDFS-8900.001.patch, HDFS-8900.002.patch > > > {code} > private final ImmutableList<XAttr> xAttrs; > {code} > Currently we use above in XAttrFeature, it's not efficient from memory point > of view, since {{ImmutableList}} and {{XAttr}} have object memory overhead, > and each object has memory alignment. > We can use a {{byte[]}} in XAttrFeature and do some compact in {{XAttr}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)