[ https://issues.apache.org/jira/browse/HDFS-2006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13975841#comment-13975841 ]
Chris Nauroth commented on HDFS-2006: ------------------------------------- Hello, [~hitliuyi]. Thank you for taking on this issue. I read through the design document and initial patch, and it looks like a great start. Here are a few questions. # Is there an intention to integrate with the ACLs work done in HDFS-4685? For example, on Linux, the getfattr command can present POSIX ACLs as xattrs in the system namespace. I haven't fully explored use of getfattr/setfattr in combination with ACLs, so I don't know the details of how this works on Linux yet. The most important piece of this would be to guarantee that setfattr cannot circumvent any important security logic enforced by setfacl, and write tests for it. # bq. Whether inode has a XAttr is through a bit of int which is shared in inode. Can you elaborate on this? This sounded like using a bit somewhere as a flag to indicate presence of xattrs, but I didn't spot anything like that in my initial scan of the patch. I just saw the new inode feature, which was basically implemented as I would have expected. # bq. In current version, large value will be stored in NN memory, unless client defines its own reference format, have external storage and can fetch content through reference. Can you please clarify this part? Is the intention that resolution of the xattr reference is somehow pluggable inside the NN, or is this just talking about a general strategy of using an xattr to persist a foreign key to some external system, with the client then driving lookup at that external system? # Could you provide more details on the memory calculation? For example, what numbers are assumed for current inode size, average number of xattrs per inode, and average size of name and value for the individual xattrs? # I was confused by the {{getXAttrs}} API that accepts {{List<Xattr>}}. Looking at the implementation in {{FSNamesystem#getXAttrs}}, it appears that the get operation also may optionally mutate the xattrs. Do I understand correctly? If so, can we remove this and make the get operation focus solely on returning the existing xattrs? # Right now, if a caller attempts to set multiple xattrs, then the authorized ones are accepted, and the unauthorized ones get filtered out and ignored silently. This could be problematic for error handling scenarios. For example, if we imagine a member of the super-group modifying xattrs in the system namespace successfully, and then that user gets removed from the super-group, then their calls will start failing to apply the xattrs silently, but no error will be propagated back to the caller to inform them that something is broken now. Perhaps this scenario should throw {{AccessControlException}} instead. # Do you intend to enforce any conservative upper limits to try to mitigate against someone maliciously or accidentally bloating the NN memory footprint by writing huge data into xattrs? Are there upper limits on number of xattrs on a single inode, number of bytes in an xattr name and number of bytes in an xattr value? # The current patch appears to include xattrs in the method signatures for the internal mkdir methods, but I didn't notice a need for this. Was this modeled after the code changes for ACLs? For POSIX ACLs, we needed to support the notion of a default ACL, which is an ACL copied from the parent directory to the child sub-directory or file at time of creation. In the implementation, this required passing along the list of ACL entries to copy in some of the internal methods. I'm not aware of general support for this kind of behavior across all xattrs, so if it doesn't exist and there is no use case for it, then I recommend removing this part of the code. # The design states a desire to preserve xattrs using distcp, but it also states that WebHDFS is not yet in scope. WebHDFS is very commonly used as a distcp target though, so I wonder if we should reconsider. # I assume that you will not support getting and setting xattrs via NFS mount. (i.e. We won't implement the case of someone calling Linux getattr/setattr on a path that is really under an NFS mount point to an HDFS cluster.) That's the choice we made for initial development of getfacl/setfacl. If this is the plan for xattrs too, then I just recommend stating that in the design doc. Thanks, again. It looks like good work! At this point, I recommend establishing a HDFS-2006 feature branch, so that you can start breaking up portions of the patch, and getting them through finer-grained code review and commit. We'll need to rally help from a PMC member. If a PMC member is listening, would you please create an HDFS-2006 branch? Thanks! > ability to support storing extended attributes per file > ------------------------------------------------------- > > Key: HDFS-2006 > URL: https://issues.apache.org/jira/browse/HDFS-2006 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode > Reporter: dhruba borthakur > Assignee: Yi Liu > Attachments: HDFS-XAttrs-Design-1.pdf, xattrs.1.patch, xattrs.patch > > > It would be nice if HDFS provides a feature to store extended attributes for > files, similar to the one described here: > http://en.wikipedia.org/wiki/Extended_file_attributes. > The challenge is that it has to be done in such a way that a site not using > this feature does not waste precious memory resources in the namenode. -- This message was sent by Atlassian JIRA (v6.2#6252)