[ 
https://issues.apache.org/jira/browse/HDFS-2006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13975841#comment-13975841
 ] 

Chris Nauroth commented on HDFS-2006:
-------------------------------------

Hello, [~hitliuyi].  Thank you for taking on this issue.  I read through the 
design document and initial patch, and it looks like a great start.  Here are a 
few questions.

# Is there an intention to integrate with the ACLs work done in HDFS-4685?  For 
example, on Linux, the getfattr command can present POSIX ACLs as xattrs in the 
system namespace.  I haven't fully explored use of getfattr/setfattr in 
combination with ACLs, so I don't know the details of how this works on Linux 
yet.  The most important piece of this would be to guarantee that setfattr 
cannot circumvent any important security logic enforced by setfacl, and write 
tests for it.
# bq. Whether inode has a XAttr is through a bit of int which is shared in 
inode.
Can you elaborate on this?  This sounded like using a bit somewhere as a flag 
to indicate presence of xattrs, but I didn't spot anything like that in my 
initial scan of the patch.  I just saw the new inode feature, which was 
basically implemented as I would have expected.
# bq. In current version, large value will be stored in NN memory, unless 
client defines its own reference format, have external storage and can fetch 
content through reference.
Can you please clarify this part?  Is the intention that resolution of the 
xattr reference is somehow pluggable inside the NN, or is this just talking 
about a general strategy of using an xattr to persist a foreign key to some 
external system, with the client then driving lookup at that external system?
# Could you provide more details on the memory calculation?  For example, what 
numbers are assumed for current inode size, average number of xattrs per inode, 
and average size of name and value for the individual xattrs?
# I was confused by the {{getXAttrs}} API that accepts {{List<Xattr>}}.  
Looking at the implementation in {{FSNamesystem#getXAttrs}}, it appears that 
the get operation also may optionally mutate the xattrs.  Do I understand 
correctly?  If so, can we remove this and make the get operation focus solely 
on returning the existing xattrs?
# Right now, if a caller attempts to set multiple xattrs, then the authorized 
ones are accepted, and the unauthorized ones get filtered out and ignored 
silently.  This could be problematic for error handling scenarios.  For 
example, if we imagine a member of the super-group modifying xattrs in the 
system namespace successfully, and then that user gets removed from the 
super-group, then their calls will start failing to apply the xattrs silently, 
but no error will be propagated back to the caller to inform them that 
something is broken now.  Perhaps this scenario should throw 
{{AccessControlException}} instead.
# Do you intend to enforce any conservative upper limits to try to mitigate 
against someone maliciously or accidentally bloating the NN memory footprint by 
writing huge data into xattrs?  Are there upper limits on number of xattrs on a 
single inode, number of bytes in an xattr name and number of bytes in an xattr 
value?
# The current patch appears to include xattrs in the method signatures for the 
internal mkdir methods, but I didn't notice a need for this.  Was this modeled 
after the code changes for ACLs?  For POSIX ACLs, we needed to support the 
notion of a default ACL, which is an ACL copied from the parent directory to 
the child sub-directory or file at time of creation.  In the implementation, 
this required passing along the list of ACL entries to copy in some of the 
internal methods.  I'm not aware of general support for this kind of behavior 
across all xattrs, so if it doesn't exist and there is no use case for it, then 
I recommend removing this part of the code.
# The design states a desire to preserve xattrs using distcp, but it also 
states that WebHDFS is not yet in scope.  WebHDFS is very commonly used as a 
distcp target though, so I wonder if we should reconsider.
# I assume that you will not support getting and setting xattrs via NFS mount.  
(i.e. We won't implement the case of someone calling Linux getattr/setattr on a 
path that is really under an NFS mount point to an HDFS cluster.)  That's the 
choice we made for initial development of getfacl/setfacl.  If this is the plan 
for xattrs too, then I just recommend stating that in the design doc.

Thanks, again.  It looks like good work!

At this point, I recommend establishing a HDFS-2006 feature branch, so that you 
can start breaking up portions of the patch, and getting them through 
finer-grained code review and commit.  We'll need to rally help from a PMC 
member.  If a PMC member is listening, would you please create an HDFS-2006 
branch?  Thanks!

> ability to support storing extended attributes per file
> -------------------------------------------------------
>
>                 Key: HDFS-2006
>                 URL: https://issues.apache.org/jira/browse/HDFS-2006
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: namenode
>            Reporter: dhruba borthakur
>            Assignee: Yi Liu
>         Attachments: HDFS-XAttrs-Design-1.pdf, xattrs.1.patch, xattrs.patch
>
>
> It would be nice if HDFS provides a feature to store extended attributes for 
> files, similar to the one described here: 
> http://en.wikipedia.org/wiki/Extended_file_attributes. 
> The challenge is that it has to be done in such a way that a site not using 
> this feature does not waste precious memory resources in the namenode.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to