[ 
https://issues.apache.org/jira/browse/HDFS-2006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13976941#comment-13976941
 ] 

Yi Liu commented on HDFS-2006:
------------------------------

[~cnauroth], thanks for so many valuable comments.
{quote}
Is there an intention to integrate with the ACLs work done in HDFS-4685? For 
example, on Linux, the getfattr command can present POSIX ACLs as xattrs in the 
system namespace. I haven't fully explored use of getfattr/setfattr in 
combination with ACLs, so I don't know the details of how this works on Linux 
yet. The most important piece of this would be to guarantee that setfattr 
cannot circumvent any important security logic enforced by setfacl, and write 
tests for it.
{quote}
This is good point. We will think to integrate later once this work is in?

{quote}
Can you elaborate on this? This sounded like using a bit somewhere as a flag to 
indicate presence of xattrs, but I didn't spot anything like that in my initial 
scan of the patch. I just saw the new inode feature, which was basically 
implemented as I would have expected.
{quote}
This is same like if #hasAcl()

{quote}
Can you please clarify this part? Is the intention that resolution of the xattr 
reference is somehow pluggable inside the NN, or is this just talking about a 
general strategy of using an xattr to persist a foreign key to some external 
system, with the client then driving lookup at that external system?
{quote}
Right now, all the Xattrs will be  stored in NN memory only. Later we may 
support to store the Xattrs large values in extrenal system to avoid NN memory, 
it’s a general strategy of using an xattr to persist a foreign key to some 
external system, But we are not focusing it in the current version 
implementation. We will revise the doc to give clear understanding on this.

{quote}
Could you provide more details on the memory calculation? For example, what 
numbers are assumed for current inode size, average number of xattrs per inode, 
and average size of name and value for the individual xattrs?
{quote}
Next revision will cover this.

{quote}
I was confused by the getXAttrs API that accepts List<Xattr>. Looking at the 
implementation in FSNamesystem#getXAttrs, it appears that the get operation 
also may optionally mutate the xattrs. Do I understand correctly? If so, can we 
remove this and make the get operation focus solely on returning the existing 
xattrs?
{quote}
Agree with you, APIs are getting refined in HADOOP-10520. 

{quote}
Right now, if a caller attempts to set multiple xattrs, then the authorized 
ones are accepted, and the unauthorized ones get filtered out and ignored 
silently. This could be problematic for error handling scenarios. For example, 
if we imagine a member of the super-group modifying xattrs in the system 
namespace successfully, and then that user gets removed from the super-group, 
then their calls will start failing to apply the xattrs silently, but no error 
will be propagated back to the caller to inform them that something is broken 
now. Perhaps this scenario should throw AccessControlException instead.
{quote}
Agree with you. We have refined the Interfaces from HADOOP-10520. Mostly we 
plan to support setting/removing single attr as first version. It’s more closer 
to POSIX APIs and can cover most of use cases, and the handling is simplified.

{quote}
Do you intend to enforce any conservative upper limits to try to mitigate 
against someone maliciously or accidentally bloating the NN memory footprint by 
writing huge data into xattrs? Are there upper limits on number of xattrs on a 
single inode, number of bytes in an xattr name and number of bytes in an xattr 
value?
{quote}
Actually we planned it in first revision of doc. After that again thought that 
the same malicious user can create more and more useless files and make NN 
memory filled. I think, better we will have this control and no hrm on having 
that.
Your point of accidental bloating is good one. Even through user is not a 
malicious user but accidentally wrote huge data as part of Xattr and forgot to 
remove that. This make NN to suffer on memory usage.
We will update in next version of doc.

{quote}
The current patch appears to include xattrs in the method signatures for the 
internal mkdir methods, but I didn't notice a need for this. Was this modeled 
after the code changes for ACLs? For POSIX ACLs, we needed to support the 
notion of a default ACL, which is an ACL copied from the parent directory to 
the child sub-directory or file at time of creation. In the implementation, 
this required passing along the list of ACL entries to copy in some of the 
internal methods. I'm not aware of general support for this kind of behavior 
across all xattrs, so if it doesn't exist and there is no use case for it, then 
I recommend removing this part of the code.
{quote}
Yes. We don't really need this part. We will remove while implementing in 
subtask JIRA.

{quote}
The design states a desire to preserve xattrs using distcp, but it also states 
that WebHDFS is not yet in scope. WebHDFS is very commonly used as a distcp 
target though, so I wonder if we should reconsider.
{quote}
WebHDFS is in scope, just not implemented. We will update this. It is actually 
on next revision.

{quote}
I assume that you will not support getting and setting xattrs via NFS mount. 
(i.e. We won't implement the case of someone calling Linux getattr/setattr on a 
path that is really under an NFS mount point to an HDFS cluster.) That's the 
choice we made for initial development of getfacl/setfacl. If this is the plan 
for xattrs too, then I just recommend stating that in the design doc.
{quote}
We will update in the doc revision about this explicitly.

We will cover all your design related comments in next revision of doc. Code 
related comments will handled in subtasks. Thanks again for the Review.


> ability to support storing extended attributes per file
> -------------------------------------------------------
>
>                 Key: HDFS-2006
>                 URL: https://issues.apache.org/jira/browse/HDFS-2006
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: namenode
>    Affects Versions: HDFS XAttrs (HDFS-2006)
>            Reporter: dhruba borthakur
>            Assignee: Yi Liu
>         Attachments: HDFS-XAttrs-Design-1.pdf, xattrs.1.patch, xattrs.patch
>
>
> It would be nice if HDFS provides a feature to store extended attributes for 
> files, similar to the one described here: 
> http://en.wikipedia.org/wiki/Extended_file_attributes. 
> The challenge is that it has to be done in such a way that a site not using 
> this feature does not waste precious memory resources in the namenode.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to