[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13990844#comment-13990844
 ] 

Chris Nauroth commented on MAPREDUCE-5809:
------------------------------------------

bq. (Question, why listStatus(..) does not return ACL or does it make sense to 
add it in the future?) Now, we need an additional getAclStatus(..) call.

We considered adding the ACLs to {{FileStatus}}, but this would have been a 
backwards-incompatible change.  {{FileStatus}} implements {{Writable}} 
serialization, which is more brittle to version compared to something like 
protobuf.  {{FileStatus#write}} doesn't embed any kind of version number, so 
there is no reliable way to tell at runtime if we are deserializing a pre-ACLs 
{{FileStatus}} or a post-ACLs {{FileStatus}}.  This would have had a high risk 
of breaking downstream code or mixed versions that had used the {{Writable}} 
serialization.  An alternative would have been to skip serializing ACLs in 
{{FileStatus#write}}, but then there would have been a risk of NPE for clients 
expecting a fully serialized object.  This is discussed further in the 
HDFS-4685 design doc on page 12:

https://issues.apache.org/jira/secure/attachment/12627729/HDFS-ACLs-Design-3.pdf

I agree that forcing an extra RPC has made this awkward.  Perhaps we'll want to 
reconsider putting ACLs into {{FileStatus}} in the future, but I think that can 
happen only on a major version boundary (trunk but not branch-2).

bq. Running the distcp command in the source cluster is probably better.

OK, it's a trade-off of speed vs. bandwidth consumption.  I'll make the change 
and upload a new patch.  The {{setAcl}} calls will still happen in parallel 
from the map tasks, but by that point, it will involve fewer total RPCs, 
because we can skip the files that don't have ACLs.

> Enhance distcp to support preserving HDFS ACLs.
> -----------------------------------------------
>
>                 Key: MAPREDUCE-5809
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5809
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: distcp
>    Affects Versions: 2.4.0
>            Reporter: Chris Nauroth
>            Assignee: Chris Nauroth
>         Attachments: MAPREDUCE-5809.1.patch, MAPREDUCE-5809.2.patch, 
> MAPREDUCE-5809.3.patch
>
>
> This issue tracks enhancing distcp to add a new command-line argument for 
> preserving HDFS ACLs from the source at the copy destination.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to