[ https://issues.apache.org/jira/browse/MAPREDUCE-5809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13990844#comment-13990844 ]
Chris Nauroth commented on MAPREDUCE-5809: ------------------------------------------ bq. (Question, why listStatus(..) does not return ACL or does it make sense to add it in the future?) Now, we need an additional getAclStatus(..) call. We considered adding the ACLs to {{FileStatus}}, but this would have been a backwards-incompatible change. {{FileStatus}} implements {{Writable}} serialization, which is more brittle to version compared to something like protobuf. {{FileStatus#write}} doesn't embed any kind of version number, so there is no reliable way to tell at runtime if we are deserializing a pre-ACLs {{FileStatus}} or a post-ACLs {{FileStatus}}. This would have had a high risk of breaking downstream code or mixed versions that had used the {{Writable}} serialization. An alternative would have been to skip serializing ACLs in {{FileStatus#write}}, but then there would have been a risk of NPE for clients expecting a fully serialized object. This is discussed further in the HDFS-4685 design doc on page 12: https://issues.apache.org/jira/secure/attachment/12627729/HDFS-ACLs-Design-3.pdf I agree that forcing an extra RPC has made this awkward. Perhaps we'll want to reconsider putting ACLs into {{FileStatus}} in the future, but I think that can happen only on a major version boundary (trunk but not branch-2). bq. Running the distcp command in the source cluster is probably better. OK, it's a trade-off of speed vs. bandwidth consumption. I'll make the change and upload a new patch. The {{setAcl}} calls will still happen in parallel from the map tasks, but by that point, it will involve fewer total RPCs, because we can skip the files that don't have ACLs. > Enhance distcp to support preserving HDFS ACLs. > ----------------------------------------------- > > Key: MAPREDUCE-5809 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5809 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: distcp > Affects Versions: 2.4.0 > Reporter: Chris Nauroth > Assignee: Chris Nauroth > Attachments: MAPREDUCE-5809.1.patch, MAPREDUCE-5809.2.patch, > MAPREDUCE-5809.3.patch > > > This issue tracks enhancing distcp to add a new command-line argument for > preserving HDFS ACLs from the source at the copy destination. -- This message was sent by Atlassian JIRA (v6.2#6252)