[ 
https://issues.apache.org/jira/browse/HDFS-15372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17127418#comment-17127418
 ] 

Stephen O'Donnell commented on HDFS-15372:
------------------------------------------

This Jira is only relevant when an attribute provider like Sentry is in place. 
For normal ACLs add by "setfacl" etc, the snapshot ACLs are correct.

Take this path for example "/data/tab1" and a snasphot of it 
"/data/.snapshot/snap1/tab1".

Pre Hadoop 3.0 (I think, as I am comparing trunk with CDH 5), the attribute 
provider actually received an inode. Now on trunk the attribute provider 
receives a list of path components, which is basically a list of each directory 
in the path as a string.

Pre Hadoop 3.0, the attribute provider simply called "inode.getFullPath" on the 
inode it received. If you ask for the permissions of the components of the 
snapshot path above, these are the paths the attribute provider sees:

/data -> provider sees /data
/data/.snapshot -> provider does not see this and an dummy permission object is 
returned.
/data/.snapshot/snap1 -> provider sees /data/.snapshot/snap1
/data/.snapshot/snap1/tab1 -> provider sees /data/tab1, as calling 
getFullPath() on the inode returns this value.

Now on trunk, if you take this same set of paths, this is what the provider 
sees:

/data -> provider see [/data]
/data/.snapshot -> provider does not see this and an dummy permission object is 
returned.
/data/.snapshot/snap1 -> provider see [/data, .snapshot/snap1] - note that the 
.snapshot dir is not separated from the snapshot name here.
/data/.snapshot/snap1/tab1 -> provider sees [/data, .snapshot/snap1, tab1]

This means that if something like Sentry provides ACLs on /data/tab1, then when 
Sentry checks the snapshot path it sees /data/.snapshot/snap1/tab1 and it does 
not give the ACLs to it. In CDH 5, the snapshot path looks the same as the 
"live" path to the provider, and Sentry returns the same ACLs on the snapshot 
as for the live path.

I believe this change (what the provider sees) is a regression and not an 
intentional change. The idea of this patch is to ensure the provider sees the 
same as it did before.

The patch I have posted, for the examples above would see:

/data -> provider see [/data]
/data/.snapshot -> provider does not see this and an dummy permission object is 
returned.
/data/.snapshot/snap1 -> provider sees [/data, snap1] - This is a problem with 
my patch, it should see [/data, .snapshot/snap1]. I need to figure out how to 
fix this.
/data/.snapshot/snap1/tab1 -> provider sees [[/data, tab1]

> Files in snapshots no longer see attribute provider permissions
> ---------------------------------------------------------------
>
>                 Key: HDFS-15372
>                 URL: https://issues.apache.org/jira/browse/HDFS-15372
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Stephen O'Donnell
>            Assignee: Stephen O'Donnell
>            Priority: Major
>         Attachments: HDFS-15372.001.patch
>
>
> Given a cluster with an authorization provider configured (eg Sentry) and the 
> paths covered by the provider are snapshotable, there was a change in 
> behaviour in how the provider permissions and ACLs are applied to files in 
> snapshots between the 2.x branch and Hadoop 3.0.
> Eg, if we have the snapshotable path /data, which is Sentry managed. The ACLs 
> below are provided by Sentry:
> {code}
> hadoop fs -getfacl -R /data
> # file: /data
> # owner: hive
> # group: hive
> user::rwx
> group::rwx
> other::--x
> # file: /data/tab1
> # owner: hive
> # group: hive
> user::rwx
> group::---
> group:flume:rwx
> user:hive:rwx
> group:hive:rwx
> group:testgroup:rwx
> mask::rwx
> other::--x
> /data/tab1
> {code}
> After taking a snapshot, the files in the snapshot do not see the provider 
> permissions:
> {code}
> hadoop fs -getfacl -R /data/.snapshot
> # file: /data/.snapshot
> # owner: 
> # group: 
> user::rwx
> group::rwx
> other::rwx
> # file: /data/.snapshot/snap1
> # owner: hive
> # group: hive
> user::rwx
> group::rwx
> other::--x
> # file: /data/.snapshot/snap1/tab1
> # owner: hive
> # group: hive
> user::rwx
> group::rwx
> other::--x
> {code}
> However pre-Hadoop 3.0 (when the attribute provider etc was extensively 
> refactored) snapshots did get the provider permissions.
> The reason is this code in FSDirectory.java which ultimately calls the 
> attribute provider and passes the path we want permissions for:
> {code}
>   INodeAttributes getAttributes(INodesInPath iip)
>       throws IOException {
>     INode node = FSDirectory.resolveLastINode(iip);
>     int snapshot = iip.getPathSnapshotId();
>     INodeAttributes nodeAttrs = node.getSnapshotINode(snapshot);
>     UserGroupInformation ugi = NameNode.getRemoteUser();
>     INodeAttributeProvider ap = this.getUserFilteredAttributeProvider(ugi);
>     if (ap != null) {
>       // permission checking sends the full components array including the
>       // first empty component for the root.  however file status
>       // related calls are expected to strip out the root component according
>       // to TestINodeAttributeProvider.
>       byte[][] components = iip.getPathComponents();
>       components = Arrays.copyOfRange(components, 1, components.length);
>       nodeAttrs = ap.getAttributes(components, nodeAttrs);
>     }
>     return nodeAttrs;
>   }
> {code}
> The line:
> {code}
> INode node = FSDirectory.resolveLastINode(iip);
> {code}
> Picks the last resolved Inode and if you then call node.getPathComponents, 
> for a path like '/data/.snapshot/snap1/tab1' it will return /data/tab1. It 
> resolves the snapshot path to its original location, but its still the 
> snapshot inode.
> However the logic passes 'iip.getPathComponents' which returns 
> "/user/.snapshot/snap1/tab" to the provider.
> The pre Hadoop 3.0 code passes the inode directly to the provider, and hence 
> it only ever sees the path as "/user/data/tab1".
> It is debatable which path should be passed to the provider - 
> /user/.snapshot/snap1/tab or /data/tab1 in the case of snapshots. However as 
> the behaviour has changed I feel we should ensure the old behaviour is 
> retained.
> It would also be fairly easy to provide a config switch so the provider gets 
> the full snapshot path or the resolved path.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to