Artur Myseliuk created HDFS-17114:
-------------------------------------
Summary: HDFS Directory Level Access
Key: HDFS-17114
URL: https://issues.apache.org/jira/browse/HDFS-17114
Project: Hadoop HDFS
Issue Type: Improvement
Components: hdfs
Reporter: Artur Myseliuk
Problem: Currently, checking and setting ACLs on file-level is time-consuming
and API-intensive for large HDFS clusters with billions of files, particularly
for use-cases where permissions and ACLs should be uniform across all nested
files within a directory. For example, Hive table files and directories should
have the same permissions and ACLs.
Solution like default ACLs doesn’t work if:
# If a user moves or rename directories with nested files. Moved directory
with files don’t inherit default ACLs of the new location.
# If a user wants to change access to all files under some path prefixes then
the user needs to update permissions and ACLs for all files in the directory.
It takes hours or even days if there are millions of files under directory.
Proposed solution:
Use ancestor directory POSIX permissions and ACLs to check access to files.
When a user tries to access file “/a/b/c.txt” , the new model will use the
closest ancestor directory “/a/b” ACLs and permissions to check access to file
“c.txt”. If the user doesn’t have access to the directory then there are 2
options:
# Fallback to default HDFS file POSIX permission and ACLs check on file level.
So the user has access to the file when: [the user has access to ancestor
directory] OR [the user has access to file].
# Throw AccessControlException.
The feature can be enabled only for some prefixes or for all files in the HDFS
cluster via configuration.
Alternative solutions:
# Use federated authorization model for HDFS path prefixes. Implementation:
[Apache Ranger|https://ranger.apache.org/index.html] and [Apache
Sentry|https://sentry.apache.org/] provides an AuthZ plugin to check access to
files. Check is implemented by matching file path to managed resource with path
prefix. All files under the prefix path will use the resource policy managed by
the framework.The plugin will default to HDFS permissions and ACLs if there is
no matching prefix.
# Cons:
# Requires set up of external service to manage policies.
# Adding external dependency will impact HDFS NN availability.
# Similar to the solution of Sentry and Ranger but use native HDFS directory
permissions and ACLs instead of federated policies. The problem is to find
which directory permissions/ACLs to check for the requested file. There are 2
solutions:
# Maintain list of prefixes as Rangers plugin and use permissions and ACLs of
prefix directory to check access to all nested files and directories.
Using flags on directory [HDFS-15638]. For example, set a flag with HDFS
Extended Attributes. When a user tries to access a file, HDFS will traverse
ancestors and check if there is any directory with the flag. If directory with
flag exists: then use directory permissions, otherwise default to file
permissions.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]