[
https://issues.apache.org/jira/browse/DRILL-4990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16504055#comment-16504055
]
ASF GitHub Bot commented on DRILL-4990:
---------------------------------------
ppadma closed pull request #652: DRILL-4990:Use new HDFS API access instead of
listStatus to check if …
URL: https://github.com/apache/drill/pull/652
This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:
As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):
diff --git
a/exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/WorkspaceSchemaFactory.java
b/exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/WorkspaceSchemaFactory.java
index b2798a1588..b7b0fb9ce8 100644
---
a/exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/WorkspaceSchemaFactory.java
+++
b/exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/WorkspaceSchemaFactory.java
@@ -70,6 +70,7 @@
import org.apache.hadoop.fs.FileStatus;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.fs.permission.FsAction;
import org.apache.hadoop.fs.permission.FsPermission;
import org.apache.hadoop.security.AccessControlException;
@@ -151,17 +152,35 @@ public WorkspaceSchemaFactory(
*/
public boolean accessible(final String userName) throws IOException {
final FileSystem fs = ImpersonationUtil.createFileSystem(userName, fsConf);
+ boolean tryListStatus = false;
try {
- // We have to rely on the listStatus as a FileSystem can have
complicated controls such as regular unix style
- // permissions, Access Control Lists (ACLs) or Access Control
Expressions (ACE). Hadoop 2.7 version of FileSystem
- // has a limited private API (FileSystem.access) to check the
permissions directly
- // (see https://issues.apache.org/jira/browse/HDFS-6570). Drill
currently relies on Hadoop 2.5.0 version of
- // FileClient. TODO: Update this when DRILL-3749 is fixed.
- fs.listStatus(wsPath);
+ // access API checks if a user has certain permissions on a file or
directory.
+ // returns normally if requested permissions are granted and throws an
exception
+ // if access is denied. This API was added in HDFS 2.6 (see HDFS-6570).
+ // It is less expensive (than listStatus which was being used before)
and hides the
+ // complicated access control logic underneath.
+ fs.access(wsPath, FsAction.READ);
} catch (final UnsupportedOperationException e) {
- logger.trace("The filesystem for this workspace does not support this
operation.", e);
+ logger.debug("The filesystem for this workspace does not support access
operation.", e);
+ tryListStatus = true;
} catch (final FileNotFoundException | AccessControlException e) {
- return false;
+ logger.debug("not found or cannot be accessed exception while accessing
file {} : ", wsPath.toString(), e);
+ tryListStatus = true;
+ } catch (final IOException e) {
+ logger.debug("IO Exception accessing file {}", wsPath.toString(), e);
+ tryListStatus = true;
+ }
+
+ // if fs.access fails for some reason, fall back to listStatus.
+ if (tryListStatus) {
+ try {
+ fs.listStatus(wsPath);
+ } catch (final UnsupportedOperationException e) {
+ logger.debug("The filesystem for this workspace does not support
listStatus operation.", e);
+ } catch (final FileNotFoundException | AccessControlException e) {
+ logger.debug("not found or cannot be accessed exception with
listStatus for file {} ", wsPath.toString(), e);
+ return false;
+ }
}
return true;
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
> Use new HDFS API access instead of listStatus to check if users have
> permissions to access workspace.
> -----------------------------------------------------------------------------------------------------
>
> Key: DRILL-4990
> URL: https://issues.apache.org/jira/browse/DRILL-4990
> Project: Apache Drill
> Issue Type: Bug
> Components: Query Planning & Optimization
> Affects Versions: 1.8.0
> Reporter: Padma Penumarthy
> Assignee: Padma Penumarthy
> Priority: Major
>
> For every query, we build the schema tree
> (runSQL->getPlan->getNewDefaultSchema->getRootSchema). All workspaces in all
> storage plugins are checked and are added to the schema tree if they are
> accessible by the user who initiated the query. For file system plugin,
> listStatus API is used to check if the workspace is accessible or not
> (WorkspaceSchemaFactory.accessible) by the user. The idea seem to be if the
> user does not have access to file(s) in the workspace, listStatus will
> generate an exception and we return false. But, listStatus (which lists all
> the entries of a directory) is an expensive operation when there are large
> number of files in the directory. A new API is added in Hadoop 2.6 called
> access (HDFS-6570) which provides the ability to check if the user has
> permissions on a file/directory. Use this new API instead of listStatus.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)