Padma Penumarthy created DRILL-4990:
---------------------------------------
Summary: Use new HDFS API access instead of listStatus to check if
users have permissions to access workspace.
Key: DRILL-4990
URL: https://issues.apache.org/jira/browse/DRILL-4990
Project: Apache Drill
Issue Type: Bug
Components: Query Planning & Optimization
Affects Versions: 1.8.0
Reporter: Padma Penumarthy
Assignee: Padma Penumarthy
Fix For: 1.9.0
For every query, we build the schema tree
(runSQL->getPlan->getNewDefaultSchema->getRootSchema). All workspaces in all
storage plugins are checked and are added to the schema tree if they are
accessible by the user who initiated the query. For file system plugin,
listStatus API is used to check if the workspace is accessible or not
(WorkspaceSchemaFactory.accessible) by the user. The idea seem to be if the
user does not have access to file(s) in the workspace, listStatus will generate
an exception and we return false. But, listStatus (which lists all the entries
of a directory) is an expensive operation when there are large number of files
in the directory. A new API is added in Hadoop 2.6 called access (HDFS-6570)
which provides the ability to check if the user has permissions on a
file/directory. Use this new API instead of listStatus. For a directory with
256k+ files, an improvement of upto 10 sec in planning time was observed when
using the new API vs. old way of listStatus.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)