[jira] [Commented] (DRILL-4990) Use new HDFS API access instead of listStatus to check if users have permissions to access workspace.

2018-06-06 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16504055#comment-16504055
 ] 

ASF GitHub Bot commented on DRILL-4990:
---

ppadma closed pull request #652: DRILL-4990:Use new HDFS API access instead of 
listStatus to check if …
URL: https://github.com/apache/drill/pull/652
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/WorkspaceSchemaFactory.java
 
b/exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/WorkspaceSchemaFactory.java
index b2798a1588..b7b0fb9ce8 100644
--- 
a/exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/WorkspaceSchemaFactory.java
+++ 
b/exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/WorkspaceSchemaFactory.java
@@ -70,6 +70,7 @@
 import org.apache.hadoop.fs.FileStatus;
 import org.apache.hadoop.fs.FileSystem;
 import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.fs.permission.FsAction;
 import org.apache.hadoop.fs.permission.FsPermission;
 import org.apache.hadoop.security.AccessControlException;
 
@@ -151,17 +152,35 @@ public WorkspaceSchemaFactory(
*/
   public boolean accessible(final String userName) throws IOException {
 final FileSystem fs = ImpersonationUtil.createFileSystem(userName, fsConf);
+boolean tryListStatus = false;
 try {
-  // We have to rely on the listStatus as a FileSystem can have 
complicated controls such as regular unix style
-  // permissions, Access Control Lists (ACLs) or Access Control 
Expressions (ACE). Hadoop 2.7 version of FileSystem
-  // has a limited private API (FileSystem.access) to check the 
permissions directly
-  // (see https://issues.apache.org/jira/browse/HDFS-6570). Drill 
currently relies on Hadoop 2.5.0 version of
-  // FileClient. TODO: Update this when DRILL-3749 is fixed.
-  fs.listStatus(wsPath);
+  // access API checks if a user has certain permissions on a file or 
directory.
+  // returns normally if requested permissions are granted and throws an 
exception
+  // if access is denied. This API was added in HDFS 2.6 (see HDFS-6570).
+  // It is less expensive (than listStatus which was being used before) 
and hides the
+  // complicated access control logic underneath.
+  fs.access(wsPath, FsAction.READ);
 } catch (final UnsupportedOperationException e) {
-  logger.trace("The filesystem for this workspace does not support this 
operation.", e);
+  logger.debug("The filesystem for this workspace does not support access 
operation.", e);
+  tryListStatus = true;
 } catch (final FileNotFoundException | AccessControlException e) {
-  return false;
+  logger.debug("not found or cannot be accessed exception while accessing 
file {} : ", wsPath.toString(), e);
+  tryListStatus = true;
+} catch (final IOException e) {
+  logger.debug("IO Exception accessing file {}", wsPath.toString(), e);
+  tryListStatus = true;
+}
+
+// if fs.access fails for some reason, fall back to listStatus.
+if (tryListStatus) {
+  try {
+fs.listStatus(wsPath);
+  } catch (final UnsupportedOperationException e) {
+logger.debug("The filesystem for this workspace does not support 
listStatus operation.", e);
+  } catch (final FileNotFoundException | AccessControlException e) {
+logger.debug("not found or cannot be accessed exception with 
listStatus for file {} ", wsPath.toString(), e);
+return false;
+  }
 }
 
 return true;


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Use new HDFS API access instead of listStatus to check if users have 
> permissions to access workspace.
> -
>
> Key: DRILL-4990
> URL: https://issues.apache.org/jira/browse/DRILL-4990
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.8.0
>Reporter: Padma Penumarthy
>Assignee: Padma Penumarthy
>Priority: Major
>
> For every query, we build the schema tree 
> (runSQL->getPlan->getNewDefaultSchema->getRootSchema). All workspaces in all 
> storage plugins are checked and are added to the schema tree if they are 
> accessible by the user w

[jira] [Commented] (DRILL-4990) Use new HDFS API access instead of listStatus to check if users have permissions to access workspace.

2018-06-06 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16504054#comment-16504054
 ] 

ASF GitHub Bot commented on DRILL-4990:
---

ppadma commented on issue #652: DRILL-4990:Use new HDFS API access instead of 
listStatus to check if …
URL: https://github.com/apache/drill/pull/652#issuecomment-395247128
 
 
   this issue is addressed through another PR #1032. closing it. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Use new HDFS API access instead of listStatus to check if users have 
> permissions to access workspace.
> -
>
> Key: DRILL-4990
> URL: https://issues.apache.org/jira/browse/DRILL-4990
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.8.0
>Reporter: Padma Penumarthy
>Assignee: Padma Penumarthy
>Priority: Major
>
> For every query, we build the schema tree 
> (runSQL->getPlan->getNewDefaultSchema->getRootSchema). All workspaces in all 
> storage plugins are checked and are added to the schema tree if they are 
> accessible by the user who initiated the query.  For file system plugin, 
> listStatus API is used to check if  the workspace is accessible or not 
> (WorkspaceSchemaFactory.accessible) by the user.  The idea seem to be if the 
> user does not have access to file(s) in the workspace, listStatus will 
> generate an exception and we return false. But, listStatus (which lists all 
> the entries of a directory) is an expensive operation when there are large 
> number of files in the directory. A new API is added in Hadoop 2.6 called 
> access (HDFS-6570) which provides the ability to check if the user has 
> permissions on a file/directory.  Use this new API instead of listStatus. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-4990) Use new HDFS API access instead of listStatus to check if users have permissions to access workspace.

2018-06-06 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16504010#comment-16504010
 ] 

ASF GitHub Bot commented on DRILL-4990:
---

kkhatua commented on issue #652: DRILL-4990:Use new HDFS API access instead of 
listStatus to check if …
URL: https://github.com/apache/drill/pull/652#issuecomment-395236785
 
 
   You can close this PR, @ppadma .
   This seems to have been resolved by @chunhui-shi 's commit (#1032) for 
DRILL-5089. 
   
https://github.com/ppadma/drill/commit/18a71a38f6bd1fd33d21d1c68fc23c5901b0080a#diff-b89fd230da4ac69b7a8b6caf7677f77a


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Use new HDFS API access instead of listStatus to check if users have 
> permissions to access workspace.
> -
>
> Key: DRILL-4990
> URL: https://issues.apache.org/jira/browse/DRILL-4990
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.8.0
>Reporter: Padma Penumarthy
>Assignee: Padma Penumarthy
>Priority: Major
>
> For every query, we build the schema tree 
> (runSQL->getPlan->getNewDefaultSchema->getRootSchema). All workspaces in all 
> storage plugins are checked and are added to the schema tree if they are 
> accessible by the user who initiated the query.  For file system plugin, 
> listStatus API is used to check if  the workspace is accessible or not 
> (WorkspaceSchemaFactory.accessible) by the user.  The idea seem to be if the 
> user does not have access to file(s) in the workspace, listStatus will 
> generate an exception and we return false. But, listStatus (which lists all 
> the entries of a directory) is an expensive operation when there are large 
> number of files in the directory. A new API is added in Hadoop 2.6 called 
> access (HDFS-6570) which provides the ability to check if the user has 
> permissions on a file/directory.  Use this new API instead of listStatus. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-4990) Use new HDFS API access instead of listStatus to check if users have permissions to access workspace.

2018-01-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16319168#comment-16319168
 ] 

ASF GitHub Bot commented on DRILL-4990:
---

Github user kkhatua commented on the issue:

https://github.com/apache/drill/pull/652
  
@ppadma can you rebase this with the current master, and include the 
workaround? It does not make sense to hold up this commit for so long if a 
workaround for the Windows platform is sufficient.


> Use new HDFS API access instead of listStatus to check if users have 
> permissions to access workspace.
> -
>
> Key: DRILL-4990
> URL: https://issues.apache.org/jira/browse/DRILL-4990
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.8.0
>Reporter: Padma Penumarthy
>Assignee: Padma Penumarthy
> Fix For: 1.13.0
>
>
> For every query, we build the schema tree 
> (runSQL->getPlan->getNewDefaultSchema->getRootSchema). All workspaces in all 
> storage plugins are checked and are added to the schema tree if they are 
> accessible by the user who initiated the query.  For file system plugin, 
> listStatus API is used to check if  the workspace is accessible or not 
> (WorkspaceSchemaFactory.accessible) by the user.  The idea seem to be if the 
> user does not have access to file(s) in the workspace, listStatus will 
> generate an exception and we return false. But, listStatus (which lists all 
> the entries of a directory) is an expensive operation when there are large 
> number of files in the directory. A new API is added in Hadoop 2.6 called 
> access (HDFS-6570) which provides the ability to check if the user has 
> permissions on a file/directory.  Use this new API instead of listStatus. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-4990) Use new HDFS API access instead of listStatus to check if users have permissions to access workspace.

2017-11-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16252838#comment-16252838
 ] 

ASF GitHub Bot commented on DRILL-4990:
---

Github user sohami commented on the issue:

https://github.com/apache/drill/pull/652
  
Based on the last in-person discussion it was decided to further look into 
why the test was failing on Windows platform. Is it the test environment setup 
issue or an actual issue w.r.t platform implementation of access method ? 


> Use new HDFS API access instead of listStatus to check if users have 
> permissions to access workspace.
> -
>
> Key: DRILL-4990
> URL: https://issues.apache.org/jira/browse/DRILL-4990
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.8.0
>Reporter: Padma Penumarthy
>Assignee: Padma Penumarthy
>
> For every query, we build the schema tree 
> (runSQL->getPlan->getNewDefaultSchema->getRootSchema). All workspaces in all 
> storage plugins are checked and are added to the schema tree if they are 
> accessible by the user who initiated the query.  For file system plugin, 
> listStatus API is used to check if  the workspace is accessible or not 
> (WorkspaceSchemaFactory.accessible) by the user.  The idea seem to be if the 
> user does not have access to file(s) in the workspace, listStatus will 
> generate an exception and we return false. But, listStatus (which lists all 
> the entries of a directory) is an expensive operation when there are large 
> number of files in the directory. A new API is added in Hadoop 2.6 called 
> access (HDFS-6570) which provides the ability to check if the user has 
> permissions on a file/directory.  Use this new API instead of listStatus. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-4990) Use new HDFS API access instead of listStatus to check if users have permissions to access workspace.

2017-11-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16240724#comment-16240724
 ] 

ASF GitHub Bot commented on DRILL-4990:
---

Github user ppadma commented on a diff in the pull request:

https://github.com/apache/drill/pull/652#discussion_r149173203
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/WorkspaceSchemaFactory.java
 ---
@@ -151,17 +152,32 @@ public WorkspaceSchemaFactory(
*/
   public boolean accessible(final String userName) throws IOException {
 final FileSystem fs = ImpersonationUtil.createFileSystem(userName, 
fsConf);
+boolean tryListStatus = false;
 try {
-  // We have to rely on the listStatus as a FileSystem can have 
complicated controls such as regular unix style
-  // permissions, Access Control Lists (ACLs) or Access Control 
Expressions (ACE). Hadoop 2.7 version of FileSystem
-  // has a limited private API (FileSystem.access) to check the 
permissions directly
-  // (see https://issues.apache.org/jira/browse/HDFS-6570). Drill 
currently relies on Hadoop 2.5.0 version of
-  // FileClient. TODO: Update this when DRILL-3749 is fixed.
-  fs.listStatus(wsPath);
+  // access API checks if a user has certain permissions on a file or 
directory.
+  // returns normally if requested permissions are granted and throws 
an exception
+  // if access is denied. This API was added in HDFS 2.6 (see 
HDFS-6570).
+  // It is less expensive (than listStatus which was being used 
before) and hides the
+  // complicated access control logic underneath.
+  fs.access(wsPath, FsAction.READ);
 } catch (final UnsupportedOperationException e) {
-  logger.trace("The filesystem for this workspace does not support 
this operation.", e);
+  logger.debug("The filesystem for this workspace does not support 
access operation.", e);
+  tryListStatus = true;
 } catch (final FileNotFoundException | AccessControlException e) {
-  return false;
+  logger.debug("file {} not found or cannot be accessed", 
wsPath.toString(), e);
+  tryListStatus = true;
--- End diff --

It can be trusted fine for regular file systems HDFS, MapR FS, Mac OS 
Extended File System etc.  I am not sure if it works for windows or not. We 
were never able to figure out whether it is a problem with the test setup or it 
is actually not supported. 

Updated the diffs with your review comments taken care of. Please take a 
look.



> Use new HDFS API access instead of listStatus to check if users have 
> permissions to access workspace.
> -
>
> Key: DRILL-4990
> URL: https://issues.apache.org/jira/browse/DRILL-4990
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.8.0
>Reporter: Padma Penumarthy
>Assignee: Padma Penumarthy
>
> For every query, we build the schema tree 
> (runSQL->getPlan->getNewDefaultSchema->getRootSchema). All workspaces in all 
> storage plugins are checked and are added to the schema tree if they are 
> accessible by the user who initiated the query.  For file system plugin, 
> listStatus API is used to check if  the workspace is accessible or not 
> (WorkspaceSchemaFactory.accessible) by the user.  The idea seem to be if the 
> user does not have access to file(s) in the workspace, listStatus will 
> generate an exception and we return false. But, listStatus (which lists all 
> the entries of a directory) is an expensive operation when there are large 
> number of files in the directory. A new API is added in Hadoop 2.6 called 
> access (HDFS-6570) which provides the ability to check if the user has 
> permissions on a file/directory.  Use this new API instead of listStatus. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-4990) Use new HDFS API access instead of listStatus to check if users have permissions to access workspace.

2017-11-05 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16239893#comment-16239893
 ] 

ASF GitHub Bot commented on DRILL-4990:
---

Github user sohami commented on a diff in the pull request:

https://github.com/apache/drill/pull/652#discussion_r148991210
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/WorkspaceSchemaFactory.java
 ---
@@ -151,17 +152,32 @@ public WorkspaceSchemaFactory(
*/
   public boolean accessible(final String userName) throws IOException {
 final FileSystem fs = ImpersonationUtil.createFileSystem(userName, 
fsConf);
+boolean tryListStatus = false;
 try {
-  // We have to rely on the listStatus as a FileSystem can have 
complicated controls such as regular unix style
-  // permissions, Access Control Lists (ACLs) or Access Control 
Expressions (ACE). Hadoop 2.7 version of FileSystem
-  // has a limited private API (FileSystem.access) to check the 
permissions directly
-  // (see https://issues.apache.org/jira/browse/HDFS-6570). Drill 
currently relies on Hadoop 2.5.0 version of
-  // FileClient. TODO: Update this when DRILL-3749 is fixed.
-  fs.listStatus(wsPath);
+  // access API checks if a user has certain permissions on a file or 
directory.
+  // returns normally if requested permissions are granted and throws 
an exception
+  // if access is denied. This API was added in HDFS 2.6 (see 
HDFS-6570).
+  // It is less expensive (than listStatus which was being used 
before) and hides the
+  // complicated access control logic underneath.
+  fs.access(wsPath, FsAction.READ);
 } catch (final UnsupportedOperationException e) {
-  logger.trace("The filesystem for this workspace does not support 
this operation.", e);
+  logger.debug("The filesystem for this workspace does not support 
access operation.", e);
+  tryListStatus = true;
 } catch (final FileNotFoundException | AccessControlException e) {
-  return false;
+  logger.debug("file {} not found or cannot be accessed", 
wsPath.toString(), e);
+  tryListStatus = true;
+}
+
+// if fs.access fails for some reason, fall back to listStatus.
+if (tryListStatus) {
+  try {
+fs.listStatus(wsPath);
+  } catch (final UnsupportedOperationException e) {
+logger.debug("The filesystem for this workspace does not support 
listStatus operation.", e);
+  } catch (final FileNotFoundException | AccessControlException e) {
+logger.debug("file {} not found or cannot be accessed", 
wsPath.toString(), e);
--- End diff --

let's add `using listStatus` in the log statement to differentiate with 
previous case


> Use new HDFS API access instead of listStatus to check if users have 
> permissions to access workspace.
> -
>
> Key: DRILL-4990
> URL: https://issues.apache.org/jira/browse/DRILL-4990
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.8.0
>Reporter: Padma Penumarthy
>Assignee: Padma Penumarthy
>
> For every query, we build the schema tree 
> (runSQL->getPlan->getNewDefaultSchema->getRootSchema). All workspaces in all 
> storage plugins are checked and are added to the schema tree if they are 
> accessible by the user who initiated the query.  For file system plugin, 
> listStatus API is used to check if  the workspace is accessible or not 
> (WorkspaceSchemaFactory.accessible) by the user.  The idea seem to be if the 
> user does not have access to file(s) in the workspace, listStatus will 
> generate an exception and we return false. But, listStatus (which lists all 
> the entries of a directory) is an expensive operation when there are large 
> number of files in the directory. A new API is added in Hadoop 2.6 called 
> access (HDFS-6570) which provides the ability to check if the user has 
> permissions on a file/directory.  Use this new API instead of listStatus. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-4990) Use new HDFS API access instead of listStatus to check if users have permissions to access workspace.

2017-11-05 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16239892#comment-16239892
 ] 

ASF GitHub Bot commented on DRILL-4990:
---

Github user sohami commented on a diff in the pull request:

https://github.com/apache/drill/pull/652#discussion_r148991415
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/WorkspaceSchemaFactory.java
 ---
@@ -151,17 +152,32 @@ public WorkspaceSchemaFactory(
*/
   public boolean accessible(final String userName) throws IOException {
 final FileSystem fs = ImpersonationUtil.createFileSystem(userName, 
fsConf);
+boolean tryListStatus = false;
 try {
-  // We have to rely on the listStatus as a FileSystem can have 
complicated controls such as regular unix style
-  // permissions, Access Control Lists (ACLs) or Access Control 
Expressions (ACE). Hadoop 2.7 version of FileSystem
-  // has a limited private API (FileSystem.access) to check the 
permissions directly
-  // (see https://issues.apache.org/jira/browse/HDFS-6570). Drill 
currently relies on Hadoop 2.5.0 version of
-  // FileClient. TODO: Update this when DRILL-3749 is fixed.
-  fs.listStatus(wsPath);
+  // access API checks if a user has certain permissions on a file or 
directory.
+  // returns normally if requested permissions are granted and throws 
an exception
+  // if access is denied. This API was added in HDFS 2.6 (see 
HDFS-6570).
+  // It is less expensive (than listStatus which was being used 
before) and hides the
+  // complicated access control logic underneath.
+  fs.access(wsPath, FsAction.READ);
 } catch (final UnsupportedOperationException e) {
-  logger.trace("The filesystem for this workspace does not support 
this operation.", e);
+  logger.debug("The filesystem for this workspace does not support 
access operation.", e);
+  tryListStatus = true;
 } catch (final FileNotFoundException | AccessControlException e) {
-  return false;
+  logger.debug("file {} not found or cannot be accessed", 
wsPath.toString(), e);
+  tryListStatus = true;
--- End diff --

Is access function never trustworthy for negative cases ? Or is it windows 
platform specific issue ?


> Use new HDFS API access instead of listStatus to check if users have 
> permissions to access workspace.
> -
>
> Key: DRILL-4990
> URL: https://issues.apache.org/jira/browse/DRILL-4990
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.8.0
>Reporter: Padma Penumarthy
>Assignee: Padma Penumarthy
>
> For every query, we build the schema tree 
> (runSQL->getPlan->getNewDefaultSchema->getRootSchema). All workspaces in all 
> storage plugins are checked and are added to the schema tree if they are 
> accessible by the user who initiated the query.  For file system plugin, 
> listStatus API is used to check if  the workspace is accessible or not 
> (WorkspaceSchemaFactory.accessible) by the user.  The idea seem to be if the 
> user does not have access to file(s) in the workspace, listStatus will 
> generate an exception and we return false. But, listStatus (which lists all 
> the entries of a directory) is an expensive operation when there are large 
> number of files in the directory. A new API is added in Hadoop 2.6 called 
> access (HDFS-6570) which provides the ability to check if the user has 
> permissions on a file/directory.  Use this new API instead of listStatus. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-4990) Use new HDFS API access instead of listStatus to check if users have permissions to access workspace.

2017-11-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16236810#comment-16236810
 ] 

ASF GitHub Bot commented on DRILL-4990:
---

Github user ppadma commented on the issue:

https://github.com/apache/drill/pull/652
  
@sohami Sorabh, since you reviewed the original pull request, can you 
please review the updated diffs ?


> Use new HDFS API access instead of listStatus to check if users have 
> permissions to access workspace.
> -
>
> Key: DRILL-4990
> URL: https://issues.apache.org/jira/browse/DRILL-4990
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.8.0
>Reporter: Padma Penumarthy
>Assignee: Padma Penumarthy
>Priority: Major
>
> For every query, we build the schema tree 
> (runSQL->getPlan->getNewDefaultSchema->getRootSchema). All workspaces in all 
> storage plugins are checked and are added to the schema tree if they are 
> accessible by the user who initiated the query.  For file system plugin, 
> listStatus API is used to check if  the workspace is accessible or not 
> (WorkspaceSchemaFactory.accessible) by the user.  The idea seem to be if the 
> user does not have access to file(s) in the workspace, listStatus will 
> generate an exception and we return false. But, listStatus (which lists all 
> the entries of a directory) is an expensive operation when there are large 
> number of files in the directory. A new API is added in Hadoop 2.6 called 
> access (HDFS-6570) which provides the ability to check if the user has 
> permissions on a file/directory.  Use this new API instead of listStatus. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-4990) Use new HDFS API access instead of listStatus to check if users have permissions to access workspace.

2017-10-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16223220#comment-16223220
 ] 

ASF GitHub Bot commented on DRILL-4990:
---

Github user ppadma commented on the issue:

https://github.com/apache/drill/pull/652
  
This pull request was never merged because of a problem with windows test 
setup we have.  As a workaround, I added code to fall back to using old API if 
new API fails for some reason. All tests are passing fine with this change. 
This is nice to include in 1.12 as it provides performance improvement for 
all DFS based queries especially when there are large number of files. 
Can we review the new diffs please ?


> Use new HDFS API access instead of listStatus to check if users have 
> permissions to access workspace.
> -
>
> Key: DRILL-4990
> URL: https://issues.apache.org/jira/browse/DRILL-4990
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.8.0
>Reporter: Padma Penumarthy
>Assignee: Padma Penumarthy
>
> For every query, we build the schema tree 
> (runSQL->getPlan->getNewDefaultSchema->getRootSchema). All workspaces in all 
> storage plugins are checked and are added to the schema tree if they are 
> accessible by the user who initiated the query.  For file system plugin, 
> listStatus API is used to check if  the workspace is accessible or not 
> (WorkspaceSchemaFactory.accessible) by the user.  The idea seem to be if the 
> user does not have access to file(s) in the workspace, listStatus will 
> generate an exception and we return false. But, listStatus (which lists all 
> the entries of a directory) is an expensive operation when there are large 
> number of files in the directory. A new API is added in Hadoop 2.6 called 
> access (HDFS-6570) which provides the ability to check if the user has 
> permissions on a file/directory.  Use this new API instead of listStatus. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-4990) Use new HDFS API access instead of listStatus to check if users have permissions to access workspace.

2017-07-27 Thread Padma Penumarthy (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16103951#comment-16103951
 ] 

Padma Penumarthy commented on DRILL-4990:
-

[~kkhatua]no. It is not checked in.

> Use new HDFS API access instead of listStatus to check if users have 
> permissions to access workspace.
> -
>
> Key: DRILL-4990
> URL: https://issues.apache.org/jira/browse/DRILL-4990
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.8.0
>Reporter: Padma Penumarthy
>Assignee: Padma Penumarthy
> Fix For: 1.9.0
>
>
> For every query, we build the schema tree 
> (runSQL->getPlan->getNewDefaultSchema->getRootSchema). All workspaces in all 
> storage plugins are checked and are added to the schema tree if they are 
> accessible by the user who initiated the query.  For file system plugin, 
> listStatus API is used to check if  the workspace is accessible or not 
> (WorkspaceSchemaFactory.accessible) by the user.  The idea seem to be if the 
> user does not have access to file(s) in the workspace, listStatus will 
> generate an exception and we return false. But, listStatus (which lists all 
> the entries of a directory) is an expensive operation when there are large 
> number of files in the directory. A new API is added in Hadoop 2.6 called 
> access (HDFS-6570) which provides the ability to check if the user has 
> permissions on a file/directory.  Use this new API instead of listStatus. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-4990) Use new HDFS API access instead of listStatus to check if users have permissions to access workspace.

2017-07-27 Thread Kunal Khatua (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16103947#comment-16103947
 ] 

Kunal Khatua commented on DRILL-4990:
-

[~ppenumarthy] did we get this checked in?

> Use new HDFS API access instead of listStatus to check if users have 
> permissions to access workspace.
> -
>
> Key: DRILL-4990
> URL: https://issues.apache.org/jira/browse/DRILL-4990
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.8.0
>Reporter: Padma Penumarthy
>Assignee: Padma Penumarthy
> Fix For: 1.9.0
>
>
> For every query, we build the schema tree 
> (runSQL->getPlan->getNewDefaultSchema->getRootSchema). All workspaces in all 
> storage plugins are checked and are added to the schema tree if they are 
> accessible by the user who initiated the query.  For file system plugin, 
> listStatus API is used to check if  the workspace is accessible or not 
> (WorkspaceSchemaFactory.accessible) by the user.  The idea seem to be if the 
> user does not have access to file(s) in the workspace, listStatus will 
> generate an exception and we return false. But, listStatus (which lists all 
> the entries of a directory) is an expensive operation when there are large 
> number of files in the directory. A new API is added in Hadoop 2.6 called 
> access (HDFS-6570) which provides the ability to check if the user has 
> permissions on a file/directory.  Use this new API instead of listStatus. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-4990) Use new HDFS API access instead of listStatus to check if users have permissions to access workspace.

2017-02-24 Thread Kunal Khatua (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15883799#comment-15883799
 ] 

Kunal Khatua commented on DRILL-4990:
-

[~agirish] Can you check why the unit tests fail for WindowsVM ?

> Use new HDFS API access instead of listStatus to check if users have 
> permissions to access workspace.
> -
>
> Key: DRILL-4990
> URL: https://issues.apache.org/jira/browse/DRILL-4990
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.8.0
>Reporter: Padma Penumarthy
>Assignee: Padma Penumarthy
> Fix For: 1.9.0
>
>
> For every query, we build the schema tree 
> (runSQL->getPlan->getNewDefaultSchema->getRootSchema). All workspaces in all 
> storage plugins are checked and are added to the schema tree if they are 
> accessible by the user who initiated the query.  For file system plugin, 
> listStatus API is used to check if  the workspace is accessible or not 
> (WorkspaceSchemaFactory.accessible) by the user.  The idea seem to be if the 
> user does not have access to file(s) in the workspace, listStatus will 
> generate an exception and we return false. But, listStatus (which lists all 
> the entries of a directory) is an expensive operation when there are large 
> number of files in the directory. A new API is added in Hadoop 2.6 called 
> access (HDFS-6570) which provides the ability to check if the user has 
> permissions on a file/directory.  Use this new API instead of listStatus. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-4990) Use new HDFS API access instead of listStatus to check if users have permissions to access workspace.

2017-02-21 Thread Padma Penumarthy (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15877355#comment-15877355
 ] 

Padma Penumarthy commented on DRILL-4990:
-

on a windows VM, unit tests are failing with  AccessControlException.  Not able 
to establish the root cause i.e. whether it is a configuration problem or 
something else.

> Use new HDFS API access instead of listStatus to check if users have 
> permissions to access workspace.
> -
>
> Key: DRILL-4990
> URL: https://issues.apache.org/jira/browse/DRILL-4990
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.8.0
>Reporter: Padma Penumarthy
>Assignee: Padma Penumarthy
> Fix For: 1.9.0
>
>
> For every query, we build the schema tree 
> (runSQL->getPlan->getNewDefaultSchema->getRootSchema). All workspaces in all 
> storage plugins are checked and are added to the schema tree if they are 
> accessible by the user who initiated the query.  For file system plugin, 
> listStatus API is used to check if  the workspace is accessible or not 
> (WorkspaceSchemaFactory.accessible) by the user.  The idea seem to be if the 
> user does not have access to file(s) in the workspace, listStatus will 
> generate an exception and we return false. But, listStatus (which lists all 
> the entries of a directory) is an expensive operation when there are large 
> number of files in the directory. A new API is added in Hadoop 2.6 called 
> access (HDFS-6570) which provides the ability to check if the user has 
> permissions on a file/directory.  Use this new API instead of listStatus. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-4990) Use new HDFS API access instead of listStatus to check if users have permissions to access workspace.

2017-02-21 Thread Kunal Khatua (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15877280#comment-15877280
 ] 

Kunal Khatua commented on DRILL-4990:
-

[~ppenumarthy] What were the test issues that needed to be resolved?

> Use new HDFS API access instead of listStatus to check if users have 
> permissions to access workspace.
> -
>
> Key: DRILL-4990
> URL: https://issues.apache.org/jira/browse/DRILL-4990
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.8.0
>Reporter: Padma Penumarthy
>Assignee: Padma Penumarthy
> Fix For: 1.9.0
>
>
> For every query, we build the schema tree 
> (runSQL->getPlan->getNewDefaultSchema->getRootSchema). All workspaces in all 
> storage plugins are checked and are added to the schema tree if they are 
> accessible by the user who initiated the query.  For file system plugin, 
> listStatus API is used to check if  the workspace is accessible or not 
> (WorkspaceSchemaFactory.accessible) by the user.  The idea seem to be if the 
> user does not have access to file(s) in the workspace, listStatus will 
> generate an exception and we return false. But, listStatus (which lists all 
> the entries of a directory) is an expensive operation when there are large 
> number of files in the directory. A new API is added in Hadoop 2.6 called 
> access (HDFS-6570) which provides the ability to check if the user has 
> permissions on a file/directory.  Use this new API instead of listStatus. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-4990) Use new HDFS API access instead of listStatus to check if users have permissions to access workspace.

2016-12-06 Thread Zelaine Fong (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15725978#comment-15725978
 ] 

Zelaine Fong commented on DRILL-4990:
-

Changed status back to "In Progress".  I believe there are some test issues 
that still need to be resolved.

> Use new HDFS API access instead of listStatus to check if users have 
> permissions to access workspace.
> -
>
> Key: DRILL-4990
> URL: https://issues.apache.org/jira/browse/DRILL-4990
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.8.0
>Reporter: Padma Penumarthy
>Assignee: Padma Penumarthy
> Fix For: 1.9.0
>
>
> For every query, we build the schema tree 
> (runSQL->getPlan->getNewDefaultSchema->getRootSchema). All workspaces in all 
> storage plugins are checked and are added to the schema tree if they are 
> accessible by the user who initiated the query.  For file system plugin, 
> listStatus API is used to check if  the workspace is accessible or not 
> (WorkspaceSchemaFactory.accessible) by the user.  The idea seem to be if the 
> user does not have access to file(s) in the workspace, listStatus will 
> generate an exception and we return false. But, listStatus (which lists all 
> the entries of a directory) is an expensive operation when there are large 
> number of files in the directory. A new API is added in Hadoop 2.6 called 
> access (HDFS-6570) which provides the ability to check if the user has 
> permissions on a file/directory.  Use this new API instead of listStatus. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4990) Use new HDFS API access instead of listStatus to check if users have permissions to access workspace.

2016-11-15 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15668631#comment-15668631
 ] 

ASF GitHub Bot commented on DRILL-4990:
---

Github user sohami commented on the issue:

https://github.com/apache/drill/pull/652
  
LGTM


> Use new HDFS API access instead of listStatus to check if users have 
> permissions to access workspace.
> -
>
> Key: DRILL-4990
> URL: https://issues.apache.org/jira/browse/DRILL-4990
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.8.0
>Reporter: Padma Penumarthy
>Assignee: Sorabh Hamirwasia
> Fix For: 1.9.0
>
>
> For every query, we build the schema tree 
> (runSQL->getPlan->getNewDefaultSchema->getRootSchema). All workspaces in all 
> storage plugins are checked and are added to the schema tree if they are 
> accessible by the user who initiated the query.  For file system plugin, 
> listStatus API is used to check if  the workspace is accessible or not 
> (WorkspaceSchemaFactory.accessible) by the user.  The idea seem to be if the 
> user does not have access to file(s) in the workspace, listStatus will 
> generate an exception and we return false. But, listStatus (which lists all 
> the entries of a directory) is an expensive operation when there are large 
> number of files in the directory. A new API is added in Hadoop 2.6 called 
> access (HDFS-6570) which provides the ability to check if the user has 
> permissions on a file/directory.  Use this new API instead of listStatus. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4990) Use new HDFS API access instead of listStatus to check if users have permissions to access workspace.

2016-11-15 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15668074#comment-15668074
 ] 

ASF GitHub Bot commented on DRILL-4990:
---

Github user ppadma commented on a diff in the pull request:

https://github.com/apache/drill/pull/652#discussion_r88097089
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/WorkspaceSchemaFactory.java
 ---
@@ -151,15 +152,11 @@ public WorkspaceSchemaFactory(
   public boolean accessible(final String userName) throws IOException {
 final FileSystem fs = ImpersonationUtil.createFileSystem(userName, 
fsConf);
 try {
-  // We have to rely on the listStatus as a FileSystem can have 
complicated controls such as regular unix style
-  // permissions, Access Control Lists (ACLs) or Access Control 
Expressions (ACE). Hadoop 2.7 version of FileSystem
-  // has a limited private API (FileSystem.access) to check the 
permissions directly
-  // (see https://issues.apache.org/jira/browse/HDFS-6570). Drill 
currently relies on Hadoop 2.5.0 version of
-  // FileClient. TODO: Update this when DRILL-3749 is fixed.
-  fs.listStatus(wsPath);
+  fs.access(wsPath, FsAction.READ);
 } catch (final UnsupportedOperationException e) {
-  logger.trace("The filesystem for this workspace does not support 
this operation.", e);
+  logger.debug("The filesystem for this workspace does not support 
this operation.", e);
--- End diff --

I am not returning false to be consistent with the existing behavior.  If 
the user does not have permission, we fail when we try to read the file during 
execution. 


> Use new HDFS API access instead of listStatus to check if users have 
> permissions to access workspace.
> -
>
> Key: DRILL-4990
> URL: https://issues.apache.org/jira/browse/DRILL-4990
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.8.0
>Reporter: Padma Penumarthy
>Assignee: Sorabh Hamirwasia
> Fix For: 1.9.0
>
>
> For every query, we build the schema tree 
> (runSQL->getPlan->getNewDefaultSchema->getRootSchema). All workspaces in all 
> storage plugins are checked and are added to the schema tree if they are 
> accessible by the user who initiated the query.  For file system plugin, 
> listStatus API is used to check if  the workspace is accessible or not 
> (WorkspaceSchemaFactory.accessible) by the user.  The idea seem to be if the 
> user does not have access to file(s) in the workspace, listStatus will 
> generate an exception and we return false. But, listStatus (which lists all 
> the entries of a directory) is an expensive operation when there are large 
> number of files in the directory. A new API is added in Hadoop 2.6 called 
> access (HDFS-6570) which provides the ability to check if the user has 
> permissions on a file/directory.  Use this new API instead of listStatus. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4990) Use new HDFS API access instead of listStatus to check if users have permissions to access workspace.

2016-11-15 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15667810#comment-15667810
 ] 

ASF GitHub Bot commented on DRILL-4990:
---

Github user sohami commented on a diff in the pull request:

https://github.com/apache/drill/pull/652#discussion_r88072322
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/WorkspaceSchemaFactory.java
 ---
@@ -151,15 +152,11 @@ public WorkspaceSchemaFactory(
   public boolean accessible(final String userName) throws IOException {
 final FileSystem fs = ImpersonationUtil.createFileSystem(userName, 
fsConf);
 try {
-  // We have to rely on the listStatus as a FileSystem can have 
complicated controls such as regular unix style
-  // permissions, Access Control Lists (ACLs) or Access Control 
Expressions (ACE). Hadoop 2.7 version of FileSystem
--- End diff --

Also can we please replace the comments with why we are using "fs.access" 
api instead. Will be good for future use.


> Use new HDFS API access instead of listStatus to check if users have 
> permissions to access workspace.
> -
>
> Key: DRILL-4990
> URL: https://issues.apache.org/jira/browse/DRILL-4990
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.8.0
>Reporter: Padma Penumarthy
>Assignee: Sorabh Hamirwasia
> Fix For: 1.9.0
>
>
> For every query, we build the schema tree 
> (runSQL->getPlan->getNewDefaultSchema->getRootSchema). All workspaces in all 
> storage plugins are checked and are added to the schema tree if they are 
> accessible by the user who initiated the query.  For file system plugin, 
> listStatus API is used to check if  the workspace is accessible or not 
> (WorkspaceSchemaFactory.accessible) by the user.  The idea seem to be if the 
> user does not have access to file(s) in the workspace, listStatus will 
> generate an exception and we return false. But, listStatus (which lists all 
> the entries of a directory) is an expensive operation when there are large 
> number of files in the directory. A new API is added in Hadoop 2.6 called 
> access (HDFS-6570) which provides the ability to check if the user has 
> permissions on a file/directory.  Use this new API instead of listStatus. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4990) Use new HDFS API access instead of listStatus to check if users have permissions to access workspace.

2016-11-15 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15667811#comment-15667811
 ] 

ASF GitHub Bot commented on DRILL-4990:
---

Github user sohami commented on a diff in the pull request:

https://github.com/apache/drill/pull/652#discussion_r88072130
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/WorkspaceSchemaFactory.java
 ---
@@ -151,15 +152,11 @@ public WorkspaceSchemaFactory(
   public boolean accessible(final String userName) throws IOException {
 final FileSystem fs = ImpersonationUtil.createFileSystem(userName, 
fsConf);
 try {
-  // We have to rely on the listStatus as a FileSystem can have 
complicated controls such as regular unix style
-  // permissions, Access Control Lists (ACLs) or Access Control 
Expressions (ACE). Hadoop 2.7 version of FileSystem
-  // has a limited private API (FileSystem.access) to check the 
permissions directly
-  // (see https://issues.apache.org/jira/browse/HDFS-6570). Drill 
currently relies on Hadoop 2.5.0 version of
-  // FileClient. TODO: Update this when DRILL-3749 is fixed.
-  fs.listStatus(wsPath);
+  fs.access(wsPath, FsAction.READ);
 } catch (final UnsupportedOperationException e) {
-  logger.trace("The filesystem for this workspace does not support 
this operation.", e);
+  logger.debug("The filesystem for this workspace does not support 
this operation.", e);
--- End diff --

We should return false in this case as well. Better to have a local boolean 
variable (like boolean isAccessAllowed;) and set it accordingly in try and 
catch. Then have just 1 return statement in the end.


> Use new HDFS API access instead of listStatus to check if users have 
> permissions to access workspace.
> -
>
> Key: DRILL-4990
> URL: https://issues.apache.org/jira/browse/DRILL-4990
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.8.0
>Reporter: Padma Penumarthy
>Assignee: Sorabh Hamirwasia
> Fix For: 1.9.0
>
>
> For every query, we build the schema tree 
> (runSQL->getPlan->getNewDefaultSchema->getRootSchema). All workspaces in all 
> storage plugins are checked and are added to the schema tree if they are 
> accessible by the user who initiated the query.  For file system plugin, 
> listStatus API is used to check if  the workspace is accessible or not 
> (WorkspaceSchemaFactory.accessible) by the user.  The idea seem to be if the 
> user does not have access to file(s) in the workspace, listStatus will 
> generate an exception and we return false. But, listStatus (which lists all 
> the entries of a directory) is an expensive operation when there are large 
> number of files in the directory. A new API is added in Hadoop 2.6 called 
> access (HDFS-6570) which provides the ability to check if the user has 
> permissions on a file/directory.  Use this new API instead of listStatus. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4990) Use new HDFS API access instead of listStatus to check if users have permissions to access workspace.

2016-11-15 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15667808#comment-15667808
 ] 

ASF GitHub Bot commented on DRILL-4990:
---

Github user ppadma commented on the issue:

https://github.com/apache/drill/pull/652
  
I did not add new unit tests because existing tests already provide enough 
coverage and they run on local file system. 


> Use new HDFS API access instead of listStatus to check if users have 
> permissions to access workspace.
> -
>
> Key: DRILL-4990
> URL: https://issues.apache.org/jira/browse/DRILL-4990
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.8.0
>Reporter: Padma Penumarthy
>Assignee: Sorabh Hamirwasia
> Fix For: 1.9.0
>
>
> For every query, we build the schema tree 
> (runSQL->getPlan->getNewDefaultSchema->getRootSchema). All workspaces in all 
> storage plugins are checked and are added to the schema tree if they are 
> accessible by the user who initiated the query.  For file system plugin, 
> listStatus API is used to check if  the workspace is accessible or not 
> (WorkspaceSchemaFactory.accessible) by the user.  The idea seem to be if the 
> user does not have access to file(s) in the workspace, listStatus will 
> generate an exception and we return false. But, listStatus (which lists all 
> the entries of a directory) is an expensive operation when there are large 
> number of files in the directory. A new API is added in Hadoop 2.6 called 
> access (HDFS-6570) which provides the ability to check if the user has 
> permissions on a file/directory.  Use this new API instead of listStatus. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4990) Use new HDFS API access instead of listStatus to check if users have permissions to access workspace.

2016-11-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15665682#comment-15665682
 ] 

ASF GitHub Bot commented on DRILL-4990:
---

GitHub user ppadma opened a pull request:

https://github.com/apache/drill/pull/652

DRILL-4990:Use new HDFS API access instead of listStatus to check if …

…users have permissions to access workspace.

Manually tested the fix with impersonation enabled. All unit and regression 
tests pass.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ppadma/drill DRILL-4990

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/652.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #652






> Use new HDFS API access instead of listStatus to check if users have 
> permissions to access workspace.
> -
>
> Key: DRILL-4990
> URL: https://issues.apache.org/jira/browse/DRILL-4990
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.8.0
>Reporter: Padma Penumarthy
>Assignee: Padma Penumarthy
> Fix For: 1.9.0
>
>
> For every query, we build the schema tree 
> (runSQL->getPlan->getNewDefaultSchema->getRootSchema). All workspaces in all 
> storage plugins are checked and are added to the schema tree if they are 
> accessible by the user who initiated the query.  For file system plugin, 
> listStatus API is used to check if  the workspace is accessible or not 
> (WorkspaceSchemaFactory.accessible) by the user.  The idea seem to be if the 
> user does not have access to file(s) in the workspace, listStatus will 
> generate an exception and we return false. But, listStatus (which lists all 
> the entries of a directory) is an expensive operation when there are large 
> number of files in the directory. A new API is added in Hadoop 2.6 called 
> access (HDFS-6570) which provides the ability to check if the user has 
> permissions on a file/directory.  Use this new API instead of listStatus. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)