[ 
https://issues.apache.org/jira/browse/SPARK-8427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-8427:
-----------------------------------
    Priority: Critical  (was: Blocker)

> Incorrect ACL checking for partitioned table in Spark SQL-1.4
> -------------------------------------------------------------
>
>                 Key: SPARK-8427
>                 URL: https://issues.apache.org/jira/browse/SPARK-8427
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.4.0
>         Environment: CentOS 6 & OS X 10.9.5, Hive-0.13.1, Spark-1.4, Hadoop 
> 2.6.0
>            Reporter: Karthik Subramanian
>            Priority: Critical
>              Labels: security
>
> Problem Statement:
> While doing query on a partitioned table using Spark SQL (Version 1.4.0), 
> access denied exception is observed on the partition the user doesn’t belong 
> to (The user permission is controlled using HDF ACLs). The same works 
> correctly in hive.
> Usercase: To address Multitenancy
> Consider a table containing multiple customers and each customer with 
> multiple facility. The table is partitioned by customer and facility. The 
> user belonging to on facility will not have access to other facility. This is 
> enforced using HDFS ACLs on corresponding directories. When querying on the 
> table as ‘user1’ belonging to ‘facility1’ and ‘customer1’ on the particular 
> partition (using ‘where’ clause) only the corresponding directory access 
> should be verified and not the entire table. 
> The above use case works as expected when using HIVE client, version 0.13.1 & 
> 1.1.0. 
> The query used: select count(*) from customertable where customer=‘customer1’ 
> and facility=‘facility1’
> Below is the exception received in Spark-shell:
> org.apache.hadoop.security.AccessControlException: Permission denied: 
> user=user1, access=READ_EXECUTE, 
> inode="/data/customertable/customer=customer2/facility=facility2”:root:supergroup:drwxrwx---:group::r-x,group:facility2:rwx
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkAccessAcl(FSPermissionChecker.java:351)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:253)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:185)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6512)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6494)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPathAccess(FSNamesystem.java:6419)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getListingInt(FSNamesystem.java:4954)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getListing(FSNamesystem.java:4915)
>       at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getListing(NameNodeRpcServer.java:826)
>       at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getListing(ClientNamenodeProtocolServerSideTranslatorPB.java:612)
>       at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>       at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
>       at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
>       at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039)
>       at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:415)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>       at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033)
>       at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>       at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>       at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>       at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>       at 
> org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
>       at 
> org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73)
>       at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1971)
>       at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1952)
>       at 
> org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:693)
>       at 
> org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem.java:105)
>       at 
> org.apache.hadoop.hdfs.DistributedFileSystem$15.doCall(DistributedFileSystem.java:755)
>       at 
> org.apache.hadoop.hdfs.DistributedFileSystem$15.doCall(DistributedFileSystem.java:751)
>       at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>       at 
> org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:751)
>       at 
> org.apache.spark.sql.sources.HadoopFsRelation$FileStatusCache.org$apache$spark$sql$sources$HadoopFsRelation$FileStatusCache$$listLeafFilesAndDirs$1(interfaces.scala:390)
>       at 
> org.apache.spark.sql.sources.HadoopFsRelation$FileStatusCache$$anonfun$2$$anonfun$apply$2.apply(interfaces.scala:402)
>       at 
> org.apache.spark.sql.sources.HadoopFsRelation$FileStatusCache$$anonfun$2$$anonfun$apply$2.apply(interfaces.scala:402)
>       at 
> scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
>       at 
> scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
>       at 
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
>       at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
>       at 
> scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:251)
>       at scala.collection.mutable.ArrayOps$ofRef.flatMap(ArrayOps.scala:108)
>       at 
> org.apache.spark.sql.sources.HadoopFsRelation$FileStatusCache$$anonfun$2.apply(interfaces.scala:402)
>       at 
> org.apache.spark.sql.sources.HadoopFsRelation$FileStatusCache$$anonfun$2.apply(interfaces.scala:398)
>       at 
> scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
>       at 
> scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
>       at 
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
>       at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
>       at 
> scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:251)
>       at scala.collection.mutable.ArrayOps$ofRef.flatMap(ArrayOps.scala:108)
>       at 
> org.apache.spark.sql.sources.HadoopFsRelation$FileStatusCache.refresh(interfaces.scala:398)
>       at 
> org.apache.spark.sql.sources.HadoopFsRelation.org$apache$spark$sql$sources$HadoopFsRelation$$fileStatusCache$lzycompute(interfaces.scala:416)
>       at 
> org.apache.spark.sql.sources.HadoopFsRelation.org$apache$spark$sql$sources$HadoopFsRelation$$fileStatusCache(interfaces.scala:414)
>       at 
> org.apache.spark.sql.sources.HadoopFsRelation.cachedLeafStatuses(interfaces.scala:421)
>       at 
> org.apache.spark.sql.parquet.ParquetRelation2$MetadataCache.refresh(newParquet.scala:355)
>       at 
> org.apache.spark.sql.parquet.ParquetRelation2.org$apache$spark$sql$parquet$ParquetRelation2$$metadataCache$lzycompute(newParquet.scala:154)
>       at 
> org.apache.spark.sql.parquet.ParquetRelation2.org$apache$spark$sql$parquet$ParquetRelation2$$metadataCache(newParquet.scala:152)
>       at 
> org.apache.spark.sql.parquet.ParquetRelation2$$anonfun$dataSchema$1.apply(newParquet.scala:193)
>       at 
> org.apache.spark.sql.parquet.ParquetRelation2$$anonfun$dataSchema$1.apply(newParquet.scala:193)
>       at scala.Option.getOrElse(Option.scala:120)
>       at 
> org.apache.spark.sql.parquet.ParquetRelation2.dataSchema(newParquet.scala:193)
>       at 
> org.apache.spark.sql.sources.HadoopFsRelation.schema$lzycompute(interfaces.scala:505)
>       at 
> org.apache.spark.sql.sources.HadoopFsRelation.schema(interfaces.scala:504)
>       at 
> org.apache.spark.sql.sources.LogicalRelation.<init>(LogicalRelation.scala:30)
>       at 
> org.apache.spark.sql.hive.HiveMetastoreCatalog$$anonfun$19.apply(HiveMetastoreCatalog.scala:314)
>       at 
> org.apache.spark.sql.hive.HiveMetastoreCatalog$$anonfun$19.apply(HiveMetastoreCatalog.scala:313)
>       at scala.Option.getOrElse(Option.scala:120)
>       at 
> org.apache.spark.sql.hive.HiveMetastoreCatalog.org$apache$spark$sql$hive$HiveMetastoreCatalog$$convertToParquetRelation(HiveMetastoreCatalog.scala:313)
>       at 
> org.apache.spark.sql.hive.HiveMetastoreCatalog$ParquetConversions$$anonfun$1.applyOrElse(HiveMetastoreCatalog.scala:406)
>       at 
> org.apache.spark.sql.hive.HiveMetastoreCatalog$ParquetConversions$$anonfun$1.applyOrElse(HiveMetastoreCatalog.scala:378)
>       at scala.PartialFunction$Lifted.apply(PartialFunction.scala:218)
>       at scala.PartialFunction$Lifted.apply(PartialFunction.scala:214)
>       at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$collect$1.apply(TreeNode.scala:129)
>       at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$collect$1.apply(TreeNode.scala:129)
>       at 
> org.apache.spark.sql.catalyst.trees.TreeNode.foreach(TreeNode.scala:88)
>       at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreach$1.apply(TreeNode.scala:89)
>       at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreach$1.apply(TreeNode.scala:89)
>       at scala.collection.immutable.List.foreach(List.scala:318)
>       at 
> org.apache.spark.sql.catalyst.trees.TreeNode.foreach(TreeNode.scala:89)
>       at 
> org.apache.spark.sql.catalyst.trees.TreeNode.collect(TreeNode.scala:129)
>       at 
> org.apache.spark.sql.hive.HiveMetastoreCatalog$ParquetConversions$.apply(HiveMetastoreCatalog.scala:378)
>       at 
> org.apache.spark.sql.hive.HiveMetastoreCatalog$ParquetConversions$.apply(HiveMetastoreCatalog.scala:371)
>       at 
> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:61)
>       at 
> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:59)
>       at 
> scala.collection.LinearSeqOptimized$class.foldLeft(LinearSeqOptimized.scala:111)
>       at scala.collection.immutable.List.foldLeft(List.scala:84)
>       at 
> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:59)
>       at 
> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:51)
>       at scala.collection.immutable.List.foreach(List.scala:318)
>       at 
> org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:51)
>       at 
> org.apache.spark.sql.SQLContext$QueryExecution.analyzed$lzycompute(SQLContext.scala:922)
>       at 
> org.apache.spark.sql.SQLContext$QueryExecution.analyzed(SQLContext.scala:922)
>       at 
> org.apache.spark.sql.SQLContext$QueryExecution.assertAnalyzed(SQLContext.scala:920)
>       at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:131)
>       at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:51)
>       at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:744)
>       at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:23)
>       at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:28)
>       at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:30)
>       at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:32)
>       at $iwC$$iwC$$iwC$$iwC.<init>(<console>:34)
>       at $iwC$$iwC$$iwC.<init>(<console>:36)
>       at $iwC$$iwC.<init>(<console>:38)
>       at $iwC.<init>(<console>:40)
>       at <init>(<console>:42)
>       at .<init>(<console>:46)
>       at .<clinit>(<console>)
>       at .<init>(<console>:7)
>       at .<clinit>(<console>)
>       at $print(<console>)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:606)
>       at 
> org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
>       at 
> org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1338)
>       at 
> org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
>       at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
>       at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
>       at 
> org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857)
>       at 
> org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
>       at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814)
>       at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657)
>       at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665)
>       at 
> org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670)
>       at 
> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997)
>       at 
> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
>       at 
> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
>       at 
> scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
>       at 
> org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)
>       at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)
>       at org.apache.spark.repl.Main$.main(Main.scala:31)
>       at org.apache.spark.repl.Main.main(Main.scala)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:606)
>       at 
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:664)
>       at 
> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:169)
>       at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:192)
>       at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:111)
>       at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> Caused by: 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException):
>  Permission denied: user=user1, access=READ_EXECUTE, 
> inode="/data/customertable/customer=customer2/facility=facility2”:root:supergroup:drwxrwx---:group::r-x,group:facility2:rwx
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkAccessAcl(FSPermissionChecker.java:351)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:253)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:185)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6512)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6494)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPathAccess(FSNamesystem.java:6419)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getListingInt(FSNamesystem.java:4954)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getListing(FSNamesystem.java:4915)
>       at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getListing(NameNodeRpcServer.java:826)
>       at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getListing(ClientNamenodeProtocolServerSideTranslatorPB.java:612)
>       at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>       at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
>       at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
>       at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039)
>       at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:415)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>       at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033)
>       at org.apache.hadoop.ipc.Client.call(Client.java:1468)
>       at org.apache.hadoop.ipc.Client.call(Client.java:1399)
>       at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
>       at com.sun.proxy.$Proxy17.getListing(Unknown Source)
>       at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getListing(ClientNamenodeProtocolTranslatorPB.java:554)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:606)
>       at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
>       at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>       at com.sun.proxy.$Proxy18.getListing(Unknown Source)
>       at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1969)
>       ... 116 more



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to