[ https://issues.apache.org/jira/browse/SPARK-5159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14315951#comment-14315951 ]
Tao Wang edited comment on SPARK-5159 at 2/11/15 10:18 AM: ----------------------------------------------------------- I have tested this on branch 1.2, below are results: 1.When set hive.server2.enable.doAs=false, I use user `hdfs` to connect ThriftServer, then do some operation, the audit log in NameNode shows like this: bq.2015-02-11 18:07:50,568 | INFO | IPC Server handler 62 on 25000 | allowed=true ugi=hdfs (auth:PROXY) via spark/had...@hadoop.com (auth:KERBEROS) ip=/9.91.11.204 cmd=getfileinfo src=/user/sparkhive/warehouse/yarn.db/child dst=null perm=null | org.apache.hadoop.hdfs.server.namenode.FSNamesystem$DefaultAuditLogger.logAuditMessage(FSNamesystem.java:7950) 2015-02-11 18:07:50,577 | INFO | IPC Server handler 16 on 25000 | allowed=true ugi=hdfs (auth:PROXY) via spark/had...@hadoop.com (auth:KERBEROS) ip=/9.91.11.204 cmd=mkdirs src=/user/sparkhive/warehouse/yarn.db/child dst=null perm=hdfs:hadoop:rwxr-xr-x | org.apache.hadoop.hdfs.server.namenode.FSNamesystem$DefaultAuditLogger.logAuditMessage(FSNamesystem.java:7950) and ThriftServer's log shows like this: bq.2015-02-11 18:07:50,471 | INFO | [pool-9-thread-2] | ugi=hdfs ip=unknown-ip-addr cmd=create_table: Table(tableName:child, dbName:yarn, owner:hdfs, createTime:1423649270, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:name, type:string, comment:null), FieldSchema(name:age, type:int, comment:null)], location:null, inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{serialization.format=,, field.delim=,}), bucketCols:[], sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], skewedColValueLocationMaps:{}), storedAsSubDirectories:false), partitionKeys:[], parameters:{}, viewOriginalText:null, viewExpandedText:null, tableType:MANAGED_TABLE) | org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.logAuditEvent(HiveMetaStore.java:305) 2. When set hive.server2.enable.doAs=true, NameNode's log show like this: bq.2015-02-11 18:00:05,599 | INFO | IPC Server handler 32 on 25000 | allowed=true ugi=spark/had...@hadoop.com (auth:KERBEROS) ip=/9.91.11.204 cmd=getfileinfo src=/user/sparkhive/warehouse/yarn.db dst=null perm=null | org.apache.hadoop.hdfs.server.namenode.FSNamesystem$DefaultAuditLogger.logAuditMessage(FSNamesystem.java:7950) 2015-02-11 18:00:05,607 | INFO | IPC Server handler 24 on 25000 | allowed=true ugi=spark/had...@hadoop.com (auth:KERBEROS) ip=/9.91.11.204 cmd=mkdirs src=/user/sparkhive/warehouse/yarn.db dst=null perm=spark:hadoop:rwxr-xr-x | org.apache.hadoop.hdfs.server.namenode.FSNamesystem$DefaultAuditLogger.logAuditMessage(FSNamesystem.java:7950) ThriftServer's log shows like this: bq.2015-02-11 18:00:05,437 | INFO | [pool-9-thread-2] | ugi=spark/had...@hadoop.com ip=unknown-ip-addr cmd=create_database: Database(name:yarn, description:null, locationUri:null, parameters:null, ownerName:spark, ownerType:USER) | org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.logAuditEvent(HiveMetaStore.java:305) 2015-02-11 18:00:05,437 | INFO | [pool-9-thread-2] | 2: get_database: yarn | org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.logInfo(HiveMetaStore.java:623) 2015-02-11 18:00:05,438 | INFO | [pool-9-thread-2] | ugi=spark/had...@hadoop.com ip=unknown-ip-addr cmd=get_database: yarn | org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.logAuditEvent(HiveMetaStore.java:305) I am not an expert on Hive or `doAs` feature. But it met my expect from my point. P.S. spark/had...@hadoop.com is the principle for HiveServer2 to access HDFS. was (Author: wangtaothetonic): I have tested this on branch 1.2, below are results: 1.When set hive.server2.enable.doAs=false, I use user `hdfs` to connect ThriftServer, then do some operation, the audit log in NameNode shows like this: bq. 2015-02-11 18:07:50,568 | INFO | IPC Server handler 62 on 25000 | allowed=true ugi=hdfs (auth:PROXY) via spark/had...@hadoop.com (auth:KERBEROS) ip=/9.91.11.204 cmd=getfileinfo src=/user/sparkhive/warehouse/yarn.db/child dst=null perm=null | org.apache.hadoop.hdfs.server.namenode.FSNamesystem$DefaultAuditLogger.logAuditMessage(FSNamesystem.java:7950) 2015-02-11 18:07:50,577 | INFO | IPC Server handler 16 on 25000 | allowed=true ugi=hdfs (auth:PROXY) via spark/had...@hadoop.com (auth:KERBEROS) ip=/9.91.11.204 cmd=mkdirs src=/user/sparkhive/warehouse/yarn.db/child dst=null perm=hdfs:hadoop:rwxr-xr-x | org.apache.hadoop.hdfs.server.namenode.FSNamesystem$DefaultAuditLogger.logAuditMessage(FSNamesystem.java:7950) and ThriftServer's log shows like this: bq. 2015-02-11 18:07:50,471 | INFO | [pool-9-thread-2] | ugi=hdfs ip=unknown-ip-addr cmd=create_table: Table(tableName:child, dbName:yarn, owner:hdfs, createTime:1423649270, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:name, type:string, comment:null), FieldSchema(name:age, type:int, comment:null)], location:null, inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{serialization.format=,, field.delim=,}), bucketCols:[], sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], skewedColValueLocationMaps:{}), storedAsSubDirectories:false), partitionKeys:[], parameters:{}, viewOriginalText:null, viewExpandedText:null, tableType:MANAGED_TABLE) | org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.logAuditEvent(HiveMetaStore.java:305) 2. When set hive.server2.enable.doAs=true, NameNode's log show like this: bq. 2015-02-11 18:00:05,599 | INFO | IPC Server handler 32 on 25000 | allowed=true ugi=spark/had...@hadoop.com (auth:KERBEROS) ip=/9.91.11.204 cmd=getfileinfo src=/user/sparkhive/warehouse/yarn.db dst=null perm=null | org.apache.hadoop.hdfs.server.namenode.FSNamesystem$DefaultAuditLogger.logAuditMessage(FSNamesystem.java:7950) 2015-02-11 18:00:05,607 | INFO | IPC Server handler 24 on 25000 | allowed=true ugi=spark/had...@hadoop.com (auth:KERBEROS) ip=/9.91.11.204 cmd=mkdirs src=/user/sparkhive/warehouse/yarn.db dst=null perm=spark:hadoop:rwxr-xr-x | org.apache.hadoop.hdfs.server.namenode.FSNamesystem$DefaultAuditLogger.logAuditMessage(FSNamesystem.java:7950) ThriftServer's log shows like this: bq. 2015-02-11 18:00:05,437 | INFO | [pool-9-thread-2] | ugi=spark/had...@hadoop.com ip=unknown-ip-addr cmd=create_database: Database(name:yarn, description:null, locationUri:null, parameters:null, ownerName:spark, ownerType:USER) | org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.logAuditEvent(HiveMetaStore.java:305) 2015-02-11 18:00:05,437 | INFO | [pool-9-thread-2] | 2: get_database: yarn | org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.logInfo(HiveMetaStore.java:623) 2015-02-11 18:00:05,438 | INFO | [pool-9-thread-2] | ugi=spark/had...@hadoop.com ip=unknown-ip-addr cmd=get_database: yarn | org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.logAuditEvent(HiveMetaStore.java:305) I am not an expert on Hive or `doAs` feature. But it met my expect from my point. P.S. spark/had...@hadoop.com is the principle for HiveServer2 to access HDFS. > Thrift server does not respect hive.server2.enable.doAs=true > ------------------------------------------------------------ > > Key: SPARK-5159 > URL: https://issues.apache.org/jira/browse/SPARK-5159 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 1.2.0 > Reporter: Andrew Ray > > I'm currently testing the spark sql thrift server on a kerberos secured > cluster in YARN mode. Currently any user can access any table regardless of > HDFS permissions as all data is read as the hive user. In HiveServer2 the > property hive.server2.enable.doAs=true causes all access to be done as the > submitting user. We should do the same. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org