Min Zhou created SPARK-8672: ------------------------------- Summary: throws NPE when running spark sql thrift server with session state authenticator Key: SPARK-8672 URL: https://issues.apache.org/jira/browse/SPARK-8672 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.4.0 Reporter: Min Zhou
Here is the configuration {noformat} <property> <name>hive.security.authorization.enabled</name> <value>false</value> </property> <property> <name>hive.security.authorization.manager</name> <value>org.apache.hadoop.hive.ql.security.authorization.StorageBasedAuthorizationProvider</value> </property> <property> <name>hive.security.metastore.authorization.manager</name> <value>org.apache.hadoop.hive.ql.security.authorization.StorageBasedAuthorizationProvider</value> </property> <property> <name>hive.security.authenticator.manager</name> <value>org.apache.hadoop.hive.ql.security.SessionStateUserAuthenticator</value> </property> <property> <name>hive.server2.enable.doAs</name> <value>true</value> </property> {noformat} I started spark thrift server with -hiveconf hive.security.authorization.enabled=true when logining to beeline, connect with jdbc and create a table after that, a NPE throws. The stack is like below {format} FAILED: NullPointerException null org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$runHive$1.apply(ClientWrapper.scala:333) org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$runHive$1.apply(ClientWrapper.scala:310) org.apache.spark.sql.hive.client.ClientWrapper.withHiveState(ClientWrapper.scala:139) org.apache.spark.sql.hive.client.ClientWrapper.runHive(ClientWrapper.scala:310) org.apache.spark.sql.hive.client.ClientWrapper.runSqlHive(ClientWrapper.scala:300) org.apache.spark.sql.hive.HiveContext.runSqlHive(HiveContext.scala:472) org.apache.spark.sql.hive.execution.HiveNativeCommand.run(HiveNativeCommand.scala:33) org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:57) org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:57) org.apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:68) org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:88) org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:88) org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:148) org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:87) org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:939) org.apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:939) org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:144) org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:128) org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:51) org.apache.spark.sql.SQLContext.sql(SQLContext.scala:744) org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.run(Shim13.scala:178) org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:231) org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:218) sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) java.lang.reflect.Method.invoke(Method.java:606) org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:79) org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:37) org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:64) java.security.AccessController.doPrivileged(Native Method) javax.security.auth.Subject.doAs(Subject.java:415) org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) org.apache.hadoop.hive.shims.HadoopShimsSecure.doAs(HadoopShimsSecure.java:493) org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:60) com.sun.proxy.$Proxy39.executeStatementAsync(Unknown Source) org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:233) org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:344) org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1313) org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1298) org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:55) org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) java.lang.Thread.run(Thread.java:745) {noformat} After some deep tracing, i notice the root cause is a null username inside a SessionState object, which is passed to a SessionStateUserAuthenticator instance. The stack above shows it doesn't get the right SessionState. Instead get a SessionState from thread local. On the other hand, every time a user login. a new SessionState with a correct username will be created {noformat} org/apache/hadoop/hive/ql/session/SessionState.<init>(Lorg/apache/hadoop/hive/conf/HiveConf;Ljava/lang/String;)V call by thread [pool-20-thread-3] org.apache.hive.service.cli.session.HiveSessionImpl.<init>(HiveSessionImpl.java:104) org.apache.hive.service.cli.session.HiveSessionImplwithUGI.<init>(HiveSessionImplwithUGI.java:49) org.apache.hive.service.cli.session.SessionManager.openSession(SessionManager.java:122) org.apache.spark.sql.hive.thriftserver.SparkSQLSessionManager.openSession(Shim13.scala:241) org.apache.hive.service.cli.CLIService.openSessionWithImpersonation(CLIService.java:161) org.apache.hive.service.cli.thrift.ThriftCLIService.getSessionHandle(ThriftCLIService.java:265) org.apache.hive.service.cli.thrift.ThriftCLIService.OpenSession(ThriftCLIService.java:191) org.apache.hive.service.cli.thrift.TCLIService$Processor$OpenSession.getResult(TCLIService.java:1253) org.apache.hive.service.cli.thrift.TCLIService$Processor$OpenSession.getResult(TCLIService.java:1238) org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:55) org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) java.lang.Thread.run(Thread.java:745) {noformat} However, the CREATE TABLE statement never retrieve this sessionstate, running queries upon this right instance. So , NPE happens -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org