[ https://issues.apache.org/jira/browse/SPARK-22891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16305862#comment-16305862 ]
Feng Liu edited comment on SPARK-22891 at 12/29/17 12:56 AM: ------------------------------------------------------------- This is definitely caused by the race from https://issues.apache.org/jira/browse/HIVE-11935. In spark 2.1, spark creates the `metadataHive` lazily until `addJar`(https://github.com/apache/spark/blob/branch-2.1/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveSessionState.scala#L40), so this can only be triggered by concurrent `addJar` (can't imagine this will happen in practice) In spark 2.2, the `metadataHive` creation is tied to the `resourceLoader` creation (see the stack trace), so it starts to be triggered by new spark session creation. In https://github.com/apache/spark/pull/20029, I'm trying to make an argument that it is safe to remove the new hive client creation. Besides this change, I think we should also make the hive client creation thread safe: https://github.com/apache/spark/blob/master/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/IsolatedClientLoader.scala#L251 was (Author: liufeng...@gmail.com): This is definitely caused by the race from https://issues.apache.org/jira/browse/HIVE-11935. In spark 2.1, spark creates the `metadataHive` lazily until `addJar`(https://github.com/apache/spark/blob/branch-2.1/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveSessionState.scala#L40), so this can only be triggered by concurrent `addJar` (can't imagine this will happen in practice) In spark 2.2, the `metadataHive` creation is tied to the `resourceLoader` creation (see the stack trace), so it starts to be triggered by new spark session creation. In https://github.com/apache/spark/pull/20029, I'm trying to make an argument that it is safe to remove the new hive client creation. Besides this change, I think should also make the hive client creation thread safe: https://github.com/apache/spark/blob/master/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/IsolatedClientLoader.scala#L251 > NullPointerException when use udf > --------------------------------- > > Key: SPARK-22891 > URL: https://issues.apache.org/jira/browse/SPARK-22891 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.2.0, 2.2.1 > Environment: hadoop 2.7.2 > Reporter: gaoyang > Priority: Minor > > In my application,i use multi threads. Each thread has a SparkSession and use > SparkSession.sqlContext.udf.register to register my udf. Sometimes there > throws exception like this: > {code:java} > java.lang.IllegalArgumentException: Error while instantiating > 'org.apache.spark.sql.hive.HiveSessionStateBuilder': > at > org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$instantiateSessionState(SparkSession.scala:1062) > at > org.apache.spark.sql.SparkSession$$anonfun$sessionState$2.apply(SparkSession.scala:137) > at > org.apache.spark.sql.SparkSession$$anonfun$sessionState$2.apply(SparkSession.scala:136) > at scala.Option.getOrElse(Option.scala:121) > at > org.apache.spark.sql.SparkSession.sessionState$lzycompute(SparkSession.scala:136) > at > org.apache.spark.sql.SparkSession.sessionState(SparkSession.scala:133) > at org.apache.spark.sql.SparkSession.udf(SparkSession.scala:207) > at org.apache.spark.sql.SQLContext.udf(SQLContext.scala:203) > at > com.game.data.stat.clusterTask.tools.standard.IpConverterRegister$.run(IpConverterRegister.scala:63) > at > ... 20 more > Caused by: java.lang.reflect.InvocationTargetException > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at > org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:264) > at > org.apache.spark.sql.hive.client.HiveClientImpl.newSession(HiveClientImpl.scala:789) > at > org.apache.spark.sql.hive.client.HiveClientImpl.newSession(HiveClientImpl.scala:79) > at > org.apache.spark.sql.hive.HiveSessionStateBuilder.resourceLoader$lzycompute(HiveSessionStateBuilder.scala:45) > at > org.apache.spark.sql.hive.HiveSessionStateBuilder.resourceLoader(HiveSessionStateBuilder.scala:44) > at > org.apache.spark.sql.hive.HiveSessionStateBuilder.catalog$lzycompute(HiveSessionStateBuilder.scala:61) > at > org.apache.spark.sql.hive.HiveSessionStateBuilder.catalog(HiveSessionStateBuilder.scala:52) > at > org.apache.spark.sql.hive.HiveSessionStateBuilder.catalog(HiveSessionStateBuilder.scala:35) > at > org.apache.spark.sql.internal.BaseSessionStateBuilder.build(BaseSessionStateBuilder.scala:289) > at > org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$instantiateSessionState(SparkSession.scala:1059) > ... 20 more > Caused by: java.lang.RuntimeException: > org.apache.hadoop.hive.ql.metadata.HiveException > at > org.apache.hadoop.hive.ql.session.SessionState.setupAuth(SessionState.java:744) > at > org.apache.hadoop.hive.ql.session.SessionState.getAuthenticator(SessionState.java:1391) > at > org.apache.spark.sql.hive.client.HiveClientImpl.<init>(HiveClientImpl.scala:210) > ... 34 more > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException > at > org.apache.hadoop.hive.ql.session.SessionState.setAuthorizerV2Config(SessionState.java:769) > at > org.apache.hadoop.hive.ql.session.SessionState.setupAuth(SessionState.java:736) > ... 36 more > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.isCompatibleWith(HiveMetaStoreClient.java:287) > at sun.reflect.GeneratedMethodAccessor45.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:156) > at com.sun.proxy.$Proxy25.isCompatibleWith(Unknown Source) > at org.apache.hadoop.hive.ql.metadata.Hive.get(Hive.java:206) > at > org.apache.hadoop.hive.ql.session.SessionState.setAuthorizerV2Config(SessionState.java:765) > ... 37 more > {code} > Also, i use apache hive 2.1.1 in my cluster. > When i use spark 2.1.x, the exception above never happends again. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org