[ https://issues.apache.org/jira/browse/SPARK-32256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17155018#comment-17155018 ]
Shixiong Zhu commented on SPARK-32256: -------------------------------------- Yep. This doesn't happen in Hadoop 3.1.0 and above > Hive may fail to detect Hadoop version when using isolated classloader > ---------------------------------------------------------------------- > > Key: SPARK-32256 > URL: https://issues.apache.org/jira/browse/SPARK-32256 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 3.0.0 > Reporter: Shixiong Zhu > Assignee: Shixiong Zhu > Priority: Blocker > > Spark allows the user to set `spark.sql.hive.metastore.jars` to specify jars > to access Hive Metastore. These jars are loaded by the isolated classloader. > Because we also share Hadoop classes with the isolated classloader, the user > doesn't need to add Hadoop jars to `spark.sql.hive.metastore.jars`, which > means when we are using the isolated classloader, hadoop-common jar is not > available in this case. If Hadoop VersionInfo is not initialized before we > switch to the isolated classloader, and we try to initialize it using the > isolated classloader (the current thread context classloader), it will fail > and report `Unknown` which causes Hive to throw the following exception: > {code} > java.lang.RuntimeException: Illegal Hadoop Version: Unknown (expected A.B.* > format) > at > org.apache.hadoop.hive.shims.ShimLoader.getMajorVersion(ShimLoader.java:147) > at > org.apache.hadoop.hive.shims.ShimLoader.loadShims(ShimLoader.java:122) > at > org.apache.hadoop.hive.shims.ShimLoader.getHadoopShims(ShimLoader.java:88) > at > org.apache.hadoop.hive.metastore.ObjectStore.getDataSourceProps(ObjectStore.java:377) > at > org.apache.hadoop.hive.metastore.ObjectStore.setConf(ObjectStore.java:268) > at > org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:76) > at > org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136) > at > org.apache.hadoop.hive.metastore.RawStoreProxy.<init>(RawStoreProxy.java:58) > at > org.apache.hadoop.hive.metastore.RawStoreProxy.getProxy(RawStoreProxy.java:67) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStore(HiveMetaStore.java:517) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:482) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:544) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:370) > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.<init>(RetryingHMSHandler.java:78) > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:84) > at > org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:5762) > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:219) > at > org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.<init>(SessionHiveMetaStoreClient.java:67) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at > org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1548) > at > org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.<init>(RetryingMetaStoreClient.java:86) > at > org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:132) > at > org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:104) > at > org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3080) > at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3108) > at > org.apache.hadoop.hive.ql.metadata.Hive.getAllFunctions(Hive.java:3349) > at > org.apache.hadoop.hive.ql.metadata.Hive.reloadFunctions(Hive.java:217) > at > org.apache.hadoop.hive.ql.metadata.Hive.registerAllFunctionsOnce(Hive.java:204) > at org.apache.hadoop.hive.ql.metadata.Hive.<init>(Hive.java:331) > at org.apache.hadoop.hive.ql.metadata.Hive.get(Hive.java:292) > at org.apache.hadoop.hive.ql.metadata.Hive.getInternal(Hive.java:262) > at org.apache.hadoop.hive.ql.metadata.Hive.get(Hive.java:247) > at > org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:543) > at > org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:511) > at > org.apache.spark.sql.hive.client.HiveClientImpl.newState(HiveClientImpl.scala:175) > at > org.apache.spark.sql.hive.client.HiveClientImpl.<init>(HiveClientImpl.scala:128) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at > org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:301) > at > org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:431) > at > org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:324) > at > org.apache.spark.sql.hive.HiveExternalCatalog.client$lzycompute(HiveExternalCatalog.scala:72) > at > org.apache.spark.sql.hive.HiveExternalCatalog.client(HiveExternalCatalog.scala:71) > at > org.apache.spark.sql.hive.client.HadoopVersionInfoSuite.$anonfun$new$1(HadoopVersionInfoSuite.scala:63) > at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85) > at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83) > {code} > Technically, This is indeed an issue of Hadoop VersionInfo which has been > fixed: https://issues.apache.org/jira/browse/HADOOP-14067. But since we are > still supporting old Hadoop versions, we should fix it. > Why this issue starts to happen in Spark 3.0.0? > In Spark 2.4.x, we use Hive 1.2.1 by default. It will trigger `VersionInfo` > initialization in the static codes of `Hive` class. This will happen when we > load `HiveClientImpl` class because `HiveClientImpl.clent` method refers to > `Hive` class. At this moment, the thread context classloader is not using the > isolcated classloader, so it can access hadoop-common jar on the classpath > and initialize it correctly. > In Spark 3.0.0, we use Hive 2.3.7. The static codes of `Hive` class are not > accessing `VersionInfo` because of the change in > https://issues.apache.org/jira/browse/HIVE-11657. Instead, accessing > `VersionInfo` happens when creating a `Hive` object (See the above stack > trace). This happens here > https://github.com/apache/spark/blob/v3.0.0/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala#L260. > But we switch to the isolated classloader before calling > `HiveClientImpl.client` (See > https://github.com/apache/spark/blob/v3.0.0/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala#L283). > This is exactly what I mentioned above: `If Hadoop VersionInfo is not > initialized before we switch to the isolated classloader, and we try to > initialize it using the isolated classloader (the current thread context > classloader), it will fail` > I marked this is a blocker because it's a regression in 3.0.0 caused by > upgrading Hive execution version from 1.2.1 to 2.3.7. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org