[ 
https://issues.apache.org/jira/browse/SPARK-32256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17155018#comment-17155018
 ] 

Shixiong Zhu commented on SPARK-32256:
--------------------------------------

Yep. This doesn't happen in Hadoop 3.1.0 and above 

> Hive may fail to detect Hadoop version when using isolated classloader
> ----------------------------------------------------------------------
>
>                 Key: SPARK-32256
>                 URL: https://issues.apache.org/jira/browse/SPARK-32256
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.0.0
>            Reporter: Shixiong Zhu
>            Assignee: Shixiong Zhu
>            Priority: Blocker
>
> Spark allows the user to set `spark.sql.hive.metastore.jars` to specify jars 
> to access Hive Metastore. These jars are loaded by the isolated classloader. 
> Because we also share Hadoop classes with the isolated classloader, the user 
> doesn't need to add Hadoop jars to `spark.sql.hive.metastore.jars`, which 
> means when we are using the isolated classloader, hadoop-common jar is not 
> available in this case. If Hadoop VersionInfo is not initialized before we 
> switch to the isolated classloader, and we try to initialize it using the 
> isolated classloader (the current thread context classloader), it will fail 
> and report `Unknown` which causes Hive to throw the following exception:
> {code}
> java.lang.RuntimeException: Illegal Hadoop Version: Unknown (expected A.B.* 
> format)
>       at 
> org.apache.hadoop.hive.shims.ShimLoader.getMajorVersion(ShimLoader.java:147)
>       at 
> org.apache.hadoop.hive.shims.ShimLoader.loadShims(ShimLoader.java:122)
>       at 
> org.apache.hadoop.hive.shims.ShimLoader.getHadoopShims(ShimLoader.java:88)
>       at 
> org.apache.hadoop.hive.metastore.ObjectStore.getDataSourceProps(ObjectStore.java:377)
>       at 
> org.apache.hadoop.hive.metastore.ObjectStore.setConf(ObjectStore.java:268)
>       at 
> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:76)
>       at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136)
>       at 
> org.apache.hadoop.hive.metastore.RawStoreProxy.<init>(RawStoreProxy.java:58)
>       at 
> org.apache.hadoop.hive.metastore.RawStoreProxy.getProxy(RawStoreProxy.java:67)
>       at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStore(HiveMetaStore.java:517)
>       at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:482)
>       at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:544)
>       at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:370)
>       at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.<init>(RetryingHMSHandler.java:78)
>       at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:84)
>       at 
> org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:5762)
>       at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:219)
>       at 
> org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.<init>(SessionHiveMetaStoreClient.java:67)
>       at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>       at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>       at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>       at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>       at 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1548)
>       at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.<init>(RetryingMetaStoreClient.java:86)
>       at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:132)
>       at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:104)
>       at 
> org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3080)
>       at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3108)
>       at 
> org.apache.hadoop.hive.ql.metadata.Hive.getAllFunctions(Hive.java:3349)
>       at 
> org.apache.hadoop.hive.ql.metadata.Hive.reloadFunctions(Hive.java:217)
>       at 
> org.apache.hadoop.hive.ql.metadata.Hive.registerAllFunctionsOnce(Hive.java:204)
>       at org.apache.hadoop.hive.ql.metadata.Hive.<init>(Hive.java:331)
>       at org.apache.hadoop.hive.ql.metadata.Hive.get(Hive.java:292)
>       at org.apache.hadoop.hive.ql.metadata.Hive.getInternal(Hive.java:262)
>       at org.apache.hadoop.hive.ql.metadata.Hive.get(Hive.java:247)
>       at 
> org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:543)
>       at 
> org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:511)
>       at 
> org.apache.spark.sql.hive.client.HiveClientImpl.newState(HiveClientImpl.scala:175)
>       at 
> org.apache.spark.sql.hive.client.HiveClientImpl.<init>(HiveClientImpl.scala:128)
>       at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>       at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>       at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>       at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>       at 
> org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:301)
>       at 
> org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:431)
>       at 
> org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:324)
>       at 
> org.apache.spark.sql.hive.HiveExternalCatalog.client$lzycompute(HiveExternalCatalog.scala:72)
>       at 
> org.apache.spark.sql.hive.HiveExternalCatalog.client(HiveExternalCatalog.scala:71)
>       at 
> org.apache.spark.sql.hive.client.HadoopVersionInfoSuite.$anonfun$new$1(HadoopVersionInfoSuite.scala:63)
>       at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
>       at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
> {code}
> Technically, This is indeed an issue of Hadoop VersionInfo which has been 
> fixed: https://issues.apache.org/jira/browse/HADOOP-14067. But since we are 
> still supporting old Hadoop versions, we should fix it.
> Why this issue starts to happen in Spark 3.0.0?
> In Spark 2.4.x, we use Hive 1.2.1 by default. It will trigger `VersionInfo` 
> initialization in the static codes of `Hive` class. This will happen when we 
> load `HiveClientImpl` class because `HiveClientImpl.clent` method refers to 
> `Hive` class. At this moment, the thread context classloader is not using the 
> isolcated classloader, so it can access hadoop-common jar on the classpath 
> and initialize it correctly.
> In Spark 3.0.0, we use Hive 2.3.7. The static codes of `Hive` class are not 
> accessing `VersionInfo` because of the change in 
> https://issues.apache.org/jira/browse/HIVE-11657. Instead, accessing 
> `VersionInfo` happens when creating a `Hive` object (See the above stack 
> trace). This happens here 
> https://github.com/apache/spark/blob/v3.0.0/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala#L260.
>  But we switch to the isolated classloader before calling 
> `HiveClientImpl.client` (See 
> https://github.com/apache/spark/blob/v3.0.0/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala#L283).
>  This is exactly what I mentioned above: `If Hadoop VersionInfo is not 
> initialized before we switch to the isolated classloader, and we try to 
> initialize it using the isolated classloader (the current thread context 
> classloader), it will fail`
> I marked this is a blocker because it's a regression in 3.0.0 caused by 
> upgrading Hive execution version from 1.2.1 to 2.3.7.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to