[ https://issues.apache.org/jira/browse/SPARK-26422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16725946#comment-16725946 ]
ASF GitHub Bot commented on SPARK-26422: ---------------------------------------- HyukjinKwon opened a new pull request #23356: [SPARK-26422][R] Support to disable Hive support in SparkR even for Hadoop versions unsupported by Hive fork URL: https://github.com/apache/spark/pull/23356 ## What changes were proposed in this pull request? Currently, even if I explicitly disable Hive support in SparkR session as below: ```r sparkSession <- sparkR.session("local[4]", "SparkR", Sys.getenv("SPARK_HOME"), enableHiveSupport = FALSE) ``` produces when the Hadoop version is not supported by our Hive fork: ``` java.lang.reflect.InvocationTargetException ... Caused by: java.lang.IllegalArgumentException: Unrecognized Hadoop major version number: 3.1.1.3.1.0.0-78 at org.apache.hadoop.hive.shims.ShimLoader.getMajorVersion(ShimLoader.java:174) at org.apache.hadoop.hive.shims.ShimLoader.loadShims(ShimLoader.java:139) at org.apache.hadoop.hive.shims.ShimLoader.getHadoopShims(ShimLoader.java:100) at org.apache.hadoop.hive.conf.HiveConf$ConfVars.<clinit>(HiveConf.java:368) ... 43 more Error in handleErrors(returnStatus, conn) : java.lang.ExceptionInInitializerError at org.apache.hadoop.hive.conf.HiveConf.<clinit>(HiveConf.java:105) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:348) at org.apache.spark.util.Utils$.classForName(Utils.scala:193) at org.apache.spark.sql.SparkSession$.hiveClassesArePresent(SparkSession.scala:1116) at org.apache.spark.sql.api.r.SQLUtils$.getOrCreateSparkSession(SQLUtils.scala:52) at org.apache.spark.sql.api.r.SQLUtils.getOrCreateSparkSession(SQLUtils.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ``` The root cause is that: ``` SparkSession.hiveClassesArePresent ``` check if the class is loadable or not to check if that's in classpath but `org.apache.hadoop.hive.conf.HiveConf` has a check for Hadoop version as static logic which is executed right away. This throws an `IllegalArgumentException` and that's not caught: https://github.com/apache/spark/blob/36edbac1c8337a4719f90e4abd58d38738b2e1fb/sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala#L1113-L1121 So, currently, if users have a Hive built-in Spark with unsupported Hadoop version by our fork (namely 3+), there's no way to use SparkR even thought it could work. This PR just propose to change the order of bool comparison so that we can don't execute `SparkSession.hiveClassesArePresent` when: 1. `enableHiveSupport` is explicitly disabled 2. `spark.sql.catalogImplementation` is `in-memory` so that we **only** check `SparkSession.hiveClassesArePresent` when Hive support is explicitly enabled by short short circuiting. ## How was this patch tested? It's difficult to write a test since we don't run tests against Hadoop 3 yet. See https://github.com/apache/spark/pull/21588. Manually tested. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Unable to disable Hive support in SparkR when Hadoop version is unsupported > --------------------------------------------------------------------------- > > Key: SPARK-26422 > URL: https://issues.apache.org/jira/browse/SPARK-26422 > Project: Spark > Issue Type: Bug > Components: SparkR > Affects Versions: 3.0.0 > Reporter: Hyukjin Kwon > Priority: Major > > When we make a Spark session as below: > {code} > sparkSession <- sparkR.session("local[4]", "SparkR", Sys.getenv("SPARK_HOME"), > list(spark.driver.extraClassPath = jarpaths, > spark.executor.extraClassPath = jarpaths), > enableHiveSupport = FALSE) > {code} > I faced an issue that it's unable to disable Hive support explicitly with the > error below: > {code} > java.lang.reflect.InvocationTargetException > ... > Caused by: java.lang.IllegalArgumentException: Unrecognized Hadoop major > version number: 3.1.1.3.1.0.0-78 > at > org.apache.hadoop.hive.shims.ShimLoader.getMajorVersion(ShimLoader.java:174) > at > org.apache.hadoop.hive.shims.ShimLoader.loadShims(ShimLoader.java:139) > at > org.apache.hadoop.hive.shims.ShimLoader.getHadoopShims(ShimLoader.java:100) > at > org.apache.hadoop.hive.conf.HiveConf$ConfVars.<clinit>(HiveConf.java:368) > ... 43 more > Error in handleErrors(returnStatus, conn) : > java.lang.ExceptionInInitializerError > at org.apache.hadoop.hive.conf.HiveConf.<clinit>(HiveConf.java:105) > at java.lang.Class.forName0(Native Method) > at java.lang.Class.forName(Class.java:348) > at org.apache.spark.util.Utils$.classForName(Utils.scala:193) > at > org.apache.spark.sql.SparkSession$.hiveClassesArePresent(SparkSession.scala:1116) > at > org.apache.spark.sql.api.r.SQLUtils$.getOrCreateSparkSession(SQLUtils.scala:52) > at > org.apache.spark.sql.api.r.SQLUtils.getOrCreateSparkSession(SQLUtils.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.spark.api.r.RBackendHandler.handleMethodCall(RBackendHandler.scala:167) > at > org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.scala:108) > ... > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org