[ 
https://issues.apache.org/jira/browse/SPARK-26422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16725946#comment-16725946
 ] 

ASF GitHub Bot commented on SPARK-26422:
----------------------------------------

HyukjinKwon opened a new pull request #23356: [SPARK-26422][R] Support to 
disable Hive support in SparkR even for Hadoop versions unsupported by Hive fork
URL: https://github.com/apache/spark/pull/23356
 
 
   ## What changes were proposed in this pull request?
   
   Currently,  even if I explicitly disable Hive support in SparkR session as 
below:
   
   ```r
   sparkSession <- sparkR.session("local[4]", "SparkR", 
Sys.getenv("SPARK_HOME"),
                                  enableHiveSupport = FALSE)
   ```
   
   produces when the Hadoop version is not supported by our Hive fork:
   
   ```
   java.lang.reflect.InvocationTargetException
   ...
   Caused by: java.lang.IllegalArgumentException: Unrecognized Hadoop major 
version number: 3.1.1.3.1.0.0-78
        at 
org.apache.hadoop.hive.shims.ShimLoader.getMajorVersion(ShimLoader.java:174)
        at 
org.apache.hadoop.hive.shims.ShimLoader.loadShims(ShimLoader.java:139)
        at 
org.apache.hadoop.hive.shims.ShimLoader.getHadoopShims(ShimLoader.java:100)
        at 
org.apache.hadoop.hive.conf.HiveConf$ConfVars.<clinit>(HiveConf.java:368)
        ... 43 more
   Error in handleErrors(returnStatus, conn) :
     java.lang.ExceptionInInitializerError
        at org.apache.hadoop.hive.conf.HiveConf.<clinit>(HiveConf.java:105)
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:348)
        at org.apache.spark.util.Utils$.classForName(Utils.scala:193)
        at 
org.apache.spark.sql.SparkSession$.hiveClassesArePresent(SparkSession.scala:1116)
        at 
org.apache.spark.sql.api.r.SQLUtils$.getOrCreateSparkSession(SQLUtils.scala:52)
        at 
org.apache.spark.sql.api.r.SQLUtils.getOrCreateSparkSession(SQLUtils.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   ```
   
   The root cause is that:
   
   ```
   SparkSession.hiveClassesArePresent
   ```
   
   check if the class is loadable or not to check if that's in classpath but 
`org.apache.hadoop.hive.conf.HiveConf` has a check for Hadoop version as static 
logic which is executed right away. This throws an `IllegalArgumentException` 
and that's not caught:
   
   
https://github.com/apache/spark/blob/36edbac1c8337a4719f90e4abd58d38738b2e1fb/sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala#L1113-L1121
   
   So, currently, if users have a Hive built-in Spark with unsupported Hadoop 
version by our fork (namely 3+), there's no way to use SparkR even thought it 
could work.
   
   This PR just propose to change the order of bool comparison so that we can 
don't execute `SparkSession.hiveClassesArePresent` when:
   
     1. `enableHiveSupport` is explicitly disabled
     2. `spark.sql.catalogImplementation` is `in-memory`
   
   so that we **only** check `SparkSession.hiveClassesArePresent` when Hive 
support is explicitly enabled by short short circuiting.
   
   ## How was this patch tested?
   
   It's difficult to write a test since we don't run tests against Hadoop 3 
yet. See https://github.com/apache/spark/pull/21588. Manually tested.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Unable to disable Hive support in SparkR when Hadoop version is unsupported
> ---------------------------------------------------------------------------
>
>                 Key: SPARK-26422
>                 URL: https://issues.apache.org/jira/browse/SPARK-26422
>             Project: Spark
>          Issue Type: Bug
>          Components: SparkR
>    Affects Versions: 3.0.0
>            Reporter: Hyukjin Kwon
>            Priority: Major
>
> When we make a Spark session as below:
> {code}
> sparkSession <- sparkR.session("local[4]", "SparkR", Sys.getenv("SPARK_HOME"),
>                                list(spark.driver.extraClassPath = jarpaths,
>                                     spark.executor.extraClassPath = jarpaths),
>                                enableHiveSupport = FALSE)
> {code}
> I faced an issue that it's unable to disable Hive support explicitly with the 
> error below:
> {code}
> java.lang.reflect.InvocationTargetException
> ...
> Caused by: java.lang.IllegalArgumentException: Unrecognized Hadoop major 
> version number: 3.1.1.3.1.0.0-78
>       at 
> org.apache.hadoop.hive.shims.ShimLoader.getMajorVersion(ShimLoader.java:174)
>       at 
> org.apache.hadoop.hive.shims.ShimLoader.loadShims(ShimLoader.java:139)
>       at 
> org.apache.hadoop.hive.shims.ShimLoader.getHadoopShims(ShimLoader.java:100)
>       at 
> org.apache.hadoop.hive.conf.HiveConf$ConfVars.<clinit>(HiveConf.java:368)
>       ... 43 more
> Error in handleErrors(returnStatus, conn) :
>   java.lang.ExceptionInInitializerError
>       at org.apache.hadoop.hive.conf.HiveConf.<clinit>(HiveConf.java:105)
>       at java.lang.Class.forName0(Native Method)
>       at java.lang.Class.forName(Class.java:348)
>       at org.apache.spark.util.Utils$.classForName(Utils.scala:193)
>       at 
> org.apache.spark.sql.SparkSession$.hiveClassesArePresent(SparkSession.scala:1116)
>       at 
> org.apache.spark.sql.api.r.SQLUtils$.getOrCreateSparkSession(SQLUtils.scala:52)
>       at 
> org.apache.spark.sql.api.r.SQLUtils.getOrCreateSparkSession(SQLUtils.scala)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:498)
>       at 
> org.apache.spark.api.r.RBackendHandler.handleMethodCall(RBackendHandler.scala:167)
>       at 
> org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.scala:108)
> ...
> {code} 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to