[jira] [Comment Edited] (SPARK-21179) Unable to return Hive INT data type into Spark via Hive JDBC driver: Caused by: java.sql.SQLDataException: [Simba][JDBC](10140) Error converting value to int.

Matthew Walton (JIRA) Wed, 10 Jan 2018 03:28:39 -0800

    [ 
https://issues.apache.org/jira/browse/SPARK-21179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16320100#comment-16320100
 ]


Matthew Walton edited comment on SPARK-21179 at 1/10/18 11:27 AM:
------------------------------------------------------------------

I ended up getting a resolution from Simba:

"I apologize for the delay, after through investigation we have confirmed that 
this is indeed a shell issue. We will be opening a JIRA with them. The reason 
it took a bit longer is because we were trying to collect more information to 
prove that this is a shell issue. The shell should be using a JDBC api to 
determine the correct string identifier which is not called so default " " is 
used... "

There is a workaround that can be used. Shell provides a JdbcDialect class that 
can be used to set the identifier to wanted one. 

For Hive you can use this before or after establishing a connection (but before 
calling show()): 


import org.apache.spark.sql.jdbc.{JdbcDialects, JdbcType, JdbcDialect} 
import org.apache.spark.sql.types._ 

val HiveDialect = new JdbcDialect { 
override def canHandle(url: String): Boolean = url.startsWith("jdbc:hive2") || 
url.contains("hive2") 

override def quoteIdentifier(colName: String): String = { s"$colName" 
} 
} 

JdbcDialects.registerDialect(HiveDialect) 


was (Author: mwalton_mstr):
I ended up getting a resolution from Simba:

"I apologize for the delay, after through investigation we have confirmed that 
this is indeed a shell issue. We will be opening a JIRA with them. The reason 
it took a bit longer is because we were trying to collect more information to 
prove that this is a shell issue. The shell should be using a JDBC api to 
determine the correct string identifier which is not called so default " " is 
used... "

There is a workaround that can be used. Shell provides a JdbcDialect class that 
can be used to set the identifier to wanted one. 

For Hive you can use this before or after establishing a connection (but before 
calling show()): 


import org.apache.spark.sql.jdbc.{JdbcDialects, JdbcType, JdbcDialect} 
import org.apache.spark.sql.types._ 

val HiveDialect = new JdbcDialect { 
override def canHandle(url: String): Boolean = url.startsWith("jdbc:hive2") || 
url.contains("hive2") 

override def quoteIdentifier(colName: String): String = { s"$colName" 
} 
} 

> Unable to return Hive INT data type into Spark via Hive JDBC driver:  Caused 
> by: java.sql.SQLDataException: [Simba][JDBC](10140) Error converting value to 
> int.  
> -----------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-21179
>                 URL: https://issues.apache.org/jira/browse/SPARK-21179
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Shell, SQL
>    Affects Versions: 1.6.0, 2.0.0, 2.1.1
>         Environment: OS:  Linux
> HDP version 2.5.0.1-60
> Hive version: 1.2.1
> Spark  version 2.0.0.2.5.0.1-60
> JDBC:  Download the latest Hortonworks JDBC driver
>            Reporter: Matthew Walton
>
> I'm trying to fetch back data in Spark SQL using a JDBC connection to Hive.  
> Unfortunately, when I try to query data that resides in an INT column I get 
> the following error:  
> 17/06/22 12:14:37 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
> java.sql.SQLDataException: [Simba][JDBC](10140) Error converting value to 
> int.  
> Steps to reproduce:
> 1) On Hive create a simple table with an INT column and insert some data (I 
> used SQuirreL Client with the Hortonworks JDBC driver):
> create table wh2.hivespark (country_id int, country_name string)
> insert into wh2.hivespark values (1, 'USA')
> 2) Copy the Hortonworks Hive JDBC driver to the machine where you will run 
> Spark Shell
> 3) Start Spark shell loading the Hortonworks Hive JDBC driver jar files
> ./spark-shell --jars 
> /home/spark/jdbc/hortonworkshive/HiveJDBC41.jar,/home/spark/jdbc/hortonworkshive/TCLIServiceClient.jar,/home/spark/jdbc/hortonworkshive/commons-codec-1.3.jar,/home/spark/jdbc/hortonworkshive/commons-logging-1.1.1.jar,/home/spark/jdbc/hortonworkshive/hive_metastore.jar,/home/spark/jdbc/hortonworkshive/hive_service.jar,/home/spark/jdbc/hortonworkshive/httpclient-4.1.3.jar,/home/spark/jdbc/hortonworkshive/httpcore-4.1.3.jar,/home/spark/jdbc/hortonworkshive/libfb303-0.9.0.jar,/home/spark/jdbc/hortonworkshive/libthrift-0.9.0.jar,/home/spark/jdbc/hortonworkshive/log4j-1.2.14.jar,/home/spark/jdbc/hortonworkshive/ql.jar,/home/spark/jdbc/hortonworkshive/slf4j-api-1.5.11.jar,/home/spark/jdbc/hortonworkshive/slf4j-log4j12-1.5.11.jar,/home/spark/jdbc/hortonworkshive/zookeeper-3.4.6.jar
> 4) In Spark shell load the data from Hive using the JDBC driver
> val hivespark = spark.read.format("jdbc").options(Map("url" -> 
> "jdbc:hive2://localhost:10000/wh2;AuthMech=3;UseNativeQuery=1;user=hfs;password=hdfs","dbtable"
>  -> 
> "wh2.hivespark")).option("driver","com.simba.hive.jdbc41.HS2Driver").option("user","hdfs").option("password","hdfs").load()
> 5) In Spark shell try to display the data
> hivespark.show()
> At this point you should see the error:
> scala> hivespark.show()
> 17/06/22 12:14:37 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
> java.sql.SQLDataException: [Simba][JDBC](10140) Error converting value to int.
>         at 
> com.simba.hiveserver2.exceptions.ExceptionConverter.toSQLException(Unknown 
> Source)
>         at 
> com.simba.hiveserver2.utilities.conversion.TypeConverter.toInt(Unknown Source)
>         at com.simba.hiveserver2.jdbc.common.SForwardResultSet.getInt(Unknown 
> Source)
>         at 
> org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$$anon$1.getNext(JDBCRDD.scala:437)
>         at 
> org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$$anon$1.hasNext(JDBCRDD.scala:535)
>         at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
>  Source)
>         at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>         at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:370)
>         at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$4.apply(SparkPlan.scala:246)
>         at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$4.apply(SparkPlan.scala:240)
>         at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:784)
>         at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:784)
>         at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>         at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
>         at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
>         at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
>         at org.apache.spark.scheduler.Task.run(Task.scala:85)
>         at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
> Note:  I also tested this issue using a JDBC driver from Progress DataDirect 
> and I see a similar error message so this does not seem to be driver specific.
> scala> hivespark.show()
> 17/06/22 12:07:59 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 2)
> java.sql.SQLException: [DataDirect][Hive JDBC Driver]Value can not be 
> converted to requested type.
> Also, if I query this table directly from SQuirreL Client tool there is no 
> error.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-21179) Unable to return Hive INT data type into Spark via Hive JDBC driver: Caused by: java.sql.SQLDataException: [Simba][JDBC](10140) Error converting value to int.

Reply via email to