[ https://issues.apache.org/jira/browse/SPARK-21179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16320100#comment-16320100 ]
Matthew Walton edited comment on SPARK-21179 at 1/10/18 11:27 AM: ------------------------------------------------------------------ I ended up getting a resolution from Simba: "I apologize for the delay, after through investigation we have confirmed that this is indeed a shell issue. We will be opening a JIRA with them. The reason it took a bit longer is because we were trying to collect more information to prove that this is a shell issue. The shell should be using a JDBC api to determine the correct string identifier which is not called so default " " is used... " There is a workaround that can be used. Shell provides a JdbcDialect class that can be used to set the identifier to wanted one. For Hive you can use this before or after establishing a connection (but before calling show()): import org.apache.spark.sql.jdbc.{JdbcDialects, JdbcType, JdbcDialect} import org.apache.spark.sql.types._ val HiveDialect = new JdbcDialect { override def canHandle(url: String): Boolean = url.startsWith("jdbc:hive2") || url.contains("hive2") override def quoteIdentifier(colName: String): String = { s"$colName" } } JdbcDialects.registerDialect(HiveDialect) was (Author: mwalton_mstr): I ended up getting a resolution from Simba: "I apologize for the delay, after through investigation we have confirmed that this is indeed a shell issue. We will be opening a JIRA with them. The reason it took a bit longer is because we were trying to collect more information to prove that this is a shell issue. The shell should be using a JDBC api to determine the correct string identifier which is not called so default " " is used... " There is a workaround that can be used. Shell provides a JdbcDialect class that can be used to set the identifier to wanted one. For Hive you can use this before or after establishing a connection (but before calling show()): import org.apache.spark.sql.jdbc.{JdbcDialects, JdbcType, JdbcDialect} import org.apache.spark.sql.types._ val HiveDialect = new JdbcDialect { override def canHandle(url: String): Boolean = url.startsWith("jdbc:hive2") || url.contains("hive2") override def quoteIdentifier(colName: String): String = { s"$colName" } } > Unable to return Hive INT data type into Spark via Hive JDBC driver: Caused > by: java.sql.SQLDataException: [Simba][JDBC](10140) Error converting value to > int. > ----------------------------------------------------------------------------------------------------------------------------------------------------------------- > > Key: SPARK-21179 > URL: https://issues.apache.org/jira/browse/SPARK-21179 > Project: Spark > Issue Type: Bug > Components: Spark Shell, SQL > Affects Versions: 1.6.0, 2.0.0, 2.1.1 > Environment: OS: Linux > HDP version 2.5.0.1-60 > Hive version: 1.2.1 > Spark version 2.0.0.2.5.0.1-60 > JDBC: Download the latest Hortonworks JDBC driver > Reporter: Matthew Walton > > I'm trying to fetch back data in Spark SQL using a JDBC connection to Hive. > Unfortunately, when I try to query data that resides in an INT column I get > the following error: > 17/06/22 12:14:37 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0) > java.sql.SQLDataException: [Simba][JDBC](10140) Error converting value to > int. > Steps to reproduce: > 1) On Hive create a simple table with an INT column and insert some data (I > used SQuirreL Client with the Hortonworks JDBC driver): > create table wh2.hivespark (country_id int, country_name string) > insert into wh2.hivespark values (1, 'USA') > 2) Copy the Hortonworks Hive JDBC driver to the machine where you will run > Spark Shell > 3) Start Spark shell loading the Hortonworks Hive JDBC driver jar files > ./spark-shell --jars > /home/spark/jdbc/hortonworkshive/HiveJDBC41.jar,/home/spark/jdbc/hortonworkshive/TCLIServiceClient.jar,/home/spark/jdbc/hortonworkshive/commons-codec-1.3.jar,/home/spark/jdbc/hortonworkshive/commons-logging-1.1.1.jar,/home/spark/jdbc/hortonworkshive/hive_metastore.jar,/home/spark/jdbc/hortonworkshive/hive_service.jar,/home/spark/jdbc/hortonworkshive/httpclient-4.1.3.jar,/home/spark/jdbc/hortonworkshive/httpcore-4.1.3.jar,/home/spark/jdbc/hortonworkshive/libfb303-0.9.0.jar,/home/spark/jdbc/hortonworkshive/libthrift-0.9.0.jar,/home/spark/jdbc/hortonworkshive/log4j-1.2.14.jar,/home/spark/jdbc/hortonworkshive/ql.jar,/home/spark/jdbc/hortonworkshive/slf4j-api-1.5.11.jar,/home/spark/jdbc/hortonworkshive/slf4j-log4j12-1.5.11.jar,/home/spark/jdbc/hortonworkshive/zookeeper-3.4.6.jar > 4) In Spark shell load the data from Hive using the JDBC driver > val hivespark = spark.read.format("jdbc").options(Map("url" -> > "jdbc:hive2://localhost:10000/wh2;AuthMech=3;UseNativeQuery=1;user=hfs;password=hdfs","dbtable" > -> > "wh2.hivespark")).option("driver","com.simba.hive.jdbc41.HS2Driver").option("user","hdfs").option("password","hdfs").load() > 5) In Spark shell try to display the data > hivespark.show() > At this point you should see the error: > scala> hivespark.show() > 17/06/22 12:14:37 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0) > java.sql.SQLDataException: [Simba][JDBC](10140) Error converting value to int. > at > com.simba.hiveserver2.exceptions.ExceptionConverter.toSQLException(Unknown > Source) > at > com.simba.hiveserver2.utilities.conversion.TypeConverter.toInt(Unknown Source) > at com.simba.hiveserver2.jdbc.common.SForwardResultSet.getInt(Unknown > Source) > at > org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$$anon$1.getNext(JDBCRDD.scala:437) > at > org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$$anon$1.hasNext(JDBCRDD.scala:535) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown > Source) > at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:370) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$4.apply(SparkPlan.scala:246) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$4.apply(SparkPlan.scala:240) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:784) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:784) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:283) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70) > at org.apache.spark.scheduler.Task.run(Task.scala:85) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Note: I also tested this issue using a JDBC driver from Progress DataDirect > and I see a similar error message so this does not seem to be driver specific. > scala> hivespark.show() > 17/06/22 12:07:59 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 2) > java.sql.SQLException: [DataDirect][Hive JDBC Driver]Value can not be > converted to requested type. > Also, if I query this table directly from SQuirreL Client tool there is no > error. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org