[ 
https://issues.apache.org/jira/browse/SPARK-21183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16063421#comment-16063421
 ] 

Matthew Walton commented on SPARK-21183:
----------------------------------------

Well, if only the SQuirreL Client tool didn't work on the same query.  It seems 
that the Spark code is the culprit and is causing the Simba driver to report 
that error.  I have another ticket opened for Spark with data coming from Hive 
drivers and it is similar in that it deals with how the integer data type is 
handled by Spark.  In that case I have several drivers I can test to show that 
the issue does not seem to originate in the driver.   Unfortunately, that one 
has not been assigned:  [https://issues.apache.org/jira/browse/SPARK-21179]  I 
only mention this issue since it seems to show a behavior in Spark when it 
comes to handling INTEGER types with certain JDBC drivers.  In each case, 
running the same SQL outside of Spark works fine.

> Unable to return Google BigQuery INTEGER data type into Spark via google 
> BigQuery JDBC driver: java.sql.SQLDataException: [Simba][JDBC](10140) Error 
> converting value to long.
> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-21183
>                 URL: https://issues.apache.org/jira/browse/SPARK-21183
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Shell, SQL
>    Affects Versions: 1.6.0, 2.0.0, 2.1.1
>         Environment: OS:  Linux
> Spark  version 2.1.1
> JDBC:  Download the latest google BigQuery JDBC Driver from Google
>            Reporter: Matthew Walton
>
> I'm trying to fetch back data in Spark using a JDBC connection to Google 
> BigQuery.  Unfortunately, when I try to query data that resides in an INTEGER 
> column I get the following error:  
> java.sql.SQLDataException: [Simba][JDBC](10140) Error converting value to 
> long.  
> Steps to reproduce:
> 1) On Google BigQuery console create a simple table with an INT column and 
> insert some data 
> 2) Copy the Google BigQuery JDBC driver to the machine where you will run 
> Spark Shell
> 3) Start Spark shell loading the GoogleBigQuery JDBC driver jar files
> ./spark-shell --jars 
> /home/ec2-user/jdbc/gbq/GoogleBigQueryJDBC42.jar,/home/ec2-user/jdbc/gbq/google-api-client-1.22.0.jar,/home/ec2-user/jdbc/gbq/google-api-services-bigquery-v2-rev320-1.22.0.jar,/home/ec2-user/jdbc/gbq/google-http-client-1.22.0.jar,/home/ec2-user/jdbc/gbq/google-http-client-jackson2-1.22.0.jar,/home/ec2-user/jdbc/gbq/google-oauth-client-1.22.0.jar,/home/ec2-user/jdbc/gbq/jackson-core-2.1.3.jar
> 4) In Spark shell load the data from Google BigQuery using the JDBC driver
> val gbq = spark.read.format("jdbc").options(Map("url" -> 
> "jdbc:bigquery://https://www.googleapis.com/bigquery/v2;ProjectId=your-project-name-here;OAuthType=0;OAuthPvtKeyPath=/usr/lib/spark/YourProjectPrivateKey.json;OAuthServiceAcctEmail=YourEmail@gmail.comAllowLargeResults=1;LargeResultDataset=_bqodbc_temp_tables;LargeResultTable=_matthew;Timeout=600","dbtable";
>  -> 
> "test.lu_test_integer")).option("driver","com.simba.googlebigquery.jdbc42.Driver").option("user","").option("password","").load()
> 5) In Spark shell try to display the data
> gbq.show()
> At this point you should see the error:
> scala> gbq.show()
> 17/06/22 19:34:57 WARN TaskSetManager: Lost task 0.0 in stage 3.0 (TID 6, 
> ip-172-31-37-165.ec2.internal, executor 3): java.sql.SQLDataException: 
> [Simba][JDBC](10140) Error converting value to long.
>         at com.simba.exceptions.ExceptionConverter.toSQLException(Unknown 
> Source)
>         at com.simba.utilities.conversion.TypeConverter.toLong(Unknown Source)
>         at com.simba.jdbc.common.SForwardResultSet.getLong(Unknown Source)
>         at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$makeGetter$8.apply(JdbcUtils.scala:365)
>         at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$makeGetter$8.apply(JdbcUtils.scala:364)
>         at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anon$1.getNext(JdbcUtils.scala:286)
>         at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anon$1.getNext(JdbcUtils.scala:268)
>         at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
>         at 
> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
>         at 
> org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
>         at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
>  Source)
>         at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>         at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:377)
>         at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:231)
>         at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:225)
>         at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827)
>         at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827)
>         at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>         at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
>         at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
>         at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
>         at org.apache.spark.scheduler.Task.run(Task.scala:99)
>         at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         at java.lang.Thread.run(Thread.java:748)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to