Matthew Walton created SPARK-21183:
--------------------------------------

             Summary: Unable to return Google BigQuery INTEGER data type into 
Spark via google BigQuery JDBC driver: java.sql.SQLDataException: 
[Simba][JDBC](10140) Error converting value to long.
                 Key: SPARK-21183
                 URL: https://issues.apache.org/jira/browse/SPARK-21183
             Project: Spark
          Issue Type: Bug
          Components: Spark Shell, SQL
    Affects Versions: 2.1.1, 2.0.0, 1.6.0
         Environment: OS:  Linux
Spark  version 2.1.1
JDBC:  Download the latest google BigQuery JDBC Driver from Google
            Reporter: Matthew Walton


I'm trying to fetch back data in Spark using a JDBC connection to Google 
BigQuery.  Unfortunately, when I try to query data that resides in an INTEGER 
column I get the following error:  

java.sql.SQLDataException: [Simba][JDBC](10140) Error converting value to long. 
 

Steps to reproduce:

1) On Google BigQuery console create a simple table with an INT column and 
insert some data 

2) Copy the Google BigQuery JDBC driver to the machine where you will run Spark 
Shell

3) Start Spark shell loading the GoogleBigQuery JDBC driver jar files

./spark-shell --jars 
/home/ec2-user/jdbc/gbq/GoogleBigQueryJDBC42.jar,/home/ec2-user/jdbc/gbq/google-api-client-1.22.0.jar,/home/ec2-user/jdbc/gbq/google-api-services-bigquery-v2-rev320-1.22.0.jar,/home/ec2-user/jdbc/gbq/google-http-client-1.22.0.jar,/home/ec2-user/jdbc/gbq/google-http-client-jackson2-1.22.0.jar,/home/ec2-user/jdbc/gbq/google-oauth-client-1.22.0.jar,/home/ec2-user/jdbc/gbq/jackson-core-2.1.3.jar

4) In Spark shell load the data from Google BigQuery using the JDBC driver

val gbq = spark.read.format("jdbc").options(Map("url" -> 
"jdbc:bigquery://https://www.googleapis.com/bigquery/v2;ProjectId=your-project-name-here;OAuthType=0;OAuthPvtKeyPath=/usr/lib/spark/YourProjectPrivateKey.json;OAuthServiceAcctEmail=YourEmail@gmail.comAllowLargeResults=1;LargeResultDataset=_bqodbc_temp_tables;LargeResultTable=_matthew;Timeout=600","dbtable";
 -> 
"test.lu_test_integer")).option("driver","com.simba.googlebigquery.jdbc42.Driver").option("user","").option("password","").load()

5) In Spark shell try to display the data

gbq.show()

At this point you should see the error:

scala> gbq.show()
17/06/22 19:34:57 WARN TaskSetManager: Lost task 0.0 in stage 3.0 (TID 6, 
ip-172-31-37-165.ec2.internal, executor 3): java.sql.SQLDataException: 
[Simba][JDBC](10140) Error converting value to long.
        at com.simba.exceptions.ExceptionConverter.toSQLException(Unknown 
Source)
        at com.simba.utilities.conversion.TypeConverter.toLong(Unknown Source)
        at com.simba.jdbc.common.SForwardResultSet.getLong(Unknown Source)
        at 
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$makeGetter$8.apply(JdbcUtils.scala:365)
        at 
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$makeGetter$8.apply(JdbcUtils.scala:364)
        at 
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anon$1.getNext(JdbcUtils.scala:286)
        at 
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anon$1.getNext(JdbcUtils.scala:268)
        at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
        at 
org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
        at 
org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
        at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
 Source)
        at 
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
        at 
org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:377)
        at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:231)
        at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:225)
        at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827)
        at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827)
        at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
        at org.apache.spark.scheduler.Task.run(Task.scala:99)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:748)




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to