[jira] [Commented] (SPARK-21183) Unable to return Google BigQuery INTEGER data type into Spark via google BigQuery JDBC driver: java.sql.SQLDataException: [Simba][JDBC](10140) Error converting value t

2019-09-13 Thread Philippe ROSSIGNOL (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-21183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16929075#comment-16929075
 ] 

Philippe ROSSIGNOL commented on SPARK-21183:


For me UseNativeQuery=0 works well with the Simba jdbc Spark driver.

For your info there is a misunderstood in the Simba Spark documentation. In 
fact it's written that 0 is the default value, but in reality the default value 
is 1 and not 0 !

> Unable to return Google BigQuery INTEGER data type into Spark via google 
> BigQuery JDBC driver: java.sql.SQLDataException: [Simba][JDBC](10140) Error 
> converting value to long.
> --
>
> Key: SPARK-21183
> URL: https://issues.apache.org/jira/browse/SPARK-21183
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell, SQL
>Affects Versions: 1.6.0, 2.0.0, 2.1.1
> Environment: OS:  Linux
> Spark  version 2.1.1
> JDBC:  Download the latest google BigQuery JDBC Driver from Google
>Reporter: Matthew Walton
>Priority: Major
>  Labels: bulk-closed
>
> I'm trying to fetch back data in Spark using a JDBC connection to Google 
> BigQuery.  Unfortunately, when I try to query data that resides in an INTEGER 
> column I get the following error:  
> java.sql.SQLDataException: [Simba][JDBC](10140) Error converting value to 
> long.  
> Steps to reproduce:
> 1) On Google BigQuery console create a simple table with an INT column and 
> insert some data 
> 2) Copy the Google BigQuery JDBC driver to the machine where you will run 
> Spark Shell
> 3) Start Spark shell loading the GoogleBigQuery JDBC driver jar files
> ./spark-shell --jars 
> /home/ec2-user/jdbc/gbq/GoogleBigQueryJDBC42.jar,/home/ec2-user/jdbc/gbq/google-api-client-1.22.0.jar,/home/ec2-user/jdbc/gbq/google-api-services-bigquery-v2-rev320-1.22.0.jar,/home/ec2-user/jdbc/gbq/google-http-client-1.22.0.jar,/home/ec2-user/jdbc/gbq/google-http-client-jackson2-1.22.0.jar,/home/ec2-user/jdbc/gbq/google-oauth-client-1.22.0.jar,/home/ec2-user/jdbc/gbq/jackson-core-2.1.3.jar
> 4) In Spark shell load the data from Google BigQuery using the JDBC driver
> val gbq = spark.read.format("jdbc").options(Map("url" -> 
> "jdbc:bigquery://https://www.googleapis.com/bigquery/v2;ProjectId=your-project-name-here;OAuthType=0;OAuthPvtKeyPath=/usr/lib/spark/YourProjectPrivateKey.json;OAuthServiceAcctEmail=YourEmail@gmail.comAllowLargeResults=1;LargeResultDataset=_bqodbc_temp_tables;LargeResultTable=_matthew;Timeout=600","dbtable;
>  -> 
> "test.lu_test_integer")).option("driver","com.simba.googlebigquery.jdbc42.Driver").option("user","").option("password","").load()
> 5) In Spark shell try to display the data
> gbq.show()
> At this point you should see the error:
> scala> gbq.show()
> 17/06/22 19:34:57 WARN TaskSetManager: Lost task 0.0 in stage 3.0 (TID 6, 
> ip-172-31-37-165.ec2.internal, executor 3): java.sql.SQLDataException: 
> [Simba][JDBC](10140) Error converting value to long.
> at com.simba.exceptions.ExceptionConverter.toSQLException(Unknown 
> Source)
> at com.simba.utilities.conversion.TypeConverter.toLong(Unknown Source)
> at com.simba.jdbc.common.SForwardResultSet.getLong(Unknown Source)
> at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$makeGetter$8.apply(JdbcUtils.scala:365)
> at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$makeGetter$8.apply(JdbcUtils.scala:364)
> at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anon$1.getNext(JdbcUtils.scala:286)
> at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anon$1.getNext(JdbcUtils.scala:268)
> at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
> at 
> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
> at 
> org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
> at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
>  Source)
> at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
> at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:377)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:231)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:225)
> at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827)
>

[jira] [Commented] (SPARK-21183) Unable to return Google BigQuery INTEGER data type into Spark via google BigQuery JDBC driver: java.sql.SQLDataException: [Simba][JDBC](10140) Error converting value t

2018-01-10 Thread Matthew Walton (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16320106#comment-16320106
 ] 

Matthew Walton commented on SPARK-21183:


I ended up getting an answer and work around from Simba here.  Their comments 
below

I apologize for the delay, after thorough investigation we have confirmed that 
this is indeed a shell issue. We will be opening a JIRA with them. The reason 
it took a bit longer is because we were trying to collect more information to 
prove that this is a shell issue. The shell should be using a JDBC api to 
determine the correct string identifier which is not called so default " " is 
used instead of back-tick which is a identifier for BigQuery. 

There is a workaround that can be used. Shell provides a JdbcDialect class that 
can be used to set the identifier to wanted one. 

For Hive and BigQuery you can use this before or after establishing a 
connection (but before calling show()): 

import org.apache.spark.sql.jdbc.{JdbcDialects, JdbcType, JdbcDialect} 

val GoogleDialect = new JdbcDialect { 
override def canHandle(url: String): Boolean = url.startsWith("jdbc:bigquery") 
|| url.contains("bigquery") 
override def quoteIdentifier(colName: String): String = { 
s"`$colName`"}} 

JdbcDialects.registerDialect(GoogleDialect); 



> Unable to return Google BigQuery INTEGER data type into Spark via google 
> BigQuery JDBC driver: java.sql.SQLDataException: [Simba][JDBC](10140) Error 
> converting value to long.
> --
>
> Key: SPARK-21183
> URL: https://issues.apache.org/jira/browse/SPARK-21183
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell, SQL
>Affects Versions: 1.6.0, 2.0.0, 2.1.1
> Environment: OS:  Linux
> Spark  version 2.1.1
> JDBC:  Download the latest google BigQuery JDBC Driver from Google
>Reporter: Matthew Walton
>
> I'm trying to fetch back data in Spark using a JDBC connection to Google 
> BigQuery.  Unfortunately, when I try to query data that resides in an INTEGER 
> column I get the following error:  
> java.sql.SQLDataException: [Simba][JDBC](10140) Error converting value to 
> long.  
> Steps to reproduce:
> 1) On Google BigQuery console create a simple table with an INT column and 
> insert some data 
> 2) Copy the Google BigQuery JDBC driver to the machine where you will run 
> Spark Shell
> 3) Start Spark shell loading the GoogleBigQuery JDBC driver jar files
> ./spark-shell --jars 
> /home/ec2-user/jdbc/gbq/GoogleBigQueryJDBC42.jar,/home/ec2-user/jdbc/gbq/google-api-client-1.22.0.jar,/home/ec2-user/jdbc/gbq/google-api-services-bigquery-v2-rev320-1.22.0.jar,/home/ec2-user/jdbc/gbq/google-http-client-1.22.0.jar,/home/ec2-user/jdbc/gbq/google-http-client-jackson2-1.22.0.jar,/home/ec2-user/jdbc/gbq/google-oauth-client-1.22.0.jar,/home/ec2-user/jdbc/gbq/jackson-core-2.1.3.jar
> 4) In Spark shell load the data from Google BigQuery using the JDBC driver
> val gbq = spark.read.format("jdbc").options(Map("url" -> 
> "jdbc:bigquery://https://www.googleapis.com/bigquery/v2;ProjectId=your-project-name-here;OAuthType=0;OAuthPvtKeyPath=/usr/lib/spark/YourProjectPrivateKey.json;OAuthServiceAcctEmail=YourEmail@gmail.comAllowLargeResults=1;LargeResultDataset=_bqodbc_temp_tables;LargeResultTable=_matthew;Timeout=600","dbtable;
>  -> 
> "test.lu_test_integer")).option("driver","com.simba.googlebigquery.jdbc42.Driver").option("user","").option("password","").load()
> 5) In Spark shell try to display the data
> gbq.show()
> At this point you should see the error:
> scala> gbq.show()
> 17/06/22 19:34:57 WARN TaskSetManager: Lost task 0.0 in stage 3.0 (TID 6, 
> ip-172-31-37-165.ec2.internal, executor 3): java.sql.SQLDataException: 
> [Simba][JDBC](10140) Error converting value to long.
> at com.simba.exceptions.ExceptionConverter.toSQLException(Unknown 
> Source)
> at com.simba.utilities.conversion.TypeConverter.toLong(Unknown Source)
> at com.simba.jdbc.common.SForwardResultSet.getLong(Unknown Source)
> at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$makeGetter$8.apply(JdbcUtils.scala:365)
> at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$makeGetter$8.apply(JdbcUtils.scala:364)
> at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anon$1.getNext(JdbcUtils.scala:286)
> at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anon$1.getNext(JdbcUtils.scala:268)
> at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
> at 
> 

[jira] [Commented] (SPARK-21183) Unable to return Google BigQuery INTEGER data type into Spark via google BigQuery JDBC driver: java.sql.SQLDataException: [Simba][JDBC](10140) Error converting value t

2017-06-26 Thread Matthew Walton (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16063448#comment-16063448
 ] 

Matthew Walton commented on SPARK-21183:


Hi Sean,

I think the issue is Spark is not handling the integer data types properly and 
this causes a conversion error.  Running a simple select against the same data 
with a tool like SQuirrel works fine to fetch back integer data but Spark seems 
to be using a different method to fetch.  I'm not sure if you have access to 
Google BigQuery but you can easily reproduce this issue.  If not, please take a 
look at the other ticket for Hive JDBC drivers as I think you probably have 
access to Hive for testing.  If this is asking too much I guess I'll try to log 
this and the other issue with Simba and let them pursue the issues more.  
Essentially, I would not expect an error on handling the integer type with 
Spark especially with the driver that Google expects its customers to use.

Thanks,
Matt

> Unable to return Google BigQuery INTEGER data type into Spark via google 
> BigQuery JDBC driver: java.sql.SQLDataException: [Simba][JDBC](10140) Error 
> converting value to long.
> --
>
> Key: SPARK-21183
> URL: https://issues.apache.org/jira/browse/SPARK-21183
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell, SQL
>Affects Versions: 1.6.0, 2.0.0, 2.1.1
> Environment: OS:  Linux
> Spark  version 2.1.1
> JDBC:  Download the latest google BigQuery JDBC Driver from Google
>Reporter: Matthew Walton
>
> I'm trying to fetch back data in Spark using a JDBC connection to Google 
> BigQuery.  Unfortunately, when I try to query data that resides in an INTEGER 
> column I get the following error:  
> java.sql.SQLDataException: [Simba][JDBC](10140) Error converting value to 
> long.  
> Steps to reproduce:
> 1) On Google BigQuery console create a simple table with an INT column and 
> insert some data 
> 2) Copy the Google BigQuery JDBC driver to the machine where you will run 
> Spark Shell
> 3) Start Spark shell loading the GoogleBigQuery JDBC driver jar files
> ./spark-shell --jars 
> /home/ec2-user/jdbc/gbq/GoogleBigQueryJDBC42.jar,/home/ec2-user/jdbc/gbq/google-api-client-1.22.0.jar,/home/ec2-user/jdbc/gbq/google-api-services-bigquery-v2-rev320-1.22.0.jar,/home/ec2-user/jdbc/gbq/google-http-client-1.22.0.jar,/home/ec2-user/jdbc/gbq/google-http-client-jackson2-1.22.0.jar,/home/ec2-user/jdbc/gbq/google-oauth-client-1.22.0.jar,/home/ec2-user/jdbc/gbq/jackson-core-2.1.3.jar
> 4) In Spark shell load the data from Google BigQuery using the JDBC driver
> val gbq = spark.read.format("jdbc").options(Map("url" -> 
> "jdbc:bigquery://https://www.googleapis.com/bigquery/v2;ProjectId=your-project-name-here;OAuthType=0;OAuthPvtKeyPath=/usr/lib/spark/YourProjectPrivateKey.json;OAuthServiceAcctEmail=YourEmail@gmail.comAllowLargeResults=1;LargeResultDataset=_bqodbc_temp_tables;LargeResultTable=_matthew;Timeout=600","dbtable;
>  -> 
> "test.lu_test_integer")).option("driver","com.simba.googlebigquery.jdbc42.Driver").option("user","").option("password","").load()
> 5) In Spark shell try to display the data
> gbq.show()
> At this point you should see the error:
> scala> gbq.show()
> 17/06/22 19:34:57 WARN TaskSetManager: Lost task 0.0 in stage 3.0 (TID 6, 
> ip-172-31-37-165.ec2.internal, executor 3): java.sql.SQLDataException: 
> [Simba][JDBC](10140) Error converting value to long.
> at com.simba.exceptions.ExceptionConverter.toSQLException(Unknown 
> Source)
> at com.simba.utilities.conversion.TypeConverter.toLong(Unknown Source)
> at com.simba.jdbc.common.SForwardResultSet.getLong(Unknown Source)
> at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$makeGetter$8.apply(JdbcUtils.scala:365)
> at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$makeGetter$8.apply(JdbcUtils.scala:364)
> at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anon$1.getNext(JdbcUtils.scala:286)
> at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anon$1.getNext(JdbcUtils.scala:268)
> at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
> at 
> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
> at 
> org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
> at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
>  Source)
> at 
> 

[jira] [Commented] (SPARK-21183) Unable to return Google BigQuery INTEGER data type into Spark via google BigQuery JDBC driver: java.sql.SQLDataException: [Simba][JDBC](10140) Error converting value t

2017-06-26 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16063428#comment-16063428
 ] 

Sean Owen commented on SPARK-21183:
---

I'm not sure that follows, but I don't know either. I think you'd have to 
establish what you think the Spark issue is here.

> Unable to return Google BigQuery INTEGER data type into Spark via google 
> BigQuery JDBC driver: java.sql.SQLDataException: [Simba][JDBC](10140) Error 
> converting value to long.
> --
>
> Key: SPARK-21183
> URL: https://issues.apache.org/jira/browse/SPARK-21183
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell, SQL
>Affects Versions: 1.6.0, 2.0.0, 2.1.1
> Environment: OS:  Linux
> Spark  version 2.1.1
> JDBC:  Download the latest google BigQuery JDBC Driver from Google
>Reporter: Matthew Walton
>
> I'm trying to fetch back data in Spark using a JDBC connection to Google 
> BigQuery.  Unfortunately, when I try to query data that resides in an INTEGER 
> column I get the following error:  
> java.sql.SQLDataException: [Simba][JDBC](10140) Error converting value to 
> long.  
> Steps to reproduce:
> 1) On Google BigQuery console create a simple table with an INT column and 
> insert some data 
> 2) Copy the Google BigQuery JDBC driver to the machine where you will run 
> Spark Shell
> 3) Start Spark shell loading the GoogleBigQuery JDBC driver jar files
> ./spark-shell --jars 
> /home/ec2-user/jdbc/gbq/GoogleBigQueryJDBC42.jar,/home/ec2-user/jdbc/gbq/google-api-client-1.22.0.jar,/home/ec2-user/jdbc/gbq/google-api-services-bigquery-v2-rev320-1.22.0.jar,/home/ec2-user/jdbc/gbq/google-http-client-1.22.0.jar,/home/ec2-user/jdbc/gbq/google-http-client-jackson2-1.22.0.jar,/home/ec2-user/jdbc/gbq/google-oauth-client-1.22.0.jar,/home/ec2-user/jdbc/gbq/jackson-core-2.1.3.jar
> 4) In Spark shell load the data from Google BigQuery using the JDBC driver
> val gbq = spark.read.format("jdbc").options(Map("url" -> 
> "jdbc:bigquery://https://www.googleapis.com/bigquery/v2;ProjectId=your-project-name-here;OAuthType=0;OAuthPvtKeyPath=/usr/lib/spark/YourProjectPrivateKey.json;OAuthServiceAcctEmail=YourEmail@gmail.comAllowLargeResults=1;LargeResultDataset=_bqodbc_temp_tables;LargeResultTable=_matthew;Timeout=600","dbtable;
>  -> 
> "test.lu_test_integer")).option("driver","com.simba.googlebigquery.jdbc42.Driver").option("user","").option("password","").load()
> 5) In Spark shell try to display the data
> gbq.show()
> At this point you should see the error:
> scala> gbq.show()
> 17/06/22 19:34:57 WARN TaskSetManager: Lost task 0.0 in stage 3.0 (TID 6, 
> ip-172-31-37-165.ec2.internal, executor 3): java.sql.SQLDataException: 
> [Simba][JDBC](10140) Error converting value to long.
> at com.simba.exceptions.ExceptionConverter.toSQLException(Unknown 
> Source)
> at com.simba.utilities.conversion.TypeConverter.toLong(Unknown Source)
> at com.simba.jdbc.common.SForwardResultSet.getLong(Unknown Source)
> at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$makeGetter$8.apply(JdbcUtils.scala:365)
> at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$makeGetter$8.apply(JdbcUtils.scala:364)
> at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anon$1.getNext(JdbcUtils.scala:286)
> at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anon$1.getNext(JdbcUtils.scala:268)
> at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
> at 
> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
> at 
> org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
> at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
>  Source)
> at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
> at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:377)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:231)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:225)
> at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827)
> at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> 

[jira] [Commented] (SPARK-21183) Unable to return Google BigQuery INTEGER data type into Spark via google BigQuery JDBC driver: java.sql.SQLDataException: [Simba][JDBC](10140) Error converting value t

2017-06-26 Thread Matthew Walton (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16063421#comment-16063421
 ] 

Matthew Walton commented on SPARK-21183:


Well, if only the SQuirreL Client tool didn't work on the same query.  It seems 
that the Spark code is the culprit and is causing the Simba driver to report 
that error.  I have another ticket opened for Spark with data coming from Hive 
drivers and it is similar in that it deals with how the integer data type is 
handled by Spark.  In that case I have several drivers I can test to show that 
the issue does not seem to originate in the driver.   Unfortunately, that one 
has not been assigned:  [https://issues.apache.org/jira/browse/SPARK-21179]  I 
only mention this issue since it seems to show a behavior in Spark when it 
comes to handling INTEGER types with certain JDBC drivers.  In each case, 
running the same SQL outside of Spark works fine.

> Unable to return Google BigQuery INTEGER data type into Spark via google 
> BigQuery JDBC driver: java.sql.SQLDataException: [Simba][JDBC](10140) Error 
> converting value to long.
> --
>
> Key: SPARK-21183
> URL: https://issues.apache.org/jira/browse/SPARK-21183
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell, SQL
>Affects Versions: 1.6.0, 2.0.0, 2.1.1
> Environment: OS:  Linux
> Spark  version 2.1.1
> JDBC:  Download the latest google BigQuery JDBC Driver from Google
>Reporter: Matthew Walton
>
> I'm trying to fetch back data in Spark using a JDBC connection to Google 
> BigQuery.  Unfortunately, when I try to query data that resides in an INTEGER 
> column I get the following error:  
> java.sql.SQLDataException: [Simba][JDBC](10140) Error converting value to 
> long.  
> Steps to reproduce:
> 1) On Google BigQuery console create a simple table with an INT column and 
> insert some data 
> 2) Copy the Google BigQuery JDBC driver to the machine where you will run 
> Spark Shell
> 3) Start Spark shell loading the GoogleBigQuery JDBC driver jar files
> ./spark-shell --jars 
> /home/ec2-user/jdbc/gbq/GoogleBigQueryJDBC42.jar,/home/ec2-user/jdbc/gbq/google-api-client-1.22.0.jar,/home/ec2-user/jdbc/gbq/google-api-services-bigquery-v2-rev320-1.22.0.jar,/home/ec2-user/jdbc/gbq/google-http-client-1.22.0.jar,/home/ec2-user/jdbc/gbq/google-http-client-jackson2-1.22.0.jar,/home/ec2-user/jdbc/gbq/google-oauth-client-1.22.0.jar,/home/ec2-user/jdbc/gbq/jackson-core-2.1.3.jar
> 4) In Spark shell load the data from Google BigQuery using the JDBC driver
> val gbq = spark.read.format("jdbc").options(Map("url" -> 
> "jdbc:bigquery://https://www.googleapis.com/bigquery/v2;ProjectId=your-project-name-here;OAuthType=0;OAuthPvtKeyPath=/usr/lib/spark/YourProjectPrivateKey.json;OAuthServiceAcctEmail=YourEmail@gmail.comAllowLargeResults=1;LargeResultDataset=_bqodbc_temp_tables;LargeResultTable=_matthew;Timeout=600","dbtable;
>  -> 
> "test.lu_test_integer")).option("driver","com.simba.googlebigquery.jdbc42.Driver").option("user","").option("password","").load()
> 5) In Spark shell try to display the data
> gbq.show()
> At this point you should see the error:
> scala> gbq.show()
> 17/06/22 19:34:57 WARN TaskSetManager: Lost task 0.0 in stage 3.0 (TID 6, 
> ip-172-31-37-165.ec2.internal, executor 3): java.sql.SQLDataException: 
> [Simba][JDBC](10140) Error converting value to long.
> at com.simba.exceptions.ExceptionConverter.toSQLException(Unknown 
> Source)
> at com.simba.utilities.conversion.TypeConverter.toLong(Unknown Source)
> at com.simba.jdbc.common.SForwardResultSet.getLong(Unknown Source)
> at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$makeGetter$8.apply(JdbcUtils.scala:365)
> at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$makeGetter$8.apply(JdbcUtils.scala:364)
> at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anon$1.getNext(JdbcUtils.scala:286)
> at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anon$1.getNext(JdbcUtils.scala:268)
> at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
> at 
> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
> at 
> org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
> at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
>  Source)
> at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
> 

[jira] [Commented] (SPARK-21183) Unable to return Google BigQuery INTEGER data type into Spark via google BigQuery JDBC driver: java.sql.SQLDataException: [Simba][JDBC](10140) Error converting value t

2017-06-26 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16063387#comment-16063387
 ] 

Sean Owen commented on SPARK-21183:
---

This looks like an error from simba, not Spark?

> Unable to return Google BigQuery INTEGER data type into Spark via google 
> BigQuery JDBC driver: java.sql.SQLDataException: [Simba][JDBC](10140) Error 
> converting value to long.
> --
>
> Key: SPARK-21183
> URL: https://issues.apache.org/jira/browse/SPARK-21183
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell, SQL
>Affects Versions: 1.6.0, 2.0.0, 2.1.1
> Environment: OS:  Linux
> Spark  version 2.1.1
> JDBC:  Download the latest google BigQuery JDBC Driver from Google
>Reporter: Matthew Walton
>
> I'm trying to fetch back data in Spark using a JDBC connection to Google 
> BigQuery.  Unfortunately, when I try to query data that resides in an INTEGER 
> column I get the following error:  
> java.sql.SQLDataException: [Simba][JDBC](10140) Error converting value to 
> long.  
> Steps to reproduce:
> 1) On Google BigQuery console create a simple table with an INT column and 
> insert some data 
> 2) Copy the Google BigQuery JDBC driver to the machine where you will run 
> Spark Shell
> 3) Start Spark shell loading the GoogleBigQuery JDBC driver jar files
> ./spark-shell --jars 
> /home/ec2-user/jdbc/gbq/GoogleBigQueryJDBC42.jar,/home/ec2-user/jdbc/gbq/google-api-client-1.22.0.jar,/home/ec2-user/jdbc/gbq/google-api-services-bigquery-v2-rev320-1.22.0.jar,/home/ec2-user/jdbc/gbq/google-http-client-1.22.0.jar,/home/ec2-user/jdbc/gbq/google-http-client-jackson2-1.22.0.jar,/home/ec2-user/jdbc/gbq/google-oauth-client-1.22.0.jar,/home/ec2-user/jdbc/gbq/jackson-core-2.1.3.jar
> 4) In Spark shell load the data from Google BigQuery using the JDBC driver
> val gbq = spark.read.format("jdbc").options(Map("url" -> 
> "jdbc:bigquery://https://www.googleapis.com/bigquery/v2;ProjectId=your-project-name-here;OAuthType=0;OAuthPvtKeyPath=/usr/lib/spark/YourProjectPrivateKey.json;OAuthServiceAcctEmail=YourEmail@gmail.comAllowLargeResults=1;LargeResultDataset=_bqodbc_temp_tables;LargeResultTable=_matthew;Timeout=600","dbtable;
>  -> 
> "test.lu_test_integer")).option("driver","com.simba.googlebigquery.jdbc42.Driver").option("user","").option("password","").load()
> 5) In Spark shell try to display the data
> gbq.show()
> At this point you should see the error:
> scala> gbq.show()
> 17/06/22 19:34:57 WARN TaskSetManager: Lost task 0.0 in stage 3.0 (TID 6, 
> ip-172-31-37-165.ec2.internal, executor 3): java.sql.SQLDataException: 
> [Simba][JDBC](10140) Error converting value to long.
> at com.simba.exceptions.ExceptionConverter.toSQLException(Unknown 
> Source)
> at com.simba.utilities.conversion.TypeConverter.toLong(Unknown Source)
> at com.simba.jdbc.common.SForwardResultSet.getLong(Unknown Source)
> at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$makeGetter$8.apply(JdbcUtils.scala:365)
> at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$makeGetter$8.apply(JdbcUtils.scala:364)
> at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anon$1.getNext(JdbcUtils.scala:286)
> at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anon$1.getNext(JdbcUtils.scala:268)
> at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
> at 
> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
> at 
> org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
> at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
>  Source)
> at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
> at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:377)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:231)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:225)
> at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827)
> at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
> 

[jira] [Commented] (SPARK-21183) Unable to return Google BigQuery INTEGER data type into Spark via google BigQuery JDBC driver: java.sql.SQLDataException: [Simba][JDBC](10140) Error converting value t

2017-06-26 Thread Matthew Walton (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16062969#comment-16062969
 ] 

Matthew Walton commented on SPARK-21183:


Hi Sean,
Thanks for the response.

It seems like any value will cause the issue as I can reproduce with some 
simple integer values.  For example, I create a table called repro_issue in GBQ 
console and have one column (col1) with data type integer.  I then run the 
following statement:

INSERT your_data_set_here.repro_issue (col1)
VALUES(10),
  (20),
  (30),
  (10),
  (-20),
  (-30),
  (7)

>From Spark shell I get the above but here is the full error from spark shell:

Welcome to
    __
 / __/__  ___ _/ /__
_\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.1.1
  /_/

Using Scala version 2.11.8 (OpenJDK 64-Bit Server VM, Java 1.8.0_131)
Type in expressions to have them evaluated.
Type :help for more information.

scala> val gbq = spark.read.format("jdbc").options(Map("url" -> 
"jdbc:bigquery://https://www.googleapis.com/bigquery/v2;ProjectId=steam-strategy-126421;OAuthType=0;OAuthPvtKeyPath=/usr/lib/spark/YourProjectPrivateKey.json;OAuthServiceAcctEmail=YourEmail@gmail;AllowLargeResults=1;LargeResultDataset=_bqodbc_temp_tables;LargeResultTable=_matthew;Timeout=600","dbtable;
 -> 
"bmp1x.repro_issue")).option("driver","com.simba.googlebigquery.jdbc42.Driver").option("user","").option("password","").load()
gbq: org.apache.spark.sql.DataFrame = [col1: bigint]

scala> gbq.show()
17/06/26 11:32:30 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, 
ip-172-31-37-165.ec2.internal, executor 1): java.sql.SQLDataException: 
[Simba][JDBC](10140) Error converting value to long.
at com.simba.exceptions.ExceptionConverter.toSQLException(Unknown 
Source)
at com.simba.utilities.conversion.TypeConverter.toLong(Unknown Source)
at com.simba.jdbc.common.SForwardResultSet.getLong(Unknown Source)
at 
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$makeGetter$8.apply(JdbcUtils.scala:365)
at 
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$makeGetter$8.apply(JdbcUtils.scala:364)
at 
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anon$1.getNext(JdbcUtils.scala:286)
at 
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anon$1.getNext(JdbcUtils.scala:268)
at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
at 
org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
at 
org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
 Source)
at 
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at 
org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:377)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:231)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:225)
at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827)
at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:99)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)

17/06/26 11:32:36 ERROR TaskSetManager: Task 0 in stage 0.0 failed 4 times; 
aborting job
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 
3, ip-172-31-37-165.ec2.internal, executor 1): java.sql.SQLDataException: 
[Simba][JDBC](10140) Error converting value to long.
at com.simba.exceptions.ExceptionConverter.toSQLException(Unknown 
Source)
at com.simba.utilities.conversion.TypeConverter.toLong(Unknown Source)
at com.simba.jdbc.common.SForwardResultSet.getLong(Unknown Source)
at 

[jira] [Commented] (SPARK-21183) Unable to return Google BigQuery INTEGER data type into Spark via google BigQuery JDBC driver: java.sql.SQLDataException: [Simba][JDBC](10140) Error converting value t

2017-06-23 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16060557#comment-16060557
 ] 

Sean Owen commented on SPARK-21183:
---

I'm not clear this is a Spark problem. What's the underlying error? what value 
failed to convert?

> Unable to return Google BigQuery INTEGER data type into Spark via google 
> BigQuery JDBC driver: java.sql.SQLDataException: [Simba][JDBC](10140) Error 
> converting value to long.
> --
>
> Key: SPARK-21183
> URL: https://issues.apache.org/jira/browse/SPARK-21183
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell, SQL
>Affects Versions: 1.6.0, 2.0.0, 2.1.1
> Environment: OS:  Linux
> Spark  version 2.1.1
> JDBC:  Download the latest google BigQuery JDBC Driver from Google
>Reporter: Matthew Walton
>
> I'm trying to fetch back data in Spark using a JDBC connection to Google 
> BigQuery.  Unfortunately, when I try to query data that resides in an INTEGER 
> column I get the following error:  
> java.sql.SQLDataException: [Simba][JDBC](10140) Error converting value to 
> long.  
> Steps to reproduce:
> 1) On Google BigQuery console create a simple table with an INT column and 
> insert some data 
> 2) Copy the Google BigQuery JDBC driver to the machine where you will run 
> Spark Shell
> 3) Start Spark shell loading the GoogleBigQuery JDBC driver jar files
> ./spark-shell --jars 
> /home/ec2-user/jdbc/gbq/GoogleBigQueryJDBC42.jar,/home/ec2-user/jdbc/gbq/google-api-client-1.22.0.jar,/home/ec2-user/jdbc/gbq/google-api-services-bigquery-v2-rev320-1.22.0.jar,/home/ec2-user/jdbc/gbq/google-http-client-1.22.0.jar,/home/ec2-user/jdbc/gbq/google-http-client-jackson2-1.22.0.jar,/home/ec2-user/jdbc/gbq/google-oauth-client-1.22.0.jar,/home/ec2-user/jdbc/gbq/jackson-core-2.1.3.jar
> 4) In Spark shell load the data from Google BigQuery using the JDBC driver
> val gbq = spark.read.format("jdbc").options(Map("url" -> 
> "jdbc:bigquery://https://www.googleapis.com/bigquery/v2;ProjectId=your-project-name-here;OAuthType=0;OAuthPvtKeyPath=/usr/lib/spark/YourProjectPrivateKey.json;OAuthServiceAcctEmail=YourEmail@gmail.comAllowLargeResults=1;LargeResultDataset=_bqodbc_temp_tables;LargeResultTable=_matthew;Timeout=600","dbtable;
>  -> 
> "test.lu_test_integer")).option("driver","com.simba.googlebigquery.jdbc42.Driver").option("user","").option("password","").load()
> 5) In Spark shell try to display the data
> gbq.show()
> At this point you should see the error:
> scala> gbq.show()
> 17/06/22 19:34:57 WARN TaskSetManager: Lost task 0.0 in stage 3.0 (TID 6, 
> ip-172-31-37-165.ec2.internal, executor 3): java.sql.SQLDataException: 
> [Simba][JDBC](10140) Error converting value to long.
> at com.simba.exceptions.ExceptionConverter.toSQLException(Unknown 
> Source)
> at com.simba.utilities.conversion.TypeConverter.toLong(Unknown Source)
> at com.simba.jdbc.common.SForwardResultSet.getLong(Unknown Source)
> at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$makeGetter$8.apply(JdbcUtils.scala:365)
> at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$makeGetter$8.apply(JdbcUtils.scala:364)
> at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anon$1.getNext(JdbcUtils.scala:286)
> at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anon$1.getNext(JdbcUtils.scala:268)
> at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
> at 
> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
> at 
> org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
> at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
>  Source)
> at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
> at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:377)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:231)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:225)
> at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827)
> at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at