[jira] [Comment Edited] (SPARK-18436) isin causing SQL syntax error with JDBC

Jiang Xingbo (JIRA) Wed, 16 Nov 2016 08:45:22 -0800

    [ 
https://issues.apache.org/jira/browse/SPARK-18436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15670939#comment-15670939
 ]


Jiang Xingbo edited comment on SPARK-18436 at 11/16/16 4:44 PM:
----------------------------------------------------------------

I created  a simpler example for this issue:
{code}
val filter = Array[Int]()
val df = Seq((1, (1, 1))).toDF("a", "b")
df.filter($"a".isin(filter:_*)).explain
{code}

The `queryExecution` of this DF is:
{code}
== Parsed Logical Plan ==
'Filter 'a IN ()
+- Project [_1#2 AS a#5, _2#3 AS b#6]
   +- LocalRelation [_1#2, _2#3]

== Analyzed Logical Plan ==
a: int, b: struct<_1:int,_2:int>
Filter a#5 IN ()
+- Project [_1#2 AS a#5, _2#3 AS b#6]
   +- LocalRelation [_1#2, _2#3]

== Optimized Logical Plan ==
Project [_1#2 AS a#5, _2#3 AS b#6]
+- Filter _1#2 IN ()
   +- LocalRelation [_1#2, _2#3]

== Physical Plan ==
*Project [_1#2 AS a#5, _2#3 AS b#6]
+- *Filter _1#2 IN ()
   +- LocalTableScan [_1#2, _2#3]
{code}

In SQL we will throw a `ParseException` for
{code}
SELECT * FROM t WHERE a IN ()
{code}

Seems we have different behavior between DF and SQL. Should we throw a 
`AnalyzeException` in `checkAnalyze`? Or should we deal with empty `IN` in 
`OptimizeIn`? Should we change the behavior of SQL too?
After we have defined the right behavior for this, I would like to send a PR. 
Thank you!


was (Author: jiangxb1987):
The `queryExecution` of this DF is:
{code}
== Parsed Logical Plan ==
'Filter 'a IN ()
+- Project [_1#2 AS a#5, _2#3 AS b#6]
   +- LocalRelation [_1#2, _2#3]

== Analyzed Logical Plan ==
a: int, b: struct<_1:int,_2:int>
Filter a#5 IN ()
+- Project [_1#2 AS a#5, _2#3 AS b#6]
   +- LocalRelation [_1#2, _2#3]

== Optimized Logical Plan ==
Project [_1#2 AS a#5, _2#3 AS b#6]
+- Filter _1#2 IN ()
   +- LocalRelation [_1#2, _2#3]

== Physical Plan ==
*Project [_1#2 AS a#5, _2#3 AS b#6]
+- *Filter _1#2 IN ()
   +- LocalTableScan [_1#2, _2#3]
{code}

In SQL we will throw a `ParseException` for
{code}
SELECT * FROM t WHERE a IN ()
{code}

Seems we have different behavior between DF and SQL. Should we throw a 
`AnalyzeException` in `checkAnalyze`? Or should we deal with empty `IN` in 
`OptimizeIn`? Should we change the behavior of SQL too?
After we have defined the right behavior for this, I would like to send a PR. 
Thank you!

> isin causing SQL syntax error with JDBC
> ---------------------------------------
>
>                 Key: SPARK-18436
>                 URL: https://issues.apache.org/jira/browse/SPARK-18436
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.0.1
>         Environment: Linux, SQL Server 2012
>            Reporter: Dan
>              Labels: jdbc, sql
>
> When using a JDBC data source, the "isin" function generates invalid SQL 
> syntax when called with an empty array, which causes the JDBC driver to throw 
> an exception.
> If the array is not empty, it works fine.
> In the below example you can assume that SOURCE_CONNECTION, SQL_DRIVER and 
> TABLE are all correctly defined.
> {noformat}
> scala> val filter = Array[String]()
> filter: Array[String] = Array()
> scala> val sortDF = spark.read.format("jdbc").options(Map("url" -> 
> SOURCE_CONNECTION, "driver" -> SQL_DRIVER, "dbtable" -> 
> TABLE)).load().filter($"cl_ult".isin(filter:_*))
> sortDF: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = 
> [ibi_bulk_id: bigint, ibi_row_id: int ... 174 more fields]
> scala> sortDF.show()
> 16/11/14 15:35:46 ERROR Executor: Exception in task 0.0 in stage 6.0 (TID 205)
> com.microsoft.sqlserver.jdbc.SQLServerException: Incorrect syntax near ')'.
>         at 
> com.microsoft.sqlserver.jdbc.SQLServerException.makeFromDatabaseError(SQLServerException.java:216)
>         at 
> com.microsoft.sqlserver.jdbc.SQLServerStatement.getNextResult(SQLServerStatement.java:1515)
>         at 
> com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement.doExecutePreparedStatement(SQLServerPreparedStatement.java:404)
>         at 
> com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement$PrepStmtExecCmd.doExecute(SQLServerPreparedStatement.java:350)
>         at com.microsoft.sqlserver.jdbc.TDSCommand.execute(IOBuffer.java:5696)
>         at 
> com.microsoft.sqlserver.jdbc.SQLServerConnection.executeCommand(SQLServerConnection.java:1715)
>         at 
> com.microsoft.sqlserver.jdbc.SQLServerStatement.executeCommand(SQLServerStatement.java:180)
>         at 
> com.microsoft.sqlserver.jdbc.SQLServerStatement.executeStatement(SQLServerStatement.java:155)
>         at 
> com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement.executeQuery(SQLServerPreparedStatement.java:285)
>         at 
> org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$$anon$1.<init>(JDBCRDD.scala:408)
>         at 
> org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD.compute(JDBCRDD.scala:379)
>         at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
>         at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
>         at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>         at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
>         at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
>         at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>         at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
>         at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
>         at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>         at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
>         at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
>         at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
>         at org.apache.spark.scheduler.Task.run(Task.scala:86)
>         at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         at java.lang.Thread.run(Thread.java:745)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-18436) isin causing SQL syntax error with JDBC

Reply via email to