[jira] [Updated] (SPARK-30220) Support Filter expression uses IN/EXISTS predicate sub-queries

jiaan.geng (Jira) Wed, 11 Dec 2019 02:16:22 -0800


     [ 
https://issues.apache.org/jira/browse/SPARK-30220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


jiaan.geng updated SPARK-30220:
-------------------------------
    Description: 
Spark SQL cannot supports a SQL with nested aggregate as below:

 
{code:java}
select sum(unique1) FILTER (WHERE
 unique1 IN (SELECT unique1 FROM onek where unique1 < 100)) FROM tenk1;{code}
 

And Spark will throw exception as follows:

 
{code:java}
org.apache.spark.sql.AnalysisException
IN/EXISTS predicate sub-queries can only be used in Filter/Join and a few 
commands: Aggregate [sum(cast(unique1#x as bigint)) AS sum(unique1)#xL]
: +- Project [unique1#x]
: +- Filter (unique1#x < 100)
: +- SubqueryAlias `onek`
: +- RelationV2[unique1#x, unique2#x, two#x, four#x, ten#x, twenty#x, 
hundred#x, thousand#x, twothousand#x, fivethous#x, tenthous#x, odd#x, even#x, 
stringu1#x, stringu2#x, string4#x] csv 
file:/home/xitong/code/gengjiaan/spark/sql/core/target/scala-2.12/test-classes/test-data/postgresql/onek.data
+- SubqueryAlias `tenk1`
 +- RelationV2[unique1#x, unique2#x, two#x, four#x, ten#x, twenty#x, hundred#x, 
thousand#x, twothousand#x, fivethous#x, tenthous#x, odd#x, even#x, stringu1#x, 
stringu2#x, string4#x] csv 
file:/home/xitong/code/gengjiaan/spark/sql/core/target/scala-2.12/test-classes/test-data/postgresql/tenk.data{code}
 

But PostgreSQL supports this syntax.
{code:java}
select sum(unique1) FILTER (WHERE
 unique1 IN (SELECT unique1 FROM onek where unique1 < 100)) FROM tenk1;
 sum 
------
 4950
(1 row){code}

> Support Filter expression uses IN/EXISTS predicate sub-queries
> --------------------------------------------------------------
>
>                 Key: SPARK-30220
>                 URL: https://issues.apache.org/jira/browse/SPARK-30220
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>    Affects Versions: 3.0.0
>            Reporter: jiaan.geng
>            Priority: Major
>
> Spark SQL cannot supports a SQL with nested aggregate as below:
>  
> {code:java}
> select sum(unique1) FILTER (WHERE
>  unique1 IN (SELECT unique1 FROM onek where unique1 < 100)) FROM tenk1;{code}
>  
> And Spark will throw exception as follows:
>  
> {code:java}
> org.apache.spark.sql.AnalysisException
> IN/EXISTS predicate sub-queries can only be used in Filter/Join and a few 
> commands: Aggregate [sum(cast(unique1#x as bigint)) AS sum(unique1)#xL]
> : +- Project [unique1#x]
> : +- Filter (unique1#x < 100)
> : +- SubqueryAlias `onek`
> : +- RelationV2[unique1#x, unique2#x, two#x, four#x, ten#x, twenty#x, 
> hundred#x, thousand#x, twothousand#x, fivethous#x, tenthous#x, odd#x, even#x, 
> stringu1#x, stringu2#x, string4#x] csv 
> file:/home/xitong/code/gengjiaan/spark/sql/core/target/scala-2.12/test-classes/test-data/postgresql/onek.data
> +- SubqueryAlias `tenk1`
>  +- RelationV2[unique1#x, unique2#x, two#x, four#x, ten#x, twenty#x, 
> hundred#x, thousand#x, twothousand#x, fivethous#x, tenthous#x, odd#x, even#x, 
> stringu1#x, stringu2#x, string4#x] csv 
> file:/home/xitong/code/gengjiaan/spark/sql/core/target/scala-2.12/test-classes/test-data/postgresql/tenk.data{code}
>  
> But PostgreSQL supports this syntax.
> {code:java}
> select sum(unique1) FILTER (WHERE
>  unique1 IN (SELECT unique1 FROM onek where unique1 < 100)) FROM tenk1;
>  sum 
> ------
>  4950
> (1 row){code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-30220) Support Filter expression uses IN/EXISTS predicate sub-queries

Reply via email to