[jira] [Comment Edited] (SPARK-26996) Scalar Subquery not handled properly in Spark 2.4

Dongjoon Hyun (JIRA) Tue, 26 Feb 2019 10:54:32 -0800


    [ 
https://issues.apache.org/jira/browse/SPARK-26996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16778461#comment-16778461
 ]


Dongjoon Hyun edited comment on SPARK-26996 at 2/26/19 6:53 PM:
----------------------------------------------------------------

As a workaround, please turn off `spark.sql.optimizer.metadataOnly` 
configuration. Then, it will work for you. Due to that bug, this configuration 
is turned off again by SPARK-26709 at Spark 2.4.1. I'll close this issue as a 
duplicate of SPARK-26709.

{code}
scala> sql("set spark.sql.optimizer.metadataOnly=false")

scala> 
spark.read.load("/tmp/latest_dates").createOrReplaceTempView("latest_dates")

scala> 
spark.read.load("/tmp/mypartitioneddata").createOrReplaceTempView("source1")

scala> spark.sql("select max(date), 'source1' as category from source1 where 
date >= (select latest_date from latest_dates where source='source1') ").show
+----------+--------+
| max(date)|category|
+----------+--------+
|2018-08-30| source1|
+----------+--------+

scala> sc.version
res6: String = 2.4.0
{code}

cc [~Gengliang.Wang] and [~maropu]


was (Author: dongjoon):
As a workaround, please turn off `spark.sql.optimizer.metadataOnly` 
configuration. Then, it will work for you. Due to that bug, this configuration 
is turned off again by SPARK-26709 at Spark 2.4.1. I'll close this issue as a 
duplicate of SPARK-26709.

{code}
scala> sql("set spark.sql.optimizer.metadataOnly=false")

scala> 
spark.read.load("/tmp/latest_dates").createOrReplaceTempView("latest_dates")

scala> 
spark.read.load("/tmp/mypartitioneddata").createOrReplaceTempView("source1")

scala> spark.sql("select max(date), 'source1' as category from source1 where 
date >= (select latest_date from latest_dates where source='source1') ").show
+----------+--------+
| max(date)|category|
+----------+--------+
|2018-08-30| source1|
+----------+--------+

scala> sc.version
res6: String = 2.4.0
{code}

cc [~Gengliang.Wang].

> Scalar Subquery not handled properly in Spark 2.4 
> --------------------------------------------------
>
>                 Key: SPARK-26996
>                 URL: https://issues.apache.org/jira/browse/SPARK-26996
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.4.0
>            Reporter: Ilya Peysakhov
>            Priority: Critical
>             Fix For: 2.4.1
>
>
> Spark 2.4 reports an error when querying a dataframe that has only 1 row and 
> 1 column (scalar subquery). 
>  
> Reproducer is below. No other data is needed to reproduce the error.
> This will write a table of dates and strings, write another "fact" table of 
> ints and dates, then read both tables as views and filter the "fact" based on 
> the max(date) from the first table. This is done within spark-shell in spark 
> 2.4 vanilla (also reproduced in AWS EMR 5.20.0)
> -------------------------
> spark.sql("select '2018-01-01' as latest_date, 'source1' as source UNION ALL 
> select '2018-01-02', 'source2' UNION ALL select '2018-01-03' , 'source3' 
> UNION ALL select '2018-01-04' ,'source4' 
> ").write.mode("overwrite").save("/latest_dates")
>  val mydatetable = spark.read.load("/latest_dates")
>  mydatetable.createOrReplaceTempView("latest_dates")
> spark.sql("select 50 as mysum, '2018-01-01' as date UNION ALL select 100, 
> '2018-01-02' UNION ALL select 300, '2018-01-03' UNION ALL select 3444, 
> '2018-01-01' UNION ALL select 600, '2018-08-30' 
> ").write.mode("overwrite").partitionBy("date").save("/mypartitioneddata")
>  val source1 = spark.read.load("/mypartitioneddata")
>  source1.createOrReplaceTempView("source1")
> spark.sql("select max(date), 'source1' as category from source1 where date >= 
> (select latest_date from latest_dates where source='source1') ").show
>  ----------------------------
>  
> Error summary
> —
> java.lang.UnsupportedOperationException: Cannot evaluate expression: 
> scalar-subquery#35 []
>  at 
> org.apache.spark.sql.catalyst.expressions.Unevaluable$class.eval(Expression.scala:258)
>  at 
> org.apache.spark.sql.catalyst.expressions.ScalarSubquery.eval(subquery.scala:246)
> -------
> This reproducer works in previous versions (2.3.2, 2.3.1, etc).
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-26996) Scalar Subquery not handled properly in Spark 2.4

Reply via email to