[ https://issues.apache.org/jira/browse/SPARK-26996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16779295#comment-16779295 ]
Gengliang Wang commented on SPARK-26996: ---------------------------------------- Thanks [~dongjoon]! > Scalar Subquery not handled properly in Spark 2.4 > -------------------------------------------------- > > Key: SPARK-26996 > URL: https://issues.apache.org/jira/browse/SPARK-26996 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.4.0 > Reporter: Ilya Peysakhov > Priority: Critical > > Spark 2.4 reports an error when querying a dataframe that has only 1 row and > 1 column (scalar subquery). > > Reproducer is below. No other data is needed to reproduce the error. > This will write a table of dates and strings, write another "fact" table of > ints and dates, then read both tables as views and filter the "fact" based on > the max(date) from the first table. This is done within spark-shell in spark > 2.4 vanilla (also reproduced in AWS EMR 5.20.0) > ------------------------- > spark.sql("select '2018-01-01' as latest_date, 'source1' as source UNION ALL > select '2018-01-02', 'source2' UNION ALL select '2018-01-03' , 'source3' > UNION ALL select '2018-01-04' ,'source4' > ").write.mode("overwrite").save("/latest_dates") > val mydatetable = spark.read.load("/latest_dates") > mydatetable.createOrReplaceTempView("latest_dates") > spark.sql("select 50 as mysum, '2018-01-01' as date UNION ALL select 100, > '2018-01-02' UNION ALL select 300, '2018-01-03' UNION ALL select 3444, > '2018-01-01' UNION ALL select 600, '2018-08-30' > ").write.mode("overwrite").partitionBy("date").save("/mypartitioneddata") > val source1 = spark.read.load("/mypartitioneddata") > source1.createOrReplaceTempView("source1") > spark.sql("select max(date), 'source1' as category from source1 where date >= > (select latest_date from latest_dates where source='source1') ").show > ---------------------------- > > Error summary > — > java.lang.UnsupportedOperationException: Cannot evaluate expression: > scalar-subquery#35 [] > at > org.apache.spark.sql.catalyst.expressions.Unevaluable$class.eval(Expression.scala:258) > at > org.apache.spark.sql.catalyst.expressions.ScalarSubquery.eval(subquery.scala:246) > ------- > This reproducer works in previous versions (2.3.2, 2.3.1, etc). > -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org