[ https://issues.apache.org/jira/browse/SPARK-26996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16778461#comment-16778461 ]
Dongjoon Hyun edited comment on SPARK-26996 at 2/26/19 6:53 PM: ---------------------------------------------------------------- As a workaround, please turn off `spark.sql.optimizer.metadataOnly` configuration. Then, it will work for you. Due to that bug, this configuration is turned off again by SPARK-26709 at Spark 2.4.1. I'll close this issue as a duplicate of SPARK-26709. {code} scala> sql("set spark.sql.optimizer.metadataOnly=false") scala> spark.read.load("/tmp/latest_dates").createOrReplaceTempView("latest_dates") scala> spark.read.load("/tmp/mypartitioneddata").createOrReplaceTempView("source1") scala> spark.sql("select max(date), 'source1' as category from source1 where date >= (select latest_date from latest_dates where source='source1') ").show +----------+--------+ | max(date)|category| +----------+--------+ |2018-08-30| source1| +----------+--------+ scala> sc.version res6: String = 2.4.0 {code} cc [~Gengliang.Wang] and [~maropu] was (Author: dongjoon): As a workaround, please turn off `spark.sql.optimizer.metadataOnly` configuration. Then, it will work for you. Due to that bug, this configuration is turned off again by SPARK-26709 at Spark 2.4.1. I'll close this issue as a duplicate of SPARK-26709. {code} scala> sql("set spark.sql.optimizer.metadataOnly=false") scala> spark.read.load("/tmp/latest_dates").createOrReplaceTempView("latest_dates") scala> spark.read.load("/tmp/mypartitioneddata").createOrReplaceTempView("source1") scala> spark.sql("select max(date), 'source1' as category from source1 where date >= (select latest_date from latest_dates where source='source1') ").show +----------+--------+ | max(date)|category| +----------+--------+ |2018-08-30| source1| +----------+--------+ scala> sc.version res6: String = 2.4.0 {code} cc [~Gengliang.Wang]. > Scalar Subquery not handled properly in Spark 2.4 > -------------------------------------------------- > > Key: SPARK-26996 > URL: https://issues.apache.org/jira/browse/SPARK-26996 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.4.0 > Reporter: Ilya Peysakhov > Priority: Critical > Fix For: 2.4.1 > > > Spark 2.4 reports an error when querying a dataframe that has only 1 row and > 1 column (scalar subquery). > > Reproducer is below. No other data is needed to reproduce the error. > This will write a table of dates and strings, write another "fact" table of > ints and dates, then read both tables as views and filter the "fact" based on > the max(date) from the first table. This is done within spark-shell in spark > 2.4 vanilla (also reproduced in AWS EMR 5.20.0) > ------------------------- > spark.sql("select '2018-01-01' as latest_date, 'source1' as source UNION ALL > select '2018-01-02', 'source2' UNION ALL select '2018-01-03' , 'source3' > UNION ALL select '2018-01-04' ,'source4' > ").write.mode("overwrite").save("/latest_dates") > val mydatetable = spark.read.load("/latest_dates") > mydatetable.createOrReplaceTempView("latest_dates") > spark.sql("select 50 as mysum, '2018-01-01' as date UNION ALL select 100, > '2018-01-02' UNION ALL select 300, '2018-01-03' UNION ALL select 3444, > '2018-01-01' UNION ALL select 600, '2018-08-30' > ").write.mode("overwrite").partitionBy("date").save("/mypartitioneddata") > val source1 = spark.read.load("/mypartitioneddata") > source1.createOrReplaceTempView("source1") > spark.sql("select max(date), 'source1' as category from source1 where date >= > (select latest_date from latest_dates where source='source1') ").show > ---------------------------- > > Error summary > — > java.lang.UnsupportedOperationException: Cannot evaluate expression: > scalar-subquery#35 [] > at > org.apache.spark.sql.catalyst.expressions.Unevaluable$class.eval(Expression.scala:258) > at > org.apache.spark.sql.catalyst.expressions.ScalarSubquery.eval(subquery.scala:246) > ------- > This reproducer works in previous versions (2.3.2, 2.3.1, etc). > -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org