Josh Rosen created SPARK-17120:
----------------------------------

             Summary: Analyzer incorrectly optimizes plan to empty LocalRelation
                 Key: SPARK-17120
                 URL: https://issues.apache.org/jira/browse/SPARK-17120
             Project: Spark
          Issue Type: Bug
    Affects Versions: 2.1.0
            Reporter: Josh Rosen
            Priority: Blocker


Consider the following query:

{code}
sc.parallelize(Seq(97)).toDF("int_col_6").createOrReplaceTempView("table_3")
sc.parallelize(Seq(0)).toDF("int_col_1").createOrReplaceTempView("table_4")

println(sql("""
  SELECT
  *
  FROM (
  SELECT
      COALESCE(t2.int_col_1, t1.int_col_6) AS int_col
      FROM table_3 t1
      LEFT JOIN table_4 t2 ON false
  ) t where (t.int_col) is not null
""").collect().toSeq)
{code}

In the innermost query, the LEFT JOIN's condition is {{false}} but nevertheless 
the number of rows produced should equal the number of rows in {{table_3}} 
(which is non-empty). Since no values are {{null}}, the outer {{where}} should 
retain all rows, so the overall result of this query should contain a single 
row with the value '97'.

Instead, the current Spark master (as of 
12a89e55cbd630fa2986da984e066cd07d3bf1f7 at least) returns no rows. Looking at 
{{explain}}, it appears that the logical plan is optimizing to {{LocalRelation 
<empty>}}, so Spark doesn't even run the query. My suspicion is that there's a 
bug in constraint propagation or filter pushdown.

This issue doesn't seem to affect Spark 2.0, so I think it's a regression in 
master. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to