Josh Rosen created SPARK-17120: ---------------------------------- Summary: Analyzer incorrectly optimizes plan to empty LocalRelation Key: SPARK-17120 URL: https://issues.apache.org/jira/browse/SPARK-17120 Project: Spark Issue Type: Bug Affects Versions: 2.1.0 Reporter: Josh Rosen Priority: Blocker
Consider the following query: {code} sc.parallelize(Seq(97)).toDF("int_col_6").createOrReplaceTempView("table_3") sc.parallelize(Seq(0)).toDF("int_col_1").createOrReplaceTempView("table_4") println(sql(""" SELECT * FROM ( SELECT COALESCE(t2.int_col_1, t1.int_col_6) AS int_col FROM table_3 t1 LEFT JOIN table_4 t2 ON false ) t where (t.int_col) is not null """).collect().toSeq) {code} In the innermost query, the LEFT JOIN's condition is {{false}} but nevertheless the number of rows produced should equal the number of rows in {{table_3}} (which is non-empty). Since no values are {{null}}, the outer {{where}} should retain all rows, so the overall result of this query should contain a single row with the value '97'. Instead, the current Spark master (as of 12a89e55cbd630fa2986da984e066cd07d3bf1f7 at least) returns no rows. Looking at {{explain}}, it appears that the logical plan is optimizing to {{LocalRelation <empty>}}, so Spark doesn't even run the query. My suspicion is that there's a bug in constraint propagation or filter pushdown. This issue doesn't seem to affect Spark 2.0, so I think it's a regression in master. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org