Xiaoju Wu created SPARK-30298: --------------------------------- Summary: bucket join cannot work for self-join with views Key: SPARK-30298 URL: https://issues.apache.org/jira/browse/SPARK-30298 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.0.0 Reporter: Xiaoju Wu
This UT may fail at the last line: {code:java} test("bucket join cannot work for self-join with views") { withSQLConf(SQLConf.AUTO_BROADCASTJOIN_THRESHOLD.key -> "1") { withTable("t1") { val df = (0 until 20).map(i => (i, i)).toDF("i", "j").as("df") df.write .format("parquet") .bucketBy(8, "i") .saveAsTable("t1") sql(s"create view v1 as select * from t1").collect() val plan1 = sql("SELECT * FROM t1 a JOIN t1 b ON a.i = b.i").queryExecution.executedPlan assert(plan1.collect { case exchange : ShuffleExchangeExec => exchange }.isEmpty) val plan2 = sql("SELECT * FROM t1 a JOIN v1 b ON a.i = b.i").queryExecution.executedPlan assert(plan2.collect { case exchange : ShuffleExchangeExec => exchange }.isEmpty) } } } {code} It's because View will add Project with Alias, then Join's requiredDistribution is based on Alias, but ProjectExec passes child's outputPartition up without Alias. Then the satisfies check cannot meet in EnsureRequirement. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org