[ https://issues.apache.org/jira/browse/SPARK-26057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Wenchen Fan reassigned SPARK-26057: ----------------------------------- Assignee: Marco Gaido > Table joining is broken in Spark 2.4 > ------------------------------------ > > Key: SPARK-26057 > URL: https://issues.apache.org/jira/browse/SPARK-26057 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.4.0 > Reporter: Pavel Parkhomenko > Assignee: Marco Gaido > Priority: Major > Fix For: 2.4.1, 3.0.0 > > > This sample works in spark-shell 2.3.1 and throws an exception in 2.4.0 > {code:java} > import java.util.Arrays.asList > import org.apache.spark.sql.Row > import org.apache.spark.sql.types._ > spark.createDataFrame( > asList( > Row("1-1", "sp", 6), > Row("1-1", "pc", 5), > Row("1-2", "pc", 4), > Row("2-1", "sp", 3), > Row("2-2", "pc", 2), > Row("2-2", "sp", 1) > ), > StructType(List(StructField("id", StringType), StructField("layout", > StringType), StructField("n", IntegerType))) > ).createOrReplaceTempView("cc") > spark.createDataFrame( > asList( > Row("sp", 1), > Row("sp", 1), > Row("sp", 2), > Row("sp", 3), > Row("sp", 3), > Row("sp", 4), > Row("sp", 5), > Row("sp", 5), > Row("pc", 1), > Row("pc", 2), > Row("pc", 2), > Row("pc", 3), > Row("pc", 4), > Row("pc", 4), > Row("pc", 5) > ), > StructType(List(StructField("layout", StringType), StructField("ts", > IntegerType))) > ).createOrReplaceTempView("p") > spark.createDataFrame( > asList( > Row("1-1", "sp", 1), > Row("1-1", "sp", 2), > Row("1-1", "pc", 3), > Row("1-2", "pc", 3), > Row("1-2", "pc", 4), > Row("2-1", "sp", 4), > Row("2-1", "sp", 5), > Row("2-2", "pc", 6), > Row("2-2", "sp", 6) > ), > StructType(List(StructField("id", StringType), StructField("layout", > StringType), StructField("ts", IntegerType))) > ).createOrReplaceTempView("c") > spark.sql(""" > SELECT cc.id, cc.layout, count(*) as m > FROM cc > JOIN p USING(layout) > WHERE EXISTS(SELECT 1 FROM c WHERE c.id = cc.id AND c.layout = cc.layout > AND c.ts > p.ts) > GROUP BY cc.id, cc.layout > """).createOrReplaceTempView("pcc") > spark.sql("SELECT * FROM pcc ORDER BY id, layout").show > spark.sql(""" > SELECT cc.id, cc.layout, n, m > FROM cc > LEFT OUTER JOIN pcc ON pcc.id = cc.id AND pcc.layout = cc.layout > """).createOrReplaceTempView("k") > spark.sql("SELECT * FROM k ORDER BY id, layout").show > {code} > Actually I tried to catch another bug: similar calculations with joins and > nested queries have different results in Spark 2.3.1 and 2.4.0, but when I > tried to create a minimal example I received exception > {code:java} > java.lang.RuntimeException: Couldn't find id#0 in > [id#38,layout#39,ts#7,id#10,layout#11,ts#12] > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org