[jira] [Updated] (SPARK-41162) Anti-join must not be pushed below aggregation with ambiguous predicates
[ https://issues.apache.org/jira/browse/SPARK-41162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enrico Minack updated SPARK-41162: -- Affects Version/s: 3.0.3 > Anti-join must not be pushed below aggregation with ambiguous predicates > > > Key: SPARK-41162 > URL: https://issues.apache.org/jira/browse/SPARK-41162 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.3, 3.1.3, 3.3.1, 3.2.3, 3.4.0 >Reporter: Enrico Minack >Priority: Major > Labels: correctness > > The following query should return a single row as all values for {{id}} > except for the largest will be eliminated by the anti-join: > {code} > val ids = Seq(1, 2, 3).toDF("id").distinct() > val result = ids.withColumn("id", $"id" + 1).join(ids, "id", > "left_anti").collect() > assert(result.length == 1) > {code} > Without the {{distinct()}}, the assertion is true. With {{distinct()}}, the > assertion should still hold but is false. > Rule {{PushDownLeftSemiAntiJoin}} pushes the {{Join}} below the left > {{Aggregate}} with join condition {{(id#750 + 1) = id#750}}, which can never > be true. > {code} > === Applying Rule > org.apache.spark.sql.catalyst.optimizer.PushDownLeftSemiAntiJoin === > !Join LeftAnti, (id#752 = id#750) 'Aggregate [id#750], > [(id#750 + 1) AS id#752] > !:- Aggregate [id#750], [(id#750 + 1) AS id#752] +- 'Join LeftAnti, > ((id#750 + 1) = id#750) > !: +- LocalRelation [id#750] :- LocalRelation > [id#750] > !+- Aggregate [id#750], [id#750] +- Aggregate [id#750], > [id#750] > ! +- LocalRelation [id#750]+- LocalRelation > [id#750] > {code} > The optimizer then rightly removes the left-anti join altogether, returning > the left child only. > Rule {{PushDownLeftSemiAntiJoin}} should not push down predicates that > reference left *and* right child. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41162) Anti-join must not be pushed below aggregation with ambiguous predicates
[ https://issues.apache.org/jira/browse/SPARK-41162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enrico Minack updated SPARK-41162: -- Description: The following query should return a single row as all values for {{id}} except for the largest will be eliminated by the anti-join: {code} val ids = Seq(1, 2, 3).toDF("id").distinct() val result = ids.withColumn("id", $"id" + 1).join(ids, Seq("id"), "left_anti").collect() assert(result.length == 1) {code} Without the {{distinct()}}, the assertion is true. With {{distinct()}}, the assertion should still hold but is false. Rule {{PushDownLeftSemiAntiJoin}} pushes the {{Join}} below the left {{Aggregate}} with join condition {{(id#750 + 1) = id#750}}, which can never be true. {code} === Applying Rule org.apache.spark.sql.catalyst.optimizer.PushDownLeftSemiAntiJoin === !Join LeftAnti, (id#752 = id#750) 'Aggregate [id#750], [(id#750 + 1) AS id#752] !:- Aggregate [id#750], [(id#750 + 1) AS id#752] +- 'Join LeftAnti, ((id#750 + 1) = id#750) !: +- LocalRelation [id#750] :- LocalRelation [id#750] !+- Aggregate [id#750], [id#750] +- Aggregate [id#750], [id#750] ! +- LocalRelation [id#750]+- LocalRelation [id#750] {code} The optimizer then rightly removes the left-anti join altogether, returning the left child only. Rule {{PushDownLeftSemiAntiJoin}} should not push down predicates that reference left *and* right child. was: The following query should return a single row as all values for {{id}} except for the largest will be eliminated by the anti-join: {code} val ids = Seq(1, 2, 3).toDF("id").distinct() val result = ids.withColumn("id", $"id" + 1).join(ids, "id", "left_anti").collect() assert(result.length == 1) {code} Without the {{distinct()}}, the assertion is true. With {{distinct()}}, the assertion should still hold but is false. Rule {{PushDownLeftSemiAntiJoin}} pushes the {{Join}} below the left {{Aggregate}} with join condition {{(id#750 + 1) = id#750}}, which can never be true. {code} === Applying Rule org.apache.spark.sql.catalyst.optimizer.PushDownLeftSemiAntiJoin === !Join LeftAnti, (id#752 = id#750) 'Aggregate [id#750], [(id#750 + 1) AS id#752] !:- Aggregate [id#750], [(id#750 + 1) AS id#752] +- 'Join LeftAnti, ((id#750 + 1) = id#750) !: +- LocalRelation [id#750] :- LocalRelation [id#750] !+- Aggregate [id#750], [id#750] +- Aggregate [id#750], [id#750] ! +- LocalRelation [id#750]+- LocalRelation [id#750] {code} The optimizer then rightly removes the left-anti join altogether, returning the left child only. Rule {{PushDownLeftSemiAntiJoin}} should not push down predicates that reference left *and* right child. > Anti-join must not be pushed below aggregation with ambiguous predicates > > > Key: SPARK-41162 > URL: https://issues.apache.org/jira/browse/SPARK-41162 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.3, 3.1.3, 3.3.1, 3.2.3, 3.4.0 >Reporter: Enrico Minack >Assignee: Enrico Minack >Priority: Major > Labels: correctness > Fix For: 3.2.4, 3.3.2, 3.4.0 > > > The following query should return a single row as all values for {{id}} > except for the largest will be eliminated by the anti-join: > {code} > val ids = Seq(1, 2, 3).toDF("id").distinct() > val result = ids.withColumn("id", $"id" + 1).join(ids, Seq("id"), > "left_anti").collect() > assert(result.length == 1) > {code} > Without the {{distinct()}}, the assertion is true. With {{distinct()}}, the > assertion should still hold but is false. > Rule {{PushDownLeftSemiAntiJoin}} pushes the {{Join}} below the left > {{Aggregate}} with join condition {{(id#750 + 1) = id#750}}, which can never > be true. > {code} > === Applying Rule > org.apache.spark.sql.catalyst.optimizer.PushDownLeftSemiAntiJoin === > !Join LeftAnti, (id#752 = id#750) 'Aggregate [id#750], > [(id#750 + 1) AS id#752] > !:- Aggregate [id#750], [(id#750 + 1) AS id#752] +- 'Join LeftAnti, > ((id#750 + 1) = id#750) > !: +- LocalRelation [id#750] :- LocalRelation > [id#750] > !+- Aggregate [id#750], [id#750] +- Aggregate [id#750], > [id#750] > ! +- LocalRelation [id#750]+- LocalRelation > [id#750] > {code} > The optimizer then rightly removes the left-anti join altogether, returning > the left child only. > Rule {{PushDownLeftSemiAntiJoin}} should not push down predicates that > reference left *and* right child. -- This message was sent by Atlassian Jira (v8.20.10#820010) --
[jira] [Updated] (SPARK-41162) Anti-join must not be pushed below aggregation with ambiguous predicates
[ https://issues.apache.org/jira/browse/SPARK-41162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enrico Minack updated SPARK-41162: -- Description: The following query should return a single row as all values for {{id}} except for the largest will be eliminated by the anti-join: {code} val ids = Seq(1, 2, 3).toDF("id").distinct() val result = ids.withColumn("id", $"id" + 1).join(ids, "id", "left_anti").collect() assert(result.length == 1) {code} Without the {{distinct()}}, the assertion is true. With {{distinct()}}, the assertion should still hold but is false. Rule {{PushDownLeftSemiAntiJoin}} pushes the {{Join}} below the left {{Aggregate}} with join condition {{(id#750 + 1) = id#750}}, which can never be true. {code} === Applying Rule org.apache.spark.sql.catalyst.optimizer.PushDownLeftSemiAntiJoin === !Join LeftAnti, (id#752 = id#750) 'Aggregate [id#750], [(id#750 + 1) AS id#752] !:- Aggregate [id#750], [(id#750 + 1) AS id#752] +- 'Join LeftAnti, ((id#750 + 1) = id#750) !: +- LocalRelation [id#750] :- LocalRelation [id#750] !+- Aggregate [id#750], [id#750] +- Aggregate [id#750], [id#750] ! +- LocalRelation [id#750]+- LocalRelation [id#750] {code} The optimizer then rightly removes the left-anti join altogether, returning the left child only. Rule {{PushDownLeftSemiAntiJoin}} should not push down predicates that reference left *and* right child. was: The following query should return a single row as all values for `id` except for the largest will be eliminated by the anti-join: {code} val ids = Seq(1, 2, 3).toDF("id").distinct() val result = ids.withColumn("id", $"id" + 1).join(ids, "id", "left_anti").collect() assert(result.length == 1) {code} Without the {{distinct()}}, the assertion is true. With {{distinct()}}, the assertion should still hold but is false. Rule {{PushDownLeftSemiAntiJoin}} pushes the {{Join}} below the left {{Aggregate}} with join condition {{(id#750 + 1) = id#750}}, which can never be true. {code} === Applying Rule org.apache.spark.sql.catalyst.optimizer.PushDownLeftSemiAntiJoin === !Join LeftAnti, (id#752 = id#750) 'Aggregate [id#750], [(id#750 + 1) AS id#752] !:- Aggregate [id#750], [(id#750 + 1) AS id#752] +- 'Join LeftAnti, ((id#750 + 1) = id#750) !: +- LocalRelation [id#750] :- LocalRelation [id#750] !+- Aggregate [id#750], [id#750] +- Aggregate [id#750], [id#750] ! +- LocalRelation [id#750]+- LocalRelation [id#750] {code} The optimizer then rightly removes the left-anti join altogether, returning the left child only. Rule {{PushDownLeftSemiAntiJoin}} should not push down predicates that reference left *and* right child. > Anti-join must not be pushed below aggregation with ambiguous predicates > > > Key: SPARK-41162 > URL: https://issues.apache.org/jira/browse/SPARK-41162 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0 >Reporter: Enrico Minack >Priority: Major > > The following query should return a single row as all values for {{id}} > except for the largest will be eliminated by the anti-join: > {code} > val ids = Seq(1, 2, 3).toDF("id").distinct() > val result = ids.withColumn("id", $"id" + 1).join(ids, "id", > "left_anti").collect() > assert(result.length == 1) > {code} > Without the {{distinct()}}, the assertion is true. With {{distinct()}}, the > assertion should still hold but is false. > Rule {{PushDownLeftSemiAntiJoin}} pushes the {{Join}} below the left > {{Aggregate}} with join condition {{(id#750 + 1) = id#750}}, which can never > be true. > {code} > === Applying Rule > org.apache.spark.sql.catalyst.optimizer.PushDownLeftSemiAntiJoin === > !Join LeftAnti, (id#752 = id#750) 'Aggregate [id#750], > [(id#750 + 1) AS id#752] > !:- Aggregate [id#750], [(id#750 + 1) AS id#752] +- 'Join LeftAnti, > ((id#750 + 1) = id#750) > !: +- LocalRelation [id#750] :- LocalRelation > [id#750] > !+- Aggregate [id#750], [id#750] +- Aggregate [id#750], > [id#750] > ! +- LocalRelation [id#750]+- LocalRelation > [id#750] > {code} > The optimizer then rightly removes the left-anti join altogether, returning > the left child only. > Rule {{PushDownLeftSemiAntiJoin}} should not push down predicates that > reference left *and* right child. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41162) Anti-join must not be pushed below aggregation with ambiguous predicates
[ https://issues.apache.org/jira/browse/SPARK-41162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shardul Mahadik updated SPARK-41162: Labels: correctness (was: ) > Anti-join must not be pushed below aggregation with ambiguous predicates > > > Key: SPARK-41162 > URL: https://issues.apache.org/jira/browse/SPARK-41162 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0 >Reporter: Enrico Minack >Priority: Major > Labels: correctness > > The following query should return a single row as all values for {{id}} > except for the largest will be eliminated by the anti-join: > {code} > val ids = Seq(1, 2, 3).toDF("id").distinct() > val result = ids.withColumn("id", $"id" + 1).join(ids, "id", > "left_anti").collect() > assert(result.length == 1) > {code} > Without the {{distinct()}}, the assertion is true. With {{distinct()}}, the > assertion should still hold but is false. > Rule {{PushDownLeftSemiAntiJoin}} pushes the {{Join}} below the left > {{Aggregate}} with join condition {{(id#750 + 1) = id#750}}, which can never > be true. > {code} > === Applying Rule > org.apache.spark.sql.catalyst.optimizer.PushDownLeftSemiAntiJoin === > !Join LeftAnti, (id#752 = id#750) 'Aggregate [id#750], > [(id#750 + 1) AS id#752] > !:- Aggregate [id#750], [(id#750 + 1) AS id#752] +- 'Join LeftAnti, > ((id#750 + 1) = id#750) > !: +- LocalRelation [id#750] :- LocalRelation > [id#750] > !+- Aggregate [id#750], [id#750] +- Aggregate [id#750], > [id#750] > ! +- LocalRelation [id#750]+- LocalRelation > [id#750] > {code} > The optimizer then rightly removes the left-anti join altogether, returning > the left child only. > Rule {{PushDownLeftSemiAntiJoin}} should not push down predicates that > reference left *and* right child. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41162) Anti-join must not be pushed below aggregation with ambiguous predicates
[ https://issues.apache.org/jira/browse/SPARK-41162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enrico Minack updated SPARK-41162: -- Affects Version/s: 3.3.1 3.1.3 3.2.3 > Anti-join must not be pushed below aggregation with ambiguous predicates > > > Key: SPARK-41162 > URL: https://issues.apache.org/jira/browse/SPARK-41162 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.3, 3.3.1, 3.2.3, 3.4.0 >Reporter: Enrico Minack >Priority: Major > Labels: correctness > > The following query should return a single row as all values for {{id}} > except for the largest will be eliminated by the anti-join: > {code} > val ids = Seq(1, 2, 3).toDF("id").distinct() > val result = ids.withColumn("id", $"id" + 1).join(ids, "id", > "left_anti").collect() > assert(result.length == 1) > {code} > Without the {{distinct()}}, the assertion is true. With {{distinct()}}, the > assertion should still hold but is false. > Rule {{PushDownLeftSemiAntiJoin}} pushes the {{Join}} below the left > {{Aggregate}} with join condition {{(id#750 + 1) = id#750}}, which can never > be true. > {code} > === Applying Rule > org.apache.spark.sql.catalyst.optimizer.PushDownLeftSemiAntiJoin === > !Join LeftAnti, (id#752 = id#750) 'Aggregate [id#750], > [(id#750 + 1) AS id#752] > !:- Aggregate [id#750], [(id#750 + 1) AS id#752] +- 'Join LeftAnti, > ((id#750 + 1) = id#750) > !: +- LocalRelation [id#750] :- LocalRelation > [id#750] > !+- Aggregate [id#750], [id#750] +- Aggregate [id#750], > [id#750] > ! +- LocalRelation [id#750]+- LocalRelation > [id#750] > {code} > The optimizer then rightly removes the left-anti join altogether, returning > the left child only. > Rule {{PushDownLeftSemiAntiJoin}} should not push down predicates that > reference left *and* right child. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org