[ https://issues.apache.org/jira/browse/SPARK-38868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17523176#comment-17523176 ]
Sean R. Owen commented on SPARK-38868: -------------------------------------- That does seem like a legit bug; I don't know how to debug or address it myself. The results of the two queries are identical. > `assert_true` fails unconditionnaly after `left_outer` joins > ------------------------------------------------------------ > > Key: SPARK-38868 > URL: https://issues.apache.org/jira/browse/SPARK-38868 > Project: Spark > Issue Type: Bug > Components: PySpark > Affects Versions: 3.1.1, 3.1.2, 3.2.0, 3.2.1 > Reporter: Fabien Dubosson > Priority: Major > > When `assert_true` is used after a `left_outer` join the assert exception is > raised even though all the rows meet the condition. Using an `inner` join > does not expose this issue. > > {code:java} > from pyspark.sql import SparkSession > from pyspark.sql import functions as sf > session = SparkSession.builder.getOrCreate() > entries = session.createDataFrame( > [ > ("a", 1), > ("b", 2), > ("c", 3), > ], > ["id", "outcome_id"], > ) > outcomes = session.createDataFrame( > [ > (1, 12), > (2, 34), > (3, 32), > ], > ["outcome_id", "outcome_value"], > ) > # Inner join works as expected > ( > entries.join(outcomes, on="outcome_id", how="inner") > .withColumn("valid", sf.assert_true(sf.col("outcome_value") > 10)) > .filter(sf.col("valid").isNull()) > .show() > ) > # Left join fails with «'('outcome_value > 10)' is not true!» even though it > is the case > ( > entries.join(outcomes, on="outcome_id", how="left_outer") > .withColumn("valid", sf.assert_true(sf.col("outcome_value") > 10)) > .filter(sf.col("valid").isNull()) > .show() > ){code} > Reproduced on `pyspark` versions: `3.2.1`, `3.2.0`, `3.1.2` and `3.1.1`. I am > not sure if "native" Spark exposes this issue as well or not, I don't have > the knowledge/setup to test that. -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org