[ 
https://issues.apache.org/jira/browse/SPARK-38868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17527585#comment-17527585
 ] 

Apache Spark commented on SPARK-38868:
--------------------------------------

User 'bersprockets' has created a pull request for this issue:
https://github.com/apache/spark/pull/36341

> `assert_true` fails unconditionnaly after `left_outer` joins
> ------------------------------------------------------------
>
>                 Key: SPARK-38868
>                 URL: https://issues.apache.org/jira/browse/SPARK-38868
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark, SQL
>    Affects Versions: 3.1.1, 3.1.2, 3.2.0, 3.2.1, 3.3.0, 3.4.0
>            Reporter: Fabien Dubosson
>            Priority: Major
>
> When `assert_true` is used after a `left_outer` join the assert exception is 
> raised even though all the rows meet the condition. Using an `inner` join 
> does not expose this issue.
>  
> {code:java}
> from pyspark.sql import SparkSession
> from pyspark.sql import functions as sf
> session = SparkSession.builder.getOrCreate()
> entries = session.createDataFrame(
>     [
>         ("a", 1),
>         ("b", 2),
>         ("c", 3),
>     ],
>     ["id", "outcome_id"],
> )
> outcomes = session.createDataFrame(
>     [
>         (1, 12),
>         (2, 34),
>         (3, 32),
>     ],
>     ["outcome_id", "outcome_value"],
> )
> # Inner join works as expected
> (
>     entries.join(outcomes, on="outcome_id", how="inner")
>     .withColumn("valid", sf.assert_true(sf.col("outcome_value") > 10))
>     .filter(sf.col("valid").isNull())
>     .show()
> )
> # Left join fails with «'('outcome_value > 10)' is not true!» even though it 
> is the case
> (
>     entries.join(outcomes, on="outcome_id", how="left_outer")
>     .withColumn("valid", sf.assert_true(sf.col("outcome_value") > 10))
>     .filter(sf.col("valid").isNull())
>     .show()
> ){code}
> Reproduced on `pyspark` versions: `3.2.1`, `3.2.0`, `3.1.2` and `3.1.1`. I am 
> not sure if "native" Spark exposes this issue as well or not, I don't have 
> the knowledge/setup to test that.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to