[ 
https://issues.apache.org/jira/browse/SPARK-46671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17805434#comment-17805434
 ] 

Asif commented on SPARK-46671:
------------------------------

on further thoughts , I am wrong.. There should be 2 separate isNotNull 
constraints..

> InferFiltersFromConstraint rule is creating a redundant filter
> --------------------------------------------------------------
>
>                 Key: SPARK-46671
>                 URL: https://issues.apache.org/jira/browse/SPARK-46671
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.5.0
>            Reporter: Asif
>            Priority: Minor
>              Labels: SQL, catalyst
>
> while bring my old PR which uses a different approach to  the 
> ConstraintPropagation algorithm ( 
> [SPARK-33152|https://issues.apache.org/jira/browse/SPARK-33152]) in synch 
> with current master, I noticed a test failure in my branch for SPARK-33152:
> The test which is failing is
> InferFiltersFromConstraintSuite:
> {code}
>   test("SPARK-43095: Avoid Once strategy's idempotence is broken for batch: 
> Infer Filters") {
>     val x = testRelation.as("x")
>     val y = testRelation.as("y")
>     val z = testRelation.as("z")
>     // Removes EqualNullSafe when constructing candidate constraints
>     comparePlans(
>       InferFiltersFromConstraints(x.select($"x.a", $"x.a".as("xa"))
>         .where($"xa" <=> $"x.a" && $"xa" === $"x.a").analyze),
>       x.select($"x.a", $"x.a".as("xa"))
>         .where($"xa".isNotNull && $"x.a".isNotNull && $"xa" <=> $"x.a" && 
> $"xa" === $"x.a").analyze)
>     // Once strategy's idempotence is not broken
>     val originalQuery =
>       x.join(y, condition = Some($"x.a" === $"y.a"))
>         .select($"x.a", $"x.a".as("xa")).as("xy")
>         .join(z, condition = Some($"xy.a" === $"z.a")).analyze
>     val correctAnswer =
>       x.where($"a".isNotNull).join(y.where($"a".isNotNull), condition = 
> Some($"x.a" === $"y.a"))
>         .select($"x.a", $"x.a".as("xa")).as("xy")
>         .join(z.where($"a".isNotNull), condition = Some($"xy.a" === 
> $"z.a")).analyze
>     val optimizedQuery = InferFiltersFromConstraints(originalQuery)
>     comparePlans(optimizedQuery, correctAnswer)
>     comparePlans(InferFiltersFromConstraints(optimizedQuery), correctAnswer)
>   }
> {code}
> In the above test, I believe the below assertion is not proper.
> There is a redundant filter which is getting created.
> Out of these two isNotNull constraints,  only one should be created.
> $"xa".isNotNull && $"x.a".isNotNull 
>  Because presence of (xa#0 = a#0), automatically implies that is one 
> attribute is not null, the other also has to be not null.
>   // Removes EqualNullSafe when constructing candidate constraints
>     comparePlans(
>       InferFiltersFromConstraints(x.select($"x.a", $"x.a".as("xa"))
>         .where($"xa" <=> $"x.a" && $"xa" === $"x.a").analyze),
>       x.select($"x.a", $"x.a".as("xa"))
>         .where($"xa".isNotNull && $"x.a".isNotNull && $"xa" <=> $"x.a" && 
> $"xa" === $"x.a").analyze) 
> This is not a big issue, but it highlights the need to take a relook at the 
> code of ConstraintPropagation and related code.
> I am filing this jira so that constraint code can be tightened/made more 
> robust.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to