Asif created SPARK-46671:
----------------------------

             Summary: InferFiltersFromConstraint rule is creating a redundant 
filter
                 Key: SPARK-46671
                 URL: https://issues.apache.org/jira/browse/SPARK-46671
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 3.5.0
            Reporter: Asif


while bring my old PR which uses a different approach to  the 
ConstraintPropagation algorithm ( 
[SPARK-33152|https://issues.apache.org/jira/browse/SPARK-33152]) in synch with 
current master, I noticed a test failure in my branch for SPARK-33152:
The test which is failing is
InferFiltersFromConstraintSuite:
{code}
  test("SPARK-43095: Avoid Once strategy's idempotence is broken for batch: 
Infer Filters") {
    val x = testRelation.as("x")
    val y = testRelation.as("y")
    val z = testRelation.as("z")

    // Removes EqualNullSafe when constructing candidate constraints
    comparePlans(
      InferFiltersFromConstraints(x.select($"x.a", $"x.a".as("xa"))
        .where($"xa" <=> $"x.a" && $"xa" === $"x.a").analyze),
      x.select($"x.a", $"x.a".as("xa"))
        .where($"xa".isNotNull && $"x.a".isNotNull && $"xa" <=> $"x.a" && $"xa" 
=== $"x.a").analyze)

    // Once strategy's idempotence is not broken
    val originalQuery =
      x.join(y, condition = Some($"x.a" === $"y.a"))
        .select($"x.a", $"x.a".as("xa")).as("xy")
        .join(z, condition = Some($"xy.a" === $"z.a")).analyze

    val correctAnswer =
      x.where($"a".isNotNull).join(y.where($"a".isNotNull), condition = 
Some($"x.a" === $"y.a"))
        .select($"x.a", $"x.a".as("xa")).as("xy")
        .join(z.where($"a".isNotNull), condition = Some($"xy.a" === 
$"z.a")).analyze

    val optimizedQuery = InferFiltersFromConstraints(originalQuery)
    comparePlans(optimizedQuery, correctAnswer)
    comparePlans(InferFiltersFromConstraints(optimizedQuery), correctAnswer)
  }
{code}

In the above test, I believe the below assertion is not proper.
There is a redundant filter which is getting created.
Out of these two isNotNull constraints,  only one should be created.

$"xa".isNotNull && $"x.a".isNotNull 
 Because presence of (xa#0 = a#0), automatically implies that is one attribute 
is not null, the other also has to be not null.

  // Removes EqualNullSafe when constructing candidate constraints
    comparePlans(
      InferFiltersFromConstraints(x.select($"x.a", $"x.a".as("xa"))
        .where($"xa" <=> $"x.a" && $"xa" === $"x.a").analyze),
      x.select($"x.a", $"x.a".as("xa"))
        .where($"xa".isNotNull && $"x.a".isNotNull && $"xa" <=> $"x.a" && $"xa" 
=== $"x.a").analyze) 

This is not a big issue, but it highlights the need to take a relook at the 
code of ConstraintPropagation and related code.

I am filing this jira so that constraint code can be tightened/made more robust.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to