[ 
https://issues.apache.org/jira/browse/SPARK-40362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Asif updated SPARK-40362:
-------------------------
    Description: 
In the canonicalization code which is now in two stages, canonicalization 
involving Commutative operators is broken, if they are subexpressions of 
certain type of expressions which override precanonicalize.

Consider following expression:

a + b > 10

This GreaterThan expression when canonicalized as a whole for first time,  will 
skip the call to Canonicalize.reorderCommutativeOperators for the Add 
expression as the GreaterThan's canonicalization used precanonicalize on 
children ( the Add expression).

 

so if create a new expression 

b + a > 10 and invoke canonicalize it, the canonicalized versions of these two 
expressions will not match.

The test 
{quote}test("bug X") {
    val tr1 = LocalRelation('c.int, 'b.string, 'a.int)

   val y = tr1.where('a.attr + 'c.attr > 10).analyze
   val fullCond = y.asInstanceOf[Filter].condition.clone()
  val addExpr = (fullCond match
Unknown macro: \{     case GreaterThan(x}
).clone().asInstanceOf[Add]
val canonicalizedFullCond = fullCond.canonicalized

// swap the operands of add
val newAddExpr = Add(addExpr.right, addExpr.left)

// build a new condition which is same as the previous one, but with operands 
of //Add reversed 

val builtCondnCanonicalized = GreaterThan(newAddExpr, Literal(10)).canonicalized

assertEquals(canonicalizedFullCond, builtCondnCanonicalized)
}
{quote}
This test fails.

The fix which I propose is that for the commutative expressions, the 
precanonicalize should be overridden and  
Canonicalize.reorderCommutativeOperators be invoked on the expression instead 
of at place of canonicalize. effectively for commutative operands ( add, or , 
multiply , and etc) canonicalize and precanonicalize should be same.



PR:

https://github.com/apache/spark/pull/37824

  was:
In the canonicalization code which is now in two stages, canonicalization 
involving Commutative operators is broken, if they are subexpressions of 
certain type of expressions which override precanonicalize.

Consider following expression:

a + b > 10

This GreaterThan expression when canonicalized as a whole for first time,  will 
skip the call to Canonicalize.reorderCommutativeOperators for the Add 
expression as the GreaterThan's canonicalization used precanonicalize on 
children ( the Add expression).

 

so if create a new expression 

b + a > 10 and invoke canonicalize it, the canonicalized versions of these two 
expressions will not match.

The test 
{quote}
test("bug X") {
    val tr1 = LocalRelation('c.int, 'b.string, 'a.int)

   val y = tr1.where('a.attr + 'c.attr > 10).analyze
   val fullCond = y.asInstanceOf[Filter].condition.clone()
  val addExpr = (fullCond match {
    case GreaterThan(x: Add, _) => x
    case LessThan(_, x: Add) => x
  }).clone().asInstanceOf[Add]
val canonicalizedFullCond = fullCond.canonicalized

// swap the operands of add
val newAddExpr = Add(addExpr.right, addExpr.left)

// build a new condition which is same as the previous one, but with operands 
of //Add reversed 

val builtCondnCanonicalized = GreaterThan(newAddExpr, Literal(10)).canonicalized

assertEquals(canonicalizedFullCond, builtCondnCanonicalized)
}
{quote}

This test fails.

The fix which I propose is that for the commutative expressions,  the 
precanonicalize should be overridden and   
Canonicalize.reorderCommutativeOperators be invoked on the expression instead 
of at place of canonicalize.  effectively for commutative operands ( add, or , 
multiply , and etc)  canonicalize and precanonicalize should be same.
I will file a PR for it


> Bug in Canonicalization of expressions like Add & Multiply i.e Commutative 
> Operators
> ------------------------------------------------------------------------------------
>
>                 Key: SPARK-40362
>                 URL: https://issues.apache.org/jira/browse/SPARK-40362
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.3.0
>            Reporter: Asif
>            Priority: Major
>              Labels: spark-sql
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> In the canonicalization code which is now in two stages, canonicalization 
> involving Commutative operators is broken, if they are subexpressions of 
> certain type of expressions which override precanonicalize.
> Consider following expression:
> a + b > 10
> This GreaterThan expression when canonicalized as a whole for first time,  
> will skip the call to Canonicalize.reorderCommutativeOperators for the Add 
> expression as the GreaterThan's canonicalization used precanonicalize on 
> children ( the Add expression).
>  
> so if create a new expression 
> b + a > 10 and invoke canonicalize it, the canonicalized versions of these 
> two expressions will not match.
> The test 
> {quote}test("bug X") {
>     val tr1 = LocalRelation('c.int, 'b.string, 'a.int)
>    val y = tr1.where('a.attr + 'c.attr > 10).analyze
>    val fullCond = y.asInstanceOf[Filter].condition.clone()
>   val addExpr = (fullCond match
> Unknown macro: \{     case GreaterThan(x}
> ).clone().asInstanceOf[Add]
> val canonicalizedFullCond = fullCond.canonicalized
> // swap the operands of add
> val newAddExpr = Add(addExpr.right, addExpr.left)
> // build a new condition which is same as the previous one, but with operands 
> of //Add reversed 
> val builtCondnCanonicalized = GreaterThan(newAddExpr, 
> Literal(10)).canonicalized
> assertEquals(canonicalizedFullCond, builtCondnCanonicalized)
> }
> {quote}
> This test fails.
> The fix which I propose is that for the commutative expressions, the 
> precanonicalize should be overridden and  
> Canonicalize.reorderCommutativeOperators be invoked on the expression instead 
> of at place of canonicalize. effectively for commutative operands ( add, or , 
> multiply , and etc) canonicalize and precanonicalize should be same.
> PR:
> https://github.com/apache/spark/pull/37824



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to