ahshahid opened a new pull request, #48094:
URL: https://github.com/apache/spark/pull/48094

   ### What changes were proposed in this pull request?
   A Trait UnionEquality is introduced which is implemented by Union and 
UnionExec nodes.  It contains code to check equality of Union node legs in an  
order agnostic manner  and also hashCode independent of the order of the legs. 
The equality does consider if the output attributes of the head nodes are same 
in terms of name, datatype, metadata, nullability etc (but not exprIDs).
   It is true that converting  Sequence of Legs into set to get order agnostic 
hashCode can result in situation like:
   Seq(leg1, leg2) and Seq(leg1, leg2, leg2) to have same hashCode when 
converted to Set, but that should not cause logical problem as equality checks 
for length.
   Though if we want to avoid hash collision in that situation, the code can be 
changed to
   Objects.hashCode(Seq(leg1, leg2).map(_.hashCode).sorted: _*)
   
   
   ### Why are the changes needed?
   Because of the way the equality of Union nodes behave currently, changing 
the order of the legs, will cause cache miss and reuse of exchange not 
happening, as the canonicalized plans will not match.
   
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   ### How was this patch tested?
   Added tests to check the equality of Union and UnionExec nodes with 
unaligned  order of the legs. 
   Added test to verify cache lookup of InMemoryRelation and reuse of exchange.
   
   
   ### Was this patch authored or co-authored using generative AI tooling?
   No
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to