ahshahid opened a new pull request, #48094: URL: https://github.com/apache/spark/pull/48094
### What changes were proposed in this pull request? A Trait UnionEquality is introduced which is implemented by Union and UnionExec nodes. It contains code to check equality of Union node legs in an order agnostic manner and also hashCode independent of the order of the legs. The equality does consider if the output attributes of the head nodes are same in terms of name, datatype, metadata, nullability etc (but not exprIDs). It is true that converting Sequence of Legs into set to get order agnostic hashCode can result in situation like: Seq(leg1, leg2) and Seq(leg1, leg2, leg2) to have same hashCode when converted to Set, but that should not cause logical problem as equality checks for length. Though if we want to avoid hash collision in that situation, the code can be changed to Objects.hashCode(Seq(leg1, leg2).map(_.hashCode).sorted: _*) ### Why are the changes needed? Because of the way the equality of Union nodes behave currently, changing the order of the legs, will cause cache miss and reuse of exchange not happening, as the canonicalized plans will not match. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Added tests to check the equality of Union and UnionExec nodes with unaligned order of the legs. Added test to verify cache lookup of InMemoryRelation and reuse of exchange. ### Was this patch authored or co-authored using generative AI tooling? No -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
