wForget opened a new pull request, #10733: URL: https://github.com/apache/incubator-gluten/pull/10733
Backport #10541 to branch-1.5 ## What changes are proposed in this pull request? This pull request introduces a safer and more robust approach for handling Spark's BroadcastMode during serialization. The main improvement is the introduction of a new SafeBroadcastMode abstraction and related utilities, which help avoid serialization issues that caused a Stackoverflow exception during broadcast exchanges. BroadcastMode was introduced in this PR that caused the issue we observed. HashedRelationBroadcastMode embeds Catalyst expression trees, which are not safe to Kryo-serialize when running with spark.kryo.referenceTracking=false (default internally). With this change, the broadcast payload now contains only primitives and byte arrays (no Catalyst trees). For bound keys, we serialize just column ordinals (+ null-aware flag) and for computed keys (e.g., upper(col)), we serialize the key expressions once as Java bytes and deserialize only where needed to build projections. (cherry picked from commit 91c52e15f16593747e918145258ebe1408cb8ea2) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
