felixloesing opened a new pull request, #10541: URL: https://github.com/apache/incubator-gluten/pull/10541
This pull request introduces a safer and more robust approach for handling Spark's `BroadcastMode` during serialization. The main improvement is the introduction of a new `SafeBroadcastMode` abstraction and related utilities, which help avoid serialization issues that caused a Stackoverflow exception during broadcast exchanges. BroadcastMode was introduced in this [PR](https://github.com/apache/incubator-gluten/pull/8116) that caused the issue we observed. HashedRelationBroadcastMode embeds Catalyst expression trees, which are not safe to Kryo-serialize when running with `spark.kryo.referenceTracking=false` (default internally). With this change, the broadcast payload now contains only primitives and byte arrays (no Catalyst trees). For bound keys, we serialize just column ordinals (+ null-aware flag) and for computed keys (e.g., upper(col)), we serialize the key expressions once as Java bytes and deserialize only where needed to build projections. #### Test Plan Ran internal test set (50 queries) and ran other query specifically checking if `spark.gluten.velox.offHeapBroadcastBuildRelation.enabled=true;` works. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
