felixloesing opened a new pull request, #10541:
URL: https://github.com/apache/incubator-gluten/pull/10541

   This pull request introduces a safer and more robust approach for handling 
Spark's `BroadcastMode` during serialization. The main improvement is the 
introduction of a new `SafeBroadcastMode` abstraction and related utilities, 
which help avoid serialization issues that caused a Stackoverflow exception 
during broadcast exchanges. BroadcastMode was introduced in this 
[PR](https://github.com/apache/incubator-gluten/pull/8116) that caused the 
issue we observed. HashedRelationBroadcastMode embeds Catalyst expression 
trees, which are not safe to Kryo-serialize when running with 
`spark.kryo.referenceTracking=false` (default internally).
   
   With this change, the broadcast payload now contains only primitives and 
byte arrays (no Catalyst trees). For bound keys, we serialize just column 
ordinals (+ null-aware flag) and for computed keys (e.g., upper(col)), we 
serialize the key expressions once as Java bytes and deserialize only where 
needed to build projections.
   
   #### Test Plan
   Ran internal test set (50 queries) and ran other query specifically checking 
if
   `spark.gluten.velox.offHeapBroadcastBuildRelation.enabled=true;` works.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to