Re: [PR] [CORE] Defer Protobuf serialization of SplitInfos in GlutenPartitions [incubator-gluten]

via GitHub Fri, 12 Sep 2025 10:02:35 -0700


kevinwilfong commented on PR #10662:
URL: 
https://github.com/apache/incubator-gluten/pull/10662#issuecomment-3286098478


   > Does it mean the driver memory can be decreased with this patch because 
java serialisation only serialise the same object only once?
   
   I suspect the reason is that Spark Java serializes the GlutenPartitions as 
needed and does not hold the serialized values in memory for a long time. In 
Gluten, we're currently Protobuf serializing the SplitInfos when we create the 
GlutenPartitions, and I see a large number of these GlutenPartitions getting 
held in the Driver's memory while the query is running, so the serialized 
SplitInfos all exist together at the same time. If Spark is Java serializing 
the GlutenPartitions only when a Task is ready to execute, and evicts the 
serialized value from memory as soon as it's been sent to the Executor, with 
this change we'll only end up with a relatively small number of serialized 
values present in the Driver's memory at the same time (proportional to the 
number of Executors).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [CORE] Defer Protobuf serialization of SplitInfos in GlutenPartitions [incubator-gluten]

Reply via email to