kevinwilfong commented on PR #10662: URL: https://github.com/apache/incubator-gluten/pull/10662#issuecomment-3286098478
> Does it mean the driver memory can be decreased with this patch because java serialisation only serialise the same object only once? I suspect the reason is that Spark Java serializes the GlutenPartitions as needed and does not hold the serialized values in memory for a long time. In Gluten, we're currently Protobuf serializing the SplitInfos when we create the GlutenPartitions, and I see a large number of these GlutenPartitions getting held in the Driver's memory while the query is running, so the serialized SplitInfos all exist together at the same time. If Spark is Java serializing the GlutenPartitions only when a Task is ready to execute, and evicts the serialized value from memory as soon as it's been sent to the Executor, with this change we'll only end up with a relatively small number of serialized values present in the Driver's memory at the same time (proportional to the number of Executors). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
