pan3793 commented on PR #54477:
URL: https://github.com/apache/spark/pull/54477#issuecomment-4404208498

   Now this is the last known blocker for Spark to support JDK 25. I have tried 
my best to push the Apache DataSketches community to release the 
datasketches-memory 3.0.3, which includes 
https://github.com/apache/datasketches-memory/pull/272, but unfortunately, this 
has not happened.
   
   For me, JDK 25 is a great improvement for Spark workloads, especially [JEP 
423: Region Pinning for G1](https://openjdk.org/jeps/423), which eliminates the 
[G1 OOM issue](https://github.com/apache/spark/pull/51796) that happens on JDK 
17 and 21. We have run Spark with JDK 25 internally for a few months with the 
patched `datasketches-memory`
   
   There are approaches I can imagine to help move this forward:
   
   1. Fork a `datasketches-memory` 3.0.3 and deploy it to Maven central, with a 
different group name - Spark used to do a similar thing for Hive 1.2.1, this is 
the fastest way, I can complete it in one day.
   2. Shade the datasketches* libs in Spark, override the class 
`org.apache.datasketches.memory.internal.ResourceImpl`
   3. A variant of 2, Copy the 
`org.apache.datasketches.memory.internal.ResourceImpl` to Spark codebase, in 
the `dev/make-distribution.sh`, use `zip` command to delete the 
`ResourceImpl.class` from `datasketches-memory-3.0.2.jar`
   
   @LuciferYang @dongjoon-hyun @huaxingao @wangyum @dtenedor @cboumalh @medb 
would love to know your thoughts about this, thank you in advance.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to