pan3793 commented on PR #54477: URL: https://github.com/apache/spark/pull/54477#issuecomment-4404208498
Now this is the last known blocker for Spark to support JDK 25. I have tried my best to push the Apache DataSketches community to release the datasketches-memory 3.0.3, which includes https://github.com/apache/datasketches-memory/pull/272, but unfortunately, this has not happened. For me, JDK 25 is a great improvement for Spark workloads, especially [JEP 423: Region Pinning for G1](https://openjdk.org/jeps/423), which eliminates the [G1 OOM issue](https://github.com/apache/spark/pull/51796) that happens on JDK 17 and 21. We have run Spark with JDK 25 internally for a few months with the patched `datasketches-memory` There are approaches I can imagine to help move this forward: 1. Fork a `datasketches-memory` 3.0.3 and deploy it to Maven central, with a different group name - Spark used to do a similar thing for Hive 1.2.1, this is the fastest way, I can complete it in one day. 2. Shade the datasketches* libs in Spark, override the class `org.apache.datasketches.memory.internal.ResourceImpl` 3. A variant of 2, Copy the `org.apache.datasketches.memory.internal.ResourceImpl` to Spark codebase, in the `dev/make-distribution.sh`, use `zip` command to delete the `ResourceImpl.class` from `datasketches-memory-3.0.2.jar` @LuciferYang @dongjoon-hyun @huaxingao @wangyum @dtenedor @cboumalh @medb would love to know your thoughts about this, thank you in advance. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
