Hello Apache DataSketches PMC and Community, This is a call for vote to release Apache DataSketches-java candidate version: 9.0.0-RC1
- This is the core Java component of the DataSketches library that includes all the sketch algorithms in production-ready packages. These sketches can be called directly from this component or used in conjunction with the adaptor components such as Hadoop Pig, Hadoop Hive, or the aggregator adaptors built into Apache Druid. Major changes with this release: This release is a major release where we took the opportunity to do some significant refactoring that will constitute incompatible changes from previous releases. Any incompatibility with prior releases is always an inconvenience to users who wish to just upgrade to the latest release and run. However, some of the code in this library was written in 2013 and meanwhile the Java language has evolved enormously since then. We chose to use this major release as the opportunity to modernize some of the code to achieve the following goals: *Remove the dependency on the DataSketches-Memory component and use FFM instead.* - The DataSketches-Memory component was originally developed in 2014 to address the need for fast access to off-heap memory data structures and used Unsafe and other JVM internals as there were no satisfactory Java language features to do this at the time. - The FFM capabilities introduced into the language in Java 22, are now part of the Java 25 LTS release, which we support. Since the capabilities of FFM are a superset of the original DataSketches-Memory component, it made sense to rewrite the code to eliminate the dependency on DataSketches-Memory and use FFM instead. This impacted code across the entire library. - This provided several advantages to the code base. By removing this dependency on DataSketches-Memory, there are now no runtime dependencies! This should make integrating this library into other Java systems much simpler. Since FFM is tightly integrated into the Java language, it has improved performance, especially with bulk operations. - As an added note: There are numerous other improvements to the Java language that we could perhaps take advantage of in a rewrite, e.g., Records, text blocks, switch expressions, sealed, var, modules, patterns, etc. However, faced with the risk of accidentally creating bugs due to too many changes at one time, we focused on FFM, which actually improved performance as opposed to just creating syntactic sugar. *Align public sketch class names so that the sketch family name is part of the class name. * - For example, the Theta sketch family was the first family written for the library and its base class was called *Sketch*. The Tuple sketch family evolved soon after and its base class was also called *Sketch*. If a user wanted to use both the Theta and Tuple families in the same class one of them had to be fully qualified every time it was referenced. - Unfortunately, this style propagated so some of the other early sketch families where we ended up with two different sketch families with a *ItemsSketch, etc*. For the more recent additions to the library we started including the sketch family name in all the relevant sketch-like public classes of a sketch family. - In this release we have refactored these older sketches with new names that now include the sketch family name. This is an incompatible change for user code moving from earlier releases, but this can be readily fixed with search-and-replace tools. This release is not perfect, but hopefully more consistent across all the different sketch families. Known Issues: *SpotBugs* - Make sure you configure SpotBugs with the /tools/FindBugsExcludeFilter.xml file. Otherwise, you may get a lot of false positive or low risk issues that we have examined and eliminated with this exclusion file. *Checkstyle* - At the time of this writing, Checkstyle had not been upgraded to handle Java 25 features. References for this release: *Source repository: * https://github.com/apache/datasketches-java *Git Tag for this release: * https://github.com/apache/datasketches-java/releases/tag/9.0.0-RC1 on branch 9.0.X *Git HashId for this release starts with: * f3b334b on branch 9.0.X *The Release Candidate / Zip Repository: * https://dist.apache.org/repos/dist/dev/datasketches/java/9.0.0-RC1 *The public signing key can be found in the KEYS file: * https://dist.apache.org/repos/dist/dev/datasketches/KEYS *The artifacts have been signed with --keyid-format SHORT:* 8CD4A902 *Repository: Maven Central [Nexus](http://repository.apache.org <http://repository.apache.org>) (Jar Artifacts):* https://repository.apache.org/content/groups/staging/org/apache/datasketches/datasketches-java/9.0.0/ *Build & Test Guide:* https://github.com/apache/datasketches-java/blob/9.0.0-RC1/README.md The vote will be performed as follows: This letter will be published on dev@ and remain open for at least 72 hours (excluding weekends and holidays), AND until at least 3 (+1) PMC votes or a majority of (+1) PMC votes are acquired. Anyone in the community can vote. This vote will close no earlier than Monday Dec 1, 2025, 6:00 PM PST. Please vote accordingly: [ ] +1 approve [ ] +0 no opinion [ ] -1 disapprove with the reason Thanks, Lee Rhodes [email protected]
