This is super exciting, thank you Naohiro I also think ALP[1] (built on FastLanes[2]) is a great encoding to explore
Getting a Java based implementation of ALP would be a great validation that the approach works well across platforms. There are open source implementations in both C/C++[3] and Rust (via vortex) [4] that we could use to benchmark / build prototypes Andrew [1]: https://ir.cwi.nl/pub/33334/33334.pdf [2]: https://www.vldb.org/pvldb/vol16/p2132-afroozeh.pdf [3]: https://github.com/cwida/FastLanes/tree/4014a3a51083a06b6d446fb78e446494721aa12b/src/alp [4]: https://github.com/vortex-data/vortex/blob/153040140e72d9038f5c092e6c6348c28a462211/encodings/alp/src/lib.rs#L4 On Fri, Oct 3, 2025 at 12:22 AM [email protected] <[email protected]> wrote: > Hi Andrew, > > I'm Naohiro, and I'm the person Julien has been in touch with. I was > planning to attend the sync yesterday but unfortunately missed it due to > the timezone difference. (I’m in Japan) > > Thanks for kicking off this discussion, I'm definitely interested in > contributing. > > To start with, I'm currently working on a POC in parquet-java to evaluate > ALP. While ALP and floating-point compression are my main focus at the > moment, I'm also interested in exploring other encoding strategies that > could benefit Parquet. I'm also drafting a proposal in Google Docs, and > once it's ready, I'll share the link. > > I'd love to hear if others are working on similar efforts, especially > around floating-point compression, to avoid duplication and potentially > collaborate. > > On 2025/10/01 18:11:51 Andrew Lamb wrote: > > I would like to start a discussion to help organize and rally anyone > > interested in adding new encodings to Parquet. > > > > I am pretty sure there are many people interested in adding new > encodings, > > but there are only a few mentions on the mailing list, such as pcode [1] > > and FSST/ALP/FastLanes [2]. Prateek mentioned on the sync call today > > that he is working on evaluating some potential encodings and hopes to > have > > some information to share soon, and Julien mentioned he had spoken to > > someone else who might be doing something similar. > > > > Now that Julien has defined a process to extend the spec[3] I think the > > steps are much clearer. > > > > So, I would like to invite anyone interested in adding new encodings to > > respond and let us know if you are willing to help evaluate new encodings > > and prototype integrations into Parquet implementations? > > > > Andrew > > > > > > [1]: https://lists.apache.org/thread/bdmfcj4g6y1ccd3mfgrp7d43d73s6zf6 > > [2]: https://lists.apache.org/thread/s3o9jk0hr942pv6ono4ymnvvj6pfdsdw > > [3]: > > https://github.com/apache/parquet-format/blob/master/proposals/README.md > > > >
