This is super exciting, thank you Naohiro

I also think ALP[1] (built on FastLanes[2]) is a great encoding to explore

Getting a Java based implementation of ALP would be a great validation
that the approach works well across platforms. There are open source
implementations in both C/C++[3] and Rust (via vortex) [4] that we could
use to benchmark / build prototypes

Andrew

[1]: https://ir.cwi.nl/pub/33334/33334.pdf
[2]: https://www.vldb.org/pvldb/vol16/p2132-afroozeh.pdf
[3]:
https://github.com/cwida/FastLanes/tree/4014a3a51083a06b6d446fb78e446494721aa12b/src/alp
[4]:
https://github.com/vortex-data/vortex/blob/153040140e72d9038f5c092e6c6348c28a462211/encodings/alp/src/lib.rs#L4

On Fri, Oct 3, 2025 at 12:22 AM [email protected]
<[email protected]> wrote:

> Hi Andrew,
>
> I'm Naohiro, and I'm the person Julien has been in touch with. I was
> planning to attend the sync yesterday but unfortunately missed it due to
> the timezone difference. (I’m in Japan)
>
> Thanks for kicking off this discussion, I'm definitely interested in
> contributing.
>
> To start with, I'm currently working on a POC in parquet-java to evaluate
> ALP. While ALP and floating-point compression are my main focus at the
> moment, I'm also interested in exploring other encoding strategies that
> could benefit Parquet. I'm also drafting a proposal in Google Docs, and
> once it's ready, I'll share the link.
>
> I'd love to hear if others are working on similar efforts, especially
> around floating-point compression, to avoid duplication and potentially
> collaborate.
>
> On 2025/10/01 18:11:51 Andrew Lamb wrote:
> > I would like to start a discussion to help organize and rally anyone
> > interested in adding new encodings to Parquet.
> >
> > I am pretty sure there are many people interested in adding new
> encodings,
> > but there are only a few mentions on the mailing list, such as pcode [1]
> > and FSST/ALP/FastLanes [2]. Prateek mentioned on the sync call today
> > that he is working on evaluating some potential encodings and hopes to
> have
> > some information to share soon, and Julien mentioned he had spoken to
> > someone else who might be doing something similar.
> >
> > Now that Julien has defined a process to extend the spec[3] I think the
> > steps are much clearer.
> >
> > So, I would like to invite anyone interested in adding new encodings to
> > respond and let us know if you are willing to help evaluate new encodings
> > and prototype integrations into Parquet implementations?
> >
> > Andrew
> >
> >
> > [1]: https://lists.apache.org/thread/bdmfcj4g6y1ccd3mfgrp7d43d73s6zf6
> > [2]: https://lists.apache.org/thread/s3o9jk0hr942pv6ono4ymnvvj6pfdsdw
> > [3]:
> > https://github.com/apache/parquet-format/blob/master/proposals/README.md
> >
>
>

Reply via email to