Thanks Prateek, I think from this content it looks to me like we are ready to start a vote to explicitly accept ALP into Parquet
Does anyone know of a reason we should postpone it for longer? Perhaps someone needs some more time to review? Andrew On Wed, Apr 22, 2026 at 1:00 PM PRATEEK GAUR <[email protected]> wrote: > Hi team, > > > > Hope everyone is doing well. I got a chance to work through all the > remaining feedback and update the spec doc. Here are the new artifacts > > 1) Spec document : > https://docs.google.com/document/d/1xz2cudDpN2Y1ImFcTXh15s-3fPtD_aWt/edit > > 2) Spec document in parquet format repo : > https://github.com/apache/parquet-format/pull/557 > > 3) Alp implementation in arrow c++ repo : > https://github.com/apache/arrow/pull/48345/changes > > 4) Alp implementation in parquet-java repo : Work for Vinoo and Julien > https://github.com/apache/parquet-java/pull/3397 > > 5) PR with test and benchmarking artifacts in parquet-testing repo : > https://github.com/apache/parquet-testing/pull/100 > > > And > > > - Go : Arnav just submitted an in progress implementation in Go. > https://github.com/apache/arrow-go/pull/704 (I haven't started looking > at it yet) > - Rust : I remember Andrew mentioned that this work is also in > progress (So 4 languages!) > > > *Arrow C++ implementation * > > > > The PR is out and was also used by Antoine to report the numbers as > reported here. Micah and Konstantin have given 1 round of feedback and > I'm addressing them today. Please note that the default optimization flag > for compiling is O2 and not Q3. I got around 70% performance improvement in > the decoding speed when using the O3 flag. > > > > *Parqet-MR Java implementation (working with Vinoo and Julien) and **Cross > Language testing* > > > Let me know if you have any questions or feedback. > > > > Now pasting some performance numbers > > > Table 1: C++ ALP Double Decode — Spotify Columns (Graviton 3, ARM > Neoverse V1) > > ┌──────────────────┬──────────────┬──────────────┬─────────┐ > > │ Column │ -O2 (MB/s) │ -O3 (MB/s) │ Speedup │ > > ├──────────────────┼──────────────┼──────────────┼─────────┤ > > │ valence │ 3,155 │ 5,523 │ 1.75x │ > > │ danceability │ 3,233 │ 5,685 │ 1.76x │ > > │ energy │ 3,197 │ 5,652 │ 1.77x │ > > │ loudness │ 3,186 │ 5,473 │ 1.72x │ > > └──────────────────┴──────────────┴──────────────┴─────────┘ > > > > > On Wed, Feb 25, 2026 at 9:49 AM PRATEEK GAUR <[email protected]> wrote: > >> @Micah Kornfield <[email protected]> : Got it. >> >> @Andrew Lamb <[email protected]> >> >> >>> Do you think it would be good to start moving the spec development into >>> markdown format, in preparation for finalizing it? >>> >> >> Yes I'll update the numbers for some of the examples I have in the spec >> based >> on the updated header size. Then we should be good to go for the markdown >> format. >> >> Thanks everyone! >> >> >>> >>> Andrew >>> >>> On Tue, Feb 17, 2026 at 7:28 PM PRATEEK GAUR <[email protected]> wrote: >>> >>> > Hi team, >>> > >>> > 1) Andrew >>> > >>> > - Thanks for working on test files. My PR did add all the test >>> files I >>> > used to benchmark on datasets. Maybe we can club it together. WIll >>> also >>> > aid >>> > cross language testing >>> > - Kosta Tarasov working on Rust implementation. This is great. >>> Thanks >>> > >>> > >>> > 2) Antoine >>> > >>> > - Thanks a lot for reporting the numbers on AMD. Looks like you are >>> > getting 8X the decoding performance of BSS. This is amazing!!. >>> > - Thanks for acknowledging the sampling design. >>> > - I agree with you on Fastlanes. In some crude experiments I didn't >>> get >>> > a good perf benefit from it on Graviton3 (but maybe there was >>> something >>> > wrong with my implementation). >>> > - Locking the 16bit exception encoding for the spec in this case. >>> > - Awesome I think we have solved for all open questions minus the >>> > version byte :). (will get back on this soon) >>> > >>> > >>> > 3) Micah >>> > >>> > - FastLanes : The current spec does allow for using FastLane with >>> the >>> > configurable enum value for layout. We should be able to inject any >>> > layout >>> > in the current design. >>> > >>> > >>> > Working on resolving all remaining open comments on the spec this week. >>> > >>> > Best >>> > Prateek >>> > >>> > >>> > On Tue, Feb 10, 2026 at 3:37 AM Steve Loughran <[email protected]> >>> > wrote: >>> > >>> > > On Sun, 8 Feb 2026 at 18:12, Micah Kornfield <[email protected]> >>> > > wrote: >>> > > >>> > > > >>> > > > >>> > > > It looks like the actual issue described for ORC in the paper is >>> that >>> > it >>> > > > has multiple sub-encodings in a batch. This is different then the >>> > design >>> > > > proposed here where there is still fixed encoding per page in >>> parquet. >>> > > > Given reasonably sized pages I don't think branch misprediction >>> should >>> > > be a >>> > > > big issue for new encodings. I agree that we should be >>> conservative in >>> > > > general for adding new encodings. >>> > > > >>> > > > >>> > > +1 >>> > > >>> > >>> >>
