Hi Andrew, I think there is a fair amount of feedback on at least the implementations, typically I think we've waited till they are close to mergeable before a final vote. Otherwise I agree we are very close.
-Micah On Saturday, April 25, 2026, Andrew Lamb <[email protected]> wrote: > Thanks Prateek, > > I think from this content it looks to me like we are ready to start a vote > to explicitly accept ALP into Parquet > > Does anyone know of a reason we should postpone it for longer? > Perhaps someone needs some more time to review? > > Andrew > > > > On Wed, Apr 22, 2026 at 1:00 PM PRATEEK GAUR <[email protected]> wrote: > >> Hi team, >> >> >> >> Hope everyone is doing well. I got a chance to work through all the >> remaining feedback and update the spec doc. Here are the new artifacts >> >> 1) Spec document : https://docs.google.com/document/d/ >> 1xz2cudDpN2Y1ImFcTXh15s-3fPtD_aWt/edit >> >> 2) Spec document in parquet format repo : https://github.com/apache/ >> parquet-format/pull/557 >> >> 3) Alp implementation in arrow c++ repo : https://github.com/apache/ >> arrow/pull/48345/changes >> >> 4) Alp implementation in parquet-java repo : Work for Vinoo and Julien >> https://github.com/apache/parquet-java/pull/3397 >> >> 5) PR with test and benchmarking artifacts in parquet-testing repo : >> https://github.com/apache/parquet-testing/pull/100 >> >> >> And >> >> >> - Go : Arnav just submitted an in progress implementation in Go. >> https://github.com/apache/arrow-go/pull/704 >> <https://github.com/apache/arrow-go/pull/704> (I haven't started >> looking at it yet) >> - Rust : I remember Andrew mentioned that this work is also in >> progress (So 4 languages!) >> >> >> *Arrow C++ implementation * >> >> >> >> The PR is out and was also used by Antoine to report the numbers as >> reported here. Micah and Konstantin have given 1 round of feedback and >> I'm addressing them today. Please note that the default optimization >> flag for compiling is O2 and not Q3. I got around 70% performance >> improvement in the decoding speed when using the O3 flag. >> >> >> >> *Parqet-MR Java implementation (working with Vinoo and Julien) and **Cross >> Language testing* >> >> >> Let me know if you have any questions or feedback. >> >> >> >> Now pasting some performance numbers >> >> >> Table 1: C++ ALP Double Decode — Spotify Columns (Graviton 3, ARM >> Neoverse V1) >> >> ┌──────────────────┬──────────────┬──────────────┬─────────┐ >> >> │ Column │ -O2 (MB/s) │ -O3 (MB/s) │ Speedup │ >> >> ├──────────────────┼──────────────┼──────────────┼─────────┤ >> >> │ valence │ 3,155 │ 5,523 │ 1.75x │ >> >> │ danceability │ 3,233 │ 5,685 │ 1.76x │ >> >> │ energy │ 3,197 │ 5,652 │ 1.77x │ >> >> │ loudness │ 3,186 │ 5,473 │ 1.72x │ >> >> └──────────────────┴──────────────┴──────────────┴─────────┘ >> >> >> >> >> On Wed, Feb 25, 2026 at 9:49 AM PRATEEK GAUR <[email protected]> wrote: >> >>> @Micah Kornfield <[email protected]> : Got it. >>> >>> @Andrew Lamb <[email protected]> >>> >>> >>>> Do you think it would be good to start moving the spec development into >>>> markdown format, in preparation for finalizing it? >>>> >>> >>> Yes I'll update the numbers for some of the examples I have in the spec >>> based >>> on the updated header size. Then we should be good to go for the >>> markdown format. >>> >>> Thanks everyone! >>> >>> >>>> >>>> Andrew >>>> >>>> On Tue, Feb 17, 2026 at 7:28 PM PRATEEK GAUR <[email protected]> >>>> wrote: >>>> >>>> > Hi team, >>>> > >>>> > 1) Andrew >>>> > >>>> > - Thanks for working on test files. My PR did add all the test >>>> files I >>>> > used to benchmark on datasets. Maybe we can club it together. WIll >>>> also >>>> > aid >>>> > cross language testing >>>> > - Kosta Tarasov working on Rust implementation. This is great. >>>> Thanks >>>> > >>>> > >>>> > 2) Antoine >>>> > >>>> > - Thanks a lot for reporting the numbers on AMD. Looks like you are >>>> > getting 8X the decoding performance of BSS. This is amazing!!. >>>> > - Thanks for acknowledging the sampling design. >>>> > - I agree with you on Fastlanes. In some crude experiments I >>>> didn't get >>>> > a good perf benefit from it on Graviton3 (but maybe there was >>>> something >>>> > wrong with my implementation). >>>> > - Locking the 16bit exception encoding for the spec in this case. >>>> > - Awesome I think we have solved for all open questions minus the >>>> > version byte :). (will get back on this soon) >>>> > >>>> > >>>> > 3) Micah >>>> > >>>> > - FastLanes : The current spec does allow for using FastLane with >>>> the >>>> > configurable enum value for layout. We should be able to inject any >>>> > layout >>>> > in the current design. >>>> > >>>> > >>>> > Working on resolving all remaining open comments on the spec this >>>> week. >>>> > >>>> > Best >>>> > Prateek >>>> > >>>> > >>>> > On Tue, Feb 10, 2026 at 3:37 AM Steve Loughran <[email protected]> >>>> > wrote: >>>> > >>>> > > On Sun, 8 Feb 2026 at 18:12, Micah Kornfield <[email protected] >>>> > >>>> > > wrote: >>>> > > >>>> > > > >>>> > > > >>>> > > > It looks like the actual issue described for ORC in the paper is >>>> that >>>> > it >>>> > > > has multiple sub-encodings in a batch. This is different then the >>>> > design >>>> > > > proposed here where there is still fixed encoding per page in >>>> parquet. >>>> > > > Given reasonably sized pages I don't think branch misprediction >>>> should >>>> > > be a >>>> > > > big issue for new encodings. I agree that we should be >>>> conservative in >>>> > > > general for adding new encodings. >>>> > > > >>>> > > > >>>> > > +1 >>>> > > >>>> > >>>> >>>
