Got it. Thank you for the clarification -- I will try and look into the spec and the Rust implementation[1] in this next week
[1]: https://github.com/apache/arrow-rs/pull/9372 On Sat, Apr 25, 2026 at 12:01 PM Micah Kornfield <[email protected]> wrote: > Hi Andrew, > I think there is a fair amount of feedback on at least the > implementations, typically I think we've waited till they are close to > mergeable before a final vote. Otherwise I agree we are very close. > > -Micah > > On Saturday, April 25, 2026, Andrew Lamb <[email protected]> wrote: > >> Thanks Prateek, >> >> I think from this content it looks to me like we are ready to start a >> vote to explicitly accept ALP into Parquet >> >> Does anyone know of a reason we should postpone it for longer? >> Perhaps someone needs some more time to review? >> >> Andrew >> >> >> >> On Wed, Apr 22, 2026 at 1:00 PM PRATEEK GAUR <[email protected]> wrote: >> >>> Hi team, >>> >>> >>> >>> Hope everyone is doing well. I got a chance to work through all the >>> remaining feedback and update the spec doc. Here are the new artifacts >>> >>> 1) Spec document : >>> https://docs.google.com/document/d/1xz2cudDpN2Y1ImFcTXh15s-3fPtD_aWt/edit >>> >>> 2) Spec document in parquet format repo : >>> https://github.com/apache/parquet-format/pull/557 >>> >>> 3) Alp implementation in arrow c++ repo : >>> https://github.com/apache/arrow/pull/48345/changes >>> >>> 4) Alp implementation in parquet-java repo : Work for Vinoo and Julien >>> https://github.com/apache/parquet-java/pull/3397 >>> >>> 5) PR with test and benchmarking artifacts in parquet-testing repo : >>> https://github.com/apache/parquet-testing/pull/100 >>> >>> >>> And >>> >>> >>> - Go : Arnav just submitted an in progress implementation in Go. >>> https://github.com/apache/arrow-go/pull/704 (I haven't started >>> looking at it yet) >>> - Rust : I remember Andrew mentioned that this work is also in >>> progress (So 4 languages!) >>> >>> >>> *Arrow C++ implementation * >>> >>> >>> >>> The PR is out and was also used by Antoine to report the numbers as >>> reported here. Micah and Konstantin have given 1 round of feedback and >>> I'm addressing them today. Please note that the default optimization >>> flag for compiling is O2 and not Q3. I got around 70% performance >>> improvement in the decoding speed when using the O3 flag. >>> >>> >>> >>> *Parqet-MR Java implementation (working with Vinoo and Julien) and **Cross >>> Language testing* >>> >>> >>> Let me know if you have any questions or feedback. >>> >>> >>> >>> Now pasting some performance numbers >>> >>> >>> Table 1: C++ ALP Double Decode — Spotify Columns (Graviton 3, ARM >>> Neoverse V1) >>> >>> ┌──────────────────┬──────────────┬──────────────┬─────────┐ >>> >>> │ Column │ -O2 (MB/s) │ -O3 (MB/s) │ Speedup │ >>> >>> ├──────────────────┼──────────────┼──────────────┼─────────┤ >>> >>> │ valence │ 3,155 │ 5,523 │ 1.75x │ >>> >>> │ danceability │ 3,233 │ 5,685 │ 1.76x │ >>> >>> │ energy │ 3,197 │ 5,652 │ 1.77x │ >>> >>> │ loudness │ 3,186 │ 5,473 │ 1.72x │ >>> >>> └──────────────────┴──────────────┴──────────────┴─────────┘ >>> >>> >>> >>> >>> On Wed, Feb 25, 2026 at 9:49 AM PRATEEK GAUR <[email protected]> wrote: >>> >>>> @Micah Kornfield <[email protected]> : Got it. >>>> >>>> @Andrew Lamb <[email protected]> >>>> >>>> >>>>> Do you think it would be good to start moving the spec development into >>>>> markdown format, in preparation for finalizing it? >>>>> >>>> >>>> Yes I'll update the numbers for some of the examples I have in the spec >>>> based >>>> on the updated header size. Then we should be good to go for the >>>> markdown format. >>>> >>>> Thanks everyone! >>>> >>>> >>>>> >>>>> Andrew >>>>> >>>>> On Tue, Feb 17, 2026 at 7:28 PM PRATEEK GAUR <[email protected]> >>>>> wrote: >>>>> >>>>> > Hi team, >>>>> > >>>>> > 1) Andrew >>>>> > >>>>> > - Thanks for working on test files. My PR did add all the test >>>>> files I >>>>> > used to benchmark on datasets. Maybe we can club it together. >>>>> WIll also >>>>> > aid >>>>> > cross language testing >>>>> > - Kosta Tarasov working on Rust implementation. This is great. >>>>> Thanks >>>>> > >>>>> > >>>>> > 2) Antoine >>>>> > >>>>> > - Thanks a lot for reporting the numbers on AMD. Looks like you >>>>> are >>>>> > getting 8X the decoding performance of BSS. This is amazing!!. >>>>> > - Thanks for acknowledging the sampling design. >>>>> > - I agree with you on Fastlanes. In some crude experiments I >>>>> didn't get >>>>> > a good perf benefit from it on Graviton3 (but maybe there was >>>>> something >>>>> > wrong with my implementation). >>>>> > - Locking the 16bit exception encoding for the spec in this case. >>>>> > - Awesome I think we have solved for all open questions minus the >>>>> > version byte :). (will get back on this soon) >>>>> > >>>>> > >>>>> > 3) Micah >>>>> > >>>>> > - FastLanes : The current spec does allow for using FastLane with >>>>> the >>>>> > configurable enum value for layout. We should be able to inject >>>>> any >>>>> > layout >>>>> > in the current design. >>>>> > >>>>> > >>>>> > Working on resolving all remaining open comments on the spec this >>>>> week. >>>>> > >>>>> > Best >>>>> > Prateek >>>>> > >>>>> > >>>>> > On Tue, Feb 10, 2026 at 3:37 AM Steve Loughran <[email protected]> >>>>> > wrote: >>>>> > >>>>> > > On Sun, 8 Feb 2026 at 18:12, Micah Kornfield < >>>>> [email protected]> >>>>> > > wrote: >>>>> > > >>>>> > > > >>>>> > > > >>>>> > > > It looks like the actual issue described for ORC in the paper is >>>>> that >>>>> > it >>>>> > > > has multiple sub-encodings in a batch. This is different then >>>>> the >>>>> > design >>>>> > > > proposed here where there is still fixed encoding per page in >>>>> parquet. >>>>> > > > Given reasonably sized pages I don't think branch misprediction >>>>> should >>>>> > > be a >>>>> > > > big issue for new encodings. I agree that we should be >>>>> conservative in >>>>> > > > general for adding new encodings. >>>>> > > > >>>>> > > > >>>>> > > +1 >>>>> > > >>>>> > >>>>> >>>>
