Thanks Andrew and Micah. `fair amount of feedback on at least the implementations` For the c++ I have already started addressing the feedback, I should be done with that Monday/Tuesday. I think Vinoo too has been making good progress on the Java implementation.
Best Prateek On Sat, Apr 25, 2026 at 12:55 PM Andrew Lamb <[email protected]> wrote: > Got it. Thank you for the clarification -- I will try and look into the > spec and the Rust implementation[1] in this next week > > [1]: https://github.com/apache/arrow-rs/pull/9372 > > On Sat, Apr 25, 2026 at 12:01 PM Micah Kornfield <[email protected]> > wrote: > >> Hi Andrew, >> I think there is a fair amount of feedback on at least the >> implementations, typically I think we've waited till they are close to >> mergeable before a final vote. Otherwise I agree we are very close. >> >> -Micah >> >> On Saturday, April 25, 2026, Andrew Lamb <[email protected]> wrote: >> >>> Thanks Prateek, >>> >>> I think from this content it looks to me like we are ready to start a >>> vote to explicitly accept ALP into Parquet >>> >>> Does anyone know of a reason we should postpone it for longer? >>> Perhaps someone needs some more time to review? >>> >>> Andrew >>> >>> >>> >>> On Wed, Apr 22, 2026 at 1:00 PM PRATEEK GAUR <[email protected]> wrote: >>> >>>> Hi team, >>>> >>>> >>>> >>>> Hope everyone is doing well. I got a chance to work through all the >>>> remaining feedback and update the spec doc. Here are the new artifacts >>>> >>>> 1) Spec document : >>>> https://docs.google.com/document/d/1xz2cudDpN2Y1ImFcTXh15s-3fPtD_aWt/edit >>>> >>>> 2) Spec document in parquet format repo : >>>> https://github.com/apache/parquet-format/pull/557 >>>> >>>> 3) Alp implementation in arrow c++ repo : >>>> https://github.com/apache/arrow/pull/48345/changes >>>> >>>> 4) Alp implementation in parquet-java repo : Work for Vinoo and Julien >>>> https://github.com/apache/parquet-java/pull/3397 >>>> >>>> 5) PR with test and benchmarking artifacts in parquet-testing repo : >>>> https://github.com/apache/parquet-testing/pull/100 >>>> >>>> >>>> And >>>> >>>> >>>> - Go : Arnav just submitted an in progress implementation in Go. >>>> https://github.com/apache/arrow-go/pull/704 (I haven't started >>>> looking at it yet) >>>> - Rust : I remember Andrew mentioned that this work is also in >>>> progress (So 4 languages!) >>>> >>>> >>>> *Arrow C++ implementation * >>>> >>>> >>>> >>>> The PR is out and was also used by Antoine to report the numbers as >>>> reported here. Micah and Konstantin have given 1 round of feedback and >>>> I'm addressing them today. Please note that the default optimization >>>> flag for compiling is O2 and not Q3. I got around 70% performance >>>> improvement in the decoding speed when using the O3 flag. >>>> >>>> >>>> >>>> *Parqet-MR Java implementation (working with Vinoo and Julien) and **Cross >>>> Language testing* >>>> >>>> >>>> Let me know if you have any questions or feedback. >>>> >>>> >>>> >>>> Now pasting some performance numbers >>>> >>>> >>>> Table 1: C++ ALP Double Decode — Spotify Columns (Graviton 3, ARM >>>> Neoverse V1) >>>> >>>> ┌──────────────────┬──────────────┬──────────────┬─────────┐ >>>> >>>> │ Column │ -O2 (MB/s) │ -O3 (MB/s) │ Speedup │ >>>> >>>> ├──────────────────┼──────────────┼──────────────┼─────────┤ >>>> >>>> │ valence │ 3,155 │ 5,523 │ 1.75x │ >>>> >>>> │ danceability │ 3,233 │ 5,685 │ 1.76x │ >>>> >>>> │ energy │ 3,197 │ 5,652 │ 1.77x │ >>>> >>>> │ loudness │ 3,186 │ 5,473 │ 1.72x │ >>>> >>>> └──────────────────┴──────────────┴──────────────┴─────────┘ >>>> >>>> >>>> >>>> >>>> On Wed, Feb 25, 2026 at 9:49 AM PRATEEK GAUR <[email protected]> >>>> wrote: >>>> >>>>> @Micah Kornfield <[email protected]> : Got it. >>>>> >>>>> @Andrew Lamb <[email protected]> >>>>> >>>>> >>>>>> Do you think it would be good to start moving the spec development >>>>>> into >>>>>> markdown format, in preparation for finalizing it? >>>>>> >>>>> >>>>> Yes I'll update the numbers for some of the examples I have in the >>>>> spec based >>>>> on the updated header size. Then we should be good to go for the >>>>> markdown format. >>>>> >>>>> Thanks everyone! >>>>> >>>>> >>>>>> >>>>>> Andrew >>>>>> >>>>>> On Tue, Feb 17, 2026 at 7:28 PM PRATEEK GAUR <[email protected]> >>>>>> wrote: >>>>>> >>>>>> > Hi team, >>>>>> > >>>>>> > 1) Andrew >>>>>> > >>>>>> > - Thanks for working on test files. My PR did add all the test >>>>>> files I >>>>>> > used to benchmark on datasets. Maybe we can club it together. >>>>>> WIll also >>>>>> > aid >>>>>> > cross language testing >>>>>> > - Kosta Tarasov working on Rust implementation. This is great. >>>>>> Thanks >>>>>> > >>>>>> > >>>>>> > 2) Antoine >>>>>> > >>>>>> > - Thanks a lot for reporting the numbers on AMD. Looks like you >>>>>> are >>>>>> > getting 8X the decoding performance of BSS. This is amazing!!. >>>>>> > - Thanks for acknowledging the sampling design. >>>>>> > - I agree with you on Fastlanes. In some crude experiments I >>>>>> didn't get >>>>>> > a good perf benefit from it on Graviton3 (but maybe there was >>>>>> something >>>>>> > wrong with my implementation). >>>>>> > - Locking the 16bit exception encoding for the spec in this case. >>>>>> > - Awesome I think we have solved for all open questions minus the >>>>>> > version byte :). (will get back on this soon) >>>>>> > >>>>>> > >>>>>> > 3) Micah >>>>>> > >>>>>> > - FastLanes : The current spec does allow for using FastLane >>>>>> with the >>>>>> > configurable enum value for layout. We should be able to inject >>>>>> any >>>>>> > layout >>>>>> > in the current design. >>>>>> > >>>>>> > >>>>>> > Working on resolving all remaining open comments on the spec this >>>>>> week. >>>>>> > >>>>>> > Best >>>>>> > Prateek >>>>>> > >>>>>> > >>>>>> > On Tue, Feb 10, 2026 at 3:37 AM Steve Loughran <[email protected] >>>>>> > >>>>>> > wrote: >>>>>> > >>>>>> > > On Sun, 8 Feb 2026 at 18:12, Micah Kornfield < >>>>>> [email protected]> >>>>>> > > wrote: >>>>>> > > >>>>>> > > > >>>>>> > > > >>>>>> > > > It looks like the actual issue described for ORC in the paper >>>>>> is that >>>>>> > it >>>>>> > > > has multiple sub-encodings in a batch. This is different then >>>>>> the >>>>>> > design >>>>>> > > > proposed here where there is still fixed encoding per page in >>>>>> parquet. >>>>>> > > > Given reasonably sized pages I don't think branch misprediction >>>>>> should >>>>>> > > be a >>>>>> > > > big issue for new encodings. I agree that we should be >>>>>> conservative in >>>>>> > > > general for adding new encodings. >>>>>> > > > >>>>>> > > > >>>>>> > > +1 >>>>>> > > >>>>>> > >>>>>> >>>>>
