Thanks Andrew and Micah for review feedback on the two PR's 1) (c++ arrow repo) https://github.com/apache/arrow/pull/48345/changes 2) (parquet-format repo) https://github.com/apache/parquet-format/pull/557
I have addressed all (unless I missed something) comments on the two PR's. Best Prateek On Sat, Apr 25, 2026 at 1:08 PM PRATEEK GAUR <[email protected]> wrote: > Thanks Andrew and Micah. > > `fair amount of feedback on at least the implementations` > For the c++ I have already started addressing the feedback, I should be > done with that Monday/Tuesday. > I think Vinoo too has been making good progress on the Java implementation. > > Best > Prateek > > On Sat, Apr 25, 2026 at 12:55 PM Andrew Lamb <[email protected]> > wrote: > >> Got it. Thank you for the clarification -- I will try and look into the >> spec and the Rust implementation[1] in this next week >> >> [1]: https://github.com/apache/arrow-rs/pull/9372 >> >> On Sat, Apr 25, 2026 at 12:01 PM Micah Kornfield <[email protected]> >> wrote: >> >>> Hi Andrew, >>> I think there is a fair amount of feedback on at least the >>> implementations, typically I think we've waited till they are close to >>> mergeable before a final vote. Otherwise I agree we are very close. >>> >>> -Micah >>> >>> On Saturday, April 25, 2026, Andrew Lamb <[email protected]> wrote: >>> >>>> Thanks Prateek, >>>> >>>> I think from this content it looks to me like we are ready to start a >>>> vote to explicitly accept ALP into Parquet >>>> >>>> Does anyone know of a reason we should postpone it for longer? >>>> Perhaps someone needs some more time to review? >>>> >>>> Andrew >>>> >>>> >>>> >>>> On Wed, Apr 22, 2026 at 1:00 PM PRATEEK GAUR <[email protected]> >>>> wrote: >>>> >>>>> Hi team, >>>>> >>>>> >>>>> >>>>> Hope everyone is doing well. I got a chance to work through all the >>>>> remaining feedback and update the spec doc. Here are the new artifacts >>>>> >>>>> 1) Spec document : >>>>> https://docs.google.com/document/d/1xz2cudDpN2Y1ImFcTXh15s-3fPtD_aWt/edit >>>>> >>>>> 2) Spec document in parquet format repo : >>>>> https://github.com/apache/parquet-format/pull/557 >>>>> >>>>> 3) Alp implementation in arrow c++ repo : >>>>> https://github.com/apache/arrow/pull/48345/changes >>>>> >>>>> 4) Alp implementation in parquet-java repo : Work for Vinoo and Julien >>>>> https://github.com/apache/parquet-java/pull/3397 >>>>> >>>>> 5) PR with test and benchmarking artifacts in parquet-testing repo : >>>>> https://github.com/apache/parquet-testing/pull/100 >>>>> >>>>> >>>>> And >>>>> >>>>> >>>>> - Go : Arnav just submitted an in progress implementation in Go. >>>>> https://github.com/apache/arrow-go/pull/704 (I haven't started >>>>> looking at it yet) >>>>> - Rust : I remember Andrew mentioned that this work is also in >>>>> progress (So 4 languages!) >>>>> >>>>> >>>>> *Arrow C++ implementation * >>>>> >>>>> >>>>> >>>>> The PR is out and was also used by Antoine to report the numbers as >>>>> reported here. Micah and Konstantin have given 1 round of feedback >>>>> and I'm addressing them today. Please note that the default >>>>> optimization flag for compiling is O2 and not Q3. I got around 70% >>>>> performance improvement in the decoding speed when using the O3 flag. >>>>> >>>>> >>>>> >>>>> *Parqet-MR Java implementation (working with Vinoo and Julien) and **Cross >>>>> Language testing* >>>>> >>>>> >>>>> Let me know if you have any questions or feedback. >>>>> >>>>> >>>>> >>>>> Now pasting some performance numbers >>>>> >>>>> >>>>> Table 1: C++ ALP Double Decode — Spotify Columns (Graviton 3, ARM >>>>> Neoverse V1) >>>>> >>>>> ┌──────────────────┬──────────────┬──────────────┬─────────┐ >>>>> >>>>> │ Column │ -O2 (MB/s) │ -O3 (MB/s) │ Speedup │ >>>>> >>>>> ├──────────────────┼──────────────┼──────────────┼─────────┤ >>>>> >>>>> │ valence │ 3,155 │ 5,523 │ 1.75x │ >>>>> >>>>> │ danceability │ 3,233 │ 5,685 │ 1.76x │ >>>>> >>>>> │ energy │ 3,197 │ 5,652 │ 1.77x │ >>>>> >>>>> │ loudness │ 3,186 │ 5,473 │ 1.72x │ >>>>> >>>>> └──────────────────┴──────────────┴──────────────┴─────────┘ >>>>> >>>>> >>>>> >>>>> >>>>> On Wed, Feb 25, 2026 at 9:49 AM PRATEEK GAUR <[email protected]> >>>>> wrote: >>>>> >>>>>> @Micah Kornfield <[email protected]> : Got it. >>>>>> >>>>>> @Andrew Lamb <[email protected]> >>>>>> >>>>>> >>>>>>> Do you think it would be good to start moving the spec development >>>>>>> into >>>>>>> markdown format, in preparation for finalizing it? >>>>>>> >>>>>> >>>>>> Yes I'll update the numbers for some of the examples I have in the >>>>>> spec based >>>>>> on the updated header size. Then we should be good to go for the >>>>>> markdown format. >>>>>> >>>>>> Thanks everyone! >>>>>> >>>>>> >>>>>>> >>>>>>> Andrew >>>>>>> >>>>>>> On Tue, Feb 17, 2026 at 7:28 PM PRATEEK GAUR <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>> > Hi team, >>>>>>> > >>>>>>> > 1) Andrew >>>>>>> > >>>>>>> > - Thanks for working on test files. My PR did add all the test >>>>>>> files I >>>>>>> > used to benchmark on datasets. Maybe we can club it together. >>>>>>> WIll also >>>>>>> > aid >>>>>>> > cross language testing >>>>>>> > - Kosta Tarasov working on Rust implementation. This is great. >>>>>>> Thanks >>>>>>> > >>>>>>> > >>>>>>> > 2) Antoine >>>>>>> > >>>>>>> > - Thanks a lot for reporting the numbers on AMD. Looks like you >>>>>>> are >>>>>>> > getting 8X the decoding performance of BSS. This is amazing!!. >>>>>>> > - Thanks for acknowledging the sampling design. >>>>>>> > - I agree with you on Fastlanes. In some crude experiments I >>>>>>> didn't get >>>>>>> > a good perf benefit from it on Graviton3 (but maybe there was >>>>>>> something >>>>>>> > wrong with my implementation). >>>>>>> > - Locking the 16bit exception encoding for the spec in this >>>>>>> case. >>>>>>> > - Awesome I think we have solved for all open questions minus >>>>>>> the >>>>>>> > version byte :). (will get back on this soon) >>>>>>> > >>>>>>> > >>>>>>> > 3) Micah >>>>>>> > >>>>>>> > - FastLanes : The current spec does allow for using FastLane >>>>>>> with the >>>>>>> > configurable enum value for layout. We should be able to inject >>>>>>> any >>>>>>> > layout >>>>>>> > in the current design. >>>>>>> > >>>>>>> > >>>>>>> > Working on resolving all remaining open comments on the spec this >>>>>>> week. >>>>>>> > >>>>>>> > Best >>>>>>> > Prateek >>>>>>> > >>>>>>> > >>>>>>> > On Tue, Feb 10, 2026 at 3:37 AM Steve Loughran < >>>>>>> [email protected]> >>>>>>> > wrote: >>>>>>> > >>>>>>> > > On Sun, 8 Feb 2026 at 18:12, Micah Kornfield < >>>>>>> [email protected]> >>>>>>> > > wrote: >>>>>>> > > >>>>>>> > > > >>>>>>> > > > >>>>>>> > > > It looks like the actual issue described for ORC in the paper >>>>>>> is that >>>>>>> > it >>>>>>> > > > has multiple sub-encodings in a batch. This is different then >>>>>>> the >>>>>>> > design >>>>>>> > > > proposed here where there is still fixed encoding per page in >>>>>>> parquet. >>>>>>> > > > Given reasonably sized pages I don't think branch >>>>>>> misprediction should >>>>>>> > > be a >>>>>>> > > > big issue for new encodings. I agree that we should be >>>>>>> conservative in >>>>>>> > > > general for adding new encodings. >>>>>>> > > > >>>>>>> > > > >>>>>>> > > +1 >>>>>>> > > >>>>>>> > >>>>>>> >>>>>>
