Thanks Prateek,

I think from this content it looks to me like we are ready to start a vote
to explicitly accept ALP into Parquet

Does anyone know of a reason we should postpone it for longer?
Perhaps someone needs some more time to review?

Andrew



On Wed, Apr 22, 2026 at 1:00 PM PRATEEK GAUR <[email protected]> wrote:

> Hi team,
>
>
>
> Hope everyone is doing well. I got a chance to work through all the
> remaining feedback and update the spec doc. Here are the new artifacts
>
> 1) Spec document :
> https://docs.google.com/document/d/1xz2cudDpN2Y1ImFcTXh15s-3fPtD_aWt/edit
>
> 2) Spec document in parquet format repo :
> https://github.com/apache/parquet-format/pull/557
>
> 3) Alp implementation in arrow c++ repo :
> https://github.com/apache/arrow/pull/48345/changes
>
> 4) Alp implementation in parquet-java repo : Work for Vinoo and Julien
> https://github.com/apache/parquet-java/pull/3397
>
> 5) PR with test and benchmarking artifacts in parquet-testing repo :
> https://github.com/apache/parquet-testing/pull/100
>
>
> And
>
>
>    - Go : Arnav just submitted an in progress implementation in Go.
>    https://github.com/apache/arrow-go/pull/704 (I haven't started looking
>    at it yet)
>    - Rust : I remember Andrew mentioned that this work is also in
>    progress (So 4 languages!)
>
>
> *Arrow C++ implementation *
>
>
>
> The PR is out and was also used by Antoine to report the numbers as
> reported here. Micah and Konstantin have given 1 round of feedback and
> I'm addressing them today. Please note that the default optimization flag
> for compiling is O2 and not Q3. I got around 70% performance improvement in
> the decoding speed when using the O3 flag.
>
>
>
> *Parqet-MR Java implementation (working with Vinoo and Julien) and **Cross
> Language testing*
>
>
>    Let me know if you have any questions or feedback.
>
>
>
> Now pasting some performance numbers
>
>
>   Table 1: C++ ALP Double Decode — Spotify Columns (Graviton 3, ARM
> Neoverse V1)
>
>   ┌──────────────────┬──────────────┬──────────────┬─────────┐
>
>   │ Column           │  -O2 (MB/s)  │  -O3 (MB/s)  │ Speedup │
>
>   ├──────────────────┼──────────────┼──────────────┼─────────┤
>
>   │ valence          │     3,155    │     5,523    │  1.75x  │
>
>   │ danceability     │     3,233    │     5,685    │  1.76x  │
>
>   │ energy           │     3,197    │     5,652    │  1.77x  │
>
>   │ loudness         │     3,186    │     5,473    │  1.72x  │
>
>   └──────────────────┴──────────────┴──────────────┴─────────┘
>
>
>
>
> On Wed, Feb 25, 2026 at 9:49 AM PRATEEK GAUR <[email protected]> wrote:
>
>> @Micah Kornfield <[email protected]> : Got it.
>>
>> @Andrew Lamb <[email protected]>
>>
>>
>>> Do you think it would be good to start moving the spec development into
>>> markdown format, in preparation for finalizing it?
>>>
>>
>> Yes I'll update the numbers for some of the examples I have in the spec
>> based
>> on the updated header size. Then we should be good to go for the markdown
>> format.
>>
>> Thanks everyone!
>>
>>
>>>
>>> Andrew
>>>
>>> On Tue, Feb 17, 2026 at 7:28 PM PRATEEK GAUR <[email protected]> wrote:
>>>
>>> > Hi team,
>>> >
>>> > 1) Andrew
>>> >
>>> >    - Thanks for working on test files. My PR did add all the test
>>> files I
>>> >    used to benchmark on datasets. Maybe we can club it together. WIll
>>> also
>>> > aid
>>> >    cross language testing
>>> >    -  Kosta Tarasov working on Rust implementation. This is great.
>>> Thanks
>>> >
>>> >
>>> > 2) Antoine
>>> >
>>> >    - Thanks a lot for reporting the numbers on AMD. Looks like you are
>>> >    getting 8X the decoding performance of BSS. This is amazing!!.
>>> >    - Thanks for acknowledging the sampling design.
>>> >    - I agree with you on Fastlanes. In some crude experiments I didn't
>>> get
>>> >    a good perf benefit from it on Graviton3 (but maybe there was
>>> something
>>> >    wrong with my implementation).
>>> >    - Locking the 16bit exception encoding for the spec in this case.
>>> >    - Awesome I think we have solved for all open questions minus the
>>> >    version byte :). (will get back on this soon)
>>> >
>>> >
>>> > 3) Micah
>>> >
>>> >    - FastLanes : The current spec does allow for using FastLane with
>>> the
>>> >    configurable enum value for layout. We should be able to inject any
>>> > layout
>>> >    in the current design.
>>> >
>>> >
>>> > Working on resolving all remaining open comments on the spec this week.
>>> >
>>> > Best
>>> > Prateek
>>> >
>>> >
>>> > On Tue, Feb 10, 2026 at 3:37 AM Steve Loughran <[email protected]>
>>> > wrote:
>>> >
>>> > > On Sun, 8 Feb 2026 at 18:12, Micah Kornfield <[email protected]>
>>> > > wrote:
>>> > >
>>> > > >
>>> > > >
>>> > > > It looks like the actual issue described for ORC in the paper is
>>> that
>>> > it
>>> > > > has multiple sub-encodings in a batch.  This is different then the
>>> > design
>>> > > > proposed here where there is still fixed encoding per page in
>>> parquet.
>>> > > > Given reasonably sized pages I don't think branch misprediction
>>> should
>>> > > be a
>>> > > > big issue for new encodings.  I agree that we should be
>>> conservative in
>>> > > > general for adding new encodings.
>>> > > >
>>> > > >
>>> > > +1
>>> > >
>>> >
>>>
>>

Reply via email to