Re: [Parquet] ALP Encoding for Floating point data

Micah Kornfield Sat, 25 Apr 2026 09:03:00 -0700

Hi Andrew,
I think there is a fair amount of feedback on at least the implementations,
typically I think we've waited till they are close to mergeable before a
final vote.  Otherwise I agree we are very close.


-Micah

On Saturday, April 25, 2026, Andrew Lamb <[email protected]> wrote:

> Thanks Prateek,
>
> I think from this content it looks to me like we are ready to start a vote
> to explicitly accept ALP into Parquet
>
> Does anyone know of a reason we should postpone it for longer?
> Perhaps someone needs some more time to review?
>
> Andrew
>
>
>
> On Wed, Apr 22, 2026 at 1:00 PM PRATEEK GAUR <[email protected]> wrote:
>
>> Hi team,
>>
>>
>>
>> Hope everyone is doing well. I got a chance to work through all the
>> remaining feedback and update the spec doc. Here are the new artifacts
>>
>> 1) Spec document : https://docs.google.com/document/d/
>> 1xz2cudDpN2Y1ImFcTXh15s-3fPtD_aWt/edit
>>
>> 2) Spec document in parquet format repo : https://github.com/apache/
>> parquet-format/pull/557
>>
>> 3) Alp implementation in arrow c++ repo : https://github.com/apache/
>> arrow/pull/48345/changes
>>
>> 4) Alp implementation in parquet-java repo : Work for Vinoo and Julien
>> https://github.com/apache/parquet-java/pull/3397
>>
>> 5) PR with test and benchmarking artifacts in parquet-testing repo :
>> https://github.com/apache/parquet-testing/pull/100
>>
>>
>> And
>>
>>
>>    - Go : Arnav just submitted an in progress implementation in Go.
>>    https://github.com/apache/arrow-go/pull/704
>>    <https://github.com/apache/arrow-go/pull/704> (I haven't started
>>    looking at it yet)
>>    - Rust : I remember Andrew mentioned that this work is also in
>>    progress (So 4 languages!)
>>
>>
>> *Arrow C++ implementation *
>>
>>
>>
>> The PR is out and was also used by Antoine to report the numbers as
>> reported here. Micah and Konstantin have given 1 round of feedback and
>> I'm addressing them today. Please note that the default optimization
>> flag for compiling is O2 and not Q3. I got around 70% performance
>> improvement in the decoding speed when using the O3 flag.
>>
>>
>>
>> *Parqet-MR Java implementation (working with Vinoo and Julien) and **Cross
>> Language testing*
>>
>>
>>    Let me know if you have any questions or feedback.
>>
>>
>>
>> Now pasting some performance numbers
>>
>>
>>   Table 1: C++ ALP Double Decode — Spotify Columns (Graviton 3, ARM
>> Neoverse V1)
>>
>>   ┌──────────────────┬──────────────┬──────────────┬─────────┐
>>
>>   │ Column           │  -O2 (MB/s)  │  -O3 (MB/s)  │ Speedup │
>>
>>   ├──────────────────┼──────────────┼──────────────┼─────────┤
>>
>>   │ valence          │     3,155    │     5,523    │  1.75x  │
>>
>>   │ danceability     │     3,233    │     5,685    │  1.76x  │
>>
>>   │ energy           │     3,197    │     5,652    │  1.77x  │
>>
>>   │ loudness         │     3,186    │     5,473    │  1.72x  │
>>
>>   └──────────────────┴──────────────┴──────────────┴─────────┘
>>
>>
>>
>>
>> On Wed, Feb 25, 2026 at 9:49 AM PRATEEK GAUR <[email protected]> wrote:
>>
>>> @Micah Kornfield <[email protected]> : Got it.
>>>
>>> @Andrew Lamb <[email protected]>
>>>
>>>
>>>> Do you think it would be good to start moving the spec development into
>>>> markdown format, in preparation for finalizing it?
>>>>
>>>
>>> Yes I'll update the numbers for some of the examples I have in the spec
>>> based
>>> on the updated header size. Then we should be good to go for the
>>> markdown format.
>>>
>>> Thanks everyone!
>>>
>>>
>>>>
>>>> Andrew
>>>>
>>>> On Tue, Feb 17, 2026 at 7:28 PM PRATEEK GAUR <[email protected]>
>>>> wrote:
>>>>
>>>> > Hi team,
>>>> >
>>>> > 1) Andrew
>>>> >
>>>> >    - Thanks for working on test files. My PR did add all the test
>>>> files I
>>>> >    used to benchmark on datasets. Maybe we can club it together. WIll
>>>> also
>>>> > aid
>>>> >    cross language testing
>>>> >    -  Kosta Tarasov working on Rust implementation. This is great.
>>>> Thanks
>>>> >
>>>> >
>>>> > 2) Antoine
>>>> >
>>>> >    - Thanks a lot for reporting the numbers on AMD. Looks like you are
>>>> >    getting 8X the decoding performance of BSS. This is amazing!!.
>>>> >    - Thanks for acknowledging the sampling design.
>>>> >    - I agree with you on Fastlanes. In some crude experiments I
>>>> didn't get
>>>> >    a good perf benefit from it on Graviton3 (but maybe there was
>>>> something
>>>> >    wrong with my implementation).
>>>> >    - Locking the 16bit exception encoding for the spec in this case.
>>>> >    - Awesome I think we have solved for all open questions minus the
>>>> >    version byte :). (will get back on this soon)
>>>> >
>>>> >
>>>> > 3) Micah
>>>> >
>>>> >    - FastLanes : The current spec does allow for using FastLane with
>>>> the
>>>> >    configurable enum value for layout. We should be able to inject any
>>>> > layout
>>>> >    in the current design.
>>>> >
>>>> >
>>>> > Working on resolving all remaining open comments on the spec this
>>>> week.
>>>> >
>>>> > Best
>>>> > Prateek
>>>> >
>>>> >
>>>> > On Tue, Feb 10, 2026 at 3:37 AM Steve Loughran <[email protected]>
>>>> > wrote:
>>>> >
>>>> > > On Sun, 8 Feb 2026 at 18:12, Micah Kornfield <[email protected]
>>>> >
>>>> > > wrote:
>>>> > >
>>>> > > >
>>>> > > >
>>>> > > > It looks like the actual issue described for ORC in the paper is
>>>> that
>>>> > it
>>>> > > > has multiple sub-encodings in a batch.  This is different then the
>>>> > design
>>>> > > > proposed here where there is still fixed encoding per page in
>>>> parquet.
>>>> > > > Given reasonably sized pages I don't think branch misprediction
>>>> should
>>>> > > be a
>>>> > > > big issue for new encodings.  I agree that we should be
>>>> conservative in
>>>> > > > general for adding new encodings.
>>>> > > >
>>>> > > >
>>>> > > +1
>>>> > >
>>>> >
>>>>
>>>

Re: [Parquet] ALP Encoding for Floating point data

Reply via email to