I also just filed a ticket[1] to track adding the example files and linked
it around to try and give it a bit more visibility.

[1]: https://github.com/apache/parquet-testing/issues/105

On Wed, Apr 29, 2026 at 7:25 PM Curt Hagenlocher <[email protected]>
wrote:

> Ah, thanks! I missed that.
>
> On Wed, Apr 29, 2026 at 4:24 PM Micah Kornfield <[email protected]>
> wrote:
>
>> Hi Curt,
>>
>>> As part of the process of amending the Parquet format, perhaps it would
>>> be a good idea for early implementations to generate sample files and
>>> commit them to apache/parquet-testing: Apache Parquet Testing
>>> <https://github.com/apache/parquet-testing> for other implementations
>>> to leverage?
>>
>>
>> It got dropped in the thread but does
>> https://github.com/apache/parquet-testing/pull/100 address your concerns?
>>
>> Thanks,
>> Micah
>>
>> On Wed, Apr 29, 2026 at 4:20 PM Curt Hagenlocher <[email protected]>
>> wrote:
>>
>>> As part of the process of amending the Parquet format, perhaps it would
>>> be a good idea for early implementations to generate sample files and
>>> commit them to apache/parquet-testing: Apache Parquet Testing
>>> <https://github.com/apache/parquet-testing> for other implementations
>>> to leverage?
>>>
>>> -Curt
>>>
>>> On Wed, Apr 29, 2026 at 4:11 PM PRATEEK GAUR <[email protected]> wrote:
>>>
>>>> Thanks Andrew and Micah for review feedback on the two PR's
>>>> 1) (c++ arrow repo) https://github.com/apache/arrow/pull/48345/changes
>>>> 2) (parquet-format repo)
>>>> https://github.com/apache/parquet-format/pull/557
>>>>
>>>> I have addressed all (unless I missed something) comments on the two
>>>> PR's.
>>>>
>>>> Best
>>>> Prateek
>>>>
>>>> On Sat, Apr 25, 2026 at 1:08 PM PRATEEK GAUR <[email protected]>
>>>> wrote:
>>>>
>>>> > Thanks Andrew and Micah.
>>>> >
>>>> > `fair amount of feedback on at least the implementations`
>>>> > For the c++ I have already started addressing the feedback, I should
>>>> be
>>>> > done with that Monday/Tuesday.
>>>> > I think Vinoo too has been making good progress on the Java
>>>> implementation.
>>>> >
>>>> > Best
>>>> > Prateek
>>>> >
>>>> > On Sat, Apr 25, 2026 at 12:55 PM Andrew Lamb <[email protected]>
>>>> > wrote:
>>>> >
>>>> >> Got it. Thank you for the clarification -- I will try and look into
>>>> the
>>>> >> spec and the Rust implementation[1] in this next week
>>>> >>
>>>> >> [1]: https://github.com/apache/arrow-rs/pull/9372
>>>> >>
>>>> >> On Sat, Apr 25, 2026 at 12:01 PM Micah Kornfield <
>>>> [email protected]>
>>>> >> wrote:
>>>> >>
>>>> >>> Hi Andrew,
>>>> >>> I think there is a fair amount of feedback on at least the
>>>> >>> implementations, typically I think we've waited till they are close
>>>> to
>>>> >>> mergeable before a final vote.  Otherwise I agree we are very close.
>>>> >>>
>>>> >>> -Micah
>>>> >>>
>>>> >>> On Saturday, April 25, 2026, Andrew Lamb <[email protected]>
>>>> wrote:
>>>> >>>
>>>> >>>> Thanks Prateek,
>>>> >>>>
>>>> >>>> I think from this content it looks to me like we are ready to
>>>> start a
>>>> >>>> vote to explicitly accept ALP into Parquet
>>>> >>>>
>>>> >>>> Does anyone know of a reason we should postpone it for longer?
>>>> >>>> Perhaps someone needs some more time to review?
>>>> >>>>
>>>> >>>> Andrew
>>>> >>>>
>>>> >>>>
>>>> >>>>
>>>> >>>> On Wed, Apr 22, 2026 at 1:00 PM PRATEEK GAUR <[email protected]>
>>>> >>>> wrote:
>>>> >>>>
>>>> >>>>> Hi team,
>>>> >>>>>
>>>> >>>>>
>>>> >>>>>
>>>> >>>>> Hope everyone is doing well. I got a chance to work through all
>>>> the
>>>> >>>>> remaining feedback and update the spec doc. Here are the new
>>>> artifacts
>>>> >>>>>
>>>> >>>>> 1) Spec document :
>>>> >>>>>
>>>> https://docs.google.com/document/d/1xz2cudDpN2Y1ImFcTXh15s-3fPtD_aWt/edit
>>>> >>>>>
>>>> >>>>> 2) Spec document in parquet format repo :
>>>> >>>>> https://github.com/apache/parquet-format/pull/557
>>>> >>>>>
>>>> >>>>> 3) Alp implementation in arrow c++ repo :
>>>> >>>>> https://github.com/apache/arrow/pull/48345/changes
>>>> >>>>>
>>>> >>>>> 4) Alp implementation in parquet-java repo : Work for Vinoo and
>>>> Julien
>>>> >>>>>  https://github.com/apache/parquet-java/pull/3397
>>>> >>>>>
>>>> >>>>> 5) PR with test and benchmarking artifacts in parquet-testing
>>>> repo :
>>>> >>>>> https://github.com/apache/parquet-testing/pull/100
>>>> >>>>>
>>>> >>>>>
>>>> >>>>> And
>>>> >>>>>
>>>> >>>>>
>>>> >>>>>    - Go : Arnav just submitted an in progress implementation in
>>>> Go.
>>>> >>>>>    https://github.com/apache/arrow-go/pull/704 (I haven't started
>>>> >>>>>    looking at it yet)
>>>> >>>>>    - Rust : I remember Andrew mentioned that this work is also in
>>>> >>>>>    progress (So 4 languages!)
>>>> >>>>>
>>>> >>>>>
>>>> >>>>> *Arrow C++ implementation *
>>>> >>>>>
>>>> >>>>>
>>>> >>>>>
>>>> >>>>> The PR is out and was also used by Antoine to report the numbers
>>>> as
>>>> >>>>> reported here. Micah and Konstantin have given 1 round of feedback
>>>> >>>>> and I'm addressing them today. Please note that the default
>>>> >>>>> optimization flag for compiling is O2 and not Q3. I got around 70%
>>>> >>>>> performance improvement in the decoding speed when using the O3
>>>> flag.
>>>> >>>>>
>>>> >>>>>
>>>> >>>>>
>>>> >>>>> *Parqet-MR Java implementation (working with Vinoo and Julien)
>>>> and **Cross
>>>> >>>>> Language testing*
>>>> >>>>>
>>>> >>>>>
>>>> >>>>>    Let me know if you have any questions or feedback.
>>>> >>>>>
>>>> >>>>>
>>>> >>>>>
>>>> >>>>> Now pasting some performance numbers
>>>> >>>>>
>>>> >>>>>
>>>> >>>>>   Table 1: C++ ALP Double Decode — Spotify Columns (Graviton 3,
>>>> ARM
>>>> >>>>> Neoverse V1)
>>>> >>>>>
>>>> >>>>>   ┌──────────────────┬──────────────┬──────────────┬─────────┐
>>>> >>>>>
>>>> >>>>>   │ Column           │  -O2 (MB/s)  │  -O3 (MB/s)  │ Speedup │
>>>> >>>>>
>>>> >>>>>   ├──────────────────┼──────────────┼──────────────┼─────────┤
>>>> >>>>>
>>>> >>>>>   │ valence          │     3,155    │     5,523    │  1.75x  │
>>>> >>>>>
>>>> >>>>>   │ danceability     │     3,233    │     5,685    │  1.76x  │
>>>> >>>>>
>>>> >>>>>   │ energy           │     3,197    │     5,652    │  1.77x  │
>>>> >>>>>
>>>> >>>>>   │ loudness         │     3,186    │     5,473    │  1.72x  │
>>>> >>>>>
>>>> >>>>>   └──────────────────┴──────────────┴──────────────┴─────────┘
>>>> >>>>>
>>>> >>>>>
>>>> >>>>>
>>>> >>>>>
>>>> >>>>> On Wed, Feb 25, 2026 at 9:49 AM PRATEEK GAUR <[email protected]>
>>>> >>>>> wrote:
>>>> >>>>>
>>>> >>>>>> @Micah Kornfield <[email protected]> : Got it.
>>>> >>>>>>
>>>> >>>>>> @Andrew Lamb <[email protected]>
>>>> >>>>>>
>>>> >>>>>>
>>>> >>>>>>> Do you think it would be good to start moving the spec
>>>> development
>>>> >>>>>>> into
>>>> >>>>>>> markdown format, in preparation for finalizing it?
>>>> >>>>>>>
>>>> >>>>>>
>>>> >>>>>> Yes I'll update the numbers for some of the examples I have in
>>>> the
>>>> >>>>>> spec based
>>>> >>>>>> on the updated header size. Then we should be good to go for the
>>>> >>>>>> markdown format.
>>>> >>>>>>
>>>> >>>>>> Thanks everyone!
>>>> >>>>>>
>>>> >>>>>>
>>>> >>>>>>>
>>>> >>>>>>> Andrew
>>>> >>>>>>>
>>>> >>>>>>> On Tue, Feb 17, 2026 at 7:28 PM PRATEEK GAUR <
>>>> [email protected]>
>>>> >>>>>>> wrote:
>>>> >>>>>>>
>>>> >>>>>>> > Hi team,
>>>> >>>>>>> >
>>>> >>>>>>> > 1) Andrew
>>>> >>>>>>> >
>>>> >>>>>>> >    - Thanks for working on test files. My PR did add all the
>>>> test
>>>> >>>>>>> files I
>>>> >>>>>>> >    used to benchmark on datasets. Maybe we can club it
>>>> together.
>>>> >>>>>>> WIll also
>>>> >>>>>>> > aid
>>>> >>>>>>> >    cross language testing
>>>> >>>>>>> >    -  Kosta Tarasov working on Rust implementation. This is
>>>> great.
>>>> >>>>>>> Thanks
>>>> >>>>>>> >
>>>> >>>>>>> >
>>>> >>>>>>> > 2) Antoine
>>>> >>>>>>> >
>>>> >>>>>>> >    - Thanks a lot for reporting the numbers on AMD. Looks
>>>> like you
>>>> >>>>>>> are
>>>> >>>>>>> >    getting 8X the decoding performance of BSS. This is
>>>> amazing!!.
>>>> >>>>>>> >    - Thanks for acknowledging the sampling design.
>>>> >>>>>>> >    - I agree with you on Fastlanes. In some crude experiments
>>>> I
>>>> >>>>>>> didn't get
>>>> >>>>>>> >    a good perf benefit from it on Graviton3 (but maybe there
>>>> was
>>>> >>>>>>> something
>>>> >>>>>>> >    wrong with my implementation).
>>>> >>>>>>> >    - Locking the 16bit exception encoding for the spec in this
>>>> >>>>>>> case.
>>>> >>>>>>> >    - Awesome I think we have solved for all open questions
>>>> minus
>>>> >>>>>>> the
>>>> >>>>>>> >    version byte :). (will get back on this soon)
>>>> >>>>>>> >
>>>> >>>>>>> >
>>>> >>>>>>> > 3) Micah
>>>> >>>>>>> >
>>>> >>>>>>> >    - FastLanes : The current spec does allow for using
>>>> FastLane
>>>> >>>>>>> with the
>>>> >>>>>>> >    configurable enum value for layout. We should be able to
>>>> inject
>>>> >>>>>>> any
>>>> >>>>>>> > layout
>>>> >>>>>>> >    in the current design.
>>>> >>>>>>> >
>>>> >>>>>>> >
>>>> >>>>>>> > Working on resolving all remaining open comments on the spec
>>>> this
>>>> >>>>>>> week.
>>>> >>>>>>> >
>>>> >>>>>>> > Best
>>>> >>>>>>> > Prateek
>>>> >>>>>>> >
>>>> >>>>>>> >
>>>> >>>>>>> > On Tue, Feb 10, 2026 at 3:37 AM Steve Loughran <
>>>> >>>>>>> [email protected]>
>>>> >>>>>>> > wrote:
>>>> >>>>>>> >
>>>> >>>>>>> > > On Sun, 8 Feb 2026 at 18:12, Micah Kornfield <
>>>> >>>>>>> [email protected]>
>>>> >>>>>>> > > wrote:
>>>> >>>>>>> > >
>>>> >>>>>>> > > >
>>>> >>>>>>> > > >
>>>> >>>>>>> > > > It looks like the actual issue described for ORC in the
>>>> paper
>>>> >>>>>>> is that
>>>> >>>>>>> > it
>>>> >>>>>>> > > > has multiple sub-encodings in a batch.  This is different
>>>> then
>>>> >>>>>>> the
>>>> >>>>>>> > design
>>>> >>>>>>> > > > proposed here where there is still fixed encoding per
>>>> page in
>>>> >>>>>>> parquet.
>>>> >>>>>>> > > > Given reasonably sized pages I don't think branch
>>>> >>>>>>> misprediction should
>>>> >>>>>>> > > be a
>>>> >>>>>>> > > > big issue for new encodings.  I agree that we should be
>>>> >>>>>>> conservative in
>>>> >>>>>>> > > > general for adding new encodings.
>>>> >>>>>>> > > >
>>>> >>>>>>> > > >
>>>> >>>>>>> > > +1
>>>> >>>>>>> > >
>>>> >>>>>>> >
>>>> >>>>>>>
>>>> >>>>>>
>>>>
>>>

Reply via email to