Ah, thanks! I missed that.

On Wed, Apr 29, 2026 at 4:24 PM Micah Kornfield <[email protected]>
wrote:

> Hi Curt,
>
>> As part of the process of amending the Parquet format, perhaps it would
>> be a good idea for early implementations to generate sample files and
>> commit them to apache/parquet-testing: Apache Parquet Testing
>> <https://github.com/apache/parquet-testing> for other implementations to
>> leverage?
>
>
> It got dropped in the thread but does
> https://github.com/apache/parquet-testing/pull/100 address your concerns?
>
> Thanks,
> Micah
>
> On Wed, Apr 29, 2026 at 4:20 PM Curt Hagenlocher <[email protected]>
> wrote:
>
>> As part of the process of amending the Parquet format, perhaps it would
>> be a good idea for early implementations to generate sample files and
>> commit them to apache/parquet-testing: Apache Parquet Testing
>> <https://github.com/apache/parquet-testing> for other implementations to
>> leverage?
>>
>> -Curt
>>
>> On Wed, Apr 29, 2026 at 4:11 PM PRATEEK GAUR <[email protected]> wrote:
>>
>>> Thanks Andrew and Micah for review feedback on the two PR's
>>> 1) (c++ arrow repo) https://github.com/apache/arrow/pull/48345/changes
>>> 2) (parquet-format repo)
>>> https://github.com/apache/parquet-format/pull/557
>>>
>>> I have addressed all (unless I missed something) comments on the two
>>> PR's.
>>>
>>> Best
>>> Prateek
>>>
>>> On Sat, Apr 25, 2026 at 1:08 PM PRATEEK GAUR <[email protected]> wrote:
>>>
>>> > Thanks Andrew and Micah.
>>> >
>>> > `fair amount of feedback on at least the implementations`
>>> > For the c++ I have already started addressing the feedback, I should be
>>> > done with that Monday/Tuesday.
>>> > I think Vinoo too has been making good progress on the Java
>>> implementation.
>>> >
>>> > Best
>>> > Prateek
>>> >
>>> > On Sat, Apr 25, 2026 at 12:55 PM Andrew Lamb <[email protected]>
>>> > wrote:
>>> >
>>> >> Got it. Thank you for the clarification -- I will try and look into
>>> the
>>> >> spec and the Rust implementation[1] in this next week
>>> >>
>>> >> [1]: https://github.com/apache/arrow-rs/pull/9372
>>> >>
>>> >> On Sat, Apr 25, 2026 at 12:01 PM Micah Kornfield <
>>> [email protected]>
>>> >> wrote:
>>> >>
>>> >>> Hi Andrew,
>>> >>> I think there is a fair amount of feedback on at least the
>>> >>> implementations, typically I think we've waited till they are close
>>> to
>>> >>> mergeable before a final vote.  Otherwise I agree we are very close.
>>> >>>
>>> >>> -Micah
>>> >>>
>>> >>> On Saturday, April 25, 2026, Andrew Lamb <[email protected]>
>>> wrote:
>>> >>>
>>> >>>> Thanks Prateek,
>>> >>>>
>>> >>>> I think from this content it looks to me like we are ready to start
>>> a
>>> >>>> vote to explicitly accept ALP into Parquet
>>> >>>>
>>> >>>> Does anyone know of a reason we should postpone it for longer?
>>> >>>> Perhaps someone needs some more time to review?
>>> >>>>
>>> >>>> Andrew
>>> >>>>
>>> >>>>
>>> >>>>
>>> >>>> On Wed, Apr 22, 2026 at 1:00 PM PRATEEK GAUR <[email protected]>
>>> >>>> wrote:
>>> >>>>
>>> >>>>> Hi team,
>>> >>>>>
>>> >>>>>
>>> >>>>>
>>> >>>>> Hope everyone is doing well. I got a chance to work through all the
>>> >>>>> remaining feedback and update the spec doc. Here are the new
>>> artifacts
>>> >>>>>
>>> >>>>> 1) Spec document :
>>> >>>>>
>>> https://docs.google.com/document/d/1xz2cudDpN2Y1ImFcTXh15s-3fPtD_aWt/edit
>>> >>>>>
>>> >>>>> 2) Spec document in parquet format repo :
>>> >>>>> https://github.com/apache/parquet-format/pull/557
>>> >>>>>
>>> >>>>> 3) Alp implementation in arrow c++ repo :
>>> >>>>> https://github.com/apache/arrow/pull/48345/changes
>>> >>>>>
>>> >>>>> 4) Alp implementation in parquet-java repo : Work for Vinoo and
>>> Julien
>>> >>>>>  https://github.com/apache/parquet-java/pull/3397
>>> >>>>>
>>> >>>>> 5) PR with test and benchmarking artifacts in parquet-testing repo
>>> :
>>> >>>>> https://github.com/apache/parquet-testing/pull/100
>>> >>>>>
>>> >>>>>
>>> >>>>> And
>>> >>>>>
>>> >>>>>
>>> >>>>>    - Go : Arnav just submitted an in progress implementation in Go.
>>> >>>>>    https://github.com/apache/arrow-go/pull/704 (I haven't started
>>> >>>>>    looking at it yet)
>>> >>>>>    - Rust : I remember Andrew mentioned that this work is also in
>>> >>>>>    progress (So 4 languages!)
>>> >>>>>
>>> >>>>>
>>> >>>>> *Arrow C++ implementation *
>>> >>>>>
>>> >>>>>
>>> >>>>>
>>> >>>>> The PR is out and was also used by Antoine to report the numbers as
>>> >>>>> reported here. Micah and Konstantin have given 1 round of feedback
>>> >>>>> and I'm addressing them today. Please note that the default
>>> >>>>> optimization flag for compiling is O2 and not Q3. I got around 70%
>>> >>>>> performance improvement in the decoding speed when using the O3
>>> flag.
>>> >>>>>
>>> >>>>>
>>> >>>>>
>>> >>>>> *Parqet-MR Java implementation (working with Vinoo and Julien) and
>>> **Cross
>>> >>>>> Language testing*
>>> >>>>>
>>> >>>>>
>>> >>>>>    Let me know if you have any questions or feedback.
>>> >>>>>
>>> >>>>>
>>> >>>>>
>>> >>>>> Now pasting some performance numbers
>>> >>>>>
>>> >>>>>
>>> >>>>>   Table 1: C++ ALP Double Decode — Spotify Columns (Graviton 3, ARM
>>> >>>>> Neoverse V1)
>>> >>>>>
>>> >>>>>   ┌──────────────────┬──────────────┬──────────────┬─────────┐
>>> >>>>>
>>> >>>>>   │ Column           │  -O2 (MB/s)  │  -O3 (MB/s)  │ Speedup │
>>> >>>>>
>>> >>>>>   ├──────────────────┼──────────────┼──────────────┼─────────┤
>>> >>>>>
>>> >>>>>   │ valence          │     3,155    │     5,523    │  1.75x  │
>>> >>>>>
>>> >>>>>   │ danceability     │     3,233    │     5,685    │  1.76x  │
>>> >>>>>
>>> >>>>>   │ energy           │     3,197    │     5,652    │  1.77x  │
>>> >>>>>
>>> >>>>>   │ loudness         │     3,186    │     5,473    │  1.72x  │
>>> >>>>>
>>> >>>>>   └──────────────────┴──────────────┴──────────────┴─────────┘
>>> >>>>>
>>> >>>>>
>>> >>>>>
>>> >>>>>
>>> >>>>> On Wed, Feb 25, 2026 at 9:49 AM PRATEEK GAUR <[email protected]>
>>> >>>>> wrote:
>>> >>>>>
>>> >>>>>> @Micah Kornfield <[email protected]> : Got it.
>>> >>>>>>
>>> >>>>>> @Andrew Lamb <[email protected]>
>>> >>>>>>
>>> >>>>>>
>>> >>>>>>> Do you think it would be good to start moving the spec
>>> development
>>> >>>>>>> into
>>> >>>>>>> markdown format, in preparation for finalizing it?
>>> >>>>>>>
>>> >>>>>>
>>> >>>>>> Yes I'll update the numbers for some of the examples I have in the
>>> >>>>>> spec based
>>> >>>>>> on the updated header size. Then we should be good to go for the
>>> >>>>>> markdown format.
>>> >>>>>>
>>> >>>>>> Thanks everyone!
>>> >>>>>>
>>> >>>>>>
>>> >>>>>>>
>>> >>>>>>> Andrew
>>> >>>>>>>
>>> >>>>>>> On Tue, Feb 17, 2026 at 7:28 PM PRATEEK GAUR <[email protected]
>>> >
>>> >>>>>>> wrote:
>>> >>>>>>>
>>> >>>>>>> > Hi team,
>>> >>>>>>> >
>>> >>>>>>> > 1) Andrew
>>> >>>>>>> >
>>> >>>>>>> >    - Thanks for working on test files. My PR did add all the
>>> test
>>> >>>>>>> files I
>>> >>>>>>> >    used to benchmark on datasets. Maybe we can club it
>>> together.
>>> >>>>>>> WIll also
>>> >>>>>>> > aid
>>> >>>>>>> >    cross language testing
>>> >>>>>>> >    -  Kosta Tarasov working on Rust implementation. This is
>>> great.
>>> >>>>>>> Thanks
>>> >>>>>>> >
>>> >>>>>>> >
>>> >>>>>>> > 2) Antoine
>>> >>>>>>> >
>>> >>>>>>> >    - Thanks a lot for reporting the numbers on AMD. Looks like
>>> you
>>> >>>>>>> are
>>> >>>>>>> >    getting 8X the decoding performance of BSS. This is
>>> amazing!!.
>>> >>>>>>> >    - Thanks for acknowledging the sampling design.
>>> >>>>>>> >    - I agree with you on Fastlanes. In some crude experiments I
>>> >>>>>>> didn't get
>>> >>>>>>> >    a good perf benefit from it on Graviton3 (but maybe there
>>> was
>>> >>>>>>> something
>>> >>>>>>> >    wrong with my implementation).
>>> >>>>>>> >    - Locking the 16bit exception encoding for the spec in this
>>> >>>>>>> case.
>>> >>>>>>> >    - Awesome I think we have solved for all open questions
>>> minus
>>> >>>>>>> the
>>> >>>>>>> >    version byte :). (will get back on this soon)
>>> >>>>>>> >
>>> >>>>>>> >
>>> >>>>>>> > 3) Micah
>>> >>>>>>> >
>>> >>>>>>> >    - FastLanes : The current spec does allow for using FastLane
>>> >>>>>>> with the
>>> >>>>>>> >    configurable enum value for layout. We should be able to
>>> inject
>>> >>>>>>> any
>>> >>>>>>> > layout
>>> >>>>>>> >    in the current design.
>>> >>>>>>> >
>>> >>>>>>> >
>>> >>>>>>> > Working on resolving all remaining open comments on the spec
>>> this
>>> >>>>>>> week.
>>> >>>>>>> >
>>> >>>>>>> > Best
>>> >>>>>>> > Prateek
>>> >>>>>>> >
>>> >>>>>>> >
>>> >>>>>>> > On Tue, Feb 10, 2026 at 3:37 AM Steve Loughran <
>>> >>>>>>> [email protected]>
>>> >>>>>>> > wrote:
>>> >>>>>>> >
>>> >>>>>>> > > On Sun, 8 Feb 2026 at 18:12, Micah Kornfield <
>>> >>>>>>> [email protected]>
>>> >>>>>>> > > wrote:
>>> >>>>>>> > >
>>> >>>>>>> > > >
>>> >>>>>>> > > >
>>> >>>>>>> > > > It looks like the actual issue described for ORC in the
>>> paper
>>> >>>>>>> is that
>>> >>>>>>> > it
>>> >>>>>>> > > > has multiple sub-encodings in a batch.  This is different
>>> then
>>> >>>>>>> the
>>> >>>>>>> > design
>>> >>>>>>> > > > proposed here where there is still fixed encoding per page
>>> in
>>> >>>>>>> parquet.
>>> >>>>>>> > > > Given reasonably sized pages I don't think branch
>>> >>>>>>> misprediction should
>>> >>>>>>> > > be a
>>> >>>>>>> > > > big issue for new encodings.  I agree that we should be
>>> >>>>>>> conservative in
>>> >>>>>>> > > > general for adding new encodings.
>>> >>>>>>> > > >
>>> >>>>>>> > > >
>>> >>>>>>> > > +1
>>> >>>>>>> > >
>>> >>>>>>> >
>>> >>>>>>>
>>> >>>>>>
>>>
>>

Reply via email to