Notes from the meeting:
https://docs.google.com/document/d/e/2PACX-1vSDHW7gvG8eO6aIxaIVPrZSqYYhtRDb5W1imnbpM4QRYNPsTwEO1fU5z7SEhVIFa4YqWJeSRJ9tcXYS/pub
Attendees:

   -

   Micah Kornfield - Databricks - Listening in
   -

   Neelesh Salian - Apple - Variant related items
   -

   Robert Kruszewski - Spiral - Listening in
   -

   Martin Prammer - Spiral - Listening in
   -

   Gunnar Morling - Confluent - Listening in
   -

   Kenny Daniel - Hyperparam - Listening
   -

   Divjot Arora - Databricks - Flatbuf footer
   -

   Jiayi Wang - backward-compatible VS incompatible changes (part of
   flatbuf discussion)
   -

   Ismaël Mejía - Microsoft - Java Encoding/Decoding perf
   -

   Anurag Mantripragada - Apple - Listening in - Variant stuff


   -

   Rok Mihevc: G-Research/Arctos Alliance <https://arctosalliance.org/>,
   Flatbuffers, FIXED_SIZE_LIST/VECTOR proposal
   -

   Prateek - Snowflake - Listening in
   -

   Benjamin Owad - Snowflake - Listening in


   -

   Dusan Paripovic - RTE , listening in
   -

   Will Edwards - Spotify - Listening in
   -

   Raúl Cumplido - QuantStack - Listening in
   -

   Steve Loughran: Variant performance update (good!)
   -

   Mengmeng Chen - Snowflake - listening in
   -

   Rahil Chertara - Onehouse - listening in


Agenda:

   -

   [Neelesh Salian + Steve Loughran] Variant related items
   -

      Iceberg - Variant Community Update
      
<https://docs.google.com/document/d/1IuhLRxw1rcPD_f4jgHuGe3SwFgy7Y5wgEGvLzf6311s/edit?tab=t.froqj7pg3868#heading=h.r977qio1wsv2>(Parquet
      items as well)
      -

      See doc for Iceberg, Spark and Parquet related items
      -

      PRs open for lazy caching…(
      https://github.com/apache/parquet-java/pull/3481)
      -

      If you want to help, please reach out! Help welcome. Tracker and
      benchmark in the doc.
      -

   [Ismael] Java Encoding/Decoding ask for review
   -

      Experimenting with improving open source libraries with AI.
      -

      Based on existing benchmarks.
      -

      Performance tests and PRs.
      -

      Avg 40% improvement on encodings. (write path)
      -

      10% on read path.
      -

      PRs have been reviewed by ismael: not just ai generated.
      -

      Need help with reviews from maintainers.
      -

         https://github.com/apache/parquet-java/pull/3512
         -

      Gunnar: I've been working on a new Parquet Parser (presented it to
      the group a few weeks back, https://github.com/hardwood-hq/hardwood);
      solely focused on parsing atm., i.e. decoding. Would love to learn about
      any improvements in that area, will check out your PRs.
      -

   [Divjot + Jiayi + Rok] Flatbuffer footer
   -

      Ref to mailing list thread regarding building bw compatible indices
      on thrift footer.
      -

      Goal to give faster random access in metadata.
      -

      2 options:
      -

         Incremental updates: Index on footer + reducing bloat by removing
         less useful metadata.
         -

            PR <https://github.com/apache/parquet-format/pull/564> to make
            path_in_schema optional
            -

         Bigger rewrite with roll out plan: New Flatbuffer based footer.
         -

      Open items:
      -

         Handling thrift schema evolution, making fields optional to
         deprecate.
         -

         Discuss increased complexity of thrift jump tables.
         -

         Finalizing plan for the flatbuffer footer.
         -

            Flatbuffer at prototype state?
            -

            Proposal:
            -

               1) replace everything as in the current proposal
               -

               2) make it minimal and more modular with extensions.
               -

         We have some internal benchmarks that show that most footers are
         actually smaller when using FlatBuffers after removing bloat unuseful
         fields. If there's some public e2e benchmarks, let me know.
But of course,
         only readers that adopt flatbuf footer can benefit from it.
         -

         Kenny: That assumes making the breaking change of dropping thrift.
         If we stay in a backward compat world then we need both flat
and thrift.
         That makes files (and parsers) much larger more complicated.
I personally
         hate the idea of dropping thrift as it will break a lot of
systems. Making
         a big breaking change is an existential risk to parquet... if
its going to
         be a hard break why wouldnt users consider alternatives at
that point? I
         like the idea of optimizing thrift much more than flatbuffer,
personally.
         -

         Gunnar Morling: Yeah, similar sentiment here
         -

         Robert: How about embedding Vortex?
         -

            Stated goal not to embed opaque encodings, schemes.
            -

            Embed vortex flatbuffer footer
            -

               Readers who can parse the footer can treat the opaque
               encoding as transparent
               -

            Input from other projects is welcome.
            -

      TODO:
      -

         Shared doc to articulate
         -

            Jiayi, Divjot, Will, Gunnar, Alkis, Robert, Rok
            -

            Content:
            -

               Describe the problem: large footer, wide schema
               -

                  Can have big footer with many row groups as well.
                  -

                  Describe what’s pathological
                  -

               Describe the options at a high level, point to detailed docs
               of POC/proposals.
               -

            Useful to share files with the problem.
            -

               Difficult
               -

         Regular meeting. Jiayi: facilitator
         -

   [Rok] FIXED_SIZE_LIST/VECTOR proposal
   -

      This is still ongoing.
      -

      3 options, will write a doc and report to the mailing list.
      -

      Use case: efficiently store Vectors
      -

      Micah: how about adding a 4th option: new logical type vector that
      annotates the existing FLBA type (?) => know you don’t have to read
      Repetition Levels.
      -

         Rahil: similar to what is being done in Hudi.
         -

         Need to discuss dense vectors vs sparse vectors.


On Tue, Apr 21, 2026 at 2:53 PM Julien Le Dem <[email protected]> wrote:

> The next Parquet sync is tomorrow Wednesday Apr 22nd at 10am PT - 1pm ET
> - 7pm CET
>
> To join the invite, join the group:
> https://groups.google.com/g/apache-parquet-community-sync
>
> Everybody is welcome, bring your topic or just listen in.
>
> (Some more details on how the meeting is run:
> https://lists.apache.org/thread/bjdkscmx7zvgfbw0wlfttxy8h6v3f71t )
>

Reply via email to