Notes from the meeting:
https://docs.google.com/document/d/e/2PACX-1vSDHW7gvG8eO6aIxaIVPrZSqYYhtRDb5W1imnbpM4QRYNPsTwEO1fU5z7SEhVIFa4YqWJeSRJ9tcXYS/pub
Gemini notes:
https://docs.google.com/document/d/e/2PACX-1vS89gT-RggoNMMuL2Gz4yTUkqYkJvX4_NeFo1fD5zdWuJJD7W6-LTcEf2cVNxR78w7E6hMmWZrX3psi/pub


Attendees:

   -

   Micah Kornfield - Databricks
   -

   Ahmar Suhail - AWS S3 - Parquet input stream optimisations
   -

   Rok Mihevc - GResearch - listening in, flatbuffers footer
   -

   Andrew Lamb - InfluxData - listening in
   -

   Jiayi Wang - Databricks, footer v3
   -

   Fokko Driesprong - Databricks: Getting people enthusiastic to vote!
   -

   Steve Loughran Cloudera: listening in
   -

   Anurag Mantripragada <[email protected]>, Apple - listening in,
   working on a proposal for column updates in Iceberg
   -

   Martin Prammer - CMU - Listening in
   -

   Julien Le Dem - Datadog. Interested in an update on new encodings and
   footer.
   -

   Alkis will join later
   -

   Russell Spitzer <[email protected]> - Snowflake - Listening
   in
   -

   Kenny Daniel - Hyparquet
   -

   Arnav Balyan - Uber


Notes/Agenda

   -

   1.17.0 Parquet Java Release Vote (devlist
   <https://lists.apache.org/thread/st0w1c6txzmrwwbdsqdh7xm2br5b2js3>)
   -

      Please take a look and and vote on the thread!
      -

      Important release.
      -

   File Path Status (https://github.com/apache/parquet-format/pull/542)
   -

      Micah: heads up that we are close to finalizing this. If you have
      comments, please chime up now.
      -

      TODO: follow up with Dan Weeks
      -

      The field is used only in the _SUMMARY file.
      -

      We want to clarify current use without precluding future use.
      -

      To use it in the future we would need a formal proposal.
      -

         There is no consensus yet on whether this functionality should be
         in Parquet or in Iceberg.
         -

      Interested in making more progress on this: Anurag, Kenny, Dan
      -

         (updated only a subset of columns, adding columns without
         rewriting everything, …)
         -

         Martin: The iceberg file API might bring more clarity.
         -

         The Lancedb approach might be interesting.
         -

   Implementation Status (
   
https://parquet.apache.org/docs/file-format/implementationstatus/#read-support-by-year
   )
   -

      TODO:
      -

         add vendor support.
         -

         Update whether hyparquet supports variant.
         -

   Parquet Java input stream optimizations:
   -

      Ahmar: see email thread
      <https://lists.apache.org/thread/nbksq32cs8h1ldj8762y6wh9zzp8gqx6> on
      the list.
      -

      Improve query performance reading Parquet on S3.
      -

         Pre-fetching, …
         -

         Google has a similar project
         -

      See document: Analytics Accelerator Library for Amazon S3 and Iceberg
      
<https://docs.google.com/document/d/13shy0RWotwfWC_qQksb95PXdi-vSUCKQyDzjoExQEN0/edit?tab=t.0#heading=h.3lc3p7s26rnw>
      -

      Process:
      -

         Use github issues and PRs to contribute this in parquet-java
         project.
         -

         This is not changing the format and doesn’t need to go through the
         Proposals.
         -

   New Encodings update:
   -

      Arnav: FSST.
      -

         Thank you Micah for reviews
         -

         How to represent symbol table in the format.
         -

         In short: Use the same way the existing lib does it.
         -

         Micah: we need the PR at the spec level before we merge
         implementations in Arrow.
         -

         Andrew wants to review the Rust PRs. we need a volunteer to have
         their code reviewed by Andrew.
         -

      Micah: progress is happening on ALP, we’ll check with Prateek next
      time.
      -

         ALP PR sent out (Antoine to review)
         -

         spec ready, attached in the PR itself.
         -


            
https://docs.google.com/document/d/1xz2cudDpN2Y1ImFcTXh15s-3fPtD_aWt/edit?usp=drivesdk&ouid=115352218487952916223&rtpof=true&sd=true
            -

            Quite complete and strong value
            -

            The spec is detailed and has pseudo code to help replicate in
            other languages
            -

   Encrypted cli now supported in parquet-java.
   -

   Flatbuffers Footer
   -

      Flatbuffer format PR:
      https://github.com/apache/parquet-format/pull/544
      -

         Remaining items:
         -

            Encryption is missing
            -

            A TODO needs clean up
            -

            Comments and documentation needs to be updated.
            -

      Jiayi: PR in arrow-cpp:
      -

      https://github.com/apache/arrow/pull/48431
      -

      Rok is working on rust implementation.:
      https://github.com/apache/arrow-rs/pull/9042
      -

      We need a PR review to move forward.
      -

      We need:
      -

         roll out plan:
         -

            Initially off by default
            -

            Flag to turn it on. (compatible change but increases the file
            size)
            -

            Eventually, becomes the only footer. (incompatible change)
            -

         Consistency validation between old and new footer.
         -

         parquet files for compatibility testing. (TBD)
         -

            https://github.com/apache/parquet-testing


On Tue, Jan 6, 2026 at 5:11 PM Julien Le Dem <[email protected]> wrote:

> The next Parquet sync is tomorrow Wednesday Jan 7th at 10am PT - 1pm ET -
> 7pm CET
>
> To join the invite, join the group:
> https://groups.google.com/g/apache-parquet-community-sync
>
> Everybody is welcome, bring your topic or just listen in.
>
> (Some more details on how the meeting is run:
> https://lists.apache.org/thread/bjdkscmx7zvgfbw0wlfttxy8h6v3f71t )
>

Reply via email to