Notes from the meeting: https://docs.google.com/document/d/e/2PACX-1vSDHW7gvG8eO6aIxaIVPrZSqYYhtRDb5W1imnbpM4QRYNPsTwEO1fU5z7SEhVIFa4YqWJeSRJ9tcXYS/pub Gemini notes: https://docs.google.com/document/d/e/2PACX-1vS89gT-RggoNMMuL2Gz4yTUkqYkJvX4_NeFo1fD5zdWuJJD7W6-LTcEf2cVNxR78w7E6hMmWZrX3psi/pub
Attendees: - Micah Kornfield - Databricks - Ahmar Suhail - AWS S3 - Parquet input stream optimisations - Rok Mihevc - GResearch - listening in, flatbuffers footer - Andrew Lamb - InfluxData - listening in - Jiayi Wang - Databricks, footer v3 - Fokko Driesprong - Databricks: Getting people enthusiastic to vote! - Steve Loughran Cloudera: listening in - Anurag Mantripragada <[email protected]>, Apple - listening in, working on a proposal for column updates in Iceberg - Martin Prammer - CMU - Listening in - Julien Le Dem - Datadog. Interested in an update on new encodings and footer. - Alkis will join later - Russell Spitzer <[email protected]> - Snowflake - Listening in - Kenny Daniel - Hyparquet - Arnav Balyan - Uber Notes/Agenda - 1.17.0 Parquet Java Release Vote (devlist <https://lists.apache.org/thread/st0w1c6txzmrwwbdsqdh7xm2br5b2js3>) - Please take a look and and vote on the thread! - Important release. - File Path Status (https://github.com/apache/parquet-format/pull/542) - Micah: heads up that we are close to finalizing this. If you have comments, please chime up now. - TODO: follow up with Dan Weeks - The field is used only in the _SUMMARY file. - We want to clarify current use without precluding future use. - To use it in the future we would need a formal proposal. - There is no consensus yet on whether this functionality should be in Parquet or in Iceberg. - Interested in making more progress on this: Anurag, Kenny, Dan - (updated only a subset of columns, adding columns without rewriting everything, …) - Martin: The iceberg file API might bring more clarity. - The Lancedb approach might be interesting. - Implementation Status ( https://parquet.apache.org/docs/file-format/implementationstatus/#read-support-by-year ) - TODO: - add vendor support. - Update whether hyparquet supports variant. - Parquet Java input stream optimizations: - Ahmar: see email thread <https://lists.apache.org/thread/nbksq32cs8h1ldj8762y6wh9zzp8gqx6> on the list. - Improve query performance reading Parquet on S3. - Pre-fetching, … - Google has a similar project - See document: Analytics Accelerator Library for Amazon S3 and Iceberg <https://docs.google.com/document/d/13shy0RWotwfWC_qQksb95PXdi-vSUCKQyDzjoExQEN0/edit?tab=t.0#heading=h.3lc3p7s26rnw> - Process: - Use github issues and PRs to contribute this in parquet-java project. - This is not changing the format and doesn’t need to go through the Proposals. - New Encodings update: - Arnav: FSST. - Thank you Micah for reviews - How to represent symbol table in the format. - In short: Use the same way the existing lib does it. - Micah: we need the PR at the spec level before we merge implementations in Arrow. - Andrew wants to review the Rust PRs. we need a volunteer to have their code reviewed by Andrew. - Micah: progress is happening on ALP, we’ll check with Prateek next time. - ALP PR sent out (Antoine to review) - spec ready, attached in the PR itself. - https://docs.google.com/document/d/1xz2cudDpN2Y1ImFcTXh15s-3fPtD_aWt/edit?usp=drivesdk&ouid=115352218487952916223&rtpof=true&sd=true - Quite complete and strong value - The spec is detailed and has pseudo code to help replicate in other languages - Encrypted cli now supported in parquet-java. - Flatbuffers Footer - Flatbuffer format PR: https://github.com/apache/parquet-format/pull/544 - Remaining items: - Encryption is missing - A TODO needs clean up - Comments and documentation needs to be updated. - Jiayi: PR in arrow-cpp: - https://github.com/apache/arrow/pull/48431 - Rok is working on rust implementation.: https://github.com/apache/arrow-rs/pull/9042 - We need a PR review to move forward. - We need: - roll out plan: - Initially off by default - Flag to turn it on. (compatible change but increases the file size) - Eventually, becomes the only footer. (incompatible change) - Consistency validation between old and new footer. - parquet files for compatibility testing. (TBD) - https://github.com/apache/parquet-testing On Tue, Jan 6, 2026 at 5:11 PM Julien Le Dem <[email protected]> wrote: > The next Parquet sync is tomorrow Wednesday Jan 7th at 10am PT - 1pm ET - > 7pm CET > > To join the invite, join the group: > https://groups.google.com/g/apache-parquet-community-sync > > Everybody is welcome, bring your topic or just listen in. > > (Some more details on how the meeting is run: > https://lists.apache.org/thread/bjdkscmx7zvgfbw0wlfttxy8h6v3f71t ) >
