Hi Julien, Thanks for the meeting notes. I wasn't able to attend. Did you discuss a new parquet-java release?
Regards, Manu On Thu, Apr 23, 2026 at 7:02 AM Julien Le Dem <[email protected]> wrote: > Notes from the meeting: > > https://docs.google.com/document/d/e/2PACX-1vSDHW7gvG8eO6aIxaIVPrZSqYYhtRDb5W1imnbpM4QRYNPsTwEO1fU5z7SEhVIFa4YqWJeSRJ9tcXYS/pub > Attendees: > > - > > Micah Kornfield - Databricks - Listening in > - > > Neelesh Salian - Apple - Variant related items > - > > Robert Kruszewski - Spiral - Listening in > - > > Martin Prammer - Spiral - Listening in > - > > Gunnar Morling - Confluent - Listening in > - > > Kenny Daniel - Hyperparam - Listening > - > > Divjot Arora - Databricks - Flatbuf footer > - > > Jiayi Wang - backward-compatible VS incompatible changes (part of > flatbuf discussion) > - > > Ismaël Mejía - Microsoft - Java Encoding/Decoding perf > - > > Anurag Mantripragada - Apple - Listening in - Variant stuff > > > - > > Rok Mihevc: G-Research/Arctos Alliance <https://arctosalliance.org/>, > Flatbuffers, FIXED_SIZE_LIST/VECTOR proposal > - > > Prateek - Snowflake - Listening in > - > > Benjamin Owad - Snowflake - Listening in > > > - > > Dusan Paripovic - RTE , listening in > - > > Will Edwards - Spotify - Listening in > - > > Raúl Cumplido - QuantStack - Listening in > - > > Steve Loughran: Variant performance update (good!) > - > > Mengmeng Chen - Snowflake - listening in > - > > Rahil Chertara - Onehouse - listening in > > > Agenda: > > - > > [Neelesh Salian + Steve Loughran] Variant related items > - > > Iceberg - Variant Community Update > < > https://docs.google.com/document/d/1IuhLRxw1rcPD_f4jgHuGe3SwFgy7Y5wgEGvLzf6311s/edit?tab=t.froqj7pg3868#heading=h.r977qio1wsv2 > >(Parquet > items as well) > - > > See doc for Iceberg, Spark and Parquet related items > - > > PRs open for lazy caching…( > https://github.com/apache/parquet-java/pull/3481) > - > > If you want to help, please reach out! Help welcome. Tracker and > benchmark in the doc. > - > > [Ismael] Java Encoding/Decoding ask for review > - > > Experimenting with improving open source libraries with AI. > - > > Based on existing benchmarks. > - > > Performance tests and PRs. > - > > Avg 40% improvement on encodings. (write path) > - > > 10% on read path. > - > > PRs have been reviewed by ismael: not just ai generated. > - > > Need help with reviews from maintainers. > - > > https://github.com/apache/parquet-java/pull/3512 > - > > Gunnar: I've been working on a new Parquet Parser (presented it to > the group a few weeks back, https://github.com/hardwood-hq/hardwood > ); > solely focused on parsing atm., i.e. decoding. Would love to learn > about > any improvements in that area, will check out your PRs. > - > > [Divjot + Jiayi + Rok] Flatbuffer footer > - > > Ref to mailing list thread regarding building bw compatible indices > on thrift footer. > - > > Goal to give faster random access in metadata. > - > > 2 options: > - > > Incremental updates: Index on footer + reducing bloat by removing > less useful metadata. > - > > PR <https://github.com/apache/parquet-format/pull/564> to make > path_in_schema optional > - > > Bigger rewrite with roll out plan: New Flatbuffer based footer. > - > > Open items: > - > > Handling thrift schema evolution, making fields optional to > deprecate. > - > > Discuss increased complexity of thrift jump tables. > - > > Finalizing plan for the flatbuffer footer. > - > > Flatbuffer at prototype state? > - > > Proposal: > - > > 1) replace everything as in the current proposal > - > > 2) make it minimal and more modular with extensions. > - > > We have some internal benchmarks that show that most footers are > actually smaller when using FlatBuffers after removing bloat > unuseful > fields. If there's some public e2e benchmarks, let me know. > But of course, > only readers that adopt flatbuf footer can benefit from it. > - > > Kenny: That assumes making the breaking change of dropping thrift. > If we stay in a backward compat world then we need both flat > and thrift. > That makes files (and parsers) much larger more complicated. > I personally > hate the idea of dropping thrift as it will break a lot of > systems. Making > a big breaking change is an existential risk to parquet... if > its going to > be a hard break why wouldnt users consider alternatives at > that point? I > like the idea of optimizing thrift much more than flatbuffer, > personally. > - > > Gunnar Morling: Yeah, similar sentiment here > - > > Robert: How about embedding Vortex? > - > > Stated goal not to embed opaque encodings, schemes. > - > > Embed vortex flatbuffer footer > - > > Readers who can parse the footer can treat the opaque > encoding as transparent > - > > Input from other projects is welcome. > - > > TODO: > - > > Shared doc to articulate > - > > Jiayi, Divjot, Will, Gunnar, Alkis, Robert, Rok > - > > Content: > - > > Describe the problem: large footer, wide schema > - > > Can have big footer with many row groups as well. > - > > Describe what’s pathological > - > > Describe the options at a high level, point to detailed docs > of POC/proposals. > - > > Useful to share files with the problem. > - > > Difficult > - > > Regular meeting. Jiayi: facilitator > - > > [Rok] FIXED_SIZE_LIST/VECTOR proposal > - > > This is still ongoing. > - > > 3 options, will write a doc and report to the mailing list. > - > > Use case: efficiently store Vectors > - > > Micah: how about adding a 4th option: new logical type vector that > annotates the existing FLBA type (?) => know you don’t have to read > Repetition Levels. > - > > Rahil: similar to what is being done in Hudi. > - > > Need to discuss dense vectors vs sparse vectors. > > > On Tue, Apr 21, 2026 at 2:53 PM Julien Le Dem <[email protected]> wrote: > > > The next Parquet sync is tomorrow Wednesday Apr 22nd at 10am PT - 1pm ET > > - 7pm CET > > > > To join the invite, join the group: > > https://groups.google.com/g/apache-parquet-community-sync > > > > Everybody is welcome, bring your topic or just listen in. > > > > (Some more details on how the meeting is run: > > https://lists.apache.org/thread/bjdkscmx7zvgfbw0wlfttxy8h6v3f71t ) > > >
