No, this was not discussed. On Wed, Apr 22, 2026 at 8:11 PM Manu Zhang <[email protected]> wrote:
> Hi Julien, > > Thanks for the meeting notes. I wasn't able to attend. Did you discuss a > new parquet-java release? > > Regards, > Manu > > On Thu, Apr 23, 2026 at 7:02 AM Julien Le Dem <[email protected]> wrote: > > > Notes from the meeting: > > > > > https://docs.google.com/document/d/e/2PACX-1vSDHW7gvG8eO6aIxaIVPrZSqYYhtRDb5W1imnbpM4QRYNPsTwEO1fU5z7SEhVIFa4YqWJeSRJ9tcXYS/pub > > Attendees: > > > > - > > > > Micah Kornfield - Databricks - Listening in > > - > > > > Neelesh Salian - Apple - Variant related items > > - > > > > Robert Kruszewski - Spiral - Listening in > > - > > > > Martin Prammer - Spiral - Listening in > > - > > > > Gunnar Morling - Confluent - Listening in > > - > > > > Kenny Daniel - Hyperparam - Listening > > - > > > > Divjot Arora - Databricks - Flatbuf footer > > - > > > > Jiayi Wang - backward-compatible VS incompatible changes (part of > > flatbuf discussion) > > - > > > > Ismaël Mejía - Microsoft - Java Encoding/Decoding perf > > - > > > > Anurag Mantripragada - Apple - Listening in - Variant stuff > > > > > > - > > > > Rok Mihevc: G-Research/Arctos Alliance <https://arctosalliance.org/>, > > Flatbuffers, FIXED_SIZE_LIST/VECTOR proposal > > - > > > > Prateek - Snowflake - Listening in > > - > > > > Benjamin Owad - Snowflake - Listening in > > > > > > - > > > > Dusan Paripovic - RTE , listening in > > - > > > > Will Edwards - Spotify - Listening in > > - > > > > Raúl Cumplido - QuantStack - Listening in > > - > > > > Steve Loughran: Variant performance update (good!) > > - > > > > Mengmeng Chen - Snowflake - listening in > > - > > > > Rahil Chertara - Onehouse - listening in > > > > > > Agenda: > > > > - > > > > [Neelesh Salian + Steve Loughran] Variant related items > > - > > > > Iceberg - Variant Community Update > > < > > > https://docs.google.com/document/d/1IuhLRxw1rcPD_f4jgHuGe3SwFgy7Y5wgEGvLzf6311s/edit?tab=t.froqj7pg3868#heading=h.r977qio1wsv2 > > >(Parquet > > items as well) > > - > > > > See doc for Iceberg, Spark and Parquet related items > > - > > > > PRs open for lazy caching…( > > https://github.com/apache/parquet-java/pull/3481) > > - > > > > If you want to help, please reach out! Help welcome. Tracker and > > benchmark in the doc. > > - > > > > [Ismael] Java Encoding/Decoding ask for review > > - > > > > Experimenting with improving open source libraries with AI. > > - > > > > Based on existing benchmarks. > > - > > > > Performance tests and PRs. > > - > > > > Avg 40% improvement on encodings. (write path) > > - > > > > 10% on read path. > > - > > > > PRs have been reviewed by ismael: not just ai generated. > > - > > > > Need help with reviews from maintainers. > > - > > > > https://github.com/apache/parquet-java/pull/3512 > > - > > > > Gunnar: I've been working on a new Parquet Parser (presented it to > > the group a few weeks back, > https://github.com/hardwood-hq/hardwood > > ); > > solely focused on parsing atm., i.e. decoding. Would love to learn > > about > > any improvements in that area, will check out your PRs. > > - > > > > [Divjot + Jiayi + Rok] Flatbuffer footer > > - > > > > Ref to mailing list thread regarding building bw compatible indices > > on thrift footer. > > - > > > > Goal to give faster random access in metadata. > > - > > > > 2 options: > > - > > > > Incremental updates: Index on footer + reducing bloat by > removing > > less useful metadata. > > - > > > > PR <https://github.com/apache/parquet-format/pull/564> to > make > > path_in_schema optional > > - > > > > Bigger rewrite with roll out plan: New Flatbuffer based footer. > > - > > > > Open items: > > - > > > > Handling thrift schema evolution, making fields optional to > > deprecate. > > - > > > > Discuss increased complexity of thrift jump tables. > > - > > > > Finalizing plan for the flatbuffer footer. > > - > > > > Flatbuffer at prototype state? > > - > > > > Proposal: > > - > > > > 1) replace everything as in the current proposal > > - > > > > 2) make it minimal and more modular with extensions. > > - > > > > We have some internal benchmarks that show that most footers are > > actually smaller when using FlatBuffers after removing bloat > > unuseful > > fields. If there's some public e2e benchmarks, let me know. > > But of course, > > only readers that adopt flatbuf footer can benefit from it. > > - > > > > Kenny: That assumes making the breaking change of dropping > thrift. > > If we stay in a backward compat world then we need both flat > > and thrift. > > That makes files (and parsers) much larger more complicated. > > I personally > > hate the idea of dropping thrift as it will break a lot of > > systems. Making > > a big breaking change is an existential risk to parquet... if > > its going to > > be a hard break why wouldnt users consider alternatives at > > that point? I > > like the idea of optimizing thrift much more than flatbuffer, > > personally. > > - > > > > Gunnar Morling: Yeah, similar sentiment here > > - > > > > Robert: How about embedding Vortex? > > - > > > > Stated goal not to embed opaque encodings, schemes. > > - > > > > Embed vortex flatbuffer footer > > - > > > > Readers who can parse the footer can treat the opaque > > encoding as transparent > > - > > > > Input from other projects is welcome. > > - > > > > TODO: > > - > > > > Shared doc to articulate > > - > > > > Jiayi, Divjot, Will, Gunnar, Alkis, Robert, Rok > > - > > > > Content: > > - > > > > Describe the problem: large footer, wide schema > > - > > > > Can have big footer with many row groups as well. > > - > > > > Describe what’s pathological > > - > > > > Describe the options at a high level, point to detailed > docs > > of POC/proposals. > > - > > > > Useful to share files with the problem. > > - > > > > Difficult > > - > > > > Regular meeting. Jiayi: facilitator > > - > > > > [Rok] FIXED_SIZE_LIST/VECTOR proposal > > - > > > > This is still ongoing. > > - > > > > 3 options, will write a doc and report to the mailing list. > > - > > > > Use case: efficiently store Vectors > > - > > > > Micah: how about adding a 4th option: new logical type vector that > > annotates the existing FLBA type (?) => know you don’t have to read > > Repetition Levels. > > - > > > > Rahil: similar to what is being done in Hudi. > > - > > > > Need to discuss dense vectors vs sparse vectors. > > > > > > On Tue, Apr 21, 2026 at 2:53 PM Julien Le Dem <[email protected]> wrote: > > > > > The next Parquet sync is tomorrow Wednesday Apr 22nd at 10am PT - 1pm > ET > > > - 7pm CET > > > > > > To join the invite, join the group: > > > https://groups.google.com/g/apache-parquet-community-sync > > > > > > Everybody is welcome, bring your topic or just listen in. > > > > > > (Some more details on how the meeting is run: > > > https://lists.apache.org/thread/bjdkscmx7zvgfbw0wlfttxy8h6v3f71t ) > > > > > >
