Attendees and topics:
-
Micah Databricks: follow up on guidelines for new encodings.
-
Andrew Lamb: InfluxData, Rust Parquet maintainer, discuss Variant binary
-
Kenny: HyParquet, JS.
-
Dewey Whereabout. C++ geometry, feng Java impl. Update on Geo
-
Adam: G-Research OSS, ParquetSharp. Work with Rok, Key mgmt tool api to
arrow rust
-
Talat Google BigQuery
-
Raul quantstack, arrow Parquet cpp
-
Rok: KMS question for encryption, Variant
-
Fokko DB
-
Gabor dremio: Variant, Parquet-java/avro module CVE
-
Dan: Databricks, Variant, geotypes. (iceberg dep)
-
Martin CMU, Variant in Rust
-
Jiaying: CMU. Rust Arrow, Parquet
-
Aihua: Snowflake, Variant
-
Gene Databricks, Variant
-
Steve Loughran
Notes
-
parquet-java/avro
-
We need a proper way to limit risk with reflection
-
Need avro expertise
-
Option to remove functionality or forcing opt in a a system property.
-
dev-list:
https://lists.apache.org/thread/c91s61tqkbbrc7xj180xh2rx89yx8pfk
-
Avro GitHub issues:
-
https://github.com/apache/parquet-java/issues/3194
-
https://github.com/apache/parquet-java/issues/3195
-
Encryption KMS
-
KMS metadata format not officially standardised but used in
parquet-java and C++/PyArrow
-
Current spec:
https://parquet.apache.org/docs/file-format/data-pages/encryption/#43-key-metadata
-
PR to add KMS API to arrow-rs
https://github.com/apache/arrow-rs/pull/7387
-
Possibly maintained externally to start with rather than in arrow-rs.
-
Update on geometry types
-
2 PRs being reviewed
-
All inconsistencies between java and C++ resolved now
-
Ex: Null
-
Canonical way to represent totally empty things
-
Need Arrow in and out of that.
-
Variant
-
Rust: https://github.com/apache/arrow-rs/pull/7404
-
See epic https://github.com/apache/arrow-rs/issues/6736
-
Discussion on how to fail early when we have an unknown version of
the Variant spec.
-
Testing Binary compatibility [Andrew]
-
Made a PR with example Binary variants:
https://github.com/apache/parquet-testing/issues/75
-
Existing implementations:
-
Spark can read/write variant
-
Iceberg implementation to read Variant Binary (java)
-
GO: https://github.com/apache/arrow-go/pull/344/files
-
Logical type: https://github.com/apache/parquet-java/pull/3072
Action items
- [image: unchecked]
[Gabor and Steve] to follow up on the list on restricting more Avro
deserialization.
-
Follow up on adding files generated by different implementations of the
Variant spec.
On Tue, Apr 15, 2025 at 4:07 PM Julien Le Dem <[email protected]> wrote:
> The next Parquet sync is tomorrow Apr 16th at 10am PT - 1pm ET - 7pm CET
> To join the invite:
> https://calendar.app.google/rCLANWLz1xg69mTL7
> Please contact me to be added to the recurring invite. (every two weeks)
> Everybody is welcome, bring your topic or just listen in.
>
> (Some more details on how the meeting is run:
> https://lists.apache.org/thread/bjdkscmx7zvgfbw0wlfttxy8h6v3f71t )
>
> Best
> Julien
>