Please find notes attached:
Attendees:
-
Micah Kornfield: Databricks
-
Adam Reeve: GResearch - Encryption in Rust
-
Adrian Garcia Badaracco: Pydantic
-
Aditya Bhatnagar: CMU
-
Aihua Xu: Snowflake
-
Alex Stephen: Google
-
Brian Hulette: Google
-
Daniel Weeks: Databricks
-
Gijs Burghoorn: Polars
-
Jeff Plaisance: Snowflake
-
Martin Prammer: CMU
-
Prateek Gaur: Snowflake
-
Raul Cumplido: Quantstack
-
Rok Mihevic: GResearch
-
Ryan Blue: Databricks
-
Sandeep Gottimukkala: Snowflake
-
Yun Zou: Snowflake
Agenda:
-
Int96 sort order specified
-
Explicit (new sort order should be added) vs implicit (rely on
version string parsing
-
Need to look to see if int96 was ever produced. Implicit might be
OK in that case, otherwise Sort Order would be preferred.
-
Variant update
-
Writing for Avro/Variant (and shredding) with test cases in java is
merged
-
Go Progress:
-
Write binary values
-
Ability to shred but no control on shredding
-
Touch base to see if this is finalized
-
Testing
-
Existing issue for test data
<https://github.com/apache/parquet-testing/issues/75>
-
Ryan to follow-up with Databricks team on the test issue
-
Shredded and read back
-
Non-shredded and read back
-
Shred everything to shredded schemas for other types and read it
back
-
Release
-
Would be nice for Iceberg.
-
Need a volunteer. (Aihua volunteered need to work through
logistics)
-
Interval type
-
Current proposal:
-
YearMonth - annotates int32
-
DurationNanos - annotated int64
-
New footer (wide schema)
-
I/O is still important (need to make sure size is equivalent)
-
Stats - make it possible to only parse subsections, should footer be
subdivided earlier
-
Forwards/Backwards compatibility should also be considered in
design