Hey everyone,
The notes of the sync earlier today.
Attendees:
-
Micah: Google, Listening in
-
Julien: Datadog, interested in updates
-
Ashish: Listening in
-
Fokko: Databricks, Listening in
-
Rok: Datatart
-
Joe: GoodData, listening in
-
Claire: Spotify
-
Ryan: Databricks, Variant shredding
Agenda:
-
Variant shredding spec
-
Geotype (skipped due to limited Geo audience)
Notes:
-
Variant:
-
Micah:
-
Issues on the ML
<https://lists.apache.org/thread/07jpgltw3gpm9lcy72zos717mj54yzwq>
are not blocking
-
TODO: get back to comment on the PR.
-
How do we handle invalid variants?
-
Have to throw an error, so whoever produced it has to fix the data
itself, instead of patching the reader.
-
With shredding the original field is removed from the unshredded data
-
Duplicate fields
-
Never trust the fields that are unshredded
-
Variant: The discussion will continue on the PR
<https://github.com/apache/parquet-format/pull/461>
Thanks everyone for attending, wish everyone great holidays, and the next
sync will take place on the 8th of January 2025. Hope to see y'all then!
Kind regards,
Fokko