Attendees:
-
Julien: Datadog
-
Alex: Google, listening in
-
Aditya: CMU Variant
-
Alkis: Databricks, new footer update
-
Andrew Lamb: Influx Data. listen
-
Brian: Google
-
Dewey: geometry
-
Talat: Google.
-
Jeff: Snowflake. Footer, encoding
-
Martin: CMU variant
-
Mengmeng: snowflake
-
Prashant: snowflake
-
Prateek: snowflake scan team. Encodings, exp
-
Russel: Snowflake. Duck
-
Sai: snowflake
-
Sandieep: snowflake
-
Selcuk: snowflake
-
Selim: detection partition field, java api.modification date?
-
Thomas: snowflake: floating type proposal.
-
Vinoo: startup
-
Micah: Databricks.
Agenda/Notes:
-
New footer update (Alkis):
-
Prototype running in databricks:
-
https://github.com/apache/arrow/pull/43793
-
Deserialization results:
-
For compatibility: generate the thrift data structure from the
flatbuffer.
-
Deserialization speed improved by 30% on average
-
Pathological cases 5 to 10x speedup
-
Some cases have regression (20-30%)
-
Need to figure out why and fix it.
-
Expect better fetch optimization because the footer is smaller.
-
This is all without taking advantage of:
-
Partial deserialization
-
Not having to produce the thrift
-
Do we need to evaluate in the context of caching files in SSDs,
memory etc
-
Follow ups:
-
Process to adopt new encodings (Sorry Micah will share doc tonight
for community input)
-
Selcuk, Jeff
-
Talat
-
Micah, Alkis
-
[Thomas]DECFLOAT Parquet Proposal
<https://docs.google.com/document/d/1j_Q6vnn6Nhy60K4o0tdC91kE5vKGNJaoDOAm71KLzNw/edit?tab=t.0#heading=h.a3yn4bu050pz>
-
Decimal floating point type: A third type beyond fixed type and
floating point
-
Spark support Decfloat (using bigdecimal)
On Tue, May 27, 2025 at 8:35 PM Julien Le Dem <[email protected]> wrote:
> The next Parquet sync is tomorrow May 28th at 10am PT - 1pm ET - 7pm CET
> To join the invite, join the group:
> https://groups.google.com/g/apache-parquet-community-sync
>
> Everybody is welcome, bring your topic or just listen in.
>
> (Some more details on how the meeting is run:
> https://lists.apache.org/thread/bjdkscmx7zvgfbw0wlfttxy8h6v3f71t )
>