Attendees:
[email protected]: Andrew, Influx data, Saying Hi!, not causing trouble
(he says)
Micah: Google, saying Hi! Causing trouble (he says)
Raul: Arrow C++/Py release manager
Julien: Datadog
Notes:
-
Updates
-
Variant: iterating in parquet-format repo
-
Footer metadata rewrite
-
Need to review individual optimizations
-
Relative indices:
-
Trade offs:
-
Pro: Smaller metadata
-
Con: More complex computation
-
Do we need 2 layers of metadata?
-
Modular vs one footer?
-
How much does the footer size matter?
-
Encodings
-
How to add future-proof encodings
-
Andrew suggested Wasm based plugins:
-
Expressivity?
-
Security?
-
Could be great for experimenting with new encoding.
-
Not great for having a standard fully defined format.
-
Query engines do integrate decoding with evaluation.
-
Would need a clear contract:
-
skip(n)
-
decode_n_values_to _arrow
-
…
-
Storing plugin in file? => security issues
-
Independent discussion of adding a few encodings
-
good Integer encodings for timestamps.
-
Good encodings for floats
-
…
-
Compelling new encodings papers:
-
Fastlane paper: 10x improvement?
-
BTR blocks: a way to cascade encodings in a better way.
On Wed, Oct 23, 2024 at 8:33 AM Julien Le Dem <[email protected]> wrote:
> The next Parquet sync is today Oct 23rd at 9:30am PT - 12:30pm ET - 6:30pm
> CET
> (in ~ 1h)
> To join the invite:
> https://calendar.app.google/GjNGkjfMYyoBUpaGA
> Please contact me to be added to the recurring invite.
> Everybody is welcome, bring your topic or just listen in.
> Best
> Julien
>