Notes from the meeting:
Notes
-
Variant
-
Finalize the variant spec
-
Aihua: spend time on validation and finalize the spec.
-
Java <-> go for shredding
-
Rust: shredding is in implementation?
-
Commit example to the parquet testing.
-
https://github.com/apache/parquet-testing/issues/75
-
Parquet testing fo haskell <https://github.com/mchav/dataframe>
-
Pure haskell implementation
-
Use apache/parquet-testing for testing
-
Time interval
-
Yun: agreement on y-m interval
-
Duration: nano? Parameter for time unit.
-
Will follow up on list and java implementation.
-
Encodings
-
Jeff/Prateek:
-
Doc for process in progress: Parquet new features
<https://docs.google.com/document/d/1qGDnOyoNyPvcN4FCRhbZGAvp0SfewlWo-WVsai5IKUo/edit?tab=t.0>
-
Starting a few proposals:
-
FSST: strings (see paper FSST: Fast Random Access String
Compression <https://www.vldb.org/pvldb/vol13/p2649-boncz.pdf> )
-
ALP: floating points (see paper ALP: Adaptive Lossless
floating-Point Compression
<https://ir.cwi.nl/pub/33334/33334.pdf>)
-
This paper also has a bunch of good example datasets to test for
string compression:
https://www.cs.cit.tum.de/fileadmin/w00cfj/dis/papers/btrblocks.pdf
-
Footer
-
Micah: to follow up with Alkis
-
Rok interested
Action items
- [image: unchecked]
Aihua, Michael, Martin, David: to collaborate on to test files, cross
compatibility tests for finalizing Variant. Can use
https://github.com/apache/parquet-testing/issues/75 for coordinating
- [image: unchecked]
Yun: follow up on the mailing list on time intervals.
- [image: unchecked]
Jeff: start email thread on ALP (or a new encoding).
On Wed, Jul 23, 2025 at 9:28 AM Julien Le Dem <[email protected]> wrote:
> The next Parquet sync is today July 23rd at 10am PT - 1pm ET - 7pm CET
> (in 30 mins)
> I'll be there!
>
> To join the invite, join the group:
> https://groups.google.com/g/apache-parquet-community-sync
>
> Everybody is welcome, bring your topic or just listen in.
>
> (Some more details on how the meeting is run:
> https://lists.apache.org/thread/bjdkscmx7zvgfbw0wlfttxy8h6v3f71t )
>