Notes from the meeting:
Notes doc
<https://docs.google.com/document/d/e/2PACX-1vSDHW7gvG8eO6aIxaIVPrZSqYYhtRDb5W1imnbpM4QRYNPsTwEO1fU5z7SEhVIFa4YqWJeSRJ9tcXYS/pub>
AI transcript
<https://docs.google.com/document/d/1B0_1GnhMramqTO6dOqkxGklhATlnzk_JO6RQL4rNiHA/edit?resourcekey=0-LGgBRawGKHrzW8pep_t-3g&tab=t.n836tp24rbb8>
(join
the group <https://groups.google.com/g/apache-parquet-community-sync> for
access)
-
New encodings
-
ALP:
-
Prateek update:
-
Will make a proposal.
-
Migrating to the code conventions in Arrow code base
-
Evaluating how to get the best numbers out of the POC: 3X
better than current encodings.
-
Follow up with fallback ALPRD .
-
PR in arrow to come relatively soon.
-
Updates on FSST encoding:
-
Arnav: updated the doc FSST Support in Parquet - Fast Static
Symbol Table
<https://docs.google.com/document/d/1g7zgopxeHc5nofJXfc8EEp_HGMaI8g-jFVvNCs2GVA0/edit?tab=t.0#heading=h.2eyxl5kkyzy7>
based
on feedback from the community
-
Test uncompressed.
-
Test with new datasets. Focusing on text oriented datasets
-
TODO:
-
[Arnav] Articulate in the doc the goal of performance evaluation
-
For datasets that do not compress well with dictionary (high
cardinality), how does FSST compare to ZSTD on Plain?
-
Could we use FSST without general purpose compression
-
[Arnav-Micah] Need a discussion on how to use dictionary and
other column Metadata for the benefit of FSST
-
[Someone] compare against ZSTD with a shared dictionary. (not a
blocker)
-
Picking encoding:
-
Optimize for random access
-
Optimize for compression
-
Left as an exercise to the writer
-
New footer update?
-
=> next time
On Tue, Nov 11, 2025 at 11:11 PM Julien Le Dem <[email protected]> wrote:
> The next Parquet sync is tomorrow Wednesday Nov 12th at 10am PT - 1pm ET
> - 7pm CET
>
> To join the invite, join the group:
> https://groups.google.com/g/apache-parquet-community-sync
>
> Everybody is welcome, bring your topic or just listen in.
>
> (Some more details on how the meeting is run:
> https://lists.apache.org/thread/bjdkscmx7zvgfbw0wlfttxy8h6v3f71t )
>