Notes from the meeting:
Notes doc
<https://docs.google.com/document/d/e/2PACX-1vSDHW7gvG8eO6aIxaIVPrZSqYYhtRDb5W1imnbpM4QRYNPsTwEO1fU5z7SEhVIFa4YqWJeSRJ9tcXYS/pub>
AI transcript
<https://docs.google.com/document/d/1B0_1GnhMramqTO6dOqkxGklhATlnzk_JO6RQL4rNiHA/edit?resourcekey=0-LGgBRawGKHrzW8pep_t-3g&tab=t.n836tp24rbb8>
(join
the group <https://groups.google.com/g/apache-parquet-community-sync> for
access)


   -

   New encodings
   -

      ALP:
      -

         Prateek update:
         -

            Will make a proposal.
            -

            Migrating to the code conventions in Arrow code base
            -

            Evaluating how to get the best numbers out of the POC: 3X
            better than current encodings.
            -

            Follow up with fallback ALPRD .
            -

            PR in arrow to come relatively soon.
            -

      Updates on FSST encoding:
      -

         Arnav: updated the doc FSST Support in Parquet - Fast Static
         Symbol Table
         
<https://docs.google.com/document/d/1g7zgopxeHc5nofJXfc8EEp_HGMaI8g-jFVvNCs2GVA0/edit?tab=t.0#heading=h.2eyxl5kkyzy7>
based
         on feedback from the community
         -

         Test uncompressed.
         -

         Test with new datasets. Focusing on text oriented datasets
         -

         TODO:
         -

            [Arnav] Articulate in the doc the goal of performance evaluation
            -

               For datasets that do not compress well with dictionary (high
               cardinality), how does FSST compare to ZSTD on Plain?
               -

               Could we use FSST without general purpose compression
               -

            [Arnav-Micah] Need a discussion on how to use dictionary and
            other column Metadata for the benefit of FSST
            -

            [Someone] compare against ZSTD with a shared dictionary. (not a
            blocker)
            -

      Picking encoding:
      -

         Optimize for random access
         -

         Optimize for compression
         -

         Left as an exercise to the writer
         -

   New footer update?
   -

      => next time


On Tue, Nov 11, 2025 at 11:11 PM Julien Le Dem <[email protected]> wrote:

> The next Parquet sync is tomorrow Wednesday Nov 12th at 10am PT - 1pm ET
> - 7pm CET
>
> To join the invite, join the group:
> https://groups.google.com/g/apache-parquet-community-sync
>
> Everybody is welcome, bring your topic or just listen in.
>
> (Some more details on how the meeting is run:
> https://lists.apache.org/thread/bjdkscmx7zvgfbw0wlfttxy8h6v3f71t )
>

Reply via email to