touche

One thing that may actually be relevant  from the duckdb fork is to examine
what they have changed:
https://github.com/duckdb/duckdb/commits/main/third_party/fsst

Specifically it looks like they have fixed bugs such as [1], and then
contributed the fixes back upstream[2].

Andrew



[1]:
https://github.com/duckdb/duckdb/commit/078e96ac75c77f03302dadd7b0488cdb9f04fc9a
[2]: https://github.com/cwida/fsst/pull/31


On Wed, Dec 10, 2025 at 10:22 AM Antoine Pitrou <[email protected]> wrote:

>
> Ok, but DuckDB vendors everything (*), so it's not very relevant as a
> reference :-)
>
> My opinion on this is that vendoring should be limited to libraries that
> are either very small or very stable. I haven't taken a look at FSST
> yet, so I don't have a specific opinion about it, but being a very
> recent compression/encoding algorithm my intuition is that it may
> receive quite a bit of maintenance (bug fixes, improvements) in the
> coming years.
>
> (*) Really: https://github.com/duckdb/duckdb/tree/main/third_party
>
> Regards
>
> Antoine.
>
>
>
> Le 10/12/2025 à 14:52, Andrew Lamb a écrit :
> > FWIW vendoring is the approach that DuckDB seems to have taken with the
> > fsst code[1]
> >
> > [1]: https://github.com/duckdb/duckdb/tree/main/third_party/fsst
> >
> > On Wed, Dec 10, 2025 at 8:47 AM Raúl Cumplido <[email protected]> wrote:
> >
> >> Thanks Arnav for working on this!
> >>
> >> I've taken a look at the current draft PR on the Arrow repository [1].
> >>
> >> Given the small amount of code required to vendor it, I am +1 on
> >> vendoring it. In general the vendored third party dependencies on
> >> Arrow C++ require less maintenance. In this case where there are no
> >> official releases of FSST and it's not distributed via other channels
> >> it probably is a similar effort.
> >>
> >> Regards,
> >> Raúl
> >>
> >> [1] https://github.com/apache/arrow/pull/48232
> >>
> >> El mié, 10 dic 2025 a las 14:14, Fokko Driesprong (<[email protected]>)
> >> escribió:
> >>>
> >>> Hey Arnav,
> >>>
> >>> Thanks for raising this. Could you add a link to the brief initial
> >>> discussion?
> >>>
> >>> If we don't need to make any modifications to the external source, I
> >> would
> >>> prefer to pull it in, as that would likely be the easiest
> >> maintenance-wise.
> >>> Upon reviewing the repository, it appears to be MIT-licensed
> >>> <https://github.com/cwida/fsst/blob/master/LICENSE>. This is
> compatible
> >>> with the ASF license <
> >> https://www.apache.org/legal/resolved.html#category-a>,
> >>> so we can ship it as part of the Parquet project. We must ensure that
> we
> >>> correctly mention the dependency in the license.
> >>>
> >>> Kind regards,
> >>> Fokko
> >>>
> >>>
> >>> Op wo 10 dec 2025 om 09:26 schreef Arnav Balyan <
> [email protected]
> >>> :
> >>>
> >>>> Hi team,
> >>>> We recently proposed the support for FSST for Parquet. There are two
> >> main
> >>>> options to take the FSST dependency:
> >>>>
> >>>> 1. CMake dependency on fsst GitHub:
> >>>>
> >>>>     - Pull FSST as an external dependency via CMake
> >>>>     - Adds an external dependency to the build
> >>>>
> >>>> 2. Vendor the code:
> >>>>
> >>>>     - Need to copy 3-4 required source files directly into the repo
> >>>>     - No external dependency
> >>>>
> >>>>
> >>>> There was a brief initial discussion on pr, and I just wanted to start
> >> a
> >>>> thread to discuss further.
> >>>> Overall this is a lightweight dependency, with a couple of commits
> >> upstream
> >>>> every few months, vendoring looks like a safe option. It may avoid
> >> external
> >>>> dependency while keeping low maintenance overhead. However, we may
> >> have to
> >>>> pull any major changes in the future.
> >>>>
> >>>> Would love to know what folks think. Are there any concerns with
> either
> >>>> approach, or a preference on how we have handled similar situations in
> >> the
> >>>> past?
> >>>>
> >>>>
> >>>> Thanks and Regards,
> >>>> Arnav
> >>>>
> >>
> >
>
>
>

Reply via email to