FWIW vendoring is the approach that DuckDB seems to have taken with the
fsst code[1]

[1]: https://github.com/duckdb/duckdb/tree/main/third_party/fsst

On Wed, Dec 10, 2025 at 8:47 AM Raúl Cumplido <[email protected]> wrote:

> Thanks Arnav for working on this!
>
> I've taken a look at the current draft PR on the Arrow repository [1].
>
> Given the small amount of code required to vendor it, I am +1 on
> vendoring it. In general the vendored third party dependencies on
> Arrow C++ require less maintenance. In this case where there are no
> official releases of FSST and it's not distributed via other channels
> it probably is a similar effort.
>
> Regards,
> Raúl
>
> [1] https://github.com/apache/arrow/pull/48232
>
> El mié, 10 dic 2025 a las 14:14, Fokko Driesprong (<[email protected]>)
> escribió:
> >
> > Hey Arnav,
> >
> > Thanks for raising this. Could you add a link to the brief initial
> > discussion?
> >
> > If we don't need to make any modifications to the external source, I
> would
> > prefer to pull it in, as that would likely be the easiest
> maintenance-wise.
> > Upon reviewing the repository, it appears to be MIT-licensed
> > <https://github.com/cwida/fsst/blob/master/LICENSE>. This is compatible
> > with the ASF license <
> https://www.apache.org/legal/resolved.html#category-a>,
> > so we can ship it as part of the Parquet project. We must ensure that we
> > correctly mention the dependency in the license.
> >
> > Kind regards,
> > Fokko
> >
> >
> > Op wo 10 dec 2025 om 09:26 schreef Arnav Balyan <[email protected]
> >:
> >
> > > Hi team,
> > > We recently proposed the support for FSST for Parquet. There are two
> main
> > > options to take the FSST dependency:
> > >
> > > 1. CMake dependency on fsst GitHub:
> > >
> > >    - Pull FSST as an external dependency via CMake
> > >    - Adds an external dependency to the build
> > >
> > > 2. Vendor the code:
> > >
> > >    - Need to copy 3-4 required source files directly into the repo
> > >    - No external dependency
> > >
> > >
> > > There was a brief initial discussion on pr, and I just wanted to start
> a
> > > thread to discuss further.
> > > Overall this is a lightweight dependency, with a couple of commits
> upstream
> > > every few months, vendoring looks like a safe option. It may avoid
> external
> > > dependency while keeping low maintenance overhead. However, we may
> have to
> > > pull any major changes in the future.
> > >
> > > Would love to know what folks think. Are there any concerns with either
> > > approach, or a preference on how we have handled similar situations in
> the
> > > past?
> > >
> > >
> > > Thanks and Regards,
> > > Arnav
> > >
>

Reply via email to