FWIW vendoring is the approach that DuckDB seems to have taken with the fsst code[1]
[1]: https://github.com/duckdb/duckdb/tree/main/third_party/fsst On Wed, Dec 10, 2025 at 8:47 AM Raúl Cumplido <[email protected]> wrote: > Thanks Arnav for working on this! > > I've taken a look at the current draft PR on the Arrow repository [1]. > > Given the small amount of code required to vendor it, I am +1 on > vendoring it. In general the vendored third party dependencies on > Arrow C++ require less maintenance. In this case where there are no > official releases of FSST and it's not distributed via other channels > it probably is a similar effort. > > Regards, > Raúl > > [1] https://github.com/apache/arrow/pull/48232 > > El mié, 10 dic 2025 a las 14:14, Fokko Driesprong (<[email protected]>) > escribió: > > > > Hey Arnav, > > > > Thanks for raising this. Could you add a link to the brief initial > > discussion? > > > > If we don't need to make any modifications to the external source, I > would > > prefer to pull it in, as that would likely be the easiest > maintenance-wise. > > Upon reviewing the repository, it appears to be MIT-licensed > > <https://github.com/cwida/fsst/blob/master/LICENSE>. This is compatible > > with the ASF license < > https://www.apache.org/legal/resolved.html#category-a>, > > so we can ship it as part of the Parquet project. We must ensure that we > > correctly mention the dependency in the license. > > > > Kind regards, > > Fokko > > > > > > Op wo 10 dec 2025 om 09:26 schreef Arnav Balyan <[email protected] > >: > > > > > Hi team, > > > We recently proposed the support for FSST for Parquet. There are two > main > > > options to take the FSST dependency: > > > > > > 1. CMake dependency on fsst GitHub: > > > > > > - Pull FSST as an external dependency via CMake > > > - Adds an external dependency to the build > > > > > > 2. Vendor the code: > > > > > > - Need to copy 3-4 required source files directly into the repo > > > - No external dependency > > > > > > > > > There was a brief initial discussion on pr, and I just wanted to start > a > > > thread to discuss further. > > > Overall this is a lightweight dependency, with a couple of commits > upstream > > > every few months, vendoring looks like a safe option. It may avoid > external > > > dependency while keeping low maintenance overhead. However, we may > have to > > > pull any major changes in the future. > > > > > > Would love to know what folks think. Are there any concerns with either > > > approach, or a preference on how we have handled similar situations in > the > > > past? > > > > > > > > > Thanks and Regards, > > > Arnav > > > >
