Ok, but DuckDB vendors everything (*), so it's not very relevant as a reference :-)

My opinion on this is that vendoring should be limited to libraries that are either very small or very stable. I haven't taken a look at FSST yet, so I don't have a specific opinion about it, but being a very recent compression/encoding algorithm my intuition is that it may receive quite a bit of maintenance (bug fixes, improvements) in the coming years.

(*) Really: https://github.com/duckdb/duckdb/tree/main/third_party

Regards

Antoine.



Le 10/12/2025 à 14:52, Andrew Lamb a écrit :
FWIW vendoring is the approach that DuckDB seems to have taken with the
fsst code[1]

[1]: https://github.com/duckdb/duckdb/tree/main/third_party/fsst

On Wed, Dec 10, 2025 at 8:47 AM Raúl Cumplido <[email protected]> wrote:

Thanks Arnav for working on this!

I've taken a look at the current draft PR on the Arrow repository [1].

Given the small amount of code required to vendor it, I am +1 on
vendoring it. In general the vendored third party dependencies on
Arrow C++ require less maintenance. In this case where there are no
official releases of FSST and it's not distributed via other channels
it probably is a similar effort.

Regards,
Raúl

[1] https://github.com/apache/arrow/pull/48232

El mié, 10 dic 2025 a las 14:14, Fokko Driesprong (<[email protected]>)
escribió:

Hey Arnav,

Thanks for raising this. Could you add a link to the brief initial
discussion?

If we don't need to make any modifications to the external source, I
would
prefer to pull it in, as that would likely be the easiest
maintenance-wise.
Upon reviewing the repository, it appears to be MIT-licensed
<https://github.com/cwida/fsst/blob/master/LICENSE>. This is compatible
with the ASF license <
https://www.apache.org/legal/resolved.html#category-a>,
so we can ship it as part of the Parquet project. We must ensure that we
correctly mention the dependency in the license.

Kind regards,
Fokko


Op wo 10 dec 2025 om 09:26 schreef Arnav Balyan <[email protected]
:

Hi team,
We recently proposed the support for FSST for Parquet. There are two
main
options to take the FSST dependency:

1. CMake dependency on fsst GitHub:

    - Pull FSST as an external dependency via CMake
    - Adds an external dependency to the build

2. Vendor the code:

    - Need to copy 3-4 required source files directly into the repo
    - No external dependency


There was a brief initial discussion on pr, and I just wanted to start
a
thread to discuss further.
Overall this is a lightweight dependency, with a couple of commits
upstream
every few months, vendoring looks like a safe option. It may avoid
external
dependency while keeping low maintenance overhead. However, we may
have to
pull any major changes in the future.

Would love to know what folks think. Are there any concerns with either
approach, or a preference on how we have handled similar situations in
the
past?


Thanks and Regards,
Arnav





Reply via email to