Ok, but DuckDB vendors everything (*), so it's not very relevant as a
reference :-)
My opinion on this is that vendoring should be limited to libraries that
are either very small or very stable. I haven't taken a look at FSST
yet, so I don't have a specific opinion about it, but being a very
recent compression/encoding algorithm my intuition is that it may
receive quite a bit of maintenance (bug fixes, improvements) in the
coming years.
(*) Really: https://github.com/duckdb/duckdb/tree/main/third_party
Regards
Antoine.
Le 10/12/2025 à 14:52, Andrew Lamb a écrit :
FWIW vendoring is the approach that DuckDB seems to have taken with the
fsst code[1]
[1]: https://github.com/duckdb/duckdb/tree/main/third_party/fsst
On Wed, Dec 10, 2025 at 8:47 AM Raúl Cumplido <[email protected]> wrote:
Thanks Arnav for working on this!
I've taken a look at the current draft PR on the Arrow repository [1].
Given the small amount of code required to vendor it, I am +1 on
vendoring it. In general the vendored third party dependencies on
Arrow C++ require less maintenance. In this case where there are no
official releases of FSST and it's not distributed via other channels
it probably is a similar effort.
Regards,
Raúl
[1] https://github.com/apache/arrow/pull/48232
El mié, 10 dic 2025 a las 14:14, Fokko Driesprong (<[email protected]>)
escribió:
Hey Arnav,
Thanks for raising this. Could you add a link to the brief initial
discussion?
If we don't need to make any modifications to the external source, I
would
prefer to pull it in, as that would likely be the easiest
maintenance-wise.
Upon reviewing the repository, it appears to be MIT-licensed
<https://github.com/cwida/fsst/blob/master/LICENSE>. This is compatible
with the ASF license <
https://www.apache.org/legal/resolved.html#category-a>,
so we can ship it as part of the Parquet project. We must ensure that we
correctly mention the dependency in the license.
Kind regards,
Fokko
Op wo 10 dec 2025 om 09:26 schreef Arnav Balyan <[email protected]
:
Hi team,
We recently proposed the support for FSST for Parquet. There are two
main
options to take the FSST dependency:
1. CMake dependency on fsst GitHub:
- Pull FSST as an external dependency via CMake
- Adds an external dependency to the build
2. Vendor the code:
- Need to copy 3-4 required source files directly into the repo
- No external dependency
There was a brief initial discussion on pr, and I just wanted to start
a
thread to discuss further.
Overall this is a lightweight dependency, with a couple of commits
upstream
every few months, vendoring looks like a safe option. It may avoid
external
dependency while keeping low maintenance overhead. However, we may
have to
pull any major changes in the future.
Would love to know what folks think. Are there any concerns with either
approach, or a preference on how we have handled similar situations in
the
past?
Thanks and Regards,
Arnav