Hey Arnav, Thanks for raising this. Could you add a link to the brief initial discussion?
If we don't need to make any modifications to the external source, I would prefer to pull it in, as that would likely be the easiest maintenance-wise. Upon reviewing the repository, it appears to be MIT-licensed <https://github.com/cwida/fsst/blob/master/LICENSE>. This is compatible with the ASF license <https://www.apache.org/legal/resolved.html#category-a>, so we can ship it as part of the Parquet project. We must ensure that we correctly mention the dependency in the license. Kind regards, Fokko Op wo 10 dec 2025 om 09:26 schreef Arnav Balyan <[email protected]>: > Hi team, > We recently proposed the support for FSST for Parquet. There are two main > options to take the FSST dependency: > > 1. CMake dependency on fsst GitHub: > > - Pull FSST as an external dependency via CMake > - Adds an external dependency to the build > > 2. Vendor the code: > > - Need to copy 3-4 required source files directly into the repo > - No external dependency > > > There was a brief initial discussion on pr, and I just wanted to start a > thread to discuss further. > Overall this is a lightweight dependency, with a couple of commits upstream > every few months, vendoring looks like a safe option. It may avoid external > dependency while keeping low maintenance overhead. However, we may have to > pull any major changes in the future. > > Would love to know what folks think. Are there any concerns with either > approach, or a preference on how we have handled similar situations in the > past? > > > Thanks and Regards, > Arnav >
