ppkarwasz commented on PR #673: URL: https://github.com/apache/commons-compress/pull/673#issuecomment-2926766400
> What I care about is the ability to build independent 'archiver' libraries, for instance, for ZIP, TAR, and other formats, and similarly for 'compressor' libraries. This is currently impossible because, for instance, the file `ZipSplitReadOnlySeekableByteChannel` references `ArchiveStreamFactory`, and obviously `ArchiveStreamFactory` depends on `ZipArchiveInputStream` etc. So compiling the zip archiver inevitably brings in the whole library. Have you tried using shading? The Maven Shade Plugin supports a [`minimizeJar`](https://maven.apache.org/plugins/maven-shade-plugin/shade-mojo.html#minimizeJar) option that removes all classes not reachable from a given set of entry points. When I specify `ZipSplitReadOnlySeekableByteChannel` as the entry point, I get a much smaller JAR—around 140 KiB—that **does not** include `ArchiveStreamFactory`. This works because the feature uses [`jdependency`](https://github.com/tcurdt/jdependency), which traces class references at the **bytecode** level. In contrast, `javac` processes all source-level references, even if the resulting class file doesn’t depend on them directly. > The reason I care about that is that I maintain the source code for apache commons-compress at google, and it has been a common occurrence that there is a vulnerability discovered in one of the archiver or compressor implementations, and we have to rush to update the commons-compress library. And when we do this, we often find that commons-compress has added more strict file validation or has otherwise changed its behavior such that passing tests start to fail. > > So perhaps you can understand that we're very interested in reducing the impact of vulnerabilities by reducing the surface area of the library that we use. On that basis, I urge you to reconsider closing this PR. Thank you—your motivation is absolutely valid, and I fully agree with the goal of reducing the surface area of Commons Compress that can be affected by vulnerabilities. In fact, this is one of the primary goals for Commons Compress 2.x, as discussed in [this thread on the `dev@commons` mailing list](https://lists.apache.org/thread/qd64z9vl67y8cm33optp62w10y30y2z3). Where we differ is in how to approach that goal: * From what you've described, it sounds like you're maintaining a private fork of Commons Compress at Google, removing archivers and compressors that aren't needed internally. * Our long-term plan is to modularize the library into a small shared core (defining common APIs and abstractions) and a set of separate modules—one per archiver/compressor, especially when they introduce external dependencies. That said, we currently lack the volunteer bandwidth to execute this plan in the short term. Modularizing Log4j Core was possible thanks to STA funding; without similar support, a modular Commons Compress could take years. Given that you're already maintaining a fork, would you be interested in helping us move toward that modular structure? I could prepare a `2.x` branch with the initial scaffolding for a multi-module Maven project, and you could help by extracting and maintaining the modules for the formats you rely on. This would result in much cleaner separation than managing internal dependencies manually. Feel free to start a discussion on the [`dev@commons` mailing list](https://commons.apache.org/mail-lists.html) or reach out @garydgregory and me directly. **Side note**: In parallel, we are also considering publishing intermediate VEX statements, as [some of your colleagues at Google have suggested](https://osv.dev/blog/posts/automating-and-scaling-vex-generation/). These could help your security response team triage false positives when vulnerabilities are reported in components you don’t actually use. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
