Re: [PR] Separate ArchiveStreamConstants from ArchiveStreamFactory [commons-compress]

via GitHub Sun, 01 Jun 2025 00:58:29 -0700


ppkarwasz commented on PR #673:
URL: https://github.com/apache/commons-compress/pull/673#issuecomment-2926766400

> What I care about is the ability to build independent 'archiver'
libraries, for instance, for ZIP, TAR, and other formats, and similarly for
'compressor' libraries. This is currently impossible because, for instance, the
file `ZipSplitReadOnlySeekableByteChannel` references `ArchiveStreamFactory`,
and obviously `ArchiveStreamFactory` depends on `ZipArchiveInputStream` etc. So
compiling the zip archiver inevitably brings in the whole library.

Have you tried using shading? The Maven Shade Plugin supports a
[`minimizeJar`](https://maven.apache.org/plugins/maven-shade-plugin/shade-mojo.html#minimizeJar)
option that removes all classes not reachable from a given set of entry
points. When I specify `ZipSplitReadOnlySeekableByteChannel` as the entry
point, I get a much smaller JAR—around 140 KiB—that **does not** include
`ArchiveStreamFactory`.

This works because the feature uses
[`jdependency`](https://github.com/tcurdt/jdependency), which traces class
references at the **bytecode** level. In contrast, `javac` processes all
source-level references, even if the resulting class file doesn’t depend on
them directly.

> The reason I care about that is that I maintain the source code for apache
commons-compress at google, and it has been a common occurrence that there is a
vulnerability discovered in one of the archiver or compressor implementations,
and we have to rush to update the commons-compress library. And when we do
this, we often find that commons-compress has added more strict file validation
or has otherwise changed its behavior such that passing tests start to fail.
>
> So perhaps you can understand that we're very interested in reducing the
impact of vulnerabilities by reducing the surface area of the library that we
use. On that basis, I urge you to reconsider closing this PR.

Thank you—your motivation is absolutely valid, and I fully agree with the
goal of reducing the surface area of Commons Compress that can be affected by
vulnerabilities. In fact, this is one of the primary goals for Commons Compress
2.x, as discussed in [this thread on the `dev@commons` mailing
list](https://lists.apache.org/thread/qd64z9vl67y8cm33optp62w10y30y2z3).

Where we differ is in how to approach that goal:

* From what you've described, it sounds like you're maintaining a private
fork of Commons Compress at Google, removing archivers and compressors that
aren't needed internally.
* Our long-term plan is to modularize the library into a small shared core
(defining common APIs and abstractions) and a set of separate modules—one per
archiver/compressor, especially when they introduce external dependencies.

That said, we currently lack the volunteer bandwidth to execute this plan in
the short term. Modularizing Log4j Core was possible thanks to STA funding;
without similar support, a modular Commons Compress could take years.

Given that you're already maintaining a fork, would you be interested in
helping us move toward that modular structure? I could prepare a `2.x` branch
with the initial scaffolding for a multi-module Maven project, and you could
help by extracting and maintaining the modules for the formats you rely on.
This would result in much cleaner separation than managing internal
dependencies manually. Feel free to start a discussion on the [`dev@commons`
mailing list](https://commons.apache.org/mail-lists.html) or reach out
@garydgregory and me directly.

**Side note**: In parallel, we are also considering publishing intermediate
VEX statements, as [some of your colleagues at Google have
suggested](https://osv.dev/blog/posts/automating-and-scaling-vex-generation/).
These could help your security response team triage false positives when
vulnerabilities are reported in components you don’t actually use.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] Separate ArchiveStreamConstants from ArchiveStreamFactory [commons-compress]

Reply via email to