ppkarwasz commented on PR #673:
URL: https://github.com/apache/commons-compress/pull/673#issuecomment-2926766400

   > What I care about is the ability to build independent 'archiver' 
libraries, for instance, for ZIP, TAR, and other formats, and similarly for 
'compressor' libraries. This is currently impossible because, for instance, the 
file `ZipSplitReadOnlySeekableByteChannel` references `ArchiveStreamFactory`, 
and obviously `ArchiveStreamFactory` depends on `ZipArchiveInputStream` etc. So 
compiling the zip archiver inevitably brings in the whole library.
   
   Have you tried using shading? The Maven Shade Plugin supports a 
[`minimizeJar`](https://maven.apache.org/plugins/maven-shade-plugin/shade-mojo.html#minimizeJar)
 option that removes all classes not reachable from a given set of entry 
points. When I specify `ZipSplitReadOnlySeekableByteChannel` as the entry 
point, I get a much smaller JAR—around 140 KiB—that **does not** include 
`ArchiveStreamFactory`.
   
   This works because the feature uses 
[`jdependency`](https://github.com/tcurdt/jdependency), which traces class 
references at the **bytecode** level. In contrast, `javac` processes all 
source-level references, even if the resulting class file doesn’t depend on 
them directly.
    
   > The reason I care about that is that I maintain the source code for apache 
commons-compress at google, and it has been a common occurrence that there is a 
vulnerability discovered in one of the archiver or compressor implementations, 
and we have to rush to update the commons-compress library. And when we do 
this, we often find that commons-compress has added more strict file validation 
or has otherwise changed its behavior such that passing tests start to fail.
   > 
   > So perhaps you can understand that we're very interested in reducing the 
impact of vulnerabilities by reducing the surface area of the library that we 
use. On that basis, I urge you to reconsider closing this PR.
   
   Thank you—your motivation is absolutely valid, and I fully agree with the 
goal of reducing the surface area of Commons Compress that can be affected by 
vulnerabilities. In fact, this is one of the primary goals for Commons Compress 
2.x, as discussed in [this thread on the `dev@commons` mailing 
list](https://lists.apache.org/thread/qd64z9vl67y8cm33optp62w10y30y2z3).
   
   Where we differ is in how to approach that goal:
   
   * From what you've described, it sounds like you're maintaining a private 
fork of Commons Compress at Google, removing archivers and compressors that 
aren't needed internally.
   * Our long-term plan is to modularize the library into a small shared core 
(defining common APIs and abstractions) and a set of separate modules—one per 
archiver/compressor, especially when they introduce external dependencies.
   
   That said, we currently lack the volunteer bandwidth to execute this plan in 
the short term. Modularizing Log4j Core was possible thanks to STA funding; 
without similar support, a modular Commons Compress could take years.
   
   Given that you're already maintaining a fork, would you be interested in 
helping us move toward that modular structure? I could prepare a `2.x` branch 
with the initial scaffolding for a multi-module Maven project, and you could 
help by extracting and maintaining the modules for the formats you rely on. 
This would result in much cleaner separation than managing internal 
dependencies manually. Feel free to start a discussion on the [`dev@commons` 
mailing list](https://commons.apache.org/mail-lists.html) or reach out 
@garydgregory and me directly.
   
   **Side note**: In parallel, we are also considering publishing intermediate 
VEX statements, as [some of your colleagues at Google have 
suggested](https://osv.dev/blog/posts/automating-and-scaling-vex-generation/). 
These could help your security response team triage false positives when 
vulnerabilities are reported in components you don’t actually use.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to