David Cockbill created ANY23-447:
------------------------------------
Summary: Reduce Any23 dependency bloat
Key: ANY23-447
URL: https://issues.apache.org/jira/browse/ANY23-447
Project: Apache Any23
Issue Type: Improvement
Components: core
Affects Versions: 2.3
Reporter: David Cockbill
Compelled by email conversation with Hans Brende:
{code:java}
David, unfortunately this move won't reduce the number of core dependencies
we have: the plugins and service modules are not dependencies of the core
module. However, it might be useful if you posted an issue about the
dependency bloat, including the various exclusions you are using: we might
be able to mitigate the problem.
{code}
This was a result of having to exclude dependencies in the pom.xml for a
product (Note that there was not too much thought in the exclusions, I was
trying to get the code size down before a release). Section of pom.xml:
{code:java}
<dependency>
<groupId>org.apache.any23</groupId>
<artifactId>apache-any23-core</artifactId>
<exclusions>
<!-- Any23 brings in a lot of dependencies which bloats the sharded
jar.
This is an attempt to reduce this by excluding packages
that we may not be using as part of Any23.
NOTE: If dependency is required at runtime, then a
java.lang.NoClassDefFoundError is thrown. -->
<exclusion>
<groupId>org.apache.tika</groupId>
<artifactId>tika-parsers</artifactId>
</exclusion>
<exclusion>
<groupId>org.bouncycastle</groupId>
<artifactId>bcmail-jdk15on</artifactId>
</exclusion>
<exclusion>
<groupId>org.bouncycastle</groupId>
<artifactId>bcprov-jdk15on</artifactId>
</exclusion>
<exclusion>
<groupId>edu.ucar</groupId>
<artifactId>cdm</artifactId>
</exclusion>
<exclusion>
<groupId>net.sf.trove4j</groupId>
<artifactId>trove4j</artifactId>
</exclusion>
<exclusion>
<groupId>org.apache.cxf</groupId>
<artifactId>cxf-rt-rs-client</artifactId>
</exclusion>
<exclusion>
<groupId>com.github.ben-manes.caffeine</groupId>
<artifactId>caffeine</artifactId>
</exclusion>
<exclusion>
<groupId>org.opengis</groupId>
<artifactId>geoapi</artifactId>
</exclusion>
<exclusion>
<groupId>com.drewnoakes</groupId>
<artifactId>metadata-extractor</artifactId>
</exclusion>
<exclusion>
<groupId>org.eclipse.rdf4j</groupId>
<artifactId>rdf4j-repository-sail</artifactId>
</exclusion>
<exclusion>
<groupId>org.eclipse.rdf4j</groupId>
<artifactId>rdf4j-sail-memory</artifactId>
</exclusion>
<exclusion>
<groupId>org.tukaani</groupId>
<artifactId>xz</artifactId>
</exclusion>
<exclusion>
<groupId>org.codelibs</groupId>
<artifactId>jhighlight</artifactId>
</exclusion>
<exclusion>
<groupId>org.gagravarr</groupId>
<artifactId>vorbis-java-core</artifactId>
</exclusion>
<exclusion>
<groupId>org.gagravarr</groupId>
<artifactId>vorbis-java-tika</artifactId>
</exclusion>
<exclusion>
<groupId>org.apache.opennlp</groupId>
<artifactId>opennlp-tools</artifactId>
</exclusion>
<exclusion>
<groupId>org.apache.pdfbox</groupId>
<artifactId>pdfbox</artifactId>
</exclusion>
<exclusion>
<groupId>org.apache.pdfbox</groupId>
<artifactId>pdfbox-tools</artifactId>
</exclusion>
<exclusion>
<groupId>org.apache.poi</groupId>
<artifactId>poi-scratchpad</artifactId>
</exclusion>
<exclusion>
<groupId>edu.ucar</groupId>
<artifactId>grib</artifactId>
</exclusion>
<exclusion>
<groupId>com.googlecode.mp4parser</groupId>
<artifactId>isoparser</artifactId>
</exclusion>
<exclusion>
<groupId>com.healthmarketscience.jackcess</groupId>
<artifactId>jackcess</artifactId>
</exclusion>
<exclusion>
<groupId>com.healthmarketscience.jackcess</groupId>
<artifactId>jackcess-encrypt</artifactId>
</exclusion>
<exclusion>
<groupId>org.apache.sis.core</groupId>
<artifactId>sis-utility</artifactId>
</exclusion>
<exclusion>
<groupId>org.apache.sis.storage</groupId>
<artifactId>sis-netcdf</artifactId>
</exclusion>
<exclusion>
<groupId>org.apache.sis.core</groupId>
<artifactId>sis-metadata</artifactId>
</exclusion>
<exclusion>
<groupId>org.eclipse.rdf4j</groupId>
<artifactId>rdf4j-rio-trix</artifactId>
</exclusion>
<exclusion>
<groupId>org.yaml</groupId>
<artifactId>snakeyaml</artifactId>
</exclusion>
<exclusion>
<groupId>org.eclipse.rdf4j</groupId>
<artifactId>rdf4j-rio-turtle</artifactId>
</exclusion>
</exclusions>
</dependency>
{code}
Some background that may be useful from my notes:
{code:java}
Whilst adding Any23 the product, the Any23 Core package was causing Lintian to
fail.
Lintian is a Debian package checker written in PERL. This package uses
Archive::Zip to unpack any .jar file in the Debian package. This particular
unzip utility does not handle the Zip64 format; causing the failure. The
original zip format has various restrictions, one of which being the number of
files in the archive. Therefore if the class files in the jar for the product
exceeds this limit (65535), then a zip64 format file is produced instead of a
standard zip file.
The Any23 Core Library does seem quite excessive in what it pulls in. From
running the following, the output for the product goes from 40490 to 78513.
zipinfo -1 product.jar | wc -l
{code}
This Linitan failure on a linux build was the original push for the exclusions;
however the product .jar also increased in a similar fashion.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)