[
https://issues.apache.org/jira/browse/TIKA-4723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18079023#comment-18079023
]
Nicholas DiPiazza commented on TIKA-4723:
-----------------------------------------
Branch TIKA-4723 has been pushed with the following fixes:
h2. Changes Made
h3. 1. tika-grpc: Assembly ZIP no longer attached to Maven artifact
The {{maven-assembly-plugin}} and {{copy-dependencies}} executions have been
moved to a new {{docker}} Maven profile with {{<attach>false</attach>}}.
* Default build ({{mvn package}}): produces only the thin JAR (~238KB). Nothing
large is uploaded to Nexus.
* Docker build ({{mvn package -Pdocker}}): produces the full distribution ZIP
in {{target/}}, but it is *not* attached/deployed.
* The CI workflow in {{docker-snapshot.yml}} already runs
{{dependency:copy-dependencies}} separately and does not use the assembly ZIP,
so no CI changes are needed.
h3. 2. All 15 tika-pipes-plugins: ZIPs no longer attached to Maven artifact
Added {{<attach>false</attach>}} to the {{maven-assembly-plugin}} in all 15
plugin POMs (s3, gcs, az-blob, kafka, solr, etc.).
The plugin ZIPs are still built during {{mvn package}} (so the Docker build
script can copy them), but they will no longer be deployed to Nexus/Maven
Central.
h3. 3. tika-serialization: lombok scope fixed
Changed lombok from {{<scope>compile</scope>}} to {{<scope>provided</scope>}} —
it is an annotation processor and should never be a transitive runtime
dependency.
h3. 4. tika-grpc README updated
Added a "Distribution and Maven Artifact" section documenting that tika-grpc is
Docker-first and that the distribution ZIP is only built with {{-Pdocker}}.
h2. Result
After these changes, a {{mvn deploy}} of tika-grpc will upload:
* One thin JAR (~238KB) — instead of a 400MB ZIP
Branch: https://github.com/apache/tika/tree/TIKA-4723
> Slim down grpc?
> ---------------
>
> Key: TIKA-4723
> URL: https://issues.apache.org/jira/browse/TIKA-4723
> Project: Tika
> Issue Type: Task
> Reporter: Tim Allison
> Priority: Major
>
> For 4.0.0-beta, we should figure out if we can slim down tika-grpc mostly
> just for environmental reasons. It currently weighs in at 648MB.
> If we said we only support it in Docker, we could strip out some native libs.
> Other options? Claude, copilot and/or gemini, please help us save the
> environment!
--
This message was sent by Atlassian Jira
(v8.20.10#820010)