[ 
https://issues.apache.org/jira/browse/TIKA-4733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18082345#comment-18082345
 ] 

ASF GitHub Bot commented on TIKA-4733:
--------------------------------------

Copilot commented on code in PR #2825:
URL: https://github.com/apache/tika/pull/2825#discussion_r3274642095


##########
tika-server/tika-server-standard/pom.xml:
##########
@@ -191,6 +198,36 @@
           </execution>
         </executions>
       </plugin>
+      <plugin>
+        <groupId>org.apache.maven.plugins</groupId>
+        <artifactId>maven-install-plugin</artifactId>
+        <executions>
+          <!--
+            With <attach>false</attach> on the assembly above (TIKA-4733) the
+            -bin.zip is not part of the project artifact set and so is neither
+            deployed to Central nor installed locally. Sibling reactor modules
+            declare tika-server-standard:bin:zip as a Maven dep, so install it
+            into the local repo at its canonical coordinates to satisfy reactor
+            resolution without publishing it to Central.
+          -->
+          <execution>
+            <id>install-server-bin-zip-locally</id>
+            <phase>install</phase>

Review Comment:
   The `install-file` execution is bound to the `install` phase, but CI runs 
the E2E workflow with `mvn -pl tika-e2e-tests -am clean verify -Pe2e` (verify 
does not reach install). With `<attach>false</attach>`, the 
`tika-server-standard:bin:zip` artifact will therefore be unavailable for 
`maven-dependency-plugin:unpack` in `tika-e2e-tests/tika-server` during 
`process-test-resources`, breaking the E2E build. Bind the `install-file` 
execution to a phase included by `verify` (e.g., `package`), or otherwise 
ensure the classifier artifact is resolvable from the reactor during `verify` 
(e.g., conditional attach for reactor builds).
   



##########
tika-app/src/main/java/org/apache/tika/cli/TikaCLI.java:
##########
@@ -298,17 +298,43 @@ private static void async(String[] args) throws Exception 
{
         }
 
         if (runpack || ! StringUtils.isBlank(tikaConfigPath)) {
-            TikaAsyncCLI.main(args);
+            invokeAsyncCLI(args);
             return;
         }
         if (args.length == 1 &&  args[0].endsWith(".json")) {
-            TikaAsyncCLI.main(args);
+            invokeAsyncCLI(args);
             return;
         }
         // For batch mode (two directories), pass directly to TikaAsyncCLI.
         // It will create its own config with PluginsWriter that includes
         // plugin-roots, fetcher, emitter, and pipes-iterator configuration.
-        TikaAsyncCLI.main(args);
+        invokeAsyncCLI(args);
+    }
+
+    /**
+     * Invokes the batch/async processor ({@code tika-async-cli}). The async
+     * processor and the parsers it forks live in the {@code lib/} directory of
+     * the tika-app distribution rather than inside the bare {@code 
tika-app.jar}.
+     * If tika-app is run as a standalone jar (without the surrounding unzipped
+     * distribution), the supporting classes are missing from the classpath and
+     * the JVM throws {@link NoClassDefFoundError}. Translate that into an
+     * actionable message rather than letting the raw error escape.
+     *
+     * @see <a 
href="https://issues.apache.org/jira/browse/TIKA-4733";>TIKA-4733</a>
+     */
+    private static void invokeAsyncCLI(String[] args) throws Exception {
+        try {
+            TikaAsyncCLI.main(args);
+        } catch (NoClassDefFoundError e) {
+            System.err.println("Error: could not load the Tika batch/async 
processor (" +
+                    e.getMessage() + ").");
+            System.err.println("Batch mode requires the full tika-app 
distribution, not the "
+                    + "standalone jar.");
+            System.err.println("Download tika-app-<version>.zip, unzip it, and 
run "
+                    + "tika-app-<version>.jar from inside the unzipped 
directory so that the "
+                    + "adjacent 'lib/' and 'plugins/' directories are on the 
classpath.");

Review Comment:
   The message says the adjacent `lib/` and `plugins/` directories are "on the 
classpath". In the distribution, `lib/` jars are referenced via the jar 
manifest Class-Path, but `plugins/` is not on the JVM classpath (it’s a 
filesystem directory used for plugin discovery). Consider rewording to avoid 
the classpath implication (e.g., say they must be present alongside the jar / 
in the distribution directory).
   





> tika-4.0.0-alpha1 - cannot find tika-async-cli.jar
> --------------------------------------------------
>
>                 Key: TIKA-4733
>                 URL: https://issues.apache.org/jira/browse/TIKA-4733
>             Project: Tika
>          Issue Type: Bug
>    Affects Versions: 4.0.0
>         Environment: Windows 11 using Java 17.
>            Reporter: Adrian Bird
>            Priority: Blocker
>
> I've downloaded the tika-app-4.0.0-alpha-1.jar and the 
> tika-4.0.0-alpha-1-src.zip and unzipped the tika-4.0.0-alpha-1-src.zip into 
> the same folder as tika-app-4.0.0-alpha-1.jar.
> I can successfully run a simple test but when I tried to extract images from 
> a document (--extract) I got this error:
> {code:java}
> Apache Tika 4.0.0-alpha-1
> Exception in thread "main" java.lang.NoClassDefFoundError: 
> org/apache/tika/async/cli/TikaAsyncCLI
>         at org.apache.tika.cli.TikaCLI.async(TikaCLI.java:311)
>         at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:261)
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.tika.async.cli.TikaAsyncCLI
>         at 
> java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:641)
>         at 
> java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188)
>         at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:525)
>         ... 2 more{code}
>  
> I also noticed the documentation for [Basic Batch Usage 
> |https://tika.apache.org/docs/4.0.0-SNAPSHOT/using-tika/cli/index.html#_basic_batch_usage]
>  has this example:
> {code:java}
> java -jar tika-async-cli.jar -i /path/to/input -o /path/to/output{code}
> Where do I get tika-async-cli.jar from?
>  
> PS: I added that it affects version 4.0.0 as there was no 'alpha' version 
> visible.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to