[
https://issues.apache.org/jira/browse/TIKA-4733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18082345#comment-18082345
]
ASF GitHub Bot commented on TIKA-4733:
--------------------------------------
Copilot commented on code in PR #2825:
URL: https://github.com/apache/tika/pull/2825#discussion_r3274642095
##########
tika-server/tika-server-standard/pom.xml:
##########
@@ -191,6 +198,36 @@
</execution>
</executions>
</plugin>
+ <plugin>
+ <groupId>org.apache.maven.plugins</groupId>
+ <artifactId>maven-install-plugin</artifactId>
+ <executions>
+ <!--
+ With <attach>false</attach> on the assembly above (TIKA-4733) the
+ -bin.zip is not part of the project artifact set and so is neither
+ deployed to Central nor installed locally. Sibling reactor modules
+ declare tika-server-standard:bin:zip as a Maven dep, so install it
+ into the local repo at its canonical coordinates to satisfy reactor
+ resolution without publishing it to Central.
+ -->
+ <execution>
+ <id>install-server-bin-zip-locally</id>
+ <phase>install</phase>
Review Comment:
The `install-file` execution is bound to the `install` phase, but CI runs
the E2E workflow with `mvn -pl tika-e2e-tests -am clean verify -Pe2e` (verify
does not reach install). With `<attach>false</attach>`, the
`tika-server-standard:bin:zip` artifact will therefore be unavailable for
`maven-dependency-plugin:unpack` in `tika-e2e-tests/tika-server` during
`process-test-resources`, breaking the E2E build. Bind the `install-file`
execution to a phase included by `verify` (e.g., `package`), or otherwise
ensure the classifier artifact is resolvable from the reactor during `verify`
(e.g., conditional attach for reactor builds).
##########
tika-app/src/main/java/org/apache/tika/cli/TikaCLI.java:
##########
@@ -298,17 +298,43 @@ private static void async(String[] args) throws Exception
{
}
if (runpack || ! StringUtils.isBlank(tikaConfigPath)) {
- TikaAsyncCLI.main(args);
+ invokeAsyncCLI(args);
return;
}
if (args.length == 1 && args[0].endsWith(".json")) {
- TikaAsyncCLI.main(args);
+ invokeAsyncCLI(args);
return;
}
// For batch mode (two directories), pass directly to TikaAsyncCLI.
// It will create its own config with PluginsWriter that includes
// plugin-roots, fetcher, emitter, and pipes-iterator configuration.
- TikaAsyncCLI.main(args);
+ invokeAsyncCLI(args);
+ }
+
+ /**
+ * Invokes the batch/async processor ({@code tika-async-cli}). The async
+ * processor and the parsers it forks live in the {@code lib/} directory of
+ * the tika-app distribution rather than inside the bare {@code
tika-app.jar}.
+ * If tika-app is run as a standalone jar (without the surrounding unzipped
+ * distribution), the supporting classes are missing from the classpath and
+ * the JVM throws {@link NoClassDefFoundError}. Translate that into an
+ * actionable message rather than letting the raw error escape.
+ *
+ * @see <a
href="https://issues.apache.org/jira/browse/TIKA-4733">TIKA-4733</a>
+ */
+ private static void invokeAsyncCLI(String[] args) throws Exception {
+ try {
+ TikaAsyncCLI.main(args);
+ } catch (NoClassDefFoundError e) {
+ System.err.println("Error: could not load the Tika batch/async
processor (" +
+ e.getMessage() + ").");
+ System.err.println("Batch mode requires the full tika-app
distribution, not the "
+ + "standalone jar.");
+ System.err.println("Download tika-app-<version>.zip, unzip it, and
run "
+ + "tika-app-<version>.jar from inside the unzipped
directory so that the "
+ + "adjacent 'lib/' and 'plugins/' directories are on the
classpath.");
Review Comment:
The message says the adjacent `lib/` and `plugins/` directories are "on the
classpath". In the distribution, `lib/` jars are referenced via the jar
manifest Class-Path, but `plugins/` is not on the JVM classpath (it’s a
filesystem directory used for plugin discovery). Consider rewording to avoid
the classpath implication (e.g., say they must be present alongside the jar /
in the distribution directory).
> tika-4.0.0-alpha1 - cannot find tika-async-cli.jar
> --------------------------------------------------
>
> Key: TIKA-4733
> URL: https://issues.apache.org/jira/browse/TIKA-4733
> Project: Tika
> Issue Type: Bug
> Affects Versions: 4.0.0
> Environment: Windows 11 using Java 17.
> Reporter: Adrian Bird
> Priority: Blocker
>
> I've downloaded the tika-app-4.0.0-alpha-1.jar and the
> tika-4.0.0-alpha-1-src.zip and unzipped the tika-4.0.0-alpha-1-src.zip into
> the same folder as tika-app-4.0.0-alpha-1.jar.
> I can successfully run a simple test but when I tried to extract images from
> a document (--extract) I got this error:
> {code:java}
> Apache Tika 4.0.0-alpha-1
> Exception in thread "main" java.lang.NoClassDefFoundError:
> org/apache/tika/async/cli/TikaAsyncCLI
> at org.apache.tika.cli.TikaCLI.async(TikaCLI.java:311)
> at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:261)
> Caused by: java.lang.ClassNotFoundException:
> org.apache.tika.async.cli.TikaAsyncCLI
> at
> java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:641)
> at
> java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188)
> at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:525)
> ... 2 more{code}
>
> I also noticed the documentation for [Basic Batch Usage
> |https://tika.apache.org/docs/4.0.0-SNAPSHOT/using-tika/cli/index.html#_basic_batch_usage]
> has this example:
> {code:java}
> java -jar tika-async-cli.jar -i /path/to/input -o /path/to/output{code}
> Where do I get tika-async-cli.jar from?
>
> PS: I added that it affects version 4.0.0 as there was no 'alpha' version
> visible.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)