[ 
https://issues.apache.org/jira/browse/TIKA-4458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18009682#comment-18009682
 ] 

Tilman Hausherr edited comment on TIKA-4458 at 7/25/25 2:42 AM:
----------------------------------------------------------------

What is your command line for tika-app?

It isn't enough to copy the jar files in the current directory, you need to 
change the command line to use "java -cp" and use a classpath by taking 
inspiration from pdfbox, see at the bottom of
https://pdfbox.apache.org/3.0/dependencies.html

so if you copy your jar files in the lib subdirectory, your command line will 
be:

java -cp "tika-app-3.2.1.jar:./lib/*" org.apache.tika.cli.TikaCLI (use ";" 
instead of ":" on Windows)


was (Author: tilman):
What is your command line for tika-app?

It isn't enough to copy the jar files in the current directory, you need to 
change the command line to use "java -cp" and use a classpath by taking 
inspiration from pdfbox, see at the bottom of
https://pdfbox.apache.org/3.0/dependencies.html

so your initial class for tika will likely be "org.apache.tika.cli.TikaCLI"

> PDFParser with Tesseract: Improve documentation about embedded JP2 and JB2 
> files
> --------------------------------------------------------------------------------
>
>                 Key: TIKA-4458
>                 URL: https://issues.apache.org/jira/browse/TIKA-4458
>             Project: Tika
>          Issue Type: Wish
>          Components: parser
>    Affects Versions: 3.2.1
>            Reporter: Peter Hoogendijk
>            Priority: Minor
>
> When using Tika-app 3.2.1 with Tesseract 5.3.0 to parse PDF-files with 
> embedded JP2 and JB2 data the following errors are reported:
> {code:java}
> ERROR [main] 20:26:27,356 org.apache.pdfbox.contentstream.PDFStreamEngine 
> Cannot read JPEG2000 image: Java Advanced Imaging (JAI) Image I/O Tools are 
> not installed {code}
> Installing jai 1.1.3 and jai-imageio 1.1 in the OpenJDK 17 lib directory does 
> not change the error messages. Is it enough to put the *.jar and *.so files 
> in that directory, or is more required?
> Please provide instructions (or a link to existing instructions) on how to 
> configure Apache Tika to solve this error. After a lot of searching I only 
> found instructions how to configure PDFBox (in pom.xml) but this does not 
> solve the issue for Apache Tika. How do I translate the required PDFBox 
> configuration sections to the Apache Tika cofiguration file? 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to