[
https://issues.apache.org/jira/browse/TIKA-4458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Peter Hoogendijk updated TIKA-4458:
-----------------------------------
Description:
When using Tika-app 3.2.1 with Tesseract 5.3.0 to parse PDF-files with embedded
JP2 and JB2 data the following errors are reported:
{code:java}
ERROR [main] 20:26:27,356 org.apache.pdfbox.contentstream.PDFStreamEngine
Cannot read JPEG2000 image: Java Advanced Imaging (JAI) Image I/O Tools are not
installed {code}
Installing jai 1.1.3 and jai-imageio 1.1 in the OpenJDK 17 lib directory does
not change the error messages. Is it enough to put the *.jar and *.so files in
that directory, or is more required?
Please provide instructions (or a link to existing instructions) on how to
configure Apache Tika to solve this error. After a lot of searching I only
found instructions how to configure PDFBox (in pom.xml) but this does not solve
the issue for Apache Tika. How do I translate the required PDFBox configuration
sections to the Apache Tika cofiguration file?
was:
When using Tika-app 3.2.1 with Tesseract 5.3.0 to parse PDF-files with embedded
JP2 and JB2 data the following errors are reported:
{code:java}
ERROR [main] 20:26:27,356 org.apache.pdfbox.contentstream.PDFStreamEngine
Cannot read JPEG2000 image: Java Advanced Imaging (JAI) Image I/O Tools are not
installed {code}
Installing jai 1.1.3 and jai-imageio 1.1 in the OpenJDK 17 lib directory does
not change the error messages.
Please provide instructions (or a link to existing instructions) on how to
configure Apache Tika to solve this error. After a lot of searching I only
found instructions how to configure PDFBox (in pom.xml) but this does not solve
the issue for Apache Tika. How do I translate the required PDFBox configuration
sections to the Apache Tika cofiguration file?
> PDFParser with Tesseract: Improve documentation about embedded JP2 and JB2
> files
> --------------------------------------------------------------------------------
>
> Key: TIKA-4458
> URL: https://issues.apache.org/jira/browse/TIKA-4458
> Project: Tika
> Issue Type: Wish
> Components: parser
> Affects Versions: 3.2.1
> Reporter: Peter Hoogendijk
> Priority: Minor
>
> When using Tika-app 3.2.1 with Tesseract 5.3.0 to parse PDF-files with
> embedded JP2 and JB2 data the following errors are reported:
> {code:java}
> ERROR [main] 20:26:27,356 org.apache.pdfbox.contentstream.PDFStreamEngine
> Cannot read JPEG2000 image: Java Advanced Imaging (JAI) Image I/O Tools are
> not installed {code}
> Installing jai 1.1.3 and jai-imageio 1.1 in the OpenJDK 17 lib directory does
> not change the error messages. Is it enough to put the *.jar and *.so files
> in that directory, or is more required?
> Please provide instructions (or a link to existing instructions) on how to
> configure Apache Tika to solve this error. After a lot of searching I only
> found instructions how to configure PDFBox (in pom.xml) but this does not
> solve the issue for Apache Tika. How do I translate the required PDFBox
> configuration sections to the Apache Tika cofiguration file?
--
This message was sent by Atlassian Jira
(v8.20.10#820010)