Flatten the form fields before searching the file if you want
PDFTextStripper to find the text in them.
On Thu, Mar 21, 2024 at 12:10 PM Paul Grütter
wrote:
> Hello list,
>
>
>
> I want to search for words in a PDF document and get their positions. It
> seems that PDFBox ignores text which has
The Apache PDFBox community is pleased to announce the release of
Apache PDFBox version 2.0.31 The release is available for download at:
https://pdfbox.apache.org/download.html
See the full release notes below for details about this release.
Release Notes -- Apache PDFBox -- Version 2.0.31
Here they are, remove the XXX
https://corpora.tika.apache.org/XXXbase/docs/govdocs1/433/433525.pdf
https://corpora.tika.apache.org/XXXbase/docs/commoncrawl3/O2/O226ORR4SMIKRGPWC6PXUYAYMSBB6FVP
https://corpora.tika.apache.org/XXXbase/docs/commoncrawl3/R4/R4EXG25W532JHDQLJAM4HF6O532TLR7D
The
Am 15.03.24 um 05:35 schrieb Tilman Hausherr:
You are correct that it's the "fb" parts that are missing. (And some of
the other tools you tried also mention this)
Just adding true results in text extraction of several files no longer
being correct, 433525-p1.pdf
4 matches
Mail list logo