Re: [PLUG] PDF-1.5 docs not searchable

Rich Shepard Sun, 25 Jul 2021 05:53:33 -0700

On Sun, 25 Jul 2021, Jason Barnett wrote:

I believe you mentioned Master PDF editor. I believe it has OCR built-in,
or allows it as a plugin. If needed, a good OCR tool is Tesseract and is
likely in your distro's repository.
https://en.wikipedia.org/wiki/Tesseract_(software)


Jason,

Thank you. The most recent doc I viewed (that prompted my post) has multiple
images per page; it's not all text. While I don't remember the few others
that I could not search it's likely that they, too, had many images embedded
within the text. So I assume they were all scanned (or produced by an
equivalent process).

I used to get scanned documents (such as permit copies) from clients and had
no reason to run them through an OCR, but I'll keep that in mind for the
future.

Germane to MasterPDFEditor, I expect that its OCR capabilites are in the
paid version, not the free one. And, yes, Tesseract is in the SBo repo.

Stay well,

Rich

Re: [PLUG] PDF-1.5 docs not searchable

Reply via email to