Rich, I believe you mentioned Master PDF editor. I believe it has OCR built-in, or allows it as a plugin. If needed, a good OCR tool is Tesseract and is likely in your distro's repository. https://en.wikipedia.org/wiki/Tesseract_(software)
Jason On Sat, Jul 24, 2021 at 11:08 AM Tomas Kuchta <tomas.kuchta.li...@gmail.com> wrote: > They are not searchable because they do not contain text to search. > Typically, they contain image only. > > The way I deal with it - I OCR the image, generate text document and place > that text into a layer under the image in the output PDF. > > Having the text under the image layer preserves the original look of the > pdf why allowing for search and select. > > I have seen pdf with text over the image, obscuring it - as well as various > attempts of making the text over the image invisible. > > Of course, OCR is not perfect as well as preserving the text in the exact > position under the image. It mostly works for text, not so much for data > extraction from tables, etc. > > I do not believe that there is OK-ish free SW solution to this. I use > commercial SW to do that. It works, but I cannot publicly recommend it due > to their nasty commercial behavior - no respect for privacy no sale just > licensing with build in obsolescence. > > Tomas > > On Fri, Jul 23, 2021, 16:18 Rich Shepard <rshep...@appl-ecosys.com> wrote: > > > I've encountered a few PDF-1.5 docs that are not searchable using xpdf, > > mupdf, okular, or MasterPDFEditor. Perhaps they're scanned and I don't > know > > how to determine if they are. > > > > My web searches found nothing relevant; my search terms might be > > inefficient. > > > > Has anyone else experienced this? > > > > Rich > > > > > > >