Thanks Zedenko, PyMuPDF is an intriguing option. I'll check it out further.
On Fri, Mar 8, 2024 at 6:14 AM Zdenko Podobny <zde...@gmail.com> wrote: > Hello, > > > I am not sure if OCRmyPDF(https://ocrmypdf.readthedocs.io/en/latest/) > allows redaction. > > If you would to implement text layer by yourself with custom font, have a > look at PyMuPDF: > > - https://github.com/pymupdf/PyMuPDF/discussions/775 (Adding text > layer to a scanned PDF) > - https://github.com/pymupdf/PyMuPDF/discussions/2464 (invisible text > layer) > > > Zdenko > > > št 7. 3. 2024 o 20:53 Mark Pellegrino <mar...@gmail.com> napísal(a): > >> I found more info here: >> >> https://github.com/tesseract-ocr/tesseract/issues/1769#issuecomment-509490277 >> >> Glyphless appears to be an 'invisible font' and all that Tesseract >> supports. It seems like the solution it to use Tesseract to generate hOCR, >> then use another tool to combine the source image with the hOCR? >> >> Does anyone have a simple workflow for editing/correcting Tesseract OCR >> documents that they can share? >> >> Thanks again, >> >> On Thursday 7 March 2024 at 14:17:28 UTC-5 Mark Pellegrino wrote: >> >>> Hello, >>> I'm trying to check PDFs made with Tesseract 5.2 for correctness using >>> an OCR editor but am unable to open them in either Abbyy or Acrobat. >>> >>> If I try to open a Tesseract PDF with Abbyy FineReader/OCR Editor, the >>> software just hangs and crashes. I can open Tesseract PDFs with Acrobat >>> Pro, but when I enable the 'Make OCR text visible' option in Preflight, >>> all of the text layer turns into unreadable black boxes. The font used >>> shows as 'GlyphLessFont' and appears to be embedded in the file. >>> >>> It doesn't matter what training data I use, or what the source image >>> was, I always get these results. Any other non-Tesseract made PDF works >>> just fine. I'm guessing that the issue is a missing font? I don't have much >>> of an understanding about how embedded PDF fonts work and I haven't found >>> anything about this in the Tesseract docs. Can someone please point me in >>> the right direction? I Thanks. >>> >>> >>> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to tesseract-ocr+unsubscr...@googlegroups.com. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/b43c0ea6-fd81-49af-b74f-e93b0a682574n%40googlegroups.com >> <https://groups.google.com/d/msgid/tesseract-ocr/b43c0ea6-fd81-49af-b74f-e93b0a682574n%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> > -- > You received this message because you are subscribed to a topic in the > Google Groups "tesseract-ocr" group. > To unsubscribe from this topic, visit > https://groups.google.com/d/topic/tesseract-ocr/d6ASNhJZUtw/unsubscribe. > To unsubscribe from this group and all its topics, send an email to > tesseract-ocr+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8wqM%2BE7KZ6_etfi6y8B_bLeZE4fRYns-TY3Yp%3DGhFjr7g%40mail.gmail.com > <https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8wqM%2BE7KZ6_etfi6y8B_bLeZE4fRYns-TY3Yp%3DGhFjr7g%40mail.gmail.com?utm_medium=email&utm_source=footer> > . > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAFPJFhb%2Bqf_k_vFia1J6%2BxvQgAOVr5Ybk-kucPBUYNkDKgnsvg%40mail.gmail.com.