[tesseract-ocr] Re: Post OCR Verification and Editing

Mark Pellegrino Thu, 07 Mar 2024 11:53:32 -0800

I found more info here:
https://github.com/tesseract-ocr/tesseract/issues/1769#issuecomment-509490277


Glyphless appears to be an 'invisible font' and all that Tesseract 
supports. It seems like the solution it to use Tesseract to generate hOCR, 
then use another tool to combine the source image with the hOCR? 

Does anyone have a simple workflow for editing/correcting Tesseract OCR 
documents that they can share?

Thanks again,

On Thursday 7 March 2024 at 14:17:28 UTC-5 Mark Pellegrino wrote:

> Hello,
> I'm trying to check PDFs made with Tesseract 5.2 for correctness using an 
> OCR editor but am unable to open them in either Abbyy or Acrobat.
>
> If I try to open a Tesseract PDF with Abbyy FineReader/OCR Editor, the 
> software just hangs and crashes. I can open Tesseract PDFs with Acrobat 
> Pro, but when I enable the  'Make OCR text visible' option in Preflight, 
> all of the text layer turns into unreadable black boxes. The font used 
> shows as 'GlyphLessFont' and appears to be embedded in the file.
>
> It doesn't matter what training data I use, or what the source image was, 
> I always get these results. Any other non-Tesseract made PDF works just 
> fine. I'm guessing that the issue is a missing font? I don't have much of 
> an understanding about how embedded PDF fonts work and I haven't found 
> anything about this in the Tesseract docs. Can someone please point me in 
> the right direction? I Thanks.
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/b43c0ea6-fd81-49af-b74f-e93b0a682574n%40googlegroups.com.

[tesseract-ocr] Re: Post OCR Verification and Editing

Reply via email to