[tesseract-ocr] OCR Output contains "xlz"

'Danny Wilson' via tesseract-ocr Sun, 15 Oct 2023 06:44:05 -0700

Running tesseract on a single Chinese character "對" outputs the character, 
but also the text "xlz".


Command line: 
tesseract sub0089w.png debugOut -l ARYuanB5-MD --dpi 72 --psm 6 -c 
preserve_interword_spaces=1

The output is two lines:
xlz
對

It used to output "sMz"  but after retraining several times with the 
specific font in use, it now outputs "xlz".

Why?

I've attached the image file in question...

[image: sub0089w.png]

(Searching the source code, the file universalambigs.h has a line " xlZ le 
1" which is similar, but not exact to the errant text I'm finding)

Thank you.

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/76ed2f78-e10f-4b9f-8d61-30f4b0f333dbn%40googlegroups.com.

[tesseract-ocr] OCR Output contains "xlz"

Reply via email to