For purposes of training, I'm wondering if the box for a character should include the surrounding space.
In particular for the CJK "FULLWIDTH COMMA", should the box be the red or green rectangle? -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/D482C8A0-6DD7-4D1B-8A3E-C9B66CA9179C%40mac.com.
The red box is easier to render because of the info available in the font metrics, but after training, OCR has many problems with the full width comma not being translated properly. It often translates to either nothing (no recognition) or translates to a latin comma. Thanks Danny -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/D482C8A0-6DD7-4D1B-8A3E-C9B66CA9179C%40mac.com.