For purposes of training, I'm wondering if the box for a character should 
include the surrounding space.

In particular for the CJK "FULLWIDTH COMMA", should the box be the red or green 
rectangle?

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/D482C8A0-6DD7-4D1B-8A3E-C9B66CA9179C%40mac.com.

The red box is easier to render because of the info available in the font 
metrics, but after training, OCR has many problems with the full width comma 
not being translated properly.  It often translates to either nothing (no 
recognition) or translates to a latin comma.

Thanks
Danny

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/D482C8A0-6DD7-4D1B-8A3E-C9B66CA9179C%40mac.com.

Reply via email to