Re: [tesseract-ocr] Should box include surrounding space?

2023-10-19 Thread 'Danny Wilson' via tesseract-ocr
Sorry, I had the coordinate system flipped on my last post. Here is a correct image produced by text2image and includes both FULLWIDTH COMMA and COMMA.  For both types of comma, the boxes produced by text2image include only the boundaries of the glyph itself and does not consider the vertical

Re: [tesseract-ocr] Should box include surrounding space?

2023-10-18 Thread 'Danny Wilson' via tesseract-ocr
Because of some issues with licensed fonts not working with text2image, we wrote our own image and box file generator in Swift on the Mac. We use that to generate a data set for 100,000 text lines and feed that into the regular training on Linux. Using a non-licensed font, I checked what box

[tesseract-ocr] Should box include surrounding space?

2023-10-17 Thread 'Danny Wilson' via tesseract-ocr
For purposes of training, I'm wondering if the box for a character should include the surrounding space. In particular for the CJK "FULLWIDTH COMMA", should the box be the red or green rectangle? -- You received this message because you are subscribed to the Google Groups "tesseract-ocr"