[tesseract-ocr] Re: Missing Characters

2020-02-04 Thread Quan Nguyen
It looks like Times New Romon font does not have the glyphs for the characters of your interest. You'll need to select a compatible font. Btw, that application is jTessBoxEditor, not VietOCR. On Tuesday, February 4, 2020 at 11:02:47 AM UTC-6, Peyi Oyelo wrote: > > Hello, > > > I am currently usi

[tesseract-ocr] tesseract Table output using tesseract.js

2020-02-04 Thread Alok Kumar
Hi, can anyone help me to extract table format using tesseract.js in asp.net. currently i am extracting data object. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an emai

Re: [tesseract-ocr] approches used for language detection on images ...

2020-02-04 Thread Albretch Mueller
On 2/1/20, Zdenko Podobny wrote: > You did not provide any example Image OK, this one would do. On this pdf file there are images of varying quality and with text embedded in various ways. This would be the typical text I would be dealing with: https://www.nysedregents.org/USHistoryGov/Archive

Re: [tesseract-ocr] Re: Announcement: Python package pytesstrain (Tesseract training helpers)

2020-02-04 Thread Shree Devi Kumar
> > By the way, I added a create_ground_truth utility, which creates .gt.txt > files as well as the associated .tif files for every specified font, to > the package. I think it could be useful for anyone who does not have a > ground truth collection yet. > > Thanks, I tried it with latest tesseract

Re: [tesseract-ocr] Re: Announcement: Python package pytesstrain (Tesseract training helpers)

2020-02-04 Thread Shree Devi Kumar
Thanks, Wincent. I will try out the tools added by you. I found a Unicode version of the ISRI evaluation tools at https://github.com/eddieantonio/ocreval which handles the high range Unicodepoints also. See https://github.com/Shreeshrii/tesstrain-modi/blob/master/reports/modi-eval-modiLayer_1.017_