Re: [tesseract-ocr] pdf -> searchable PDF

2017-01-20 Thread Jeff Breidenbach
There is a lengthy side discussion that is appropriate to move back here. I've been asked to elaborate what I mean by image extraction. https://github.com/tesseract-ocr/tesseract/issues/660 There are two ways to turn a PDF file into images. One is to render it, for example using a tool like

Re: [tesseract-ocr] Macron’s recognition in Tesseract (āĀēĒīĪōŌūŪ)

2017-01-20 Thread ShreeDevi Kumar
In addition to Macrons, I will also request addition of other accented letters for Indic text transliterations. Ray, Will using -l eng+ be the best way to handle these? I tried to do an add layer training, but the recognition is worse, since I did not use many fonts for the test training. I am