[tesseract-ocr] Re: Tesseract for Tibetan

2016-01-15 Thread Zach
I am the developer of the Namsel OCR project (https://www.namsel.com/) and can speak to a few different Tibetan OCR implementations. First, you may want look at tbrc.org and particularly their e-text section. We've OCR'd the entire Tibetan Tengyur and Kangyur as well as hundreds of thousands of

[tesseract-ocr] Re: PDF/A versions

2016-01-15 Thread Jeff Breidenbach
My understanding is PDF/A requires a bit more metadata, for example some color profile information (ICC) and a description about where the data came from (XMP). Tesseract doesn't supply that, sorry. I have no reason to believe implementation is hard, it's just not something I'm currently

[tesseract-ocr] Re: append output file?

2016-01-15 Thread Jeff Breidenbach
There's the normal Linux way for appending things: tesseract image-1.png - >> results.txt tesseract image-2.png - >> results.txt tesseract image-3.png - >> results.txt ... Or perhaps you are thinking about support for streaming:

Re: [tesseract-ocr] how to use tesstrain .sh etc in ubuntu 15.10

2016-01-15 Thread Jeff Breidenbach
Hi all, I just want to mention that the copy of tesstrain.sh that ships with Ubuntu is slightly modified to make life a little easier. The very terse documentation is in the standard location. /usr/share/doc/tesseract/README.debian The modification saves some typing. This is an example of