I am the developer of the Namsel OCR project (https://www.namsel.com/) and
can speak to a few different Tibetan OCR implementations. First, you may
want look at tbrc.org and particularly their e-text section. We've OCR'd
the entire Tibetan Tengyur and Kangyur as well as hundreds of thousands of
My understanding is PDF/A requires a bit more metadata, for example some
color profile information (ICC) and a description about where the data came
from (XMP). Tesseract doesn't supply that, sorry. I have no reason to
believe implementation is hard, it's just not something I'm currently
There's the normal Linux way for appending things:
tesseract image-1.png - >> results.txt
tesseract image-2.png - >> results.txt
tesseract image-3.png - >> results.txt
...
Or perhaps you are thinking about support for streaming:
Hi all, I just want to mention that the copy of tesstrain.sh that ships
with Ubuntu is slightly modified to make life a little easier. The
very terse documentation is in the standard location.
/usr/share/doc/tesseract/README.debian
The modification saves some typing. This is an example of
4 matches
Mail list logo