On Fri, Jan 20, 2017 at 07:19:13PM +0100, Ingo Feinerer wrote:
> Hi,
>
> please find attached a port for pdfsandwich,
> a tool to make "sandwich" OCR pdf files.
>
> $ cat pkg/DESCR
> pdfsandwich generates "sandwich" OCR pdf files, i.e. pdf files which contain
> only images (no text) will be processed by optical character recognition (OCR)
> and the text will be added to each page invisibly "behind" the images.
>
> pdfsandwich is a command line tool which is supposed to be useful to OCR
> scanned books or journals. It is able to recognize the page layout even for
> multicolumn text.
>
> OK to import?
>
> Best regards,
> Ingo
Hi,
I haven't tested pdfsandwich but I have WIP port for ocrmypdf, at least
python is more readable for me than ocalm :)
https://github.com/jbarlow83/OCRmyPDF
j.