On 10/15/2013 12:25 PM, Eric Lease Morgan wrote:
On Oct 14, 2013, at 4:49 PM, Robert Haschart<[email protected]>  wrote:

For a limited period of time I am making publicly available a Web-based program 
called PDF2TXT --http://bit.ly/1bJRyh8
Although based on some subsequent messages where you mention tesseract
maybe I misunderstood and your tool only handles pdfs that have already
been OCR'ed which would explain why the second document (which only
contains page images) fails.
Robert, that's correct. As of right now the document needs to have been 
previously OCRed. --Eric
The abstract extraction routine I have been working on does use tesseract internally for doing OCR when it encounters a document that doesn't have usable full-text. I agree that tesseract is not that easy to install, especially if (as in my case) you do not have root/sudo access to the machine. Since I have gone through installing tesseract quite recently, perhaps my experience can be helpful to you.

-Bob Haschart

Reply via email to