Re: [CODE4LIB] pdf2txt

Eric Lease Morgan Tue, 15 Oct 2013 08:46:22 -0700

On Oct 14, 2013, at 7:56 AM, Nicolas Franck <[email protected]> wrote:


> Could this also be done by Apache Tika? Or do I miss a crucial point?
> 
> http://tika.apache.org/1.4/gettingstarted.html


Nicolas, this looks VERY promising! It seemingly can extract the OCR from a PDF 
document as well as extract the text from a Word document. 'More experimenting, 
but thank you. code4lib++  --Eric Morgan

Re: [CODE4LIB] pdf2txt

Reply via email to