In message <[EMAIL PROTECTED]>, [EMAIL PROTECTED] wrote: > On Sep 25, 3:02 pm, Paul Hankin <[EMAIL PROTECTED]> wrote: > >> Googling for 'pdf to text python' and following the first link >> giveshttp://pybrary.net/pyPdf/ > > Doesn't work that well...
This is inherent in the nature of PDF: it's a page-description language, not a document-interchange language. Each text-drawing command can put a block of text anywhere on the page, so you have no idea, just from parsing the PDF content, how to join these blocks up into lines, paragraphs, columns etc. -- http://mail.python.org/mailman/listinfo/python-list