In message <[EMAIL PROTECTED]>, 
[EMAIL PROTECTED] wrote:

> On Sep 25, 3:02 pm, Paul Hankin <[EMAIL PROTECTED]> wrote:
>
>> Googling for 'pdf to text python' and following the first link
>> giveshttp://pybrary.net/pyPdf/
> 
> Doesn't work that well...

This is inherent in the nature of PDF: it's a page-description language, not
a document-interchange language. Each text-drawing command can put a block
of text anywhere on the page, so you have no idea, just from parsing the
PDF content, how to join these blocks up into lines, paragraphs, columns
etc.
-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to