On Sep 25, 6:41 pm, brad <[EMAIL PROTECTED]> wrote: > I have a very crude Python script that extracts text from some (and I > emphasize some) PDF documents. On many PDF docs, I cannot extract text, > but this is because I'm doing something wrong. The PDF spec is large and > complex and there are various ways in which to store and encode text. I > wanted to post here and ask if anyone is interested in helping make the > script better which means it should accurately extract text from most > any pdf file... not just some. > > I know the topic of reading/extracting the text from a PDF document > natively in Python comes up every now and then on comp.lang.python... > I've posted about it in the past myself. After searching for other > solutions, I've resorted to attempting this on my own in my spare time. > Using apps external to Python (pdftotext, etc.) is not really an option > for me. If someone knows of a free native Python app that does this now, > let me know and I'll use that instead!
Googling for 'pdf to text python' and following the first link gives http://pybrary.net/pyPdf/ -- Paul Hankin -- http://mail.python.org/mailman/listinfo/python-list