Andreas Lobinger wrote:

rbt wrote:

Not really a Python question... but here goes: Is there a way to read the content of a PDF file and decode it with Python? I'd like to read PDF's, decode them, and then search the data for certain strings.

First of all,

still applies here.

If you can deal with a very basic implementation of a pdf-lib you
might be interested in

In the CVS (or the current snapshot) you can find in
ppg/Doc/text_extract.txt an example for text extraction.

 >>> import pdffile
 >>> import pages
 >>> import zlib
 >>> pf = pdffile.pdffile('../pdf-testset1/a.pdf')
 >>> pp = pages.pages(pf)
 >>> c = zlib.decompress(pf[pp.pagelist[0]['/Contents']].stream)
 >>> op = pdftool.parse_content(c)
 >>> sop = [x[1] for x in op if x[0] in ["'", "Tj"]]
 >>> for a in sop:
        print a[0]

Wishing a happy day

Thanks guys... what if I convert it to PS via printing it to a file or something? Would that make it easier to work with?

Reply via email to