On Wed, Mar 17, 2010 at 9:53 AM, Peng Yu <pengyu...@gmail.com> wrote: > Thank you for your long reply! But I'm not sure if you get my question or not. > > Acrobat can highlight certain words in pdfs. I could add notes to the > highlighted words as well. However, I find that I frequently end up > with highlighting some words that can be expressed by a regular > expression. > > To improve my productivity, I don't want do this manually in Acrobat > but rather do it in an automatic way, if there is such a tool > available. People in reportlab mailing list said this is not possible > with reportlab. And I don't see PyPDF can do this. If you know there > is an API to for this purpose, please let me know. Thank you!
I do not know of any API specific to this purpose, no. But I mentioned three libraries (pagecatcher, pdfminer, and pdfrw) that are capable, to a greater or lesser extent, of reading in PDFs and giving you the data from them, which you can then do your replacement on and then write back out. I would imagine this would be a piece of cake with pagecatcher. (I noticed you just posted on the reportlab mailing list, but you did not specifically mention pagecatcher.) It will probably take more work with either of the other two. It is probable that none of them do exactly what you want, but also that any of them is a better starting point than coding what you want from scratch. Regards, Pat -- http://mail.python.org/mailman/listinfo/python-list