In article <[EMAIL PROTECTED]>, rbt <[EMAIL PROTECTED]> wrote: . . .
Read and search them for strings. If I could do that on windows, linux and mac with the *same* bit of Python code, I'd be very happy ;)
Textual content, right? Without regard to font funniness, or whether the string is in or out of a table, and so on?
That's right. More specifically, I've written a script that uses a RE to search through documents for social security numbers. You can see it here:
http://filebox.vt.edu/users/rtilley/public/find_ssns/find_ssns.html
This works on Word, Excel, html, rtf or any ANSI based text. I need the ability to read and make sense of PDF files as well so I can apply the RE to their content. It's been frustrating to say the least. Nothing at all against Python... mostly just sick of hearing about the 'Portable' document format that isn't string or RE searchable... at least not easily anyway.
'Might be a few days before I answer; I'm crashing into end-of- the-month deadlines.
No problem. Thanks for the help. -- http://mail.python.org/mailman/listinfo/python-list