On Sat, Feb 14, 2009 at 11:01 AM, Terry Reedy <tjre...@udel.edu> wrote: > > bryan.fodn...@gmail.com wrote: >> >> I have a large amount of RTF files where the only thing in them is an >> image. I would like to extract them an save them as a png. >> Eventually, I would like to also grab some text that is on the image. >> I think PIL has something for this. >> >> Does anyone have any suggestion on how to start this? > > Wikepedia Rich Text Format has several links, which lead to > http://pyrtf.sourceforge.net/ > http://code.google.com/p/pyrtf-ng/ > The former says rtf generation, including images. > The latter says rtf generation and parsing, but only claims to be a rewrite > of the former. > > -- > http://mail.python.org/mailman/listinfo/python-list
I've written an RTF parser in Python before, but for the purpose of filtering and discarding content rather than extracting it. Take a look at the specification here: http://www.microsoft.com/downloads/details.aspx?familyid=dd422b8d-ff06-4207-b476-6b5396a18a2b&displaylang=en You will find that images are specified by one or more RTF control words followed by a long string of hex data. For this special purpose, you will not need to write a parser for the entire specification. Just search the file for the correct sequence of control words, extract the hex data that follows, and save it to a file. It helps if you open the RTF document in a text editor and locate the specific control group that contains the image, as the format and order of control words varies depending on the application that created it. If all of your documents are created with the same application, it will be much easier. -- http://mail.python.org/mailman/listinfo/python-list