On Tue, 04 Jul 2006 06:32:13 -0700, Gaurav Agarwal wrote: > Hi, > > I wanted a script that can convert any file format (RTF/DOC/HTML/PDF/PS > etc) to text format.
RTF, HTML and PS are already text format. DOC is a secret, closed proprietary format. It will be a lot of work reverse-engineering it. Perhaps you should consider using existing tools that already do it -- see, for example, the word processors Abiword and OpenOffice. They are open-source, so you can read and learn from their code. Alternatively, you could try some of the suggestions here: http://www.linux.com/article.pl?sid=06/02/22/201247 Or you could just run through the .doc file, filtering out binary characters, and display just the text characters. That's a quick-and-dirty strategy that might help. PDF is (I believe) a compressed, binary format of PS. Perhaps you should look at the program pdf2ps -- maybe it will help. If you explain your needs in a little more detail, perhaps people can give you answers which are a little more helpful. -- Steven. -- http://mail.python.org/mailman/listinfo/python-list