-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi,
there are some: Catdoc and Antiword for example. Simple shell commands to extract text from files. http://www.45.free.net/~vitus/software/catdoc/ http://www.winfield.demon.nl/ Antiword has better windows support, but as far as I know doesn't support .ppt as well as catdoc. I'm no expert though, just used it once or twice at the university. If you use them, I would be interested in feedback on how well it works. Thanks in advance and good luck Florian P.S.: There is an article about even more of them at http://www.linux.com/article.pl?sid=06/02/22/201247 . John Joseph Bachir wrote: > On Apr 1, 2007, at 5:37 AM, Andreas Korth wrote: >> Are you serious? You're adding raw, unprocessed PPT files to your >> index? >> >> Now this is just wrong. PPT files may contain all sorts of binary >> data, such as images and videos. I just had a look at the sample >> presentation that came with my Office installation. This file is >> 3.5MB in size with a (plain text) payload of less than 1KB. > > As I stated in my previous email, I am conjecturing that indexing > these documents will not affect search performance. Do you disagree? > > > >> I'm sure there's some tool available which converts PPT to plain text >> and I strongly recommend you go out and find it. > > > I've searched far and wide and have found none. > > john > _______________________________________________ > Ferret-talk mailing list > [email protected] > http://rubyforge.org/mailman/listinfo/ferret-talk > -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.3 (Darwin) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGD+ID8RlGMqQ8m7oRAoFjAKCfgIzDsFnl+gKgnHQKI11yAkhTYQCfQpx3 fa5wJ2SaE2JlLzQABqxJe7Q= =AX5Q -----END PGP SIGNATURE----- _______________________________________________ Ferret-talk mailing list [email protected] http://rubyforge.org/mailman/listinfo/ferret-talk

