-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi,

there are some:

Catdoc and Antiword for example. Simple shell commands to extract text
from files.

http://www.45.free.net/~vitus/software/catdoc/
http://www.winfield.demon.nl/

Antiword has better windows support, but as far as I know doesn't
support .ppt as well as catdoc. I'm no expert though, just used it once
or twice at the university. If you use them, I would be interested in
feedback on how well it works.

Thanks in advance and good luck
Florian

P.S.: There is an article about even more of them at
http://www.linux.com/article.pl?sid=06/02/22/201247 .

John Joseph Bachir wrote:
> On Apr 1, 2007, at 5:37 AM, Andreas Korth wrote:
>> Are you serious? You're adding raw, unprocessed PPT files to your  
>> index?
>>
>> Now this is just wrong. PPT files may contain all sorts of binary
>> data, such as images and videos. I just had a look at the sample
>> presentation that came with my Office installation. This file is
>> 3.5MB in size with a (plain text) payload of less than 1KB.
> 
> As I stated in my previous email, I am conjecturing that indexing  
> these documents will not affect search performance. Do you disagree?
> 
> 
> 
>> I'm sure there's some tool available which converts PPT to plain text
>> and I strongly recommend you go out and find it.
> 
> 
> I've searched far and wide and have found none.
> 
> john
> _______________________________________________
> Ferret-talk mailing list
> [email protected]
> http://rubyforge.org/mailman/listinfo/ferret-talk
> 
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.3 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGD+ID8RlGMqQ8m7oRAoFjAKCfgIzDsFnl+gKgnHQKI11yAkhTYQCfQpx3
fa5wJ2SaE2JlLzQABqxJe7Q=
=AX5Q
-----END PGP SIGNATURE-----
_______________________________________________
Ferret-talk mailing list
[email protected]
http://rubyforge.org/mailman/listinfo/ferret-talk

Reply via email to