On Apr 1, 2007, at 3:09 AM, John Bachir wrote: > Here's an interesting problem: In my app, we are indexing various > types of documents, including microsoft powerpoint. Powerpoint > documents are mostly binary, but have a bunch of text (all of the > text in the document?) as well.
Are you serious? You're adding raw, unprocessed PPT files to your index? Now this is just wrong. PPT files may contain all sorts of binary data, such as images and videos. I just had a look at the sample presentation that came with my Office installation. This file is 3.5MB in size with a (plain text) payload of less than 1KB. I'm sure there's some tool available which converts PPT to plain text and I strongly recommend you go out and find it. Cheers, Andy _______________________________________________ Ferret-talk mailing list [email protected] http://rubyforge.org/mailman/listinfo/ferret-talk

