Hello Dick, and all (first post), Here are some more that I use:
HTML to text: Vilistextum http://bhaak.dyndns.org/vilistextum/ also lynx: http://lynx.browser.org/ PDF to text: pdftotext, from Xpdf http://www.foolabs.com/xpdf/ WordPerfect to text: wpd2text, from libwpd http://libwpd.sourceforge.net/ Converting other text encodings: iconv http://www.gnu.org/software/libiconv/ -Stuart Sierra John Leach wrote: > you may need to turn to using some external tools. > > something similar to this was discussed before and some tools suggested. > > See: http://www.ruby-forum.com/topic/103374 > > On Wed, 2007-04-25 at 19:14 +0200, Dick Monahan wrote: >> The documents we want to index come in many formats; e.g., HTML, PDF, >> RTF, Word, Excel, etc., etc., etc. I've been searching to find parsers >> that will translate each of these formats to indexable text, but have >> had little success. Any help will be appreciated. _______________________________________________ Ferret-talk mailing list [email protected] http://rubyforge.org/mailman/listinfo/ferret-talk

