Hi Dick,

you may need to turn to using some external tools.

something similar to this was discussed before and some tools suggested.

See: http://www.ruby-forum.com/topic/103374


assuming the text is stored ASCII single byte, you could fall back on
the "strings" command as a last resort.  It should be installed already
on modern GNU/Linux distros.  Try cygwin for windows.  It reads in any
data and outputs all "printable character sequences".

John.

On Wed, 2007-04-25 at 19:14 +0200, Dick Monahan wrote:
> The documents we want to index come in many formats;  e.g., HTML, PDF,
> RTF, Word, Excel, etc., etc., etc.  I've been searching to find parsers
> that will translate each of these formats to indexable text, but have
> had little success.  Any help will be appreciated.
> 
-- 
http://johnleach.co.uk

_______________________________________________
Ferret-talk mailing list
[email protected]
http://rubyforge.org/mailman/listinfo/ferret-talk

Reply via email to