Hi, So I've tried switching to the latest version of Ferret (0.11.06), but I am still getting the following errors.
,---- | Error: exception 2 not handled: Error decoding input string. Check that you have the locale set correctly | from spanish_indexer.rb:45 | from spanish_indexer.rb:38:in `each' | from spanish_indexer.rb:38 `---- The articles are recognized as valid utf8 using iconv, and I believe my locale is set properly ,---- | LANG=en_US.UTF-8 | LC_CTYPE="en_US.UTF-8" | LC_NUMERIC="en_US.UTF-8" | LC_TIME="en_US.UTF-8" | LC_COLLATE="en_US.UTF-8" | LC_MONETARY="en_US.UTF-8" | LC_MESSAGES="en_US.UTF-8" | LC_PAPER="en_US.UTF-8" | LC_NAME="en_US.UTF-8" | LC_ADDRESS="en_US.UTF-8" | LC_TELEPHONE="en_US.UTF-8" | LC_MEASUREMENT="en_US.UTF-8" | LC_IDENTIFICATION="en_US.UTF-8" | LC_ALL= `---- what's weird here is that the errors don't always happen on the same articles, if I try to run indexing three times, printing out the articles that throw this error, I get a different list of articles each time. In fact I just changed my indexing script so that it keeps trying to index failed articles ,---- | # ind is my index | # | # add_arts is a method which takes a list of articles, tries to | # index them, and returns a list of the articles that | # threw errors during indexing | # | puts art_paths.size.to_s + "articles" | missed = add_arts(art_paths, ind) | while missed.size > 0 | missed = add_arts(missed, ind) | puts missed.size | end `---- and I was able to index all of the articles with the following output ,---- | 5843 articles | 34 | 16 | 10 | 9 | 7 | 7 | 6 | 1 | 0 `---- any ideas what could be causing this non-deterministic behavior? Thanks, Eric -- schulte _______________________________________________ Ferret-talk mailing list [email protected] http://rubyforge.org/mailman/listinfo/ferret-talk

