Hi,

So I've tried switching to the latest version of Ferret (0.11.06), but
I am still getting the following errors.

,----
| Error: exception 2 not handled: Error decoding input string. Check that you 
have the locale set correctly
|       from spanish_indexer.rb:45
|       from spanish_indexer.rb:38:in `each'
|       from spanish_indexer.rb:38
`----

The articles are recognized as valid utf8 using iconv, and I believe
my locale is set properly

,----
| LANG=en_US.UTF-8
| LC_CTYPE="en_US.UTF-8"
| LC_NUMERIC="en_US.UTF-8"
| LC_TIME="en_US.UTF-8"
| LC_COLLATE="en_US.UTF-8"
| LC_MONETARY="en_US.UTF-8"
| LC_MESSAGES="en_US.UTF-8"
| LC_PAPER="en_US.UTF-8"
| LC_NAME="en_US.UTF-8"
| LC_ADDRESS="en_US.UTF-8"
| LC_TELEPHONE="en_US.UTF-8"
| LC_MEASUREMENT="en_US.UTF-8"
| LC_IDENTIFICATION="en_US.UTF-8"
| LC_ALL=
`----

what's weird here is that the errors don't always happen on the same
articles, if I try to run indexing three times, printing out the
articles that throw this error, I get a different list of articles
each time.

In fact I just changed my indexing script so that it keeps trying to
index failed articles

,----
| # ind      is my index
| # 
| # add_arts is a method which takes a list of articles, tries to
| #          index them, and returns a list of the articles that
| #          threw errors during indexing
| # 
| puts art_paths.size.to_s + "articles"
| missed = add_arts(art_paths, ind)
| while missed.size > 0
|   missed = add_arts(missed, ind)
|   puts missed.size
| end
`----

and I was able to index all of the articles with the following output

,----
| 5843 articles
| 34
| 16
| 10
| 9
| 7
| 7
| 6
| 1
| 0
`----

any ideas what could be causing this non-deterministic behavior?

Thanks,
Eric

-- 
schulte
_______________________________________________
Ferret-talk mailing list
[email protected]
http://rubyforge.org/mailman/listinfo/ferret-talk

Reply via email to