Hi!

Are you *sure* this is all valid UTF8? I dont know how the file command determines this, and if it always is right. Maybe try to play around with iconv to ensure whatever you send to Ferret really is UTF8.

Cheers,
Jens

On 19.05.2008, at 18:00, Eric Schulte wrote:

Hi,

I am trying to index a number of Spanish language text files, but a
large fraction of the files are generating errors like the
following...

Error: exception 2 not handled: Error decoding input string. Check that you have the locale set correctly

however it looks to me like my locale matches the file type.  Running
the file command on the files returns

$ file /media/.../raw/abc/20Jan2007_abc_001041_67.es
/media/.../raw/abc/20Jan2007_abc_001041_67.es: UTF-8 Unicode text




and my locale is

$ locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=


after enough of these errors are generated, I begin to get errors for
having too many open files, and the indexing fails.

Error: exception 2 not handled: Too many open files

Any suggestions would be greatly appreciated.

Thanks,
Eric
_______________________________________________
Ferret-talk mailing list
[email protected]
http://rubyforge.org/mailman/listinfo/ferret-talk


--
Jens Krämer
Finkenlust 14, 06449 Aschersleben, Germany
VAT Id DE251962952
http://www.jkraemer.net/ - Blog
http://www.omdb.org/     - The new free film database

_______________________________________________
Ferret-talk mailing list
[email protected]
http://rubyforge.org/mailman/listinfo/ferret-talk

Reply via email to