Hi!
Are you *sure* this is all valid UTF8? I dont know how the file
command determines this, and if it always is right.
Maybe try to play around with iconv to ensure whatever you send to
Ferret really is UTF8.
Cheers,
Jens
On 19.05.2008, at 18:00, Eric Schulte wrote:
Hi,
I am trying to index a number of Spanish language text files, but a
large fraction of the files are generating errors like the
following...
Error: exception 2 not handled: Error decoding input string. Check
that you have the locale set correctly
however it looks to me like my locale matches the file type. Running
the file command on the files returns
$ file /media/.../raw/abc/20Jan2007_abc_001041_67.es
/media/.../raw/abc/20Jan2007_abc_001041_67.es: UTF-8 Unicode text
and my locale is
$ locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=
after enough of these errors are generated, I begin to get errors for
having too many open files, and the indexing fails.
Error: exception 2 not handled: Too many open files
Any suggestions would be greatly appreciated.
Thanks,
Eric
_______________________________________________
Ferret-talk mailing list
[email protected]
http://rubyforge.org/mailman/listinfo/ferret-talk
--
Jens Krämer
Finkenlust 14, 06449 Aschersleben, Germany
VAT Id DE251962952
http://www.jkraemer.net/ - Blog
http://www.omdb.org/ - The new free film database
_______________________________________________
Ferret-talk mailing list
[email protected]
http://rubyforge.org/mailman/listinfo/ferret-talk