[I am using a mail program that does not support utf-8 properly so please
excuse the mangling of your name]
On Mon, 19 Jan 2009, Przemys?aw 'Przemoc' Pawe?czyk wrote:
So the problem is with the lossy conversion from e.g. UTF8 to some
8-bit character set.
In my opinion aspell should detect words with
unsupported characters (in current language) and store/print them
without any conversion.
Otherwise some languages might look privileged in some dictionaries
(vide German in English dictionaries). The control over "language
privileges" it is not in user hand and that is not a good solution.
I do not fully understand what you are saying, however to me it makes
little sense to store foreign words in a dictionary, for example German
in an English dictionary. The only exception might be foreign names, but
I don't want to get into that.
Now the problem you are having is that Aspell is not recognizing foreign
characters as part of the word. This is because it assumes any characters
it does not know about in the current language (ie not in the 8-bit
character set for the language) is not part of a word. To fix this it
will be necessary to recreate the dictionaries from source replacing the
current character set with a special expanded one which includes all
characters in the Latin script.
For the English language do this. Download and unpack the English dictionary
from:
ftp://ftp.gnu.org/gnu/aspell/dict/en/aspell6-en-6.0-0.tar.bz2
and get aspell-lang from cvs using:
cvs -z3 -d:pserver:anonym...@cvs.savannah.gnu.org:/sources/aspell co
aspell-lang
Go into the "aspell-lang" directory and create the expanded character set
using:
./mkchardata maps/iso-8859-1-u.txt
Now copy some files from aspell-lang to aspell6-en-6.0-0
cp aspell-lang/maps/iso-8859-1-u.cset aspell6-en-6.0-0
cp aspell-lang/maps/iso-8859-1-u.cmap aspell6-en-6.0-0
cp -p aspell-lang/proc aspell6-en-6.0-0
Now go into "aspell6-en-6.0-0".
Edit the file "en.dat" and change "iso8859-1" to "iso8859-1-u". Also edit
en_affix.dat and change "ISO8859-1" to "ISO8859-1-U".
In "info" add the lines:
data-file iso-8859-1-u.cset
data-file iso-8859-1-u.cmap
(doesn't really matter where)
Now regenerate the other files:
./proc
And finally build the dictionary:
./configure
make
And maybe install it:
make install
For other languages do a similar thing.
For more info in the expanded character set see "B.1.1 Notes on Latin
Languages" in the manual (http://aspell.net/man-html/Supported.html) and
the README in aspell-lang.
_______________________________________________
Aspell-user mailing list
Aspell-user@gnu.org
http://lists.gnu.org/mailman/listinfo/aspell-user