[I am using a mail program that does not support utf-8 properly so please excuse the mangling of your name]

On Mon, 19 Jan 2009, Przemys?aw 'Przemoc' Pawe?czyk wrote:

So the problem is with the lossy conversion from e.g. UTF8 to some
8-bit character set.

In my opinion aspell should detect words with
unsupported characters (in current language) and store/print them
without any conversion.
Otherwise some languages might look privileged in some dictionaries
(vide German in English dictionaries). The control over "language
privileges" it is not in user hand and that is not a good solution.

I do not fully understand what you are saying, however to me it makes little sense to store foreign words in a dictionary, for example German in an English dictionary. The only exception might be foreign names, but I don't want to get into that.

Now the problem you are having is that Aspell is not recognizing foreign characters as part of the word. This is because it assumes any characters it does not know about in the current language (ie not in the 8-bit character set for the language) is not part of a word. To fix this it will be necessary to recreate the dictionaries from source replacing the current character set with a special expanded one which includes all characters in the Latin script.

For the English language do this.  Download and unpack the English dictionary 
from:
  ftp://ftp.gnu.org/gnu/aspell/dict/en/aspell6-en-6.0-0.tar.bz2
and get aspell-lang from cvs using:
  cvs -z3 -d:pserver:anonym...@cvs.savannah.gnu.org:/sources/aspell co 
aspell-lang

Go into the "aspell-lang" directory and create the expanded character set using:
  ./mkchardata maps/iso-8859-1-u.txt

Now copy some files from aspell-lang to aspell6-en-6.0-0
  cp aspell-lang/maps/iso-8859-1-u.cset aspell6-en-6.0-0
  cp aspell-lang/maps/iso-8859-1-u.cmap aspell6-en-6.0-0
  cp -p aspell-lang/proc aspell6-en-6.0-0

Now go into "aspell6-en-6.0-0".

Edit the file "en.dat" and change "iso8859-1" to "iso8859-1-u". Also edit en_affix.dat and change "ISO8859-1" to "ISO8859-1-U".

In "info" add the lines:
  data-file iso-8859-1-u.cset
  data-file iso-8859-1-u.cmap
(doesn't really matter where)

Now regenerate the other files:
  ./proc

And finally build the dictionary:
  ./configure
  make

And maybe install it:
  make install


For other languages do a similar thing.

For more info in the expanded character set see "B.1.1 Notes on Latin Languages" in the manual (http://aspell.net/man-html/Supported.html) and the README in aspell-lang.



_______________________________________________
Aspell-user mailing list
Aspell-user@gnu.org
http://lists.gnu.org/mailman/listinfo/aspell-user

Reply via email to