How can I get the proper  8-bit encoded morphological dictionaries?
The ones I downloaded from 
ftp://ftp.mokk.bme.hu/Tool/Hunmorph/Resources/Morphdb.hu/morphdb-hu-20060525.tgz
(morphdb_hu.aff, dic) are obviously not in 8 bit encoded format.

Can I convert them to the proper form? If yes, how?

I tried:
e...@anonymous:~/program/humorph$ cat morphdb_hu.aff | iconv -f latin2 -t utf-8 
> morphdb_hu.aff.u8
e...@anonymous:~/program/humorph$ cat morphdb_hu.dic | iconv -f latin2 -t utf-8 
> morphdb_hu.dic.u8


In the *.aff.u8 file 
SET ISO8859-2 replaced with SET UTF-8 

The result is still no good:

e...@anonymous:~/program/humorph$ echo program | chmorph *hu.aff.u8 *hu.dic.u8 
/dev/stdin NOM ACC
program

e...@anonymous:~/program/humorph$ echo program | chmorph *hu.aff.u8 *hu.dic.u8 
/dev/stdin NOM POSS
program

e...@anonymous:~/program/humorph$ echo program asztalt |./analyze *hu.aff.u8 
*hu.dic.u8 /dev/stdin
generate(program, asztalt) = NO DATA

-eleonora



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lingucomponent.openoffice.org
For additional commands, e-mail: dev-h...@lingucomponent.openoffice.org

Reply via email to