Thanks,
I found the problem. There was a UTF-8 BOM mark at the beginning of the
dic and aff files, and this broke everything. I took the mark out and
now it works.
Presently we separate words with a Zero Width Space (ZWSP), but in the
future we hope to be able to integrate a break iterator in the
applications, so that inserting the ZWSP will no longer be necessary...
we will probably base it on the new one for Thai in ICU. The problem is
that it needs to go into all the major applications.
Javier
[EMAIL PROTECTED] wrote
Hi Javier,
It seems, this is a break iterator problem (I think, your dictionary works with
Hunspell executable). You need to add a Khmer dictionary based break iterator
class to the OOo (or ICU, if you need a general solution) source. In
OpenOffice.org Thai locale uses ICU's Thai dic. based break iterator, Japanese
and Chinese locales have dictionaries in
OOo/i18npool/source/breakiterator/data/ directory, and new break iterator
classes implemented in OOo/i18npool/source/breakiterator_cjk.cxx and
OOo/i18npool/inc/breakiterator_cjk.hxx.
Could you make an issue for this problem? It seems, OOo will have improvements
in this direction:
http://specs.openoffice.org/g11n/word_breaking/42660_Easy_override_of_incorrect_word_breaking.odt
Regards,
Laci
Quoting Javier SOLA <[EMAIL PROTECTED]>:
Hi,
I just finished a term list for Khmer language in UTF-8 (56.000 terms),
and an affix file. Khmer is a complex text layout language (CTL)
I have been trying to test them, by placing both files in the
share\dict\ooo\ directory of my installation of OOo 2.1 (on Windows),
and I have added an entry to the dictionary.lst file to include the
language.
As a result, now OOo detects that I have a Khmer (km) dictionary, but it
cannot find any wo rds in it (unless they have one single letter).
I have tried to simplify the .aff file to one single line (SET UTF-8),
and the dictionary to 20 terms, but it still cannot find any of the
words that are in the dictionary.
I have looked at the Gujarati dictionary (because it is in UTF-8), and I
believe that I am doing something similar.... but still, it cannot find
my words.
Any idea of what I might be doing wrong?
Thanks in advance,
Javier
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]