C. Norbu schreef:
On Fri, Jun 12, 2009 at 5:52 PM, ge <eleonor...@gmx.net> wrote:

Hello,

Why can not you store the Dzonkha words in the dictionary as words together
with Tsheg marks:

Wordcount
Syl1
syl1TshegSyl2/flags
Syl3TshegSyl4TshegSyl5/flags
Syl1TshegSyl4/flags
..

?
Thanks. This is how i did and it doesn't seems to work. i had even tried
including

WORDCHARS [Tsheg] and BREAK [Tsheg] in the affix file.


This is how all latin charset using languages
store their words. (except: They do not store Tshegs,
but checking would work perfectly also with Tshegs)

 In our case, instead of space, we use Tsheg. it is with the keystroke
[SPACE BAR] in our keyboard system. so, the moment we strike the SPACE BAR,
the first syllable was spell checked (even after storing the words same as
above).

I had some problems like this, and spent a lot of time trying to find the cause. In the end, it was the encoding that was bothering me. Be sure the file has the same encoding as stted in the header of the .aff ..
Is Tsheg also between words in Dzongha, or there is space or a different
symbol?
Yes. Tsheg [symbolically, small dot] is between Dzongkha characters,
syllables, and words.


is it something to do with word boundaries in Dzongkha. or may be incorrect
.aff and .dic file.
How do you see it.


Thanks.
Regards,

C.Norbu

-eleonora


Hi,

Dzongkha text flow in continuum. Dzongkha words consists of one or more
syllable.
in case of multisyllable word, the syllables are separated by the Tibetan
Inter-syllabic Mark called Tsheg [unicode: 0F0B].
This Tsheg is a small dot represented in the Dzongkha keyboard by [Space
Bar].

So, the basic problem with the Dzongkha Spell Checker is that, this Tsheg
causes
hunspell to spell check Dzongkha word syllable by syllable.
and if we store the .dic file with syllables instead of word,
then there would be multitude of invalid words formed.

The example to suit the above problem would be Latin-borrowed English words
"ad hoc", "alma mater", etc....
if we list "ad", "hoc", "alma", "mater", separately in the .dic file, then
we can have words such as "ad alma" "ad mater"
"alma hoc", and so on.......

i see mentioning about ICU breakiterator, ZWSP, etc. how do these all
works..any links to these....
How to go about it... Any idea and suggestionsgreatly appreciated..

Thanks in advance
C. Norbu.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lingucomponent.openoffice.org
For additional commands, e-mail: dev-h...@lingucomponent.openoffice.org





---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lingucomponent.openoffice.org
For additional commands, e-mail: dev-h...@lingucomponent.openoffice.org

Reply via email to