Hello,

Why can not you store the Dzonkha words in the dictionary as words together 
with Tsheg marks:

Wordcount
Syl1
syl1TshegSyl2/flags
Syl3TshegSyl4TshegSyl5/flags
Syl1TshegSyl4/flags
..

?

This is how all latin charset using languages
store their words. (except: They do not store Tshegs,
but checking would work perfectly also with Tshegs)

Is Tsheg also between words in Dzongha, or there is space or a different symbol?

-eleonora


Hi,

Dzongkha text flow in continuum. Dzongkha words consists of one or more
syllable.
in case of multisyllable word, the syllables are separated by the Tibetan
Inter-syllabic Mark called Tsheg [unicode: 0F0B].
This Tsheg is a small dot represented in the Dzongkha keyboard by [Space
Bar].

So, the basic problem with the Dzongkha Spell Checker is that, this Tsheg
causes
hunspell to spell check Dzongkha word syllable by syllable.
and if we store the .dic file with syllables instead of word,
then there would be multitude of invalid words formed.

The example to suit the above problem would be Latin-borrowed English words
"ad hoc", "alma mater", etc....
if we list "ad", "hoc", "alma", "mater", separately in the .dic file, then
we can have words such as "ad alma" "ad mater"
"alma hoc", and so on.......

i see mentioning about ICU breakiterator, ZWSP, etc. how do these all
works..any links to these....
How to go about it... Any idea and suggestionsgreatly appreciated..

Thanks in advance
C. Norbu.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lingucomponent.openoffice.org
For additional commands, e-mail: dev-h...@lingucomponent.openoffice.org

Reply via email to