Hello László, :-) > Hi, > > See extended ALetter definitions of the Hungarian word breaking rules: > > http://svn.services.openoffice.org/ooo/branches/OOO310/i18npool/source/breakiterator/data/dict_word_hu.txt > http://svn.services.openoffice.org/ooo/branches/OOO310/i18npool/source/breakiterator/data/edit_word_hu.txt > > By the way, it also contains numbers and other special signs, because > Hungarian uses their affixed forms. (For example, "with 25%" is > "25%-kal" in Hungarian, and not the frequent bad form "25%-al"): > > $ALetter = [\u0002 [:Alphabetic:] [:name= COMMERCIAL AT:] [:name= > HEBREW PUNCTUATION GERESH:] > [:name = PERCENT SIGN:] [:name = PER MILLE SIGN:] > [:name = PER TEN THOUSAND SIGN:] > [:name = SECTION SIGN:] [:name = DEGREE SIGN:] [:name > = EURO SIGN:] > [:name = HYPHEN-MINUS:] [:name = EN DASH:] [:name = EM DASH:] > [:name = DIGIT ZERO:] > [:name = DIGIT ONE:] > [:name = DIGIT TWO:] > [:name = DIGIT THREE:] > [:name = DIGIT FOUR:] > [:name = DIGIT FIVE:] > [:name = DIGIT SIX:] > [:name = DIGIT SEVEN:] > [:name = DIGIT EIGHT:] > [:name = DIGIT NINE:] > - $Ideographic > - $Katakana > - $Hangul > - [:Script = Thai:] > - [:Script = Lao:] > - [:Script = Hiragana:]]; >
I tried something similar. I did the following changes: $ALetter = [\u0002 [:name = HYPHEN-MINUS:] [:name = EN DASH:] [:Alphabetic:] [:name= COMMERCIAL AT:] [:name= HEBREW PUNCTUATION GERESH:] - $Ideographic - $Katakana - $Hangul - [:Script = Thai:] - [:Script = Lao:] - [:Script = Hiragana:]]; ... $SufixLetter = [:name= FULL STOP:] [:name = HYPHEN-MINUS:] [:name = EN DASH:]; Basically it worked, but an unwanted side effect was that multiple dashes got accepted at the start or end of the word. That is "---water" and "river---" were regarded as one word. Whereas if I use text like "...water" and "river...", always only one of the dashes was included with the word. Thus I am wondering if it could be done similar for the dashes... Also, since I'm completely new to the ICU, I don't know if my above try has any unwanted side effects. Do you have any clues for me? Regards, Thomas