On Wed, Mar 13, 2013 at 09:04:43AM +0100, Hans J. Albertsson wrote: > not such an easy thing then... > > however. I suppose given a complete dictionary, a system could try > splitting a word between c and h and see if the resulting subwords > are in the dictionary. Hmm
Sometimes they might not be valid words, so no match in the dictionary. But I think we shouldn't waste our time with this :-). Who cares about a language actively used by ~5 mio people only? Especially, when the problem is really a corner case in our language. > On 2013-03-13 08:57, Marcel Telka wrote: > >On Wed, Mar 13, 2013 at 08:01:16AM +0100, Hans J. Albertsson wrote: > >>Can that confusion happen at the start of a word or only inside > >>words?? And how many rulebreaking words are there, can they be > >>enumerated?? > >It can happen only in the middle. Only in a case when the word is constructed > >from two initially (semi-)separate words. In my example the word "viachlasný" > >is "viac" + "hlasný" (it is something like multi+voice). It will happen > >always > >when you combine a word ending by "c" with another word starting with "h". > >Such > >combinations are not very often used in the language, but because they are > >constructed as a combination of two words, you can construct a lot of such > >words. The situation is even worse because those two (semi-)separate words > >used > >for the construction might not exist as separate words in such form as it was > >used for the construction. > > > >BTW, the problem is caused by a fact that our "ch" (the single letter) is > >equivalent to Russian "X", with the exception that we do not have a single > >character for it (we probably should have). AFAIK, Czech language is similar > >with "ch" (it is a single letter too), but I am not sure whether there is an > >example of a real word where "ch" are two letters. > > > >>On 2013-03-12 22:50, Sašo Kiselkov wrote: > >>>On 03/12/2013 10:10 PM, Marcel Telka wrote: > >>>>On Tue, Mar 12, 2013 at 10:02:27PM +0100, Sašo Kiselkov wrote: > >>>>>I'm pretty sure nobody in bash development actually considers > >>>>>locale-specific letter ordering rules. Language-specific idiosyncrasies > >>>>>are a never ending stream of hurt and implementation problems (e.g. in > >>>>>my language "ch" is supposed to be treated as a single letter for > >>>>>sorting purposes). > >>>>Interestingly, "ch" is not always a single letter. It depends on a word: > >>>>"viachlasný" is an example of a word where "ch" are two letters... > >>>> > >>>>Yes, our language is Slovak. > >>>Correct, that's another twisty-twist I forgot to mention. Slovak > >>>sucks... (for computing) > >>> > >>>-- > >>>Saso > > > _______________________________________________ > OpenIndiana-discuss mailing list > OpenIndiana-discuss@openindiana.org > http://openindiana.org/mailman/listinfo/openindiana-discuss -- +-------------------------------------------+ | Marcel Telka e-mail: mar...@telka.sk | | homepage: http://telka.sk/ | | jabber: mar...@jabber.sk | +-------------------------------------------+ _______________________________________________ OpenIndiana-discuss mailing list OpenIndiana-discuss@openindiana.org http://openindiana.org/mailman/listinfo/openindiana-discuss