Hello Scott

> In this case, the pattern has a bit more information:
> a=á<b<c<cs<d<e=é<f<g<gy<h<i=í...<z<zs
> where "a" can be told to sort the same as "a with acute", both of these sort
> before "b" ... and "zs" sorts after "z"
Actually a<>á and e<>é ... more clearly a<á and e<é. In some relaxed
situtations the equivalence could be stated but the Hungarian grammar
is much more complex that I could be an "ex catedra" judge about it.

> Geza, how important are these "multi-letter graphemes" (cs, dz, dzs, gy, ly,
> ny, sz, ty and zs) in a sort algorithm?  At the same site, Péter Szigetvári
dz and dzs are good for translative ortography i.e. for transcribing
foreign words. E.g. dzs means j (the Hungarian language is more
phonetic-oriented than any other indo-europian or latin-legacy
language families). cs, gy, ly, ny, sz, ty and zs are "inborn"
Hungarian specialities, many words has them as components. How
important they are? That's a very hard question because in a mixed
language text (e.g. Hungarian medical report intersprsed with medical
latin terminology) one should understand the word itself to specify
its corresponding sorting order: e.g. in a Hungarian word the "ly" phoneme
(which roughly corresponds to the English "y", but in Hungarian "j" is
phonetically also equivalent with "ly" but ortographically different
words use the one than the other). If you don't know the word you
cannot even decide its hyphenation, as you wrote:
> "Unfortunately, the task is not trivial: some sequences that look like
> multi-letter graphemes are in fact not, e.g., bércsík may be ranked before
> or after bérczerge depending on its morphology: bér+csík (after bérczerge)
> or bérc+sík (before bérczerge). This can be decided only with a
bér-csík or bérc-sík - different sorting order and even different
hyphenation (just for fulfilling your presmued curiosity what these
words mean: the 1st one could be translated to payment-stripe [not a
logical word combination] the second one to a geographical plane
[correct Hungarian word]. Without a dictionary, no program can get
through this, not even a semantic parser.

Back to these di-graphemes: they are important, fundamental parts of
our language but personally I can live without sorting them correctly
in a computer program. :-)



-- 
Best regards,
 Geza                            mailto:[EMAIL PROTECTED]

-- 
To unsubscribe from this list, please send an email to
[EMAIL PROTECTED] with "unsubscribe" in the 
subject, without the quotes.

Reply via email to