RE: Hungarian collation

imre Mon, 30 Oct 2006 22:28:03 -0800

Hi, 

> From: Peter Gulutzan [mailto:[EMAIL PROTECTED] 
> > > MySQL is looking for an authoritative, official statement which 
> > > states all the current Hungarian collation rules.
> > 
> > According to the Reference Level Description of the 
> hungarian language 
> > (ISBN
> > 9634206441 or the hungarian version on line:
> > http://bme-tk.bme.hu/other/kuszob/hangok.htm ) the rules are the 
> > following:
> > 
> 
> Apparently http://bme-tk.bme.hu/other/kuszob/hangok.htm is an 
> educational site (something to do with the council of Europe) 
> as opposed to an official standards site, if I'm 
> understanding correctly.


Yes.

There is a standard about the collation to use in libraries and
bibliographies.  You can find some data about it here:
http://www.mszt.hu/standardsearch/detail.asp?id=007042 

The definitive guide of the hungarian language is the "A magyar helyesírás
szabályai" (ISBN 9630577356) issued by the Hungarian Academy of Sciences.
An older issue (from 1985) is available for download from here (in
Hungarian): http://mek.oszk.hu/01500/01547/index.phtml

It describes practically the same collation rules as the Reference Level
description, with an additional rule about (latin-like) letters that don't
appear in the Hungarian alphabet.  This is the following:
These letters are sort with their unadorned version, except when all else is
equal.  In that case they are coming after the native variants
I.e.: galamb < Gärtner < gáz and mosna < Mošna

> > - The basic order of the alphabet is a á b c cs d dz dzs e 
> é f g gy h 
> > i í j k l ly m n ny o ó ö ő p q r s sz t ty u ú ü ű v w x y z zs
> > - For the short-long vowel pairs (a á, e é, i í, o ó, ö ő, 
> u ú, ü ű)  
> > long = short usually, but long > short if all else is 
> equal. E.g., kád 
> > < kar < kár < kard
> 
> So far, this seems to be the opinion of a majority, although 
> not everyone describes the rule the same way. If MySQL adopts 
> this rule, SELECT * FROM t WHERE column1 = 'kár'; will not 
> return rows where column1 = 'kar'. But perhaps SELECT * FROM 
> t WHERE column LIKE 'ká%'
> will return rows where column1 = 'kar'

This sounds pretty good to me, especially that in the Hungarian language,
the accent marks tend to appear & disappear from words according to the
suffix.

> > - The long double consonants are sorting as if they would have been 
> > expanded.  I.e., ggy as gygy, nny as nyny
> 
> So 'ccs sorts with cscs' is true, i.e. ccs > cds
> 
> I expect that there is no rule which could apply for all LIKE 
> searches.

I think, it would be nice (again, because of certain suffix rules) if e.g.,
LIKE 'cs%' would also match 'ccs'

> > - Composit words are sorted according to word parts. I.e., 
> meggyújt < 
> > meglát < megy < meggy
> > 
> 
> I don't see a way to determine what is a composite word. So 
> MySQL would return meglát < megy < meggy < meggyújt

I was sort of expecting this :-)

ImRe



--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

RE: Hungarian collation

Reply via email to