UTF8 Latvian collation is wrong for 4 letters: A E I U. 
--------------------------------------------------------

                 Key: CORE-4548
                 URL: http://tracker.firebirdsql.org/browse/CORE-4548
             Project: Firebird Core
          Issue Type: Bug
          Components: Charsets/Collation
    Affects Versions: 2.5.3
         Environment: any
            Reporter: Edgars Solafs


This issue was already reported and fixed couple years ago for WIN1257_LV 
collation.
Here is link for old issue: 
http://tracker.firebirdsql.org/si/jira.issueviews:issue-html/CORE-3131/CORE-3131.html
As description in previous report was very clear and detailed i will copy most 
of the text here and will add new .sql file for testing purposes.

Recently we moved from WIN1257 (Latvian) to UTF8. And we needed to keep data 
sorting in UTF8 database for Latvian alphabet like it was in WIN1257_LV 
collation.

So I tried creating utf8 collation with lv_LV LOCALE but get error when using 
ICU libraries that comes with Firebird installation. So I downloaded newer (v 
4.8) put them in firebird library directory and register them in fbintl.conf 
file. After that I was able to create in UTF8 database collation with LV locale 
but quickly found out that there are same problems with sorting that used to be 
with WIN1257_LV collation.


Here is some description taken from old issue which describes Latvian alphabet 
rules.

In latvian alphabet there can be accented letters A E I U (and others). 
Accented letters should follow after simple letters according the rules of 
alphabet, but they don't. For now, Firebird does not sort them, and our clients 
are unhappy with that. 

For now it works that way: 
A and Ā, a and ā - no difference in sorting 
E and Ē, e and ē - no difference in sorting 
I and Ī, i and ī - no difference in sorting 
U and Ū, u and ū - no difference in sorting 

Currently it works as described here: 
http://www.collation-charts.org/firebird20/fb203.WIN1257.WIN1257_LV.html 

Should be: 
AĀ, aā 
EĒ, eē 
IĪ, iī 
UŪ, uū 

Link to latvian alphabet in Wikipedia: 
http://lv.wikipedia.org/wiki/Latvie%C5%A1u_alfab%C4%93ts 


SQL query for creating collation with lv_lv locale in UTF8 database:
CREATE COLLATION test_lv_utf8
for UTF8 
from UNICODE 
case sensitive
ACCENT SENSITIVE 
'LOCALE=lv_LV;ICU-VERSION=4.8';   

Script to reproduce the problem in attachment. Script creates table with 2 
fields: "TEXT" - latvian text, "SORTIROVKA" - text field with right indexes. 
Script is saved in UTF-8 encoding. 

To reproduce problem, use query: 
select *
from TEST_LV_SORT tls
order by tls.text COLLATE test_lv_utf8;

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://tracker.firebirdsql.org/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

------------------------------------------------------------------------------
Want excitement?
Manually upgrade your production database.
When you want reliability, choose Perforce
Perforce version control. Predictably reliable.
http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
Firebird-Devel mailing list, web interface at 
https://lists.sourceforge.net/lists/listinfo/firebird-devel

Reply via email to