UTF8 Latvian collation is wrong for 4 letters: A E I U.
--------------------------------------------------------
Key: CORE-4548
URL: http://tracker.firebirdsql.org/browse/CORE-4548
Project: Firebird Core
Issue Type: Bug
Components: Charsets/Collation
Affects Versions: 2.5.3
Environment: any
Reporter: Edgars Solafs
This issue was already reported and fixed couple years ago for WIN1257_LV
collation.
Here is link for old issue:
http://tracker.firebirdsql.org/si/jira.issueviews:issue-html/CORE-3131/CORE-3131.html
As description in previous report was very clear and detailed i will copy most
of the text here and will add new .sql file for testing purposes.
Recently we moved from WIN1257 (Latvian) to UTF8. And we needed to keep data
sorting in UTF8 database for Latvian alphabet like it was in WIN1257_LV
collation.
So I tried creating utf8 collation with lv_LV LOCALE but get error when using
ICU libraries that comes with Firebird installation. So I downloaded newer (v
4.8) put them in firebird library directory and register them in fbintl.conf
file. After that I was able to create in UTF8 database collation with LV locale
but quickly found out that there are same problems with sorting that used to be
with WIN1257_LV collation.
Here is some description taken from old issue which describes Latvian alphabet
rules.
In latvian alphabet there can be accented letters A E I U (and others).
Accented letters should follow after simple letters according the rules of
alphabet, but they don't. For now, Firebird does not sort them, and our clients
are unhappy with that.
For now it works that way:
A and Ā, a and ā - no difference in sorting
E and Ē, e and ē - no difference in sorting
I and Ī, i and ī - no difference in sorting
U and Ū, u and ū - no difference in sorting
Currently it works as described here:
http://www.collation-charts.org/firebird20/fb203.WIN1257.WIN1257_LV.html
Should be:
AĀ, aā
EĒ, eē
IĪ, iī
UŪ, uū
Link to latvian alphabet in Wikipedia:
http://lv.wikipedia.org/wiki/Latvie%C5%A1u_alfab%C4%93ts
SQL query for creating collation with lv_lv locale in UTF8 database:
CREATE COLLATION test_lv_utf8
for UTF8
from UNICODE
case sensitive
ACCENT SENSITIVE
'LOCALE=lv_LV;ICU-VERSION=4.8';
Script to reproduce the problem in attachment. Script creates table with 2
fields: "TEXT" - latvian text, "SORTIROVKA" - text field with right indexes.
Script is saved in UTF-8 encoding.
To reproduce problem, use query:
select *
from TEST_LV_SORT tls
order by tls.text COLLATE test_lv_utf8;
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://tracker.firebirdsql.org/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
------------------------------------------------------------------------------
Want excitement?
Manually upgrade your production database.
When you want reliability, choose Perforce
Perforce version control. Predictably reliable.
http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
Firebird-Devel mailing list, web interface at
https://lists.sourceforge.net/lists/listinfo/firebird-devel