[Dspace-tech] Browse, UTF-8 and sorting in 1.5

Urban Andersson Thu, 26 Jun 2008 01:09:33 -0700

Hello all,

One obstacle that I see in 1.5 is that terms containing diacritics are 
sorted by base character in the browse indexes - i.e. the letter "Ö" 
(oumlaut) is treated like "O" etc., which result in records not being 
sorted according to (national) standards (or the postgres locale).


I notice that these characters are decomposed (diacritic stored after 
the base character) before writing the sorting terms to the database, 
and this result in the above.
I know that this has been discussed  in the past, at the DSUG meeting in 
Rome for instance, but I cannot remember the exact reason why it was 
done this way (but I remember that there was an explanation and I would 
be happy if someone could please repeat it).

Modifying the utf-8 characters manually in bi_2_dis.sort_value et al 
will result in records being sorted correctly, but then the links from 
the indexes don't seem to work. 

Has anyone else looked into this and is there perhaps a known, quick 
(and possibly dirty) solution to get the most rudimentary sorting to 
work properly?
Our 1.5 runs on Debian linux, Postgres and Tomcat 6.


/ Urban Andersson

-- 
Urban Andersson
Digitala biblioteket / Digital library
Göteborgs universitetsbibliotek / Göteborg University Library
Box 222, SE 405 30 Göteborg, SWEDEN
Tel: +46 (0)31 7866185
[EMAIL PROTECTED] 



-------------------------------------------------------------------------
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://sourceforge.net/services/buy/index.php
_______________________________________________
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech

[Dspace-tech] Browse, UTF-8 and sorting in 1.5

Reply via email to