Re: [GENERAL] collation & UTF-8
Tomi NA <[EMAIL PROTECTED]> writes: > You were right about this: > LC_ALL=3Dhr_HR.UTF-8 sort < test.txt > (seemingly) collates the same way that pgsql does. Accented letters at the > end of the alphabet. I've tried hr_HR.UTF8 as well, without results. If you're not sure what locales are available on your system, run "locale -a". I don't think "sort" will complain about an unknown locale setting, it'll probably just fall back to "C" locale. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] collation & UTF-8
On 2/24/06, Martijn van Oosterhoutwrote: On Fri, Feb 24, 2006 at 06:23:07PM +0100, Tomi NA wrote:> I'm using PosgreSQL 8.1.2 on linux and want to load UTF-8 encoded varchars.> While I can store and get at stored text correctly, the ORDER BY places all > accented characters (Croatian, in this case - probably marked hr_HR) after> non-accented characters.> This is no showstopper, but it does affect the general perception of> application quality. Collation is a function of the OS. Basically, is the locale of yourdatabase setup for UTF-8 collation? It would probably be calledhr_HR.UTF-8.You were right about this: LC_ALL=hr_HR.UTF-8 sort < test.txt(seemingly) collates the same way that pgsql does. Accented letters at the end of the alphabet. I've tried hr_HR.UTF8 as well, without results. Btw, my database is created withCREATE DATABASE mydb WITH OWNER = postgres ENCODING = 'UTF8' TABLESPACE = pg_default;Yes, setup the locale correctly. In general, postgresql should give the same results as sort(1) on the command-line. Use that to experiment.LC_ALL=hr_HR.UTF-8 sort < input > outputI'm very sorry to report it does not work. :( Btw,set | grep LC_ returns nothing...is this a possible source of the problem? Tomislav
Re: [GENERAL] collation & UTF-8
On Fri, Feb 24, 2006 at 06:23:07PM +0100, Tomi NA wrote: > I'm using PosgreSQL 8.1.2 on linux and want to load UTF-8 encoded varchars. > While I can store and get at stored text correctly, the ORDER BY places all > accented characters (Croatian, in this case - probably marked hr_HR) after > non-accented characters. > This is no showstopper, but it does affect the general perception of > application quality. Collation is a function of the OS. Basically, is the locale of your database setup for UTF-8 collation? It would probably be called hr_HR.UTF-8. > is there an official way to set up UTF8 collation so that "SELECT first_name > FROM persons ORDER BY first_name" works as expected? Yes, setup the locale correctly. In general, postgresql should give the same results as sort(1) on the command-line. Use that to experiment. LC_ALL=hr_HR.UTF-8 sort < input > output Hope this helps, -- Martijn van Oosterhout http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them. signature.asc Description: Digital signature
[GENERAL] collation & UTF-8
I'm using PosgreSQL 8.1.2 on linux and want to load UTF-8 encoded varchars.While I can store and get at stored text correctly, the ORDER BY places all accented characters (Croatian, in this case - probably marked hr_HR) after non-accented characters. This is no showstopper, but it does affect the general perception of application quality.Now, I've seen the issue mentioned in a number of places, but often with fairly old versions of pgsql (<8.0), in different circumstances etc. so my question is: is there an official way to set up UTF8 collation so that "SELECT first_name FROM persons ORDER BY first_name" works as expected?TIA,Tomislav