Re: [GENERAL] Locale, encoding, sort order confusion

2006-08-21 Thread John Gunther




Alvaro Herrera wrote:

  
For example, with all LC_* parameters set to "en_US.UTF8", I get the 
following incorrect "order:by":

Béarn
Bécancour
Beaupré

  
  
Did you initdb with locale en_US.UTF8, and also createdb with encoding
UTF8?  While you can certainly choose mismatching values in createdb and
initdb, you shouldn't because it doesn't work.

That's an interesting question. Are the LC_* variables set by initdb or
createdb? In other words, does their value indicate what initdb
settings I used? If I do a default createdb, will the new database
automatically be consistent with the cluster's initdb?




Re: [GENERAL] Locale, encoding, sort order confusion

2006-08-20 Thread Alvaro Herrera
John Gunther wrote:
> >Pretty much any locale (say, en_US for you) with a matching character 
> >set should work.  Unless you go out of your way, this should be the 
> >default setting.
> 
> >>I've tried a half dozen time-consuming configs without 
> >>success.
> 
> >Like what?
> 
> For example, with all LC_* parameters set to "en_US.UTF8", I get the 
> following incorrect "order:by":
> 
> Béarn
> Bécancour
> Beaupré

Did you initdb with locale en_US.UTF8, and also createdb with encoding
UTF8?  While you can certainly choose mismatching values in createdb and
initdb, you shouldn't because it doesn't work.  See the docs here:

http://www.postgresql.org/docs/8.1/static/multibyte.html

Important:  Although you can specify any encoding you want for a
database, it is unwise to choose an encoding that is not what is
expected by the locale you have selected. The LC_COLLATE and
LC_CTYPE settings imply a particular encoding, and locale-dependent
operations (such as sorting) are likely to misinterpret data that is
in an incompatible encoding.

Since these locale settings are frozen by initdb, the apparent
flexibility to use different encodings in different databases of a
cluster is more theoretical than real. It is likely that these
mechanisms will be revisited in future versions of PostgreSQL. 


> Incidentally, in psql via a putty.exe session, the only character set 
> translation I can find that displays the accented characters is CP437. Does 
> this seem right?

You should probably set client_encoding in the psql session (using
\encoding) if you want to change the charset in putty.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [GENERAL] Locale, encoding, sort order confusion

2006-08-20 Thread John Gunther
Pretty much any locale (say, en_US for you) with a matching character 
set should work.  Unless you go out of your way, this should be the 
default setting.


I've tried a half dozen time-consuming configs without 
success.



Like what?


For example, with all LC_* parameters set to "en_US.UTF8", I get the following incorrect 
"order:by":

Béarn
Bécancour
Beaupré

Bécancour should be last.

Incidentally, in psql via a putty.exe session, the only character set 
translation I can find that displays the accented characters is CP437. Does 
this seem right?


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] Locale, encoding, sort order confusion

2006-08-20 Thread Peter Eisentraut
John Gunther wrote:
> Can someone tell me what combination of PostgreSQL and Linux settings
> I need for this? Or point me somewhere that it's well explained. It
> seems like a very basic question, but I'm just dense,
> I guess.

Pretty much any locale (say, en_US for you) with a matching character 
set should work.  Unless you go out of your way, this should be the 
default setting.

> I've tried a half dozen time-consuming configs without 
> success.

Like what?

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 6: explain analyze is your friend