Re: [HACKERS] Locale + encoding combinations

Trevor Talbot Fri, 12 Oct 2007 06:08:03 -0700

On 10/12/07, Dave Page <[EMAIL PROTECTED]> wrote:
> Tom Lane wrote
> > That still leaves us with the problem of how to tell whether a locale
> > spec is bad on Windows.  Judging by your example, Windows checks whether
> > the code page is present but not whether it is sane for the base locale.
> > What happens when there's a mismatch --- eg, what encoding do system
> > messages come out in?
>
> I'm not sure how to test that specifically, but it seems that accented
> characters simply fall back to their undecorated equivalents if the
> encoding is not appropriate, eg:
>
> [EMAIL PROTECTED]:~$ ./setlc French_France.1252
> Locale: French_France.1252
> The date is: sam. 01 of août  2007
> [EMAIL PROTECTED]:~$ ./setlc French_France.28597
> Locale: French_France.28597
> The date is: sam. 01 of aout  2007
>
> (the encodings used there are WIN1252 and ISO8859-7 (Greek)).
>
> I'm happy to test further is you can suggest how I can figure out the
> encoding actually output.


The encoding output is the one you specified.  Keep in mind,
underneath Windows is mostly working with Unicode, so all characters
exist and the locale rules specify their behavior there.  The encoding
is just the byte stream it needs to force them all into after doing
whatever it does to them.  As you've seen, it uses some sort of
best-fit mapping I don't know the details of.  (It will drop accent
marks and choose characters with similar shape where possible, by
default.)

I think it's a bit more complex for input/transform cases where you
operate on the byte stream directly without intermediate conversion to
Unicode, which is why UTF-8 doesn't work as a codepage, but again I
don't have the details nearby.  I can try to do more digging if
needed.

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

               http://archives.postgresql.org

Re: [HACKERS] Locale + encoding combinations

Reply via email to