On 10/12/07, Dave Page <[EMAIL PROTECTED]> wrote: > Tom Lane wrote > > That still leaves us with the problem of how to tell whether a locale > > spec is bad on Windows. Judging by your example, Windows checks whether > > the code page is present but not whether it is sane for the base locale. > > What happens when there's a mismatch --- eg, what encoding do system > > messages come out in? > > I'm not sure how to test that specifically, but it seems that accented > characters simply fall back to their undecorated equivalents if the > encoding is not appropriate, eg: > > [EMAIL PROTECTED]:~$ ./setlc French_France.1252 > Locale: French_France.1252 > The date is: sam. 01 of août 2007 > [EMAIL PROTECTED]:~$ ./setlc French_France.28597 > Locale: French_France.28597 > The date is: sam. 01 of aout 2007 > > (the encodings used there are WIN1252 and ISO8859-7 (Greek)). > > I'm happy to test further is you can suggest how I can figure out the > encoding actually output.
The encoding output is the one you specified. Keep in mind, underneath Windows is mostly working with Unicode, so all characters exist and the locale rules specify their behavior there. The encoding is just the byte stream it needs to force them all into after doing whatever it does to them. As you've seen, it uses some sort of best-fit mapping I don't know the details of. (It will drop accent marks and choose characters with similar shape where possible, by default.) I think it's a bit more complex for input/transform cases where you operate on the byte stream directly without intermediate conversion to Unicode, which is why UTF-8 doesn't work as a codepage, but again I don't have the details nearby. I can try to do more digging if needed. ---------------------------(end of broadcast)--------------------------- TIP 4: Have you searched our list archives? http://archives.postgresql.org