Re: [HACKERS] could not adopt C locale failure at startup on Windows
Noah Misch n...@leadboat.com writes: On Wed, Jun 17, 2015 at 01:43:55PM -0400, Tom Lane wrote: I'm not exactly sure that they wouldn't be garbled anyway during the interval where we're setting these things up. Until DatabaseEncoding, ClientEncoding, and gettext's internal notions are all in sync, we are at risk of that type of issue no matter what. True; the window exists and is small enough both ways. This is merely one more reason to fix the bug without fixing what ain't broke. [ shrug... ] I'm not planning to waste a whole lot more breath on this topic, but to my mind the current definition of pg_perm_setlocale() *is* broke. Previously, that function only interacted with the standard libc locale facilities. Now it's also entangled with gettext(), as well as SetMessageEncoding and GetDatabaseEncoding. IMO that extra cross-module entanglement is the exact reason we have a bug here. It's a layering violation and I think we should get rid of it. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] could not adopt C locale failure at startup on Windows
On Wed, Jun 17, 2015 at 01:43:55PM -0400, Tom Lane wrote: Noah Misch n...@leadboat.com writes: On Mon, Jun 15, 2015 at 12:37:43PM -0400, Tom Lane wrote: It's mere chance that the order of calls to pg_perm_setlocale() is such that the code works now; and it's not hard to imagine future changes in gettext, or reordering of our own code, that would break it. pg_bind_textdomain_codeset() looks at one piece of the locale environment, namely setlocale(LC_CTYPE, NULL), so the order of pg_perm_setlocale() calls does not affect it. Well, my point is that that is a larger assumption about the behavior of pg_bind_textdomain_codeset() than I think we ought to make, even if it happens to be true today. Perhaps it's just me, but I can envision changes of similar plausibility that break under each approach and not the other. Without a way to distinguish on that basis, I'm left shrugging about a proposal to switch. For that matter, if pg_bind_textdomain_codeset() starts to act on other facets of the locale, that's liable to be a bug independent of our choice here. However locale facets conflict, we expect LC_CTYPE to control the message codeset. There's nothing acutely bad about the alternative you identify here, but it's no better equipped to forestall mistakes. Moving the call later would slightly expand the window during which translated messages are garbled. I'm not exactly sure that they wouldn't be garbled anyway during the interval where we're setting these things up. Until DatabaseEncoding, ClientEncoding, and gettext's internal notions are all in sync, we are at risk of that type of issue no matter what. True; the window exists and is small enough both ways. This is merely one more reason to fix the bug without fixing what ain't broke. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] could not adopt C locale failure at startup on Windows
Noah Misch n...@leadboat.com writes: On Mon, Jun 15, 2015 at 12:37:43PM -0400, Tom Lane wrote: It's mere chance that the order of calls to pg_perm_setlocale() is such that the code works now; and it's not hard to imagine future changes in gettext, or reordering of our own code, that would break it. pg_bind_textdomain_codeset() looks at one piece of the locale environment, namely setlocale(LC_CTYPE, NULL), so the order of pg_perm_setlocale() calls does not affect it. Well, my point is that that is a larger assumption about the behavior of pg_bind_textdomain_codeset() than I think we ought to make, even if it happens to be true today. There's nothing acutely bad about the alternative you identify here, but it's no better equipped to forestall mistakes. Moving the call later would slightly expand the window during which translated messages are garbled. I'm not exactly sure that they wouldn't be garbled anyway during the interval where we're setting these things up. Until DatabaseEncoding, ClientEncoding, and gettext's internal notions are all in sync, we are at risk of that type of issue no matter what. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] could not adopt C locale failure at startup on Windows
On Mon, Jun 15, 2015 at 12:37:43PM -0400, Tom Lane wrote: After further thought, ISTM that this bug is evidence that 5f538ad was badly designed, and the proposed fix has blinkers on. If pg_bind_textdomain_codeset() is looking at the locale environment, we should not be calling it from inside pg_perm_setlocale() at all, but from higher level code *after* we're done setting up the whole libc locale environment --- thus, after the unsetenv(LC_ALL) call in main.c, and somewhere near the bottom of CheckMyDatabase() in postinit.c. It's mere chance that the order of calls to pg_perm_setlocale() is such that the code works now; and it's not hard to imagine future changes in gettext, or reordering of our own code, that would break it. pg_bind_textdomain_codeset() looks at one piece of the locale environment, namely setlocale(LC_CTYPE, NULL), so the order of pg_perm_setlocale() calls does not affect it. There's nothing acutely bad about the alternative you identify here, but it's no better equipped to forestall mistakes. Moving the call later would slightly expand the window during which translated messages are garbled. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] could not adopt C locale failure at startup on Windows
On Mon, Jun 15, 2015 at 08:47:12AM -0400, Tom Lane wrote: Noah Misch n...@leadboat.com writes: While Windows was the bellwether, harm potential is greater on non-Windows systems. pg_perm_setlocale() sets the LC_CTYPE environment variable to help PL/Perl avoid clobbering the process locale; see plperl_init_interp() comments. However, that function has bespoke code for Windows, on which setting the environment variable doesn't help. I don't know which other platforms invalidate previous setlocale() return values on setlocale(LC_CTYPE, NULL). Therefore, I propose committing the attached diagnostic patch and reverting it after about one buildfarm cycle. It will make affected configurations fail hard, and then I'll have a notion about the prevalence of damage to expect in the field. I doubt this will teach us anything; if any buildfarm systems were exhibiting the issue, they'd have been failing all along, no? No; most systems let environment variables carry arbitrary strings of non-nul bytes, so they don't see $SUBJECT. I want to probe for all systems that are currently issuing putenv(LC_CTYPE=garbage), not just the ones where a picky putenv() illuminates it. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] could not adopt C locale failure at startup on Windows
Noah Misch n...@leadboat.com writes: On Wed, Jun 10, 2015 at 10:09:38AM -0400, Tom Lane wrote: Hm. I could understand getting encoding difficulties in that environment, but it's hard to see why they'd manifest like this. Can you trace through pg_perm_setlocale and figure out why it's reporting failure? A faster test is to set LC_CTYPE=C in the environment and run postgres --version. The root cause is a bug my commit 5f538ad introduced at the start of the 9.4 cycle. pg_perm_setlocale() now calls pg_bind_textdomain_codeset(), which calls setlocale(LC_CTYPE, NULL). POSIX permits that to clobber all previous setlocale() return values, which it did here[1]. Ah-hah. While Windows was the bellwether, harm potential is greater on non-Windows systems. pg_perm_setlocale() sets the LC_CTYPE environment variable to help PL/Perl avoid clobbering the process locale; see plperl_init_interp() comments. However, that function has bespoke code for Windows, on which setting the environment variable doesn't help. I don't know which other platforms invalidate previous setlocale() return values on setlocale(LC_CTYPE, NULL). Therefore, I propose committing the attached diagnostic patch and reverting it after about one buildfarm cycle. It will make affected configurations fail hard, and then I'll have a notion about the prevalence of damage to expect in the field. I doubt this will teach us anything; if any buildfarm systems were exhibiting the issue, they'd have been failing all along, no? This should break at least the bootstrap/initdb case on any affected system. The actual fix is trivial, attached second. This is for back-patch to 9.4. Looks sane to me. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] could not adopt C locale failure at startup on Windows
Noah Misch n...@leadboat.com writes: A faster test is to set LC_CTYPE=C in the environment and run postgres --version. The root cause is a bug my commit 5f538ad introduced at the start of the 9.4 cycle. pg_perm_setlocale() now calls pg_bind_textdomain_codeset(), which calls setlocale(LC_CTYPE, NULL). POSIX permits that to clobber all previous setlocale() return values, which it did here[1]. After further thought, ISTM that this bug is evidence that 5f538ad was badly designed, and the proposed fix has blinkers on. If pg_bind_textdomain_codeset() is looking at the locale environment, we should not be calling it from inside pg_perm_setlocale() at all, but from higher level code *after* we're done setting up the whole libc locale environment --- thus, after the unsetenv(LC_ALL) call in main.c, and somewhere near the bottom of CheckMyDatabase() in postinit.c. It's mere chance that the order of calls to pg_perm_setlocale() is such that the code works now; and it's not hard to imagine future changes in gettext, or reordering of our own code, that would break it. So I think having to duplicate that call is a reasonable price to pay for not having surprising entanglements in what should be a very thin wrapper around setlocale(3). regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] could not adopt C locale failure at startup on Windows
On Wed, Jun 10, 2015 at 10:09:38AM -0400, Tom Lane wrote: Noah Misch n...@leadboat.com writes: I can reproduce this with initdb --locale=C, postgresql-9.4.3-1-windows-binaries.zip (32-bit), Windows 7 x64, and the Windows ANSI code page set to CP936. (Choose Chinese (Simplified, PRC) in Control Panel - Region and Language - Administrative - Language for non-Unicode programs.) It is neither necessary nor sufficient to change Control Panel - Region and Language - Formats - Format. Binaries from postgresql-9.4.3-1-windows-x64-binaries.zip do not exhibit the problem. Note that CP936 is a PG_ENCODING_IS_CLIENT_ONLY() encoding. Hm. I could understand getting encoding difficulties in that environment, but it's hard to see why they'd manifest like this. Can you trace through pg_perm_setlocale and figure out why it's reporting failure? A faster test is to set LC_CTYPE=C in the environment and run postgres --version. The root cause is a bug my commit 5f538ad introduced at the start of the 9.4 cycle. pg_perm_setlocale() now calls pg_bind_textdomain_codeset(), which calls setlocale(LC_CTYPE, NULL). POSIX permits that to clobber all previous setlocale() return values, which it did here[1]. The ensuing putenv(LC_CTYPE=garbage bytes) at the end of pg_perm_setlocale() fails under Windows ANSI code page 936, because the garbage bytes often aren't a valid CP936 string. I would expect the same symptom on other multibyte Windows locales. While Windows was the bellwether, harm potential is greater on non-Windows systems. pg_perm_setlocale() sets the LC_CTYPE environment variable to help PL/Perl avoid clobbering the process locale; see plperl_init_interp() comments. However, that function has bespoke code for Windows, on which setting the environment variable doesn't help. I don't know which other platforms invalidate previous setlocale() return values on setlocale(LC_CTYPE, NULL). Therefore, I propose committing the attached diagnostic patch and reverting it after about one buildfarm cycle. It will make affected configurations fail hard, and then I'll have a notion about the prevalence of damage to expect in the field. The actual fix is trivial, attached second. This is for back-patch to 9.4. [1] It does so in 32-bit release (non-debug), NLS builds done under Visual Studio 2012 and Visual Studio 2013. The official binaries used VS2013. The symptoms are slightly different under VS2012. I did not test earlier versions. Debug builds and 64-bit builds were unaffected. diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c index 4be735e..d33081b 100644 --- a/src/backend/utils/adt/pg_locale.c +++ b/src/backend/utils/adt/pg_locale.c @@ -58,6 +58,7 @@ #include catalog/pg_collation.h #include catalog/pg_control.h #include mb/pg_wchar.h +#include utils/builtins.h #include utils/hsearch.h #include utils/memutils.h #include utils/pg_locale.h @@ -148,6 +149,7 @@ pg_perm_setlocale(int category, const char *locale) char *result; const char *envvar; char *envbuf; + charorig_result[LC_ENV_BUFSIZE]; #ifndef WIN32 result = setlocale(category, locale); @@ -173,6 +175,7 @@ pg_perm_setlocale(int category, const char *locale) if (result == NULL) return result; /* fall out immediately on failure */ + strlcpy(orig_result, result, sizeof(orig_result)); /* * Use the right encoding in translated messages. Under ENABLE_NLS, let @@ -231,6 +234,15 @@ pg_perm_setlocale(int category, const char *locale) } snprintf(envbuf, LC_ENV_BUFSIZE - 1, %s=%s, envvar, result); + if (strcmp(orig_result, result) != 0) + { + charhex[2 * LC_ENV_BUFSIZE + 1]; + + hex_encode(result, Min(1 + strlen(result), LC_ENV_BUFSIZE), hex); + hex[sizeof(hex) - 1] = '\0'; + elog(FATAL, setlocale() result %s clobbered to 0x%s, +orig_result, hex); + } if (putenv(envbuf)) return NULL; diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c index 4be735e..84215e0 100644 --- a/src/backend/utils/adt/pg_locale.c +++ b/src/backend/utils/adt/pg_locale.c @@ -183,6 +183,12 @@ pg_perm_setlocale(int category, const char *locale) */ if (category == LC_CTYPE) { + static char save_lc_ctype[LC_ENV_BUFSIZE]; + + /* copy setlocale() return value before callee invokes it again */ + strlcpy(save_lc_ctype, result, sizeof(save_lc_ctype)); + result = save_lc_ctype; + #ifdef ENABLE_NLS SetMessageEncoding(pg_bind_textdomain_codeset(textdomain(NULL))); #else -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] could not adopt C locale failure at startup on Windows
Noah Misch n...@leadboat.com writes: On Tue, Jun 09, 2015 at 12:24:02PM -0400, Tom Lane wrote: Yeah, my first instinct was to blame ca325941 as well, but I don't think any of that code executes during init_locale(). Also, http://www.postgresql.org/message-id/20150326040321.2492.24...@wrigleys.postgresql.org says it's been seen in 9.4.1. The return value check and error message were new in 9.4.1. I suspect the underlying problem has been present far longer, undetected. Oooh ... I'd forgotten that 6fdba8ceb was so recent. Agreed, what we are seeing is probably a situation that's been there for a long time, but we were ignoring the failure up to now (and, evidently, it wasn't really creating any problems). I can reproduce this with initdb --locale=C, postgresql-9.4.3-1-windows-binaries.zip (32-bit), Windows 7 x64, and the Windows ANSI code page set to CP936. (Choose Chinese (Simplified, PRC) in Control Panel - Region and Language - Administrative - Language for non-Unicode programs.) It is neither necessary nor sufficient to change Control Panel - Region and Language - Formats - Format. Binaries from postgresql-9.4.3-1-windows-x64-binaries.zip do not exhibit the problem. Note that CP936 is a PG_ENCODING_IS_CLIENT_ONLY() encoding. Hm. I could understand getting encoding difficulties in that environment, but it's hard to see why they'd manifest like this. Can you trace through pg_perm_setlocale and figure out why it's reporting failure? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] could not adopt C locale failure at startup on Windows
On 2015-06-09 11:20:06 -0400, Tom Lane wrote: We've seen several reports of $SUBJECT lately. I have no idea what's going on, but it occurred to me that it could be informative to tweak init_locale() so that it reports which category failed to be set. Any objections to squeezing that into today's releases? No objection, +1. Seems fairly low risk. The error seems odd. The only even remotely related change between 9.4.1 and .2 seems to be ca325941. Could also be a build environment change. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] could not adopt C locale failure at startup on Windows
Andres Freund and...@anarazel.de writes: On 2015-06-09 11:20:06 -0400, Tom Lane wrote: We've seen several reports of $SUBJECT lately. I have no idea what's going on, but it occurred to me that it could be informative to tweak init_locale() so that it reports which category failed to be set. Any objections to squeezing that into today's releases? No objection, +1. Seems fairly low risk. The error seems odd. The only even remotely related change between 9.4.1 and .2 seems to be ca325941. Could also be a build environment change. Yeah, my first instinct was to blame ca325941 as well, but I don't think any of that code executes during init_locale(). Also, http://www.postgresql.org/message-id/20150326040321.2492.24...@wrigleys.postgresql.org says it's been seen in 9.4.1. Of course, it might be premature to assume that report had an identical cause to the later ones. What I plan to do is this: static void init_locale(const char *categoryname, int category, const char *locale) { if (pg_perm_setlocale(category, locale) == NULL pg_perm_setlocale(category, C) == NULL) elog(FATAL, could not adopt either \%s\ locale or C locale for %s, locale, categoryname); } with the obvious changes to the call sites to pass the string names of the categories not just their numeric codes. (We could just log the numbers, but it'd be much harder to interpret.) This might be overkill, but when you don't know what you're looking for, every little bit helps ... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] could not adopt C locale failure at startup on Windows
On Tue, Jun 09, 2015 at 12:24:02PM -0400, Tom Lane wrote: Andres Freund and...@anarazel.de writes: The error seems odd. The only even remotely related change between 9.4.1 and .2 seems to be ca325941. Could also be a build environment change. Yeah, my first instinct was to blame ca325941 as well, but I don't think any of that code executes during init_locale(). Also, http://www.postgresql.org/message-id/20150326040321.2492.24...@wrigleys.postgresql.org says it's been seen in 9.4.1. The return value check and error message were new in 9.4.1. I suspect the underlying problem has been present far longer, undetected. I can reproduce this with initdb --locale=C, postgresql-9.4.3-1-windows-binaries.zip (32-bit), Windows 7 x64, and the Windows ANSI code page set to CP936. (Choose Chinese (Simplified, PRC) in Control Panel - Region and Language - Administrative - Language for non-Unicode programs.) It is neither necessary nor sufficient to change Control Panel - Region and Language - Formats - Format. Binaries from postgresql-9.4.3-1-windows-x64-binaries.zip do not exhibit the problem. Note that CP936 is a PG_ENCODING_IS_CLIENT_ONLY() encoding. What I plan to do is this: static void init_locale(const char *categoryname, int category, const char *locale) { if (pg_perm_setlocale(category, locale) == NULL pg_perm_setlocale(category, C) == NULL) elog(FATAL, could not adopt either \%s\ locale or C locale for %s, locale, categoryname); } Good to have. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers