Re: [HACKERS] could not adopt C locale failure at startup on Windows

2015-06-18 Thread Tom Lane
Noah Misch n...@leadboat.com writes:
 On Wed, Jun 17, 2015 at 01:43:55PM -0400, Tom Lane wrote:
 I'm not exactly sure that they wouldn't be garbled anyway during the
 interval where we're setting these things up.  Until DatabaseEncoding,
 ClientEncoding, and gettext's internal notions are all in sync, we
 are at risk of that type of issue no matter what.

 True; the window exists and is small enough both ways.  This is merely one
 more reason to fix the bug without fixing what ain't broke.

[ shrug... ]  I'm not planning to waste a whole lot more breath on this
topic, but to my mind the current definition of pg_perm_setlocale() *is*
broke.  Previously, that function only interacted with the standard libc
locale facilities.  Now it's also entangled with gettext(), as well as
SetMessageEncoding and GetDatabaseEncoding.  IMO that extra cross-module
entanglement is the exact reason we have a bug here.  It's a layering
violation and I think we should get rid of it.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] could not adopt C locale failure at startup on Windows

2015-06-17 Thread Noah Misch
On Wed, Jun 17, 2015 at 01:43:55PM -0400, Tom Lane wrote:
 Noah Misch n...@leadboat.com writes:
  On Mon, Jun 15, 2015 at 12:37:43PM -0400, Tom Lane wrote:
  It's mere chance that the order of calls to pg_perm_setlocale() is
  such that the code works now; and it's not hard to imagine future
  changes in gettext, or reordering of our own code, that would break it.
 
  pg_bind_textdomain_codeset() looks at one piece of the locale environment,
  namely setlocale(LC_CTYPE, NULL), so the order of pg_perm_setlocale() calls
  does not affect it.
 
 Well, my point is that that is a larger assumption about the behavior of
 pg_bind_textdomain_codeset() than I think we ought to make, even if it
 happens to be true today.

Perhaps it's just me, but I can envision changes of similar plausibility that
break under each approach and not the other.  Without a way to distinguish on
that basis, I'm left shrugging about a proposal to switch.  For that matter,
if pg_bind_textdomain_codeset() starts to act on other facets of the locale,
that's liable to be a bug independent of our choice here.  However locale
facets conflict, we expect LC_CTYPE to control the message codeset.

  There's nothing acutely bad about the alternative you
  identify here, but it's no better equipped to forestall mistakes.  Moving 
  the
  call later would slightly expand the window during which translated messages
  are garbled.
 
 I'm not exactly sure that they wouldn't be garbled anyway during the
 interval where we're setting these things up.  Until DatabaseEncoding,
 ClientEncoding, and gettext's internal notions are all in sync, we
 are at risk of that type of issue no matter what.

True; the window exists and is small enough both ways.  This is merely one
more reason to fix the bug without fixing what ain't broke.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] could not adopt C locale failure at startup on Windows

2015-06-17 Thread Tom Lane
Noah Misch n...@leadboat.com writes:
 On Mon, Jun 15, 2015 at 12:37:43PM -0400, Tom Lane wrote:
 It's mere chance that the order of calls to pg_perm_setlocale() is
 such that the code works now; and it's not hard to imagine future
 changes in gettext, or reordering of our own code, that would break it.

 pg_bind_textdomain_codeset() looks at one piece of the locale environment,
 namely setlocale(LC_CTYPE, NULL), so the order of pg_perm_setlocale() calls
 does not affect it.

Well, my point is that that is a larger assumption about the behavior of
pg_bind_textdomain_codeset() than I think we ought to make, even if it
happens to be true today.

 There's nothing acutely bad about the alternative you
 identify here, but it's no better equipped to forestall mistakes.  Moving the
 call later would slightly expand the window during which translated messages
 are garbled.

I'm not exactly sure that they wouldn't be garbled anyway during the
interval where we're setting these things up.  Until DatabaseEncoding,
ClientEncoding, and gettext's internal notions are all in sync, we
are at risk of that type of issue no matter what.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] could not adopt C locale failure at startup on Windows

2015-06-17 Thread Noah Misch
On Mon, Jun 15, 2015 at 12:37:43PM -0400, Tom Lane wrote:
 After further thought, ISTM that this bug is evidence that 5f538ad
 was badly designed, and the proposed fix has blinkers on.  If
 pg_bind_textdomain_codeset() is looking at the locale environment,
 we should not be calling it from inside pg_perm_setlocale() at all,
 but from higher level code *after* we're done setting up the whole libc
 locale environment --- thus, after the unsetenv(LC_ALL) call in main.c,
 and somewhere near the bottom of CheckMyDatabase() in postinit.c.
 It's mere chance that the order of calls to pg_perm_setlocale() is
 such that the code works now; and it's not hard to imagine future
 changes in gettext, or reordering of our own code, that would break it.

pg_bind_textdomain_codeset() looks at one piece of the locale environment,
namely setlocale(LC_CTYPE, NULL), so the order of pg_perm_setlocale() calls
does not affect it.  There's nothing acutely bad about the alternative you
identify here, but it's no better equipped to forestall mistakes.  Moving the
call later would slightly expand the window during which translated messages
are garbled.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] could not adopt C locale failure at startup on Windows

2015-06-16 Thread Noah Misch
On Mon, Jun 15, 2015 at 08:47:12AM -0400, Tom Lane wrote:
 Noah Misch n...@leadboat.com writes:
  While Windows was the bellwether, harm potential is greater on non-Windows
  systems.  pg_perm_setlocale() sets the LC_CTYPE environment variable to help
  PL/Perl avoid clobbering the process locale; see plperl_init_interp()
  comments.  However, that function has bespoke code for Windows, on which
  setting the environment variable doesn't help.  I don't know which other
  platforms invalidate previous setlocale() return values on 
  setlocale(LC_CTYPE,
  NULL).  Therefore, I propose committing the attached diagnostic patch and
  reverting it after about one buildfarm cycle.  It will make affected
  configurations fail hard, and then I'll have a notion about the prevalence 
  of
  damage to expect in the field.
 
 I doubt this will teach us anything; if any buildfarm systems were
 exhibiting the issue, they'd have been failing all along, no?

No; most systems let environment variables carry arbitrary strings of non-nul
bytes, so they don't see $SUBJECT.  I want to probe for all systems that are
currently issuing putenv(LC_CTYPE=garbage), not just the ones where a
picky putenv() illuminates it.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] could not adopt C locale failure at startup on Windows

2015-06-15 Thread Tom Lane
Noah Misch n...@leadboat.com writes:
 On Wed, Jun 10, 2015 at 10:09:38AM -0400, Tom Lane wrote:
 Hm.  I could understand getting encoding difficulties in that environment,
 but it's hard to see why they'd manifest like this.  Can you trace through
 pg_perm_setlocale and figure out why it's reporting failure?

 A faster test is to set LC_CTYPE=C in the environment and run postgres
 --version.  The root cause is a bug my commit 5f538ad introduced at the start
 of the 9.4 cycle.  pg_perm_setlocale() now calls pg_bind_textdomain_codeset(),
 which calls setlocale(LC_CTYPE, NULL).  POSIX permits that to clobber all
 previous setlocale() return values, which it did here[1].

Ah-hah.

 While Windows was the bellwether, harm potential is greater on non-Windows
 systems.  pg_perm_setlocale() sets the LC_CTYPE environment variable to help
 PL/Perl avoid clobbering the process locale; see plperl_init_interp()
 comments.  However, that function has bespoke code for Windows, on which
 setting the environment variable doesn't help.  I don't know which other
 platforms invalidate previous setlocale() return values on setlocale(LC_CTYPE,
 NULL).  Therefore, I propose committing the attached diagnostic patch and
 reverting it after about one buildfarm cycle.  It will make affected
 configurations fail hard, and then I'll have a notion about the prevalence of
 damage to expect in the field.

I doubt this will teach us anything; if any buildfarm systems were
exhibiting the issue, they'd have been failing all along, no?  This should
break at least the bootstrap/initdb case on any affected system.

 The actual fix is trivial, attached second.  This is for back-patch to 9.4.

Looks sane to me.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] could not adopt C locale failure at startup on Windows

2015-06-15 Thread Tom Lane
Noah Misch n...@leadboat.com writes:
 A faster test is to set LC_CTYPE=C in the environment and run postgres
 --version.  The root cause is a bug my commit 5f538ad introduced at the start
 of the 9.4 cycle.  pg_perm_setlocale() now calls pg_bind_textdomain_codeset(),
 which calls setlocale(LC_CTYPE, NULL).  POSIX permits that to clobber all
 previous setlocale() return values, which it did here[1].

After further thought, ISTM that this bug is evidence that 5f538ad
was badly designed, and the proposed fix has blinkers on.  If
pg_bind_textdomain_codeset() is looking at the locale environment,
we should not be calling it from inside pg_perm_setlocale() at all,
but from higher level code *after* we're done setting up the whole libc
locale environment --- thus, after the unsetenv(LC_ALL) call in main.c,
and somewhere near the bottom of CheckMyDatabase() in postinit.c.
It's mere chance that the order of calls to pg_perm_setlocale() is
such that the code works now; and it's not hard to imagine future
changes in gettext, or reordering of our own code, that would break it.
So I think having to duplicate that call is a reasonable price to pay
for not having surprising entanglements in what should be a very thin
wrapper around setlocale(3).

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] could not adopt C locale failure at startup on Windows

2015-06-15 Thread Noah Misch
On Wed, Jun 10, 2015 at 10:09:38AM -0400, Tom Lane wrote:
 Noah Misch n...@leadboat.com writes:
  I can reproduce this with initdb --locale=C,
  postgresql-9.4.3-1-windows-binaries.zip (32-bit), Windows 7 x64, and the
  Windows ANSI code page set to CP936.  (Choose Chinese (Simplified, PRC) in
  Control Panel - Region and Language - Administrative - Language for
  non-Unicode programs.)  It is neither necessary nor sufficient to change
  Control Panel - Region and Language - Formats - Format.  Binaries from
  postgresql-9.4.3-1-windows-x64-binaries.zip do not exhibit the problem.  
  Note
  that CP936 is a PG_ENCODING_IS_CLIENT_ONLY() encoding.
 
 Hm.  I could understand getting encoding difficulties in that environment,
 but it's hard to see why they'd manifest like this.  Can you trace through
 pg_perm_setlocale and figure out why it's reporting failure?

A faster test is to set LC_CTYPE=C in the environment and run postgres
--version.  The root cause is a bug my commit 5f538ad introduced at the start
of the 9.4 cycle.  pg_perm_setlocale() now calls pg_bind_textdomain_codeset(),
which calls setlocale(LC_CTYPE, NULL).  POSIX permits that to clobber all
previous setlocale() return values, which it did here[1].  The ensuing
putenv(LC_CTYPE=garbage bytes) at the end of pg_perm_setlocale() fails
under Windows ANSI code page 936, because the garbage bytes often aren't a
valid CP936 string.  I would expect the same symptom on other multibyte
Windows locales.

While Windows was the bellwether, harm potential is greater on non-Windows
systems.  pg_perm_setlocale() sets the LC_CTYPE environment variable to help
PL/Perl avoid clobbering the process locale; see plperl_init_interp()
comments.  However, that function has bespoke code for Windows, on which
setting the environment variable doesn't help.  I don't know which other
platforms invalidate previous setlocale() return values on setlocale(LC_CTYPE,
NULL).  Therefore, I propose committing the attached diagnostic patch and
reverting it after about one buildfarm cycle.  It will make affected
configurations fail hard, and then I'll have a notion about the prevalence of
damage to expect in the field.

The actual fix is trivial, attached second.  This is for back-patch to 9.4.


[1] It does so in 32-bit release (non-debug), NLS builds done under Visual
Studio 2012 and Visual Studio 2013.  The official binaries used VS2013.  The
symptoms are slightly different under VS2012.  I did not test earlier
versions.  Debug builds and 64-bit builds were unaffected.
diff --git a/src/backend/utils/adt/pg_locale.c 
b/src/backend/utils/adt/pg_locale.c
index 4be735e..d33081b 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -58,6 +58,7 @@
 #include catalog/pg_collation.h
 #include catalog/pg_control.h
 #include mb/pg_wchar.h
+#include utils/builtins.h
 #include utils/hsearch.h
 #include utils/memutils.h
 #include utils/pg_locale.h
@@ -148,6 +149,7 @@ pg_perm_setlocale(int category, const char *locale)
char   *result;
const char *envvar;
char   *envbuf;
+   charorig_result[LC_ENV_BUFSIZE];
 
 #ifndef WIN32
result = setlocale(category, locale);
@@ -173,6 +175,7 @@ pg_perm_setlocale(int category, const char *locale)
 
if (result == NULL)
return result;  /* fall out immediately on 
failure */
+   strlcpy(orig_result, result, sizeof(orig_result));
 
/*
 * Use the right encoding in translated messages.  Under ENABLE_NLS, let
@@ -231,6 +234,15 @@ pg_perm_setlocale(int category, const char *locale)
}
 
snprintf(envbuf, LC_ENV_BUFSIZE - 1, %s=%s, envvar, result);
+   if (strcmp(orig_result, result) != 0)
+   {
+   charhex[2 * LC_ENV_BUFSIZE + 1];
+
+   hex_encode(result, Min(1 + strlen(result), LC_ENV_BUFSIZE), 
hex);
+   hex[sizeof(hex) - 1] = '\0';
+   elog(FATAL, setlocale() result %s clobbered to 0x%s,
+orig_result, hex);
+   }
 
if (putenv(envbuf))
return NULL;
diff --git a/src/backend/utils/adt/pg_locale.c 
b/src/backend/utils/adt/pg_locale.c
index 4be735e..84215e0 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -183,6 +183,12 @@ pg_perm_setlocale(int category, const char *locale)
 */
if (category == LC_CTYPE)
{
+   static char save_lc_ctype[LC_ENV_BUFSIZE];
+
+   /* copy setlocale() return value before callee invokes it again 
*/
+   strlcpy(save_lc_ctype, result, sizeof(save_lc_ctype));
+   result = save_lc_ctype;
+
 #ifdef ENABLE_NLS

SetMessageEncoding(pg_bind_textdomain_codeset(textdomain(NULL)));
 #else

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] could not adopt C locale failure at startup on Windows

2015-06-10 Thread Tom Lane
Noah Misch n...@leadboat.com writes:
 On Tue, Jun 09, 2015 at 12:24:02PM -0400, Tom Lane wrote:
 Yeah, my first instinct was to blame ca325941 as well, but I don't think
 any of that code executes during init_locale().  Also,
 http://www.postgresql.org/message-id/20150326040321.2492.24...@wrigleys.postgresql.org
 says it's been seen in 9.4.1.

 The return value check and error message were new in 9.4.1.  I suspect the
 underlying problem has been present far longer, undetected.

Oooh ... I'd forgotten that 6fdba8ceb was so recent.  Agreed, what we are
seeing is probably a situation that's been there for a long time, but we
were ignoring the failure up to now (and, evidently, it wasn't really
creating any problems).

 I can reproduce this with initdb --locale=C,
 postgresql-9.4.3-1-windows-binaries.zip (32-bit), Windows 7 x64, and the
 Windows ANSI code page set to CP936.  (Choose Chinese (Simplified, PRC) in
 Control Panel - Region and Language - Administrative - Language for
 non-Unicode programs.)  It is neither necessary nor sufficient to change
 Control Panel - Region and Language - Formats - Format.  Binaries from
 postgresql-9.4.3-1-windows-x64-binaries.zip do not exhibit the problem.  Note
 that CP936 is a PG_ENCODING_IS_CLIENT_ONLY() encoding.

Hm.  I could understand getting encoding difficulties in that environment,
but it's hard to see why they'd manifest like this.  Can you trace through
pg_perm_setlocale and figure out why it's reporting failure?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] could not adopt C locale failure at startup on Windows

2015-06-09 Thread Andres Freund
On 2015-06-09 11:20:06 -0400, Tom Lane wrote:
 We've seen several reports of $SUBJECT lately.  I have no idea what's
 going on, but it occurred to me that it could be informative to tweak
 init_locale() so that it reports which category failed to be set.
 Any objections to squeezing that into today's releases?

No objection, +1. Seems fairly low risk.

The error seems odd. The only even remotely related change between 9.4.1
and .2 seems to be ca325941. Could also be a build environment change.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] could not adopt C locale failure at startup on Windows

2015-06-09 Thread Tom Lane
Andres Freund and...@anarazel.de writes:
 On 2015-06-09 11:20:06 -0400, Tom Lane wrote:
 We've seen several reports of $SUBJECT lately.  I have no idea what's
 going on, but it occurred to me that it could be informative to tweak
 init_locale() so that it reports which category failed to be set.
 Any objections to squeezing that into today's releases?

 No objection, +1. Seems fairly low risk.

 The error seems odd. The only even remotely related change between 9.4.1
 and .2 seems to be ca325941. Could also be a build environment change.

Yeah, my first instinct was to blame ca325941 as well, but I don't think
any of that code executes during init_locale().  Also,
http://www.postgresql.org/message-id/20150326040321.2492.24...@wrigleys.postgresql.org
says it's been seen in 9.4.1.  Of course, it might be premature to assume
that report had an identical cause to the later ones.

What I plan to do is this:

static void
init_locale(const char *categoryname, int category, const char *locale)
{
if (pg_perm_setlocale(category, locale) == NULL 
pg_perm_setlocale(category, C) == NULL)
elog(FATAL, could not adopt either \%s\ locale or C locale for 
%s, locale, categoryname);
}

with the obvious changes to the call sites to pass the string names of
the categories not just their numeric codes.  (We could just log the
numbers, but it'd be much harder to interpret.)  This might be overkill,
but when you don't know what you're looking for, every little bit helps ...

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] could not adopt C locale failure at startup on Windows

2015-06-09 Thread Noah Misch
On Tue, Jun 09, 2015 at 12:24:02PM -0400, Tom Lane wrote:
 Andres Freund and...@anarazel.de writes:
  The error seems odd. The only even remotely related change between 9.4.1
  and .2 seems to be ca325941. Could also be a build environment change.
 
 Yeah, my first instinct was to blame ca325941 as well, but I don't think
 any of that code executes during init_locale().  Also,
 http://www.postgresql.org/message-id/20150326040321.2492.24...@wrigleys.postgresql.org
 says it's been seen in 9.4.1.

The return value check and error message were new in 9.4.1.  I suspect the
underlying problem has been present far longer, undetected.

I can reproduce this with initdb --locale=C,
postgresql-9.4.3-1-windows-binaries.zip (32-bit), Windows 7 x64, and the
Windows ANSI code page set to CP936.  (Choose Chinese (Simplified, PRC) in
Control Panel - Region and Language - Administrative - Language for
non-Unicode programs.)  It is neither necessary nor sufficient to change
Control Panel - Region and Language - Formats - Format.  Binaries from
postgresql-9.4.3-1-windows-x64-binaries.zip do not exhibit the problem.  Note
that CP936 is a PG_ENCODING_IS_CLIENT_ONLY() encoding.

 What I plan to do is this:
 
 static void
 init_locale(const char *categoryname, int category, const char *locale)
 {
 if (pg_perm_setlocale(category, locale) == NULL 
 pg_perm_setlocale(category, C) == NULL)
 elog(FATAL, could not adopt either \%s\ locale or C locale for 
 %s, locale, categoryname);
 }

Good to have.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers