Re: BUG #17946: LC_MONETARY & DO LANGUAGE plperl - BUG

2024-01-05 Thread Joe Conway

On 1/5/24 12:56, Robert Haas wrote:

On Sun, Aug 27, 2023 at 4:25 PM Heikki Linnakangas  wrote:

I think you got that it backwards. 'perl_locale_obj' is set to the perl
interpreter's locale, whenever we are *outside* the interpreter.


This thread has had no update for more than 4 months, so I'm marking
the CF entry RwF for now.

It can always be reopened, if Joe or Tristan or Heikki or someone else
picks it up again.



It is definitely a bug, so I do plan to get back to it at some point, 
hopefully sooner rather than later...


--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com





Re: BUG #17946: LC_MONETARY & DO LANGUAGE plperl - BUG

2024-01-05 Thread Robert Haas
On Sun, Aug 27, 2023 at 4:25 PM Heikki Linnakangas  wrote:
> I think you got that it backwards. 'perl_locale_obj' is set to the perl
> interpreter's locale, whenever we are *outside* the interpreter.

This thread has had no update for more than 4 months, so I'm marking
the CF entry RwF for now.

It can always be reopened, if Joe or Tristan or Heikki or someone else
picks it up again.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: BUG #17946: LC_MONETARY & DO LANGUAGE plperl - BUG

2023-08-27 Thread Heikki Linnakangas

On 27/08/2023 16:41, Joe Conway wrote:

On 8/15/23 10:40, Heikki Linnakangas wrote:

If multiple interpreters are used, is the single perl_locale_obj
variable still enough? Each interpreter can have their own locale I believe.


So in other words plperl and plperlu both used in the same query? I
don't see how we could get from one to the other without going through
the outer "postgres" locale first. Or are you thinking something else?


I think you got that it backwards. 'perl_locale_obj' is set to the perl 
interpreter's locale, whenever we are *outside* the interpreter.


This crashes with the patch:

postgres=# DO LANGUAGE plperlu
$function$
   use POSIX qw(setlocale LC_NUMERIC);
   use locale;

   setlocale LC_NUMERIC, "sv_SE.utf8";
$function$;
DO
postgres=# do language plperl $$ $$;
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.

I was going to test using plperl and plperl in the same session and 
expected the interpreters to mix up the locales they use. Maybe the 
crash is because of something like that, although I didn't expect a 
crash, just weird confusion on which locale is used.


--
Heikki Linnakangas
Neon (https://neon.tech)





Re: BUG #17946: LC_MONETARY & DO LANGUAGE plperl - BUG

2023-08-27 Thread Joe Conway

On 8/15/23 10:40, Heikki Linnakangas wrote:

On 01/08/2023 16:48, Joe Conway wrote:

Any further comments on the posted patch[1]? I would like to apply/push
this prior to the beta and minor releases next week.


I'm not sure about the placement of the uselocale() calls. In
plperl_spi_exec(), for example, I think we should switch to the global
locale right after the check_spi_usage_allowed() call. Otherwise, if an
error happens in BeginInternalSubTransaction() or in pg_verifymbstr(),
the error would be processed with the perl locale. Maybe that's
harmless, error processing hardly cares about LC_MONETARY, but seems
wrong in principle.


I guess you could probably argue that we should flip this around, and 
only enter the perl locale when calling into libperl, and exit the perl 
locale every time we reemerge under plperl.c control. That seems pretty 
drastic and potentially messy though.



Hmm, come to think of it, if BeginInternalSubTransaction() throws an
error, we just jump out of the perl interpreter? That doesn't seem cool.
But that's not new with this patch.


Hmm, true enough I guess.


If I'm reading correctly, compile_plperl_function() calls
select_perl_context(), which calls plperl_trusted_init(), which calls
uselocale(). So it leaves locale set to the perl locale. Who sets it back?


No one does it seems, at least not currently


How about adding a small wrapper around eval_pl() that sets and unsets
the locale(), just when we enter the interpreter? It's easier to see
that we are doing the calls in right places, if we make them as close as
possible to entering/exiting the interpreter. Are there other functions
in addition to eval_pl() that need to be called with the perl locale?


I can see that as a better strategy, but "other functions in addition to 
eval_pv()" (I assume you mean eval_pv rather than eval_pl) is a tricky 
one to answer.


I ran the attached script like so (from cwd src/pl/plperl) like so:
```
symbols-used.sh /lib/x86_64-linux-gnu/libperl.so.5.34 plperl.so
```
and get a fairly long list of exported libperl functions that get linked 
into plperl.so:


```
Matched symbols:
boot_DynaLoader
perl_alloc
Perl_av_extend
Perl_av_fetch
Perl_av_len
Perl_av_push
*Perl_call_list
*Perl_call_pv
*Perl_call_sv
perl_construct
Perl_croak
Perl_croak_nocontext
Perl_croak_sv
Perl_croak_xs_usage
Perl_die
*Perl_eval_pv
Perl_free_tmps
Perl_get_sv
Perl_gv_add_by_type
Perl_gv_stashpv
Perl_hv_clear
Perl_hv_common
Perl_hv_common_key_len
Perl_hv_iterinit
Perl_hv_iternext
Perl_hv_iternext_flags
Perl_hv_iternextsv
Perl_hv_ksplit
Perl_looks_like_number
Perl_markstack_grow
Perl_mg_get
Perl_newRV
Perl_newRV_noinc
Perl_newSV
Perl_newSViv
Perl_newSVpv
Perl_newSVpvn
Perl_newSVpvn_flags
Perl_newSVsv
Perl_newSVsv_flags
Perl_newSV_type
Perl_newSVuv
Perl_newXS
Perl_newXS_flags
*perl_parse
Perl_pop_scope
Perl_push_scope
*perl_run
Perl_save_item
Perl_savetmps
Perl_stack_grow
Perl_sv_2bool
Perl_sv_2bool_flags
Perl_sv_2iv
Perl_sv_2iv_flags
Perl_sv_2mortal
Perl_sv_2pv
Perl_sv_2pvbyte
Perl_sv_2pvbyte_flags
Perl_sv_2pv_flags
Perl_sv_2pvutf8
Perl_sv_2pvutf8_flags
Perl_sv_bless
Perl_sv_free
Perl_sv_free2
Perl_sv_isa
Perl_sv_newmortal
Perl_sv_setiv
Perl_sv_setiv_mg
Perl_sv_setsv
Perl_sv_setsv_flags
Perl_sys_init
Perl_sys_init3
Perl_xs_boot_epilog
Perl_xs_handshake
```

I marked the ones that look like perhaps we should care about in the 
above list with an asterisk:


*Perl_call_list
*Perl_call_pv
*Perl_call_sv
*Perl_eval_pv
*perl_run

but perhaps there are others?


/*
 * plperl_xact_callback --- cleanup at main-transaction end.
 */
static void
plperl_xact_callback(XactEvent event, void *arg)
{
/* ensure global locale is the current locale */
if (uselocale((locale_t) 0) != LC_GLOBAL_LOCALE)
perl_locale_obj = uselocale(LC_GLOBAL_LOCALE);
}


So the assumption is that the if current locale is not LC_GLOBAL_LOCALE,
then it was the perl locale. Seems true today, but this could confusion
if anything else calls uselocale(). In particular, if another PL
implementation copies this, and you use plperl and the other PL at the
same time, they would get mixed up. I think we need another "bool
perl_locale_obj_in_use" variable to track explicitly whether the perl
locale is currently active.


Or perhaps don't assume that we want the global locale and swap between 
pg_locale_obj (whatever it is) and perl_locale_obj?



If we are careful to put the uselocale() calls in the right places so
that we never ereport() while in perl locale, this callback isn't
needed. Maybe it's still a good idea, though, to be extra sure that
things get reset to a sane state if something unexpected happens.


I feel more comfortable that we have a "belt and suspenders" method to 
restore the locale that was in use by Postgres before entering perl.



If multiple interpreters are used, is the single perl_locale_obj
variable still enough? Each interpreter can have their own locale I believe.


So in other words plperl and plperlu both 

Re: BUG #17946: LC_MONETARY & DO LANGUAGE plperl - BUG

2023-08-15 Thread Heikki Linnakangas

On 01/08/2023 16:48, Joe Conway wrote:

Any further comments on the posted patch[1]? I would like to apply/push
this prior to the beta and minor releases next week.


I'm not sure about the placement of the uselocale() calls. In 
plperl_spi_exec(), for example, I think we should switch to the global 
locale right after the check_spi_usage_allowed() call. Otherwise, if an 
error happens in BeginInternalSubTransaction() or in pg_verifymbstr(), 
the error would be processed with the perl locale. Maybe that's 
harmless, error processing hardly cares about LC_MONETARY, but seems 
wrong in principle.


Hmm, come to think of it, if BeginInternalSubTransaction() throws an 
error, we just jump out of the perl interpreter? That doesn't seem cool. 
But that's not new with this patch.


If I'm reading correctly, compile_plperl_function() calls 
select_perl_context(), which calls plperl_trusted_init(), which calls 
uselocale(). So it leaves locale set to the perl locale. Who sets it back?


How about adding a small wrapper around eval_pl() that sets and unsets 
the locale(), just when we enter the interpreter? It's easier to see 
that we are doing the calls in right places, if we make them as close as 
possible to entering/exiting the interpreter. Are there other functions 
in addition to eval_pl() that need to be called with the perl locale?



/*
 * plperl_xact_callback --- cleanup at main-transaction end.
 */
static void
plperl_xact_callback(XactEvent event, void *arg)
{
/* ensure global locale is the current locale */
if (uselocale((locale_t) 0) != LC_GLOBAL_LOCALE)
perl_locale_obj = uselocale(LC_GLOBAL_LOCALE);
}


So the assumption is that the if current locale is not LC_GLOBAL_LOCALE, 
then it was the perl locale. Seems true today, but this could confusion 
if anything else calls uselocale(). In particular, if another PL 
implementation copies this, and you use plperl and the other PL at the 
same time, they would get mixed up. I think we need another "bool 
perl_locale_obj_in_use" variable to track explicitly whether the perl 
locale is currently active.


If we are careful to put the uselocale() calls in the right places so 
that we never ereport() while in perl locale, this callback isn't 
needed. Maybe it's still a good idea, though, to be extra sure that 
things get reset to a sane state if something unexpected happens.


If multiple interpreters are used, is the single perl_locale_obj 
variable still enough? Each interpreter can have their own locale I believe.


PS. please pgindent

--
Heikki Linnakangas
Neon (https://neon.tech)





Re: BUG #17946: LC_MONETARY & DO LANGUAGE plperl - BUG

2023-08-01 Thread Tristan Partin

On Tue Aug 1, 2023 at 8:48 AM CDT, Joe Conway wrote:

On 7/3/23 12:25, Tristan Partin wrote:
> On Sat Jun 24, 2023 at 8:09 AM CDT, Joe Conway wrote:
>> Although I have not looked yet, presumably we could have similar 
>> problems with plpython. I would like to get agreement on this approach 
>> against plperl before diving into that though.

>>
>> Thoughts?
> 
> I don't see anything immediately wrong with this.


Any further comments on the posted patch[1]? I would like to apply/push 
this prior to the beta and minor releases next week.


Joe

[1] 
https://www.postgresql.org/message-id/ec6fa20d-e691-198a-4a13-e761771b9dec%40joeconway.com


None from my end.

--
Tristan Partin
Neon (https://neon.tech)




Re: BUG #17946: LC_MONETARY & DO LANGUAGE plperl - BUG

2023-08-01 Thread Joe Conway

On 7/3/23 12:25, Tristan Partin wrote:

On Sat Jun 24, 2023 at 8:09 AM CDT, Joe Conway wrote:
Although I have not looked yet, presumably we could have similar 
problems with plpython. I would like to get agreement on this approach 
against plperl before diving into that though.


Thoughts?


I don't see anything immediately wrong with this.


Any further comments on the posted patch[1]? I would like to apply/push 
this prior to the beta and minor releases next week.


Joe

[1] 
https://www.postgresql.org/message-id/ec6fa20d-e691-198a-4a13-e761771b9dec%40joeconway.com


--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com





Re: BUG #17946: LC_MONETARY & DO LANGUAGE plperl - BUG

2023-07-10 Thread Tristan Partin
Here is an up to date patch given some churn on the master branch.

-- 
Tristan Partin
Neon (https://neon.tech)
From b68cec481768c7c635ec48329b4764eced264572 Mon Sep 17 00:00:00 2001
From: Tristan Partin 
Date: Fri, 30 Jun 2023 09:31:04 -0500
Subject: [PATCH v3 1/3] Skip checking for uselocale on Windows

Windows doesn't have uselocale, so skip it.
---
 meson.build | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/meson.build b/meson.build
index 0c44f19cb9..3356f72bf0 100644
--- a/meson.build
+++ b/meson.build
@@ -2438,7 +2438,7 @@ func_checks = [
   ['strsignal'],
   ['sync_file_range'],
   ['syncfs'],
-  ['uselocale'],
+  ['uselocale', {'skip': host_system == 'windows'}],
   ['wcstombs_l'],
 ]
 
-- 
Tristan Partin
Neon (https://neon.tech)

From 3207694a1d214a7d5b844f3f6dfd8378408172af Mon Sep 17 00:00:00 2001
From: Tristan Partin 
Date: Fri, 30 Jun 2023 11:13:29 -0500
Subject: [PATCH v3 2/3] Add locale_is_c function

In some places throughout the codebase, there are string comparisons for
locales matching C or POSIX. Encapsulate this logic into a single
function and use it.
---
 src/backend/utils/adt/pg_locale.c | 39 ++-
 src/backend/utils/init/postinit.c |  4 +---
 src/backend/utils/mb/mbutils.c|  5 ++--
 src/include/utils/pg_locale.h |  1 +
 4 files changed, 27 insertions(+), 22 deletions(-)

diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index aa9da99308..047c02dbab 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -191,6 +191,23 @@ wcstombs_l(char *dest, const wchar_t *src, size_t n, locale_t loc)
 }
 #endif
 
+
+/*
+ * Check if a locale is the C locale
+ *
+ * POSIX is an alias for C. Passing ingore_case as true will use
+ * pg_strcasecmp() instead of strcmp().
+ */
+bool
+locale_is_c(const char *locale, bool ignore_case)
+{
+	if (!ignore_case)
+		return strcmp(locale, "C") == 0 || strcmp(locale, "POSIX") == 0;
+
+	return pg_strcasecmp(locale, "C") == 0 || pg_strcasecmp(locale, "POSIX") == 0;
+}
+
+
 /*
  * pg_perm_setlocale
  *
@@ -1276,10 +1293,8 @@ lookup_collation_cache(Oid collation, bool set_flags)
 			datum = SysCacheGetAttrNotNull(COLLOID, tp, Anum_pg_collation_collctype);
 			collctype = TextDatumGetCString(datum);
 
-			cache_entry->collate_is_c = ((strcmp(collcollate, "C") == 0) ||
-		 (strcmp(collcollate, "POSIX") == 0));
-			cache_entry->ctype_is_c = ((strcmp(collctype, "C") == 0) ||
-	   (strcmp(collctype, "POSIX") == 0));
+			cache_entry->collate_is_c = locale_is_c(collcollate, false);
+			cache_entry->ctype_is_c = locale_is_c(collctype, false);
 		}
 		else
 		{
@@ -1327,12 +1342,8 @@ lc_collate_is_c(Oid collation)
 		if (!localeptr)
 			elog(ERROR, "invalid LC_COLLATE setting");
 
-		if (strcmp(localeptr, "C") == 0)
-			result = true;
-		else if (strcmp(localeptr, "POSIX") == 0)
-			result = true;
-		else
-			result = false;
+		result = locale_is_c(localeptr, false);
+
 		return (bool) result;
 	}
 
@@ -1380,12 +1391,8 @@ lc_ctype_is_c(Oid collation)
 		if (!localeptr)
 			elog(ERROR, "invalid LC_CTYPE setting");
 
-		if (strcmp(localeptr, "C") == 0)
-			result = true;
-		else if (strcmp(localeptr, "POSIX") == 0)
-			result = true;
-		else
-			result = false;
+		result = locale_is_c(localeptr, false);
+
 		return (bool) result;
 	}
 
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index 0f9b92b32e..a92a0c438f 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -419,9 +419,7 @@ CheckMyDatabase(const char *name, bool am_superuser, bool override_allow_connect
 		   " which is not recognized by setlocale().", ctype),
  errhint("Recreate the database with another locale or install the missing locale.")));
 
-	if (strcmp(ctype, "C") == 0 ||
-		strcmp(ctype, "POSIX") == 0)
-		database_ctype_is_c = true;
+	database_ctype_is_c = locale_is_c(ctype, false);
 
 	if (dbform->datlocprovider == COLLPROVIDER_ICU)
 	{
diff --git a/src/backend/utils/mb/mbutils.c b/src/backend/utils/mb/mbutils.c
index 67a1ab2ab2..997156515c 100644
--- a/src/backend/utils/mb/mbutils.c
+++ b/src/backend/utils/mb/mbutils.c
@@ -39,6 +39,7 @@
 #include "mb/pg_wchar.h"
 #include "utils/builtins.h"
 #include "utils/memutils.h"
+#include "utils/pg_locale.h"
 #include "utils/syscache.h"
 #include "varatt.h"
 
@@ -1237,9 +1238,7 @@ pg_bind_textdomain_codeset(const char *domainname)
 	int			new_msgenc;
 
 #ifndef WIN32
-	const char *ctype = setlocale(LC_CTYPE, NULL);
-
-	if (pg_strcasecmp(ctype, "C") == 0 || pg_strcasecmp(ctype, "POSIX") == 0)
+	if (locale_is_c(locale_ctype, true))
 #endif
 		if (encoding != PG_SQL_ASCII &&
 			raw_pg_bind_textdomain_codeset(domainname, encoding))
diff --git a/src/include/utils/pg_locale.h b/src/include/utils/pg_locale.h
index 6447bea8e0..e08d96e099 100644
--- a/src/include/utils/pg_locale.h
+++ b/src/include/utils/pg_locale.h
@@ -54,6 +54,7 @@ extern PGDLLIMPORT bool 

Re: BUG #17946: LC_MONETARY & DO LANGUAGE plperl - BUG

2023-07-05 Thread Tristan Partin
Someday I will learn...

Attached is the v2.

-- 
Tristan Partin
Neon (https://neon.tech)
From 5688bc2b2c6331f437a72b6a429199c5416ccd76 Mon Sep 17 00:00:00 2001
From: Tristan Partin 
Date: Fri, 30 Jun 2023 09:31:04 -0500
Subject: [PATCH v2 1/3] Skip checking for uselocale on Windows

Windows doesn't have uselocale, so skip it.
---
 meson.build | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/meson.build b/meson.build
index fbec997947..df76f10822 100644
--- a/meson.build
+++ b/meson.build
@@ -2439,7 +2439,7 @@ func_checks = [
   ['strsignal'],
   ['sync_file_range'],
   ['syncfs'],
-  ['uselocale'],
+  ['uselocale', {'skip': host_system == 'windows'}],
   ['wcstombs_l'],
 ]
 
-- 
Tristan Partin
Neon (https://neon.tech)

From d52ab7851e9638e36cd99859014d7aa31154e600 Mon Sep 17 00:00:00 2001
From: Tristan Partin 
Date: Fri, 30 Jun 2023 11:13:29 -0500
Subject: [PATCH v2 2/3] Add locale_is_c function

In some places throughout the codebase, there are string comparisons for
locales matching C or POSIX. Encapsulate this logic into a single
function and use it.
---
 src/backend/utils/adt/pg_locale.c | 37 ++-
 src/backend/utils/init/postinit.c |  4 +---
 src/backend/utils/mb/mbutils.c|  5 ++---
 src/include/utils/pg_locale.h |  1 +
 4 files changed, 25 insertions(+), 22 deletions(-)

diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index 514d4a9eeb..b455ef3aff 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -154,6 +154,21 @@ static void icu_set_collation_attributes(UCollator *collator, const char *loc,
 		 UErrorCode *status);
 #endif
 
+/*
+ * Check if a locale is the C locale
+ *
+ * POSIX is an alias for C. Passing ingore_case as true will use
+ * pg_strcasecmp() instead of strcmp().
+ */
+bool
+locale_is_c(const char *locale, bool ignore_case)
+{
+	if (!ignore_case)
+		return strcmp(locale, "C") == 0 || strcmp(locale, "POSIX") == 0;
+
+	return pg_strcasecmp(locale, "C") == 0 || pg_strcasecmp(locale, "POSIX") == 0;
+}
+
 /*
  * pg_perm_setlocale
  *
@@ -1239,10 +1254,8 @@ lookup_collation_cache(Oid collation, bool set_flags)
 			datum = SysCacheGetAttrNotNull(COLLOID, tp, Anum_pg_collation_collctype);
 			collctype = TextDatumGetCString(datum);
 
-			cache_entry->collate_is_c = ((strcmp(collcollate, "C") == 0) ||
-		 (strcmp(collcollate, "POSIX") == 0));
-			cache_entry->ctype_is_c = ((strcmp(collctype, "C") == 0) ||
-	   (strcmp(collctype, "POSIX") == 0));
+			cache_entry->collate_is_c = locale_is_c(collcollate, false);
+			cache_entry->ctype_is_c = locale_is_c(collctype, false);
 		}
 		else
 		{
@@ -1290,12 +1303,8 @@ lc_collate_is_c(Oid collation)
 		if (!localeptr)
 			elog(ERROR, "invalid LC_COLLATE setting");
 
-		if (strcmp(localeptr, "C") == 0)
-			result = true;
-		else if (strcmp(localeptr, "POSIX") == 0)
-			result = true;
-		else
-			result = false;
+		result = locale_is_c(localeptr, false);
+
 		return (bool) result;
 	}
 
@@ -1343,12 +1352,8 @@ lc_ctype_is_c(Oid collation)
 		if (!localeptr)
 			elog(ERROR, "invalid LC_CTYPE setting");
 
-		if (strcmp(localeptr, "C") == 0)
-			result = true;
-		else if (strcmp(localeptr, "POSIX") == 0)
-			result = true;
-		else
-			result = false;
+		result = locale_is_c(localeptr, false);
+
 		return (bool) result;
 	}
 
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index 0f9b92b32e..a92a0c438f 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -419,9 +419,7 @@ CheckMyDatabase(const char *name, bool am_superuser, bool override_allow_connect
 		   " which is not recognized by setlocale().", ctype),
  errhint("Recreate the database with another locale or install the missing locale.")));
 
-	if (strcmp(ctype, "C") == 0 ||
-		strcmp(ctype, "POSIX") == 0)
-		database_ctype_is_c = true;
+	database_ctype_is_c = locale_is_c(ctype, false);
 
 	if (dbform->datlocprovider == COLLPROVIDER_ICU)
 	{
diff --git a/src/backend/utils/mb/mbutils.c b/src/backend/utils/mb/mbutils.c
index 67a1ab2ab2..997156515c 100644
--- a/src/backend/utils/mb/mbutils.c
+++ b/src/backend/utils/mb/mbutils.c
@@ -39,6 +39,7 @@
 #include "mb/pg_wchar.h"
 #include "utils/builtins.h"
 #include "utils/memutils.h"
+#include "utils/pg_locale.h"
 #include "utils/syscache.h"
 #include "varatt.h"
 
@@ -1237,9 +1238,7 @@ pg_bind_textdomain_codeset(const char *domainname)
 	int			new_msgenc;
 
 #ifndef WIN32
-	const char *ctype = setlocale(LC_CTYPE, NULL);
-
-	if (pg_strcasecmp(ctype, "C") == 0 || pg_strcasecmp(ctype, "POSIX") == 0)
+	if (locale_is_c(locale_ctype, true))
 #endif
 		if (encoding != PG_SQL_ASCII &&
 			raw_pg_bind_textdomain_codeset(domainname, encoding))
diff --git a/src/include/utils/pg_locale.h b/src/include/utils/pg_locale.h
index e2a7243542..872bef798a 100644
--- a/src/include/utils/pg_locale.h
+++ b/src/include/utils/pg_locale.h
@@ -54,6 +54,7 @@ extern 

Re: BUG #17946: LC_MONETARY & DO LANGUAGE plperl - BUG

2023-07-05 Thread Tristan Partin
On Mon Jul 3, 2023 at 9:42 AM CDT, Tristan Partin wrote:
> Thanks for your patience. Attached is a patch that should cover all the
> problematic use cases of setlocale(). There are some setlocale() calls in
> tests, initdb, and ecpg left. I plan to get to ecpglib before the final
> version of this patch after I abstract over Windows not having
> uselocale(). I think leaving initdb and tests as is would be fine, but I
> am also happy to just permanently purge setlocale() from the codebase
> if people see value in that. We could also poison[0] setlocale() at that
> point.
>
> [0]: https://gcc.gnu.org/onlinedocs/cpp/Pragmas.html

Here is a v2 with best effort Windows support. My patch currently
assumes that you either have uselocale() or are Windows. I dropped the
environment variable hacks, but could bring them back if we didn't like
this requirement.

I tried to add an email[0] to discuss this with hackers, but failed to add
the CC. Let's discuss here instead given my complete inability to manage
mailing lists :).

[0]: https://www.postgresql.org/message-id/CTUJ604ZWHI1.3PFZK152XCWLX%40gonk

-- 
Tristan Partin
Neon (https://neon.tech)




Re: BUG #17946: LC_MONETARY & DO LANGUAGE plperl - BUG

2023-07-03 Thread Joe Conway

On 7/3/23 12:25, Tristan Partin wrote:

On Sat Jun 24, 2023 at 8:09 AM CDT, Joe Conway wrote:
Although I have not looked yet, presumably we could have similar 
problems with plpython. I would like to get agreement on this approach 
against plperl before diving into that though.


Thoughts?


I don't see anything immediately wrong with this. I think doing a
similar thing for plpython would make sense. Might make sense to CC any
other pl* maintainers too.


In our tree there are only plperl and plpython to worry about.

"other pl* maintainers" is a fuzzy concept since other pl's are 
scattered far and wide.


I think it is reasonable to expect such maintainers to be paying 
attention to hackers and pick up on it themselves (I say that as a pl 
maintainer myself -- plr)


--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com





Re: BUG #17946: LC_MONETARY & DO LANGUAGE plperl - BUG

2023-07-03 Thread Joe Conway

On 7/3/23 12:17, Tristan Partin wrote:

The Reply-To header in your email is pointing at joe@cd, fyi. Pretty
strange.



I noticed that -- it happened only the one time, and I am not sure why. 
Seems fine now though.


--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com





Re: BUG #17946: LC_MONETARY & DO LANGUAGE plperl - BUG

2023-07-03 Thread Tristan Partin
On Sat Jun 24, 2023 at 8:09 AM CDT, Joe Conway wrote:
> Although I have not looked yet, presumably we could have similar 
> problems with plpython. I would like to get agreement on this approach 
> against plperl before diving into that though.
>
> Thoughts?

I don't see anything immediately wrong with this. I think doing a
similar thing for plpython would make sense. Might make sense to CC any
other pl* maintainers too.

-- 
Tristan Partin
Neon (https://neon.tech)




Re: BUG #17946: LC_MONETARY & DO LANGUAGE plperl - BUG

2023-07-03 Thread Tristan Partin
Joe,

The Reply-To header in your email is pointing at joe@cd, fyi. Pretty
strange.

-- 
Tristan Partin
Neon (https://neon.tech)




Re: BUG #17946: LC_MONETARY & DO LANGUAGE plperl - BUG

2023-07-03 Thread Tristan Partin
On Fri Jun 30, 2023 at 7:13 AM CDT, Joe Conway wrote:
> On 6/29/23 22:13, Tristan Partin wrote:
> > On Mon Jun 5, 2023 at 11:00 AM CDT, Heikki Linnakangas wrote:
> >> I think the uselocale() call renders ineffective the setlocale() calls 
> >> that we make later. Maybe we should replace our setlocale() calls with 
> >> uselocale(), too.
> > 
> > For what it's worth to everyone else in the thread (especially Joe), I
> > have a patch locally that fixes the mentioned bug using uselocale(). I
> > am not sure that it is worth committing for v16 given how _large_ (the
> > patch is actually quite small, +216 -235) of a change it is. I am going
> > to spend tomorrow combing over it a bit more and evaluating other
> > setlocale uses in the codebase.
>
> (moving thread to hackers)
>
> I don't see a patch attached -- how is it different than what I posted a 
> week ago and added to the commitfest here?
>
>   https://commitfest.postgresql.org/43/4413/
>
> FWIW, if you are proposing replacing all uses of setlocale() with 
> uselocale() as Heikki suggested:
>
> 1/ I don't think that is pg16 material, and almost certainly not 
> back-patchable to earlier.

I am in agreement.

> 2/ It probably does not solve all of the identified issues caused by the 
> newer perl libraries by itself, i.e. I believe the patch posted to the 
> CF is still needed.

Perhaps. I do think your patch is still valuable regardless. Works for
backpatching and is just good defensive programming. I have added myself
as a reviewer.

> 3/ I believe it is probably the right way to go for pg17+, but I would 
> love to hear opinions from Jeff Davis, Peter Eisentraut, and/or Thomas 
> Munroe (the locale code "usual suspects" ;-)), and others, about that.

Thanks for your patience. Attached is a patch that should cover all the
problematic use cases of setlocale(). There are some setlocale() calls in
tests, initdb, and ecpg left. I plan to get to ecpglib before the final
version of this patch after I abstract over Windows not having
uselocale(). I think leaving initdb and tests as is would be fine, but I
am also happy to just permanently purge setlocale() from the codebase
if people see value in that. We could also poison[0] setlocale() at that
point.

[0]: https://gcc.gnu.org/onlinedocs/cpp/Pragmas.html

-- 
Tristan Partin
Neon (https://neon.tech)
From 5688bc2b2c6331f437a72b6a429199c5416ccd76 Mon Sep 17 00:00:00 2001
From: Tristan Partin 
Date: Fri, 30 Jun 2023 09:31:04 -0500
Subject: [PATCH v1 1/3] Skip checking for uselocale on Windows

Windows doesn't have uselocale, so skip it.
---
 meson.build | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/meson.build b/meson.build
index fbec997947..df76f10822 100644
--- a/meson.build
+++ b/meson.build
@@ -2439,7 +2439,7 @@ func_checks = [
   ['strsignal'],
   ['sync_file_range'],
   ['syncfs'],
-  ['uselocale'],
+  ['uselocale', {'skip': host_system == 'windows'}],
   ['wcstombs_l'],
 ]
 
-- 
Tristan Partin
Neon (https://neon.tech)

From d52ab7851e9638e36cd99859014d7aa31154e600 Mon Sep 17 00:00:00 2001
From: Tristan Partin 
Date: Fri, 30 Jun 2023 11:13:29 -0500
Subject: [PATCH v1 2/3] Add locale_is_c function

In some places throughout the codebase, there are string comparisons for
locales matching C or POSIX. Encapsulate this logic into a single
function and use it.
---
 src/backend/utils/adt/pg_locale.c | 37 ++-
 src/backend/utils/init/postinit.c |  4 +---
 src/backend/utils/mb/mbutils.c|  5 ++---
 src/include/utils/pg_locale.h |  1 +
 4 files changed, 25 insertions(+), 22 deletions(-)

diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index 514d4a9eeb..b455ef3aff 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -154,6 +154,21 @@ static void icu_set_collation_attributes(UCollator *collator, const char *loc,
 		 UErrorCode *status);
 #endif
 
+/*
+ * Check if a locale is the C locale
+ *
+ * POSIX is an alias for C. Passing ingore_case as true will use
+ * pg_strcasecmp() instead of strcmp().
+ */
+bool
+locale_is_c(const char *locale, bool ignore_case)
+{
+	if (!ignore_case)
+		return strcmp(locale, "C") == 0 || strcmp(locale, "POSIX") == 0;
+
+	return pg_strcasecmp(locale, "C") == 0 || pg_strcasecmp(locale, "POSIX") == 0;
+}
+
 /*
  * pg_perm_setlocale
  *
@@ -1239,10 +1254,8 @@ lookup_collation_cache(Oid collation, bool set_flags)
 			datum = SysCacheGetAttrNotNull(COLLOID, tp, Anum_pg_collation_collctype);
 			collctype = TextDatumGetCString(datum);
 
-			cache_entry->collate_is_c = ((strcmp(collcollate, "C") == 0) ||
-		 (strcmp(collcollate, "POSIX") == 0));
-			cache_entry->ctype_is_c = ((strcmp(collctype, "C") == 0) ||
-	   (strcmp(collctype, "POSIX") == 0));
+			cache_entry->collate_is_c = locale_is_c(collcollate, false);
+			cache_entry->ctype_is_c = locale_is_c(collctype, false);
 		}
 		else
 		{
@@ -1290,12 +1303,8 @@ lc_collate_is_c(Oid collation)
 		

Re: BUG #17946: LC_MONETARY & DO LANGUAGE plperl - BUG

2023-06-30 Thread Joe Conway

On 6/29/23 22:13, Tristan Partin wrote:

On Mon Jun 5, 2023 at 11:00 AM CDT, Heikki Linnakangas wrote:
I think the uselocale() call renders ineffective the setlocale() calls 
that we make later. Maybe we should replace our setlocale() calls with 
uselocale(), too.


For what it's worth to everyone else in the thread (especially Joe), I
have a patch locally that fixes the mentioned bug using uselocale(). I
am not sure that it is worth committing for v16 given how _large_ (the
patch is actually quite small, +216 -235) of a change it is. I am going
to spend tomorrow combing over it a bit more and evaluating other
setlocale uses in the codebase.


(moving thread to hackers)

I don't see a patch attached -- how is it different than what I posted a 
week ago and added to the commitfest here?


 https://commitfest.postgresql.org/43/4413/

FWIW, if you are proposing replacing all uses of setlocale() with 
uselocale() as Heikki suggested:


1/ I don't think that is pg16 material, and almost certainly not 
back-patchable to earlier.


2/ It probably does not solve all of the identified issues caused by the 
newer perl libraries by itself, i.e. I believe the patch posted to the 
CF is still needed.


3/ I believe it is probably the right way to go for pg17+, but I would 
love to hear opinions from Jeff Davis, Peter Eisentraut, and/or Thomas 
Munroe (the locale code "usual suspects" ;-)), and others, about that.


--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



OpenPGP_signature
Description: OpenPGP digital signature


Re: BUG #17946: LC_MONETARY & DO LANGUAGE plperl - BUG

2023-06-24 Thread Joe Conway

On 6/22/23 03:26, Heikki Linnakangas wrote:

On 21/06/2023 01:02, Joe Conway wrote:

On 6/19/23 19:30, Heikki Linnakangas wrote:

I think we should call "uselocale(LC_GLOBAL_LOCALE)" immediately after
returning from the perl interpreter, instead of before setlocale()
calls, if we want all Postgres code to run with the global locale. Not
sure how much performance overhead that would have.


I don't see how that is practical, or at least it does not really
address the issue. I think any loaded shared library could cause the
same problem by running newlocale() + uselocale() on init. Perhaps I
should go test that theory though.


Any shared library could do that, that's true. Any shared library could
also call 'chdir'. But most shared libraries don't. I think it's the
responsibility of the extension that loads the shared library, plperl in
this case, to make sure it doesn't mess up the environment for the
postgres backend.

Ok, fair enough.

The attached fixes all of the issues raised on this thread by 
specifically patching plperl.


8<
create or replace function finnish_to_number()
returns numeric as
$$
  select to_number('1,23', '9D99')
$$ language sql set lc_numeric to 'fi_FI.utf8';

pl_regression=# show lc_monetary;
 lc_monetary
-
 C
(1 row)

DO LANGUAGE 'plperlu'
$$
  use POSIX qw(setlocale LC_NUMERIC);
  use locale;
  setlocale LC_NUMERIC, "fi_FI.utf8";
  $n = 5/2;   # Assign numeric 2.5 to $n
  spi_exec_query('SELECT finnish_to_number()');
  # Locale-dependent conversion to string
  $a = " $n";
  # Locale-dependent output
  elog(NOTICE, "half five is $n");
$$;
NOTICE:  half five is 2,5
DO

set lc_messages ='sv_SE.UTF8';
this prints syntax error in Swedish;
FEL:  syntaxfel vid eller nära "this"
LINE 1: this prints syntax error in Swedish;
^

set lc_messages ='en_GB.utf8';
this *should* print syntax error in English;
ERROR:  syntax error at or near "this"
LINE 1: this *should* print syntax error in English;
^
set lc_monetary ='sv_SE.UTF8';
SELECT 12.34::money AS price;
  price
--
 12,34 kr
(1 row)

set lc_monetary ='en_GB.UTF8';
SELECT 12.34::money AS price;
 price

 £12.34
(1 row)

set lc_monetary ='en_US.UTF8';
SELECT 12.34::money AS price;
 price

 $12.34
(1 row)
8<

This works correctly from what I can see -- tested against pg16beta1 on 
Linux Mint with perl v5.34.0 as well as against pg15.2 on RHEL 7 with 
perl v5.16.3.


Although I have not looked yet, presumably we could have similar 
problems with plpython. I would like to get agreement on this approach 
against plperl before diving into that though.


Thoughts?

--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
Ensure correct locale is used when executing plperl

Newer versions of libperl, via plperl, call uselocale() which
has the effect of changing the current locale away from the
global locale underneath postgres. This can result in, among other
infelicities, localeconv() grabbing the wrong locale for
numeric and monetary symbols and formatting. Fix that by arranging
to capture the perl locale and swapping with the global locale
as appropriate when entering and exiting libperl. Importantly,
this dance is also needed when exiting perl via SPI calls made
while executing perl.

Backpatch to all supported versions.

Author: Joe Conway
Reviewed-By: Tom Lane and Heikki Linnakangas
Reported by: Guido Brugnara
Discussion: https://postgr.es/m/flat/17946-3e84cb577e9551c3%40postgresql.org
Backpatch-through: 11

diff --git a/src/pl/plperl/plperl.c b/src/pl/plperl/plperl.c
index 8638642..9831361 100644
*** a/src/pl/plperl/plperl.c
--- b/src/pl/plperl/plperl.c
*** typedef struct plperl_array_info
*** 223,228 
--- 223,233 
  static HTAB *plperl_interp_hash = NULL;
  static HTAB *plperl_proc_hash = NULL;
  static plperl_interp_desc *plperl_active_interp = NULL;
+ /*
+  * Newer versions of perl call uselocale() to switch away from
+  * the global locale used by the backend. Store that here.
+  */
+ static locale_t perl_locale_obj = LC_GLOBAL_LOCALE;
  
  /* If we have an unassigned "held" interpreter, it's stored here */
  static PerlInterpreter *plperl_held_interp = NULL;
*** static char *setlocale_perl(int category
*** 302,307 
--- 307,314 
  #define setlocale_perl(a,b)  Perl_setlocale(a,b)
  #endif			/* defined(WIN32) && PERL_VERSION_LT(5, 28, 0) */
  
+ static void plperl_xact_callback(XactEvent event, void *arg);
+ 
  /*
   * Decrement the refcount of the given SV within the active Perl interpreter
   *
*** _PG_init(void)
*** 482,487 
--- 489,508 
  	 */
  	plperl_held_interp = plperl_init_interp();
  
+ 	/*
+ 	 * Grab a copy of perl locale in use, and switch back
+ 	 * to the global one. We will need to switch back and
+ 	 * forth, such that the current locale is perl's whenever
+ 	 * we are about to evaluate perl code, and the global
+ 	 * locale 

Re: BUG #17946: LC_MONETARY & DO LANGUAGE plperl - BUG

2023-06-22 Thread Heikki Linnakangas

On 21/06/2023 01:02, Joe Conway wrote:

On 6/19/23 19:30, Heikki Linnakangas wrote:

On 18/06/2023 21:27, Joe Conway wrote:
With the patch you're proposing, do we now have a coding rule that you
must call "uselocale(LC_GLOBAL_LOCALE)" before every and any call to
setlocale()? If so, you missed a few spots: pg_perm_setlocale,
pg_bind_textdomain_codeset, and cache_locale_time.


Well I was not proposing such a rule (trying to stay narrowly focused on
the demonstrated issue) but I suppose it might make sense. Anywhere we
use setlocale() we are depending on subsequent locale operations to use
the global locale. And uselocale(LC_GLOBAL_LOCALE) itself looks like it
ought to be pretty cheap.


The current locale affects a lot of other things than localeconv()
calls. For example, LC_MESSAGES affects all strerror() calls. Do we need
to call "uselocale(LC_GLOBAL_LOCALE)" before all possible strerror()
calls too?


That seems heavy handed


Yet I think that's exactly where this is heading. See this case (for 
gettext() rather than strerror()):


postgres=# set lc_messages ='sv_SE.UTF8';
SET
postgres=# this prints syntax error in Swedish;
FEL:  syntaxfel vid eller nära "this"
LINE 1: this prints syntax error in Swedish;
^
postgres=# load 'plperl';
LOAD
postgres=# set lc_messages ='en_GB.utf8';
SET
postgres=# this *should* print syntax error in English;
FEL:  syntaxfel vid eller nära "this"
LINE 1: this *should* print syntax error in English;
^


I think we should call "uselocale(LC_GLOBAL_LOCALE)" immediately after
returning from the perl interpreter, instead of before setlocale()
calls, if we want all Postgres code to run with the global locale. Not
sure how much performance overhead that would have.


I don't see how that is practical, or at least it does not really
address the issue. I think any loaded shared library could cause the
same problem by running newlocale() + uselocale() on init. Perhaps I
should go test that theory though.


Any shared library could do that, that's true. Any shared library could 
also call 'chdir'. But most shared libraries don't. I think it's the 
responsibility of the extension that loads the shared library, plperl in 
this case, to make sure it doesn't mess up the environment for the 
postgres backend.



Testing the patch, I bumped into this:

postgres=# create or replace function finnish_to_number() returns
numeric as $$ select to_number('1,23', '9D99'); $$ language sql set
lc_numeric to 'fi_FI.utf8';
CREATE FUNCTION
postgres=# DO LANGUAGE 'plperlu' $$
use POSIX qw(setlocale LC_NUMERIC);
use locale;

setlocale LC_NUMERIC, "fi_FI.utf8";

$n = 5/2;   # Assign numeric 2.5 to $n

spi_exec_query('SELECT finnish_to_number()');

$a = " $n"; # Locale-dependent conversion to string
elog(NOTICE, "half five is $n");   # Locale-dependent output
$$;
NOTICE:  half five is 2,5
DO
postgres=# select to_char(now(), 'Day');
WARNING:  could not determine encoding for locale "en_GB.UTF-8": codeset
is "ANSI_X3.4-1968"
 to_char
---
Tuesday
(1 row)


Do you think that is because uselocale(LC_GLOBAL_LOCALE) pulls out the
rug from under perl?


libperl is fine in this case. But cache_locale_time() also calls 
setlocale(), and your patch didn't add the "uselocale(LC_GLOBAL_LOCALE)" 
there.


It's a valid concern that "uselocale(LC_GLOBAL_LOCALE)" could pull the 
rug from under perl. I tried to find issues like that, by calling 
locale-dependent functions in plperl, with SQL functions that call 
"uselocale(LC_GLOBAL_LOCALE)" via PGLC_localeconv() in between. But I 
couldn't find any case where the perl code would misbehave. I guess 
libperl calls uselocale() before any locale-dependent function, but I 
didn't look very closely.


--
Heikki Linnakangas
Neon (https://neon.tech)





Re: BUG #17946: LC_MONETARY & DO LANGUAGE plperl - BUG

2023-06-20 Thread Joe Conway

On 6/19/23 19:30, Heikki Linnakangas wrote:

On 18/06/2023 21:27, Joe Conway wrote:

I have proposed a targeted fix that I believe is safe to backpatch --
attached.

IIUC, Tom was +1, but Heikki was looking for a more general solution.

My issue with the more general solution is that it will likely be too
invasive to backpatch, and at the moment at least, there are no other
confirmed bugs related to all of this (even if the current code is more
fragile than we would prefer).


Ok, I agree switching to uselocale() everywhere is too much to
backpatch. We should consider it for master though.


Makes sense


With the patch you're proposing, do we now have a coding rule that you
must call "uselocale(LC_GLOBAL_LOCALE)" before every and any call to
setlocale()? If so, you missed a few spots: pg_perm_setlocale,
pg_bind_textdomain_codeset, and cache_locale_time.


Well I was not proposing such a rule (trying to stay narrowly focused on 
the demonstrated issue) but I suppose it might make sense. Anywhere we 
use setlocale() we are depending on subsequent locale operations to use 
the global locale. And uselocale(LC_GLOBAL_LOCALE) itself looks like it 
ought to be pretty cheap.



The current locale affects a lot of other things than localeconv()
calls. For example, LC_MESSAGES affects all strerror() calls. Do we need
to call "uselocale(LC_GLOBAL_LOCALE)" before all possible strerror()
calls too?


That seems heavy handed


I think we should call "uselocale(LC_GLOBAL_LOCALE)" immediately after
returning from the perl interpreter, instead of before setlocale()
calls, if we want all Postgres code to run with the global locale. Not
sure how much performance overhead that would have.


I don't see how that is practical, or at least it does not really 
address the issue. I think any loaded shared library could cause the 
same problem by running newlocale() + uselocale() on init. Perhaps I 
should go test that theory though.



I just found out about perl's "switch_to_global_locale" function
(https://perldoc.perl.org/perlapi#switch_to_global_locale). Should we
use that?


Maybe, although it does not seem to exist on the older perl version on 
RHEL7. And same comment as above -- while it might solve the problem 
with libperl, it doesn't address similar problems with other loaded 
shared libraries.



Testing the patch, I bumped into this:

postgres=# create or replace function finnish_to_number() returns
numeric as $$ select to_number('1,23', '9D99'); $$ language sql set
lc_numeric to 'fi_FI.utf8';
CREATE FUNCTION
postgres=# DO LANGUAGE 'plperlu' $$
use POSIX qw(setlocale LC_NUMERIC);
use locale;

setlocale LC_NUMERIC, "fi_FI.utf8";

$n = 5/2;   # Assign numeric 2.5 to $n

spi_exec_query('SELECT finnish_to_number()');

$a = " $n"; # Locale-dependent conversion to string
elog(NOTICE, "half five is $n");   # Locale-dependent output
$$;
NOTICE:  half five is 2,5
DO
postgres=# select to_char(now(), 'Day');
WARNING:  could not determine encoding for locale "en_GB.UTF-8": codeset
is "ANSI_X3.4-1968"
to_char
---
   Tuesday
(1 row)


Do you think that is because uselocale(LC_GLOBAL_LOCALE) pulls out the 
rug from under perl?


--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com





Re: BUG #17946: LC_MONETARY & DO LANGUAGE plperl - BUG

2023-06-19 Thread Heikki Linnakangas

On 18/06/2023 21:27, Joe Conway wrote:

I have proposed a targeted fix that I believe is safe to backpatch --
attached.

IIUC, Tom was +1, but Heikki was looking for a more general solution.

My issue with the more general solution is that it will likely be too
invasive to backpatch, and at the moment at least, there are no other
confirmed bugs related to all of this (even if the current code is more
fragile than we would prefer).


Ok, I agree switching to uselocale() everywhere is too much to 
backpatch. We should consider it for master though.


With the patch you're proposing, do we now have a coding rule that you 
must call "uselocale(LC_GLOBAL_LOCALE)" before every and any call to 
setlocale()? If so, you missed a few spots: pg_perm_setlocale, 
pg_bind_textdomain_codeset, and cache_locale_time.


The current locale affects a lot of other things than localeconv() 
calls. For example, LC_MESSAGES affects all strerror() calls. Do we need 
to call "uselocale(LC_GLOBAL_LOCALE)" before all possible strerror() 
calls too?


I think we should call "uselocale(LC_GLOBAL_LOCALE)" immediately after 
returning from the perl interpreter, instead of before setlocale() 
calls, if we want all Postgres code to run with the global locale. Not 
sure how much performance overhead that would have.


I just found out about perl's "switch_to_global_locale" function 
(https://perldoc.perl.org/perlapi#switch_to_global_locale). Should we 
use that?


Testing the patch, I bumped into this:

postgres=# create or replace function finnish_to_number() returns 
numeric as $$ select to_number('1,23', '9D99'); $$ language sql set 
lc_numeric to 'fi_FI.utf8';

CREATE FUNCTION
postgres=# DO LANGUAGE 'plperlu' $$
use POSIX qw(setlocale LC_NUMERIC);
use locale;

setlocale LC_NUMERIC, "fi_FI.utf8";

$n = 5/2;   # Assign numeric 2.5 to $n

spi_exec_query('SELECT finnish_to_number()');

$a = " $n"; # Locale-dependent conversion to string
elog(NOTICE, "half five is $n");   # Locale-dependent output
$$;
NOTICE:  half five is 2,5
DO
postgres=# select to_char(now(), 'Day');
WARNING:  could not determine encoding for locale "en_GB.UTF-8": codeset 
is "ANSI_X3.4-1968"

  to_char
---
 Tuesday
(1 row)

--
Heikki Linnakangas
Neon (https://neon.tech)





Re: BUG #17946: LC_MONETARY & DO LANGUAGE plperl - BUG

2023-06-18 Thread Joe Conway

On 6/12/23 17:28, Joe Conway wrote:

On 6/12/23 10:44, Joe Conway wrote:

1/ how do we fix the misbehavior reported due to libperl in existing
stable branches





I was mostly trying to concentrate on #1, but 2 & 3 are worthy of
discussion.


Hmm, browsing through the perl source I came across a reference to this
(from https://perldoc.perl.org/perllocale):

---
PERL_SKIP_LOCALE_INIT

  This environment variable, available starting in Perl v5.20, if set
(to any value), tells Perl to not use the rest of the environment
variables to initialize with. Instead, Perl uses whatever the current
locale settings are. This is particularly useful in embedded
environments, see "Using embedded Perl with POSIX locales" in perlembed.
---

Seems we ought to be using that.


Turns out that that does nothing useful as far as I can tell.

So I am back to proposing the attached against pg16beta1, to be 
backpatched to pg11.


Since much of the discussion happened on pgsql-bugs, the background 
summary for hackers is this:


When plperl is first loaded, the init function eventually works its way 
to calling Perl_init_i18nl10n(). In versions of perl >= 5.20, that ends 
up at S_emulate_setlocale() which does a series of uselocale() calls. 
For reference, RHEL 7 is perl 5.16.3 while RHEL 9 is perl 5.32.1. Older 
versions of perl do not have this behavior.


The problem with uselocale() is that it changes the active locale away 
from the default global locale. Subsequent uses of setlocale() affect 
the global locale, but if that is not the active locale, it does not 
control the results of locale dependent functions such as localeconv(), 
which is what we depend on in PGLC_localeconv().


The result is illustrated in this example:
8<
psql test
psql (16beta1)
Type "help" for help.

test=# show lc_monetary;
 lc_monetary
-
 en_GB.UTF-8
(1 row)

test=# SELECT 12.34::money AS price;
 price

 £12.34
(1 row)

test=# \q
8<
psql test
psql (16beta1)
Type "help" for help.

test=# load 'plperl';
LOAD
test=# show lc_monetary;
 lc_monetary
-
 en_GB.UTF-8
(1 row)

test=# SELECT 12.34::money AS price;
 price

 $12.34
(1 row)
8<

Notice that merely loading plperl makes the currency symbol wrong.

I have proposed a targeted fix that I believe is safe to backpatch -- 
attached.


IIUC, Tom was +1, but Heikki was looking for a more general solution.

My issue with the more general solution is that it will likely be too 
invasive to backpatch, and at the moment at least, there are no other 
confirmed bugs related to all of this (even if the current code is more 
fragile than we would prefer).


I would like to commit this to all supported branches in the next few 
days, unless there are other suggestions or objections.


--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
Ensure the global locale gets used by localeconv()

A loaded library, such as libperl, may call uselocale() underneath us.
This can result in localeconv() grabbing the wrong locale for
numeric and monetary symbols and formatting. Fix that by resetting
to the global locale determined by setlocale(). Backpatch to all
supported versions.

Author: Joe Conway
Reviewed-By: Tom Lane
Reported by: Guido Brugnara
Discussion: https://postgr.es/m/flat/17946-3e84cb577e9551c3%40postgresql.org
Backpatch-through: 11

diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index 31e3b16..9dba161 100644
*** a/src/backend/utils/adt/pg_locale.c
--- b/src/backend/utils/adt/pg_locale.c
*** PGLC_localeconv(void)
*** 505,510 
--- 505,517 
  	}
  
  	/*
+ 	 * Ensure the global locale will be used by localeconv().
+ 	 * This is necessary, for example, if another loaded library
+ 	 * such as libperl has done uselocale() underneath us.
+ 	 */
+ 	uselocale(LC_GLOBAL_LOCALE);
+ 
+ 	/*
  	 * This is tricky because we really don't want to risk throwing error
  	 * while the locale is set to other than our usual settings.  Therefore,
  	 * the process is: collect the usual settings, set locale to special
diff --git a/src/fe_utils/print.c b/src/fe_utils/print.c
index 7af1ccb..5d80e77 100644
*** a/src/fe_utils/print.c
--- b/src/fe_utils/print.c
*** setDecimalLocale(void)
*** 3628,3633 
--- 3628,3639 
  {
  	struct lconv *extlconv;
  
+ 	/*
+ 	 * Ensure the global locale will be used by localeconv().
+ 	 * This is necessary, for example, if another loaded library
+ 	 * has done uselocale() underneath us.
+ 	 */
+ 	uselocale(LC_GLOBAL_LOCALE);
  	extlconv = localeconv();
  
  	/* Don't accept an empty decimal_point string */


Re: BUG #17946: LC_MONETARY & DO LANGUAGE plperl - BUG

2023-06-15 Thread Tristan Partin
On Mon Jun 12, 2023 at 4:13 AM CDT, Heikki Linnakangas wrote:
> There are a few uselocale() calls in ecpg, and they are protected by 
> HAVE_USELOCALE. Interestingly, the calls in pg_locale.c are not, but 
> they are protected by HAVE_LOCALE_T. Seems a little inconsistent.

Patch is attached. CC-ing hackers.

-- 
Tristan Partin
Neon (https://neon.tech)
From 02a2cdb83405b5aecaec2af02e379a81161f8372 Mon Sep 17 00:00:00 2001
From: Tristan Partin 
Date: Wed, 14 Jun 2023 11:36:00 -0500
Subject: [PATCH v1] Make uselocale protection more consistent

In ecpg, uselocale uses are protected by checking if HAVE_USELOCALE is
defined. Use the same check in pg_locale.c. Since HAVE_USELOCALE implies
HAVE_LOCALE_T, the code should be the same on _all_ platforms that
Postgres supports. Otherwise, I am sure there would have been a bug
report with pg_locale.c failing to build due to the system having
locale_t, but not uselocale.
---
 src/backend/utils/adt/pg_locale.c | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index 31e3b16ae0..3585afb298 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -2972,7 +2972,7 @@ wchar2char(char *to, const wchar_t *from, size_t tolen, pg_locale_t locale)
 	}
 	else
 	{
-#ifdef HAVE_LOCALE_T
+#ifdef HAVE_USELOCALE
 #ifdef HAVE_WCSTOMBS_L
 		/* Use wcstombs_l for nondefault locales */
 		result = wcstombs_l(to, from, tolen, locale->info.lt);
@@ -2984,11 +2984,11 @@ wchar2char(char *to, const wchar_t *from, size_t tolen, pg_locale_t locale)
 
 		uselocale(save_locale);
 #endif			/* HAVE_WCSTOMBS_L */
-#else			/* !HAVE_LOCALE_T */
-		/* Can't have locale != 0 without HAVE_LOCALE_T */
+#else			/* !HAVE_USELOCALE */
+		/* Can't have locale != 0 without HAVE_LOCALE_T, which HAVE_USELOCALE implies */
 		elog(ERROR, "wcstombs_l is not available");
 		result = 0;/* keep compiler quiet */
-#endif			/* HAVE_LOCALE_T */
+#endif			/* HAVE_USELOCALE */
 	}
 
 	return result;
@@ -3049,7 +3049,7 @@ char2wchar(wchar_t *to, size_t tolen, const char *from, size_t fromlen,
 		}
 		else
 		{
-#ifdef HAVE_LOCALE_T
+#ifdef HAVE_USELOCALE
 #ifdef HAVE_MBSTOWCS_L
 			/* Use mbstowcs_l for nondefault locales */
 			result = mbstowcs_l(to, str, tolen, locale->info.lt);
@@ -3061,11 +3061,11 @@ char2wchar(wchar_t *to, size_t tolen, const char *from, size_t fromlen,
 
 			uselocale(save_locale);
 #endif			/* HAVE_MBSTOWCS_L */
-#else			/* !HAVE_LOCALE_T */
-			/* Can't have locale != 0 without HAVE_LOCALE_T */
+#else			/* !HAVE_USELOCALE */
+			/* Can't have locale != 0 without HAVE_LOCALE_T, which HAVE_USELOCALE implies */
 			elog(ERROR, "mbstowcs_l is not available");
 			result = 0;			/* keep compiler quiet */
-#endif			/* HAVE_LOCALE_T */
+#endif			/* HAVE_USELOCALE */
 		}
 
 		pfree(str);
-- 
Tristan Partin
Neon (https://neon.tech)



Re: BUG #17946: LC_MONETARY & DO LANGUAGE plperl - BUG

2023-06-12 Thread Joe Conway

On 6/12/23 10:44, Joe Conway wrote:

1/ how do we fix the misbehavior reported due to libperl in existing
stable branches





I was mostly trying to concentrate on #1, but 2 & 3 are worthy of
discussion.


Hmm, browsing through the perl source I came across a reference to this 
(from https://perldoc.perl.org/perllocale):


---
PERL_SKIP_LOCALE_INIT

This environment variable, available starting in Perl v5.20, if set 
(to any value), tells Perl to not use the rest of the environment 
variables to initialize with. Instead, Perl uses whatever the current 
locale settings are. This is particularly useful in embedded 
environments, see "Using embedded Perl with POSIX locales" in perlembed.

---

Seems we ought to be using that.

--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com





Re: BUG #17946: LC_MONETARY & DO LANGUAGE plperl - BUG

2023-06-12 Thread Joe Conway

(moving to hackers)

On 6/12/23 05:13, Heikki Linnakangas wrote:

On 10/06/2023 22:28, Joe Conway wrote:

On 6/10/23 15:07, Joe Conway wrote:

On 6/10/23 14:42, Tom Lane wrote:

Joe Conway  writes:

5/ The attached fixes the issue for me on pg10 and passes check-world.
Comments?


The call in PGLC_localeconv seems *very* oddly placed.  Why not
do that before it does any other locale calls?  Otherwise you don't
really have reason to believe you're saving the appropriate
values to restore later.



As far as I can tell it really only affects localeconv(), so I tried to
place it close to those. But I am fine with moving it up.


This version is against pg16 (rather than pg10), moves up that hunk,
mentions localeconv() in the comment as the reason for the call, and
fixes some whitespace sloppiness. I will plan to apply to all supported
branches.

Better?


The man page for uselocale(LC_GLOBAL_LOCALE) says: "The calling thread's
current locale is set to the global locale determined by setlocale(3)."
Does that undo the effect of calling uselocale() previously, so if you
later call setlocale(), the new locale takes effect in the thread too?
Or is it equivalent to "uselocale(LC_ALL, setlocale(NULL))", so that it
sets the thread's locale to the current global locale, but later
setlocale() calls have no effect on it?


setlocale() changes the global locale, but uselocale() changes the 
locale that is currently active, as I understand it.


Also note that uselocale man page says "Unlike setlocale(3), uselocale() 
does not allow selective replacement of individual locale  categories. 
To employ a locale that differs in only a few categories from the 
current locale, use calls to duplocale(3) and newlocale(3) to obtain a 
locale object equivalent to the current locale and modify the desired 
categories in that object."



In any case, this still doesn't feel like the right place. We have many
more setlocale() calls. Shouldn't we sprinkle them all with
uselocale(LC_GLOBAL_LOCALE)? cache_locale_time() for example. Or rather,
all the places where we use any functions that depend on the current locale.

How about we replace all setlocale() calls with uselocale()?


I don't see us backpatching something that invasive. It might be the 
right thing to do for pg17, or even pg16, but I think that is a 
different discussion



Shouldn't we restore the old thread-specific locale after the calls? I'm
not sure why libperl calls uselocale(), but we are now overwriting the
locale that it sets.


That is a good question. Though arguably perl is doing the wrong thing 
by not resetting the global locale when it is being used embedded.



We have a few other uselocale() calls in pg_locale.c, and we take
care to restore the old locale in those.


I think as long as we are relying on setlocale rather than uselocale in 
general (see above), the global locale is where we want things left.



There are a few uselocale() calls in ecpg, and they are protected by
HAVE_USELOCALE. Interestingly, the calls in pg_locale.c are not, but
they are protected by HAVE_LOCALE_T. Seems a little inconsistent.


Possibly something we should clean up, but I think that is separate from 
this fix.


In general I think we have 2 or possibly three distinct things here:

1/ how do we fix the misbehavior reported due to libperl in existing 
stable branches


2/ what makes most sense going forward (and does that mean pg16 or pg17)

3/ misc code cleanups

I was mostly trying to concentrate on #1, but 2 & 3 are worthy of 
discussion.


--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com