Re: [HACKERS] move collation import to backend

2017-01-18 Thread Tom Lane
Jeff Janes  writes:
> With this commit, I'm getting 'make check' fail at initdb with the error:

> 2017-01-18 07:47:50.565 PST [43691] FATAL:  collation "aa_ER@saaho" for
> encoding "UTF8" already exists

Yeah, so are large chunks of the buildfarm.  Having now read the patch,
I see that the problem is that it simply ignored the de-duplication
logic that existed in initdb's implementation.  That was put there
on the basis of bitter experience, as I recall.

The new code seems to think it's sufficient to do an "if not exists"
insertion when generating abbreviated names, but that's wrong, and
even if it avoided outright failures, it would be nondeterministic
(I doubt "locale -a" is guaranteed to emit locale names in any
particular order).

I think this needs to be reverted pending redesign of the de-duplication
coding.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] move collation import to backend

2017-01-18 Thread Jeff Janes
On Tue, Jan 17, 2017 at 9:05 AM, Peter Eisentraut <
peter.eisentr...@2ndquadrant.com> wrote:

> On 1/9/17 10:04 PM, Euler Taveira wrote:
> > On 18-12-2016 18:30, Peter Eisentraut wrote:
> >> Updated patch with that fix.
> >>
> > Peter, I reviewed and improved your patch.
> >
> > * I document the new function. Since collation is a database object, I
> > chose "Database Object Management Functions" section.
>
> OK
>
> > * I've added a check to any-encoding database because I got 'FATAL:
> > collation "C" already exists' on Debian 8, although, I didn't get on
> > CentOS 7. The problem is that Debian has two locales for C (C and
> > C.UTF-8) and CentOS has just one (C).
>
> OK
>
> > * I've added OidIsValid to test the new collation row.
>
> OK
>
> > * I've changed the parameter order. Schema seems more important than
> > if_not_exists. Also, we generally leave those boolean parameters for the
> > end of list. I don't turn if_not_exists optional but IMO it would be a
> > good idea (default = true).
>
> I put them that way because in an SQL command the "IF NOT EXISTS" comes
> before the schema, but I can see how it is weird that way in a function.
>
> > * You removed some #if and #ifdef while moving things around. I put it
> back.
> > * You didn't pgident some lines of code but I'm sure you didn't for a
> > small patch footprint.
>
> I had left the #if in initdb, but I think your changes are better.
>
> > I'm attaching the complete and also a patch at the top of your last
> patch.
>
> Thanks.  If there are no more comments, I will proceed with that.
>
>
With this commit, I'm getting 'make check' fail at initdb with the error:

2017-01-18 07:47:50.565 PST [43691] FATAL:  collation "aa_ER@saaho" for
encoding "UTF8" already exists
2017-01-18 07:47:50.565 PST [43691] STATEMENT:  SELECT
pg_import_system_collations(if_not_exists => false, schema => 'pg_catalog');

My system:

CentOS release 6.8 (Final)
gcc version 4.4.7 20120313 (Red Hat 4.4.7-17) (GCC)

./configure > /dev/null # no options

$ locale -a|fgrep aa_ER
aa_ER
aa_ER.utf8
aa_ER.utf8@saaho
aa_ER@saaho

I have no idea what @ means in a locale, I'm just relaying the information.

Cheers,

Jeff


Re: [HACKERS] move collation import to backend

2017-01-17 Thread Peter Eisentraut
On 1/9/17 10:04 PM, Euler Taveira wrote:
> On 18-12-2016 18:30, Peter Eisentraut wrote:
>> Updated patch with that fix.
>>
> Peter, I reviewed and improved your patch.
> 
> * I document the new function. Since collation is a database object, I
> chose "Database Object Management Functions" section.

OK

> * I've added a check to any-encoding database because I got 'FATAL:
> collation "C" already exists' on Debian 8, although, I didn't get on
> CentOS 7. The problem is that Debian has two locales for C (C and
> C.UTF-8) and CentOS has just one (C).

OK

> * I've added OidIsValid to test the new collation row.

OK

> * I've changed the parameter order. Schema seems more important than
> if_not_exists. Also, we generally leave those boolean parameters for the
> end of list. I don't turn if_not_exists optional but IMO it would be a
> good idea (default = true).

I put them that way because in an SQL command the "IF NOT EXISTS" comes
before the schema, but I can see how it is weird that way in a function.

> * You removed some #if and #ifdef while moving things around. I put it back.
> * You didn't pgident some lines of code but I'm sure you didn't for a
> small patch footprint.

I had left the #if in initdb, but I think your changes are better.

> I'm attaching the complete and also a patch at the top of your last patch.

Thanks.  If there are no more comments, I will proceed with that.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] move collation import to backend

2017-01-09 Thread Euler Taveira
On 18-12-2016 18:30, Peter Eisentraut wrote:
> Updated patch with that fix.
> 
Peter, I reviewed and improved your patch.

* I document the new function. Since collation is a database object, I
chose "Database Object Management Functions" section.
* I've added a check to any-encoding database because I got 'FATAL:
collation "C" already exists' on Debian 8, although, I didn't get on
CentOS 7. The problem is that Debian has two locales for C (C and
C.UTF-8) and CentOS has just one (C).
* I've added OidIsValid to test the new collation row.
* I've changed the parameter order. Schema seems more important than
if_not_exists. Also, we generally leave those boolean parameters for the
end of list. I don't turn if_not_exists optional but IMO it would be a
good idea (default = true).
* You removed some #if and #ifdef while moving things around. I put it back.
* You didn't pgident some lines of code but I'm sure you didn't for a
small patch footprint.
* I didn't test on Windows.
* As a last comment, you set cost = 100 and it seems reasonable because
it lasts 411 ms to scan/load 923 collations in my slow VM. However,
successive executions takes ~1200 ms.

I'm attaching the complete and also a patch at the top of your last patch.


-- 
   Euler Taveira   Timbira - http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 10e3186..1e52a48 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -19190,6 +19190,38 @@ postgres=# SELECT * FROM pg_xlogfile_name_offset(pg_stop_backup());
 in the database's default tablespace, the tablespace can be specified as 0.

 
+   
+   Operating system collations are loaded with the
+   pg_import_system_collations function, shown in .
+   
+
+   
+Collation Functions
+
+ 
+  Name Return Type Description
+ 
+
+ 
+  
+   
+pg_import_system_collations
+pg_import_system_collations(schema regnamespace, if_not_exists boolean)
+   
+   void
+   Import operating system collations
+  
+ 
+
+   
+
+   
+   pg_import_system_collations loads collations that it finds on
+   the operating system into system catalog pg_collation,
+   skipping those that are already present.
+   
+
   
 
   
diff --git a/src/backend/catalog/pg_collation.c b/src/backend/catalog/pg_collation.c
index 63c2eb9..694c0f6 100644
--- a/src/backend/catalog/pg_collation.c
+++ b/src/backend/catalog/pg_collation.c
@@ -98,10 +98,21 @@ CollationCreate(const char *collname, Oid collnamespace,
 			  PointerGetDatum(collname),
 			  Int32GetDatum(-1),
 			  ObjectIdGetDatum(collnamespace)))
-		ereport(ERROR,
+	{
+		if (if_not_exists)
+		{
+			ereport(NOTICE,
+(errcode(ERRCODE_DUPLICATE_OBJECT),
+ errmsg("collation \"%s\" already exists, skipping",
+		collname)));
+			return InvalidOid;
+		}
+		else
+			ereport(ERROR,
 (errcode(ERRCODE_DUPLICATE_OBJECT),
  errmsg("collation \"%s\" already exists",
 		collname)));
+	}
 
 	/* open pg_collation */
 	rel = heap_open(CollationRelationId, RowExclusiveLock);
diff --git a/src/backend/commands/collationcmds.c b/src/backend/commands/collationcmds.c
index e108b50..cf3acea 100644
--- a/src/backend/commands/collationcmds.c
+++ b/src/backend/commands/collationcmds.c
@@ -139,7 +139,7 @@ DefineCollation(ParseState *pstate, List *names, List *parameters)
 			 collctype,
 			 false);
 
-	if (!newoid)
+	if (!OidIsValid(newoid))
 		return InvalidObjectAddress;
 
 	ObjectAddressSet(address, CollationRelationId, newoid);
@@ -183,6 +183,7 @@ IsThereCollationInNamespace(const char *collname, Oid nspOid)
 }
 
 
+#ifdef HAVE_LOCALE_T
 /*
  * "Normalize" a locale name, stripping off encoding tags such as
  * ".utf8" (e.g., "en_US.utf8" -> "en_US", but "br_FR.iso885915@euro"
@@ -216,13 +217,15 @@ normalize_locale_name(char *new, const char *old)
 
 	return changed;
 }
+#endif	/* HAVE_LOCALE_T */
 
 
 Datum
 pg_import_system_collations(PG_FUNCTION_ARGS)
 {
-	bool		if_not_exists = PG_GETARG_BOOL(0);
-	Oid nspid = PG_GETARG_OID(1);
+#if defined(HAVE_LOCALE_T) && !defined(WIN32)
+	Oid nspid = PG_GETARG_OID(0);
+	bool		if_not_exists = PG_GETARG_BOOL(1);
 
 	FILE	   *locale_a_handle;
 	char		localebuf[NAMEDATALEN]; /* we assume ASCII so this is fine */
@@ -321,6 +324,7 @@ pg_import_system_collations(PG_FUNCTION_ARGS)
 	if (count == 0)
 		ereport(ERROR,
 (errmsg("no usable system locales were found")));
+#endif	/* not HAVE_LOCALE_T && not WIN32 */
 
 	PG_RETURN_VOID();
 }
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index b0126a9..bb8637e 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -5345,7 +5345,7 @@ DESCR("pg_controldata recovery state information as a function");
 DATA(insert OID = 3444 ( pg_control_init PGNSP PGUID 12 1 0 0 0 f f f f t f v s 0 0 2249 "" "{23,23,23,23,23,23,23,2

Re: [HACKERS] move collation import to backend

2016-12-18 Thread Peter Eisentraut
On 11/30/16 8:18 AM, Peter Eisentraut wrote:
>> It seems not to be project style to have prototypes in the middle of the
>> file...
> 
> OK, will fix.

Updated patch with that fix.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From 0c17610b698cc335bc0aed1a66d151e96f618537 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Wed, 30 Nov 2016 12:00:00 -0500
Subject: [PATCH v3] Add function to import operation system collations

Move this logic out of initdb into a user-callable function.  This
simplifies the code and makes it possible to update the standard
collations later on if additional operating system collations appear.
---
 src/backend/catalog/pg_collation.c|  18 +++-
 src/backend/commands/collationcmds.c  | 149 +-
 src/bin/initdb/initdb.c   | 164 +-
 src/include/catalog/pg_collation_fn.h |   3 +-
 src/include/catalog/pg_proc.h |   3 +
 src/include/commands/collationcmds.h  |   2 +
 6 files changed, 172 insertions(+), 167 deletions(-)

diff --git a/src/backend/catalog/pg_collation.c b/src/backend/catalog/pg_collation.c
index f37cf37c4a..cda64c44a1 100644
--- a/src/backend/catalog/pg_collation.c
+++ b/src/backend/catalog/pg_collation.c
@@ -41,7 +41,8 @@ Oid
 CollationCreate(const char *collname, Oid collnamespace,
 Oid collowner,
 int32 collencoding,
-const char *collcollate, const char *collctype)
+const char *collcollate, const char *collctype,
+bool if_not_exists)
 {
 	Relation	rel;
 	TupleDesc	tupDesc;
@@ -72,10 +73,21 @@ CollationCreate(const char *collname, Oid collnamespace,
 			  PointerGetDatum(collname),
 			  Int32GetDatum(collencoding),
 			  ObjectIdGetDatum(collnamespace)))
-		ereport(ERROR,
+	{
+		if (if_not_exists)
+		{
+			ereport(NOTICE,
 (errcode(ERRCODE_DUPLICATE_OBJECT),
- errmsg("collation \"%s\" for encoding \"%s\" already exists",
+ errmsg("collation \"%s\" for encoding \"%s\" already exists, skipping",
 		collname, pg_encoding_to_char(collencoding;
+			return InvalidOid;
+		}
+		else
+			ereport(ERROR,
+	(errcode(ERRCODE_DUPLICATE_OBJECT),
+	 errmsg("collation \"%s\" for encoding \"%s\" already exists",
+			collname, pg_encoding_to_char(collencoding;
+	}
 
 	/*
 	 * Also forbid matching an any-encoding entry.  This test of course is not
diff --git a/src/backend/commands/collationcmds.c b/src/backend/commands/collationcmds.c
index 9bba748708..eafc0a99fa 100644
--- a/src/backend/commands/collationcmds.c
+++ b/src/backend/commands/collationcmds.c
@@ -136,7 +136,11 @@ DefineCollation(ParseState *pstate, List *names, List *parameters)
 			 GetUserId(),
 			 GetDatabaseEncoding(),
 			 collcollate,
-			 collctype);
+			 collctype,
+			 false);
+
+	if (!newoid)
+		return InvalidObjectAddress;
 
 	ObjectAddressSet(address, CollationRelationId, newoid);
 
@@ -177,3 +181,146 @@ IsThereCollationInNamespace(const char *collname, Oid nspOid)
  errmsg("collation \"%s\" already exists in schema \"%s\"",
 		collname, get_namespace_name(nspOid;
 }
+
+
+/*
+ * "Normalize" a locale name, stripping off encoding tags such as
+ * ".utf8" (e.g., "en_US.utf8" -> "en_US", but "br_FR.iso885915@euro"
+ * -> "br_FR@euro").  Return true if a new, different name was
+ * generated.
+ */
+static bool
+normalize_locale_name(char *new, const char *old)
+{
+	char	   *n = new;
+	const char *o = old;
+	bool		changed = false;
+
+	while (*o)
+	{
+		if (*o == '.')
+		{
+			/* skip over encoding tag such as ".utf8" or ".UTF-8" */
+			o++;
+			while ((*o >= 'A' && *o <= 'Z')
+   || (*o >= 'a' && *o <= 'z')
+   || (*o >= '0' && *o <= '9')
+   || (*o == '-'))
+o++;
+			changed = true;
+		}
+		else
+			*n++ = *o++;
+	}
+	*n = '\0';
+
+	return changed;
+}
+
+
+Datum
+pg_import_system_collations(PG_FUNCTION_ARGS)
+{
+	bool		if_not_exists = PG_GETARG_BOOL(0);
+	Oid nspid = PG_GETARG_OID(1);
+
+	FILE	   *locale_a_handle;
+	char		localebuf[NAMEDATALEN]; /* we assume ASCII so this is fine */
+	int			count = 0;
+
+	if (!superuser())
+		ereport(ERROR,
+(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+ (errmsg("must be superuser to import system collations";
+
+	locale_a_handle = OpenPipeStream("locale -a", "r");
+	if (locale_a_handle == NULL)
+		ereport(ERROR,
+(errcode_for_file_access(),
+ errmsg("could not execute command \"%s\": %m",
+		"locale -a")));
+
+	while (fgets(localebuf, sizeof(localebuf), locale_a_handle))
+	{
+		int			i;
+		size_t		len;
+		int			enc;
+		bool		skip;
+		char		alias[NAMEDATALEN];
+
+		len = strlen(localebuf);
+
+		if (len == 0 || localebuf[len - 1] != '\n')
+		{
+			elog(DEBUG1, "locale name too long, skipped: \"%s\"", localebuf);
+			continue;
+		}
+		localebuf[len - 1] = '\0';
+
+		/*
+		 * Some systems have locale names that don't consist entirely of ASCII
+		 * letters (such as "bokmå

Re: [HACKERS] move collation import to backend

2016-12-04 Thread Haribabu Kommi
On Thu, Dec 1, 2016 at 12:18 AM, Peter Eisentraut <
peter.eisentr...@2ndquadrant.com> wrote:

>
> >
>  +
>  +Datum pg_import_system_collations(PG_FUNCTION_ARGS);
>  +
>  +Datum
>  +pg_import_system_collations(PG_FUNCTION_ARGS)
>  +{
> >>>
> >>> Uh?
> >>
> >> Required to avoid compiler warning about missing prototype.
> >
> > It seems not to be project style to have prototypes in the middle of the
> > file...
>
> OK, will fix.
>

Moved to next CF with "waiting on author" status.

Regards,
Hari Babu
Fujitsu Australia


Re: [HACKERS] move collation import to backend

2016-11-30 Thread Peter Eisentraut
On 11/29/16 2:53 PM, Andres Freund wrote:
> On 2016-11-29 12:16:37 -0500, Peter Eisentraut wrote:
>> On 11/12/16 10:38 AM, Andres Freund wrote:
/*
 * Also forbid matching an any-encoding entry.  This test of course is 
 not
 * backed up by the unique index, but it's not a problem since we don't
 * support adding any-encoding entries after initdb.
 */
>>>
>>> Note that this isn't true anymore...
>>
>> I think this is still correct, because the collation import does not
>> produce any any-encoding entries (collencoding = -1).
> 
> Well, the comment "don't support adding any-encoding entries after
> initdb." is now wrong.

I think there is a misunderstanding.  The comment says that we don't
support adding encodings that have collencoding = -1 after initdb.  That
is still true.  Note that the original comment as two "any"'s.  With
this patch, we would now support adding collations with collencoding <>
-1 after initdb.

> 
 +
 +Datum pg_import_system_collations(PG_FUNCTION_ARGS);
 +
 +Datum
 +pg_import_system_collations(PG_FUNCTION_ARGS)
 +{
>>>
>>> Uh?
>>
>> Required to avoid compiler warning about missing prototype.
> 
> It seems not to be project style to have prototypes in the middle of the
> file...

OK, will fix.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] move collation import to backend

2016-11-29 Thread Tom Lane
Andres Freund  writes:
> On 2016-11-29 12:16:37 -0500, Peter Eisentraut wrote:
>> Required to avoid compiler warning about missing prototype.

> It seems not to be project style to have prototypes in the middle of the
> file...

I agree.  Please put that in builtins.h, if you can't find any better
header for it.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] move collation import to backend

2016-11-29 Thread Andres Freund
On 2016-11-29 12:16:37 -0500, Peter Eisentraut wrote:
> On 11/12/16 10:38 AM, Andres Freund wrote:
> >>/*
> >> * Also forbid matching an any-encoding entry.  This test of course is 
> >> not
> >> * backed up by the unique index, but it's not a problem since we don't
> >> * support adding any-encoding entries after initdb.
> >> */
> > 
> > Note that this isn't true anymore...
> 
> I think this is still correct, because the collation import does not
> produce any any-encoding entries (collencoding = -1).

Well, the comment "don't support adding any-encoding entries after
initdb." is now wrong.

> >> +
> >> +Datum pg_import_system_collations(PG_FUNCTION_ARGS);
> >> +
> >> +Datum
> >> +pg_import_system_collations(PG_FUNCTION_ARGS)
> >> +{
> > 
> > Uh?
> 
> Required to avoid compiler warning about missing prototype.

It seems not to be project style to have prototypes in the middle of the
file...

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] move collation import to backend

2016-11-29 Thread Peter Eisentraut
On 11/12/16 10:38 AM, Andres Freund wrote:
> E.g. what if previously present collations are now unavailable?

You get an error message when you try to use the collation.  I think
that is a different class of problems.

>>  
>>  /*
>>   * Also forbid matching an any-encoding entry.  This test of course is 
>> not
>>   * backed up by the unique index, but it's not a problem since we don't
>>   * support adding any-encoding entries after initdb.
>>   */
> 
> Note that this isn't true anymore...

I think this is still correct, because the collation import does not
produce any any-encoding entries (collencoding = -1).

>> +
>> +Datum pg_import_system_collations(PG_FUNCTION_ARGS);
>> +
>> +Datum
>> +pg_import_system_collations(PG_FUNCTION_ARGS)
>> +{
> 
> Uh?

Required to avoid compiler warning about missing prototype.

> This function needs to have !superuser permissions revoked, which it
> afaics currently hasn't.

Done.

New patch attached (includes OID change because of conflict).

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From bb6710c55df3a5f7023ddcda749e05e05e49bc59 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Tue, 29 Nov 2016 12:00:00 -0500
Subject: [PATCH v2] Add function to import operation system collations

Move this logic out of initdb into a user-callable function.  This
simplifies the code and makes it possible to update the standard
collations later on if additional operating system collations appear.
---
 src/backend/catalog/pg_collation.c|  18 +++-
 src/backend/commands/collationcmds.c  | 151 ++-
 src/bin/initdb/initdb.c   | 164 +-
 src/include/catalog/pg_collation_fn.h |   3 +-
 src/include/catalog/pg_proc.h |   3 +
 5 files changed, 172 insertions(+), 167 deletions(-)

diff --git a/src/backend/catalog/pg_collation.c b/src/backend/catalog/pg_collation.c
index f37cf37..cda64c4 100644
--- a/src/backend/catalog/pg_collation.c
+++ b/src/backend/catalog/pg_collation.c
@@ -41,7 +41,8 @@ Oid
 CollationCreate(const char *collname, Oid collnamespace,
 Oid collowner,
 int32 collencoding,
-const char *collcollate, const char *collctype)
+const char *collcollate, const char *collctype,
+bool if_not_exists)
 {
 	Relation	rel;
 	TupleDesc	tupDesc;
@@ -72,10 +73,21 @@ CollationCreate(const char *collname, Oid collnamespace,
 			  PointerGetDatum(collname),
 			  Int32GetDatum(collencoding),
 			  ObjectIdGetDatum(collnamespace)))
-		ereport(ERROR,
+	{
+		if (if_not_exists)
+		{
+			ereport(NOTICE,
 (errcode(ERRCODE_DUPLICATE_OBJECT),
- errmsg("collation \"%s\" for encoding \"%s\" already exists",
+ errmsg("collation \"%s\" for encoding \"%s\" already exists, skipping",
 		collname, pg_encoding_to_char(collencoding;
+			return InvalidOid;
+		}
+		else
+			ereport(ERROR,
+	(errcode(ERRCODE_DUPLICATE_OBJECT),
+	 errmsg("collation \"%s\" for encoding \"%s\" already exists",
+			collname, pg_encoding_to_char(collencoding;
+	}
 
 	/*
 	 * Also forbid matching an any-encoding entry.  This test of course is not
diff --git a/src/backend/commands/collationcmds.c b/src/backend/commands/collationcmds.c
index 9bba748..f4b7b65 100644
--- a/src/backend/commands/collationcmds.c
+++ b/src/backend/commands/collationcmds.c
@@ -136,7 +136,11 @@ DefineCollation(ParseState *pstate, List *names, List *parameters)
 			 GetUserId(),
 			 GetDatabaseEncoding(),
 			 collcollate,
-			 collctype);
+			 collctype,
+			 false);
+
+	if (!newoid)
+		return InvalidObjectAddress;
 
 	ObjectAddressSet(address, CollationRelationId, newoid);
 
@@ -177,3 +181,148 @@ IsThereCollationInNamespace(const char *collname, Oid nspOid)
  errmsg("collation \"%s\" already exists in schema \"%s\"",
 		collname, get_namespace_name(nspOid;
 }
+
+
+/*
+ * "Normalize" a locale name, stripping off encoding tags such as
+ * ".utf8" (e.g., "en_US.utf8" -> "en_US", but "br_FR.iso885915@euro"
+ * -> "br_FR@euro").  Return true if a new, different name was
+ * generated.
+ */
+static bool
+normalize_locale_name(char *new, const char *old)
+{
+	char	   *n = new;
+	const char *o = old;
+	bool		changed = false;
+
+	while (*o)
+	{
+		if (*o == '.')
+		{
+			/* skip over encoding tag such as ".utf8" or ".UTF-8" */
+			o++;
+			while ((*o >= 'A' && *o <= 'Z')
+   || (*o >= 'a' && *o <= 'z')
+   || (*o >= '0' && *o <= '9')
+   || (*o == '-'))
+o++;
+			changed = true;
+		}
+		else
+			*n++ = *o++;
+	}
+	*n = '\0';
+
+	return changed;
+}
+
+
+Datum pg_import_system_collations(PG_FUNCTION_ARGS);
+
+Datum
+pg_import_system_collations(PG_FUNCTION_ARGS)
+{
+	bool		if_not_exists = PG_GETARG_BOOL(0);
+	Oid nspid = PG_GETARG_OID(1);
+
+	FILE	   *locale_a_handle;
+	char		localebuf[NAMEDATALEN]; /* we assume ASCII so this is fine */
+	int			count = 0;
+
+	if 

Re: [HACKERS] move collation import to backend

2016-11-12 Thread Andres Freund
Hi,

On 2016-10-27 21:56:53 -0400, Peter Eisentraut wrote:
> Currently, initdb parses locale -a output to populate pg_collation.  If
> additional collations are installed in the operating system, it is not
> possible to repeat this process, only by doing each step manually.  So I
> propose to move this to a backend function that can be called
> separately, and have initdb call that.  Running this logic in the
> backend instead of initdb also makes the code simpler.  If we add other
> collation providers such as ICU, initdb doesn't need to know about that
> at all, because all the logic would be contained in the backend.

That generally sounds like a good idea.  There's some questions imo:
E.g. what if previously present collations are now unavailable?

> I thought about making this a top-level command (IMPORT COLLATIONS ...
> ?) but decided against it for now, to keep it simple.

Seems ok to me.

>  
>   /*
>* Also forbid matching an any-encoding entry.  This test of course is 
> not
>* backed up by the unique index, but it's not a problem since we don't
>* support adding any-encoding entries after initdb.
>*/

Note that this isn't true anymore...

> +
> +Datum pg_import_system_collations(PG_FUNCTION_ARGS);
> +
> +Datum
> +pg_import_system_collations(PG_FUNCTION_ARGS)
> +{

Uh?

> + boolif_not_exists = PG_GETARG_BOOL(0);
> + Oid nspid = PG_GETARG_OID(1);
> +
> + FILE   *locale_a_handle;
> + charlocalebuf[NAMEDATALEN]; /* we assume ASCII so this is 
> fine */
> + int count = 0;
> +
> + locale_a_handle = OpenPipeStream("locale -a", "r");
> + if (locale_a_handle == NULL)
> + ereport(ERROR,
> + (errcode_for_file_access(),
> +  errmsg("could not execute command \"%s\": %m",
> + "locale -a")));

This function needs to have !superuser permissions revoked, which it
afaics currently hasn't.


Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] move collation import to backend

2016-10-27 Thread Peter Eisentraut
Currently, initdb parses locale -a output to populate pg_collation.  If
additional collations are installed in the operating system, it is not
possible to repeat this process, only by doing each step manually.  So I
propose to move this to a backend function that can be called
separately, and have initdb call that.  Running this logic in the
backend instead of initdb also makes the code simpler.  If we add other
collation providers such as ICU, initdb doesn't need to know about that
at all, because all the logic would be contained in the backend.

Here is an example:

select pg_import_system_collations(if_not_exists => false, schema =>
'test');

(Specifying the schema also allows testing this without overwriting
pg_catalog.)

I thought about making this a top-level command (IMPORT COLLATIONS ...
?) but decided against it for now, to keep it simple.  Right now, this
is more of a refactoring.  Documentation could be added if we decide so.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From 97fb7f992b95d2ca8725011dc141dad88051a8cd Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Thu, 13 Oct 2016 12:00:00 -0400
Subject: [PATCH] Add function to import operation system collations

Move this logic out of initdb into a user-callable function.  This
simplifies the code and makes it possible to update the standard
collations later on if additional operating system collations appear.
---
 src/backend/catalog/pg_collation.c|  18 +++-
 src/backend/commands/collationcmds.c  | 146 +-
 src/bin/initdb/initdb.c   | 164 +-
 src/include/catalog/pg_collation_fn.h |   3 +-
 src/include/catalog/pg_proc.h |   3 +
 5 files changed, 167 insertions(+), 167 deletions(-)

diff --git a/src/backend/catalog/pg_collation.c b/src/backend/catalog/pg_collation.c
index f37cf37..cda64c4 100644
--- a/src/backend/catalog/pg_collation.c
+++ b/src/backend/catalog/pg_collation.c
@@ -41,7 +41,8 @@ Oid
 CollationCreate(const char *collname, Oid collnamespace,
 Oid collowner,
 int32 collencoding,
-const char *collcollate, const char *collctype)
+const char *collcollate, const char *collctype,
+bool if_not_exists)
 {
 	Relation	rel;
 	TupleDesc	tupDesc;
@@ -72,10 +73,21 @@ CollationCreate(const char *collname, Oid collnamespace,
 			  PointerGetDatum(collname),
 			  Int32GetDatum(collencoding),
 			  ObjectIdGetDatum(collnamespace)))
-		ereport(ERROR,
+	{
+		if (if_not_exists)
+		{
+			ereport(NOTICE,
 (errcode(ERRCODE_DUPLICATE_OBJECT),
- errmsg("collation \"%s\" for encoding \"%s\" already exists",
+ errmsg("collation \"%s\" for encoding \"%s\" already exists, skipping",
 		collname, pg_encoding_to_char(collencoding;
+			return InvalidOid;
+		}
+		else
+			ereport(ERROR,
+	(errcode(ERRCODE_DUPLICATE_OBJECT),
+	 errmsg("collation \"%s\" for encoding \"%s\" already exists",
+			collname, pg_encoding_to_char(collencoding;
+	}
 
 	/*
 	 * Also forbid matching an any-encoding entry.  This test of course is not
diff --git a/src/backend/commands/collationcmds.c b/src/backend/commands/collationcmds.c
index 9bba748..062e3b6 100644
--- a/src/backend/commands/collationcmds.c
+++ b/src/backend/commands/collationcmds.c
@@ -136,7 +136,11 @@ DefineCollation(ParseState *pstate, List *names, List *parameters)
 			 GetUserId(),
 			 GetDatabaseEncoding(),
 			 collcollate,
-			 collctype);
+			 collctype,
+			 false);
+
+	if (!newoid)
+		return InvalidObjectAddress;
 
 	ObjectAddressSet(address, CollationRelationId, newoid);
 
@@ -177,3 +181,143 @@ IsThereCollationInNamespace(const char *collname, Oid nspOid)
  errmsg("collation \"%s\" already exists in schema \"%s\"",
 		collname, get_namespace_name(nspOid;
 }
+
+
+/*
+ * "Normalize" a locale name, stripping off encoding tags such as
+ * ".utf8" (e.g., "en_US.utf8" -> "en_US", but "br_FR.iso885915@euro"
+ * -> "br_FR@euro").  Return true if a new, different name was
+ * generated.
+ */
+static bool
+normalize_locale_name(char *new, const char *old)
+{
+	char	   *n = new;
+	const char *o = old;
+	bool		changed = false;
+
+	while (*o)
+	{
+		if (*o == '.')
+		{
+			/* skip over encoding tag such as ".utf8" or ".UTF-8" */
+			o++;
+			while ((*o >= 'A' && *o <= 'Z')
+   || (*o >= 'a' && *o <= 'z')
+   || (*o >= '0' && *o <= '9')
+   || (*o == '-'))
+o++;
+			changed = true;
+		}
+		else
+			*n++ = *o++;
+	}
+	*n = '\0';
+
+	return changed;
+}
+
+
+Datum pg_import_system_collations(PG_FUNCTION_ARGS);
+
+Datum
+pg_import_system_collations(PG_FUNCTION_ARGS)
+{
+	bool		if_not_exists = PG_GETARG_BOOL(0);
+	Oid nspid = PG_GETARG_OID(1);
+
+	FILE	   *locale_a_handle;
+	char		localebuf[NAMEDATALEN]; /* we assume ASCII so this is fine */
+	int			count = 0;
+
+	locale_a_handle = OpenPipeStream("locale -a", "r");
+	if (locale_a