I wrote:
> Marco Atzeri <marco.atz...@gmail.com> writes:
>> Building on Cygwin latest 10  beta1 or head sourece,
>> make check fails as:
>> ...
>> performing post-bootstrap initialization ... 2017-05-31 23:23:22.214 
>> CEST [16860] FATAL:  collation "ja_JP" for encoding "EUC_JP" already exists

> Hmph.  Could we see the results of "locale -a | grep ja_JP" ?

Despite the lack of followup from the OP, I'm pretty troubled by this
report.  It shows that the reimplementation of OS collation data import
as pg_import_system_collations() is a whole lot more fragile than the
original coding.  We have never before trusted "locale -a" to not produce
duplicate outputs, not since the very beginning in 414c5a2e.  AFAICS,
the current coding has also lost the protections we added very shortly
after that in 853c1750f; and it has also lost the admittedly rather
arbitrary, but at least deterministic, preference order for conflicting
short aliases that was in the original initdb code.

I suppose the idea was to see whether we actually needed those defenses,
but since we have here a failure report after less than a month of beta,
it seems clear to me that we do.  I think we need to upgrade
pg_import_system_collations to have all the same logic that was there
before.

Now the hard part of that is that because pg_import_system_collations
isn't using a temporary staging table, but is just inserting directly
into pg_collation, there isn't any way for it to eliminate duplicates
unless it uses if_not_exists behavior all the time.  So there seem to
be two ways to proceed:

1. Drop pg_import_system_collations' if_not_exists argument and just
define it as adding any collations not already known in pg_collation.

2. Significantly rewrite it so that it de-dups the collation set by
hand before trying to insert into pg_collation.

#2 seems like a lot more work, but on the other hand, we might need
most of that logic anyway to get back deterministic alias handling.
However, since I cannot see any real-world use case at all for
if_not_exists = false, I figure we might as well do #1 and take
whatever simplification we can get that way.

I'm willing to do the legwork on this, but before I start, does
anyone have any ideas or objections?

                        regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to