In ICU 54 and earlier, if ucol_open() is unable to find a matching
locale, it will fall back to the *environment*.

Using ICU 54:

  initdb -D data -N --locale="en_US.UTF-8"
  pg_ctl -D data -l logfile start
  psql postgres -c "create collation asdf(provider=icu, locale='asdf')"
  # returns true
  psql postgres -c "select 'abc' collate asdf < 'ABC' collate asdf"
  psql postgres -c "alter system set lc_messages='C'"
  pg_ctl -D data -l logfile restart
  # returns false and warns about collation version mismatch
  psql postgres -c "select 'abc' collate asdf < 'ABC' collate asdf"

This was fixed in ICU 55 to fall back to the root locale instead[1],
which is stable, has a collator version, and is not dependent on the
environment. As far as I can tell, 55 and later never fall back to the
environment when opening a collator (unless you explicitly pass NULL to
ucol_open(), which is documented).

It would be nice if we could detect when this fallback-to-environment
happens, so that we could just refuse to create the bogus collation.
But I didn't find a good way. There are non-error return codes from
ucol_open() that seem promising[2], but they aren't actually very
useful to distinguish the fallback-to-environment case as far as I can
tell.

Unless someone has a better idea, I think we need to bump the minimum
required ICU version to 55. That would solve the issue in v16 and
later, but those using old versions of ICU and old versions of postgres
would still be vulnerable to these kinds of typos.

Regards,
        Jeff Davis


[1] https://icu.unicode.org/download/55m1
[2]
https://unicode-org.github.io/icu-docs/apidoc/dev/icu4c/utypes_8h.html#a3343c1c8a8377277046774691c98d78c


Reply via email to