Re: [HACKERS] CREATE COLLATION does not sanitize ICU's BCP 47 language tags. Should it?

Peter Eisentraut Fri, 22 Sep 2017 11:35:42 -0700

On 9/21/17 16:55, Peter Geoghegan wrote:
>> I strongly prefer if there, as much as possible, is only one format for
>> inputting ICU locales.
> I agree, but the bigger issue is that we're *half way* between
> supporting only one format, and supporting two formats. AFAICT, there
> is no reason that we can't simply support one format on all ICU
> versions, and keep what ends up within pg_collation at initdb time
> essentially the same across ICU versions (except for those that are
> due to cultural/political developments).


After reviewing this thread and testing around a bit, I think the code
is mostly fine as it is, but the documentation is lacking.  Hence,
attached is a patch to expand the documentation quite a bit, especially
to document in more detail what ICU locale strings are accepted.

The documentation has always stated, albeit perhaps in obscure ways,
that we accept for locales whatever ICU accepts.  I don't think we
should restrict or override that in any way.  That would cause existing
documentation and lore on ICU to be invalid for PostgreSQL.

I specifically disagree that we should, as appears to be suggested here,
restrict the input locale strings to BCP 47 and reject or transform the
traditional ICU-specific locale syntax.  Again, that would cause
existing ICU documentation material to become less applicable to
PostgreSQL.  And basically, there is no reason for it, because I am not
aware that ICU plans to get rid of that syntax.  Moreover, we need to
support that syntax for older ICU versions anyway.  A patch has been
posted that, as I understand it, would allow BCP 47 syntax to be used
with older versions as well.  That's a nice idea, but it's a new feature
and not appropriate for PG10 at this time.

(Note that we also don't canonicalize libc locale names.)

The attached documentation patch documents both locale naming forms in
parallel.

The other attached patch contains a test suite that verifies that the
examples in the documentation actually work.  It's not meant for
committing into the mainline, but it was useful for me.

During testing with various versions I have also discovered the
following things:

- The current code appears to be of the opinion that BCP 47 locale names
are accepted as of ICU 54.  That appears to be incorrect; I find that
they work fine in ICU 52, but they don't work in ICU 4.2.  I don't know
where the cutoff is.  That might be worth changing in the code if we can
obtain more information.

- What might have led to the above mistake is the following in the
ucol_open() documentation: q{Starting with ICU 54, collation attributes
can be specified via locale keywords as well, in the old locale
extension syntax ("el@colCaseFirst=upper") or in language tag syntax
("el-u-kf-upper").}  That is correct.  If you use that syntax in earlier
versions, the case-first specification in this example is ignored.  You
need to use ucol_setAttribute() then.  (Note that the phonebook stuff
still works, because that is not a "collation attribute".)

(One of my plans for PG11 had been to allow specifying such collation
attributes via additional CREATE COLLATION clauses and pg_collation
columns, but that might be obsolete now.)

- Moreover, it is not the case that ICU accepts just any sort of
nonsense as a locale name.  For example, "el@colCaseFirst=whatever" is
rejected with U_ILLEGAL_ARGUMENT_ERROR.  Now, it might in other cases be
more liberal than we might be hoping for.  But I think they have reasons
for what they do.

One conclusion here is that there are multiple dimensions to what sort
of thing is accepted as a locale in different versions of ICU and what
the canonical format is.  If we insist that everything has to be written
in the form that is preferred today, then we'll have awkward problems if
a future version of ICU establishes even more variants that will then be
preferred.

I think there is room for incremental refinement here.  Features like
optionally checking or printing the canonical or preferred locale format
or making the locale description available via a function or system view
might all be valuable additions to a future PostgreSQL release.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

From 5eac8a0df7163f8374382d37b32b9c2d3580238d Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <[email protected]>
Date: Fri, 22 Sep 2017 13:51:01 -0400
Subject: [PATCH 1/2] Expand collation documentation

Document better how to create custom collations and what locale strings
ICU accepts.  Explain the ICU examples in more detail.  Also update the
text on the CREATE COLLATION reference page a bit to take ICU more into
account.
---
 doc/src/sgml/charset.sgml              | 135 ++++++++++++++++++++++++++-------
 doc/src/sgml/ref/create_collation.sgml |  28 ++++---
 2 files changed, 124 insertions(+), 39 deletions(-)

diff --git a/doc/src/sgml/charset.sgml b/doc/src/sgml/charset.sgml
index 44e43503a6..63f7de5b43 100644
--- a/doc/src/sgml/charset.sgml
+++ b/doc/src/sgml/charset.sgml
@@ -515,7 +515,7 @@ <title>Managing Collations</title>
    <para>
     A collation object provided by <literal>libc</literal> maps to a
     combination of <symbol>LC_COLLATE</symbol> and <symbol>LC_CTYPE</symbol>
-    settings.  (As
+    settings, as accepted by the <literal>setlocale()</literal> system library 
call.  (As
     the name would suggest, the main purpose of a collation is to set
     <symbol>LC_COLLATE</symbol>, which controls the sort order.  But
     it is rarely necessary in practice to have an
@@ -640,21 +640,19 @@ <title>libc collations</title>
    <title>ICU collations</title>
 
    <para>
-    Collations provided by ICU are created with names in BCP 47 language tag
+    With ICU, it is not sensible to enumerate all possible locale names.  ICU
+    uses a particular naming system for locales, but there are many more ways
+    to name a locale than there are actually distinct locales.
+    <command>initdb</command> uses the ICU APIs to extract a set of distinct
+    locales to populate the initial set of collations.  Collations provided by
+    ICU are created in the SQL environment with names in BCP 47 language tag
     format, with a <quote>private use</quote>
     extension <literal>-x-icu</literal> appended, to distinguish them from
-    libc locales.  So <literal>de-x-icu</literal> would be an example name.
+    libc locales.
    </para>
 
    <para>
-    With ICU, it is not sensible to enumerate all possible locale names.  ICU
-    uses a particular naming system for locales, but there are many more ways
-    to name a locale than there are actually distinct locales.  (In fact, any
-    string will be accepted as a locale name.)
-    See <ulink url="http://userguide.icu-project.org/locale";></ulink> for
-    information on ICU locale naming.  <command>initdb</command> uses the ICU
-    APIs to extract a set of distinct locales to populate the initial set of
-    collations.  Here are some example collations that might be created:
+    Here are some example collations that might be created:
 
     <variablelist>
      <varlistentry>
@@ -695,32 +693,104 @@ <title>ICU collations</title>
     will draw an error along the lines of <quote>collation "de-x-icu" for
     encoding "WIN874" does not exist</>.
    </para>
+  </sect4>
+  </sect3>
+
+  <sect3 id="collation-create">
+   <title>Creating New Collation Objects</title>
+
+   <para>
+    If the standard and predefined collations are not sufficient, users can
+    create their own collation objects using the SQL
+    command <xref linkend="sql-createcollation">.
+   </para>
+
+   <para>
+    The standard and predefined collations are in the
+    schema <literal>pg_catalog</literal>, like all predefined objects.
+    User-defined collations should be created in user schemas.  This also
+    ensures that they are saved by <command>pg_dump</command>.
+   </para>
+
+   <sect4>
+    <title>libc collations</title>
+
+    <para>
+     New libc collations can be created like this:
+<programlisting>
+CREATE COLLATION german (provider = libc, locale = 'de_DE');
+</programlisting>
+     The exact values that are acceptable for the <literal>locale</literal>
+     clause in this command depend on the operating system.  On Unix-like
+     systems, the command <literal>locale -a</literal> will show a list.
+    </para>
+
+    <para>
+     Since the predefined libc collations already include all collations
+     defined in the operating system when the database instance is
+     initialized, it is not often necessary to manually create new ones.
+     Reasons might be if a different naming system is desired (in which case
+     see also <xref linkend="collation-copy">) or if the operating system has
+     been upgraded to provide new locale definitions (in which case see
+     also <link 
linkend="functions-admin-collation"><function>pg_import_system_collations()</function></link>).
+    </para>
+   </sect4>
+
+   <sect4>
+    <title>ICU collations</title>
 
    <para>
     ICU allows collations to be customized beyond the basic language+country
     set that is preloaded by <command>initdb</command>.  Users are encouraged
     to define their own collation objects that make use of these facilities to
-    suit the sorting behavior to their requirements.  Here are some examples:
+    suit the sorting behavior to their requirements.
+    See <ulink url="http://userguide.icu-project.org/locale";></ulink>
+    and <ulink url="http://userguide.icu-project.org/collation/api";></ulink> 
for
+    information on ICU locale naming.  The set of acceptable names and
+    attributes depends on the particular ICU version.
+   </para>
+
+   <para>
+    Here are some examples:
 
     <variablelist>
      <varlistentry>
-      <term><literal>CREATE COLLATION "de-u-co-phonebk-x-icu" (provider = icu, 
locale = 'de-u-co-phonebk')</literal></term>
+      <term><literal>CREATE COLLATION "de-u-co-phonebk-x-icu" (provider = icu, 
locale = 'de-u-co-phonebk');</literal></term>
+      <term><literal>CREATE COLLATION "de-u-co-phonebk-x-icu" (provider = icu, 
locale = 'de@collation=phonebook');</literal></term>
       <listitem>
        <para>German collation with phone book collation type</para>
+       <para>
+        The first example selects the ICU locale using a <quote>language
+        tag</quote> per BCP 47.  The second example uses the traditional
+        ICU-specific locale syntax.  The first style is preferred going
+        forward, but it is not supported by older ICU versions.
+       </para>
+       <para>
+        Note that you can name the collation objects in the SQL environment
+        anything you want.  In this example, we follow the naming style that
+        the predefined collations use, which in turn also follow BCP 47, but
+        that is not required for user-defined collations.
+       </para>
       </listitem>
      </varlistentry>
 
      <varlistentry>
-      <term><literal>CREATE COLLATION "und-u-co-emoji-x-icu" (provider = icu, 
locale = 'und-u-co-emoji')</literal></term>
+      <term><literal>CREATE COLLATION "und-u-co-emoji-x-icu" (provider = icu, 
locale = 'und-u-co-emoji');</literal></term>
+      <term><literal>CREATE COLLATION "und-u-co-emoji-x-icu" (provider = icu, 
locale = '@collation=emoji');</literal></term>
       <listitem>
        <para>
         Root collation with Emoji collation type, per Unicode Technical 
Standard #51
        </para>
+       <para>
+        Observe how in the traditional ICU locale naming system, the root
+        locale is selected by an empty string.
+       </para>
       </listitem>
      </varlistentry>
 
      <varlistentry>
-      <term><literal>CREATE COLLATION digitslast (provider = icu, locale = 
'en-u-kr-latn-digit')</literal></term>
+      <term><literal>CREATE COLLATION digitslast (provider = icu, locale = 
'en-u-kr-latn-digit');</literal></term>
+      <term><literal>CREATE COLLATION digitslast (provider = icu, locale = 
'en@colReorder=latn-digit');</literal></term>
       <listitem>
        <para>
         Sort digits after Latin letters.  (The default is digits before 
letters.)
@@ -729,7 +799,8 @@ <title>ICU collations</title>
      </varlistentry>
 
      <varlistentry>
-      <term><literal>CREATE COLLATION upperfirst (provider = icu, locale = 
'en-u-kf-upper')</literal></term>
+      <term><literal>CREATE COLLATION upperfirst (provider = icu, locale = 
'en-u-kf-upper');</literal></term>
+      <term><literal>CREATE COLLATION upperfirst (provider = icu, locale = 
'en@colCaseFirst=upper');</literal></term>
       <listitem>
        <para>
         Sort upper-case letters before lower-case letters.  (The default is
@@ -739,7 +810,8 @@ <title>ICU collations</title>
      </varlistentry>
 
      <varlistentry>
-      <term><literal>CREATE COLLATION special (provider = icu, locale = 
'en-u-kf-upper-kr-latn-digit')</literal></term>
+      <term><literal>CREATE COLLATION special (provider = icu, locale = 
'en-u-kf-upper-kr-latn-digit');</literal></term>
+      <term><literal>CREATE COLLATION special (provider = icu, locale = 
'en@colCaseFirst=upper;colReorder=latn-digit');</literal></term>
       <listitem>
        <para>
         Combines both of the above options.
@@ -748,7 +820,8 @@ <title>ICU collations</title>
      </varlistentry>
 
      <varlistentry>
-      <term><literal>CREATE COLLATION numeric (provider = icu, locale = 
'en-u-kn-true')</literal></term>
+      <term><literal>CREATE COLLATION numeric (provider = icu, locale = 
'en-u-kn-true');</literal></term>
+      <term><literal>CREATE COLLATION numeric (provider = icu, locale = 
'en@colNumeric=yes');</literal></term>
       <listitem>
        <para>
         Numeric ordering, sorts sequences of digits by their numeric value,
@@ -768,7 +841,8 @@ <title>ICU collations</title>
     repository</ulink>.
     The <ulink url="https://ssl.icu-project.org/icu-bin/locexp";>ICU Locale
     Explorer</ulink> can be used to check the details of a particular locale
-    definition.
+    definition.  The examples using the <literal>k*</literal> subtags require
+    at least ICU version 54.
    </para>
 
    <para>
@@ -779,10 +853,21 @@ <title>ICU collations</title>
     strings that compare equal according to the collation but are not
     byte-wise equal will be sorted according to their byte values.
    </para>
+
+   <note>
+    <para>
+     By design, ICU will accept almost any string as a locale name and match
+     it to the closet locale it can provide, using the fallback procedure
+     described in its documentation.  Thus, there will be no direct feedback
+     if a collation specification is composed using features that the given
+     ICU installation does not actually support.  It is therefore recommended
+     to create application-level test cases to check that the collation
+     definitions satisfy one's requirements.
+    </para>
+   </note>
    </sect4>
-   </sect3>
 
-   <sect3>
+   <sect4 id="collation-copy">
    <title>Copying Collations</title>
 
    <para>
@@ -796,13 +881,7 @@ <title>Copying Collations</title>
 CREATE COLLATION french FROM "fr-x-icu";
 </programlisting>
    </para>
-
-   <para>
-    The standard and predefined collations are in the
-    schema <literal>pg_catalog</literal>, like all predefined objects.
-    User-defined collations should be created in user schemas.  This also
-    ensures that they are saved by <command>pg_dump</command>.
-   </para>
+   </sect4>
    </sect3>
   </sect2>
  </sect1>
diff --git a/doc/src/sgml/ref/create_collation.sgml 
b/doc/src/sgml/ref/create_collation.sgml
index 2d3e050545..f88758095f 100644
--- a/doc/src/sgml/ref/create_collation.sgml
+++ b/doc/src/sgml/ref/create_collation.sgml
@@ -93,10 +93,7 @@ <title>Parameters</title>
      <listitem>
       <para>
        Use the specified operating system locale for
-       the <symbol>LC_COLLATE</symbol> locale category.  The locale
-       must be applicable to the current database encoding.
-       (See <xref linkend="sql-createdatabase"> for the precise
-       rules.)
+       the <symbol>LC_COLLATE</symbol> locale category.
       </para>
      </listitem>
     </varlistentry>
@@ -107,10 +104,7 @@ <title>Parameters</title>
      <listitem>
       <para>
        Use the specified operating system locale for
-       the <symbol>LC_CTYPE</symbol> locale category.  The locale
-       must be applicable to the current database encoding.
-       (See <xref linkend="sql-createdatabase"> for the precise
-       rules.)
+       the <symbol>LC_CTYPE</symbol> locale category.
       </para>
      </listitem>
     </varlistentry>
@@ -173,8 +167,13 @@ <title>Notes</title>
   </para>
 
   <para>
-   See <xref linkend="collation"> for more information about collation
-   support in PostgreSQL.
+   See <xref linkend="collation-create"> for more information on how to create 
collations.
+  </para>
+
+  <para>
+   When using the <literal>libc</literal> collation provider, the locale must
+   be applicable to the current database encoding.
+   See <xref linkend="sql-createdatabase"> for the precise rules.
   </para>
  </refsect1>
 
@@ -186,7 +185,14 @@ <title>Examples</title>
    <literal>fr_FR.utf8</literal>
    (assuming the current database encoding is <literal>UTF8</literal>):
 <programlisting>
-CREATE COLLATION french (LOCALE = 'fr_FR.utf8');
+CREATE COLLATION french (locale = 'fr_FR.utf8');
+</programlisting>
+  </para>
+
+  <para>
+   To create a collation using the ICU provider using German phone book sort 
order:
+<programlisting>
+CREATE COLLATION german_phonebook (provider = icu, locale = 'de-u-co-phonebk');
 </programlisting>
   </para>
 
-- 
2.14.1

From 6e454f63b9730f10cf2502dc80a8d3034d7bc690 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <[email protected]>
Date: Fri, 22 Sep 2017 13:52:28 -0400
Subject: [PATCH 2/2] Test cases for advanced ICU features

---
 src/test/regress/expected/collate_icu_advanced.out | 257 +++++++++++++++++++++
 src/test/regress/sql/collate_icu_advanced.sql      |  49 ++++
 2 files changed, 306 insertions(+)
 create mode 100644 src/test/regress/expected/collate_icu_advanced.out
 create mode 100644 src/test/regress/sql/collate_icu_advanced.sql

diff --git a/src/test/regress/expected/collate_icu_advanced.out 
b/src/test/regress/expected/collate_icu_advanced.out
new file mode 100644
index 0000000000..46260db53b
--- /dev/null
+++ b/src/test/regress/expected/collate_icu_advanced.out
@@ -0,0 +1,257 @@
+SET client_encoding TO UTF8;
+CREATE TABLE names (name text);
+INSERT INTO names VALUES ('Göbel'), ('Goethe'), ('Goldmann'), ('Göthe'), 
('Götz');
+CREATE COLLATION "de-u-co-phonebk-x-icu/1" (provider = icu, locale = 
'de-u-co-phonebk');
+CREATE COLLATION "de-u-co-phonebk-x-icu/2" (provider = icu, locale = 
'de@collation=phonebook');
+SELECT name FROM names ORDER BY name COLLATE "de-x-icu";
+   name   
+----------
+ Göbel
+ Goethe
+ Goldmann
+ Göthe
+ Götz
+(5 rows)
+
+SELECT name FROM names ORDER BY name COLLATE "de-u-co-phonebk-x-icu/1";
+   name   
+----------
+ Göbel
+ Goethe
+ Göthe
+ Götz
+ Goldmann
+(5 rows)
+
+SELECT name FROM names ORDER BY name COLLATE "de-u-co-phonebk-x-icu/2";
+   name   
+----------
+ Göbel
+ Goethe
+ Göthe
+ Götz
+ Goldmann
+(5 rows)
+
+CREATE COLLATION "und-u-co-emoji-x-icu/1" (provider = icu, locale = 
'und-u-co-emoji');
+CREATE COLLATION "und-u-co-emoji-x-icu/2" (provider = icu, locale = 
'@collation=emoji');
+SELECT chr(x) FROM generate_series(x'1F634'::int, x'1F644'::int) AS _(x) ORDER 
BY chr(x) COLLATE "und-x-icu";
+ chr 
+-----
+ 😴
+ 😵
+ 😶
+ 😷
+ 😸
+ 😹
+ 😺
+ 😻
+ 😼
+ 😽
+ 😾
+ 😿
+ 🙀
+ 🙁
+ 🙂
+ 🙃
+ 🙄
+(17 rows)
+
+SELECT chr(x) FROM generate_series(x'1F634'::int, x'1F644'::int) AS _(x) ORDER 
BY chr(x) COLLATE "und-u-co-emoji-x-icu/1";
+ chr 
+-----
+ 🙂
+ 😶
+ 🙄
+ 😴
+ 🙃
+ 🙁
+ 😵
+ 😷
+ 😺
+ 😸
+ 😹
+ 😻
+ 😼
+ 😽
+ 🙀
+ 😿
+ 😾
+(17 rows)
+
+SELECT chr(x) FROM generate_series(x'1F634'::int, x'1F644'::int) AS _(x) ORDER 
BY chr(x) COLLATE "und-u-co-emoji-x-icu/2";
+ chr 
+-----
+ 🙂
+ 😶
+ 🙄
+ 😴
+ 🙃
+ 🙁
+ 😵
+ 😷
+ 😺
+ 😸
+ 😹
+ 😻
+ 😼
+ 😽
+ 🙀
+ 😿
+ 😾
+(17 rows)
+
+CREATE TABLE test1 (x text);
+INSERT INTO test1 VALUES ('1'), ('12'), ('123'), ('2'), ('21'), ('a'), ('b'), 
('c'), ('A'), ('B'), ('C');
+CREATE COLLATION digitslast1 (provider = icu, locale = 'en-u-kr-latn-digit');
+CREATE COLLATION digitslast2 (provider = icu, locale = 
'en@colReorder=latn-digit');
+SELECT * FROM test1 ORDER BY x COLLATE "und-x-icu";
+  x  
+-----
+ 1
+ 12
+ 123
+ 2
+ 21
+ a
+ A
+ b
+ B
+ c
+ C
+(11 rows)
+
+SELECT * FROM test1 ORDER BY x COLLATE digitslast1;
+  x  
+-----
+ a
+ A
+ b
+ B
+ c
+ C
+ 1
+ 12
+ 123
+ 2
+ 21
+(11 rows)
+
+SELECT * FROM test1 ORDER BY x COLLATE digitslast2;
+  x  
+-----
+ a
+ A
+ b
+ B
+ c
+ C
+ 1
+ 12
+ 123
+ 2
+ 21
+(11 rows)
+
+CREATE COLLATION upperfirst1 (provider = icu, locale = 'en-u-kf-upper');
+CREATE COLLATION upperfirst2 (provider = icu, locale = 
'en@colCaseFirst=upper');
+SELECT * FROM test1 ORDER BY x COLLATE upperfirst1;
+  x  
+-----
+ 1
+ 12
+ 123
+ 2
+ 21
+ A
+ a
+ B
+ b
+ C
+ c
+(11 rows)
+
+SELECT * FROM test1 ORDER BY x COLLATE upperfirst2;
+  x  
+-----
+ 1
+ 12
+ 123
+ 2
+ 21
+ A
+ a
+ B
+ b
+ C
+ c
+(11 rows)
+
+CREATE COLLATION special1 (provider = icu, locale = 
'en-u-kf-upper-kr-latn-digit');
+CREATE COLLATION special2 (provider = icu, locale = 
'en@colCaseFirst=upper;colReorder=latn-digit');
+SELECT * FROM test1 ORDER BY x COLLATE special1;
+  x  
+-----
+ A
+ a
+ B
+ b
+ C
+ c
+ 1
+ 12
+ 123
+ 2
+ 21
+(11 rows)
+
+SELECT * FROM test1 ORDER BY x COLLATE special2;
+  x  
+-----
+ A
+ a
+ B
+ b
+ C
+ c
+ 1
+ 12
+ 123
+ 2
+ 21
+(11 rows)
+
+CREATE COLLATION numeric1 (provider = icu, locale = 'en-u-kn-true');
+CREATE COLLATION numeric2 (provider = icu, locale = 'en@colNumeric=yes');
+SELECT * FROM test1 ORDER BY x COLLATE numeric1;
+  x  
+-----
+ 1
+ 2
+ 12
+ 21
+ 123
+ a
+ A
+ b
+ B
+ c
+ C
+(11 rows)
+
+SELECT * FROM test1 ORDER BY x COLLATE numeric2;
+  x  
+-----
+ 1
+ 2
+ 12
+ 21
+ 123
+ a
+ A
+ b
+ B
+ c
+ C
+(11 rows)
+
diff --git a/src/test/regress/sql/collate_icu_advanced.sql 
b/src/test/regress/sql/collate_icu_advanced.sql
new file mode 100644
index 0000000000..4d83644da6
--- /dev/null
+++ b/src/test/regress/sql/collate_icu_advanced.sql
@@ -0,0 +1,49 @@
+SET client_encoding TO UTF8;
+
+CREATE TABLE names (name text);
+
+INSERT INTO names VALUES ('Göbel'), ('Goethe'), ('Goldmann'), ('Göthe'), 
('Götz');
+
+CREATE COLLATION "de-u-co-phonebk-x-icu/1" (provider = icu, locale = 
'de-u-co-phonebk');
+CREATE COLLATION "de-u-co-phonebk-x-icu/2" (provider = icu, locale = 
'de@collation=phonebook');
+
+SELECT name FROM names ORDER BY name COLLATE "de-x-icu";
+SELECT name FROM names ORDER BY name COLLATE "de-u-co-phonebk-x-icu/1";
+SELECT name FROM names ORDER BY name COLLATE "de-u-co-phonebk-x-icu/2";
+
+
+CREATE COLLATION "und-u-co-emoji-x-icu/1" (provider = icu, locale = 
'und-u-co-emoji');
+CREATE COLLATION "und-u-co-emoji-x-icu/2" (provider = icu, locale = 
'@collation=emoji');
+
+SELECT chr(x) FROM generate_series(x'1F634'::int, x'1F644'::int) AS _(x) ORDER 
BY chr(x) COLLATE "und-x-icu";
+SELECT chr(x) FROM generate_series(x'1F634'::int, x'1F644'::int) AS _(x) ORDER 
BY chr(x) COLLATE "und-u-co-emoji-x-icu/1";
+SELECT chr(x) FROM generate_series(x'1F634'::int, x'1F644'::int) AS _(x) ORDER 
BY chr(x) COLLATE "und-u-co-emoji-x-icu/2";
+
+
+CREATE TABLE test1 (x text);
+INSERT INTO test1 VALUES ('1'), ('12'), ('123'), ('2'), ('21'), ('a'), ('b'), 
('c'), ('A'), ('B'), ('C');
+
+CREATE COLLATION digitslast1 (provider = icu, locale = 'en-u-kr-latn-digit');
+CREATE COLLATION digitslast2 (provider = icu, locale = 
'en@colReorder=latn-digit');
+
+SELECT * FROM test1 ORDER BY x COLLATE "und-x-icu";
+SELECT * FROM test1 ORDER BY x COLLATE digitslast1;
+SELECT * FROM test1 ORDER BY x COLLATE digitslast2;
+
+CREATE COLLATION upperfirst1 (provider = icu, locale = 'en-u-kf-upper');
+CREATE COLLATION upperfirst2 (provider = icu, locale = 
'en@colCaseFirst=upper');
+
+SELECT * FROM test1 ORDER BY x COLLATE upperfirst1;
+SELECT * FROM test1 ORDER BY x COLLATE upperfirst2;
+
+CREATE COLLATION special1 (provider = icu, locale = 
'en-u-kf-upper-kr-latn-digit');
+CREATE COLLATION special2 (provider = icu, locale = 
'en@colCaseFirst=upper;colReorder=latn-digit');
+
+SELECT * FROM test1 ORDER BY x COLLATE special1;
+SELECT * FROM test1 ORDER BY x COLLATE special2;
+
+CREATE COLLATION numeric1 (provider = icu, locale = 'en-u-kn-true');
+CREATE COLLATION numeric2 (provider = icu, locale = 'en@colNumeric=yes');
+
+SELECT * FROM test1 ORDER BY x COLLATE numeric1;
+SELECT * FROM test1 ORDER BY x COLLATE numeric2;
-- 
2.14.1

-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] CREATE COLLATION does not sanitize ICU's BCP 47 language tags. Should it?

Reply via email to