Re: Refactor to introduce pg_strcoll().

2023-03-05 Thread Thomas Munro
+/* Win32 does not have UTF-8, so we need to map to UTF-16 */

I wonder if this is still true.  I think in Windows 10+ you can enable
UTF-8 support.  Then could you use strcoll_l() directly?  I struggled
to understand that, but I am a simple Unix hobbit from the shire so I
dunno.  (Perhaps the *whole OS* has to be in that mode, so you might
have to do a runtime test?  This was discussed in another thread that
mostly left me confused[1].).

And that leads to another thought.  We have an old comment
"Unfortunately, there is no strncoll(), so ...".  Curiously, Windows
does actually have strncoll_l() (as do some other libcs out there).
So after skipping the expansion to wchar_t, one might think you could
avoid the extra copy required to nul-terminate the string (and hope
that it doesn't make an extra copy internally, far from given).
Unfortunately it seems to be defined in a strange way that doesn't
look like your pg_strncoll_XXX() convention: it has just one length
parameter, not one for each string.  That is, it's designed for
comparing prefixes of strings, not for working with
non-null-terminated strings.  I'm not entirely sure if the interface
makes sense at all!  Is it measuring in 'chars' or 'encoded
characters'?  I would guess the former, like strncpy() et al, but then
what does it mean if it chops a UTF-8 sequence in half?  And at a
higher level, if you wanted to use it for our purpose, you'd
presumably need Min(s1_len, s2_len), but I wonder if there are string
pairs that would sort in a different order if the collation algorithm
could see more characters after that?  For example, in Dutch "ij" is
sometimes treated like a letter that sorts differently than "i" + "j"
normally would, so if you arbitrarily chop that "j" off while
comparing common-length prefix you might get into trouble; likewise
for "aa" in Danish.  Perhaps these sorts of problems explain why it's
not in the standard (though I see it was at some point in some kind of
draft; I don't grok the C standards process enough to track down what
happened but WG20/WG14 draft N1027[2] clearly contains strncoll_l()
alongside the stuff that we know and use today).  Or maybe I'm
underthinking it.

[1] 
https://www.postgresql.org/message-id/flat/CA%2BhUKGJ%3DXThErgAQRoqfCy1bKPxXVuF0%3D2zDbB%2BSxDs59pv7Fw%40mail.gmail.com
[2] https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1027.pdf




Re: Refactor to introduce pg_strcoll().

2022-11-09 Thread Jeff Davis
Attached some new patches. I think you were right that the API of
pg_strcoll() was strange before, so I changed it to:

  * pg_strcoll() takes nul-terminated arguments
  * pg_strncoll() takes arguments and lengths

The new pg_strcoll()/pg_strncoll() api in 0001 seems much reasonable to
support in the long term, potentially with other callers.

Patch 0004 exports:

  * pg_collate_icu() is for ICU and takes arguments and lengths
  * pg_collate_libc() is for libc and takes nul-terminated arguments

for a tiny speedup because varstrfastcmp_locale() has both nul-
terminated arguments and their lengths.

On Fri, 2022-11-04 at 15:06 -0700, Jeff Davis wrote:
> But I think the complex hacks are just the transformation into UTF 16
> and calling of wcscoll(). If that's done from within pg_strcoll()
> (after my patch), then I think it just works, right?

I included a patch (0005) to enable varstrfastcmp_locale on windows
with a server encoding of UTF-8. I don't have a great way of testing
this, but it seems like it should work.

> I was also considering an optimization to use stack allocation for
> short strings when doing the non-UTF8 icu comparison. I am seeing
> some
> benefit there

I also included this optimization in 0003: try to use the stack for
reasonable values; allocate on the heap for larger strings. I think
it's helping a bit.

Patch 0002 helps a lot more: for the case of ICU with a non-UTF8 server
encoding, the speedup is something like 30%. The reason is that
icu_to_uchar() currently calls ucnv_toUChars() twice: once to determine
the precise output buffer size required, and then again once it has the
buffer ready. But it's easy to just size the destination buffer
conservatively, because the maximum space required is enough to hold
twice the number of UChars as there are input characters[1], plus the
terminating nul. That means we just call ucnv_toUChars() once, to do
the actual conversion.

My perf test was to use a quick hack to disable abbreviated keys (so
that it's doing more comparisons), and then just do a large ORDER BY
... COLLATE on a table with generated text. The text is random lorem
ipsum data, with some characters replaced by multibyte characters to
exercise some more interesting paths. If someone else has some more
sophisticated collation tests I'd be interested to try them.

Regards,
Jeff Davis

[1]
https://unicode-org.github.io/icu-docs/apidoc/dev/icu4c/ucnv_8h.html#ae1049fcb893783c860fe0f9d4da84939
From 29249d17b5580e87365c008b18c83db5528db37e Mon Sep 17 00:00:00 2001
From: Jeff Davis 
Date: Wed, 9 Nov 2022 08:58:08 -0800
Subject: [PATCH v3 1/5] Introduce pg_strcoll() and pg_strncoll().

Hide the special cases of the platform, collation provider, and
database encoding in pg_locale.c. Simplify varlena.c.
---
 src/backend/utils/adt/pg_locale.c | 263 ++
 src/backend/utils/adt/varlena.c   | 230 +-
 src/include/utils/pg_locale.h |   3 +
 3 files changed, 268 insertions(+), 228 deletions(-)

diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index 2b42d9ccd8..94dc64c2d0 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -79,6 +79,12 @@
 #include 
 #endif
 
+/*
+ * This should be large enough that most strings will fit, but small enough
+ * that we feel comfortable putting it on the stack
+ */
+#define		TEXTBUFLEN			1024
+
 #define		MAX_L10N_DATA		80
 
 
@@ -1731,6 +1737,263 @@ get_collation_actual_version(char collprovider, const char *collcollate)
 	return collversion;
 }
 
+/*
+ * win32_utf8_wcscoll
+ *
+ * Convert UTF8 arguments to wide characters and invoke wcscoll() or
+ * wcscoll_l().
+ */
+#ifdef WIN32
+static int
+win32_utf8_wcscoll(const char *arg1, size_t len1, const char *arg2, size_t len2,
+   pg_locale_t locale)
+{
+	char		a1buf[TEXTBUFLEN];
+	char		a2buf[TEXTBUFLEN];
+	char	   *a1p,
+			   *a2p;
+	int			a1len;
+	int			a2len;
+	int			r;
+	int			result;
+
+	if (len1 >= TEXTBUFLEN / 2)
+	{
+		a1len = len1 * 2 + 2;
+		a1p = palloc(a1len);
+	}
+	else
+	{
+		a1len = TEXTBUFLEN;
+		a1p = a1buf;
+	}
+	if (len2 >= TEXTBUFLEN / 2)
+	{
+		a2len = len2 * 2 + 2;
+		a2p = palloc(a2len);
+	}
+	else
+	{
+		a2len = TEXTBUFLEN;
+		a2p = a2buf;
+	}
+
+	/* API does not work for zero-length input */
+	if (len1 == 0)
+		r = 0;
+	else
+	{
+		r = MultiByteToWideChar(CP_UTF8, 0, arg1, len1,
+(LPWSTR) a1p, a1len / 2);
+		if (!r)
+			ereport(ERROR,
+	(errmsg("could not convert string to UTF-16: error code %lu",
+			GetLastError(;
+	}
+	((LPWSTR) a1p)[r] = 0;
+
+	if (len2 == 0)
+		r = 0;
+	else
+	{
+		r = MultiByteToWideChar(CP_UTF8, 0, arg2, len2,
+(LPWSTR) a2p, a2len / 2);
+		if (!r)
+			ereport(ERROR,
+	(errmsg("could not convert string to UTF-16: error code %lu",
+			GetLastError(;
+	}
+	((LPWSTR) a2p)[r] = 0;
+
+	errno = 0;
+#ifdef HAVE_LOCALE_T
+	if (locale)
+		result = wcscoll_l((LPWSTR) a1p, (LPWSTR) a2p, locale->info.lt);

Re: Refactor to introduce pg_strcoll().

2022-11-04 Thread Jeff Davis
On Tue, 2022-11-01 at 13:36 +0100, Peter Eisentraut wrote:
> But I think putting the Windows UTF-8 code (win32_utf8_wcscoll())
> from 
> varstr_cmp() into pg_strcoll() might create future complications. 
> Normally, it would be up to varstr_sortsupport() to set up a 
> Windows/UTF-8 specific comparator function, but it says there it's
> not 
> implemented.  But someone who wanted to implement that would have to 
> lift it back out of pg_strcoll, or rearrange the code in some other
> way.

Is there a reason that it wouldn't work to just use
varlenafastcmp_locale() after my patch? The comment says:

  /*  
   * There is a further exception on Windows.  When the database  
   * encoding is UTF-8 and we are not using the C collation, complex  
   * hacks are required... 

But I think the complex hacks are just the transformation into UTF 16
and calling of wcscoll(). If that's done from within pg_strcoll()
(after my patch), then I think it just works, right?

I can't easily test on windows, so perhaps I'm missing something. Does
the buildfarm have enough coverage here? Is it reasonable to try a
commit that removes the:

  #ifdef WIN32
if (GetDatabaseEncoding() == PG_UTF8 &&
!(locale && locale->provider == COLLPROVIDER_ICU))
return;
  #endif

along with my patch and see what I get out of the buildfarm?

> Perhaps you have some follow-up work 
> planned, in which case it might be better to consider this in more 
> context. 

I was also considering an optimization to use stack allocation for
short strings when doing the non-UTF8 icu comparison. I am seeing some
benefit there, but it seems to depend on the size of the stack buffer.
That suggests that maybe TEXTBUFSIZE is too large (1024) and perhaps we
should drop it down, but I need to investigate more.

In general, I'm trying to slowly get the locale stuff more isolated.

Regards,
Jeff Davis




Re: Refactor to introduce pg_strcoll().

2022-11-01 Thread Peter Eisentraut

On 15.10.22 01:00, Jeff Davis wrote:

On Thu, 2022-10-13 at 10:57 +0200, Peter Eisentraut wrote:

It's a bit confusing that arguments must be NUL-terminated, but the
length is still specified.  Maybe another sentence to explain that
would
be helpful.


Added a comment. It was a little frustrating to get a perfectly clean
API, because the callers do some buffer manipulation and optimizations
of their own. I think this is an improvement, but suggestions welcome.

If win32 is used with UTF-8 and wcscoll, it ends up allocating some
extra stack space for the temporary buffers, whereas previously it used
the buffers on the stack of varstr_cmp(). I'm not sure if that's a
problem or not.


Refactoring the common code from varstr_cmp() and varstrfastcmp_locale() 
looks like a clear win.


But I think putting the Windows UTF-8 code (win32_utf8_wcscoll()) from 
varstr_cmp() into pg_strcoll() might create future complications. 
Normally, it would be up to varstr_sortsupport() to set up a 
Windows/UTF-8 specific comparator function, but it says there it's not 
implemented.  But someone who wanted to implement that would have to 
lift it back out of pg_strcoll, or rearrange the code in some other way. 
 It's not a clear win, I think.  Perhaps you have some follow-up work 
planned, in which case it might be better to consider this in more 
context.  Otherwise, I'd be tempted to leave it where it was in 
varstr_cmp().  (It could still be a separate function, to keep 
varstr_cmp() itself a bit smaller.)





Re: Refactor to introduce pg_strcoll().

2022-10-14 Thread Jeff Davis
On Thu, 2022-10-13 at 10:57 +0200, Peter Eisentraut wrote:
> It's a bit confusing that arguments must be NUL-terminated, but the 
> length is still specified.  Maybe another sentence to explain that
> would 
> be helpful.

Added a comment. It was a little frustrating to get a perfectly clean
API, because the callers do some buffer manipulation and optimizations
of their own. I think this is an improvement, but suggestions welcome.

If win32 is used with UTF-8 and wcscoll, it ends up allocating some
extra stack space for the temporary buffers, whereas previously it used
the buffers on the stack of varstr_cmp(). I'm not sure if that's a
problem or not.

> The length arguments ought to be of type size_t, I think.

Changed.

Thank you.


-- 
Jeff Davis
PostgreSQL Contributor Team - AWS


From 4d5664552a8a86418a94c37fd4ab8ca3a665c1cd Mon Sep 17 00:00:00 2001
From: Jeff Davis 
Date: Thu, 6 Oct 2022 10:46:36 -0700
Subject: [PATCH] Refactor: introduce pg_strcoll().

Isolate collation routines into pg_locale.c and simplify varlena.c.
---
 src/backend/utils/adt/pg_locale.c | 180 ++
 src/backend/utils/adt/varlena.c   | 207 +-
 src/include/utils/pg_locale.h |   3 +-
 3 files changed, 184 insertions(+), 206 deletions(-)

diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index 2b42d9ccd8..3eb6a67bdc 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -1639,6 +1639,186 @@ pg_newlocale_from_collation(Oid collid)
 	return cache_entry->locale;
 }
 
+/*
+ * win32_utf8_wcscoll
+ *
+ * Convert UTF8 arguments to wide characters and invoke wcscoll() or
+ * wcscoll_l(). Will allocate on large input.
+ */
+#ifdef WIN32
+#define TEXTBUFLEN		1024
+static int
+win32_utf8_wcscoll(const char *arg1, size_t len1, const char *arg2,
+   size_t len2, pg_locale_t locale)
+{
+	char		a1buf[TEXTBUFLEN];
+	char		a2buf[TEXTBUFLEN];
+	char	   *a1p,
+			   *a2p;
+	int			a1len;
+	int			a2len;
+	int			r;
+	int			result;
+
+	if (len1 >= TEXTBUFLEN / 2)
+	{
+		a1len = len1 * 2 + 2;
+		a1p = palloc(a1len);
+	}
+	else
+	{
+		a1len = TEXTBUFLEN;
+		a1p = a1buf;
+	}
+	if (len2 >= TEXTBUFLEN / 2)
+	{
+		a2len = len2 * 2 + 2;
+		a2p = palloc(a2len);
+	}
+	else
+	{
+		a2len = TEXTBUFLEN;
+		a2p = a2buf;
+	}
+
+	/* API does not work for zero-length input */
+	if (len1 == 0)
+		r = 0;
+	else
+	{
+		r = MultiByteToWideChar(CP_UTF8, 0, arg1, len1,
+(LPWSTR) a1p, a1len / 2);
+		if (!r)
+			ereport(ERROR,
+	(errmsg("could not convert string to UTF-16: error code %lu",
+			GetLastError(;
+	}
+	((LPWSTR) a1p)[r] = 0;
+
+	if (len2 == 0)
+		r = 0;
+	else
+	{
+		r = MultiByteToWideChar(CP_UTF8, 0, arg2, len2,
+(LPWSTR) a2p, a2len / 2);
+		if (!r)
+			ereport(ERROR,
+	(errmsg("could not convert string to UTF-16: error code %lu",
+			GetLastError(;
+	}
+	((LPWSTR) a2p)[r] = 0;
+
+	errno = 0;
+#ifdef HAVE_LOCALE_T
+	if (locale)
+		result = wcscoll_l((LPWSTR) a1p, (LPWSTR) a2p, locale->info.lt);
+	else
+#endif
+		result = wcscoll((LPWSTR) a1p, (LPWSTR) a2p);
+	if (result == 2147483647)	/* _NLSCMPERROR; missing from mingw
+ * headers */
+		ereport(ERROR,
+(errmsg("could not compare Unicode strings: %m")));
+
+	if (a1p != a1buf)
+		pfree(a1p);
+	if (a2p != a2buf)
+		pfree(a2p);
+
+	return result;
+}
+#endif
+
+/*
+ * pg_strcoll
+ *
+ * Call ucol_strcollUTF8(), ucol_strcoll(), strcoll(), strcoll_l(), wcscoll(),
+ * or wcscoll_l() as appropriate for the given locale, platform, and database
+ * encoding. If the locale is not specified, use the database collation.
+ *
+ * Arguments must be NUL-terminated so they can be passed directly to
+ * strcoll(); but we also need the lengths to pass to ucol_strcoll().
+ *
+ * If the collation is deterministic, break ties with memcmp(), and then with
+ * the string length.
+ */
+int
+pg_strcoll(const char *arg1, size_t len1, const char *arg2, size_t len2,
+		   pg_locale_t locale)
+{
+	int result;
+
+#ifdef WIN32
+	/* Win32 does not have UTF-8, so we need to map to UTF-16 */
+	if (GetDatabaseEncoding() == PG_UTF8
+		&& (!locale || locale->provider == COLLPROVIDER_LIBC))
+	{
+		result = win32_utf8_wcscoll(arg1, len1, arg2, len2, locale);
+	}
+	else
+#endif			/* WIN32 */
+	if (locale)
+	{
+		if (locale->provider == COLLPROVIDER_ICU)
+		{
+#ifdef USE_ICU
+#ifdef HAVE_UCOL_STRCOLLUTF8
+			if (GetDatabaseEncoding() == PG_UTF8)
+			{
+UErrorCode	status;
+
+status = U_ZERO_ERROR;
+result = ucol_strcollUTF8(locale->info.icu.ucol,
+		  arg1, len1,
+		  arg2, len2,
+		  &status);
+if (U_FAILURE(status))
+	ereport(ERROR,
+			(errmsg("collation failed: %s", u_errorName(status;
+			}
+			else
+#endif
+			{
+int32_t		ulen1,
+			ulen2;
+UChar	   *uchar1,
+		   *uchar2;
+
+ulen1 = icu_to_uchar(&uchar1, arg1, len1);
+ulen2 = icu_to_uchar(&uchar2, arg2, len2);
+
+result = ucol_strcoll(locale->info

Re: Refactor to introduce pg_strcoll().

2022-10-13 Thread Peter Eisentraut

On 07.10.22 01:15, Jeff Davis wrote:

+ * Call ucol_strcollUTF8(), ucol_strcoll(), strcoll(), strcoll_l(), wcscoll(),
+ * or wcscoll_l() as appropriate for the given locale, platform, and database
+ * encoding. Arguments must be NUL-terminated. If the locale is not specified,
+ * use the database collation.
+ *
+ * If the collation is deterministic, break ties with memcmp(), and then with
+ * the string length.
+ */
+int
+pg_strcoll(const char *arg1, int len1, const char *arg2, int len2,
+  pg_locale_t locale)


It's a bit confusing that arguments must be NUL-terminated, but the 
length is still specified.  Maybe another sentence to explain that would 
be helpful.


The length arguments ought to be of type size_t, I think.





Refactor to introduce pg_strcoll().

2022-10-06 Thread Jeff Davis
Refactors to isolate strcoll, wcscoll, and ucol_strcoll into
pg_locale.c which seems like a better place. Most of the buffer
manipulation and equality optimizations are still left in varlena.c.

Patch attached.

I'm not able to easily test on windows, but it should be close to
correct as I just moved some code around.

Is there a good way to look for regressions (perf or correctness) when
making changes in this area, especially on windows and/or with strange
collation rules? What I did doesn't look like a problem, but there's
always a chance of a regression.


-- 
Jeff Davis
PostgreSQL Contributor Team - AWS


From 644ee9ac28652884d91f17f4fc7755104538d9f1 Mon Sep 17 00:00:00 2001
From: Jeff Davis 
Date: Thu, 6 Oct 2022 10:46:36 -0700
Subject: [PATCH] Refactor: introduce pg_strcoll().

Isolate collation routines into pg_locale.c and simplify varlena.c.
---
 src/backend/utils/adt/pg_locale.c | 178 +
 src/backend/utils/adt/varlena.c   | 207 +-
 src/include/utils/pg_locale.h |   3 +-
 3 files changed, 182 insertions(+), 206 deletions(-)

diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index 2b42d9ccd8..696ff75201 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -1639,6 +1639,184 @@ pg_newlocale_from_collation(Oid collid)
 	return cache_entry->locale;
 }
 
+/*
+ * win32_utf8_wcscoll
+ *
+ * Convert UTF8 arguments to wide characters and invoke wcscoll() or
+ * wcscoll_l(). Will allocate on large input.
+ */
+#ifdef WIN32
+#define TEXTBUFLEN		1024
+static int
+win32_utf8_wcscoll(const char *arg1, int len1, const char *arg2, int len2,
+   pg_locale_t locale)
+{
+	char		a1buf[TEXTBUFLEN];
+	char		a2buf[TEXTBUFLEN];
+	char	   *a1p,
+			   *a2p;
+	int			a1len;
+	int			a2len;
+	int			r;
+	int			result;
+
+	if (len1 >= TEXTBUFLEN / 2)
+	{
+		a1len = len1 * 2 + 2;
+		a1p = palloc(a1len);
+	}
+	else
+	{
+		a1len = TEXTBUFLEN;
+		a1p = a1buf;
+	}
+	if (len2 >= TEXTBUFLEN / 2)
+	{
+		a2len = len2 * 2 + 2;
+		a2p = palloc(a2len);
+	}
+	else
+	{
+		a2len = TEXTBUFLEN;
+		a2p = a2buf;
+	}
+
+	/* API does not work for zero-length input */
+	if (len1 == 0)
+		r = 0;
+	else
+	{
+		r = MultiByteToWideChar(CP_UTF8, 0, arg1, len1,
+(LPWSTR) a1p, a1len / 2);
+		if (!r)
+			ereport(ERROR,
+	(errmsg("could not convert string to UTF-16: error code %lu",
+			GetLastError(;
+	}
+	((LPWSTR) a1p)[r] = 0;
+
+	if (len2 == 0)
+		r = 0;
+	else
+	{
+		r = MultiByteToWideChar(CP_UTF8, 0, arg2, len2,
+(LPWSTR) a2p, a2len / 2);
+		if (!r)
+			ereport(ERROR,
+	(errmsg("could not convert string to UTF-16: error code %lu",
+			GetLastError(;
+	}
+	((LPWSTR) a2p)[r] = 0;
+
+	errno = 0;
+#ifdef HAVE_LOCALE_T
+	if (locale)
+		result = wcscoll_l((LPWSTR) a1p, (LPWSTR) a2p, locale->info.lt);
+	else
+#endif
+		result = wcscoll((LPWSTR) a1p, (LPWSTR) a2p);
+	if (result == 2147483647)	/* _NLSCMPERROR; missing from mingw
+ * headers */
+		ereport(ERROR,
+(errmsg("could not compare Unicode strings: %m")));
+
+	if (a1p != a1buf)
+		pfree(a1p);
+	if (a2p != a2buf)
+		pfree(a2p);
+
+	return result;
+}
+#endif
+
+/*
+ * pg_strcoll
+ *
+ * Call ucol_strcollUTF8(), ucol_strcoll(), strcoll(), strcoll_l(), wcscoll(),
+ * or wcscoll_l() as appropriate for the given locale, platform, and database
+ * encoding. Arguments must be NUL-terminated. If the locale is not specified,
+ * use the database collation.
+ *
+ * If the collation is deterministic, break ties with memcmp(), and then with
+ * the string length.
+ */
+int
+pg_strcoll(const char *arg1, int len1, const char *arg2, int len2,
+		   pg_locale_t locale)
+{
+	int result;
+
+#ifdef WIN32
+	/* Win32 does not have UTF-8, so we need to map to UTF-16 */
+	if (GetDatabaseEncoding() == PG_UTF8
+		&& (!locale || locale->provider == COLLPROVIDER_LIBC))
+	{
+		result = win32_utf8_wcscoll(arg1, len1, arg2, len2, locale);
+	}
+	else
+#endif			/* WIN32 */
+	if (locale)
+	{
+		if (locale->provider == COLLPROVIDER_ICU)
+		{
+#ifdef USE_ICU
+#ifdef HAVE_UCOL_STRCOLLUTF8
+			if (GetDatabaseEncoding() == PG_UTF8)
+			{
+UErrorCode	status;
+
+status = U_ZERO_ERROR;
+result = ucol_strcollUTF8(locale->info.icu.ucol,
+		  arg1, len1,
+		  arg2, len2,
+		  &status);
+if (U_FAILURE(status))
+	ereport(ERROR,
+			(errmsg("collation failed: %s", u_errorName(status;
+			}
+			else
+#endif
+			{
+int32_t		ulen1,
+			ulen2;
+UChar	   *uchar1,
+		   *uchar2;
+
+ulen1 = icu_to_uchar(&uchar1, arg1, len1);
+ulen2 = icu_to_uchar(&uchar2, arg2, len2);
+
+result = ucol_strcoll(locale->info.icu.ucol,
+	  uchar1, ulen1,
+	  uchar2, ulen2);
+
+pfree(uchar1);
+pfree(uchar2);
+			}
+#else			/* not USE_ICU */
+			/* shouldn't happen */
+			elog(ERROR, "unsupported collprovider: %c", locale->provider);
+#endif			/* not USE_ICU */
+		}
+		else
+		{
+#ifdef H