While comparing two text strings using varstr_cmp(), if *strcoll*() call returns 0, we do strcmp() tie-breaker to do binary comparison, because strcoll() can return 0 for non-identical strings :
varstr_cmp() { ... /* * In some locales strcoll() can claim that nonidentical strings are * equal. Believing that would be bad news for a number of reasons, * so we follow Perl's lead and sort "equal" strings according to * strcmp(). */ if (result == 0) result = strcmp(a1p, a2p); ... } But is this supposed to apply for ICU collations as well ? If collation provider is icu, the comparison is done using ucol_strcoll*(). I suspect that ucol_strcoll*() intentionally returns some characters as being identical, so doing strcmp() may not make sense. For e.g. , if the below two characters are compared using ucol_strcollUTF8(), it returns 0, meaning the strings are identical : Greek Oxia : UTF-16 encoding : 0x1FFD (http://www.fileformat.info/info/unicode/char/1ffd/index.htm) Greek Tonos : UTF-16 encoding : 0x0384 (http://www.fileformat.info/info/unicode/char/0384/index.htm) The characters are displayed like this : postgres=# select (U&'\+001FFD') , (U&'\+000384') collate ucatest; ?column? | ?column? ----------+---------- ´ | ΄ (Although this example has similar looking characters, this might not be a factor behind treating them equal) Now since ucol_strcoll*() returns 0, these strings are always compared using strcmp(), so 1FFD > 0384 returns true : create collation ucatest (locale = 'en_US.UTF8', provider = 'icu'); postgres=# select (U&'\+001FFD') > (U&'\+000384') collate ucatest; ?column? ---------- t Whereas, if strcmp() is skipped for ICU collations : if (result == 0 && !(mylocale && mylocale->provider == COLLPROVIDER_ICU)) result = strcmp(a1p, a2p); ... then the comparison using ICU collation tells they are identical strings : postgres=# select (U&'\+001FFD') > (U&'\+000384') collate ucatest; ?column? ---------- f (1 row) postgres=# select (U&'\+001FFD') < (U&'\+000384') collate ucatest; ?column? ---------- f (1 row) postgres=# select (U&'\+001FFD') <= (U&'\+000384') collate ucatest; ?column? ---------- t Now I have verified that strcoll() returns true for 1FFD > 0384. So, it looks like ICU API function ucol_strcoll() returns false by intention. That's the reason I feel like the strcmp-if-strtoll-returns-0 thing might not be applicable for ICU. But I may be wrong, please correct me if I may be missing something. -- Thanks, -Amit Khandekar EnterpriseDB Corporation The Postgres Database Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers