On Mon, Aug 25, 2025 at 3:52 AM Nishant Sharma <[email protected]> wrote: > We did debug 'Experiment 1' and we find that:- > Whatever is the Column 1 in 'ORDER BY' gets correctly ordered, > because it uses abbreviated sort optimization due to which its data > datum gets converted to abbreviated datum using > "varstr_abbrev_convert()" function, and then the comparator > function selected is > "ssup->comparator = ssup_datum_unsigned_cmp()" > for sorting operation. But in case of column 2 in 'ORDER BY' (which > is showing incorrect result for 'Experiment 1') does not use > abbreviated sort optimization and here comparator function selected > is "ssup->comparator = varlenafastcmp_locale" --> > "strncoll_icu_utf8()", which appears, uses the third party ICU > library function for comparison and does not work as expected.
Generally, there are two methods for performing collation-aware string comparisons, and they are expected to produce equivalent results. For libc collations, the first method is to use strcoll() or strcoll_l() or similar to directly compared the strings, and the other is to use strxfrm() or similar to convert the string to a normalize representation which can then be compared using memcmp(). Equivalent facilities exist for ICU; see collate_methods_icu and collate_methods_icu_utf8 in pg_local_icu.c; one or the other of those should be getting used here. Even though these two methods are expected to produce equivalent results, if there's a bug, they might not. That bug could either exist in our code or it could exist in ICU; I suspect the latter is more likely, but the former is certainly possible. What I think you want to do is try to track down two specific strings where transforming them via strnxfrm_icu() produces results that compare in one order using memcmp(), but passing the same strings to strncoll_icu() or strncoll_icu_utf8() -- whichever is appropriate -- produces a different result. If you're able to find such strings, then you can probably rephrase those examples in terms of whatever underlying functions strncoll_icu() and strxfrm_icu() are calling and we can maybe report the problem to the ICU maintainers. If no such strings seem to exist, then there's probably a problem in the PostgreSQL code someplace, and you should try to figure out why we're getting the wrong answer despite ICU apparently doing the right thing. -- Robert Haas EDB: http://www.enterprisedb.com
