+ /* Win32 does not have UTF-8, so we need to map to UTF-16 */ I wonder if this is still true. I think in Windows 10+ you can enable UTF-8 support. Then could you use strcoll_l() directly? I struggled to understand that, but I am a simple Unix hobbit from the shire so I dunno. (Perhaps the *whole OS* has to be in that mode, so you might have to do a runtime test? This was discussed in another thread that mostly left me confused[1].).
And that leads to another thought. We have an old comment "Unfortunately, there is no strncoll(), so ...". Curiously, Windows does actually have strncoll_l() (as do some other libcs out there). So after skipping the expansion to wchar_t, one might think you could avoid the extra copy required to nul-terminate the string (and hope that it doesn't make an extra copy internally, far from given). Unfortunately it seems to be defined in a strange way that doesn't look like your pg_strncoll_XXX() convention: it has just one length parameter, not one for each string. That is, it's designed for comparing prefixes of strings, not for working with non-null-terminated strings. I'm not entirely sure if the interface makes sense at all! Is it measuring in 'chars' or 'encoded characters'? I would guess the former, like strncpy() et al, but then what does it mean if it chops a UTF-8 sequence in half? And at a higher level, if you wanted to use it for our purpose, you'd presumably need Min(s1_len, s2_len), but I wonder if there are string pairs that would sort in a different order if the collation algorithm could see more characters after that? For example, in Dutch "ij" is sometimes treated like a letter that sorts differently than "i" + "j" normally would, so if you arbitrarily chop that "j" off while comparing common-length prefix you might get into trouble; likewise for "aa" in Danish. Perhaps these sorts of problems explain why it's not in the standard (though I see it was at some point in some kind of draft; I don't grok the C standards process enough to track down what happened but WG20/WG14 draft N1027[2] clearly contains strncoll_l() alongside the stuff that we know and use today). Or maybe I'm underthinking it. [1] https://www.postgresql.org/message-id/flat/CA%2BhUKGJ%3DXThErgAQRoqfCy1bKPxXVuF0%3D2zDbB%2BSxDs59pv7Fw%40mail.gmail.com [2] https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1027.pdf