On Fri, Sep 9, 2016 at 6:39 AM, Dave Page <dp...@pgadmin.org> wrote: > Looking back at my old emails, apparently ICU 5.0 and later include > ucol_strcollUTF8() which avoids the need to convert UTF-8 characters > to 16 bit before sorting. RHEL 6 has the older 4.2 version of ICU.
At the risk of stating the obvious, there is a reason why ICU traditionally worked with UTF-16 natively. It's the same reason why many OSes and application frameworks (e.g., Java) use UTF-16 internally, even though UTF-8 is much more popular on the web. Which is: there are certain low-level optimizations possible that are not possible with UTF-8. I'm not saying that it would be just as good if we were to not use the UTF-8 optimized stuff that ICU now has. My point is that it's not useful to prejudge whether or not performance will be acceptable based on a factor like this, which is ultimately just an implementation detail. The ICU patch either performs acceptably as a substitute for something like glibc, or it does not. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers