On Fri, Sep 9, 2016 at 6:39 AM, Dave Page <dp...@pgadmin.org> wrote:
> Looking back at my old emails, apparently ICU 5.0 and later include
> ucol_strcollUTF8() which avoids the need to convert UTF-8 characters
> to 16 bit before sorting. RHEL 6 has the older 4.2 version of ICU.

At the risk of stating the obvious, there is a reason why ICU
traditionally worked with UTF-16 natively. It's the same reason why
many OSes and application frameworks (e.g., Java) use UTF-16
internally, even though UTF-8 is much more popular on the web. Which
is: there are certain low-level optimizations possible that are not
possible with UTF-8.

I'm not saying that it would be just as good if we were to not use the
UTF-8 optimized stuff that ICU now has. My point is that it's not
useful to prejudge whether or not performance will be acceptable based
on a factor like this, which is ultimately just an implementation
detail. The ICU patch either performs acceptably as a substitute for
something like glibc, or it does not.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to