Re: [HACKERS] improve Chinese locale performance

Martijn van Oosterhout Sun, 28 Jul 2013 02:41:47 -0700

On Tue, Jul 23, 2013 at 10:34:21AM -0400, Robert Haas wrote:
> I pretty much lost interest in ICU upon reading that they use UTF-16
> as their internal format.
> 
> http://userguide.icu-project.org/strings#TOC-Strings-in-ICU


The UTF-8 support has been steadily improving:

  For example, icu::Collator::compareUTF8() compares two UTF-8 strings
  incrementally, without converting all of the two strings to UTF-16 if
  there is an early base letter difference.

http://userguide.icu-project.org/strings/utf-8

For all other encodings you should be able to use an iterator. As to
performance I have no idea.

The main issue with strxfrm() is its lame API. If it supported
returning prefixes you'd be set, but as it is you need >10MB of memory
just to transform a 10MB string, even if only the first few characers
would be enough to sort...

Mvg,
-- 
Martijn van Oosterhout   <[email protected]>   http://svana.org/kleptog/
> He who writes carelessly confesses thereby at the very outset that he does
> not attach much importance to his own thoughts.
   -- Arthur Schopenhauer

signature.asc
Description: Digital signature

Re: [HACKERS] improve Chinese locale performance

Reply via email to