On Wed, 2008-09-03 at 13:22 -0700, Brian Aker wrote: > Hi! > > On Sep 3, 2008, at 10:03 AM, Jim Starkey wrote: > > > I'm planning to use ICU (IBM's International Components for Unicode) > > for the actual collations. It's licensed under MIT's X11 license, > > and is GPL compatible. > > Postgres made a go at using that (and so did one of the "P" languages > according to Tim Bray). They all found is to be a less then desirable > library from the stand point of performance.
PHP 6 is using ICU and therefore Utf-16 internally which means lots of time is spent on converting everything (script code [identifiers, ...], user input, data from different libraries PHP uses, ...) is converted (mostly) from Utf-8 to utf-16, processed and then converted back to utf-8, this makes the code more complex and simple benchmarks I did in the early times of the development showed quite some impact ... And well, PHP 6 is in development for a few years and we recently merged all features, except Unicode-related stuff, back to 5.3 -- you can image how nice using ICU works ;-) I think Yahoo! (who sponsored lots of the initial work) evaluated some other options but found ICU being the best way, even though it's expensive and well, I guess for them performance really matters. johannes _______________________________________________ Mailing list: https://launchpad.net/~drizzle-discuss Post to : [email protected] Unsubscribe : https://launchpad.net/~drizzle-discuss More help : https://help.launchpad.net/ListHelp

