On Thu, Mar 15, 2007 at 01:28:58PM -0700, Rob Cameron wrote: > Simon, > > You asked about relevance. The UTF-8 to UTF-16 bottleneck > is widely cited in literature on XML processing performance.
And why would you do this? Simply keep the data as UTF-8. There's no good reason for using UTF-16 at all; it's just a bad implementation choice. IIRC either HTML or XML (yes I know they're different but I forget which does it..) specifies that everything is UTF-16 internally, which is naturally a stupid thing to specify, but this can in almost all cases simply be ignored because it's an internal implementation detail that's not testably different. > For example, in SAX processing, Psaila finds that transcoding > takes > 50% of XML processing time. But isn't XML processing something like 1-5% of your total time for whatever you're ultimately trying to do? Rich -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/