On Thu, Mar 15, 2007 at 01:28:58PM -0700, Rob Cameron wrote:
> Simon,
> 
> You asked about relevance.   The UTF-8 to UTF-16 bottleneck
> is widely cited in literature on XML processing performance.

And why would you do this? Simply keep the data as UTF-8. There's no
good reason for using UTF-16 at all; it's just a bad implementation
choice. IIRC either HTML or XML (yes I know they're different but I
forget which does it..) specifies that everything is UTF-16
internally, which is naturally a stupid thing to specify, but this can
in almost all cases simply be ignored because it's an internal
implementation detail that's not testably different.

> For example, in SAX processing, Psaila finds that transcoding
> takes > 50% of XML processing time.

But isn't XML processing something like 1-5% of your total time for
whatever you're ultimately trying to do?

Rich

--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Reply via email to