At 11:47 AM 1/1/2002 +0000, Tim Bunce wrote: >On Mon, Dec 31, 2001 at 06:53:29AM -1000, David & Lisa Jacobs wrote: > > From: "Dan Sugalski" <[EMAIL PROTECTED]> > > > >Agreed. I'll probably have the encoding structure provide the > > terminating > > > >bytes. As a side note don't we also have to split UTF-16 into UTF-16BE > > and > > > >UTF-16LE (big endian and little endian)? > > > > > > I think UTF-16 can be a single encoding. The little/big endian issue can > > be > > > dealt with by an I/O filter. > > > > Will an IO filter have an opportunity to inject itself when we mmap a file? > > It was because you said you wanted this capability that I thought we were > > maintaining the serialized forms of unicode encodings. Otherwise, I would > > be highly tempted to convert the internal representation of all unicode > > strings into and array of 4 byte ints (allows for much faster processing). > >That's an assumption that may not always/often be true. Especially given the >impact on cpu data caches.
Yeah, we may want to go UTF-16 for most things. I need to check and see how much spillover there's been to >ffff space in the Unicode spec. Dan --------------------------------------"it's like this"------------------- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk