At 11:47 AM 1/1/2002 +0000, Tim Bunce wrote:
>On Mon, Dec 31, 2001 at 06:53:29AM -1000, David & Lisa Jacobs wrote:
> > From: "Dan Sugalski" <[EMAIL PROTECTED]>
> > > >Agreed.  I'll probably have the encoding structure provide the
> > terminating
> > > >bytes.  As a side note don't we also have to split UTF-16 into UTF-16BE
> > and
> > > >UTF-16LE (big endian and little endian)?
> > >
> > > I think UTF-16 can be a single encoding. The little/big endian issue can
> > be
> > > dealt with by an I/O filter.
> >
> > Will an IO filter have an opportunity to inject itself when we mmap a file?
> > It was because you said you wanted this capability that I thought we were
> > maintaining the serialized forms of unicode encodings.  Otherwise, I would
> > be highly tempted to convert the internal representation of all unicode
> > strings into and array of 4 byte ints (allows for much faster processing).
>
>That's an assumption that may not always/often be true. Especially given the
>impact on cpu data caches.

Yeah, we may want to go UTF-16 for most things. I need to check and see how 
much spillover there's been to >ffff space in the Unicode spec.

                                        Dan

--------------------------------------"it's like this"-------------------
Dan Sugalski                          even samurai
[EMAIL PROTECTED]                         have teddy bears and even
                                      teddy bears get drunk

Reply via email to