At 11:16 AM 2/12/2001 +0000, Tim Bunce wrote:
>On Fri, Feb 09, 2001 at 04:15:42PM -0500, Dan Sugalski wrote:
> >
> > >On the other side, for a string that is matched against regexps, it 
> doesn't
> > >matter much if it has variable character length, since regexps 
> normally read
> > >all the string anyway, and indexing characters isn't much of a concern.
> >
> > You underestimate the impact of variable-length data, I think. Regexes
> > should go rather faster on fixed-length than variable length data. How 
> much
> > so depends on your processor. (I can guarantee that Alphas will run a
> > darned sight faster on UTF-32 than UTF-8...)
>
>Umm, don't cpu data cache size issues complicate that? What if the ~4x
>bigger UTF-32 string doesn't fit in the cache but the UTF-8 one does?
>(I'm obviously simplifying somewhere here, but you get the idea.)

Yeah, this does definitely complicate things, and how the string gets 
accessed makes a difference as well. I'm cobbling together some tests to 
see what the full effects look like--I should have something in a day or 
three, depending on local work conditions.

                                        Dan

--------------------------------------"it's like this"-------------------
Dan Sugalski                          even samurai
[EMAIL PROTECTED]                         have teddy bears and even
                                      teddy bears get drunk

Reply via email to