On Thu, 4 Oct 2001, Glenn Maynard wrote:
> A big hit, but I wonder how much is avoidable.  The three cases for this, I
> think, are: strstr (dumb, ends up comparing continuation bytes); strstr
> that knows utf8 (avoid comparing those bytes); or converting to UCS-2 or
> UCS-4 and doing a memcmp.
>
> I think skipping continuations would be a speed hit--you'd be taking
> the (minor) hit of UTF-8 decoding logic for every character, and all
> you're saving is a few byte compares.  (Actually, a lot of byte
> compares, but it's a lot less code.)

Please substantiate any claims about performance by actually making a
realistic measurement, not a guess. Most such guesses are naive on modern
processor architectures, which typically are RAM bound for searches, not
CPU bound.

Markus

-- 
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org,  WWW: <http://www.cl.cam.ac.uk/~mgk25/>

-
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Reply via email to