On Sat, Oct 06, 2001 at 07:21:03PM +0100, Markus Kuhn wrote:
> Please substantiate any claims about performance by actually making a
> realistic measurement, not a guess. Most such guesses are naive on modern
> processor architectures, which typically are RAM bound for searches, not
> CPU bound.

It comes out about the same; the decoding logic offsets the gain of doing
less compares.  It pulls ahead of raw strstr with a simple optimization
of removing an unnecessary inner loop conditional (about 10% on my system),
so it's not quite RAM-bound.

I hadn't looked at decoding logic much; a big reason this ends up faster
is the "(minor) hit of UTF-8 decoding logic" is in fact no hit at all,
since it's just a small table lookup.  Full UTF-8 decoding is a bunch of
conditionals, which is about three times slower (it's all the memory access
with a lot more branching)--using Vim's UTF8 decoder, which looks
fairly standard-issue--but that's not needed here.

There are also tradeoffs for "safe" behavior--should every string
function do validation logic?  The strpbrk function posted recently has
a conditional in the inner loop to report errors for invalid UTF-8
sequences; it may or may not make a speed difference.  Personally, I'd
leave validation in one place, validate strings when they're created and
have thinner library routines.  (It might not matter.)

-- 
Glenn Maynard
-
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Reply via email to