On Sat, Oct 06, 2001 at 07:21:03PM +0100, Markus Kuhn wrote: > Please substantiate any claims about performance by actually making a > realistic measurement, not a guess. Most such guesses are naive on modern > processor architectures, which typically are RAM bound for searches, not > CPU bound.
It comes out about the same; the decoding logic offsets the gain of doing less compares. It pulls ahead of raw strstr with a simple optimization of removing an unnecessary inner loop conditional (about 10% on my system), so it's not quite RAM-bound. I hadn't looked at decoding logic much; a big reason this ends up faster is the "(minor) hit of UTF-8 decoding logic" is in fact no hit at all, since it's just a small table lookup. Full UTF-8 decoding is a bunch of conditionals, which is about three times slower (it's all the memory access with a lot more branching)--using Vim's UTF8 decoder, which looks fairly standard-issue--but that's not needed here. There are also tradeoffs for "safe" behavior--should every string function do validation logic? The strpbrk function posted recently has a conditional in the inner loop to report errors for invalid UTF-8 sequences; it may or may not make a speed difference. Personally, I'd leave validation in one place, validate strings when they're created and have thinner library routines. (It might not matter.) -- Glenn Maynard - Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/