On Thu, 4 Oct 2001, Glenn Maynard wrote: > A big hit, but I wonder how much is avoidable. The three cases for this, I > think, are: strstr (dumb, ends up comparing continuation bytes); strstr > that knows utf8 (avoid comparing those bytes); or converting to UCS-2 or > UCS-4 and doing a memcmp. > > I think skipping continuations would be a speed hit--you'd be taking > the (minor) hit of UTF-8 decoding logic for every character, and all > you're saving is a few byte compares. (Actually, a lot of byte > compares, but it's a lot less code.)
Please substantiate any claims about performance by actually making a realistic measurement, not a guess. Most such guesses are naive on modern processor architectures, which typically are RAM bound for searches, not CPU bound. Markus -- Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK Email: mkuhn at acm.org, WWW: <http://www.cl.cam.ac.uk/~mgk25/> - Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/