On Thu, 28 Jul 2005, Steven Rostedt wrote: > > In the thread "[RFC][PATCH] Make MAX_RT_PRIO and MAX_USER_RT_PRIO > configurable" I discovered that a C version of find_first_bit is faster > than the asm version now when compiled against gcc 3.3.6 and gcc 4.0.1 > (both from versions of Debian unstable). I wrote a benchmark (attached) > that runs the code 1,000,000 times.
I suspect the old "rep scas" has always been slower than compiler-generated code, at least under your test conditions. Many of the old asm's are actually _very_ old, and some of them come from pre-0.01 days and are more about me learning the i386 (and gcc inline asm). That said, I don't much like your benchmarking methodology. I suspect that quite often, the code in question runs from L2 cache, not in a tight loop, and so that "run a million times" approach is not necessarily the best one. I'll apply this one as obvious: I doubt the compiler generates bigger code or has any real downsides, but I just wanted to say that in general I just wish people didn't always time the hot-cache case ;) Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/