According to Intel Technology Journal [1], page 270, bt instruction runs 20% faster on Core2 Duo than equivalent generic code.
---Qoute from p.270--- The bit test instruction bt was introduced in the i386 processor. In some implementations, including the Intel NetBurst® micro-architecture, the instruction has a high latency. The Intel Core micro-architecture executes bt in a single cycle, when the bit base operand is a register. Therefore, the Intel C++/Fortran compiler uses the bt instruction to implement a common bit test idiom when optimizing for the Intel Core micro-architecture. The optimized code runs about 20% faster than the generic version on an Intel Core 2 Duo processor. Both of these versions are shown below: C source code int x, n; ... if (x & (1 << n)) ... Generic code generation ; edx contains x, ecx contains n. mov eax, 1 shl eax, cl test edx, eax je taken Intel Core micro-architecture code generation ; edx contains x, eax contains n. bt edx, eax jae taken ---/Quote--- I have a patch in testing that implements suggested optimization for TARGET_USE_BT (including core2) targets. [1] Inside the Intel® 10.1 Compilers: New Threadizer and New Vectorizer for Intel® Core2 Processors, Intel Technology Journal, Vol. 11, Issue 4, November 15, 2007, http://download.intel.com/technology/itj/2007/v11i4/1-inside/1-Inside_the_Intel_Compilers.pdf -- Summary: Generate bit test (bt) instructions Product: gcc Version: 4.3.0 Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: target AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: ubizjak at gmail dot com GCC target triplet: x86 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36473