I would expect gcc to generate comparable code for both functions below, or perhaps even better code for foo() than for bar() since the code in foo() is likely to be more common than the equivalent code in bar(). However, the code produced for foo() is suboptimal in comparison to the code for bar(). In my timings on x86 with gcc 4.3.0 at -O2, foo() appears to run about 5% slower than bar().
$ cat t.c && gcc -S -O2 t.c && cat t.s int foo (int *a, int *b) { return a && b || !a && !b; } int bar (int *a, int *b) { return !!a == !!b; } .file "t.c" .text .p2align 4,,15 .globl foo .type foo, @function foo: .LFB2: testq %rdi, %rdi je .L2 testq %rsi, %rsi movl $1, %eax je .L2 rep ret .p2align 4,,10 .p2align 3 .L2: testq %rdi, %rdi sete %al testq %rsi, %rsi sete %dl andl %edx, %eax movzbl %al, %eax ret .LFE2: .size foo, .-foo .p2align 4,,15 .globl bar .type bar, @function bar: .LFB3: testq %rdi, %rdi sete %al testq %rsi, %rsi setne %dl xorl %edx, %eax movzbl %al, %eax ret .LFE3: .size bar, .-bar .section .eh_frame,"a",@progbits .Lframe1: .long .LECIE1-.LSCIE1 .LSCIE1: .long 0x0 .byte 0x1 .string "zR" .uleb128 0x1 .sleb128 -8 .byte 0x10 .uleb128 0x1 .byte 0x3 .byte 0xc .uleb128 0x7 .uleb128 0x8 .byte 0x90 .uleb128 0x1 .align 8 .LECIE1: .LSFDE1: .long .LEFDE1-.LASFDE1 .LASFDE1: .long .LASFDE1-.Lframe1 .long .LFB2 .long .LFE2-.LFB2 .uleb128 0x0 .align 8 .LEFDE1: .LSFDE3: .long .LEFDE3-.LASFDE3 .LASFDE3: .long .LASFDE3-.Lframe1 .long .LFB3 .long .LFE3-.LFB3 .uleb128 0x0 .align 8 .LEFDE3: .ident "GCC: (GNU) 4.3.0 20080428 (Red Hat 4.3.0-8)" .section .note.GNU-stack,"",@progbits -- Summary: suboptimal code for (a && b || !a && !b) Product: gcc Version: 4.3.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: sebor at roguewave dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38126