https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67317
Bug ID: 67317 Summary: [x86] Silly code generation for _addcarry_u32/_addcarry_u64 Product: gcc Version: 5.2.0 Status: UNCONFIRMED Severity: minor Priority: P3 Component: inline-asm Assignee: unassigned at gcc dot gnu.org Reporter: myriachan at gmail dot com Target Milestone: --- x86 intrinsics _addcarry_u32 and _addcarry_u64 generate silly code. For example, the following function to get the result of a 64-bit addition (the XOR is to the output clearer): u64 testcarry(u64 a, u64 b, u64 c, u64 d) { u64 result0, result1; _addcarry_u64(_addcarry_u64(0, a, c, &result0), b, d, &result1); return result0 ^ result1; } This is the code generated with -O1, -O2 and -O3: xor r8d, r8d add r8b, -1 adc rdx, rdi setc r8b mov rax, rdx add r8b, -1 adc rcx, rsi xor rax, rcx ret The first sillyness is that _addcarry_u64 does not optimize a compile-time constant 0 being the first carry parameter. Instead of "adc", it should just use "add". The second sillyness is with the use of r8b to store the carry flag, then using "add r8b, -1" to put the result back into carry. Instead, the code should be something like this: add rdx, rdi mov rax, rdx adc rcx, rsi xor rax, rcx ret Naturally, for something this simple, I'd use unsigned __int128, but this came up in large number math.