https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90622
Bug ID: 90622 Summary: Suboptimal code generated for __builtin_avr_insert_bits Product: gcc Version: 5.4.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: igusarov at mail dot ru Target Milestone: --- Please consider the following function: uint8_t copy_bit_5_to_bit_2(uint8_t dst, uint8_t src) { return __builtin_avr_insert_bits(0xFFFFF5FF, src, dst); } That particular map value (magic hex constant) is supposed to copy the 5-th bit from argument 'src' to the 2-nd bit of argument 'dst' while leaving all other bits of src unmodified. In other words, given that bit representation of src is [s7 s6 s5 s4 s3 s2 s1 s0], and bit representation of dst is [d7 d6 d5 d4 d3 d2 d1 d0], it should return [d7 d6 d5 d4 d3 s5 d1 d0]. The code generated for such function is perfect: bst r22,5 # Take the 5-th bit of r22 bld r24,2 # Put it as the 2-nd bit in r24 Similar code is generated for copying any n-th bit to any m-th bit, provided that n and m are different. Thus far everything is great. However, the code generated for copying n-th bit to n-th bit is surprisingly suboptimal. A similar function uint8_t copy_bit_2_to_bit_2(uint8_t dst, uint8_t src) { return __builtin_avr_insert_bits(0xFFFFF2FF, src, dst); } gives: eor r22,r24 andi r22,lo8(4) eor r24,r22 which takes an extra word of program memory and an extra CPU cycle at runtime. I wonder what's wrong with using the same bst/bld idiom which is successfully used for n-to-m copy? I would expect that the following code is much better: bst r22,2 bld r24,2 It would be great if the compiler can generate it.