Chandler Carruth <[email protected]> writes: > CodeGen: When bitfields fall on natural boundaries, split them up > > Please revert. We used to do this, and I worked very hard to specifically > design the system to work the way it does now. There are copious comments, > email threads, and discussions where we debated how to handle this and decided > to do so in the exact way. > > If you want to re-open the discussion, cool. But please do so as a discussion, > and cite specifically why the rationale for using fully-wide loads and stores > is inapplicable or causes insurmountable problems.
Hm, I hadn't really realized the history on this. I guess you're talking about r169489, "Rework the bitfield access IR generation". Looks to me like we really did a lot more than just break bitfields up on natural boundaries. Note that my change here doesn't change the way we lay the bitfields out *at all*, and that all of the tests added in r169489 pass unchanged with it. I can understand the argument against extra complexity though: I'll get back to that in a minute. > On Wed, Aug 13, 2014 at 7:42 PM, Justin Bogner <[email protected]> wrote: > > This leads to bad code being generated in most of our 32 bit backends. > > Also, cite your source. I tested this on ARM and x86 when I changed bitfields > to work the way they did before your patch and we generated very good code. I was looking into some poor code generation on something like this: typedef struct { unsigned int lsb32 : 32; unsigned int msb7 : 7; unsigned int padding : 25; } P; void foo(P *p) { p->lsb32 = 0x0000; } On armv7, this was giving pretty poor code. At -O0: push {r4, lr} sub sp, sp, #8 mov r1, r0 str r0, [sp, #4] ldrb r2, [r0, #5] mov r3, r0 ldrb r12, [r3, #4]! ldrb lr, [r3, #2] ldrb r4, [r3, #3] strb r4, [r3, #3] strb lr, [r3, #2] strb r2, [r0, #5] mov r2, #0 strb r2, [r0, #3] strb r2, [r0, #2] strb r2, [r0, #1] strb r2, [r0] strb r12, [r3] str r1, [sp] @ 4-byte Spill add sp, sp, #8 pop {r4, pc} and at -O3: push {lr} mov r2, r0 ldrb lr, [r0, #5] ldrb r12, [r2, #4]! ldrb r1, [r2, #3] ldrb r3, [r2, #2] strb r1, [r2, #3] mov r1, #0 strb r3, [r2, #2] strb lr, [r0, #5] strb r1, [r0, #3] strb r1, [r0, #2] strb r1, [r0, #1] strb r1, [r0] strb r12, [r2] pop {lr} bx lr Where with the change I've made here, we have -O0: sub sp, sp, #8 mov r1, r0 str r0, [sp, #4] mov r2, #0 strb r2, [r0, #3] strb r2, [r0, #2] strb r2, [r0, #1] strb r2, [r0] str r1, [sp] @ 4-byte Spill add sp, sp, #8 bx lr and -O3: mov r1, #0 strb r1, [r0, #3] strb r1, [r0, #2] strb r1, [r0, #1] strb r1, [r0] bx lr The code that's being generated is pretty awful, and it just gets worse if you add more to the bitfield. I'd briefly looked at i386 as well where I saw some extra instructions at -O0 and thought things were the same. Looks like the -O3 code is identical there though, so my claim about most of our 32 bit backends was a mistake. Sorry about that. So I guess there are two ways to handle this. The change I've made, which adds a small amount of complexity to the frontend in order to avoid generating IR that does a lot of redundant work, or teaching the arm backend that a read-modify-write that masks out most of the bits should be optimized to a narrower store. I haven't looked into how hard that option is. Do you think the backend approach will be simpler (or perhaps valuable enough in other situations that it's worth its cost)? If so, I'll revert this and try that approach out. _______________________________________________ cfe-commits mailing list [email protected] http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits
