Re: Optimization of bit field assignemnts
On Mon, Feb 12, 2024 at 5:49 PM Hugh Gleaves via Gcc wrote: > > I’m interested in whether it would be feasible to add an optimization that > compacted assignments to multiple bit fields. > > Today, if I have a 32 bit long struct composed of say, four 8 bit fields and > assign constants to them like this: > > ahb1_ptr->RCC.CFGR.MCO1_PRE = 7; > ahb1_ptr->RCC.CFGR.I2SSC = 0; > ahb1_ptr->RCC.CFGR.MCO1 = 3; > > This generates code (on Arm) like this: > > ahb1_ptr->RCC.CFGR.MCO1_PRE = 7; > 0x08000230 ldr.w r1, [r3, #2056] @ 0x808 > 0x08000234 orr.w r1, r1, #117440512 @ 0x700 > 0x08000238 str.w r1, [r3, #2056] @ 0x808 > ahb1_ptr->RCC.CFGR.I2SSC = 0; > 0x0800023c ldr.w r1, [r3, #2056] @ 0x808 > 0x08000240 bfc r1, #23, #1 > 0x08000244 str.w r1, [r3, #2056] @ 0x808 > ahb1_ptr->RCC.CFGR.MCO1 = 3; > 0x08000248 ldr.w r1, [r3, #2056] @ 0x808 > 0x0800024c orr.w r1, r1, #6291456 @ 0x60 > 0x08000250 str.w r1, [r3, #2056] @ 0x808 > > It would be an improvement, if the compiler analyzed these assignments and > realized they are all modifications to the same 32 bit datum, generate an > appropriate OR and AND bitmask and then apply those to the register and do > just a single store at the end. > > In other words, infer the equivalent of this: > > RCC->CFGR &= ~0x07E0; > RCC->CFGR |=0x0760; > > This strikes me as very feasible, the compiler knows the offset and bit > length of the sub fields so all of the information needed seems to be present. There is the store-merging pass which should already do this when constraints allow. Richard. > Thoughts… > > >
Re: Optimization of bit field assignemnts
On 12/02/2024 17:47, Hugh Gleaves via Gcc wrote: I’m interested in whether it would be feasible to add an optimization that compacted assignments to multiple bit fields. Today, if I have a 32 bit long struct composed of say, four 8 bit fields and assign constants to them like this: ahb1_ptr->RCC.CFGR.MCO1_PRE = 7; ahb1_ptr->RCC.CFGR.I2SSC = 0; ahb1_ptr->RCC.CFGR.MCO1 = 3; This generates code (on Arm) like this: ahb1_ptr->RCC.CFGR.MCO1_PRE = 7; 0x08000230 ldr.w r1, [r3, #2056] @ 0x808 0x08000234 orr.w r1, r1, #117440512 @ 0x700 0x08000238 str.w r1, [r3, #2056] @ 0x808 ahb1_ptr->RCC.CFGR.I2SSC = 0; 0x0800023c ldr.w r1, [r3, #2056] @ 0x808 0x08000240 bfc r1, #23, #1 0x08000244 str.w r1, [r3, #2056] @ 0x808 ahb1_ptr->RCC.CFGR.MCO1 = 3; 0x08000248 ldr.w r1, [r3, #2056] @ 0x808 0x0800024c orr.w r1, r1, #6291456 @ 0x60 0x08000250 str.w r1, [r3, #2056] @ 0x808 It would be an improvement, if the compiler analyzed these assignments and realized they are all modifications to the same 32 bit datum, generate an appropriate OR and AND bitmask and then apply those to the register and do just a single store at the end. In other words, infer the equivalent of this: RCC->CFGR &= ~0x07E0; RCC->CFGR |=0x0760; This strikes me as very feasible, the compiler knows the offset and bit length of the sub fields so all of the information needed seems to be present. Thoughts… In most such cases, the underlying definition of the structure (or the pointer to the structure) is volatile, because it is a hardware register. The compiler cannot combine the register field settings, because volatile accesses must not be combined - precisely so that programmers can reliably control hardware. It is normal to want to be sure that a particular bitfield is changed, and only after that will the next bitfield be changed, and so on. Sometimes that means the result is slower than it would have to be - but this is much better than giving wrong results when the programmer needs the changes to be handled separately. It is not uncommon for the bytes underlying a hardware register bitfield struct to be available directly as well, letting you do the bit manipulation in a local copy which you then write out in a single operation.
Optimization of bit field assignemnts
I’m interested in whether it would be feasible to add an optimization that compacted assignments to multiple bit fields. Today, if I have a 32 bit long struct composed of say, four 8 bit fields and assign constants to them like this: ahb1_ptr->RCC.CFGR.MCO1_PRE = 7; ahb1_ptr->RCC.CFGR.I2SSC = 0; ahb1_ptr->RCC.CFGR.MCO1 = 3; This generates code (on Arm) like this: ahb1_ptr->RCC.CFGR.MCO1_PRE = 7; 0x08000230 ldr.w r1, [r3, #2056] @ 0x808 0x08000234 orr.w r1, r1, #117440512 @ 0x700 0x08000238 str.w r1, [r3, #2056] @ 0x808 ahb1_ptr->RCC.CFGR.I2SSC = 0; 0x0800023c ldr.w r1, [r3, #2056] @ 0x808 0x08000240 bfc r1, #23, #1 0x08000244 str.w r1, [r3, #2056] @ 0x808 ahb1_ptr->RCC.CFGR.MCO1 = 3; 0x08000248 ldr.w r1, [r3, #2056] @ 0x808 0x0800024c orr.w r1, r1, #6291456 @ 0x60 0x08000250 str.w r1, [r3, #2056] @ 0x808 It would be an improvement, if the compiler analyzed these assignments and realized they are all modifications to the same 32 bit datum, generate an appropriate OR and AND bitmask and then apply those to the register and do just a single store at the end. In other words, infer the equivalent of this: RCC->CFGR &= ~0x07E0; RCC->CFGR |=0x0760; This strikes me as very feasible, the compiler knows the offset and bit length of the sub fields so all of the information needed seems to be present. Thoughts…