https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115551

            Bug ID: 115551
           Summary: [missed optimization] "c1 << (a + c2)" not optimized
                    into "(c1 << c2) << a"
           Product: gcc
           Version: 15.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: normal
          Priority: P3
         Component: middle-end
          Assignee: unassigned at gcc dot gnu.org
          Reporter: burnus at gcc dot gnu.org
                CC: pinskia at gcc dot gnu.org
  Target Milestone: ---

Created attachment 58468
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=58468&action=edit
patch to show how to get a nice output – but doesn't actually use it. Not to be
used..

"c1 << (a + c2)" not optimized into "(c1 << c2) << a"

Example:

int f(int ch) {
  unsigned long mask1 = ((((1UL))) << (1 + 4 * ((1) - 1))) << (ch * 4); 
  unsigned long mask2 = ((((1UL))) << (1 + 4 * ((ch + 1) - 1)));
  return mask1-mask2;
}

GCC converts this currently to:

mask1 = 2 << (ch * 4)
mask2 = 1 << (ch * 4 + 1)

* * *

Related to
https://lore.kernel.org/lkml/d7ef7a6158df4ba6687233b0e00d37796b069fb3.1718791090.git.u.kleine-koe...@baylibre.com/

Result: 
* With the 2nd form the resulting binary gets ~25% smaller
* Saving nearly 500 bytes!

* * *

On ARM, the generated code for mask1 is:

lsls    r0, r0, #2
movs    r3, #2
lsl.w   r0, r3, r0

and for mask2:

lsls    r0, r0, #2
adds    r0, #1  // additional 'adds' instruction
movs    r3, #1
lsl.w   r0, r3, r0

Reply via email to