The feature already exists at -Os by default (i.e., all functions are by
default minimally aligned).  The suggestion here is only to let GCC minimize
the amount of padding it adds to functions in order to align the explicitly
overaligned ones that follow by changing the order it emits them in.

Outside -Os, functions would continue to be optimally aligned unless overridden
by the attribute.  When their alignment is explicitly reduced by the attribute
GCC could still be smart about ordering them so as to minimize wasted space. 

  __attribute__ ((aligned (4))) int f4 (int i) { return 2 * i; }
  double f (double x) { return x * x * x; }
  __attribute__ ((aligned (4))) int g4 (int i) { return i; }

for which GCC for x86_64 emits:

  0000000000000000 <f4>:        ;; unnecessarily overaligned
     0: 8d 04 3f                lea    (%rdi,%rdi,1),%eax
     3: c3                      retq   
     4: 66 90                   xchg   %ax,%ax
     6: 66 2e 0f 1f 84 00 00    nopw   %cs:0x0(%rax,%rax,1)
     d: 00 00 00 

  0000000000000010 <f>:         ;; optimally aligned
    10: 66 0f 28 c8             movapd %xmm0,%xmm1
    14: f2 0f 59 c8             mulsd  %xmm0,%xmm1
    18: f2 0f 59 c1             mulsd  %xmm1,%xmm0
    1c: c3                      retq   
    1d: 0f 1f 00                nopl   (%rax)

  0000000000000020 <g4>:        ;; also unnecessarily overaligned
    20: 89 f8                   mov    %edi,%eax
    22: c3                      retq   

If it laid down f first instead it would be able to avoid padding f4:

0000000000000000 <f>:
   0:   66 0f 28 c8             movapd %xmm0,%xmm1
   4:   f2 0f 59 c8             mulsd  %xmm0,%xmm1
   8:   f2 0f 59 c1             mulsd  %xmm1,%xmm0
   c:   c3                      retq   
   d:   0f 1f 00                nopl   (%rax)

0000000000000010 <f4>:          ;; unavoidably overaligned
  10:   8d 04 3f                lea    (%rdi,%rdi,1),%eax
  13:   c3                      retq   

0000000000000014 <g4>:          ;; aligned exactly as requested
  14:   89 f8                   mov    %edi,%eax
  16:   c3                      retq   

This is probably not important outside -Os, but if it's implemented for -Os it
won't cost anything to also enable it at other optimization levels.

The following discussion provides some context:

