> Patch that attempts to take into account .p2align directives that are emitted
> for (some) CODE_LABELs and also the gen_align insns that the pass itself
> inserts.  For a CODE_LABEL, say .p2align 16,,10 means either that the .p2align
> directive starts a new 16 byte page (then insns before it are never
> interesting), or nothing was skipped because more than 10 bytes would need to
> be skipped.  But that means the current group could contain only 5 or less
> bytes of instructions before the label, so again, we don't have to look at
> instructions not in the last 5 bytes.
> Another fix is that for MAX_SKIP < 7, ASM_OUTPUT_MAX_SKIP_ALIGN shouldn't emit
> the second .p2align 3, which might (and often does) skip more than MAX_SKIP
> bytes (up to 7).

Nice path. Code looks better. It checked on Linux kernel
But 2 notes:

1.There is no garanty that .p2align will be translated to NOPs. Example:

# cat test.c
void f(int i)
        if (i == 1) F(1);
        if (i == 2) F(2);
        if (i == 3) F(3);
        if (i == 4) F(4);
        if (i == 5) F(5);
# gcc -o test.s test.c -O2 -S
# cat test.s
        .file   "test.c"
        .p2align 4,,15
.globl f
        .type   f, @function
        cmpl    $1, %edi
        je      .L7
        cmpl    $2, %edi
        je      .L7
        cmpl    $3, %edi
        je      .L7
        cmpl    $4, %edi
        .p2align 4,,5    <------- attempt of padding
        je      .L7
        cmpl    $5, %edi
        je      .L7
        .p2align 4,,10
        .p2align 3
        xorl    %eax, %eax
        jmp     F
        .size   f, .-f
        .ident  "GCC: (GNU) 4.5.0 20090512 (experimental)"
        .section        .note.GNU-stack,"",@progbits

# gcc -o test.out test.s -O2 -c
# objdump -d test.out
0000000000000000 <f>:
   0:   83 ff 01                cmp    $0x1,%edi
   3:   74 1b                   je     20 <f+0x20>
   5:   83 ff 02                cmp    $0x2,%edi
   8:   74 16                   je     20 <f+0x20>
   a:   83 ff 03                cmp    $0x3,%edi
   d:   74 11                   je     20 <f+0x20>
   f:   83 ff 04                cmp    $0x4,%edi
  12:   74 0c                   je     20 <f+0x20>      <---- no NOP here 
  14:   83 ff 05                cmp    $0x5,%edi
  17:   74 07                   je     20 <f+0x20>
  19:   f3 c3                   repz retq 

IMHO, better to insert not .p2align, but NOPs directly. ( I mean line -
emit_insn_before (gen_align (GEN_INT (padsize)), insn); )

2. IMHO, it's bad idea to insert somthing between CMP and conditional jmp.
Quote from Intel 64 and IA-32 Architectures Optimization Reference Manual

>>       Optimizing for Macro-fusion
>> Macro-fusion merges two instructions to a single μop. Intel Core 
>> Microarchitecture
>> performs this hardware optimization under limited circumstances.
>> The first instruction of the macro-fused pair must be a CMP or TEST 
>> instruction. This
>> instruction can be REG-REG, REG-IMM, or a micro-fused REG-MEM comparison. The
>> second instruction (adjacent in the instruction stream) should be a 
>> conditional
>> branch.

So if we need to insert NOPs, better to do it _before_ CMP.



