------- Comment #18 from jakub at gcc dot gnu dot org 2009-05-13 08:30 ------- No, .p2align is the right thing to do, given that GCC doesn't have 100% accurate information about instruction sizes (for e.g. inline asms it can't have, for stuff where branch shortening can decrease the size doesn't have it until the shortening branch phase which is too late for this machine reorg, and in other cases the lengths are just upper bounds). Say .p2align 16,,5 says insert a nop up to 5 bytes if you can reach the 16-byte boundary with it, otherwise don't insert anything. But that necessarily means that there were less than 11 bytes in the same 16 byte page and if the lower bound insn size estimation determined that in 11 bytes you can't have 3 branch changing instructions, you are fine. Breaking of fused compare and jump (32-bit code only) is unfortunate, but inserting it before the cmp would mean often unnecessarily large padding.
-- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39942