------- Comment #19 from vvv at ru dot ru 2009-05-13 11:42 ------- (In reply to comment #18) > No, .p2align is the right thing to do, given that GCC doesn't have 100% > accurate information about instruction sizes (for e.g. inline asms it can't > have, for > stuff where branch shortening can decrease the size doesn't have it until the > shortening branch phase which is too late for this machine reorg, and in other > cases the lengths are just upper bounds). Say .p2align 16,,5 says > insert a nop up to 5 bytes if you can reach the 16-byte boundary with it, > otherwise don't insert anything. But that necessarily means that there were > less than 11 bytes in the same 16 byte page and if the lower bound insn size > estimation determined that in 11 bytes you can't have 3 branch changing > instructions, you are fine. Breaking of fused compare and jump (32-bit code > only) is unfortunate, but inserting it before the cmp would mean often > unnecessarily large padding.
You are rigth, if padding required for every 16-byte page with 4 branches on it. But Intel writes about "16-byte chunk", not "16-byte page". Quote from Intel 64 and IA-32 Architectures Optimization Reference Manual: Assembly/Compiler Coding Rule 10. (M impact, L generality) Do not put more than four branches in a 16-byte chunk. IMHO, here chunk - memory range from x to x+10h, where x - _any_ address. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39942