On Wed, Feb 08, 2017 at 12:48:56PM +0100, Jakub Jelinek wrote: > > First of all, GCC is in stage4 and this isn't a fix for a regression, so it > is not acceptable at this point. Next stage1 will start in April or so.
I had gotten the impression there were sort of branches in the SCM; this of course should go to HEAD / trunk / master or whatever it's called. > The length of nop varies greatly across different architectures, > some architectures have different spelling for it (e.g. > ia64, s390{,x} use nop 0 and mmix uses swym 0), some architectures have > tons of different nops with different sizes (see e.g. x86_64/i686), for some > longer sizes it is often beneficial to emit a jump around the rest of nops > instead of just tons of nops. > > Even if it is counted always in nops and only nops are emitted (not really > efficient, and I guess kernel cares about efficiency), the above is Yes, efficiency is a goal, but not unchallenged. Besides, out of order micro- architectures hardly suffer from NOPs inserted other than their occupation of cache space and memory bandwidth. The worst impact I found on some in-order aarch64 CPUs was a penalty of 1 clock cycle for 2 NOPs. Note that this is a "you asked for it -- you get it" feature. Only if you explicitly request those NOPs they will be inserted; normal operation is unaffected. > definitely not acceptable for a generic hook implementation, it contains too > many ELF specific details as well as 64-bit target specific details etc. > For the label, you should use something like: > static int ppad_no; > char tmp_label[...]; > ASM_GENERATE_INTERNAL_LABEL (tmp_label, "LPPAD", ppad_no); > ppad_no++; > ASM_OUTPUT_LABEL (file, tmp_label); > the "a",@progbits is not generic enough, you want > switch_to_section (get_section ("__prolog_pads_loc", 0, NULL)); > or so. .previous doesn't work on many targets, you need to actuall switch > to the previously current section. .quad is not supported by many > assemblers, you want assemble_integer, with the appropriate size (say > derived from size of pointer), decide whether in that section everything is > aligned or unaligned, emit aligning directive if the former. Thanks a lot for this blueprint! It saved me half of the work for the rewrite! > There are many other ways in which the ABI can check, e.g. > see how i386.c (ix86_function_regparm) can change ABI if > cgraph_local_info *i = &target->local; > if (i && i->local && i->can_change_signature) > So you'd e.g. need to make sure that functions you emit the padding for > can't change signature. Bet you won't be able to handle e.g. IPA cp or many > other IPA optimizations that change signature too, while the function from > the source may still be around, perhaps nothing will call it because the > callers have been changed to call its clone with different arguments. > Or say IPA-ICF, if you want to live-patch only some function and not another > function which happens to have identical bogy in the end. Indeed. Live patching needs to take more care. But as I wrote, it is only _my_ current use case; the feature itself should be usable more widely. I wouldn't want to restrict future users on what they can optimise and what they can't. Maybe someone wants to measure exactly these effects? The point is: whatever instructions are to replace those NOPs, they will very likely need a register or two. With IPA-RA, all bets are off. Without, you can assume the caller-save regs, the scratch regs and some intra-procedure regs to be available. Please let's leave this all to the framework implementers. The same goes for nops before or after the entry point. This is only a _mechanism_. Torsten