On Wed, Feb 08, 2017 at 12:48:56PM +0100, Jakub Jelinek wrote:
> 
> First of all, GCC is in stage4 and this isn't a fix for a regression, so it
> is not acceptable at this point.  Next stage1 will start in April or so.

I had gotten the impression there were sort of branches in the SCM; this
of course should go to HEAD / trunk / master or whatever it's called.

> The length of nop varies greatly across different architectures,
> some architectures have different spelling for it (e.g.
> ia64, s390{,x} use nop 0 and mmix uses swym 0), some architectures have
> tons of different nops with different sizes (see e.g. x86_64/i686), for some
> longer sizes it is often beneficial to emit a jump around the rest of nops
> instead of just tons of nops.
> 
> Even if it is counted always in nops and only nops are emitted (not really
> efficient, and I guess kernel cares about efficiency), the above is

Yes, efficiency is a goal, but not unchallenged. Besides, out of order micro-
architectures hardly suffer from NOPs inserted other than their occupation of
cache space and memory bandwidth. The worst impact I found on some in-order
aarch64 CPUs was a penalty of 1 clock cycle for 2 NOPs.

Note that this is a "you asked for it -- you get it" feature. Only if you
explicitly request those NOPs they will be inserted; normal operation is
unaffected.

> definitely not acceptable for a generic hook implementation, it contains too
> many ELF specific details as well as 64-bit target specific details etc.
> For the label, you should use something like:
>   static int ppad_no;
>   char tmp_label[...];
>   ASM_GENERATE_INTERNAL_LABEL (tmp_label, "LPPAD", ppad_no);
>   ppad_no++;
>   ASM_OUTPUT_LABEL (file, tmp_label);
> the "a",@progbits is not generic enough, you want
>   switch_to_section (get_section ("__prolog_pads_loc", 0, NULL));
> or so.  .previous doesn't work on many targets, you need to actuall switch
> to the previously current section.  .quad is not supported by many
> assemblers, you want assemble_integer, with the appropriate size (say
> derived from size of pointer), decide whether in that section everything is
> aligned or unaligned, emit aligning directive if the former.

Thanks a lot for this blueprint! It saved me half of the work for the rewrite!

> There are many other ways in which the ABI can check, e.g.
> see how i386.c (ix86_function_regparm) can change ABI if
>           cgraph_local_info *i = &target->local;
>           if (i && i->local && i->can_change_signature)
> So you'd e.g. need to make sure that functions you emit the padding for
> can't change signature.  Bet you won't be able to handle e.g. IPA cp or many
> other IPA optimizations that change signature too, while the function from
> the source may still be around, perhaps nothing will call it because the
> callers have been changed to call its clone with different arguments.
> Or say IPA-ICF, if you want to live-patch only some function and not another
> function which happens to have identical bogy in the end.

Indeed. Live patching needs to take more care. But as I wrote, it is only
_my_ current use case; the feature itself should be usable more widely.
I wouldn't want to restrict future users on what they can optimise and what
they can't. Maybe someone wants to measure exactly these effects?

The point is: whatever instructions are to replace those NOPs, they will
very likely need a register or two. With IPA-RA, all bets are off. Without,
you can assume the caller-save regs, the scratch regs and some intra-procedure
regs to be available. Please let's leave this all to the framework
implementers. The same goes for nops before or after the entry point. This
is only a _mechanism_.

        Torsten

Reply via email to