On Wed, 5 Aug 2020, Qing Zhao wrote: > Hi, Richard, > > Thanks a lot for your careful review and detailed comments. > > > > On Aug 4, 2020, at 2:35 AM, Richard Biener <rguent...@suse.de> wrote: > > > > I have a few comments below - I'm not sure I'm qualified to fully > > review the rest though. > > Could you let me know who will be the more qualified person to fully review > the rest of middle-end change?
Jeff might be, but with the intended purpose (ROP mitigation AFAIU) it would be nice for other target maintainers to chime in (Segher for power maybe) for the question below... > >>>>> [PATCH] Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all] > >>>>> command-line option and > >>>>> zero_call_used_regs("skip|used-gpr|all-gpr||used|all") function > >>>>> attribue: > >>>>> > >>>>> 1. -fzero-call-used-regs=skip and zero_call_used_regs("skip") > >>>>> > >>>>> Don't zero call-used registers upon function return. > > > > Does a return via EH unwinding also constitute a function return? I > > think you may want to have a finally handler or support in the unwinder > > for this? Then there's abnormal return via longjmp & friends, I guess > > there's nothing that can be done there besides patching glibc? > > > > In general I am missing reasoning as why to use -fzero-call-used-regs= > > in the documentation, that is, what is the thread model and what are > > the guarantees? Is there any point zeroing registers when spill slots > > are left populated with stale register contents? How do I (and why > > would I want to?) ensure that there's no information leak from the > > implementation of 'foo' to their callers? Do I need to compile all > > of 'foo' and functions called from 'foo' with -fzero-call-used-regs= > > or is it enough to annotate API boundaries I want to proptect with > > zero_call_used_regs("...")? > > > > Again - what's the intended use (and how does it fulful anything useful > > for that case)? > > The major question of the above is: what’s the motivation of the whole patch? > H.J.Lu and I have replied this question in separated emails, let’s continue > with > this high-level discussion in that thread. > > > >>>>> @@ -4506,6 +4511,69 @@ handle_no_split_stack_attribute (tree *node, > >>>>> tree name, > >>>>> return NULL_TREE; > >>>>> } > >>>>> > >>>>> +/* Handle a "zero_call_used_regs" attribute; arguments as in > >>>>> + struct attribute_spec.handler. */ > >>>>> + > >>>>> +static tree > >>>>> +handle_zero_call_used_regs_attribute (tree *node, tree name, tree args, > >>>>> + int ARG_UNUSED (flags), > >>>>> + bool *no_add_attris) > >>>>> +{ > >>>>> + tree decl = *node; > >>>>> + tree id = TREE_VALUE (args); > >>>>> + enum zero_call_used_regs zero_call_used_regs_type = > >>>>> zero_call_used_regs_unset; > >>>>> + > >>>>> + if (TREE_CODE (decl) != FUNCTION_DECL) > >>>>> + { > >>>>> + error_at (DECL_SOURCE_LOCATION (decl), > >>>>> + "%qE attribute applies only to functions", name); > >>>>> + *no_add_attris = true; > >>>>> + return NULL_TREE; > >>>>> + } > >>>>> + else if (DECL_INITIAL (decl)) > >>>>> + { > >>>>> + error_at (DECL_SOURCE_LOCATION (decl), > >>>>> + "cannot set %qE attribute after definition", name); > > > > Why's that? > This might not be needed, I will fix this in the next update. > > >>>>> > >>>>> diff --git a/gcc/c/c-decl.c b/gcc/c/c-decl.c > >>>>> index 81bd2ee..ded1880 100644 > >>>>> --- a/gcc/c/c-decl.c > >>>>> +++ b/gcc/c/c-decl.c > >>>>> @@ -2681,6 +2681,10 @@ merge_decls (tree newdecl, tree olddecl, tree > >>>>> newtype, tree oldtype) > >>>>> DECL_IS_NOVOPS (newdecl) |= DECL_IS_NOVOPS (olddecl); > >>>>> } > >>>>> > >>>>> + /* Merge the zero_call_used_regs_type information. */ > >>>>> + if (TREE_CODE (newdecl) == FUNCTION_DECL) > >>>>> + DECL_ZERO_CALL_USED_REGS (newdecl) = DECL_ZERO_CALL_USED_REGS > >>>>> (olddecl); > >>>>> + > > > > If you need this (see below) then likely cp/* needs similar adjustment > > so do other places in the middle-end (function cloning, etc) > > Will check this in cp/* and function cloning etc to see whether the copying > and merging are needed in other > places. > > Another thought, if I use “lookup_attribute” of the function decl instead of > checking these bits as you suggested > later, all these copying and merging might not be necessary anymore. I will > check on that. > > > >>>>> > >>>>> + > >>>>> +/* Emit a sequence of insns to zero the call-used-registers for the > >>>>> current > >>>>> + * function. */ > > > > No '*' on the continuation line > > Okay, will fix this. > > >>>>> + > >>>>> + /* This array holds the zero rtx with the correponding machine mode. > >>>>> */ > >>>>> + rtx zero_rtx[(int)MAX_MACHINE_MODE]; > >>>>> + for (int i = 0; i < (int) MAX_MACHINE_MODE; i++) > >>>>> + zero_rtx[i] = NULL_RTX; > >>>>> + > >>>>> + for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++) > >>>>> + { > >>>>> + if (!this_target_hard_regs->x_call_used_regs[regno]) > > > > Use if (!call_used_regs[regno]) > Okay. > > > > >>>>> + continue; > >>>>> + if (fixed_regs[regno]) > >>>>> + continue; > >>>>> + if (is_live_reg_at_exit (regno)) > >>>>> + continue; > > > > How can a call-used reg be live at exit? > > Yes, this might not be needed, I will double check on this. > > > > >>>>> + if (!targetm.calls.zero_call_used_regno_p (regno, gpr_only)) > >>>>> + continue; > > > > Why does the target need some extra say here? > > Only target can decide which hard regs should be zeroed, and which hard regs > are general purpose register. I'm mostly questioning the plethora of target hooks added and whether this details are a good granularity applying to more than just x86. Did I suggest to compute a hardreg set that the middle-end says was used and is not live and leave the rest to the target? > > > >>>>> + if (used_only && !df_regs_ever_live_p (regno)) > > > > So I suppose this does not include uses by callees of this function? > > Yes, I think so. > > > >>>>> + continue; > >>>>> + > >>>>> + /* Now we can emit insn to zero this register. */ > >>>>> + rtx reg, tmp; > >>>>> + > >>>>> + machine_mode mode > >>>>> + = targetm.calls.zero_call_used_regno_mode (regno, > >>>>> + reg_raw_mode[regno]); > > > > In what case does the target ever need to adjust this (we're dealing > > with hard-regs only?)? > > For x86, for example, even though the GPR registers are 64-bit, we only need > to zero the lower 32-bit. etc. That's an optimization, yes. > > > >>>>> + if (mode == VOIDmode) > >>>>> + continue; > >>>>> + if (!have_regs_of_mode[mode]) > >>>>> + continue; > > > > When does this happen? > > This might be removed. I will check. > > > >>>>> + > >>>>> + reg = gen_rtx_REG (mode, regno); > >>>>> + if (zero_rtx[(int)mode] == NULL_RTX) > >>>>> + { > >>>>> + zero_rtx[(int)mode] = reg; > >>>>> + tmp = gen_rtx_SET (reg, const0_rtx); > >>>>> + emit_insn (tmp); > >>>>> + } > >>>>> + else > >>>>> + emit_move_insn (reg, zero_rtx[(int)mode]); > > > > Not sure but I think the canonical zero to use is CONST0_RTX (mode) > > but I may be wrong. > > You mean “const0_rtx” should be “CONST0_RTX(mode)”? > I will check on this. > > > I'd rather have the target be able to specify > > some special instruction for zeroing here. Some may have > > multi-reg set instructions for example. That said, can't we > > defer the actual zeroing to the target in full and only compute > > a hard-reg-set of to-be zerored registers here and pass that > > to a target hook? Ah, I did. > For vector regs, we have already provided this interface with > > targetm.calls.zero_all_vector_registers (used_only) > > For integer registers, do we need such target hook too? > If so, yes, it might be better to let the target decide how to zero the > registers. > > If Not, the current design might be good enough, right? But why not simplify it all to a single hook targetm.calls.zero_regs (used-not-live-hardregset, gpr_only); ? > > > >>>>> + > >>>>> + emit_insn (targetm.calls.pro_epilogue_use (reg)); > >>>>> + } > >>>>> + > >>>>> + return; > >>>>> +} > >>>>> + > >>>>> + > >>>>> /* Return a sequence to be used as the epilogue for the current > >>>>> function, > >>>>> or NULL. */ > >>>>> > >>>>> @@ -5819,6 +5961,9 @@ make_epilogue_seq (void) > >>>>> > >>>>> start_sequence (); > >>>>> emit_note (NOTE_INSN_EPILOGUE_BEG); > >>>>> + > >>>>> + gen_call_used_regs_seq (); > >>>>> + > > > > The caller eventually performs shrink-wrapping - are you sure that > > doesn't mess up things? > > My understanding is, in the standard epilogue, there is no handling of > “call-used” registers. Therefore, shrink-wrapping will not impact > “call-used” registers as well. > Our patch only handles call-used registers, so, there should be no any > interaction between this patch and shrink-wrapping. I don't know (CCed Segher, he should eventually). > > > >>>>> > >>>>> + > >>>>> + /* How to clear call-used registers upon function return. */ > >>>>> + ENUM_BITFIELD(zero_call_used_regs) zero_call_used_regs_type : 3; > >>>>> + > >>>>> + /* 11 unused bits. */ > > > > So instead of wasting "precious" bits please use lookup_attribute > > in the single place you query this value (which is once per function). > > There's no need to complicate matters by trying to maintain the above. > > Thanks for the suggestion. > Yes, I will try to use lookup_attribute in function.c instead of adding these > bits. That will save us these > precious space. Yes, I think this will simplify the code. Richard. > Thanks again. > > Qing -- Richard Biener <rguent...@suse.de> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)