Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]

Qing Zhao via Gcc-patches Mon, 14 Sep 2020 11:50:44 -0700

> On Sep 14, 2020, at 11:33 AM, Richard Sandiford <richard.sandif...@arm.com> 
> wrote:
> 
> Qing Zhao <qing.z...@oracle.com> writes:
>>> Like I mentioned earlier though, passes that run after
>>> pass_thread_prologue_and_epilogue can use call-clobbered registers that
>>> weren't previously used.  For example, on x86_64, the function might
>>> not use %r8 when the prologue, epilogue and returns are generated,
>>> but pass_regrename might later introduce a new use of %r8.  AIUI,
>>> the “used” version of the new command-line option is supposed to clear
>>> %r8 in these circumstances, but it wouldn't do so if the data was
>>> collected at the point that the return is generated.
>> 
>> Thanks for the information.
>> 
>>> 
>>> That's why I think it's more robust to do this later (at the beginning
>>> of pass_late_compilation) and insert the zeroing before returns that
>>> already exist.
>> 
>> Yes, looks like it’s not correct to insert the zeroing at the time when 
>> prologue, epilogue and return are generated.
>> As I also checked, “return” might be also generated as late as pass 
>> “pass_delay_slots”,  So, shall we move the
>> New pass as late as possible?
> 
> If we insert the zeroing before pass_delay_slots and describe the
> result correctly, pass_delay_slots should do the right thing.
> 
> Describing the result correctly includes ensuring that the cleared
> registers are treated as live on exit from the function, so that the
> zeroing doesn't get deleted again, or skipped by pass_delay_slots.

In the current implementation for x86, when we generating a zeroing insn as the 
following:

(insn 18 16 19 2 (set (reg:SI 1 dx)
        (const_int 0 [0])) "t10.c":11:1 -1
     (nil))
(insn 19 18 20 2 (unspec_volatile [
            (reg:SI 1 dx)
        ] UNSPECV_PRO_EPILOGUE_USE) "t10.c":11:1 -1
     (nil))

i.e, after each zeroing insn, the register that is zeroed is marked as 
“UNSPECV_PRO_EPILOGUE_USE”, 
By doing this, we can avoid this zeroing insn from being deleted or skipped. 

Is doing this enough to describe the result correctly?
Is there other thing we need to do in addition to this?

> 
>> Can I put it immediately before “pass_final”? What’s the latest place
>> I can put it?
> 
> Like you say here…
> 
>>>>> So IMO the pass should just search for all the
>>>>> returns in a function and insert the zeroing instructions before
>>>>> each one.
>>>> 
>>>> I was considering this approach too for some time, however, there is one 
>>>> major issue with this as 
>>>> Segher mentioned, The middle end does not know some details on the 
>>>> registers, lacking such 
>>>> detailed information might result incorrect code generation at middle end.
>>>> 
>>>> For example, on x86_64 target, when “return” with pop, the scratch 
>>>> register “ECX” will be 
>>>> used for returning, then it’s incorrect to zero “ecx” before generating 
>>>> the return. Since middle end
>>>> doesn't have such information, it cannot avoid to zero “ecx”. Therefore 
>>>> incorrect code might be 
>>>> generated. 
>>>> 
>>>> Segher also mentioned that on Power, there are some scratch registers also 
>>>> are used for 
>>>> Other purpose, clearing them before return is not correct. 
>>> 
>>> But the dataflow information has to be correct between
>>> pass_thread_prologue_and_epilogue and pass_free_cfg, otherwise
>>> any pass in that region could clobber the registers in the same way.
>> 
>> You mean, the data flow information will be not correct after pass_free_cfg? 
>> “pass_delay_slots” is after “pass_free_cfg”,  and there might be new 
>> “return” generated in “pass_delay_slots”, 
>> If we want to generate zeroing for the new “return” which was generated in 
>> “pass_delay_slots”, can we correctly to do so?
> 
> …the zeroing has to be done before pass_free_cfg, because the information
> isn't reliable after that point.  I think it would make sense to do it
> before pass_compute_alignments, because inserting the zeros will affect
> alignment.

Okay. 

Then there is another problem:  what about the new “return”s that are generated 
at pass_delay_slots?

Should we generate the zeroing for these new returns? Since the data flow 
information might not be correct at this pass,
It looks like that there is no correct way to add the zeroing insn for these 
new “return”, then, what should we do about this?

> 
>>> To get the registers that are live before the return, you can start with
>>> the registers that are live out from the block that contains the return,
>>> then “simulate” the return instruction backwards to get the set of
>>> registers that are live before the return instruction
>>> (see df_simulate_one_insn_backwards).
>> 
>> Okay. 
>> Currently, I am using the following to check whether a reg is live out the 
>> block that contains the return:
>> 
>> /* Check whether the hard register REGNO is live at the exit block
>> * of the current routine.  */
>> static bool
>> is_live_reg_at_exit (unsigned int regno)
>> {
>>  edge e;
>>  edge_iterator ei;
>> 
>>  FOR_EACH_EDGE (e, ei, EXIT_BLOCK_PTR_FOR_FN (cfun)->preds)
>>    {
>>      bitmap live_out = df_get_live_out (e->src);
>>      if (REGNO_REG_SET_P (live_out, regno))
>>        return true;
>>    }
>> 
>>  return false;
>> }
>> 
>> Is this correct?
> 
> df_get_live_out is the right way to get the set of live registers
> on exit from a block.  But if we search for return instructions
> and find a return instruction R, we should do:
> 
>  basic_block bb = BLOCK_FOR_INSN (R);
>  auto_bitmap live_regs;
>  bitmap_copy (regs, df_get_live_out (bb));
>  df_simulate_one_insn_backwards (bb, R, live_regs);

> 
> and then use LIVE_REGS as the set of registers that are live before R,
> and so can't be clobbered.

Okay. Thanks for the info.
> 
> For extra safety, you could/should also check targetm.hard_regno_scratch_ok
> to see whether there's a target-specific reason why a register can't
> be clobbered.

/* Return true if is OK to use a hard register REGNO as scratch register
   in peephole2.  */
DEFHOOK
(hard_regno_scratch_ok,


Is this checking only valid for pass_peephole2?

> 
>>> In the x86_64 case you mention, the pattern is:
>>> 
>>> (define_insn "*simple_return_indirect_internal<mode>"
>>> [(simple_return)
>>>  (use (match_operand:W 0 "register_operand" "r"))]
>>> "reload_completed"
>>> …)
>>> 
>>> This (use …) tells the df machinery that the instruction needs
>>> operand 0 (= ecx).  The process above would therefore realise
>>> that ecx can't be clobbered.
>> 
>> Okay, I see.  The df will reflect this information, no need for special 
>> handling here. 
>> 
>> However, for the cases on Power as Segher mentioned, there are also some 
>> scratch registers used for
>> Other purpose, not sure whether we can correctly generate zeroing in 
>> middle-end for Power?
> 
> Segher would be better placed to answer that, but I think the process
> above has to give a conservatively-accurate list of live registers.
> If it misses a register, the other late rtl passes could clobber
> that same register.

Segher, can you comment on this? 

thanks.

Qing
> 
> Thanks,
> Richard
Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]

Reply via email to