Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]

Qing Zhao via Gcc-patches Mon, 14 Sep 2020 13:25:20 -0700

> On Sep 14, 2020, at 2:20 PM, Richard Sandiford <richard.sandif...@arm.com> 
> wrote:
> 
> Qing Zhao <qing.z...@oracle.com <mailto:qing.z...@oracle.com>> writes:
>>> On Sep 14, 2020, at 11:33 AM, Richard Sandiford <richard.sandif...@arm.com> 
>>> wrote:
>>> 
>>> Qing Zhao <qing.z...@oracle.com> writes:
>>>>> Like I mentioned earlier though, passes that run after
>>>>> pass_thread_prologue_and_epilogue can use call-clobbered registers that
>>>>> weren't previously used.  For example, on x86_64, the function might
>>>>> not use %r8 when the prologue, epilogue and returns are generated,
>>>>> but pass_regrename might later introduce a new use of %r8.  AIUI,
>>>>> the “used” version of the new command-line option is supposed to clear
>>>>> %r8 in these circumstances, but it wouldn't do so if the data was
>>>>> collected at the point that the return is generated.
>>>> 
>>>> Thanks for the information.
>>>> 
>>>>> 
>>>>> That's why I think it's more robust to do this later (at the beginning
>>>>> of pass_late_compilation) and insert the zeroing before returns that
>>>>> already exist.
>>>> 
>>>> Yes, looks like it’s not correct to insert the zeroing at the time when 
>>>> prologue, epilogue and return are generated.
>>>> As I also checked, “return” might be also generated as late as pass 
>>>> “pass_delay_slots”,  So, shall we move the
>>>> New pass as late as possible?
>>> 
>>> If we insert the zeroing before pass_delay_slots and describe the
>>> result correctly, pass_delay_slots should do the right thing.
>>> 
>>> Describing the result correctly includes ensuring that the cleared
>>> registers are treated as live on exit from the function, so that the
>>> zeroing doesn't get deleted again, or skipped by pass_delay_slots.
>> 
>> In the current implementation for x86, when we generating a zeroing insn as 
>> the following:
>> 
>> (insn 18 16 19 2 (set (reg:SI 1 dx)
>>        (const_int 0 [0])) "t10.c":11:1 -1
>>     (nil))
>> (insn 19 18 20 2 (unspec_volatile [
>>            (reg:SI 1 dx)
>>        ] UNSPECV_PRO_EPILOGUE_USE) "t10.c":11:1 -1
>>     (nil))
>> 
>> i.e, after each zeroing insn, the register that is zeroed is marked as 
>> “UNSPECV_PRO_EPILOGUE_USE”, 
>> By doing this, we can avoid this zeroing insn from being deleted or skipped. 
>> 
>> Is doing this enough to describe the result correctly?
>> Is there other thing we need to do in addition to this?
> 
> I guess that works, but I think it would be better to abstract
> EPILOGUE_USES into a new target-independent wrapper function that
> (a) returns true if EPILOGUE_USES itself returns true and (b) returns
> true for registers that need to be zero on return, if the zeroing
> instructions have already been inserted.  The places that currently
> test EPILOGUE_USES should then test this new wrapper function instead.

Okay, I see. 
Looks like that EPILOGUE_USES is used in df-scan.c to compute the data flow 
information. If EPILOUGE_USES return true
for registers that need to be zeroed on return, those registers will be 
included in the data flow information, as a result, later
passes will not be able to delete them. 

This sounds to be a cleaner approach than the current one that marks the 
registers  “UNSPECV_PRO_EPILOGUE_USE”. 

A more detailed implementation question on this: 
Where should I put this new target-independent wrapper function in? Which 
header file will be a proper place to hold this new function?

> 
> After inserting the zeroing instructions, the pass should recompute the
> live-out sets based on this.

Is only computing the live-out sets of the block that including the return insn 
enough? Or we should re-compute the whole procedure? 

Which utility routine I should use to recompute the live-out sets?

> 
>>>>> But the dataflow information has to be correct between
>>>>> pass_thread_prologue_and_epilogue and pass_free_cfg, otherwise
>>>>> any pass in that region could clobber the registers in the same way.
>>>> 
>>>> You mean, the data flow information will be not correct after 
>>>> pass_free_cfg? 
>>>> “pass_delay_slots” is after “pass_free_cfg”,  and there might be new 
>>>> “return” generated in “pass_delay_slots”, 
>>>> If we want to generate zeroing for the new “return” which was generated in 
>>>> “pass_delay_slots”, can we correctly to do so?
>>> 
>>> …the zeroing has to be done before pass_free_cfg, because the information
>>> isn't reliable after that point.  I think it would make sense to do it
>>> before pass_compute_alignments, because inserting the zeros will affect
>>> alignment.
>> 
>> Okay. 
>> 
>> Then there is another problem:  what about the new “return”s that are 
>> generated at pass_delay_slots?
>> 
>> Should we generate the zeroing for these new returns? Since the data flow 
>> information might not be correct at this pass,
>> It looks like that there is no correct way to add the zeroing insn for these 
>> new “return”, then, what should we do about this?
> 
> pass_delay_slots isn't a problem.  It doesn't change *what* happens
> on each return path, it just changes how the instructions to achieve
> it are arranged.
> 
> So e.g. if every path through the function clears register R before
> pass_delay_slots, and if that clearing is represented as being necessary,
> then every path through the function will clear register R after the pass
> as well.

Okay, I might now understand what you mean here.

My understanding is:

In our new pass that is put in the beginning of the pass_late_compilation, I,e 
pass_zero_call_used_regs;

      PUSH_INSERT_PASSES_WITHIN (pass_late_compilation)
++++  NEXT_PASS (pass_zero_call_used_regs);
          NEXT_PASS (pass_compute_alignments);
          NEXT_PASS (pass_variable_tracking);
          NEXT_PASS (pass_free_cfg);
          NEXT_PASS (pass_machine_reorg);
          NEXT_PASS (pass_cleanup_barriers);
          NEXT_PASS (pass_delay_slots);

When we scan the EXIT BLOCK of the routine, all the return insns have already 
been there.
The later passes including “pass_delay_slots” will not generate additional 
returns anymore,  they might just call “target.gen_return” or 
“target.gen_simple_return() to replace 
“ret_rtx” or “simple_ret_rtx” ?


> 
>>> For extra safety, you could/should also check targetm.hard_regno_scratch_ok
>>> to see whether there's a target-specific reason why a register can't
>>> be clobbered.
>> 
>> /* Return true if is OK to use a hard register REGNO as scratch register
>>   in peephole2.  */
>> DEFHOOK
>> (hard_regno_scratch_ok,
>> 
>> 
>> Is this checking only valid for pass_peephole2?
> 
> No, that comment looks out of date.  The hook is already used in
> postreload, for example.

Okay, I see.

thanks.

Qing
> 
> Thanks,
> Richard
Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]

Reply via email to