> On Sep 14, 2020, at 2:20 PM, Richard Sandiford <richard.sandif...@arm.com>
> wrote:
>
> Qing Zhao <qing.z...@oracle.com <mailto:qing.z...@oracle.com>> writes:
>>> On Sep 14, 2020, at 11:33 AM, Richard Sandiford <richard.sandif...@arm.com>
>>> wrote:
>>>
>>> Qing Zhao <qing.z...@oracle.com> writes:
>>>>> Like I mentioned earlier though, passes that run after
>>>>> pass_thread_prologue_and_epilogue can use call-clobbered registers that
>>>>> weren't previously used. For example, on x86_64, the function might
>>>>> not use %r8 when the prologue, epilogue and returns are generated,
>>>>> but pass_regrename might later introduce a new use of %r8. AIUI,
>>>>> the “used” version of the new command-line option is supposed to clear
>>>>> %r8 in these circumstances, but it wouldn't do so if the data was
>>>>> collected at the point that the return is generated.
>>>>
>>>> Thanks for the information.
>>>>
>>>>>
>>>>> That's why I think it's more robust to do this later (at the beginning
>>>>> of pass_late_compilation) and insert the zeroing before returns that
>>>>> already exist.
>>>>
>>>> Yes, looks like it’s not correct to insert the zeroing at the time when
>>>> prologue, epilogue and return are generated.
>>>> As I also checked, “return” might be also generated as late as pass
>>>> “pass_delay_slots”, So, shall we move the
>>>> New pass as late as possible?
>>>
>>> If we insert the zeroing before pass_delay_slots and describe the
>>> result correctly, pass_delay_slots should do the right thing.
>>>
>>> Describing the result correctly includes ensuring that the cleared
>>> registers are treated as live on exit from the function, so that the
>>> zeroing doesn't get deleted again, or skipped by pass_delay_slots.
>>
>> In the current implementation for x86, when we generating a zeroing insn as
>> the following:
>>
>> (insn 18 16 19 2 (set (reg:SI 1 dx)
>> (const_int 0 [0])) "t10.c":11:1 -1
>> (nil))
>> (insn 19 18 20 2 (unspec_volatile [
>> (reg:SI 1 dx)
>> ] UNSPECV_PRO_EPILOGUE_USE) "t10.c":11:1 -1
>> (nil))
>>
>> i.e, after each zeroing insn, the register that is zeroed is marked as
>> “UNSPECV_PRO_EPILOGUE_USE”,
>> By doing this, we can avoid this zeroing insn from being deleted or skipped.
>>
>> Is doing this enough to describe the result correctly?
>> Is there other thing we need to do in addition to this?
>
> I guess that works, but I think it would be better to abstract
> EPILOGUE_USES into a new target-independent wrapper function that
> (a) returns true if EPILOGUE_USES itself returns true and (b) returns
> true for registers that need to be zero on return, if the zeroing
> instructions have already been inserted. The places that currently
> test EPILOGUE_USES should then test this new wrapper function instead.
Okay, I see.
Looks like that EPILOGUE_USES is used in df-scan.c to compute the data flow
information. If EPILOUGE_USES return true
for registers that need to be zeroed on return, those registers will be
included in the data flow information, as a result, later
passes will not be able to delete them.
This sounds to be a cleaner approach than the current one that marks the
registers “UNSPECV_PRO_EPILOGUE_USE”.
A more detailed implementation question on this:
Where should I put this new target-independent wrapper function in? Which
header file will be a proper place to hold this new function?
>
> After inserting the zeroing instructions, the pass should recompute the
> live-out sets based on this.
Is only computing the live-out sets of the block that including the return insn
enough? Or we should re-compute the whole procedure?
Which utility routine I should use to recompute the live-out sets?
>
>>>>> But the dataflow information has to be correct between
>>>>> pass_thread_prologue_and_epilogue and pass_free_cfg, otherwise
>>>>> any pass in that region could clobber the registers in the same way.
>>>>
>>>> You mean, the data flow information will be not correct after
>>>> pass_free_cfg?
>>>> “pass_delay_slots” is after “pass_free_cfg”, and there might be new
>>>> “return” generated in “pass_delay_slots”,
>>>> If we want to generate zeroing for the new “return” which was generated in
>>>> “pass_delay_slots”, can we correctly to do so?
>>>
>>> …the zeroing has to be done before pass_free_cfg, because the information
>>> isn't reliable after that point. I think it would make sense to do it
>>> before pass_compute_alignments, because inserting the zeros will affect
>>> alignment.
>>
>> Okay.
>>
>> Then there is another problem: what about the new “return”s that are
>> generated at pass_delay_slots?
>>
>> Should we generate the zeroing for these new returns? Since the data flow
>> information might not be correct at this pass,
>> It looks like that there is no correct way to add the zeroing insn for these
>> new “return”, then, what should we do about this?
>
> pass_delay_slots isn't a problem. It doesn't change *what* happens
> on each return path, it just changes how the instructions to achieve
> it are arranged.
>
> So e.g. if every path through the function clears register R before
> pass_delay_slots, and if that clearing is represented as being necessary,
> then every path through the function will clear register R after the pass
> as well.
Okay, I might now understand what you mean here.
My understanding is:
In our new pass that is put in the beginning of the pass_late_compilation, I,e
pass_zero_call_used_regs;
PUSH_INSERT_PASSES_WITHIN (pass_late_compilation)
++++ NEXT_PASS (pass_zero_call_used_regs);
NEXT_PASS (pass_compute_alignments);
NEXT_PASS (pass_variable_tracking);
NEXT_PASS (pass_free_cfg);
NEXT_PASS (pass_machine_reorg);
NEXT_PASS (pass_cleanup_barriers);
NEXT_PASS (pass_delay_slots);
When we scan the EXIT BLOCK of the routine, all the return insns have already
been there.
The later passes including “pass_delay_slots” will not generate additional
returns anymore, they might just call “target.gen_return” or
“target.gen_simple_return() to replace
“ret_rtx” or “simple_ret_rtx” ?
>
>>> For extra safety, you could/should also check targetm.hard_regno_scratch_ok
>>> to see whether there's a target-specific reason why a register can't
>>> be clobbered.
>>
>> /* Return true if is OK to use a hard register REGNO as scratch register
>> in peephole2. */
>> DEFHOOK
>> (hard_regno_scratch_ok,
>>
>>
>> Is this checking only valid for pass_peephole2?
>
> No, that comment looks out of date. The hook is already used in
> postreload, for example.
Okay, I see.
thanks.
Qing
>
> Thanks,
> Richard