On Sun, Jun 25, 2023, at 8:49 AM, Jeff Law wrote:
> On 6/24/23 19:40, Stefan O'Rear wrote:
>> On Sat, Jun 24, 2023, at 11:01 AM, Jeff Law via Gcc-patches wrote:
>>> On 6/21/23 02:14, Wang, Yanzhang wrote:
>>>> Hi Jeff, sorry for the late reply.
>>>>
>>>>> The long branch handling is done at the assembler level.  So the 
>>>>> clobbering
>>>>> of $ra isn't visible to the compiler.  Thus the compiler has to be
>>>>> extremely careful to not hold values in $ra because the assembler may
>>>>> clobber $ra.
>>>>
>>>> If assembler will modify the $ra behavior, it seems the rules we defined in
>>>> the riscv.cc will be ignored. For example, the $ra saving generated by this
>>>> patch may be modified by the assmebler and all others depends on it will be
>>>> wrong. So implementing the long jump in the compiler is better.
>>> Basically correct.  The assembler potentially clobbers $ra.  That's why
>>> in the long jump patches $ra becomes a fixed register -- the compiler
>>> doesn't know when it's clobbered by the assembler.
>>>
>>> Even if this were done in the compiler, we'd still have to do something
>>> special with $ra.  The point at which decisions about register
>>> allocation and such are made is before the point where we know the final
>>> positions of jumps/labels.  It's a classic problem in GCC's design.
>> 
>> Do you have a reference for more information on the long jump patches?
> I can extract the patch Andrew wrote if that would be helpful.
>
>> 
>> I'm particularly curious about why $ra was selected as the temporary instead
>> of $t1 like the tail pseudoinstruction uses.
> $ra would be less disruptive from a code generation standpoint. 
> Essentially whatever register is selected has to become a fixed 
> register, meaning it's unavailable to the allocator.   Thus $t1 would be 
> a horrible choice.  Ultimately this is defined by the assembler.

To clarify: are you proposing to make ra (or t1 in the hypothetical) a fixed
register for all functions, or only those heuristically identified as 
potentially
larger than 1MiB?  And would this extend to forcing the creation of stack frames
for all functions, including very small functions?  I am concerned this would
result in a substantial performance regression.

Without seeing the patch I can't know if I'm missing something obvious but I
would say t1 has three advantages:

1. Consistency with tail, possibly simpler implementation.

2. Very few functions use all seven t-registers.  qemu linux-user in 2016 had an
off-by-one bug that corrupted t6 in sigreturn and it took months for anyone to
notice.  By contrast, ra has live data in every non-_Noreturn function.

3. Any jalr instruction which has rs1=ra has a hint effect on the return address
stack (call, return, or coroutine swap); a jalr which is intended to be treated
as a plain jump must have rs1!=ra, rs1!=t0.

-s

> jeff

Reply via email to