On 6/25/23 12:45, Stefan O'Rear wrote:


To clarify: are you proposing to make ra (or t1 in the hypothetical) a fixed
register for all functions, or only those heuristically identified as 
potentially
larger than 1MiB?  And would this extend to forcing the creation of stack frames
for all functions, including very small functions?  I am concerned this would
result in a substantial performance regression.For the case Yanzhang is discussing (firmware and such), yes. And
that's simply the cost they're going to have to pay for wanting consistent backtraces without utilizing dwarf unwind info, sframe or orc.

Normal builds won't be using those options and thus won't suffer from those performance penalties.


Without seeing the patch I can't know if I'm missing something obvious but I
would say t1 has three advantages:

1. Consistency with tail, possibly simpler implementation.
And as I've already stated, this sequence is defined by the assembler. While I do want to revisit a compiler only solution, it's way down on my list of things to improve if I do a cost/benefit analysis. If someone wants to take a stab at it, I'm all for it. But it's not a simple problem due the phase ordering issues.


2. Very few functions use all seven t-registers.  qemu linux-user in 2016 had an
off-by-one bug that corrupted t6 in sigreturn and it took months for anyone to
notice.  By contrast, ra has live data in every non-_Noreturn function.
That's a terrible way to evaluate the impact. The right way is to use real benchmarks. Not synthetic benchmarks. Not indirect observations that require triggering a bug in a sigreturn path. Build and run a real benchmark.




3. Any jalr instruction which has rs1=ra has a hint effect on the return address
stack (call, return, or coroutine swap); a jalr which is intended to be treated
as a plain jump must have rs1!=ra, rs1!=t0.
I'm well aware of these concerns. We support disambiguating various jump forms to facilitate different branch predictors.

jeff

Reply via email to