On 6/25/23 12:45, Stefan O'Rear wrote:
To clarify: are you proposing to make ra (or t1 in the hypothetical) a fixed
register for all functions, or only those heuristically identified as
potentially
larger than 1MiB? And would this extend to forcing the creation of stack frames
for all functions, including very small functions? I am concerned this would
result in a substantial performance regression.For the case Yanzhang is discussing (firmware and such), yes. And
that's simply the cost they're going to have to pay for wanting
consistent backtraces without utilizing dwarf unwind info, sframe or orc.
Normal builds won't be using those options and thus won't suffer from
those performance penalties.
Without seeing the patch I can't know if I'm missing something obvious but I
would say t1 has three advantages:
1. Consistency with tail, possibly simpler implementation.
And as I've already stated, this sequence is defined by the assembler.
While I do want to revisit a compiler only solution, it's way down on my
list of things to improve if I do a cost/benefit analysis. If someone
wants to take a stab at it, I'm all for it. But it's not a simple
problem due the phase ordering issues.
2. Very few functions use all seven t-registers. qemu linux-user in 2016 had an
off-by-one bug that corrupted t6 in sigreturn and it took months for anyone to
notice. By contrast, ra has live data in every non-_Noreturn function.
That's a terrible way to evaluate the impact. The right way is to use
real benchmarks. Not synthetic benchmarks. Not indirect observations
that require triggering a bug in a sigreturn path. Build and run a real
benchmark.
3. Any jalr instruction which has rs1=ra has a hint effect on the return address
stack (call, return, or coroutine swap); a jalr which is intended to be treated
as a plain jump must have rs1!=ra, rs1!=t0.
I'm well aware of these concerns. We support disambiguating various
jump forms to facilitate different branch predictors.
jeff