On 6/7/21 12:29 PM, Bruno Piazera Larsen wrote:
I just tried sending mmu_idx all the way down, but I ran into a very weird bug
of gcc. If we have to add one more parameter that GCC can't just optimize away
we get at least a slow down of 5x for the first test of check-acceptance (could
be more, but the test times out after 900 seconds, so I'm not sure).
That's odd. We already have more arguments than the number of argument
registers... A 5x slowdown is distinctly odd.
One way
that I managed to get around that is saving the current MSR, setting it to 5,
and restoring after the xlate call. The code ended up something like:
int new_idx = (5<<HFLAGS_IMMU_IDX) | (5<<HFLAGS_DMMU_IDX);
int clr = (7<<HFLAGS_IMMU_IDX) | (7<<HFLAGS_DMMU_IDX);
int old_idx = env->msr & clr;
clr = ~clr;
/* set new msr so we don't need to send the mmu_idx */
env->msr = (env->msr & clr) | new_idx;
ret = ppc_radix64_partition_scoped_xlate(...);
/* restore old mmu_idx */
env->msr = (env->msr & clr) | old_idx;
No, this is silly.
We need to do one of two things:
- make sure everything is inlined,
- reduce the number of arguments.
We're currently passing in 9 arguments, which really is too many already. We
should be using something akin to mmu_ctx_t, but probably specific to radix64
without the random stuff collected for random other mmu models.
r~