On 2015-09-01 22:51, Richard Henderson wrote:
> I've been looking at this problem off and on for the last week or so,
> prompted by the sparc performance work.  Although I havn't been able
> to get a proper sparc64 guest install working, I see the exact same
> problem with a mips guest.
> 
> On alpha or x86, which seem to perform well, perf numbers for the
> executable have about 30% of the execution time spent in cpu_exec.
> For mips, on the other hand, we spend about 30% of the time in
> routines related to tcg (re-)translation.

Indeed the problem happens on CPUs which implement the MMU as a 
"software assisted TLB" (or any other marketing name), as opposed to
hardware page walk MMU. They can hold a limited number of TLB entry
at a given time, and require the OS to do the page walk to refill the
TLB. For that an exception is generated, and the faulting address has
to be determined. That's were the TB retranslation takes place, and
that's why it happens a lot more on these CPUS.

A few years ago, I measured about 45% of the TB translation actually
being retranslation for mips and 60% for SH4 for a standard workload.
For a comparison, these value around 1% on i386 and around 5% on ARM.

That's why each time we add an optimization to the optimize, we get
faster code, but we might loose because it takes longer to generate.

> Aurelien has a patch in his own branches that attempts to mitigate this
> on mips by shadow caching more tlb entries.  While this does improve
> performace a bit, it employs a linear search through a large buffer,
> with the effect of 30-ish % perf numbers for r4k_map_address.
> (One could probably improve things by hashing the data in that array,
> rather than a linear search, but...)

Yes, that is just a workaround and probably highly workload dependent,
that's why I never submitted it.

> In the past we've talked about getting rid of retranslation entirely.
> It's clever, but it certainly has its share of problems.  I gave it
> a go this weekend.

Really great that you have been able to implement that.

> The following isn't quite right.  It fails to boot on sparc even with
> our tiny test kernel.  It also triggers an abort on mips, eventually.
> But it's able to get all the way through to a prompt, and in the 
> process I can see that perf results are quite different -- much more
> like results I see for alpha.
> 
> Thoughts on the approach?

It looks like the approach we discussed with Paolo back in June:

http://lists.nongnu.org/archive/html/qemu-devel/2015-06/msg04885.html

For me it looks like the good way to proceed, we just have to take care
that the informations to store do not take too much space compared to 
the actual translated code.

I'll give a look and a test asap.

Aurelien

-- 
Aurelien Jarno                          GPG: 4096R/1DDD8C9B
aurel...@aurel32.net                 http://www.aurel32.net

Reply via email to