Re: [Qemu-devel] [PATCH] tcg/softmmu: Increase size of TLB caches

2017-09-06 Thread Alex Bennée

Richard Henderson  writes:

> On 09/05/2017 05:33 PM, Pranith Kumar wrote:
>>> This significantly degrades performance of alpha-softmmu.
>>> It spends about 25% of all cpu time in memset.
>>
>> What workload does it degrade for? I will try to reproduce and see
>> which memset is causing this.
>
> emerge --update --ask @world
>
> which is a lot of python, iirc.
>
> It is the tlb flush memset that is causing this.
> The tlb has grown from a few kB to 300kB with your patch.

Hmm I wonder if this is QEMU doing global flushes which could actually
be partial TLB flushes? emerge is quite a memory hog though so its
conceivable we are generating a lot of churn as it runs.

--
Alex Bennée



Re: [Qemu-devel] [PATCH] tcg/softmmu: Increase size of TLB caches

2017-09-06 Thread Richard Henderson
On 09/05/2017 05:33 PM, Pranith Kumar wrote:
>> This significantly degrades performance of alpha-softmmu.
>> It spends about 25% of all cpu time in memset.
> 
> What workload does it degrade for? I will try to reproduce and see
> which memset is causing this.

emerge --update --ask @world

which is a lot of python, iirc.

It is the tlb flush memset that is causing this.
The tlb has grown from a few kB to 300kB with your patch.


r~



Re: [Qemu-devel] [PATCH] tcg/softmmu: Increase size of TLB caches

2017-09-05 Thread Pranith Kumar
On Tue, Sep 5, 2017 at 5:50 PM, Richard Henderson  wrote:
> On 08/29/2017 10:23 AM, Pranith Kumar wrote:
>> This patch increases the number of entries cached in the TLB. I went
>> over a few architectures to see if increasing it is problematic. Only
>> armv6 seems to have a limitation that only 8 bits can be used for
>> indexing these entries. For other architectures, the number of TLB
>> entries is increased to a 4K-sized cache. The patch also doubles the
>> number of victim TLB entries.
>>
>> Some statistics collected from a build benchmark for various cache
>> sizes is listed below:
>>
>> | TLB bits\vTLB entires | 8 |16  |32 |
>> | 8 | 952.94(+0.0%) | 929.99(+2.4%)  | 919.02(+3.6%) |
>> |10 | 898.92(+5.6%) | 886.13(+7.0%)  | 887.03(+6.9%) |
>> |12 | 878.56(+7.8%) | 873.53(+8.3%)* | 875.34(+8.1%) |
>>
>> The best combination for this workload came out to be 12 bits for the
>> TLB and a 16 entry vTLB cache.
>
> This significantly degrades performance of alpha-softmmu.
> It spends about 25% of all cpu time in memset.

What workload does it degrade for? I will try to reproduce and see
which memset is causing this.

-- 
Pranith



Re: [Qemu-devel] [PATCH] tcg/softmmu: Increase size of TLB caches

2017-09-05 Thread Richard Henderson
On 08/29/2017 10:23 AM, Pranith Kumar wrote:
> This patch increases the number of entries cached in the TLB. I went
> over a few architectures to see if increasing it is problematic. Only
> armv6 seems to have a limitation that only 8 bits can be used for
> indexing these entries. For other architectures, the number of TLB
> entries is increased to a 4K-sized cache. The patch also doubles the
> number of victim TLB entries.
> 
> Some statistics collected from a build benchmark for various cache
> sizes is listed below:
> 
> | TLB bits\vTLB entires | 8 |16  |32 |
> | 8 | 952.94(+0.0%) | 929.99(+2.4%)  | 919.02(+3.6%) |
> |10 | 898.92(+5.6%) | 886.13(+7.0%)  | 887.03(+6.9%) |
> |12 | 878.56(+7.8%) | 873.53(+8.3%)* | 875.34(+8.1%) |
> 
> The best combination for this workload came out to be 12 bits for the
> TLB and a 16 entry vTLB cache.

This significantly degrades performance of alpha-softmmu.
It spends about 25% of all cpu time in memset.


r~



Re: [Qemu-devel] [PATCH] tcg/softmmu: Increase size of TLB caches

2017-08-29 Thread Richard Henderson
On 08/29/2017 10:23 AM, Pranith Kumar wrote:
> This patch increases the number of entries cached in the TLB. I went
> over a few architectures to see if increasing it is problematic. Only
> armv6 seems to have a limitation that only 8 bits can be used for
> indexing these entries. For other architectures, the number of TLB
> entries is increased to a 4K-sized cache. The patch also doubles the
> number of victim TLB entries.
> 
> Some statistics collected from a build benchmark for various cache
> sizes is listed below:
> 
> | TLB bits\vTLB entires | 8 |16  |32 |
> | 8 | 952.94(+0.0%) | 929.99(+2.4%)  | 919.02(+3.6%) |
> |10 | 898.92(+5.6%) | 886.13(+7.0%)  | 887.03(+6.9%) |
> |12 | 878.56(+7.8%) | 873.53(+8.3%)* | 875.34(+8.1%) |
> 
> The best combination for this workload came out to be 12 bits for the
> TLB and a 16 entry vTLB cache.
> 
> Signed-off-by: Pranith Kumar 
> ---
>  include/exec/cpu-defs.h  | 6 +++---
>  tcg/aarch64/tcg-target.h | 1 +
>  tcg/arm/tcg-target.h | 1 +
>  tcg/i386/tcg-target.h| 2 ++
>  tcg/ia64/tcg-target.h| 1 +
>  tcg/mips/tcg-target.h| 2 ++
>  tcg/ppc/tcg-target.h | 1 +
>  tcg/s390/tcg-target.h| 1 +
>  tcg/sparc/tcg-target.h   | 1 +
>  tcg/tci/tcg-target.h | 1 +
>  10 files changed, 14 insertions(+), 3 deletions(-)

Reviewed-by: Richard Henderson 


r~