Re: [Qemu-devel] [PATCH] tcg/softmmu: Increase size of TLB caches
Richard Henderson writes: > On 09/05/2017 05:33 PM, Pranith Kumar wrote: >>> This significantly degrades performance of alpha-softmmu. >>> It spends about 25% of all cpu time in memset. >> >> What workload does it degrade for? I will try to reproduce and see >> which memset is causing this. > > emerge --update --ask @world > > which is a lot of python, iirc. > > It is the tlb flush memset that is causing this. > The tlb has grown from a few kB to 300kB with your patch. Hmm I wonder if this is QEMU doing global flushes which could actually be partial TLB flushes? emerge is quite a memory hog though so its conceivable we are generating a lot of churn as it runs. -- Alex Bennée
Re: [Qemu-devel] [PATCH] tcg/softmmu: Increase size of TLB caches
On 09/05/2017 05:33 PM, Pranith Kumar wrote: >> This significantly degrades performance of alpha-softmmu. >> It spends about 25% of all cpu time in memset. > > What workload does it degrade for? I will try to reproduce and see > which memset is causing this. emerge --update --ask @world which is a lot of python, iirc. It is the tlb flush memset that is causing this. The tlb has grown from a few kB to 300kB with your patch. r~
Re: [Qemu-devel] [PATCH] tcg/softmmu: Increase size of TLB caches
On Tue, Sep 5, 2017 at 5:50 PM, Richard Henderson wrote: > On 08/29/2017 10:23 AM, Pranith Kumar wrote: >> This patch increases the number of entries cached in the TLB. I went >> over a few architectures to see if increasing it is problematic. Only >> armv6 seems to have a limitation that only 8 bits can be used for >> indexing these entries. For other architectures, the number of TLB >> entries is increased to a 4K-sized cache. The patch also doubles the >> number of victim TLB entries. >> >> Some statistics collected from a build benchmark for various cache >> sizes is listed below: >> >> | TLB bits\vTLB entires | 8 |16 |32 | >> | 8 | 952.94(+0.0%) | 929.99(+2.4%) | 919.02(+3.6%) | >> |10 | 898.92(+5.6%) | 886.13(+7.0%) | 887.03(+6.9%) | >> |12 | 878.56(+7.8%) | 873.53(+8.3%)* | 875.34(+8.1%) | >> >> The best combination for this workload came out to be 12 bits for the >> TLB and a 16 entry vTLB cache. > > This significantly degrades performance of alpha-softmmu. > It spends about 25% of all cpu time in memset. What workload does it degrade for? I will try to reproduce and see which memset is causing this. -- Pranith
Re: [Qemu-devel] [PATCH] tcg/softmmu: Increase size of TLB caches
On 08/29/2017 10:23 AM, Pranith Kumar wrote: > This patch increases the number of entries cached in the TLB. I went > over a few architectures to see if increasing it is problematic. Only > armv6 seems to have a limitation that only 8 bits can be used for > indexing these entries. For other architectures, the number of TLB > entries is increased to a 4K-sized cache. The patch also doubles the > number of victim TLB entries. > > Some statistics collected from a build benchmark for various cache > sizes is listed below: > > | TLB bits\vTLB entires | 8 |16 |32 | > | 8 | 952.94(+0.0%) | 929.99(+2.4%) | 919.02(+3.6%) | > |10 | 898.92(+5.6%) | 886.13(+7.0%) | 887.03(+6.9%) | > |12 | 878.56(+7.8%) | 873.53(+8.3%)* | 875.34(+8.1%) | > > The best combination for this workload came out to be 12 bits for the > TLB and a 16 entry vTLB cache. This significantly degrades performance of alpha-softmmu. It spends about 25% of all cpu time in memset. r~
Re: [Qemu-devel] [PATCH] tcg/softmmu: Increase size of TLB caches
On 08/29/2017 10:23 AM, Pranith Kumar wrote: > This patch increases the number of entries cached in the TLB. I went > over a few architectures to see if increasing it is problematic. Only > armv6 seems to have a limitation that only 8 bits can be used for > indexing these entries. For other architectures, the number of TLB > entries is increased to a 4K-sized cache. The patch also doubles the > number of victim TLB entries. > > Some statistics collected from a build benchmark for various cache > sizes is listed below: > > | TLB bits\vTLB entires | 8 |16 |32 | > | 8 | 952.94(+0.0%) | 929.99(+2.4%) | 919.02(+3.6%) | > |10 | 898.92(+5.6%) | 886.13(+7.0%) | 887.03(+6.9%) | > |12 | 878.56(+7.8%) | 873.53(+8.3%)* | 875.34(+8.1%) | > > The best combination for this workload came out to be 12 bits for the > TLB and a 16 entry vTLB cache. > > Signed-off-by: Pranith Kumar > --- > include/exec/cpu-defs.h | 6 +++--- > tcg/aarch64/tcg-target.h | 1 + > tcg/arm/tcg-target.h | 1 + > tcg/i386/tcg-target.h| 2 ++ > tcg/ia64/tcg-target.h| 1 + > tcg/mips/tcg-target.h| 2 ++ > tcg/ppc/tcg-target.h | 1 + > tcg/s390/tcg-target.h| 1 + > tcg/sparc/tcg-target.h | 1 + > tcg/tci/tcg-target.h | 1 + > 10 files changed, 14 insertions(+), 3 deletions(-) Reviewed-by: Richard Henderson r~