> -----Original Message----- > From: longpeng > Sent: Friday, September 02, 2016 10:23 AM > To: ehabk...@redhat.com; r...@twiddle.net; pbonz...@redhat.com; > m...@redhat.com > Cc: Zhaoshenglong; Gonglei (Arei); Huangpeng (Peter); Herongguang (Stephen); > qemu-devel@nongnu.org; Longpeng(Mike) > Subject: [PATCH v3] target-i386: present virtual L3 cache info for vcpus > > From: "Longpeng(Mike)" <longp...@huawei2.com> >
A typo in email address, pls resend the v3. > Some software algorithms are based on the hardware's cache info, for > example, > for x86 linux kernel, when cpu1 want to wakeup a task on cpu2, cpu1 will > trigger > a resched IPI and told cpu2 to do the wakeup if they don't share low level > cache. Oppositely, cpu1 will access cpu2's runqueue directly if they share > llc. > The relevant linux-kernel code as bellow: > > static void ttwu_queue(struct task_struct *p, int cpu) > { > struct rq *rq = cpu_rq(cpu); > ...... > if (... && !cpus_share_cache(smp_processor_id(), cpu)) { > ...... > ttwu_queue_remote(p, cpu); /* will trigger RES IPI */ > return; > } > ...... > ttwu_do_activate(rq, p, 0); /* access target's rq directly */ > ...... > } > > In real hardware, the cpus on the same socket share L3 cache, so one won't > trigger a resched IPIs when wakeup a task on others. But QEMU doesn't > present a > virtual L3 cache info for VM, then the linux guest will trigger lots of RES > IPIs > under some workloads even if the virtual cpus belongs to the same virtual > socket. > > For KVM, this degrades performance, because there will be lots of vmexit due > to > guest send IPIs. > > The workload is a SAP HANA's testsuite, we run it one round(about 40 > minuates) > and observe the (Suse11sp3)Guest's amounts of RES IPIs which triggering > during > the period: > > No-L3 With-L3(applied this patch) > cpu0: 363890 44582 > cpu1: 373405 43109 > cpu2: 340783 43797 > cpu3: 333854 43409 > cpu4: 327170 40038 > cpu5: 325491 39922 > cpu6: 319129 42391 > cpu7: 306480 41035 > cpu8: 161139 32188 > cpu9: 164649 31024 > cpu10: 149823 30398 > cpu11: 149823 32455 > cpu12: 164830 35143 > cpu13: 172269 35805 > cpu14: 179979 33898 > cpu15: 194505 32754 > avg: 268963.6 40129.8 > > The VM's topology is "1*socket 8*cores 2*threads". > After present virtual L3 cache info for VM, the amounts of RES IPIs in guest > reduce 85%. > > What's more, for KVM, vcpus send IPIs will cause vmexit which is expensive. > We had tested the overall system performance if vcpus actually run on sparate > physical socket. With L3 cache, the performance improves > 7.2%~33.1%(avg:15.7%). > > Signed-off-by: Longpeng(Mike) <longp...@huawei2.com> > Here as well. Regards, -Gonglei