Hi Marcelo:
mmu_recycled is always 0 for this guest -- even after almost 4 hours of uptime.
Here is a kvm_stat sample where guest time was very high and qemu had 2
processors at 100% on the host. I removed counters where both columns have 0
value for brevity.
exits 45937979 758051
fpu_reload 1416831 87
halt_exits 112911 0
halt_wakeup 31771 0
host_state_reload 2068602 263
insn_emulation 21601480 365493
io_exits 1827374 2705
irq_exits 8934818 285196
mmio_exits 421674 147
mmu_cache_miss 4817689 93680
mmu_flooded 4815273 93680
mmu_pde_zapped 51344 0
mmu_prefetch 4817625 93680
mmu_pte_updated 14803298 270104
mmu_pte_write 19859863 363785
mmu_shadow_zapped 4832106 93679
pf_fixed 32184355 468398
pf_guest 264138 0
remote_tlb_flush 10697762 280522
tlb_flush 10301338 176424
(NOTE: This is for a *5* second sample interval instead of 1 to allow me to
capture the data).
Here's a sample when the guest is "well-behaved" (system time <10%, though ):
exits 51502194 97453
fpu_reload 1421736 227
halt_exits 138361 1927
halt_wakeup 33047 117
host_state_reload 2110190 3740
insn_emulation 24367441 47260
io_exits 1874075 2576
irq_exits 10224702 13333
mmio_exits 435154 1726
mmu_cache_miss 5414097 11258
mmu_flooded 5411548 11243
mmu_pde_zapped 52851 44
mmu_prefetch 5414031 11258
mmu_pte_updated 16854686 29901
mmu_pte_write 22526765 42285
mmu_shadow_zapped 5430025 11313
pf_fixed 36144578 67666
pf_guest 282794 430
remote_tlb_flush 12126268 14619
tlb_flush 11753162 21460
There is definitely a strong correlation between the mmu counters and high
system times in the guest. I am still trying to find out what in the guest is
stimulating it when running on RHEL3; I do not see this same behavior for an
equivalent setup running on RHEL4.
By the way I added an mmu_prefetch stat in prefetch_page() to count the number
of times the for() loop is hit with PTTYPE == 64; ie., number of times
paging64_prefetch_page() is invoked. (I wanted an explicit counter for this
loop, though the info seems to duplicate other entries.) That counter is listed
above. As I mentioned in a prior post when kscand kicks in the change in
mmu_prefetch counter is at 20,000+/sec, with each trip through that function
taking 45k+ cycles.
kscand is an instigator shortly after boot, however, kscand is *not* the culprit
once the system has been up for 30-45 minutes. I have started instrumenting the
RHEL3U8 kernel and for the load I am running kscand does not walk the active
lists very often once the system is up.
So, to dig deeper on what in the guest is stimulating the mmu I collected
kvmtrace data for about a 2 minute time interval which caught about a 30-second
period where guest system time was steady in the 25-30% range. Summarizing the
number of times a RIP appears in an VMEXIT shows the following high runners:
count RIP RHEL3-symbol
82549 0xc0140e42 follow_page [kernel] c0140d90 offset b2
42532 0xc0144760 handle_mm_fault [kernel] c01446d0 offset 90
36826 0xc013da4a futex_wait [kernel] c013d870 offset 1da
29987 0xc0145cd0 zap_pte_range [kernel] c0145c10 offset c0
27451 0xc0144018 do_no_page [kernel] c0143e20 offset 1f8
(halt entry removed the list since that is the ideal scenario for an exit).
So the RIP correlates to follow_page() for a large percentage of the VMEXITs.
I wrote an awk script to summarize (histogram style) the TSC cycles between
VMEXIT and VMENTRY for an address. For the first rip, 0xc0140e42, 82,271 times
(ie., almost 100% of the time) the trace shows a delta between 50k and 100k
cycles between the VMEXIT and the subsequent VMENTRY. Similarly for the second
one, 0xc0144760, 42403 times (again almost 100% of the occurrences) the trace
shows a delta between 50k and 100k cycles between VMEXIT and VMENTRY. These
seems to correlate with the prefetch_page function in kvm, though I am not 100%
positive on that.
I am now investigating the kernel paths leading to those functions. Any insights
would definitely be appreciated.
david
Marcelo Tosatti wrote:
> On Fri, Apr 25, 2008 at 11:33:18AM -0600, David S. Ahern wrote:
>> Most of the cycles (~80% of that 54k+) are spent in paging64_prefetch_page():
>>
>> for (i = 0; i < PT64_ENT_PER_PAGE; ++i) {
>> gpa_t pte_gpa = gfn_to_gpa(sp->gfn);
>> pte_gpa += (i+offset) * sizeof(pt_element_t);
>>
>> r = kvm_read_guest_atomic(vcpu->kvm, pte_gpa, &pt,
>> sizeof(pt_element_t));
>> if (r || is_present_pte(pt))
>> sp->spt[i] = shadow_trap_nonpresent_pte;
>> else
>> sp->spt[i] = shadow_notrap_nonpresent_pte;
>> }
>>
>> This loop is run 512 times and takes a total of ~45k cycles, or ~88 cycles
>> per
>> loop.
>>
>> This function gets run >20,000/sec during some of the kscand loops.
>
> Hi David,
>
> Do you see the mmu_recycled counter increase?
>
-------------------------------------------------------------------------
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference
Don't miss this year's exciting event. There's still time to save $100.
Use priority code J8TL2D2.
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
_______________________________________________
kvm-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/kvm-devel