Re: Strange CPU usage pattern in SMP guest
On Tue, Mar 30, 2010 at 10:27:43AM +0200, Sebastian Hetze wrote: > With this patch applied, the system runs without hrtimer messages since > 5 days and the timer iterrupts look fine. > > Actually, when restarting the system with the hrtimer patch applied, > we also changed the BIOS setting to disable Intel SmartStep on the host. > Since there are no hrtimer messages at all, it might be that the SmartStep > CPU frequency adjustment is the real cause for the slow interrupts in > the KVM guest. Anyone else experienced these problems? Back with SmartStep enabled, we get "hrtimer: interrupt took 119986041 ns" so it really looks like this SmartStep feature is the actual cause for the interrupts to be delayed in the first place. The system continues to run correctly so far, so the patch does fix the problems caused by these delayed interrupts. Thanx alot & best regards, Sebastian -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Strange CPU usage pattern in SMP guest
On Tue, Mar 23, 2010 at 06:18:08PM -0300, Marcelo Tosatti wrote: > On Mon, Mar 22, 2010 at 01:51:20PM +0100, Sebastian Hetze wrote: > > On Sun, Mar 21, 2010 at 05:17:38PM +0200, Avi Kivity wrote: > > > On 03/21/2010 04:55 PM, Sebastian Hetze wrote: > > >> On Sun, Mar 21, 2010 at 02:19:40PM +0200, Avi Kivity wrote: > > >> > > >>> On 03/21/2010 02:02 PM, Sebastian Hetze wrote: > > >>> > > 12:46:02 CPU%usr %nice%sys %iowait%irq %soft > > %steal %guest %idle > > 12:46:03 all0,20 11,35 10,968,960,402,99 > > 0,000,00 65,14 > > 12:46:03 01,00 11,007,00 15,000,001,00 > > 0,000,00 65,00 > > 12:46:03 10,007,142,046,121,02 11,22 > > 0,000,00 72,45 > > 12:46:03 20,00 15,001,00 12,000,001,00 > > 0,000,00 71,00 > > 12:46:03 30,00 11,00 23,008,000,000,00 > > 0,000,00 58,00 > > 12:46:03 40,000,00 50,000,000,000,00 > > 0,000,00 50,00 > > 12:46:03 50,00 13,00 20,004,000,001,00 > > 0,000,00 62,00 > > > > So it is only CPU4 that is showing this strange behaviour. > > > > > > >>> Can you adjust irqtop to only count cpu4? or even just post a few 'cat > > >>> /proc/interrupts' from that guest. > > >>> > > >>> Most likely the timer interrupt for cpu4 died. > > >>> > > >> I've added two keys +/- to your irqtop to focus up and down > > >> in the row of available CPUs. > > >> The irqtop for CPU4 shows a constant number of 6 local timer interrupts > > >> per update, while the other CPUs show various higher values: > > >> > > >> irqtop for cpu 4 > > >> > > >> eth0 188 > > >> Rescheduling interrupts 162 > > >> Local timer interrupts 6 > > >> ata_piix3 > > >> TLB shootdowns 1 > > >> Spurious interrupts 0 > > >> Machine check exceptions0 > > >> > > >> > > >> irqtop for cpu 5 > > >> > > >> eth0 257 > > >> Local timer interrupts251 > > >> Rescheduling interrupts 237 > > >> Spurious interrupts 0 > > >> Machine check exceptions0 > > >> > > >> So the timer interrupt for cpu4 is not completely dead but somehow > > >> broken. > > > > > > That is incredibly weird. > > > > > >> What can cause this problem? Any way to speed it up again? > > >> > > > > > > The host has 8 cpus and is only running this 6 vcpu guest, yes? > > > > > > Can you confirm the other vcpus are ticking at 250 Hz? > > > > > > What does 'top' show running on cpu 4? Pressing 'f' 'j' will add a > > > last-used-cpu field in the display. > > > > > > Marcelo, any ideas? > > > > Just to let you know, right after startup, all vcpus work fine. > > > > The following message might be related to the problem: > > hrtimer: interrupt too slow, forcing clock min delta to 165954639 ns > > > > The guest is an 32bit system running on an 64bit host. > > Sebastian, > > Please apply the attached patch to your guest kernel. > With this patch applied, the system runs without hrtimer messages since 5 days and the timer iterrupts look fine. However, I had this Clocksource tsc unstable (delta = -4398046474878 ns) message that I reported on Sunday. Actually, when restarting the system with the hrtimer patch applied, we also changed the BIOS setting to disable Intel SmartStep on the host. Since there are no hrtimer messages at all, it might be that the SmartStep CPU frequency adjustment is the real cause for the slow interrupts in the KVM guest. Anyone else experienced these problems? -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Strange CPU usage pattern in SMP guest
On Mon, Mar 22, 2010 at 01:51:20PM +0100, Sebastian Hetze wrote: > On Sun, Mar 21, 2010 at 05:17:38PM +0200, Avi Kivity wrote: > > On 03/21/2010 04:55 PM, Sebastian Hetze wrote: > >> On Sun, Mar 21, 2010 at 02:19:40PM +0200, Avi Kivity wrote: > >> > >>> On 03/21/2010 02:02 PM, Sebastian Hetze wrote: > >>> > 12:46:02 CPU%usr %nice%sys %iowait%irq %soft %steal > %guest %idle > 12:46:03 all0,20 11,35 10,968,960,402,990,00 > 0,00 65,14 > 12:46:03 01,00 11,007,00 15,000,001,000,00 > 0,00 65,00 > 12:46:03 10,007,142,046,121,02 11,220,00 > 0,00 72,45 > 12:46:03 20,00 15,001,00 12,000,001,000,00 > 0,00 71,00 > 12:46:03 30,00 11,00 23,008,000,000,000,00 > 0,00 58,00 > 12:46:03 40,000,00 50,000,000,000,000,00 > 0,00 50,00 > 12:46:03 50,00 13,00 20,004,000,001,000,00 > 0,00 62,00 > > So it is only CPU4 that is showing this strange behaviour. > > > >>> Can you adjust irqtop to only count cpu4? or even just post a few 'cat > >>> /proc/interrupts' from that guest. > >>> > >>> Most likely the timer interrupt for cpu4 died. > >>> > >> I've added two keys +/- to your irqtop to focus up and down > >> in the row of available CPUs. > >> The irqtop for CPU4 shows a constant number of 6 local timer interrupts > >> per update, while the other CPUs show various higher values: > >> > >> irqtop for cpu 4 > >> > >> eth0 188 > >> Rescheduling interrupts 162 > >> Local timer interrupts 6 > >> ata_piix3 > >> TLB shootdowns 1 > >> Spurious interrupts 0 > >> Machine check exceptions0 > >> > >> > >> irqtop for cpu 5 > >> > >> eth0 257 > >> Local timer interrupts251 > >> Rescheduling interrupts 237 > >> Spurious interrupts 0 > >> Machine check exceptions0 > >> > >> So the timer interrupt for cpu4 is not completely dead but somehow > >> broken. > > > > That is incredibly weird. > > > >> What can cause this problem? Any way to speed it up again? > >> > > > > The host has 8 cpus and is only running this 6 vcpu guest, yes? > > > > Can you confirm the other vcpus are ticking at 250 Hz? > > > > What does 'top' show running on cpu 4? Pressing 'f' 'j' will add a > > last-used-cpu field in the display. > > > > Marcelo, any ideas? > > Just to let you know, right after startup, all vcpus work fine. > > The following message might be related to the problem: > hrtimer: interrupt too slow, forcing clock min delta to 165954639 ns > > The guest is an 32bit system running on an 64bit host. Sebastian, Please apply the attached patch to your guest kernel. commit ff28aa973164f8e5f4b2633e178162d121c4e5eb Author: Thomas Gleixner Date: Fri Nov 13 17:05:44 2009 +0100 hrtimer: Tune hrtimer_interrupt hang logic The hrtimer_interrupt hang logic adjusts min_delta_ns based on the execution time of the hrtimer callbacks. This is error-prone for virtual machines, where a guest vcpu can be scheduled out during the execution of the callbacks (and the callbacks themselves can do operations that translate to blocking operations in the hypervisor), which in can lead to large min_delta_ns rendering the system unusable. Replace the current heuristics with something more reliable. Allow the interrupt code to try 3 times to catch up with the lost time. If that fails use the total time spent in the interrupt handler to defer the next timer interrupt so the system can catch up with other things which got delayed. Limit that deferment to 100ms. The retry events and the maximum time spent in the interrupt handler are recorded and exposed via /proc/timer_list Inspired by a patch from Marcelo. Reported-by: Michael Tokarev Signed-off-by: Thomas Gleixner Tested-by: Marcelo Tosatti Cc: kvm@vger.kernel.org diff --git a/include/linux/hrtimer.h b/include/linux/hrtimer.h index 4759917..2e1064f 100644 --- a/include/linux/hrtimer.h +++ b/include/linux/hrtimer.h @@ -164,10 +164,11 @@ struct hrtimer_clock_base { * @expires_next: absolute time of the next event which was scheduled * via clock_set_next_event() * @hres_active: State of high resolution mode - * @check_clocks: Indictator, when set evaluate time source and clock - * event devices whether high resolution mode can be - *
Re: Strange CPU usage pattern in SMP guest
On Sun, Mar 21, 2010 at 05:17:38PM +0200, Avi Kivity wrote: > On 03/21/2010 04:55 PM, Sebastian Hetze wrote: >> On Sun, Mar 21, 2010 at 02:19:40PM +0200, Avi Kivity wrote: >> >>> On 03/21/2010 02:02 PM, Sebastian Hetze wrote: >>> 12:46:02 CPU%usr %nice%sys %iowait%irq %soft %steal %guest %idle 12:46:03 all0,20 11,35 10,968,960,402,990,00 0,00 65,14 12:46:03 01,00 11,007,00 15,000,001,000,00 0,00 65,00 12:46:03 10,007,142,046,121,02 11,220,00 0,00 72,45 12:46:03 20,00 15,001,00 12,000,001,000,00 0,00 71,00 12:46:03 30,00 11,00 23,008,000,000,000,00 0,00 58,00 12:46:03 40,000,00 50,000,000,000,000,00 0,00 50,00 12:46:03 50,00 13,00 20,004,000,001,000,00 0,00 62,00 So it is only CPU4 that is showing this strange behaviour. >>> Can you adjust irqtop to only count cpu4? or even just post a few 'cat >>> /proc/interrupts' from that guest. >>> >>> Most likely the timer interrupt for cpu4 died. >>> >> I've added two keys +/- to your irqtop to focus up and down >> in the row of available CPUs. >> The irqtop for CPU4 shows a constant number of 6 local timer interrupts >> per update, while the other CPUs show various higher values: >> >> irqtop for cpu 4 >> >> eth0 188 >> Rescheduling interrupts 162 >> Local timer interrupts 6 >> ata_piix3 >> TLB shootdowns 1 >> Spurious interrupts 0 >> Machine check exceptions0 >> >> >> irqtop for cpu 5 >> >> eth0 257 >> Local timer interrupts251 >> Rescheduling interrupts 237 >> Spurious interrupts 0 >> Machine check exceptions0 >> >> So the timer interrupt for cpu4 is not completely dead but somehow >> broken. > > That is incredibly weird. > >> What can cause this problem? Any way to speed it up again? >> > > The host has 8 cpus and is only running this 6 vcpu guest, yes? > > Can you confirm the other vcpus are ticking at 250 Hz? > > What does 'top' show running on cpu 4? Pressing 'f' 'j' will add a > last-used-cpu field in the display. > > Marcelo, any ideas? Just to let you know, right after startup, all vcpus work fine. The following message might be related to the problem: hrtimer: interrupt too slow, forcing clock min delta to 165954639 ns The guest is an 32bit system running on an 64bit host. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Strange CPU usage pattern in SMP guest
On Sun, Mar 21, 2010 at 05:17:38PM +0200, Avi Kivity wrote: > On 03/21/2010 04:55 PM, Sebastian Hetze wrote: >> On Sun, Mar 21, 2010 at 02:19:40PM +0200, Avi Kivity wrote: >> >>> On 03/21/2010 02:02 PM, Sebastian Hetze wrote: >>> 12:46:02 CPU%usr %nice%sys %iowait%irq %soft %steal %guest %idle 12:46:03 all0,20 11,35 10,968,960,402,990,00 0,00 65,14 12:46:03 01,00 11,007,00 15,000,001,000,00 0,00 65,00 12:46:03 10,007,142,046,121,02 11,220,00 0,00 72,45 12:46:03 20,00 15,001,00 12,000,001,000,00 0,00 71,00 12:46:03 30,00 11,00 23,008,000,000,000,00 0,00 58,00 12:46:03 40,000,00 50,000,000,000,000,00 0,00 50,00 12:46:03 50,00 13,00 20,004,000,001,000,00 0,00 62,00 So it is only CPU4 that is showing this strange behaviour. >>> Can you adjust irqtop to only count cpu4? or even just post a few 'cat >>> /proc/interrupts' from that guest. >>> >>> Most likely the timer interrupt for cpu4 died. >>> >> I've added two keys +/- to your irqtop to focus up and down >> in the row of available CPUs. >> The irqtop for CPU4 shows a constant number of 6 local timer interrupts >> per update, while the other CPUs show various higher values: >> >> irqtop for cpu 4 >> >> eth0 188 >> Rescheduling interrupts 162 >> Local timer interrupts 6 >> ata_piix3 >> TLB shootdowns 1 >> Spurious interrupts 0 >> Machine check exceptions0 >> >> >> irqtop for cpu 5 >> >> eth0 257 >> Local timer interrupts251 >> Rescheduling interrupts 237 >> Spurious interrupts 0 >> Machine check exceptions0 >> >> So the timer interrupt for cpu4 is not completely dead but somehow >> broken. > > That is incredibly weird. > >> What can cause this problem? Any way to speed it up again? >> > > The host has 8 cpus and is only running this 6 vcpu guest, yes? The host is an dual quad core E5520 with hyperthrading enabled, so we see 2x4x2=16 CPUs on the host. The guest is started with 6 CPUs. > Can you confirm the other vcpus are ticking at 250 Hz? The irqtop shows different numbers for local timer interrupts on the other CPUs. The total number (summed up over all CPUs) varies between something like 700 and 1400. Any CPU can be down to 10 and next update up to 260. Only CPU4 stays at the 6 local timer interrupts. > > What does 'top' show running on cpu 4? Pressing 'f' 'j' will add a > last-used-cpu field in the display. The processes are not bound to a particular CPU, so the picture varies. Here are two shots: take1: 15 root RT -5 000 S0 0.0 0:01.70 4 migration/4 16 root 15 -5 000 S0 0.0 0:00.08 4 ksoftirqd/4 17 root RT -5 000 S0 0.0 0:00.00 4 watchdog/4 25 root 15 -5 000 S0 0.0 0:00.01 4 events/4 35 root 15 -5 000 S0 0.0 0:00.00 4 kintegrityd/4 41 root 15 -5 000 S0 0.0 0:00.03 4 kblockd/4 50 root 15 -5 000 S0 0.0 0:00.90 4 ata/4 55 root 15 -5 000 S0 0.0 0:00.00 4 kseriod 66 root 15 -5 000 S0 0.0 0:00.00 4 aio/4 73 root 15 -5 000 S0 0.0 0:00.00 4 crypto/4 80 root 15 -5 000 S0 0.0 2:11.71 4 scsi_eh_1 87 root 15 -5 000 S0 0.0 0:00.00 4 kmpathd/4 95 root 15 -5 000 S0 0.0 0:00.00 4 kondemand/4 101 root 15 -5 000 S0 0.0 0:00.00 4 kconservative/4 103 root 10 -10 000 S0 0.0 0:00.00 4 krfcommd 681 root 15 -5 000 S0 0.0 0:00.00 4 kdmflush 686 root 15 -5 000 S0 0.0 0:00.00 4 kdmflush 691 root 15 -5 000 S0 0.0 0:00.00 4 kdmflush 737 root 15 -5 000 S0 0.0 0:00.71 4 kjournald 826 root 16 -4 2100 452 312 S0 0.0 0:00.14 4 udevd 1350 root 15 -5 000 S0 0.0 0:00.00 4 kpsmoused 1444 root 15 -5 000 S0 0.0 0:00.00 4 kgameportd 1718 root 15 -5 000 S0 0.0 0:14.62 4 kjournald 2108 statd 20 0 2252 1152 760 S0 0.0 0:02.66 4 rpc.statd 2117 root 15 -5 000 S0 0.0 0:00.36 4 rpciod/4 2123 root 15 -5 00
Re: Strange CPU usage pattern in SMP guest
On 03/21/2010 04:55 PM, Sebastian Hetze wrote: On Sun, Mar 21, 2010 at 02:19:40PM +0200, Avi Kivity wrote: On 03/21/2010 02:02 PM, Sebastian Hetze wrote: 12:46:02 CPU%usr %nice%sys %iowait%irq %soft %steal %guest %idle 12:46:03 all0,20 11,35 10,968,960,402,990,00 0,00 65,14 12:46:03 01,00 11,007,00 15,000,001,000,00 0,00 65,00 12:46:03 10,007,142,046,121,02 11,220,00 0,00 72,45 12:46:03 20,00 15,001,00 12,000,001,000,00 0,00 71,00 12:46:03 30,00 11,00 23,008,000,000,000,00 0,00 58,00 12:46:03 40,000,00 50,000,000,000,000,00 0,00 50,00 12:46:03 50,00 13,00 20,004,000,001,000,00 0,00 62,00 So it is only CPU4 that is showing this strange behaviour. Can you adjust irqtop to only count cpu4? or even just post a few 'cat /proc/interrupts' from that guest. Most likely the timer interrupt for cpu4 died. I've added two keys +/- to your irqtop to focus up and down in the row of available CPUs. The irqtop for CPU4 shows a constant number of 6 local timer interrupts per update, while the other CPUs show various higher values: irqtop for cpu 4 eth0 188 Rescheduling interrupts 162 Local timer interrupts 6 ata_piix3 TLB shootdowns 1 Spurious interrupts 0 Machine check exceptions0 irqtop for cpu 5 eth0 257 Local timer interrupts251 Rescheduling interrupts 237 Spurious interrupts 0 Machine check exceptions0 So the timer interrupt for cpu4 is not completely dead but somehow broken. That is incredibly weird. What can cause this problem? Any way to speed it up again? The host has 8 cpus and is only running this 6 vcpu guest, yes? Can you confirm the other vcpus are ticking at 250 Hz? What does 'top' show running on cpu 4? Pressing 'f' 'j' will add a last-used-cpu field in the display. Marcelo, any ideas? -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Strange CPU usage pattern in SMP guest
On Sun, Mar 21, 2010 at 02:19:40PM +0200, Avi Kivity wrote: > On 03/21/2010 02:02 PM, Sebastian Hetze wrote: >> >> 12:46:02 CPU%usr %nice%sys %iowait%irq %soft %steal >> %guest %idle >> 12:46:03 all0,20 11,35 10,968,960,402,990,00 >> 0,00 65,14 >> 12:46:03 01,00 11,007,00 15,000,001,000,00 >> 0,00 65,00 >> 12:46:03 10,007,142,046,121,02 11,220,00 >> 0,00 72,45 >> 12:46:03 20,00 15,001,00 12,000,001,000,00 >> 0,00 71,00 >> 12:46:03 30,00 11,00 23,008,000,000,000,00 >> 0,00 58,00 >> 12:46:03 40,000,00 50,000,000,000,000,00 >> 0,00 50,00 >> 12:46:03 50,00 13,00 20,004,000,001,000,00 >> 0,00 62,00 >> >> So it is only CPU4 that is showing this strange behaviour. >> > > Can you adjust irqtop to only count cpu4? or even just post a few 'cat > /proc/interrupts' from that guest. > > Most likely the timer interrupt for cpu4 died. I've added two keys +/- to your irqtop to focus up and down in the row of available CPUs. The irqtop for CPU4 shows a constant number of 6 local timer interrupts per update, while the other CPUs show various higher values: irqtop for cpu 4 eth0 188 Rescheduling interrupts 162 Local timer interrupts 6 ata_piix3 TLB shootdowns 1 Spurious interrupts 0 Machine check exceptions0 irqtop for cpu 5 eth0 257 Local timer interrupts251 Rescheduling interrupts 237 Spurious interrupts 0 Machine check exceptions0 So the timer interrupt for cpu4 is not completely dead but somehow broken. What can cause this problem? Any way to speed it up again? #!/usr/bin/python import curses import sys, os, time, optparse def read_interrupts(): global target irq = {} proc = file('/proc/interrupts') nrcpu = len(proc.readline().split()) if target < 0: target = 0; if target > nrcpu: target = nrcpu for line in proc.readlines(): vec, data = line.strip().split(':', 1) if vec in ('ERR', 'MIS'): continue counts = data.split(None, nrcpu) counts, rest = (counts[:-1], counts[-1]) if target == 0: count = sum([int(x) for x in counts]) else: count = int(counts[target-1]) try: v = int(vec) name = rest.split(None, 1)[1] except: name = rest irq[name] = count return irq def delta_interrupts(): old = read_interrupts() while True: irq = read_interrupts() delta = {} for key in irq.keys(): delta[key] = irq[key] - old[key] yield delta old = irq target = 0 label_width = 35 number_width = 10 def tui(screen): curses.use_default_colors() global target curses.noecho() def getcount(x): return x[1] def refresh(irq): screen.erase() if target > 0: title = "irqtop for cpu %d"%(target-1) else: title = "irqtop sum for all cpu's" screen.addstr(0, 0, title) row = 2 for name, count in sorted(irq.items(), key = getcount, reverse = True): if row >= screen.getmaxyx()[0]: break col = 1 screen.addstr(row, col, name) col += label_width screen.addstr(row, col, '%10d' % (count,)) row += 1 screen.refresh() for irqs in delta_interrupts(): refresh(irqs) curses.halfdelay(10) try: c = screen.getkey() if c == 'q': break if c == '+': target = target+1 if c == '-': target = target-1 except KeyboardInterrupt: break except curses.error: continue import curses.wrapper curses.wrapper(tui)
Re: Strange CPU usage pattern in SMP guest
On 03/21/2010 02:02 PM, Sebastian Hetze wrote: 12:46:02 CPU%usr %nice%sys %iowait%irq %soft %steal %guest %idle 12:46:03 all0,20 11,35 10,968,960,402,990,00 0,00 65,14 12:46:03 01,00 11,007,00 15,000,001,000,00 0,00 65,00 12:46:03 10,007,142,046,121,02 11,220,00 0,00 72,45 12:46:03 20,00 15,001,00 12,000,001,000,00 0,00 71,00 12:46:03 30,00 11,00 23,008,000,000,000,00 0,00 58,00 12:46:03 40,000,00 50,000,000,000,000,00 0,00 50,00 12:46:03 50,00 13,00 20,004,000,001,000,00 0,00 62,00 So it is only CPU4 that is showing this strange behaviour. Can you adjust irqtop to only count cpu4? or even just post a few 'cat /proc/interrupts' from that guest. Most likely the timer interrupt for cpu4 died. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Strange CPU usage pattern in SMP guest
On Sun, Mar 21, 2010 at 12:09:00PM +0200, Avi Kivity wrote: > On 03/21/2010 02:13 AM, Sebastian Hetze wrote: >> Hi *, >> >> in an 6 CPU SMP guest running on an host with 2 quad core >> Intel Xeon E5520 with hyperthrading enabled >> we see one or more guest CPUs working in a very strange >> pattern. It looks like all or nothing. We can easily identify >> the effected CPU with xosview. Here is the mpstat output >> compared to one regular working CPU: >> >> >> mpstat -P 4 1 >> Linux 2.6.31-16-generic-pae (guest) 21.03.2010 _i686_ (6 CPU) >> 00:45:19 CPU%usr %nice%sys %iowait%irq %soft %steal >> %guest %idle >> 00:45:20 40,00 100,000,000,000,000,000,00 >> 0,000,00 >> 00:45:21 40,00 100,000,000,000,000,000,00 >> 0,000,00 >> 00:45:22 40,00 100,000,000,000,000,000,00 >> 0,000,00 >> 00:45:23 40,00 100,000,000,000,000,000,00 >> 0,000,00 >> 00:45:24 40,00 66,670,000,000,00 33,330,00 >> 0,000,00 >> 00:45:25 40,00 100,000,000,000,000,000,00 >> 0,000,00 >> 00:45:26 40,00 100,000,000,000,000,000,00 >> 0,000,00 >> > > Looks like the guest is only receiving 3-4 timer interrupts per second, > so time becomes quantized. > > Please run the attached irqtop in the affected guest and report the results. > > Is the host overly busy? What host kernel, kvm, and qemu are you > running? Is the guest running an I/O workload? if so, how are the disks The host is not busy at all. In fact, currently it is running only one guest. The host is running an ubuntu 2.6.31-14-server kernel. qemu-kvm is 0.12.2-0ubuntu6. The kvm module has srcversion: 82D6B673524596F9CF3E84C as stated by modinfo. The guest occasionally is running IO workload. However, the effect is visible all the time. And it is only one out of 6 CPUs the very same guest is running. This is the output on the guest for all CPUs: mpstat -P ALL 1 12:45:59 CPU%usr %nice%sys %iowait%irq %soft %steal %guest %idle 12:46:00 all0,409,742,395,370,803,980,00 0,00 77,34 12:46:00 01,005,006,003,001,009,000,00 0,00 75,00 12:46:00 10,00 23,002,00 10,000,000,000,00 0,00 65,00 12:46:00 20,005,940,996,930,001,980,00 0,00 84,16 12:46:00 30,008,002,005,002,009,000,00 0,00 74,00 12:46:00 40,00 33,330,000,000,000,000,00 0,00 66,67 12:46:00 50,005,940,003,960,000,990,00 0,00 89,11 12:46:00 CPU%usr %nice%sys %iowait%irq %soft %steal %guest %idle 12:46:01 all0,605,813,21 24,450,403,610,00 0,00 61,92 12:46:01 01,014,047,07 31,311,016,060,00 0,00 49,49 12:46:01 10,005,002,00 19,000,002,000,00 0,00 72,00 12:46:01 20,997,921,98 35,640,002,970,00 0,00 50,50 12:46:01 31,984,952,97 13,860,006,930,00 0,00 69,31 12:46:01 40,00 33,330,000,000,000,000,00 0,00 66,67 12:46:01 50,008,083,03 22,220,001,010,00 0,00 65,66 12:46:01 CPU%usr %nice%sys %iowait%irq %soft %steal %guest %idle 12:46:02 all2,38 12,70 17,06 14,680,601,980,00 0,00 50,60 12:46:02 03,96 15,849,90 13,860,002,970,00 0,00 53,47 12:46:02 12,976,935,94 19,802,972,970,00 0,00 58,42 12:46:02 22,02 17,178,08 18,182,021,010,00 0,00 51,52 12:46:02 32,02 10,108,08 14,140,002,020,00 0,00 63,64 12:46:02 40,000,000,000,000,000,000,00 0,00 100,00 12:46:02 50,00 13,00 55,006,000,001,000,00 0,00 25,00 12:46:02 CPU%usr %nice%sys %iowait%irq %soft %steal %guest %idle 12:46:03 all0,20 11,35 10,968,960,402,990,00 0,00 65,14 12:46:03 01,00 11,007,00 15,000,001,000,00 0,00 65,00 12:46:03 10,007,142,046,121,02 11,220,00 0,00 72,45 12:46:03 20,00 15,001,00 12,000,001,000,00 0,00 71,00 12:46:03 30,00 11,00 23,008,000,000,000,00 0,00 58,00 12:46:03 40,000,00 50,000,000,000,000,00 0,00 50,00 12:46:03 50,00 13,00 20,004,000,001,000,00 0,00 6
Re: Strange CPU usage pattern in SMP guest
On 03/21/2010 02:13 AM, Sebastian Hetze wrote: Hi *, in an 6 CPU SMP guest running on an host with 2 quad core Intel Xeon E5520 with hyperthrading enabled we see one or more guest CPUs working in a very strange pattern. It looks like all or nothing. We can easily identify the effected CPU with xosview. Here is the mpstat output compared to one regular working CPU: mpstat -P 4 1 Linux 2.6.31-16-generic-pae (guest) 21.03.2010 _i686_ (6 CPU) 00:45:19 CPU%usr %nice%sys %iowait%irq %soft %steal %guest %idle 00:45:20 40,00 100,000,000,000,000,000,00 0,000,00 00:45:21 40,00 100,000,000,000,000,000,00 0,000,00 00:45:22 40,00 100,000,000,000,000,000,00 0,000,00 00:45:23 40,00 100,000,000,000,000,000,00 0,000,00 00:45:24 40,00 66,670,000,000,00 33,330,00 0,000,00 00:45:25 40,00 100,000,000,000,000,000,00 0,000,00 00:45:26 40,00 100,000,000,000,000,000,00 0,000,00 Looks like the guest is only receiving 3-4 timer interrupts per second, so time becomes quantized. Please run the attached irqtop in the affected guest and report the results. Is the host overly busy? What host kernel, kvm, and qemu are you running? Is the guest running an I/O workload? if so, how are the disks configured? -- error compiling committee.c: too many arguments to function #!/usr/bin/python import curses import sys, os, time, optparse def read_interrupts(): irq = {} proc = file('/proc/interrupts') nrcpu = len(proc.readline().split()) for line in proc.readlines(): vec, data = line.strip().split(':', 1) if vec in ('ERR', 'MIS'): continue counts = data.split(None, nrcpu) counts, rest = (counts[:-1], counts[-1]) count = sum([int(x) for x in counts]) try: v = int(vec) name = rest.split(None, 1)[1] except: name = rest irq[name] = count return irq def delta_interrupts(): old = read_interrupts() while True: irq = read_interrupts() delta = {} for key in irq.keys(): delta[key] = irq[key] - old[key] yield delta old = irq label_width = 30 number_width = 10 def tui(screen): curses.use_default_colors() curses.noecho() def getcount(x): return x[1] def refresh(irq): screen.erase() screen.addstr(0, 0, 'irqtop') row = 2 for name, count in sorted(irq.items(), key = getcount, reverse = True): if row >= screen.getmaxyx()[0]: break col = 1 screen.addstr(row, col, name) col += label_width screen.addstr(row, col, '%10d' % (count,)) row += 1 screen.refresh() for irqs in delta_interrupts(): refresh(irqs) curses.halfdelay(10) try: c = screen.getkey() if c == 'q': break except KeyboardInterrupt: break except curses.error: continue import curses.wrapper curses.wrapper(tui)