subject:"\"Re\\\: Strange CPU usage pattern in SMP guest\""

Re: Strange CPU usage pattern in SMP guest

2010-04-05 Thread Sebastian Hetze

On Tue, Mar 30, 2010 at 10:27:43AM +0200, Sebastian Hetze wrote:
> With this patch applied, the system runs without hrtimer messages since
> 5 days and the timer iterrupts look fine.
> 
> Actually, when restarting the system with the hrtimer patch applied,
> we also changed the BIOS setting to disable Intel SmartStep on the host.
> Since there are no hrtimer messages at all, it might be that the SmartStep
> CPU frequency adjustment is the real cause for the slow interrupts in
> the KVM guest. Anyone else experienced these problems?

Back with SmartStep enabled, we get "hrtimer: interrupt took 119986041 ns"
so it really looks like this SmartStep feature is the actual cause for
the interrupts to be delayed in the first place.

The system continues to run correctly so far, so the patch does fix the 
problems caused by these delayed interrupts.

Thanx alot & best regards,

  Sebastian
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Strange CPU usage pattern in SMP guest

2010-03-30 Thread Sebastian Hetze

On Tue, Mar 23, 2010 at 06:18:08PM -0300, Marcelo Tosatti wrote:
> On Mon, Mar 22, 2010 at 01:51:20PM +0100, Sebastian Hetze wrote:
> > On Sun, Mar 21, 2010 at 05:17:38PM +0200, Avi Kivity wrote:
> > > On 03/21/2010 04:55 PM, Sebastian Hetze wrote:
> > >> On Sun, Mar 21, 2010 at 02:19:40PM +0200, Avi Kivity wrote:
> > >>
> > >>> On 03/21/2010 02:02 PM, Sebastian Hetze wrote:
> > >>>  
> >  12:46:02 CPU%usr   %nice%sys %iowait%irq   %soft  
> >  %steal  %guest   %idle
> >  12:46:03 all0,20   11,35   10,968,960,402,99
> >  0,000,00   65,14
> >  12:46:03   01,00   11,007,00   15,000,001,00
> >  0,000,00   65,00
> >  12:46:03   10,007,142,046,121,02   11,22
> >  0,000,00   72,45
> >  12:46:03   20,00   15,001,00   12,000,001,00
> >  0,000,00   71,00
> >  12:46:03   30,00   11,00   23,008,000,000,00
> >  0,000,00   58,00
> >  12:46:03   40,000,00   50,000,000,000,00
> >  0,000,00   50,00
> >  12:46:03   50,00   13,00   20,004,000,001,00
> >  0,000,00   62,00
> > 
> >  So it is only CPU4 that is showing this strange behaviour.
> > 
> > 
> > >>> Can you adjust irqtop to only count cpu4?  or even just post a few 'cat
> > >>> /proc/interrupts' from that guest.
> > >>>
> > >>> Most likely the timer interrupt for cpu4 died.
> > >>>  
> > >> I've added two keys +/- to your irqtop to focus up and down
> > >> in the row of available CPUs.
> > >> The irqtop for CPU4 shows a constant number of 6 local timer interrupts
> > >> per update, while the other CPUs show various higher values:
> > >>
> > >> irqtop for cpu 4
> > >>
> > >>   eth0  188
> > >>   Rescheduling interrupts   162
> > >>   Local timer interrupts  6
> > >>   ata_piix3
> > >>   TLB shootdowns  1
> > >>   Spurious interrupts 0
> > >>   Machine check exceptions0
> > >>
> > >>
> > >> irqtop for cpu 5
> > >>
> > >>   eth0  257
> > >>   Local timer interrupts251
> > >>   Rescheduling interrupts   237
> > >>   Spurious interrupts 0
> > >>   Machine check exceptions0
> > >>
> > >> So the timer interrupt for cpu4 is not completely dead but somehow
> > >> broken.
> > >
> > > That is incredibly weird.
> > >
> > >> What can cause this problem? Any way to speed it up again?
> > >>
> > >
> > > The host has 8 cpus and is only running this 6 vcpu guest, yes?
> > >
> > > Can you confirm the other vcpus are ticking at 250 Hz?
> > >
> > > What does 'top' show running on cpu 4?  Pressing 'f' 'j' will add a  
> > > last-used-cpu field in the display.
> > >
> > > Marcelo, any ideas?
> > 
> > Just to let you know, right after startup, all vcpus work fine.
> > 
> > The following message might be related to the problem:
> > hrtimer: interrupt too slow, forcing clock min delta to 165954639 ns
> > 
> > The guest is an 32bit system running on an 64bit host.
> 
> Sebastian,
> 
> Please apply the attached patch to your guest kernel.
> 

With this patch applied, the system runs without hrtimer messages since
5 days and the timer iterrupts look fine. However, I had this
Clocksource tsc unstable (delta = -4398046474878 ns) message that I
reported on Sunday.

Actually, when restarting the system with the hrtimer patch applied,
we also changed the BIOS setting to disable Intel SmartStep on the host.
Since there are no hrtimer messages at all, it might be that the SmartStep
CPU frequency adjustment is the real cause for the slow interrupts in
the KVM guest. Anyone else experienced these problems?
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Strange CPU usage pattern in SMP guest

2010-03-23 Thread Marcelo Tosatti

On Mon, Mar 22, 2010 at 01:51:20PM +0100, Sebastian Hetze wrote:
> On Sun, Mar 21, 2010 at 05:17:38PM +0200, Avi Kivity wrote:
> > On 03/21/2010 04:55 PM, Sebastian Hetze wrote:
> >> On Sun, Mar 21, 2010 at 02:19:40PM +0200, Avi Kivity wrote:
> >>
> >>> On 03/21/2010 02:02 PM, Sebastian Hetze wrote:
> >>>  
>  12:46:02 CPU%usr   %nice%sys %iowait%irq   %soft  %steal 
>   %guest   %idle
>  12:46:03 all0,20   11,35   10,968,960,402,990,00 
> 0,00   65,14
>  12:46:03   01,00   11,007,00   15,000,001,000,00 
> 0,00   65,00
>  12:46:03   10,007,142,046,121,02   11,220,00 
> 0,00   72,45
>  12:46:03   20,00   15,001,00   12,000,001,000,00 
> 0,00   71,00
>  12:46:03   30,00   11,00   23,008,000,000,000,00 
> 0,00   58,00
>  12:46:03   40,000,00   50,000,000,000,000,00 
> 0,00   50,00
>  12:46:03   50,00   13,00   20,004,000,001,000,00 
> 0,00   62,00
> 
>  So it is only CPU4 that is showing this strange behaviour.
> 
> 
> >>> Can you adjust irqtop to only count cpu4?  or even just post a few 'cat
> >>> /proc/interrupts' from that guest.
> >>>
> >>> Most likely the timer interrupt for cpu4 died.
> >>>  
> >> I've added two keys +/- to your irqtop to focus up and down
> >> in the row of available CPUs.
> >> The irqtop for CPU4 shows a constant number of 6 local timer interrupts
> >> per update, while the other CPUs show various higher values:
> >>
> >> irqtop for cpu 4
> >>
> >>   eth0  188
> >>   Rescheduling interrupts   162
> >>   Local timer interrupts  6
> >>   ata_piix3
> >>   TLB shootdowns  1
> >>   Spurious interrupts 0
> >>   Machine check exceptions0
> >>
> >>
> >> irqtop for cpu 5
> >>
> >>   eth0  257
> >>   Local timer interrupts251
> >>   Rescheduling interrupts   237
> >>   Spurious interrupts 0
> >>   Machine check exceptions0
> >>
> >> So the timer interrupt for cpu4 is not completely dead but somehow
> >> broken.
> >
> > That is incredibly weird.
> >
> >> What can cause this problem? Any way to speed it up again?
> >>
> >
> > The host has 8 cpus and is only running this 6 vcpu guest, yes?
> >
> > Can you confirm the other vcpus are ticking at 250 Hz?
> >
> > What does 'top' show running on cpu 4?  Pressing 'f' 'j' will add a  
> > last-used-cpu field in the display.
> >
> > Marcelo, any ideas?
> 
> Just to let you know, right after startup, all vcpus work fine.
> 
> The following message might be related to the problem:
> hrtimer: interrupt too slow, forcing clock min delta to 165954639 ns
> 
> The guest is an 32bit system running on an 64bit host.

Sebastian,

Please apply the attached patch to your guest kernel.

commit ff28aa973164f8e5f4b2633e178162d121c4e5eb
Author: Thomas Gleixner 
Date:   Fri Nov 13 17:05:44 2009 +0100

hrtimer: Tune hrtimer_interrupt hang logic

The hrtimer_interrupt hang logic adjusts min_delta_ns based on the
execution time of the hrtimer callbacks.

This is error-prone for virtual machines, where a guest vcpu can be
scheduled out during the execution of the callbacks (and the callbacks
themselves can do operations that translate to blocking operations in
the hypervisor), which in can lead to large min_delta_ns rendering the
system unusable.

Replace the current heuristics with something more reliable. Allow the
interrupt code to try 3 times to catch up with the lost time. If that
fails use the total time spent in the interrupt handler to defer the
next timer interrupt so the system can catch up with other things
which got delayed. Limit that deferment to 100ms.

The retry events and the maximum time spent in the interrupt handler
are recorded and exposed via /proc/timer_list

Inspired by a patch from Marcelo.

Reported-by: Michael Tokarev 
Signed-off-by: Thomas Gleixner 
Tested-by: Marcelo Tosatti 
Cc: kvm@vger.kernel.org

diff --git a/include/linux/hrtimer.h b/include/linux/hrtimer.h
index 4759917..2e1064f 100644
--- a/include/linux/hrtimer.h
+++ b/include/linux/hrtimer.h
@@ -164,10 +164,11 @@ struct hrtimer_clock_base {
  * @expires_next:  absolute time of the next event which was scheduled
  * via clock_set_next_event()
  * @hres_active:   State of high resolution mode
- * @check_clocks:  Indictator, when set evaluate time source and clock
- * event devices whether high resolution mode can be
- *

Re: Strange CPU usage pattern in SMP guest

2010-03-22 Thread Sebastian Hetze

On Sun, Mar 21, 2010 at 05:17:38PM +0200, Avi Kivity wrote:
> On 03/21/2010 04:55 PM, Sebastian Hetze wrote:
>> On Sun, Mar 21, 2010 at 02:19:40PM +0200, Avi Kivity wrote:
>>
>>> On 03/21/2010 02:02 PM, Sebastian Hetze wrote:
>>>  
 12:46:02 CPU%usr   %nice%sys %iowait%irq   %soft  %steal  
 %guest   %idle
 12:46:03 all0,20   11,35   10,968,960,402,990,00   
  0,00   65,14
 12:46:03   01,00   11,007,00   15,000,001,000,00   
  0,00   65,00
 12:46:03   10,007,142,046,121,02   11,220,00   
  0,00   72,45
 12:46:03   20,00   15,001,00   12,000,001,000,00   
  0,00   71,00
 12:46:03   30,00   11,00   23,008,000,000,000,00   
  0,00   58,00
 12:46:03   40,000,00   50,000,000,000,000,00   
  0,00   50,00
 12:46:03   50,00   13,00   20,004,000,001,000,00   
  0,00   62,00

 So it is only CPU4 that is showing this strange behaviour.


>>> Can you adjust irqtop to only count cpu4?  or even just post a few 'cat
>>> /proc/interrupts' from that guest.
>>>
>>> Most likely the timer interrupt for cpu4 died.
>>>  
>> I've added two keys +/- to your irqtop to focus up and down
>> in the row of available CPUs.
>> The irqtop for CPU4 shows a constant number of 6 local timer interrupts
>> per update, while the other CPUs show various higher values:
>>
>> irqtop for cpu 4
>>
>>   eth0  188
>>   Rescheduling interrupts   162
>>   Local timer interrupts  6
>>   ata_piix3
>>   TLB shootdowns  1
>>   Spurious interrupts 0
>>   Machine check exceptions0
>>
>>
>> irqtop for cpu 5
>>
>>   eth0  257
>>   Local timer interrupts251
>>   Rescheduling interrupts   237
>>   Spurious interrupts 0
>>   Machine check exceptions0
>>
>> So the timer interrupt for cpu4 is not completely dead but somehow
>> broken.
>
> That is incredibly weird.
>
>> What can cause this problem? Any way to speed it up again?
>>
>
> The host has 8 cpus and is only running this 6 vcpu guest, yes?
>
> Can you confirm the other vcpus are ticking at 250 Hz?
>
> What does 'top' show running on cpu 4?  Pressing 'f' 'j' will add a  
> last-used-cpu field in the display.
>
> Marcelo, any ideas?

Just to let you know, right after startup, all vcpus work fine.

The following message might be related to the problem:
hrtimer: interrupt too slow, forcing clock min delta to 165954639 ns

The guest is an 32bit system running on an 64bit host.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Strange CPU usage pattern in SMP guest

2010-03-21 Thread Sebastian Hetze

On Sun, Mar 21, 2010 at 05:17:38PM +0200, Avi Kivity wrote:
> On 03/21/2010 04:55 PM, Sebastian Hetze wrote:
>> On Sun, Mar 21, 2010 at 02:19:40PM +0200, Avi Kivity wrote:
>>
>>> On 03/21/2010 02:02 PM, Sebastian Hetze wrote:
>>>  
 12:46:02 CPU%usr   %nice%sys %iowait%irq   %soft  %steal  
 %guest   %idle
 12:46:03 all0,20   11,35   10,968,960,402,990,00   
  0,00   65,14
 12:46:03   01,00   11,007,00   15,000,001,000,00   
  0,00   65,00
 12:46:03   10,007,142,046,121,02   11,220,00   
  0,00   72,45
 12:46:03   20,00   15,001,00   12,000,001,000,00   
  0,00   71,00
 12:46:03   30,00   11,00   23,008,000,000,000,00   
  0,00   58,00
 12:46:03   40,000,00   50,000,000,000,000,00   
  0,00   50,00
 12:46:03   50,00   13,00   20,004,000,001,000,00   
  0,00   62,00

 So it is only CPU4 that is showing this strange behaviour.


>>> Can you adjust irqtop to only count cpu4?  or even just post a few 'cat
>>> /proc/interrupts' from that guest.
>>>
>>> Most likely the timer interrupt for cpu4 died.
>>>  
>> I've added two keys +/- to your irqtop to focus up and down
>> in the row of available CPUs.
>> The irqtop for CPU4 shows a constant number of 6 local timer interrupts
>> per update, while the other CPUs show various higher values:
>>
>> irqtop for cpu 4
>>
>>   eth0  188
>>   Rescheduling interrupts   162
>>   Local timer interrupts  6
>>   ata_piix3
>>   TLB shootdowns  1
>>   Spurious interrupts 0
>>   Machine check exceptions0
>>
>>
>> irqtop for cpu 5
>>
>>   eth0  257
>>   Local timer interrupts251
>>   Rescheduling interrupts   237
>>   Spurious interrupts 0
>>   Machine check exceptions0
>>
>> So the timer interrupt for cpu4 is not completely dead but somehow
>> broken.
>
> That is incredibly weird.
>
>> What can cause this problem? Any way to speed it up again?
>>
>
> The host has 8 cpus and is only running this 6 vcpu guest, yes?

The host is an dual quad core E5520 with hyperthrading enabled, so we
see 2x4x2=16 CPUs on the host. The guest is started with 6 CPUs.

> Can you confirm the other vcpus are ticking at 250 Hz?

The irqtop shows different numbers for local timer interrupts on the
other CPUs. The total number (summed up over all CPUs) varies between
something like 700 and 1400. Any CPU can be down to 10 and next update
up to 260. Only CPU4 stays at the 6 local timer interrupts.

>
> What does 'top' show running on cpu 4?  Pressing 'f' 'j' will add a  
> last-used-cpu field in the display.

The processes are not bound to a particular CPU, so the picture varies.
Here are two shots:

take1:

   15 root  RT  -5 000 S0  0.0   0:01.70 4 migration/4
   16 root  15  -5 000 S0  0.0   0:00.08 4 ksoftirqd/4
   17 root  RT  -5 000 S0  0.0   0:00.00 4 watchdog/4
   25 root  15  -5 000 S0  0.0   0:00.01 4 events/4
   35 root  15  -5 000 S0  0.0   0:00.00 4 kintegrityd/4
   41 root  15  -5 000 S0  0.0   0:00.03 4 kblockd/4
   50 root  15  -5 000 S0  0.0   0:00.90 4 ata/4
   55 root  15  -5 000 S0  0.0   0:00.00 4 kseriod
   66 root  15  -5 000 S0  0.0   0:00.00 4 aio/4
   73 root  15  -5 000 S0  0.0   0:00.00 4 crypto/4
   80 root  15  -5 000 S0  0.0   2:11.71 4 scsi_eh_1
   87 root  15  -5 000 S0  0.0   0:00.00 4 kmpathd/4
   95 root  15  -5 000 S0  0.0   0:00.00 4 kondemand/4
  101 root  15  -5 000 S0  0.0   0:00.00 4 kconservative/4
  103 root  10 -10 000 S0  0.0   0:00.00 4 krfcommd
  681 root  15  -5 000 S0  0.0   0:00.00 4 kdmflush
  686 root  15  -5 000 S0  0.0   0:00.00 4 kdmflush
  691 root  15  -5 000 S0  0.0   0:00.00 4 kdmflush
  737 root  15  -5 000 S0  0.0   0:00.71 4 kjournald
  826 root  16  -4  2100  452  312 S0  0.0   0:00.14 4 udevd
 1350 root  15  -5 000 S0  0.0   0:00.00 4 kpsmoused
 1444 root  15  -5 000 S0  0.0   0:00.00 4 kgameportd
 1718 root  15  -5 000 S0  0.0   0:14.62 4 kjournald
 2108 statd 20   0  2252 1152  760 S0  0.0   0:02.66 4 rpc.statd
 2117 root  15  -5 000 S0  0.0   0:00.36 4 rpciod/4
 2123 root  15  -5 00

Re: Strange CPU usage pattern in SMP guest

2010-03-21 Thread Avi Kivity


On 03/21/2010 04:55 PM, Sebastian Hetze wrote:

On Sun, Mar 21, 2010 at 02:19:40PM +0200, Avi Kivity wrote:
   

On 03/21/2010 02:02 PM, Sebastian Hetze wrote:
 

12:46:02 CPU%usr   %nice%sys %iowait%irq   %soft  %steal  
%guest   %idle
12:46:03 all0,20   11,35   10,968,960,402,990,00
0,00   65,14
12:46:03   01,00   11,007,00   15,000,001,000,00
0,00   65,00
12:46:03   10,007,142,046,121,02   11,220,00
0,00   72,45
12:46:03   20,00   15,001,00   12,000,001,000,00
0,00   71,00
12:46:03   30,00   11,00   23,008,000,000,000,00
0,00   58,00
12:46:03   40,000,00   50,000,000,000,000,00
0,00   50,00
12:46:03   50,00   13,00   20,004,000,001,000,00
0,00   62,00

So it is only CPU4 that is showing this strange behaviour.

   

Can you adjust irqtop to only count cpu4?  or even just post a few 'cat
/proc/interrupts' from that guest.

Most likely the timer interrupt for cpu4 died.
 

I've added two keys +/- to your irqtop to focus up and down
in the row of available CPUs.
The irqtop for CPU4 shows a constant number of 6 local timer interrupts
per update, while the other CPUs show various higher values:

irqtop for cpu 4

  eth0  188
  Rescheduling interrupts   162
  Local timer interrupts  6
  ata_piix3
  TLB shootdowns  1
  Spurious interrupts 0
  Machine check exceptions0


irqtop for cpu 5

  eth0  257
  Local timer interrupts251
  Rescheduling interrupts   237
  Spurious interrupts 0
  Machine check exceptions0

So the timer interrupt for cpu4 is not completely dead but somehow
broken.


That is incredibly weird.


What can cause this problem? Any way to speed it up again?
   


The host has 8 cpus and is only running this 6 vcpu guest, yes?

Can you confirm the other vcpus are ticking at 250 Hz?

What does 'top' show running on cpu 4?  Pressing 'f' 'j' will add a 
last-used-cpu field in the display.


Marcelo, any ideas?

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Strange CPU usage pattern in SMP guest

2010-03-21 Thread Sebastian Hetze

On Sun, Mar 21, 2010 at 02:19:40PM +0200, Avi Kivity wrote:
> On 03/21/2010 02:02 PM, Sebastian Hetze wrote:
>>
>> 12:46:02 CPU%usr   %nice%sys %iowait%irq   %soft  %steal  
>> %guest   %idle
>> 12:46:03 all0,20   11,35   10,968,960,402,990,00
>> 0,00   65,14
>> 12:46:03   01,00   11,007,00   15,000,001,000,00
>> 0,00   65,00
>> 12:46:03   10,007,142,046,121,02   11,220,00
>> 0,00   72,45
>> 12:46:03   20,00   15,001,00   12,000,001,000,00
>> 0,00   71,00
>> 12:46:03   30,00   11,00   23,008,000,000,000,00
>> 0,00   58,00
>> 12:46:03   40,000,00   50,000,000,000,000,00
>> 0,00   50,00
>> 12:46:03   50,00   13,00   20,004,000,001,000,00
>> 0,00   62,00
>>
>> So it is only CPU4 that is showing this strange behaviour.
>>
>
> Can you adjust irqtop to only count cpu4?  or even just post a few 'cat  
> /proc/interrupts' from that guest.
>
> Most likely the timer interrupt for cpu4 died.

I've added two keys +/- to your irqtop to focus up and down
in the row of available CPUs.
The irqtop for CPU4 shows a constant number of 6 local timer interrupts
per update, while the other CPUs show various higher values:

irqtop for cpu 4

 eth0  188
 Rescheduling interrupts   162
 Local timer interrupts  6
 ata_piix3
 TLB shootdowns  1
 Spurious interrupts 0
 Machine check exceptions0


irqtop for cpu 5

 eth0  257
 Local timer interrupts251
 Rescheduling interrupts   237
 Spurious interrupts 0
 Machine check exceptions0

So the timer interrupt for cpu4 is not completely dead but somehow
broken. What can cause this problem? Any way to speed it up again?

#!/usr/bin/python

import curses
import sys, os, time, optparse

def read_interrupts():
global target
irq = {}
proc = file('/proc/interrupts')
nrcpu = len(proc.readline().split())
if target < 0:
target = 0;
if target > nrcpu:
target = nrcpu
for line in proc.readlines():
vec, data = line.strip().split(':', 1)
if vec in ('ERR', 'MIS'):
continue
counts = data.split(None, nrcpu)
counts, rest = (counts[:-1], counts[-1])
if target == 0:
count = sum([int(x) for x in counts])
else:
count = int(counts[target-1])
try:
v = int(vec)
name = rest.split(None, 1)[1]
except:
name = rest
irq[name] = count
return irq

def delta_interrupts():
old = read_interrupts()
while True:
irq = read_interrupts()
delta = {}
for key in irq.keys():
delta[key] = irq[key] - old[key]
yield delta
old = irq

target = 0
label_width = 35
number_width = 10

def tui(screen):
curses.use_default_colors()
global target
curses.noecho()
def getcount(x):
return x[1]
def refresh(irq):
screen.erase()
if target > 0:
title = "irqtop for cpu %d"%(target-1)
else:
title = "irqtop sum for all cpu's"
screen.addstr(0, 0, title)
row = 2
for name, count in sorted(irq.items(), key = getcount, reverse = True):
if row >= screen.getmaxyx()[0]:
break
col = 1
screen.addstr(row, col, name)
col += label_width
screen.addstr(row, col, '%10d' % (count,))
row += 1
screen.refresh()

for irqs in delta_interrupts():
refresh(irqs)
curses.halfdelay(10)
try:
c = screen.getkey()
if c == 'q':
break
if c == '+':
target = target+1
if c == '-':
target = target-1
except KeyboardInterrupt:
break
except curses.error:
continue

import curses.wrapper
curses.wrapper(tui)

Re: Strange CPU usage pattern in SMP guest

2010-03-21 Thread Avi Kivity


On 03/21/2010 02:02 PM, Sebastian Hetze wrote:


12:46:02 CPU%usr   %nice%sys %iowait%irq   %soft  %steal  
%guest   %idle
12:46:03 all0,20   11,35   10,968,960,402,990,00
0,00   65,14
12:46:03   01,00   11,007,00   15,000,001,000,00
0,00   65,00
12:46:03   10,007,142,046,121,02   11,220,00
0,00   72,45
12:46:03   20,00   15,001,00   12,000,001,000,00
0,00   71,00
12:46:03   30,00   11,00   23,008,000,000,000,00
0,00   58,00
12:46:03   40,000,00   50,000,000,000,000,00
0,00   50,00
12:46:03   50,00   13,00   20,004,000,001,000,00
0,00   62,00

So it is only CPU4 that is showing this strange behaviour.
   


Can you adjust irqtop to only count cpu4?  or even just post a few 'cat 
/proc/interrupts' from that guest.


Most likely the timer interrupt for cpu4 died.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Strange CPU usage pattern in SMP guest

2010-03-21 Thread Sebastian Hetze

On Sun, Mar 21, 2010 at 12:09:00PM +0200, Avi Kivity wrote:
> On 03/21/2010 02:13 AM, Sebastian Hetze wrote:
>> Hi *,
>>
>> in an 6 CPU SMP guest running on an host with 2 quad core
>> Intel Xeon E5520 with hyperthrading enabled
>> we see one or more guest CPUs working in a very strange
>> pattern. It looks like all or nothing. We can easily identify
>> the effected CPU with xosview. Here is the mpstat output
>> compared to one regular working CPU:
>>
>>
>> mpstat -P 4 1
>> Linux 2.6.31-16-generic-pae (guest)  21.03.2010  _i686_  (6 CPU)
>> 00:45:19 CPU%usr   %nice%sys %iowait%irq   %soft  %steal  
>> %guest   %idle
>> 00:45:20   40,00  100,000,000,000,000,000,00
>> 0,000,00
>> 00:45:21   40,00  100,000,000,000,000,000,00
>> 0,000,00
>> 00:45:22   40,00  100,000,000,000,000,000,00
>> 0,000,00
>> 00:45:23   40,00  100,000,000,000,000,000,00
>> 0,000,00
>> 00:45:24   40,00   66,670,000,000,00   33,330,00
>> 0,000,00
>> 00:45:25   40,00  100,000,000,000,000,000,00
>> 0,000,00
>> 00:45:26   40,00  100,000,000,000,000,000,00
>> 0,000,00
>>
>
> Looks like the guest is only receiving 3-4 timer interrupts per second,  
> so time becomes quantized.
>
> Please run the attached irqtop in the affected guest and report the results.
>
> Is the host overly busy?  What host kernel, kvm, and qemu are you  
> running?  Is the guest running an I/O workload? if so, how are the disks  

The host is not busy at all. In fact, currently it is running only one
guest. The host is running an ubuntu 2.6.31-14-server kernel. qemu-kvm
is 0.12.2-0ubuntu6. The kvm module has srcversion: 82D6B673524596F9CF3E84C
as stated by modinfo.

The guest occasionally is running IO workload. However, the effect is
visible all the time. And it is only one out of 6 CPUs the very same guest
is running. This is the output on the guest for all CPUs:

mpstat -P ALL 1
12:45:59 CPU%usr   %nice%sys %iowait%irq   %soft  %steal  
%guest   %idle
12:46:00 all0,409,742,395,370,803,980,00
0,00   77,34
12:46:00   01,005,006,003,001,009,000,00
0,00   75,00
12:46:00   10,00   23,002,00   10,000,000,000,00
0,00   65,00
12:46:00   20,005,940,996,930,001,980,00
0,00   84,16
12:46:00   30,008,002,005,002,009,000,00
0,00   74,00
12:46:00   40,00   33,330,000,000,000,000,00
0,00   66,67
12:46:00   50,005,940,003,960,000,990,00
0,00   89,11

12:46:00 CPU%usr   %nice%sys %iowait%irq   %soft  %steal  
%guest   %idle
12:46:01 all0,605,813,21   24,450,403,610,00
0,00   61,92
12:46:01   01,014,047,07   31,311,016,060,00
0,00   49,49
12:46:01   10,005,002,00   19,000,002,000,00
0,00   72,00
12:46:01   20,997,921,98   35,640,002,970,00
0,00   50,50
12:46:01   31,984,952,97   13,860,006,930,00
0,00   69,31
12:46:01   40,00   33,330,000,000,000,000,00
0,00   66,67
12:46:01   50,008,083,03   22,220,001,010,00
0,00   65,66

12:46:01 CPU%usr   %nice%sys %iowait%irq   %soft  %steal  
%guest   %idle
12:46:02 all2,38   12,70   17,06   14,680,601,980,00
0,00   50,60
12:46:02   03,96   15,849,90   13,860,002,970,00
0,00   53,47
12:46:02   12,976,935,94   19,802,972,970,00
0,00   58,42
12:46:02   22,02   17,178,08   18,182,021,010,00
0,00   51,52
12:46:02   32,02   10,108,08   14,140,002,020,00
0,00   63,64
12:46:02   40,000,000,000,000,000,000,00
0,00  100,00
12:46:02   50,00   13,00   55,006,000,001,000,00
0,00   25,00

12:46:02 CPU%usr   %nice%sys %iowait%irq   %soft  %steal  
%guest   %idle
12:46:03 all0,20   11,35   10,968,960,402,990,00
0,00   65,14
12:46:03   01,00   11,007,00   15,000,001,000,00
0,00   65,00
12:46:03   10,007,142,046,121,02   11,220,00
0,00   72,45
12:46:03   20,00   15,001,00   12,000,001,000,00
0,00   71,00
12:46:03   30,00   11,00   23,008,000,000,000,00
0,00   58,00
12:46:03   40,000,00   50,000,000,000,000,00
0,00   50,00
12:46:03   50,00   13,00   20,004,000,001,000,00
0,00   6

Re: Strange CPU usage pattern in SMP guest

2010-03-21 Thread Avi Kivity


On 03/21/2010 02:13 AM, Sebastian Hetze wrote:

Hi *,

in an 6 CPU SMP guest running on an host with 2 quad core
Intel Xeon E5520 with hyperthrading enabled
we see one or more guest CPUs working in a very strange
pattern. It looks like all or nothing. We can easily identify
the effected CPU with xosview. Here is the mpstat output
compared to one regular working CPU:


mpstat -P 4 1
Linux 2.6.31-16-generic-pae (guest) 21.03.2010  _i686_  (6 CPU)
00:45:19 CPU%usr   %nice%sys %iowait%irq   %soft  %steal  
%guest   %idle
00:45:20   40,00  100,000,000,000,000,000,00
0,000,00
00:45:21   40,00  100,000,000,000,000,000,00
0,000,00
00:45:22   40,00  100,000,000,000,000,000,00
0,000,00
00:45:23   40,00  100,000,000,000,000,000,00
0,000,00
00:45:24   40,00   66,670,000,000,00   33,330,00
0,000,00
00:45:25   40,00  100,000,000,000,000,000,00
0,000,00
00:45:26   40,00  100,000,000,000,000,000,00
0,000,00
   


Looks like the guest is only receiving 3-4 timer interrupts per second, 
so time becomes quantized.


Please run the attached irqtop in the affected guest and report the results.

Is the host overly busy?  What host kernel, kvm, and qemu are you 
running?  Is the guest running an I/O workload? if so, how are the disks 
configured?


--
error compiling committee.c: too many arguments to function

#!/usr/bin/python

import curses
import sys, os, time, optparse

def read_interrupts():
irq = {}
proc = file('/proc/interrupts')
nrcpu = len(proc.readline().split())
for line in proc.readlines():
vec, data = line.strip().split(':', 1)
if vec in ('ERR', 'MIS'):
continue
counts = data.split(None, nrcpu)
counts, rest = (counts[:-1], counts[-1])
count = sum([int(x) for x in counts])
try:
v = int(vec)
name = rest.split(None, 1)[1]
except:
name = rest
irq[name] = count
return irq

def delta_interrupts():
old = read_interrupts()
while True:
irq = read_interrupts()
delta = {}
for key in irq.keys():
delta[key] = irq[key] - old[key]
yield delta
old = irq

label_width = 30
number_width = 10

def tui(screen):
curses.use_default_colors()
curses.noecho()
def getcount(x):
return x[1]
def refresh(irq):
screen.erase()
screen.addstr(0, 0, 'irqtop')
row = 2
for name, count in sorted(irq.items(), key = getcount, reverse = True):
if row >= screen.getmaxyx()[0]:
break
col = 1
screen.addstr(row, col, name)
col += label_width
screen.addstr(row, col, '%10d' % (count,))
row += 1
screen.refresh()

for irqs in delta_interrupts():
refresh(irqs)
curses.halfdelay(10)
try:
c = screen.getkey()
if c == 'q':
break
except KeyboardInterrupt:
break
except curses.error:
continue

import curses.wrapper
curses.wrapper(tui)

Re: Strange CPU usage pattern in SMP guest

Re: Strange CPU usage pattern in SMP guest

Re: Strange CPU usage pattern in SMP guest

Re: Strange CPU usage pattern in SMP guest

Re: Strange CPU usage pattern in SMP guest

Re: Strange CPU usage pattern in SMP guest

Re: Strange CPU usage pattern in SMP guest

Re: Strange CPU usage pattern in SMP guest

Re: Strange CPU usage pattern in SMP guest

Re: Strange CPU usage pattern in SMP guest

10 matches

Site Navigation

Mail list logo

Footer information