Re: [PATCH 0/5]: Fix kdump under KVM

2009-10-29 Thread Avi Kivity

On 10/29/2009 10:34 AM, Chris Lalancette wrote:


I'm starting to take a look at this.  While this may be a generically useful
cleanup, it occurs to me that even if we call "set_irq" from the hrtimer
callback, we still have to do the "kick_vcpus" type thing.  Otherwise, we still
have the problem where if all vcpus are idle, none of them will be doing
vcpu_run anytime soon to actually inject the interrupt into the guest.

Or did I mis-understand what you are proposing?
   


The pic and ioapic code will kick vcpus that they deliver interrupts to, 
that's the natural order of things.  It was the kick from the timer 
(which is only connected indirectly to vcpus via the pic/ioapic) which 
is hacky and special.


As a bonus, if the interrupt is masked, no vcpus will be kicked.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/5]: Fix kdump under KVM

2009-10-29 Thread Chris Lalancette
Avi Kivity wrote:
> On 10/28/2009 12:13 PM, Chris Lalancette wrote:
>>> The kick from i8254 code is pretty bad, as you mention.  I forget why it
>>> is needed at all - shouldn't kvm_set_irq() end up kicking the correct
>>>  
>> As I understand it, that's not quite how it works.  From what I can see, what
>> happens is that the i8254 is programmed as an hrtimer.  When the hrtimer
>> expires, we get a callback in kvm_timer_fn (or pit_timer_fn, in my new code).
>> That code is running in interrupt context, however, so you can't directly 
>> call
>> "set_irq" at that point.  Instead, we update the "pending" variable and defer
>> work until later on.  That "later on" is when we are doing a vcpu_run, at 
>> which
>> point we check the "pending" variable, and if set, inject the interrupt.
>>
>> The problem is that if the vcpu(s) are in idle when the hrtimer expires, and 
>> we
>> don't kick them, no vcpu will wake up, and hence none of them will ever run
>> "set_irq" to get it injected into the guest.
>>
> 
> Oh yes, I remember now.
> 
>> If you have other ideas on how we might accomplish this, I'd definitely be
>> interested in hearing them.
>>
> 
> We could schedule work to run in thread context.  Previous to the user 
> return notifier work, this would cause a reloading of a bunch of msrs 
> and would thus be quite expensive, but now switching to kernel threads 
> and back is pretty cheap.
> 
> So we could try schedule_work().  My only worry is that it uses 
> wake_up() instead of wake_up_sync(), so it could introduce delay.  I'd 
> much prefer a variant the schedules immediately.
> 
> An alternative is to make irq injection work from irq context.  It's not 
> difficult to do technically - convert mutexes to spin_lock_irqsave() - 
> I'm just worried about long irqoff times with large guests (ioapic needs 
> to iterate over all vcpus sometimes).  This is useful for other cases - 
> device assignment interrupt injection, irqfd for vhost/vbus.  So maybe 
> we should go this path, and worry about reducing irqoff times later when 
> we bump KVM_MAX_VCPUS.
> 

I'm starting to take a look at this.  While this may be a generically useful
cleanup, it occurs to me that even if we call "set_irq" from the hrtimer
callback, we still have to do the "kick_vcpus" type thing.  Otherwise, we still
have the problem where if all vcpus are idle, none of them will be doing
vcpu_run anytime soon to actually inject the interrupt into the guest.

Or did I mis-understand what you are proposing?

-- 
Chris Lalancette
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/5]: Fix kdump under KVM

2009-10-28 Thread Avi Kivity

On 10/28/2009 12:13 PM, Chris Lalancette wrote:



The kick from i8254 code is pretty bad, as you mention.  I forget why it
is needed at all - shouldn't kvm_set_irq() end up kicking the correct
 

As I understand it, that's not quite how it works.  From what I can see, what
happens is that the i8254 is programmed as an hrtimer.  When the hrtimer
expires, we get a callback in kvm_timer_fn (or pit_timer_fn, in my new code).
That code is running in interrupt context, however, so you can't directly call
"set_irq" at that point.  Instead, we update the "pending" variable and defer
work until later on.  That "later on" is when we are doing a vcpu_run, at which
point we check the "pending" variable, and if set, inject the interrupt.

The problem is that if the vcpu(s) are in idle when the hrtimer expires, and we
don't kick them, no vcpu will wake up, and hence none of them will ever run
"set_irq" to get it injected into the guest.
   


Oh yes, I remember now.


If you have other ideas on how we might accomplish this, I'd definitely be
interested in hearing them.
   


We could schedule work to run in thread context.  Previous to the user 
return notifier work, this would cause a reloading of a bunch of msrs 
and would thus be quite expensive, but now switching to kernel threads 
and back is pretty cheap.


So we could try schedule_work().  My only worry is that it uses 
wake_up() instead of wake_up_sync(), so it could introduce delay.  I'd 
much prefer a variant the schedules immediately.


An alternative is to make irq injection work from irq context.  It's not 
difficult to do technically - convert mutexes to spin_lock_irqsave() - 
I'm just worried about long irqoff times with large guests (ioapic needs 
to iterate over all vcpus sometimes).  This is useful for other cases - 
device assignment interrupt injection, irqfd for vhost/vbus.  So maybe 
we should go this path, and worry about reducing irqoff times later when 
we bump KVM_MAX_VCPUS.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/5]: Fix kdump under KVM

2009-10-28 Thread Chris Lalancette
Avi Kivity wrote:
> On 10/27/2009 06:41 PM, Chris Lalancette wrote:
>> This patch series aims to get kdump working inside a KVM guest.
>> The current problem with using kdump is that KVM always delivers
>> PIT interrupts to the BSP, and the BSP only.  While this is
>> technically allowed by the MPS spec, most motherboards actually
>> deliver timer interrupts to *any* LAPIC in virtual wire mode.
>> Since a crash can occur on any CPU, timer interrupts must
>> be able to reach any CPU in order for kdump to work properly.
>>
>> Therefore, this patch series kicks all of the relevant vCPUs
>> when delivering a timer interrupt.  With these patches in
>> place, kdump in a RHEL-5 guest works properly.
>>
> 
> This is pretty expensive on large guests.  However I suppose under 
> normal conditions we won't be in virtual wire mode?

Right, exactly.  When running in normal SMP mode, your LAPIC's are in "Symmetric
I/O Mode", meaning they won't be kicked at all.  It's possible in theory for a
guest OS to program all of it's LAPIC's in virtual wire mode, but that is a
decidedly non-standard setup and not one recommended in any of the MPS
documentation that I've read.

> 
> The kick from i8254 code is pretty bad, as you mention.  I forget why it 
> is needed at all - shouldn't kvm_set_irq() end up kicking the correct 

As I understand it, that's not quite how it works.  From what I can see, what
happens is that the i8254 is programmed as an hrtimer.  When the hrtimer
expires, we get a callback in kvm_timer_fn (or pit_timer_fn, in my new code).
That code is running in interrupt context, however, so you can't directly call
"set_irq" at that point.  Instead, we update the "pending" variable and defer
work until later on.  That "later on" is when we are doing a vcpu_run, at which
point we check the "pending" variable, and if set, inject the interrupt.

The problem is that if the vcpu(s) are in idle when the hrtimer expires, and we
don't kick them, no vcpu will wake up, and hence none of them will ever run
"set_irq" to get it injected into the guest.

If you have other ideas on how we might accomplish this, I'd definitely be
interested in hearing them.

> vcpu (and note the pic still insists on the bsp:)
> 
> /*
>   * callback when PIC0 irq status changed
>   */
> static void pic_irq_request(void *opaque, int level)
> {
>  struct kvm *kvm = opaque;
>  struct kvm_vcpu *vcpu = kvm->bsp_vcpu;
>  struct kvm_pic *s = pic_irqchip(kvm);
>  int irq = pic_get_irq(&s->pics[0]);
> 
>  s->output = level;
>  if (vcpu && level && (s->pics[0].isr_ack & (1 << irq))) {
>  s->pics[0].isr_ack &= ~(1 << irq);
>  kvm_vcpu_kick(vcpu);
>  }
> }
> 

Yes, I looked at this too.  It's another one we could fix by changing
"kvm_vcpu_kick()" to "kvm_irq_kick_vcpus()".  However, it's not exactly required
by kdump since the linux kernel prefers to use the IOAPIC where possible.

-- 
Chris Lalancette
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/5]: Fix kdump under KVM

2009-10-28 Thread Avi Kivity

On 10/27/2009 06:41 PM, Chris Lalancette wrote:

This patch series aims to get kdump working inside a KVM guest.
The current problem with using kdump is that KVM always delivers
PIT interrupts to the BSP, and the BSP only.  While this is
technically allowed by the MPS spec, most motherboards actually
deliver timer interrupts to *any* LAPIC in virtual wire mode.
Since a crash can occur on any CPU, timer interrupts must
be able to reach any CPU in order for kdump to work properly.

Therefore, this patch series kicks all of the relevant vCPUs
when delivering a timer interrupt.  With these patches in
place, kdump in a RHEL-5 guest works properly.
   


This is pretty expensive on large guests.  However I suppose under 
normal conditions we won't be in virtual wire mode?


The kick from i8254 code is pretty bad, as you mention.  I forget why it 
is needed at all - shouldn't kvm_set_irq() end up kicking the correct 
vcpu (and note the pic still insists on the bsp:)


/*
 * callback when PIC0 irq status changed
 */
static void pic_irq_request(void *opaque, int level)
{
struct kvm *kvm = opaque;
struct kvm_vcpu *vcpu = kvm->bsp_vcpu;
struct kvm_pic *s = pic_irqchip(kvm);
int irq = pic_get_irq(&s->pics[0]);

s->output = level;
if (vcpu && level && (s->pics[0].isr_ack & (1 << irq))) {
s->pics[0].isr_ack &= ~(1 << irq);
kvm_vcpu_kick(vcpu);
}
}

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/5]: Fix kdump under KVM

2009-10-27 Thread Chris Lalancette

This patch series aims to get kdump working inside a KVM guest.
The current problem with using kdump is that KVM always delivers
PIT interrupts to the BSP, and the BSP only.  While this is
technically allowed by the MPS spec, most motherboards actually
deliver timer interrupts to *any* LAPIC in virtual wire mode.
Since a crash can occur on any CPU, timer interrupts must
be able to reach any CPU in order for kdump to work properly.

Therefore, this patch series kicks all of the relevant vCPUs
when delivering a timer interrupt.  With these patches in
place, kdump in a RHEL-5 guest works properly.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html