Re: [PATCH 5/5] Fix kdump under KVM.

2009-11-02 Thread Avi Kivity

On 11/02/2009 03:22 PM, Chris Lalancette wrote:

Avi Kivity wrote:
   

On 10/30/2009 02:23 PM, Chris Lalancette wrote:
 

In the meantime, I've gotten the "set_irq from IRQ context" that Avi
suggested
working, and the fixing up of this IOAPIC check is the last bit to actually get
kdump working.

   

There are two problems with that:

- kvm_set_irq() calls some notifiers (irq_mask_notifiers, at least, but
possibly more); need to make sure those are safe from irq context
- kvm_set_irq() will loop on all vcpus, usually incurring a cache miss
or two; with large vcpu counts this can make the irq-off time quite
large, hurting our worst-case latency

we can defer the second problem since it's only a performance issue, but
how did you deal with the first?

 

Actually, there is a problem here, but I don't think it's as you describe.
kvm_set_irq() doesn't directly call the mask notifiers; all it really does is
call the i8259/ioapic callback, which causes an interrupt to be raised on the
LAPIC in question, which causes the interrupt to be injected into the VCPU on
the next run.  That doesn't call the irq_mask_notifiers, and seems to be
IRQ-safe (with the conversion from mutex's to spin_locks).

The irq mask notifiers are called later on during an EOI.  In particular,
ioapic_mmio_write() ->  ioapic_write_indirect(), which is protected by the
ioapic_lock.  I don't see a problem here (although please point it out if I
missed it).
   


If we convert the ioapic lock to spin_lock_irq, then mask notifiers will 
be called under that lock.  It's true that mask notifiers aren't called 
directly from the kvm_set_irq path, but if we change the lock, all paths 
are affected.



The problem I do see has to do with the irq_ack notifiers.  In particular, if
you are on ia64 and ioapic_mmio_write() gets called for the IOAPIC_REG_EOI
address, we call __kvm_ioapic_update_eoi().  In that case, we drop the spin_lock
and call kvm_notify_acked_irq(), which can re-enter the ioapic code.  This could
be IRQ unsafe, after we return from kvm_notify_acked_irq() we re-acquire the
spin-lock, but in an irq-unsafe fashion.  Therefore we could take the spin-lock,
get interrupted for the timer interrupt, and try to re-acquire the lock, causing
a deadlock.  I'll have to think a bit more on how to solve that one (there's a
similar situation going on in the i8259, which is not ia64 specific).
   


We have to reaquire the lock with spin_lock_irqsave() to avoid the deadlock.


So, either I'm missing exactly what you are talking about, or there's no
deadlock with my patch (except with ia64).  Can you explain further


EOI is also called from x86 (though the lapic).

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/5] Fix kdump under KVM.

2009-11-02 Thread Chris Lalancette
Avi Kivity wrote:
> On 10/30/2009 02:23 PM, Chris Lalancette wrote:
>> In the meantime, I've gotten the "set_irq from IRQ context" that Avi 
>> suggested
>> working, and the fixing up of this IOAPIC check is the last bit to actually 
>> get
>> kdump working.
>>
> 
> There are two problems with that:
> 
> - kvm_set_irq() calls some notifiers (irq_mask_notifiers, at least, but 
> possibly more); need to make sure those are safe from irq context
> - kvm_set_irq() will loop on all vcpus, usually incurring a cache miss 
> or two; with large vcpu counts this can make the irq-off time quite 
> large, hurting our worst-case latency
> 
> we can defer the second problem since it's only a performance issue, but 
> how did you deal with the first?
> 

Actually, there is a problem here, but I don't think it's as you describe.
kvm_set_irq() doesn't directly call the mask notifiers; all it really does is
call the i8259/ioapic callback, which causes an interrupt to be raised on the
LAPIC in question, which causes the interrupt to be injected into the VCPU on
the next run.  That doesn't call the irq_mask_notifiers, and seems to be
IRQ-safe (with the conversion from mutex's to spin_locks).

The irq mask notifiers are called later on during an EOI.  In particular,
ioapic_mmio_write() -> ioapic_write_indirect(), which is protected by the
ioapic_lock.  I don't see a problem here (although please point it out if I
missed it).

The problem I do see has to do with the irq_ack notifiers.  In particular, if
you are on ia64 and ioapic_mmio_write() gets called for the IOAPIC_REG_EOI
address, we call __kvm_ioapic_update_eoi().  In that case, we drop the spin_lock
and call kvm_notify_acked_irq(), which can re-enter the ioapic code.  This could
be IRQ unsafe, after we return from kvm_notify_acked_irq() we re-acquire the
spin-lock, but in an irq-unsafe fashion.  Therefore we could take the spin-lock,
get interrupted for the timer interrupt, and try to re-acquire the lock, causing
a deadlock.  I'll have to think a bit more on how to solve that one (there's a
similar situation going on in the i8259, which is not ia64 specific).

So, either I'm missing exactly what you are talking about, or there's no
deadlock with my patch (except with ia64).  Can you explain further?

-- 
Chris Lalancette
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/5] Fix kdump under KVM.

2009-11-01 Thread Avi Kivity

On 10/30/2009 02:23 PM, Chris Lalancette wrote:
In the meantime, I've gotten the "set_irq from IRQ context" that Avi 
suggested

working, and the fixing up of this IOAPIC check is the last bit to actually get
kdump working.
   


There are two problems with that:

- kvm_set_irq() calls some notifiers (irq_mask_notifiers, at least, but 
possibly more); need to make sure those are safe from irq context
- kvm_set_irq() will loop on all vcpus, usually incurring a cache miss 
or two; with large vcpu counts this can make the irq-off time quite 
large, hurting our worst-case latency


we can defer the second problem since it's only a performance issue, but 
how did you deal with the first?


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/5] Fix kdump under KVM.

2009-10-30 Thread Marcelo Tosatti
On Fri, Oct 30, 2009 at 12:03:20PM -0600, David S. Ahern wrote:
> 
> On 10/30/2009 09:28 AM, Chris Lalancette wrote:
> >>
> >> For VMX, the guests TSC's are now usually synchronized (which was not
> >> the case before, when this patch was written). For AMD, they start out
> >> of sync.
> >>
> >> Migration also causes them to be out of sync. So i'd prefer to postpone
> >> the removal of this limitation until there is a guarantee the TSCs 
> >> are synchronized across vcpus.
> > 
> > OK.  I'll try to get on an unsynchronized TSC machine and see if I can 
> > reproduce
> > there.
> 
> Perhaps Marcelo was referring to his patchset from December 2008
> targeting TSC synchronization?
> 
> http://www.mail-archive.com/kvm@vger.kernel.org/msg08071.html
> 
> David

Yes, unsynchronized TSCs in the guest perspective.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/5] Fix kdump under KVM.

2009-10-30 Thread Marcelo Tosatti
On Fri, Oct 30, 2009 at 04:28:01PM +0100, Chris Lalancette wrote:
> Marcelo Tosatti wrote:
> > On Fri, Oct 30, 2009 at 01:23:57PM +0100, Chris Lalancette wrote:
> >> Marcelo Tosatti wrote:
> >>> On Wed, Oct 28, 2009 at 11:21:42AM +0100, Chris Lalancette wrote:
>  Marcelo Tosatti wrote:
> > On Tue, Oct 27, 2009 at 05:41:07PM +0100, Chris Lalancette wrote:
> >> This patch is the main point of the series.  In order for
> >> kdump to properly work inside a KVM guest, we need to make
> >> sure that all VCPUs in virtual wire APIC mode get kicked
> >> to try and pick up the timer interrupts.  To do this,
> >> we iterate over the CPUs and deliver interrupts to the
> >> proper VCPUs.
> >>
> >> I don't love the concept of doing kvm_irq_kick_vcpus() from
> >> within pit_timer_fn().  A PIT is not connected to a CPU at all,
> >> only to a PIC or APIC.  However, if a CPU enters idle, this is
> >> the only way to wake it up to check for the interrupt.
> > The reason the PIT interrupt was fixed to BSP is:
> >
> > http://www.mail-archive.com/kvm-de...@lists.sourceforge.net/msg13250.html
> >
> > Perhaps enhancing ioapic_deliver's (irq == 0) check to bypass the   
> > destination overwrite in case its programmed by the guest to 
> > a single CPU would fix it?
>  Ug, nasty.  I definitely don't want to re-introduce that bug.  What 
>  exactly do
>  you mean by "fix it"?  Do you mean fix the original RHEL-5.1 PAE issue, 
>  or fix
>  the kdump issue?
> >>> The kdump issue. Something like:
> >>>
> >>> #ifdef CONFIG_X86
> >>> /* Always delivery PIT interrupt to vcpu 0 */
> >>> if (irq == 0 && dest_multiple_vcpus(entry)) {
> >>> irqe.dest_mode = 0; /* Physical mode. */
> >>> /* need to read apic_id from apic regiest since
> >>>  * it can be rewritten */
> >>> irqe.dest_id = ioapic->kvm->bsp_vcpu->vcpu_id;
> >>> }
> >>> #endif
> >>>
> >>> The rest of the code should be ready to deal with it.
> >> Do you have any more information about how to reproduce that original 
> >> issue?  I
> >> tried installing a RHEL-5.1 i386 guest with a PAE kernel, and I was able to
> >> install and boot it just fine with or without the "always deliver irq 0 to 
> >> vcpu
> >> 0" check above.  Either there's something in particular I need to do to 
> >> trigger
> >> the issue (hardware or software), or it's being solved another way.
> > 
> > For VMX, the guests TSC's are now usually synchronized (which was not
> > the case before, when this patch was written). For AMD, they start out
> > of sync.
> > 
> > Migration also causes them to be out of sync. So i'd prefer to postpone
> > the removal of this limitation until there is a guarantee the TSCs 
> > are synchronized across vcpus.
> 
> OK.  I'll try to get on an unsynchronized TSC machine and see if I can 
> reproduce
> there.
> 
> > 
> >> In the meantime, I've gotten the "set_irq from IRQ context" that Avi 
> >> suggested
> >> working, and the fixing up of this IOAPIC check is the last bit to 
> >> actually get
> >> kdump working.
> > 
> > The kdump fix does not require that, I believe (although it is
> > useful/required for other things, please send the patch separately).
> > 
> > The hrtimer will fire on vcpu0, then:
> > 
> > kvm_inject_pit_interrupts -> ioapic_set_irq -> apic_deliver ->
> > kvm_vcpu_kick(vcpuN)
> > 
> > As long as the dest-id = vcpu_id overwrite change is in place.
> 
> Just to follow up here, I am taking your suggestion about testing for dest_id 
> to
> multiple vcpus in the IOAPIC, and it seems to be working so far.  However, 
> that
> is not enough to fix kdump.  We also need to remove the BSP assumptions in the
> i8254 code, otherwise we will never even attempt to deliver the interrupt to
> !vcpu0.

But the limitation is only in ioapic_deliver. vcpu0 will process the
hrtimer interrupt on its entry path (through __inject_pit_timer_intr),
call into the ioapic code via kvm_set_irq, and follow the normal apic
interrupt delivery path (which includes waking up the target vcpu(s)).

> That means that we need code for Avi's suggestion about kvm_set_irq for
> this to all work.

What happens now is the kvm_set_irq work is "scheduled" to execute in
vcpu0 entry context.

Regarding KVM_REQ_PENDING_TIMER, it is implicitly used in
vcpu_enter_guest here:

if (vcpu->requests || need_resched() || signal_pending(current))

> Attached is the patch that I'm currently testing if anyone else wants to look 
> at
> it.  I obviously need to break it up into a nice series, but that will have to
> wait until next week.  My "ioapic_dest_multiple_cpus()" function could do 
> with a
> "popcnt", but I couldn't find a macro after a quick look around the tree.
> What's the recommended way to count bits in Linux?

There's a hweight_long helper.

Hum, suppose its safe to limit the irq0->bsp override to ioapic delivery

Re: [PATCH 5/5] Fix kdump under KVM.

2009-10-30 Thread David S. Ahern

On 10/30/2009 09:28 AM, Chris Lalancette wrote:
>>
>> For VMX, the guests TSC's are now usually synchronized (which was not
>> the case before, when this patch was written). For AMD, they start out
>> of sync.
>>
>> Migration also causes them to be out of sync. So i'd prefer to postpone
>> the removal of this limitation until there is a guarantee the TSCs 
>> are synchronized across vcpus.
> 
> OK.  I'll try to get on an unsynchronized TSC machine and see if I can 
> reproduce
> there.

Perhaps Marcelo was referring to his patchset from December 2008
targeting TSC synchronization?

http://www.mail-archive.com/kvm@vger.kernel.org/msg08071.html

David
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/5] Fix kdump under KVM.

2009-10-30 Thread Chris Lalancette
Marcelo Tosatti wrote:
> On Fri, Oct 30, 2009 at 01:23:57PM +0100, Chris Lalancette wrote:
>> Marcelo Tosatti wrote:
>>> On Wed, Oct 28, 2009 at 11:21:42AM +0100, Chris Lalancette wrote:
 Marcelo Tosatti wrote:
> On Tue, Oct 27, 2009 at 05:41:07PM +0100, Chris Lalancette wrote:
>> This patch is the main point of the series.  In order for
>> kdump to properly work inside a KVM guest, we need to make
>> sure that all VCPUs in virtual wire APIC mode get kicked
>> to try and pick up the timer interrupts.  To do this,
>> we iterate over the CPUs and deliver interrupts to the
>> proper VCPUs.
>>
>> I don't love the concept of doing kvm_irq_kick_vcpus() from
>> within pit_timer_fn().  A PIT is not connected to a CPU at all,
>> only to a PIC or APIC.  However, if a CPU enters idle, this is
>> the only way to wake it up to check for the interrupt.
> The reason the PIT interrupt was fixed to BSP is:
>
> http://www.mail-archive.com/kvm-de...@lists.sourceforge.net/msg13250.html
>
> Perhaps enhancing ioapic_deliver's (irq == 0) check to bypass the   
> destination overwrite in case its programmed by the guest to 
> a single CPU would fix it?
 Ug, nasty.  I definitely don't want to re-introduce that bug.  What 
 exactly do
 you mean by "fix it"?  Do you mean fix the original RHEL-5.1 PAE issue, or 
 fix
 the kdump issue?
>>> The kdump issue. Something like:
>>>
>>> #ifdef CONFIG_X86
>>> /* Always delivery PIT interrupt to vcpu 0 */
>>> if (irq == 0 && dest_multiple_vcpus(entry)) {
>>> irqe.dest_mode = 0; /* Physical mode. */
>>> /* need to read apic_id from apic regiest since
>>>  * it can be rewritten */
>>> irqe.dest_id = ioapic->kvm->bsp_vcpu->vcpu_id;
>>> }
>>> #endif
>>>
>>> The rest of the code should be ready to deal with it.
>> Do you have any more information about how to reproduce that original issue? 
>>  I
>> tried installing a RHEL-5.1 i386 guest with a PAE kernel, and I was able to
>> install and boot it just fine with or without the "always deliver irq 0 to 
>> vcpu
>> 0" check above.  Either there's something in particular I need to do to 
>> trigger
>> the issue (hardware or software), or it's being solved another way.
> 
> For VMX, the guests TSC's are now usually synchronized (which was not
> the case before, when this patch was written). For AMD, they start out
> of sync.
> 
> Migration also causes them to be out of sync. So i'd prefer to postpone
> the removal of this limitation until there is a guarantee the TSCs 
> are synchronized across vcpus.

OK.  I'll try to get on an unsynchronized TSC machine and see if I can reproduce
there.

> 
>> In the meantime, I've gotten the "set_irq from IRQ context" that Avi 
>> suggested
>> working, and the fixing up of this IOAPIC check is the last bit to actually 
>> get
>> kdump working.
> 
> The kdump fix does not require that, I believe (although it is
> useful/required for other things, please send the patch separately).
> 
> The hrtimer will fire on vcpu0, then:
> 
> kvm_inject_pit_interrupts -> ioapic_set_irq -> apic_deliver ->
> kvm_vcpu_kick(vcpuN)
> 
> As long as the dest-id = vcpu_id overwrite change is in place.

Just to follow up here, I am taking your suggestion about testing for dest_id to
multiple vcpus in the IOAPIC, and it seems to be working so far.  However, that
is not enough to fix kdump.  We also need to remove the BSP assumptions in the
i8254 code, otherwise we will never even attempt to deliver the interrupt to
!vcpu0.  That means that we need code for Avi's suggestion about kvm_set_irq for
this to all work.

Attached is the patch that I'm currently testing if anyone else wants to look at
it.  I obviously need to break it up into a nice series, but that will have to
wait until next week.  My "ioapic_dest_multiple_cpus()" function could do with a
"popcnt", but I couldn't find a macro after a quick look around the tree.
What's the recommended way to count bits in Linux?

-- 
Chris Lalancette


current.patch
Description: application/mbox


Re: [PATCH 5/5] Fix kdump under KVM.

2009-10-30 Thread Marcelo Tosatti
On Fri, Oct 30, 2009 at 01:23:57PM +0100, Chris Lalancette wrote:
> Marcelo Tosatti wrote:
> > On Wed, Oct 28, 2009 at 11:21:42AM +0100, Chris Lalancette wrote:
> >> Marcelo Tosatti wrote:
> >>> On Tue, Oct 27, 2009 at 05:41:07PM +0100, Chris Lalancette wrote:
>  This patch is the main point of the series.  In order for
>  kdump to properly work inside a KVM guest, we need to make
>  sure that all VCPUs in virtual wire APIC mode get kicked
>  to try and pick up the timer interrupts.  To do this,
>  we iterate over the CPUs and deliver interrupts to the
>  proper VCPUs.
> 
>  I don't love the concept of doing kvm_irq_kick_vcpus() from
>  within pit_timer_fn().  A PIT is not connected to a CPU at all,
>  only to a PIC or APIC.  However, if a CPU enters idle, this is
>  the only way to wake it up to check for the interrupt.
> >>> The reason the PIT interrupt was fixed to BSP is:
> >>>
> >>> http://www.mail-archive.com/kvm-de...@lists.sourceforge.net/msg13250.html
> >>>
> >>> Perhaps enhancing ioapic_deliver's (irq == 0) check to bypass the   
> >>> destination overwrite in case its programmed by the guest to 
> >>> a single CPU would fix it?
> >> Ug, nasty.  I definitely don't want to re-introduce that bug.  What 
> >> exactly do
> >> you mean by "fix it"?  Do you mean fix the original RHEL-5.1 PAE issue, or 
> >> fix
> >> the kdump issue?
> > 
> > The kdump issue. Something like:
> > 
> > #ifdef CONFIG_X86
> > /* Always delivery PIT interrupt to vcpu 0 */
> > if (irq == 0 && dest_multiple_vcpus(entry)) {
> > irqe.dest_mode = 0; /* Physical mode. */
> > /* need to read apic_id from apic regiest since
> >  * it can be rewritten */
> > irqe.dest_id = ioapic->kvm->bsp_vcpu->vcpu_id;
> > }
> > #endif
> > 
> > The rest of the code should be ready to deal with it.
> 
> Do you have any more information about how to reproduce that original issue?  
> I
> tried installing a RHEL-5.1 i386 guest with a PAE kernel, and I was able to
> install and boot it just fine with or without the "always deliver irq 0 to 
> vcpu
> 0" check above.  Either there's something in particular I need to do to 
> trigger
> the issue (hardware or software), or it's being solved another way.

For VMX, the guests TSC's are now usually synchronized (which was not
the case before, when this patch was written). For AMD, they start out
of sync.

Migration also causes them to be out of sync. So i'd prefer to postpone
the removal of this limitation until there is a guarantee the TSCs 
are synchronized across vcpus.

> In the meantime, I've gotten the "set_irq from IRQ context" that Avi suggested
> working, and the fixing up of this IOAPIC check is the last bit to actually 
> get
> kdump working.

The kdump fix does not require that, I believe (although it is
useful/required for other things, please send the patch separately).

The hrtimer will fire on vcpu0, then:

kvm_inject_pit_interrupts -> ioapic_set_irq -> apic_deliver ->
kvm_vcpu_kick(vcpuN)

As long as the dest-id = vcpu_id overwrite change is in place.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/5] Fix kdump under KVM.

2009-10-30 Thread Chris Lalancette
Marcelo Tosatti wrote:
> On Wed, Oct 28, 2009 at 11:21:42AM +0100, Chris Lalancette wrote:
>> Marcelo Tosatti wrote:
>>> On Tue, Oct 27, 2009 at 05:41:07PM +0100, Chris Lalancette wrote:
 This patch is the main point of the series.  In order for
 kdump to properly work inside a KVM guest, we need to make
 sure that all VCPUs in virtual wire APIC mode get kicked
 to try and pick up the timer interrupts.  To do this,
 we iterate over the CPUs and deliver interrupts to the
 proper VCPUs.

 I don't love the concept of doing kvm_irq_kick_vcpus() from
 within pit_timer_fn().  A PIT is not connected to a CPU at all,
 only to a PIC or APIC.  However, if a CPU enters idle, this is
 the only way to wake it up to check for the interrupt.
>>> The reason the PIT interrupt was fixed to BSP is:
>>>
>>> http://www.mail-archive.com/kvm-de...@lists.sourceforge.net/msg13250.html
>>>
>>> Perhaps enhancing ioapic_deliver's (irq == 0) check to bypass the   
>>> destination overwrite in case its programmed by the guest to 
>>> a single CPU would fix it?
>> Ug, nasty.  I definitely don't want to re-introduce that bug.  What exactly 
>> do
>> you mean by "fix it"?  Do you mean fix the original RHEL-5.1 PAE issue, or 
>> fix
>> the kdump issue?
> 
> The kdump issue. Something like:
> 
> #ifdef CONFIG_X86
> /* Always delivery PIT interrupt to vcpu 0 */
> if (irq == 0 && dest_multiple_vcpus(entry)) {
> irqe.dest_mode = 0; /* Physical mode. */
> /* need to read apic_id from apic regiest since
>  * it can be rewritten */
> irqe.dest_id = ioapic->kvm->bsp_vcpu->vcpu_id;
> }
> #endif
> 
> The rest of the code should be ready to deal with it.

Do you have any more information about how to reproduce that original issue?  I
tried installing a RHEL-5.1 i386 guest with a PAE kernel, and I was able to
install and boot it just fine with or without the "always deliver irq 0 to vcpu
0" check above.  Either there's something in particular I need to do to trigger
the issue (hardware or software), or it's being solved another way.

In the meantime, I've gotten the "set_irq from IRQ context" that Avi suggested
working, and the fixing up of this IOAPIC check is the last bit to actually get
kdump working.

-- 
Chris Lalancette
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/5] Fix kdump under KVM.

2009-10-28 Thread Marcelo Tosatti
On Wed, Oct 28, 2009 at 11:21:42AM +0100, Chris Lalancette wrote:
> Marcelo Tosatti wrote:
> > On Tue, Oct 27, 2009 at 05:41:07PM +0100, Chris Lalancette wrote:
> >> This patch is the main point of the series.  In order for
> >> kdump to properly work inside a KVM guest, we need to make
> >> sure that all VCPUs in virtual wire APIC mode get kicked
> >> to try and pick up the timer interrupts.  To do this,
> >> we iterate over the CPUs and deliver interrupts to the
> >> proper VCPUs.
> >>
> >> I don't love the concept of doing kvm_irq_kick_vcpus() from
> >> within pit_timer_fn().  A PIT is not connected to a CPU at all,
> >> only to a PIC or APIC.  However, if a CPU enters idle, this is
> >> the only way to wake it up to check for the interrupt.
> > 
> > The reason the PIT interrupt was fixed to BSP is:
> > 
> > http://www.mail-archive.com/kvm-de...@lists.sourceforge.net/msg13250.html
> > 
> > Perhaps enhancing ioapic_deliver's (irq == 0) check to bypass the   
> > destination overwrite in case its programmed by the guest to 
> > a single CPU would fix it?
> 
> Ug, nasty.  I definitely don't want to re-introduce that bug.  What exactly do
> you mean by "fix it"?  Do you mean fix the original RHEL-5.1 PAE issue, or fix
> the kdump issue?

The kdump issue. Something like:

#ifdef CONFIG_X86
/* Always delivery PIT interrupt to vcpu 0 */
if (irq == 0 && dest_multiple_vcpus(entry)) {
irqe.dest_mode = 0; /* Physical mode. */
/* need to read apic_id from apic regiest since
 * it can be rewritten */
irqe.dest_id = ioapic->kvm->bsp_vcpu->vcpu_id;
}
#endif

The rest of the code should be ready to deal with it.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/5] Fix kdump under KVM.

2009-10-28 Thread Chris Lalancette
Marcelo Tosatti wrote:
> On Tue, Oct 27, 2009 at 05:41:07PM +0100, Chris Lalancette wrote:
>> This patch is the main point of the series.  In order for
>> kdump to properly work inside a KVM guest, we need to make
>> sure that all VCPUs in virtual wire APIC mode get kicked
>> to try and pick up the timer interrupts.  To do this,
>> we iterate over the CPUs and deliver interrupts to the
>> proper VCPUs.
>>
>> I don't love the concept of doing kvm_irq_kick_vcpus() from
>> within pit_timer_fn().  A PIT is not connected to a CPU at all,
>> only to a PIC or APIC.  However, if a CPU enters idle, this is
>> the only way to wake it up to check for the interrupt.
> 
> The reason the PIT interrupt was fixed to BSP is:
> 
> http://www.mail-archive.com/kvm-de...@lists.sourceforge.net/msg13250.html
> 
> Perhaps enhancing ioapic_deliver's (irq == 0) check to bypass the   
> destination overwrite in case its programmed by the guest to 
> a single CPU would fix it?

Ug, nasty.  I definitely don't want to re-introduce that bug.  What exactly do
you mean by "fix it"?  Do you mean fix the original RHEL-5.1 PAE issue, or fix
the kdump issue?

-- 
Chris Lalancette
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/5] Fix kdump under KVM.

2009-10-27 Thread Marcelo Tosatti
On Tue, Oct 27, 2009 at 05:41:07PM +0100, Chris Lalancette wrote:
> This patch is the main point of the series.  In order for
> kdump to properly work inside a KVM guest, we need to make
> sure that all VCPUs in virtual wire APIC mode get kicked
> to try and pick up the timer interrupts.  To do this,
> we iterate over the CPUs and deliver interrupts to the
> proper VCPUs.
> 
> I don't love the concept of doing kvm_irq_kick_vcpus() from
> within pit_timer_fn().  A PIT is not connected to a CPU at all,
> only to a PIC or APIC.  However, if a CPU enters idle, this is
> the only way to wake it up to check for the interrupt.

The reason the PIT interrupt was fixed to BSP is:

http://www.mail-archive.com/kvm-de...@lists.sourceforge.net/msg13250.html

Perhaps enhancing ioapic_deliver's (irq == 0) check to bypass the   
destination overwrite in case its programmed by the guest to 
a single CPU would fix it?



--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 5/5] Fix kdump under KVM.

2009-10-27 Thread Chris Lalancette
This patch is the main point of the series.  In order for
kdump to properly work inside a KVM guest, we need to make
sure that all VCPUs in virtual wire APIC mode get kicked
to try and pick up the timer interrupts.  To do this,
we iterate over the CPUs and deliver interrupts to the
proper VCPUs.

I don't love the concept of doing kvm_irq_kick_vcpus() from
within pit_timer_fn().  A PIT is not connected to a CPU at all,
only to a PIC or APIC.  However, if a CPU enters idle, this is
the only way to wake it up to check for the interrupt.

Signed-off-by: Chris Lalancette 
---
:100644 100644 d5c08fa... fe08823... M  arch/x86/kvm/i8254.c
:100644 100644 40f... 5b699c1... M  arch/x86/kvm/lapic.c
:100644 100644 40010b0... 9c4e52b... M  arch/x86/kvm/lapic.h
:100644 100644 053e49f... 975b0d6... M  include/linux/kvm_host.h
:100644 100644 cd6f92b... 717d265... M  virt/kvm/ioapic.c
:100644 100644 0d454d3... d24ac91... M  virt/kvm/irq_comm.c
 arch/x86/kvm/i8254.c |3 +--
 arch/x86/kvm/lapic.c |   12 
 arch/x86/kvm/lapic.h |1 +
 include/linux/kvm_host.h |2 ++
 virt/kvm/ioapic.c|9 -
 virt/kvm/irq_comm.c  |   12 
 6 files changed, 28 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
index d5c08fa..fe08823 100644
--- a/arch/x86/kvm/i8254.c
+++ b/arch/x86/kvm/i8254.c
@@ -299,8 +299,7 @@ static enum hrtimer_restart pit_timer_fn(struct hrtimer 
*data)
if (ktimer->reinject || !atomic_read(&ktimer->pending))
atomic_inc(&ktimer->pending);
 
-   if (waitqueue_active(&ktimer->kvm->bsp_vcpu->wq))
-   wake_up_interruptible(&ktimer->kvm->bsp_vcpu->wq);
+   kvm_irq_kick_vcpus(ktimer->kvm);
 
if (ktimer->t_ops->is_periodic(ktimer)) {
hrtimer_add_expires_ns(&ktimer->timer, ktimer->period);
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 40f..5b699c1 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1031,6 +1031,18 @@ void kvm_apic_nmi_wd_deliver(struct kvm_vcpu *vcpu)
kvm_apic_local_deliver(apic, APIC_LVT0);
 }
 
+int kvm_apic_in_virtual_wire_mode(struct kvm_vcpu *vcpu)
+{
+   if (kvm_lapic_enabled(vcpu)) {
+   u32 lvt0 = apic_get_reg(vcpu->arch.apic, APIC_LVT0);
+   if ((lvt0 & APIC_LVT_MASKED) == 0 &&
+   GET_APIC_DELIVERY_MODE(lvt0) == APIC_MODE_EXTINT)
+   return 1;
+   }
+
+   return 0;
+}
+
 static struct kvm_timer_ops lapic_timer_ops = {
.is_periodic = lapic_is_periodic,
 };
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index 40010b0..9c4e52b 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -30,6 +30,7 @@ void kvm_lapic_set_tpr(struct kvm_vcpu *vcpu, unsigned long 
cr8);
 void kvm_lapic_set_base(struct kvm_vcpu *vcpu, u64 value);
 u64 kvm_lapic_get_base(struct kvm_vcpu *vcpu);
 void kvm_apic_set_version(struct kvm_vcpu *vcpu);
+int kvm_apic_in_virtual_wire_mode(struct kvm_vcpu *vcpu);
 
 int kvm_apic_match_physical_addr(struct kvm_lapic *apic, u16 dest);
 int kvm_apic_match_logical_addr(struct kvm_lapic *apic, u8 mda);
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 053e49f..975b0d6 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -542,6 +542,8 @@ int kvm_set_irq_routing(struct kvm *kvm,
unsigned flags);
 void kvm_free_irq_routing(struct kvm *kvm);
 
+void kvm_irq_kick_vcpus(struct kvm *kvm);
+
 #else
 
 static inline void kvm_free_irq_routing(struct kvm *kvm) {}
diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c
index cd6f92b..717d265 100644
--- a/virt/kvm/ioapic.c
+++ b/virt/kvm/ioapic.c
@@ -163,15 +163,6 @@ static int ioapic_deliver(struct kvm_ioapic *ioapic, int 
irq)
irqe.level = 1;
irqe.shorthand = 0;
 
-#ifdef CONFIG_X86
-   /* Always deliver PIT interrupt to vcpu 0 */
-   if (irq == 0) {
-   irqe.dest_mode = 0; /* Physical mode. */
-   /* need to read apic_id from apic regiest since
-* it can be rewritten */
-   irqe.dest_id = ioapic->kvm->bsp_vcpu->vcpu_id;
-   }
-#endif
return kvm_irq_delivery_to_apic(ioapic->kvm, NULL, &irqe);
 }
 
diff --git a/virt/kvm/irq_comm.c b/virt/kvm/irq_comm.c
index 0d454d3..d24ac91 100644
--- a/virt/kvm/irq_comm.c
+++ b/virt/kvm/irq_comm.c
@@ -293,6 +293,18 @@ void kvm_free_irq_routing(struct kvm *kvm)
kfree(kvm->irq_routing);
 }
 
+void kvm_irq_kick_vcpus(struct kvm *kvm)
+{
+   int i;
+   struct kvm_vcpu *vcpu;
+
+   kvm_for_each_vcpu(i, vcpu, kvm) {
+   if (kvm_apic_in_virtual_wire_mode(vcpu))
+   if (waitqueue_active(&vcpu->wq))
+   wake_up_interruptible(&vcpu->wq);
+   }
+}
+
 static int setup_routing_entry(struct kvm_irq_routing_table *rt,
   struct kvm_kernel_irq_routing_entry *e,