Re: [PATCH 08/10] Move IO APIC to its own lock.

2009-08-09 Thread Gleb Natapov
On Sun, Aug 09, 2009 at 06:10:07PM +0300, Avi Kivity wrote:
> On 08/09/2009 05:57 PM, Gleb Natapov wrote:
>> On Sun, Aug 09, 2009 at 05:54:30PM +0300, Avi Kivity wrote:
>>
>>> On 08/09/2009 03:41 PM, Gleb Natapov wrote:
>>>  
 Introduce new function kvm_notifier_set_irq() that should be used
 to change irq line level from irq notifiers. When irq notifier
 change irq line level it calls into irq chip code recursively. The
 function avoids taking a lock recursively.


>>> This looks really horrible.  I don't have an alternative yet, but I'll
>>> think of one.
>>>  
>> I agree this is not nice. This is needed only for device assignment
>> case. That explains why I don't like device assignment :)
>
> Well, implementation problems can be fixed.  Other issues with device  
> assignment cannot.
>
>> The problem
>> is that the only communication channel from guest to assigned device that
>> goes through the host is interrupt injection/acknowledgement, so we try to
>> do things (lowering IRQ) on this path that usually are done somewhere else.
>>
>
> You can queue the injection somehow (work struct? special purpose queue  
> examined after unlock?) and avoid the recursive locking.
>
We can't. Line status should be update here and now. Otherwise interrupt
will be immediately reinjected. 

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 08/10] Move IO APIC to its own lock.

2009-08-09 Thread Avi Kivity

On 08/09/2009 05:57 PM, Gleb Natapov wrote:

On Sun, Aug 09, 2009 at 05:54:30PM +0300, Avi Kivity wrote:
   

On 08/09/2009 03:41 PM, Gleb Natapov wrote:
 

Introduce new function kvm_notifier_set_irq() that should be used
to change irq line level from irq notifiers. When irq notifier
change irq line level it calls into irq chip code recursively. The
function avoids taking a lock recursively.

   

This looks really horrible.  I don't have an alternative yet, but I'll
think of one.
 

I agree this is not nice. This is needed only for device assignment
case. That explains why I don't like device assignment :)


Well, implementation problems can be fixed.  Other issues with device 
assignment cannot.



The problem
is that the only communication channel from guest to assigned device that
goes through the host is interrupt injection/acknowledgement, so we try to
do things (lowering IRQ) on this path that usually are done somewhere else.
   


You can queue the injection somehow (work struct? special purpose queue 
examined after unlock?) and avoid the recursive locking.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 08/10] Move IO APIC to its own lock.

2009-08-09 Thread Gleb Natapov
On Sun, Aug 09, 2009 at 05:54:30PM +0300, Avi Kivity wrote:
> On 08/09/2009 03:41 PM, Gleb Natapov wrote:
>> Introduce new function kvm_notifier_set_irq() that should be used
>> to change irq line level from irq notifiers. When irq notifier
>> change irq line level it calls into irq chip code recursively. The
>> function avoids taking a lock recursively.
>>
>
> This looks really horrible.  I don't have an alternative yet, but I'll  
> think of one.
I agree this is not nice. This is needed only for device assignment
case. That explains why I don't like device assignment :) The problem
is that the only communication channel from guest to assigned device that
goes through the host is interrupt injection/acknowledgement, so we try to
do things (lowering IRQ) on this path that usually are done somewhere else.
 
--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 08/10] Move IO APIC to its own lock.

2009-08-09 Thread Avi Kivity

On 08/09/2009 03:41 PM, Gleb Natapov wrote:

Introduce new function kvm_notifier_set_irq() that should be used
to change irq line level from irq notifiers. When irq notifier
change irq line level it calls into irq chip code recursively. The
function avoids taking a lock recursively.
   


This looks really horrible.  I don't have an alternative yet, but I'll 
think of one.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 08/10] Move IO APIC to its own lock.

2009-07-17 Thread Marcelo Tosatti
On Fri, Jul 17, 2009 at 08:32:00PM +0300, Gleb Natapov wrote:
> > There are a few issues that kvm_set_irq return code cannot handle,
> > AFAICS (please correct me if i'm wrong, or provide useful points for
> > discussion).
> > 
> > case 1)
> > 1. kvm_set_irq, delivered OK (IOW, assume guest has acked the
> > interrupt, next_timer_event += timer_period).
> > 2. hlt -> kvm_vcpu_block thinks it can sleep, because next injectable
> > timer event is the future.
> > 3. which might not be the case, since we are not certain the guest 
> > has acked the interrupt.
> > 
> > case 2)
> > 1. kvm_set_irq, delivered OK.
> > 2. guest masks irq.
> > 3. here you do no want to unhalt the vcpu on the next timer event,
> > because irq is masked at irqchip level.
> > 
> > So you need both ack notification and mask notification for proper
> > emulation of UNHALT. Could check whether the interrupt is masked on
> > injection and avoid UNHALT for case 2), though.
> > 
> > But to check that, you need to look into both PIC and IOAPIC pins, which 
> > can happen like:
> > 
> > pic_lock
> > check if pin is masked
> > pic_unlock
> > 
> > ioapic_lock
> > check if pin is masked
> > ioapic_unlock
> And what if interrupts are blocked by vcpu? You don't event know what
> cpu will receive the interrupt (may be several?).

Timers get bound to vcpus. If the IOAPIC is programmed to multiple
vcpus, you bind one timer instance to each target vcpu. So it will need
"target vcpu notifiers" (good point).

> > And only then, if interrupt can reach the processor, UNHALT the vcpu.
> > This could avoid the need to check for PIC/IOAPIC state inside the PIT /
> > other timers mask notifiers (regarding the current locking discussion).
> > 
> Timer is just a device that generates interrupt. You don't check in
> a device emulation code if CPU is halted and should be unhalted and
> timer code shouldn't do that either. The only check that decide if vcpu
> should be unhalted is in kvm_arch_vcpu_runnable() and it goes like this:
>   if there is interrupt to inject and interrupt are not masked by vcpu or
>   there is nmi to inject then unhalt vcpu.
> So what the timer code should do is to call kvm_set_irq(1) and if
> interrupt (or nmi) will be delivered to vcpu as a result of this the
> above check will unhalt vcpu.

OK, can use kvm_arch_vcpu_runnable in the comparison.

> If you want to decide in timer code what should be done to vcpu as a
> result of a timer event you'll have to reimplement entire pic/ioapic/lapic
> logic inside it. What if timer interrupt configured to send INIT to vcpu?

It'll go through kvm_set_irq (expect for LAPIC timer).

> Both cases you described are problematic only because you are trying to
> make interrupt code too smart.
> 
> > Those checks can race with another processor unmasking the pin for 
> > the irqchips, but then its a guest problem.
> Sane OSse serialize access to pic/ioapic. I think we can assume this in
> our code.
> 
> > 
> > You still need the mask notifier to be able to reset the reinjection
> > logic on unmask (see 4780c65904f0fc4e312ee2da9383eacbe04e61ea). Is that 
> > the case thats broken with RTC in QEMU?
> > 
> Yes, interrupt masking is broken with RTC in QEMU, but only because QEMU
> wanted it like this. In KVM if kvm_set_irq() returns zero it means that
> interrupt was masked so reinjection logic can be reset.
> 
> > Also, the fact that you receive an ack notification allows you to make 
> > decisions of behaviour between injection and ack. You can also measure
> > the delay between these two events, which is not used at the moment
> > but is potentially useful information.
> You can use ack notifier for this, but it doesn't require to call back
> into pic/ioapic code.

Agreed.

> > 
> > You do not have that information on AEOI style interrupts (such as timer 
> > delivery via PIT->LVT0 in NMI mode or PIC AEOI), but there's nothing 
> > you can to help such cases.
> > 
> > > > (it could check whether a GSI is masked or not on both PIC/IOAPIC/LAPIC
> > > > from a different context other than mask notifier).
> > > > 
> > > > > But I think I found the way to not drop the lock here.
> > > > 
> > > > See, this is adding more complexity than simplifying things (the
> > > > recursive check).
> > > This is not more complex than current code. (IMHO)
> > 
> > Maybe there is a better way to get rid of that recursion check, maybe 
> > move the notifiers to be called after internal irqchip state has been
> > updated, outside the lock?
> > 
> Impressible (IOW I don't see how it could be done). The ack notifier is
> the one who drives line low, so it should be called before irr is
> checked for pending interrupts. Basically the same scenario I described
> for pic in my previous mail.

Flargunbastic. You're right.

> > > had a chance to clear interrupt condition in a device. Or do I miss
> > > something here?
> > 
> > No, it looks wrong. The ack notification should be inside pic_clear_isr.
> > 
> So you agree with me that the cu

Re: [PATCH 08/10] Move IO APIC to its own lock.

2009-07-17 Thread Gleb Natapov
On Fri, Jul 17, 2009 at 12:19:58PM -0300, Marcelo Tosatti wrote:
> On Thu, Jul 16, 2009 at 11:49:35PM +0300, Gleb Natapov wrote:
> > On Thu, Jul 16, 2009 at 03:09:30PM -0300, Marcelo Tosatti wrote:
> > > On Thu, Jul 16, 2009 at 10:51:16AM +0300, Gleb Natapov wrote:
> > > > On Wed, Jul 15, 2009 at 06:38:48PM -0300, Marcelo Tosatti wrote:
> > > > > On Wed, Jul 15, 2009 at 11:48:17PM +0300, Gleb Natapov wrote:
> > > > > > > > +   spin_unlock(&ioapic->lock);
> > > > > > > > +   kvm_notify_acked_irq(ioapic->kvm, 
> > > > > > > > KVM_IRQCHIP_IOAPIC, i);
> > > > > > > > +   spin_lock(&ioapic->lock);
> > > > > > > 
> > > > > > > While traversing the IOAPIC pins you drop the lock and acquire it 
> > > > > > > again,
> > > > > > > which is error prone: what if the redirection table is changed in
> > > > > > > between, for example?
> > > > > > > 
> > > > > > Yes, the ack notifiers is a problematic part. The only thing that
> > > > > > current ack notifiers do is set_irq_level(0) so this is not the 
> > > > > > problem
> > > > > > in practice. I'll see if I can eliminate this dropping of the lock 
> > > > > > here.
> > > > > 
> > > > > Currently, yes. But take into account that an ack notifier might do 
> > > > > other things than set_irq_level(0).
> > > > > 
> > > > For instance? Ack notifier can do whatever it wants if it does not call
> > > > back into ioapic. It may call to ioapic only by set_irq_level(0) (which
> > > > it does now from device assignment code) or by set_irq_level(1) and this
> > > > does not make sense for level triggered interrupts because not calling
> > > > set_irq_level(1) will have the same effect.
> > > 
> > > Checking whether a given gsi is masked on the
> > > PIC/IOAPIC from the PIC/IOAPIC mask notifiers
> > > (http://www.spinics.net/lists/kvm/msg19093.html), for example. 
> > > 
> > > Do you think that particular case should be handled in a different way?
> > I dislike ack notifiers and think that the should not be used if
> > possible. kvm_set_irq() return value provide you enough info to know it
> > interrupt was delivered or masked so the pit case can (and should) be
> > handled without using irq notifiers at all. See how RTC interrupt
> > coalescing is handled in qemu (it does not handle masking correctly, but
> > only because qemu community enjoys having their code broken).
> 
> There are a few issues that kvm_set_irq return code cannot handle,
> AFAICS (please correct me if i'm wrong, or provide useful points for
> discussion).
> 
> case 1)
> 1. kvm_set_irq, delivered OK (IOW, assume guest has acked the
> interrupt, next_timer_event += timer_period).
> 2. hlt -> kvm_vcpu_block thinks it can sleep, because next injectable
> timer event is the future.
> 3. which might not be the case, since we are not certain the guest 
> has acked the interrupt.
> 
> case 2)
> 1. kvm_set_irq, delivered OK.
> 2. guest masks irq.
> 3. here you do no want to unhalt the vcpu on the next timer event,
> because irq is masked at irqchip level.
> 
> So you need both ack notification and mask notification for proper
> emulation of UNHALT. Could check whether the interrupt is masked on
> injection and avoid UNHALT for case 2), though.
> 
> But to check that, you need to look into both PIC and IOAPIC pins, which 
> can happen like:
> 
> pic_lock
> check if pin is masked
> pic_unlock
> 
> ioapic_lock
> check if pin is masked
> ioapic_unlock
And what if interrupts are blocked by vcpu? You don't event know what
cpu will receive the interrupt (may be several?).

> 
> And only then, if interrupt can reach the processor, UNHALT the vcpu.
> This could avoid the need to check for PIC/IOAPIC state inside the PIT /
> other timers mask notifiers (regarding the current locking discussion).
> 
Timer is just a device that generates interrupt. You don't check in
a device emulation code if CPU is halted and should be unhalted and
timer code shouldn't do that either. The only check that decide if vcpu
should be unhalted is in kvm_arch_vcpu_runnable() and it goes like this:
  if there is interrupt to inject and interrupt are not masked by vcpu or
  there is nmi to inject then unhalt vcpu.
So what the timer code should do is to call kvm_set_irq(1) and if
interrupt (or nmi) will be delivered to vcpu as a result of this the
above check will unhalt vcpu.

If you want to decide in timer code what should be done to vcpu as a
result of a timer event you'll have to reimplement entire pic/ioapic/lapic
logic inside it. What if timer interrupt configured to send INIT to vcpu?

Both cases you described are problematic only because you are trying to
make interrupt code too smart.

> Those checks can race with another processor unmasking the pin for 
> the irqchips, but then its a guest problem.
Sane OSse serialize access to pic/ioapic. I think we can assume this in
our code.

> 
> You still need the mask notifier to be able to reset the reinjection
> logic on unmask (see 4780c65904f0fc4e312ee2da9383eacbe04

Re: [PATCH 08/10] Move IO APIC to its own lock.

2009-07-17 Thread Marcelo Tosatti
On Thu, Jul 16, 2009 at 11:49:35PM +0300, Gleb Natapov wrote:
> On Thu, Jul 16, 2009 at 03:09:30PM -0300, Marcelo Tosatti wrote:
> > On Thu, Jul 16, 2009 at 10:51:16AM +0300, Gleb Natapov wrote:
> > > On Wed, Jul 15, 2009 at 06:38:48PM -0300, Marcelo Tosatti wrote:
> > > > On Wed, Jul 15, 2009 at 11:48:17PM +0300, Gleb Natapov wrote:
> > > > > > > + spin_unlock(&ioapic->lock);
> > > > > > > + kvm_notify_acked_irq(ioapic->kvm, KVM_IRQCHIP_IOAPIC, 
> > > > > > > i);
> > > > > > > + spin_lock(&ioapic->lock);
> > > > > > 
> > > > > > While traversing the IOAPIC pins you drop the lock and acquire it 
> > > > > > again,
> > > > > > which is error prone: what if the redirection table is changed in
> > > > > > between, for example?
> > > > > > 
> > > > > Yes, the ack notifiers is a problematic part. The only thing that
> > > > > current ack notifiers do is set_irq_level(0) so this is not the 
> > > > > problem
> > > > > in practice. I'll see if I can eliminate this dropping of the lock 
> > > > > here.
> > > > 
> > > > Currently, yes. But take into account that an ack notifier might do 
> > > > other things than set_irq_level(0).
> > > > 
> > > For instance? Ack notifier can do whatever it wants if it does not call
> > > back into ioapic. It may call to ioapic only by set_irq_level(0) (which
> > > it does now from device assignment code) or by set_irq_level(1) and this
> > > does not make sense for level triggered interrupts because not calling
> > > set_irq_level(1) will have the same effect.
> > 
> > Checking whether a given gsi is masked on the
> > PIC/IOAPIC from the PIC/IOAPIC mask notifiers
> > (http://www.spinics.net/lists/kvm/msg19093.html), for example. 
> > 
> > Do you think that particular case should be handled in a different way?
> I dislike ack notifiers and think that the should not be used if
> possible. kvm_set_irq() return value provide you enough info to know it
> interrupt was delivered or masked so the pit case can (and should) be
> handled without using irq notifiers at all. See how RTC interrupt
> coalescing is handled in qemu (it does not handle masking correctly, but
> only because qemu community enjoys having their code broken).

There are a few issues that kvm_set_irq return code cannot handle,
AFAICS (please correct me if i'm wrong, or provide useful points for
discussion).

case 1)
1. kvm_set_irq, delivered OK (IOW, assume guest has acked the
interrupt, next_timer_event += timer_period).
2. hlt -> kvm_vcpu_block thinks it can sleep, because next injectable
timer event is the future.
3. which might not be the case, since we are not certain the guest 
has acked the interrupt.

case 2)
1. kvm_set_irq, delivered OK.
2. guest masks irq.
3. here you do no want to unhalt the vcpu on the next timer event,
because irq is masked at irqchip level.

So you need both ack notification and mask notification for proper
emulation of UNHALT. Could check whether the interrupt is masked on
injection and avoid UNHALT for case 2), though.

But to check that, you need to look into both PIC and IOAPIC pins, which 
can happen like:

pic_lock
check if pin is masked
pic_unlock

ioapic_lock
check if pin is masked
ioapic_unlock

And only then, if interrupt can reach the processor, UNHALT the vcpu.
This could avoid the need to check for PIC/IOAPIC state inside the PIT /
other timers mask notifiers (regarding the current locking discussion).

Those checks can race with another processor unmasking the pin for 
the irqchips, but then its a guest problem.

You still need the mask notifier to be able to reset the reinjection
logic on unmask (see 4780c65904f0fc4e312ee2da9383eacbe04e61ea). Is that 
the case thats broken with RTC in QEMU?

Also, the fact that you receive an ack notification allows you to make 
decisions of behaviour between injection and ack. You can also measure
the delay between these two events, which is not used at the moment
but is potentially useful information.

You do not have that information on AEOI style interrupts (such as timer 
delivery via PIT->LVT0 in NMI mode or PIC AEOI), but there's nothing 
you can to help such cases.

> > (it could check whether a GSI is masked or not on both PIC/IOAPIC/LAPIC
> > from a different context other than mask notifier).
> > 
> > > But I think I found the way to not drop the lock here.
> > 
> > See, this is adding more complexity than simplifying things (the
> > recursive check).
> This is not more complex than current code. (IMHO)

Maybe there is a better way to get rid of that recursion check, maybe 
move the notifiers to be called after internal irqchip state has been
updated, outside the lock?

> > > > Say for example an ack notifier that takes the PIC or IOAPIC lock 
> > > > (for whatever reason).
> > > What reason? No one should take PIC or IOAPIC lock outside of PIC or
> > > IOAPIC code. This is layering violation. IOAPIC should be accessed only
> > > through its public interface (set_irq/mmio_read|write/update_eoi)
> > 
> > See

Re: [PATCH 08/10] Move IO APIC to its own lock.

2009-07-16 Thread Gleb Natapov
On Thu, Jul 16, 2009 at 03:09:30PM -0300, Marcelo Tosatti wrote:
> On Thu, Jul 16, 2009 at 10:51:16AM +0300, Gleb Natapov wrote:
> > On Wed, Jul 15, 2009 at 06:38:48PM -0300, Marcelo Tosatti wrote:
> > > On Wed, Jul 15, 2009 at 11:48:17PM +0300, Gleb Natapov wrote:
> > > > > > +   spin_unlock(&ioapic->lock);
> > > > > > +   kvm_notify_acked_irq(ioapic->kvm, KVM_IRQCHIP_IOAPIC, 
> > > > > > i);
> > > > > > +   spin_lock(&ioapic->lock);
> > > > > 
> > > > > While traversing the IOAPIC pins you drop the lock and acquire it 
> > > > > again,
> > > > > which is error prone: what if the redirection table is changed in
> > > > > between, for example?
> > > > > 
> > > > Yes, the ack notifiers is a problematic part. The only thing that
> > > > current ack notifiers do is set_irq_level(0) so this is not the problem
> > > > in practice. I'll see if I can eliminate this dropping of the lock here.
> > > 
> > > Currently, yes. But take into account that an ack notifier might do 
> > > other things than set_irq_level(0).
> > > 
> > For instance? Ack notifier can do whatever it wants if it does not call
> > back into ioapic. It may call to ioapic only by set_irq_level(0) (which
> > it does now from device assignment code) or by set_irq_level(1) and this
> > does not make sense for level triggered interrupts because not calling
> > set_irq_level(1) will have the same effect.
> 
> Checking whether a given gsi is masked on the
> PIC/IOAPIC from the PIC/IOAPIC mask notifiers
> (http://www.spinics.net/lists/kvm/msg19093.html), for example. 
> 
> Do you think that particular case should be handled in a different way?
I dislike ack notifiers and think that the should not be used if
possible. kvm_set_irq() return value provide you enough info to know it
interrupt was delivered or masked so the pit case can (and should) be
handled without using irq notifiers at all. See how RTC interrupt
coalescing is handled in qemu (it does not handle masking correctly, but
only because qemu community enjoys having their code broken).

> (it could check whether a GSI is masked or not on both PIC/IOAPIC/LAPIC
> from a different context other than mask notifier).
> 
> > But I think I found the way to not drop the lock here.
> 
> See, this is adding more complexity than simplifying things (the
> recursive check).
This is not more complex than current code. (IMHO)

> 
> > > Say for example an ack notifier that takes the PIC or IOAPIC lock 
> > > (for whatever reason).
> > What reason? No one should take PIC or IOAPIC lock outside of PIC or
> > IOAPIC code. This is layering violation. IOAPIC should be accessed only
> > through its public interface (set_irq/mmio_read|write/update_eoi)
> 
> See link above.
I can't see how the code from link above can work with current code
since mask notifiers are not called from PIC.

> 
> > > 
> > > > > Also, irq_lock was held during ack and mask notifier callbacks, so 
> > > > > they
> > > > > (the callbacks) did not have to worry about concurrency.
> > > > > 
> > > > Is it possible to have more then one ack for the same interrupt line at
> > > > the same time?
> > > 
> > > Why not? But maybe this is a somewhat stupid point, because you can
> > > require the notifiers to handle that.
> > If this is possible (and it looks like it may happen if IOAPIC broadcasts
> > an interrupt vector to more then one vcpu) then it may be better to push
> > complexity into an ack notifier to not penalize a common scenario.
> > But again, if we will not drop the lock around ack notifier call they
> > will be serialized.
> > 
> > > 
> > > > > You questioned the purpose of irq_lock earlier, and thats what it is:
> > > > > synchronization between pic and ioapic blur at the irq notifiers.
> > > > > 
> > > > Pic has its own locking and it fails miserably in regards ack notifiers.
> > > > I bet nobody tried device assignment with pic. It will not work.
> > > 
> > > It fails to take irq_lock on automatic EOI because on guest entry 
> > > interrupts are disabled (see d5ecfdd25c412df9c0ecf3ab8e066461fd4f69df).
> > This explains why the code is buggy, but does not fix the bug. There are
> > two bugs at least with the pic + ack notifiers. The first one is that
> > irq notifiers are called without irq_lock held and that will trigger
> > WARN_ON(!mutex_is_locked(&kvm->irq_lock)) in kvm_set_irq() in device
> > assignment case (besides what is the point of a lock if it is not taken
> > on every code path?).
> 
> This could be fixed by switching irq_lock to a spinlock.
Yes, but then you want to minimize the amount of code that runs under a
spinlock.

> 
> >  Another one is that you can't postpone call to ack notifiers in
> > kvm_pic_read_irq() till after pic_update_irq(s). The reason is that
> > pic_update_irq() will trigger acked interrupt immediately since ack
> > notifier is the one who lowers irq line in device assignment case
> > (that is the reason I haven't done the same in ioapic case BTW).
> 
> I'm not sure i understa

Re: [PATCH 08/10] Move IO APIC to its own lock.

2009-07-16 Thread Marcelo Tosatti
On Thu, Jul 16, 2009 at 10:51:16AM +0300, Gleb Natapov wrote:
> On Wed, Jul 15, 2009 at 06:38:48PM -0300, Marcelo Tosatti wrote:
> > On Wed, Jul 15, 2009 at 11:48:17PM +0300, Gleb Natapov wrote:
> > > > > + spin_unlock(&ioapic->lock);
> > > > > + kvm_notify_acked_irq(ioapic->kvm, KVM_IRQCHIP_IOAPIC, 
> > > > > i);
> > > > > + spin_lock(&ioapic->lock);
> > > > 
> > > > While traversing the IOAPIC pins you drop the lock and acquire it again,
> > > > which is error prone: what if the redirection table is changed in
> > > > between, for example?
> > > > 
> > > Yes, the ack notifiers is a problematic part. The only thing that
> > > current ack notifiers do is set_irq_level(0) so this is not the problem
> > > in practice. I'll see if I can eliminate this dropping of the lock here.
> > 
> > Currently, yes. But take into account that an ack notifier might do 
> > other things than set_irq_level(0).
> > 
> For instance? Ack notifier can do whatever it wants if it does not call
> back into ioapic. It may call to ioapic only by set_irq_level(0) (which
> it does now from device assignment code) or by set_irq_level(1) and this
> does not make sense for level triggered interrupts because not calling
> set_irq_level(1) will have the same effect.

Checking whether a given gsi is masked on the
PIC/IOAPIC from the PIC/IOAPIC mask notifiers
(http://www.spinics.net/lists/kvm/msg19093.html), for example. 

Do you think that particular case should be handled in a different way?
(it could check whether a GSI is masked or not on both PIC/IOAPIC/LAPIC
from a different context other than mask notifier).

> But I think I found the way to not drop the lock here.

See, this is adding more complexity than simplifying things (the
recursive check).

> > Say for example an ack notifier that takes the PIC or IOAPIC lock 
> > (for whatever reason).
> What reason? No one should take PIC or IOAPIC lock outside of PIC or
> IOAPIC code. This is layering violation. IOAPIC should be accessed only
> through its public interface (set_irq/mmio_read|write/update_eoi)

See link above.

> > 
> > > > Also, irq_lock was held during ack and mask notifier callbacks, so they
> > > > (the callbacks) did not have to worry about concurrency.
> > > > 
> > > Is it possible to have more then one ack for the same interrupt line at
> > > the same time?
> > 
> > Why not? But maybe this is a somewhat stupid point, because you can
> > require the notifiers to handle that.
> If this is possible (and it looks like it may happen if IOAPIC broadcasts
> an interrupt vector to more then one vcpu) then it may be better to push
> complexity into an ack notifier to not penalize a common scenario.
> But again, if we will not drop the lock around ack notifier call they
> will be serialized.
> 
> > 
> > > > You questioned the purpose of irq_lock earlier, and thats what it is:
> > > > synchronization between pic and ioapic blur at the irq notifiers.
> > > > 
> > > Pic has its own locking and it fails miserably in regards ack notifiers.
> > > I bet nobody tried device assignment with pic. It will not work.
> > 
> > It fails to take irq_lock on automatic EOI because on guest entry 
> > interrupts are disabled (see d5ecfdd25c412df9c0ecf3ab8e066461fd4f69df).
> This explains why the code is buggy, but does not fix the bug. There are
> two bugs at least with the pic + ack notifiers. The first one is that
> irq notifiers are called without irq_lock held and that will trigger
> WARN_ON(!mutex_is_locked(&kvm->irq_lock)) in kvm_set_irq() in device
> assignment case (besides what is the point of a lock if it is not taken
> on every code path?).

This could be fixed by switching irq_lock to a spinlock.

>  Another one is that you can't postpone call to ack notifiers in
> kvm_pic_read_irq() till after pic_update_irq(s). The reason is that
> pic_update_irq() will trigger acked interrupt immediately since ack
> notifier is the one who lowers irq line in device assignment case
> (that is the reason I haven't done the same in ioapic case BTW).

I'm not sure i understand this one, can you be more verbose please?

> > > irq_lock is indeed used to synchronization ioapic access, but the way
> > > it is done requires calling kvm_set_irq() with lock held and this adds
> > > unnecessary lock on a msi irq injection path.
> > > 
> > > > Using RCU to synchronize ack/mask notifier list walk versus list
> > > > addition is good, but i'm not entirely sure that smaller locks like you
> > > > are proposing is a benefit long term (things become much more tricky).
> > > Without removing irq_lock RCU is useless since the list walking is always
> > > done with irq_lock held. I see smaller locks as simplification BTW. The
> > > thing is tricky now, or so I felt while trying to understand what
> > > irq_lock actually protects. With smaller locks with names that clearly
> > > indicates which data structure it protects I feel the code is much
> > > easy to understand.
> > 
> > What is

Re: [PATCH 08/10] Move IO APIC to its own lock.

2009-07-16 Thread Gleb Natapov
On Wed, Jul 15, 2009 at 06:38:48PM -0300, Marcelo Tosatti wrote:
> On Wed, Jul 15, 2009 at 11:48:17PM +0300, Gleb Natapov wrote:
> > > > +   spin_unlock(&ioapic->lock);
> > > > +   kvm_notify_acked_irq(ioapic->kvm, KVM_IRQCHIP_IOAPIC, 
> > > > i);
> > > > +   spin_lock(&ioapic->lock);
> > > 
> > > While traversing the IOAPIC pins you drop the lock and acquire it again,
> > > which is error prone: what if the redirection table is changed in
> > > between, for example?
> > > 
> > Yes, the ack notifiers is a problematic part. The only thing that
> > current ack notifiers do is set_irq_level(0) so this is not the problem
> > in practice. I'll see if I can eliminate this dropping of the lock here.
> 
> Currently, yes. But take into account that an ack notifier might do 
> other things than set_irq_level(0).
> 
For instance? Ack notifier can do whatever it wants if it does not call
back into ioapic. It may call to ioapic only by set_irq_level(0) (which
it does now from device assignment code) or by set_irq_level(1) and this
does not make sense for level triggered interrupts because not calling
set_irq_level(1) will have the same effect.

But I think I found the way to not drop the lock here.

> Say for example an ack notifier that takes the PIC or IOAPIC lock 
> (for whatever reason).
What reason? No one should take PIC or IOAPIC lock outside of PIC or
IOAPIC code. This is layering violation. IOAPIC should be accessed only
through its public interface (set_irq/mmio_read|write/update_eoi)

> 
> > > Also, irq_lock was held during ack and mask notifier callbacks, so they
> > > (the callbacks) did not have to worry about concurrency.
> > > 
> > Is it possible to have more then one ack for the same interrupt line at
> > the same time?
> 
> Why not? But maybe this is a somewhat stupid point, because you can
> require the notifiers to handle that.
If this is possible (and it looks like it may happen if IOAPIC broadcasts
an interrupt vector to more then one vcpu) then it may be better to push
complexity into an ack notifier to not penalize a common scenario.
But again, if we will not drop the lock around ack notifier call they
will be serialized.

> 
> > > You questioned the purpose of irq_lock earlier, and thats what it is:
> > > synchronization between pic and ioapic blur at the irq notifiers.
> > > 
> > Pic has its own locking and it fails miserably in regards ack notifiers.
> > I bet nobody tried device assignment with pic. It will not work.
> 
> It fails to take irq_lock on automatic EOI because on guest entry 
> interrupts are disabled (see d5ecfdd25c412df9c0ecf3ab8e066461fd4f69df).
This explains why the code is buggy, but does not fix the bug. There are
two bugs at least with the pic + ack notifiers. The first one is that
irq notifiers are called without irq_lock held and that will trigger
WARN_ON(!mutex_is_locked(&kvm->irq_lock)) in kvm_set_irq() in device
assignment case (besides what is the point of a lock if it is not taken
on every code path?). Another one is that you can't postpone call to ack
notifiers in kvm_pic_read_irq() till after pic_update_irq(s). The reason
is that pic_update_irq() will trigger acked interrupt immediately since
ack notifier is the one who lowers irq line in device assignment case
(that is the reason I haven't done the same in ioapic case BTW).

> 
> > irq_lock is indeed used to synchronization ioapic access, but the way
> > it is done requires calling kvm_set_irq() with lock held and this adds
> > unnecessary lock on a msi irq injection path.
> > 
> > > Using RCU to synchronize ack/mask notifier list walk versus list
> > > addition is good, but i'm not entirely sure that smaller locks like you
> > > are proposing is a benefit long term (things become much more tricky).
> > Without removing irq_lock RCU is useless since the list walking is always
> > done with irq_lock held. I see smaller locks as simplification BTW. The
> > thing is tricky now, or so I felt while trying to understand what
> > irq_lock actually protects. With smaller locks with names that clearly
> > indicates which data structure it protects I feel the code is much
> > easy to understand.
> 
> What is worrying is if you need to take both PIC and IOAPIC locks for 
> some operation (then have to worry about lock ordering etc).
> 
Lets not worry about far fetched theoretical cases especially since you
have the same problem now. What if you'll need to take both pic lock and
irq_lock? Except that this is not so theoretical because we need to
take them both with current code we just don't do it.

> > > Maybe it is the only way forward (and so you push this complexity to the
> > > ack/mask notifiers), but i think it can be avoided until its proven that
> > > there is contention on irq_lock (which is reduced by faster search).
> > This is not about contention. This is about existence of the lock in the
> > first place. With the patch set there is no lock at msi irq injection
> > path at a

Re: [PATCH 08/10] Move IO APIC to its own lock.

2009-07-15 Thread Marcelo Tosatti
On Wed, Jul 15, 2009 at 11:48:17PM +0300, Gleb Natapov wrote:
> > > + spin_unlock(&ioapic->lock);
> > > + kvm_notify_acked_irq(ioapic->kvm, KVM_IRQCHIP_IOAPIC, i);
> > > + spin_lock(&ioapic->lock);
> > 
> > While traversing the IOAPIC pins you drop the lock and acquire it again,
> > which is error prone: what if the redirection table is changed in
> > between, for example?
> > 
> Yes, the ack notifiers is a problematic part. The only thing that
> current ack notifiers do is set_irq_level(0) so this is not the problem
> in practice. I'll see if I can eliminate this dropping of the lock here.

Currently, yes. But take into account that an ack notifier might do 
other things than set_irq_level(0).

Say for example an ack notifier that takes the PIC or IOAPIC lock 
(for whatever reason).

> > Also, irq_lock was held during ack and mask notifier callbacks, so they
> > (the callbacks) did not have to worry about concurrency.
> > 
> Is it possible to have more then one ack for the same interrupt line at
> the same time?

Why not? But maybe this is a somewhat stupid point, because you can
require the notifiers to handle that.

> > You questioned the purpose of irq_lock earlier, and thats what it is:
> > synchronization between pic and ioapic blur at the irq notifiers.
> > 
> Pic has its own locking and it fails miserably in regards ack notifiers.
> I bet nobody tried device assignment with pic. It will not work.

It fails to take irq_lock on automatic EOI because on guest entry 
interrupts are disabled (see d5ecfdd25c412df9c0ecf3ab8e066461fd4f69df).

> irq_lock is indeed used to synchronization ioapic access, but the way
> it is done requires calling kvm_set_irq() with lock held and this adds
> unnecessary lock on a msi irq injection path.
> 
> > Using RCU to synchronize ack/mask notifier list walk versus list
> > addition is good, but i'm not entirely sure that smaller locks like you
> > are proposing is a benefit long term (things become much more tricky).
> Without removing irq_lock RCU is useless since the list walking is always
> done with irq_lock held. I see smaller locks as simplification BTW. The
> thing is tricky now, or so I felt while trying to understand what
> irq_lock actually protects. With smaller locks with names that clearly
> indicates which data structure it protects I feel the code is much
> easy to understand.

What is worrying is if you need to take both PIC and IOAPIC locks for 
some operation (then have to worry about lock ordering etc).

> > Maybe it is the only way forward (and so you push this complexity to the
> > ack/mask notifiers), but i think it can be avoided until its proven that
> > there is contention on irq_lock (which is reduced by faster search).
> This is not about contention. This is about existence of the lock in the
> first place. With the patch set there is no lock at msi irq injection
> path at all and this is better than having lock no matter if the lock is
> congested or not. There is still a lock on ioapic path (which MSI does
> not use) and removing of this lock should be done only after we see
> that it is congested, because removing it involves adding a number of
> atomic accesses, so it is not clear win without lock contention.
> (removing it will also solve ack notification problem hmmm)

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 08/10] Move IO APIC to its own lock.

2009-07-15 Thread Gleb Natapov
On Wed, Jul 15, 2009 at 02:57:59PM -0300, Marcelo Tosatti wrote:
> On Tue, Jul 14, 2009 at 05:30:43PM +0300, Gleb Natapov wrote:
> > 
> > Signed-off-by: Gleb Natapov 
> > ---
> >  arch/ia64/kvm/kvm-ia64.c |   27 ++--
> >  arch/x86/kvm/lapic.c |5 +---
> >  arch/x86/kvm/x86.c   |   30 ++-
> >  virt/kvm/ioapic.c|   50 
> > -
> >  virt/kvm/ioapic.h|1 +
> >  5 files changed, 73 insertions(+), 40 deletions(-)
> > 
> > diff --git a/arch/ia64/kvm/kvm-ia64.c b/arch/ia64/kvm/kvm-ia64.c
> > index 0ad09f0..dd7ef2d 100644
> > --- a/arch/ia64/kvm/kvm-ia64.c
> > +++ b/arch/ia64/kvm/kvm-ia64.c
> > @@ -850,9 +850,16 @@ static int kvm_vm_ioctl_get_irqchip(struct kvm *kvm,
> >  
> > r = 0;
> > switch (chip->chip_id) {
> > -   case KVM_IRQCHIP_IOAPIC:
> > -   memcpy(&chip->chip.ioapic, ioapic_irqchip(kvm),
> > -   sizeof(struct kvm_ioapic_state));
> > +   case KVM_IRQCHIP_IOAPIC: {
> > +   struct kvm_ioapic *ioapic = ioapic_irqchip(kvm);
> > +   if (ioapic) {
> > +   spin_lock(&ioapic->lock);
> > +   memcpy(&chip->chip.ioapic, ioapic,
> > +  sizeof(struct kvm_ioapic_state));
> > +   spin_unlock(&ioapic->lock);
> > +   } else
> > +   r = -EINVAL;
> > +   }
> > break;
> > default:
> > r = -EINVAL;
> > @@ -867,10 +874,16 @@ static int kvm_vm_ioctl_set_irqchip(struct kvm *kvm, 
> > struct kvm_irqchip *chip)
> >  
> > r = 0;
> > switch (chip->chip_id) {
> > -   case KVM_IRQCHIP_IOAPIC:
> > -   memcpy(ioapic_irqchip(kvm),
> > -   &chip->chip.ioapic,
> > -   sizeof(struct kvm_ioapic_state));
> > +   case KVM_IRQCHIP_IOAPIC: {
> > +   struct kvm_ioapic *ioapic = ioapic_irqchip(kvm);
> > +   if (ioapic) {
> > +   spin_lock(&ioapic->lock);
> > +   memcpy(ioapic, &chip->chip.ioapic,
> > +  sizeof(struct kvm_ioapic_state));
> > +   spin_unlock(&ioapic->lock);
> > +   } else
> > +   r = -EINVAL;
> > +   }
> > break;
> > default:
> > r = -EINVAL;
> > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> > index e2e2849..61ffcfc 100644
> > --- a/arch/x86/kvm/lapic.c
> > +++ b/arch/x86/kvm/lapic.c
> > @@ -471,11 +471,8 @@ static void apic_set_eoi(struct kvm_lapic *apic)
> > trigger_mode = IOAPIC_LEVEL_TRIG;
> > else
> > trigger_mode = IOAPIC_EDGE_TRIG;
> > -   if (!(apic_get_reg(apic, APIC_SPIV) & APIC_SPIV_DIRECTED_EOI)) {
> > -   mutex_lock(&apic->vcpu->kvm->irq_lock);
> > +   if (!(apic_get_reg(apic, APIC_SPIV) & APIC_SPIV_DIRECTED_EOI))
> > kvm_ioapic_update_eoi(apic->vcpu->kvm, vector, trigger_mode);
> > -   mutex_unlock(&apic->vcpu->kvm->irq_lock);
> > -   }
> >  }
> >  
> >  static void apic_send_ipi(struct kvm_lapic *apic)
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index 48567fa..de99b84 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -2010,10 +2010,16 @@ static int kvm_vm_ioctl_get_irqchip(struct kvm 
> > *kvm, struct kvm_irqchip *chip)
> > &pic_irqchip(kvm)->pics[1],
> > sizeof(struct kvm_pic_state));
> > break;
> > -   case KVM_IRQCHIP_IOAPIC:
> > -   memcpy(&chip->chip.ioapic,
> > -   ioapic_irqchip(kvm),
> > -   sizeof(struct kvm_ioapic_state));
> > +   case KVM_IRQCHIP_IOAPIC: {
> > +   struct kvm_ioapic *ioapic = ioapic_irqchip(kvm);
> > +   if (ioapic) {
> > +   spin_lock(&ioapic->lock);
> > +   memcpy(&chip->chip.ioapic, ioapic,
> > +  sizeof(struct kvm_ioapic_state));
> > +   spin_unlock(&ioapic->lock);
> > +   } else
> > +   r = -EINVAL;
> > +   }
> > break;
> > default:
> > r = -EINVAL;
> > @@ -2042,12 +2048,16 @@ static int kvm_vm_ioctl_set_irqchip(struct kvm 
> > *kvm, struct kvm_irqchip *chip)
> > sizeof(struct kvm_pic_state));
> > spin_unlock(&pic_irqchip(kvm)->lock);
> > break;
> > -   case KVM_IRQCHIP_IOAPIC:
> > -   mutex_lock(&kvm->irq_lock);
> > -   memcpy(ioapic_irqchip(kvm),
> > -   &chip->chip.ioapic,
> > -   sizeof(struct kvm_ioapic_state));
> > -   mutex_unlock(&kvm->irq_lock);
> > +   case KVM_IRQCHIP_IOAPIC: {
> > +   struct kvm_ioapic *ioapic = ioapic_irqchip(kvm);
> > +   if (ioapic) {
> > +   spin_lock(&ioapic->lock);
> > +   memcpy(ioapic, &chip->chip.ioapic,
> > +  sizeof(struct kvm_ioapic_state));
> > +  

Re: [PATCH 08/10] Move IO APIC to its own lock.

2009-07-15 Thread Marcelo Tosatti
On Tue, Jul 14, 2009 at 05:30:43PM +0300, Gleb Natapov wrote:
> 
> Signed-off-by: Gleb Natapov 
> ---
>  arch/ia64/kvm/kvm-ia64.c |   27 ++--
>  arch/x86/kvm/lapic.c |5 +---
>  arch/x86/kvm/x86.c   |   30 ++-
>  virt/kvm/ioapic.c|   50 -
>  virt/kvm/ioapic.h|1 +
>  5 files changed, 73 insertions(+), 40 deletions(-)
> 
> diff --git a/arch/ia64/kvm/kvm-ia64.c b/arch/ia64/kvm/kvm-ia64.c
> index 0ad09f0..dd7ef2d 100644
> --- a/arch/ia64/kvm/kvm-ia64.c
> +++ b/arch/ia64/kvm/kvm-ia64.c
> @@ -850,9 +850,16 @@ static int kvm_vm_ioctl_get_irqchip(struct kvm *kvm,
>  
>   r = 0;
>   switch (chip->chip_id) {
> - case KVM_IRQCHIP_IOAPIC:
> - memcpy(&chip->chip.ioapic, ioapic_irqchip(kvm),
> - sizeof(struct kvm_ioapic_state));
> + case KVM_IRQCHIP_IOAPIC: {
> + struct kvm_ioapic *ioapic = ioapic_irqchip(kvm);
> + if (ioapic) {
> + spin_lock(&ioapic->lock);
> + memcpy(&chip->chip.ioapic, ioapic,
> +sizeof(struct kvm_ioapic_state));
> + spin_unlock(&ioapic->lock);
> + } else
> + r = -EINVAL;
> + }
>   break;
>   default:
>   r = -EINVAL;
> @@ -867,10 +874,16 @@ static int kvm_vm_ioctl_set_irqchip(struct kvm *kvm, 
> struct kvm_irqchip *chip)
>  
>   r = 0;
>   switch (chip->chip_id) {
> - case KVM_IRQCHIP_IOAPIC:
> - memcpy(ioapic_irqchip(kvm),
> - &chip->chip.ioapic,
> - sizeof(struct kvm_ioapic_state));
> + case KVM_IRQCHIP_IOAPIC: {
> + struct kvm_ioapic *ioapic = ioapic_irqchip(kvm);
> + if (ioapic) {
> + spin_lock(&ioapic->lock);
> + memcpy(ioapic, &chip->chip.ioapic,
> +sizeof(struct kvm_ioapic_state));
> + spin_unlock(&ioapic->lock);
> + } else
> + r = -EINVAL;
> + }
>   break;
>   default:
>   r = -EINVAL;
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index e2e2849..61ffcfc 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -471,11 +471,8 @@ static void apic_set_eoi(struct kvm_lapic *apic)
>   trigger_mode = IOAPIC_LEVEL_TRIG;
>   else
>   trigger_mode = IOAPIC_EDGE_TRIG;
> - if (!(apic_get_reg(apic, APIC_SPIV) & APIC_SPIV_DIRECTED_EOI)) {
> - mutex_lock(&apic->vcpu->kvm->irq_lock);
> + if (!(apic_get_reg(apic, APIC_SPIV) & APIC_SPIV_DIRECTED_EOI))
>   kvm_ioapic_update_eoi(apic->vcpu->kvm, vector, trigger_mode);
> - mutex_unlock(&apic->vcpu->kvm->irq_lock);
> - }
>  }
>  
>  static void apic_send_ipi(struct kvm_lapic *apic)
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 48567fa..de99b84 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -2010,10 +2010,16 @@ static int kvm_vm_ioctl_get_irqchip(struct kvm *kvm, 
> struct kvm_irqchip *chip)
>   &pic_irqchip(kvm)->pics[1],
>   sizeof(struct kvm_pic_state));
>   break;
> - case KVM_IRQCHIP_IOAPIC:
> - memcpy(&chip->chip.ioapic,
> - ioapic_irqchip(kvm),
> - sizeof(struct kvm_ioapic_state));
> + case KVM_IRQCHIP_IOAPIC: {
> + struct kvm_ioapic *ioapic = ioapic_irqchip(kvm);
> + if (ioapic) {
> + spin_lock(&ioapic->lock);
> + memcpy(&chip->chip.ioapic, ioapic,
> +sizeof(struct kvm_ioapic_state));
> + spin_unlock(&ioapic->lock);
> + } else
> + r = -EINVAL;
> + }
>   break;
>   default:
>   r = -EINVAL;
> @@ -2042,12 +2048,16 @@ static int kvm_vm_ioctl_set_irqchip(struct kvm *kvm, 
> struct kvm_irqchip *chip)
>   sizeof(struct kvm_pic_state));
>   spin_unlock(&pic_irqchip(kvm)->lock);
>   break;
> - case KVM_IRQCHIP_IOAPIC:
> - mutex_lock(&kvm->irq_lock);
> - memcpy(ioapic_irqchip(kvm),
> - &chip->chip.ioapic,
> - sizeof(struct kvm_ioapic_state));
> - mutex_unlock(&kvm->irq_lock);
> + case KVM_IRQCHIP_IOAPIC: {
> + struct kvm_ioapic *ioapic = ioapic_irqchip(kvm);
> + if (ioapic) {
> + spin_lock(&ioapic->lock);
> + memcpy(ioapic, &chip->chip.ioapic,
> +sizeof(struct kvm_ioapic_state));
> + spin_unlock(&ioapic->lock);
> + } else
> + r = -EINVAL;
> + }
>   break;
>