Re: [kvm-devel] [PATCH] Add irqdevice interface + userint implementation

2007-04-10 Thread Avi Kivity
Gregory Haskins wrote:
 I still have not converted over to kerneldoc format as I cannot find an 
 example anywhere yet, and the documentation under 
 Documentation/kernel-doc-nano-HOWTO was a little vague.  Pointers appreciated.

   

see lib/kref.c for an example.



-- 
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.


-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH] Add irqdevice interface + userint implementation

2007-04-10 Thread Avi Kivity
Gregory Haskins wrote:
 +
 +struct kvm_irqdevice;
 +
 +struct kvm_irqsink {
 +   void (*raise_intr)(struct kvm_irqsink *this, 
 +  struct kvm_irqdevice *dev);
 +
 +   void *private;
 +};
 +
 +struct kvm_irqdevice {
 +   int  (*pending)(struct kvm_irqdevice *this, int flags);
 +   int  (*read_vector)(struct kvm_irqdevice *this, int flags); 
   
   
 I'm a bit confused here.  The kvm_irqsink mechanism implies a push 
 mechanism.  - pending and - read_vector imply a pull mechanism.
 

 What I am modeling is the concept of an PIC INTR + INTA.  INTR serves to just 
 cause the processor to realize it needs to service an interrupt.  The INTA 
 cycle is what then gets invoked after INTR to figure out which vector is 
 appropriate.  Finally, the vector is called out of the IDT.  

 However, there is no actual INTA cycle on a virtualized guest.  We emulate an 
 INTA cycle by calling our pending/read routines and injecting the vector into 
 the VMCS.  Whats missing from the current HEAD is that there is no concept of 
 INTR.  This is whats new, and its why we have both a push and a pull 
 mechanism.  The old way only had the pull.  Does this make sense?  
   

Hmm.  The current code is synchronous in nature (the vcpu is never
executing while we raise an interrupt, so the INTA is never needed, as
we can ensure the cpu can process the interrupt when we inject it.  With
smp/paravirt/whatever (I realize now), this is no longer true.  The NIC
raises an interrupt, we have to interrupt the guest to see if its
current IF/cr8 permit interrupt injection, and if not,  we have to keep
the interrupt in the irqdevice and request an exit when IF/cr8 permit
injection.

This means that -pending() and -read_vector() have to be called in a
critical section, otherwise the pending interrupt might be change
between the first and second call, resulting in an injected interrupt to
software that is not ready to accept it.  If we replace read_vector by
-ack(this, int vector) we avoid this need (and also save a pointless
recalculation of the vector).

  

 +static inline int bitarray_findhighest(struct bitarray *this)
 +{
 +   if (!this- summary)
 +   return - 1;
 +   else {
 +   int word_index = __ffs(this- summary);
 +   int bit_index  = __ffs(this- pending[word_index]);
 +
 +   return word_index * BITS_PER_LONG + bit_index;  
 +   }
 +}
   
   
 Um, this is the lowest?
 

 Opps...carry over from the original code.  I forgot to reverse the polarity.

   

Note that the original code did not work as intended.  Because of the
IF/tpr issues, there is at most interrupt queued, and only when it is
ready to be accepted.  With the in-kernel apic that changes of course.

   
 +
 +static int userint_pending(struct kvm_irqdevice *this, int flags)
 +{
 +   kvm_userint *s = (kvm_userint*)this- private;
 +   int ret;
 +
 +   spin_lock_irq(s- lock);
 +
 +   if (flags  KVM_IRQFLAGS_NMI)
 +   ret = s- nmi_pending;
 +   else
 +   ret = bitarray_pending(s- irq_pending);
   
   
 You probably want:

 ret = s- nmi_pending;
 if (!(flags  KVM_IRQFLAGS_NMI))
   ret |= bitarray_pending(...);

 To avoid calling this twice when interrupts are enabled.
 

 I think there is a misunderstanding.  The return value from this function 
 should just be 0 for false, and non-zero for true.  The bitarray still holds 
 a bit for IRQ2 so you technically wouldn't need to call this twice when 
 interrupts are enabled.  Simply calling this without the flag set will give 
 you the proper behavior, so I think it is fine as written.  It also doesn't 
 hurt to do it the way you wrote it either.  Let me know if I misunderstood 
 you.

   

Ah, I forgot that nmi is also vector 2 here.



-- 
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.


-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH] Add irqdevice interface + userint implementation

2007-04-10 Thread Christoph Hellwig
On Mon, Apr 09, 2007 at 04:27:54PM -0400, Gregory Haskins wrote:
 Here is the next pass of the patch with changes based on feedback.
 
 I still have not converted over to kerneldoc format as I cannot find an 
 example anywhere yet, and the documentation under 
 Documentation/kernel-doc-nano-HOWTO was a little vague.  Pointers appreciated.

This still seems to miss most/all of my feedback.


-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH] Add irqdevice interface + userint implementation

2007-04-10 Thread Gregory Haskins
 On Tue, Apr 10, 2007 at  3:52 AM, in message [EMAIL PROTECTED],
Avi Kivity [EMAIL PROTECTED] wrote: 

 Hmm.  The current code is synchronous in nature (the vcpu is never
 executing while we raise an interrupt, so the INTA is never needed, as
 we can ensure the cpu can process the interrupt when we inject it.  With
 smp/paravirt/whatever (I realize now), this is no longer true.  

Exactly.  This is in prep for the SMP/PV stuff.  I anticipate that the vcpu 
INTR will effectively be a no-op until then.  Eventually we can add the logic 
to host-IPI the vcpu if its running, but this state is an impossibility today 
due to the synchronous nature as you have pointed out.

 The NIC raises an interrupt, we have to interrupt the guest to see if its
 current IF/cr8 permit interrupt injection, and if not,  we have to keep
 the interrupt in the irqdevice and request an exit when IF/cr8 permit
 injection.

Exactly.  The INTR really just serves to kick the cpu into a VMEXIT so we can 
evaluate the IF/TPR.  In many cases this may reveal that the CPU might not be 
able to take the interrupt right then and there, so it will remain queued by 
the irqdevice model until it can.  If the interrupt cannot be dispatched, we 
would turn on any relevant EXIT feature (e.g., interrupt-window-exiting) if we 
dont do this already (I believe this is already in there)

 
 This means that - pending() and - read_vector() have to be called in a
 critical section, otherwise the pending interrupt might be change
 between the first and second call, resulting in an injected interrupt to
 software that is not ready to accept it.  If we replace read_vector by
 - ack(this, int vector) we avoid this need (and also save a pointless
 recalculation of the vector).

I am a little confused as to what you are proposing.  The current design has 
pending only returning a boolean if there is something pending, and read_vector 
actually does the INTA cycle (i.e., returns the vector # and marks it 
accepted).  I can certainly see how a lack of critical section between pending 
and read_vector could allow a new interrupt to be injected, but I am not seeing 
why this is a problem.  If a new higher-priority interrupt comes in between the 
pending and read_vector, the new one will be returned, but I think this is 
desirable?

And if we replace read_vector() with ack(), how do we learn of the vector in 
the first place?  I am guessing that you are thinking that both pending and 
read_vector return a vector and thus the confusion.  Otherwise, I am missing 
your point completely :)  Could you elaborate?

 Note that the original code did not work as intended.  Because of the
 IF/tpr issues, there is at most interrupt queued, and only when it is
 ready to be accepted.  

Ack

 With the in- kernel apic that changes of course.

Agreed

Regards,
-Greg





-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH] Add irqdevice interface + userint implementation

2007-04-10 Thread Avi Kivity
Gregory Haskins wrote:
 The NIC raises an interrupt, we have to interrupt the guest to see if its
 current IF/cr8 permit interrupt injection, and if not,  we have to keep
 the interrupt in the irqdevice and request an exit when IF/cr8 permit
 injection.
 

 Exactly.  The INTR really just serves to kick the cpu into a VMEXIT so we can 
 evaluate the IF/TPR.  In many cases this may reveal that the CPU might not be 
 able to take the interrupt right then and there, so it will remain queued by 
 the irqdevice model until it can.  If the interrupt cannot be dispatched, we 
 would turn on any relevant EXIT feature (e.g., interrupt-window-exiting) if 
 we dont do this already (I believe this is already in there)

   
 This means that - pending() and - read_vector() have to be called in a
 critical section, otherwise the pending interrupt might be change
 between the first and second call, resulting in an injected interrupt to
 software that is not ready to accept it.  If we replace read_vector by
 - ack(this, int vector) we avoid this need (and also save a pointless
 recalculation of the vector).
 

 I am a little confused as to what you are proposing.  The current design has 
 pending only returning a boolean if there is something pending, and 
 read_vector actually does the INTA cycle (i.e., returns the vector # and 
 marks it accepted).  I can certainly see how a lack of critical section 
 between pending and read_vector could allow a new interrupt to be injected, 
 but I am not seeing why this is a problem.  If a new higher-priority 
 interrupt comes in between the pending and read_vector, the new one will be 
 returned, but I think this is desirable?

 And if we replace read_vector() with ack(), how do we learn of the vector in 
 the first place?  I am guessing that you are thinking that both pending and 
 read_vector return a vector and thus the confusion.  Otherwise, I am missing 
 your point completely :)  Could you elaborate?
   

That's what I thought, but it doesn't really matter:  what's important 
is that accepting an interrupt has several stages, and that they must 
act as a transaction.

The stages are:
1. determine which interrupt is to be injected (if not, abort)
2. check if IF and tpr allow injection
3. one of
 - inject the interrupt and mark it as in-service
 - request an exit on IF and/or tpr threshold

Simply indicating that an interrupt may be pending is already performed 
by the irq_sink callback.

I think it would be best to put all stages into one callback so that the 
locking is kept simple, but that's just handwaving.  Seeing the code 
would make my comments more concrete.

One way around the issue is to have a single callback, -inject_irq(), 
that is responsible for all of the above.  It would call vcpu helper 
functions to check IF and do the actual injection or exit request.


-- 
error compiling committee.c: too many arguments to function


-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH] Add irqdevice interface + userint implementation

2007-04-10 Thread Gregory Haskins
 On Tue, Apr 10, 2007 at  8:01 AM, in message [EMAIL PROTECTED],
Avi Kivity [EMAIL PROTECTED] wrote: 
 Gregory Haskins wrote:

 And if we replace read_vector() with ack(), how do we learn of the vector in 
 the first place?  I am guessing that you are thinking that both pending and 
 read_vector return a vector and thus the confusion.  Otherwise, I am missing 
 your point completely :)  Could you elaborate?
   
 
 That's what I thought, but it doesn't really matter:  what's important 
 is that accepting an interrupt has several stages, and that they must 
 act as a transaction.
 
 The stages are:
 1. determine which interrupt is to be injected (if not, abort)
 2. check if IF and tpr allow injection
 3. one of
  -  inject the interrupt and mark it as in- service
  -  request an exit on IF and/or tpr threshold
 
 Simply indicating that an interrupt may be pending is already performed 
 by the irq_sink callback.

Yes, but the operative words here are may be.  INTR tells the system that an 
interrupt *may be* pending, but we don't yet actually know which vector (if 
any) will be eligible for injection.  Thinking down the road when we will have 
SMP/PV, we must assume that the CPU could be actually running concurrently.  If 
so, it must be stopped first before we can evaluate the state (e.g., IF, TPR, 
etc.) to know which vector (if any) can be dispatched.  The eligibility of a 
vector (or the vector itself if a new, higher-pri line is raised in between) 
may change up until the point the EXIT actually occurs.  

This is analogous to how the 8259 INTR/INTA cycle works, IIUC.  The INTR line 
is raised to get the CPUs attention asynchronously independent of CPU state, 
but its the processor controlled INTA cycle that retrieves the vector to use.  
IMHO, we want to do the same thing here.   The thread that invokes the set_pin 
and thus indirectly the INTR is one context.  You can think of this as the PIC. 
 The thread that is running the guest and VMEXITs is another context.  You can 
think of this as the processor.  The PIC signals the processor that it needs 
attention, but the vector is conveyed on the processor's terms when its ready.  
In cases where there is no parallelism (e.g., todays sync design, or of the 
vcpu happens to not be executing), the INTR is a no-op. 

Now as far as the rest of the API is concerned, pending() tells us definitively 
(and efficiently) after we have stopped that there really is something pending 
given the current state of the system.  read_vector() tells us what that vector 
actually is.  Really, read_vector() is the authoritative source of data and 
acts like our processor controlled INTA cycle.  The important thing is that 
this read_vector() method is called in the same context as the guest CPU 
thread.  The other two (irq_sink, pending) are just helpers to get us to that 
point.

Where I really think we need to have a critical section is between the 
vcpu-INTR callback, and the part of the vcpu logic that evaluates whether 
interrupts are pending (to avoid missing a newly injected interrupt while we 
are processing).  However, I really see this as an vcpu issue more than an 
issue against the irqdevice interface.  Does this make sense?

I will put together a follow-on patch that implements my thinking to help 
illustrate the concept.

Regards,
-Greg



-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH] Add irqdevice interface + userint implementation

2007-04-10 Thread Avi Kivity
Gregory Haskins wrote:
 On Tue, Apr 10, 2007 at  8:01 AM, in message [EMAIL PROTECTED],
 
 Avi Kivity [EMAIL PROTECTED] wrote: 
   
 Gregory Haskins wrote:
 
 And if we replace read_vector() with ack(), how do we learn of the vector 
 in 
   
 the first place?  I am guessing that you are thinking that both pending and 
 read_vector return a vector and thus the confusion.  Otherwise, I am missing 
 your point completely :)  Could you elaborate?
 
   
   
 That's what I thought, but it doesn't really matter:  what's important 
 is that accepting an interrupt has several stages, and that they must 
 act as a transaction.

 The stages are:
 1. determine which interrupt is to be injected (if not, abort)
 2. check if IF and tpr allow injection
 3. one of
  -  inject the interrupt and mark it as in- service
  -  request an exit on IF and/or tpr threshold

 Simply indicating that an interrupt may be pending is already performed 
 by the irq_sink callback.
 

 Yes, but the operative words here are may be.  INTR tells the system that 
 an interrupt *may be* pending, but we don't yet actually know which vector 
 (if any) will be eligible for injection.  Thinking down the road when we will 
 have SMP/PV, we must assume that the CPU could be actually running 
 concurrently.  If so, it must be stopped first before we can evaluate the 
 state (e.g., IF, TPR, etc.) to know which vector (if any) can be dispatched.  
 The eligibility of a vector (or the vector itself if a new, higher-pri line 
 is raised in between) may change up until the point the EXIT actually occurs. 
  

 This is analogous to how the 8259 INTR/INTA cycle works, IIUC.  The INTR line 
 is raised to get the CPUs attention asynchronously independent of CPU state, 
 but its the processor controlled INTA cycle that retrieves the vector to use. 
  IMHO, we want to do the same thing here.   The thread that invokes the 
 set_pin and thus indirectly the INTR is one context.  You can think of this 
 as the PIC.  The thread that is running the guest and VMEXITs is another 
 context.  You can think of this as the processor.  The PIC signals the 
 processor that it needs attention, but the vector is conveyed on the 
 processor's terms when its ready.  In cases where there is no parallelism 
 (e.g., todays sync design, or of the vcpu happens to not be executing), the 
 INTR is a no-op. 

 Now as far as the rest of the API is concerned, pending() tells us 
 definitively (and efficiently) after we have stopped that there really is 
 something pending given the current state of the system.  read_vector() tells 
 us what that vector actually is.  Really, read_vector() is the authoritative 
 source of data and acts like our processor controlled INTA cycle.  The 
 important thing is that this read_vector() method is called in the same 
 context as the guest CPU thread.  The other two (irq_sink, pending) are just 
 helpers to get us to that point.

 Where I really think we need to have a critical section is between the 
 vcpu-INTR callback, and the part of the vcpu logic that evaluates whether 
 interrupts are pending (to avoid missing a newly injected interrupt while we 
 are processing).  However, I really see this as an vcpu issue more than an 
 issue against the irqdevice interface.  Does this make sense?
   

Both the vcpu and the irqdevice are involved in the decision:

- the vcpu holds the IF (or rather, the interrupt window)
- the lapic holds the tpr
- but the vcpu contains the controls that allow exiting on tpr 
modification (on 64-bit and with next-gen Intel processors)

IIUC, you are suggesting (pseudocode):

  spin_lock(vcpu-irq_lock)
  nmi = !irq_window_open(vcpu)
  if (vcpu-irq-pending(nmi))
  inject(vcpu, vcpu-irq-read_vector(nmi))
  spin_unlock(vcpu-irq_lock)

where -pending() has to program a tpr exit if current tpr masks an 
interrupt.

The alternative I'm suggesting is to move all this mess into the 
irqdevice code in a single callback.  While this has less reuse 
potential, in fact the hardware itself is messy and attempts to 
generalize it may turn into a mess as well.

 I will put together a follow-on patch that implements my thinking to help 
 illustrate the concept.
   

That will be best.  After all, we're not trying to design a generic 
interface, we have a know an limited set of users.  Seeing the users 
will help evaluate the API.


-- 
error compiling committee.c: too many arguments to function


-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH] Add irqdevice interface + userint implementation

2007-04-10 Thread Gregory Haskins
 On Tue, Apr 10, 2007 at 10:46 AM, in message [EMAIL PROTECTED],
Avi Kivity [EMAIL PROTECTED] wrote: 
 
 Both the vcpu and the irqdevice are involved in the decision:
 
 -  the vcpu holds the IF (or rather, the interrupt window)
 -  the lapic holds the tpr
 -  but the vcpu contains the controls that allow exiting on tpr 
 modification (on 64- bit and with next- gen Intel processors)

Agreed

 
 IIUC, you are suggesting (pseudocode):
 
   spin_lock(vcpu- irq_lock)
   nmi = !irq_window_open(vcpu)
   if (vcpu- irq- pending(nmi))
   inject(vcpu, vcpu- irq- read_vector(nmi))
   spin_unlock(vcpu- irq_lock)
 

Your psuedo-code is spot on.  This would also be coupled with the locks inside 
the INTR callback, of course. 

 where - pending() has to program a tpr exit if current tpr masks an 
 interrupt.

I wasn't planning on the irqdevice having this type of control because I 
assumed we would always have these exit types set, but you have highlighted an 
interesting point.  If we want to selectively enable these different types of 
exits (and I agree that it's probably a good idea), we need more intelligence 
from the irqdevice.  Now the question is how to handle/detect the different 
*types* of masking that can occur.  My current API is really just designed to 
return a boolean: true = there are injectable vectors, false = there are no 
injectable vectors (but there may be masked vectors).  

I don't like the idea of this programming being handled directly by the 
irqdevice because I think its potentially the wrong layer.  However, we do need 
to account for this somehow.  We could change the API to handle the pending 
condition as a tri-level: no-interrupts, masked-interrupts, 
injectable-interrupts?  For no and injectable, the operation is as your 
psuedo-code indicates.  For the masked case, we could enumerate different 
classes of reasons (e.g. TPR, etc) that could be used to drive the programming. 
 So the pseudo-code would now be:

   spin_lock(vcpu- irq_lock)
   nmi = !irq_window_open(vcpu)
   switch (vcpu- irq- pending(nmi, mask_reasons)) {
   case NONE_PENDING:
  break;
   case MASKED_PENDING:
  if (mask_reasons  MASKREASON_TPR)
  vcpu-exit_reasons = TPR;
  /* etc */
  break;
case PENDING:
   inject(vcpu, vcpu- irq- read_vector(nmi))
   break;
   }
   spin_unlock(vcpu- irq_lock)

Thoughts?

 
 The alternative I'm suggesting is to move all this mess into the 
 irqdevice code in a single callback.  While this has less reuse 
 potential, in fact the hardware itself is messy and attempts to 
 generalize it may turn into a mess as well.
 

I cant quite wrap my head around how I could work out all of the issues as I 
(think) have done with the current proposal, but I am open to suggestion.  Can 
you elaborate more?

 I will put together a follow- on patch that implements my thinking to help 
 illustrate the concept.
   
 
 That will be best.  After all, we're not trying to design a generic 
 interface, we have a know an limited set of users.  Seeing the users 
 will help evaluate the API.

Will do.

Regards,
-Greg




-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH] Add irqdevice interface + userint implementation

2007-04-10 Thread Avi Kivity
Gregory Haskins wrote:
   
 where - pending() has to program a tpr exit if current tpr masks an 
 interrupt.
 

 I wasn't planning on the irqdevice having this type of control because I 
 assumed we would always have these exit types set, but you have highlighted 
 an interesting point.  If we want to selectively enable these different types 
 of exits (and I agree that it's probably a good idea), we need more 
 intelligence from the irqdevice.  Now the question is how to handle/detect 
 the different *types* of masking that can occur.  My current API is really 
 just designed to return a boolean: true = there are injectable vectors, 
 false = there are no injectable vectors (but there may be masked vectors).  

 I don't like the idea of this programming being handled directly by the 
 irqdevice because I think its potentially the wrong layer.  However, we do 
 need to account for this somehow.  We could change the API to handle the 
 pending condition as a tri-level: no-interrupts, masked-interrupts, 
 injectable-interrupts?  For no and injectable, the operation is as your 
 psuedo-code indicates.  For the masked case, we could enumerate different 
 classes of reasons (e.g. TPR, etc) that could be used to drive the 
 programming.  So the pseudo-code would now be:

spin_lock(vcpu- irq_lock)
nmi = !irq_window_open(vcpu)
switch (vcpu- irq- pending(nmi, mask_reasons)) {
case NONE_PENDING:
   break;
case MASKED_PENDING:
   if (mask_reasons  MASKREASON_TPR)
   vcpu-exit_reasons = TPR;
   /* etc */
   break;
 case PENDING:
inject(vcpu, vcpu- irq- read_vector(nmi))
break;
}
spin_unlock(vcpu- irq_lock)

 Thoughts?

   
 The alternative I'm suggesting is to move all this mess into the 
 irqdevice code in a single callback.  While this has less reuse 
 potential, in fact the hardware itself is messy and attempts to 
 generalize it may turn into a mess as well.

 

 I cant quite wrap my head around how I could work out all of the issues as I 
 (think) have done with the current proposal, but I am open to suggestion.  
 Can you elaborate more?
   

My suggestion is basically to give up on having well separated layers of 
interrupts and vcpu but hack as necessary.  While this may sound 
heretical, it actually follows the hardware.  Note how cr8 (a cpu 
register) is aliased to the tpr (an irqdevice entity), and how VT and 
SVM (which are supposed to virtualize the cpu) actually provide some 
apic emulation capabilities.

My point is that the hardware is a tangled mess, and any attempt to 
pretend otherwise in software will lead to a brittle model that breaks 
often/is unnatural/is clumsy (like the switch above).

-- 
error compiling committee.c: too many arguments to function


-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH] Add irqdevice interface + userint implementation

2007-04-09 Thread Gregory Haskins
 On Sun, Apr 8, 2007 at  5:58 AM, in message
[EMAIL PROTECTED], Christoph Hellwig [EMAIL PROTECTED]
wrote: 
  diff -- git a/drivers/kvm/kvm_irqdevice.h b/drivers/kvm/kvm_irqdevice.h
 new file mode 100644
 index 000..9c91d15
 ---  /dev/null
 +++ b/drivers/kvm/kvm_irqdevice.h
 @@ - 0,0 +1,206 @@
 +/*
 + * kvm_irqdevice.h
 
 No need to mention the file name in the top of file comment.  Quite 
 contrary,
 this comment can easily get out of sync and thus is not encouraged.

Ack

 
 + *
 + * Defines an interface for an abstract interrupt controller.  The model 
 + * consists of a unit with an arbitrary number of input lines (IRQ0- N), an
 + * output line (INTR), and methods for completing an interrupt- acknowledge
 + * cycle (INTA).  A particular implementation of this model will define
 + * various policies, such as irq- to- vector translation, INTA/auto- EOI 
 policy,
 + * etc.  
 + * 
 + * In addition, the INTR callback mechanism allows the unit to be wired 
 to
 + * an interruptible source in a very flexible manner. For instance, an 
 + * irqdevice could have its INTR wired to a VCPU (ala LAPIC), or another 
 + * interrupt controller (ala cascaded i8259s)
 + *
 + * Copyright (C) 2007 Novell
 + *
 + * Authors:
 + *   Gregory Haskins [EMAIL PROTECTED]
 + *
 + * This program is free software; you can redistribute it and/or modify it
 + * under the terms and conditions of the GNU General Public License,
 + * version 2, as published by the Free Software Foundation.
 + *
 + * This program is distributed in the hope it will be useful, but WITHOUT
 + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
 + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License 
 for
 + * more details.
 + *
 + * You should have received a copy of the GNU General Public License along 
 with
 + * this program; if not, write to the Free Software Foundation, Inc., 59 
 Temple
 + * Place -  Suite 330, Boston, MA 02111- 1307 USA.
 
 It would be nice if you could follow the top of file comment style used
 in the rest of the kvm code instead of using a very different one.

I actually copied this header from one of the KVM files, but I will take a 
deeper look to see if I can figure out what you are talking about.  Perhaps you 
are referring to how I described the file, rather than the other details like 
the GPL section, etc.

 
 +/*
 + * kvm_irq_init()
 + *
 + * Description:
 + *Initialized the kvm_irqdevice for use.  Should be called before 
 calling
 + *any derived implementation init functions
 + * 
 + * Parameters:
 + *struct kvm_irqdevice *dev: This device
 + *
 + * Returns: (void)
 + */
 
 Please use kernel- doc style comments that can be used to auto- generate
 documentation.

I can do this if its the right thing to do...but isnt that for general kernel 
API?  This stuff is really just private to KVM.  Please advise. 

 
 +static inline int kvm_irq_pending(struct kvm_irqdevice *dev, int flags)
 +{
 +return dev- pending(dev, flags);
 +}

I respectfully disagree with these next set of objections related to the 
wrappers.  I will try to detail the reasons why

 
 This wrapper is not needed at all, just call the method directly.

I would agree with you if were were talking about disparate pointers: e.g. 
kvm_arch_ops for function context and kvm_vcpu for data context.  However, in 
this case we have the same pointer for both (e.g. we are emulating the 
implicit this from C++).  The two *always* being the same is an invariant 
that this wrapper helps to enforce.  Lack of this wrapper is just asking for a 
bug, IMHO.

 
 +static inline int kvm_irq_read_vector(struct kvm_irqdevice *dev, int flags)
 +{
 +return dev- read_vector(dev, flags);
 +}
 
 ditto
 
 +static inline int kvm_irq_set_pin(struct kvm_irqdevice *dev, int pin, 
 +  int level, int flags)
 +{
 +return dev- set_pin(dev, pin, level, flags);
 +}
 
 ditto
 
 +static inline int kvm_irq_summary(struct kvm_irqdevice *dev, void *data)
 +{
 +return dev- summary(dev, data);
 +}
 
 ditto.
 
 +static inline void kvm_irq_register_sink(struct kvm_irqdevice *dev, 
 + const struct kvm_irqsink *sink)
 +{
 +dev- sink = *sink;
 +}
 
 Completely useless wrapper, just do the assignment in the caller.

This has to do with encapsulation.  The fact that the data is stored in 
dev-sink is an implementation detail that I prefer to see hidden by the 
interface.  For instance, what if someone changes their mind about the current 
policy of a singleton callback (e.g. allow multiple callback registrations)?  
Or what if someone wants to add error checking to make sure a callback is 
registered only once? 

 Also the convention of copying the operation vectors is rather unnatural
 for linux, just set the pointer (and make sure the operations regustired
 are file- scope not local- scope as in your current code)

Ack.

 
 +static inline void kvm_irq_raise_intr(struct 

Re: [kvm-devel] [PATCH] Add irqdevice interface + userint implementation

2007-04-09 Thread Gregory Haskins
Here is the next pass of the patch with changes based on feedback.

I still have not converted over to kerneldoc format as I cannot find an example 
anywhere yet, and the documentation under Documentation/kernel-doc-nano-HOWTO 
was a little vague.  Pointers appreciated.

-Greg

---
KVM: Add irqdevice object

The current code is geared towards using a user-mode (A)PIC.  This patch adds
an irqdevice abstraction, and implements a userint model to handle the
duties of the original code.  Later, we can develop other irqdevice models 
to handle objects like LAPIC, IOAPIC, i8259, etc, as appropriate

Signed-off-by: Gregory Haskins [EMAIL PROTECTED]
---

 drivers/kvm/Makefile|2 
 drivers/kvm/irqdevice.h |  202 +
 drivers/kvm/kvm.h   |9 +-
 drivers/kvm/kvm_main.c  |   57 +
 drivers/kvm/svm.c   |   33 ---
 drivers/kvm/userint.c   |  210 +++
 drivers/kvm/vmx.c   |   29 +++---
 7 files changed, 489 insertions(+), 53 deletions(-)

diff --git a/drivers/kvm/Makefile b/drivers/kvm/Makefile
index c0a789f..540afbc 100644
--- a/drivers/kvm/Makefile
+++ b/drivers/kvm/Makefile
@@ -2,7 +2,7 @@
 # Makefile for Kernel-based Virtual Machine module
 #
 
-kvm-objs := kvm_main.o mmu.o x86_emulate.o
+kvm-objs := kvm_main.o mmu.o x86_emulate.o userint.o
 obj-$(CONFIG_KVM) += kvm.o
 kvm-intel-objs = vmx.o
 obj-$(CONFIG_KVM_INTEL) += kvm-intel.o
diff --git a/drivers/kvm/irqdevice.h b/drivers/kvm/irqdevice.h
new file mode 100644
index 000..e4377d4
--- /dev/null
+++ b/drivers/kvm/irqdevice.h
@@ -0,0 +1,202 @@
+/*
+ * Defines an interface for an abstract interrupt controller.  The model 
+ * consists of a unit with an arbitrary number of input lines (IRQ0-N), an
+ * output line (INTR), and methods for completing an interrupt-acknowledge
+ * cycle (INTA).  A particular implementation of this model will define
+ * various policies, such as irq-to-vector translation, INTA/auto-EOI policy,
+ * etc.  
+ * 
+ * In addition, the INTR callback mechanism allows the unit to be wired to
+ * an interruptible source in a very flexible manner. For instance, an 
+ * irqdevice could have its INTR wired to a VCPU (ala LAPIC), or another 
+ * interrupt controller (ala cascaded i8259s)
+ *
+ * Copyright (C) 2007 Novell
+ *
+ * Authors:
+ *   Gregory Haskins [EMAIL PROTECTED]
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 59 Temple
+ * Place - Suite 330, Boston, MA 02111-1307 USA.
+ *
+ */
+
+#ifndef __IRQDEVICE_H
+#define __IRQDEVICE_H
+
+#define KVM_IRQFLAGS_NMI  (1  0)
+#define KVM_IRQFLAGS_PEEK (1  1)
+
+struct kvm_irqdevice;
+
+struct kvm_irqsink {
+   void (*raise_intr)(struct kvm_irqsink *this, 
+  struct kvm_irqdevice *dev);
+
+   void *private;
+};
+
+struct kvm_irqdevice {
+   int  (*pending)(struct kvm_irqdevice *this, int flags);
+   int  (*read_vector)(struct kvm_irqdevice *this, int flags); 
+   int  (*set_pin)(struct kvm_irqdevice *this, int pin, int level);
+   int  (*summary)(struct kvm_irqdevice *this, void *data);
+   void (*destructor)(struct kvm_irqdevice *this);
+
+   void   *private;
+   struct kvm_irqsink  sink;
+};
+
+/*
+ * kvm_irqdevice_init()
+ *
+ * Description:
+ *Initialized the kvm_irqdevice for use.  Should be called before calling
+ *any derived implementation init functions
+ * 
+ * Parameters:
+ *struct kvm_irqdevice *dev: This device
+ *
+ * Returns: (void)
+ */
+static inline void kvm_irqdevice_init(struct kvm_irqdevice *dev)
+{
+   memset(dev, 0, sizeof(*dev));
+}
+
+/*
+ * kvm_irqdevice_pending()
+ *
+ * Description:
+ *Efficiently determines if an interrupt is pending on an irqdevice
+ *
+ * Parameters:
+ *struct kvm_irqdevice *dev: The device
+ *int flags: Modifies the behavior as follows:
+ * 
+ *   KVM_IRQFLAGS_NMI: Mask everything but NMIs
+ *
+ * Returns: (int)
+ *0 = no iterrupts pending (per flags criteria)
+ *1 = one or more interrupts are pending
+ */
+static inline int kvm_irqdevice_pending(struct kvm_irqdevice *dev, int flags)
+{
+   return dev-pending(dev, flags);
+}
+
+/*
+ * kvm_irqdevice_read_vector()
+ *
+ * Description:
+ *Read the highest priority pending vector from the device, potentially
+ *invoking auto-EOI depending on device policy
+ *
+ * Parameters:
+ *struct 

Re: [kvm-devel] [PATCH] Add irqdevice interface + userint implementation

2007-04-08 Thread Avi Kivity
Christoph Hellwig wrote:
 +/*--
 + * optimized bitarray object - works like bitarrays in bitops, but uses 
 + * a summary field to accelerate lookups.  Assumes external locking 
 + *-*/
 +
 +struct bitarray {
 +unsigned long summary; /* 1 per word in pending */
 +unsigned long pending[NR_IRQ_WORDS];
 +};
 +
 
 This should go into a separate header, probably even in include/linux/
   

It's limited to BITS_PER_LONG * BITS_PER_LONG bits (1K or 4K, depending 
on arch).  Do you see any potential users?

Making it generic would be difficult in C without preprocessor abuse.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.


-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel