Re: [PATCH] Inter-VM shared memory PCI device

2010-03-11 Thread Arnd Bergmann
On Thursday 11 March 2010, Avi Kivity wrote:
  A totally different option that avoids this whole problem would
  be to separate the signalling from the shared memory, making the
  PCI shared memory device a trivial device with a single memory BAR,
  and using something a higher-level concept like a virtio based
  serial line for the actual signalling.
 
 
 That would be much slower.  The current scheme allows for an 
 ioeventfd/irqfd short circuit which allows one guest to interrupt 
 another without involving their qemus at all.

Yes, the serial line approach would be much slower, but my point
was that we can do signaling over something else, which could
well be something building on irqfd.

Arnd
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Inter-VM shared memory PCI device

2010-03-11 Thread Avi Kivity

On 03/11/2010 02:57 PM, Arnd Bergmann wrote:

On Thursday 11 March 2010, Avi Kivity wrote:
   

A totally different option that avoids this whole problem would
be to separate the signalling from the shared memory, making the
PCI shared memory device a trivial device with a single memory BAR,
and using something a higher-level concept like a virtio based
serial line for the actual signalling.

   

That would be much slower.  The current scheme allows for an
ioeventfd/irqfd short circuit which allows one guest to interrupt
another without involving their qemus at all.
 

Yes, the serial line approach would be much slower, but my point
was that we can do signaling over something else, which could
well be something building on irqfd.
   


Well, we could, but it seems to make things more complicated?  A card 
with shared memory, and another card with an interrupt interconnect?


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Inter-VM shared memory PCI device

2010-03-11 Thread Arnd Bergmann
On Thursday 11 March 2010, Avi Kivity wrote:
  That would be much slower.  The current scheme allows for an
  ioeventfd/irqfd short circuit which allows one guest to interrupt
  another without involving their qemus at all.
   
  Yes, the serial line approach would be much slower, but my point
  was that we can do signaling over something else, which could
  well be something building on irqfd.
 
 Well, we could, but it seems to make things more complicated?  A card 
 with shared memory, and another card with an interrupt interconnect?

Yes, I agree that it's more complicated if you have a specific application
in mind that needs one of each, and most use cases that want shared memory
also need an interrupt mechanism, but it's not always the case:

- You could use ext2 with -o xip on a private mapping of a shared host file
in order to share the page cache. This does not need any interrupts.

- If you have more than two parties sharing the segment, there are different
ways to communicate, e.g. always send an interrupt to all others, or have
dedicated point-to-point connections. There is also some complexity in
trying to cover all possible cases in one driver.

I have to say that I also really like the idea of futex over shared memory,
which could potentially make this all a lot simpler. I don't know how this
would best be implemented on the host though.

Arnd
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Inter-VM shared memory PCI device

2010-03-10 Thread Avi Kivity

On 03/09/2010 08:34 PM, Cam Macdonell wrote:

On Tue, Mar 9, 2010 at 10:28 AM, Avi Kivitya...@redhat.com  wrote:
   

On 03/09/2010 05:27 PM, Cam Macdonell wrote:
 
   
 

  Registers are used
for synchronization between guests sharing the same memory object when
interrupts are supported (this requires using the shared memory server).


   

How does the driver detect whether interrupts are supported or not?

 

At the moment, the VM ID is set to -1 if interrupts aren't supported,
but that may not be the clearest way to do things.  With UIO is there
a way to detect if the interrupt pin is on?

   

I suggest not designing the device to uio.  Make it a good guest-independent
device, and if uio doesn't fit it, change it.

Why not support interrupts unconditionally?  Is the device useful without
interrupts?
 

Currently my patch works with or without the shared memory server.  If
you give the parameter

-ivshmem 256,foo

then this will create (if necessary) and map /dev/shm/foo as the
shared region without interrupt support.  Some users of shared memory
are using it this way.

Going forward we can require the shared memory server and always have
interrupts enabled.
   


Can you explain how they synchronize?  Polling?  Using the network?  
Using it as a shared cache?


If it's a reasonable use case it makes sense to keep it.

Another thing comes to mind - a shared memory ID, in case a guest has 
multiple cards.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Inter-VM shared memory PCI device

2010-03-10 Thread Arnd Bergmann
On Tuesday 09 March 2010, Cam Macdonell wrote:
 
  We could make the masking in RAM, not in registers, like virtio, which would
  require no exits.  It would then be part of the application specific
  protocol and out of scope of of this spec.
 
 
 This kind of implementation would be possible now since with UIO it's
 up to the application whether to mask interrupts or not and what
 interrupts mean.  We could leave the interrupt mask register for those
 who want that behaviour.  Arnd's idea would remove the need for the
 Doorbell and Mask, but we will always need at least one MMIO register
 to send whatever interrupts we do send.

You'd also have to be very careful if the notification is in RAM to
avoid races between one guest triggering an interrupt and another
guest clearing its interrupt mask.

A totally different option that avoids this whole problem would
be to separate the signalling from the shared memory, making the
PCI shared memory device a trivial device with a single memory BAR,
and using something a higher-level concept like a virtio based
serial line for the actual signalling.

Arnd
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Inter-VM shared memory PCI device

2010-03-10 Thread Cam Macdonell
On Wed, Mar 10, 2010 at 2:21 AM, Avi Kivity a...@redhat.com wrote:
 On 03/09/2010 08:34 PM, Cam Macdonell wrote:

 On Tue, Mar 9, 2010 at 10:28 AM, Avi Kivitya...@redhat.com  wrote:


 On 03/09/2010 05:27 PM, Cam Macdonell wrote:






  Registers are used
 for synchronization between guests sharing the same memory object when
 interrupts are supported (this requires using the shared memory
 server).




 How does the driver detect whether interrupts are supported or not?



 At the moment, the VM ID is set to -1 if interrupts aren't supported,
 but that may not be the clearest way to do things.  With UIO is there
 a way to detect if the interrupt pin is on?



 I suggest not designing the device to uio.  Make it a good
 guest-independent
 device, and if uio doesn't fit it, change it.

 Why not support interrupts unconditionally?  Is the device useful without
 interrupts?


 Currently my patch works with or without the shared memory server.  If
 you give the parameter

 -ivshmem 256,foo

 then this will create (if necessary) and map /dev/shm/foo as the
 shared region without interrupt support.  Some users of shared memory
 are using it this way.

 Going forward we can require the shared memory server and always have
 interrupts enabled.


 Can you explain how they synchronize?  Polling?  Using the network?  Using
 it as a shared cache?

 If it's a reasonable use case it makes sense to keep it.


Do you mean how they synchronize without interrupts?  One project I've
been contacted about uses the shared region directly for
synchronization for simulations running in different VMs that share
data in the memory region.  In my tests spinlocks in the shared region
work between guests.

If we want to keep the serverless implementation, do we need to
support shm_open with -chardev somehow? Something like -chardev
shm,name=foo.  Right now my qdev implementation just passes the name
to the -device option and opens it.

 Another thing comes to mind - a shared memory ID, in case a guest has
 multiple cards.

Sure, a number that can be passed on the command-line and stored in a register?

Cam
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Inter-VM shared memory PCI device

2010-03-10 Thread Avi Kivity

On 03/10/2010 06:36 PM, Cam Macdonell wrote:

On Wed, Mar 10, 2010 at 2:21 AM, Avi Kivitya...@redhat.com  wrote:
   

On 03/09/2010 08:34 PM, Cam Macdonell wrote:
 

On Tue, Mar 9, 2010 at 10:28 AM, Avi Kivitya...@redhat.comwrote:

   

On 03/09/2010 05:27 PM, Cam Macdonell wrote:

 


   


 

  Registers are used
for synchronization between guests sharing the same memory object when
interrupts are supported (this requires using the shared memory
server).



   

How does the driver detect whether interrupts are supported or not?


 

At the moment, the VM ID is set to -1 if interrupts aren't supported,
but that may not be the clearest way to do things.  With UIO is there
a way to detect if the interrupt pin is on?


   

I suggest not designing the device to uio.  Make it a good
guest-independent
device, and if uio doesn't fit it, change it.

Why not support interrupts unconditionally?  Is the device useful without
interrupts?

 

Currently my patch works with or without the shared memory server.  If
you give the parameter

-ivshmem 256,foo

then this will create (if necessary) and map /dev/shm/foo as the
shared region without interrupt support.  Some users of shared memory
are using it this way.

Going forward we can require the shared memory server and always have
interrupts enabled.

   

Can you explain how they synchronize?  Polling?  Using the network?  Using
it as a shared cache?

If it's a reasonable use case it makes sense to keep it.

 

Do you mean how they synchronize without interrupts?  One project I've
been contacted about uses the shared region directly for
synchronization for simulations running in different VMs that share
data in the memory region.  In my tests spinlocks in the shared region
work between guests.
   


I see.


If we want to keep the serverless implementation, do we need to
support shm_open with -chardev somehow? Something like -chardev
shm,name=foo.  Right now my qdev implementation just passes the name
to the -device option and opens it.
   


I think using the file name is fine.


Another thing comes to mind - a shared memory ID, in case a guest has
multiple cards.
 

Sure, a number that can be passed on the command-line and stored in a register?
   


Yes.  NICs use the MAC address and storage uses the disk serial number, 
this is the same thing for shared memory.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Inter-VM shared memory PCI device

2010-03-10 Thread Avi Kivity

On 03/10/2010 04:04 PM, Arnd Bergmann wrote:

On Tuesday 09 March 2010, Cam Macdonell wrote:
   

We could make the masking in RAM, not in registers, like virtio, which would
require no exits.  It would then be part of the application specific
protocol and out of scope of of this spec.

   

This kind of implementation would be possible now since with UIO it's
up to the application whether to mask interrupts or not and what
interrupts mean.  We could leave the interrupt mask register for those
who want that behaviour.  Arnd's idea would remove the need for the
Doorbell and Mask, but we will always need at least one MMIO register
to send whatever interrupts we do send.
 

You'd also have to be very careful if the notification is in RAM to
avoid races between one guest triggering an interrupt and another
guest clearing its interrupt mask.

A totally different option that avoids this whole problem would
be to separate the signalling from the shared memory, making the
PCI shared memory device a trivial device with a single memory BAR,
and using something a higher-level concept like a virtio based
serial line for the actual signalling.
   


That would be much slower.  The current scheme allows for an 
ioeventfd/irqfd short circuit which allows one guest to interrupt 
another without involving their qemus at all.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Inter-VM shared memory PCI device

2010-03-09 Thread Arnd Bergmann
On Monday 08 March 2010, Cam Macdonell wrote:
 enum ivshmem_registers {
 IntrMask = 0,
 IntrStatus = 2,
 Doorbell = 4,
 IVPosition = 6,
 IVLiveList = 8
 };
 
 The first two registers are the interrupt mask and status registers.
 Interrupts are triggered when a message is received on the guest's eventfd 
 from
 another VM.  Writing to the 'Doorbell' register is how synchronization 
 messages
 are sent to other VMs.
 
 The IVPosition register is read-only and reports the guest's ID number.  The
 IVLiveList register is also read-only and reports a bit vector of currently
 live VM IDs.
 
 The Doorbell register is 16-bits, but is treated as two 8-bit values.  The
 upper 8-bits are used for the destination VM ID.  The lower 8-bits are the
 value which will be written to the destination VM and what the guest status
 register will be set to when the interrupt is trigger is the destination 
 guest.
 A value of 255 in the upper 8-bits will trigger a broadcast where the message
 will be sent to all other guests.

This means you have at least two intercepts for each message:

1. Sender writes to doorbell
2. Receiver gets interrupted

With optionally two more intercepts in order to avoid interrupting the
receiver every time:

3. Receiver masks interrupt in order to process data
4. Receiver unmasks interrupt when it's done and status is no longer pending

I believe you can do much better than this, you combine status and mask
bits, making this level triggered, and move to a bitmask of all guests:

In order to send an interrupt to another guest, the sender first checks
the bit for the receiver. If it's '1', no need for any intercept, the
receiver will come back anyway. If it's zero, write a '1' bit, which
gets OR'd into the bitmask by the host. The receiver gets interrupted
at a raising edge and just leaves the bit on, until it's done processing,
then turns the bit off by writing a '1' into its own location in the mask.

Arnd
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Inter-VM shared memory PCI device

2010-03-09 Thread Avi Kivity

On 03/08/2010 07:57 PM, Cam Macdonell wrote:



Can you provide a spec that describes the device?  This would be useful for
maintaining the code, writing guest drivers, and as a framework for review.
 

I'm not sure if you want the Qemu command-line part as part of the
spec here, but I've included for completeness.
   


I meant something from the guest's point of view, so command line syntax 
is less important.  It should be equally applicable to a real PCI card 
that works with the same driver.


See http://ozlabs.org/~rusty/virtio-spec/ for an example.


The Inter-VM Shared Memory PCI device
---

BARs

The device supports two BARs.  BAR0 is a 256-byte MMIO region to
support registers
   


(but might be extended in the future)


and BAR1 is used to map the shared memory object from the host.  The size of
BAR1 is specified on the command-line and must be a power of 2 in size.

Registers

BAR0 currently supports 5 registers of 16-bits each.


Suggest making registers 32-bits, friendlier towards non-x86.


  Registers are used
for synchronization between guests sharing the same memory object when
interrupts are supported (this requires using the shared memory server).
   


How does the driver detect whether interrupts are supported or not?


When using interrupts, VMs communicate with a shared memory server that passes
the shared memory object file descriptor using SCM_RIGHTS.  The server assigns
each VM an ID number and sends this ID number to the Qemu process along with a
series of eventfd file descriptors, one per guest using the shared memory
server.  These eventfds will be used to send interrupts between guests.  Each
guest listens on the eventfd corresponding to their ID and may use the others
for sending interrupts to other guests.

enum ivshmem_registers {
 IntrMask = 0,
 IntrStatus = 2,
 Doorbell = 4,
 IVPosition = 6,
 IVLiveList = 8
};

The first two registers are the interrupt mask and status registers.
Interrupts are triggered when a message is received on the guest's eventfd from
another VM.  Writing to the 'Doorbell' register is how synchronization messages
are sent to other VMs.

The IVPosition register is read-only and reports the guest's ID number.  The
IVLiveList register is also read-only and reports a bit vector of currently
live VM IDs.
   


That limits the number of guests to 16.


The Doorbell register is 16-bits, but is treated as two 8-bit values.  The
upper 8-bits are used for the destination VM ID.  The lower 8-bits are the
value which will be written to the destination VM and what the guest status
register will be set to when the interrupt is trigger is the destination guest.
   


What happens when two interrupts are sent back-to-back to the same 
guest?  Will the first status value be lost?


Also, reading the status register requires a vmexit.  I suggest dropping 
it and requiring the application to manage this information in the 
shared memory area (where it could do proper queueing of multiple messages).



A value of 255 in the upper 8-bits will trigger a broadcast where the message
will be sent to all other guests.
   


Please consider adding:

- MSI support
- interrupt on a guest attaching/detaching to the shared memory device

With MSI you could also have the doorbell specify both guest ID and 
vector number, which may be useful.


Thanks for this - it definitely makes reviewing easier.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Inter-VM shared memory PCI device

2010-03-09 Thread Avi Kivity

On 03/09/2010 02:49 PM, Arnd Bergmann wrote:

On Monday 08 March 2010, Cam Macdonell wrote:
   

enum ivshmem_registers {
 IntrMask = 0,
 IntrStatus = 2,
 Doorbell = 4,
 IVPosition = 6,
 IVLiveList = 8
};

The first two registers are the interrupt mask and status registers.
Interrupts are triggered when a message is received on the guest's eventfd from
another VM.  Writing to the 'Doorbell' register is how synchronization messages
are sent to other VMs.

The IVPosition register is read-only and reports the guest's ID number.  The
IVLiveList register is also read-only and reports a bit vector of currently
live VM IDs.

The Doorbell register is 16-bits, but is treated as two 8-bit values.  The
upper 8-bits are used for the destination VM ID.  The lower 8-bits are the
value which will be written to the destination VM and what the guest status
register will be set to when the interrupt is trigger is the destination guest.
A value of 255 in the upper 8-bits will trigger a broadcast where the message
will be sent to all other guests.
 

This means you have at least two intercepts for each message:

1. Sender writes to doorbell
2. Receiver gets interrupted

With optionally two more intercepts in order to avoid interrupting the
receiver every time:

3. Receiver masks interrupt in order to process data
4. Receiver unmasks interrupt when it's done and status is no longer pending

I believe you can do much better than this, you combine status and mask
bits, making this level triggered, and move to a bitmask of all guests:

In order to send an interrupt to another guest, the sender first checks
the bit for the receiver. If it's '1', no need for any intercept, the
receiver will come back anyway. If it's zero, write a '1' bit, which
gets OR'd into the bitmask by the host. The receiver gets interrupted
at a raising edge and just leaves the bit on, until it's done processing,
then turns the bit off by writing a '1' into its own location in the mask.
   


We could make the masking in RAM, not in registers, like virtio, which 
would require no exits.  It would then be part of the application 
specific protocol and out of scope of of this spec.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Inter-VM shared memory PCI device

2010-03-09 Thread Cam Macdonell
On Tue, Mar 9, 2010 at 6:03 AM, Avi Kivity a...@redhat.com wrote:
 On 03/09/2010 02:49 PM, Arnd Bergmann wrote:

 On Monday 08 March 2010, Cam Macdonell wrote:


 enum ivshmem_registers {
     IntrMask = 0,
     IntrStatus = 2,
     Doorbell = 4,
     IVPosition = 6,
     IVLiveList = 8
 };

 The first two registers are the interrupt mask and status registers.
 Interrupts are triggered when a message is received on the guest's
 eventfd from
 another VM.  Writing to the 'Doorbell' register is how synchronization
 messages
 are sent to other VMs.

 The IVPosition register is read-only and reports the guest's ID number.
  The
 IVLiveList register is also read-only and reports a bit vector of
 currently
 live VM IDs.

 The Doorbell register is 16-bits, but is treated as two 8-bit values.
  The
 upper 8-bits are used for the destination VM ID.  The lower 8-bits are
 the
 value which will be written to the destination VM and what the guest
 status
 register will be set to when the interrupt is trigger is the destination
 guest.
 A value of 255 in the upper 8-bits will trigger a broadcast where the
 message
 will be sent to all other guests.


 This means you have at least two intercepts for each message:

 1. Sender writes to doorbell
 2. Receiver gets interrupted

 With optionally two more intercepts in order to avoid interrupting the
 receiver every time:

 3. Receiver masks interrupt in order to process data
 4. Receiver unmasks interrupt when it's done and status is no longer
 pending

 I believe you can do much better than this, you combine status and mask
 bits, making this level triggered, and move to a bitmask of all guests:

 In order to send an interrupt to another guest, the sender first checks
 the bit for the receiver. If it's '1', no need for any intercept, the
 receiver will come back anyway. If it's zero, write a '1' bit, which
 gets OR'd into the bitmask by the host. The receiver gets interrupted
 at a raising edge and just leaves the bit on, until it's done processing,
 then turns the bit off by writing a '1' into its own location in the mask.


 We could make the masking in RAM, not in registers, like virtio, which would
 require no exits.  It would then be part of the application specific
 protocol and out of scope of of this spec.


This kind of implementation would be possible now since with UIO it's
up to the application whether to mask interrupts or not and what
interrupts mean.  We could leave the interrupt mask register for those
who want that behaviour.  Arnd's idea would remove the need for the
Doorbell and Mask, but we will always need at least one MMIO register
to send whatever interrupts we do send.

Cam
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Inter-VM shared memory PCI device

2010-03-09 Thread Avi Kivity

On 03/09/2010 05:27 PM, Cam Macdonell wrote:





  Registers are used
for synchronization between guests sharing the same memory object when
interrupts are supported (this requires using the shared memory server).

   

How does the driver detect whether interrupts are supported or not?
 

At the moment, the VM ID is set to -1 if interrupts aren't supported,
but that may not be the clearest way to do things.  With UIO is there
a way to detect if the interrupt pin is on?
   


I suggest not designing the device to uio.  Make it a good 
guest-independent device, and if uio doesn't fit it, change it.


Why not support interrupts unconditionally?  Is the device useful 
without interrupts?



The Doorbell register is 16-bits, but is treated as two 8-bit values.  The
upper 8-bits are used for the destination VM ID.  The lower 8-bits are the
value which will be written to the destination VM and what the guest
status
register will be set to when the interrupt is trigger is the destination
guest.

   

What happens when two interrupts are sent back-to-back to the same guest?
  Will the first status value be lost?
 

Right now, it would be.  I believe that eventfd has a counting
semaphore option, that could prevent loss of status (but limits what
the status could be).
   


It only counts the number of interrupts (and kvm will coalesce them anyway).


My understanding of uio_pci interrupt handling
is fairly new, but we could have the uio driver store the interrupt
statuses to avoid losing them.
   


There's nowhere to store them if we use ioeventfd/irqfd.  I think it's 
both easier and more efficient to leave this to the application (to 
store into shared memory).


--

error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Inter-VM shared memory PCI device

2010-03-09 Thread Anthony Liguori

On 03/09/2010 11:28 AM, Avi Kivity wrote:

On 03/09/2010 05:27 PM, Cam Macdonell wrote:





  Registers are used
for synchronization between guests sharing the same memory object when
interrupts are supported (this requires using the shared memory 
server).



How does the driver detect whether interrupts are supported or not?

At the moment, the VM ID is set to -1 if interrupts aren't supported,
but that may not be the clearest way to do things.  With UIO is there
a way to detect if the interrupt pin is on?


I suggest not designing the device to uio.  Make it a good 
guest-independent device, and if uio doesn't fit it, change it.


You can always fall back to reading the config space directly.  It's not 
strictly required that you stick to the UIO interface.


Why not support interrupts unconditionally?  Is the device useful 
without interrupts?


You can always just have interrupts enabled and not use them if that's 
desired.


Regards,

Anthony Liguori
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Inter-VM shared memory PCI device

2010-03-09 Thread Cam Macdonell
On Tue, Mar 9, 2010 at 10:28 AM, Avi Kivity a...@redhat.com wrote:
 On 03/09/2010 05:27 PM, Cam Macdonell wrote:


  Registers are used
 for synchronization between guests sharing the same memory object when
 interrupts are supported (this requires using the shared memory server).



 How does the driver detect whether interrupts are supported or not?


 At the moment, the VM ID is set to -1 if interrupts aren't supported,
 but that may not be the clearest way to do things.  With UIO is there
 a way to detect if the interrupt pin is on?


 I suggest not designing the device to uio.  Make it a good guest-independent
 device, and if uio doesn't fit it, change it.

 Why not support interrupts unconditionally?  Is the device useful without
 interrupts?

Currently my patch works with or without the shared memory server.  If
you give the parameter

-ivshmem 256,foo

then this will create (if necessary) and map /dev/shm/foo as the
shared region without interrupt support.  Some users of shared memory
are using it this way.

Going forward we can require the shared memory server and always have
interrupts enabled.


 The Doorbell register is 16-bits, but is treated as two 8-bit values.
  The
 upper 8-bits are used for the destination VM ID.  The lower 8-bits are
 the
 value which will be written to the destination VM and what the guest
 status
 register will be set to when the interrupt is trigger is the destination
 guest.



 What happens when two interrupts are sent back-to-back to the same guest?
  Will the first status value be lost?


 Right now, it would be.  I believe that eventfd has a counting
 semaphore option, that could prevent loss of status (but limits what
 the status could be).


 It only counts the number of interrupts (and kvm will coalesce them anyway).

Right.


 My understanding of uio_pci interrupt handling
 is fairly new, but we could have the uio driver store the interrupt
 statuses to avoid losing them.


 There's nowhere to store them if we use ioeventfd/irqfd.  I think it's both
 easier and more efficient to leave this to the application (to store into
 shared memory).

Agreed.

Cam
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Inter-VM shared memory PCI device

2010-03-08 Thread Avi Kivity

On 03/06/2010 01:52 AM, Cam Macdonell wrote:

Support an inter-vm shared memory device that maps a shared-memory object
as a PCI device in the guest.  This patch also supports interrupts between
guest by communicating over a unix domain socket.  This patch applies to the
qemu-kvm repository.

This device now creates a qemu character device and sends 1-bytes messages to
trigger interrupts.  Writes are trigger by writing to the Doorbell register
on the shared memory PCI device.  The lower 8-bits of the value written to this
register are sent as the 1-byte message so different meanings of interrupts can
be supported.

Interrupts are supported between multiple VMs by using a shared memory server

-ivshmemsize in MB,[unix:path][file]

Interrupts can also be used between host and guest as well by implementing a
listener on the host that talks to shared memory server.  The shared memory
server passes file descriptors for the shared memory object and eventfds (our
interrupt mechanism) to the respective qemu instances.

   


Can you provide a spec that describes the device?  This would be useful 
for maintaining the code, writing guest drivers, and as a framework for 
review.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Inter-VM shared memory PCI device

2010-03-08 Thread Cam Macdonell
On Mon, Mar 8, 2010 at 2:56 AM, Avi Kivity a...@redhat.com wrote:
 On 03/06/2010 01:52 AM, Cam Macdonell wrote:

 Support an inter-vm shared memory device that maps a shared-memory object
 as a PCI device in the guest.  This patch also supports interrupts between
 guest by communicating over a unix domain socket.  This patch applies to
 the
 qemu-kvm repository.

 This device now creates a qemu character device and sends 1-bytes messages
 to
 trigger interrupts.  Writes are trigger by writing to the Doorbell
 register
 on the shared memory PCI device.  The lower 8-bits of the value written to
 this
 register are sent as the 1-byte message so different meanings of
 interrupts can
 be supported.

 Interrupts are supported between multiple VMs by using a shared memory
 server

 -ivshmemsize in MB,[unix:path][file]

 Interrupts can also be used between host and guest as well by implementing
 a
 listener on the host that talks to shared memory server.  The shared
 memory
 server passes file descriptors for the shared memory object and eventfds
 (our
 interrupt mechanism) to the respective qemu instances.



 Can you provide a spec that describes the device?  This would be useful for
 maintaining the code, writing guest drivers, and as a framework for review.

I'm not sure if you want the Qemu command-line part as part of the
spec here, but I've included for completeness.

Device Specification for Inter-VM shared memory device
---

Qemu Command-line
---

The command-line for inter-vm shared memory is as follows

-ivshmem size,[unix:]name

the size argument specifies the size of the shared memory object.  The second
option specifies either a unix domain socket (when using the unix: prefix) or a
name for the shared memory object.

If a unix domain socket is specified, the guest will receive the shared object
from the shared memory server listening on that socket and will support
interrupts with the other guests using that server.  Each server only serves
one memory object.

If a name is specified on the command line (without 'unix:'), then the guest
will open the POSIX shared memory object with that name (in /dev/shm) and the
specified size.  The guest will NOT support interrupts but the shared memory
object can be shared between multiple guests.

The Inter-VM Shared Memory PCI device
---

BARs

The device supports two BARs.  BAR0 is a 256-byte MMIO region to
support registers
and BAR1 is used to map the shared memory object from the host.  The size of
BAR1 is specified on the command-line and must be a power of 2 in size.

Registers

BAR0 currently supports 5 registers of 16-bits each.  Registers are used
for synchronization between guests sharing the same memory object when
interrupts are supported (this requires using the shared memory server).

When using interrupts, VMs communicate with a shared memory server that passes
the shared memory object file descriptor using SCM_RIGHTS.  The server assigns
each VM an ID number and sends this ID number to the Qemu process along with a
series of eventfd file descriptors, one per guest using the shared memory
server.  These eventfds will be used to send interrupts between guests.  Each
guest listens on the eventfd corresponding to their ID and may use the others
for sending interrupts to other guests.

enum ivshmem_registers {
IntrMask = 0,
IntrStatus = 2,
Doorbell = 4,
IVPosition = 6,
IVLiveList = 8
};

The first two registers are the interrupt mask and status registers.
Interrupts are triggered when a message is received on the guest's eventfd from
another VM.  Writing to the 'Doorbell' register is how synchronization messages
are sent to other VMs.

The IVPosition register is read-only and reports the guest's ID number.  The
IVLiveList register is also read-only and reports a bit vector of currently
live VM IDs.

The Doorbell register is 16-bits, but is treated as two 8-bit values.  The
upper 8-bits are used for the destination VM ID.  The lower 8-bits are the
value which will be written to the destination VM and what the guest status
register will be set to when the interrupt is trigger is the destination guest.
A value of 255 in the upper 8-bits will trigger a broadcast where the message
will be sent to all other guests.

Cheers,
Cam


 --
 error compiling committee.c: too many arguments to function


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html