Re: [PATCH v2] Driver for Inter-VM shared memory device for KVM supporting interrupts.

2009-05-20 Thread Christian Bornträger
Am Mittwoch 20 Mai 2009 04:58:38 schrieb Rusty Russell:
 On Wed, 20 May 2009 02:21:08 am Cam Macdonell wrote:
  Avi Kivity wrote:
   Christian Bornträger wrote:
   To summarize, Anthony thinks it should use virtio, while I believe
   virtio is useful for exporting guest memory, not for importing host
   memory.

 Yes, precisely.

 But what's it *for*, this shared memory?  Implementing shared memory is
 trivial.  Using it is harder.  For example, inter-guest networking: you'd
 have to copy packets in and out, making it slow as well as losing
 abstraction.

 The only interesting idea I can think of is exposing it to userspace, and
 having that run some protocol across it for fast app - app comms.  But if
 that's your plan, you still have a lot of code the write!

 So I guess I'm missing the big picture here?

I can give some insights about shared memory usage in z/VM. z/VM uses so-
called discontiguous saved segments (DCSS) to shared memory between guests.
(naming side note:
o discontigous because these segments can have holes and different 
access
  rights, e.g. you can build DCSS that go from 800M-801M read only and
  900M-910M exclusive-write.
o segments because the 2nd level of our page tables is called segment 
table.
 )

z/VM uses these segments for several purposes:
o The monitoring subsystem uses a DCSS to get data from several components
o shared guest kernels: The CMS operating system is build as a bootable DCSS
  (called named-saved-segments NSS). All guests have the same host pages for
  the read-only parts of the CMS kernel. The local data is stored in
  exclusive-write parts of the same NSS. Linux on System z is also capable of
  using this feature (CONFIG_SHARED_KERNEL). The kernel linkage is changed in
  a way to separate the read-only text segment from the other parts with
  segment size alignment
o execute-in-place: This is a Linux feature to exploit the DCSS technology.
  The goal is to shared identical guest pages without the additional overhead
  of KSM etc. We have a block device driver for DCSS. This block device driver
  supports the direct_access function and therefore allows to use the xip
  option of ext2. The idea is to put  binaries into an read-only ext2
  filesystem. Whenever an mmap is made on this file system, the page is not
  mapped into the page cache. The ptes point into the DCSS memory instead.
  Since the DCSS is demand-paged by the host no memory is wasted for unused
  parts of the binaries. In case of COW the page is copied as usual. It turned
  out that installations with many similar guests (lets say 400 guests) will
  profit in terms of memory saving and quicker application startups (not the
  first guest of course). There is a downside: this requires a skilled
  administrator to setup.

We have also experimented with network, Posix shared memory, and shared caches 
via DCSS. Most of these ideas turned out to be not very useful or hard to 
implement proper.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] Driver for Inter-VM shared memory device for KVM supporting interrupts.

2009-05-20 Thread Christian Bornträger
Am Dienstag 19 Mai 2009 20:39:24 schrieb Anthony Liguori:
 Perhaps something that maps closer to the current add_buf/get_buf API.
 Something like:

 struct iovec *(*map_buf)(struct virtqueue *vq, unsigned int *out_num,
 unsigned int *in_num);
 void (*unmap_buf)(struct virtqueue *vq, struct iovec *iov, unsigned int
 out_num, unsigned int in_num);

 There's symmetry here which is good.  The one bad thing about it is
 forces certain memory to be read-only and other memory to be
 read-write.  I don't see that as a bad thing though.

 I think we'll need an interface like this so support driver domains too
 since backend.  To put it another way, in QEMU, map_buf ==
 virtqueue_pop and unmap_buf == virtqueue_push.


You are proposing that the guest should define some guest memory to be used as 
shared memory (some kind of replacement), right? This is fine, as long as we 
can _also_ map host memory somewhere else (e.g. after guest memory, above 1TB 
etc.). I definitely want to be able to have an 64MB guest map an 2GB shared 
memory zone. (See my other mail about the execute-in-place via DCSS use case).


I think we should start to write down some requirements. This will help to get 
a better understanding of the necessary interface:
here are my first ideas:

o allow to map host-shared-memory to anyplace that can be addressed via a PFN
o allow to map beyond guest storage
o allow to replace guest memory
o read-only and read/write modes
o driver interface should not depend on hardware specific stuff (e.g. prefer 
generic virtio over PCI)

More ideas are welcome.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] Driver for Inter-VM shared memory device for KVM supporting interrupts.

2009-05-20 Thread François Diakhate
On Wed, May 20, 2009 at 4:58 AM, Rusty Russell ru...@rustcorp.com.au wrote:

 The only interesting idea I can think of is exposing it to userspace, and
 having that run some protocol across it for fast app - app comms.  But if
 that's your plan, you still have a lot of code the write!

 So I guess I'm missing the big picture here?

Hello Rusty,

For an example, you may have a look at a paper I wrote last year
on achieving fast MPI-like message passing between guests over
shared memory [1].

For my proof-of-concept implementation, I introduced a virtual device
allowing to perform DMA between guests, something for which virtio is
well suited, but also to share some memory to transfer small messages
efficiently from userspace. To expose this shared memory to guests, I
implemented something quite similar to what Cam is proposing which
was to expose it as the memory of a pci device. I think it could be a
useful addition to virtio if it allowed to abstract this.

François

[1] http://hal.archives-ouvertes.fr/docs/00/36/86/22/PDF/vhpc08.pdf
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] Driver for Inter-VM shared memory device for KVM supporting interrupts.

2009-05-20 Thread Christian Bornträger
Am Mittwoch 20 Mai 2009 10:45:50 schrieb Avi Kivity:
 Christian Bornträger wrote:
  o shared guest kernels: The CMS operating system is build as a bootable
  DCSS (called named-saved-segments NSS). All guests have the same host
  pages for the read-only parts of the CMS kernel. The local data is stored
  in exclusive-write parts of the same NSS. Linux on System z is also
  capable of using this feature (CONFIG_SHARED_KERNEL). The kernel linkage
  is changed in a way to separate the read-only text segment from the other
  parts with segment size alignment

 How does patching (smp, kprobes/jprobes, markers/ftrace) work with this?
It does not. :-) 
Because of that and since most distro kernels are fully modular and kernel 
updates are another problem this feature is not used very often for Linux. It 
is used heavily in CMS, though.
Actually, we could do COW in the host but then it is really not worth the 
effort.

  o execute-in-place: This is a Linux feature to exploit the DCSS
  technology. The goal is to shared identical guest pages without the
  additional overhead of KSM etc. We have a block device driver for DCSS.
  This block device driver supports the direct_access function and
  therefore allows to use the xip option of ext2. The idea is to put 
  binaries into an read-only ext2 filesystem. Whenever an mmap is made on
  this file system, the page is not mapped into the page cache. The ptes
  point into the DCSS memory instead. Since the DCSS is demand-paged by the
  host no memory is wasted for unused parts of the binaries. In case of COW
  the page is copied as usual. It turned out that installations with many
  similar guests (lets say 400 guests) will profit in terms of memory
  saving and quicker application startups (not the first guest of course).
  There is a downside: this requires a skilled administrator to setup.

 ksm might be easier to admin, at the cost of some cpu time.

Yes, KSM is easier and it even finds duplicate data pages.
On the other hand it does only provide memory saving. It does not speedup 
application startup like execute-in-place (major page faults become minor page 
faults for text pages if the page is already backed by the host)
I am not claiming that KSM is useless. Depending on the scenario you might 
want the one or the other or even both. For typical desktop use, KSM is very 
likely the better approach.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] Driver for Inter-VM shared memory device for KVM supporting interrupts.

2009-05-20 Thread Avi Kivity

Christian Bornträger wrote:

Am Mittwoch 20 Mai 2009 10:45:50 schrieb Avi Kivity:
  

Christian Bornträger wrote:


o shared guest kernels: The CMS operating system is build as a bootable
DCSS (called named-saved-segments NSS). All guests have the same host
pages for the read-only parts of the CMS kernel. The local data is stored
in exclusive-write parts of the same NSS. Linux on System z is also
capable of using this feature (CONFIG_SHARED_KERNEL). The kernel linkage
is changed in a way to separate the read-only text segment from the other
parts with segment size alignment
  

How does patching (smp, kprobes/jprobes, markers/ftrace) work with this?

It does not. :-) 
Because of that and since most distro kernels are fully modular and kernel 
updates are another problem this feature is not used very often for Linux. It 
is used heavily in CMS, though.
Actually, we could do COW in the host but then it is really not worth the 
effort.
  


ksm on low throttle would solve all of those problems.


Yes, KSM is easier and it even finds duplicate data pages.
On the other hand it does only provide memory saving. It does not speedup 
application startup like execute-in-place (major page faults become minor page 
faults for text pages if the page is already backed by the host)
I am not claiming that KSM is useless. Depending on the scenario you might 
want the one or the other or even both. For typical desktop use, KSM is very 
likely the better approach


If ksm shares pagecache, then doesn't it become effectively XIP?

We could also hook virtio dma to preemptively share pages somehow.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] Driver for Inter-VM shared memory device for KVM supporting interrupts.

2009-05-20 Thread Christian Bornträger
Am Mittwoch 20 Mai 2009 11:11:57 schrieb Avi Kivity:
  Yes, KSM is easier and it even finds duplicate data pages.
  On the other hand it does only provide memory saving. It does not speedup
  application startup like execute-in-place (major page faults become minor
  page faults for text pages if the page is already backed by the host) I
  am not claiming that KSM is useless. Depending on the scenario you might
  want the one or the other or even both. For typical desktop use, KSM is
  very likely the better approach

 If ksm shares pagecache, then doesn't it become effectively XIP?

Not exactly, only for long running guests with stable working set. If the 
guest boots up, its page cache is basically empty, but the shared segment is 
populated. its the startup where xip wins. Same is true for guests with 
quickly changing working sets. 

 We could also hook virtio dma to preemptively share pages somehow.

Yes, that is something to think about. One idea that is used on z/VM by lot of 
customers is to have a shared disk read-only for /usr that is cached by the 
host.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] Driver for Inter-VM shared memory device for KVM supporting interrupts.

2009-05-20 Thread Anthony Liguori

Christian Bornträger wrote:

Am Dienstag 19 Mai 2009 20:39:24 schrieb Anthony Liguori:
  

Perhaps something that maps closer to the current add_buf/get_buf API.
Something like:

struct iovec *(*map_buf)(struct virtqueue *vq, unsigned int *out_num,
unsigned int *in_num);
void (*unmap_buf)(struct virtqueue *vq, struct iovec *iov, unsigned int
out_num, unsigned int in_num);

There's symmetry here which is good.  The one bad thing about it is
forces certain memory to be read-only and other memory to be
read-write.  I don't see that as a bad thing though.

I think we'll need an interface like this so support driver domains too
since backend.  To put it another way, in QEMU, map_buf ==
virtqueue_pop and unmap_buf == virtqueue_push.




You are proposing that the guest should define some guest memory to be used as 
shared memory (some kind of replacement), right?


No.  map_buf() returns a mapped region of memory.  Where that memory 
comes from is up to the transport.  It can be the result of an ioremap 
of a PCI BAR.


The model of virtio frontends today is:

o add buffer of guest's memory
o let backend do something with it
o get back buffer of guest's memory

The backend model (as implemented by QEMU) is:

o get buffer of mapped front-end memory
o do something with memory
o give buffer back

For implementing persistent shared memory, you need a vring with enough 
elements to hold all of the shared memory regions at one time.  This 
becomes more practical with indirect scatter/gather entries.


Of course, whether vring is used at all is a transport detail.

Regards,

Anthony Liguori
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] Driver for Inter-VM shared memory device for KVM supporting interrupts.

2009-05-19 Thread Christian Bornträger
Am Montag 18 Mai 2009 16:26:15 schrieb Avi Kivity:
 Christian Borntraeger wrote:
  Sorry for the late question, but I missed your first version. Is there a
  way to change that code to use virtio instead of PCI? That would allow us
  to use this driver on s390 and maybe other virtio transports.

 Opinion differs.  See the discussion in
 http://article.gmane.org/gmane.comp.emulators.kvm.devel/30119.

 To summarize, Anthony thinks it should use virtio, while I believe
 virtio is useful for exporting guest memory, not for importing host memory.

I think the current virtio interface is not ideal for importing host memory, 
but we can change that. If you look at the dcssblk driver for s390, it allows 
a guest to map shared memory segments via a diagnose (hypercall). This driver 
uses PCI regions to map memory.

My point is, that the method to map memory is completely irrelevant, we just 
need something like mmap/shmget between the guest and the host. We could 
define an interface in virtio, that can be used by any transport. In case of 
pci this could be a simple pci map operation. 

What do you think about something like: (CCed Rusty)
---
 include/linux/virtio.h |   26 ++
 1 file changed, 26 insertions(+)

Index: linux-2.6/include/linux/virtio.h
===
--- linux-2.6.orig/include/linux/virtio.h
+++ linux-2.6/include/linux/virtio.h
@@ -71,6 +71,31 @@ struct virtqueue_ops {
 };
 
 /**
+ * virtio_device_ops - operations for virtio devices
+ * @map_region: map host buffer at a given address
+ * vdev: the struct virtio_device we're talking about.
+ * addr: The address where the buffer should be mapped (hint only)
+ * length: THe length of the mapping
+ * identifier: the token that identifies the host buffer
+ *  Returns the mapping address or an error pointer.
+ * @unmap_region: unmap host buffer from the address
+ * vdev: the struct virtio_device we're talking about.
+ * addr: The address where the buffer is mapped
+ *  Returns 0 on success or an error
+ *
+ * TBD, we might need query etc.
+ */
+struct virtio_device_ops {
+   void * (*map_region)(struct virtio_device *vdev,
+void *addr,
+size_t length,
+int identifier);
+   int (*unmap_region)(struct virtio_device *vdev, void *addr);
+/* we might need query region and other stuff */
+};
+
+
+/**
  * virtio_device - representation of a device using virtio
  * @index: unique position on the virtio bus
  * @dev: underlying device.
@@ -85,6 +110,7 @@ struct virtio_device
struct device dev;
struct virtio_device_id id;
struct virtio_config_ops *config;
+   struct virtio_device_ops *ops;
/* Note that this is a Linux set_bit-style bitmap. */
unsigned long features[1];
void *priv;



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] Driver for Inter-VM shared memory device for KVM supporting interrupts.

2009-05-19 Thread Avi Kivity

Christian Bornträger wrote:

To summarize, Anthony thinks it should use virtio, while I believe
virtio is useful for exporting guest memory, not for importing host memory.



I think the current virtio interface is not ideal for importing host memory, 
but we can change that. If you look at the dcssblk driver for s390, it allows 
a guest to map shared memory segments via a diagnose (hypercall). This driver 
uses PCI regions to map memory.


My point is, that the method to map memory is completely irrelevant, we just 
need something like mmap/shmget between the guest and the host. We could 
define an interface in virtio, that can be used by any transport. In case of 
pci this could be a simple pci map operation. 


What do you think about something like: (CCed Rusty)
  


Exactly.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] Driver for Inter-VM shared memory device for KVM supporting interrupts.

2009-05-19 Thread Cam Macdonell

Avi Kivity wrote:

Christian Bornträger wrote:

To summarize, Anthony thinks it should use virtio, while I believe
virtio is useful for exporting guest memory, not for importing host 
memory.



I think the current virtio interface is not ideal for importing host 
memory, but we can change that. If you look at the dcssblk driver for 
s390, it allows a guest to map shared memory segments via a diagnose 
(hypercall). This driver uses PCI regions to map memory.


My point is, that the method to map memory is completely irrelevant, 
we just need something like mmap/shmget between the guest and the 
host. We could define an interface in virtio, that can be used by any 
transport. In case of pci this could be a simple pci map operation.

What do you think about something like: (CCed Rusty)
  


Exactly.



Agreed.


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] Driver for Inter-VM shared memory device for KVM supporting interrupts.

2009-05-19 Thread Anthony Liguori

Christian Bornträger wrote:

Am Montag 18 Mai 2009 16:26:15 schrieb Avi Kivity:
  

Christian Borntraeger wrote:


Sorry for the late question, but I missed your first version. Is there a
way to change that code to use virtio instead of PCI? That would allow us
to use this driver on s390 and maybe other virtio transports.
  

Opinion differs.  See the discussion in
http://article.gmane.org/gmane.comp.emulators.kvm.devel/30119.

To summarize, Anthony thinks it should use virtio, while I believe
virtio is useful for exporting guest memory, not for importing host memory.



I think the current virtio interface is not ideal for importing host memory, 
but we can change that. If you look at the dcssblk driver for s390, it allows 
a guest to map shared memory segments via a diagnose (hypercall). This driver 
uses PCI regions to map memory.


My point is, that the method to map memory is completely irrelevant, we just 
need something like mmap/shmget between the guest and the host. We could 
define an interface in virtio, that can be used by any transport. In case of 
pci this could be a simple pci map operation. 


What do you think about something like: (CCed Rusty)
---
 include/linux/virtio.h |   26 ++
 1 file changed, 26 insertions(+)

Index: linux-2.6/include/linux/virtio.h
===
--- linux-2.6.orig/include/linux/virtio.h
+++ linux-2.6/include/linux/virtio.h
@@ -71,6 +71,31 @@ struct virtqueue_ops {
 };
 
 /**

+ * virtio_device_ops - operations for virtio devices
+ * @map_region: map host buffer at a given address
+ * vdev: the struct virtio_device we're talking about.
+ * addr: The address where the buffer should be mapped (hint only)
+ * length: THe length of the mapping
+ * identifier: the token that identifies the host buffer
+ *  Returns the mapping address or an error pointer.
+ * @unmap_region: unmap host buffer from the address
+ * vdev: the struct virtio_device we're talking about.
+ * addr: The address where the buffer is mapped
+ *  Returns 0 on success or an error
+ *
+ * TBD, we might need query etc.
+ */
+struct virtio_device_ops {
+   void * (*map_region)(struct virtio_device *vdev,
+void *addr,
+size_t length,
+int identifier);
+   int (*unmap_region)(struct virtio_device *vdev, void *addr);
+/* we might need query region and other stuff */
+};
  


Perhaps something that maps closer to the current add_buf/get_buf API.  
Something like:


struct iovec *(*map_buf)(struct virtqueue *vq, unsigned int *out_num, 
unsigned int *in_num);
void (*unmap_buf)(struct virtqueue *vq, struct iovec *iov, unsigned int 
out_num, unsigned int in_num);


There's symmetry here which is good.  The one bad thing about it is 
forces certain memory to be read-only and other memory to be 
read-write.  I don't see that as a bad thing though.


I think we'll need an interface like this so support driver domains too 
since backend.  To put it another way, in QEMU, map_buf == 
virtqueue_pop and unmap_buf == virtqueue_push.


Regards,

Anthony Liguori
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] Driver for Inter-VM shared memory device for KVM supporting interrupts.

2009-05-19 Thread Rusty Russell
On Wed, 20 May 2009 02:21:08 am Cam Macdonell wrote:
 Avi Kivity wrote:
  Christian Bornträger wrote:
  To summarize, Anthony thinks it should use virtio, while I believe
  virtio is useful for exporting guest memory, not for importing host
  memory.

Yes, precisely.

But what's it *for*, this shared memory?  Implementing shared memory is 
trivial.  Using it is harder.  For example, inter-guest networking: you'd have 
to copy packets in and out, making it slow as well as losing abstraction.

The only interesting idea I can think of is exposing it to userspace, and 
having that run some protocol across it for fast app - app comms.  But if 
that's your plan, you still have a lot of code the write!

So I guess I'm missing the big picture here?

Thanks,
Rusty.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] Driver for Inter-VM shared memory device for KVM supporting interrupts.

2009-05-18 Thread Christian Borntraeger
Am Thursday 07 May 2009 18:26:07 schrieb Cam Macdonell:
 Driver for inter-VM shared memory device that now supports interrupts
 between two guests.  The driver defines a counting semaphore and wait_event
 queue for different synchronization needs of users.  Initializing the
 semaphore count, sending interrupts and waiting are implemented via ioctl
 calls. 
...
 +#include linux/pci.h

Sorry for the late question, but I missed your first version. Is there a way to 
change that code to use virtio instead of PCI? That would allow us to use this 
driver on s390 and maybe other virtio transports.

Christian

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] Driver for Inter-VM shared memory device for KVM supporting interrupts.

2009-05-18 Thread Avi Kivity

Christian Borntraeger wrote:

Sorry for the late question, but I missed your first version. Is there a way to 
change that code to use virtio instead of PCI? That would allow us to use this 
driver on s390 and maybe other virtio transports.
  


Opinion differs.  See the discussion in 
http://article.gmane.org/gmane.comp.emulators.kvm.devel/30119.


To summarize, Anthony thinks it should use virtio, while I believe 
virtio is useful for exporting guest memory, not for importing host memory.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] Driver for Inter-VM shared memory device for KVM supporting interrupts.

2009-05-18 Thread Cam Macdonell

Christian Borntraeger wrote:

Am Thursday 07 May 2009 18:26:07 schrieb Cam Macdonell:

Driver for inter-VM shared memory device that now supports interrupts
between two guests.  The driver defines a counting semaphore and wait_event
queue for different synchronization needs of users.  Initializing the
semaphore count, sending interrupts and waiting are implemented via ioctl
calls. 

...

+#include linux/pci.h


Sorry for the late question, but I missed your first version. Is there a way to 
change that code to use virtio instead of PCI? That would allow us to use this 
driver on s390 and maybe other virtio transports.

Christian



Forgive my s390 ignorance, but is there a device interface in s390 that 
can export memory and support interrupts?  I'm not opposed to virtio, 
but I like the simplicity of the PCI approach as well as having the 
memory that is shared external to any particular VM.  The current 
approach is using a shared memory object on the host as the shared 
memory the VMs share.


Cam


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] Driver for Inter-VM shared memory device for KVM supporting interrupts.

2009-05-07 Thread Cam Macdonell
Driver for inter-VM shared memory device that now supports interrupts between 
two guests.  The driver defines a counting semaphore and wait_event queue for 
different synchronization needs of users.  Initializing the semaphore count, 
sending interrupts and waiting are implemented via ioctl calls. 

The synchronization mechanisms are simple and rely on existing kernel 
primitives, but I think they're flexible for synchronization between guests.  
I'm contemplating more complicated designs that would use the shared memory to 
store synchronization variables, but thought I would get this initial patch out 
to get some feedback. 

Cheers,
Cam 

---
 drivers/char/Kconfig   |8 +
 drivers/char/Makefile  |2 +
 drivers/char/kvm_ivshmem.c |  430 
 3 files changed, 440 insertions(+), 0 deletions(-)
 create mode 100644 drivers/char/kvm_ivshmem.c

diff --git a/drivers/char/Kconfig b/drivers/char/Kconfig
index 735bbe2..afa7cb8 100644
--- a/drivers/char/Kconfig
+++ b/drivers/char/Kconfig
@@ -1099,6 +1099,14 @@ config DEVPORT
depends on ISA || PCI
default y
 
+config KVM_IVSHMEM
+tristate Inter-VM Shared Memory Device
+depends on PCI
+default m
+help
+  This device maps a region of shared memory between the host OS and any
+  number of virtual machines.
+
 source drivers/s390/char/Kconfig
 
 endmenu
diff --git a/drivers/char/Makefile b/drivers/char/Makefile
index 9caf5b5..021f06b 100644
--- a/drivers/char/Makefile
+++ b/drivers/char/Makefile
@@ -111,6 +111,8 @@ obj-$(CONFIG_PS3_FLASH) += ps3flash.o
 obj-$(CONFIG_JS_RTC)   += js-rtc.o
 js-rtc-y = rtc.o
 
+obj-$(CONFIG_KVM_IVSHMEM)  += kvm_ivshmem.o
+
 # Files generated that shall be removed upon make clean
 clean-files := consolemap_deftbl.c defkeymap.c
 
diff --git a/drivers/char/kvm_ivshmem.c b/drivers/char/kvm_ivshmem.c
new file mode 100644
index 000..a20a224
--- /dev/null
+++ b/drivers/char/kvm_ivshmem.c
@@ -0,0 +1,430 @@
+/*
+ * drivers/char/kvm_ivshmem.c - driver for KVM Inter-VM shared memory PCI 
device
+ *
+ * Copyright 2009 Cam Macdonell c...@cs.ualberta.ca
+ *
+ * Based on cirrusfb.c and 8139cp.c:
+ * Copyright 1999-2001 Jeff Garzik
+ * Copyright 2001-2004 Jeff Garzik
+ *
+ */
+
+#include linux/init.h
+#include linux/kernel.h
+#include linux/module.h
+#include linux/pci.h
+#include linux/proc_fs.h
+#include linux/smp_lock.h
+#include asm/uaccess.h
+#include linux/interrupt.h
+#include linux/mutex.h
+
+#define TRUE 1
+#define FALSE 0
+#define KVM_IVSHMEM_DEVICE_MINOR_NUM 0
+
+enum {
+/* KVM Inter-VM shared memory device register offsets */
+IntrMask= 0x00,/* Interrupt Mask */
+IntrStatus  = 0x10,/* Interrupt Status */
+Doorbell= 0x20,/* Doorbell */
+ShmOK = 1/* Everything is OK */
+};
+
+typedef struct kvm_ivshmem_device {
+void __iomem * regs;
+
+void * base_addr;
+
+unsigned int regaddr;
+unsigned int reg_size;
+
+unsigned int ioaddr;
+unsigned int ioaddr_size;
+unsigned int irq;
+
+bool enabled;
+
+} kvm_ivshmem_device;
+
+static int event_num;
+static struct semaphore sema;
+static wait_queue_head_t wait_queue;
+
+static kvm_ivshmem_device kvm_ivshmem_dev;
+
+static int device_major_nr;
+
+static int kvm_ivshmem_ioctl(struct inode *, struct file *, unsigned int, 
unsigned long);
+static int kvm_ivshmem_mmap(struct file *, struct vm_area_struct *);
+static int kvm_ivshmem_open(struct inode *, struct file *);
+static int kvm_ivshmem_release(struct inode *, struct file *);
+static ssize_t kvm_ivshmem_read(struct file *, char *, size_t, loff_t *);
+static ssize_t kvm_ivshmem_write(struct file *, const char *, size_t, loff_t 
*);
+static loff_t kvm_ivshmem_lseek(struct file * filp, loff_t offset, int origin);
+
+enum ivshmem_ioctl { set_sema, down_sema, sema_irq, wait_event, wait_event_irq 
};
+
+static const struct file_operations kvm_ivshmem_ops = {
+.owner   = THIS_MODULE,
+.open= kvm_ivshmem_open,
+.mmap= kvm_ivshmem_mmap,
+.read= kvm_ivshmem_read,
+.ioctl   = kvm_ivshmem_ioctl,
+.write   = kvm_ivshmem_write,
+.llseek  = kvm_ivshmem_lseek,
+.release = kvm_ivshmem_release,
+};
+
+static struct pci_device_id kvm_ivshmem_id_table[] = {
+{ 0x1af4, 0x1110, PCI_ANY_ID, PCI_ANY_ID, 0, 0, 0 },
+{ 0 },
+};
+MODULE_DEVICE_TABLE (pci, kvm_ivshmem_id_table);
+
+static void kvm_ivshmem_remove_device(struct pci_dev* pdev);
+static int kvm_ivshmem_probe_device (struct pci_dev *pdev,
+const struct pci_device_id * ent);
+
+static struct pci_driver kvm_ivshmem_pci_driver = {
+.name= kvm-shmem,
+.id_table= kvm_ivshmem_id_table,
+.probe   = kvm_ivshmem_probe_device,
+.remove  = kvm_ivshmem_remove_device,
+};
+
+static int kvm_ivshmem_ioctl(struct inode * ino, struct file * filp,
+unsigned int cmd,