Re: [Qemu-devel] Re: [PATCH v3 0/2] Inter-VM shared memory PCI device

2010-03-26 Thread Cam Macdonell
On Thu, Mar 25, 2010 at 7:32 PM, Jamie Lokier ja...@shareable.org wrote:
 Cam Macdonell wrote:
 An irqfd can only trigger a single vector in a guest.  Right now I
 only have one eventfd per guest.    So ioeventfd/irqfd restricts the
 current implementation to a single vector that a guest can trigger.
 Without irqfd, eventfds can be used like registers a write the number
 of the vector they want to trigger, but as you point out it is racy.

 It's not racy if you use a pipe instead of eventfd. :-)

 Actually, why not?  A byte pipe between guests would be more versatile.

A pipe between guests would be quite versatile, however it's an
orthogonal design to shared memory.The shared memory is how data
should be shared/communicated if someone is using this device and the
interrupts are there to help with synchronization.


 Could it even integrate with virtio-serial, somehow?


Could virtio-serial be used as is to support a pipe between guests?

If a user wanted shared memory and a pipe, then they could setup two
devices and use them together.

Cam
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 0/2] Inter-VM shared memory PCI device

2010-03-26 Thread Avi Kivity

On 03/26/2010 01:05 AM, Cam Macdonell wrote:



I meant a unicast doorbell: 16 bits for guest ID, 16 bits for vector number.
 

Ah, yes.  Who knew two bit registers is an ambiguous term.  Do you
strongly prefer the one doorbell design?
   


Just floating out ideas.  An advantage is that it conserves register 
space; this is important if we use PIO.  For mmio this isn't so important.


--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 0/2] Inter-VM shared memory PCI device

2010-03-25 Thread Avi Kivity

On 03/25/2010 08:08 AM, Cam Macdonell wrote:

Support an inter-vm shared memory device that maps a shared-memory object
as a PCI device in the guest.  This patch also supports interrupts between
guest by communicating over a unix domain socket.  This patch applies to the
qemu-kvm repository.

Changes in this version are using the qdev format and optional use of MSI and
ioeventfd/irqfd.

The non-interrupt version is supported by passing the shm parameter

 -device ivshmem,size=size in MB,[shm=shm_name]

which will simply map the shm object into a BAR.

Interrupts are supported between multiple VMs by using a shared memory server
that is connected to with a socket character device

 -device ivshmem,size=size in MB[,chardev=chardev name][,irqfd=on]
 [,msi=on][,nvectors=n]
 -chardev socket,path=path,id=chardev name

The server passes file descriptors for the shared memory object and eventfds 
(our
interrupt mechanism) to the respective qemu instances.

When using interrupts, VMs communicate with a shared memory server that passes
the shared memory object file descriptor using SCM_RIGHTS.  The server assigns
each VM an ID number and sends this ID number to the Qemu process along with a
series of eventfd file descriptors, one per guest using the shared memory
server.  These eventfds will be used to send interrupts between guests.  Each
guest listens on the eventfd corresponding to their ID and may use the others
for sending interrupts to other guests.
   


Please put the spec somewhere publicly accessible with a permanent URL.  
I suggest a new qemu.git directory specs/.  It's more important than the 
code IMO.



enum ivshmem_registers {
 IntrMask = 0,
 IntrStatus = 4,
 IVPosition = 8,
 Doorbell = 12
};

The first two registers are the interrupt mask and status registers.  Mask and
status are only used with pin-based interrupts.  They are unused with MSI
interrupts.  The IVPosition register is read-only and reports the guest's ID
number.  Interrupts are triggered when a message is received on the guest's
eventfd from another VM.  To trigger an event, a guest must write to another
guest's Doorbell.  The Doorbells begin at offset 12.  A particular guest's
doorbell offset in the MMIO region is equal to

guest_id * 32 + Doorbell

The doorbell register for each guest is 32-bits.  The doorbell-per-guest
design was motivated for use with ioeventfd.
   


You can also use a single doorbell register with ioeventfd, as it can 
match against the data written.  If you go this route, you'd have two 
doorbells, one where you write a guest ID to send an interrupt to that 
guest, and one where any write generates a multicast.


Possible later extensions:
- multiple doorbells that trigger different vectors
- multicast doorbells


The semantics of the value written to the doorbell depends on whether the
device is using MSI or a regular pin-based interrupt.
   


I recommend against making the semantics interrupt-style dependent.  It 
means the application needs to know whether MSI is in use or not, while 
it is generally the OS that is in control of that.



Regular Interrupts
--

If regular interrupts are used (due to either a guest not supporting MSI or the
user specifying not to use them on the command-line) then the value written to
a guest's doorbell is what the guest's status register will be set to.

An status of (2^32 - 1) indicates that a new guest has joined.  Guests
should not send a message of this value for any other reason.

Message Signalled Interrupts


The important thing to remember with MSI is that it is only a signal, no
status is set (since MSI interrupts are not shared).  All information other
than the interrupt itself should be communicated via the shared memory region.
MSI is on by default.  It can be turned off with the msi=off to the parameter.
   



If the device uses MSI then the value written to the doorbell is the MSI vector
that will be raised.  Vector 0 is used to notify that a new guest has joined.
Vector 0 cannot be triggered by another guest since a value of 0 does not
trigger an eventfd.
   


Ah, looks like we approached the vector/guest matrix from different 
directions.



ioeventfd/irqfd
---

ioeventfd/irqfd is turned on by irqfd=on passed to the device parameter (it is
off by default).  When using ioeventfd/irqfd the only interrupt value that can
be passed to another guest is 1 despite what value is written to a guest's
Doorbell.
   


ioeventfd/irqfd are an implementation detail.  The spec should not 
depend on it.  It needs to be written as if qemu and kvm do not exist.  
Again, I recommend Rusty's virtio-pci for inspiration.


Applications should see exactly the same thing whether ioeventfd is 
enabled or not.



Sample programs, init scripts and the shared memory server are available in a
git repo here:

 www.gitorious.org/nahanni

Cam Macdonell (2):
   Support adding a file to 

Re: [PATCH v3 0/2] Inter-VM shared memory PCI device

2010-03-25 Thread Michael S. Tsirkin
On Thu, Mar 25, 2010 at 11:04:54AM +0200, Avi Kivity wrote:
 Again, I recommend Rusty's virtio-pci for inspiration.

Not just inspiration, how about building on virtio-pci?

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 0/2] Inter-VM shared memory PCI device

2010-03-25 Thread Markus Armbruster
Avi Kivity a...@redhat.com writes:

 Please put the spec somewhere publicly accessible with a permanent
 URL.  I suggest a new qemu.git directory specs/.  It's more important
 than the code IMO.

What about docs/?  It already exists.

[...]
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 0/2] Inter-VM shared memory PCI device

2010-03-25 Thread Avi Kivity

On 03/25/2010 11:26 AM, Markus Armbruster wrote:

Avi Kivitya...@redhat.com  writes:

   

Please put the spec somewhere publicly accessible with a permanent
URL.  I suggest a new qemu.git directory specs/.  It's more important
than the code IMO.
 

What about docs/?  It already exists.
   


docs/ would be internal qemu documentation.  I want it to be clear this 
is external documentation.


docs/specs/ would work for that.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 0/2] Inter-VM shared memory PCI device

2010-03-25 Thread Cam Macdonell
On Thu, Mar 25, 2010 at 3:21 AM, Michael S. Tsirkin m...@redhat.com wrote:
 On Thu, Mar 25, 2010 at 11:04:54AM +0200, Avi Kivity wrote:
 Again, I recommend Rusty's virtio-pci for inspiration.

 Not just inspiration, how about building on virtio-pci?

Virtio was discussed at good length last year on a previous version.
I did implement a virtio version that extended virtio to use memory
regions for importing host memory into a guest.

http://www.mail-archive.com/qemu-de...@nongnu.org/msg25784.html

However, Anthony thought the memory regions broke the DMA engine model
of virtio too much and so I went back to PCI.

Cam


 --
 MST


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 0/2] Inter-VM shared memory PCI device

2010-03-25 Thread Cam Macdonell
On Thu, Mar 25, 2010 at 3:04 AM, Avi Kivity a...@redhat.com wrote:
 On 03/25/2010 08:08 AM, Cam Macdonell wrote:

 Support an inter-vm shared memory device that maps a shared-memory object
 as a PCI device in the guest.  This patch also supports interrupts between
 guest by communicating over a unix domain socket.  This patch applies to
 the
 qemu-kvm repository.

 Changes in this version are using the qdev format and optional use of MSI
 and
 ioeventfd/irqfd.

 The non-interrupt version is supported by passing the shm parameter

     -device ivshmem,size=size in MB,[shm=shm_name]

 which will simply map the shm object into a BAR.

 Interrupts are supported between multiple VMs by using a shared memory
 server
 that is connected to with a socket character device

     -device ivshmem,size=size in MB[,chardev=chardev name][,irqfd=on]
             [,msi=on][,nvectors=n]
     -chardev socket,path=path,id=chardev name

 The server passes file descriptors for the shared memory object and
 eventfds (our
 interrupt mechanism) to the respective qemu instances.

 When using interrupts, VMs communicate with a shared memory server that
 passes
 the shared memory object file descriptor using SCM_RIGHTS.  The server
 assigns
 each VM an ID number and sends this ID number to the Qemu process along
 with a
 series of eventfd file descriptors, one per guest using the shared memory
 server.  These eventfds will be used to send interrupts between guests.
  Each
 guest listens on the eventfd corresponding to their ID and may use the
 others
 for sending interrupts to other guests.


 Please put the spec somewhere publicly accessible with a permanent URL.  I
 suggest a new qemu.git directory specs/.  It's more important than the code
 IMO.

Sorry to be pedantic, do you want a URL or the spec as part of a patch
that adds it as  a file in qemu.git/docs/specs/


 enum ivshmem_registers {
     IntrMask = 0,
     IntrStatus = 4,
     IVPosition = 8,
     Doorbell = 12
 };

 The first two registers are the interrupt mask and status registers.  Mask
 and
 status are only used with pin-based interrupts.  They are unused with MSI
 interrupts.  The IVPosition register is read-only and reports the guest's
 ID
 number.  Interrupts are triggered when a message is received on the
 guest's
 eventfd from another VM.  To trigger an event, a guest must write to
 another
 guest's Doorbell.  The Doorbells begin at offset 12.  A particular
 guest's
 doorbell offset in the MMIO region is equal to

 guest_id * 32 + Doorbell

 The doorbell register for each guest is 32-bits.  The doorbell-per-guest
 design was motivated for use with ioeventfd.


 You can also use a single doorbell register with ioeventfd, as it can match
 against the data written.  If you go this route, you'd have two doorbells,
 one where you write a guest ID to send an interrupt to that guest, and one
 where any write generates a multicast.

I thought of using the datamatch.


 Possible later extensions:
 - multiple doorbells that trigger different vectors
 - multicast doorbells

Since the doorbells are exposed the multicast could be done by the
driver.  If multicast is handled by qemu, then we have different
behaviour when using ioeventfd/irqfd since only one eventfd can be
triggered by a write.


 The semantics of the value written to the doorbell depends on whether the
 device is using MSI or a regular pin-based interrupt.


 I recommend against making the semantics interrupt-style dependent.  It
 means the application needs to know whether MSI is in use or not, while it
 is generally the OS that is in control of that.

It is basically the use of the status register that is the difference.
 The application view of what is happening doesn't need to change,
especially with UIO: write to doorbells, block on read until interrupt
arrives.  In the MSI case I could set the status register to the
vector that is received and then the would be equivalent from the view
of the application.  But, if future MSI support in UIO allows MSI
information (such as vector number) to be accessible in userspace,
then applications would become MSI dependent anyway.


 Regular Interrupts
 --

 If regular interrupts are used (due to either a guest not supporting MSI
 or the
 user specifying not to use them on the command-line) then the value
 written to
 a guest's doorbell is what the guest's status register will be set to.

 An status of (2^32 - 1) indicates that a new guest has joined.  Guests
 should not send a message of this value for any other reason.

 Message Signalled Interrupts
 

 The important thing to remember with MSI is that it is only a signal, no
 status is set (since MSI interrupts are not shared).  All information
 other
 than the interrupt itself should be communicated via the shared memory
 region.
 MSI is on by default.  It can be turned off with the msi=off to the
 parameter.


 If the device uses MSI then the value written to the 

Re: [PATCH v3 0/2] Inter-VM shared memory PCI device

2010-03-25 Thread Avi Kivity

On 03/25/2010 06:50 PM, Cam Macdonell wrote:



Please put the spec somewhere publicly accessible with a permanent URL.  I
suggest a new qemu.git directory specs/.  It's more important than the code
IMO.
 

Sorry to be pedantic, do you want a URL or the spec as part of a patch
that adds it as  a file in qemu.git/docs/specs/
   


I leave it up to you.  If you are up to hosting it independently, than 
just post a URL as part of the patch.  Otherwise, I'm sure qemu.git will 
be more than happy to be the official repository for the memory sharing 
device specification.  In that case, make the the spec the first patch 
in the series.



Possible later extensions:
- multiple doorbells that trigger different vectors
- multicast doorbells
 

Since the doorbells are exposed the multicast could be done by the
driver.  If multicast is handled by qemu, then we have different
behaviour when using ioeventfd/irqfd since only one eventfd can be
triggered by a write.
   


Multicast by the driver would require one exit per guest signalled.  
Multicast by the shared memory server needs one exit to signal an 
eventfd, then the shared memory server signals the irqfds of all members 
of the multicast group.



The semantics of the value written to the doorbell depends on whether the
device is using MSI or a regular pin-based interrupt.

   

I recommend against making the semantics interrupt-style dependent.  It
means the application needs to know whether MSI is in use or not, while it
is generally the OS that is in control of that.
 

It is basically the use of the status register that is the difference.
  The application view of what is happening doesn't need to change,
especially with UIO: write to doorbells, block on read until interrupt
arrives.  In the MSI case I could set the status register to the
vector that is received and then the would be equivalent from the view
of the application.  But, if future MSI support in UIO allows MSI
information (such as vector number) to be accessible in userspace,
then applications would become MSI dependent anyway.
   


Ah, I see.  You adjusted for the different behaviours in the driver.

Still I recommend dropping the status register: this allows single-msi 
and PIRQ to behave the same way.  Also it is racy, if two guests signal 
a third, they will overwrite each other's status.



ioeventfd/irqfd are an implementation detail.  The spec should not depend on
it.  It needs to be written as if qemu and kvm do not exist.  Again, I
recommend Rusty's virtio-pci for inspiration.

Applications should see exactly the same thing whether ioeventfd is enabled
or not.
 

The challenge I recently encountered with this is one line in the
eventfd implementation

from kvm/virt/kvm/eventfd.c

/* MMIO/PIO writes trigger an event if the addr/val match */
static int
ioeventfd_write(struct kvm_io_device *this, gpa_t addr, int len,
 const void *val)
{
 struct _ioeventfd *p = to_ioeventfd(this);

 if (!ioeventfd_in_range(p, addr, len, val))
 return -EOPNOTSUPP;

 eventfd_signal(p-eventfd, 1);
 return 0;
}

IIUC, no matter what value is written to an ioeventfd by a guest, a
value of 1 is written.  So ioeventfds work differently than eventfds.
Can we add a multivalue flag to ioeventfds so that the value that
the guest writes is written to eventfd?
   


Eventfd values are a counter, not a register.  A read() on the other 
side returns the sum of all write()s (or eventfd_signal()s).  In the 
context of irqfd it just means the number of interrupts we coalesced.


Multivalue was considered at one time for a different need and 
rejected.  Really, to solve the race you need a queue, and that can only 
be done in the shared memory segment using locked instructions.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 0/2] Inter-VM shared memory PCI device

2010-03-25 Thread Cam Macdonell
On Thu, Mar 25, 2010 at 11:02 AM, Avi Kivity a...@redhat.com wrote:
 On 03/25/2010 06:50 PM, Cam Macdonell wrote:

 Please put the spec somewhere publicly accessible with a permanent URL.
  I
 suggest a new qemu.git directory specs/.  It's more important than the
 code
 IMO.


 Sorry to be pedantic, do you want a URL or the spec as part of a patch
 that adds it as  a file in qemu.git/docs/specs/


 I leave it up to you.  If you are up to hosting it independently, than just
 post a URL as part of the patch.  Otherwise, I'm sure qemu.git will be more
 than happy to be the official repository for the memory sharing device
 specification.  In that case, make the the spec the first patch in the
 series.

Ok, I'll send it as part of the series that way people can comment
inline easily.


 Possible later extensions:
 - multiple doorbells that trigger different vectors
 - multicast doorbells


 Since the doorbells are exposed the multicast could be done by the
 driver.  If multicast is handled by qemu, then we have different
 behaviour when using ioeventfd/irqfd since only one eventfd can be
 triggered by a write.


 Multicast by the driver would require one exit per guest signalled.
  Multicast by the shared memory server needs one exit to signal an eventfd,
 then the shared memory server signals the irqfds of all members of the
 multicast group.

 The semantics of the value written to the doorbell depends on whether
 the
 device is using MSI or a regular pin-based interrupt.



 I recommend against making the semantics interrupt-style dependent.  It
 means the application needs to know whether MSI is in use or not, while
 it
 is generally the OS that is in control of that.


 It is basically the use of the status register that is the difference.
  The application view of what is happening doesn't need to change,
 especially with UIO: write to doorbells, block on read until interrupt
 arrives.  In the MSI case I could set the status register to the
 vector that is received and then the would be equivalent from the view
 of the application.  But, if future MSI support in UIO allows MSI
 information (such as vector number) to be accessible in userspace,
 then applications would become MSI dependent anyway.


 Ah, I see.  You adjusted for the different behaviours in the driver.

 Still I recommend dropping the status register: this allows single-msi and
 PIRQ to behave the same way.  Also it is racy, if two guests signal a third,
 they will overwrite each other's status.

With shared interrupts with PIRQ without a status register how does a
device know it generated the interrupt?


 ioeventfd/irqfd are an implementation detail.  The spec should not depend
 on
 it.  It needs to be written as if qemu and kvm do not exist.  Again, I
 recommend Rusty's virtio-pci for inspiration.

 Applications should see exactly the same thing whether ioeventfd is
 enabled
 or not.


 The challenge I recently encountered with this is one line in the
 eventfd implementation

 from kvm/virt/kvm/eventfd.c

 /* MMIO/PIO writes trigger an event if the addr/val match */
 static int
 ioeventfd_write(struct kvm_io_device *this, gpa_t addr, int len,
         const void *val)
 {
     struct _ioeventfd *p = to_ioeventfd(this);

     if (!ioeventfd_in_range(p, addr, len, val))
         return -EOPNOTSUPP;

     eventfd_signal(p-eventfd, 1);
     return 0;
 }

 IIUC, no matter what value is written to an ioeventfd by a guest, a
 value of 1 is written.  So ioeventfds work differently than eventfds.
 Can we add a multivalue flag to ioeventfds so that the value that
 the guest writes is written to eventfd?


 Eventfd values are a counter, not a register.  A read() on the other side
 returns the sum of all write()s (or eventfd_signal()s).  In the context of
 irqfd it just means the number of interrupts we coalesced.

 Multivalue was considered at one time for a different need and rejected.
  Really, to solve the race you need a queue, and that can only be done in
 the shared memory segment using locked instructions.

I had a hunch it was probably considered.  That explains why irqfd
doesn't have a datamatch field.  I guess supporting multiple MSI
vectors with one doorbell per guest isn't possible if one 1 bit of
information can be communicated.

So, ioeventfd/irqfd restricts MSI to 1 vector between guests.  Should
multi-MSI even be supported then in the non-ioeventfd/irq case?
Otherwise ioeventfd/irqfd become more than an implementation detail.


 --
 error compiling committee.c: too many arguments to function


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 0/2] Inter-VM shared memory PCI device

2010-03-25 Thread Avi Kivity

On 03/25/2010 07:35 PM, Cam Macdonell wrote:



Ah, I see.  You adjusted for the different behaviours in the driver.

Still I recommend dropping the status register: this allows single-msi and
PIRQ to behave the same way.  Also it is racy, if two guests signal a third,
they will overwrite each other's status.
 

With shared interrupts with PIRQ without a status register how does a
device know it generated the interrupt?
   


Right, you need a status register.  Just don't add any more information, 
since MSI cannot carry any data.



Eventfd values are a counter, not a register.  A read() on the other side
returns the sum of all write()s (or eventfd_signal()s).  In the context of
irqfd it just means the number of interrupts we coalesced.

Multivalue was considered at one time for a different need and rejected.
  Really, to solve the race you need a queue, and that can only be done in
the shared memory segment using locked instructions.
 

I had a hunch it was probably considered.  That explains why irqfd
doesn't have a datamatch field.  I guess supporting multiple MSI
vectors with one doorbell per guest isn't possible if one 1 bit of
information can be communicated.
   


Actually you can have one doorbell supporting multiple vectors and 
guests, simply divide the data value into two bit fields, one for the 
vector and one for the guest.  A single write gets both values into the 
host, which can then use datamatch to trigger the correct eventfd (which 
is wired to an irqfd in another guest).



So, ioeventfd/irqfd restricts MSI to 1 vector between guests.  Should
multi-MSI even be supported then in the non-ioeventfd/irq case?
Otherwise ioeventfd/irqfd become more than an implementation detail.
   


I lost you.  Please re-explain.



--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 0/2] Inter-VM shared memory PCI device

2010-03-25 Thread Cam Macdonell
On Thu, Mar 25, 2010 at 11:48 AM, Avi Kivity a...@redhat.com wrote:
 On 03/25/2010 07:35 PM, Cam Macdonell wrote:

 Ah, I see.  You adjusted for the different behaviours in the driver.

 Still I recommend dropping the status register: this allows single-msi
 and
 PIRQ to behave the same way.  Also it is racy, if two guests signal a
 third,
 they will overwrite each other's status.


 With shared interrupts with PIRQ without a status register how does a
 device know it generated the interrupt?


 Right, you need a status register.  Just don't add any more information,
 since MSI cannot carry any data.

Right.


 Eventfd values are a counter, not a register.  A read() on the other side
 returns the sum of all write()s (or eventfd_signal()s).  In the context
 of
 irqfd it just means the number of interrupts we coalesced.

 Multivalue was considered at one time for a different need and rejected.
  Really, to solve the race you need a queue, and that can only be done in
 the shared memory segment using locked instructions.


 I had a hunch it was probably considered.  That explains why irqfd
 doesn't have a datamatch field.  I guess supporting multiple MSI
 vectors with one doorbell per guest isn't possible if one 1 bit of
 information can be communicated.


 Actually you can have one doorbell supporting multiple vectors and guests,
 simply divide the data value into two bit fields, one for the vector and one
 for the guest.  A single write gets both values into the host, which can
 then use datamatch to trigger the correct eventfd (which is wired to an
 irqfd in another guest).

At 4-bits per guest, a single write is then limited to 8 guests (with
32-bit registers), we could got to 64-bit.


 So, ioeventfd/irqfd restricts MSI to 1 vector between guests.  Should
 multi-MSI even be supported then in the non-ioeventfd/irq case?
 Otherwise ioeventfd/irqfd become more than an implementation detail.


 I lost you.  Please re-explain.

An irqfd can only trigger a single vector in a guest.  Right now I
only have one eventfd per guest.So ioeventfd/irqfd restricts the
current implementation to a single vector that a guest can trigger.
Without irqfd, eventfds can be used like registers a write the number
of the vector they want to trigger, but as you point out it is racy.

So, supporting multiple vectors via irqfd requires multiple eventfds
for each guest (one per vector).   a total of (# of guests) X (# of
vectors) are required.  If we're limited to 8 or 16 guests that's not
too bad, but since the server opens them all we're restricted to 1024,
but that's a pretty high ceiling for this purpose.




 --
 Do not meddle in the internals of kernels, for they are subtle and quick to
 panic.


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 0/2] Inter-VM shared memory PCI device

2010-03-25 Thread Avi Kivity

On 03/25/2010 08:17 PM, Cam Macdonell wrote:



I had a hunch it was probably considered.  That explains why irqfd
doesn't have a datamatch field.  I guess supporting multiple MSI
vectors with one doorbell per guest isn't possible if one 1 bit of
information can be communicated.

   

Actually you can have one doorbell supporting multiple vectors and guests,
simply divide the data value into two bit fields, one for the vector and one
for the guest.  A single write gets both values into the host, which can
then use datamatch to trigger the correct eventfd (which is wired to an
irqfd in another guest).
 

At 4-bits per guest, a single write is then limited to 8 guests (with
32-bit registers), we could got to 64-bit.
   


I meant a unicast doorbell: 16 bits for guest ID, 16 bits for vector number.

 

So, ioeventfd/irqfd restricts MSI to 1 vector between guests.  Should
multi-MSI even be supported then in the non-ioeventfd/irq case?
Otherwise ioeventfd/irqfd become more than an implementation detail.

   

I lost you.  Please re-explain.
 

An irqfd can only trigger a single vector in a guest.  Right now I
only have one eventfd per guest.So ioeventfd/irqfd restricts the
current implementation to a single vector that a guest can trigger.
Without irqfd, eventfds can be used like registers a write the number
of the vector they want to trigger, but as you point out it is racy.
   


You can't use eventfds as registers.  The next write will add to the 
current value.



So, supporting multiple vectors via irqfd requires multiple eventfds
for each guest (one per vector).   a total of (# of guests) X (# of
vectors) are required.  If we're limited to 8 or 16 guests that's not
too bad, but since the server opens them all we're restricted to 1024,
but that's a pretty high ceiling for this purpose.
   


I'm sure we can raise the fd ulimit for this.  Note, I think qemus need 
the ulimit raised as well, since an fd passed via SCM_RIGHTS probably 
counts as an open file.


--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: [PATCH v3 0/2] Inter-VM shared memory PCI device

2010-03-25 Thread Jamie Lokier
Cam Macdonell wrote:
 An irqfd can only trigger a single vector in a guest.  Right now I
 only have one eventfd per guest.So ioeventfd/irqfd restricts the
 current implementation to a single vector that a guest can trigger.
 Without irqfd, eventfds can be used like registers a write the number
 of the vector they want to trigger, but as you point out it is racy.

It's not racy if you use a pipe instead of eventfd. :-)

Actually, why not?  A byte pipe between guests would be more versatile.

Could it even integrate with virtio-serial, somehow?

-- Jamie

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html