Re: [kvm-devel] [PATCH/RFC 7/9] Virtual network guest device driver

2007-05-23 Thread Avi Kivity
Christoph Hellwig wrote:
 On Tue, May 22, 2007 at 10:00:42AM -0700, ron minnich wrote:
   
 On 5/22/07, Eric Van Hensbergen [EMAIL PROTECTED] wrote:

 
 I'm not opposed to supporting emulation environments, just don't make
 a large pile of crap the default like Xen -- and having to integrate
 PCI probing code in my guest domains is a large pile of crap.
   
 Exactly. I'm about to start a pretty large project here, using xen or
 kvm, not sure. One thing for sure, we are NOT going to use anything
 but PV devices. Full emulation is nice, but it's just plain silly if
 you don't have to do it. And we don't have to do it. So let's get the
 PV devices right, not try to shoehorn them into some framework like
 PCI.
 

 If you don't care about full virtualization kvm is the wrong project for
 you.  You might want to take a look at lguest.

   

This is incorrect.  While kvm started out as a full virtualization
project, it will expand with I/O PV and core PV.  Eventually most of the
paravirt_ops interface will have a kvm implementation.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.


-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH/RFC 7/9] Virtual network guest device driver

2007-05-23 Thread Christoph Hellwig
On Wed, May 23, 2007 at 03:16:50PM +0300, Avi Kivity wrote:
 Christoph Hellwig wrote:
  On Tue, May 22, 2007 at 10:00:42AM -0700, ron minnich wrote:

  On 5/22/07, Eric Van Hensbergen [EMAIL PROTECTED] wrote:
 
  
  I'm not opposed to supporting emulation environments, just don't make
  a large pile of crap the default like Xen -- and having to integrate
  PCI probing code in my guest domains is a large pile of crap.

  Exactly. I'm about to start a pretty large project here, using xen or
  kvm, not sure. One thing for sure, we are NOT going to use anything
  but PV devices. Full emulation is nice, but it's just plain silly if
  you don't have to do it. And we don't have to do it. So let's get the
  PV devices right, not try to shoehorn them into some framework like
  PCI.
  
 
  If you don't care about full virtualization kvm is the wrong project for
  you.  You might want to take a look at lguest.
 

 
 This is incorrect.  While kvm started out as a full virtualization
 project, it will expand with I/O PV and core PV.  Eventually most of the
 paravirt_ops interface will have a kvm implementation.

The statement above was a little misworded I think.  It should have
been a if you care about pure PV ...


-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH/RFC 7/9] Virtual network guest device driver

2007-05-23 Thread Eric Van Hensbergen
On 5/23/07, Carsten Otte [EMAIL PROTECTED] wrote:

 For me, plan9 does provide answers to a lot of above requirements.
 However, it does not provide capabilities for shared memory and it
 adds extra complexity. It's been designed to solve a different problem.


As a point of clarification, plan9 protocols have been used over
shared memory for resource access on virtualized systems for the past
3 years.  There are certainly ways it can be further optimized, but it
is not a restriction.  As far as complexity goes, our guest-side stack
is around 2000 lines of code (with an additional 1000 lines of support
routines that could likely be replaced by standard library or OS
services in more conventional platforms) and supports console, file
system, network, and block device access.

 I think the virtual device abstraction should provide the following
 functionality:
 - hypercall guest to host with parameters and return value
 - interrupt from host to guest with parameters
 - thin interrupt from host to guest, no parameters
 - shared memory between guest and host
 - dma access to guest memory, possibly via kmap on the host
 - copy from/to guest memory


Good list.  We can certainly work within these parameters.  It would
be nice to have some facility for direct guest-guest communication
-- however, I understand the difficulties in doing that in a secure
and safe way.  Still, having the ability to provision such a direct
interface would be nice for those that can take advantage of it.

-eric

-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH/RFC 7/9] Virtual network guest device driver

2007-05-23 Thread Arnd Bergmann
On Wednesday 23 May 2007, Eric Van Hensbergen wrote:
 On 5/23/07, Carsten Otte [EMAIL PROTECTED] wrote:
 
  For me, plan9 does provide answers to a lot of above requirements.
  However, it does not provide capabilities for shared memory and it
  adds extra complexity. It's been designed to solve a different problem.
 
 As a point of clarification, plan9 protocols have been used over
 shared memory for resource access on virtualized systems for the past
 3 years.  There are certainly ways it can be further optimized, but it
 is not a restriction.

I think what Carsten means is to have a mmap interface over 9p, not
implementing 9p by means of shared memory, which is what I guess
you are referring to.

If you want to share memory areas between a guest and the host
or another guest, you can't do that with the regular Tread/Twrite
interface that 9p has on a file.

 As far as complexity goes, our guest-side stack 
 is around 2000 lines of code (with an additional 1000 lines of support
 routines that could likely be replaced by standard library or OS
 services in more conventional platforms) and supports console, file
 system, network, and block device access.

Another interface that I think is missing in 9p is a notification
for hotplugging. Of course you can have a long-running read on a
special file that returns the file names for virtual devices that
have been added or removed in the guest, but that sounds a little
clumsy compared to an specialized interface (e.g. Tnotify).

Arnd 

-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH/RFC 7/9] Virtual network guest device driver

2007-05-23 Thread Eric Van Hensbergen
On 5/23/07, Arnd Bergmann [EMAIL PROTECTED] wrote:
 On Wednesday 23 May 2007, Eric Van Hensbergen wrote:
  On 5/23/07, Carsten Otte [EMAIL PROTECTED] wrote:
  
   For me, plan9 does provide answers to a lot of above requirements.
   However, it does not provide capabilities for shared memory and it
   adds extra complexity. It's been designed to solve a different problem.
  
  As a point of clarification, plan9 protocols have been used over
  shared memory for resource access on virtualized systems for the past
  3 years. There are certainly ways it can be further optimized, but it
  is not a restriction.

 I think what Carsten means is to have a mmap interface over 9p, not
 implementing 9p by means of shared memory, which is what I guess
 you are referring to.

 If you want to share memory areas between a guest and the host
 or another guest, you can't do that with the regular Tread/Twrite
 interface that 9p has on a file.


Well, there's nothing strictly preventing a mmap interface over 9p (in
fact we are working with that in a Cell project internally) --
however, I'm not sure that makes the best sense for device access
anyways.  The real thing missing from the current implementation is a
better underlying transport which can pass payloads by reference to
shared memory as opposed to marshaling operations through a shared
memory transport -- however, this is what Los Alamos and IBM are
working on right now.

  As far as complexity goes, our guest-side stack
  is around 2000 lines of code (with an additional 1000 lines of support
  routines that could likely be replaced by standard library or OS
  services in more conventional platforms) and supports console, file
  system, network, and block device access.

 Another interface that I think is missing in 9p is a notification
 for hotplugging. Of course you can have a long-running read on a
 special file that returns the file names for virtual devices that
 have been added or removed in the guest, but that sounds a little
 clumsy compared to an specialized interface (e.g. Tnotify).


Discovery and hot-plugging would be synthetic file system semantic
issues that need to be resolved and in general are probably, as Rusty
and others suggested, best handled as a separate set of topics.  That
being said, specialized interfaces always seemed a bit more clunky to
me (just look at ioctl), but I suppose that's largely a matter of
taste.  The advantage of having a file system interface to event
notification is it creates a much more flexible environment, allowing
even simple shell scripting languages to resolve events versus having
to build a complex infrastructure -- and since 9p can be transitively
mounted over a network, you can build cluster management suites
without secondary layers of gorp for such things.  The LANL guys will
probably have more to say about this at their OLS talk on the KVM
management synthetic file system interface they build with 9p.

-eric

-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH/RFC 7/9] Virtual network guest device driver

2007-05-23 Thread Eric Van Hensbergen
On 5/23/07, Eric Van Hensbergen [EMAIL PROTECTED] wrote:
 On 5/23/07, Arnd Bergmann [EMAIL PROTECTED] wrote:
  On Wednesday 23 May 2007, Eric Van Hensbergen wrote:
   On 5/23/07, Carsten Otte [EMAIL PROTECTED] wrote:
   
For me, plan9 does provide answers to a lot of above requirements.
However, it does not provide capabilities for shared memory and it
adds extra complexity. It's been designed to solve a different problem.
   
   As a point of clarification, plan9 protocols have been used over
   shared memory for resource access on virtualized systems for the past
   3 years. There are certainly ways it can be further optimized, but it
   is not a restriction.
 
  I think what Carsten means is to have a mmap interface over 9p, not
  implementing 9p by means of shared memory, which is what I guess
  you are referring to.
 
  If you want to share memory areas between a guest and the host
  or another guest, you can't do that with the regular Tread/Twrite
  interface that 9p has on a file.
 

ugh.  I'm tired.  Its been a long week -- I realized after I fired off
that last message that you mean establishing a shared mapping versus
support for mmap operations over 9p (which devolve into Tread/Twrite).
 Sorry.  Yes -- that's correct, 9p wouldn't necessarily buy you
something like that.  In fact, the current 9p code relies on someone
else providing that basic mechanism in order for us to establish our
shared memory transport.

What Carsten described as his virtual device abstraction sounded like
a good foundation -- just don't make me use ioctl :)

-eric

-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH/RFC 7/9] Virtual network guest device driver

2007-05-22 Thread Christoph Hellwig
On Tue, May 22, 2007 at 07:49:51AM -0500, Eric Van Hensbergen wrote:
  In the general case, you can't pass a command line argument to Linux
  either.  kvm doesn't boot Linux; it boots the bios, which boots the boot
  sector, which boots grub, which boots Linux.  Relying on the user to
  edit the command line in grub is wrong.
 
 
 I didn't think we were talking about the general case, I thought we
 were discussing the PV case.  In the PV case, having bios/bootloader
 is unnecessary overhead.  To that same end, I don't see Windows in the
 PV case unless they magically want to to coordinate PV standards with
 us, in which case we certainly can negotiate a more sane discovery
 mechanism.

In case of KVM no one is speaking of pure PV.  What people have been
working on is PV accelaration of a vullvirt host, similar to how
s390 is working for decaded.  The host emulates the full architecture,
but there are some escape for speedups.  Typical escapes would be drivers
for storage or networking because those can no be virtualized very well
on x86-style hardware.


-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH/RFC 7/9] Virtual network guest device driver

2007-05-22 Thread Anthony Liguori
Eric Van Hensbergen wrote:
 On 5/22/07, Avi Kivity [EMAIL PROTECTED] wrote:
 Anthony Liguori wrote:
 
  In a PV environment why not just pass an initial cookie/hash/whatever
  as a command-line argument/register/memory-space to the underlying
  kernel?
 
 
  You can't pass a command line argument to Windows (at least, not 
 easily
  AFAIK).  You could get away with an MSR/CPUID flag but then you're
  relying on uniqueness which isn't guaranteed.
 

 In the general case, you can't pass a command line argument to Linux
 either.  kvm doesn't boot Linux; it boots the bios, which boots the boot
 sector, which boots grub, which boots Linux.  Relying on the user to
 edit the command line in grub is wrong.


 I didn't think we were talking about the general case, I thought we
 were discussing the PV case.

It is still useful to use PV drivers with full virtualization so it's 
something that ought to be considered.

Regards,

Anthony Liguori

   In the PV case, having bios/bootloader
 is unnecessary overhead.  To that same end, I don't see Windows in the
 PV case unless they magically want to to coordinate PV standards with
 us, in which case we certainly can negotiate a more sane discovery
 mechanism.

-eric



-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH/RFC 7/9] Virtual network guest device driver

2007-05-22 Thread ron minnich
On 5/22/07, Anthony Liguori [EMAIL PROTECTED] wrote:
 Eric Van Hensbergen wrote:
  On 5/22/07, Christoph Hellwig [EMAIL PROTECTED] wrote:
 
  I didn't think we were talking about the general case, I thought we
  were discussing the PV case.
 
 
  In case of KVM no one is speaking of pure PV.
 
 
 
  Why not?  It seems worthwhile to come up with something that can cover
  the whole spectrum instead of having different hypervisors (and
  interfaces).
 

 Because in a few years, almost everyone will have hardware capable of
 doing full virtualization so why bother with pure PV.

I don't know, we could shoot for a clean, simple interface that makes
PV easy to integrate into any kernel. Pick a common underlying
abstraction for all resources.
Define a simple, efficient memory channel for the comms. Lay 9p over
it. Then take it from there for each device.

I agree, from the way (e.g.) the Xen devices work, PV is a pain. But
it need not be that way.

I think from the Plan 9 side we're happy to run full PV. But we're 0%
of the world, so that may bias our importance a bit :-)

thanks

ron

-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH/RFC 7/9] Virtual network guest device driver

2007-05-22 Thread Eric Van Hensbergen
On 5/22/07, Anthony Liguori [EMAIL PROTECTED] wrote:
 Eric Van Hensbergen wrote:
  On 5/22/07, Christoph Hellwig [EMAIL PROTECTED] wrote:
 
 
  In case of KVM no one is speaking of pure PV.
 
 
 
  Why not?  It seems worthwhile to come up with something that can cover
  the whole spectrum instead of having different hypervisors (and
  interfaces).
 

 Because in a few years, almost everyone will have hardware capable of
 doing full virtualization so why bother with pure PV.


No matter what the capabilities, full device emulation is always going
to be wasteful.   Just because I have the hardware to run Vista,
doesn't mean I should run Vista.

  Maybe my view is skewed because I don't care to run windows.
 

 It's not just windows.  There are a lot of people who want to use
 virtualization to run RHEL2 or even RH9.  Backporting PV to these
 kernels is a huge effort.


I'm not opposed to supporting emulation environments, just don't make
a large pile of crap the default like Xen -- and having to integrate
PCI probing code in my guest domains is a large pile of crap.

  -eric

-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH/RFC 7/9] Virtual network guest device driver

2007-05-22 Thread ron minnich
On 5/22/07, Eric Van Hensbergen [EMAIL PROTECTED] wrote:

 I'm not opposed to supporting emulation environments, just don't make
 a large pile of crap the default like Xen -- and having to integrate
 PCI probing code in my guest domains is a large pile of crap.

Exactly. I'm about to start a pretty large project here, using xen or
kvm, not sure. One thing for sure, we are NOT going to use anything
but PV devices. Full emulation is nice, but it's just plain silly if
you don't have to do it. And we don't have to do it. So let's get the
PV devices right, not try to shoehorn them into some framework like
PCI.

What happens to these schemes if I want to try, e.g., 2^16 PV devices?
Or some other crazy thing that doesn't play well with PCI -- simple
example -- I want a 256 GB region of memory for a device. PCI rules
require me to align it on 256GB boundaries and it must be contiguous
address space. This is a hardware rule, done for hardware reasons, and
has no place in the PV world. What if I want a bit more than the basic
set of BARs that PCI gives me? Why would we apply such rules to a PV?
Why limit ourselves this early in the game?

thanks

ron

-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH/RFC 7/9] Virtual network guest device driver

2007-05-22 Thread ron minnich
On 5/22/07, Dor Laor [EMAIL PROTECTED] wrote:

 Don't quit so soon on us.

OK. I'll go look at Ingo's stuff.

Thanks again

ron

-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH/RFC 7/9] Virtual network guest device driver

2007-05-21 Thread Christian Borntraeger
 This is quite easy with KVM.  I like the approach that vmchannel has 
 taken.  A simple PCI device.  That gives you a discovery mechanism for 
 shared memory and an interrupt and then you can just implement a ring 
 queue using those mechanisms (along with a PIO port for signaling from 
 the guest to the host).  So given that underlying mechanism, the 
 question is how to expose that within the guest kernel/userspace and 
 within the host.

Sorry for answering late, but I dont like PCI as a device bus for all
platforms. s390 has no PCI and s390 has no PIO. I would prefer a new 
simple hypercall based virtual bus. I dont know much about windows 
driver programming, but I guess it it is not that hard to add a new bus.
-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH/RFC 7/9] Virtual network guest device driver

2007-05-21 Thread Cornelia Huck
On Mon, 21 May 2007 13:28:03 +0200,
Arnd Bergmann [EMAIL PROTECTED] wrote:

 We've had the same discussion about PCI as virtual device abstraction
 recently when hpa made the suggestions to get a set of PCI device
 numbers registered for Linux.

(If you want to read it up, it's the thread at
http://marc.info/?t=11755452543r=1w=2)

 
 IIRC, the conclusion to which we came was that it is indeed helpful
 for most architecture to have a PCI device as one way to probe for
 the functionality, but not to rely on it. s390 is the obvious
 example where you can't have PCI, but you may also want to build
 a guest kernel without PCI support because of space constraints
 in a many-guests machine.
 
 What I think would be ideal is to have a new bus type in Linux
 that does not have any dependency on PCI itself, but can be
 easily implemented as a child of a PCI device.
 
 If we only need the stuff mentioned by Anthony, the interface could
 look like
 
 struct vmchannel_device {
   struct resource virt_mem;
   struct vm_device_id id;
   int irq;

   int (*signal)(struct vmchannel_device *);
   int (*irq_ack)(struct vmchannel_device *);
   struct device dev;
 };

IRQ numbers are evil :)

It should be more like a
void *vmchannel_device_handle;
which could be different things depending on what we want the
vmchannel_device to be a child of (it could be an IRQ number for
PCI devices, or something like subchannel_id if we wanted to
support channel devices).

 
 Such a device can easily be provided as a child of a PCI device,
 or as something that is purely virtual based on an hcall interface.

This looks like a flexible approach.

-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH/RFC 7/9] Virtual network guest device driver

2007-05-21 Thread Arnd Bergmann
On Monday 21 May 2007, Cornelia Huck wrote:
 IRQ numbers are evil :)

yes, but getting rid of them is an entirely different discussion.
I really think that in the first step, you should be able to
use its external interrupts with the same request_irq interface
as the other architectures.

Fundamentally, the s390 architecture has external interrupt numbers
as well, you're just using a different interface for registering them.
The ccw devices obviously have a better interface already, but
that doesn't help you here.
 
 It should be more like a
 void *vmchannel_device_handle;
 which could be different things depending on what we want the
 vmchannel_device to be a child of (it could be an IRQ number for
 PCI devices, or something like subchannel_id if we wanted to
 support channel devices).

No, the driver needs to know how to get at the interrupt without
caring about the bus implementation, that's why you either need
to have a callback function set by the driver (like s390 CCW
or USB have it), or visible interrupt number (like everyone does).

There is no need for a pointer back to a vmchannel_device_handle,
all information needed by the bus layer can simply be in a
subclass derived from the vmchannel_device, e.g.

struct vmchannel_pci {
struct pci_device *parent; /* shortcut, same as
  to_pci_dev(this.vmdev.dev.parent) */
unsigned long signal_ioport; /* for interrupt generation */
struct vmchannel_device vmdev;
};

You would allocate this structure in the pci_driver that registers
the vmchannel_device.

Arnd 

-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH/RFC 7/9] Virtual network guest device driver

2007-05-21 Thread Anthony Liguori
Arnd Bergmann wrote:
 On Monday 21 May 2007, Christian Borntraeger wrote:
   
 This is quite easy with KVM.  I like the approach that vmchannel has 
 taken.  A simple PCI device.  That gives you a discovery mechanism for 
 shared memory and an interrupt and then you can just implement a ring 
 queue using those mechanisms (along with a PIO port for signaling from 
 the guest to the host).  So given that underlying mechanism, the 
 question is how to expose that within the guest kernel/userspace and 
 within the host.
   
 Sorry for answering late, but I dont like PCI as a device bus for all
 platforms. s390 has no PCI and s390 has no PIO. 

Right, I'm not interested in the lowest level implementation (PCI device 
+ PIO).  I'm more interested in the higher level interface.  The goal is 
to allow drivers to be able to be written to the higher level interface 
so that they work on any platform that implements the lower level 
interface.  On x86, that would be PCI/PIO.  On s390, that could be 
hypercall based.

 I would prefer a new 
 simple hypercall based virtual bus. I dont know much about windows 
 driver programming, but I guess it it is not that hard to add a new bus.
 

 We've had the same discussion about PCI as virtual device abstraction
 recently when hpa made the suggestions to get a set of PCI device
 numbers registered for Linux.

 IIRC, the conclusion to which we came was that it is indeed helpful
 for most architecture to have a PCI device as one way to probe for
 the functionality, but not to rely on it. s390 is the obvious
 example where you can't have PCI, but you may also want to build
 a guest kernel without PCI support because of space constraints
 in a many-guests machine.

 What I think would be ideal is to have a new bus type in Linux
 that does not have any dependency on PCI itself, but can be
 easily implemented as a child of a PCI device.

 If we only need the stuff mentioned by Anthony, the interface could
 look like

 struct vmchannel_device {
   struct resource virt_mem;
   struct vm_device_id id;
   int irq;
   int (*signal)(struct vmchannel_device *);
   int (*irq_ack)(struct vmchannel_device *);
   struct device dev;
 };

 Such a device can easily be provided as a child of a PCI device,
 or as something that is purely virtual based on an hcall interface.
   

Yes, this is close to what I was thinking.  I'm not sure that this 
particular interface can encompass the variety of memory sharing 
mechanisms though.

When I mentioned shared memory via the PCI device, I was referring to 
the memory needed for boot strapping the device.  You still need a 
mechanism to transfer memory for things like zero-copy disk IO and 
network devices.  This may involve passing memory addresses directly, 
copying data, or page flipping.

This leads me to think that a higher level interface that provided a 
data passing interface would be more useful.  Something like:

struct vmchannel_device {
struct vm_device_id id;
int (*open)(struct vmchannel_device *, const char *name, const char 
*service)
int (*release)(struct vmchannel_device *);
ssize_t (*sendmsg)(struct vmchannel_device *, const void *, size_t);
ssize_t (*recvmsg)(struct vmchannel_device *, void *, size_t);
struct device dev;
};

The consuming interface of this would be a socket (PF_VIRTLINK).  The 
sockaddr would contain a name identifying a VM and a service description.

This doesn't address the memory issues I raised above but I think it 
would be easier to special case the drivers where it mattered.  For 
instance, on x86 KVM, a PV disk driver front end would consist of 
connecting to a virtlink socket, and then transferring struct bio's.  
QEMU instances would listen on the virtlink socket in the host, and 
service them directly (QEMU can access all of the guests memory directly 
in userspace).

A PV graphics device could just be a VNC server that listened on a 
virtlink socket.

Regards,

Anthony Liguori

   Arnd 
   


-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH/RFC 7/9] Virtual network guest device driver

2007-05-21 Thread ron minnich
OK, so what are we doing here? We're using a PCI abstraction, as a
common abstraction,which is not common really, because we don't have a
common abstraction? So we describe all these non-pci resources with a
pci abstraction?

I don't get it at all. I really think the resource interface idea I
mentioned, which is borrowed from Plan 9, makes  a whole lot more
sense.  IBM Austin has already shown it in practice in the papers I
referenced. It can work. A memory channel at the bottom, with a
resource sharing protocol (9p) above it, and then you describe your
resources via names and a simple file-directory model. Note that PCI
sort of tries to do this tree model, but it's all binary, and, as
noted, it's hardly universal.

All of this is trivially exported over a network, so the use of shared
memory channels in no way rules out network access. Plan 9 exports
devices over the network routinely.

If you're using a PCI abstraction, something has gone badly wrong I think.

thanks

ron

-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH/RFC 7/9] Virtual network guest device driver

2007-05-21 Thread ron minnich
On 5/21/07, Anthony Liguori [EMAIL PROTECTED] wrote:
 ron minnich wrote:
  OK, so what are we doing here? We're using a PCI abstraction, as a
  common abstraction,which is not common really, because we don't have a
  common abstraction? So we describe all these non-pci resources with a
  pci abstraction?
 

 No.  You're confusing PV device discovery with the actual paravirtual
 transport.  In a fully virtual environment like KVM, a PCI bus is
 present.  You need some way for the guest to detect that a PV device is
 present.  The most natural way to do this IMHO is to have an entry for
 the PV device in the PCI bus.  That will make a lot of existing code happy.


I don't think I am confusing it, now that you've explained it more
fully. I'm even less happy with it :-)

How will I explain this sort of thing to my grandchildren? :-)
grandpop, why do those PV devices look like a bus defined in 1994?

Why would you not have, e.g., a 9p server for PV device config space
as well? I actually implemented that on Xen -- it was quite trivial,
and it makes more sense -- to me anyway -- than pretending a PV device
is something it's not.

What it happening, it seems to me, is that people are still trying to
use an abstraction -- PCI device -- which is not really an
abstraction, to model aspects of PV device discovery, enumeration,
configuration and operation. I'm still pretty uncomfortable with it --
well, honestly, it seems kind of gross to me. It's just as easy to
build the right abstraction underneath all this, and then, for those
OSes that have existing code that needs to be happy, present that
abstraction as a PCI bus. But making the PCI bus the underlying
abstraction is getting the order inverted,I believe.

I realize that PCI device space is a pretty handy way to do this, that
it is very convenient. I wonder what happens when you get a system
without enough holes in the config space for you to hide the PV
devices in, or that has some other weird property that breaks this
model. I've already worked with one system that had 32 PCI busses.

There are other hypervisors that made convenient choices over the
right choice, and they are paying for it. Let's try to avoid that on
kvm. Kvm has so much going for it right now.

thanks

ron

-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH/RFC 7/9] Virtual network guest device driver

2007-05-21 Thread Anthony Liguori
ron minnich wrote:
 On 5/21/07, Anthony Liguori [EMAIL PROTECTED] wrote:
 No.  You're confusing PV device discovery with the actual paravirtual
 transport.  In a fully virtual environment like KVM, a PCI bus is
 present.  You need some way for the guest to detect that a PV device is
 present.  The most natural way to do this IMHO is to have an entry for
 the PV device in the PCI bus.  That will make a lot of existing code 
 happy.


 I don't think I am confusing it, now that you've explained it more
 fully. I'm even less happy with it :-)

Sometimes I think the best way to make you happy is to just stop talking :-)

 How will I explain this sort of thing to my grandchildren? :-)
 grandpop, why do those PV devices look like a bus defined in 1994?

 Why would you not have, e.g., a 9p server for PV device config space
 as well? I actually implemented that on Xen -- it was quite trivial,
 and it makes more sense -- to me anyway -- than pretending a PV device
 is something it's not.

 What it happening, it seems to me, is that people are still trying to
 use an abstraction -- PCI device -- which is not really an
 abstraction, to model aspects of PV device discovery, enumeration,
 configuration and operation. I'm still pretty uncomfortable with it --
 well, honestly, it seems kind of gross to me. It's just as easy to
 build the right abstraction underneath all this, and then, for those
 OSes that have existing code that needs to be happy, present that
 abstraction as a PCI bus. But making the PCI bus the underlying
 abstraction is getting the order inverted,I believe.

Okay.  The first problem here is that you're assuming that I'm 
suggesting that this who thing mandate a PCI bus.  I'm not.  I'm merely 
saying that one possible way to implement this is by using a PCI bus to 
discover the existing of a VIRTLINK socket.  Clearly, the s390 guys 
would have to use something else.

For PV Xen where there is no PCI bus, XenBus would be used.  So very 
concretely, there are three separate classes of problems:

1) How to determine that a VM can use virtlink sockets
2) How to enumerate paravirtual devices
3) The various PV protocols for each device

Whatever Linux implements, it has to allow multiple implementations for 
#1.  For x86 VMs, PCI is just the easiest thing to do here.  You could 
do hypercalls but it gets messy on different hypervisors (vmcall with 0 
in eax may do something funky in Xen but be the probing hypercall on KVM).

For #2, I'm not really proposing anything concrete.  One possibility is 
to allow virtlink sockets to be addressed with a service and to use 
that.  That doesn't allow for enumeration though so it may not be perfect.

I'm not proposing anything at all for #3.  That's outside the scope of 
this discussion in my mind.

Now, once you have a virtlink socket, could you use p9 to implement #2 
and #3?  Sounds like something you could write a paper about :-) But 
that's later argument.  Right now, I'm just focused on solving the boot 
strap issue.

Hope this clarifies things a bit.

Regards,

Anthony Liguori

 I realize that PCI device space is a pretty handy way to do this, that
 it is very convenient. I wonder what happens when you get a system
 without enough holes in the config space for you to hide the PV
 devices in, or that has some other weird property that breaks this
 model. I've already worked with one system that had 32 PCI busses.

 There are other hypervisors that made convenient choices over the
 right choice, and they are paying for it. Let's try to avoid that on
 kvm. Kvm has so much going for it right now.

 thanks

 ron



-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH/RFC 7/9] Virtual network guest device driver

2007-05-21 Thread Anthony Liguori
Eric Van Hensbergen wrote:
 On 5/21/07, Anthony Liguori [EMAIL PROTECTED] wrote:
 ron minnich wrote:
  OK, so what are we doing here? We're using a PCI abstraction, as a
  common abstraction,which is not common really, because we don't have a
  common abstraction? So we describe all these non-pci resources with a
  pci abstraction?
 

 No.  You're confusing PV device discovery with the actual paravirtual
 transport.

 In a PV environment why not just pass an initial cookie/hash/whatever
 as a command-line argument/register/memory-space to the underlying
 kernel?

You can't pass a command line argument to Windows (at least, not easily 
AFAIK).  You could get away with an MSR/CPUID flag but then you're 
relying on uniqueness which isn't guaranteed.

   The presence of such a kernel argument would suggest the
 existence of a hypercall interface or other such mechanism to attach
 to the initial transport(s).  Command-line arguments may be a bit too
 linux-centric to Ron's taste, but if we are going to chose something
 arbitrary like PCI, I'd prefer we chose something a bit more
 straightforward to interact with instead of doing crazy ritual dances
 to extract what should be straightforward information.  I really don't
 want to have integrate PCI parsing into my testOS/libOS kernels.

You could just hard code a PIC interrupt and rely on some static memory 
address for IO and avoid the PCI bus entirely.  The whole point of the 
PCI bus is to avoid hardcoding this sort of things but if you don't want 
the complexity associated with PCI, then using the older mechanisms 
seems like the obvious thing to do.

Regards,

Anthony Liguori

-eric



-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH/RFC 7/9] Virtual network guest device driver

2007-05-18 Thread Anthony Liguori
ron minnich wrote:
 Hi Anthony,

 I still feel that how about a socket interface is still focused on
 the how to implement, and not what the interface should be.

Right.  I'm not trying to answer that question ATM.  There are a number 
of paravirt devices that would be useful in a virtual setting.  For 
instance, a PV device for providing the guest with entropy and a shared 
PV clipboard.  These devices should be simple but all current 
communication mechanisms are far too complicated.

 I also
 am not sure the socket system call interface is quite what we want,
 although it's a neat idea.  It's also not that portable outside the
 everything is a Linux variant world.

A filesystem interface certainly isn't very portable outside the POSIX 
world :-)

 Once it is connected, we can move data.

 This is similar to your socket idea, but consider that:
 o to see active vmics, I use 'ls'
 o I don't have to create a new sockaddr address type
 o I can control access with chmod
 o I am seperating the interface from the implementation
 o This is, of course, not really 'files', but in-memory data
 structures; this can
  (and will) be fast
 o No binary data structures.
  For different domains, even on the same machine, alignment rules etc. 
 are not
  always the same -- I hit this when I ported Plan 9 to Xen, esp. back 
 when Xen
   relied so heavily on gcc tricks such as __align__ and packed. Using
 character strings
  eliminates that problem.

The interface you're proposing is almost functionally identical to a 
socket.  In fact, once you open /data you've got an fd that you interact 
with in the same way as you would interact with a socket.

It's not that there's an unique value for this sort of interface in 
virtualization; I don't think you're making that argument.  Instead, 
you're making a general argument as to why this way of doing things is 
better than what Unix has been doing forever (with things like 
sockets).  That's fine, I think you have a valid point, but that's a 
larger argument to have on LKML or at a conference.  This isn't the 
place to shoe-horn this sort of thing.

A socket interface would provide a simple, well-understood interface 
that few people in the Linux community would disagree with (it's already 
there for s390).  It should also be easy enough to stream p9 over the 
socket so you can build these interfaces easily and continue your 
attempts to expose the world as a virtual filesystem :-)

Regards,

Anthony Liguori

 This is, I think, the kind of thing Eric would also like to see, but
 he can correct me.
 Thanks

 ron


-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH/RFC 7/9] Virtual network guest device driver

2007-05-18 Thread ron minnich
On 5/18/07, Anthony Liguori [EMAIL PROTECTED] wrote:

  I also
  am not sure the socket system call interface is quite what we want,
  although it's a neat idea.  It's also not that portable outside the
  everything is a Linux variant world.

 A filesystem interface certainly isn't very portable outside the POSIX
 world :-)

Actually, it probably the most portable thing you can have.

 The interface you're proposing is almost functionally identical to a
 socket.  In fact, once you open /data you've got an fd that you interact
 with in the same way as you would interact with a socket.

Well, sure, I stole the interface from Plan 9, and they use this
interface to do sockets, among *many* other things -- and there's the
point. The interface is not just sockets. But if you're used to
sockets, it looks familiar. I only steal from the best :-)

Note, btw, that the fd has a path, and can be examined easily, and
also passed to other programs for use. That's messy and ugly with
sockets.


 It's not that there's an unique value for this sort of interface in
 virtualization; I don't think you're making that argument.  Instead,
 you're making a general argument as to why this way of doing things is
 better than what Unix has been doing forever (with things like
 sockets)

Yes, Unix has been doing it this way forever. The interface I am
proposing was
the one designed by the Unix guys -- once they realized how deficient
the Unix way of doing things had become.

But, forgetting all this argument, it still seems to me that the file
system interface is far simpler than a socket interface. No binary
structures. No new sockaddr structures needed. No alignment/padding
rules. You can actually set up a link from a shell script, or perl, or
python, or whatever, without a special set of bindings.

 A socket interface would provide a simple, well-understood interface
 that few people in the Linux community would disagree with (it's already
 there for s390).

Yes, but ... well understood to the Linux community. Can we look at a
broader scope?

We've got a golden opportunity here to build a really flexible VMIC
interface. I would hate to lose it.

Anyway, thanks for discussing this.

ron

-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH/RFC 7/9] Virtual network guest device driver

2007-05-17 Thread Carsten Otte
Daniel P. Berrange wrote:
 As a userspace apps service, I'd very much like to see a common sockets 
 interface for inter-VM communication that is portable across virt systems 
 like Xen  KVM. I'd see it as similar to UNIX domain sockets in style. So 
 basically any app which could do UNIX domain sockets, could be ported to 
 inter-VM sockets by just changing PF_UNIX to say,  PF_VIRT
 Lots of interesting details around impl  security (what VMs are allowed
 to talk to each other, whether this policy should be controlled by the
 host, or allow VMs to decide for themselves).
z/VM, the premium hypervisor on 390 already has this capability for 
decades. This is called IUCV (inter user communication vehicle), where 
user really means virtual machine. It so happens the support for 
AF_IUCV was recently merged to Linux mainline. It may be worth a look, 
either for using it or because learning from existing solutions is 
always a good idea.

so long,
Carsten


-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH/RFC 7/9] Virtual network guest device driver

2007-05-17 Thread Anthony Liguori
Carsten Otte wrote:
 Daniel P. Berrange wrote:
   
 As a userspace apps service, I'd very much like to see a common sockets 
 interface for inter-VM communication that is portable across virt systems 
 like Xen  KVM. I'd see it as similar to UNIX domain sockets in style. So 
 basically any app which could do UNIX domain sockets, could be ported to 
 inter-VM sockets by just changing PF_UNIX to say,  PF_VIRT
 Lots of interesting details around impl  security (what VMs are allowed
 to talk to each other, whether this policy should be controlled by the
 host, or allow VMs to decide for themselves).
 
 z/VM, the premium hypervisor on 390 already has this capability for 
 decades. This is called IUCV (inter user communication vehicle), where 
 user really means virtual machine. It so happens the support for 
 AF_IUCV was recently merged to Linux mainline. It may be worth a look, 
 either for using it or because learning from existing solutions is 
 always a good idea.
   

Is there anything that explains what the fields in sockaddr mean:

sa_family_tsiucv_family;
unsigned shortsiucv_port;/* Reserved */
unsigned intsiucv_addr;/* Reserved */
charsiucv_nodeid[8];/* Reserved */
charsiucv_user_id[8];/* Guest User Id */
charsiucv_name[8];/* Application Name */

Regards,

Anthony LIugori

 so long,
 Carsten


 -
 This SF.net email is sponsored by DB2 Express
 Download DB2 Express C - the FREE version of DB2 express and take
 control of your XML. No limits. Just data. Click to get it now.
 http://sourceforge.net/powerbar/db2/
 ___
 kvm-devel mailing list
 kvm-devel@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/kvm-devel

   


-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH/RFC 7/9] Virtual network guest device driver

2007-05-17 Thread Anthony Liguori
Rusty Russell wrote:
 On Wed, 2007-05-16 at 14:10 -0500, Anthony Liguori wrote:
   
 For the host, you can probably stay entirely within QEMU.  Interguest 
 communication would be a bit tricky but guest-host communication is 
 real simple.
 

 guest-host is always simple.  But it'd be great if it didn't matter to
 the guest whether it's talking to the host or another guest.

 I think shared memory is an obvious start, but it's not enough for
 inter-guest where they can't freely access each other's memory.  So you
 really want a ring-buffer of descriptors with a hypervisor-assist to say
 read/write this into the memory referred to by that descriptor.
   

I think this is getting a little ahead of ourselves.  An example of this 
idea is pretty straight-forward but it gets more complicated when trying 
to support the existing memory sharing mechanisms on various 
hypervisors.  There are a few cases to consider:

1) The target VM can access all of the memory of the guest VM with no 
penalty.  This is the case when going from guest=QEMU in KVM or going 
from guest=kernel (ignoring highmem) in KVM.  For this, you can send 
arbitrary memory to the host.

2) The target VM can access all of the memory of the guest VM with a 
penalty.  For guest=other userspace process in KVM, an mmap() would be 
required.  This would work for Xen provided the target VM was domain-0 
but it would incur a xc_map_foreign_range().

3) The target and source VM can only share memory based on an existing 
pool.  This is the guest with Xen and grant tables.

I think an API that covers these three cases is a bit tricky and will 
likely make undesired trade-offs.  I think it's easier to start out 
focusing on the low-speed case where there's a mandatory data-copy.

You can still pass gntref's or PFNs down this transport if you like and 
perhaps down the road we'll find that we can make a common interface for 
doing this sort of thing.

Regards,

Anthony Liguori

 I think this can be done as a simple variation of the current schemes in
 existence.

 But I'm shutting up until I have some demonstration code 8)

   
 A tricky bit of this is how to do discovery.  If you want to support 
 interguest communication, it's not really sufficient to just use strings 
 since they identifiers would have to be unique throughout the entire 
 system.  Maybe you just leave it as a guest=host channel and be done 
 with it.
 

 Hmm, I was going to leave that unspecified.  One thing at a time...

 Rusty.


 -
 This SF.net email is sponsored by DB2 Express
 Download DB2 Express C - the FREE version of DB2 express and take
 control of your XML. No limits. Just data. Click to get it now.
 http://sourceforge.net/powerbar/db2/
 ___
 kvm-devel mailing list
 kvm-devel@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/kvm-devel

   


-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH/RFC 7/9] Virtual network guest device driver

2007-05-17 Thread Rusty Russell
On Thu, 2007-05-17 at 11:13 -0500, Anthony Liguori wrote:
 Rusty Russell wrote:
  I think shared memory is an obvious start, but it's not enough for
  inter-guest where they can't freely access each other's memory.  So you
  really want a ring-buffer of descriptors with a hypervisor-assist to say
  read/write this into the memory referred to by that descriptor.
 
 I think this is getting a little ahead of ourselves.  An example of this 
 idea is pretty straight-forward but it gets more complicated when trying 
 to support the existing memory sharing mechanisms on various 
 hypervisors.  There are a few cases to consider:

To clarify, I'm not overly interested in existing mechanisms.  I'm first
trying for something sane from a Linux driver POV, then see if it can be
implemented in terms of legacy systems.

This reflects my belief that we will see more virtualization solutions
in the medium term, so it's reasonable to look at a new system.

Cheers,
Rusty.


-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH/RFC 7/9] Virtual network guest device driver

2007-05-17 Thread ron minnich
On 5/16/07, Anthony Liguori [EMAIL PROTECTED] wrote:
 What do you think about a socket interface?  I'm not sure how discovery
 would work yet, but there are a few PV socket implementations for Xen at
 the moment.

Hi Anthony,

I still feel that how about a socket interface is still focused on
the how to implement, and not what the interface should be. I also
am not sure the socket system call interface is quite what we want,
although it's a neat idea.  It's also not that portable outside the
everything is a Linux variant world.

So how about this as an interface design. The communications channels
are visible in our name space at a mountpoint of our choice. Let's
call this mount point, for sake of argument, vmic.

When we mount on vmic, we see one file:
/vmic/clone

When we open and read /vmic/clone, we get a number, let's pretend for
this example we get  '0'. The numbers are not important, except to
distinguish connections. Opening the clone file gets us a connection
endpoint. Ls of the directory now shows this:
/vmic/clone
/vmic/0

The directory, and the files in it, are owned by me, mode 700 or
600 or 400 as the file requires. The mode can be changed, of course,
if I wish to allow wider access to the channel. Here, already, we see
some advantage to the use of the file system for this type of
capability.

What is in the directory? Here is one proposal.
/vmic/0/data
/vmic/0/status
/vmic/0/ctl
/vmic/0/local
/vmic/0/remote
What can we do with this?
Data is pretty obvious: we can read it or write it, and that data is
received/sent from the other endpoint. Note that I'm not saying how
the data flows: it can be done in whatever manner is most efficient,
by the kernel, including zero copy. It can be different for many
reasons, but the point is that the interface is basically unchanging.
Of course, it is an error to read or write data until something at the
other end connects to the local end!

What is status? We cat it and it gets us status in some meaningful
text string. E.g.:
cat /vmic/0/status
connected /domain/name

What is local? It's our local name for the resource in this domain
What is remote? It's the name of other endpoint.

What's a name look like? I'm thinking it might look like /domain/name,
but that is just a guess ...

What is ctl? here is where the fun begins. We might do things such as
echo bind somename  /vmic/0/ctl
this names the vmic. We might want to wait for a connection:
echo listen 1 /vmic/0/ctl
We might want to restrict it somehow
echo key somekey  /vmic/0/ctl
echo listendomain domainnumber  /vmic/0/ctl
or we might know there is something out there.
echo connect /domainname/somename  /vmic/0/ctl

Once it is connected, we can move data.

This is similar to your socket idea, but consider that:
o to see active vmics, I use 'ls'
o I don't have to create a new sockaddr address type
o I can control access with chmod
o I am seperating the interface from the implementation
o This is, of course, not really 'files', but in-memory data
structures; this can
  (and will) be fast
o No binary data structures.
  For different domains, even on the same machine, alignment rules etc. are not
  always the same -- I hit this when I ported Plan 9 to Xen, esp. back when Xen
   relied so heavily on gcc tricks such as __align__ and packed. Using
character strings
  eliminates that problem.

This is, I think, the kind of thing Eric would also like to see, but
he can correct me.
Thanks

ron

-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH/RFC 7/9] Virtual network guest device driver

2007-05-16 Thread Daniel P. Berrange
On Wed, May 16, 2007 at 12:28:00PM -0500, Anthony Liguori wrote:
 Eric Van Hensbergen wrote:
  On 5/11/07, Anthony Liguori [EMAIL PROTECTED] wrote:
 
  There's definitely a conversation to have here.  There are going to be a
  lot of small devices that would benefit from a common transport
  mechanism.  Someone mentioned a PV entropy device on LKML.  A
  host=guest filesystem is another consumer of such an interface.
 
  I'm inclined to think though that the abstraction point should be the
  transport and not the actual protocol.  My concern with standardizing on
  a protocol like 9p would be that one would lose some potential
  optimizations (like passing PFN's directly between guest and host).
 
 
  I think that there are two layers - having a standard, well defined,
  simple shared memory transport between partitions (or between
  emulators and the host system) is certainly a prerequisite.  There are
  lots of different decisions to made here:
 
 What do you think about a socket interface?  I'm not sure how discovery 
 would work yet, but there are a few PV socket implementations for Xen at 
 the moment.

As a userspace apps service, I'd very much like to see a common sockets 
interface for inter-VM communication that is portable across virt systems 
like Xen  KVM. I'd see it as similar to UNIX domain sockets in style. So 
basically any app which could do UNIX domain sockets, could be ported to 
inter-VM sockets by just changing PF_UNIX to say,  PF_VIRT
Lots of interesting details around impl  security (what VMs are allowed
to talk to each other, whether this policy should be controlled by the
host, or allow VMs to decide for themselves).

   a) does it communicate with userspace, kernelspace, or both?
 
 sockets are usable for both userspace/kernespace.

For userspace, it would be very easy to adapt existing sockets based
apps using IP or UNIX sockets to use inter-VM sockets, which is a big
positive.

   d) can all of these parameters be something controllable from userspace?
   e) I'm sure there are many others that I can't be bothered to think
  of on a Friday
 
 The biggest point of contention would probably be what goes in the 
 sockaddr structure.

Keeping it very simple would be some arbitrary 'path', similar to UNIX 
domain sockets in the abstract namespace ?

Regards,
Dan.
-- 
|=- Red Hat, Engineering, Emerging Technologies, Boston.  +1 978 392 2496 -=|
|=-   Perl modules: http://search.cpan.org/~danberr/  -=|
|=-   Projects: http://freshmeat.net/~danielpb/   -=|
|=-  GnuPG: 7D3B9505   F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505  -=| 

-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH/RFC 7/9] Virtual network guest device driver

2007-05-16 Thread Eric Van Hensbergen
On 5/16/07, Anthony Liguori [EMAIL PROTECTED] wrote:

 What do you think about a socket interface?  I'm not sure how discovery
 would work yet, but there are a few PV socket implementations for Xen at
 the moment.


From a functional standpoint I don't have a huge problem with it,
particularly if its more of a pure socket and not something that tries
to look like a TCP/IP endpoint -- I would prefer something closer to
netlink.  Sockets would allow the exisitng 9p stuff to pretty much
work as-is.

However, all that being said, I noticed some pretty big differences
between sockets and shared memory in terms of overhead under Linux.

If you take a look at the RPC latency graph in:
http://plan9.escet.urjc.es/iwp9/cready/PROSE_iwp9_2006.pdf

You'll see that a local socket implementation has about an order of
magnitude worse latency than a PROSE/Libra inter-partition shared
memory channel.  Furthermore it will really limit our ability to trim
the fat of unnecessary copies in order to have competitive
performance.  But perhaps there's magic you can do to eliminate that.

Of course, you could always layer a socket interface for userspace
simplicity on top of a more performance-optimized underlying transport
that could be used directly by kernel-modules.

  -eric

-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH/RFC 7/9] Virtual network guest device driver

2007-05-16 Thread Gregory Haskins
 On Wed, May 16, 2007 at  1:28 PM, in message [EMAIL PROTECTED],
Anthony Liguori [EMAIL PROTECTED] wrote: 
 
 What do you think about a socket interface?  I'm not sure how discovery 
 would work yet, but there are a few PV socket implementations for Xen at 
 the moment.

FYI: The work I am doing is exactly that.  I am going to extend host-based unix 
domain sockets up to the KVM guest.  Not sure how well it will work yet, as I 
had to lay the LAPIC work down first for IO-completion.

-Greg


-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH/RFC 7/9] Virtual network guest device driver

2007-05-16 Thread Anthony Liguori
Gregory Haskins wrote:
 On Wed, May 16, 2007 at  1:28 PM, in message [EMAIL PROTECTED],
 
 Anthony Liguori [EMAIL PROTECTED] wrote: 
   
 What do you think about a socket interface?  I'm not sure how discovery 
 would work yet, but there are a few PV socket implementations for Xen at 
 the moment.
 

 FYI: The work I am doing is exactly that.  I am going to extend host-based 
 unix domain sockets up to the KVM guest.  Not sure how well it will work yet, 
 as I had to lay the LAPIC work down first for IO-completion.
   

Do you plan on introducing a new address family in the guest?

Regards,

Anthony Liguori

 -Greg

   


-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH/RFC 7/9] Virtual network guest device driver

2007-05-16 Thread Anthony Liguori
Eric Van Hensbergen wrote:
 On 5/16/07, Anthony Liguori [EMAIL PROTECTED] wrote:

 What do you think about a socket interface?  I'm not sure how discovery
 would work yet, but there are a few PV socket implementations for Xen at
 the moment.


 From a functional standpoint I don't have a huge problem with it,
 particularly if its more of a pure socket and not something that tries
 to look like a TCP/IP endpoint -- I would prefer something closer to
 netlink.  Sockets would allow the exisitng 9p stuff to pretty much
 work as-is.

So you would prefer assigning out types instead of using an identifier 
string in the sockaddr?

 However, all that being said, I noticed some pretty big differences
 between sockets and shared memory in terms of overhead under Linux.

 If you take a look at the RPC latency graph in:
 http://plan9.escet.urjc.es/iwp9/cready/PROSE_iwp9_2006.pdf

 You'll see that a local socket implementation has about an order of
 magnitude worse latency than a PROSE/Libra inter-partition shared
 memory channel.

You seem to suggest that the low latency is due to a very greedy (CPU 
hungry) polling algorithm.  A poll vs. interrupt model would seem to me 
to be orthogonal to using sockets as an interface.

   Furthermore it will really limit our ability to trim
 the fat of unnecessary copies in order to have competitive
 performance.  But perhaps there's magic you can do to eliminate that.

sockets do add copies.  My initial thinking is that one can work around 
this by passing guest PFNs (or grant references in Xen).  I'm also happy 
to start out focusing on low-speed devices.

 Of course, you could always layer a socket interface for userspace
 simplicity on top of a more performance-optimized underlying transport
 that could be used directly by kernel-modules.

Right.

Regards,

Anthony Liguori

  -eric


-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH/RFC 7/9] Virtual network guest device driver

2007-05-16 Thread Anthony Liguori
Gregory Haskins wrote:
 On Wed, May 16, 2007 at  2:39 PM, in message [EMAIL PROTECTED],
 
 Anthony Liguori [EMAIL PROTECTED] wrote: 
   
 Gregory Haskins wrote:
 
 On Wed, May 16, 2007 at  1:28 PM, in message [EMAIL PROTECTED],
 
 
 Anthony Liguori [EMAIL PROTECTED] wrote: 
   
   
 What do you think about a socket interface?  I'm not sure how discovery 
 would work yet, but there are a few PV socket implementations for Xen at 
 the moment.
 
 
 FYI: The work I am doing is exactly that.  I am going to extend host- based 
   
 unix domain sockets up to the KVM guest.  Not sure how well it will work 
 yet, 
 as I had to lay the LAPIC work down first for IO- completion.
 
   
   
 Do you plan on introducing a new address family in the guest?
 

 Well, since I had to step back and lay some infrastructure groundwork I 
 haven't vetted this approach yet...so its possible what I am about to say is 
 relatively naive:  But my primary application is to create a guest-kernel to 
 host IVMC.

This is quite easy with KVM.  I like the approach that vmchannel has 
taken.  A simple PCI device.  That gives you a discovery mechanism for 
shared memory and an interrupt and then you can just implement a ring 
queue using those mechanisms (along with a PIO port for signaling from 
the guest to the host).  So given that underlying mechanism, the 
question is how to expose that within the guest kernel/userspace and 
within the host.

For the host, you can probably stay entirely within QEMU.  Interguest 
communication would be a bit tricky but guest-host communication is 
real simple.

You could stop at exposing the channel as a socket within the guest 
kernel/userspace.  That would work, but you may also want to expose the 
ring queue within the kernel at least if there are consumers that need 
to avoid the copy.

A tricky bit of this is how to do discovery.  If you want to support 
interguest communication, it's not really sufficient to just use strings 
since they identifiers would have to be unique throughout the entire 
system.  Maybe you just leave it as a guest=host channel and be done 
with it.

Regards,

Anthony Liguori

   For that you can just think of the guest as any other process on the host, 
 and it will just use the sockets normally as any host-process would.  There 
 might be some thunking that has to happen to deal with gpa vs va, etc, but 
 otherwise its a standard consumer.  If you want to extend IVMC up to 
 guest-userspace, I think making some kind of new socket family makes sense in 
 the guests stack.  PF_VIRT like someone else suggested, for instance.  But 
 since I dont need this type of IVMC I haven't really thought about this too 
 much.

 -Greg


   


-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH/RFC 7/9] Virtual network guest device driver

2007-05-16 Thread Rusty Russell
On Wed, 2007-05-16 at 14:10 -0500, Anthony Liguori wrote:
 For the host, you can probably stay entirely within QEMU.  Interguest 
 communication would be a bit tricky but guest-host communication is 
 real simple.

guest-host is always simple.  But it'd be great if it didn't matter to
the guest whether it's talking to the host or another guest.

I think shared memory is an obvious start, but it's not enough for
inter-guest where they can't freely access each other's memory.  So you
really want a ring-buffer of descriptors with a hypervisor-assist to say
read/write this into the memory referred to by that descriptor.

I think this can be done as a simple variation of the current schemes in
existence.

But I'm shutting up until I have some demonstration code 8)

 A tricky bit of this is how to do discovery.  If you want to support 
 interguest communication, it's not really sufficient to just use strings 
 since they identifiers would have to be unique throughout the entire 
 system.  Maybe you just leave it as a guest=host channel and be done 
 with it.

Hmm, I was going to leave that unspecified.  One thing at a time...

Rusty.


-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH/RFC 7/9] Virtual network guest device driver

2007-05-14 Thread Christian Bornträger
On Monday 14 May 2007 14:05, Avi Kivity wrote:
 But I agree that the growing code base is a problem. With the block 
 driver we can probably keep the host side in userspace, but to do the 
 same for networking is much more work. I do think (now) that it is doable.

Interesting. What kind of userspace networking do you have in mind?

One of the first trys from Carsten was to use tun/tap, which proved to be slow 
performance-wise.

What I had in mind was some kind of switch in userspace. That would allow 
non-root guests to define there own private networks. We could use Linux fast 
pipe implementation for guest-to-guest communication. 

The questions is how to connect user space networks to the host ones?
- tun/tap is quite slow
- last time we checked, netfiler offered only IP hooks (if you dont use the 
bridging code)
- raw sockets get tricky if you do in/out at the same time because you have to 
manually deal with loops

This reminds me, that we actually have another party doing virtual networking 
between guests: UML. User mode linux actually can do networking/switching in 
userspace, but I cannot tell how well UMLs concept works out. 

Christian

-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH/RFC 7/9] Virtual network guest device driver

2007-05-14 Thread Carsten Otte
Avi Kivity wrote:
 But I agree that the growing code base is a problem. With the block 
 driver we can probably keep the host side in userspace, but to do the 
 same for networking is much more work. I do think (now) that it is doable.
I agree that networking needs to be handled in the host kernel. We go 
out to userspace for signaling at this time, but that's simply broken. 
All our userspace does is do a system call next.

so long,
Carsten

-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH/RFC 7/9] Virtual network guest device driver

2007-05-13 Thread Dor Laor
ron minnich wrote:
 Let me ask what may seem to be a naive question to the linux world. I
 see you are doing a lot off solid work on adding block and network
 devices. The code for block and network devices
 is implemented in different ways. I've also seen this difference of
 inerface/implementation on Xen.
Actually, the difference derives from the fact that block and network
are indeed different:
- block submits requests that ask the host to transfer from/to
preallocated guest data buffers via dma (request driven)
- net transmits packets that should end up in an skb on the remote
side (two way, push driven)
- net is sensitive to round-trip times, block is not due to the device
plug for request merging

We tried different access methods for both block and network. We have
selected the current communication mechanics after doing performance
measurements.
I believe for a portable solution we need to develop a set of
primitives for sending signals (read: interrupts) back and forth, for
copying data to guest memory, and for establishing shared memory
between guests and between guest+host. These primitives need to be
implemented for each platform, and paravirtual drivers should build on
top of that.
At this point in time, we are aware that these device drivers don't do
what we'd want for a portable solution. We'll focus on getting the
kernel interfaces to sie/vt/svm proper and portable first.

so long,
Carsten

Based on the previous discussion and the s390 PV drivers I have more
gasoline to pour to the flame:

We have a working PV driver with 1Gbit performance. The reasons we don't

push it into the kernel are:
a. We should perform much better
b. It would be a painful task getting all the code review that a

   complicated network interface should get.
c. There's already a PV driver that answers a,b.
The Xen's PV network driver is now pushed into the kernel.
It is optimized, and support tso.
By adding a generic ops calls we can make enjoy all the above.

Using Xen's core PV code doesn't imply that we will have their interface
{xenstore} the interface creation and tear-down would be kvm specific.
They could even have a plain directory structure.

-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH/RFC 7/9] Virtual network guest device driver

2007-05-13 Thread Anthony Liguori
Dor Laor wrote:
 push it into the kernel are:
   a. We should perform much better
   b. It would be a painful task getting all the code review that a

  complicated network interface should get.
   c. There's already a PV driver that answers a,b.
 The Xen's PV network driver is now pushed into the kernel.
   

Actually, it's not (at least not as of a few moments ago).  Furthermore, 
the plan is to completely rearchitect the netback/netfront protocol for 
the next Xen release (this effort is referred to netchannel2).

See some of the XenSummit slides as to why this is necessary.

Regards,

Anthony Liguori

 It is optimized, and support tso.
 By adding a generic ops calls we can make enjoy all the above.

 Using Xen's core PV code doesn't imply that we will have their interface
 {xenstore} the interface creation and tear-down would be kvm specific.
 They could even have a plain directory structure.

 -
 This SF.net email is sponsored by DB2 Express
 Download DB2 Express C - the FREE version of DB2 express and take
 control of your XML. No limits. Just data. Click to get it now.
 http://sourceforge.net/powerbar/db2/
 ___
 kvm-devel mailing list
 kvm-devel@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/kvm-devel

   


-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH/RFC 7/9] Virtual network guest device driver

2007-05-13 Thread Dor Laor
Subject: Re: [kvm-devel] [PATCH/RFC 7/9] Virtual network guest device
driver

Dor Laor wrote:
 push it into the kernel are:
  a. We should perform much better
  b. It would be a painful task getting all the code review that a

 complicated network interface should get.
  c. There's already a PV driver that answers a,b.
 The Xen's PV network driver is now pushed into the kernel.


Actually, it's not (at least not as of a few moments ago).
Furthermore,
the plan is to completely rearchitect the netback/netfront protocol for
the next Xen release (this effort is referred to netchannel2).

But isn't Jeremy Fitzhardinge is pushing big patch queue into the
kernel?
If we manage to plant hooks into the netback/front for using net_ops,
they and the code will get into the kernel they will be have to keep the
hooks for netchannel2. 


See some of the XenSummit slides as to why this is necessary.

It's looks like generalizing all the level 0,1,2 features plus
performance optimizations. It's not something we couldn't upgrade to.

Regards,

Anthony Liguori

 It is optimized, and support tso.
 By adding a generic ops calls we can make enjoy all the above.

 Using Xen's core PV code doesn't imply that we will have their
interface
 {xenstore} the interface creation and tear-down would be kvm
specific.
 They could even have a plain directory structure.



-
 This SF.net email is sponsored by DB2 Express
 Download DB2 Express C - the FREE version of DB2 express and take
 control of your XML. No limits. Just data. Click to get it now.
 http://sourceforge.net/powerbar/db2/
 ___
 kvm-devel mailing list
 kvm-devel@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/kvm-devel




-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH/RFC 7/9] Virtual network guest device driver

2007-05-13 Thread Anthony Liguori
Dor Laor wrote:
 Furthermore,
   
 the plan is to completely rearchitect the netback/netfront protocol for
 the next Xen release (this effort is referred to netchannel2).
 

 But isn't Jeremy Fitzhardinge is pushing big patch queue into the
 kernel?
   

Yes, but it's not in the kernel yet and there's no guarantee it'll get 
there in time for KVM's consumption.

 If we manage to plant hooks into the netback/front for using net_ops,
 they and the code will get into the kernel they will be have to keep the
 hooks for netchannel2. 

   
 See some of the XenSummit slides as to why this is necessary.
 

 It's looks like generalizing all the level 0,1,2 features plus
 performance optimizations. It's not something we couldn't upgrade to.
   

I'm curious what Rusty thinks as I do not know nearly enough about the 
networking subsystem to make an educated statement here.  Would it be 
better to just try and generalize netback/netfront or build something 
from scratch?  Could the lguest driver be generalized more easily?

Regards,

Anthony LIguori

 Regards,

 Anthony Liguori

 
 It is optimized, and support tso.
 By adding a generic ops calls we can make enjoy all the above.

 Using Xen's core PV code doesn't imply that we will have their
   
 interface
   
 {xenstore} the interface creation and tear-down would be kvm
   
 specific.
   
 They could even have a plain directory structure.


   
 
 -
   
 This SF.net email is sponsored by DB2 Express
 Download DB2 Express C - the FREE version of DB2 express and take
 control of your XML. No limits. Just data. Click to get it now.
 http://sourceforge.net/powerbar/db2/
 ___
 kvm-devel mailing list
 kvm-devel@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/kvm-devel


   


   


-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH/RFC 7/9] Virtual network guest device driver

2007-05-13 Thread Muli Ben-Yehuda
On Sun, May 13, 2007 at 11:49:14AM -0500, Anthony Liguori wrote:
 Dor Laor wrote:
  Furthermore,

  the plan is to completely rearchitect the netback/netfront
  protocol for the next Xen release (this effort is referred to
  netchannel2).
  
 
  But isn't Jeremy Fitzhardinge is pushing big patch queue into the
  kernel?

 
 Yes, but it's not in the kernel yet and there's no guarantee it'll
 get there in time for KVM's consumption.

On the other hand, there's strong interest in having unified virtual
drivers. Given that the Xen drivers are out there, have been submitted
and have been reasonably optimized, there will be some resistance to
putting in yet another set of PV drivers. Also, the contentious
merge point as I understand it is xenbus needing review, rather than
the drivers themselves which are in pretty good shape.

Cheers,
Muli

-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH/RFC 7/9] Virtual network guest device driver

2007-05-13 Thread Dor Laor
Subject: Re: [kvm-devel] [PATCH/RFC 7/9] Virtual network guest device
driver

On Sun, May 13, 2007 at 11:49:14AM -0500, Anthony Liguori wrote:
 Dor Laor wrote:
  Furthermore,
 
  the plan is to completely rearchitect the netback/netfront
  protocol for the next Xen release (this effort is referred to
  netchannel2).
 
 
  But isn't Jeremy Fitzhardinge is pushing big patch queue into the
  kernel?
 

 Yes, but it's not in the kernel yet and there's no guarantee it'll
 get there in time for KVM's consumption.

On the other hand, there's strong interest in having unified virtual
drivers. Given that the Xen drivers are out there, have been submitted
and have been reasonably optimized, there will be some resistance to
putting in yet another set of PV drivers. Also, the contentious
merge point as I understand it is xenbus needing review, rather than
the drivers themselves which are in pretty good shape.

Moreover, it's not that it is too complex to write set of back/front
ends, it just it's already written and optimized down to the bit.
Our current implementation has all the regular bells and whistles
(rings, delayed notifications, napi) it is simper than Xen's but it
lacks further optimizations and tso/scatter gather.
If we even use the NetChannel2 we should enjoy from smart NIC features
too.
It's more tempting and fun to continue to support our implementation but
it's righter to reuse code. 
Nevertheless, we'll be happy to hear and discuss what others are
thinking.

If the current Xen code fail to hit the kernel, then it would be even
easier for us - we'll just rip off all the Xen wrapping, the grant
tables and the flipping would go away leaving clean optimized network
code.
Regards,
Dor.

-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH/RFC 7/9] Virtual network guest device driver

2007-05-13 Thread Rusty Russell
On Sun, 2007-05-13 at 11:49 -0500, Anthony Liguori wrote:
 Dor Laor wrote:
  Furthermore,

  the plan is to completely rearchitect the netback/netfront protocol for
  the next Xen release (this effort is referred to netchannel2).
  It's looks like generalizing all the level 0,1,2 features plus
  performance optimizations. It's not something we couldn't upgrade to.
 
 I'm curious what Rusty thinks as I do not know nearly enough about the 
 networking subsystem to make an educated statement here.  Would it be 
 better to just try and generalize netback/netfront or build something 
 from scratch?  Could the lguest driver be generalized more easily?

In turn, I'm curious as to Herbert's opinions on this.

The lguest netdriver has only two features: it's small, and it does
multi-way inter-guest networking as well as guest-host.  It's not
clear how much the latter wins in real life over a point-to-point comms
system.

My interest is in a common low-level transport.  My experience is that
it's easy to create an efficient comms channel between a guest and host
(ie. one side can access the others' memory), but it's worthwhile trying
for a model which transparently allows untrusted comms (ie.
hypervisor-assisted to access the other guest's memory).  That's easier
if you only want point-to-point (see lguest's io.c for a more general
solution).

Cheers,
Rusty.


-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH/RFC 7/9] Virtual network guest device driver

2007-05-12 Thread Carsten Otte

ron minnich wrote:
 Let me ask what may seem to be a naive question to the linux world. I
 see you are doing a lot off solid work on adding block and network
 devices. The code for block and network devices
 is implemented in different ways. I've also seen this difference of
 inerface/implementation on Xen.
Actually, the difference derives from the fact that block and network 
are indeed different:
- block submits requests that ask the host to transfer from/to 
preallocated guest data buffers via dma (request driven)
- net transmits packets that should end up in an skb on the remote 
side (two way, push driven)
- net is sensitive to round-trip times, block is not due to the device 
plug for request merging

We tried different access methods for both block and network. We have 
selected the current communication mechanics after doing performance 
measurements.
I believe for a portable solution we need to develop a set of 
primitives for sending signals (read: interrupts) back and forth, for 
copying data to guest memory, and for establishing shared memory 
between guests and between guest+host. These primitives need to be 
implemented for each platform, and paravirtual drivers should build on 
top of that.
At this point in time, we are aware that these device drivers don't do 
what we'd want for a portable solution. We'll focus on getting the 
kernel interfaces to sie/vt/svm proper and portable first.

so long,
Carsten

-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH/RFC 7/9] Virtual network guest device driver

2007-05-11 Thread ron minnich
Let me ask what may seem to be a naive question to the linux world. I
see you are doing a lot off solid work on adding block and network
devices. The code for block and network devices
is implemented in different ways. I've also seen this difference of
inerface/implementation on Xen.

Hence my question:
Why are the INTERFACES to the block and network devices different? I
can understand that the implementation -- what goes on inside the
box -- would be different. But, again, why is the interface to the
resource different in each case? Will every distinct type of I/O
device end up with a different interface?

These questions doubtless seem naive, I suppose, except I use a system
(Plan 9) in which a common interface is in fact used for the different
resources. I have been hoping that we could bring this model -- same
interface, different resource -- to the inter-vm communications. I
would like to at least raise the idea that it could be used on KVM.

Avoiding too much detail, in the plan 9 world, read and write of data
to a disk is via file read and write system calls. Same for a network.
Same for the mouse, the window system, the serial port, the console,
USB, and so on. Please see this note from IBM on what is
possible:http://domino.watson.ibm.com/library/CyberDig.nsf/0/c6c779bbf1650fa4852570670054f3ca?OpenDocument
or http://plan9.escet.urjc.es/iwp9/cready/PROSE_iwp9_2006.pdf

Different resources, same interface. In the hypervisor world, you
build one shared memory queue as a basic abstraction. On top of that
queue, you run 9P. The provider (network, block device, etc.) provides
certain resources to you, the guest domain The resources have names. A
network can look like this, to a kvm guest (this command from a Plan 9
system):
cpu% ls /net/ether0
/net/ether0/0
/net/ether0/1
/net/ether0/2
/net/ether0/addr
/net/ether0/clone
/net/ether0/ifstats
/net/ether0/stats
To get network stats, or do I/O, one simply gains access to the
appropriate ring buffer, by finding the name, and does the ring buffer
sends and receives via shared memory queues. The I/O operations can be
very efficient.

Disk looks like this:
cpu% ls -l /dev/sdC0
--rw-r- S 0 bootes bootes   104857600 Jan 22 15:49 /dev/sdC0/9fat
--rw-r- S 0 bootes bootes 65361213440 Jan 22 15:49 /dev/sdC0/arenas
--rw-r- S 0 bootes bootes   0 Jan 22 15:49 /dev/sdC0/ctl
--rw-r- S 0 bootes bootes 82348277760 Jan 22 15:49 /dev/sdC0/data
--rw-r- S 0 bootes bootes 13072242688 Jan 22 15:49 /dev/sdC0/fossil
--rw-r- S 0 bootes bootes  3268060672 Jan 22 15:49 /dev/sdC0/isect
--rw-r- S 0 bootes bootes 512 Jan 22 15:49 /dev/sdC0/nvram
--rw-r- S 0 bootes bootes 82343245824 Jan 22 15:49 /dev/sdC0/plan9
-lrw--- S 0 bootes bootes   0 Jan 22 15:49 /dev/sdC0/raw
--rw-r- S 0 bootes bootes   536870912 Jan 22 15:49 /dev/sdC0/swap
cpu%

So the disk partitions are files, with the data file being the
whole disk. Again, on a hypervisor system, to do I/O, software could
create a connection to the file and establish the in-memory ring
buffer, for that partition. This I/O can be very efficient; IBM
research is working on zero-copy mechanisms for moving data between
domains.

The result is a single, consistent mechanism for accessing all
resources from a guest domain. The resources have names, and it is
easy to examine the status -- binary interfaces can be minimized. The
resources can be provided by in-kernel servers -- Linux drivers -- or
out-of-kernel servers -- proceses. Same interface, and yet the
implementation of the provider of the resource can be utterly
different.

We had hoped to get something like this into Xen. On Xen, for example,
the block device and ethernet device interfaces are as different as
one could imagine. Disk I/O does not steal pages from the guest. The
network does. Disk I/O is in 4k chunks, period, with a bitmap
describing which of the 8 512-byte subunits are being sent. The enet
device, on read, returns a page with your packet, but also potentially
containing bits of other domain's packets too. The interfaces are as
dissimilar as they can be, and I see no reason for such a huge
variance between what are basically read/write devices.

Another issue is that kvm, in its current form (-24) is beautifully
simple. These additions seem to detract from the beauty a  bit. Might
it be worth taking a little time to consider these ideas in order to
preserve the basic elegance of KVM?

So, before we go too far down the Xen-like paravirtualized device
route, can we discuss the way this ought to look a bit?

thanks

ron

-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net

Re: [kvm-devel] [PATCH/RFC 7/9] Virtual network guest device driver

2007-05-11 Thread Anthony Liguori
ron minnich wrote:
 Avoiding too much detail, in the plan 9 world, read and write of data
 to a disk is via file read and write system calls.

For low speed devices, I think paravirtualization doesn't make a lot of 
sense unless it's absolutely required.  I don't know enough about s390 
to know if it supports things like uarts but if so, then emulating a 
uart would in my mind make a lot more sense than a PV console device.

  Same for a network.
 Same for the mouse, the window system, the serial port, the console,
 USB, and so on. Please see this note from IBM on what is
 possible:http://domino.watson.ibm.com/library/CyberDig.nsf/0/c6c779bbf1650fa4852570670054f3ca?OpenDocument
 or http://plan9.escet.urjc.es/iwp9/cready/PROSE_iwp9_2006.pdf
 Different resources, same interface. In the hypervisor world, you
 build one shared memory queue as a basic abstraction. On top of that
 queue, you run 9P. The provider (network, block device, etc.) provides
 certain resources to you, the guest domain The resources have names. A
 network can look like this, to a kvm guest (this command from a Plan 9
 system):
 cpu% ls /net/ether0
 /net/ether0/0
 /net/ether0/1
 /net/ether0/2
 /net/ether0/addr
 /net/ether0/clone
 /net/ether0/ifstats
 /net/ether0/stats
   

This smells a bit like XenStore which I think most will agree was an 
unmitigated disaster.  This sort of thing gets terribly complicated to 
deal with in the corner cases.  Atomic operation of multiple read/write 
operations is difficult to express.  Moreover, quite a lot of things are 
naturally expressed as a state machine which is not straight forward to 
do in this sort of model.  This may have been all figured out in 9P but 
it's certainly not a simple thing to get right.

I think a general rule of thumb for a virtualized environment is that 
the closer you stick to the way hardware tends to do things, the less 
likely you are to screw yourself up and the easier it will be for other 
platforms to support your devices.  Implementing a full 9P client just 
to get console access in something like mini-os would be unfortunate.  
At least the posted s390 console driver behaves roughly like a uart so 
it's pretty obvious that it will be easy to implement in any OS that 
supports uarts already.

Regards,

Anthony Liguori

 To get network stats, or do I/O, one simply gains access to the
 appropriate ring buffer, by finding the name, and does the ring buffer
 sends and receives via shared memory queues. The I/O operations can be
 very efficient.

 Disk looks like this:
 cpu% ls -l /dev/sdC0
 --rw-r- S 0 bootes bootes   104857600 Jan 22 15:49 /dev/sdC0/9fat
 --rw-r- S 0 bootes bootes 65361213440 Jan 22 15:49 /dev/sdC0/arenas
 --rw-r- S 0 bootes bootes   0 Jan 22 15:49 /dev/sdC0/ctl
 --rw-r- S 0 bootes bootes 82348277760 Jan 22 15:49 /dev/sdC0/data
 --rw-r- S 0 bootes bootes 13072242688 Jan 22 15:49 /dev/sdC0/fossil
 --rw-r- S 0 bootes bootes  3268060672 Jan 22 15:49 /dev/sdC0/isect
 --rw-r- S 0 bootes bootes 512 Jan 22 15:49 /dev/sdC0/nvram
 --rw-r- S 0 bootes bootes 82343245824 Jan 22 15:49 /dev/sdC0/plan9
 -lrw--- S 0 bootes bootes   0 Jan 22 15:49 /dev/sdC0/raw
 --rw-r- S 0 bootes bootes   536870912 Jan 22 15:49 /dev/sdC0/swap
 cpu%

 So the disk partitions are files, with the data file being the
 whole disk. Again, on a hypervisor system, to do I/O, software could
 create a connection to the file and establish the in-memory ring
 buffer, for that partition. This I/O can be very efficient; IBM
 research is working on zero-copy mechanisms for moving data between
 domains.

 The result is a single, consistent mechanism for accessing all
 resources from a guest domain. The resources have names, and it is
 easy to examine the status -- binary interfaces can be minimized. The
 resources can be provided by in-kernel servers -- Linux drivers -- or
 out-of-kernel servers -- proceses. Same interface, and yet the
 implementation of the provider of the resource can be utterly
 different.

 We had hoped to get something like this into Xen. On Xen, for example,
 the block device and ethernet device interfaces are as different as
 one could imagine. Disk I/O does not steal pages from the guest. The
 network does. Disk I/O is in 4k chunks, period, with a bitmap
 describing which of the 8 512-byte subunits are being sent. The enet
 device, on read, returns a page with your packet, but also potentially
 containing bits of other domain's packets too. The interfaces are as
 dissimilar as they can be, and I see no reason for such a huge
 variance between what are basically read/write devices.

 Another issue is that kvm, in its current form (-24) is beautifully
 simple. These additions seem to detract from the beauty a  bit. Might
 it be worth taking a little time to consider these ideas in order to
 preserve the basic elegance of KVM?

 So, before we go too far down the Xen-like paravirtualized device
 route, can we discuss 

Re: [kvm-devel] [PATCH/RFC 7/9] Virtual network guest device driver

2007-05-11 Thread Anthony Liguori
Eric Van Hensbergen wrote:
 On 5/11/07, Anthony Liguori [EMAIL PROTECTED] wrote:
   
 cpu% ls /net/ether0
 /net/ether0/0
 /net/ether0/1
 /net/ether0/2
 /net/ether0/addr
 /net/ether0/clone
 /net/ether0/ifstats
 /net/ether0/stats

   
 This smells a bit like XenStore which I think most will agree was an
 unmitigated disaster.

 

 I'd have to disagree with you Anthony.  The Plan 9 interfaces are
 simple and built into the kernel - they don't have the
 multi-layered-stack-python-xmlrpc garbage that made up the Xen
 interfaces.
   

My point isn't that 9p is just like XenStore but rather that turning 
this idea into something that is useful and elegant is non-trivial.

 If it were just console access, I would agree with you, but its really
 about implementing a single solution for all drivers you are accessing
 across the interface.  A single client versus dozens of different
 driver variants.

There's definitely a conversation to have here.  There are going to be a 
lot of small devices that would benefit from a common transport 
mechanism.  Someone mentioned a PV entropy device on LKML.  A 
host=guest filesystem is another consumer of such an interface.

I'm inclined to think though that the abstraction point should be the 
transport and not the actual protocol.  My concern with standardizing on 
a protocol like 9p would be that one would lose some potential 
optimizations (like passing PFN's directly between guest and host).

   Our existing 9p client for mini-os is ~3000 LOC and
 it is a pretty naive port from the p9p code base so it could probably
 be reduced even further.  It is a very small percentage of our
 existing mini-os kernels and gives us console, disk, network, IP
 stack, file system, and control interfaces.  Of course Linux clients
 could just use v9fs with a hypervisor-shared-memory transport which I
 haven't merged yet.  We'll also be using the same set of interfaces
 for the simulator shortly.
   

So is there any reason to even tie 9p to KVM?  Why not just have a 
common PV transport that 9p can use.  For certain things, it may make 
sense (like v9fs).

Regards,

Anthony Liguori

 Oh yeah, and don't forget the fact that resource access can bridge
 seamlessly over any network and the protocol has provisions to be
 secured with authentication/encryption/digesting if desired.

 Los Alamos will be presenting 9p based control interfaces for KVM at OLS.

 -eric

 -
 This SF.net email is sponsored by DB2 Express
 Download DB2 Express C - the FREE version of DB2 express and take
 control of your XML. No limits. Just data. Click to get it now.
 http://sourceforge.net/powerbar/db2/
 ___
 kvm-devel mailing list
 kvm-devel@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/kvm-devel

   


-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel