Re: [kvm-devel] [PATCH/RFC 7/9] Virtual network guest device driver
Christoph Hellwig wrote: On Tue, May 22, 2007 at 10:00:42AM -0700, ron minnich wrote: On 5/22/07, Eric Van Hensbergen [EMAIL PROTECTED] wrote: I'm not opposed to supporting emulation environments, just don't make a large pile of crap the default like Xen -- and having to integrate PCI probing code in my guest domains is a large pile of crap. Exactly. I'm about to start a pretty large project here, using xen or kvm, not sure. One thing for sure, we are NOT going to use anything but PV devices. Full emulation is nice, but it's just plain silly if you don't have to do it. And we don't have to do it. So let's get the PV devices right, not try to shoehorn them into some framework like PCI. If you don't care about full virtualization kvm is the wrong project for you. You might want to take a look at lguest. This is incorrect. While kvm started out as a full virtualization project, it will expand with I/O PV and core PV. Eventually most of the paravirt_ops interface will have a kvm implementation. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH/RFC 7/9] Virtual network guest device driver
On Wed, May 23, 2007 at 03:16:50PM +0300, Avi Kivity wrote: Christoph Hellwig wrote: On Tue, May 22, 2007 at 10:00:42AM -0700, ron minnich wrote: On 5/22/07, Eric Van Hensbergen [EMAIL PROTECTED] wrote: I'm not opposed to supporting emulation environments, just don't make a large pile of crap the default like Xen -- and having to integrate PCI probing code in my guest domains is a large pile of crap. Exactly. I'm about to start a pretty large project here, using xen or kvm, not sure. One thing for sure, we are NOT going to use anything but PV devices. Full emulation is nice, but it's just plain silly if you don't have to do it. And we don't have to do it. So let's get the PV devices right, not try to shoehorn them into some framework like PCI. If you don't care about full virtualization kvm is the wrong project for you. You might want to take a look at lguest. This is incorrect. While kvm started out as a full virtualization project, it will expand with I/O PV and core PV. Eventually most of the paravirt_ops interface will have a kvm implementation. The statement above was a little misworded I think. It should have been a if you care about pure PV ... - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH/RFC 7/9] Virtual network guest device driver
On 5/23/07, Carsten Otte [EMAIL PROTECTED] wrote: For me, plan9 does provide answers to a lot of above requirements. However, it does not provide capabilities for shared memory and it adds extra complexity. It's been designed to solve a different problem. As a point of clarification, plan9 protocols have been used over shared memory for resource access on virtualized systems for the past 3 years. There are certainly ways it can be further optimized, but it is not a restriction. As far as complexity goes, our guest-side stack is around 2000 lines of code (with an additional 1000 lines of support routines that could likely be replaced by standard library or OS services in more conventional platforms) and supports console, file system, network, and block device access. I think the virtual device abstraction should provide the following functionality: - hypercall guest to host with parameters and return value - interrupt from host to guest with parameters - thin interrupt from host to guest, no parameters - shared memory between guest and host - dma access to guest memory, possibly via kmap on the host - copy from/to guest memory Good list. We can certainly work within these parameters. It would be nice to have some facility for direct guest-guest communication -- however, I understand the difficulties in doing that in a secure and safe way. Still, having the ability to provision such a direct interface would be nice for those that can take advantage of it. -eric - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH/RFC 7/9] Virtual network guest device driver
On Wednesday 23 May 2007, Eric Van Hensbergen wrote: On 5/23/07, Carsten Otte [EMAIL PROTECTED] wrote: For me, plan9 does provide answers to a lot of above requirements. However, it does not provide capabilities for shared memory and it adds extra complexity. It's been designed to solve a different problem. As a point of clarification, plan9 protocols have been used over shared memory for resource access on virtualized systems for the past 3 years. There are certainly ways it can be further optimized, but it is not a restriction. I think what Carsten means is to have a mmap interface over 9p, not implementing 9p by means of shared memory, which is what I guess you are referring to. If you want to share memory areas between a guest and the host or another guest, you can't do that with the regular Tread/Twrite interface that 9p has on a file. As far as complexity goes, our guest-side stack is around 2000 lines of code (with an additional 1000 lines of support routines that could likely be replaced by standard library or OS services in more conventional platforms) and supports console, file system, network, and block device access. Another interface that I think is missing in 9p is a notification for hotplugging. Of course you can have a long-running read on a special file that returns the file names for virtual devices that have been added or removed in the guest, but that sounds a little clumsy compared to an specialized interface (e.g. Tnotify). Arnd - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH/RFC 7/9] Virtual network guest device driver
On 5/23/07, Arnd Bergmann [EMAIL PROTECTED] wrote: On Wednesday 23 May 2007, Eric Van Hensbergen wrote: On 5/23/07, Carsten Otte [EMAIL PROTECTED] wrote: For me, plan9 does provide answers to a lot of above requirements. However, it does not provide capabilities for shared memory and it adds extra complexity. It's been designed to solve a different problem. As a point of clarification, plan9 protocols have been used over shared memory for resource access on virtualized systems for the past 3 years. There are certainly ways it can be further optimized, but it is not a restriction. I think what Carsten means is to have a mmap interface over 9p, not implementing 9p by means of shared memory, which is what I guess you are referring to. If you want to share memory areas between a guest and the host or another guest, you can't do that with the regular Tread/Twrite interface that 9p has on a file. Well, there's nothing strictly preventing a mmap interface over 9p (in fact we are working with that in a Cell project internally) -- however, I'm not sure that makes the best sense for device access anyways. The real thing missing from the current implementation is a better underlying transport which can pass payloads by reference to shared memory as opposed to marshaling operations through a shared memory transport -- however, this is what Los Alamos and IBM are working on right now. As far as complexity goes, our guest-side stack is around 2000 lines of code (with an additional 1000 lines of support routines that could likely be replaced by standard library or OS services in more conventional platforms) and supports console, file system, network, and block device access. Another interface that I think is missing in 9p is a notification for hotplugging. Of course you can have a long-running read on a special file that returns the file names for virtual devices that have been added or removed in the guest, but that sounds a little clumsy compared to an specialized interface (e.g. Tnotify). Discovery and hot-plugging would be synthetic file system semantic issues that need to be resolved and in general are probably, as Rusty and others suggested, best handled as a separate set of topics. That being said, specialized interfaces always seemed a bit more clunky to me (just look at ioctl), but I suppose that's largely a matter of taste. The advantage of having a file system interface to event notification is it creates a much more flexible environment, allowing even simple shell scripting languages to resolve events versus having to build a complex infrastructure -- and since 9p can be transitively mounted over a network, you can build cluster management suites without secondary layers of gorp for such things. The LANL guys will probably have more to say about this at their OLS talk on the KVM management synthetic file system interface they build with 9p. -eric - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH/RFC 7/9] Virtual network guest device driver
On 5/23/07, Eric Van Hensbergen [EMAIL PROTECTED] wrote: On 5/23/07, Arnd Bergmann [EMAIL PROTECTED] wrote: On Wednesday 23 May 2007, Eric Van Hensbergen wrote: On 5/23/07, Carsten Otte [EMAIL PROTECTED] wrote: For me, plan9 does provide answers to a lot of above requirements. However, it does not provide capabilities for shared memory and it adds extra complexity. It's been designed to solve a different problem. As a point of clarification, plan9 protocols have been used over shared memory for resource access on virtualized systems for the past 3 years. There are certainly ways it can be further optimized, but it is not a restriction. I think what Carsten means is to have a mmap interface over 9p, not implementing 9p by means of shared memory, which is what I guess you are referring to. If you want to share memory areas between a guest and the host or another guest, you can't do that with the regular Tread/Twrite interface that 9p has on a file. ugh. I'm tired. Its been a long week -- I realized after I fired off that last message that you mean establishing a shared mapping versus support for mmap operations over 9p (which devolve into Tread/Twrite). Sorry. Yes -- that's correct, 9p wouldn't necessarily buy you something like that. In fact, the current 9p code relies on someone else providing that basic mechanism in order for us to establish our shared memory transport. What Carsten described as his virtual device abstraction sounded like a good foundation -- just don't make me use ioctl :) -eric - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH/RFC 7/9] Virtual network guest device driver
On Tue, May 22, 2007 at 07:49:51AM -0500, Eric Van Hensbergen wrote: In the general case, you can't pass a command line argument to Linux either. kvm doesn't boot Linux; it boots the bios, which boots the boot sector, which boots grub, which boots Linux. Relying on the user to edit the command line in grub is wrong. I didn't think we were talking about the general case, I thought we were discussing the PV case. In the PV case, having bios/bootloader is unnecessary overhead. To that same end, I don't see Windows in the PV case unless they magically want to to coordinate PV standards with us, in which case we certainly can negotiate a more sane discovery mechanism. In case of KVM no one is speaking of pure PV. What people have been working on is PV accelaration of a vullvirt host, similar to how s390 is working for decaded. The host emulates the full architecture, but there are some escape for speedups. Typical escapes would be drivers for storage or networking because those can no be virtualized very well on x86-style hardware. - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH/RFC 7/9] Virtual network guest device driver
Eric Van Hensbergen wrote: On 5/22/07, Avi Kivity [EMAIL PROTECTED] wrote: Anthony Liguori wrote: In a PV environment why not just pass an initial cookie/hash/whatever as a command-line argument/register/memory-space to the underlying kernel? You can't pass a command line argument to Windows (at least, not easily AFAIK). You could get away with an MSR/CPUID flag but then you're relying on uniqueness which isn't guaranteed. In the general case, you can't pass a command line argument to Linux either. kvm doesn't boot Linux; it boots the bios, which boots the boot sector, which boots grub, which boots Linux. Relying on the user to edit the command line in grub is wrong. I didn't think we were talking about the general case, I thought we were discussing the PV case. It is still useful to use PV drivers with full virtualization so it's something that ought to be considered. Regards, Anthony Liguori In the PV case, having bios/bootloader is unnecessary overhead. To that same end, I don't see Windows in the PV case unless they magically want to to coordinate PV standards with us, in which case we certainly can negotiate a more sane discovery mechanism. -eric - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH/RFC 7/9] Virtual network guest device driver
On 5/22/07, Anthony Liguori [EMAIL PROTECTED] wrote: Eric Van Hensbergen wrote: On 5/22/07, Christoph Hellwig [EMAIL PROTECTED] wrote: I didn't think we were talking about the general case, I thought we were discussing the PV case. In case of KVM no one is speaking of pure PV. Why not? It seems worthwhile to come up with something that can cover the whole spectrum instead of having different hypervisors (and interfaces). Because in a few years, almost everyone will have hardware capable of doing full virtualization so why bother with pure PV. I don't know, we could shoot for a clean, simple interface that makes PV easy to integrate into any kernel. Pick a common underlying abstraction for all resources. Define a simple, efficient memory channel for the comms. Lay 9p over it. Then take it from there for each device. I agree, from the way (e.g.) the Xen devices work, PV is a pain. But it need not be that way. I think from the Plan 9 side we're happy to run full PV. But we're 0% of the world, so that may bias our importance a bit :-) thanks ron - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH/RFC 7/9] Virtual network guest device driver
On 5/22/07, Anthony Liguori [EMAIL PROTECTED] wrote: Eric Van Hensbergen wrote: On 5/22/07, Christoph Hellwig [EMAIL PROTECTED] wrote: In case of KVM no one is speaking of pure PV. Why not? It seems worthwhile to come up with something that can cover the whole spectrum instead of having different hypervisors (and interfaces). Because in a few years, almost everyone will have hardware capable of doing full virtualization so why bother with pure PV. No matter what the capabilities, full device emulation is always going to be wasteful. Just because I have the hardware to run Vista, doesn't mean I should run Vista. Maybe my view is skewed because I don't care to run windows. It's not just windows. There are a lot of people who want to use virtualization to run RHEL2 or even RH9. Backporting PV to these kernels is a huge effort. I'm not opposed to supporting emulation environments, just don't make a large pile of crap the default like Xen -- and having to integrate PCI probing code in my guest domains is a large pile of crap. -eric - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH/RFC 7/9] Virtual network guest device driver
On 5/22/07, Eric Van Hensbergen [EMAIL PROTECTED] wrote: I'm not opposed to supporting emulation environments, just don't make a large pile of crap the default like Xen -- and having to integrate PCI probing code in my guest domains is a large pile of crap. Exactly. I'm about to start a pretty large project here, using xen or kvm, not sure. One thing for sure, we are NOT going to use anything but PV devices. Full emulation is nice, but it's just plain silly if you don't have to do it. And we don't have to do it. So let's get the PV devices right, not try to shoehorn them into some framework like PCI. What happens to these schemes if I want to try, e.g., 2^16 PV devices? Or some other crazy thing that doesn't play well with PCI -- simple example -- I want a 256 GB region of memory for a device. PCI rules require me to align it on 256GB boundaries and it must be contiguous address space. This is a hardware rule, done for hardware reasons, and has no place in the PV world. What if I want a bit more than the basic set of BARs that PCI gives me? Why would we apply such rules to a PV? Why limit ourselves this early in the game? thanks ron - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH/RFC 7/9] Virtual network guest device driver
On 5/22/07, Dor Laor [EMAIL PROTECTED] wrote: Don't quit so soon on us. OK. I'll go look at Ingo's stuff. Thanks again ron - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH/RFC 7/9] Virtual network guest device driver
This is quite easy with KVM. I like the approach that vmchannel has taken. A simple PCI device. That gives you a discovery mechanism for shared memory and an interrupt and then you can just implement a ring queue using those mechanisms (along with a PIO port for signaling from the guest to the host). So given that underlying mechanism, the question is how to expose that within the guest kernel/userspace and within the host. Sorry for answering late, but I dont like PCI as a device bus for all platforms. s390 has no PCI and s390 has no PIO. I would prefer a new simple hypercall based virtual bus. I dont know much about windows driver programming, but I guess it it is not that hard to add a new bus. - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH/RFC 7/9] Virtual network guest device driver
On Mon, 21 May 2007 13:28:03 +0200, Arnd Bergmann [EMAIL PROTECTED] wrote: We've had the same discussion about PCI as virtual device abstraction recently when hpa made the suggestions to get a set of PCI device numbers registered for Linux. (If you want to read it up, it's the thread at http://marc.info/?t=11755452543r=1w=2) IIRC, the conclusion to which we came was that it is indeed helpful for most architecture to have a PCI device as one way to probe for the functionality, but not to rely on it. s390 is the obvious example where you can't have PCI, but you may also want to build a guest kernel without PCI support because of space constraints in a many-guests machine. What I think would be ideal is to have a new bus type in Linux that does not have any dependency on PCI itself, but can be easily implemented as a child of a PCI device. If we only need the stuff mentioned by Anthony, the interface could look like struct vmchannel_device { struct resource virt_mem; struct vm_device_id id; int irq; int (*signal)(struct vmchannel_device *); int (*irq_ack)(struct vmchannel_device *); struct device dev; }; IRQ numbers are evil :) It should be more like a void *vmchannel_device_handle; which could be different things depending on what we want the vmchannel_device to be a child of (it could be an IRQ number for PCI devices, or something like subchannel_id if we wanted to support channel devices). Such a device can easily be provided as a child of a PCI device, or as something that is purely virtual based on an hcall interface. This looks like a flexible approach. - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH/RFC 7/9] Virtual network guest device driver
On Monday 21 May 2007, Cornelia Huck wrote: IRQ numbers are evil :) yes, but getting rid of them is an entirely different discussion. I really think that in the first step, you should be able to use its external interrupts with the same request_irq interface as the other architectures. Fundamentally, the s390 architecture has external interrupt numbers as well, you're just using a different interface for registering them. The ccw devices obviously have a better interface already, but that doesn't help you here. It should be more like a void *vmchannel_device_handle; which could be different things depending on what we want the vmchannel_device to be a child of (it could be an IRQ number for PCI devices, or something like subchannel_id if we wanted to support channel devices). No, the driver needs to know how to get at the interrupt without caring about the bus implementation, that's why you either need to have a callback function set by the driver (like s390 CCW or USB have it), or visible interrupt number (like everyone does). There is no need for a pointer back to a vmchannel_device_handle, all information needed by the bus layer can simply be in a subclass derived from the vmchannel_device, e.g. struct vmchannel_pci { struct pci_device *parent; /* shortcut, same as to_pci_dev(this.vmdev.dev.parent) */ unsigned long signal_ioport; /* for interrupt generation */ struct vmchannel_device vmdev; }; You would allocate this structure in the pci_driver that registers the vmchannel_device. Arnd - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH/RFC 7/9] Virtual network guest device driver
Arnd Bergmann wrote: On Monday 21 May 2007, Christian Borntraeger wrote: This is quite easy with KVM. I like the approach that vmchannel has taken. A simple PCI device. That gives you a discovery mechanism for shared memory and an interrupt and then you can just implement a ring queue using those mechanisms (along with a PIO port for signaling from the guest to the host). So given that underlying mechanism, the question is how to expose that within the guest kernel/userspace and within the host. Sorry for answering late, but I dont like PCI as a device bus for all platforms. s390 has no PCI and s390 has no PIO. Right, I'm not interested in the lowest level implementation (PCI device + PIO). I'm more interested in the higher level interface. The goal is to allow drivers to be able to be written to the higher level interface so that they work on any platform that implements the lower level interface. On x86, that would be PCI/PIO. On s390, that could be hypercall based. I would prefer a new simple hypercall based virtual bus. I dont know much about windows driver programming, but I guess it it is not that hard to add a new bus. We've had the same discussion about PCI as virtual device abstraction recently when hpa made the suggestions to get a set of PCI device numbers registered for Linux. IIRC, the conclusion to which we came was that it is indeed helpful for most architecture to have a PCI device as one way to probe for the functionality, but not to rely on it. s390 is the obvious example where you can't have PCI, but you may also want to build a guest kernel without PCI support because of space constraints in a many-guests machine. What I think would be ideal is to have a new bus type in Linux that does not have any dependency on PCI itself, but can be easily implemented as a child of a PCI device. If we only need the stuff mentioned by Anthony, the interface could look like struct vmchannel_device { struct resource virt_mem; struct vm_device_id id; int irq; int (*signal)(struct vmchannel_device *); int (*irq_ack)(struct vmchannel_device *); struct device dev; }; Such a device can easily be provided as a child of a PCI device, or as something that is purely virtual based on an hcall interface. Yes, this is close to what I was thinking. I'm not sure that this particular interface can encompass the variety of memory sharing mechanisms though. When I mentioned shared memory via the PCI device, I was referring to the memory needed for boot strapping the device. You still need a mechanism to transfer memory for things like zero-copy disk IO and network devices. This may involve passing memory addresses directly, copying data, or page flipping. This leads me to think that a higher level interface that provided a data passing interface would be more useful. Something like: struct vmchannel_device { struct vm_device_id id; int (*open)(struct vmchannel_device *, const char *name, const char *service) int (*release)(struct vmchannel_device *); ssize_t (*sendmsg)(struct vmchannel_device *, const void *, size_t); ssize_t (*recvmsg)(struct vmchannel_device *, void *, size_t); struct device dev; }; The consuming interface of this would be a socket (PF_VIRTLINK). The sockaddr would contain a name identifying a VM and a service description. This doesn't address the memory issues I raised above but I think it would be easier to special case the drivers where it mattered. For instance, on x86 KVM, a PV disk driver front end would consist of connecting to a virtlink socket, and then transferring struct bio's. QEMU instances would listen on the virtlink socket in the host, and service them directly (QEMU can access all of the guests memory directly in userspace). A PV graphics device could just be a VNC server that listened on a virtlink socket. Regards, Anthony Liguori Arnd - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH/RFC 7/9] Virtual network guest device driver
OK, so what are we doing here? We're using a PCI abstraction, as a common abstraction,which is not common really, because we don't have a common abstraction? So we describe all these non-pci resources with a pci abstraction? I don't get it at all. I really think the resource interface idea I mentioned, which is borrowed from Plan 9, makes a whole lot more sense. IBM Austin has already shown it in practice in the papers I referenced. It can work. A memory channel at the bottom, with a resource sharing protocol (9p) above it, and then you describe your resources via names and a simple file-directory model. Note that PCI sort of tries to do this tree model, but it's all binary, and, as noted, it's hardly universal. All of this is trivially exported over a network, so the use of shared memory channels in no way rules out network access. Plan 9 exports devices over the network routinely. If you're using a PCI abstraction, something has gone badly wrong I think. thanks ron - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH/RFC 7/9] Virtual network guest device driver
On 5/21/07, Anthony Liguori [EMAIL PROTECTED] wrote: ron minnich wrote: OK, so what are we doing here? We're using a PCI abstraction, as a common abstraction,which is not common really, because we don't have a common abstraction? So we describe all these non-pci resources with a pci abstraction? No. You're confusing PV device discovery with the actual paravirtual transport. In a fully virtual environment like KVM, a PCI bus is present. You need some way for the guest to detect that a PV device is present. The most natural way to do this IMHO is to have an entry for the PV device in the PCI bus. That will make a lot of existing code happy. I don't think I am confusing it, now that you've explained it more fully. I'm even less happy with it :-) How will I explain this sort of thing to my grandchildren? :-) grandpop, why do those PV devices look like a bus defined in 1994? Why would you not have, e.g., a 9p server for PV device config space as well? I actually implemented that on Xen -- it was quite trivial, and it makes more sense -- to me anyway -- than pretending a PV device is something it's not. What it happening, it seems to me, is that people are still trying to use an abstraction -- PCI device -- which is not really an abstraction, to model aspects of PV device discovery, enumeration, configuration and operation. I'm still pretty uncomfortable with it -- well, honestly, it seems kind of gross to me. It's just as easy to build the right abstraction underneath all this, and then, for those OSes that have existing code that needs to be happy, present that abstraction as a PCI bus. But making the PCI bus the underlying abstraction is getting the order inverted,I believe. I realize that PCI device space is a pretty handy way to do this, that it is very convenient. I wonder what happens when you get a system without enough holes in the config space for you to hide the PV devices in, or that has some other weird property that breaks this model. I've already worked with one system that had 32 PCI busses. There are other hypervisors that made convenient choices over the right choice, and they are paying for it. Let's try to avoid that on kvm. Kvm has so much going for it right now. thanks ron - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH/RFC 7/9] Virtual network guest device driver
ron minnich wrote: On 5/21/07, Anthony Liguori [EMAIL PROTECTED] wrote: No. You're confusing PV device discovery with the actual paravirtual transport. In a fully virtual environment like KVM, a PCI bus is present. You need some way for the guest to detect that a PV device is present. The most natural way to do this IMHO is to have an entry for the PV device in the PCI bus. That will make a lot of existing code happy. I don't think I am confusing it, now that you've explained it more fully. I'm even less happy with it :-) Sometimes I think the best way to make you happy is to just stop talking :-) How will I explain this sort of thing to my grandchildren? :-) grandpop, why do those PV devices look like a bus defined in 1994? Why would you not have, e.g., a 9p server for PV device config space as well? I actually implemented that on Xen -- it was quite trivial, and it makes more sense -- to me anyway -- than pretending a PV device is something it's not. What it happening, it seems to me, is that people are still trying to use an abstraction -- PCI device -- which is not really an abstraction, to model aspects of PV device discovery, enumeration, configuration and operation. I'm still pretty uncomfortable with it -- well, honestly, it seems kind of gross to me. It's just as easy to build the right abstraction underneath all this, and then, for those OSes that have existing code that needs to be happy, present that abstraction as a PCI bus. But making the PCI bus the underlying abstraction is getting the order inverted,I believe. Okay. The first problem here is that you're assuming that I'm suggesting that this who thing mandate a PCI bus. I'm not. I'm merely saying that one possible way to implement this is by using a PCI bus to discover the existing of a VIRTLINK socket. Clearly, the s390 guys would have to use something else. For PV Xen where there is no PCI bus, XenBus would be used. So very concretely, there are three separate classes of problems: 1) How to determine that a VM can use virtlink sockets 2) How to enumerate paravirtual devices 3) The various PV protocols for each device Whatever Linux implements, it has to allow multiple implementations for #1. For x86 VMs, PCI is just the easiest thing to do here. You could do hypercalls but it gets messy on different hypervisors (vmcall with 0 in eax may do something funky in Xen but be the probing hypercall on KVM). For #2, I'm not really proposing anything concrete. One possibility is to allow virtlink sockets to be addressed with a service and to use that. That doesn't allow for enumeration though so it may not be perfect. I'm not proposing anything at all for #3. That's outside the scope of this discussion in my mind. Now, once you have a virtlink socket, could you use p9 to implement #2 and #3? Sounds like something you could write a paper about :-) But that's later argument. Right now, I'm just focused on solving the boot strap issue. Hope this clarifies things a bit. Regards, Anthony Liguori I realize that PCI device space is a pretty handy way to do this, that it is very convenient. I wonder what happens when you get a system without enough holes in the config space for you to hide the PV devices in, or that has some other weird property that breaks this model. I've already worked with one system that had 32 PCI busses. There are other hypervisors that made convenient choices over the right choice, and they are paying for it. Let's try to avoid that on kvm. Kvm has so much going for it right now. thanks ron - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH/RFC 7/9] Virtual network guest device driver
Eric Van Hensbergen wrote: On 5/21/07, Anthony Liguori [EMAIL PROTECTED] wrote: ron minnich wrote: OK, so what are we doing here? We're using a PCI abstraction, as a common abstraction,which is not common really, because we don't have a common abstraction? So we describe all these non-pci resources with a pci abstraction? No. You're confusing PV device discovery with the actual paravirtual transport. In a PV environment why not just pass an initial cookie/hash/whatever as a command-line argument/register/memory-space to the underlying kernel? You can't pass a command line argument to Windows (at least, not easily AFAIK). You could get away with an MSR/CPUID flag but then you're relying on uniqueness which isn't guaranteed. The presence of such a kernel argument would suggest the existence of a hypercall interface or other such mechanism to attach to the initial transport(s). Command-line arguments may be a bit too linux-centric to Ron's taste, but if we are going to chose something arbitrary like PCI, I'd prefer we chose something a bit more straightforward to interact with instead of doing crazy ritual dances to extract what should be straightforward information. I really don't want to have integrate PCI parsing into my testOS/libOS kernels. You could just hard code a PIC interrupt and rely on some static memory address for IO and avoid the PCI bus entirely. The whole point of the PCI bus is to avoid hardcoding this sort of things but if you don't want the complexity associated with PCI, then using the older mechanisms seems like the obvious thing to do. Regards, Anthony Liguori -eric - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH/RFC 7/9] Virtual network guest device driver
ron minnich wrote: Hi Anthony, I still feel that how about a socket interface is still focused on the how to implement, and not what the interface should be. Right. I'm not trying to answer that question ATM. There are a number of paravirt devices that would be useful in a virtual setting. For instance, a PV device for providing the guest with entropy and a shared PV clipboard. These devices should be simple but all current communication mechanisms are far too complicated. I also am not sure the socket system call interface is quite what we want, although it's a neat idea. It's also not that portable outside the everything is a Linux variant world. A filesystem interface certainly isn't very portable outside the POSIX world :-) Once it is connected, we can move data. This is similar to your socket idea, but consider that: o to see active vmics, I use 'ls' o I don't have to create a new sockaddr address type o I can control access with chmod o I am seperating the interface from the implementation o This is, of course, not really 'files', but in-memory data structures; this can (and will) be fast o No binary data structures. For different domains, even on the same machine, alignment rules etc. are not always the same -- I hit this when I ported Plan 9 to Xen, esp. back when Xen relied so heavily on gcc tricks such as __align__ and packed. Using character strings eliminates that problem. The interface you're proposing is almost functionally identical to a socket. In fact, once you open /data you've got an fd that you interact with in the same way as you would interact with a socket. It's not that there's an unique value for this sort of interface in virtualization; I don't think you're making that argument. Instead, you're making a general argument as to why this way of doing things is better than what Unix has been doing forever (with things like sockets). That's fine, I think you have a valid point, but that's a larger argument to have on LKML or at a conference. This isn't the place to shoe-horn this sort of thing. A socket interface would provide a simple, well-understood interface that few people in the Linux community would disagree with (it's already there for s390). It should also be easy enough to stream p9 over the socket so you can build these interfaces easily and continue your attempts to expose the world as a virtual filesystem :-) Regards, Anthony Liguori This is, I think, the kind of thing Eric would also like to see, but he can correct me. Thanks ron - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH/RFC 7/9] Virtual network guest device driver
On 5/18/07, Anthony Liguori [EMAIL PROTECTED] wrote: I also am not sure the socket system call interface is quite what we want, although it's a neat idea. It's also not that portable outside the everything is a Linux variant world. A filesystem interface certainly isn't very portable outside the POSIX world :-) Actually, it probably the most portable thing you can have. The interface you're proposing is almost functionally identical to a socket. In fact, once you open /data you've got an fd that you interact with in the same way as you would interact with a socket. Well, sure, I stole the interface from Plan 9, and they use this interface to do sockets, among *many* other things -- and there's the point. The interface is not just sockets. But if you're used to sockets, it looks familiar. I only steal from the best :-) Note, btw, that the fd has a path, and can be examined easily, and also passed to other programs for use. That's messy and ugly with sockets. It's not that there's an unique value for this sort of interface in virtualization; I don't think you're making that argument. Instead, you're making a general argument as to why this way of doing things is better than what Unix has been doing forever (with things like sockets) Yes, Unix has been doing it this way forever. The interface I am proposing was the one designed by the Unix guys -- once they realized how deficient the Unix way of doing things had become. But, forgetting all this argument, it still seems to me that the file system interface is far simpler than a socket interface. No binary structures. No new sockaddr structures needed. No alignment/padding rules. You can actually set up a link from a shell script, or perl, or python, or whatever, without a special set of bindings. A socket interface would provide a simple, well-understood interface that few people in the Linux community would disagree with (it's already there for s390). Yes, but ... well understood to the Linux community. Can we look at a broader scope? We've got a golden opportunity here to build a really flexible VMIC interface. I would hate to lose it. Anyway, thanks for discussing this. ron - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH/RFC 7/9] Virtual network guest device driver
Daniel P. Berrange wrote: As a userspace apps service, I'd very much like to see a common sockets interface for inter-VM communication that is portable across virt systems like Xen KVM. I'd see it as similar to UNIX domain sockets in style. So basically any app which could do UNIX domain sockets, could be ported to inter-VM sockets by just changing PF_UNIX to say, PF_VIRT Lots of interesting details around impl security (what VMs are allowed to talk to each other, whether this policy should be controlled by the host, or allow VMs to decide for themselves). z/VM, the premium hypervisor on 390 already has this capability for decades. This is called IUCV (inter user communication vehicle), where user really means virtual machine. It so happens the support for AF_IUCV was recently merged to Linux mainline. It may be worth a look, either for using it or because learning from existing solutions is always a good idea. so long, Carsten - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH/RFC 7/9] Virtual network guest device driver
Carsten Otte wrote: Daniel P. Berrange wrote: As a userspace apps service, I'd very much like to see a common sockets interface for inter-VM communication that is portable across virt systems like Xen KVM. I'd see it as similar to UNIX domain sockets in style. So basically any app which could do UNIX domain sockets, could be ported to inter-VM sockets by just changing PF_UNIX to say, PF_VIRT Lots of interesting details around impl security (what VMs are allowed to talk to each other, whether this policy should be controlled by the host, or allow VMs to decide for themselves). z/VM, the premium hypervisor on 390 already has this capability for decades. This is called IUCV (inter user communication vehicle), where user really means virtual machine. It so happens the support for AF_IUCV was recently merged to Linux mainline. It may be worth a look, either for using it or because learning from existing solutions is always a good idea. Is there anything that explains what the fields in sockaddr mean: sa_family_tsiucv_family; unsigned shortsiucv_port;/* Reserved */ unsigned intsiucv_addr;/* Reserved */ charsiucv_nodeid[8];/* Reserved */ charsiucv_user_id[8];/* Guest User Id */ charsiucv_name[8];/* Application Name */ Regards, Anthony LIugori so long, Carsten - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH/RFC 7/9] Virtual network guest device driver
Rusty Russell wrote: On Wed, 2007-05-16 at 14:10 -0500, Anthony Liguori wrote: For the host, you can probably stay entirely within QEMU. Interguest communication would be a bit tricky but guest-host communication is real simple. guest-host is always simple. But it'd be great if it didn't matter to the guest whether it's talking to the host or another guest. I think shared memory is an obvious start, but it's not enough for inter-guest where they can't freely access each other's memory. So you really want a ring-buffer of descriptors with a hypervisor-assist to say read/write this into the memory referred to by that descriptor. I think this is getting a little ahead of ourselves. An example of this idea is pretty straight-forward but it gets more complicated when trying to support the existing memory sharing mechanisms on various hypervisors. There are a few cases to consider: 1) The target VM can access all of the memory of the guest VM with no penalty. This is the case when going from guest=QEMU in KVM or going from guest=kernel (ignoring highmem) in KVM. For this, you can send arbitrary memory to the host. 2) The target VM can access all of the memory of the guest VM with a penalty. For guest=other userspace process in KVM, an mmap() would be required. This would work for Xen provided the target VM was domain-0 but it would incur a xc_map_foreign_range(). 3) The target and source VM can only share memory based on an existing pool. This is the guest with Xen and grant tables. I think an API that covers these three cases is a bit tricky and will likely make undesired trade-offs. I think it's easier to start out focusing on the low-speed case where there's a mandatory data-copy. You can still pass gntref's or PFNs down this transport if you like and perhaps down the road we'll find that we can make a common interface for doing this sort of thing. Regards, Anthony Liguori I think this can be done as a simple variation of the current schemes in existence. But I'm shutting up until I have some demonstration code 8) A tricky bit of this is how to do discovery. If you want to support interguest communication, it's not really sufficient to just use strings since they identifiers would have to be unique throughout the entire system. Maybe you just leave it as a guest=host channel and be done with it. Hmm, I was going to leave that unspecified. One thing at a time... Rusty. - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH/RFC 7/9] Virtual network guest device driver
On Thu, 2007-05-17 at 11:13 -0500, Anthony Liguori wrote: Rusty Russell wrote: I think shared memory is an obvious start, but it's not enough for inter-guest where they can't freely access each other's memory. So you really want a ring-buffer of descriptors with a hypervisor-assist to say read/write this into the memory referred to by that descriptor. I think this is getting a little ahead of ourselves. An example of this idea is pretty straight-forward but it gets more complicated when trying to support the existing memory sharing mechanisms on various hypervisors. There are a few cases to consider: To clarify, I'm not overly interested in existing mechanisms. I'm first trying for something sane from a Linux driver POV, then see if it can be implemented in terms of legacy systems. This reflects my belief that we will see more virtualization solutions in the medium term, so it's reasonable to look at a new system. Cheers, Rusty. - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH/RFC 7/9] Virtual network guest device driver
On 5/16/07, Anthony Liguori [EMAIL PROTECTED] wrote: What do you think about a socket interface? I'm not sure how discovery would work yet, but there are a few PV socket implementations for Xen at the moment. Hi Anthony, I still feel that how about a socket interface is still focused on the how to implement, and not what the interface should be. I also am not sure the socket system call interface is quite what we want, although it's a neat idea. It's also not that portable outside the everything is a Linux variant world. So how about this as an interface design. The communications channels are visible in our name space at a mountpoint of our choice. Let's call this mount point, for sake of argument, vmic. When we mount on vmic, we see one file: /vmic/clone When we open and read /vmic/clone, we get a number, let's pretend for this example we get '0'. The numbers are not important, except to distinguish connections. Opening the clone file gets us a connection endpoint. Ls of the directory now shows this: /vmic/clone /vmic/0 The directory, and the files in it, are owned by me, mode 700 or 600 or 400 as the file requires. The mode can be changed, of course, if I wish to allow wider access to the channel. Here, already, we see some advantage to the use of the file system for this type of capability. What is in the directory? Here is one proposal. /vmic/0/data /vmic/0/status /vmic/0/ctl /vmic/0/local /vmic/0/remote What can we do with this? Data is pretty obvious: we can read it or write it, and that data is received/sent from the other endpoint. Note that I'm not saying how the data flows: it can be done in whatever manner is most efficient, by the kernel, including zero copy. It can be different for many reasons, but the point is that the interface is basically unchanging. Of course, it is an error to read or write data until something at the other end connects to the local end! What is status? We cat it and it gets us status in some meaningful text string. E.g.: cat /vmic/0/status connected /domain/name What is local? It's our local name for the resource in this domain What is remote? It's the name of other endpoint. What's a name look like? I'm thinking it might look like /domain/name, but that is just a guess ... What is ctl? here is where the fun begins. We might do things such as echo bind somename /vmic/0/ctl this names the vmic. We might want to wait for a connection: echo listen 1 /vmic/0/ctl We might want to restrict it somehow echo key somekey /vmic/0/ctl echo listendomain domainnumber /vmic/0/ctl or we might know there is something out there. echo connect /domainname/somename /vmic/0/ctl Once it is connected, we can move data. This is similar to your socket idea, but consider that: o to see active vmics, I use 'ls' o I don't have to create a new sockaddr address type o I can control access with chmod o I am seperating the interface from the implementation o This is, of course, not really 'files', but in-memory data structures; this can (and will) be fast o No binary data structures. For different domains, even on the same machine, alignment rules etc. are not always the same -- I hit this when I ported Plan 9 to Xen, esp. back when Xen relied so heavily on gcc tricks such as __align__ and packed. Using character strings eliminates that problem. This is, I think, the kind of thing Eric would also like to see, but he can correct me. Thanks ron - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH/RFC 7/9] Virtual network guest device driver
On Wed, May 16, 2007 at 12:28:00PM -0500, Anthony Liguori wrote: Eric Van Hensbergen wrote: On 5/11/07, Anthony Liguori [EMAIL PROTECTED] wrote: There's definitely a conversation to have here. There are going to be a lot of small devices that would benefit from a common transport mechanism. Someone mentioned a PV entropy device on LKML. A host=guest filesystem is another consumer of such an interface. I'm inclined to think though that the abstraction point should be the transport and not the actual protocol. My concern with standardizing on a protocol like 9p would be that one would lose some potential optimizations (like passing PFN's directly between guest and host). I think that there are two layers - having a standard, well defined, simple shared memory transport between partitions (or between emulators and the host system) is certainly a prerequisite. There are lots of different decisions to made here: What do you think about a socket interface? I'm not sure how discovery would work yet, but there are a few PV socket implementations for Xen at the moment. As a userspace apps service, I'd very much like to see a common sockets interface for inter-VM communication that is portable across virt systems like Xen KVM. I'd see it as similar to UNIX domain sockets in style. So basically any app which could do UNIX domain sockets, could be ported to inter-VM sockets by just changing PF_UNIX to say, PF_VIRT Lots of interesting details around impl security (what VMs are allowed to talk to each other, whether this policy should be controlled by the host, or allow VMs to decide for themselves). a) does it communicate with userspace, kernelspace, or both? sockets are usable for both userspace/kernespace. For userspace, it would be very easy to adapt existing sockets based apps using IP or UNIX sockets to use inter-VM sockets, which is a big positive. d) can all of these parameters be something controllable from userspace? e) I'm sure there are many others that I can't be bothered to think of on a Friday The biggest point of contention would probably be what goes in the sockaddr structure. Keeping it very simple would be some arbitrary 'path', similar to UNIX domain sockets in the abstract namespace ? Regards, Dan. -- |=- Red Hat, Engineering, Emerging Technologies, Boston. +1 978 392 2496 -=| |=- Perl modules: http://search.cpan.org/~danberr/ -=| |=- Projects: http://freshmeat.net/~danielpb/ -=| |=- GnuPG: 7D3B9505 F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 -=| - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH/RFC 7/9] Virtual network guest device driver
On 5/16/07, Anthony Liguori [EMAIL PROTECTED] wrote: What do you think about a socket interface? I'm not sure how discovery would work yet, but there are a few PV socket implementations for Xen at the moment. From a functional standpoint I don't have a huge problem with it, particularly if its more of a pure socket and not something that tries to look like a TCP/IP endpoint -- I would prefer something closer to netlink. Sockets would allow the exisitng 9p stuff to pretty much work as-is. However, all that being said, I noticed some pretty big differences between sockets and shared memory in terms of overhead under Linux. If you take a look at the RPC latency graph in: http://plan9.escet.urjc.es/iwp9/cready/PROSE_iwp9_2006.pdf You'll see that a local socket implementation has about an order of magnitude worse latency than a PROSE/Libra inter-partition shared memory channel. Furthermore it will really limit our ability to trim the fat of unnecessary copies in order to have competitive performance. But perhaps there's magic you can do to eliminate that. Of course, you could always layer a socket interface for userspace simplicity on top of a more performance-optimized underlying transport that could be used directly by kernel-modules. -eric - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH/RFC 7/9] Virtual network guest device driver
On Wed, May 16, 2007 at 1:28 PM, in message [EMAIL PROTECTED], Anthony Liguori [EMAIL PROTECTED] wrote: What do you think about a socket interface? I'm not sure how discovery would work yet, but there are a few PV socket implementations for Xen at the moment. FYI: The work I am doing is exactly that. I am going to extend host-based unix domain sockets up to the KVM guest. Not sure how well it will work yet, as I had to lay the LAPIC work down first for IO-completion. -Greg - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH/RFC 7/9] Virtual network guest device driver
Gregory Haskins wrote: On Wed, May 16, 2007 at 1:28 PM, in message [EMAIL PROTECTED], Anthony Liguori [EMAIL PROTECTED] wrote: What do you think about a socket interface? I'm not sure how discovery would work yet, but there are a few PV socket implementations for Xen at the moment. FYI: The work I am doing is exactly that. I am going to extend host-based unix domain sockets up to the KVM guest. Not sure how well it will work yet, as I had to lay the LAPIC work down first for IO-completion. Do you plan on introducing a new address family in the guest? Regards, Anthony Liguori -Greg - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH/RFC 7/9] Virtual network guest device driver
Eric Van Hensbergen wrote: On 5/16/07, Anthony Liguori [EMAIL PROTECTED] wrote: What do you think about a socket interface? I'm not sure how discovery would work yet, but there are a few PV socket implementations for Xen at the moment. From a functional standpoint I don't have a huge problem with it, particularly if its more of a pure socket and not something that tries to look like a TCP/IP endpoint -- I would prefer something closer to netlink. Sockets would allow the exisitng 9p stuff to pretty much work as-is. So you would prefer assigning out types instead of using an identifier string in the sockaddr? However, all that being said, I noticed some pretty big differences between sockets and shared memory in terms of overhead under Linux. If you take a look at the RPC latency graph in: http://plan9.escet.urjc.es/iwp9/cready/PROSE_iwp9_2006.pdf You'll see that a local socket implementation has about an order of magnitude worse latency than a PROSE/Libra inter-partition shared memory channel. You seem to suggest that the low latency is due to a very greedy (CPU hungry) polling algorithm. A poll vs. interrupt model would seem to me to be orthogonal to using sockets as an interface. Furthermore it will really limit our ability to trim the fat of unnecessary copies in order to have competitive performance. But perhaps there's magic you can do to eliminate that. sockets do add copies. My initial thinking is that one can work around this by passing guest PFNs (or grant references in Xen). I'm also happy to start out focusing on low-speed devices. Of course, you could always layer a socket interface for userspace simplicity on top of a more performance-optimized underlying transport that could be used directly by kernel-modules. Right. Regards, Anthony Liguori -eric - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH/RFC 7/9] Virtual network guest device driver
Gregory Haskins wrote: On Wed, May 16, 2007 at 2:39 PM, in message [EMAIL PROTECTED], Anthony Liguori [EMAIL PROTECTED] wrote: Gregory Haskins wrote: On Wed, May 16, 2007 at 1:28 PM, in message [EMAIL PROTECTED], Anthony Liguori [EMAIL PROTECTED] wrote: What do you think about a socket interface? I'm not sure how discovery would work yet, but there are a few PV socket implementations for Xen at the moment. FYI: The work I am doing is exactly that. I am going to extend host- based unix domain sockets up to the KVM guest. Not sure how well it will work yet, as I had to lay the LAPIC work down first for IO- completion. Do you plan on introducing a new address family in the guest? Well, since I had to step back and lay some infrastructure groundwork I haven't vetted this approach yet...so its possible what I am about to say is relatively naive: But my primary application is to create a guest-kernel to host IVMC. This is quite easy with KVM. I like the approach that vmchannel has taken. A simple PCI device. That gives you a discovery mechanism for shared memory and an interrupt and then you can just implement a ring queue using those mechanisms (along with a PIO port for signaling from the guest to the host). So given that underlying mechanism, the question is how to expose that within the guest kernel/userspace and within the host. For the host, you can probably stay entirely within QEMU. Interguest communication would be a bit tricky but guest-host communication is real simple. You could stop at exposing the channel as a socket within the guest kernel/userspace. That would work, but you may also want to expose the ring queue within the kernel at least if there are consumers that need to avoid the copy. A tricky bit of this is how to do discovery. If you want to support interguest communication, it's not really sufficient to just use strings since they identifiers would have to be unique throughout the entire system. Maybe you just leave it as a guest=host channel and be done with it. Regards, Anthony Liguori For that you can just think of the guest as any other process on the host, and it will just use the sockets normally as any host-process would. There might be some thunking that has to happen to deal with gpa vs va, etc, but otherwise its a standard consumer. If you want to extend IVMC up to guest-userspace, I think making some kind of new socket family makes sense in the guests stack. PF_VIRT like someone else suggested, for instance. But since I dont need this type of IVMC I haven't really thought about this too much. -Greg - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH/RFC 7/9] Virtual network guest device driver
On Wed, 2007-05-16 at 14:10 -0500, Anthony Liguori wrote: For the host, you can probably stay entirely within QEMU. Interguest communication would be a bit tricky but guest-host communication is real simple. guest-host is always simple. But it'd be great if it didn't matter to the guest whether it's talking to the host or another guest. I think shared memory is an obvious start, but it's not enough for inter-guest where they can't freely access each other's memory. So you really want a ring-buffer of descriptors with a hypervisor-assist to say read/write this into the memory referred to by that descriptor. I think this can be done as a simple variation of the current schemes in existence. But I'm shutting up until I have some demonstration code 8) A tricky bit of this is how to do discovery. If you want to support interguest communication, it's not really sufficient to just use strings since they identifiers would have to be unique throughout the entire system. Maybe you just leave it as a guest=host channel and be done with it. Hmm, I was going to leave that unspecified. One thing at a time... Rusty. - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH/RFC 7/9] Virtual network guest device driver
On Monday 14 May 2007 14:05, Avi Kivity wrote: But I agree that the growing code base is a problem. With the block driver we can probably keep the host side in userspace, but to do the same for networking is much more work. I do think (now) that it is doable. Interesting. What kind of userspace networking do you have in mind? One of the first trys from Carsten was to use tun/tap, which proved to be slow performance-wise. What I had in mind was some kind of switch in userspace. That would allow non-root guests to define there own private networks. We could use Linux fast pipe implementation for guest-to-guest communication. The questions is how to connect user space networks to the host ones? - tun/tap is quite slow - last time we checked, netfiler offered only IP hooks (if you dont use the bridging code) - raw sockets get tricky if you do in/out at the same time because you have to manually deal with loops This reminds me, that we actually have another party doing virtual networking between guests: UML. User mode linux actually can do networking/switching in userspace, but I cannot tell how well UMLs concept works out. Christian - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH/RFC 7/9] Virtual network guest device driver
Avi Kivity wrote: But I agree that the growing code base is a problem. With the block driver we can probably keep the host side in userspace, but to do the same for networking is much more work. I do think (now) that it is doable. I agree that networking needs to be handled in the host kernel. We go out to userspace for signaling at this time, but that's simply broken. All our userspace does is do a system call next. so long, Carsten - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH/RFC 7/9] Virtual network guest device driver
ron minnich wrote: Let me ask what may seem to be a naive question to the linux world. I see you are doing a lot off solid work on adding block and network devices. The code for block and network devices is implemented in different ways. I've also seen this difference of inerface/implementation on Xen. Actually, the difference derives from the fact that block and network are indeed different: - block submits requests that ask the host to transfer from/to preallocated guest data buffers via dma (request driven) - net transmits packets that should end up in an skb on the remote side (two way, push driven) - net is sensitive to round-trip times, block is not due to the device plug for request merging We tried different access methods for both block and network. We have selected the current communication mechanics after doing performance measurements. I believe for a portable solution we need to develop a set of primitives for sending signals (read: interrupts) back and forth, for copying data to guest memory, and for establishing shared memory between guests and between guest+host. These primitives need to be implemented for each platform, and paravirtual drivers should build on top of that. At this point in time, we are aware that these device drivers don't do what we'd want for a portable solution. We'll focus on getting the kernel interfaces to sie/vt/svm proper and portable first. so long, Carsten Based on the previous discussion and the s390 PV drivers I have more gasoline to pour to the flame: We have a working PV driver with 1Gbit performance. The reasons we don't push it into the kernel are: a. We should perform much better b. It would be a painful task getting all the code review that a complicated network interface should get. c. There's already a PV driver that answers a,b. The Xen's PV network driver is now pushed into the kernel. It is optimized, and support tso. By adding a generic ops calls we can make enjoy all the above. Using Xen's core PV code doesn't imply that we will have their interface {xenstore} the interface creation and tear-down would be kvm specific. They could even have a plain directory structure. - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH/RFC 7/9] Virtual network guest device driver
Dor Laor wrote: push it into the kernel are: a. We should perform much better b. It would be a painful task getting all the code review that a complicated network interface should get. c. There's already a PV driver that answers a,b. The Xen's PV network driver is now pushed into the kernel. Actually, it's not (at least not as of a few moments ago). Furthermore, the plan is to completely rearchitect the netback/netfront protocol for the next Xen release (this effort is referred to netchannel2). See some of the XenSummit slides as to why this is necessary. Regards, Anthony Liguori It is optimized, and support tso. By adding a generic ops calls we can make enjoy all the above. Using Xen's core PV code doesn't imply that we will have their interface {xenstore} the interface creation and tear-down would be kvm specific. They could even have a plain directory structure. - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH/RFC 7/9] Virtual network guest device driver
Subject: Re: [kvm-devel] [PATCH/RFC 7/9] Virtual network guest device driver Dor Laor wrote: push it into the kernel are: a. We should perform much better b. It would be a painful task getting all the code review that a complicated network interface should get. c. There's already a PV driver that answers a,b. The Xen's PV network driver is now pushed into the kernel. Actually, it's not (at least not as of a few moments ago). Furthermore, the plan is to completely rearchitect the netback/netfront protocol for the next Xen release (this effort is referred to netchannel2). But isn't Jeremy Fitzhardinge is pushing big patch queue into the kernel? If we manage to plant hooks into the netback/front for using net_ops, they and the code will get into the kernel they will be have to keep the hooks for netchannel2. See some of the XenSummit slides as to why this is necessary. It's looks like generalizing all the level 0,1,2 features plus performance optimizations. It's not something we couldn't upgrade to. Regards, Anthony Liguori It is optimized, and support tso. By adding a generic ops calls we can make enjoy all the above. Using Xen's core PV code doesn't imply that we will have their interface {xenstore} the interface creation and tear-down would be kvm specific. They could even have a plain directory structure. - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH/RFC 7/9] Virtual network guest device driver
Dor Laor wrote: Furthermore, the plan is to completely rearchitect the netback/netfront protocol for the next Xen release (this effort is referred to netchannel2). But isn't Jeremy Fitzhardinge is pushing big patch queue into the kernel? Yes, but it's not in the kernel yet and there's no guarantee it'll get there in time for KVM's consumption. If we manage to plant hooks into the netback/front for using net_ops, they and the code will get into the kernel they will be have to keep the hooks for netchannel2. See some of the XenSummit slides as to why this is necessary. It's looks like generalizing all the level 0,1,2 features plus performance optimizations. It's not something we couldn't upgrade to. I'm curious what Rusty thinks as I do not know nearly enough about the networking subsystem to make an educated statement here. Would it be better to just try and generalize netback/netfront or build something from scratch? Could the lguest driver be generalized more easily? Regards, Anthony LIguori Regards, Anthony Liguori It is optimized, and support tso. By adding a generic ops calls we can make enjoy all the above. Using Xen's core PV code doesn't imply that we will have their interface {xenstore} the interface creation and tear-down would be kvm specific. They could even have a plain directory structure. - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH/RFC 7/9] Virtual network guest device driver
On Sun, May 13, 2007 at 11:49:14AM -0500, Anthony Liguori wrote: Dor Laor wrote: Furthermore, the plan is to completely rearchitect the netback/netfront protocol for the next Xen release (this effort is referred to netchannel2). But isn't Jeremy Fitzhardinge is pushing big patch queue into the kernel? Yes, but it's not in the kernel yet and there's no guarantee it'll get there in time for KVM's consumption. On the other hand, there's strong interest in having unified virtual drivers. Given that the Xen drivers are out there, have been submitted and have been reasonably optimized, there will be some resistance to putting in yet another set of PV drivers. Also, the contentious merge point as I understand it is xenbus needing review, rather than the drivers themselves which are in pretty good shape. Cheers, Muli - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH/RFC 7/9] Virtual network guest device driver
Subject: Re: [kvm-devel] [PATCH/RFC 7/9] Virtual network guest device driver On Sun, May 13, 2007 at 11:49:14AM -0500, Anthony Liguori wrote: Dor Laor wrote: Furthermore, the plan is to completely rearchitect the netback/netfront protocol for the next Xen release (this effort is referred to netchannel2). But isn't Jeremy Fitzhardinge is pushing big patch queue into the kernel? Yes, but it's not in the kernel yet and there's no guarantee it'll get there in time for KVM's consumption. On the other hand, there's strong interest in having unified virtual drivers. Given that the Xen drivers are out there, have been submitted and have been reasonably optimized, there will be some resistance to putting in yet another set of PV drivers. Also, the contentious merge point as I understand it is xenbus needing review, rather than the drivers themselves which are in pretty good shape. Moreover, it's not that it is too complex to write set of back/front ends, it just it's already written and optimized down to the bit. Our current implementation has all the regular bells and whistles (rings, delayed notifications, napi) it is simper than Xen's but it lacks further optimizations and tso/scatter gather. If we even use the NetChannel2 we should enjoy from smart NIC features too. It's more tempting and fun to continue to support our implementation but it's righter to reuse code. Nevertheless, we'll be happy to hear and discuss what others are thinking. If the current Xen code fail to hit the kernel, then it would be even easier for us - we'll just rip off all the Xen wrapping, the grant tables and the flipping would go away leaving clean optimized network code. Regards, Dor. - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH/RFC 7/9] Virtual network guest device driver
On Sun, 2007-05-13 at 11:49 -0500, Anthony Liguori wrote: Dor Laor wrote: Furthermore, the plan is to completely rearchitect the netback/netfront protocol for the next Xen release (this effort is referred to netchannel2). It's looks like generalizing all the level 0,1,2 features plus performance optimizations. It's not something we couldn't upgrade to. I'm curious what Rusty thinks as I do not know nearly enough about the networking subsystem to make an educated statement here. Would it be better to just try and generalize netback/netfront or build something from scratch? Could the lguest driver be generalized more easily? In turn, I'm curious as to Herbert's opinions on this. The lguest netdriver has only two features: it's small, and it does multi-way inter-guest networking as well as guest-host. It's not clear how much the latter wins in real life over a point-to-point comms system. My interest is in a common low-level transport. My experience is that it's easy to create an efficient comms channel between a guest and host (ie. one side can access the others' memory), but it's worthwhile trying for a model which transparently allows untrusted comms (ie. hypervisor-assisted to access the other guest's memory). That's easier if you only want point-to-point (see lguest's io.c for a more general solution). Cheers, Rusty. - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH/RFC 7/9] Virtual network guest device driver
ron minnich wrote: Let me ask what may seem to be a naive question to the linux world. I see you are doing a lot off solid work on adding block and network devices. The code for block and network devices is implemented in different ways. I've also seen this difference of inerface/implementation on Xen. Actually, the difference derives from the fact that block and network are indeed different: - block submits requests that ask the host to transfer from/to preallocated guest data buffers via dma (request driven) - net transmits packets that should end up in an skb on the remote side (two way, push driven) - net is sensitive to round-trip times, block is not due to the device plug for request merging We tried different access methods for both block and network. We have selected the current communication mechanics after doing performance measurements. I believe for a portable solution we need to develop a set of primitives for sending signals (read: interrupts) back and forth, for copying data to guest memory, and for establishing shared memory between guests and between guest+host. These primitives need to be implemented for each platform, and paravirtual drivers should build on top of that. At this point in time, we are aware that these device drivers don't do what we'd want for a portable solution. We'll focus on getting the kernel interfaces to sie/vt/svm proper and portable first. so long, Carsten - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH/RFC 7/9] Virtual network guest device driver
Let me ask what may seem to be a naive question to the linux world. I see you are doing a lot off solid work on adding block and network devices. The code for block and network devices is implemented in different ways. I've also seen this difference of inerface/implementation on Xen. Hence my question: Why are the INTERFACES to the block and network devices different? I can understand that the implementation -- what goes on inside the box -- would be different. But, again, why is the interface to the resource different in each case? Will every distinct type of I/O device end up with a different interface? These questions doubtless seem naive, I suppose, except I use a system (Plan 9) in which a common interface is in fact used for the different resources. I have been hoping that we could bring this model -- same interface, different resource -- to the inter-vm communications. I would like to at least raise the idea that it could be used on KVM. Avoiding too much detail, in the plan 9 world, read and write of data to a disk is via file read and write system calls. Same for a network. Same for the mouse, the window system, the serial port, the console, USB, and so on. Please see this note from IBM on what is possible:http://domino.watson.ibm.com/library/CyberDig.nsf/0/c6c779bbf1650fa4852570670054f3ca?OpenDocument or http://plan9.escet.urjc.es/iwp9/cready/PROSE_iwp9_2006.pdf Different resources, same interface. In the hypervisor world, you build one shared memory queue as a basic abstraction. On top of that queue, you run 9P. The provider (network, block device, etc.) provides certain resources to you, the guest domain The resources have names. A network can look like this, to a kvm guest (this command from a Plan 9 system): cpu% ls /net/ether0 /net/ether0/0 /net/ether0/1 /net/ether0/2 /net/ether0/addr /net/ether0/clone /net/ether0/ifstats /net/ether0/stats To get network stats, or do I/O, one simply gains access to the appropriate ring buffer, by finding the name, and does the ring buffer sends and receives via shared memory queues. The I/O operations can be very efficient. Disk looks like this: cpu% ls -l /dev/sdC0 --rw-r- S 0 bootes bootes 104857600 Jan 22 15:49 /dev/sdC0/9fat --rw-r- S 0 bootes bootes 65361213440 Jan 22 15:49 /dev/sdC0/arenas --rw-r- S 0 bootes bootes 0 Jan 22 15:49 /dev/sdC0/ctl --rw-r- S 0 bootes bootes 82348277760 Jan 22 15:49 /dev/sdC0/data --rw-r- S 0 bootes bootes 13072242688 Jan 22 15:49 /dev/sdC0/fossil --rw-r- S 0 bootes bootes 3268060672 Jan 22 15:49 /dev/sdC0/isect --rw-r- S 0 bootes bootes 512 Jan 22 15:49 /dev/sdC0/nvram --rw-r- S 0 bootes bootes 82343245824 Jan 22 15:49 /dev/sdC0/plan9 -lrw--- S 0 bootes bootes 0 Jan 22 15:49 /dev/sdC0/raw --rw-r- S 0 bootes bootes 536870912 Jan 22 15:49 /dev/sdC0/swap cpu% So the disk partitions are files, with the data file being the whole disk. Again, on a hypervisor system, to do I/O, software could create a connection to the file and establish the in-memory ring buffer, for that partition. This I/O can be very efficient; IBM research is working on zero-copy mechanisms for moving data between domains. The result is a single, consistent mechanism for accessing all resources from a guest domain. The resources have names, and it is easy to examine the status -- binary interfaces can be minimized. The resources can be provided by in-kernel servers -- Linux drivers -- or out-of-kernel servers -- proceses. Same interface, and yet the implementation of the provider of the resource can be utterly different. We had hoped to get something like this into Xen. On Xen, for example, the block device and ethernet device interfaces are as different as one could imagine. Disk I/O does not steal pages from the guest. The network does. Disk I/O is in 4k chunks, period, with a bitmap describing which of the 8 512-byte subunits are being sent. The enet device, on read, returns a page with your packet, but also potentially containing bits of other domain's packets too. The interfaces are as dissimilar as they can be, and I see no reason for such a huge variance between what are basically read/write devices. Another issue is that kvm, in its current form (-24) is beautifully simple. These additions seem to detract from the beauty a bit. Might it be worth taking a little time to consider these ideas in order to preserve the basic elegance of KVM? So, before we go too far down the Xen-like paravirtualized device route, can we discuss the way this ought to look a bit? thanks ron - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net
Re: [kvm-devel] [PATCH/RFC 7/9] Virtual network guest device driver
ron minnich wrote: Avoiding too much detail, in the plan 9 world, read and write of data to a disk is via file read and write system calls. For low speed devices, I think paravirtualization doesn't make a lot of sense unless it's absolutely required. I don't know enough about s390 to know if it supports things like uarts but if so, then emulating a uart would in my mind make a lot more sense than a PV console device. Same for a network. Same for the mouse, the window system, the serial port, the console, USB, and so on. Please see this note from IBM on what is possible:http://domino.watson.ibm.com/library/CyberDig.nsf/0/c6c779bbf1650fa4852570670054f3ca?OpenDocument or http://plan9.escet.urjc.es/iwp9/cready/PROSE_iwp9_2006.pdf Different resources, same interface. In the hypervisor world, you build one shared memory queue as a basic abstraction. On top of that queue, you run 9P. The provider (network, block device, etc.) provides certain resources to you, the guest domain The resources have names. A network can look like this, to a kvm guest (this command from a Plan 9 system): cpu% ls /net/ether0 /net/ether0/0 /net/ether0/1 /net/ether0/2 /net/ether0/addr /net/ether0/clone /net/ether0/ifstats /net/ether0/stats This smells a bit like XenStore which I think most will agree was an unmitigated disaster. This sort of thing gets terribly complicated to deal with in the corner cases. Atomic operation of multiple read/write operations is difficult to express. Moreover, quite a lot of things are naturally expressed as a state machine which is not straight forward to do in this sort of model. This may have been all figured out in 9P but it's certainly not a simple thing to get right. I think a general rule of thumb for a virtualized environment is that the closer you stick to the way hardware tends to do things, the less likely you are to screw yourself up and the easier it will be for other platforms to support your devices. Implementing a full 9P client just to get console access in something like mini-os would be unfortunate. At least the posted s390 console driver behaves roughly like a uart so it's pretty obvious that it will be easy to implement in any OS that supports uarts already. Regards, Anthony Liguori To get network stats, or do I/O, one simply gains access to the appropriate ring buffer, by finding the name, and does the ring buffer sends and receives via shared memory queues. The I/O operations can be very efficient. Disk looks like this: cpu% ls -l /dev/sdC0 --rw-r- S 0 bootes bootes 104857600 Jan 22 15:49 /dev/sdC0/9fat --rw-r- S 0 bootes bootes 65361213440 Jan 22 15:49 /dev/sdC0/arenas --rw-r- S 0 bootes bootes 0 Jan 22 15:49 /dev/sdC0/ctl --rw-r- S 0 bootes bootes 82348277760 Jan 22 15:49 /dev/sdC0/data --rw-r- S 0 bootes bootes 13072242688 Jan 22 15:49 /dev/sdC0/fossil --rw-r- S 0 bootes bootes 3268060672 Jan 22 15:49 /dev/sdC0/isect --rw-r- S 0 bootes bootes 512 Jan 22 15:49 /dev/sdC0/nvram --rw-r- S 0 bootes bootes 82343245824 Jan 22 15:49 /dev/sdC0/plan9 -lrw--- S 0 bootes bootes 0 Jan 22 15:49 /dev/sdC0/raw --rw-r- S 0 bootes bootes 536870912 Jan 22 15:49 /dev/sdC0/swap cpu% So the disk partitions are files, with the data file being the whole disk. Again, on a hypervisor system, to do I/O, software could create a connection to the file and establish the in-memory ring buffer, for that partition. This I/O can be very efficient; IBM research is working on zero-copy mechanisms for moving data between domains. The result is a single, consistent mechanism for accessing all resources from a guest domain. The resources have names, and it is easy to examine the status -- binary interfaces can be minimized. The resources can be provided by in-kernel servers -- Linux drivers -- or out-of-kernel servers -- proceses. Same interface, and yet the implementation of the provider of the resource can be utterly different. We had hoped to get something like this into Xen. On Xen, for example, the block device and ethernet device interfaces are as different as one could imagine. Disk I/O does not steal pages from the guest. The network does. Disk I/O is in 4k chunks, period, with a bitmap describing which of the 8 512-byte subunits are being sent. The enet device, on read, returns a page with your packet, but also potentially containing bits of other domain's packets too. The interfaces are as dissimilar as they can be, and I see no reason for such a huge variance between what are basically read/write devices. Another issue is that kvm, in its current form (-24) is beautifully simple. These additions seem to detract from the beauty a bit. Might it be worth taking a little time to consider these ideas in order to preserve the basic elegance of KVM? So, before we go too far down the Xen-like paravirtualized device route, can we discuss
Re: [kvm-devel] [PATCH/RFC 7/9] Virtual network guest device driver
Eric Van Hensbergen wrote: On 5/11/07, Anthony Liguori [EMAIL PROTECTED] wrote: cpu% ls /net/ether0 /net/ether0/0 /net/ether0/1 /net/ether0/2 /net/ether0/addr /net/ether0/clone /net/ether0/ifstats /net/ether0/stats This smells a bit like XenStore which I think most will agree was an unmitigated disaster. I'd have to disagree with you Anthony. The Plan 9 interfaces are simple and built into the kernel - they don't have the multi-layered-stack-python-xmlrpc garbage that made up the Xen interfaces. My point isn't that 9p is just like XenStore but rather that turning this idea into something that is useful and elegant is non-trivial. If it were just console access, I would agree with you, but its really about implementing a single solution for all drivers you are accessing across the interface. A single client versus dozens of different driver variants. There's definitely a conversation to have here. There are going to be a lot of small devices that would benefit from a common transport mechanism. Someone mentioned a PV entropy device on LKML. A host=guest filesystem is another consumer of such an interface. I'm inclined to think though that the abstraction point should be the transport and not the actual protocol. My concern with standardizing on a protocol like 9p would be that one would lose some potential optimizations (like passing PFN's directly between guest and host). Our existing 9p client for mini-os is ~3000 LOC and it is a pretty naive port from the p9p code base so it could probably be reduced even further. It is a very small percentage of our existing mini-os kernels and gives us console, disk, network, IP stack, file system, and control interfaces. Of course Linux clients could just use v9fs with a hypervisor-shared-memory transport which I haven't merged yet. We'll also be using the same set of interfaces for the simulator shortly. So is there any reason to even tie 9p to KVM? Why not just have a common PV transport that 9p can use. For certain things, it may make sense (like v9fs). Regards, Anthony Liguori Oh yeah, and don't forget the fact that resource access can bridge seamlessly over any network and the protocol has provisions to be secured with authentication/encryption/digesting if desired. Los Alamos will be presenting 9p based control interfaces for KVM at OLS. -eric - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel