Re: [PATCHv2 1/2] mm: export use_mm/unuse_mm to modules
On Thu, 17 Sep 2009 08:38:18 +0300 "Michael S. Tsirkin" wrote: > Hi Andrew, > On Tue, Aug 11, 2009 at 03:10:10PM -0700, Andrew Morton wrote: > > On Wed, 12 Aug 2009 00:27:52 +0300 > > "Michael S. Tsirkin" wrote: > > > > > vhost net module wants to do copy to/from user from a kernel thread, > > > which needs use_mm (like what fs/aio has). Move that into mm/ and > > > export to modules. > > > > OK by me. Please include this change in the virtio patchset. Which I > > shall cheerfully not be looking at :) > > The virtio patches are somewhat delayed as we are ironing out the > kernel/user interface with Rusty. Can the patch moving use_mm to mm/ be > applied without exporting to modules for now? This will make it easier > for virtio which will only have to patch in the EXPORT line. That was 10,000 patches ago. > I also have a small patch optimizing atomic usage in use_mm (which I did for > virtio) and it's easier to apply it if the code is in the new place. > > If ok, pls let me know and I'll post the patch without the EXPORT line. Please just send them all out. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCHv2 1/2] mm: export use_mm/unuse_mm to modules
Hi Andrew, On Tue, Aug 11, 2009 at 03:10:10PM -0700, Andrew Morton wrote: > On Wed, 12 Aug 2009 00:27:52 +0300 > "Michael S. Tsirkin" wrote: > > > vhost net module wants to do copy to/from user from a kernel thread, > > which needs use_mm (like what fs/aio has). Move that into mm/ and > > export to modules. > > OK by me. Please include this change in the virtio patchset. Which I > shall cheerfully not be looking at :) The virtio patches are somewhat delayed as we are ironing out the kernel/user interface with Rusty. Can the patch moving use_mm to mm/ be applied without exporting to modules for now? This will make it easier for virtio which will only have to patch in the EXPORT line. I also have a small patch optimizing atomic usage in use_mm (which I did for virtio) and it's easier to apply it if the code is in the new place. If ok, pls let me know and I'll post the patch without the EXPORT line. -- MST ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server
Michael S. Tsirkin wrote: > On Wed, Sep 16, 2009 at 10:10:55AM -0400, Gregory Haskins wrote: >>> There is no role reversal. >> So if I have virtio-blk driver running on the x86 and vhost-blk device >> running on the ppc board, I can use the ppc board as a block-device. >> What if I really wanted to go the other way? > > It seems ppc is the only one that can initiate DMA to an arbitrary > address, so you can't do this really, or you can by tunneling each > request back to ppc, or doing an extra data copy, but it's unlikely to > work well. > > The limitation comes from hardware, not from the API we use. Understood, but presumably it can be exposed as a sub-function of the ppc's board's register file as a DMA-controller service to the x86. This would fall into the "tunnel requests back" category you mention above, though I think "tunnel" implies a heavier protocol than it would actually require. This would look more like a PIO cycle to a DMA controller than some higher layer protocol. You would then utilize that DMA service inside the memctx, and it the rest of vbus would work transparently with the existing devices/drivers. I do agree it would require some benchmarking to determine its feasibility, which is why I was careful to say things like "may work" ;). I also do not even know if its possible to expose the service this way on his system. If this design is not possible or performs poorly, I admit vbus is just as hosed as vhost in regard to the "role correction" benefit. Kind Regards, -Greg signature.asc Description: OpenPGP digital signature ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server
On Wed, Sep 16, 2009 at 10:10:55AM -0400, Gregory Haskins wrote: > > There is no role reversal. > > So if I have virtio-blk driver running on the x86 and vhost-blk device > running on the ppc board, I can use the ppc board as a block-device. > What if I really wanted to go the other way? It seems ppc is the only one that can initiate DMA to an arbitrary address, so you can't do this really, or you can by tunneling each request back to ppc, or doing an extra data copy, but it's unlikely to work well. The limitation comes from hardware, not from the API we use. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server
Avi Kivity wrote: > On 09/16/2009 10:22 PM, Gregory Haskins wrote: >> Avi Kivity wrote: >> >>> On 09/16/2009 05:10 PM, Gregory Haskins wrote: >>> > If kvm can do it, others can. > > The problem is that you seem to either hand-wave over details like this, or you give details that are pretty much exactly what vbus does already. My point is that I've already sat down and thought about these issues and solved them in a freely available GPL'ed software package. >>> In the kernel. IMO that's the wrong place for it. >>> >> 3) "in-kernel": You can do something like virtio-net to vhost to >> potentially meet some of the requirements, but not all. >> >> In order to fully meet (3), you would need to do some of that stuff you >> mentioned in the last reply with muxing device-nr/reg-nr. In addition, >> we need to have a facility for mapping eventfds and establishing a >> signaling mechanism (like PIO+qid), etc. KVM does this with >> IRQFD/IOEVENTFD, but we dont have KVM in this case so it needs to be >> invented. >> > > irqfd/eventfd is the abstraction layer, it doesn't need to be reabstracted. Not per se, but it needs to be interfaced. How do I register that eventfd with the fastpath in Ira's rig? How do I signal the eventfd (x86->ppc, and ppc->x86)? To take it to the next level, how do I organize that mechanism so that it works for more than one IO-stream (e.g. address the various queues within ethernet or a different device like the console)? KVM has IOEVENTFD and IRQFD managed with MSI and PIO. This new rig does not have the luxury of an established IO paradigm. Is vbus the only way to implement a solution? No. But it is _a_ way, and its one that was specifically designed to solve this very problem (as well as others). (As an aside, note that you generally will want an abstraction on top of irqfd/eventfd like shm-signal or virtqueues to do shared-memory based event mitigation, but I digress. That is a separate topic). > >> To meet performance, this stuff has to be in kernel and there has to be >> a way to manage it. > > and management belongs in userspace. vbus does not dictate where the management must be. Its an extensible framework, governed by what you plug into it (ala connectors and devices). For instance, the vbus-kvm connector in alacrityvm chooses to put DEVADD and DEVDROP hotswap events into the interrupt stream, because they are simple and we already needed the interrupt stream anyway for fast-path. As another example: venet chose to put ->call(MACQUERY) "config-space" into its call namespace because its simple, and we already need ->calls() for fastpath. It therefore exports an attribute to sysfs that allows the management app to set it. I could likewise have designed the connector or device-model differently as to keep the mac-address and hotswap-events somewhere else (QEMU/PCI userspace) but this seems silly to me when they are so trivial, so I didn't. > >> Since vbus was designed to do exactly that, this is >> what I would advocate. You could also reinvent these concepts and put >> your own mux and mapping code in place, in addition to all the other >> stuff that vbus does. But I am not clear why anyone would want to. >> > > Maybe they like their backward compatibility and Windows support. This is really not relevant to this thread, since we are talking about Ira's hardware. But if you must bring this up, then I will reiterate that you just design the connector to interface with QEMU+PCI and you have that too if that was important to you. But on that topic: Since you could consider KVM a "motherboard manufacturer" of sorts (it just happens to be virtual hardware), I don't know why KVM seems to consider itself the only motherboard manufacturer in the world that has to make everything look legacy. If a company like ASUS wants to add some cutting edge IO controller/bus, they simply do it. Pretty much every product release may contain a different array of devices, many of which are not backwards compatible with any prior silicon. The guy/gal installing Windows on that system may see a "?" in device-manager until they load a driver that supports the new chip, and subsequently it works. It is certainly not a requirement to make said chip somehow work with existing drivers/facilities on bare metal, per se. Why should virtual systems be different? So, yeah, the current design of the vbus-kvm connector means I have to provide a driver. This is understood, and I have no problem with that. The only thing that I would agree has to be backwards compatible is the BIOS/boot function. If you can't support running an image like the Windows installer, you are hosed. If you can't use your ethernet until you get a chance to install a driver after the install completes, its just like most other systems in existence. IOW: It's not a big deal. For cases where the IO system is needed
Re: [PATCH] virtio_console: Add support for multiple ports for generic guest and host communication
Alan Cox wrote: >> This device is very much a serial port. I don't see any reason not >> to treat it like one. >> > > Here are a few > > - You don't need POSIX multi-open semantics, hangup and the like > We do actually want hangup and a few other of the tty specific ops. The only thing we really don't want is a baud rate. > - Seek makes sense on some kinds of fixed attributes > I don't think we're dealing with fixed attributes. These are streams. Fundamentally, this is a paravirtual uart. The improvement over a standard uart is that there can be a larger number of ports, ports can have some identification associated with them, and we are not constrained to the emulated hardware interface which doesn't exist on certain platforms (like s390). > - TTY has a relatively large memory overhead per device > - Sysfs is what everything else uses > - Sysfs has some rather complete lifetime management you'll need to > redo by hand > sysfs doesn't model streaming data which is what this driver provides. > - You don't need idiotic games with numbering spaces > > Abusing tty for this is ridiculous. If the argument is that tty is an awkward interface that should only be used for legacy purposes, then sure, we should just implement a new userspace interface for this. In fact, this is probably supported by the very existence of hvc. On the other hand, this is fundamentally a paravirtual serial device. Since serial devices are exposed via the tty subsystem, it seems like a logical choice. > In some ways putting much of it in > kernel is ridiculous too as you can do it with a FUSE fs or simply > export the info guest-guest using SNMP. > This device cannot be implemented as-is in userspace because it depends on DMA which precludes the use of something like uio_pci. We could modify the device to avoid dma if the feeling was that there was no interest in putting this in the kernel. Regards, Anthony Liguori ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server
On 09/16/2009 10:22 PM, Gregory Haskins wrote: > Avi Kivity wrote: > >> On 09/16/2009 05:10 PM, Gregory Haskins wrote: >> If kvm can do it, others can. >>> The problem is that you seem to either hand-wave over details like this, >>> or you give details that are pretty much exactly what vbus does already. >>>My point is that I've already sat down and thought about these issues >>> and solved them in a freely available GPL'ed software package. >>> >>> >> In the kernel. IMO that's the wrong place for it. >> > 3) "in-kernel": You can do something like virtio-net to vhost to > potentially meet some of the requirements, but not all. > > In order to fully meet (3), you would need to do some of that stuff you > mentioned in the last reply with muxing device-nr/reg-nr. In addition, > we need to have a facility for mapping eventfds and establishing a > signaling mechanism (like PIO+qid), etc. KVM does this with > IRQFD/IOEVENTFD, but we dont have KVM in this case so it needs to be > invented. > irqfd/eventfd is the abstraction layer, it doesn't need to be reabstracted. > To meet performance, this stuff has to be in kernel and there has to be > a way to manage it. and management belongs in userspace. > Since vbus was designed to do exactly that, this is > what I would advocate. You could also reinvent these concepts and put > your own mux and mapping code in place, in addition to all the other > stuff that vbus does. But I am not clear why anyone would want to. > Maybe they like their backward compatibility and Windows support. > So no, the kernel is not the wrong place for it. Its the _only_ place > for it. Otherwise, just use (1) and be done with it. > > I'm talking about the config stuff, not the data path. >> Further, if we adopt >> vbus, if drop compatibility with existing guests or have to support both >> vbus and virtio-pci. >> > We already need to support both (at least to support Ira). virtio-pci > doesn't work here. Something else (vbus, or vbus-like) is needed. > virtio-ira. >>> So the question is: is your position that vbus is all wrong and you wish >>> to create a new bus-like thing to solve the problem? >>> >> I don't intend to create anything new, I am satisfied with virtio. If >> it works for Ira, excellent. If not, too bad. >> > I think that about sums it up, then. > Yes. I'm all for reusing virtio, but I'm not going switch to vbus or support both for this esoteric use case. >>> If so, how is it >>> different from what Ive already done? More importantly, what specific >>> objections do you have to what Ive done, as perhaps they can be fixed >>> instead of starting over? >>> >>> >> The two biggest objections are: >> - the host side is in the kernel >> > As it needs to be. > vhost-net somehow manages to work without the config stuff in the kernel. > With all due respect, based on all of your comments in aggregate I > really do not think you are truly grasping what I am actually building here. > Thanks. >>> Bingo. So now its a question of do you want to write this layer from >>> scratch, or re-use my framework. >>> >>> >> You will have to implement a connector or whatever for vbus as well. >> vbus has more layers so it's probably smaller for vbus. >> > Bingo! (addictive, isn't it) > That is precisely the point. > > All the stuff for how to map eventfds, handle signal mitigation, demux > device/function pointers, isolation, etc, are built in. All the > connector has to do is transport the 4-6 verbs and provide a memory > mapping/copy function, and the rest is reusable. The device models > would then work in all environments unmodified, and likewise the > connectors could use all device-models unmodified. > Well, virtio has a similar abstraction on the guest side. The host side abstraction is limited to signalling since all configuration is in userspace. vhost-net ought to work for lguest and s390 without change. >> It was already implemented three times for virtio, so apparently that's >> extensible too. >> > And to my point, I'm trying to commoditize as much of that process as > possible on both the front and backends (at least for cases where > performance matters) so that you don't need to reinvent the wheel for > each one. > Since you're interested in any-to-any connectors it makes sense to you. I'm only interested in kvm-host-to-kvm-guest, so reducing the already minor effort to implement a new virtio binding has little appeal to me. >> You mean, if the x86 board was able to access the disks and dma into the >> ppb boards memory? You'd run vhost-blk on x86 and virtio-net on ppc. >> > But as we discussed, vhost doesn't work well if you try to run it on the > x86 side due to its assumptions about pagable "guest" memory, right? So > is that even an option? And even still, you would still need to solve > t
Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server
Avi Kivity wrote: > On 09/16/2009 05:10 PM, Gregory Haskins wrote: >> >>> If kvm can do it, others can. >>> >> The problem is that you seem to either hand-wave over details like this, >> or you give details that are pretty much exactly what vbus does already. >> My point is that I've already sat down and thought about these issues >> and solved them in a freely available GPL'ed software package. >> > > In the kernel. IMO that's the wrong place for it. In conversations with Ira, he indicated he needs kernel-to-kernel ethernet for performance, and needs at least an ethernet and console connectivity. You could conceivably build a solution for this system 3 basic ways: 1) "completely" in userspace: use things like tuntap on the ppc boards, and tunnel packets across a custom point-to-point connection formed over the pci link to a userspace app on the x86 board. This app then reinjects the packets into the x86 kernel as a raw socket or tuntap, etc. Pretty much vanilla tuntap/vpn kind of stuff. Advantage: very little kernel code. Problem: performance (citation: hopefully obvious). 2) "partially" in userspace: have an in-kernel virtio-net driver talk to a userspace based virtio-net backend. This is the (current, non-vhost oriented) KVM/qemu model. Advantage, re-uses existing kernel-code. Problem: performance (citation: see alacrityvm numbers). 3) "in-kernel": You can do something like virtio-net to vhost to potentially meet some of the requirements, but not all. In order to fully meet (3), you would need to do some of that stuff you mentioned in the last reply with muxing device-nr/reg-nr. In addition, we need to have a facility for mapping eventfds and establishing a signaling mechanism (like PIO+qid), etc. KVM does this with IRQFD/IOEVENTFD, but we dont have KVM in this case so it needs to be invented. To meet performance, this stuff has to be in kernel and there has to be a way to manage it. Since vbus was designed to do exactly that, this is what I would advocate. You could also reinvent these concepts and put your own mux and mapping code in place, in addition to all the other stuff that vbus does. But I am not clear why anyone would want to. So no, the kernel is not the wrong place for it. Its the _only_ place for it. Otherwise, just use (1) and be done with it. > Further, if we adopt > vbus, if drop compatibility with existing guests or have to support both > vbus and virtio-pci. We already need to support both (at least to support Ira). virtio-pci doesn't work here. Something else (vbus, or vbus-like) is needed. > >> So the question is: is your position that vbus is all wrong and you wish >> to create a new bus-like thing to solve the problem? > > I don't intend to create anything new, I am satisfied with virtio. If > it works for Ira, excellent. If not, too bad. I think that about sums it up, then. > I believe it will work without too much trouble. Afaict it wont for the reasons I mentioned. > >> If so, how is it >> different from what Ive already done? More importantly, what specific >> objections do you have to what Ive done, as perhaps they can be fixed >> instead of starting over? >> > > The two biggest objections are: > - the host side is in the kernel As it needs to be. > - the guest side is a new bus instead of reusing pci (on x86/kvm), > making Windows support more difficult Thats a function of the vbus-connector, which is different from vbus-core. If you don't like it (and I know you don't), we can write one that interfaces to qemu's pci system. I just don't like the limitations that imposes, nor do I think we need that complexity of dealing with a split PCI model, so I chose to not implement vbus-kvm this way. With all due respect, based on all of your comments in aggregate I really do not think you are truly grasping what I am actually building here. > > I guess these two are exactly what you think are vbus' greatest > advantages, so we'll probably have to extend our agree-to-disagree on > this one. > > I also had issues with using just one interrupt vector to service all > events, but that's easily fixed. Again, function of the connector. > >>> There is no guest and host in this scenario. There's a device side >>> (ppc) and a driver side (x86). The driver side can access configuration >>> information on the device side. How to multiplex multiple devices is an >>> interesting exercise for whoever writes the virtio binding for that >>> setup. >>> >> Bingo. So now its a question of do you want to write this layer from >> scratch, or re-use my framework. >> > > You will have to implement a connector or whatever for vbus as well. > vbus has more layers so it's probably smaller for vbus. Bingo! That is precisely the point. All the stuff for how to map eventfds, handle signal mitigation, demux device/function pointers, isolation, etc, are built in. All the connector has to do is transport the 4-6 verbs and provide
Re: vhost-net todo list
On 09/16/2009 06:27 PM, Arnd Bergmann wrote: > That scenario is probably not so relevant for KVM, unless you > consider the guest taking over the qemu host process a valid > security threat. > It is. We address it by using SCM_RIGHTS for all sensitive operations and selinuxing qemu as tightly as possible. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: vhost-net todo list
On Wed, Sep 16, 2009 at 05:27:25PM +0200, Arnd Bergmann wrote: > On Wednesday 16 September 2009, Michael S. Tsirkin wrote: > > > > > > No, I think this is less important, because the bridge code > > > also doesn't do this. > > > > True, but the reason might be that it is much harder in bridge (you have > > to snoop multicast registrations). With macvlan you know which > > multicasts does each device want. > > Right. It shouldn't be hard to do, and I'll probably get to > that after the other changes. > > > One of the problems that raw packet sockets have is the requirement > > > for root permissions (e.g. through libvirt). Tap sockets and > > > macvtap both don't have this limitation, so you can use them as > > > a regular user without libvirt. > > > > I don't see a huge difference here. > > If you are happy with the user being able to bypass filters in host, > > just give her CAP_NET_RAW capability. It does not have to be root. > > Capabilities are nice in theory, but I've never seen them being used > effectively in practice, where it essentially comes down to some > SUID wrapper. Heh, for tap people seem to just give out write access to it and that's all. Not really different. > Also, I might not want to allow the user to open a > random random raw socket, but only one on a specific downstream > port of a macvlan interface, so I can filter out the data from > that respective MAC address in an external switch. I agree. Maybe we can fix that for raw sockets, want me to add it to the list? :) > That scenario is probably not so relevant for KVM, unless you > consider the guest taking over the qemu host process a valid > security threat. Defence in depth is a good thing, anyway. > Arnd <>< ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server
On Wed, Sep 16, 2009 at 05:22:37PM +0200, Arnd Bergmann wrote: > On Wednesday 16 September 2009, Michael S. Tsirkin wrote: > > On Wed, Sep 16, 2009 at 04:57:42PM +0200, Arnd Bergmann wrote: > > > On Tuesday 15 September 2009, Michael S. Tsirkin wrote: > > > > Userspace in x86 maps a PCI region, uses it for communication with ppc? > > > > > > This might have portability issues. On x86 it should work, but if the > > > host is powerpc or similar, you cannot reliably access PCI I/O memory > > > through copy_tofrom_user but have to use memcpy_toio/fromio or > > > readl/writel > > > calls, which don't work on user pointers. > > > > > > Specifically on powerpc, copy_from_user cannot access unaligned buffers > > > if they are on an I/O mapping. > > > > > We are talking about doing this in userspace, not in kernel. > > Ok, that's fine then. I thought the idea was to use the vhost_net driver It's a separate issue. We were talking generally about configuration and setup. Gregory implemented it in kernel, Avi wants it moved to userspace, with only fastpath in kernel. > to access the user memory, which would be a really cute hack otherwise, > as you'd only need to provide the eventfds from a hardware specific > driver and could use the regular virtio_net on the other side. > > Arnd <>< To do that, maybe copy to user on ppc can be fixed, or wrapped around in a arch specific macro, so that everyone else does not have to go through abstraction layers. -- MST ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server
On 09/16/2009 05:10 PM, Gregory Haskins wrote: > >> If kvm can do it, others can. >> > The problem is that you seem to either hand-wave over details like this, > or you give details that are pretty much exactly what vbus does already. > My point is that I've already sat down and thought about these issues > and solved them in a freely available GPL'ed software package. > In the kernel. IMO that's the wrong place for it. Further, if we adopt vbus, if drop compatibility with existing guests or have to support both vbus and virtio-pci. > So the question is: is your position that vbus is all wrong and you wish > to create a new bus-like thing to solve the problem? I don't intend to create anything new, I am satisfied with virtio. If it works for Ira, excellent. If not, too bad. I believe it will work without too much trouble. > If so, how is it > different from what Ive already done? More importantly, what specific > objections do you have to what Ive done, as perhaps they can be fixed > instead of starting over? > The two biggest objections are: - the host side is in the kernel - the guest side is a new bus instead of reusing pci (on x86/kvm), making Windows support more difficult I guess these two are exactly what you think are vbus' greatest advantages, so we'll probably have to extend our agree-to-disagree on this one. I also had issues with using just one interrupt vector to service all events, but that's easily fixed. >> There is no guest and host in this scenario. There's a device side >> (ppc) and a driver side (x86). The driver side can access configuration >> information on the device side. How to multiplex multiple devices is an >> interesting exercise for whoever writes the virtio binding for that setup. >> > Bingo. So now its a question of do you want to write this layer from > scratch, or re-use my framework. > You will have to implement a connector or whatever for vbus as well. vbus has more layers so it's probably smaller for vbus. >>> I am talking about how we would tunnel the config space for N devices >>> across his transport. >>> >>> >> Sounds trivial. >> > No one said it was rocket science. But it does need to be designed and > implemented end-to-end, much of which Ive already done in what I hope is > an extensible way. > It was already implemented three times for virtio, so apparently that's extensible too. >> Write an address containing the device number and >> register number to on location, read or write data from another. >> > You mean like the "u64 devh", and "u32 func" fields I have here for the > vbus-kvm connector? > > http://git.kernel.org/?p=linux/kernel/git/ghaskins/alacrityvm/linux-2.6.git;a=blob;f=include/linux/vbus_pci.h;h=fe337590e644017392e4c9d9236150adb2333729;hb=ded8ce2005a85c174ba93ee26f8d67049ef11025#l64 > > Probably. >>> That sounds convenient given his hardware, but it has its own set of >>> problems. For one, the configuration/inventory of these boards is now >>> driven by the wrong side and has to be addressed. >>> >> Why is it the wrong side? >> > "Wrong" is probably too harsh a word when looking at ethernet. Its > certainly "odd", and possibly inconvenient. It would be like having > vhost in a KVM guest, and virtio-net running on the host. You could do > it, but its weird and awkward. Where it really falls apart and enters > the "wrong" category is for non-symmetric devices, like disk-io. > > It's not odd or wrong or wierd or awkward. An ethernet NIC is not symmetric, one side does DMA and issues interrupts, the other uses its own memory. That's exactly the case with Ira's setup. If the ppc boards were to emulate a disk controller, you'd run virtio-blk on x86 and vhost-blk on the ppc boards. >>> Second, the role >>> reversal will likely not work for many models other than ethernet (e.g. >>> virtio-console or virtio-blk drivers running on the x86 board would be >>> naturally consuming services from the slave boards...virtio-net is an >>> exception because 802.x is generally symmetrical). >>> >>> >> There is no role reversal. >> > So if I have virtio-blk driver running on the x86 and vhost-blk device > running on the ppc board, I can use the ppc board as a block-device. > What if I really wanted to go the other way? > You mean, if the x86 board was able to access the disks and dma into the ppb boards memory? You'd run vhost-blk on x86 and virtio-net on ppc. As long as you don't use the words "guest" and "host" but keep to "driver" and "device", it all works out. >> The side doing dma is the device, the side >> accessing its own memory is the driver. Just like that other 1e12 >> driver/device pairs out there. >> > IIUC, his ppc boards really can be seen as "guests" (they are linux > instances that are utilizing services from the x86, not the other way > around). They aren't guests. Guests d
Re: vhost-net todo list
On Wednesday 16 September 2009, Michael S. Tsirkin wrote: > > > > No, I think this is less important, because the bridge code > > also doesn't do this. > > True, but the reason might be that it is much harder in bridge (you have > to snoop multicast registrations). With macvlan you know which > multicasts does each device want. Right. It shouldn't be hard to do, and I'll probably get to that after the other changes. > > One of the problems that raw packet sockets have is the requirement > > for root permissions (e.g. through libvirt). Tap sockets and > > macvtap both don't have this limitation, so you can use them as > > a regular user without libvirt. > > I don't see a huge difference here. > If you are happy with the user being able to bypass filters in host, > just give her CAP_NET_RAW capability. It does not have to be root. Capabilities are nice in theory, but I've never seen them being used effectively in practice, where it essentially comes down to some SUID wrapper. Also, I might not want to allow the user to open a random random raw socket, but only one on a specific downstream port of a macvlan interface, so I can filter out the data from that respective MAC address in an external switch. That scenario is probably not so relevant for KVM, unless you consider the guest taking over the qemu host process a valid security threat. Arnd <>< ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server
On Wednesday 16 September 2009, Michael S. Tsirkin wrote: > On Wed, Sep 16, 2009 at 04:57:42PM +0200, Arnd Bergmann wrote: > > On Tuesday 15 September 2009, Michael S. Tsirkin wrote: > > > Userspace in x86 maps a PCI region, uses it for communication with ppc? > > > > This might have portability issues. On x86 it should work, but if the > > host is powerpc or similar, you cannot reliably access PCI I/O memory > > through copy_tofrom_user but have to use memcpy_toio/fromio or readl/writel > > calls, which don't work on user pointers. > > > > Specifically on powerpc, copy_from_user cannot access unaligned buffers > > if they are on an I/O mapping. > > > We are talking about doing this in userspace, not in kernel. Ok, that's fine then. I thought the idea was to use the vhost_net driver to access the user memory, which would be a really cute hack otherwise, as you'd only need to provide the eventfds from a hardware specific driver and could use the regular virtio_net on the other side. Arnd <>< ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: vhost-net todo list
On Wed, Sep 16, 2009 at 05:08:46PM +0200, Arnd Bergmann wrote: > On Wednesday 16 September 2009, Michael S. Tsirkin wrote: > > On Wed, Sep 16, 2009 at 04:52:40PM +0200, Arnd Bergmann wrote: > > > On Wednesday 16 September 2009, Michael S. Tsirkin wrote: > > > > vhost-net driver projects > > > > > > I still think that list should include > > > > Yea, why not. Go wild. > > > > > - UDP multicast socket support > > > - TCP socket support > > > > Switch to UDP unicast while we are at it? > > tunneling raw packets over TCP looks wrong. > > Well, TCP is what qemu supports right now, that's why > I added it to the list. We could add UDP unicast as > yet another protocol in both qemu and vhost_net if there > is demand for it. The implementation should be trivial > based on the existing code paths. > > > > One thing I'm planning to work on is bridge support in macvlan, > > > together with VEPA compliant operation, i.e. not sending back > > > multicast frames to the origin. > > > > is multicast filtering already there (i.e. only getting > > frames for groups you want)? > > No, I think this is less important, because the bridge code > also doesn't do this. True, but the reason might be that it is much harder in bridge (you have to snoop multicast registrations). With macvlan you know which multicasts does each device want. > > > I'll also keep looking into macvtap, though that will be less > > > important once you get the tap socket support running. > > > > Not sure I see the connection. to get an equivalent to macvtap, > > what you need is tso etc support in packet sockets. No? > > I'm not worried about tso support here. > > One of the problems that raw packet sockets have is the requirement > for root permissions (e.g. through libvirt). Tap sockets and > macvtap both don't have this limitation, so you can use them as > a regular user without libvirt. I don't see a huge difference here. If you are happy with the user being able to bypass filters in host, just give her CAP_NET_RAW capability. It does not have to be root. > Arnd <>< ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server
On Wed, Sep 16, 2009 at 04:57:42PM +0200, Arnd Bergmann wrote: > On Tuesday 15 September 2009, Michael S. Tsirkin wrote: > > Userspace in x86 maps a PCI region, uses it for communication with ppc? > > This might have portability issues. On x86 it should work, but if the > host is powerpc or similar, you cannot reliably access PCI I/O memory > through copy_tofrom_user but have to use memcpy_toio/fromio or readl/writel > calls, which don't work on user pointers. > > Specifically on powerpc, copy_from_user cannot access unaligned buffers > if they are on an I/O mapping. > > Arnd <>< We are talking about doing this in userspace, not in kernel. -- MST ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: vhost-net todo list
On Wednesday 16 September 2009, Michael S. Tsirkin wrote: > On Wed, Sep 16, 2009 at 04:52:40PM +0200, Arnd Bergmann wrote: > > On Wednesday 16 September 2009, Michael S. Tsirkin wrote: > > > vhost-net driver projects > > > > I still think that list should include > > Yea, why not. Go wild. > > > - UDP multicast socket support > > - TCP socket support > > Switch to UDP unicast while we are at it? > tunneling raw packets over TCP looks wrong. Well, TCP is what qemu supports right now, that's why I added it to the list. We could add UDP unicast as yet another protocol in both qemu and vhost_net if there is demand for it. The implementation should be trivial based on the existing code paths. > > One thing I'm planning to work on is bridge support in macvlan, > > together with VEPA compliant operation, i.e. not sending back > > multicast frames to the origin. > > is multicast filtering already there (i.e. only getting > frames for groups you want)? No, I think this is less important, because the bridge code also doesn't do this. > > I'll also keep looking into macvtap, though that will be less > > important once you get the tap socket support running. > > Not sure I see the connection. to get an equivalent to macvtap, > what you need is tso etc support in packet sockets. No? I'm not worried about tso support here. One of the problems that raw packet sockets have is the requirement for root permissions (e.g. through libvirt). Tap sockets and macvtap both don't have this limitation, so you can use them as a regular user without libvirt. Arnd <>< ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: vhost-net todo list
On Wed, Sep 16, 2009 at 04:52:40PM +0200, Arnd Bergmann wrote: > On Wednesday 16 September 2009, Michael S. Tsirkin wrote: > > vhost-net driver projects > > I still think that list should include Why not. But note that including things in a list will not magically make them done :) > - UDP multicast socket support > - TCP socket support Switch to UDP unicast while we are at it? tunneling raw packets over TCP looks wrong. > - raw packet socket support for qemu (from Or Gerlitz) > if we have those, plus the tap support that is already on > your list, we can use vhost-net as a generic offload > for the host networking in qemu. > > > projects involing networking stack > > - export socket from tap so vhost can use it - working on it now > > - extend raw sockets to support GSO/checksum offloading, > > and teach vhost to use that capability > > [one way to do this: virtio net header support] > > will allow working with e.g. macvlan > > One thing I'm planning to work on is bridge support in macvlan, > together with VEPA compliant operation, i.e. not sending back > multicast frames to the origin. is multicast filtering already there (i.e. only getting frames for groups you want)? > I'll also keep looking into macvtap, though that will be less > important once you get the tap socket support running. Not sure I see the connection. to get an equivalent to macvtap, what you need is tso etc support in packet sockets. No? > Arnd <>< ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: vhost-net todo list
On Wed, Sep 16, 2009 at 04:52:40PM +0200, Arnd Bergmann wrote: > On Wednesday 16 September 2009, Michael S. Tsirkin wrote: > > vhost-net driver projects > > I still think that list should include Yea, why not. Go wild. > - UDP multicast socket support > - TCP socket support Switch to UDP unicast while we are at it? tunneling raw packets over TCP looks wrong. > - raw packet socket support for qemu (from Or Gerlitz) > if we have those, plus the tap support that is already on > your list, we can use vhost-net as a generic offload > for the host networking in qemu. > > > projects involing networking stack > > - export socket from tap so vhost can use it - working on it now > > - extend raw sockets to support GSO/checksum offloading, > > and teach vhost to use that capability > > [one way to do this: virtio net header support] > > will allow working with e.g. macvlan > > One thing I'm planning to work on is bridge support in macvlan, > together with VEPA compliant operation, i.e. not sending back > multicast frames to the origin. is multicast filtering already there (i.e. only getting frames for groups you want)? > I'll also keep looking into macvtap, though that will be less > important once you get the tap socket support running. Not sure I see the connection. to get an equivalent to macvtap, what you need is tso etc support in packet sockets. No? > Arnd <>< ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server
On Tuesday 15 September 2009, Michael S. Tsirkin wrote: > Userspace in x86 maps a PCI region, uses it for communication with ppc? This might have portability issues. On x86 it should work, but if the host is powerpc or similar, you cannot reliably access PCI I/O memory through copy_tofrom_user but have to use memcpy_toio/fromio or readl/writel calls, which don't work on user pointers. Specifically on powerpc, copy_from_user cannot access unaligned buffers if they are on an I/O mapping. Arnd <>< ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: vhost-net todo list
On Wednesday 16 September 2009, Michael S. Tsirkin wrote: > vhost-net driver projects I still think that list should include - UDP multicast socket support - TCP socket support - raw packet socket support for qemu (from Or Gerlitz) if we have those, plus the tap support that is already on your list, we can use vhost-net as a generic offload for the host networking in qemu. > projects involing networking stack > - export socket from tap so vhost can use it - working on it now > - extend raw sockets to support GSO/checksum offloading, > and teach vhost to use that capability > [one way to do this: virtio net header support] > will allow working with e.g. macvlan One thing I'm planning to work on is bridge support in macvlan, together with VEPA compliant operation, i.e. not sending back multicast frames to the origin. I'll also keep looking into macvtap, though that will be less important once you get the tap socket support running. Arnd <>< ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server
Avi Kivity wrote: > On 09/16/2009 02:44 PM, Gregory Haskins wrote: >> The problem isn't where to find the models...the problem is how to >> aggregate multiple models to the guest. >> > > You mean configuration? > >>> You instantiate multiple vhost-nets. Multiple ethernet NICs is a >>> supported configuration for kvm. >>> >> But this is not KVM. >> >> > > If kvm can do it, others can. The problem is that you seem to either hand-wave over details like this, or you give details that are pretty much exactly what vbus does already. My point is that I've already sat down and thought about these issues and solved them in a freely available GPL'ed software package. So the question is: is your position that vbus is all wrong and you wish to create a new bus-like thing to solve the problem? If so, how is it different from what Ive already done? More importantly, what specific objections do you have to what Ive done, as perhaps they can be fixed instead of starting over? > His slave boards surface themselves as PCI devices to the x86 host. So how do you use that to make multiple vhost-based devices (say two virtio-nets, and a virtio-console) communicate across the transport? >>> I don't really see the difference between 1 and N here. >>> >> A KVM surfaces N virtio-devices as N pci-devices to the guest. What do >> we do in Ira's case where the entire guest represents itself as a PCI >> device to the host, and nothing the other way around? >> > > There is no guest and host in this scenario. There's a device side > (ppc) and a driver side (x86). The driver side can access configuration > information on the device side. How to multiplex multiple devices is an > interesting exercise for whoever writes the virtio binding for that setup. Bingo. So now its a question of do you want to write this layer from scratch, or re-use my framework. > There are multiple ways to do this, but what I am saying is that whatever is conceived will start to look eerily like a vbus-connector, since this is one of its primary purposes ;) >>> I'm not sure if you're talking about the configuration interface or data >>> path here. >>> >> I am talking about how we would tunnel the config space for N devices >> across his transport. >> > > Sounds trivial. No one said it was rocket science. But it does need to be designed and implemented end-to-end, much of which Ive already done in what I hope is an extensible way. > Write an address containing the device number and > register number to on location, read or write data from another. You mean like the "u64 devh", and "u32 func" fields I have here for the vbus-kvm connector? http://git.kernel.org/?p=linux/kernel/git/ghaskins/alacrityvm/linux-2.6.git;a=blob;f=include/linux/vbus_pci.h;h=fe337590e644017392e4c9d9236150adb2333729;hb=ded8ce2005a85c174ba93ee26f8d67049ef11025#l64 > Just > like the PCI cf8/cfc interface. > >>> They aren't in the "guest". The best way to look at it is >>> >>> - a device side, with a dma engine: vhost-net >>> - a driver side, only accessing its own memory: virtio-net >>> >>> Given that Ira's config has the dma engine in the ppc boards, that's >>> where vhost-net would live (the ppc boards acting as NICs to the x86 >>> board, essentially). >>> >> That sounds convenient given his hardware, but it has its own set of >> problems. For one, the configuration/inventory of these boards is now >> driven by the wrong side and has to be addressed. > > Why is it the wrong side? "Wrong" is probably too harsh a word when looking at ethernet. Its certainly "odd", and possibly inconvenient. It would be like having vhost in a KVM guest, and virtio-net running on the host. You could do it, but its weird and awkward. Where it really falls apart and enters the "wrong" category is for non-symmetric devices, like disk-io. > >> Second, the role >> reversal will likely not work for many models other than ethernet (e.g. >> virtio-console or virtio-blk drivers running on the x86 board would be >> naturally consuming services from the slave boards...virtio-net is an >> exception because 802.x is generally symmetrical). >> > > There is no role reversal. So if I have virtio-blk driver running on the x86 and vhost-blk device running on the ppc board, I can use the ppc board as a block-device. What if I really wanted to go the other way? > The side doing dma is the device, the side > accessing its own memory is the driver. Just like that other 1e12 > driver/device pairs out there. IIUC, his ppc boards really can be seen as "guests" (they are linux instances that are utilizing services from the x86, not the other way around). vhost forces the model to have the ppc boards act as IO-hosts, whereas vbus would likely work in either direction due to its more refined abstraction layer. > >>> I have no idea, that's for Ira to solve. >>> >> Bingo. Thus
Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server
On 09/16/2009 02:44 PM, Gregory Haskins wrote: > The problem isn't where to find the models...the problem is how to > aggregate multiple models to the guest. > You mean configuration? >> You instantiate multiple vhost-nets. Multiple ethernet NICs is a >> supported configuration for kvm. >> > But this is not KVM. > > If kvm can do it, others can. >>> His slave boards surface themselves as PCI devices to the x86 >>> host. So how do you use that to make multiple vhost-based devices (say >>> two virtio-nets, and a virtio-console) communicate across the transport? >>> >>> >> I don't really see the difference between 1 and N here. >> > A KVM surfaces N virtio-devices as N pci-devices to the guest. What do > we do in Ira's case where the entire guest represents itself as a PCI > device to the host, and nothing the other way around? > There is no guest and host in this scenario. There's a device side (ppc) and a driver side (x86). The driver side can access configuration information on the device side. How to multiplex multiple devices is an interesting exercise for whoever writes the virtio binding for that setup. >>> There are multiple ways to do this, but what I am saying is that >>> whatever is conceived will start to look eerily like a vbus-connector, >>> since this is one of its primary purposes ;) >>> >>> >> I'm not sure if you're talking about the configuration interface or data >> path here. >> > I am talking about how we would tunnel the config space for N devices > across his transport. > Sounds trivial. Write an address containing the device number and register number to on location, read or write data from another. Just like the PCI cf8/cfc interface. >> They aren't in the "guest". The best way to look at it is >> >> - a device side, with a dma engine: vhost-net >> - a driver side, only accessing its own memory: virtio-net >> >> Given that Ira's config has the dma engine in the ppc boards, that's >> where vhost-net would live (the ppc boards acting as NICs to the x86 >> board, essentially). >> > That sounds convenient given his hardware, but it has its own set of > problems. For one, the configuration/inventory of these boards is now > driven by the wrong side and has to be addressed. Why is it the wrong side? > Second, the role > reversal will likely not work for many models other than ethernet (e.g. > virtio-console or virtio-blk drivers running on the x86 board would be > naturally consuming services from the slave boards...virtio-net is an > exception because 802.x is generally symmetrical). > There is no role reversal. The side doing dma is the device, the side accessing its own memory is the driver. Just like that other 1e12 driver/device pairs out there. >> I have no idea, that's for Ira to solve. >> > Bingo. Thus my statement that the vhost proposal is incomplete. You > have the virtio-net and vhost-net pieces covering the fast-path > end-points, but nothing in the middle (transport, aggregation, > config-space), and nothing on the management-side. vbus provides most > of the other pieces, and can even support the same virtio-net protocol > on top. The remaining part would be something like a udev script to > populate the vbus with devices on board-insert events. > Of course vhost is incomplete, in the same sense that Linux is incomplete. Both require userspace. >> If he could fake the PCI >> config space as seen by the x86 board, he would just show the normal pci >> config and use virtio-pci (multiple channels would show up as a >> multifunction device). Given he can't, he needs to tunnel the virtio >> config space some other way. >> > Right, and note that vbus was designed to solve this. This tunneling > can, of course, be done without vbus using some other design. However, > whatever solution is created will look incredibly close to what I've > already done, so my point is "why reinvent it"? > virtio requires binding for this tunnelling, so does vbus. Its the same problem with the same solution. -- error compiling committee.c: too many arguments to function ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server
Avi Kivity wrote: > On 09/15/2009 11:08 PM, Gregory Haskins wrote: >> >>> There's virtio-console, virtio-blk etc. None of these have kernel-mode >>> servers, but these could be implemented if/when needed. >>> >> IIUC, Ira already needs at least ethernet and console capability. >> >> > > He's welcome to pick up the necessary code from qemu. The problem isn't where to find the models...the problem is how to aggregate multiple models to the guest. > b) what do you suppose this protocol to aggregate the connections would look like? (hint: this is what a vbus-connector does). >>> You mean multilink? You expose the device as a multiqueue. >>> >> No, what I mean is how do you surface multiple ethernet and consoles to >> the guests? For Ira's case, I think he needs at minimum at least one of >> each, and he mentioned possibly having two unique ethernets at one point. >> > > You instantiate multiple vhost-nets. Multiple ethernet NICs is a > supported configuration for kvm. But this is not KVM. > >> His slave boards surface themselves as PCI devices to the x86 >> host. So how do you use that to make multiple vhost-based devices (say >> two virtio-nets, and a virtio-console) communicate across the transport? >> > > I don't really see the difference between 1 and N here. A KVM surfaces N virtio-devices as N pci-devices to the guest. What do we do in Ira's case where the entire guest represents itself as a PCI device to the host, and nothing the other way around? > >> There are multiple ways to do this, but what I am saying is that >> whatever is conceived will start to look eerily like a vbus-connector, >> since this is one of its primary purposes ;) >> > > I'm not sure if you're talking about the configuration interface or data > path here. I am talking about how we would tunnel the config space for N devices across his transport. As an aside, the vbus-kvm connector makes them one and the same, but they do not have to be. Its all in the connector design. > c) how do you manage the configuration, especially on a per-board basis? >>> pci (for kvm/x86). >>> >> Ok, for kvm understood (and I would also add "qemu" to that mix). But >> we are talking about vhost's application in a non-kvm environment here, >> right?. >> >> So if the vhost-X devices are in the "guest", > > They aren't in the "guest". The best way to look at it is > > - a device side, with a dma engine: vhost-net > - a driver side, only accessing its own memory: virtio-net > > Given that Ira's config has the dma engine in the ppc boards, that's > where vhost-net would live (the ppc boards acting as NICs to the x86 > board, essentially). That sounds convenient given his hardware, but it has its own set of problems. For one, the configuration/inventory of these boards is now driven by the wrong side and has to be addressed. Second, the role reversal will likely not work for many models other than ethernet (e.g. virtio-console or virtio-blk drivers running on the x86 board would be naturally consuming services from the slave boards...virtio-net is an exception because 802.x is generally symmetrical). IIUC, vbus would support having the device models live properly on the x86 side, solving both of these problems. It would be impossible to reverse vhost given its current design. > >> and the x86 board is just >> a slave...How do you tell each ppc board how many devices and what >> config (e.g. MACs, etc) to instantiate? Do you assume that they should >> all be symmetric and based on positional (e.g. slot) data? What if you >> want asymmetric configurations (if not here, perhaps in a different >> environment)? >> > > I have no idea, that's for Ira to solve. Bingo. Thus my statement that the vhost proposal is incomplete. You have the virtio-net and vhost-net pieces covering the fast-path end-points, but nothing in the middle (transport, aggregation, config-space), and nothing on the management-side. vbus provides most of the other pieces, and can even support the same virtio-net protocol on top. The remaining part would be something like a udev script to populate the vbus with devices on board-insert events. > If he could fake the PCI > config space as seen by the x86 board, he would just show the normal pci > config and use virtio-pci (multiple channels would show up as a > multifunction device). Given he can't, he needs to tunnel the virtio > config space some other way. Right, and note that vbus was designed to solve this. This tunneling can, of course, be done without vbus using some other design. However, whatever solution is created will look incredibly close to what I've already done, so my point is "why reinvent it"? > >>> Yes. virtio is really virtualization oriented. >>> >> I would say that its vhost in particular that is virtualization >> oriented. virtio, as a concept, generally should work in physical >
Re: [PATCH] virtio_console: Add support for multiple ports for generic guest and host communication
> This device is very much a serial port. I don't see any reason not > to treat it like one. Here are a few - You don't need POSIX multi-open semantics, hangup and the like - Seek makes sense on some kinds of fixed attributes - TTY has a relatively large memory overhead per device - Sysfs is what everything else uses - Sysfs has some rather complete lifetime management you'll need to redo by hand - You don't need idiotic games with numbering spaces Abusing tty for this is ridiculous. In some ways putting much of it in kernel is ridiculous too as you can do it with a FUSE fs or simply export the info guest-guest using SNMP. Alan ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
vhost-net todo list
Some people asked about getting involved with vhost. Here's a short list of projects. vhost-net driver projects - profiling would be very helpful, I have not done any yet - tap support - working on it now - merged buffers - working on it now - scalability/fairness for large # of guests - working on it now - logging support with dirty page tracking in kernel - working on it now - indirect buffers - worth it? - vm exit mitigation for TX (worth it? naive implementation does not seem to help) - interrupt mitigation for RX - level triggered interrupts - what's the best thing to do here? qemu projects - migration support - level triggered interrupts - what's the best thing to do here? - upstream support for injecting interrupts from kernel, from qemu-kvm.git to qemu.git (this is a vhost dependency, without it vhost can't be upstreamed, or it can, but without real benefit) - general cleanup and upstreaming projects involing networking stack - export socket from tap so vhost can use it - working on it now - extend raw sockets to support GSO/checksum offloading, and teach vhost to use that capability [one way to do this: virtio net header support] will allow working with e.g. macvlan long term projects - multiqueue (involves all of vhost, qemu, virtio, networking stack) - More testing is always good -- MST ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server
On 09/15/2009 11:08 PM, Gregory Haskins wrote: > >> There's virtio-console, virtio-blk etc. None of these have kernel-mode >> servers, but these could be implemented if/when needed. >> > IIUC, Ira already needs at least ethernet and console capability. > > He's welcome to pick up the necessary code from qemu. >>> b) what do you suppose this protocol to aggregate the connections would >>> look like? (hint: this is what a vbus-connector does). >>> >>> >> You mean multilink? You expose the device as a multiqueue. >> > No, what I mean is how do you surface multiple ethernet and consoles to > the guests? For Ira's case, I think he needs at minimum at least one of > each, and he mentioned possibly having two unique ethernets at one point. > You instantiate multiple vhost-nets. Multiple ethernet NICs is a supported configuration for kvm. > His slave boards surface themselves as PCI devices to the x86 > host. So how do you use that to make multiple vhost-based devices (say > two virtio-nets, and a virtio-console) communicate across the transport? > I don't really see the difference between 1 and N here. > There are multiple ways to do this, but what I am saying is that > whatever is conceived will start to look eerily like a vbus-connector, > since this is one of its primary purposes ;) > I'm not sure if you're talking about the configuration interface or data path here. >>> c) how do you manage the configuration, especially on a per-board basis? >>> >>> >> pci (for kvm/x86). >> > Ok, for kvm understood (and I would also add "qemu" to that mix). But > we are talking about vhost's application in a non-kvm environment here, > right?. > > So if the vhost-X devices are in the "guest", They aren't in the "guest". The best way to look at it is - a device side, with a dma engine: vhost-net - a driver side, only accessing its own memory: virtio-net Given that Ira's config has the dma engine in the ppc boards, that's where vhost-net would live (the ppc boards acting as NICs to the x86 board, essentially). > and the x86 board is just > a slave...How do you tell each ppc board how many devices and what > config (e.g. MACs, etc) to instantiate? Do you assume that they should > all be symmetric and based on positional (e.g. slot) data? What if you > want asymmetric configurations (if not here, perhaps in a different > environment)? > I have no idea, that's for Ira to solve. If he could fake the PCI config space as seen by the x86 board, he would just show the normal pci config and use virtio-pci (multiple channels would show up as a multifunction device). Given he can't, he needs to tunnel the virtio config space some other way. >> Yes. virtio is really virtualization oriented. >> > I would say that its vhost in particular that is virtualization > oriented. virtio, as a concept, generally should work in physical > systems, if perhaps with some minor modifications. The biggest "limit" > is having "virt" in its name ;) > Let me rephrase. The virtio developers are virtualization oriented. If it works for non-virt applications, that's good, but not a design goal. -- error compiling committee.c: too many arguments to function ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization