Re: [Qemu-devel] [dpdk-dev] [ovs-dev] [PATCH RFC] netdev-dpdk: Fix device obtain mac address when received first packet in vhost type
On Fri, Nov 24, 2017 at 05:59:09PM +0800, Chen Hailin wrote: > Hi Aaron Conole && Jianfeng, > > The stp could not work in ovs-dpdk vhostuser. > Because the attached vhost device doesn't have MAC address. > > Now we have two ways to solve this problem. > 1. The vhost learns MAC address from packet like as my first patch. I do agree with Aaron this is not the right way. > 2. The virtio notifies MAC address actively to vhost user . Unfortunately, AFAIK, there is no way to achieve that so far. we could either let virtio/QEMU to expose the CQ to vhost or add a new VHOST_USER message to carry the mac address. While vhost-user is a generic interface adding a virtio-net specific message also doesn't seem quite right. Exposing CQ is probably the best we can do. Anyway, both need spec change. --yliu > > In my opinions, if we treat it as a device, we should allocate > MAC address for the device when the VM started. > > Which one do you think better? > > > > Best Regards, > Chen Hailin > che...@arraynetworks.com.cn > > From: Aaron Conole > Date: 2017-11-18 10:00 > To: Hailin Chen > CC: ovs-...@openvswitch.org; Maxime Coquelin; cl...@arraynetworks.com.cn > Subject: Re: [ovs-dev] [PATCH RFC] netdev-dpdk: Fix device obtain mac address > when received first packet in vhost type > Hi Hailin, > > Hailin Chen writes: > > > The stp could not work on netdev-dpdk if network is loop. > > Because the stp protocol negotiates designate port by sending > > BPDU packets which contains MAC address. > > However the device doesn't have MAC address in vhostuser type. > > Thus, function send_bpdu_cb would not send BPDU packets. > > > > This patch will set the MAC for device when received first packet. > > > > Signed-off-by: Hailin Chen > > --- > > Thanks for the patch. > > In general, I don't think this is the right approach to deal with this > type of issue. I believe the problem statement is that OvS bridge is > unaware of the guest MAC address - did I get it right? In that case, I > would think that a better way to solve this would be to have virtio tell > the mac address of the guest. I don't recall right now if that's > allowed in the virtio spec, but I do remember some kind of negotiation > features. > > I've CC'd Maxime, who is one of the maintainers of the virtio code from > DPDK side. Perhaps there is an alternate way to solve this. > ___ > dev mailing list > d...@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [Qemu-devel] [RFC v3 1/3] vhost-user: Add MTU protocol feature and op
On Wed, Nov 30, 2016 at 11:10:15AM +0100, Maxime Coquelin wrote: > +#define VHOST_USER_PROTOCOL_F_MTU4 > > Message types > - > @@ -470,6 +471,18 @@ Message types >The first 6 bytes of the payload contain the mac address of the guest > to >allow the vhost user backend to construct and broadcast the fake RARP. > > + * VHOST_USER_SET_MTU One thing I have just thought of is that, though we currently have one vhost-user driver (net), SPDK here (James cc'ed) is proposing vhost-user for other devices, say SCSI. So maybe it's better to add some type prefix here, like VHOST_USER_NET_SET_MTU? --yliu
Re: [Qemu-devel] [dpdk-dev] dpdk/vpp and cross-version migration for vhost
On Thu, Nov 24, 2016 at 09:30:49AM +, Kevin Traynor wrote: > On 11/24/2016 06:31 AM, Yuanhan Liu wrote: > > On Tue, Nov 22, 2016 at 04:53:05PM +0200, Michael S. Tsirkin wrote: > >>>> You keep assuming that you have the VM started first and > >>>> figure out things afterwards, but this does not work. > >>>> > >>>> Think about a cluster of machines. You want to start a VM in > >>>> a way that will ensure compatibility with all hosts > >>>> in a cluster. > >>> > >>> I see. I was more considering about the case when the dst > >>> host (including the qemu and dpdk combo) is given, and > >>> then determine whether it will be a successfull migration > >>> or not. > >>> > >>> And you are asking that we need to know which host could > >>> be a good candidate before starting the migration. In such > >>> case, we indeed need some inputs from both the qemu and > >>> vhost-user backend. > >>> > >>> For DPDK, I think it could be simple, just as you said, it > >>> could be either a tiny script, or even a macro defined in > >>> the source code file (we extend it every time we add a > >>> new feature) to let the libvirt to read it. Or something > >>> else. > >> > >> There's the issue of APIs that tweak features as Maxime > >> suggested. > > > > Yes, it's a good point. > > > >> Maybe the only thing to do is to deprecate it, > > > > Looks like so. > > > >> but I feel some way for application to pass info into > >> guest might be benefitial. > > > > The two APIs are just for tweaking feature bits DPDK supports before > > any device got connected. It's another way to disable some features > > (the another obvious way is to through QEMU command lines). > > > > IMO, it's bit handy only in a case like: we have bunch of VMs. Instead > > of disabling something though qemu one by one, we could disable it > > once in DPDK. > > > > But I doubt the useful of it. It's only used in DPDK's vhost example > > after all. Nor is it used in vhost pmd, neither is it used in OVS. > > rte_vhost_feature_disable() is currently used in OVS, lib/netdev-dpdk.c Hmmm. I must have checked very old code ... > > netdev_dpdk_vhost_class_init(void) > { > static struct ovsthread_once once = OVSTHREAD_ONCE_INITIALIZER; > > /* This function can be called for different classes. The > initialization > * needs to be done only once */ > if (ovsthread_once_start(&once)) { > rte_vhost_driver_callback_register(&virtio_net_device_ops); > rte_vhost_feature_disable(1ULL << VIRTIO_NET_F_HOST_TSO4 > | 1ULL << VIRTIO_NET_F_HOST_TSO6 > | 1ULL << VIRTIO_NET_F_CSUM); I saw the commit introduced such change, but it tells no reason why it was added. commit 362ca39639ae871806be5ae97d55e1cbb14afd92 Author: mweglicx Date: Thu Apr 14 17:40:06 2016 +0100 Update relevant artifacts to add support for DPDK 16.04. Following changes are applied: - INSTALL.DPDK.md: CONFIG_RTE_BUILD_COMBINE_LIBS step has been removed because it is no longer present in DPDK configuration (combined library is created by default), - INSTALL.DPDK.md: VHost Cuse configuration is updated, - netdev-dpdk.c: Link speed definition is changed in DPDK and netdev_dpdk_get_features is updated accordingly, - netdev-dpdk.c: TSO and checksum offload has been disabled for vhostuser device. - .travis/linux-build.sh: DPDK version is updated and legacy flags have been removed in configuration. Signed-off-by: Michal Weglicki Signed-off-by: Panu Matilainen Acked-by: Daniele Di Proietto --yliu
Re: [Qemu-devel] dpdk/vpp and cross-version migration for vhost
On Tue, Nov 22, 2016 at 04:53:05PM +0200, Michael S. Tsirkin wrote: > > > You keep assuming that you have the VM started first and > > > figure out things afterwards, but this does not work. > > > > > > Think about a cluster of machines. You want to start a VM in > > > a way that will ensure compatibility with all hosts > > > in a cluster. > > > > I see. I was more considering about the case when the dst > > host (including the qemu and dpdk combo) is given, and > > then determine whether it will be a successfull migration > > or not. > > > > And you are asking that we need to know which host could > > be a good candidate before starting the migration. In such > > case, we indeed need some inputs from both the qemu and > > vhost-user backend. > > > > For DPDK, I think it could be simple, just as you said, it > > could be either a tiny script, or even a macro defined in > > the source code file (we extend it every time we add a > > new feature) to let the libvirt to read it. Or something > > else. > > There's the issue of APIs that tweak features as Maxime > suggested. Yes, it's a good point. > Maybe the only thing to do is to deprecate it, Looks like so. > but I feel some way for application to pass info into > guest might be benefitial. The two APIs are just for tweaking feature bits DPDK supports before any device got connected. It's another way to disable some features (the another obvious way is to through QEMU command lines). IMO, it's bit handy only in a case like: we have bunch of VMs. Instead of disabling something though qemu one by one, we could disable it once in DPDK. But I doubt the useful of it. It's only used in DPDK's vhost example after all. Nor is it used in vhost pmd, neither is it used in OVS. > > > If you don't, guest visible interface will change > > > and you won't be able to migrate. > > > > > > It does not make sense to discuss feature bits specifically > > > since that is not the only part of interface. > > > For example, max ring size supported might change. > > > > I don't quite understand why we have to consider the max ring > > size here? Isn't it a virtio device attribute, that QEMU could > > provide such compatibility information? > > > > I mean, DPDK is supposed to support vary vring size, it's QEMU > > to give a specifc value. > > If backend supports s/g of any size up to 2^16, there's no issue. I don't know others, but I see no issues in DPDK. > ATM some backends might be assuming up to 1K s/g since > QEMU never supported bigger ones. We might classify this > as a bug, or not and add a feature flag. > > But it's just an example. There might be more values at issue > in the future. Yeah, maybe. But we could analysis it one by one. > > > Let me describe how it works in qemu/libvirt. > > > When you install a VM, you can specify compatibility > > > level (aka "machine type"), and you can query the supported compatibility > > > levels. Management uses that to find the supported compatibility > > > and stores the compatibility in XML that is migrated with the VM. > > > There's also a way to find the latest level which is the > > > default unless overridden by user, again this level > > > is recorded and then > > > - management can make sure migration destination is compatible > > > - management can avoid migration to hosts without that support > > > > Thanks for the info, it helps. > > > > ... > > > > > >>As version here is an opaque string for libvirt and qemu, > > > > > >>anything can be used - but I suggest either a list > > > > > >>of values defining the interface, e.g. > > > > > >>any_layout=on,max_ring=256 > > > > > >>or a version including the name and vendor of the backend, > > > > > >>e.g. "org.dpdk.v4.5.6". > > > > The version scheme may not be ideal here. Assume a QEMU is supposed > > to work with a specific DPDK version, however, user may disable some > > newer features through qemu command line, that it also could work with > > an elder DPDK version. Using the version scheme will not allow us doing > > such migration to an elder DPDK version. The MTU is a lively example > > here? (when MTU feature is provided by QEMU but is actually disabled > > by user, that it could also work with an elder DPDK without MTU support). > > > > --yliu > > OK, so does a list of values look better to you then? Yes, if there are no better way. And I think it may be better to not list all those features, literally. But instead, using the number should be better, say, features=0xdeadbeef. Listing the feature names means we have to come to an agreement in all components involved here (QEMU, libvirt, DPDK, VPP, and maybe more backends), that we have to use the exact same feature names. Though it may not be a big deal, it lacks some flexibility. A feature bits will not have this issue. --yliu > > > > > > > >> > > > > > >>Note that typically the list of supported versions can only be > > > > > >>extended, not shrunk. Also, if the host/guest interf
Re: [Qemu-devel] dpdk/vpp and cross-version migration for vhost
On Thu, Nov 17, 2016 at 07:37:09PM +0200, Michael S. Tsirkin wrote: > On Thu, Nov 17, 2016 at 05:49:36PM +0800, Yuanhan Liu wrote: > > On Thu, Nov 17, 2016 at 09:47:09AM +0100, Maxime Coquelin wrote: > > > > > > > > > On 11/17/2016 09:29 AM, Yuanhan Liu wrote: > > > >As usaual, sorry for late response :/ > > > > > > > >On Thu, Oct 13, 2016 at 08:50:52PM +0300, Michael S. Tsirkin wrote: > > > >>Hi! > > > >>So it looks like we face a problem with cross-version > > > >>migration when using vhost. It's not new but became more > > > >>acute with the advent of vhost user. > > > >> > > > >>For users to be able to migrate between different versions > > > >>of the hypervisor the interface exposed to guests > > > >>by hypervisor must stay unchanged. > > > >> > > > >>The problem is that a qemu device is connected > > > >>to a backend in another process, so the interface > > > >>exposed to guests depends on the capabilities of that > > > >>process. > > > >> > > > >>Specifically, for vhost user interface based on virtio, this includes > > > >>the "host features" bitmap that defines the interface, as well as more > > > >>host values such as the max ring size. Adding new features/changing > > > >>values to this interface is required to make progress, but on the other > > > >>hand we need ability to get the old host features to be compatible. > > > > > > > >It looks like to the same issue of vhost-user reconnect to me. For > > > >example, > > > > > > > >- start dpdk 16.07 & qemu 2.5 > > > >- kill dpdk > > > >- start dpdk 16.11 > > > > > > > >Though DPDK 16.11 has more features comparing to dpdk 16.07 (say, > > > >indirect), > > > >above should work. Because qemu saves the negotiated features before the > > > >disconnect and stores it back after the reconnection. > > > > > > > >commit a463215b087c41d7ca94e51aa347cde523831873 > > > >Author: Marc-André Lureau > > > >Date: Mon Jun 6 18:45:05 2016 +0200 > > > > > > > >vhost-net: save & restore vhost-user acked features > > > > > > > >The initial vhost-user connection sets the features to be > > > > negotiated > > > >with the driver. Renegotiation isn't possible without device > > > > reset. > > > > > > > >To handle reconnection of vhost-user backend, ensure the same > > > > set of > > > >features are provided, and reuse already acked features. > > > > > > > >Signed-off-by: Marc-André Lureau > > > > > > > > > > > >So we could do similar to vhost-user? I mean, save the acked features > > > >before migration and store it back after it. This should be able to > > > >keep the compatibility. If user downgrades DPDK version, it also could > > > >be easily detected, and then exit with an error to user: migration > > > >failed due to un-compatible vhost features. > > > > > > > >Just some rough thoughts. Makes tiny sense? > > > > > > My understanding is that the management tool has to know whether > > > versions are compatible before initiating the migration: > > > > Makes sense. How about getting and restoring the acked features through > > qemu command lines then, say, through the monitor interface? > > > > With that, it would be something like: > > > > - start vhost-user backend (DPDK, VPP, or whatever) & qemu in the src host > > > > - read the acked features (through monitor interface) > > > > - start vhost-user backend in the dst host > > > > - start qemu in the dst host with the just queried acked features > > > > QEMU then is expected to use this feature set for the later vhost-user > > feature negotitation. Exit if features compatibility is broken. > > > > Thoughts? > > > > --yliu > > > You keep assuming that you have the VM started first and > figure out things afterwards, but this does not work. > > Think about a cluster of machines. You want to start a VM in > a way that will ensure compatibility with all hosts > in a cluster. I see. I was more considering about the case when the dst ho
Re: [Qemu-devel] dpdk/vpp and cross-version migration for vhost
On Thu, Nov 17, 2016 at 09:47:09AM +0100, Maxime Coquelin wrote: > > > On 11/17/2016 09:29 AM, Yuanhan Liu wrote: > >As usaual, sorry for late response :/ > > > >On Thu, Oct 13, 2016 at 08:50:52PM +0300, Michael S. Tsirkin wrote: > >>Hi! > >>So it looks like we face a problem with cross-version > >>migration when using vhost. It's not new but became more > >>acute with the advent of vhost user. > >> > >>For users to be able to migrate between different versions > >>of the hypervisor the interface exposed to guests > >>by hypervisor must stay unchanged. > >> > >>The problem is that a qemu device is connected > >>to a backend in another process, so the interface > >>exposed to guests depends on the capabilities of that > >>process. > >> > >>Specifically, for vhost user interface based on virtio, this includes > >>the "host features" bitmap that defines the interface, as well as more > >>host values such as the max ring size. Adding new features/changing > >>values to this interface is required to make progress, but on the other > >>hand we need ability to get the old host features to be compatible. > > > >It looks like to the same issue of vhost-user reconnect to me. For example, > > > >- start dpdk 16.07 & qemu 2.5 > >- kill dpdk > >- start dpdk 16.11 > > > >Though DPDK 16.11 has more features comparing to dpdk 16.07 (say, indirect), > >above should work. Because qemu saves the negotiated features before the > >disconnect and stores it back after the reconnection. > > > >commit a463215b087c41d7ca94e51aa347cde523831873 > >Author: Marc-André Lureau > >Date: Mon Jun 6 18:45:05 2016 +0200 > > > >vhost-net: save & restore vhost-user acked features > > > >The initial vhost-user connection sets the features to be negotiated > >with the driver. Renegotiation isn't possible without device reset. > > > >To handle reconnection of vhost-user backend, ensure the same set of > >features are provided, and reuse already acked features. > > > >Signed-off-by: Marc-André Lureau > > > > > >So we could do similar to vhost-user? I mean, save the acked features > >before migration and store it back after it. This should be able to > >keep the compatibility. If user downgrades DPDK version, it also could > >be easily detected, and then exit with an error to user: migration > >failed due to un-compatible vhost features. > > > >Just some rough thoughts. Makes tiny sense? > > My understanding is that the management tool has to know whether > versions are compatible before initiating the migration: Makes sense. How about getting and restoring the acked features through qemu command lines then, say, through the monitor interface? With that, it would be something like: - start vhost-user backend (DPDK, VPP, or whatever) & qemu in the src host - read the acked features (through monitor interface) - start vhost-user backend in the dst host - start qemu in the dst host with the just queried acked features QEMU then is expected to use this feature set for the later vhost-user feature negotitation. Exit if features compatibility is broken. Thoughts? --yliu > 1. The downtime could be unpredictable if a VM has to move from hosts > to hosts multiple times, which is problematic, especially for NFV. > 2. If migration is not possible, maybe the management tool would > prefer not to interrupt the VM on current host. > > I have little experience with migration though, so I could be mistaken. > > Thanks, > Maxime > > > > > --yliu > >> > >>To solve this problem within qemu, qemu has a versioning system based on > >>a machine type concept which fundamentally is a version string, by > >>specifying that string one can get hardware compatible with a previous > >>qemu version. QEMU also reports the latest version and list of versions > >>supported so libvirt records the version at VM creation and then is > >>careful to use this machine version whenever it migrates a VM. > >> > >>One might wonder how is this solved with a kernel vhost backend. The > >>answer is that it mostly isn't - instead an assumption is made, that > >>qemu versions are deployed together with the kernel - this is generally > >>true for downstreams. Thus whenever qemu gains a new feature, it is > >>already supported by the kernel as well. However, if one attempts > >>migration wit
Re: [Qemu-devel] dpdk/vpp and cross-version migration for vhost
As usaual, sorry for late response :/ On Thu, Oct 13, 2016 at 08:50:52PM +0300, Michael S. Tsirkin wrote: > Hi! > So it looks like we face a problem with cross-version > migration when using vhost. It's not new but became more > acute with the advent of vhost user. > > For users to be able to migrate between different versions > of the hypervisor the interface exposed to guests > by hypervisor must stay unchanged. > > The problem is that a qemu device is connected > to a backend in another process, so the interface > exposed to guests depends on the capabilities of that > process. > > Specifically, for vhost user interface based on virtio, this includes > the "host features" bitmap that defines the interface, as well as more > host values such as the max ring size. Adding new features/changing > values to this interface is required to make progress, but on the other > hand we need ability to get the old host features to be compatible. It looks like to the same issue of vhost-user reconnect to me. For example, - start dpdk 16.07 & qemu 2.5 - kill dpdk - start dpdk 16.11 Though DPDK 16.11 has more features comparing to dpdk 16.07 (say, indirect), above should work. Because qemu saves the negotiated features before the disconnect and stores it back after the reconnection. commit a463215b087c41d7ca94e51aa347cde523831873 Author: Marc-André Lureau Date: Mon Jun 6 18:45:05 2016 +0200 vhost-net: save & restore vhost-user acked features The initial vhost-user connection sets the features to be negotiated with the driver. Renegotiation isn't possible without device reset. To handle reconnection of vhost-user backend, ensure the same set of features are provided, and reuse already acked features. Signed-off-by: Marc-André Lureau So we could do similar to vhost-user? I mean, save the acked features before migration and store it back after it. This should be able to keep the compatibility. If user downgrades DPDK version, it also could be easily detected, and then exit with an error to user: migration failed due to un-compatible vhost features. Just some rough thoughts. Makes tiny sense? --yliu > > To solve this problem within qemu, qemu has a versioning system based on > a machine type concept which fundamentally is a version string, by > specifying that string one can get hardware compatible with a previous > qemu version. QEMU also reports the latest version and list of versions > supported so libvirt records the version at VM creation and then is > careful to use this machine version whenever it migrates a VM. > > One might wonder how is this solved with a kernel vhost backend. The > answer is that it mostly isn't - instead an assumption is made, that > qemu versions are deployed together with the kernel - this is generally > true for downstreams. Thus whenever qemu gains a new feature, it is > already supported by the kernel as well. However, if one attempts > migration with a new qemu from a system with a new to old kernel, one > would get a failure. > > In the world where we have multiple userspace backends, with some of > these supplied by ISVs, this seems non-realistic. > > IMO we need to support vhost backend versioning, ideally > in a way that will also work for vhost kernel backends. > > So I'd like to get some input from both backend and management > developers on what a good solution would look like. > > If we want to emulate the qemu solution, this involves adding the > concept of interface versions to dpdk. For example, dpdk could supply a > file (or utility printing?) with list of versions: latest and versions > supported. libvirt could read that and > - store latest version at vm creation > - pass it around with the vm > - pass it to qemu > > >From here, qemu could pass this over the vhost-user channel, > thus making sure it's initialized with the correct > compatible interface. > > As version here is an opaque string for libvirt and qemu, > anything can be used - but I suggest either a list > of values defining the interface, e.g. > any_layout=on,max_ring=256 > or a version including the name and vendor of the backend, > e.g. "org.dpdk.v4.5.6". > > Note that typically the list of supported versions can only be > extended, not shrunk. Also, if the host/guest interface > does not change, don't change the current version as > this just creates work for everyone. > > Thoughts? Would this work well for management? dpdk? vpp? > > Thanks! > > -- > MST
Re: [Qemu-devel] [dpdk-dev] [PATCH 1/2] vhost: enable any layout feature
On Tue, Oct 11, 2016 at 02:57:49PM +0800, Yuanhan Liu wrote: > > > > > > There was an example: the vhost enqueue optmization patchset from > > > > > > Zhihong [0] uses memset, and it introduces more than 15% drop (IIRC) Though it doesn't matter now, but I have verified it yesterday (with and wihtout memset), the drop could be up to 30+%. This is to let you know that it could behaviour badly if memset is not inlined. > > > > > > on my Ivybridge server: it has no such issue on his server though. > > > > > > > > > > > > [0]: http://dpdk.org/ml/archives/dev/2016-August/045272.html > > > > > > > > > > > > --yliu > > > > > > > > > > I'd say that's weird. what's your config? any chance you > > > > > are using an old compiler? > > > > > > > > Not really, it's gcc 5.3.1. Maybe Zhihong could explain more. IIRC, > > > > he said the memset is not well optimized for Ivybridge server. > > > > > > The dst is remote in that case. It's fine on Haswell but has complication > > > in Ivy Bridge which (wasn't supposed to but) causes serious frontend > > > issue. > > > > > > I don't think gcc inlined it there. I'm using fc24 gcc 6.1.1. > > > > > > So try something like this then: > > Yes, I saw memset is inlined when this diff is applied. I have another concern though: It's a trick could let gcc do the inline, I am not quite sure whether that's ture with other compilers (i.e. clang, icc, or even, older gcc). For this case, I think I still prefer some trick like *(struct ..*) = {0, } Or even, we may could introduce rte_memset(). IIRC, that has been proposed somehow before? --yliu
Re: [Qemu-devel] [PATCH 1/2] vhost: enable any layout feature
On Mon, Oct 10, 2016 at 07:39:59AM +0300, Michael S. Tsirkin wrote: > > > > > > 1. why is this memset expensive? > > > > > > > > > > I don't have the exact answer, but just some rough thoughts: > > > > > > > > > > It's an external clib function: there is a call stack and the > > > > > IP register will bounch back and forth. > > > > > > > > for memset 0? gcc 5.3.1 on fedora happily inlines it. > > > > > > Good to know! > > > > > > > > overkill to use that for resetting 14 bytes structure. > > > > > > > > > > Some trick like > > > > > *(struct virtio_net_hdr *)hdr = {0, }; > > > > > > > > > > Or even > > > > > hdr->xxx = 0; > > > > > hdr->yyy = 0; > > > > > > > > > > should behaviour better. > > > > > > > > > > There was an example: the vhost enqueue optmization patchset from > > > > > Zhihong [0] uses memset, and it introduces more than 15% drop (IIRC) > > > > > on my Ivybridge server: it has no such issue on his server though. > > > > > > > > > > [0]: http://dpdk.org/ml/archives/dev/2016-August/045272.html > > > > > > > > > > --yliu > > > > > > > > I'd say that's weird. what's your config? any chance you > > > > are using an old compiler? > > > > > > Not really, it's gcc 5.3.1. Maybe Zhihong could explain more. IIRC, > > > he said the memset is not well optimized for Ivybridge server. > > > > The dst is remote in that case. It's fine on Haswell but has complication > > in Ivy Bridge which (wasn't supposed to but) causes serious frontend issue. > > > > I don't think gcc inlined it there. I'm using fc24 gcc 6.1.1. > > > So try something like this then: Yes, I saw memset is inlined when this diff is applied. So, mind to send a formal patch? You might want to try build at least: it doesn't build. > Generally pointer chasing in vq->hw->vtnet_hdr_size can't be good > for performance. Move fields used on data path into vq > and use from there to avoid indirections? Good suggestion! --yliu
Re: [Qemu-devel] [PATCH 1/2] vhost: enable any layout feature
On Tue, Oct 11, 2016 at 08:39:54AM +0200, Maxime Coquelin wrote: > > > On 10/11/2016 08:04 AM, Yuanhan Liu wrote: > >On Mon, Oct 10, 2016 at 04:54:39PM +0200, Maxime Coquelin wrote: > >> > >> > >>On 10/10/2016 04:42 PM, Yuanhan Liu wrote: > >>>On Mon, Oct 10, 2016 at 02:40:44PM +0200, Maxime Coquelin wrote: > >>>>>>>At that time, a packet always use 2 descs. Since indirect desc is > >>>>>>>enabled (by default) now, the assumption is not true then. What's > >>>>>>>worse, it might even slow things a bit down. That should also be > >>>>>>>part of the reason why performance is slightly worse than before. > >>>>>>> > >>>>>>> --yliu > >>>>>> > >>>>>>I'm not sure I get what you are saying > >>>>>> > >>>>>>>commit 1d41d77cf81c448c1b09e1e859bfd300e2054a98 > >>>>>>>Author: Yuanhan Liu > >>>>>>>Date: Mon May 2 17:46:17 2016 -0700 > >>>>>>> > >>>>>>> vhost: optimize dequeue for small packets > >>>>>>> > >>>>>>> A virtio driver normally uses at least 2 desc buffers for Tx: the > >>>>>>> first for storing the header, and the others for storing the data. > >>>>>>> > >>>>>>> Therefore, we could fetch the first data desc buf before the main > >>>>>>> loop, and do the copy first before the check of "are we done yet?". > >>>>>>> This could save one check for small packets that just have one data > >>>>>>> desc buffer and need one mbuf to store it. > >>>>>>> > >>>>>>> Signed-off-by: Yuanhan Liu > >>>>>>> Acked-by: Huawei Xie > >>>>>>> Tested-by: Rich Lane > >>>>>> > >>>>>>This fast-paths the 2-descriptors format but it's not active > >>>>>>for indirect descriptors. Is this what you mean? > >>>>> > >>>>>Yes. It's also not active when ANY_LAYOUT is actually turned on. > >>>>>>Should be a simple matter to apply this optimization for indirect. > >>>>> > >>>>>Might be. > >>>> > >>>>If I understand the code correctly, indirect descs also benefit from this > >>>>optimization, or am I missing something? > >>> > >>>Aha..., you are right! > >> > >>The interesting thing is that the patch I send on Thursday that removes > >>header access when no offload has been negotiated[0] seems to reduce > >>almost to zero the performance seen with indirect descriptors enabled. > > > >Didn't follow that. > > > >>I see this with 64 bytes packets using testpmd on both ends. > >> > >>When I did the patch, I would have expected the same gain with both > >>modes, whereas I measured +1% for direct and +4% for indirect. > > > >IIRC, I did a test before (remove those offload code piece), and the > >performance was basically the same before and after that. Well, there > >might be some small difference, say 1% as you said. But the result has > >never been steady. > > > >Anyway, I think your patch is good to have: I just didn't see v2. > > I waited to gather some comments/feedback before sending the v2. > I'll send it today or tomorrow. Interesting, I saw a deadlock then: I haven't looked at the code carefully once you said there is a v2, thus I'm waiting for it. However, you are waitting for my review. :) Anyway, I will take time to look at it shortly. --yliu
Re: [Qemu-devel] [PATCH 1/2] vhost: enable any layout feature
On Mon, Oct 10, 2016 at 04:54:39PM +0200, Maxime Coquelin wrote: > > > On 10/10/2016 04:42 PM, Yuanhan Liu wrote: > >On Mon, Oct 10, 2016 at 02:40:44PM +0200, Maxime Coquelin wrote: > >>>>>At that time, a packet always use 2 descs. Since indirect desc is > >>>>>enabled (by default) now, the assumption is not true then. What's > >>>>>worse, it might even slow things a bit down. That should also be > >>>>>part of the reason why performance is slightly worse than before. > >>>>> > >>>>> --yliu > >>>> > >>>>I'm not sure I get what you are saying > >>>> > >>>>>commit 1d41d77cf81c448c1b09e1e859bfd300e2054a98 > >>>>>Author: Yuanhan Liu > >>>>>Date: Mon May 2 17:46:17 2016 -0700 > >>>>> > >>>>> vhost: optimize dequeue for small packets > >>>>> > >>>>> A virtio driver normally uses at least 2 desc buffers for Tx: the > >>>>> first for storing the header, and the others for storing the data. > >>>>> > >>>>> Therefore, we could fetch the first data desc buf before the main > >>>>> loop, and do the copy first before the check of "are we done yet?". > >>>>> This could save one check for small packets that just have one data > >>>>> desc buffer and need one mbuf to store it. > >>>>> > >>>>> Signed-off-by: Yuanhan Liu > >>>>> Acked-by: Huawei Xie > >>>>> Tested-by: Rich Lane > >>>> > >>>>This fast-paths the 2-descriptors format but it's not active > >>>>for indirect descriptors. Is this what you mean? > >>> > >>>Yes. It's also not active when ANY_LAYOUT is actually turned on. > >>>>Should be a simple matter to apply this optimization for indirect. > >>> > >>>Might be. > >> > >>If I understand the code correctly, indirect descs also benefit from this > >>optimization, or am I missing something? > > > >Aha..., you are right! > > The interesting thing is that the patch I send on Thursday that removes > header access when no offload has been negotiated[0] seems to reduce > almost to zero the performance seen with indirect descriptors enabled. Didn't follow that. > I see this with 64 bytes packets using testpmd on both ends. > > When I did the patch, I would have expected the same gain with both > modes, whereas I measured +1% for direct and +4% for indirect. IIRC, I did a test before (remove those offload code piece), and the performance was basically the same before and after that. Well, there might be some small difference, say 1% as you said. But the result has never been steady. Anyway, I think your patch is good to have: I just didn't see v2. --yliu
Re: [Qemu-devel] [PATCH 1/2] vhost: enable any layout feature
On Mon, Oct 10, 2016 at 02:40:44PM +0200, Maxime Coquelin wrote: > >>>At that time, a packet always use 2 descs. Since indirect desc is > >>>enabled (by default) now, the assumption is not true then. What's > >>>worse, it might even slow things a bit down. That should also be > >>>part of the reason why performance is slightly worse than before. > >>> > >>> --yliu > >> > >>I'm not sure I get what you are saying > >> > >>>commit 1d41d77cf81c448c1b09e1e859bfd300e2054a98 > >>>Author: Yuanhan Liu > >>>Date: Mon May 2 17:46:17 2016 -0700 > >>> > >>>vhost: optimize dequeue for small packets > >>> > >>>A virtio driver normally uses at least 2 desc buffers for Tx: the > >>>first for storing the header, and the others for storing the data. > >>> > >>>Therefore, we could fetch the first data desc buf before the main > >>> loop, and do the copy first before the check of "are we done yet?". > >>>This could save one check for small packets that just have one data > >>>desc buffer and need one mbuf to store it. > >>> > >>>Signed-off-by: Yuanhan Liu > >>>Acked-by: Huawei Xie > >>>Tested-by: Rich Lane > >> > >>This fast-paths the 2-descriptors format but it's not active > >>for indirect descriptors. Is this what you mean? > > > >Yes. It's also not active when ANY_LAYOUT is actually turned on. > >>Should be a simple matter to apply this optimization for indirect. > > > >Might be. > > If I understand the code correctly, indirect descs also benefit from this > optimization, or am I missing something? Aha..., you are right! --yliu
Re: [Qemu-devel] [PATCH 1/2] vhost: enable any layout feature
On Mon, Oct 10, 2016 at 07:17:06AM +0300, Michael S. Tsirkin wrote: > On Mon, Oct 10, 2016 at 12:05:31PM +0800, Yuanhan Liu wrote: > > On Fri, Sep 30, 2016 at 10:16:43PM +0300, Michael S. Tsirkin wrote: > > > > > And the same is done is done in DPDK: > > > > > > > > > > static inline int __attribute__((always_inline)) > > > > > copy_desc_to_mbuf(struct virtio_net *dev, struct vring_desc *descs, > > > > > uint16_t max_desc, struct rte_mbuf *m, uint16_t desc_idx, > > > > > struct rte_mempool *mbuf_pool) > > > > > { > > > > > ... > > > > > /* > > > > > * A virtio driver normally uses at least 2 desc buffers > > > > > * for Tx: the first for storing the header, and others > > > > > * for storing the data. > > > > > */ > > > > > if (likely((desc->len == dev->vhost_hlen) && > > > > >(desc->flags & VRING_DESC_F_NEXT) != 0)) { > > > > > desc = &descs[desc->next]; > > > > > if (unlikely(desc->flags & VRING_DESC_F_INDIRECT)) > > > > > return -1; > > > > > > > > > > desc_addr = gpa_to_vva(dev, desc->addr); > > > > > if (unlikely(!desc_addr)) > > > > > return -1; > > > > > > > > > > rte_prefetch0((void *)(uintptr_t)desc_addr); > > > > > > > > > > desc_offset = 0; > > > > > desc_avail = desc->len; > > > > > nr_desc+= 1; > > > > > > > > > > PRINT_PACKET(dev, (uintptr_t)desc_addr, desc->len, 0); > > > > > } else { > > > > > desc_avail = desc->len - dev->vhost_hlen; > > > > > desc_offset = dev->vhost_hlen; > > > > > } > > > > > > > > Actually, the header is parsed in DPDK vhost implementation. > > > > But as Virtio PMD provides a zero'ed header, we could just parse > > > > the header only if VIRTIO_NET_F_NO_TX_HEADER is not negotiated. > > > > > > host can always skip the header parse if it wants to. > > > It didn't seem worth it to add branches there but > > > if I'm wrong, by all means code it up. > > > > It's added by following commit, which yields about 10% performance > > boosts for PVP case (with 64B packet size). > > > > At that time, a packet always use 2 descs. Since indirect desc is > > enabled (by default) now, the assumption is not true then. What's > > worse, it might even slow things a bit down. That should also be > > part of the reason why performance is slightly worse than before. > > > > --yliu > > I'm not sure I get what you are saying > > > commit 1d41d77cf81c448c1b09e1e859bfd300e2054a98 > > Author: Yuanhan Liu > > Date: Mon May 2 17:46:17 2016 -0700 > > > > vhost: optimize dequeue for small packets > > > > A virtio driver normally uses at least 2 desc buffers for Tx: the > > first for storing the header, and the others for storing the data. > > > > Therefore, we could fetch the first data desc buf before the main > > loop, and do the copy first before the check of "are we done yet?". > > This could save one check for small packets that just have one data > > desc buffer and need one mbuf to store it. > > > > Signed-off-by: Yuanhan Liu > > Acked-by: Huawei Xie > > Tested-by: Rich Lane > > This fast-paths the 2-descriptors format but it's not active > for indirect descriptors. Is this what you mean? Yes. It's also not active when ANY_LAYOUT is actually turned on. > Should be a simple matter to apply this optimization for indirect. Might be. --yliu
Re: [Qemu-devel] [PATCH 1/2] vhost: enable any layout feature
On Fri, Sep 30, 2016 at 10:16:43PM +0300, Michael S. Tsirkin wrote: > > > And the same is done is done in DPDK: > > > > > > static inline int __attribute__((always_inline)) > > > copy_desc_to_mbuf(struct virtio_net *dev, struct vring_desc *descs, > > > uint16_t max_desc, struct rte_mbuf *m, uint16_t desc_idx, > > > struct rte_mempool *mbuf_pool) > > > { > > > ... > > > /* > > > * A virtio driver normally uses at least 2 desc buffers > > > * for Tx: the first for storing the header, and others > > > * for storing the data. > > > */ > > > if (likely((desc->len == dev->vhost_hlen) && > > >(desc->flags & VRING_DESC_F_NEXT) != 0)) { > > > desc = &descs[desc->next]; > > > if (unlikely(desc->flags & VRING_DESC_F_INDIRECT)) > > > return -1; > > > > > > desc_addr = gpa_to_vva(dev, desc->addr); > > > if (unlikely(!desc_addr)) > > > return -1; > > > > > > rte_prefetch0((void *)(uintptr_t)desc_addr); > > > > > > desc_offset = 0; > > > desc_avail = desc->len; > > > nr_desc+= 1; > > > > > > PRINT_PACKET(dev, (uintptr_t)desc_addr, desc->len, 0); > > > } else { > > > desc_avail = desc->len - dev->vhost_hlen; > > > desc_offset = dev->vhost_hlen; > > > } > > > > Actually, the header is parsed in DPDK vhost implementation. > > But as Virtio PMD provides a zero'ed header, we could just parse > > the header only if VIRTIO_NET_F_NO_TX_HEADER is not negotiated. > > host can always skip the header parse if it wants to. > It didn't seem worth it to add branches there but > if I'm wrong, by all means code it up. It's added by following commit, which yields about 10% performance boosts for PVP case (with 64B packet size). At that time, a packet always use 2 descs. Since indirect desc is enabled (by default) now, the assumption is not true then. What's worse, it might even slow things a bit down. That should also be part of the reason why performance is slightly worse than before. --yliu commit 1d41d77cf81c448c1b09e1e859bfd300e2054a98 Author: Yuanhan Liu Date: Mon May 2 17:46:17 2016 -0700 vhost: optimize dequeue for small packets A virtio driver normally uses at least 2 desc buffers for Tx: the first for storing the header, and the others for storing the data. Therefore, we could fetch the first data desc buf before the main loop, and do the copy first before the check of "are we done yet?". This could save one check for small packets that just have one data desc buffer and need one mbuf to store it. Signed-off-by: Yuanhan Liu Acked-by: Huawei Xie Tested-by: Rich Lane
Re: [Qemu-devel] [PATCH 1/2] vhost: enable any layout feature
On Thu, Sep 29, 2016 at 10:05:22PM +0200, Maxime Coquelin wrote: > >so doing this unconditionally would be a spec violation, but if you see > >value in this, we can add a feature bit. > Right it would be a spec violation, so it should be done conditionally. > If a feature bit is to be added, what about VIRTIO_NET_F_NO_TX_HEADER? > It would imply VIRTIO_NET_F_CSUM not set, and no GSO features set. > If negotiated, we wouldn't need to prepend a header. If we could skip Tx header, I think we could also skip Rx header, in the case when mrg-rx is aslo turned off? > From the micro-benchmarks results, we can expect +10% compared to > indirect descriptors, and + 5% compared to using 2 descs in the > virtqueue. > Also, it should have the same benefits as indirect descriptors for 0% > pkt loss (as we can fill 2x more packets in the virtqueue). > > What do you think? I would vote for this. It should yield maximum performance for the case that it's guaranteed that packet size will always fit in a typical MTU (1500). --yliu
Re: [Qemu-devel] [PATCH 1/2] vhost: enable any layout feature
On Mon, Oct 10, 2016 at 06:46:44AM +0300, Michael S. Tsirkin wrote: > On Mon, Oct 10, 2016 at 11:37:44AM +0800, Yuanhan Liu wrote: > > On Thu, Sep 29, 2016 at 11:21:48PM +0300, Michael S. Tsirkin wrote: > > > On Thu, Sep 29, 2016 at 10:05:22PM +0200, Maxime Coquelin wrote: > > > > > > > > > > > > On 09/29/2016 07:57 PM, Michael S. Tsirkin wrote: > > > Yes but two points. > > > > > > 1. why is this memset expensive? > > > > I don't have the exact answer, but just some rough thoughts: > > > > It's an external clib function: there is a call stack and the > > IP register will bounch back and forth. > > for memset 0? gcc 5.3.1 on fedora happily inlines it. Good to know! > > overkill to use that for resetting 14 bytes structure. > > > > Some trick like > > *(struct virtio_net_hdr *)hdr = {0, }; > > > > Or even > > hdr->xxx = 0; > > hdr->yyy = 0; > > > > should behaviour better. > > > > There was an example: the vhost enqueue optmization patchset from > > Zhihong [0] uses memset, and it introduces more than 15% drop (IIRC) > > on my Ivybridge server: it has no such issue on his server though. > > > > [0]: http://dpdk.org/ml/archives/dev/2016-August/045272.html > > > > --yliu > > I'd say that's weird. what's your config? any chance you > are using an old compiler? Not really, it's gcc 5.3.1. Maybe Zhihong could explain more. IIRC, he said the memset is not well optimized for Ivybridge server. --yliu
Re: [Qemu-devel] [PATCH 1/2] vhost: enable any layout feature
On Mon, Oct 10, 2016 at 06:04:32AM +0300, Michael S. Tsirkin wrote: > > > So I guess at this point, we can teach vhost-user in qemu > > > that version 1 implies any_layout but only for machine types > > > qemu 2.8 and up. It sets a bad precedent but oh well. > > > > It should work. > > > > --yliu > > Cool. Want to post a patch? I could have a try this week (Well, it's very unlikely though). If not, it will be postponed for a while: I am traveling next week. --yliu
Re: [Qemu-devel] [PATCH 1/2] vhost: enable any layout feature
On Mon, Oct 10, 2016 at 02:20:22AM +0300, Michael S. Tsirkin wrote: > On Wed, Sep 28, 2016 at 10:28:48AM +0800, Yuanhan Liu wrote: > > On Tue, Sep 27, 2016 at 10:56:40PM +0300, Michael S. Tsirkin wrote: > > > On Tue, Sep 27, 2016 at 11:11:58AM +0800, Yuanhan Liu wrote: > > > > On Mon, Sep 26, 2016 at 10:24:55PM +0300, Michael S. Tsirkin wrote: > > > > > On Mon, Sep 26, 2016 at 11:01:58AM -0700, Stephen Hemminger wrote: > > > > > > I assume that if using Version 1 that the bit will be ignored > > > > > > > > Yes, but I will just quote what you just said: what if the guest > > > > virtio device is a legacy device? I also gave my reasons in another > > > > email why I consistently set this flag: > > > > > > > > - we have to return all features we support to the guest. > > > > > > > > We don't know the guest is a modern or legacy device. That means > > > > we should claim we support both: VERSION_1 and ANY_LAYOUT. > > > > > > > > Assume guest is a legacy device and we just set VERSION_1 (the > > > > current > > > > case), ANY_LAYOUT will never be negotiated. > > > > > > > > - I'm following the way Linux kernel takes: it also set both features. > > > > > > > > Maybe, we could unset ANY_LAYOUT when VERSION_1 is _negotiated_? > > > > > > > > The unset after negotiation I proposed turned out it won't work: the > > > > feature is already negotiated; unsetting it only in vhost side doesn't > > > > change anything. Besides, it may break the migration as Michael stated > > > > below. > > > > > > I think the reverse. Teach vhost user that for future machine types > > > only VERSION_1 implies ANY_LAYOUT. > > So I guess at this point, we can teach vhost-user in qemu > that version 1 implies any_layout but only for machine types > qemu 2.8 and up. It sets a bad precedent but oh well. It should work. --yliu
Re: [Qemu-devel] [PATCH 1/2] vhost: enable any layout feature
On Thu, Sep 29, 2016 at 11:21:48PM +0300, Michael S. Tsirkin wrote: > On Thu, Sep 29, 2016 at 10:05:22PM +0200, Maxime Coquelin wrote: > > > > > > On 09/29/2016 07:57 PM, Michael S. Tsirkin wrote: > Yes but two points. > > 1. why is this memset expensive? I don't have the exact answer, but just some rough thoughts: It's an external clib function: there is a call stack and the IP register will bounch back and forth. BTW, It's kind of an overkill to use that for resetting 14 bytes structure. Some trick like *(struct virtio_net_hdr *)hdr = {0, }; Or even hdr->xxx = 0; hdr->yyy = 0; should behaviour better. There was an example: the vhost enqueue optmization patchset from Zhihong [0] uses memset, and it introduces more than 15% drop (IIRC) on my Ivybridge server: it has no such issue on his server though. [0]: http://dpdk.org/ml/archives/dev/2016-August/045272.html --yliu > Is the test completely skipping looking >at the packet otherwise? > > 2. As long as we are doing this, see > Alignment vs. Networking > > in Documentation/unaligned-memory-access.txt > > > > From the micro-benchmarks results, we can expect +10% compared to > > indirect descriptors, and + 5% compared to using 2 descs in the > > virtqueue. > > Also, it should have the same benefits as indirect descriptors for 0% > > pkt loss (as we can fill 2x more packets in the virtqueue). > > > > What do you think? > > > > Thanks, > > Maxime
Re: [Qemu-devel] [PATCH 1/2] vhost: enable any layout feature
On Tue, Sep 27, 2016 at 10:56:40PM +0300, Michael S. Tsirkin wrote: > On Tue, Sep 27, 2016 at 11:11:58AM +0800, Yuanhan Liu wrote: > > On Mon, Sep 26, 2016 at 10:24:55PM +0300, Michael S. Tsirkin wrote: > > > On Mon, Sep 26, 2016 at 11:01:58AM -0700, Stephen Hemminger wrote: > > > > I assume that if using Version 1 that the bit will be ignored > > > > Yes, but I will just quote what you just said: what if the guest > > virtio device is a legacy device? I also gave my reasons in another > > email why I consistently set this flag: > > > > - we have to return all features we support to the guest. > > > > We don't know the guest is a modern or legacy device. That means > > we should claim we support both: VERSION_1 and ANY_LAYOUT. > > > > Assume guest is a legacy device and we just set VERSION_1 (the current > > case), ANY_LAYOUT will never be negotiated. > > > > - I'm following the way Linux kernel takes: it also set both features. > > > > Maybe, we could unset ANY_LAYOUT when VERSION_1 is _negotiated_? > > > > The unset after negotiation I proposed turned out it won't work: the > > feature is already negotiated; unsetting it only in vhost side doesn't > > change anything. Besides, it may break the migration as Michael stated > > below. > > I think the reverse. Teach vhost user that for future machine types > only VERSION_1 implies ANY_LAYOUT. > > > > > Therein lies a problem. If dpdk tweaks flags, updating it > > > will break guest migration. > > > > > > One way is to require that users specify all flags fully when > > > creating the virtio net device. > > > > Like how? By a new command line option? And user has to type > > all those features? > > Make libvirt do this. users use management normally. those that don't > likely don't migrate VMs. Fair enough. > > > > QEMU could verify that all required > > > flags are set, and fail init if not. > > > > > > This has other advantages, e.g. it adds ability to > > > init device without waiting for dpdk to connect. Will the feature negotiation between DPDK and QEMU still exist in your proposal? > > > > > > However, enabling each new feature would now require > > > management work. How about dpdk ships the list > > > of supported features instead? > > > Management tools could read them on source and destination > > > and select features supported on both sides. > > > > That means the management tool would somehow has a dependency on > > DPDK project, which I have no objection at all. But, is that > > a good idea? > > It already starts the bridge somehow, does it not? Indeed. I was firstly thinking about reading the dpdk source file to determine the DPDK supported feature list, with which the bind is too tight. I later realized you may ask DPDK to provide a binary to dump the list, or something like that. > > > BTW, I'm not quite sure I followed your idea. I mean, how it supposed > > to fix the ANY_LAYOUT issue here? How this flag will be set for > > legacy device? > > > > --yliu > > For ANY_LAYOUT, I think we should just set in in qemu, > but only for new machine types. What do you mean by "new machine types"? Virtio device with newer virtio-spec version? > This addresses migration > concerns. To make sure I followed you, do you mean the migration issue from an older "dpdk + qemu" combo to a newer "dpdk + qemu" combo (that more new features might be shipped)? Besides that, your proposal looks like a big work to accomplish. Are you okay to make it simple first: set it consistently like what Linux kernel does? This would at least make the ANY_LAYOUT actually be enabled for legacy device (which is also the default one that's widely used so far). --yliu > > But there will be more new features in the future and > it is necessary to think how we will enable them without > breaking migration. > > -- > MST
Re: [Qemu-devel] [PATCH 1/2] vhost: enable any layout feature
On Mon, Sep 26, 2016 at 10:24:55PM +0300, Michael S. Tsirkin wrote: > On Mon, Sep 26, 2016 at 11:01:58AM -0700, Stephen Hemminger wrote: > > I assume that if using Version 1 that the bit will be ignored Yes, but I will just quote what you just said: what if the guest virtio device is a legacy device? I also gave my reasons in another email why I consistently set this flag: - we have to return all features we support to the guest. We don't know the guest is a modern or legacy device. That means we should claim we support both: VERSION_1 and ANY_LAYOUT. Assume guest is a legacy device and we just set VERSION_1 (the current case), ANY_LAYOUT will never be negotiated. - I'm following the way Linux kernel takes: it also set both features. Maybe, we could unset ANY_LAYOUT when VERSION_1 is _negotiated_? The unset after negotiation I proposed turned out it won't work: the feature is already negotiated; unsetting it only in vhost side doesn't change anything. Besides, it may break the migration as Michael stated below. > Therein lies a problem. If dpdk tweaks flags, updating it > will break guest migration. > > One way is to require that users specify all flags fully when > creating the virtio net device. Like how? By a new command line option? And user has to type all those features? > QEMU could verify that all required > flags are set, and fail init if not. > > This has other advantages, e.g. it adds ability to > init device without waiting for dpdk to connect. > > However, enabling each new feature would now require > management work. How about dpdk ships the list > of supported features instead? > Management tools could read them on source and destination > and select features supported on both sides. That means the management tool would somehow has a dependency on DPDK project, which I have no objection at all. But, is that a good idea? BTW, I'm not quite sure I followed your idea. I mean, how it supposed to fix the ANY_LAYOUT issue here? How this flag will be set for legacy device? --yliu
Re: [Qemu-devel] [PATCH v2 22/23] vhost-user: wait until backend init is completed
On Thu, Jun 30, 2016 at 05:22:43AM -0400, Marc-André Lureau wrote: > Hi > > - Original Message - > > On Fri, Jun 24, 2016 at 03:51:09PM +0200, marcandre.lur...@redhat.com wrote: > > > From: Marc-André Lureau > > > > > > The chardev waits for an initial connection before starting qemu, > > > vhost-user wants the backend negotiation to be completed. vhost-user is > > > started in the net_vhost_user_event callback, which is synchronously > > > called after the socket is connected. > > > > > > Signed-off-by: Marc-André Lureau > > > --- > > > net/vhost-user.c | 12 +++- > > > 1 file changed, 11 insertions(+), 1 deletion(-) > > > > > > diff --git a/net/vhost-user.c b/net/vhost-user.c > > > index 95ed2d2..4badd9e 100644 > > > --- a/net/vhost-user.c > > > +++ b/net/vhost-user.c > > > @@ -24,6 +24,7 @@ typedef struct VhostUserState { > > > VHostNetState *vhost_net; > > > int watch; > > > uint64_t acked_features; > > > +bool started; > > > } VhostUserState; > > > > > > typedef struct VhostUserChardevProps { > > > @@ -211,6 +212,7 @@ static void net_vhost_user_event(void *opaque, int > > > event) > > > return; > > > } > > > qmp_set_link(name, true, &err); > > > +s->started = true; > > > break; > > > case CHR_EVENT_CLOSED: > > > qmp_set_link(name, false, &err); > > > @@ -248,7 +250,15 @@ static int net_vhost_user_init(NetClientState *peer, > > > const char *device, > > > s->chr = chr; > > > } > > > > > > -qemu_chr_add_handlers(chr, NULL, NULL, net_vhost_user_event, > > > nc[0].name); > > > +do { > > > +Error *err = NULL; > > > +if (qemu_chr_wait_connected(chr, &err) < 0) { > > > +error_report_err(err); > > > +return -1; > > > +} > > > +qemu_chr_add_handlers(chr, NULL, NULL, > > > + net_vhost_user_event, nc[0].name); > > > +} while (!s->started); > > > > > > I haven't looked at your patchset carefully yet, but I did a quick test, > > and showed that above code piece just breaks vhost-user: it's a dead loop > > that vhost-user net will be initiated again and again. > > Interesting, thanks a lot for testing! > > The vhost-user-test works just fine, as well as vhost-user-bridge. > > > The dead loop is due to, we check "s->started" corresponding to last nc, > > while we set "s->started" corresponding to the first nc. > > I see, that makes sense with multiqueue, I'll update the code. It would be > really nice to have a basic multiqueue test in vhost-user-test. My bad. It was actually my task to add the multiqueue test: it was postponed, or even forgotten :(. If you don't mind, feel free to add it. If you have not time for that, I will spend some time to fix what I left. --yliu
Re: [Qemu-devel] [PATCH v2 22/23] vhost-user: wait until backend init is completed
On Fri, Jun 24, 2016 at 03:51:09PM +0200, marcandre.lur...@redhat.com wrote: > From: Marc-André Lureau > > The chardev waits for an initial connection before starting qemu, > vhost-user wants the backend negotiation to be completed. vhost-user is > started in the net_vhost_user_event callback, which is synchronously > called after the socket is connected. > > Signed-off-by: Marc-André Lureau > --- > net/vhost-user.c | 12 +++- > 1 file changed, 11 insertions(+), 1 deletion(-) > > diff --git a/net/vhost-user.c b/net/vhost-user.c > index 95ed2d2..4badd9e 100644 > --- a/net/vhost-user.c > +++ b/net/vhost-user.c > @@ -24,6 +24,7 @@ typedef struct VhostUserState { > VHostNetState *vhost_net; > int watch; > uint64_t acked_features; > +bool started; > } VhostUserState; > > typedef struct VhostUserChardevProps { > @@ -211,6 +212,7 @@ static void net_vhost_user_event(void *opaque, int event) > return; > } > qmp_set_link(name, true, &err); > +s->started = true; > break; > case CHR_EVENT_CLOSED: > qmp_set_link(name, false, &err); > @@ -248,7 +250,15 @@ static int net_vhost_user_init(NetClientState *peer, > const char *device, > s->chr = chr; > } > > -qemu_chr_add_handlers(chr, NULL, NULL, net_vhost_user_event, nc[0].name); > +do { > +Error *err = NULL; > +if (qemu_chr_wait_connected(chr, &err) < 0) { > +error_report_err(err); > +return -1; > +} > +qemu_chr_add_handlers(chr, NULL, NULL, > + net_vhost_user_event, nc[0].name); > +} while (!s->started); I haven't looked at your patchset carefully yet, but I did a quick test, and showed that above code piece just breaks vhost-user: it's a dead loop that vhost-user net will be initiated again and again. The dead loop is due to, we check "s->started" corresponding to last nc, while we set "s->started" corresponding to the first nc. --yliu
Re: [Qemu-devel] [PATCH 07/10] vhost-net: save & restore vhost-user acked features
On Tue, May 10, 2016 at 06:03:57PM +0200, marcandre.lur...@redhat.com wrote: > From: Marc-André Lureau > > The initial vhost-user connection sets the features to be negotiated > with the driver. Renegotiation isn't possible without device reset. > > To handle reconnection of vhost-user backend, ensure the same set of > features are provided, and reuse already acked features. > > Signed-off-by: Marc-André Lureau > --- > hw/net/vhost_net.c | 21 - > include/net/vhost-user.h | 1 + > include/net/vhost_net.h | 3 +++ > net/vhost-user.c | 10 ++ > 4 files changed, 34 insertions(+), 1 deletion(-) > > diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c > index 805a0df..f3df18c 100644 > --- a/hw/net/vhost_net.c > +++ b/hw/net/vhost_net.c > @@ -120,6 +120,11 @@ uint64_t vhost_net_get_max_queues(VHostNetState *net) > return net->dev.max_queues; > } > > +uint64_t vhost_net_get_acked_features(VHostNetState *net) > +{ > +return net->dev.acked_features; > +} Note that you need add a dummy implementation for !VHOST_NET. Otherwise, build won't work. And sorry for being noisy, here is another ping about the status of this patch set. --yliu
Re: [Qemu-devel] [PATCH 00/10] RFCv3: vhost-user: simple reconnection support
On Sun, May 15, 2016 at 12:42:12PM +0300, Michael S. Tsirkin wrote: > > FYI, I have a follow up series (~20 patches, > > https://github.com/elmarco/qemu/tree/vhost-user-reconnect) doing > > mostly cleanups and extra checks for disconnection at run time. In > > particular, it should avoid some obvious crashers/asserts, and > > prevents qemu from running as long the initial vhost_user_start() > > didn't succeed (so initial flags are set). I would like to know how to > > proceed with the follow-up: should I resend the whole series or should > > we review/merge this rfc first (even though it is known to be > > incomplete in many disconnect cases that the follow up fixes). > > > > thanks > > I think a gradual merge is better. +1 May I ask what's the plan to merge this patch set? DPDK has some dependency on this patch set. --yliu
Re: [Qemu-devel] [PATCH 02/10] vubr: add client mode
Hi, Just to let you know that as a client, it need to add the reconnect ability, as the QEMU (as the server) may restart as well. Well, I'm thinking that you may think it's an example bridge only, so, let it be simple. Then I'm sorry for the noise. --yliu On Tue, May 10, 2016 at 06:03:52PM +0200, marcandre.lur...@redhat.com wrote: > From: Marc-André Lureau > > If -c is specified, vubr will try to connect to the socket instead of > listening for connections. > > Signed-off-by: Marc-André Lureau > --- > tests/vhost-user-bridge.c | 38 ++ > 1 file changed, 26 insertions(+), 12 deletions(-) > > diff --git a/tests/vhost-user-bridge.c b/tests/vhost-user-bridge.c > index 0779ba2..f907ce7 100644 > --- a/tests/vhost-user-bridge.c > +++ b/tests/vhost-user-bridge.c > @@ -1204,12 +1204,13 @@ vubr_accept_cb(int sock, void *ctx) > } > > static VubrDev * > -vubr_new(const char *path) > +vubr_new(const char *path, bool client) > { > VubrDev *dev = (VubrDev *) calloc(1, sizeof(VubrDev)); > dev->nregions = 0; > int i; > struct sockaddr_un un; > +CallbackFunc cb; > size_t len; > > for (i = 0; i < MAX_NR_VIRTQUEUE; i++) { > @@ -1238,19 +1239,27 @@ vubr_new(const char *path) > un.sun_family = AF_UNIX; > strcpy(un.sun_path, path); > len = sizeof(un.sun_family) + strlen(path); > -unlink(path); > > -if (bind(dev->sock, (struct sockaddr *) &un, len) == -1) { > -vubr_die("bind"); > -} > +if (!client) { > +unlink(path); > + > +if (bind(dev->sock, (struct sockaddr *) &un, len) == -1) { > +vubr_die("bind"); > +} > > -if (listen(dev->sock, 1) == -1) { > -vubr_die("listen"); > +if (listen(dev->sock, 1) == -1) { > +vubr_die("listen"); > +} > +cb = vubr_accept_cb; > +} else { > +if (connect(dev->sock, (struct sockaddr *)&un, len) == -1) { > +vubr_die("connect"); > +} > +cb = vubr_receive_cb; > } > > dispatcher_init(&dev->dispatcher); > -dispatcher_add(&dev->dispatcher, dev->sock, (void *)dev, > - vubr_accept_cb); > +dispatcher_add(&dev->dispatcher, dev->sock, (void *)dev, cb); > > DPRINT("Waiting for connections on UNIX socket %s ...\n", path); > return dev; > @@ -1369,8 +1378,9 @@ main(int argc, char *argv[]) > { > VubrDev *dev; > int opt; > +bool client = false; > > -while ((opt = getopt(argc, argv, "l:r:u:")) != -1) { > +while ((opt = getopt(argc, argv, "l:r:u:c")) != -1) { > > switch (opt) { > case 'l': > @@ -1386,16 +1396,20 @@ main(int argc, char *argv[]) > case 'u': > ud_socket_path = strdup(optarg); > break; > +case 'c': > +client = true; > +break; > default: > goto out; > } > } > > -DPRINT("ud socket: %s\n", ud_socket_path); > +DPRINT("ud socket: %s (%s)\n", ud_socket_path, > + client ? "client" : "server"); > DPRINT("local: %s:%s\n", lhost, lport); > DPRINT("remote:%s:%s\n", rhost, rport); > > -dev = vubr_new(ud_socket_path); > +dev = vubr_new(ud_socket_path, client); > if (!dev) { > return 1; > } > -- > 2.7.4
Re: [Qemu-devel] [PATCH 00/10] RFCv3: vhost-user: simple reconnection support
Hi Marc, I was also thinking we should do it in this way first. That simplfies things. So, feel free to add: Tested-by: Yuanhan Liu Reviewed-by: Yuanhan Liu (well, I didn't review the test case carefully). Thanks. --yliu On Tue, May 10, 2016 at 06:03:50PM +0200, marcandre.lur...@redhat.com wrote: > From: Marc-André Lureau > > Hi, > > In a previous series "RFCv2: vhost-user: shutdown and reconnection", I > proposed to add a new slave request to handle graceful shutdown, for > both qemu configuration, server or client, while keeping the guest > running with link down status. > > However, for the simple case where qemu is configured as server, and > the backend processes packets in order and disconnects, it is possible > for the backend to recover after reconnection by discarding the qemu > SET_VRING_BASE value and resuming from the used->index instead. This > simplifies the reconnection support in this particular situation (it > would make sense to have the backend declare this behaviour with an > extra flag) > > The guest won't be notified of link status change and queues may not > be processed in a timely manner, also qemu may assert if some > vhost-user commands are happening while the backend is disconnected. > So the reconnection must happen "quickly enough" here. In order to > keep the series relatively small, these further problems will be > addressed later. > > These series demonstrates a simple reconnect support for vubr > (configured as client and qemu as server), includes some nice to > have fixes and a simple test. > > Marc-André Lureau (8): > vubr: add client mode > vubr: workaround stale vring base > vhost-user: disconnect on start failure > vhost-net: do not crash if backend is not present > vhost-net: save & restore vhost-user acked features > vhost-net: save & restore vring enable state > tests: append i386 tests > test: start vhost-user reconnect test > > Tetsuya Mukawa (2): > vhost-user: add ability to know vhost-user backend disconnection > qemu-char: add qemu_chr_disconnect to close a fd accepted by listen fd > > hw/net/vhost_net.c| 39 ++- > include/net/net.h | 1 + > include/net/vhost-user.h | 1 + > include/net/vhost_net.h | 3 ++ > include/sysemu/char.h | 7 +++ > net/vhost-user.c | 32 +++- > qemu-char.c | 8 +++ > tests/Makefile| 2 +- > tests/vhost-user-bridge.c | 45 - > tests/vhost-user-test.c | 123 > +++--- > 10 files changed, 228 insertions(+), 33 deletions(-) > > -- > 2.7.4
Re: [Qemu-devel] [PATCH 11/18] vhost-user: add shutdown support
Hello, On Wed, May 04, 2016 at 10:13:49PM +0300, Michael S. Tsirkin wrote: > How do you avoid it? > > > > Management is required to make this robust, auto-reconnect > > > is handy for people bypassing management. > > > > tbh, I don't like autoreconnect. My previous series didn't include > > this and assumed the feature would be supported only when qemu is > > configured to be the server. I added reconnect upon request by users. > > I don't have better solutions so OK I guess. Yes, it's a request from me :) Well, there may be few others also requested that. The reason I had this asking is that, so far, we just have only one vhost-user frontend, and that is QEMU. But we may have many backends, I'm aware of 4 at the time writing, including the vubr from QEMU. While we could do vhost-user client and reconnect implementation on all backends, it's clear that implementing it in the only backend (QEMU) introduces more benefits. OTOH, I could implement DPDK vhost-user as client and try reconnect there, if that could makes all stuff a bit easier. --yliu
Re: [Qemu-devel] [PATCH 11/18] vhost-user: add shutdown support
On Mon, May 02, 2016 at 03:04:30PM +0300, Michael S. Tsirkin wrote: > On Mon, May 02, 2016 at 01:29:18PM +0200, Marc-André Lureau wrote: > > Hi > > > > On Mon, May 2, 2016 at 12:45 PM, Michael S. Tsirkin wrote: > > > 1. Graceful disconnect > > > - One should be able to do vmstop, disconnect, connect then vm start > > > This looks like a nice intermediate step. > > > - Why would we always assume it's always remote initiating the disconnect? > > > Monitor commands for disconnecting seem called for. > > > > Those two solutions involve VM management. This looks more complex to > > communicate/synchronize vhost-user backend & vm management & qemu. The > > use case I cover is request from the backend to shutdown, > > Right, but flushing buffers + closing the socket looks like > a cleaner interface than a ton of messages going back and forth. I'd agree with Michael on that. It needs more cares when dealing with two stream buffers: you can't quit backend soon after the shutdown request, instead, you have to wait for the ACK from QEMU. Making request from QEMU avoids that. --yliu
Re: [Qemu-devel] [PATCH 11/18] vhost-user: add shutdown support
On Mon, May 02, 2016 at 01:45:31PM +0300, Michael S. Tsirkin wrote: > On Sun, May 01, 2016 at 02:04:42PM -0700, Yuanhan Liu wrote: > > On Sun, May 01, 2016 at 02:37:19PM +0300, Michael S. Tsirkin wrote: > > > On Fri, Apr 29, 2016 at 10:48:35AM -0700, Yuanhan Liu wrote: > > > > > But, I > > > > > would worry first about a backend that crashes that it may corrupt the > > > > > VM memory too... > > > > > > > > Not quite sure I follow this. But, backend just touches the virtio > > > > related memory, so it will do no harm to the VM? > > > > > > It crashed so apparently it's not behaving as designed - > > > how can we be sure it does not harm the VM? > > > > Hi Michael, > > > > What I know so far, a crash might could corrupt the virtio memory in two > > ways: > > > > - vring_used_elem might be in stale state when we are in the middle of > > updating used vring. Say, we might have updated the "id" field, but > > leaving "len" untouched. > > > > - vring desc buff might also be in stale state, when we are copying > > data into it. > > > - a random write into VM memory due to backend bug corrupts it. > > > However, the two issues will not be real issue, as used->idx is not > > updated in both case. Thefore, those corrupted memory will not be > > processed by guest. So, no harm I'd say. Or, am I missing anything? > > Why is backend crashing? It shouldn't so maybe this means it's > memory is corrupt. That is the claim. As far as virtio memory is not corrupted (or even corrupt in above ways), there would be no issue. But, yes, you made a good point: we make no guarantees that it's not the virtio memory corruption caused the crash. > > BTW, Michael, what's your thoughts on the whole reconnect stuff? > > > > --yliu > > My thoughts are that we need to split these patchsets up. > > There are several issues here: > > > 1. Graceful disconnect > - One should be able to do vmstop, disconnect, connect then vm start > This looks like a nice intermediate step. > - Why would we always assume it's always remote initiating the disconnect? > Monitor commands for disconnecting seem called for. A monitor command is a good suggestion. But I'm thinking why vmstop is necessary. Basically, a disconnect is as to a cable unplug to physical NIC; we don't need pause it before unplugging. > > 2. Unexpected disconnect > - If buffers are processed in order or flushed before socket close, > then on disconnect status can be recovered > from ring alone. If that is of interest, we might want to add > a protocol flag for that. F_DISCONNECT_FLUSH ? Sorry, what does this flag supposed to work here? > Without this, > only graceful disconnect or reset with guest's help can be supported. > - As Marc-André said, without graceful shutdown it is not enough to > handle ring state though. We must handle socket getting disconnected > in the middle of send/receive. While it's more work, > it does seem like a nice interface to support. Same as above, what the backend (or QEMU) can do for this case without the guest's (reset) help? > - I understand the concern that existing guests do not handle NEED_RESET > status. One way to fix this would be a new feature bit. If NEED_RESET not > handled, I could check the code by myself, but I'm thinking it might be trivial for you to answer my question: how do we know that NEED_RESET is not handled? > request hot-unplug instead. Same as above, may I know how to request a hot-unplug? > > 3. Running while disconnected > - At the moment, we wait on vm start for remote to connect, > if we support disconnecting backend without stopping > we probably should also support starting it without waiting > for connection Agreed. I guess that would anaswer my first question: we don't actually need to do vmstop before disconnect. --yliu > - Guests expect tx buffers to be used in a timely manner, thus: > - If vm is running we must use them in qemu, maybe discarding packets > in the process. > There already are commands for link down, I'm not sure there's value > in doing this automatically in qemu. > - Alternatively, we should also have an option to stop vm automatically (like > we do on > disk errors) to reduce number of dropped packets. > > 4. Reconnecting > - When acting as a server, we might want an option to go back to > listening state, but it should not be the only option, > there should be a monitor command for moving it back to > listening state. > - When acting as a client, auto-reconnect might be handy sometimes, but > should not be the only > option, there should be a way to manually request connect, possibly to > a different target, so a monitor command for re-connecting seems called for. > - We'll also need monitor events for disconnects so management knows it > must re-connect/restart listening. > - If we stopped VM, there could be an option to restart on reconnect. > > > -- > MST
Re: [Qemu-devel] [PATCH 11/18] vhost-user: add shutdown support
On Sun, May 01, 2016 at 02:37:19PM +0300, Michael S. Tsirkin wrote: > On Fri, Apr 29, 2016 at 10:48:35AM -0700, Yuanhan Liu wrote: > > > But, I > > > would worry first about a backend that crashes that it may corrupt the > > > VM memory too... > > > > Not quite sure I follow this. But, backend just touches the virtio > > related memory, so it will do no harm to the VM? > > It crashed so apparently it's not behaving as designed - > how can we be sure it does not harm the VM? Hi Michael, What I know so far, a crash might could corrupt the virtio memory in two ways: - vring_used_elem might be in stale state when we are in the middle of updating used vring. Say, we might have updated the "id" field, but leaving "len" untouched. - vring desc buff might also be in stale state, when we are copying data into it. However, the two issues will not be real issue, as used->idx is not updated in both case. Thefore, those corrupted memory will not be processed by guest. So, no harm I'd say. Or, am I missing anything? BTW, Michael, what's your thoughts on the whole reconnect stuff? --yliu
Re: [Qemu-devel] [PATCH 11/18] vhost-user: add shutdown support
On Fri, Apr 29, 2016 at 12:40:09PM +0200, Marc-André Lureau wrote: > Hi > > On Thu, Apr 28, 2016 at 7:23 AM, Yuanhan Liu > wrote: > > On Fri, Apr 01, 2016 at 01:16:21PM +0200, marcandre.lur...@redhat.com wrote: > >> From: Marc-André Lureau > >> +Slave message types > >> +--- > >> + > >> + * VHOST_USER_SLAVE_SHUTDOWN: > >> + Id: 1 > >> + Master payload: N/A > >> + Slave payload: u64 > >> + > >> + Request the master to shutdown the slave. A 0 reply is for > >> + success, in which case the slave may close all connections > >> + immediately and quit. A non-zero reply cancels the request. > >> + > >> + Before a reply comes, the master may make other requests in > >> + order to flush or sync state. > > > > Hi all, > > > > I threw this proposal as well as DPDK's implementation to our customer > > (OVS, Openstack and some other teams) who made such request before. I'm > > sorry to say that none of them really liked that we can't handle crash. > > Making reconnect work from a vhost-user backend crash is exactly something > > they are after. > > Handling crashes is not quite the same as what I propose here. Yes, I know. However, handling crashes is exactly what our customers want. And I just want to let you know that, say, I don't ask you to do that :) > I see > it as a different goal. But I doubt about usefulness and reliability > of a backend that crashes. Agreed with you on that. However, I guess you have to admit that crashes just happen. Kernel sometimes crashes, too. So, it would be great if we could make whole stuff work again after an unexpected crash (say, from OVS), without restarting all guests. > In many case, vhost-user was designed after > kernel vhost, and qemu code has the same expectation about the kernel > or the vhost-user backend: many calls are sync and will simply > assert() on unexpected results. I guess we could at aleast try to dimish it, if we can't avoid it completely. > > And to handle the crash, I was thinking of the proposal from Michael. > > That is to do reset from the guest OS. This would fix this issue > > ultimately. However, old kernel will not benefit from this, as well > > as other guest other than Linux, making it not that useful for current > > usage. > > Yes, I hope Michael can help with that, I am not very familiar with > the kernel code. > > > Thinking of that the VHOST_USER_SLAVE_SHUTDOWN just gives QEMU a chance > > to get the vring base (last used idx) from the backend, Huawei suggests > > Right, but after this message, the backend should have flushed all > pending ring packets and stop processing them. So it's also a clean > sync point. > > > that we could still make it in a consistent state after the crash, if > > we get the vring base from vring->used->idx. That worked as expected > > You can have a backend that would have already processed packets and > not updated used idx. You could also have out-of-order packets in > flights (ex: desc 1-2-3 avail, 1-3 used, 2 pending..). I can't see a > clean way to restore this, but to reset the queues and start over, > with either packet loss or packet duplication. Judging that it (crash or restart) happens so rare, I don't think it matters. Moreoever, doesn't that happen in real world :) > If the backend > guarantees to process packets in order, it may simplify things, but it > would be a special case. Well, it's more like a backend thing: it's the backend to try to set a saner vring base as stated in above proposal. Therefore, I will not say it's a special case. > > > from my test. The only tricky thing might be how to detect a crash, > > and we could do a simple compare of the vring base from QEMU with > > the vring->used->idx at the initiation stage. If mismatch found, get > > it from vring->used->idx instead. > > I don't follow, would the backend restore its last vring->used->idx > after a crash? Yes, restore from the SET_VRING_BASE from QEMU. But it's a stale value, normally 0 if no start/stop happens before. Therefore, we can't use it after the crash, instead, we could try to detect the mismatch and try to fix it at SET_VRING_ADDR request. > > > Comments/thoughts? Is it a solid enough solution to you? If so, we > > could make things much simpler, and what's most important, we could > > be able to handle crash. > > That's not so easy, many of the vhost_ops->vhost*() are followed by > assert(r>0), which will be hard to change to handle failure. But, I > would worry first about a backend that crashes that it may corrupt the > VM memory too... Not quite sure I follow this. But, backend just touches the virtio related memory, so it will do no harm to the VM? --yliu
Re: [Qemu-devel] [PATCH 11/18] vhost-user: add shutdown support
On Fri, Apr 01, 2016 at 01:16:21PM +0200, marcandre.lur...@redhat.com wrote: > From: Marc-André Lureau > > Signed-off-by: Marc-André Lureau > --- > docs/specs/vhost-user.txt | 15 +++ > hw/virtio/vhost-user.c| 16 > 2 files changed, 31 insertions(+) > > diff --git a/docs/specs/vhost-user.txt b/docs/specs/vhost-user.txt > index 8a635fa..60d6d13 100644 > --- a/docs/specs/vhost-user.txt > +++ b/docs/specs/vhost-user.txt > @@ -487,3 +487,18 @@ Message types >request to the master. It is passed in the ancillary data. >This message is only sent if VHOST_USER_PROTOCOL_F_SLAVE_CHANNEL >feature is available. > + > +Slave message types > +--- > + > + * VHOST_USER_SLAVE_SHUTDOWN: > + Id: 1 > + Master payload: N/A > + Slave payload: u64 > + > + Request the master to shutdown the slave. A 0 reply is for > + success, in which case the slave may close all connections > + immediately and quit. A non-zero reply cancels the request. > + > + Before a reply comes, the master may make other requests in > + order to flush or sync state. Hi all, I threw this proposal as well as DPDK's implementation to our customer (OVS, Openstack and some other teams) who made such request before. I'm sorry to say that none of them really liked that we can't handle crash. Making reconnect work from a vhost-user backend crash is exactly something they are after. And to handle the crash, I was thinking of the proposal from Michael. That is to do reset from the guest OS. This would fix this issue ultimately. However, old kernel will not benefit from this, as well as other guest other than Linux, making it not that useful for current usage. Thinking of that the VHOST_USER_SLAVE_SHUTDOWN just gives QEMU a chance to get the vring base (last used idx) from the backend, Huawei suggests that we could still make it in a consistent state after the crash, if we get the vring base from vring->used->idx. That worked as expected from my test. The only tricky thing might be how to detect a crash, and we could do a simple compare of the vring base from QEMU with the vring->used->idx at the initiation stage. If mismatch found, get it from vring->used->idx instead. Comments/thoughts? Is it a solid enough solution to you? If so, we could make things much simpler, and what's most important, we could be able to handle crash. --yliu
Re: [Qemu-devel] [PATCH 04/18] char: add wait support for reconnect
On Fri, Apr 01, 2016 at 01:16:14PM +0200, marcandre.lur...@redhat.com wrote: > From: Marc-André Lureau > > Until now, 'wait' was solely used for listening sockets. However, it can > also be useful for 'reconnect' socket kind, where the first open must > succeed before continuing. > > This allows for instance (with other support patches) having vhost-user > wait for an initial connection to setup the vhost-net, before eventually > accepting new connections. I have met a tricky issue about this patch. Assume the socket exist in before we start QEMU, but somehow we can not connect to it, say due to permission error. In such case, QEMU would wait there forever, without any error messages. I would assume it's a bug, right? --yliu > > Signed-off-by: Marc-André Lureau > --- > qemu-char.c | 31 +++ > 1 file changed, 23 insertions(+), 8 deletions(-) > > diff --git a/qemu-char.c b/qemu-char.c > index 8702931..3e25c08 100644 > --- a/qemu-char.c > +++ b/qemu-char.c > @@ -3659,7 +3659,7 @@ static void qemu_chr_parse_socket(QemuOpts *opts, > ChardevBackend *backend, >Error **errp) > { > bool is_listen = qemu_opt_get_bool(opts, "server", false); > -bool is_waitconnect = is_listen && qemu_opt_get_bool(opts, "wait", true); > +bool is_wait= qemu_opt_get_bool(opts, "wait", is_listen); > bool is_telnet = qemu_opt_get_bool(opts, "telnet", false); > bool do_nodelay = !qemu_opt_get_bool(opts, "delay", true); > int64_t reconnect = qemu_opt_get_number(opts, "reconnect", 0); > @@ -3696,7 +3696,7 @@ static void qemu_chr_parse_socket(QemuOpts *opts, > ChardevBackend *backend, > sock->has_telnet = true; > sock->telnet = is_telnet; > sock->has_wait = true; > -sock->wait = is_waitconnect; > +sock->wait = is_wait; > sock->has_reconnect = true; > sock->reconnect = reconnect; > sock->tls_creds = g_strdup(tls_creds); > @@ -4350,7 +4350,7 @@ static CharDriverState *qmp_chardev_open_socket(const > char *id, > bool do_nodelay = sock->has_nodelay ? sock->nodelay : false; > bool is_listen = sock->has_server ? sock->server : true; > bool is_telnet = sock->has_telnet ? sock->telnet : false; > -bool is_waitconnect = sock->has_wait? sock->wait: false; > +bool is_wait= sock->has_wait? sock->wait: false; > int64_t reconnect = sock->has_reconnect ? sock->reconnect : 0; > ChardevCommon *common = qapi_ChardevSocket_base(sock); > QIOChannelSocket *sioc = NULL; > @@ -4424,7 +4424,7 @@ static CharDriverState *qmp_chardev_open_socket(const > char *id, > } > > sioc = qio_channel_socket_new(); > -if (s->reconnect_time) { > +if (s->reconnect_time && !is_wait) { > qio_channel_socket_connect_async(sioc, s->addr, > qemu_chr_socket_connected, > chr, NULL); > @@ -4433,7 +4433,7 @@ static CharDriverState *qmp_chardev_open_socket(const > char *id, > goto error; > } > s->listen_ioc = sioc; > -if (is_waitconnect) { > +if (is_wait) { > trace_char_socket_waiting(chr->filename); > tcp_chr_accept(QIO_CHANNEL(s->listen_ioc), G_IO_IN, chr); > } > @@ -4443,9 +4443,24 @@ static CharDriverState *qmp_chardev_open_socket(const > char *id, > QIO_CHANNEL(s->listen_ioc), G_IO_IN, tcp_chr_accept, chr, > NULL); > } > } else { > -if (qio_channel_socket_connect_sync(sioc, s->addr, errp) < 0) { > -goto error; > -} > +do { > +Error *err = NULL; > + > +if (qio_channel_socket_connect_sync(sioc, s->addr, &err) < 0) { > +if (reconnect) { > +trace_char_socket_reconnect_error(chr->label, > + error_get_pretty(err)); > +error_free(err); > +continue; > +} else { > +error_propagate(errp, err); > +goto error; > +} > +} else { > +break; > +} > +} while (true); > + > tcp_chr_new_client(chr, sioc); > object_unref(OBJECT(sioc)); > } > -- > 2.5.5
Re: [Qemu-devel] [PATCH 11/18] vhost-user: add shutdown support
On Wed, Apr 13, 2016 at 11:43:56PM +0200, Marc-André Lureau wrote: > On Wed, Apr 13, 2016 at 7:32 PM, Yuanhan Liu > wrote: > >> > >> > I'm asking because I found a seg fault issue sometimes, > >> > due to opaque is NULL. > > > > Oh, I was wrong, it's u being NULL, but not opaque. > >> > > >> > >> I would be interested to see the backtrace or have a reproducer. > > > > It's a normal test steps: start a vhost-user switch (I'm using DPDK > > vhost-switch example), kill it, and wait for a while (something like > > more than 10s or even longer), then I saw a seg fault: > > > > (gdb) p dev > > $4 = (struct vhost_dev *) 0x56571bf0 > > (gdb) p u > > $5 = (struct vhost_user *) 0x0 > > (gdb) where > > #0 0x55798612 in slave_read (opaque=0x56571bf0) > > at /home/yliu/qemu/hw/virtio/vhost-user.c:539 > > #1 0x55a343a4 in aio_dispatch (ctx=0x5655f560) at > > /home/yliu/qemu/aio-posix.c:327 > > #2 0x55a2738b in aio_ctx_dispatch (source=0x5655f560, > > callback=0x0, user_data=0x0) > > at /home/yliu/qemu/async.c:233 > > #3 0x751032a6 in g_main_context_dispatch () from > > /lib64/libglib-2.0.so.0 > > #4 0x55a3239e in glib_pollfds_poll () at > > /home/yliu/qemu/main-loop.c:213 > > #5 0x55a3247b in os_host_main_loop_wait (timeout=29875848) at > > /home/yliu/qemu/main-loop.c:258 > > #6 0x55a3252b in main_loop_wait (nonblocking=0) at > > /home/yliu/qemu/main-loop.c:506 > > #7 0x55846e35 in main_loop () at /home/yliu/qemu/vl.c:1934 > > #8 0x5584e6bf in main (argc=31, argv=0x7fffe078, > > envp=0x7fffe178) > > at /home/yliu/qemu/vl.c:4658 > > > > This patch set doesn't try to handle crashes from backend. This would > require a much more detailed study of the existing code path. A lot of > places assume the backend is fully working as expected. I think > handling backend crashes should be a different, later, patch set. Oh, sorry for not making it clear. I actually did the kill by "ctrl-c". It then is captured to send a SLAVE_SHUTDOWN request. So, I would say it's a normal quit. --yliu
Re: [Qemu-devel] [PATCH 11/18] vhost-user: add shutdown support
On Wed, Apr 13, 2016 at 05:51:15AM -0400, Marc-André Lureau wrote: > Hi > > - Original Message - > > Hi Marc, > > > > First of all, sorry again for late response! > > > > Last time I tried with your first version, I found few issues related > > with reconnect, mainly on the acked_feautres lost. While checking your > > new code, I found that you've already solved that, which is great. > > > > So, I tried harder this time, your patches work great, except that I > > found few nits. > > > > On Fri, Apr 01, 2016 at 01:16:21PM +0200, marcandre.lur...@redhat.com wrote: > > > From: Marc-André Lureau > > ... > > > +Slave message types > > > +--- > > > + > > > + * VHOST_USER_SLAVE_SHUTDOWN: > > > + Id: 1 > > > + Master payload: N/A > > > + Slave payload: u64 > > > + > > > + Request the master to shutdown the slave. A 0 reply is for > > > + success, in which case the slave may close all connections > > > + immediately and quit. > > > > Assume we are using ovs + dpdk here, that we could have two > > vhost-user connections. While ovs tries to initiate a restart, > > it might unregister the two connections one by one. In such > > case, two VHOST_USER_SLAVE_SHUTDOWN request will be sent, > > and two replies will get. Therefore, I don't think it's a > > proper ask here to let the backend implementation to do quit > > here. > > > > On success reply, the master sent all the commands to finish the connection. > So the slave must flush/finish all pending requests first. Yes, that's okay. I here just mean the "close __all__ connections" and "quit" part. Firstly, we should do cleanup/flush/finish to it's own connection. But not all, right? Second, as stated, doing quit might not make sense, as we may have more connections. > I think this should be enough, otherwise we may need a new explicit message? > > > > > > > > > switch (msg.request) { > > > +case VHOST_USER_SLAVE_SHUTDOWN: { > > > +uint64_t success = 1; /* 0 is for success */ > > > +if (dev->stop) { > > > +dev->stop(dev); > > > +success = 0; > > > +} > > > +msg.payload.u64 = success; > > > +msg.size = sizeof(msg.payload.u64); > > > +size = send(u->slave_fd, &msg, VHOST_USER_HDR_SIZE + msg.size, > > > 0); > > > +if (size != VHOST_USER_HDR_SIZE + msg.size) { > > > +error_report("Failed to write reply."); > > > +} > > > +break; > > > > You might want to remove the slave_fd from watch list? We > > might also need to close slave_fd here, assuming that we > > will no longer use it when VHOST_USER_SLAVE_SHUTDOWN is > > received? > > Makes sense, I will change that in next update. > > > I'm asking because I found a seg fault issue sometimes, > > due to opaque is NULL. Oh, I was wrong, it's u being NULL, but not opaque. > > > > I would be interested to see the backtrace or have a reproducer. It's a normal test steps: start a vhost-user switch (I'm using DPDK vhost-switch example), kill it, and wait for a while (something like more than 10s or even longer), then I saw a seg fault: (gdb) p dev $4 = (struct vhost_dev *) 0x56571bf0 (gdb) p u $5 = (struct vhost_user *) 0x0 (gdb) where #0 0x55798612 in slave_read (opaque=0x56571bf0) at /home/yliu/qemu/hw/virtio/vhost-user.c:539 #1 0x55a343a4 in aio_dispatch (ctx=0x5655f560) at /home/yliu/qemu/aio-posix.c:327 #2 0x55a2738b in aio_ctx_dispatch (source=0x5655f560, callback=0x0, user_data=0x0) at /home/yliu/qemu/async.c:233 #3 0x751032a6 in g_main_context_dispatch () from /lib64/libglib-2.0.so.0 #4 0x55a3239e in glib_pollfds_poll () at /home/yliu/qemu/main-loop.c:213 #5 0x55a3247b in os_host_main_loop_wait (timeout=29875848) at /home/yliu/qemu/main-loop.c:258 #6 0x55a3252b in main_loop_wait (nonblocking=0) at /home/yliu/qemu/main-loop.c:506 #7 0x55846e35 in main_loop () at /home/yliu/qemu/vl.c:1934 #8 0x5584e6bf in main (argc=31, argv=0x7fffe078, envp=0x7fffe178) at /home/yliu/qemu/vl.c:4658 --yliu
Re: [Qemu-devel] [PATCH 18/18] test: add shutdown support vubr test
On Fri, Apr 01, 2016 at 01:16:28PM +0200, marcandre.lur...@redhat.com wrote: > +static void > +vubr_handle_slave_reply(VhostUserMsg *vmsg) > +{ > +DPRINT( > +"== Vhost slave reply from QEMU > ==\n"); > +DPRINT("Request: %s (%d)\n", vubr_slave_request_str[vmsg->request], > + vmsg->request); > +DPRINT("Flags: 0x%x\n", vmsg->flags); > +DPRINT("Size:%d\n", vmsg->size); > + > +switch (vmsg->request) { > +case VHOST_USER_SLAVE_SHUTDOWN: > +DPRINT("Shutdown success: 0x%016"PRIx64"\n", vmsg->payload.u64); > +if (vmsg->payload.u64 == 0) { > +exit(0); > +} > +default: > +DPRINT("Invalid slave reply"); > +}; ^^ Minor nit: redundant ';'. --yliu
Re: [Qemu-devel] [PATCH 11/18] vhost-user: add shutdown support
Hi Marc, First of all, sorry again for late response! Last time I tried with your first version, I found few issues related with reconnect, mainly on the acked_feautres lost. While checking your new code, I found that you've already solved that, which is great. So, I tried harder this time, your patches work great, except that I found few nits. On Fri, Apr 01, 2016 at 01:16:21PM +0200, marcandre.lur...@redhat.com wrote: > From: Marc-André Lureau ... > +Slave message types > +--- > + > + * VHOST_USER_SLAVE_SHUTDOWN: > + Id: 1 > + Master payload: N/A > + Slave payload: u64 > + > + Request the master to shutdown the slave. A 0 reply is for > + success, in which case the slave may close all connections > + immediately and quit. Assume we are using ovs + dpdk here, that we could have two vhost-user connections. While ovs tries to initiate a restart, it might unregister the two connections one by one. In such case, two VHOST_USER_SLAVE_SHUTDOWN request will be sent, and two replies will get. Therefore, I don't think it's a proper ask here to let the backend implementation to do quit here. > > switch (msg.request) { > +case VHOST_USER_SLAVE_SHUTDOWN: { > +uint64_t success = 1; /* 0 is for success */ > +if (dev->stop) { > +dev->stop(dev); > +success = 0; > +} > +msg.payload.u64 = success; > +msg.size = sizeof(msg.payload.u64); > +size = send(u->slave_fd, &msg, VHOST_USER_HDR_SIZE + msg.size, 0); > +if (size != VHOST_USER_HDR_SIZE + msg.size) { > +error_report("Failed to write reply."); > +} > +break; You might want to remove the slave_fd from watch list? We might also need to close slave_fd here, assuming that we will no longer use it when VHOST_USER_SLAVE_SHUTDOWN is received? I'm asking because I found a seg fault issue sometimes, due to opaque is NULL. --yliu
Re: [Qemu-devel] [PATCH RFC 00/14] vhost-user: shutdown and reconnection
On Tue, Mar 29, 2016 at 12:52:32PM +0200, Marc-André Lureau wrote: > Hi > > On Tue, Mar 29, 2016 at 10:10 AM, Yuanhan Liu > wrote: > > Backend crash may be not that normal in production usage, it is in > > development stage. It would be handy if we can handle that as well, > > as it's a painful experience for me that every time I do a VM2VM test > > and once my changes to backend causes a crash (unfortunately, it > > happens often in early stage), I have to reboot the two VMs. > > > > If we have reconnect, life could be easier. I then don't need worry > > that I will mess the backend and crash it any more. > > > > And again, the reason I mentioned crash here again is not because > > we need handle it, specially. Instead, I was thinking we might be > > able to handle the two reasons both. > > I think crash could be handle with queue reset Michael proposed, but > that would probably require guest side changes. Is the reply from Michael in this thread the whole story? If not, would you please give me a link of that discussion? > > >> Conceptually, I think if we allow the backend to disconnect, it makes > >> sense that qemu is actually the socket server. But it doesn't matter > >> much, it's simple to teach qemu to reconnect a timer... > > > > We already have that, right? I mean, the char-dev "reconnect" option. > > Sure, what I mean is that it sounds cleaner from a design pov for qemu > to be the server (since it is the one actually waiting for backend in > this case), Yeah, I agree with you on that. However, thinking in this way: let QEMU be the server and backend be the client, the client still need to keep trying to connect the server, when the server crashes/quits/restarts. My point is, in either way, we need bear in mind that server could also be down (due to crashes/restarts), that we have to teach the client to do reconnect when disconnected. Judging that QEMU already has the support, I'd slightly prefer to let QEMU still be the client, and do the reconnect tries, if it works. And to make it clear, both should work, but it's more like a question which one will be a better option (to us, DPDK), QEMU be the server or the client? What's sure here is that, in either way, your patches would work and are required. > beside a timer is often a pretty rough solution to > anything. As stated above, I'm afraid that is somehow needed. It might be in QEMU or backend, though. > >> So we should > >> probably allow both cases anyway. > > > > Yes, I think both should work. I may be wrong (that I may miss > > something), but it seems it's (far) easier to me to keep QEMU > > as the client, and adding the "reconnect" option, judging that > > we have all stuff to make it work ready. In this way, > > > > - if backend crashes/quits, you just need restart the backend, > > and QEMU will retry the connection. > > > > - if QEMU crashes/quits, you just need restart the QEMU, and > > then QEMU will start a fresh connection. > > It may all depend on use cases, it's not more obvious or easy than the > other to me. > > > However, if let QEMU as the server, there are 2 major works need > > to be done in the backend side (take DPDK as example, that just > > acts as vhost-user server only so far): > > > > - Introducing a new API or extending current API to let it to > > connect the server socket from QEMU. > > > > - If QEMU crashes/quits, we need add code to backend to keep > > reconnecting unless connection is established, which is something > > similar to the "reconnect" stuff in QEMU. > > As you can see, it needs more effort (though it's not something > > you care :). And it has duplicate work. > > > Ah, I am looking at this from qemu angle, backend may need to adapt if > it doesn't already handle both "socket role" (client & server). Agreed. > > > >> > >> > - start DPDK vhost-switch example > >> > > >> > - start QEMU, which will connect to DPDK vhost-user > >> > > >> > link is good now. > >> > > >> > - kill DPDK vhost-switch > >> > > >> > link is broken at this stage > >> > > >> > - start DPDK vhost-switch again > >> > > >> > you will find that the link is back again. > >> > > >> > > >> > Will that makes sense to you? If so, we may need do nothing (or just > >> > very few) changes at all to DPDK to get the reconnect work. > >> > >>
Re: [Qemu-devel] [PATCH RFC 00/14] vhost-user: shutdown and reconnection
On Fri, Mar 25, 2016 at 07:00:15PM +0100, Marc-André Lureau wrote: > Hi > > On Thu, Mar 24, 2016 at 8:10 AM, Yuanhan Liu > wrote: > >> > The following series starts from the idea that the slave can request a > >> > "managed" shutdown instead and later recover (I guess the use case for > >> > this is to allow for example to update static dispatching/filter rules > >> > etc) > > > > What if the backend crashes, that no such request will be sent? And > > I'm wondering why this request is needed, as we are able to detect > > the disconnect now (with your patches). Hi, sorry for late and thanks for the input. (CC'ed more guys, as they are the hungry users that wish this feature to be done ASAP; they may provide more valuable inputs) > > I don't think trying to handle backend crashes is really a thing we > need to take care of. Yes, me neither. The reason I mentioned backend crash is that it's one of the reasons why we need vhost-user reconnect stuff. And the another reason is the backend restart. Backend crash may be not that normal in production usage, it is in development stage. It would be handy if we can handle that as well, as it's a painful experience for me that every time I do a VM2VM test and once my changes to backend causes a crash (unfortunately, it happens often in early stage), I have to reboot the two VMs. If we have reconnect, life could be easier. I then don't need worry that I will mess the backend and crash it any more. And again, the reason I mentioned crash here again is not because we need handle it, specially. Instead, I was thinking we might be able to handle the two reasons both. > If the backend is bad enough to crash, it may as > well corrupt the guest memory (mst: my understanding of vhost-user is > that backend must be trusted, or it could just throw garbage in the > queue descriptors with surprising consequences or elsewhere in the > guest memory actually, right?). > > > BTW, you meant to let QEMU as the server and the backend as the client > > here, right? Honestly, that's what we've thought of, too, in the first > > time. > > However, I'm wondering could we still go with the QEMU as the client > > and the backend as the server (the default and the only way DPDK > > supports), and let QEMU to try to reconnect when the backend crashes > > and restarts. In such case, we need enable the "reconnect" option > > for vhost-user, and once I have done that, it basically works in my > > test: > > > > Conceptually, I think if we allow the backend to disconnect, it makes > sense that qemu is actually the socket server. But it doesn't matter > much, it's simple to teach qemu to reconnect a timer... We already have that, right? I mean, the char-dev "reconnect" option. > So we should > probably allow both cases anyway. Yes, I think both should work. I may be wrong (that I may miss something), but it seems it's (far) easier to me to keep QEMU as the client, and adding the "reconnect" option, judging that we have all stuff to make it work ready. In this way, - if backend crashes/quits, you just need restart the backend, and QEMU will retry the connection. - if QEMU crashes/quits, you just need restart the QEMU, and then QEMU will start a fresh connection. However, if let QEMU as the server, there are 2 major works need to be done in the backend side (take DPDK as example, that just acts as vhost-user server only so far): - Introducing a new API or extending current API to let it to connect the server socket from QEMU. - If QEMU crashes/quits, we need add code to backend to keep reconnecting unless connection is established, which is something similar to the "reconnect" stuff in QEMU. As you can see, it needs more effort (though it's not something you care :). And it has duplicate work. > > > - start DPDK vhost-switch example > > > > - start QEMU, which will connect to DPDK vhost-user > > > > link is good now. > > > > - kill DPDK vhost-switch > > > > link is broken at this stage > > > > - start DPDK vhost-switch again > > > > you will find that the link is back again. > > > > > > Will that makes sense to you? If so, we may need do nothing (or just > > very few) changes at all to DPDK to get the reconnect work. > > The main issue with handling crashes (gone at any time) is that the > backend my not have time to sync the used idx (at the least). It may > already have processed incoming packets, so on reconnect, it may > duplicate the receiving/dispatching work. That's not the case for DPDK backend implementation: incoming packets won
Re: [Qemu-devel] [PATCH RFC 00/14] vhost-user: shutdown and reconnection
On Thu, Nov 26, 2015 at 12:33:22PM +0200, Michael S. Tsirkin wrote: > On Wed, Sep 09, 2015 at 01:09:52AM +0200, marcandre.lur...@redhat.com wrote: > > From: Marc-André Lureau > > > > In a previous series "Add feature to start QEMU without vhost-user > > backend", Tetsuya Mukawa proposed to allow the vhost-user backend to > > disconnect and reconnect. However, Michael Tsirkin pointed out that > > you can't do that without extra care, because the guest and hypervisor > > don't know the slave ring manipulation state, there might be pending > > replies for example that could be lost, and suggested to reset the > > guest queues, but this requires kernel changes, and it may have to > > clear the ring and lose queued packets. Hi Michael & Marc, I said I'd would like to have a try last week, and I did. However, I just be able to have a closer look at the code recently. > > The following series starts from the idea that the slave can request a > > "managed" shutdown instead and later recover (I guess the use case for > > this is to allow for example to update static dispatching/filter rules > > etc) What if the backend crashes, that no such request will be sent? And I'm wondering why this request is needed, as we are able to detect the disconnect now (with your patches). BTW, you meant to let QEMU as the server and the backend as the client here, right? Honestly, that's what we've thought of, too, in the first time. However, I'm wondering could we still go with the QEMU as the client and the backend as the server (the default and the only way DPDK supports), and let QEMU to try to reconnect when the backend crashes and restarts. In such case, we need enable the "reconnect" option for vhost-user, and once I have done that, it basically works in my test: - start DPDK vhost-switch example - start QEMU, which will connect to DPDK vhost-user link is good now. - kill DPDK vhost-switch link is broken at this stage - start DPDK vhost-switch again you will find that the link is back again. Will that makes sense to you? If so, we may need do nothing (or just very few) changes at all to DPDK to get the reconnect work. --yliu > I'm still not sure users actually need this. I am inclined to think we > should teach guests to respond to NEED_RESET status. Then to handle > disconnect, we would > - deactivate the disconnected backend > - stop VM, and wait for a reconnect > - set NEED_RESET status, and re-activate the backend > after guest reset > > -- > MST
Re: [Qemu-devel] chardev: vhost-user: reconnect issue when QEMU as server
On Wed, Mar 09, 2016 at 04:04:45PM +0100, Marc-André Lureau wrote: > Hi > > On Wed, Mar 9, 2016 at 2:29 PM, Yuanhan Liu > wrote: > > I could spend more time to dig it. But before that, I want to know > > how you guys think about it? Does that proposal makes sense to you? > > Or, any better ideas? > > See also this thread, which I have recently rebased: > https://lists.nongnu.org/archive/html/qemu-devel/2015-09/msg02172.html Brillient! Hopefully I can get some time to have a detailed look at the code and give some tries next week. BTW, one question: what's the current status of it? Why the last version is half year ago? --yliu > > I am trying to get the following series reviewed/merged before > discussion the rest > https://lists.nongnu.org/archive/html/qemu-devel/2016-02/msg05419.html > > Help welcome > > -- > Marc-André Lureau
[Qemu-devel] chardev: vhost-user: reconnect issue when QEMU as server
Hi, Currently, the typical usage of vhost-user is QEMU as the client, while the backend (say DPDK vhost-user) be the server. There is a major drawback: the restart of the backend (say, upgrade) needs restart the QEMU. The obvious solution would be let QEMU be the server and the backend be the client. I did a quick try before, and it would work as expected, if it's the first time the client tries to connect the server. However, it would not, if it's the second (or later) times, say a client restart. Per my understanding so far, there are two factors caused such issue: 1) QEMU socket char dev is designed as one server accepts one client only. A server will not accept another client unless there is no connection established, or the connection is disconnected. 2) For vhost-user case, QEMU serves as the initiator, that sends vhost-user messages on his own. In another word, QEMU will not poll the socket fd, unlike DPDK vhost-user. That is to say QEMU has no idea to know when the connection is disconnected. According to 1), QEMU will not be able to accept the second connect from DPDK, when DPDK vhost restarts. In summary, we need figure out a way to detect the disconnect at QEMU. I came up with a solution that might work: we poll the socket fd, and peek (instead of read) the data. Peek would fail if a connection is disconnected. I had a quick hack, unluckily, it didn't work as expected. I'm thinking I might have missed something. I could spend more time to dig it. But before that, I want to know how you guys think about it? Does that proposal makes sense to you? Or, any better ideas? Thanks. --yliu
Re: [Qemu-devel] [PATCH] vhost: drop dead code
On Wed, Dec 02, 2015 at 01:50:35PM +0200, Michael S. Tsirkin wrote: > We dropped the implementation of vhost_dev_query, > drop it from the header file as well. > > Signed-off-by: Michael S. Tsirkin Reviewed-by: Yuanhan Liu --yliu > --- > include/hw/virtio/vhost.h | 1 - > 1 file changed, 1 deletion(-) > > diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h > index 7437fd4..b60d758 100644 > --- a/include/hw/virtio/vhost.h > +++ b/include/hw/virtio/vhost.h > @@ -66,7 +66,6 @@ struct vhost_dev { > int vhost_dev_init(struct vhost_dev *hdev, void *opaque, > VhostBackendType backend_type); > void vhost_dev_cleanup(struct vhost_dev *hdev); > -bool vhost_dev_query(struct vhost_dev *hdev, VirtIODevice *vdev); > int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice *vdev); > void vhost_dev_stop(struct vhost_dev *hdev, VirtIODevice *vdev); > int vhost_dev_enable_notifiers(struct vhost_dev *hdev, VirtIODevice *vdev); > -- > MST
Re: [Qemu-devel] [PATCH] Revert "vhost: send SET_VRING_ENABLE at start/stop"
On Thu, Nov 26, 2015 at 06:52:56PM +0200, Michael S. Tsirkin wrote: > On Thu, Nov 26, 2015 at 09:46:08AM +0800, Yuanhan Liu wrote: > > On Wed, Nov 25, 2015 at 02:42:05PM +0200, Michael S. Tsirkin wrote: > > > This reverts commit 3a12f32229a046f4d4ab0a3a52fb01d2d5a1ab76. > > > > > > In case of live migration several queues can be enabled and not only the > > > first one. So informing backend that only the first queue is enabled is > > > wrong. > > > > Reviewed-by: Yuanhan Liu > > > > BTW, we should also update the spec about ring stop, right? > > > > --yliu > > Pls take a look at for_upstream in my tree, and tell me > whether there's anything we need to clarify. I had a quick check, and found it's fine. I was thinking we are still clarifying that there are two ways for signing a ring stop: VHOST_USER_GET_VRING_BASE and VHOST_UESR_SET_VRING_ENABLE. Sorry for the noisy then. --yliu
Re: [Qemu-devel] [PATCH] Revert "vhost: send SET_VRING_ENABLE at start/stop"
On Wed, Nov 25, 2015 at 02:42:05PM +0200, Michael S. Tsirkin wrote: > This reverts commit 3a12f32229a046f4d4ab0a3a52fb01d2d5a1ab76. > > In case of live migration several queues can be enabled and not only the > first one. So informing backend that only the first queue is enabled is > wrong. Reviewed-by: Yuanhan Liu BTW, we should also update the spec about ring stop, right? --yliu > > Reported-by: Thibaut Collet > Cc: Yuanhan Liu > Signed-off-by: Michael S. Tsirkin > --- > hw/virtio/vhost.c | 9 - > 1 file changed, 9 deletions(-) > > diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c > index 1794f0d..de29968 100644 > --- a/hw/virtio/vhost.c > +++ b/hw/virtio/vhost.c > @@ -1226,11 +1226,6 @@ int vhost_dev_start(struct vhost_dev *hdev, > VirtIODevice *vdev) > } > } > > -if (hdev->vhost_ops->vhost_set_vring_enable) { > -/* only enable first vq pair by default */ > -hdev->vhost_ops->vhost_set_vring_enable(hdev, hdev->vq_index == 0); > -} > - > return 0; > fail_log: > vhost_log_put(hdev, false); > @@ -1261,10 +1256,6 @@ void vhost_dev_stop(struct vhost_dev *hdev, > VirtIODevice *vdev) > hdev->vq_index + i); > } > > -if (hdev->vhost_ops->vhost_set_vring_enable) { > -hdev->vhost_ops->vhost_set_vring_enable(hdev, 0); > -} > - > vhost_log_put(hdev, true); > hdev->started = false; > hdev->log = NULL; > -- > MST
Re: [Qemu-devel] [PATCH for-2.5 1/1] vhost-user: do not send SET_VRING_ENABLE at start
On Tue, Nov 24, 2015 at 11:23:34PM +0200, Michael S. Tsirkin wrote: > On Tue, Nov 24, 2015 at 10:05:27PM +0100, Thibaut Collet wrote: > > On Tue, Nov 24, 2015 at 9:52 PM, Michael S. Tsirkin wrote: > > > On Tue, Nov 24, 2015 at 05:10:36PM +0100, Thibaut Collet wrote: > > >> This patch reverts partially commit 3a12f32229a. > > >> > > >> In case of live migration several queues can be enabled and not only the > > >> first > > >> one. So inform backend that only the first queue is enabled is wrong. > > >> > > >> Since commit 7263a0ad7899 backend is already notified of the state of > > >> the vring > > >> through the vring attach operation. This function, called during the > > >> startup > > >> sequence, provides the correct state of the vring, even in case of live > > >> migration. > > >> > > >> So nothing has to be added to give the vring state to the backend at the > > >> startup. > > >> > > >> Signed-off-by: Thibaut Collet > > >> --- > > >> hw/virtio/vhost.c | 5 - > > >> 1 file changed, 5 deletions(-) > > >> > > >> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c > > >> index 1794f0d..870cd12 100644 > > >> --- a/hw/virtio/vhost.c > > >> +++ b/hw/virtio/vhost.c > > >> @@ -1226,11 +1226,6 @@ int vhost_dev_start(struct vhost_dev *hdev, > > >> VirtIODevice *vdev) > > >> } > > >> } > > >> > > >> -if (hdev->vhost_ops->vhost_set_vring_enable) { > > >> -/* only enable first vq pair by default */ > > >> -hdev->vhost_ops->vhost_set_vring_enable(hdev, hdev->vq_index == > > >> 0); > > >> -} > > >> - > > >> return 0; > > >> fail_log: > > >> vhost_log_put(hdev, false); > > >> -- > > >> 2.1.4 > > > > > > Yes - and I'm beginning to think that maybe we should revert > > > all of 3a12f32229a then, for symmetry. > > > > > > > Keep the disable vring on the stop can be useful. For example if the > > VM is rebooted all the vring will be disabled and backend will avoid > > to send packet to the VM in this case (I am not sure the virtio ring > > address is always valid during a reboot and writingg data in this > > memory can cause unexpected behaviour in this case). > > I think there's still some confusion: > writing memory can still happen even if you disable the ring > since the TX ring is still processed so we write into the used ring. > > We call GET_VRING_BASE on stop and that ensures rings are > stopped. Yes, that's what I suggested first, which also makes the logic quite simple: we use GET_VRING_BASE as the sign of vring stop. Intead of GET_VRING_BASE when protocol not negotiated, and SET_VRING_ENABLE when protocol is negotiated. Michael, should I submit a revert patch, or you could do it directly? --yliu > > > > > Yunnan, Victor - what do you think? > > > > > > -- > > > MST
Re: [Qemu-devel] [PATCH for-2.5] vhost-user: clarify start and enable
On Mon, Nov 23, 2015 at 12:54:56PM +0200, Michael S. Tsirkin wrote: > It seems that we currently have some duplication between > started and enabled states. > > The actual reason is that enable is not documented correctly: > what it does is connecting ring to the backend. > > This is important for MQ, because a Linux guest expects TX > packets to be completed even if it disables some queues > temporarily. Thanks for the clarification. And Reviewed-by: Yuanhan Liu --yliu > > Cc: Yuanhan Liu > Cc: Victor Kaplansky > Signed-off-by: Michael S. Tsirkin > --- > docs/specs/vhost-user.txt | 20 +++- > 1 file changed, 15 insertions(+), 5 deletions(-) > > diff --git a/docs/specs/vhost-user.txt b/docs/specs/vhost-user.txt > index 7b9cd6d..0312d40 100644 > --- a/docs/specs/vhost-user.txt > +++ b/docs/specs/vhost-user.txt > @@ -148,13 +148,23 @@ a feature bit was dedicated for this purpose: > > Starting and stopping rings > -- > -Client must only process each ring when it is both started and enabled. > +Client must only process each ring when it is started. > + > +Client must only pass data between the ring and the > +backend, when the ring is enabled. > + > +If ring is started but disabled, client must process the > +ring without talking to the backend. > + > +For example, for a networking device, in the disabled state > +client must not supply any new RX packets, but must process > +and discard any TX packets. > > If VHOST_USER_F_PROTOCOL_FEATURES has not been negotiated, the ring is > initialized > in an enabled state. > > If VHOST_USER_F_PROTOCOL_FEATURES has been negotiated, the ring is > initialized > -in a disabled state. Client must not process it until ring is enabled by > +in a disabled state. Client must not pass data to/from the backend until > ring is enabled by > VHOST_USER_SET_VRING_ENABLE with parameter 1, or after it has been disabled > by > VHOST_USER_SET_VRING_ENABLE with parameter 0. > > @@ -166,7 +176,7 @@ descriptor is readable) on the descriptor specified by > VHOST_USER_SET_VRING_KICK, and stop ring upon receiving > VHOST_USER_GET_VRING_BASE. > > -While processing the rings (when they are started and enabled), client must > +While processing the rings (whether they are enabled or not), client must > support changing some configuration aspects on the fly. > > Multiple queue support > @@ -309,11 +319,11 @@ Message types >Id: 4 >Master payload: N/A > > - This is no longer used. Used to be sent to request stopping > + This is no longer used. Used to be sent to request disabling >all rings, but some clients interpreted it to also discard >connection state (this interpretation would lead to bugs). >It is recommended that clients either ignore this message, > - or use it to stop all rings. > + or use it to disable all rings. > > * VHOST_USER_SET_MEM_TABLE > > -- > MST
Re: [Qemu-devel] [PATCH v4 0/5] handle vhost reset/start/stop correctly
On Fri, Nov 13, 2015 at 12:24:52PM +0200, Michael S. Tsirkin wrote: > On Fri, Nov 13, 2015 at 10:03:29AM +0800, Yuanhan Liu wrote: > > On Thu, Nov 12, 2015 at 04:44:19PM +0200, Michael S. Tsirkin wrote: > > > On Thu, Nov 12, 2015 at 10:08:15PM +0800, Yuanhan Liu wrote: > > > > On Thu, Nov 12, 2015 at 03:33:41PM +0200, Michael S. Tsirkin wrote: > > > > > On Wed, Nov 11, 2015 at 09:24:36PM +0800, Yuanhan Liu wrote: > > > > > > > > > > > > Patch 1 rename RESET_DEVICE back to RESET_OWNER > > > > > > > > > > > > Patch 2 introduced a new function: vhost_net_reset(), which is > > > > > > invoked > > > > > > when reset happens, say, driver is unloaded (unbinded). > > > > > > > > > > > > Michael suggested to do that only when MQ is not negotiated. > > > > > > However, reset happens no matter MQ is enabled or negotiated > > > > > > or not, and we should give a sign to backend to reset some > > > > > > device to a proper state after it happens. > > > > > > > > > > I don't think it's needed: we set all state at start anyway. > > > > > > > > Agree with that. > > > > > > > > > > > > > > > Note that the message sent is still RESET_OWNER. It might > > > > > > not > > > > > > be a good idea, but we could not simply rename it to > > > > > > RESET_DEVICE, > > > > > > and maybe adding another RESET_DEVICE might be better. > > > > > > > > > > So modern clients don't need this at all. Old clients need something > > > > > to > > > > > stop device, but singling out reset is not a good idea: even if driver > > > > > is unloaded, you need to do that. snabbswitch (ab)uses RESET_OWNER > > > > > for > > > > > this, so maybe RESET_OWNER should be renamed DISABLE_ALL, to just make > > > > > it stop all queues. > > > > > > > > > > Does any dpdk version that was released respond to RESET_OWNER in some > > > > > way? How exactly? > > > > > > > > It just resets some states, such as closing call fd and kick fd, > > > > unmapping buf from hugetlbfs from set_mem_table. > > > > > > And is this in some dpdk version that has been released? > > > > Yes, it's been there long time ago, and you can find them in recent > > releases (say, v2.1). > > > > However, QEMU just added RESET_OWNER since v2.4.0, and it's likely that > > we will remove it since this version, v2.5.0. > > > I see. So I think that if MQ is negotiated, we should not send it. > VRING_ENABLE(0) is enough. But what if MQ is not negotiated? > In any case, it must be done after all rings are stopped in vhost_dev_stop. > Otherwise we can not retrieve the ring index so we can't resume. > > I'm not sure it's worth the hassle. > To work correctly, when MQ is not negotiated then clients > need to stop ring on GET_VRING_BASE, and that seems enough > to make everything work correctly across reboot. > If MQ is negotiated then there's VRING_ENABLE so hacks > are not needed anymore. I'm okay with that. And if so, we may need put those kind of information to vhost-user spec, to make it a standard. Or, I'm wondering could we make it be consistent: sending GET_VRING_BASE as the only sign of vhost dev stop, no matter MQ is negotiated or not? And I'm re-think what's the necessary of sending VRING_ENABLE at start/stop? Just because it's a better name than GET_VRING_BASE? With current code, VRING_ENABLE will be sent in following several places: - virtio_net_set_queues, which will be invoked in the virtio net initiation stage. And it also will be invoked after user executing "ethtool -L" command. - vhost_dev_start - vhost_dev_stop We might need take different actions for each above cases, say we need release some resources (such as closing the kick fd) on vhost_dev_stop. But we definitely could not do that for VRING_ENABLE messages from ethtool as user can re-run the ethtool command to bring those vrings back to functional: doing those kind of resource release would not be able to bring it back. So, it might be difficult (or more accurate, messy) for the client to figure out which case it is: at which time we should release some resources, and at which time we should not. --yliu > > Cc Luke for an o
[Qemu-devel] [PATCH 1/2] vhost: let SET_VRING_ENABLE message depends on protocol feature
But not depend on PROTOCOL_F_MQ feature bit. So that we could use SET_VRING_ENABLE to sign the backend on stop, even if MQ is disabled. That's reasonable, since we will have one queue pair at least. Signed-off-by: Yuanhan Liu --- hw/virtio/vhost-user.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c index 4766f98..a9c8335 100644 --- a/hw/virtio/vhost-user.c +++ b/hw/virtio/vhost-user.c @@ -331,7 +331,7 @@ static int vhost_user_set_vring_enable(struct vhost_dev *dev, int enable) .num = enable, }; -if (!(dev->protocol_features & (1ULL << VHOST_USER_PROTOCOL_F_MQ))) { +if (!virtio_has_feature(dev->features, VHOST_USER_F_PROTOCOL_FEATURES)) { return -1; } -- 1.9.3
[Qemu-devel] [PATCH 2/2] vhost: don't send RESET_OWNER at stop
First of all, RESET_OWNER message is sent incorrectly, as it's sent before GET_VRING_BASE. And the reset message would let the later call get nothing correct. And, sending SET_VRING_ENABLE at stop, which has already been done, makes more sense than RESET_OWNER. Signed-off-by: Yuanhan Liu --- hw/net/vhost_net.c | 6 -- 1 file changed, 6 deletions(-) diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c index d91b7b1..14337a4 100644 --- a/hw/net/vhost_net.c +++ b/hw/net/vhost_net.c @@ -292,12 +292,6 @@ static void vhost_net_stop_one(struct vhost_net *net, int r = vhost_ops->vhost_net_set_backend(&net->dev, &file); assert(r >= 0); } -} else if (net->nc->info->type == NET_CLIENT_OPTIONS_KIND_VHOST_USER) { -for (file.index = 0; file.index < net->dev.nvqs; ++file.index) { -const VhostOps *vhost_ops = net->dev.vhost_ops; -int r = vhost_ops->vhost_reset_device(&net->dev); -assert(r >= 0); -} } if (net->nc->info->poll) { net->nc->info->poll(net->nc, true); -- 1.9.3
Re: [Qemu-devel] [PATCH v4 0/5] handle vhost reset/start/stop correctly
On Thu, Nov 12, 2015 at 04:44:19PM +0200, Michael S. Tsirkin wrote: > On Thu, Nov 12, 2015 at 10:08:15PM +0800, Yuanhan Liu wrote: > > On Thu, Nov 12, 2015 at 03:33:41PM +0200, Michael S. Tsirkin wrote: > > > On Wed, Nov 11, 2015 at 09:24:36PM +0800, Yuanhan Liu wrote: > > > > > > > > Patch 1 rename RESET_DEVICE back to RESET_OWNER > > > > > > > > Patch 2 introduced a new function: vhost_net_reset(), which is invoked > > > > when reset happens, say, driver is unloaded (unbinded). > > > > > > > > Michael suggested to do that only when MQ is not negotiated. > > > > However, reset happens no matter MQ is enabled or negotiated > > > > or not, and we should give a sign to backend to reset some > > > > device to a proper state after it happens. > > > > > > I don't think it's needed: we set all state at start anyway. > > > > Agree with that. > > > > > > > > > Note that the message sent is still RESET_OWNER. It might not > > > > be a good idea, but we could not simply rename it to > > > > RESET_DEVICE, > > > > and maybe adding another RESET_DEVICE might be better. > > > > > > So modern clients don't need this at all. Old clients need something to > > > stop device, but singling out reset is not a good idea: even if driver > > > is unloaded, you need to do that. snabbswitch (ab)uses RESET_OWNER for > > > this, so maybe RESET_OWNER should be renamed DISABLE_ALL, to just make > > > it stop all queues. > > > > > > Does any dpdk version that was released respond to RESET_OWNER in some > > > way? How exactly? > > > > It just resets some states, such as closing call fd and kick fd, > > unmapping buf from hugetlbfs from set_mem_table. > > And is this in some dpdk version that has been released? Yes, it's been there long time ago, and you can find them in recent releases (say, v2.1). However, QEMU just added RESET_OWNER since v2.4.0, and it's likely that we will remove it since this version, v2.5.0. > > > And, apparently we could do that on stop, too. So, from this pov, we > > don't need RESET_OWNER. > > > > > > > > > Patch 3 and 4 send SET_PROTOCOL_FEATURES at start, just like we send > > > > SET_FEATURES. > > > > > > > > Patch 5 send SET_VRING_ENABLE at start/stop > > > > > > > > Michael, I intended to send it when MQ is negotiated as you > > > > suggested, > > > > however, I found that VHOST_USER_PROTOCOL_F_MQ is defined in > > > > vhost-user.c, > > > > which is not accessible to vhost.c. > > > > > > > > Exporting it to vhost.h will resolve that, however, it's not a > > > > good > > > > idea to move vhost user specific stuff to there. We could also > > > > introduce > > > > another vhost callback to test whether MQ is negotiated: I just > > > > don't > > > > think it's worthy. > > > > > > > > Hence, here I just used a simple test: invoke > > > > set_vring_enable() just > > > > when it is defined. Judging the callback self has another MQ > > > > check, > > > > I guess it's okay. > > > > > > > > And sorry that it took so long to send this version. > > > > > > Hmm, you are saying everyone needs SET_VRING_ENABLE? > > > Maybe we should make SET_VRING_ENABLE depend on protocol features then, > > > and not MQ? > > > > I'm thinking something same. Otherwise, there is still no way to inform > > the backend (or client) when a vhost dev is stopped when MQ is disabled > > (which is the default state). > > > > So, let's assume all clients have protocol features enabled, and send > > SET_VRING_ENABLE at start/stop? And if it does not, it's just like we > > are back to QEMU v2.3, where no RESET_OWNER nor SET_VRING_ENABLE > > messages are sent on stop: it worked before, and it should also work > > now. > > So maybe we should drop RESET_OWNER from stop then? Yeah, agree with you. Sending reset meessage at stop doesn't make too much sense. > > > > > > > I applied patches 1 and 5 for now. > > > > Do you have comment about patch 3 and 4? Should we set protocol features > > just like we set features at start? > > > > Thanks. > > > > --yliu > > We don't set the at start - we set them on connect. Okay to me. I will drop them then. --yliu
Re: [Qemu-devel] [PATCH v4 0/5] handle vhost reset/start/stop correctly
On Thu, Nov 12, 2015 at 03:33:41PM +0200, Michael S. Tsirkin wrote: > On Wed, Nov 11, 2015 at 09:24:36PM +0800, Yuanhan Liu wrote: > > > > Patch 1 rename RESET_DEVICE back to RESET_OWNER > > > > Patch 2 introduced a new function: vhost_net_reset(), which is invoked > > when reset happens, say, driver is unloaded (unbinded). > > > > Michael suggested to do that only when MQ is not negotiated. > > However, reset happens no matter MQ is enabled or negotiated > > or not, and we should give a sign to backend to reset some > > device to a proper state after it happens. > > I don't think it's needed: we set all state at start anyway. Agree with that. > > > Note that the message sent is still RESET_OWNER. It might not > > be a good idea, but we could not simply rename it to RESET_DEVICE, > > and maybe adding another RESET_DEVICE might be better. > > So modern clients don't need this at all. Old clients need something to > stop device, but singling out reset is not a good idea: even if driver > is unloaded, you need to do that. snabbswitch (ab)uses RESET_OWNER for > this, so maybe RESET_OWNER should be renamed DISABLE_ALL, to just make > it stop all queues. > > Does any dpdk version that was released respond to RESET_OWNER in some > way? How exactly? It just resets some states, such as closing call fd and kick fd, unmapping buf from hugetlbfs from set_mem_table. And, apparently we could do that on stop, too. So, from this pov, we don't need RESET_OWNER. > > > Patch 3 and 4 send SET_PROTOCOL_FEATURES at start, just like we send > > SET_FEATURES. > > > > Patch 5 send SET_VRING_ENABLE at start/stop > > > > Michael, I intended to send it when MQ is negotiated as you > > suggested, > > however, I found that VHOST_USER_PROTOCOL_F_MQ is defined in > > vhost-user.c, > > which is not accessible to vhost.c. > > > > Exporting it to vhost.h will resolve that, however, it's not a good > > idea to move vhost user specific stuff to there. We could also > > introduce > > another vhost callback to test whether MQ is negotiated: I just > > don't > > think it's worthy. > > > > Hence, here I just used a simple test: invoke set_vring_enable() > > just > > when it is defined. Judging the callback self has another MQ check, > > I guess it's okay. > > > > And sorry that it took so long to send this version. > > Hmm, you are saying everyone needs SET_VRING_ENABLE? > Maybe we should make SET_VRING_ENABLE depend on protocol features then, > and not MQ? I'm thinking something same. Otherwise, there is still no way to inform the backend (or client) when a vhost dev is stopped when MQ is disabled (which is the default state). So, let's assume all clients have protocol features enabled, and send SET_VRING_ENABLE at start/stop? And if it does not, it's just like we are back to QEMU v2.3, where no RESET_OWNER nor SET_VRING_ENABLE messages are sent on stop: it worked before, and it should also work now. > > I applied patches 1 and 5 for now. Do you have comment about patch 3 and 4? Should we set protocol features just like we set features at start? Thanks. --yliu
[Qemu-devel] [PATCH v4 5/5] vhost: send SET_VRING_ENABLE at start/stop
Send SET_VRING_ENABLE at start/stop, to give the backend an explicit sign of our state. Signed-off-by: Yuanhan Liu --- hw/virtio/vhost.c | 9 + 1 file changed, 9 insertions(+) diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c index be48511..0e956b5 100644 --- a/hw/virtio/vhost.c +++ b/hw/virtio/vhost.c @@ -1234,6 +1234,11 @@ int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice *vdev) } } +if (hdev->vhost_ops->vhost_set_vring_enable) { +/* only enable first vq pair by default */ +hdev->vhost_ops->vhost_set_vring_enable(hdev, hdev->vq_index == 0); +} + return 0; fail_log: vhost_log_put(hdev, false); @@ -1264,6 +1269,10 @@ void vhost_dev_stop(struct vhost_dev *hdev, VirtIODevice *vdev) hdev->vq_index + i); } +if (hdev->vhost_ops->vhost_set_vring_enable) { +hdev->vhost_ops->vhost_set_vring_enable(hdev, 0); +} + vhost_log_put(hdev, true); hdev->started = false; hdev->log = NULL; -- 1.9.0
[Qemu-devel] [PATCH v4 4/5] vhost: send SET_PROTOCOL_FEATURES at start
So that the backend can restore the protocol features after a reset. Signed-off-by: Yuanhan Liu --- hw/virtio/vhost.c | 8 1 file changed, 8 insertions(+) diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c index de29968..be48511 100644 --- a/hw/virtio/vhost.c +++ b/hw/virtio/vhost.c @@ -1195,6 +1195,14 @@ int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice *vdev) if (r < 0) { goto fail_features; } +if (hdev->vhost_ops->vhost_set_protocol_features) { +r = hdev->vhost_ops->vhost_set_protocol_features(hdev, +hdev->protocol_features); +if (r < 0) { +goto fail_features; +} +} + r = hdev->vhost_ops->vhost_set_mem_table(hdev, hdev->mem); if (r < 0) { r = -errno; -- 1.9.0
[Qemu-devel] [PATCH v4 1/5] vhost: rename RESET_DEVICE backto RESET_OWNER
This patch basically reverts commit d1f8b30e. It turned out that it breaks stuff, so revert it: http://lists.nongnu.org/archive/html/qemu-devel/2015-10/msg00949.html CC: "Michael S. Tsirkin" Reported-by: Paolo Bonzini Signed-off-by: Yuanhan Liu --- docs/specs/vhost-user.txt | 4 ++-- hw/virtio/vhost-backend.c | 2 +- hw/virtio/vhost-user.c | 6 +++--- linux-headers/linux/vhost.h | 2 +- tests/vhost-user-bridge.c | 6 +++--- tests/vhost-user-test.c | 4 ++-- 6 files changed, 12 insertions(+), 12 deletions(-) diff --git a/docs/specs/vhost-user.txt b/docs/specs/vhost-user.txt index e0d71e2..d319715 100644 --- a/docs/specs/vhost-user.txt +++ b/docs/specs/vhost-user.txt @@ -255,10 +255,10 @@ Message types as an owner of the session. This can be used on the Slave as a "session start" flag. - * VHOST_USER_RESET_DEVICE + * VHOST_USER_RESET_OWNER Id: 4 - Equivalent ioctl: VHOST_RESET_DEVICE + Equivalent ioctl: VHOST_RESET_OWNER Master payload: N/A Issued when a new connection is about to be closed. The Master will no diff --git a/hw/virtio/vhost-backend.c b/hw/virtio/vhost-backend.c index 1d5f684..b734a60 100644 --- a/hw/virtio/vhost-backend.c +++ b/hw/virtio/vhost-backend.c @@ -156,7 +156,7 @@ static int vhost_kernel_set_owner(struct vhost_dev *dev) static int vhost_kernel_reset_device(struct vhost_dev *dev) { -return vhost_kernel_call(dev, VHOST_RESET_DEVICE, NULL); +return vhost_kernel_call(dev, VHOST_RESET_OWNER, NULL); } static int vhost_kernel_get_vq_index(struct vhost_dev *dev, int idx) diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c index 83c84f1..4766f98 100644 --- a/hw/virtio/vhost-user.c +++ b/hw/virtio/vhost-user.c @@ -43,7 +43,7 @@ typedef enum VhostUserRequest { VHOST_USER_GET_FEATURES = 1, VHOST_USER_SET_FEATURES = 2, VHOST_USER_SET_OWNER = 3, -VHOST_USER_RESET_DEVICE = 4, +VHOST_USER_RESET_OWNER = 4, VHOST_USER_SET_MEM_TABLE = 5, VHOST_USER_SET_LOG_BASE = 6, VHOST_USER_SET_LOG_FD = 7, @@ -157,7 +157,7 @@ static bool vhost_user_one_time_request(VhostUserRequest request) { switch (request) { case VHOST_USER_SET_OWNER: -case VHOST_USER_RESET_DEVICE: +case VHOST_USER_RESET_OWNER: case VHOST_USER_SET_MEM_TABLE: case VHOST_USER_GET_QUEUE_NUM: return true; @@ -486,7 +486,7 @@ static int vhost_user_set_owner(struct vhost_dev *dev) static int vhost_user_reset_device(struct vhost_dev *dev) { VhostUserMsg msg = { -.request = VHOST_USER_RESET_DEVICE, +.request = VHOST_USER_RESET_OWNER, .flags = VHOST_USER_VERSION, }; diff --git a/linux-headers/linux/vhost.h b/linux-headers/linux/vhost.h index 14a0160..ead86db 100644 --- a/linux-headers/linux/vhost.h +++ b/linux-headers/linux/vhost.h @@ -78,7 +78,7 @@ struct vhost_memory { #define VHOST_SET_OWNER _IO(VHOST_VIRTIO, 0x01) /* Give up ownership, and reset the device to default values. * Allows subsequent call to VHOST_OWNER_SET to succeed. */ -#define VHOST_RESET_DEVICE _IO(VHOST_VIRTIO, 0x02) +#define VHOST_RESET_OWNER _IO(VHOST_VIRTIO, 0x02) /* Set up/modify memory layout */ #define VHOST_SET_MEM_TABLE_IOW(VHOST_VIRTIO, 0x03, struct vhost_memory) diff --git a/tests/vhost-user-bridge.c b/tests/vhost-user-bridge.c index fa18ad5..864f69e 100644 --- a/tests/vhost-user-bridge.c +++ b/tests/vhost-user-bridge.c @@ -188,7 +188,7 @@ typedef enum VhostUserRequest { VHOST_USER_GET_FEATURES = 1, VHOST_USER_SET_FEATURES = 2, VHOST_USER_SET_OWNER = 3, -VHOST_USER_RESET_DEVICE = 4, +VHOST_USER_RESET_OWNER = 4, VHOST_USER_SET_MEM_TABLE = 5, VHOST_USER_SET_LOG_BASE = 6, VHOST_USER_SET_LOG_FD = 7, @@ -274,7 +274,7 @@ static const char *vubr_request_str[] = { [VHOST_USER_GET_FEATURES] = "VHOST_USER_GET_FEATURES", [VHOST_USER_SET_FEATURES] = "VHOST_USER_SET_FEATURES", [VHOST_USER_SET_OWNER] = "VHOST_USER_SET_OWNER", -[VHOST_USER_RESET_DEVICE] = "VHOST_USER_RESET_DEVICE", +[VHOST_USER_RESET_OWNER] = "VHOST_USER_RESET_OWNER", [VHOST_USER_SET_MEM_TABLE] = "VHOST_USER_SET_MEM_TABLE", [VHOST_USER_SET_LOG_BASE] = "VHOST_USER_SET_LOG_BASE", [VHOST_USER_SET_LOG_FD] = "VHOST_USER_SET_LOG_FD", @@ -921,7 +921,7 @@ vubr_execute_request(VubrDev *dev, VhostUserMsg *vmsg) return vubr_set_features_exec(dev, vmsg); case VHOST_USER_SET_OWNER: return vubr_set_owner_exec(dev, vmsg); -case VHOST_USER_RESET_DEVICE: +case VHOST_USER_RESET_OWNER: return vubr_reset_device_exec(dev, vmsg); case VHOST_USER_SET_MEM_TABLE: return vubr_set_mem_table_exec(dev, vmsg); diff --git a/tests/vhost-user-test.c b/tests/vhost-user-test.c index b6dde75..66778c4 100644 --- a/t
[Qemu-devel] [PATCH v4 0/5] handle vhost reset/start/stop correctly
Patch 1 rename RESET_DEVICE back to RESET_OWNER Patch 2 introduced a new function: vhost_net_reset(), which is invoked when reset happens, say, driver is unloaded (unbinded). Michael suggested to do that only when MQ is not negotiated. However, reset happens no matter MQ is enabled or negotiated or not, and we should give a sign to backend to reset some device to a proper state after it happens. Note that the message sent is still RESET_OWNER. It might not be a good idea, but we could not simply rename it to RESET_DEVICE, and maybe adding another RESET_DEVICE might be better. Patch 3 and 4 send SET_PROTOCOL_FEATURES at start, just like we send SET_FEATURES. Patch 5 send SET_VRING_ENABLE at start/stop Michael, I intended to send it when MQ is negotiated as you suggested, however, I found that VHOST_USER_PROTOCOL_F_MQ is defined in vhost-user.c, which is not accessible to vhost.c. Exporting it to vhost.h will resolve that, however, it's not a good idea to move vhost user specific stuff to there. We could also introduce another vhost callback to test whether MQ is negotiated: I just don't think it's worthy. Hence, here I just used a simple test: invoke set_vring_enable() just when it is defined. Judging the callback self has another MQ check, I guess it's okay. And sorry that it took so long to send this version. --- Yuanhan Liu (5): vhost: rename RESET_DEVICE backto RESET_OWNER vhost: reset vhost net when virtio_net_reset happens vhost: introduce vhost_set/get_protocol_features callbacks vhost: send SET_PROTOCOL_FEATURES at start vhost: send SET_VRING_ENABLE at start/stop docs/specs/vhost-user.txt | 4 ++-- hw/net/vhost_net.c| 20 ++-- hw/net/virtio-net.c | 14 ++ hw/virtio/vhost-backend.c | 2 +- hw/virtio/vhost-user.c| 13 ++--- hw/virtio/vhost.c | 17 + include/hw/virtio/vhost-backend.h | 6 ++ include/net/vhost_net.h | 1 + linux-headers/linux/vhost.h | 2 +- tests/vhost-user-bridge.c | 6 +++--- tests/vhost-user-test.c | 4 ++-- 11 files changed, 71 insertions(+), 18 deletions(-) -- 1.9.0
[Qemu-devel] [PATCH v4 3/5] vhost: introduce vhost_set/get_protocol_features callbacks
Signed-off-by: Yuanhan Liu --- hw/virtio/vhost-user.c| 7 +++ include/hw/virtio/vhost-backend.h | 6 ++ 2 files changed, 13 insertions(+) diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c index 4766f98..2d8bdbd 100644 --- a/hw/virtio/vhost-user.c +++ b/hw/virtio/vhost-user.c @@ -471,6 +471,11 @@ static int vhost_user_get_features(struct vhost_dev *dev, uint64_t *features) return vhost_user_get_u64(dev, VHOST_USER_GET_FEATURES, features); } +static int vhost_user_get_protocol_features(struct vhost_dev *dev, uint64_t *features) +{ +return vhost_user_get_u64(dev, VHOST_USER_GET_PROTOCOL_FEATURES, features); +} + static int vhost_user_set_owner(struct vhost_dev *dev) { VhostUserMsg msg = { @@ -616,6 +621,8 @@ const VhostOps user_ops = { .vhost_set_vring_call = vhost_user_set_vring_call, .vhost_set_features = vhost_user_set_features, .vhost_get_features = vhost_user_get_features, +.vhost_set_protocol_features = vhost_user_set_protocol_features, +.vhost_get_protocol_features = vhost_user_get_protocol_features, .vhost_set_owner = vhost_user_set_owner, .vhost_reset_device = vhost_user_reset_device, .vhost_get_vq_index = vhost_user_get_vq_index, diff --git a/include/hw/virtio/vhost-backend.h b/include/hw/virtio/vhost-backend.h index c59cc81..7e705ce 100644 --- a/include/hw/virtio/vhost-backend.h +++ b/include/hw/virtio/vhost-backend.h @@ -62,6 +62,10 @@ typedef int (*vhost_set_features_op)(struct vhost_dev *dev, uint64_t features); typedef int (*vhost_get_features_op)(struct vhost_dev *dev, uint64_t *features); +typedef int (*vhost_set_protocol_features_op)(struct vhost_dev *dev, + uint64_t features); +typedef int (*vhost_get_protocol_features_op)(struct vhost_dev *dev, + uint64_t *features); typedef int (*vhost_set_owner_op)(struct vhost_dev *dev); typedef int (*vhost_reset_device_op)(struct vhost_dev *dev); typedef int (*vhost_get_vq_index_op)(struct vhost_dev *dev, int idx); @@ -91,6 +95,8 @@ typedef struct VhostOps { vhost_set_vring_call_op vhost_set_vring_call; vhost_set_features_op vhost_set_features; vhost_get_features_op vhost_get_features; +vhost_set_protocol_features_op vhost_set_protocol_features; +vhost_get_protocol_features_op vhost_get_protocol_features; vhost_set_owner_op vhost_set_owner; vhost_reset_device_op vhost_reset_device; vhost_get_vq_index_op vhost_get_vq_index; -- 1.9.0
[Qemu-devel] [PATCH v4 2/5] vhost: reset vhost net when virtio_net_reset happens
When a virtio net driver is unloaded (unbind), virtio_net_reset happens. For vhost net, we should also send a message (RESET_OWNER) to the backend to do some proper reset settings. Signed-off-by: Yuanhan Liu --- hw/net/vhost_net.c | 20 ++-- hw/net/virtio-net.c | 14 ++ include/net/vhost_net.h | 1 + 3 files changed, 29 insertions(+), 6 deletions(-) diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c index d91b7b1..387f851 100644 --- a/hw/net/vhost_net.c +++ b/hw/net/vhost_net.c @@ -292,12 +292,6 @@ static void vhost_net_stop_one(struct vhost_net *net, int r = vhost_ops->vhost_net_set_backend(&net->dev, &file); assert(r >= 0); } -} else if (net->nc->info->type == NET_CLIENT_OPTIONS_KIND_VHOST_USER) { -for (file.index = 0; file.index < net->dev.nvqs; ++file.index) { -const VhostOps *vhost_ops = net->dev.vhost_ops; -int r = vhost_ops->vhost_reset_device(&net->dev); -assert(r >= 0); -} } if (net->nc->info->poll) { net->nc->info->poll(net->nc, true); @@ -382,6 +376,16 @@ void vhost_net_stop(VirtIODevice *dev, NetClientState *ncs, assert(vhost_net_set_vnet_endian(dev, ncs[0].peer, false) >= 0); } +void vhost_net_reset(VirtIODevice *vdev) +{ +VirtIONet *n = VIRTIO_NET(vdev); +struct vhost_net *net = get_vhost_net(n->nic->ncs[0].peer); + +if (net->nc->info->type == NET_CLIENT_OPTIONS_KIND_VHOST_USER) { +net->dev.vhost_ops->vhost_reset_device(&net->dev); +} +} + void vhost_net_cleanup(struct vhost_net *net) { vhost_dev_cleanup(&net->dev); @@ -469,6 +473,10 @@ void vhost_net_stop(VirtIODevice *dev, { } +void vhost_net_reset(VirtIODevice *vdev) +{ +} + void vhost_net_cleanup(struct vhost_net *net) { } diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c index a877614..75472ba 100644 --- a/hw/net/virtio-net.c +++ b/hw/net/virtio-net.c @@ -318,10 +318,24 @@ static RxFilterInfo *virtio_net_query_rxfilter(NetClientState *nc) return info; } +static void virtio_net_vhost_reset(VirtIONet *n) +{ +VirtIODevice *vdev = VIRTIO_DEVICE(n); +NetClientState *nc = qemu_get_queue(n->nic); + +if (!get_vhost_net(nc->peer)) { +return; +} + +vhost_net_reset(vdev); +} + static void virtio_net_reset(VirtIODevice *vdev) { VirtIONet *n = VIRTIO_NET(vdev); +virtio_net_vhost_reset(n); + /* Reset back to compatibility mode */ n->promisc = 1; n->allmulti = 0; diff --git a/include/net/vhost_net.h b/include/net/vhost_net.h index 3389b41..d1c1331 100644 --- a/include/net/vhost_net.h +++ b/include/net/vhost_net.h @@ -18,6 +18,7 @@ struct vhost_net *vhost_net_init(VhostNetOptions *options); int vhost_net_start(VirtIODevice *dev, NetClientState *ncs, int total_queues); void vhost_net_stop(VirtIODevice *dev, NetClientState *ncs, int total_queues); +void vhost_net_reset(VirtIODevice *dev); void vhost_net_cleanup(VHostNetState *net); -- 1.9.0
Re: [Qemu-devel] [PATCH] vhost-user: set link down when the char device is closed
Thanks. Reviewed-by: Yuanhan Liu --yliu On Wed, Nov 11, 2015 at 02:53:29PM +0800, Wen Congyang wrote: > Signed-off-by: Wen Congyang > --- > net/vhost-user.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/net/vhost-user.c b/net/vhost-user.c > index 0ebd7df..5071602 100644 > --- a/net/vhost-user.c > +++ b/net/vhost-user.c > @@ -188,7 +188,7 @@ static void net_vhost_user_event(void *opaque, int event) > qmp_set_link(name, true, &err); > break; > case CHR_EVENT_CLOSED: > -qmp_set_link(name, true, &err); > +qmp_set_link(name, false, &err); > vhost_user_stop(queues, ncs); > break; > } > -- > 2.5.0
Re: [Qemu-devel] [PULL 12/22] vhost: rename VHOST_RESET_OWNER to VHOST_RESET_DEVICE
On Fri, Nov 06, 2015 at 10:01:58AM +, Peter Maydell wrote: > On 6 November 2015 at 01:34, Yuanhan Liu wrote: > > On Thu, Nov 05, 2015 at 11:42:15AM +, Peter Maydell wrote: > >> On 3 October 2015 at 17:33, Michael S. Tsirkin wrote: > >> > On Fri, Oct 02, 2015 at 06:18:51PM +0200, Paolo Bonzini wrote: > >> >> > >> >> > >> >> On 24/09/2015 15:20, Michael S. Tsirkin wrote: > >> >> > From: Yuanhan Liu > >> >> > > >> >> > Quote from Michael: > >> >> > > >> >> > We really should rename VHOST_RESET_OWNER to VHOST_RESET_DEVICE. > >> >> > >> >> Where is the corresponding Linux patch for this? > >> >> > >> >> I would like to fetch the updated headers for KVM, and this is breaking > >> >> it. In fact, a patch that just renames the #define (without providing > >> >> the old name for backwards compatibility) would be NACKed in upstream > >> >> Linux. > >> >> > >> >> Paolo > >> > > >> > Right. And it turns out this whole approach is wrong. I intend to > >> > revert this patch, and also drop the patch sending VHOST_RESET_OWNER on > >> > device stop. > >> > >> This revert doesn't seem to have happened, I think, which means > >> that this is one of the things which prevents a clean header-update > >> against kvm/next. Could we get this fixed for rc0, please? > > > > My bad. I will fix it next week. What's the deadline for rc0 then? > > rc0 is 12th November (http://wiki.qemu.org/Planning/2.5). You need > to also allow time for the patch to be reviewed and possibly taken > via somebody's tree. Got it and thanks for the remind. --yliu
Re: [Qemu-devel] [PULL 12/22] vhost: rename VHOST_RESET_OWNER to VHOST_RESET_DEVICE
On Thu, Nov 05, 2015 at 11:42:15AM +, Peter Maydell wrote: > On 3 October 2015 at 17:33, Michael S. Tsirkin wrote: > > On Fri, Oct 02, 2015 at 06:18:51PM +0200, Paolo Bonzini wrote: > >> > >> > >> On 24/09/2015 15:20, Michael S. Tsirkin wrote: > >> > From: Yuanhan Liu > >> > > >> > Quote from Michael: > >> > > >> > We really should rename VHOST_RESET_OWNER to VHOST_RESET_DEVICE. > >> > >> Where is the corresponding Linux patch for this? > >> > >> I would like to fetch the updated headers for KVM, and this is breaking > >> it. In fact, a patch that just renames the #define (without providing > >> the old name for backwards compatibility) would be NACKed in upstream > >> Linux. > >> > >> Paolo > > > > Right. And it turns out this whole approach is wrong. I intend to > > revert this patch, and also drop the patch sending VHOST_RESET_OWNER on > > device stop. > > This revert doesn't seem to have happened, I think, which means > that this is one of the things which prevents a clean header-update > against kvm/next. Could we get this fixed for rc0, please? My bad. I will fix it next week. What's the deadline for rc0 then? BTW, as Michael stated, a simple revert is not enough. --yliu
[Qemu-devel] [PATCH v3 1/3] vhost: rename VHOST_RESET_DEVICE back to VHOST_RESET_OWNER
It turned out that it breaks stuff (VHOST_RESET_OWNER is not defined), so revert it: http://lists.nongnu.org/archive/html/qemu-devel/2015-10/msg00949.html CC: "Michael S. Tsirkin" Reported-by: Paolo Bonzini Signed-off-by: Yuanhan Liu --- docs/specs/vhost-user.txt | 4 ++-- hw/virtio/vhost-backend.c | 2 +- hw/virtio/vhost-user.c | 6 +++--- linux-headers/linux/vhost.h | 2 +- tests/vhost-user-test.c | 4 ++-- 5 files changed, 9 insertions(+), 9 deletions(-) diff --git a/docs/specs/vhost-user.txt b/docs/specs/vhost-user.txt index e0d71e2..d319715 100644 --- a/docs/specs/vhost-user.txt +++ b/docs/specs/vhost-user.txt @@ -255,10 +255,10 @@ Message types as an owner of the session. This can be used on the Slave as a "session start" flag. - * VHOST_USER_RESET_DEVICE + * VHOST_USER_RESET_OWNER Id: 4 - Equivalent ioctl: VHOST_RESET_DEVICE + Equivalent ioctl: VHOST_RESET_OWNER Master payload: N/A Issued when a new connection is about to be closed. The Master will no diff --git a/hw/virtio/vhost-backend.c b/hw/virtio/vhost-backend.c index 1d5f684..b734a60 100644 --- a/hw/virtio/vhost-backend.c +++ b/hw/virtio/vhost-backend.c @@ -156,7 +156,7 @@ static int vhost_kernel_set_owner(struct vhost_dev *dev) static int vhost_kernel_reset_device(struct vhost_dev *dev) { -return vhost_kernel_call(dev, VHOST_RESET_DEVICE, NULL); +return vhost_kernel_call(dev, VHOST_RESET_OWNER, NULL); } static int vhost_kernel_get_vq_index(struct vhost_dev *dev, int idx) diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c index 78442ba..7c532f4 100644 --- a/hw/virtio/vhost-user.c +++ b/hw/virtio/vhost-user.c @@ -43,7 +43,7 @@ typedef enum VhostUserRequest { VHOST_USER_GET_FEATURES = 1, VHOST_USER_SET_FEATURES = 2, VHOST_USER_SET_OWNER = 3, -VHOST_USER_RESET_DEVICE = 4, +VHOST_USER_RESET_OWNER = 4, VHOST_USER_SET_MEM_TABLE = 5, VHOST_USER_SET_LOG_BASE = 6, VHOST_USER_SET_LOG_FD = 7, @@ -157,7 +157,7 @@ static bool vhost_user_one_time_request(VhostUserRequest request) { switch (request) { case VHOST_USER_SET_OWNER: -case VHOST_USER_RESET_DEVICE: +case VHOST_USER_RESET_OWNER: case VHOST_USER_SET_MEM_TABLE: case VHOST_USER_GET_QUEUE_NUM: return true; @@ -486,7 +486,7 @@ static int vhost_user_set_owner(struct vhost_dev *dev) static int vhost_user_reset_device(struct vhost_dev *dev) { VhostUserMsg msg = { -.request = VHOST_USER_RESET_DEVICE, +.request = VHOST_USER_RESET_OWNER, .flags = VHOST_USER_VERSION, }; diff --git a/linux-headers/linux/vhost.h b/linux-headers/linux/vhost.h index 14a0160..ead86db 100644 --- a/linux-headers/linux/vhost.h +++ b/linux-headers/linux/vhost.h @@ -78,7 +78,7 @@ struct vhost_memory { #define VHOST_SET_OWNER _IO(VHOST_VIRTIO, 0x01) /* Give up ownership, and reset the device to default values. * Allows subsequent call to VHOST_OWNER_SET to succeed. */ -#define VHOST_RESET_DEVICE _IO(VHOST_VIRTIO, 0x02) +#define VHOST_RESET_OWNER _IO(VHOST_VIRTIO, 0x02) /* Set up/modify memory layout */ #define VHOST_SET_MEM_TABLE_IOW(VHOST_VIRTIO, 0x03, struct vhost_memory) diff --git a/tests/vhost-user-test.c b/tests/vhost-user-test.c index a74c934..d2fc048 100644 --- a/tests/vhost-user-test.c +++ b/tests/vhost-user-test.c @@ -57,7 +57,7 @@ typedef enum VhostUserRequest { VHOST_USER_GET_FEATURES = 1, VHOST_USER_SET_FEATURES = 2, VHOST_USER_SET_OWNER = 3, -VHOST_USER_RESET_DEVICE = 4, +VHOST_USER_RESET_OWNER = 4, VHOST_USER_SET_MEM_TABLE = 5, VHOST_USER_SET_LOG_BASE = 6, VHOST_USER_SET_LOG_FD = 7, @@ -307,7 +307,7 @@ static void chr_read(void *opaque, const uint8_t *buf, int size) g_cond_signal(&s->data_cond); break; -case VHOST_USER_RESET_DEVICE: +case VHOST_USER_RESET_OWNER: s->fds_num = 0; break; -- 1.9.0
[Qemu-devel] [PATCH v3 2/3] doc: vhost-user: request naming fix
They are VHOST_USER_XXX instead of VHOST_XXX messages. Also, add VHOST_USER_GET_QUEUE_NUM to the section that requries replies. Cc: Michael S. Tsirkin Signed-off-by: Yuanhan Liu --- docs/specs/vhost-user.txt | 21 +++-- 1 file changed, 11 insertions(+), 10 deletions(-) diff --git a/docs/specs/vhost-user.txt b/docs/specs/vhost-user.txt index d319715..0a5a346 100644 --- a/docs/specs/vhost-user.txt +++ b/docs/specs/vhost-user.txt @@ -112,20 +112,21 @@ The communication consists of master sending message requests and slave sending message replies. Most of the requests don't require replies. Here is a list of the ones that do: - * VHOST_GET_FEATURES - * VHOST_GET_PROTOCOL_FEATURES - * VHOST_GET_VRING_BASE - * VHOST_SET_LOG_BASE (if VHOST_USER_PROTOCOL_F_LOG_SHMFD) + * VHOST_USER_GET_FEATURES + * VHOST_USER_GET_PROTOCOL_FEATURES + * VHOST_USER_GET_VRING_BASE + * VHOST_USER_GET_QUEUE_NUM + * VHOST_USER_SET_LOG_BASE (if VHOST_USER_PROTOCOL_F_LOG_SHMFD) There are several messages that the master sends with file descriptors passed in the ancillary data: - * VHOST_SET_MEM_TABLE - * VHOST_SET_LOG_BASE (if VHOST_USER_PROTOCOL_F_LOG_SHMFD) - * VHOST_SET_LOG_FD - * VHOST_SET_VRING_KICK - * VHOST_SET_VRING_CALL - * VHOST_SET_VRING_ERR + * VHOST_USER_SET_MEM_TABLE + * VHOST_USER_SET_LOG_BASE (if VHOST_USER_PROTOCOL_F_LOG_SHMFD) + * VHOST_USER_SET_LOG_FD + * VHOST_USER_SET_VRING_KICK + * VHOST_USER_SET_VRING_CALL + * VHOST_USER_SET_VRING_ERR If Master is unable to send the full message or receives a wrong reply it will close the connection. An optional reconnection mechanism can be implemented. -- 1.9.0
[Qemu-devel] [PATCH v3 0/3] vhost-user mq sequel works
Patch 1 is basically reverts the patch does the rename, as it may break build for VHOST_RESET_OWNER is no longer defined. As Michael stated, reverting it is not enough, we need send it from the right place, and send SET_VRING_ENABLE at start/stop, which is something I will do later. Patch 2 is a minor fix or request naming in vhost-user spec Patch 3 adds the vhost-user mq unit test --- Yuanhan Liu (3): vhost: rename VHOST_RESET_DEVICE back to VHOST_RESET_OWNER doc: vhost-user: request naming fix vhost-user-test: add multiple queue test docs/specs/vhost-user.txt | 25 + hw/virtio/vhost-backend.c | 2 +- hw/virtio/vhost-user.c | 6 +++--- linux-headers/linux/vhost.h | 2 +- tests/vhost-user-test.c | 27 +++ 5 files changed, 41 insertions(+), 21 deletions(-) -- 1.9.0
[Qemu-devel] [PATCH v3 3/3] vhost-user-test: add multiple queue test
Setting VHOST_USER_PROTOCOL_F_MQ protocol feature bit to claim that we support MQ feature, and simply assume we support 2 queue pairs at most. Cc: Michael S. Tsirkin Cc: Jason Wang Signed-off-by: Yuanhan Liu --- tests/vhost-user-test.c | 23 +-- 1 file changed, 21 insertions(+), 2 deletions(-) diff --git a/tests/vhost-user-test.c b/tests/vhost-user-test.c index d2fc048..50e6c75 100644 --- a/tests/vhost-user-test.c +++ b/tests/vhost-user-test.c @@ -30,11 +30,17 @@ #define HAVE_MONOTONIC_TIME #endif +#define MAX_QUEUES 2 + +#define __STR(x)#x +#define STR(x) __STR(x) + #define QEMU_CMD_ACCEL " -machine accel=tcg" #define QEMU_CMD_MEM" -m %d -object memory-backend-file,id=mem,size=%dM,"\ "mem-path=%s,share=on -numa node,memdev=mem" #define QEMU_CMD_CHR" -chardev socket,id=%s,path=%s" -#define QEMU_CMD_NETDEV " -netdev vhost-user,id=net0,chardev=%s,vhostforce" +#define QEMU_CMD_NETDEV " -netdev vhost-user,id=net0,chardev=%s,vhostforce," \ +"queues=" STR(MAX_QUEUES) #define QEMU_CMD_NET" -device virtio-net-pci,netdev=net0 " #define QEMU_CMD_ROM" -option-rom ../pc-bios/pxe-virtio.rom" @@ -48,6 +54,7 @@ #define VHOST_MEMORY_MAX_NREGIONS8 #define VHOST_USER_F_PROTOCOL_FEATURES 30 +#define VHOST_USER_PROTOCOL_F_MQ0 #define VHOST_USER_PROTOCOL_F_LOG_SHMFD 1 #define VHOST_LOG_PAGE 0x1000 @@ -70,6 +77,8 @@ typedef enum VhostUserRequest { VHOST_USER_SET_VRING_ERR = 14, VHOST_USER_GET_PROTOCOL_FEATURES = 15, VHOST_USER_SET_PROTOCOL_FEATURES = 16, +VHOST_USER_GET_QUEUE_NUM = 17, +VHOST_USER_SET_VRING_ENABLE = 18, VHOST_USER_MAX } VhostUserRequest; @@ -258,7 +267,8 @@ static void chr_read(void *opaque, const uint8_t *buf, int size) /* send back features to qemu */ msg.flags |= VHOST_USER_REPLY_MASK; msg.size = sizeof(m.u64); -msg.u64 = 1 << VHOST_USER_PROTOCOL_F_LOG_SHMFD; +msg.u64 = (1ULL << VHOST_USER_PROTOCOL_F_LOG_SHMFD) | + (1ULL << VHOST_USER_PROTOCOL_F_MQ); p = (uint8_t *) &msg; qemu_chr_fe_write_all(chr, p, VHOST_USER_HDR_SIZE + msg.size); break; @@ -307,6 +317,15 @@ static void chr_read(void *opaque, const uint8_t *buf, int size) g_cond_signal(&s->data_cond); break; +case VHOST_USER_GET_QUEUE_NUM: +/* send back the number of queues we support (let it be 2) to qemu */ +msg.flags |= VHOST_USER_REPLY_MASK; +msg.size = sizeof(m.u64); +msg.u64 = MAX_QUEUES; +p = (uint8_t *) &msg; +qemu_chr_fe_write_all(chr, p, VHOST_USER_HDR_SIZE + msg.size); +break; + case VHOST_USER_RESET_OWNER: s->fds_num = 0; break; -- 1.9.0
Re: [Qemu-devel] [PATCH v2 5/5] vhost: send VHOST_USER_SET_VRING_ENABLE at start/stop
On Wed, Oct 21, 2015 at 05:11:44PM +0300, Michael S. Tsirkin wrote: > On Wed, Oct 21, 2015 at 09:43:16PM +0800, Yuanhan Liu wrote: > > On Wed, Oct 21, 2015 at 01:39:11PM +0300, Michael S. Tsirkin wrote: > > > On Wed, Oct 21, 2015 at 05:07:18PM +0800, Yuanhan Liu wrote: > > > > Send VHOST_USER_SET_VRING_ENABLE at start/stop when multiple queue > > > > is negotiated, to inform the backend that we are ready or not. > > > > > > OK but that's only if MQ is set. > > > > Maybe we could just call vhost_backend_set_vring_enable() unconditionally? > > It's nil operation when MQ is not set. > > > > > If now, we need to do > > > RESET_OWNER followed by SET_OWNER. > > > > Could you be more specific? Why sending RESET_OWNER followed by > > SET_OWNER? > > > > TBH, I'm a bit confused with RESET_OWNER now: what it does, and when is > > supposed to send it :( > > It's not well specified, but it does say it's analogous to RESET_OWNER > in kernel. That one is well documented: > > /* Set current process as the (exclusive) owner of this file descriptor. > * This must be called before any other vhost command. Further calls to > * VHOST_OWNER_SET fail until VHOST_OWNER_RESET is called. */ > #define VHOST_SET_OWNER _IO(VHOST_VIRTIO, 0x01) > /* Give up ownership, and reset the device to default values. > * Allows subsequent call to VHOST_OWNER_SET to succeed. */ > #define VHOST_RESET_OWNER _IO(VHOST_VIRTIO, 0x02) Thanks, that helps (I think). I recalled my old question, and rechecked your answer again: Because we need to get the state from remote after stop. RESET_OWNER discards that, so we can't resume the VM. So, if I understand it correctly this time, you want to keep the VM state at the backend side even after the VM is shut down, and then we can resume it with those saved state And why don't do that when MQ is enabled? I don't see it has anyting to do with MQ. > > So if we want just the reset part, we need to do VHOST_RESET_OWNER > then redo everything that we did previously: VHOST_SET_OWNER > SET_VRING_CALL etc etc. > > > And, sending RESET_OWNER inside virtio_net_reset() also looks weird. > > I made a quick try before sending this patchset, and the vhost-user > > request dump doesn't look right to me: the message is sent after > > vhost dev init (GET_FEATURES, GET_PROTOCOL_FEATURE, SET_OWNER, ..., > > SET_VRING_CALL), and before peer attach (SET_VRING_ENABLE) and > > vhost_dev_start (SET_MEM_TABLE, ... SET_VRING_KICK ...): > > Food for thought. Aha... > > > > > # start of a VM > > > > VHOST_CONFIG: new virtio connection is 28 > > VHOST_CONFIG: new device, handle is 0 > > VHOST_CONFIG: read message VHOST_USER_GET_FEATURES > > VHOST_CONFIG: read message VHOST_USER_GET_PROTOCOL_FEATURES > > VHOST_CONFIG: read message VHOST_USER_SET_PROTOCOL_FEATURES > > VHOST_CONFIG: read message VHOST_USER_GET_QUEUE_NUM > > VHOST_CONFIG: read message VHOST_USER_SET_OWNER > > VHOST_CONFIG: read message VHOST_USER_GET_FEATURES > > VHOST_CONFIG: read message VHOST_USER_SET_VRING_CALL > > VHOST_CONFIG: vring call idx:0 file:29 > > VHOST_CONFIG: read message VHOST_USER_SET_VRING_CALL > > VHOST_CONFIG: vring call idx:1 file:30 > > VHOST_CONFIG: read message VHOST_USER_GET_FEATURES > > VHOST_CONFIG: read message VHOST_USER_GET_PROTOCOL_FEATURES > > VHOST_CONFIG: read message VHOST_USER_SET_PROTOCOL_FEATURES > > VHOST_CONFIG: read message VHOST_USER_GET_QUEUE_NUM > > VHOST_CONFIG: read message VHOST_USER_GET_FEATURES > > VHOST_CONFIG: read message VHOST_USER_SET_VRING_CALL > > ... > > ... > > VHOST_CONFIG: read message VHOST_USER_SET_VRING_CALL > > VHOST_CONFIG: vring call idx:6 file:35 > > VHOST_CONFIG: read message VHOST_USER_SET_VRING_CALL > > VHOST_CONFIG: vring call idx:7 file:36 > > > > ==> VHOST_CONFIG: read message VHOST_USER_RESET_OWNER > > VHOST_CONFIG: read message VHOST_USER_RESET_OWNER > > > > VHOST_CONFIG: read message VHOST_USER_SET_VRING_ENABLE > > VHOST_CONFIG: set queue enable: 1 to qp idx: 0 > > VHOST_CONFIG: read message VHOST_USER_SET_VRING_ENABLE > > VHOST_CONFIG: set queue enable: 0 to qp idx: 2 > > VHOST_CONFIG: read message VHOST_USER_SET_VRING_ENABLE > > VHOST_CONFIG: set queue enable: 0 to qp idx: 4 > > VHOST_CONFIG: read message VHOST_USER_SET_VRING_ENABLE > > VHOST_CONFIG: set queue enable: 0 to qp idx: 6 > > VHOST_CONFIG
Re: [Qemu-devel] [PATCH v2 1/5] Revert "vhost: rename VHOST_RESET_OWNER to VHOST_RESET_DEVICE"
On Wed, Oct 21, 2015 at 05:13:49PM +0300, Michael S. Tsirkin wrote: > On Wed, Oct 21, 2015 at 09:04:17PM +0800, Yuanhan Liu wrote: > > On Wed, Oct 21, 2015 at 01:40:59PM +0300, Michael S. Tsirkin wrote: > > > On Wed, Oct 21, 2015 at 05:07:14PM +0800, Yuanhan Liu wrote: > > > > This reverts commit d1f8b30ec8dde0318fd1b98d24a64926feae9625. > > > > > > > > It turned out that it breaks stuff, so revert it: > > > > > > > > > > > > http://lists.nongnu.org/archive/html/qemu-devel/2015-10/msg00949.html > > > > > > > > CC: "Michael S. Tsirkin" > > > > Reported-by: Paolo Bonzini > > > > Signed-off-by: Yuanhan Liu > > > > > > > > > Are these patches dependent on each other? If yes pls send a cover letter. > > > If not pls don't send a series, post them separately. > > > > Got it. > > > > > And, pls Cc me on all vhost user patches. > > > > Isn't I always doing that? :) > > > > --yliu > > You didn't for 5/5. Oops, that obviously is a mistake. Sorry for that. --yliu > > > > > > > > --- > > > > docs/specs/vhost-user.txt | 4 ++-- > > > > hw/net/vhost_net.c | 2 +- > > > > hw/virtio/vhost-user.c | 8 > > > > linux-headers/linux/vhost.h | 2 +- > > > > tests/vhost-user-test.c | 2 +- > > > > 5 files changed, 9 insertions(+), 9 deletions(-) > > > > > > > > diff --git a/docs/specs/vhost-user.txt b/docs/specs/vhost-user.txt > > > > index 4eadad1..4bcd17d 100644 > > > > --- a/docs/specs/vhost-user.txt > > > > +++ b/docs/specs/vhost-user.txt > > > > @@ -211,10 +211,10 @@ Message types > > > >as an owner of the session. This can be used on the Slave as a > > > >"session start" flag. > > > > > > > > - * VHOST_USER_RESET_DEVICE > > > > + * VHOST_USER_RESET_OWNER > > > > > > > >Id: 4 > > > > - Equivalent ioctl: VHOST_RESET_DEVICE > > > > + Equivalent ioctl: VHOST_RESET_OWNER > > > >Master payload: N/A > > > > > > > >Issued when a new connection is about to be closed. The Master > > > > will no > > > > diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c > > > > index 2bce891..804f5c9 100644 > > > > --- a/hw/net/vhost_net.c > > > > +++ b/hw/net/vhost_net.c > > > > @@ -296,7 +296,7 @@ static void vhost_net_stop_one(struct vhost_net > > > > *net, > > > > } else if (net->nc->info->type == > > > > NET_CLIENT_OPTIONS_KIND_VHOST_USER) { > > > > for (file.index = 0; file.index < net->dev.nvqs; ++file.index) > > > > { > > > > const VhostOps *vhost_ops = net->dev.vhost_ops; > > > > -int r = vhost_ops->vhost_call(&net->dev, > > > > VHOST_RESET_DEVICE, > > > > +int r = vhost_ops->vhost_call(&net->dev, VHOST_RESET_OWNER, > > > >NULL); > > > > assert(r >= 0); > > > > } > > > > diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c > > > > index b11c0d2..12a9104 100644 > > > > --- a/hw/virtio/vhost-user.c > > > > +++ b/hw/virtio/vhost-user.c > > > > @@ -34,7 +34,7 @@ typedef enum VhostUserRequest { > > > > VHOST_USER_GET_FEATURES = 1, > > > > VHOST_USER_SET_FEATURES = 2, > > > > VHOST_USER_SET_OWNER = 3, > > > > -VHOST_USER_RESET_DEVICE = 4, > > > > +VHOST_USER_RESET_OWNER = 4, > > > > VHOST_USER_SET_MEM_TABLE = 5, > > > > VHOST_USER_SET_LOG_BASE = 6, > > > > VHOST_USER_SET_LOG_FD = 7, > > > > @@ -102,7 +102,7 @@ static unsigned long int > > > > ioctl_to_vhost_user_request[VHOST_USER_MAX] = { > > > > VHOST_GET_FEATURES, /* VHOST_USER_GET_FEATURES */ > > > > VHOST_SET_FEATURES, /* VHOST_USER_SET_FEATURES */ > > > > VHOST_SET_OWNER,/* VHOST_USER_SET_OWNER */ > > > > -VHOST_RESET_DEVICE, /* VHOST_USER_RESET_DEVICE */ > > > > +VHOST_RESET_OWNER, /* VHOST_USER_RESET_OWNER */ > > > > VHOST_SET_MEM_TABLE,/* VHOST_USER_SET_MEM_TABLE */ > >
Re: [Qemu-devel] [PATCH v2 5/5] vhost: send VHOST_USER_SET_VRING_ENABLE at start/stop
On Wed, Oct 21, 2015 at 01:39:11PM +0300, Michael S. Tsirkin wrote: > On Wed, Oct 21, 2015 at 05:07:18PM +0800, Yuanhan Liu wrote: > > Send VHOST_USER_SET_VRING_ENABLE at start/stop when multiple queue > > is negotiated, to inform the backend that we are ready or not. > > OK but that's only if MQ is set. Maybe we could just call vhost_backend_set_vring_enable() unconditionally? It's nil operation when MQ is not set. > If now, we need to do > RESET_OWNER followed by SET_OWNER. Could you be more specific? Why sending RESET_OWNER followed by SET_OWNER? TBH, I'm a bit confused with RESET_OWNER now: what it does, and when is supposed to send it :( And, sending RESET_OWNER inside virtio_net_reset() also looks weird. I made a quick try before sending this patchset, and the vhost-user request dump doesn't look right to me: the message is sent after vhost dev init (GET_FEATURES, GET_PROTOCOL_FEATURE, SET_OWNER, ..., SET_VRING_CALL), and before peer attach (SET_VRING_ENABLE) and vhost_dev_start (SET_MEM_TABLE, ... SET_VRING_KICK ...): # start of a VM VHOST_CONFIG: new virtio connection is 28 VHOST_CONFIG: new device, handle is 0 VHOST_CONFIG: read message VHOST_USER_GET_FEATURES VHOST_CONFIG: read message VHOST_USER_GET_PROTOCOL_FEATURES VHOST_CONFIG: read message VHOST_USER_SET_PROTOCOL_FEATURES VHOST_CONFIG: read message VHOST_USER_GET_QUEUE_NUM VHOST_CONFIG: read message VHOST_USER_SET_OWNER VHOST_CONFIG: read message VHOST_USER_GET_FEATURES VHOST_CONFIG: read message VHOST_USER_SET_VRING_CALL VHOST_CONFIG: vring call idx:0 file:29 VHOST_CONFIG: read message VHOST_USER_SET_VRING_CALL VHOST_CONFIG: vring call idx:1 file:30 VHOST_CONFIG: read message VHOST_USER_GET_FEATURES VHOST_CONFIG: read message VHOST_USER_GET_PROTOCOL_FEATURES VHOST_CONFIG: read message VHOST_USER_SET_PROTOCOL_FEATURES VHOST_CONFIG: read message VHOST_USER_GET_QUEUE_NUM VHOST_CONFIG: read message VHOST_USER_GET_FEATURES VHOST_CONFIG: read message VHOST_USER_SET_VRING_CALL ... ... VHOST_CONFIG: read message VHOST_USER_SET_VRING_CALL VHOST_CONFIG: vring call idx:6 file:35 VHOST_CONFIG: read message VHOST_USER_SET_VRING_CALL VHOST_CONFIG: vring call idx:7 file:36 ==> VHOST_CONFIG: read message VHOST_USER_RESET_OWNER VHOST_CONFIG: read message VHOST_USER_RESET_OWNER VHOST_CONFIG: read message VHOST_USER_SET_VRING_ENABLE VHOST_CONFIG: set queue enable: 1 to qp idx: 0 VHOST_CONFIG: read message VHOST_USER_SET_VRING_ENABLE VHOST_CONFIG: set queue enable: 0 to qp idx: 2 VHOST_CONFIG: read message VHOST_USER_SET_VRING_ENABLE VHOST_CONFIG: set queue enable: 0 to qp idx: 4 VHOST_CONFIG: read message VHOST_USER_SET_VRING_ENABLE VHOST_CONFIG: set queue enable: 0 to qp idx: 6 VHOST_CONFIG: read message VHOST_USER_SET_VRING_CALL VHOST_CONFIG: vring call idx:0 file:29 VHOST_CONFIG: read message VHOST_USER_SET_VRING_CALL VHOST_CONFIG: vring call idx:1 file:30 VHOST_CONFIG: read message VHOST_USER_SET_VRING_CALL VHOST_CONFIG: vring call idx:2 file:31 VHOST_CONFIG: read message VHOST_USER_SET_VRING_CALL VHOST_CONFIG: vring call idx:3 file:32 VHOST_CONFIG: read message VHOST_USER_SET_VRING_CALL VHOST_CONFIG: vring call idx:4 file:33 VHOST_CONFIG: read message VHOST_USER_SET_VRING_CALL VHOST_CONFIG: vring call idx:5 file:34 VHOST_CONFIG: read message VHOST_USER_SET_VRING_CALL VHOST_CONFIG: vring call idx:6 file:35 VHOST_CONFIG: read message VHOST_USER_SET_VRING_CALL VHOST_CONFIG: vring call idx:7 file:36 VHOST_CONFIG: read message VHOST_USER_SET_FEATURES VHOST_CONFIG: read message VHOST_USER_SET_MEM_TABLE VHOST_CONFIG: mapped region 0 fd:37 to 0x2aaac000 sz:0xa off:0x0 VHOST_CONFIG: mapped region 1 fd:38 to 0x2aab sz:0x8000 off:0xc VHOST_CONFIG: read message VHOST_USER_SET_VRING_NUM VHOST_CONFIG: read message VHOST_USER_SET_VRING_BASE VHOST_CONFIG: read message VHOST_USER_SET_VRING_ADDR VHOST_CONFIG: read message VHOST_USER_SET_VRING_KICK VHOST_CONFIG: vring kick idx:0 file:39 VHOST_CONFIG: virtio is not ready for processing. VHOST_CONFIG: read message VHOST_USER_SET_VRING_NUM VHOST_CONFIG: read message VHOST_USER_SET_VRING_BASE VHOST_CONFIG: read message VHOST_USER_SET_VRING_ADDR VHOST_CONFIG: read message VHOST_USER_SET_VRING_KICK VHOST_CONFIG: vring kick idx:1 file:40 VHOST_CONFIG: virtio is not ready for processing. VHOST_CONFIG: read message VHOST_USER_SET_VRING_ENABLE VHOST_CONFIG: set queue enable: 1 to qp idx: 0 VHOST_CONFIG: read message VHOST_USER_SET_FEATURES VHOST_CONFIG: read message VHOST_USER_SET_VRING_NUM VHOST_CONFIG: read message VHOST_USER_SET_VRING_BASE VHOST_CONFIG: read message VHOST_USER_SET_VRING_ADDR VHOST_CONFIG: read message VHOST_USER_SET_VRING_KICK VHOST
Re: [Qemu-devel] [PATCH v2 1/5] Revert "vhost: rename VHOST_RESET_OWNER to VHOST_RESET_DEVICE"
On Wed, Oct 21, 2015 at 01:40:59PM +0300, Michael S. Tsirkin wrote: > On Wed, Oct 21, 2015 at 05:07:14PM +0800, Yuanhan Liu wrote: > > This reverts commit d1f8b30ec8dde0318fd1b98d24a64926feae9625. > > > > It turned out that it breaks stuff, so revert it: > > > > http://lists.nongnu.org/archive/html/qemu-devel/2015-10/msg00949.html > > > > CC: "Michael S. Tsirkin" > > Reported-by: Paolo Bonzini > > Signed-off-by: Yuanhan Liu > > > Are these patches dependent on each other? If yes pls send a cover letter. > If not pls don't send a series, post them separately. Got it. > And, pls Cc me on all vhost user patches. Isn't I always doing that? :) --yliu > > > --- > > docs/specs/vhost-user.txt | 4 ++-- > > hw/net/vhost_net.c | 2 +- > > hw/virtio/vhost-user.c | 8 > > linux-headers/linux/vhost.h | 2 +- > > tests/vhost-user-test.c | 2 +- > > 5 files changed, 9 insertions(+), 9 deletions(-) > > > > diff --git a/docs/specs/vhost-user.txt b/docs/specs/vhost-user.txt > > index 4eadad1..4bcd17d 100644 > > --- a/docs/specs/vhost-user.txt > > +++ b/docs/specs/vhost-user.txt > > @@ -211,10 +211,10 @@ Message types > >as an owner of the session. This can be used on the Slave as a > >"session start" flag. > > > > - * VHOST_USER_RESET_DEVICE > > + * VHOST_USER_RESET_OWNER > > > >Id: 4 > > - Equivalent ioctl: VHOST_RESET_DEVICE > > + Equivalent ioctl: VHOST_RESET_OWNER > >Master payload: N/A > > > >Issued when a new connection is about to be closed. The Master will > > no > > diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c > > index 2bce891..804f5c9 100644 > > --- a/hw/net/vhost_net.c > > +++ b/hw/net/vhost_net.c > > @@ -296,7 +296,7 @@ static void vhost_net_stop_one(struct vhost_net *net, > > } else if (net->nc->info->type == NET_CLIENT_OPTIONS_KIND_VHOST_USER) { > > for (file.index = 0; file.index < net->dev.nvqs; ++file.index) { > > const VhostOps *vhost_ops = net->dev.vhost_ops; > > -int r = vhost_ops->vhost_call(&net->dev, VHOST_RESET_DEVICE, > > +int r = vhost_ops->vhost_call(&net->dev, VHOST_RESET_OWNER, > >NULL); > > assert(r >= 0); > > } > > diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c > > index b11c0d2..12a9104 100644 > > --- a/hw/virtio/vhost-user.c > > +++ b/hw/virtio/vhost-user.c > > @@ -34,7 +34,7 @@ typedef enum VhostUserRequest { > > VHOST_USER_GET_FEATURES = 1, > > VHOST_USER_SET_FEATURES = 2, > > VHOST_USER_SET_OWNER = 3, > > -VHOST_USER_RESET_DEVICE = 4, > > +VHOST_USER_RESET_OWNER = 4, > > VHOST_USER_SET_MEM_TABLE = 5, > > VHOST_USER_SET_LOG_BASE = 6, > > VHOST_USER_SET_LOG_FD = 7, > > @@ -102,7 +102,7 @@ static unsigned long int > > ioctl_to_vhost_user_request[VHOST_USER_MAX] = { > > VHOST_GET_FEATURES, /* VHOST_USER_GET_FEATURES */ > > VHOST_SET_FEATURES, /* VHOST_USER_SET_FEATURES */ > > VHOST_SET_OWNER,/* VHOST_USER_SET_OWNER */ > > -VHOST_RESET_DEVICE, /* VHOST_USER_RESET_DEVICE */ > > +VHOST_RESET_OWNER, /* VHOST_USER_RESET_OWNER */ > > VHOST_SET_MEM_TABLE,/* VHOST_USER_SET_MEM_TABLE */ > > VHOST_SET_LOG_BASE, /* VHOST_USER_SET_LOG_BASE */ > > VHOST_SET_LOG_FD, /* VHOST_USER_SET_LOG_FD */ > > @@ -192,7 +192,7 @@ static bool > > vhost_user_one_time_request(VhostUserRequest request) > > { > > switch (request) { > > case VHOST_USER_SET_OWNER: > > -case VHOST_USER_RESET_DEVICE: > > +case VHOST_USER_RESET_OWNER: > > case VHOST_USER_SET_MEM_TABLE: > > case VHOST_USER_GET_QUEUE_NUM: > > return true; > > @@ -249,7 +249,7 @@ static int vhost_user_call(struct vhost_dev *dev, > > unsigned long int request, > > break; > > > > case VHOST_USER_SET_OWNER: > > -case VHOST_USER_RESET_DEVICE: > > +case VHOST_USER_RESET_OWNER: > > break; > > > > case VHOST_USER_SET_MEM_TABLE: > > diff --git a/linux-headers/linux/vhost.h b/linux-headers/linux/vhost.h > > index 14a0160..ead86db 100644 > > --- a/linux-headers/linux/vhost.h > > +++ b/linux-headers/linux/vhost.h > > @@ -78,7 +78,7 @@ struct vhost_memory { > > #define
Re: [Qemu-devel] [PATCH 1/3] Revert "vhost: rename VHOST_RESET_OWNER to VHOST_RESET_DEVICE"
On Wed, Oct 21, 2015 at 12:15:52PM +0300, Michael S. Tsirkin wrote: > On Wed, Oct 21, 2015 at 04:55:16PM +0800, Yuanhan Liu wrote: > > On Fri, Oct 16, 2015 at 10:47:49AM +0300, Michael S. Tsirkin wrote: > > > On Fri, Oct 16, 2015 at 03:32:56PM +0800, Yuanhan Liu wrote: > > > > On Fri, Oct 16, 2015 at 10:21:40AM +0300, Michael S. Tsirkin wrote: > > > > > On Fri, Oct 16, 2015 at 03:18:59PM +0800, Yuanhan Liu wrote: > > > > > > This reverts commit d1f8b30ec8dde0318fd1b98d24a64926feae9625. > > > > > > > > > > > > It turned out that it breaks stuff, so revert it: > > > > > > > > > > > > > > > > > > http://lists.nongnu.org/archive/html/qemu-devel/2015-10/msg00949.html > > > > > > > > > > > > CC: "Michael S. Tsirkin" > > > > > > Reported-by: Paolo Bonzini > > > > > > Signed-off-by: Yuanhan Liu > > > > > > > > > > OK but that's not enough. We need to also > > > > > - drop patch sending this on stop > > > > > > > > I could do that. But may I know why we have to drop it on stop? > > > > > > Because we need to get the state from remote after stop. > > > RESET_OWNER discards that, so we can't resume the VM. > > > > > > > If we don't send it on stop, when will we send it? The termination > > > > of QEMU? > > > > > > If mq is negotiated, we don't need it. If mq isn't negotiated, maybe we > > > should send it on reset/driver unload. > > > > Sorry, what place do you mean exactly? > > There's no such code - you'll have to add > a callback into vhost from virtio. Say, at virtio_net_reset()? --yliu > > > > > It might be better than > > > corrupting guest memory. And if we do, maybe we should keep > > > the RESET_DEVICE name. > > > > > > > > - send VHOST_USER_SET_VRING_ENABLE on stop and start instead > > > > > > > > That indeed looks more reasonable. And actually, Huawei asked whether > > > > there is a message from QEMU to tell us that a vring is ready in > > > > our weekly meeting this morning. And here it is :) > > > > > > > > --yliu > > > > > > Unfortunately that's only there if MQ is negotiated. > > > > > > If MQ is not negotiated, then I think you can get away with > > > starting processing after you get the 1st kick. > > > > > > > > > > > > > > > > > > > > --- > > > > > > docs/specs/vhost-user.txt | 4 ++-- > > > > > > hw/net/vhost_net.c | 2 +- > > > > > > hw/virtio/vhost-user.c | 8 > > > > > > linux-headers/linux/vhost.h | 2 +- > > > > > > tests/vhost-user-test.c | 2 +- > > > > > > 5 files changed, 9 insertions(+), 9 deletions(-) > > > > > > > > > > > > diff --git a/docs/specs/vhost-user.txt b/docs/specs/vhost-user.txt > > > > > > index 4eadad1..4bcd17d 100644 > > > > > > --- a/docs/specs/vhost-user.txt > > > > > > +++ b/docs/specs/vhost-user.txt > > > > > > @@ -211,10 +211,10 @@ Message types > > > > > >as an owner of the session. This can be used on the Slave as > > > > > > a > > > > > >"session start" flag. > > > > > > > > > > > > - * VHOST_USER_RESET_DEVICE > > > > > > + * VHOST_USER_RESET_OWNER > > > > > > > > > > > >Id: 4 > > > > > > - Equivalent ioctl: VHOST_RESET_DEVICE > > > > > > + Equivalent ioctl: VHOST_RESET_OWNER > > > > > >Master payload: N/A > > > > > > > > > > > >Issued when a new connection is about to be closed. The > > > > > > Master will no > > > > > > diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c > > > > > > index 2bce891..804f5c9 100644 > > > > > > --- a/hw/net/vhost_net.c > > > > > > +++ b/hw/net/vhost_net.c > > > > > > @@ -296,7 +296,7 @@ static void vhost_net_stop_one(struct vhost_net > > > > > > *net, > > > > > > } else if (net->nc-&g
Re: [Qemu-devel] [PATCH 1/3] Revert "vhost: rename VHOST_RESET_OWNER to VHOST_RESET_DEVICE"
On Fri, Oct 16, 2015 at 10:47:49AM +0300, Michael S. Tsirkin wrote: > On Fri, Oct 16, 2015 at 03:32:56PM +0800, Yuanhan Liu wrote: > > On Fri, Oct 16, 2015 at 10:21:40AM +0300, Michael S. Tsirkin wrote: > > > On Fri, Oct 16, 2015 at 03:18:59PM +0800, Yuanhan Liu wrote: > > > > This reverts commit d1f8b30ec8dde0318fd1b98d24a64926feae9625. > > > > > > > > It turned out that it breaks stuff, so revert it: > > > > > > > > > > > > http://lists.nongnu.org/archive/html/qemu-devel/2015-10/msg00949.html > > > > > > > > CC: "Michael S. Tsirkin" > > > > Reported-by: Paolo Bonzini > > > > Signed-off-by: Yuanhan Liu > > > > > > OK but that's not enough. We need to also > > > - drop patch sending this on stop > > > > I could do that. But may I know why we have to drop it on stop? > > Because we need to get the state from remote after stop. > RESET_OWNER discards that, so we can't resume the VM. > > > If we don't send it on stop, when will we send it? The termination > > of QEMU? > > If mq is negotiated, we don't need it. If mq isn't negotiated, maybe we > should send it on reset/driver unload. Sorry, what place do you mean exactly? --yliu > It might be better than > corrupting guest memory. And if we do, maybe we should keep > the RESET_DEVICE name. > > > > - send VHOST_USER_SET_VRING_ENABLE on stop and start instead > > > > That indeed looks more reasonable. And actually, Huawei asked whether > > there is a message from QEMU to tell us that a vring is ready in > > our weekly meeting this morning. And here it is :) > > > > --yliu > > Unfortunately that's only there if MQ is negotiated. > > If MQ is not negotiated, then I think you can get away with > starting processing after you get the 1st kick. > > > > > > > > > > --- > > > > docs/specs/vhost-user.txt | 4 ++-- > > > > hw/net/vhost_net.c | 2 +- > > > > hw/virtio/vhost-user.c | 8 > > > > linux-headers/linux/vhost.h | 2 +- > > > > tests/vhost-user-test.c | 2 +- > > > > 5 files changed, 9 insertions(+), 9 deletions(-) > > > > > > > > diff --git a/docs/specs/vhost-user.txt b/docs/specs/vhost-user.txt > > > > index 4eadad1..4bcd17d 100644 > > > > --- a/docs/specs/vhost-user.txt > > > > +++ b/docs/specs/vhost-user.txt > > > > @@ -211,10 +211,10 @@ Message types > > > >as an owner of the session. This can be used on the Slave as a > > > >"session start" flag. > > > > > > > > - * VHOST_USER_RESET_DEVICE > > > > + * VHOST_USER_RESET_OWNER > > > > > > > >Id: 4 > > > > - Equivalent ioctl: VHOST_RESET_DEVICE > > > > + Equivalent ioctl: VHOST_RESET_OWNER > > > >Master payload: N/A > > > > > > > >Issued when a new connection is about to be closed. The Master > > > > will no > > > > diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c > > > > index 2bce891..804f5c9 100644 > > > > --- a/hw/net/vhost_net.c > > > > +++ b/hw/net/vhost_net.c > > > > @@ -296,7 +296,7 @@ static void vhost_net_stop_one(struct vhost_net > > > > *net, > > > > } else if (net->nc->info->type == > > > > NET_CLIENT_OPTIONS_KIND_VHOST_USER) { > > > > for (file.index = 0; file.index < net->dev.nvqs; ++file.index) > > > > { > > > > const VhostOps *vhost_ops = net->dev.vhost_ops; > > > > -int r = vhost_ops->vhost_call(&net->dev, > > > > VHOST_RESET_DEVICE, > > > > +int r = vhost_ops->vhost_call(&net->dev, VHOST_RESET_OWNER, > > > >NULL); > > > > assert(r >= 0); > > > > } > > > > diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c > > > > index b11c0d2..12a9104 100644 > > > > --- a/hw/virtio/vhost-user.c > > > > +++ b/hw/virtio/vhost-user.c > > > > @@ -34,7 +34,7 @@ typedef enum VhostUserRequest { > > > > VHOST_USER_GET_FEATURES = 1, > > > > VHOST_USER_SET_FEATURES = 2, > > > > VHOST_USER_SET_OWNER = 3, > > > > -VHOST_USER_RESET_DEVICE =
[Qemu-devel] [PATCH v2 5/5] vhost: send VHOST_USER_SET_VRING_ENABLE at start/stop
Send VHOST_USER_SET_VRING_ENABLE at start/stop when multiple queue is negotiated, to inform the backend that we are ready or not. And exclude VHOST_USER_GET_QUEUE_NUM as one time request, as we need to get max_queues for each vhost_dev. Suggested-by: Michael S. Tsirkin Signed-off-by: Yuanhan Liu --- hw/virtio/vhost-user.c | 1 - hw/virtio/vhost.c | 18 ++ 2 files changed, 18 insertions(+), 1 deletion(-) diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c index 12a9104..6532a73 100644 --- a/hw/virtio/vhost-user.c +++ b/hw/virtio/vhost-user.c @@ -194,7 +194,6 @@ static bool vhost_user_one_time_request(VhostUserRequest request) case VHOST_USER_SET_OWNER: case VHOST_USER_RESET_OWNER: case VHOST_USER_SET_MEM_TABLE: -case VHOST_USER_GET_QUEUE_NUM: return true; default: return false; diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c index c0ed5b2..54a4633 100644 --- a/hw/virtio/vhost.c +++ b/hw/virtio/vhost.c @@ -1146,6 +1146,19 @@ int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice *vdev) } } +/* + * Send VHOST_USER_SET_VRING_ENABLE message when multiple queue + * is negotiated to inform the back end that we are ready. + * + * Only set enable to 1 for first queue pair, as we enable one + * queue pair by default. + */ +if (hdev->max_queues > 1 && +hdev->vhost_ops->vhost_backend_set_vring_enable) { +hdev->vhost_ops->vhost_backend_set_vring_enable(hdev, +hdev->vq_index == 0); +} + return 0; fail_log: vhost_log_put(hdev, false); @@ -1180,5 +1193,10 @@ void vhost_dev_stop(struct vhost_dev *hdev, VirtIODevice *vdev) hdev->started = false; hdev->log = NULL; hdev->log_size = 0; + +if (hdev->max_queues > 1 && +hdev->vhost_ops->vhost_backend_set_vring_enable) { +hdev->vhost_ops->vhost_backend_set_vring_enable(hdev, 0); +} } -- 1.9.0
[Qemu-devel] [PATCH v2 4/5] Revert "vhost-user: Send VHOST_RESET_OWNER on vhost stop"
Don't send VHOST_RESET_OWNER, for as Michael stated: Because we need to get the state from remote after stop. RESET_OWNER discards that, so we can't resume the VM. This reverts commit 294ce717e0f212ed0763307f3eab72b4a1bdf4d0. Cc: Luke Gorrie Cc: Michael S. Tsirkin Signed-off-by: Yuanhan Liu --- hw/net/vhost_net.c | 7 --- 1 file changed, 7 deletions(-) diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c index 804f5c9..95da5f8 100644 --- a/hw/net/vhost_net.c +++ b/hw/net/vhost_net.c @@ -293,13 +293,6 @@ static void vhost_net_stop_one(struct vhost_net *net, &file); assert(r >= 0); } -} else if (net->nc->info->type == NET_CLIENT_OPTIONS_KIND_VHOST_USER) { -for (file.index = 0; file.index < net->dev.nvqs; ++file.index) { -const VhostOps *vhost_ops = net->dev.vhost_ops; -int r = vhost_ops->vhost_call(&net->dev, VHOST_RESET_OWNER, - NULL); -assert(r >= 0); -} } if (net->nc->info->poll) { net->nc->info->poll(net->nc, true); -- 1.9.0
[Qemu-devel] [PATCH v2 1/5] Revert "vhost: rename VHOST_RESET_OWNER to VHOST_RESET_DEVICE"
This reverts commit d1f8b30ec8dde0318fd1b98d24a64926feae9625. It turned out that it breaks stuff, so revert it: http://lists.nongnu.org/archive/html/qemu-devel/2015-10/msg00949.html CC: "Michael S. Tsirkin" Reported-by: Paolo Bonzini Signed-off-by: Yuanhan Liu --- docs/specs/vhost-user.txt | 4 ++-- hw/net/vhost_net.c | 2 +- hw/virtio/vhost-user.c | 8 linux-headers/linux/vhost.h | 2 +- tests/vhost-user-test.c | 2 +- 5 files changed, 9 insertions(+), 9 deletions(-) diff --git a/docs/specs/vhost-user.txt b/docs/specs/vhost-user.txt index 4eadad1..4bcd17d 100644 --- a/docs/specs/vhost-user.txt +++ b/docs/specs/vhost-user.txt @@ -211,10 +211,10 @@ Message types as an owner of the session. This can be used on the Slave as a "session start" flag. - * VHOST_USER_RESET_DEVICE + * VHOST_USER_RESET_OWNER Id: 4 - Equivalent ioctl: VHOST_RESET_DEVICE + Equivalent ioctl: VHOST_RESET_OWNER Master payload: N/A Issued when a new connection is about to be closed. The Master will no diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c index 2bce891..804f5c9 100644 --- a/hw/net/vhost_net.c +++ b/hw/net/vhost_net.c @@ -296,7 +296,7 @@ static void vhost_net_stop_one(struct vhost_net *net, } else if (net->nc->info->type == NET_CLIENT_OPTIONS_KIND_VHOST_USER) { for (file.index = 0; file.index < net->dev.nvqs; ++file.index) { const VhostOps *vhost_ops = net->dev.vhost_ops; -int r = vhost_ops->vhost_call(&net->dev, VHOST_RESET_DEVICE, +int r = vhost_ops->vhost_call(&net->dev, VHOST_RESET_OWNER, NULL); assert(r >= 0); } diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c index b11c0d2..12a9104 100644 --- a/hw/virtio/vhost-user.c +++ b/hw/virtio/vhost-user.c @@ -34,7 +34,7 @@ typedef enum VhostUserRequest { VHOST_USER_GET_FEATURES = 1, VHOST_USER_SET_FEATURES = 2, VHOST_USER_SET_OWNER = 3, -VHOST_USER_RESET_DEVICE = 4, +VHOST_USER_RESET_OWNER = 4, VHOST_USER_SET_MEM_TABLE = 5, VHOST_USER_SET_LOG_BASE = 6, VHOST_USER_SET_LOG_FD = 7, @@ -102,7 +102,7 @@ static unsigned long int ioctl_to_vhost_user_request[VHOST_USER_MAX] = { VHOST_GET_FEATURES, /* VHOST_USER_GET_FEATURES */ VHOST_SET_FEATURES, /* VHOST_USER_SET_FEATURES */ VHOST_SET_OWNER,/* VHOST_USER_SET_OWNER */ -VHOST_RESET_DEVICE, /* VHOST_USER_RESET_DEVICE */ +VHOST_RESET_OWNER, /* VHOST_USER_RESET_OWNER */ VHOST_SET_MEM_TABLE,/* VHOST_USER_SET_MEM_TABLE */ VHOST_SET_LOG_BASE, /* VHOST_USER_SET_LOG_BASE */ VHOST_SET_LOG_FD, /* VHOST_USER_SET_LOG_FD */ @@ -192,7 +192,7 @@ static bool vhost_user_one_time_request(VhostUserRequest request) { switch (request) { case VHOST_USER_SET_OWNER: -case VHOST_USER_RESET_DEVICE: +case VHOST_USER_RESET_OWNER: case VHOST_USER_SET_MEM_TABLE: case VHOST_USER_GET_QUEUE_NUM: return true; @@ -249,7 +249,7 @@ static int vhost_user_call(struct vhost_dev *dev, unsigned long int request, break; case VHOST_USER_SET_OWNER: -case VHOST_USER_RESET_DEVICE: +case VHOST_USER_RESET_OWNER: break; case VHOST_USER_SET_MEM_TABLE: diff --git a/linux-headers/linux/vhost.h b/linux-headers/linux/vhost.h index 14a0160..ead86db 100644 --- a/linux-headers/linux/vhost.h +++ b/linux-headers/linux/vhost.h @@ -78,7 +78,7 @@ struct vhost_memory { #define VHOST_SET_OWNER _IO(VHOST_VIRTIO, 0x01) /* Give up ownership, and reset the device to default values. * Allows subsequent call to VHOST_OWNER_SET to succeed. */ -#define VHOST_RESET_DEVICE _IO(VHOST_VIRTIO, 0x02) +#define VHOST_RESET_OWNER _IO(VHOST_VIRTIO, 0x02) /* Set up/modify memory layout */ #define VHOST_SET_MEM_TABLE_IOW(VHOST_VIRTIO, 0x03, struct vhost_memory) diff --git a/tests/vhost-user-test.c b/tests/vhost-user-test.c index 56df5cc..f181391 100644 --- a/tests/vhost-user-test.c +++ b/tests/vhost-user-test.c @@ -53,7 +53,7 @@ typedef enum VhostUserRequest { VHOST_USER_GET_FEATURES = 1, VHOST_USER_SET_FEATURES = 2, VHOST_USER_SET_OWNER = 3, -VHOST_USER_RESET_DEVICE = 4, +VHOST_USER_RESET_OWNER = 4, VHOST_USER_SET_MEM_TABLE = 5, VHOST_USER_SET_LOG_BASE = 6, VHOST_USER_SET_LOG_FD = 7, -- 1.9.0
[Qemu-devel] [PATCH v2 2/5] doc: vhost-user: request naming fix
They are VHOST_USER_XXX instead of VHOST_XXX messages. Also, add VHOST_USER_GET_QUEUE_NUM to the section that requries replies. Cc: Michael S. Tsirkin Signed-off-by: Yuanhan Liu --- docs/specs/vhost-user.txt | 17 + 1 file changed, 9 insertions(+), 8 deletions(-) diff --git a/docs/specs/vhost-user.txt b/docs/specs/vhost-user.txt index 4bcd17d..b23d88f 100644 --- a/docs/specs/vhost-user.txt +++ b/docs/specs/vhost-user.txt @@ -112,18 +112,19 @@ The communication consists of master sending message requests and slave sending message replies. Most of the requests don't require replies. Here is a list of the ones that do: - * VHOST_GET_FEATURES - * VHOST_GET_PROTOCOL_FEATURES - * VHOST_GET_VRING_BASE + * VHOST_USER_GET_FEATURES + * VHOST_USER_GET_PROTOCOL_FEATURES + * VHOST_USER_GET_VRING_BASE + * VHOST_USER_GET_QUEUE_NUM There are several messages that the master sends with file descriptors passed in the ancillary data: - * VHOST_SET_MEM_TABLE - * VHOST_SET_LOG_FD - * VHOST_SET_VRING_KICK - * VHOST_SET_VRING_CALL - * VHOST_SET_VRING_ERR + * VHOST_USER_SET_MEM_TABLE + * VHOST_USER_SET_LOG_FD + * VHOST_USER_SET_VRING_KICK + * VHOST_USER_SET_VRING_CALL + * VHOST_USER_SET_VRING_ERR If Master is unable to send the full message or receives a wrong reply it will close the connection. An optional reconnection mechanism can be implemented. -- 1.9.0
[Qemu-devel] [PATCH v2 3/5] vhost-user-test: add multiple queue test
Setting VHOST_USER_PROTOCOL_F_MQ protocol feature bit to claim that we support MQ feature, and simply assume we support 0xff queue pairs at most. Cc: Michael S. Tsirkin Cc: Jason Wang Signed-off-by: Yuanhan Liu --- v2: use macro to define the max queues we support --- tests/vhost-user-test.c | 21 +++-- 1 file changed, 19 insertions(+), 2 deletions(-) diff --git a/tests/vhost-user-test.c b/tests/vhost-user-test.c index f181391..8162e64 100644 --- a/tests/vhost-user-test.c +++ b/tests/vhost-user-test.c @@ -29,11 +29,14 @@ #define HAVE_MONOTONIC_TIME #endif +#define MAX_QUEUS 0xff + #define QEMU_CMD_ACCEL " -machine accel=tcg" #define QEMU_CMD_MEM" -m 512 -object memory-backend-file,id=mem,size=512M,"\ "mem-path=%s,share=on -numa node,memdev=mem" #define QEMU_CMD_CHR" -chardev socket,id=chr0,path=%s" -#define QEMU_CMD_NETDEV " -netdev vhost-user,id=net0,chardev=chr0,vhostforce" +#define QEMU_CMD_NETDEV " -netdev vhost-user,id=net0,chardev=chr0,vhostforce," \ +"queues=2" #define QEMU_CMD_NET" -device virtio-net-pci,netdev=net0 " #define QEMU_CMD_ROM" -option-rom ../pc-bios/pxe-virtio.rom" @@ -48,6 +51,8 @@ #define VHOST_USER_F_PROTOCOL_FEATURES 30 +#define VHOST_USER_PROTOCOL_F_MQ 0 + typedef enum VhostUserRequest { VHOST_USER_NONE = 0, VHOST_USER_GET_FEATURES = 1, @@ -66,6 +71,8 @@ typedef enum VhostUserRequest { VHOST_USER_SET_VRING_ERR = 14, VHOST_USER_GET_PROTOCOL_FEATURES = 15, VHOST_USER_SET_PROTOCOL_FEATURES = 16, +VHOST_USER_GET_QUEUE_NUM = 17, +VHOST_USER_SET_VRING_ENABLE = 18, VHOST_USER_MAX } VhostUserRequest; @@ -232,7 +239,7 @@ static void chr_read(void *opaque, const uint8_t *buf, int size) /* send back features to qemu */ msg.flags |= VHOST_USER_REPLY_MASK; msg.size = sizeof(m.u64); -msg.u64 = 0; +msg.u64 = (1ULL << VHOST_USER_PROTOCOL_F_MQ); p = (uint8_t *) &msg; qemu_chr_fe_write_all(chr, p, VHOST_USER_HDR_SIZE + msg.size); break; @@ -266,6 +273,16 @@ static void chr_read(void *opaque, const uint8_t *buf, int size) */ qemu_set_nonblock(fd); break; + +case VHOST_USER_GET_QUEUE_NUM: + /* send back the number of queues we support (let it be 2) to qemu */ +msg.flags |= VHOST_USER_REPLY_MASK; +msg.size = sizeof(m.u64); +msg.u64 = MAX_QUEUS; +p = (uint8_t *) &msg; +qemu_chr_fe_write_all(chr, p, VHOST_USER_HDR_SIZE + msg.size); +break; + default: break; } -- 1.9.0
Re: [Qemu-devel] [PATCH 1/3] Revert "vhost: rename VHOST_RESET_OWNER to VHOST_RESET_DEVICE"
On Fri, Oct 16, 2015 at 11:15:23AM +0300, Michael S. Tsirkin wrote: > On Fri, Oct 16, 2015 at 04:04:09PM +0800, Yuanhan Liu wrote: > > On Fri, Oct 16, 2015 at 10:47:49AM +0300, Michael S. Tsirkin wrote: > > > On Fri, Oct 16, 2015 at 03:32:56PM +0800, Yuanhan Liu wrote: > > > > On Fri, Oct 16, 2015 at 10:21:40AM +0300, Michael S. Tsirkin wrote: > > > > > On Fri, Oct 16, 2015 at 03:18:59PM +0800, Yuanhan Liu wrote: > > > > > > This reverts commit d1f8b30ec8dde0318fd1b98d24a64926feae9625. > > > > > > > > > > > > It turned out that it breaks stuff, so revert it: > > > > > > > > > > > > > > > > > > http://lists.nongnu.org/archive/html/qemu-devel/2015-10/msg00949.html > > > > > > > > > > > > CC: "Michael S. Tsirkin" > > > > > > Reported-by: Paolo Bonzini > > > > > > Signed-off-by: Yuanhan Liu > > > > > > > > > > OK but that's not enough. We need to also > > > > > - drop patch sending this on stop > > > > > > > > I could do that. But may I know why we have to drop it on stop? > > > > > > Because we need to get the state from remote after stop. > > > RESET_OWNER discards that, so we can't resume the VM. > > > > Got it, and thanks for the explanation. > > > > > > If we don't send it on stop, when will we send it? The termination > > > > of QEMU? > > > > > > If mq is negotiated, we don't need it. If mq isn't negotiated, maybe we > > > should send it on reset/driver unload. It might be better than > > > corrupting guest memory. And if we do, maybe we should keep > > > the RESET_DEVICE name. > > > > Maybe we should just ignore it now, before I (we) have clear clue > > on how to handle it correctly. > > Well if MQ is negotiated, then skipping this is clearly > the right thing to do. Let me put a bit more thoughts on this when get time. > > > > > > > > - send VHOST_USER_SET_VRING_ENABLE on stop and start instead > > > > > > > > That indeed looks more reasonable. And actually, Huawei asked whether > > > > there is a message from QEMU to tell us that a vring is ready in > > > > our weekly meeting this morning. And here it is :) > > > > > > > > --yliu > > > > > > Unfortunately that's only there if MQ is negotiated. > > > > Yes. > > > > > > > > If MQ is not negotiated, then I think you can get away with > > > starting processing after you get the 1st kick. > > > > We mark a vring is ready by checking the callfd and kickfd: > > if both are valid (> 0), then it's ready. > > > > --yliu > > No, that's not right. If MQ is not there, you must not look at vring > until you get at least 1 kick. In other words, until kickfd is readable. > Whether callfd is set should not matter at all. That makes sense to me. Huawei, what do you think? --yliu > If MQ is there, you should just go by VHOST_USER_SET_VRING_ENABLE. > > > > > > > > > > > > > > > > > > > --- > > > > > > docs/specs/vhost-user.txt | 4 ++-- > > > > > > hw/net/vhost_net.c | 2 +- > > > > > > hw/virtio/vhost-user.c | 8 > > > > > > linux-headers/linux/vhost.h | 2 +- > > > > > > tests/vhost-user-test.c | 2 +- > > > > > > 5 files changed, 9 insertions(+), 9 deletions(-) > > > > > > > > > > > > diff --git a/docs/specs/vhost-user.txt b/docs/specs/vhost-user.txt > > > > > > index 4eadad1..4bcd17d 100644 > > > > > > --- a/docs/specs/vhost-user.txt > > > > > > +++ b/docs/specs/vhost-user.txt > > > > > > @@ -211,10 +211,10 @@ Message types > > > > > >as an owner of the session. This can be used on the Slave as > > > > > > a > > > > > >"session start" flag. > > > > > > > > > > > > - * VHOST_USER_RESET_DEVICE > > > > > > + * VHOST_USER_RESET_OWNER > > > > > > > > > > > >Id: 4 > > > > > > - Equivalent ioctl: VHOST_RESET_DEVICE > > > > > > + Equivalent ioctl: VHOST_RESET_OWNER > > > > > >
Re: [Qemu-devel] [PATCH 1/3] Revert "vhost: rename VHOST_RESET_OWNER to VHOST_RESET_DEVICE"
On Fri, Oct 16, 2015 at 10:47:49AM +0300, Michael S. Tsirkin wrote: > On Fri, Oct 16, 2015 at 03:32:56PM +0800, Yuanhan Liu wrote: > > On Fri, Oct 16, 2015 at 10:21:40AM +0300, Michael S. Tsirkin wrote: > > > On Fri, Oct 16, 2015 at 03:18:59PM +0800, Yuanhan Liu wrote: > > > > This reverts commit d1f8b30ec8dde0318fd1b98d24a64926feae9625. > > > > > > > > It turned out that it breaks stuff, so revert it: > > > > > > > > > > > > http://lists.nongnu.org/archive/html/qemu-devel/2015-10/msg00949.html > > > > > > > > CC: "Michael S. Tsirkin" > > > > Reported-by: Paolo Bonzini > > > > Signed-off-by: Yuanhan Liu > > > > > > OK but that's not enough. We need to also > > > - drop patch sending this on stop > > > > I could do that. But may I know why we have to drop it on stop? > > Because we need to get the state from remote after stop. > RESET_OWNER discards that, so we can't resume the VM. Got it, and thanks for the explanation. > > If we don't send it on stop, when will we send it? The termination > > of QEMU? > > If mq is negotiated, we don't need it. If mq isn't negotiated, maybe we > should send it on reset/driver unload. It might be better than > corrupting guest memory. And if we do, maybe we should keep > the RESET_DEVICE name. Maybe we should just ignore it now, before I (we) have clear clue on how to handle it correctly. > > > > - send VHOST_USER_SET_VRING_ENABLE on stop and start instead > > > > That indeed looks more reasonable. And actually, Huawei asked whether > > there is a message from QEMU to tell us that a vring is ready in > > our weekly meeting this morning. And here it is :) > > > > --yliu > > Unfortunately that's only there if MQ is negotiated. Yes. > > If MQ is not negotiated, then I think you can get away with > starting processing after you get the 1st kick. We mark a vring is ready by checking the callfd and kickfd: if both are valid (> 0), then it's ready. --yliu > > > > > > > > > --- > > > > docs/specs/vhost-user.txt | 4 ++-- > > > > hw/net/vhost_net.c | 2 +- > > > > hw/virtio/vhost-user.c | 8 > > > > linux-headers/linux/vhost.h | 2 +- > > > > tests/vhost-user-test.c | 2 +- > > > > 5 files changed, 9 insertions(+), 9 deletions(-) > > > > > > > > diff --git a/docs/specs/vhost-user.txt b/docs/specs/vhost-user.txt > > > > index 4eadad1..4bcd17d 100644 > > > > --- a/docs/specs/vhost-user.txt > > > > +++ b/docs/specs/vhost-user.txt > > > > @@ -211,10 +211,10 @@ Message types > > > >as an owner of the session. This can be used on the Slave as a > > > >"session start" flag. > > > > > > > > - * VHOST_USER_RESET_DEVICE > > > > + * VHOST_USER_RESET_OWNER > > > > > > > >Id: 4 > > > > - Equivalent ioctl: VHOST_RESET_DEVICE > > > > + Equivalent ioctl: VHOST_RESET_OWNER > > > >Master payload: N/A > > > > > > > >Issued when a new connection is about to be closed. The Master > > > > will no > > > > diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c > > > > index 2bce891..804f5c9 100644 > > > > --- a/hw/net/vhost_net.c > > > > +++ b/hw/net/vhost_net.c > > > > @@ -296,7 +296,7 @@ static void vhost_net_stop_one(struct vhost_net > > > > *net, > > > > } else if (net->nc->info->type == > > > > NET_CLIENT_OPTIONS_KIND_VHOST_USER) { > > > > for (file.index = 0; file.index < net->dev.nvqs; ++file.index) > > > > { > > > > const VhostOps *vhost_ops = net->dev.vhost_ops; > > > > -int r = vhost_ops->vhost_call(&net->dev, > > > > VHOST_RESET_DEVICE, > > > > +int r = vhost_ops->vhost_call(&net->dev, VHOST_RESET_OWNER, > > > >NULL); > > > > assert(r >= 0); > > > > } > > > > diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c > > > > index b11c0d2..12a9104 100644 > > > > --- a/hw/virtio/vhost-user.c > > > > +++ b/hw/virtio/vhost-user.c > > > > @@ -34,7 +34,7 @@ typedef enum VhostUserRequest { >
Re: [Qemu-devel] [PATCH 3/3] vhost-user-test: add multiple queue test
On Fri, Oct 16, 2015 at 10:20:08AM +0300, Michael S. Tsirkin wrote: > On Fri, Oct 16, 2015 at 03:19:01PM +0800, Yuanhan Liu wrote: > > Setting VHOST_USER_PROTOCOL_F_MQ protocol feature bit to claim that we > > support MQ feature, and simply assume we support 2 queue pairs at most. > > Well - let's add a macro, or something? Yeah, a macro is better. --yliu > > > > > Cc: Michael S. Tsirkin > > Cc: Jason Wang > > Signed-off-by: Yuanhan Liu > > --- > > tests/vhost-user-test.c | 18 -- > > 1 file changed, 16 insertions(+), 2 deletions(-) > > > > diff --git a/tests/vhost-user-test.c b/tests/vhost-user-test.c > > index f181391..f0aa36f 100644 > > --- a/tests/vhost-user-test.c > > +++ b/tests/vhost-user-test.c > > @@ -33,7 +33,7 @@ > > #define QEMU_CMD_MEM" -m 512 -object > > memory-backend-file,id=mem,size=512M,"\ > > "mem-path=%s,share=on -numa node,memdev=mem" > > #define QEMU_CMD_CHR" -chardev socket,id=chr0,path=%s" > > -#define QEMU_CMD_NETDEV " -netdev > > vhost-user,id=net0,chardev=chr0,vhostforce" > > +#define QEMU_CMD_NETDEV " -netdev > > vhost-user,id=net0,chardev=chr0,vhostforce,queues=2" > > #define QEMU_CMD_NET" -device virtio-net-pci,netdev=net0 " > > #define QEMU_CMD_ROM" -option-rom ../pc-bios/pxe-virtio.rom" > > > > @@ -48,6 +48,8 @@ > > > > #define VHOST_USER_F_PROTOCOL_FEATURES 30 > > > > +#define VHOST_USER_PROTOCOL_F_MQ 0 > > + > > typedef enum VhostUserRequest { > > VHOST_USER_NONE = 0, > > VHOST_USER_GET_FEATURES = 1, > > @@ -66,6 +68,8 @@ typedef enum VhostUserRequest { > > VHOST_USER_SET_VRING_ERR = 14, > > VHOST_USER_GET_PROTOCOL_FEATURES = 15, > > VHOST_USER_SET_PROTOCOL_FEATURES = 16, > > +VHOST_USER_GET_QUEUE_NUM = 17, > > +VHOST_USER_SET_VRING_ENABLE = 18, > > VHOST_USER_MAX > > } VhostUserRequest; > > > > @@ -232,7 +236,7 @@ static void chr_read(void *opaque, const uint8_t *buf, > > int size) > > /* send back features to qemu */ > > msg.flags |= VHOST_USER_REPLY_MASK; > > msg.size = sizeof(m.u64); > > -msg.u64 = 0; > > +msg.u64 = (1ULL << VHOST_USER_PROTOCOL_F_MQ); > > p = (uint8_t *) &msg; > > qemu_chr_fe_write_all(chr, p, VHOST_USER_HDR_SIZE + msg.size); > > break; > > @@ -266,6 +270,16 @@ static void chr_read(void *opaque, const uint8_t *buf, > > int size) > > */ > > qemu_set_nonblock(fd); > > break; > > + > > +case VHOST_USER_GET_QUEUE_NUM: > > + /* send back the number of queues we support (let it be 2) to > > qemu */ > > +msg.flags |= VHOST_USER_REPLY_MASK; > > +msg.size = sizeof(m.u64); > > +msg.u64 = 2; > > +p = (uint8_t *) &msg; > > +qemu_chr_fe_write_all(chr, p, VHOST_USER_HDR_SIZE + msg.size); > > +break; > > + > > default: > > break; > > } > > -- > > 1.9.0
Re: [Qemu-devel] [PATCH 1/3] Revert "vhost: rename VHOST_RESET_OWNER to VHOST_RESET_DEVICE"
On Fri, Oct 16, 2015 at 10:21:40AM +0300, Michael S. Tsirkin wrote: > On Fri, Oct 16, 2015 at 03:18:59PM +0800, Yuanhan Liu wrote: > > This reverts commit d1f8b30ec8dde0318fd1b98d24a64926feae9625. > > > > It turned out that it breaks stuff, so revert it: > > > > http://lists.nongnu.org/archive/html/qemu-devel/2015-10/msg00949.html > > > > CC: "Michael S. Tsirkin" > > Reported-by: Paolo Bonzini > > Signed-off-by: Yuanhan Liu > > OK but that's not enough. We need to also > - drop patch sending this on stop I could do that. But may I know why we have to drop it on stop? If we don't send it on stop, when will we send it? The termination of QEMU? > - send VHOST_USER_SET_VRING_ENABLE on stop and start instead That indeed looks more reasonable. And actually, Huawei asked whether there is a message from QEMU to tell us that a vring is ready in our weekly meeting this morning. And here it is :) --yliu > > > --- > > docs/specs/vhost-user.txt | 4 ++-- > > hw/net/vhost_net.c | 2 +- > > hw/virtio/vhost-user.c | 8 > > linux-headers/linux/vhost.h | 2 +- > > tests/vhost-user-test.c | 2 +- > > 5 files changed, 9 insertions(+), 9 deletions(-) > > > > diff --git a/docs/specs/vhost-user.txt b/docs/specs/vhost-user.txt > > index 4eadad1..4bcd17d 100644 > > --- a/docs/specs/vhost-user.txt > > +++ b/docs/specs/vhost-user.txt > > @@ -211,10 +211,10 @@ Message types > >as an owner of the session. This can be used on the Slave as a > >"session start" flag. > > > > - * VHOST_USER_RESET_DEVICE > > + * VHOST_USER_RESET_OWNER > > > >Id: 4 > > - Equivalent ioctl: VHOST_RESET_DEVICE > > + Equivalent ioctl: VHOST_RESET_OWNER > >Master payload: N/A > > > >Issued when a new connection is about to be closed. The Master will > > no > > diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c > > index 2bce891..804f5c9 100644 > > --- a/hw/net/vhost_net.c > > +++ b/hw/net/vhost_net.c > > @@ -296,7 +296,7 @@ static void vhost_net_stop_one(struct vhost_net *net, > > } else if (net->nc->info->type == NET_CLIENT_OPTIONS_KIND_VHOST_USER) { > > for (file.index = 0; file.index < net->dev.nvqs; ++file.index) { > > const VhostOps *vhost_ops = net->dev.vhost_ops; > > -int r = vhost_ops->vhost_call(&net->dev, VHOST_RESET_DEVICE, > > +int r = vhost_ops->vhost_call(&net->dev, VHOST_RESET_OWNER, > >NULL); > > assert(r >= 0); > > } > > diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c > > index b11c0d2..12a9104 100644 > > --- a/hw/virtio/vhost-user.c > > +++ b/hw/virtio/vhost-user.c > > @@ -34,7 +34,7 @@ typedef enum VhostUserRequest { > > VHOST_USER_GET_FEATURES = 1, > > VHOST_USER_SET_FEATURES = 2, > > VHOST_USER_SET_OWNER = 3, > > -VHOST_USER_RESET_DEVICE = 4, > > +VHOST_USER_RESET_OWNER = 4, > > VHOST_USER_SET_MEM_TABLE = 5, > > VHOST_USER_SET_LOG_BASE = 6, > > VHOST_USER_SET_LOG_FD = 7, > > @@ -102,7 +102,7 @@ static unsigned long int > > ioctl_to_vhost_user_request[VHOST_USER_MAX] = { > > VHOST_GET_FEATURES, /* VHOST_USER_GET_FEATURES */ > > VHOST_SET_FEATURES, /* VHOST_USER_SET_FEATURES */ > > VHOST_SET_OWNER,/* VHOST_USER_SET_OWNER */ > > -VHOST_RESET_DEVICE, /* VHOST_USER_RESET_DEVICE */ > > +VHOST_RESET_OWNER, /* VHOST_USER_RESET_OWNER */ > > VHOST_SET_MEM_TABLE,/* VHOST_USER_SET_MEM_TABLE */ > > VHOST_SET_LOG_BASE, /* VHOST_USER_SET_LOG_BASE */ > > VHOST_SET_LOG_FD, /* VHOST_USER_SET_LOG_FD */ > > @@ -192,7 +192,7 @@ static bool > > vhost_user_one_time_request(VhostUserRequest request) > > { > > switch (request) { > > case VHOST_USER_SET_OWNER: > > -case VHOST_USER_RESET_DEVICE: > > +case VHOST_USER_RESET_OWNER: > > case VHOST_USER_SET_MEM_TABLE: > > case VHOST_USER_GET_QUEUE_NUM: > > return true; > > @@ -249,7 +249,7 @@ static int vhost_user_call(struct vhost_dev *dev, > > unsigned long int request, > > break; > > > > case VHOST_USER_SET_OWNER: > > -case VHOST_USER_RESET_DEVICE: > > +case VHOST_USER_RESET_OWNER: > > break; > > > > case VHOST_USER_SET_MEM_TABLE: > > diff --git a/l
[Qemu-devel] [PATCH 1/3] Revert "vhost: rename VHOST_RESET_OWNER to VHOST_RESET_DEVICE"
This reverts commit d1f8b30ec8dde0318fd1b98d24a64926feae9625. It turned out that it breaks stuff, so revert it: http://lists.nongnu.org/archive/html/qemu-devel/2015-10/msg00949.html CC: "Michael S. Tsirkin" Reported-by: Paolo Bonzini Signed-off-by: Yuanhan Liu --- docs/specs/vhost-user.txt | 4 ++-- hw/net/vhost_net.c | 2 +- hw/virtio/vhost-user.c | 8 linux-headers/linux/vhost.h | 2 +- tests/vhost-user-test.c | 2 +- 5 files changed, 9 insertions(+), 9 deletions(-) diff --git a/docs/specs/vhost-user.txt b/docs/specs/vhost-user.txt index 4eadad1..4bcd17d 100644 --- a/docs/specs/vhost-user.txt +++ b/docs/specs/vhost-user.txt @@ -211,10 +211,10 @@ Message types as an owner of the session. This can be used on the Slave as a "session start" flag. - * VHOST_USER_RESET_DEVICE + * VHOST_USER_RESET_OWNER Id: 4 - Equivalent ioctl: VHOST_RESET_DEVICE + Equivalent ioctl: VHOST_RESET_OWNER Master payload: N/A Issued when a new connection is about to be closed. The Master will no diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c index 2bce891..804f5c9 100644 --- a/hw/net/vhost_net.c +++ b/hw/net/vhost_net.c @@ -296,7 +296,7 @@ static void vhost_net_stop_one(struct vhost_net *net, } else if (net->nc->info->type == NET_CLIENT_OPTIONS_KIND_VHOST_USER) { for (file.index = 0; file.index < net->dev.nvqs; ++file.index) { const VhostOps *vhost_ops = net->dev.vhost_ops; -int r = vhost_ops->vhost_call(&net->dev, VHOST_RESET_DEVICE, +int r = vhost_ops->vhost_call(&net->dev, VHOST_RESET_OWNER, NULL); assert(r >= 0); } diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c index b11c0d2..12a9104 100644 --- a/hw/virtio/vhost-user.c +++ b/hw/virtio/vhost-user.c @@ -34,7 +34,7 @@ typedef enum VhostUserRequest { VHOST_USER_GET_FEATURES = 1, VHOST_USER_SET_FEATURES = 2, VHOST_USER_SET_OWNER = 3, -VHOST_USER_RESET_DEVICE = 4, +VHOST_USER_RESET_OWNER = 4, VHOST_USER_SET_MEM_TABLE = 5, VHOST_USER_SET_LOG_BASE = 6, VHOST_USER_SET_LOG_FD = 7, @@ -102,7 +102,7 @@ static unsigned long int ioctl_to_vhost_user_request[VHOST_USER_MAX] = { VHOST_GET_FEATURES, /* VHOST_USER_GET_FEATURES */ VHOST_SET_FEATURES, /* VHOST_USER_SET_FEATURES */ VHOST_SET_OWNER,/* VHOST_USER_SET_OWNER */ -VHOST_RESET_DEVICE, /* VHOST_USER_RESET_DEVICE */ +VHOST_RESET_OWNER, /* VHOST_USER_RESET_OWNER */ VHOST_SET_MEM_TABLE,/* VHOST_USER_SET_MEM_TABLE */ VHOST_SET_LOG_BASE, /* VHOST_USER_SET_LOG_BASE */ VHOST_SET_LOG_FD, /* VHOST_USER_SET_LOG_FD */ @@ -192,7 +192,7 @@ static bool vhost_user_one_time_request(VhostUserRequest request) { switch (request) { case VHOST_USER_SET_OWNER: -case VHOST_USER_RESET_DEVICE: +case VHOST_USER_RESET_OWNER: case VHOST_USER_SET_MEM_TABLE: case VHOST_USER_GET_QUEUE_NUM: return true; @@ -249,7 +249,7 @@ static int vhost_user_call(struct vhost_dev *dev, unsigned long int request, break; case VHOST_USER_SET_OWNER: -case VHOST_USER_RESET_DEVICE: +case VHOST_USER_RESET_OWNER: break; case VHOST_USER_SET_MEM_TABLE: diff --git a/linux-headers/linux/vhost.h b/linux-headers/linux/vhost.h index 14a0160..ead86db 100644 --- a/linux-headers/linux/vhost.h +++ b/linux-headers/linux/vhost.h @@ -78,7 +78,7 @@ struct vhost_memory { #define VHOST_SET_OWNER _IO(VHOST_VIRTIO, 0x01) /* Give up ownership, and reset the device to default values. * Allows subsequent call to VHOST_OWNER_SET to succeed. */ -#define VHOST_RESET_DEVICE _IO(VHOST_VIRTIO, 0x02) +#define VHOST_RESET_OWNER _IO(VHOST_VIRTIO, 0x02) /* Set up/modify memory layout */ #define VHOST_SET_MEM_TABLE_IOW(VHOST_VIRTIO, 0x03, struct vhost_memory) diff --git a/tests/vhost-user-test.c b/tests/vhost-user-test.c index 56df5cc..f181391 100644 --- a/tests/vhost-user-test.c +++ b/tests/vhost-user-test.c @@ -53,7 +53,7 @@ typedef enum VhostUserRequest { VHOST_USER_GET_FEATURES = 1, VHOST_USER_SET_FEATURES = 2, VHOST_USER_SET_OWNER = 3, -VHOST_USER_RESET_DEVICE = 4, +VHOST_USER_RESET_OWNER = 4, VHOST_USER_SET_MEM_TABLE = 5, VHOST_USER_SET_LOG_BASE = 6, VHOST_USER_SET_LOG_FD = 7, -- 1.9.0
[Qemu-devel] [PATCH 2/3] doc: vhost-user: request naming fix
They are VHOST_USER_XXX instead of VHOST_XXX messages. Also, add VHOST_USER_GET_QUEUE_NUM to the section that requries replies. Cc: Michael S. Tsirkin Signed-off-by: Yuanhan Liu --- docs/specs/vhost-user.txt | 17 + 1 file changed, 9 insertions(+), 8 deletions(-) diff --git a/docs/specs/vhost-user.txt b/docs/specs/vhost-user.txt index 4bcd17d..b23d88f 100644 --- a/docs/specs/vhost-user.txt +++ b/docs/specs/vhost-user.txt @@ -112,18 +112,19 @@ The communication consists of master sending message requests and slave sending message replies. Most of the requests don't require replies. Here is a list of the ones that do: - * VHOST_GET_FEATURES - * VHOST_GET_PROTOCOL_FEATURES - * VHOST_GET_VRING_BASE + * VHOST_USER_GET_FEATURES + * VHOST_USER_GET_PROTOCOL_FEATURES + * VHOST_USER_GET_VRING_BASE + * VHOST_USER_GET_QUEUE_NUM There are several messages that the master sends with file descriptors passed in the ancillary data: - * VHOST_SET_MEM_TABLE - * VHOST_SET_LOG_FD - * VHOST_SET_VRING_KICK - * VHOST_SET_VRING_CALL - * VHOST_SET_VRING_ERR + * VHOST_USER_SET_MEM_TABLE + * VHOST_USER_SET_LOG_FD + * VHOST_USER_SET_VRING_KICK + * VHOST_USER_SET_VRING_CALL + * VHOST_USER_SET_VRING_ERR If Master is unable to send the full message or receives a wrong reply it will close the connection. An optional reconnection mechanism can be implemented. -- 1.9.0
[Qemu-devel] [PATCH 3/3] vhost-user-test: add multiple queue test
Setting VHOST_USER_PROTOCOL_F_MQ protocol feature bit to claim that we support MQ feature, and simply assume we support 2 queue pairs at most. Cc: Michael S. Tsirkin Cc: Jason Wang Signed-off-by: Yuanhan Liu --- tests/vhost-user-test.c | 18 -- 1 file changed, 16 insertions(+), 2 deletions(-) diff --git a/tests/vhost-user-test.c b/tests/vhost-user-test.c index f181391..f0aa36f 100644 --- a/tests/vhost-user-test.c +++ b/tests/vhost-user-test.c @@ -33,7 +33,7 @@ #define QEMU_CMD_MEM" -m 512 -object memory-backend-file,id=mem,size=512M,"\ "mem-path=%s,share=on -numa node,memdev=mem" #define QEMU_CMD_CHR" -chardev socket,id=chr0,path=%s" -#define QEMU_CMD_NETDEV " -netdev vhost-user,id=net0,chardev=chr0,vhostforce" +#define QEMU_CMD_NETDEV " -netdev vhost-user,id=net0,chardev=chr0,vhostforce,queues=2" #define QEMU_CMD_NET" -device virtio-net-pci,netdev=net0 " #define QEMU_CMD_ROM" -option-rom ../pc-bios/pxe-virtio.rom" @@ -48,6 +48,8 @@ #define VHOST_USER_F_PROTOCOL_FEATURES 30 +#define VHOST_USER_PROTOCOL_F_MQ 0 + typedef enum VhostUserRequest { VHOST_USER_NONE = 0, VHOST_USER_GET_FEATURES = 1, @@ -66,6 +68,8 @@ typedef enum VhostUserRequest { VHOST_USER_SET_VRING_ERR = 14, VHOST_USER_GET_PROTOCOL_FEATURES = 15, VHOST_USER_SET_PROTOCOL_FEATURES = 16, +VHOST_USER_GET_QUEUE_NUM = 17, +VHOST_USER_SET_VRING_ENABLE = 18, VHOST_USER_MAX } VhostUserRequest; @@ -232,7 +236,7 @@ static void chr_read(void *opaque, const uint8_t *buf, int size) /* send back features to qemu */ msg.flags |= VHOST_USER_REPLY_MASK; msg.size = sizeof(m.u64); -msg.u64 = 0; +msg.u64 = (1ULL << VHOST_USER_PROTOCOL_F_MQ); p = (uint8_t *) &msg; qemu_chr_fe_write_all(chr, p, VHOST_USER_HDR_SIZE + msg.size); break; @@ -266,6 +270,16 @@ static void chr_read(void *opaque, const uint8_t *buf, int size) */ qemu_set_nonblock(fd); break; + +case VHOST_USER_GET_QUEUE_NUM: + /* send back the number of queues we support (let it be 2) to qemu */ +msg.flags |= VHOST_USER_REPLY_MASK; +msg.size = sizeof(m.u64); +msg.u64 = 2; +p = (uint8_t *) &msg; +qemu_chr_fe_write_all(chr, p, VHOST_USER_HDR_SIZE + msg.size); +break; + default: break; } -- 1.9.0
Re: [Qemu-devel] [PULL 12/22] vhost: rename VHOST_RESET_OWNER to VHOST_RESET_DEVICE
On Sat, Oct 03, 2015 at 07:33:16PM +0300, Michael S. Tsirkin wrote: > On Fri, Oct 02, 2015 at 06:18:51PM +0200, Paolo Bonzini wrote: > > > > > > On 24/09/2015 15:20, Michael S. Tsirkin wrote: > > > From: Yuanhan Liu > > > > > > Quote from Michael: > > > > > > We really should rename VHOST_RESET_OWNER to VHOST_RESET_DEVICE. > > > > Where is the corresponding Linux patch for this? > > > > I would like to fetch the updated headers for KVM, and this is breaking > > it. In fact, a patch that just renames the #define (without providing > > the old name for backwards compatibility) would be NACKed in upstream Linux. > > > > Paolo > > Right. And it turns out this whole approach is wrong. I intend to > revert this patch, I was considering to put this patch as the last one in this set, so that we could drop (or revert) it if it's turned out to be wrong (I had vague feeling that it was wrong). Luckily, this patch could be reverted successfully (I firstly thought there might be conflicts). Besides that, we need one extra manual fix: [yliu@yliu-dev ~/qemu]$ gg 'VHOST.*RESET_DEVICE' hw/virtio/vhost-user.c:195:case VHOST_USER_RESET_DEVICE: Michael, shall I send the revert request, or you will do that? > and also drop the patch sending VHOST_RESET_OWNER on > device stop. Something wrong there? --yliu > > > > Suggested-by: Michael S. Tsirkin > > > Signed-off-by: Yuanhan Liu > > > Reviewed-by: Michael S. Tsirkin > > > Signed-off-by: Michael S. Tsirkin > > > Signed-off-by: Yuanhan Liu > > > Reviewed-by: Marcel Apfelbaum > > > --- > > > linux-headers/linux/vhost.h | 2 +- > > > hw/net/vhost_net.c | 2 +- > > > hw/virtio/vhost-user.c | 6 +++--- > > > tests/vhost-user-test.c | 2 +- > > > docs/specs/vhost-user.txt | 4 ++-- > > > 5 files changed, 8 insertions(+), 8 deletions(-) > > > > > > diff --git a/linux-headers/linux/vhost.h b/linux-headers/linux/vhost.h > > > index ead86db..14a0160 100644 > > > --- a/linux-headers/linux/vhost.h > > > +++ b/linux-headers/linux/vhost.h > > > @@ -78,7 +78,7 @@ struct vhost_memory { > > > #define VHOST_SET_OWNER _IO(VHOST_VIRTIO, 0x01) > > > /* Give up ownership, and reset the device to default values. > > > * Allows subsequent call to VHOST_OWNER_SET to succeed. */ > > > -#define VHOST_RESET_OWNER _IO(VHOST_VIRTIO, 0x02) > > > +#define VHOST_RESET_DEVICE _IO(VHOST_VIRTIO, 0x02) > > > > > > /* Set up/modify memory layout */ > > > #define VHOST_SET_MEM_TABLE _IOW(VHOST_VIRTIO, 0x03, struct > > > vhost_memory) > > > diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c > > > index 9d32d76..b7d29b7 100644 > > > --- a/hw/net/vhost_net.c > > > +++ b/hw/net/vhost_net.c > > > @@ -287,7 +287,7 @@ static void vhost_net_stop_one(struct vhost_net *net, > > > } else if (net->nc->info->type == > > > NET_CLIENT_OPTIONS_KIND_VHOST_USER) { > > > for (file.index = 0; file.index < net->dev.nvqs; ++file.index) { > > > const VhostOps *vhost_ops = net->dev.vhost_ops; > > > -int r = vhost_ops->vhost_call(&net->dev, VHOST_RESET_OWNER, > > > +int r = vhost_ops->vhost_call(&net->dev, VHOST_RESET_DEVICE, > > >NULL); > > > assert(r >= 0); > > > } > > > diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c > > > index 7fe35c6..9cb2f52 100644 > > > --- a/hw/virtio/vhost-user.c > > > +++ b/hw/virtio/vhost-user.c > > > @@ -32,7 +32,7 @@ typedef enum VhostUserRequest { > > > VHOST_USER_GET_FEATURES = 1, > > > VHOST_USER_SET_FEATURES = 2, > > > VHOST_USER_SET_OWNER = 3, > > > -VHOST_USER_RESET_OWNER = 4, > > > +VHOST_USER_RESET_DEVICE = 4, > > > VHOST_USER_SET_MEM_TABLE = 5, > > > VHOST_USER_SET_LOG_BASE = 6, > > > VHOST_USER_SET_LOG_FD = 7, > > > @@ -98,7 +98,7 @@ static unsigned long int > > > ioctl_to_vhost_user_request[VHOST_USER_MAX] = { > > > VHOST_GET_FEATURES, /* VHOST_USER_GET_FEATURES */ > > > VHOST_SET_FEATURES, /* VHOST_USER_SET_FEATURES */ > > > VHOST_SET_OWNER,/* VHOST_USER_SET_OWNER */ > > > -VHOST_RESET_OWNER, /* VHOST_USER_RESET_OWNER */ > > > +VHOST_RESET_DEVICE, /* VHOST_USER_RES
[Qemu-devel] [PATCH v12 5/7] vhost: introduce vhost_backend_get_vq_index method
Subtracting the idx with the base(dev->vq_index) for vhost-kernel, and then adding it back for vhost-user doesn't seem right. Here introduces a new method vhost_backend_get_vq_index() for getting the right vq index for following vhost messages calls. Suggested-by: Jason Wang Signed-off-by: Yuanhan Liu Reviewed-by: Jason Wang Tested-by: Marcel Apfelbaum --- hw/virtio/vhost-backend.c | 10 +- hw/virtio/vhost-user.c| 12 ++-- hw/virtio/vhost.c | 15 ++- include/hw/virtio/vhost-backend.h | 2 ++ 4 files changed, 27 insertions(+), 12 deletions(-) diff --git a/hw/virtio/vhost-backend.c b/hw/virtio/vhost-backend.c index 4d68a27..72d1392 100644 --- a/hw/virtio/vhost-backend.c +++ b/hw/virtio/vhost-backend.c @@ -42,11 +42,19 @@ static int vhost_kernel_cleanup(struct vhost_dev *dev) return close(fd); } +static int vhost_kernel_get_vq_index(struct vhost_dev *dev, int idx) +{ +assert(idx >= dev->vq_index && idx < dev->vq_index + dev->nvqs); + +return idx - dev->vq_index; +} + static const VhostOps kernel_ops = { .backend_type = VHOST_BACKEND_TYPE_KERNEL, .vhost_call = vhost_kernel_call, .vhost_backend_init = vhost_kernel_init, -.vhost_backend_cleanup = vhost_kernel_cleanup +.vhost_backend_cleanup = vhost_kernel_cleanup, +.vhost_backend_get_vq_index = vhost_kernel_get_vq_index, }; int vhost_set_backend_type(struct vhost_dev *dev, VhostBackendType backend_type) diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c index 694fde5..5018fd6 100644 --- a/hw/virtio/vhost-user.c +++ b/hw/virtio/vhost-user.c @@ -393,9 +393,17 @@ static int vhost_user_cleanup(struct vhost_dev *dev) return 0; } +static int vhost_user_get_vq_index(struct vhost_dev *dev, int idx) +{ +assert(idx >= dev->vq_index && idx < dev->vq_index + dev->nvqs); + +return idx; +} + const VhostOps user_ops = { .backend_type = VHOST_BACKEND_TYPE_USER, .vhost_call = vhost_user_call, .vhost_backend_init = vhost_user_init, -.vhost_backend_cleanup = vhost_user_cleanup -}; +.vhost_backend_cleanup = vhost_user_cleanup, +.vhost_backend_get_vq_index = vhost_user_get_vq_index, +}; diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c index a08c36b..7a7812d 100644 --- a/hw/virtio/vhost.c +++ b/hw/virtio/vhost.c @@ -719,7 +719,7 @@ static int vhost_virtqueue_start(struct vhost_dev *dev, { hwaddr s, l, a; int r; -int vhost_vq_index = idx - dev->vq_index; +int vhost_vq_index = dev->vhost_ops->vhost_backend_get_vq_index(dev, idx); struct vhost_vring_file file = { .index = vhost_vq_index }; @@ -728,7 +728,6 @@ static int vhost_virtqueue_start(struct vhost_dev *dev, }; struct VirtQueue *vvq = virtio_get_queue(vdev, idx); -assert(idx >= dev->vq_index && idx < dev->vq_index + dev->nvqs); vq->num = state.num = virtio_queue_get_num(vdev, idx); r = dev->vhost_ops->vhost_call(dev, VHOST_SET_VRING_NUM, &state); @@ -822,12 +821,12 @@ static void vhost_virtqueue_stop(struct vhost_dev *dev, struct vhost_virtqueue *vq, unsigned idx) { -int vhost_vq_index = idx - dev->vq_index; +int vhost_vq_index = dev->vhost_ops->vhost_backend_get_vq_index(dev, idx); struct vhost_vring_state state = { .index = vhost_vq_index, }; int r; -assert(idx >= dev->vq_index && idx < dev->vq_index + dev->nvqs); + r = dev->vhost_ops->vhost_call(dev, VHOST_GET_VRING_BASE, &state); if (r < 0) { fprintf(stderr, "vhost VQ %d ring restore failed: %d\n", idx, r); @@ -1066,17 +1065,15 @@ void vhost_virtqueue_mask(struct vhost_dev *hdev, VirtIODevice *vdev, int n, { struct VirtQueue *vvq = virtio_get_queue(vdev, n); int r, index = n - hdev->vq_index; +struct vhost_vring_file file; -assert(n >= hdev->vq_index && n < hdev->vq_index + hdev->nvqs); - -struct vhost_vring_file file = { -.index = index -}; if (mask) { file.fd = event_notifier_get_fd(&hdev->vqs[index].masked_notifier); } else { file.fd = event_notifier_get_fd(virtio_queue_get_guest_notifier(vvq)); } + +file.index = hdev->vhost_ops->vhost_backend_get_vq_index(hdev, n); r = hdev->vhost_ops->vhost_call(hdev, VHOST_SET_VRING_CALL, &file); assert(r >= 0); } diff --git a/include/hw/virtio/vhost-backend.h b/include/hw/virtio/vhost-backend.h index e472f29..e1dfc6d 100644 --- a/include/hw/virtio/vhost-backend.h +++ b/include/hw/virtio/vhost-backend.h @@ -24,12 +24,14 @@ typedef int (*vhost_call)(struct vhost_dev *dev, unsigned long int request, void *arg);
[Qemu-devel] [PATCH v12 3/7] vhost: rename VHOST_RESET_OWNER to VHOST_RESET_DEVICE
Quote from Michael: > Hi Michael, > > I'm wondering why we need a payload here, as we don't do that for > VHOST_SET_OWNER. I mean, stopping one or few queue pairs when a > connect is about to be close doesn't make sense to me. Instead, > we should clean up all queue pair when VHOST_RESET_OWNER message > is received, right? We really should rename VHOST_RESET_OWNER to VHOST_RESET_DEVICE. http://lists.nongnu.org/archive/html/qemu-devel/2015-09/msg00050.html To make it sound clear that it's a per-device message, instead of per-queue or something else. Suggested-by: Michael S. Tsirkin Signed-off-by: Yuanhan Liu Reviewed-by: Marcel Apfelbaum Tested-by: Marcel Apfelbaum --- docs/specs/vhost-user.txt | 4 ++-- hw/net/vhost_net.c | 2 +- hw/virtio/vhost-user.c | 6 +++--- linux-headers/linux/vhost.h | 2 +- tests/vhost-user-test.c | 2 +- 5 files changed, 8 insertions(+), 8 deletions(-) diff --git a/docs/specs/vhost-user.txt b/docs/specs/vhost-user.txt index eec3251..af4b1b9 100644 --- a/docs/specs/vhost-user.txt +++ b/docs/specs/vhost-user.txt @@ -194,10 +194,10 @@ Message types as an owner of the session. This can be used on the Slave as a "session start" flag. - * VHOST_USER_RESET_OWNER + * VHOST_USER_RESET_DEVICE Id: 4 - Equivalent ioctl: VHOST_RESET_OWNER + Equivalent ioctl: VHOST_RESET_DEVICE Master payload: N/A Issued when a new connection is about to be closed. The Master will no diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c index 9d32d76..b7d29b7 100644 --- a/hw/net/vhost_net.c +++ b/hw/net/vhost_net.c @@ -287,7 +287,7 @@ static void vhost_net_stop_one(struct vhost_net *net, } else if (net->nc->info->type == NET_CLIENT_OPTIONS_KIND_VHOST_USER) { for (file.index = 0; file.index < net->dev.nvqs; ++file.index) { const VhostOps *vhost_ops = net->dev.vhost_ops; -int r = vhost_ops->vhost_call(&net->dev, VHOST_RESET_OWNER, +int r = vhost_ops->vhost_call(&net->dev, VHOST_RESET_DEVICE, NULL); assert(r >= 0); } diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c index 7fe35c6..9cb2f52 100644 --- a/hw/virtio/vhost-user.c +++ b/hw/virtio/vhost-user.c @@ -32,7 +32,7 @@ typedef enum VhostUserRequest { VHOST_USER_GET_FEATURES = 1, VHOST_USER_SET_FEATURES = 2, VHOST_USER_SET_OWNER = 3, -VHOST_USER_RESET_OWNER = 4, +VHOST_USER_RESET_DEVICE = 4, VHOST_USER_SET_MEM_TABLE = 5, VHOST_USER_SET_LOG_BASE = 6, VHOST_USER_SET_LOG_FD = 7, @@ -98,7 +98,7 @@ static unsigned long int ioctl_to_vhost_user_request[VHOST_USER_MAX] = { VHOST_GET_FEATURES, /* VHOST_USER_GET_FEATURES */ VHOST_SET_FEATURES, /* VHOST_USER_SET_FEATURES */ VHOST_SET_OWNER,/* VHOST_USER_SET_OWNER */ -VHOST_RESET_OWNER, /* VHOST_USER_RESET_OWNER */ +VHOST_RESET_DEVICE, /* VHOST_USER_RESET_DEVICE */ VHOST_SET_MEM_TABLE,/* VHOST_USER_SET_MEM_TABLE */ VHOST_SET_LOG_BASE, /* VHOST_USER_SET_LOG_BASE */ VHOST_SET_LOG_FD, /* VHOST_USER_SET_LOG_FD */ @@ -222,7 +222,7 @@ static int vhost_user_call(struct vhost_dev *dev, unsigned long int request, break; case VHOST_USER_SET_OWNER: -case VHOST_USER_RESET_OWNER: +case VHOST_USER_RESET_DEVICE: break; case VHOST_USER_SET_MEM_TABLE: diff --git a/linux-headers/linux/vhost.h b/linux-headers/linux/vhost.h index ead86db..14a0160 100644 --- a/linux-headers/linux/vhost.h +++ b/linux-headers/linux/vhost.h @@ -78,7 +78,7 @@ struct vhost_memory { #define VHOST_SET_OWNER _IO(VHOST_VIRTIO, 0x01) /* Give up ownership, and reset the device to default values. * Allows subsequent call to VHOST_OWNER_SET to succeed. */ -#define VHOST_RESET_OWNER _IO(VHOST_VIRTIO, 0x02) +#define VHOST_RESET_DEVICE _IO(VHOST_VIRTIO, 0x02) /* Set up/modify memory layout */ #define VHOST_SET_MEM_TABLE_IOW(VHOST_VIRTIO, 0x03, struct vhost_memory) diff --git a/tests/vhost-user-test.c b/tests/vhost-user-test.c index 75fedf0..e301db7 100644 --- a/tests/vhost-user-test.c +++ b/tests/vhost-user-test.c @@ -58,7 +58,7 @@ typedef enum VhostUserRequest { VHOST_USER_GET_FEATURES = 1, VHOST_USER_SET_FEATURES = 2, VHOST_USER_SET_OWNER = 3, -VHOST_USER_RESET_OWNER = 4, +VHOST_USER_RESET_DEVICE = 4, VHOST_USER_SET_MEM_TABLE = 5, VHOST_USER_SET_LOG_BASE = 6, VHOST_USER_SET_LOG_FD = 7, -- 1.9.0
[Qemu-devel] [PATCH v12 7/7] vhost-user: add a new message to disable/enable a specific virt queue.
From: Changchun Ouyang Add a new message, VHOST_USER_SET_VRING_ENABLE, to enable or disable a specific virt queue, which is similar to attach/detach queue for tap device. virtio driver on guest doesn't have to use max virt queue pair, it could enable any number of virt queue ranging from 1 to max virt queue pair. Signed-off-by: Changchun Ouyang Signed-off-by: Yuanhan Liu Reviewed-by: Jason Wang Tested-by: Marcel Apfelbaum --- docs/specs/vhost-user.txt | 12 +++- hw/net/vhost_net.c| 18 ++ hw/net/virtio-net.c | 8 hw/virtio/vhost-user.c| 19 +++ include/hw/virtio/vhost-backend.h | 2 ++ include/net/vhost_net.h | 2 ++ 6 files changed, 60 insertions(+), 1 deletion(-) diff --git a/docs/specs/vhost-user.txt b/docs/specs/vhost-user.txt index 477edd1..904a6f2 100644 --- a/docs/specs/vhost-user.txt +++ b/docs/specs/vhost-user.txt @@ -149,7 +149,9 @@ VHOST_USER_GET_PROTOCOL_FEATURES. Master should stop when the number of requested queues is bigger than that. As all queues share one connection, the master uses a unique index for each -queue in the sent message to identify a specified queue. +queue in the sent message to identify a specified queue. One queue pair +is enabled initially. More queues are enabled dynamically, by sending +message VHOST_USER_SET_VRING_ENABLE. Message types - @@ -328,3 +330,11 @@ Message types Query how many queues the backend supports. This request should be sent only when VHOST_USER_PROTOCOL_F_MQ is set in quried protocol features by VHOST_USER_GET_PROTOCOL_FEATURES. + + * VHOST_USER_SET_VRING_ENABLE + + Id: 18 + Equivalent ioctl: N/A + Master payload: vring state description + + Signal slave to enable or disable corresponding vring. diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c index e2cb40b..1dcac1d 100644 --- a/hw/net/vhost_net.c +++ b/hw/net/vhost_net.c @@ -422,6 +422,19 @@ VHostNetState *get_vhost_net(NetClientState *nc) return vhost_net; } + +int vhost_set_vring_enable(NetClientState *nc, int enable) +{ +VHostNetState *net = get_vhost_net(nc); +const VhostOps *vhost_ops = net->dev.vhost_ops; + +if (vhost_ops->vhost_backend_set_vring_enable) { +return vhost_ops->vhost_backend_set_vring_enable(&net->dev, enable); +} + +return 0; +} + #else uint64_t vhost_net_get_max_queues(VHostNetState *net) { @@ -472,4 +485,9 @@ VHostNetState *get_vhost_net(NetClientState *nc) { return 0; } + +int vhost_set_vring_enable(NetClientState *nc, int enable) +{ +return 0; +} #endif diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c index f72eebf..916222f 100644 --- a/hw/net/virtio-net.c +++ b/hw/net/virtio-net.c @@ -406,6 +406,10 @@ static int peer_attach(VirtIONet *n, int index) return 0; } +if (nc->peer->info->type == NET_CLIENT_OPTIONS_KIND_VHOST_USER) { +vhost_set_vring_enable(nc->peer, 1); +} + if (nc->peer->info->type != NET_CLIENT_OPTIONS_KIND_TAP) { return 0; } @@ -421,6 +425,10 @@ static int peer_detach(VirtIONet *n, int index) return 0; } +if (nc->peer->info->type == NET_CLIENT_OPTIONS_KIND_VHOST_USER) { +vhost_set_vring_enable(nc->peer, 0); +} + if (nc->peer->info->type != NET_CLIENT_OPTIONS_KIND_TAP) { return 0; } diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c index d6850d5..aca10a2 100644 --- a/hw/virtio/vhost-user.c +++ b/hw/virtio/vhost-user.c @@ -48,6 +48,7 @@ typedef enum VhostUserRequest { VHOST_USER_GET_PROTOCOL_FEATURES = 15, VHOST_USER_SET_PROTOCOL_FEATURES = 16, VHOST_USER_GET_QUEUE_NUM = 17, +VHOST_USER_SET_VRING_ENABLE = 18, VHOST_USER_MAX } VhostUserRequest; @@ -290,6 +291,7 @@ static int vhost_user_call(struct vhost_dev *dev, unsigned long int request, case VHOST_USER_SET_VRING_NUM: case VHOST_USER_SET_VRING_BASE: +case VHOST_USER_SET_VRING_ENABLE: memcpy(&msg.state, arg, sizeof(struct vhost_vring_state)); msg.size = sizeof(m.state); break; @@ -406,6 +408,22 @@ static int vhost_user_init(struct vhost_dev *dev, void *opaque) return 0; } +static int vhost_user_set_vring_enable(struct vhost_dev *dev, int enable) +{ +struct vhost_vring_state state = { +.index = dev->vq_index, +.num = enable, +}; + +assert(dev->vhost_ops->backend_type == VHOST_BACKEND_TYPE_USER); + +if (!(dev->protocol_features & (1ULL << VHOST_USER_PROTOCOL_F_MQ))) { +return -1; +} + +return vhost_user_call(dev, VHOST_USER_SET_VRING_ENABLE, &state); +} + static int vhost_user_cleanup(struct vhost_dev *dev) { assert(dev->vhost_ops->backend_type == VHOST_BACKEND_TYPE_USER); @@ -428,4 +446,5 @@ const VhostOps user_ops = {
[Qemu-devel] [PATCH v12 4/7] vhost-user: add VHOST_USER_GET_QUEUE_NUM message
This is for querying how many queues the backend supports if it has mq support(when VHOST_USER_PROTOCOL_F_MQ flag is set from the queried protocol features). vhost_net_get_max_queues() is the interface to export that value, and to tell if the backend supports # of queues user requested, which is done in the following patch. Signed-off-by: Yuanhan Liu Reviewed-by: Marcel Apfelbaum Tested-by: Marcel Apfelbaum --- v12: mark VHOST_USER_GET_QUEUE_NUM as a message need reply. --- docs/specs/vhost-user.txt | 12 hw/net/vhost_net.c| 12 hw/virtio/vhost-user.c| 15 ++- include/hw/virtio/vhost.h | 1 + include/net/vhost_net.h | 1 + 5 files changed, 40 insertions(+), 1 deletion(-) diff --git a/docs/specs/vhost-user.txt b/docs/specs/vhost-user.txt index af4b1b9..5346df2 100644 --- a/docs/specs/vhost-user.txt +++ b/docs/specs/vhost-user.txt @@ -115,6 +115,7 @@ the ones that do: * VHOST_GET_FEATURES * VHOST_GET_PROTOCOL_FEATURES * VHOST_GET_VRING_BASE + * VHOST_USER_GET_QUEUE_NUM There are several messages that the master sends with file descriptors passed in the ancillary data: @@ -301,3 +302,14 @@ Message types Bits (0-7) of the payload contain the vring index. Bit 8 is the invalid FD flag. This flag is set when there is no file descriptor in the ancillary data. + + * VHOST_USER_GET_QUEUE_NUM + + Id: 17 + Equivalent ioctl: N/A + Master payload: N/A + Slave payload: u64 + + Query how many queues the backend supports. This request should be + sent only when VHOST_USER_PROTOCOL_F_MQ is set in quried protocol + features by VHOST_USER_GET_PROTOCOL_FEATURES. diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c index b7d29b7..f663e5a 100644 --- a/hw/net/vhost_net.c +++ b/hw/net/vhost_net.c @@ -122,6 +122,11 @@ void vhost_net_ack_features(struct vhost_net *net, uint64_t features) vhost_ack_features(&net->dev, vhost_net_get_feature_bits(net), features); } +uint64_t vhost_net_get_max_queues(VHostNetState *net) +{ +return net->dev.max_queues; +} + static int vhost_net_get_fd(NetClientState *backend) { switch (backend->info->type) { @@ -144,6 +149,8 @@ struct vhost_net *vhost_net_init(VhostNetOptions *options) goto fail; } +net->dev.max_queues = 1; + if (backend_kernel) { r = vhost_net_get_fd(options->net_backend); if (r < 0) { @@ -414,6 +421,11 @@ VHostNetState *get_vhost_net(NetClientState *nc) return vhost_net; } #else +uint64_t vhost_net_get_max_queues(VHostNetState *net) +{ +return 1; +} + struct vhost_net *vhost_net_init(VhostNetOptions *options) { error_report("vhost-net support is not compiled in"); diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c index 9cb2f52..694fde5 100644 --- a/hw/virtio/vhost-user.c +++ b/hw/virtio/vhost-user.c @@ -25,7 +25,9 @@ #define VHOST_MEMORY_MAX_NREGIONS8 #define VHOST_USER_F_PROTOCOL_FEATURES 30 -#define VHOST_USER_PROTOCOL_FEATURE_MASK 0x0ULL +#define VHOST_USER_PROTOCOL_FEATURE_MASK 0x1ULL + +#define VHOST_USER_PROTOCOL_F_MQ0 typedef enum VhostUserRequest { VHOST_USER_NONE = 0, @@ -45,6 +47,7 @@ typedef enum VhostUserRequest { VHOST_USER_SET_VRING_ERR = 14, VHOST_USER_GET_PROTOCOL_FEATURES = 15, VHOST_USER_SET_PROTOCOL_FEATURES = 16, +VHOST_USER_GET_QUEUE_NUM = 17, VHOST_USER_MAX } VhostUserRequest; @@ -211,6 +214,7 @@ static int vhost_user_call(struct vhost_dev *dev, unsigned long int request, switch (msg_request) { case VHOST_USER_GET_FEATURES: case VHOST_USER_GET_PROTOCOL_FEATURES: +case VHOST_USER_GET_QUEUE_NUM: need_reply = 1; break; @@ -315,6 +319,7 @@ static int vhost_user_call(struct vhost_dev *dev, unsigned long int request, switch (msg_request) { case VHOST_USER_GET_FEATURES: case VHOST_USER_GET_PROTOCOL_FEATURES: +case VHOST_USER_GET_QUEUE_NUM: if (msg.size != sizeof(m.u64)) { error_report("Received bad msg size."); return -1; @@ -366,6 +371,14 @@ static int vhost_user_init(struct vhost_dev *dev, void *opaque) if (err < 0) { return err; } + +/* query the max queues we support if backend supports Multiple Queue */ +if (dev->protocol_features & (1ULL << VHOST_USER_PROTOCOL_F_MQ)) { +err = vhost_user_call(dev, VHOST_USER_GET_QUEUE_NUM, &dev->max_queues); +if (err < 0) { +return err; +} +} } return 0; diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h index 6467c73..c3758f3 100644 --- a/include/hw/virtio/vhost.h +++ b/include/hw/virtio/vhost.h @@ -48,6 +48,7 @@ struct vhost_dev { unsigned long long acked_features; unsigned long long backend_features; unsigned long long p
[Qemu-devel] [PATCH v12 2/7] vhost-user: add protocol feature negotiation
From: "Michael S. Tsirkin" Support a separate bitmask for vhost-user protocol features, and messages to get/set protocol features. Invoke them at init. No features are defined yet. [ leverage vhost_user_call for request handling -- Yuanhan Liu ] Signed-off-by: Michael S. Tsirkin Signed-off-by: Yuanhan Liu Reviewed-by: Marcel Apfelbaum Tested-by: Marcel Apfelbaum --- docs/specs/vhost-user.txt | 37 + hw/net/vhost_net.c| 2 ++ hw/virtio/vhost-user.c| 31 +++ include/hw/virtio/vhost.h | 1 + 4 files changed, 71 insertions(+) diff --git a/docs/specs/vhost-user.txt b/docs/specs/vhost-user.txt index 650bb18..eec3251 100644 --- a/docs/specs/vhost-user.txt +++ b/docs/specs/vhost-user.txt @@ -113,6 +113,7 @@ message replies. Most of the requests don't require replies. Here is a list of the ones that do: * VHOST_GET_FEATURES + * VHOST_GET_PROTOCOL_FEATURES * VHOST_GET_VRING_BASE There are several messages that the master sends with file descriptors passed @@ -127,6 +128,13 @@ in the ancillary data: If Master is unable to send the full message or receives a wrong reply it will close the connection. An optional reconnection mechanism can be implemented. +Any protocol extensions are gated by protocol feature bits, +which allows full backwards compatibility on both master +and slave. +As older slaves don't support negotiating protocol features, +a feature bit was dedicated for this purpose: +#define VHOST_USER_F_PROTOCOL_FEATURES 30 + Message types - @@ -138,6 +146,8 @@ Message types Slave payload: u64 Get from the underlying vhost implementation the features bitmask. + Feature bit VHOST_USER_F_PROTOCOL_FEATURES signals slave support for + VHOST_USER_GET_PROTOCOL_FEATURES and VHOST_USER_SET_PROTOCOL_FEATURES. * VHOST_USER_SET_FEATURES @@ -146,6 +156,33 @@ Message types Master payload: u64 Enable features in the underlying vhost implementation using a bitmask. + Feature bit VHOST_USER_F_PROTOCOL_FEATURES signals slave support for + VHOST_USER_GET_PROTOCOL_FEATURES and VHOST_USER_SET_PROTOCOL_FEATURES. + + * VHOST_USER_GET_PROTOCOL_FEATURES + + Id: 15 + Equivalent ioctl: N/A + Master payload: N/A + Slave payload: u64 + + Get the protocol feature bitmask from the underlying vhost implementation. + Only legal if feature bit VHOST_USER_F_PROTOCOL_FEATURES is present in + VHOST_USER_GET_FEATURES. + Note: slave that reported VHOST_USER_F_PROTOCOL_FEATURES must support + this message even before VHOST_USER_SET_FEATURES was called. + + * VHOST_USER_SET_PROTOCOL_FEATURES + + Id: 16 + Equivalent Ioctl: N/A + Master payload: u64 + + Enable protocol features in the underlying vhost implementation. + Only legal if feature bit VHOST_USER_F_PROTOCOL_FEATURES is present in + VHOST_USER_GET_FEATURES. + Note: slave that reported VHOST_USER_F_PROTOCOL_FEATURES must support + this message even before VHOST_USER_SET_FEATURES was called. * VHOST_USER_SET_OWNER diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c index 1d76b94..9d32d76 100644 --- a/hw/net/vhost_net.c +++ b/hw/net/vhost_net.c @@ -152,8 +152,10 @@ struct vhost_net *vhost_net_init(VhostNetOptions *options) net->dev.backend_features = qemu_has_vnet_hdr(options->net_backend) ? 0 : (1ULL << VHOST_NET_F_VIRTIO_NET_HDR); net->backend = r; +net->dev.protocol_features = 0; } else { net->dev.backend_features = 0; +net->dev.protocol_features = 0; net->backend = -1; } net->nc = options->net_backend; diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c index 13677ac..7fe35c6 100644 --- a/hw/virtio/vhost-user.c +++ b/hw/virtio/vhost-user.c @@ -24,6 +24,8 @@ #include #define VHOST_MEMORY_MAX_NREGIONS8 +#define VHOST_USER_F_PROTOCOL_FEATURES 30 +#define VHOST_USER_PROTOCOL_FEATURE_MASK 0x0ULL typedef enum VhostUserRequest { VHOST_USER_NONE = 0, @@ -41,6 +43,8 @@ typedef enum VhostUserRequest { VHOST_USER_SET_VRING_KICK = 12, VHOST_USER_SET_VRING_CALL = 13, VHOST_USER_SET_VRING_ERR = 14, +VHOST_USER_GET_PROTOCOL_FEATURES = 15, +VHOST_USER_SET_PROTOCOL_FEATURES = 16, VHOST_USER_MAX } VhostUserRequest; @@ -206,11 +210,13 @@ static int vhost_user_call(struct vhost_dev *dev, unsigned long int request, switch (msg_request) { case VHOST_USER_GET_FEATURES: +case VHOST_USER_GET_PROTOCOL_FEATURES: need_reply = 1; break; case VHOST_USER_SET_FEATURES: case VHOST_USER_SET_LOG_BASE: +case VHOST_USER_SET_PROTOCOL_FEATURES: msg.u64 = *((__u64 *) arg); msg.size = sizeof(m.u64); break; @@ -308,6 +314,7 @@ static int vhost_user_call(struct vhost_dev *dev, unsigned long int req
[Qemu-devel] [PATCH v12 6/7] vhost-user: add multiple queue support
From: Changchun Ouyang This patch is initially based a patch from Nikolay Nikolaev. This patch adds vhost-user multiple queue support, by creating a nc and vhost_net pair for each queue. Qemu exits if find that the backend can't support the number of requested queues (by providing queues=# option). The max number is queried by a new message, VHOST_USER_GET_QUEUE_NUM, and is sent only when protocol feature VHOST_USER_PROTOCOL_F_MQ is present first. The max queue check is done at vhost-user initiation stage. We initiate one queue first, which, in the meantime, also gets the max_queues the backend supports. In older version, it was reported that some messages are sent more times than necessary. Here we came an agreement with Michael that we could categorize vhost user messages to 2 types: dev specific messages, which should be sent only once, and vring specific messages, which should be sent per queue. Here I introduced a helper function vhost_user_dev_request(), which lists following messages as dev specific messages: VHOST_USER_SET_OWNER VHOST_USER_RESET_DEVICE VHOST_USER_SET_MEM_TABLE VHOST_USER_GET_QUEUE_NUM For above messages, we simply ignore them when they are not sent the first time. Signed-off-by: Nikolay Nikolaev Signed-off-by: Changchun Ouyang Signed-off-by: Yuanhan Liu Reviewed-by: Jason Wang Tested-by: Marcel Apfelbaum --- v12: rename vhost_user_one_time_request() to vhost_user_dev_request(). use vhost_net_set_vq_index() instead direct assignment. v11: inovke qmp_set_link() directly --- docs/specs/vhost-user.txt | 15 + hw/net/vhost_net.c| 20 --- hw/virtio/vhost-user.c| 22 hw/virtio/vhost.c | 5 +- net/vhost-user.c | 141 ++ qapi-schema.json | 6 +- qemu-options.hx | 5 +- 7 files changed, 152 insertions(+), 62 deletions(-) diff --git a/docs/specs/vhost-user.txt b/docs/specs/vhost-user.txt index 5346df2..477edd1 100644 --- a/docs/specs/vhost-user.txt +++ b/docs/specs/vhost-user.txt @@ -136,6 +136,21 @@ As older slaves don't support negotiating protocol features, a feature bit was dedicated for this purpose: #define VHOST_USER_F_PROTOCOL_FEATURES 30 +Multiple queue support +-- + +Multiple queue is treated as a protocol extension, hence the slave has to +implement protocol features first. The multiple queues feature is supported +only when the protocol feature VHOST_USER_PROTOCOL_F_MQ (bit 0) is set: +#define VHOST_USER_PROTOCOL_F_MQ0 + +The max number of queues the slave supports can be queried with message +VHOST_USER_GET_PROTOCOL_FEATURES. Master should stop when the number of +requested queues is bigger than that. + +As all queues share one connection, the master uses a unique index for each +queue in the sent message to identify a specified queue. + Message types - diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c index f663e5a..e2cb40b 100644 --- a/hw/net/vhost_net.c +++ b/hw/net/vhost_net.c @@ -138,6 +138,11 @@ static int vhost_net_get_fd(NetClientState *backend) } } +static void vhost_net_set_vq_index(struct vhost_net *net, int vq_index) +{ +net->dev.vq_index = vq_index; +} + struct vhost_net *vhost_net_init(VhostNetOptions *options) { int r; @@ -148,8 +153,11 @@ struct vhost_net *vhost_net_init(VhostNetOptions *options) fprintf(stderr, "vhost-net requires net backend to be setup\n"); goto fail; } +net->nc = options->net_backend; net->dev.max_queues = 1; +net->dev.nvqs = 2; +net->dev.vqs = net->vqs; if (backend_kernel) { r = vhost_net_get_fd(options->net_backend); @@ -164,11 +172,10 @@ struct vhost_net *vhost_net_init(VhostNetOptions *options) net->dev.backend_features = 0; net->dev.protocol_features = 0; net->backend = -1; -} -net->nc = options->net_backend; -net->dev.nvqs = 2; -net->dev.vqs = net->vqs; +/* vhost-user needs vq_index to initiate a specific queue pair */ +vhost_net_set_vq_index(net, net->nc->queue_index * net->dev.nvqs); +} r = vhost_dev_init(&net->dev, options->opaque, options->backend_type); @@ -196,11 +203,6 @@ fail: return NULL; } -static void vhost_net_set_vq_index(struct vhost_net *net, int vq_index) -{ -net->dev.vq_index = vq_index; -} - static int vhost_net_set_vnet_endian(VirtIODevice *dev, NetClientState *peer, bool set) { diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c index 5018fd6..d6850d5 100644 --- a/hw/virtio/vhost-user.c +++ b/hw/virtio/vhost-user.c @@ -187,6 +187,19 @@ static int vhost_user_write(struct vhost_dev *dev, VhostUserMsg *msg, 0 : -1; } +static bool vhost_user_dev_request(V
[Qemu-devel] [PATCH v12 0/7] vhost-user multiple queue support
Hi, Here is the updated patch set for enabling vhost-user multiple queue. It works well in my test, and it also passes Marcel's testing (thank him for testing!). --- There are only few minor changes comparing with last version, such as typo fixes, function rename, and something else. The reason I made another formal version is, after addressing all comments I got so far, that I hope Michael could consider to merge this patch set. And if it's still not ready, Michael, please kindly tell me so that I can fix them. This patch set introduces 2 more vhost user messages: VHOST_USER_GET_QUEUE_NUM, for querying how many queues the backend supports, and VHOST_USER_SET_VRING_ENABLE, for enabling/disabling a specific virt queue. Both of the two new messages are treated as vhost protocol extension, and that's why Michaels's patch "vhost-user: add protocol feature negotiation" is also included here. Patch 1-5 are all prepare works for actually enabling multiple queue. Patch 6 is the major patch for enabling multiple queue, which also tries to address two major concerns from Michael: no feedback from backend if it can't support # of requested queues, and all messages are sent N time. It also fixes a hidden bug. Patch 7 introduces the VHOST_USER_SET_VRING_ENABLE message, to enable or disable a specific vring. v12: - typo fixes, function rename, and so on. v11: - typo fixes pointed out by Eric - define a dummy vhost_net_get_max_queues() when !CONFIG_VHOST_NET. - Per suggested by Jason Wang, invoke qmp_set_link() directly to remove duplicate code. v10: - typo fixes pointed out by Eric - [PATCH 6]: don't treat VHOST_USER_SET/GET_[PROTOCOL]_FEATURES as one time request, as the two feature bits need to be stored at per-device. v9: - Per suggested by Jason Wang, patch 5 introduces a new vhost backend method: vhost_backend_get_vq_index(). - Use qemu_find_net_clients_except() at net/vhost-user.c for gathering all related ncs so that we could register chr dev event handler once. Which is also suggested by Jason Wang. Thanks. --yliu --- Changchun Ouyang (2): vhost-user: add multiple queue support vhost-user: add a new message to disable/enable a specific virt queue. Michael S. Tsirkin (1): vhost-user: add protocol feature negotiation Yuanhan Liu (4): vhost-user: use VHOST_USER_XXX macro for switch statement vhost: rename VHOST_RESET_OWNER to VHOST_RESET_DEVICE vhost-user: add VHOST_USER_GET_QUEUE_NUM message vhost: introduce vhost_backend_get_vq_index method docs/specs/vhost-user.txt | 78 - hw/net/vhost_net.c| 54 --- hw/net/virtio-net.c | 8 +++ hw/virtio/vhost-backend.c | 10 ++- hw/virtio/vhost-user.c| 139 +++-- hw/virtio/vhost.c | 20 +++--- include/hw/virtio/vhost-backend.h | 4 ++ include/hw/virtio/vhost.h | 2 + include/net/vhost_net.h | 3 + linux-headers/linux/vhost.h | 2 +- net/vhost-user.c | 141 +- qapi-schema.json | 6 +- qemu-options.hx | 5 +- tests/vhost-user-test.c | 2 +- 14 files changed, 377 insertions(+), 97 deletions(-) -- 1.9.0
[Qemu-devel] [PATCH v12 1/7] vhost-user: use VHOST_USER_XXX macro for switch statement
So that we could let vhost_user_call to handle extented requests, such as VHOST_USER_GET/SET_PROTOCOL_FEATURES, instead of invoking vhost_user_read/write and constructing the msg again by ourself. Signed-off-by: Yuanhan Liu Reviewed-by: Marcel Apfelbaum Tested-by: Marcel Apfelbaum --- hw/virtio/vhost-user.c | 38 ++ 1 file changed, 22 insertions(+), 16 deletions(-) diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c index e7ab829..13677ac 100644 --- a/hw/virtio/vhost-user.c +++ b/hw/virtio/vhost-user.c @@ -193,27 +193,33 @@ static int vhost_user_call(struct vhost_dev *dev, unsigned long int request, assert(dev->vhost_ops->backend_type == VHOST_BACKEND_TYPE_USER); -msg_request = vhost_user_request_translate(request); +/* only translate vhost ioctl requests */ +if (request > VHOST_USER_MAX) { +msg_request = vhost_user_request_translate(request); +} else { +msg_request = request; +} + msg.request = msg_request; msg.flags = VHOST_USER_VERSION; msg.size = 0; -switch (request) { -case VHOST_GET_FEATURES: +switch (msg_request) { +case VHOST_USER_GET_FEATURES: need_reply = 1; break; -case VHOST_SET_FEATURES: -case VHOST_SET_LOG_BASE: +case VHOST_USER_SET_FEATURES: +case VHOST_USER_SET_LOG_BASE: msg.u64 = *((__u64 *) arg); msg.size = sizeof(m.u64); break; -case VHOST_SET_OWNER: -case VHOST_RESET_OWNER: +case VHOST_USER_SET_OWNER: +case VHOST_USER_RESET_OWNER: break; -case VHOST_SET_MEM_TABLE: +case VHOST_USER_SET_MEM_TABLE: for (i = 0; i < dev->mem->nregions; ++i) { struct vhost_memory_region *reg = dev->mem->regions + i; ram_addr_t ram_addr; @@ -246,30 +252,30 @@ static int vhost_user_call(struct vhost_dev *dev, unsigned long int request, break; -case VHOST_SET_LOG_FD: +case VHOST_USER_SET_LOG_FD: fds[fd_num++] = *((int *) arg); break; -case VHOST_SET_VRING_NUM: -case VHOST_SET_VRING_BASE: +case VHOST_USER_SET_VRING_NUM: +case VHOST_USER_SET_VRING_BASE: memcpy(&msg.state, arg, sizeof(struct vhost_vring_state)); msg.size = sizeof(m.state); break; -case VHOST_GET_VRING_BASE: +case VHOST_USER_GET_VRING_BASE: memcpy(&msg.state, arg, sizeof(struct vhost_vring_state)); msg.size = sizeof(m.state); need_reply = 1; break; -case VHOST_SET_VRING_ADDR: +case VHOST_USER_SET_VRING_ADDR: memcpy(&msg.addr, arg, sizeof(struct vhost_vring_addr)); msg.size = sizeof(m.addr); break; -case VHOST_SET_VRING_KICK: -case VHOST_SET_VRING_CALL: -case VHOST_SET_VRING_ERR: +case VHOST_USER_SET_VRING_KICK: +case VHOST_USER_SET_VRING_CALL: +case VHOST_USER_SET_VRING_ERR: file = arg; msg.u64 = file->index & VHOST_USER_VRING_IDX_MASK; msg.size = sizeof(m.u64); -- 1.9.0
Re: [Qemu-devel] [PATCH v11 2/7] vhost-user: add protocol feature negotiation
On Thu, Sep 24, 2015 at 01:13:24PM +0300, Marcel Apfelbaum wrote: > >diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c > >index 1d76b94..9d32d76 100644 > >--- a/hw/net/vhost_net.c > >+++ b/hw/net/vhost_net.c > >@@ -152,8 +152,10 @@ struct vhost_net *vhost_net_init(VhostNetOptions > >*options) > > net->dev.backend_features = qemu_has_vnet_hdr(options->net_backend) > > ? 0 : (1ULL << VHOST_NET_F_VIRTIO_NET_HDR); > > net->backend = r; > >+net->dev.protocol_features = 0; > > } else { > > net->dev.backend_features = 0; > >+net->dev.protocol_features = 0; > > net->backend = -1; > > } > > Maybe protocol_features assignment should be outside the if clause. > (assigned to 0 in both cases) Yeah, we could do that. However, it seems that it will take more effort, for handling patch conflicts while rebase, than it worths. Therefore I will keep it. --yliu
Re: [Qemu-devel] [PATCH v11 2/7] vhost-user: add protocol feature negotiation
On Thu, Sep 24, 2015 at 01:13:24PM +0300, Marcel Apfelbaum wrote: > On 09/23/2015 07:19 AM, Yuanhan Liu wrote: > >From: "Michael S. Tsirkin" > > > >Support a separate bitmask for vhost-user protocol features, > >and messages to get/set protocol features. > > > >Invoke them at init. > > > >No features are defined yet. > > > >[ leverage vhost_user_call for request handling -- Yuanhan Liu ] > > > >Signed-off-by: Michael S. Tsirkin > >Signed-off-by: Yuanhan Liu > >--- > > docs/specs/vhost-user.txt | 37 + > > hw/net/vhost_net.c| 2 ++ > > hw/virtio/vhost-user.c| 31 +++ > > include/hw/virtio/vhost.h | 1 + > > 4 files changed, 71 insertions(+) > > > >diff --git a/docs/specs/vhost-user.txt b/docs/specs/vhost-user.txt > >index 650bb18..70da3b1 100644 > >--- a/docs/specs/vhost-user.txt > >+++ b/docs/specs/vhost-user.txt > >@@ -113,6 +113,7 @@ message replies. Most of the requests don't require > >replies. Here is a list of > > the ones that do: > > > > * VHOST_GET_FEATURES > >+ * VHOST_GET_PROTOCOL_FEATURES > > * VHOST_GET_VRING_BASE > > > > There are several messages that the master sends with file descriptors > > passed > >@@ -127,6 +128,13 @@ in the ancillary data: > > If Master is unable to send the full message or receives a wrong reply it > > will > > close the connection. An optional reconnection mechanism can be > > implemented. > > > >+Any protocol extensions are gated by protocol feature bits, > >+which allows full backwards compatibility on both master > >+and slave. > >+As older slaves don't support negotiating protocol features, > >+a feature bit was dedicated for this purpose: > >+#define VHOST_USER_F_PROTOCOL_FEATURES 30 > >+ > > Message types > > - > > > >@@ -138,6 +146,8 @@ Message types > >Slave payload: u64 > > > >Get from the underlying vhost implementation the features bitmask. > >+ Feature bit VHOST_USER_F_PROTOCOL_FEATURES signals slave support for > >+ VHOST_USER_GET_PROTOCOL_FEATURES and VHOST_USER_SET_PROTOCOL_FEATURES. > > > > * VHOST_USER_SET_FEATURES > > > >@@ -146,6 +156,33 @@ Message types > >Master payload: u64 > > > >Enable features in the underlying vhost implementation using a > > bitmask. > >+ Feature bit VHOST_USER_F_PROTOCOL_FEATURES signals slave support for > >+ VHOST_USER_GET_PROTOCOL_FEATURES and VHOST_USER_SET_PROTOCOL_FEATURES. > >+ > >+ * VHOST_USER_GET_PROTOCOL_FEATURES > >+ > >+ Id: 15 > >+ Equivalent ioctl: VHOST_GET_FEATURES > > VHOST_USER_GET_PROTOCOL_FEATURES does not have an equivalent ioctl Oops, I didn't notice that. Will fix it soon, and thank you for the review! --yliu > > >+ Master payload: N/A > >+ Slave payload: u64 > >+ > >+ Get the protocol feature bitmask from the underlying vhost > >implementation. > >+ Only legal if feature bit VHOST_USER_F_PROTOCOL_FEATURES is present in > >+ VHOST_USER_GET_FEATURES. > >+ Note: slave that reported VHOST_USER_F_PROTOCOL_FEATURES must support > >+ this message even before VHOST_USER_SET_FEATURES was called. > >+ > >+ * VHOST_USER_SET_PROTOCOL_FEATURES > >+ > >+ Id: 16 > >+ Ioctl: VHOST_SET_FEATURES > > Same here > > >+ Master payload: u64 > >+ > >+ Enable protocol features in the underlying vhost implementation. > >+ Only legal if feature bit VHOST_USER_F_PROTOCOL_FEATURES is present in > >+ VHOST_USER_GET_FEATURES. > >+ Note: slave that reported VHOST_USER_F_PROTOCOL_FEATURES must support > >+ this message even before VHOST_USER_SET_FEATURES was called. > > > > * VHOST_USER_SET_OWNER > > > >diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c > >index 1d76b94..9d32d76 100644 > >--- a/hw/net/vhost_net.c > >+++ b/hw/net/vhost_net.c > >@@ -152,8 +152,10 @@ struct vhost_net *vhost_net_init(VhostNetOptions > >*options) > > net->dev.backend_features = qemu_has_vnet_hdr(options->net_backend) > > ? 0 : (1ULL << VHOST_NET_F_VIRTIO_NET_HDR); > > net->backend = r; > >+net->dev.protocol_features = 0; > > } else { > > net->dev.backend_features = 0; > >+net->dev.protocol_features = 0; > >
Re: [Qemu-devel] [PATCH v11 6/7] vhost-user: add multiple queue support
On Thu, Sep 24, 2015 at 01:34:31PM +0800, Jason Wang wrote: > > > Some nitpicks and comments. If you plan to send another version, please > consider to fix them. I will, but I'd like to hold a while before getting more comments from Michael; I don't want to repost a whole new version too often for minor changes. BTW, Michael, is this patchset being ready for merge? If not, please kindly tell me what is wrong so that I can fix them. TBH, I'd really like to see it be merged soon, as I'm gonna have holidays soon for two weeks start from this Saturday. I could still reply emails, even make patches during that, but I guess I only have limited time for that. > > Reviewed-by: Jason Wang Thank you! > > Btw, it's better to extend current vhost-user unit test to support > multiqueue. Please consider this on top this series. Yeah, I will. But, may I do that later? (And if possible, after the merge?) > > > > > diff --git a/docs/specs/vhost-user.txt b/docs/specs/vhost-user.txt > > index 43db9b4..cfc9d41 100644 > > --- a/docs/specs/vhost-user.txt > > +++ b/docs/specs/vhost-user.txt > > @@ -135,6 +135,21 @@ As older slaves don't support negotiating protocol > > features, > > a feature bit was dedicated for this purpose: > > #define VHOST_USER_F_PROTOCOL_FEATURES 30 > > > > +Multiple queue support > > +-- > > + > > +Multiple queue is treated as a protocol extension, hence the slave has to > > +implement protocol features first. The multiple queues feature is supported > > +only when the protocol feature VHOST_USER_PROTOCOL_F_MQ (bit 0) is set: > > +#define VHOST_USER_PROTOCOL_F_MQ0 > > + > > +The max number of queues the slave supports can be queried with message > > +VHOST_USER_GET_PROTOCOL_FEATURES. Master should stop when the number of > > +requested queues is bigger than that. > > + > > +As all queues share one connection, the master uses a unique index for each > > +queue in the sent message to identify a specified queue. > > + > > Message types > > - > > > > diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c > > index f663e5a..ad82b5c 100644 > > --- a/hw/net/vhost_net.c > > +++ b/hw/net/vhost_net.c > > @@ -148,8 +148,11 @@ struct vhost_net *vhost_net_init(VhostNetOptions > > *options) > > fprintf(stderr, "vhost-net requires net backend to be setup\n"); > > goto fail; > > } > > +net->nc = options->net_backend; > > > > net->dev.max_queues = 1; > > +net->dev.nvqs = 2; > > +net->dev.vqs = net->vqs; > > > > if (backend_kernel) { > > r = vhost_net_get_fd(options->net_backend); > > @@ -164,11 +167,10 @@ struct vhost_net *vhost_net_init(VhostNetOptions > > *options) > > net->dev.backend_features = 0; > > net->dev.protocol_features = 0; > > net->backend = -1; > > -} > > -net->nc = options->net_backend; > > > > -net->dev.nvqs = 2; > > -net->dev.vqs = net->vqs; > > +/* vhost-user needs vq_index to initiate a specific queue pair */ > > +net->dev.vq_index = net->nc->queue_index * net->dev.nvqs; > > Better use vhost_net_set_vq_index() here. I could do that. > > > +} > > > > r = vhost_dev_init(&net->dev, options->opaque, > > options->backend_type); > > diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c > > index 5018fd6..e42fde6 100644 > > --- a/hw/virtio/vhost-user.c > > +++ b/hw/virtio/vhost-user.c > > @@ -187,6 +187,19 @@ static int vhost_user_write(struct vhost_dev *dev, > > VhostUserMsg *msg, > > 0 : -1; > > } > > > > +static bool vhost_user_one_time_request(VhostUserRequest request) > > +{ > > +switch (request) { > > +case VHOST_USER_SET_OWNER: > > +case VHOST_USER_RESET_DEVICE: > > +case VHOST_USER_SET_MEM_TABLE: > > +case VHOST_USER_GET_QUEUE_NUM: > > +return true; > > +default: > > +return false; > > +} > > +} > > + > > static int vhost_user_call(struct vhost_dev *dev, unsigned long int > > request, > > void *arg) > > { > > @@ -207,6 +220,15 @@ static int vhost_user_call(struct vhost_dev *dev, > > unsigned long int request, > > msg_request = request; > > } > > > > +/* > > + * For non-vring specific requests, like VHOST_USER_SET_MEM_TABLE, > > + * we just need send it once in the first time. For later such > > + * request, we just ignore it. > > + */ > > +if (vhost_user_one_time_request(msg_request) && dev->vq_index != 0) { > > +return 0; > > +} > > Looks like vhost_user_dev_request() is a better name here. Yeah, indeed. Thanks. > > > + > > msg.request = msg_request; > > msg.flags = VHOST_USER_VERSION; > > msg.size = 0; > > diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c > > index 7a7812d..c0ed5b2 100644 > > --- a/hw/virtio/vhost.c > > +++ b/hw/virtio/vhost.c > > @@ -874,8 +874,9 @@ static void vhost_eventfd_del(MemoryListener *listener, > > static in
Re: [Qemu-devel] [PATCH v11 6/7] vhost-user: add multiple queue support
On Wed, Sep 23, 2015 at 12:20:00PM +0800, Yuanhan Liu wrote: > From: Changchun Ouyang > [...] > static void net_vhost_user_event(void *opaque, int event) > { > -VhostUserState *s = opaque; > +const char *name = opaque; > +NetClientState *ncs[MAX_QUEUE_NUM]; > +VhostUserState *s; > +Error *err = NULL; > +int queues; > > +queues = qemu_find_net_clients_except(name, ncs, > + NET_CLIENT_OPTIONS_KIND_NIC, > + MAX_QUEUE_NUM); > +s = DO_UPCAST(VhostUserState, nc, ncs[0]); > switch (event) { > case CHR_EVENT_OPENED: > -vhost_user_start(s); > -net_vhost_link_down(s, false); > +if (vhost_user_start(queues, ncs) < 0) { > +exit(1); > +} > +qmp_set_link(name, true, &err); > error_report("chardev \"%s\" went up", s->chr->label); > break; > case CHR_EVENT_CLOSED: > -net_vhost_link_down(s, true); > -vhost_user_stop(s); > +qmp_set_link(name, true, &err); > +vhost_user_stop(queues, ncs); Sigh... A silly Copy & Paste typo. Here is the udpated patch: --yliu ---8<--- >From 3793772867024dfabb9eb8eb5c53ae6714f88618 Mon Sep 17 00:00:00 2001 From: Changchun Ouyang Date: Tue, 15 Sep 2015 14:28:38 +0800 Subject: [PATCH v11 6/7] vhost-user: add multiple queue support This patch is initially based a patch from Nikolay Nikolaev. This patch adds vhost-user multiple queue support, by creating a nc and vhost_net pair for each queue. Qemu exits if find that the backend can't support the number of requested queues (by providing queues=# option). The max number is queried by a new message, VHOST_USER_GET_QUEUE_NUM, and is sent only when protocol feature VHOST_USER_PROTOCOL_F_MQ is present first. The max queue check is done at vhost-user initiation stage. We initiate one queue first, which, in the meantime, also gets the max_queues the backend supports. In older version, it was reported that some messages are sent more times than necessary. Here we came an agreement with Michael that we could categorize vhost user messages to 2 types: non-vring specific messages, which should be sent only once, and vring specific messages, which should be sent per queue. Here I introduced a helper function vhost_user_one_time_request(), which lists following messages as non-vring specific messages: VHOST_USER_SET_OWNER VHOST_USER_RESET_DEVICE VHOST_USER_SET_MEM_TABLE VHOST_USER_GET_QUEUE_NUM For above messages, we simply ignore them when they are not sent the first time. Signed-off-by: Nikolay Nikolaev Signed-off-by: Changchun Ouyang Signed-off-by: Yuanhan Liu --- v11: inovke qmp_set_link() directly -- Suggested by Jason Wang v10: don't treat VHOST_USER_SET/GET_[PROTOCOL]_FEATURES as one time request, as the two feature bits need to be stored at per-device. v9: per suggested by Jason Wang, we could invoke qemu_chr_add_handlers() once only, and invoke qemu_find_net_clients_except() at the handler to gather all related ncs, for initiating all queues. Which addresses a hidden bug in older verion in a more QEMU-like way. v8: set net->dev.vq_index correctly inside vhost_net_init() based on the value from net->nc->queue_index. --- docs/specs/vhost-user.txt | 15 + hw/net/vhost_net.c| 10 ++-- hw/virtio/vhost-user.c| 22 hw/virtio/vhost.c | 5 +- net/vhost-user.c | 141 ++ qapi-schema.json | 6 +- qemu-options.hx | 5 +- 7 files changed, 147 insertions(+), 57 deletions(-) diff --git a/docs/specs/vhost-user.txt b/docs/specs/vhost-user.txt index 43db9b4..cfc9d41 100644 --- a/docs/specs/vhost-user.txt +++ b/docs/specs/vhost-user.txt @@ -135,6 +135,21 @@ As older slaves don't support negotiating protocol features, a feature bit was dedicated for this purpose: #define VHOST_USER_F_PROTOCOL_FEATURES 30 +Multiple queue support +-- + +Multiple queue is treated as a protocol extension, hence the slave has to +implement protocol features first. The multiple queues feature is supported +only when the protocol feature VHOST_USER_PROTOCOL_F_MQ (bit 0) is set: +#define VHOST_USER_PROTOCOL_F_MQ0 + +The max number of queues the slave supports can be queried with message +VHOST_USER_GET_PROTOCOL_FEATURES. Master should stop when the number of +requested queues is bigger than that. + +As all queues share one connection, the master uses a unique index for each +queue in the sent message to identify a specified queue. + Message types - diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c index f663e5a..ad82b5c 100644 --- a/hw/net/vhost_net.c +++ b/hw/net/v
[Qemu-devel] [PATCH v11 5/7] vhost: introduce vhost_backend_get_vq_index method
Minusing the idx with the base(dev->vq_index) for vhost-kernel, and then adding it back for vhost-user doesn't seem right. Here introduces a new method vhost_backend_get_vq_index() for getting the right vq index for following vhost messages calls. Suggested-by: Jason Wang Signed-off-by: Yuanhan Liu Reviewed-by: Jason Wang --- hw/virtio/vhost-backend.c | 10 +- hw/virtio/vhost-user.c| 12 ++-- hw/virtio/vhost.c | 15 ++- include/hw/virtio/vhost-backend.h | 2 ++ 4 files changed, 27 insertions(+), 12 deletions(-) diff --git a/hw/virtio/vhost-backend.c b/hw/virtio/vhost-backend.c index 4d68a27..72d1392 100644 --- a/hw/virtio/vhost-backend.c +++ b/hw/virtio/vhost-backend.c @@ -42,11 +42,19 @@ static int vhost_kernel_cleanup(struct vhost_dev *dev) return close(fd); } +static int vhost_kernel_get_vq_index(struct vhost_dev *dev, int idx) +{ +assert(idx >= dev->vq_index && idx < dev->vq_index + dev->nvqs); + +return idx - dev->vq_index; +} + static const VhostOps kernel_ops = { .backend_type = VHOST_BACKEND_TYPE_KERNEL, .vhost_call = vhost_kernel_call, .vhost_backend_init = vhost_kernel_init, -.vhost_backend_cleanup = vhost_kernel_cleanup +.vhost_backend_cleanup = vhost_kernel_cleanup, +.vhost_backend_get_vq_index = vhost_kernel_get_vq_index, }; int vhost_set_backend_type(struct vhost_dev *dev, VhostBackendType backend_type) diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c index 694fde5..5018fd6 100644 --- a/hw/virtio/vhost-user.c +++ b/hw/virtio/vhost-user.c @@ -393,9 +393,17 @@ static int vhost_user_cleanup(struct vhost_dev *dev) return 0; } +static int vhost_user_get_vq_index(struct vhost_dev *dev, int idx) +{ +assert(idx >= dev->vq_index && idx < dev->vq_index + dev->nvqs); + +return idx; +} + const VhostOps user_ops = { .backend_type = VHOST_BACKEND_TYPE_USER, .vhost_call = vhost_user_call, .vhost_backend_init = vhost_user_init, -.vhost_backend_cleanup = vhost_user_cleanup -}; +.vhost_backend_cleanup = vhost_user_cleanup, +.vhost_backend_get_vq_index = vhost_user_get_vq_index, +}; diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c index a08c36b..7a7812d 100644 --- a/hw/virtio/vhost.c +++ b/hw/virtio/vhost.c @@ -719,7 +719,7 @@ static int vhost_virtqueue_start(struct vhost_dev *dev, { hwaddr s, l, a; int r; -int vhost_vq_index = idx - dev->vq_index; +int vhost_vq_index = dev->vhost_ops->vhost_backend_get_vq_index(dev, idx); struct vhost_vring_file file = { .index = vhost_vq_index }; @@ -728,7 +728,6 @@ static int vhost_virtqueue_start(struct vhost_dev *dev, }; struct VirtQueue *vvq = virtio_get_queue(vdev, idx); -assert(idx >= dev->vq_index && idx < dev->vq_index + dev->nvqs); vq->num = state.num = virtio_queue_get_num(vdev, idx); r = dev->vhost_ops->vhost_call(dev, VHOST_SET_VRING_NUM, &state); @@ -822,12 +821,12 @@ static void vhost_virtqueue_stop(struct vhost_dev *dev, struct vhost_virtqueue *vq, unsigned idx) { -int vhost_vq_index = idx - dev->vq_index; +int vhost_vq_index = dev->vhost_ops->vhost_backend_get_vq_index(dev, idx); struct vhost_vring_state state = { .index = vhost_vq_index, }; int r; -assert(idx >= dev->vq_index && idx < dev->vq_index + dev->nvqs); + r = dev->vhost_ops->vhost_call(dev, VHOST_GET_VRING_BASE, &state); if (r < 0) { fprintf(stderr, "vhost VQ %d ring restore failed: %d\n", idx, r); @@ -1066,17 +1065,15 @@ void vhost_virtqueue_mask(struct vhost_dev *hdev, VirtIODevice *vdev, int n, { struct VirtQueue *vvq = virtio_get_queue(vdev, n); int r, index = n - hdev->vq_index; +struct vhost_vring_file file; -assert(n >= hdev->vq_index && n < hdev->vq_index + hdev->nvqs); - -struct vhost_vring_file file = { -.index = index -}; if (mask) { file.fd = event_notifier_get_fd(&hdev->vqs[index].masked_notifier); } else { file.fd = event_notifier_get_fd(virtio_queue_get_guest_notifier(vvq)); } + +file.index = hdev->vhost_ops->vhost_backend_get_vq_index(hdev, n); r = hdev->vhost_ops->vhost_call(hdev, VHOST_SET_VRING_CALL, &file); assert(r >= 0); } diff --git a/include/hw/virtio/vhost-backend.h b/include/hw/virtio/vhost-backend.h index e472f29..e1dfc6d 100644 --- a/include/hw/virtio/vhost-backend.h +++ b/include/hw/virtio/vhost-backend.h @@ -24,12 +24,14 @@ typedef int (*vhost_call)(struct vhost_dev *dev, unsigned long int request, void *arg); typedef int (*vhost_backend
[Qemu-devel] [PATCH v11 4/7] vhost-user: add VHOST_USER_GET_QUEUE_NUM message
This is for querying how many queues the backend supports if it has mq support(when VHOST_USER_PROTOCOL_F_MQ flag is set from the quried protocol features). vhost_net_get_max_queues() is the interface to export that value, and to tell if the backend supports # of queues user requested, which is done in the following patch. Signed-off-by: Yuanhan Liu --- v11: define a dummy vhost_net_get_max_queues when !CONFIG_VHOST_NET. --- docs/specs/vhost-user.txt | 11 +++ hw/net/vhost_net.c| 12 hw/virtio/vhost-user.c| 15 ++- include/hw/virtio/vhost.h | 1 + include/net/vhost_net.h | 1 + 5 files changed, 39 insertions(+), 1 deletion(-) diff --git a/docs/specs/vhost-user.txt b/docs/specs/vhost-user.txt index ccbbcbb..43db9b4 100644 --- a/docs/specs/vhost-user.txt +++ b/docs/specs/vhost-user.txt @@ -301,3 +301,14 @@ Message types Bits (0-7) of the payload contain the vring index. Bit 8 is the invalid FD flag. This flag is set when there is no file descriptor in the ancillary data. + + * VHOST_USER_GET_QUEUE_NUM + + Id: 17 + Equivalent ioctl: N/A + Master payload: N/A + Slave payload: u64 + + Query how many queues the backend supports. This request should be + sent only when VHOST_USER_PROTOCOL_F_MQ is set in quried protocol + features by VHOST_USER_GET_PROTOCOL_FEATURES. diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c index b7d29b7..f663e5a 100644 --- a/hw/net/vhost_net.c +++ b/hw/net/vhost_net.c @@ -122,6 +122,11 @@ void vhost_net_ack_features(struct vhost_net *net, uint64_t features) vhost_ack_features(&net->dev, vhost_net_get_feature_bits(net), features); } +uint64_t vhost_net_get_max_queues(VHostNetState *net) +{ +return net->dev.max_queues; +} + static int vhost_net_get_fd(NetClientState *backend) { switch (backend->info->type) { @@ -144,6 +149,8 @@ struct vhost_net *vhost_net_init(VhostNetOptions *options) goto fail; } +net->dev.max_queues = 1; + if (backend_kernel) { r = vhost_net_get_fd(options->net_backend); if (r < 0) { @@ -414,6 +421,11 @@ VHostNetState *get_vhost_net(NetClientState *nc) return vhost_net; } #else +uint64_t vhost_net_get_max_queues(VHostNetState *net) +{ +return 1; +} + struct vhost_net *vhost_net_init(VhostNetOptions *options) { error_report("vhost-net support is not compiled in"); diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c index 9cb2f52..694fde5 100644 --- a/hw/virtio/vhost-user.c +++ b/hw/virtio/vhost-user.c @@ -25,7 +25,9 @@ #define VHOST_MEMORY_MAX_NREGIONS8 #define VHOST_USER_F_PROTOCOL_FEATURES 30 -#define VHOST_USER_PROTOCOL_FEATURE_MASK 0x0ULL +#define VHOST_USER_PROTOCOL_FEATURE_MASK 0x1ULL + +#define VHOST_USER_PROTOCOL_F_MQ0 typedef enum VhostUserRequest { VHOST_USER_NONE = 0, @@ -45,6 +47,7 @@ typedef enum VhostUserRequest { VHOST_USER_SET_VRING_ERR = 14, VHOST_USER_GET_PROTOCOL_FEATURES = 15, VHOST_USER_SET_PROTOCOL_FEATURES = 16, +VHOST_USER_GET_QUEUE_NUM = 17, VHOST_USER_MAX } VhostUserRequest; @@ -211,6 +214,7 @@ static int vhost_user_call(struct vhost_dev *dev, unsigned long int request, switch (msg_request) { case VHOST_USER_GET_FEATURES: case VHOST_USER_GET_PROTOCOL_FEATURES: +case VHOST_USER_GET_QUEUE_NUM: need_reply = 1; break; @@ -315,6 +319,7 @@ static int vhost_user_call(struct vhost_dev *dev, unsigned long int request, switch (msg_request) { case VHOST_USER_GET_FEATURES: case VHOST_USER_GET_PROTOCOL_FEATURES: +case VHOST_USER_GET_QUEUE_NUM: if (msg.size != sizeof(m.u64)) { error_report("Received bad msg size."); return -1; @@ -366,6 +371,14 @@ static int vhost_user_init(struct vhost_dev *dev, void *opaque) if (err < 0) { return err; } + +/* query the max queues we support if backend supports Multiple Queue */ +if (dev->protocol_features & (1ULL << VHOST_USER_PROTOCOL_F_MQ)) { +err = vhost_user_call(dev, VHOST_USER_GET_QUEUE_NUM, &dev->max_queues); +if (err < 0) { +return err; +} +} } return 0; diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h index 6467c73..c3758f3 100644 --- a/include/hw/virtio/vhost.h +++ b/include/hw/virtio/vhost.h @@ -48,6 +48,7 @@ struct vhost_dev { unsigned long long acked_features; unsigned long long backend_features; unsigned long long protocol_features; +unsigned long long max_queues; bool started; bool log_enabled; unsigned long long log_size; diff --git a/include/net/vhost_net.h b/include/net/vhost_net.h index 840d4b1..8db723e 100644 --- a/include/net/vhost_net.h +++ b/include/net/vhost_net.h @@ -13,6 +13,7 @@
[Qemu-devel] [PATCH v11 3/7] vhost: rename VHOST_RESET_OWNER to VHOST_RESET_DEVICE
Quote from Michael: We really should rename VHOST_RESET_OWNER to VHOST_RESET_DEVICE. Suggested-by: Michael S. Tsirkin Signed-off-by: Yuanhan Liu --- docs/specs/vhost-user.txt | 4 ++-- hw/net/vhost_net.c | 2 +- hw/virtio/vhost-user.c | 6 +++--- linux-headers/linux/vhost.h | 2 +- tests/vhost-user-test.c | 2 +- 5 files changed, 8 insertions(+), 8 deletions(-) diff --git a/docs/specs/vhost-user.txt b/docs/specs/vhost-user.txt index 70da3b1..ccbbcbb 100644 --- a/docs/specs/vhost-user.txt +++ b/docs/specs/vhost-user.txt @@ -194,10 +194,10 @@ Message types as an owner of the session. This can be used on the Slave as a "session start" flag. - * VHOST_USER_RESET_OWNER + * VHOST_USER_RESET_DEVICE Id: 4 - Equivalent ioctl: VHOST_RESET_OWNER + Equivalent ioctl: VHOST_RESET_DEVICE Master payload: N/A Issued when a new connection is about to be closed. The Master will no diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c index 9d32d76..b7d29b7 100644 --- a/hw/net/vhost_net.c +++ b/hw/net/vhost_net.c @@ -287,7 +287,7 @@ static void vhost_net_stop_one(struct vhost_net *net, } else if (net->nc->info->type == NET_CLIENT_OPTIONS_KIND_VHOST_USER) { for (file.index = 0; file.index < net->dev.nvqs; ++file.index) { const VhostOps *vhost_ops = net->dev.vhost_ops; -int r = vhost_ops->vhost_call(&net->dev, VHOST_RESET_OWNER, +int r = vhost_ops->vhost_call(&net->dev, VHOST_RESET_DEVICE, NULL); assert(r >= 0); } diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c index 7fe35c6..9cb2f52 100644 --- a/hw/virtio/vhost-user.c +++ b/hw/virtio/vhost-user.c @@ -32,7 +32,7 @@ typedef enum VhostUserRequest { VHOST_USER_GET_FEATURES = 1, VHOST_USER_SET_FEATURES = 2, VHOST_USER_SET_OWNER = 3, -VHOST_USER_RESET_OWNER = 4, +VHOST_USER_RESET_DEVICE = 4, VHOST_USER_SET_MEM_TABLE = 5, VHOST_USER_SET_LOG_BASE = 6, VHOST_USER_SET_LOG_FD = 7, @@ -98,7 +98,7 @@ static unsigned long int ioctl_to_vhost_user_request[VHOST_USER_MAX] = { VHOST_GET_FEATURES, /* VHOST_USER_GET_FEATURES */ VHOST_SET_FEATURES, /* VHOST_USER_SET_FEATURES */ VHOST_SET_OWNER,/* VHOST_USER_SET_OWNER */ -VHOST_RESET_OWNER, /* VHOST_USER_RESET_OWNER */ +VHOST_RESET_DEVICE, /* VHOST_USER_RESET_DEVICE */ VHOST_SET_MEM_TABLE,/* VHOST_USER_SET_MEM_TABLE */ VHOST_SET_LOG_BASE, /* VHOST_USER_SET_LOG_BASE */ VHOST_SET_LOG_FD, /* VHOST_USER_SET_LOG_FD */ @@ -222,7 +222,7 @@ static int vhost_user_call(struct vhost_dev *dev, unsigned long int request, break; case VHOST_USER_SET_OWNER: -case VHOST_USER_RESET_OWNER: +case VHOST_USER_RESET_DEVICE: break; case VHOST_USER_SET_MEM_TABLE: diff --git a/linux-headers/linux/vhost.h b/linux-headers/linux/vhost.h index ead86db..14a0160 100644 --- a/linux-headers/linux/vhost.h +++ b/linux-headers/linux/vhost.h @@ -78,7 +78,7 @@ struct vhost_memory { #define VHOST_SET_OWNER _IO(VHOST_VIRTIO, 0x01) /* Give up ownership, and reset the device to default values. * Allows subsequent call to VHOST_OWNER_SET to succeed. */ -#define VHOST_RESET_OWNER _IO(VHOST_VIRTIO, 0x02) +#define VHOST_RESET_DEVICE _IO(VHOST_VIRTIO, 0x02) /* Set up/modify memory layout */ #define VHOST_SET_MEM_TABLE_IOW(VHOST_VIRTIO, 0x03, struct vhost_memory) diff --git a/tests/vhost-user-test.c b/tests/vhost-user-test.c index 75fedf0..e301db7 100644 --- a/tests/vhost-user-test.c +++ b/tests/vhost-user-test.c @@ -58,7 +58,7 @@ typedef enum VhostUserRequest { VHOST_USER_GET_FEATURES = 1, VHOST_USER_SET_FEATURES = 2, VHOST_USER_SET_OWNER = 3, -VHOST_USER_RESET_OWNER = 4, +VHOST_USER_RESET_DEVICE = 4, VHOST_USER_SET_MEM_TABLE = 5, VHOST_USER_SET_LOG_BASE = 6, VHOST_USER_SET_LOG_FD = 7, -- 1.9.0