Re: virtio-blk: should num_vqs be limited by num_possible_cpus()?
On 3/13/19 5:39 PM, Cornelia Huck wrote: > On Wed, 13 Mar 2019 11:26:04 +0800 > Dongli Zhang wrote: > >> On 3/13/19 1:33 AM, Cornelia Huck wrote: >>> On Tue, 12 Mar 2019 10:22:46 -0700 (PDT) >>> Dongli Zhang wrote: >>> I observed that there is one msix vector for config and one shared vector for all queues in below qemu cmdline, when the num-queues for virtio-blk is more than the number of possible cpus: qemu: "-smp 4" while "-device virtio-blk-pci,drive=drive-0,id=virtblk0,num-queues=6" # cat /proc/interrupts CPU0 CPU1 CPU2 CPU3 ... ... 24: 0 0 0 0 PCI-MSI 65536-edge virtio0-config 25: 0 0 0 59 PCI-MSI 65537-edge virtio0-virtqueues ... ... However, when num-queues is the same as number of possible cpus: qemu: "-smp 4" while "-device virtio-blk-pci,drive=drive-0,id=virtblk0,num-queues=4" # cat /proc/interrupts CPU0 CPU1 CPU2 CPU3 ... ... 24: 0 0 0 0 PCI-MSI 65536-edge virtio0-config 25: 2 0 0 0 PCI-MSI 65537-edge virtio0-req.0 26: 0 35 0 0 PCI-MSI 65538-edge virtio0-req.1 27: 0 0 32 0 PCI-MSI 65539-edge virtio0-req.2 28: 0 0 0 0 PCI-MSI 65540-edge virtio0-req.3 ... ... In above case, there is one msix vector per queue. >>> >>> Please note that this is pci-specific... >>> This is because the max number of queues is not limited by the number of possible cpus. By default, nvme (regardless about write_queues and poll_queues) and xen-blkfront limit the number of queues with num_possible_cpus(). >>> >>> ...and these are probably pci-specific as well. >> >> Not pci-specific, but per-cpu as well. > > Ah, I meant that those are pci devices. > >> >>> Is this by design on purpose, or can we fix with below? diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c index 4bc083b..df95ce3 100644 --- a/drivers/block/virtio_blk.c +++ b/drivers/block/virtio_blk.c @@ -513,6 +513,8 @@ static int init_vq(struct virtio_blk *vblk) if (err) num_vqs = 1; + num_vqs = min(num_possible_cpus(), num_vqs); + vblk->vqs = kmalloc_array(num_vqs, sizeof(*vblk->vqs), GFP_KERNEL); if (!vblk->vqs) return -ENOMEM; >>> >>> virtio-blk, however, is not pci-specific. >>> >>> If we are using the ccw transport on s390, a completely different >>> interrupt mechanism is in use ('floating' interrupts, which are not >>> per-cpu). A check like that should therefore not go into the generic >>> driver. >>> >> >> So far there seems two options. >> >> The 1st option is to ask the qemu user to always specify "-num-queues" with >> the >> same number of vcpus when running x86 guest with pci for virtio-blk or >> virtio-scsi, in order to assign a vector for each queue. > > That does seem like an extra burden for the user: IIUC, things work > even if you have too many queues, it's just not optimal. It sounds like > something that can be done by a management layer (e.g. libvirt), though. > >> Or, is it fine for virtio folks to add a new hook to 'struct >> virtio_config_ops' >> so that different platforms (e.g., pci or ccw) would use different ways to >> limit >> the max number of queues in guest, with something like below? > > That sounds better, as both transports and drivers can opt-in here. > > However, maybe it would be even better to try to come up with a better > strategy of allocating msix vectors in virtio-pci. More vectors in the > num_queues > num_cpus case, even if they still need to be shared? > Individual vectors for n-1 cpus and then a shared one for the remaining > queues? > > It might even be device-specific: Have some low-traffic status queues > share a vector, and provide an individual vector for high-traffic > queues. Would need some device<->transport interface, obviously. > This sounds a little bit similar to multiple hctx maps? So far, as virtio-blk only supports set->nr_maps = 1, no matter how many hw queues are assigned for virtio-blk, blk_mq_alloc_tag_set() would use at most nr_cpu_ids hw queues. 2981 int blk_mq_alloc_tag_set(struct blk_mq_tag_set *set) ... ... 3021 /* 3022 * There is no use for more h/w queues than cpus if we just have 3023 * a single map 3024 */ 3025 if (set->nr_maps == 1 && set->nr_hw_queues > nr_cpu_ids) 3026 set->nr_hw_queues = nr_cpu_ids; Even the block layer would limit the number of hw queues by n
[RFC] vhost: select TAP if VHOST is configured
If VHOST_NET is configured but TUN and TAP are not, then the kernel will build but vhost will not work correctly since it can't setup the necessary tap device. A solution is to select it. Fixes: 9a393b5d5988 ("tap: tap as an independent module") Signed-off-by: Stephen Hemminger --- drivers/vhost/Kconfig | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/vhost/Kconfig b/drivers/vhost/Kconfig index b580885243f7..a24c69598241 100644 --- a/drivers/vhost/Kconfig +++ b/drivers/vhost/Kconfig @@ -1,7 +1,8 @@ config VHOST_NET tristate "Host kernel accelerator for virtio net" - depends on NET && EVENTFD && (TUN || !TUN) && (TAP || !TAP) + depends on NET && EVENTFD select VHOST + select TAP ---help--- This kernel module can be loaded in host kernel to accelerate guest networking with virtio_net. Not to be confused with virtio_net -- 2.17.1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [RFC PATCH V2 0/5] vhost: accelerate metadata access through vmap()
On Tue, Mar 12, 2019 at 01:53:37PM -0700, James Bottomley wrote: > I've got to say: optimize what? What code do we ever have in the > kernel that kmap's a page and then doesn't do anything with it? You can > guarantee that on kunmap the page is either referenced (needs > invalidating) or updated (needs flushing). The in-kernel use of kmap is > always > > kmap > do something with the mapped page > kunmap > > In a very short interval. It seems just a simplification to make > kunmap do the flush if needed rather than try to have the users > remember. The thing which makes this really simple is that on most > architectures flush and invalidate is the same operation. If you > really want to optimize you can use the referenced and dirty bits on > the kmapped pte to tell you what operation to do, but if your flush is > your invalidate, you simply assume the data needs flushing on kunmap > without checking anything. I agree that this would be a good way to simplify the API. Now we'd just need volunteers to implement this for all architectures that need cache flushing and then remove the explicit flushing in the callers.. > > Which means after we fix vhost to add the flush_dcache_page after > > kunmap, Parisc will get a double hit (but it also means Parisc was > > the only one of those archs needed explicit cache flushes, where > > vhost worked correctly so far.. so it kinds of proofs your point of > > giving up being the safe choice). > > What double hit? If there's no cache to flush then cache flush is a > no-op. It's also a highly piplineable no-op because the CPU has the L1 > cache within easy reach. The only event when flush takes a large > amount time is if we actually have dirty data to write back to main > memory. I've heard people complaining that on some microarchitectures even no-op cache flushes are relatively expensive. Don't ask me why, but if we can easily avoid double flushes we should do that. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: virtio-blk: should num_vqs be limited by num_possible_cpus()?
On Wed, 13 Mar 2019 11:26:04 +0800 Dongli Zhang wrote: > On 3/13/19 1:33 AM, Cornelia Huck wrote: > > On Tue, 12 Mar 2019 10:22:46 -0700 (PDT) > > Dongli Zhang wrote: > > > >> I observed that there is one msix vector for config and one shared vector > >> for all queues in below qemu cmdline, when the num-queues for virtio-blk > >> is more than the number of possible cpus: > >> > >> qemu: "-smp 4" while "-device > >> virtio-blk-pci,drive=drive-0,id=virtblk0,num-queues=6" > >> > >> # cat /proc/interrupts > >>CPU0 CPU1 CPU2 CPU3 > >> ... ... > >> 24: 0 0 0 0 PCI-MSI 65536-edge > >> virtio0-config > >> 25: 0 0 0 59 PCI-MSI 65537-edge > >> virtio0-virtqueues > >> ... ... > >> > >> > >> However, when num-queues is the same as number of possible cpus: > >> > >> qemu: "-smp 4" while "-device > >> virtio-blk-pci,drive=drive-0,id=virtblk0,num-queues=4" > >> > >> # cat /proc/interrupts > >>CPU0 CPU1 CPU2 CPU3 > >> ... ... > >> 24: 0 0 0 0 PCI-MSI 65536-edge > >> virtio0-config > >> 25: 2 0 0 0 PCI-MSI 65537-edge > >> virtio0-req.0 > >> 26: 0 35 0 0 PCI-MSI 65538-edge > >> virtio0-req.1 > >> 27: 0 0 32 0 PCI-MSI 65539-edge > >> virtio0-req.2 > >> 28: 0 0 0 0 PCI-MSI 65540-edge > >> virtio0-req.3 > >> ... ... > >> > >> In above case, there is one msix vector per queue. > > > > Please note that this is pci-specific... > > > >> > >> > >> This is because the max number of queues is not limited by the number of > >> possible cpus. > >> > >> By default, nvme (regardless about write_queues and poll_queues) and > >> xen-blkfront limit the number of queues with num_possible_cpus(). > > > > ...and these are probably pci-specific as well. > > Not pci-specific, but per-cpu as well. Ah, I meant that those are pci devices. > > > > >> > >> > >> Is this by design on purpose, or can we fix with below? > >> > >> > >> diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c > >> index 4bc083b..df95ce3 100644 > >> --- a/drivers/block/virtio_blk.c > >> +++ b/drivers/block/virtio_blk.c > >> @@ -513,6 +513,8 @@ static int init_vq(struct virtio_blk *vblk) > >>if (err) > >>num_vqs = 1; > >> > >> + num_vqs = min(num_possible_cpus(), num_vqs); > >> + > >>vblk->vqs = kmalloc_array(num_vqs, sizeof(*vblk->vqs), GFP_KERNEL); > >>if (!vblk->vqs) > >>return -ENOMEM; > > > > virtio-blk, however, is not pci-specific. > > > > If we are using the ccw transport on s390, a completely different > > interrupt mechanism is in use ('floating' interrupts, which are not > > per-cpu). A check like that should therefore not go into the generic > > driver. > > > > So far there seems two options. > > The 1st option is to ask the qemu user to always specify "-num-queues" with > the > same number of vcpus when running x86 guest with pci for virtio-blk or > virtio-scsi, in order to assign a vector for each queue. That does seem like an extra burden for the user: IIUC, things work even if you have too many queues, it's just not optimal. It sounds like something that can be done by a management layer (e.g. libvirt), though. > Or, is it fine for virtio folks to add a new hook to 'struct > virtio_config_ops' > so that different platforms (e.g., pci or ccw) would use different ways to > limit > the max number of queues in guest, with something like below? That sounds better, as both transports and drivers can opt-in here. However, maybe it would be even better to try to come up with a better strategy of allocating msix vectors in virtio-pci. More vectors in the num_queues > num_cpus case, even if they still need to be shared? Individual vectors for n-1 cpus and then a shared one for the remaining queues? It might even be device-specific: Have some low-traffic status queues share a vector, and provide an individual vector for high-traffic queues. Would need some device<->transport interface, obviously. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization