Re: [RFC PATCH 4/4] virtio-net: sleep instead of busy waiting for cvq command
> This needs to be proposed to the virtio spec first. And actually we > need more than this: > > 1) we still need a way to deal with the device without this feature > 2) driver can't depend solely on what is advertised by the device (e.g > device can choose to advertise a very long timeout) I think that I wasn't clear, sorry. I'm not talking about a new virtio feature, I'm talking about a situation when: * virtio_net issues a control command. * the device gets the command, but somehow, completes the command after timeout. * virtio_net assumes that the command failed (timeout), and issues a different control command. * virtio_net will then call virtqueue_wait_for_used, and will immediately get the previous response (If I'm not wrong). So, this is not a new feature that I'm proposing, just a situation that may occur due to cvq timeouts. Anyhow, your solution calling BAD_RING if we reach a timeout should prevent this situation. Thanks ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH net V2] virtio-net: correctly enable callback during start_xmit
On Fri, Dec 16, 2022 at 11:43 AM Jason Wang wrote: > > On Thu, Dec 15, 2022 at 5:35 PM Michael S. Tsirkin wrote: > > > > On Thu, Dec 15, 2022 at 05:15:43PM +0800, Jason Wang wrote: > > > On Thu, Dec 15, 2022 at 5:02 PM Michael S. Tsirkin > > > wrote: > > > > > > > > On Thu, Dec 15, 2022 at 11:27:19AM +0800, Jason Wang wrote: > > > > > Commit a7766ef18b33("virtio_net: disable cb aggressively") enables > > > > > virtqueue callback via the following statement: > > > > > > > > > > do { > > > > >.. > > > > > } while (use_napi && kick && > > > > >unlikely(!virtqueue_enable_cb_delayed(sq->vq))); > > > > > > > > > > When NAPI is used and kick is false, the callback won't be enabled > > > > > here. And when the virtqueue is about to be full, the tx will be > > > > > disabled, but we still don't enable tx interrupt which will cause a TX > > > > > hang. This could be observed when using pktgen with burst enabled. > > > > > > > > > > Fixing this by trying to enable tx interrupt after we disable TX when > > > > > we're not using napi or kick is false. > > > > > > > > > > Fixes: a7766ef18b33 ("virtio_net: disable cb aggressively") > > > > > Signed-off-by: Jason Wang > > > > > --- > > > > > The patch is needed for -stable. > > > > > Changes since V1: > > > > > - enable tx interrupt after we disable tx > > > > > --- > > > > > drivers/net/virtio_net.c | 2 +- > > > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > > > > > index 86e52454b5b5..dcf3a536d78a 100644 > > > > > --- a/drivers/net/virtio_net.c > > > > > +++ b/drivers/net/virtio_net.c > > > > > @@ -1873,7 +1873,7 @@ static netdev_tx_t start_xmit(struct sk_buff > > > > > *skb, struct net_device *dev) > > > > >*/ > > > > > if (sq->vq->num_free < 2+MAX_SKB_FRAGS) { > > > > > netif_stop_subqueue(dev, qnum); > > > > > - if (!use_napi && > > > > > + if ((!use_napi || !kick) && > > > > > unlikely(!virtqueue_enable_cb_delayed(sq->vq))) { > > > > > /* More just got used, free them then recheck. > > > > > */ > > > > > free_old_xmit_skbs(sq, false); > > > > > > > > This will work but the following lines are: > > > > > > > >if (sq->vq->num_free >= 2+MAX_SKB_FRAGS) { > > > > netif_start_subqueue(dev, qnum); > > > > virtqueue_disable_cb(sq->vq); > > > > } > > > > > > > > > > > > and I thought we are supposed to keep callbacks enabled with napi? > > > > > > This seems to be the opposite logic of commit a7766ef18b33 that > > > disables callbacks for NAPI. > > > > > > It said: > > > > > > There are currently two cases where we poll TX vq not in response to a > > > callback: start xmit and rx napi. We currently do this with callbacks > > > enabled which can cause extra interrupts from the card. Used not to > > > be > > > a big issue as we run with interrupts disabled but that is no longer > > > the > > > case, and in some cases the rate of spurious interrupts is so high > > > linux detects this and actually kills the interrupt. > > > > > > My undersatnding is that it tries to disable callbacks on TX. > > > > I think we want to disable callbacks while polling, yes. here we are not > > polling, and I think we want a callback because otherwise nothing will > > orphan skbs and a socket can be blocked, not transmitting anything - a > > deadlock. > > I'm not sure how I got here, did you mean a partial revert of > a7766ef18b33 (the part that disables TX callbacks on start_xmit)? Michael, any idea on this? Thanks > > Btw, I plan to remove non NAPI mode completely, since it was disabled > by default for years and we don't see any complaint, then we may have > modern features like BQL and better TCP performance. In that sense we > may simply keep tx callback open as most of modern NIC did. > > > > > > > One of the ideas of napi is to free on napi callback, not here > > > > immediately. > > > > > > > > I think it is easier to just do a separate branch here. Along the > > > > lines of: > > > > > > > > if (use_napi) { > > > > if > > > > (unlikely(!virtqueue_enable_cb_delayed(sq->vq))) > > > > virtqueue_napi_schedule(napi, vq); > > > > > > This seems to be a new logic and it causes some delay in processing TX > > > (unnecessary NAPI). > > > > That's good, we overloaded the queue so we are already going > > too fast, deferring tx so queue has chance to drain > > will allow better batching in the qdisc. > > I meant, compare to > > 1) schedule NAPI and poll TX > > The current code did > > 2) poll TX immediately > > 2) seems faster? > > Thanks > > > > > > > } else { > > > > ... old code ... > > > >
Re: [PATCH] virtio_blk: mark all zone fields LE
On Fri, Dec 23, 2022 at 3:32 AM Michael S. Tsirkin wrote: > > zone is a virtio 1.x feature so all fields are LE, > they are handled as such, but have mistakenly been labeled > __virtioXX in the header. This results in a bunch of sparse warnings. > > Use the __leXX tags to make sparse happy. > > Signed-off-by: Michael S. Tsirkin Acked-by: Jason Wang Thanks > --- > include/uapi/linux/virtio_blk.h | 18 +- > 1 file changed, 9 insertions(+), 9 deletions(-) > > diff --git a/include/uapi/linux/virtio_blk.h b/include/uapi/linux/virtio_blk.h > index f4d5ee7c6f30..ec3c3779406f 100644 > --- a/include/uapi/linux/virtio_blk.h > +++ b/include/uapi/linux/virtio_blk.h > @@ -141,11 +141,11 @@ struct virtio_blk_config { > > /* Zoned block device characteristics (if VIRTIO_BLK_F_ZONED) */ > struct virtio_blk_zoned_characteristics { > - __virtio32 zone_sectors; > - __virtio32 max_open_zones; > - __virtio32 max_active_zones; > - __virtio32 max_append_sectors; > - __virtio32 write_granularity; > + __le32 zone_sectors; > + __le32 max_open_zones; > + __le32 max_active_zones; > + __le32 max_append_sectors; > + __le32 write_granularity; > __u8 model; > __u8 unused2[3]; > } zoned; > @@ -245,11 +245,11 @@ struct virtio_blk_outhdr { > */ > struct virtio_blk_zone_descriptor { > /* Zone capacity */ > - __virtio64 z_cap; > + __le64 z_cap; > /* The starting sector of the zone */ > - __virtio64 z_start; > + __le64 z_start; > /* Zone write pointer position in sectors */ > - __virtio64 z_wp; > + __le64 z_wp; > /* Zone type */ > __u8 z_type; > /* Zone state */ > @@ -258,7 +258,7 @@ struct virtio_blk_zone_descriptor { > }; > > struct virtio_blk_zone_report { > - __virtio64 nr_zones; > + __le64 nr_zones; > __u8 reserved[56]; > struct virtio_blk_zone_descriptor zones[]; > }; > -- > MST > ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH] vdpa_sim: get rid of DMA ops
Looks good from the DMA subsysten POV: Acked-by: Christoph Hellwig Thanks for doing the work! ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH] vdpa_sim: get rid of DMA ops
We used to (ab)use the DMA ops for setting up identical mappings in the IOTLB. This patch tries to get rid of the those unnecessary DMA ops by maintaining a simple identical/passthrough mappings by default. When bound to virtio_vdpa driver, DMA API will simply use PA as the IOVA and we will be all fine. When the vDPA bus tries to setup customized mapping (e.g when bound to vhost-vDPA), the identical/passthrough mapping will be removed. Signed-off-by: Jason Wang --- Note: - This patch depends on the series "[PATCH V3 0/4] Vendor stats support in vdpasim_net" --- drivers/vdpa/vdpa_sim/vdpa_sim.c | 170 --- drivers/vdpa/vdpa_sim/vdpa_sim.h | 2 +- 2 files changed, 22 insertions(+), 150 deletions(-) diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c b/drivers/vdpa/vdpa_sim/vdpa_sim.c index 45d3f84b7937..187fa3a0e5d5 100644 --- a/drivers/vdpa/vdpa_sim/vdpa_sim.c +++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c @@ -17,7 +17,6 @@ #include #include #include -#include #include #include "vdpa_sim.h" @@ -45,13 +44,6 @@ static struct vdpasim *vdpa_to_sim(struct vdpa_device *vdpa) return container_of(vdpa, struct vdpasim, vdpa); } -static struct vdpasim *dev_to_sim(struct device *dev) -{ - struct vdpa_device *vdpa = dev_to_vdpa(dev); - - return vdpa_to_sim(vdpa); -} - static void vdpasim_vq_notify(struct vringh *vring) { struct vdpasim_virtqueue *vq = @@ -104,8 +96,12 @@ static void vdpasim_do_reset(struct vdpasim *vdpasim) >iommu_lock); } - for (i = 0; i < vdpasim->dev_attr.nas; i++) + for (i = 0; i < vdpasim->dev_attr.nas; i++) { vhost_iotlb_reset(>iommu[i]); + vhost_iotlb_add_range(>iommu[i], 0, ULONG_MAX, + 0, VHOST_MAP_RW); + vdpasim->iommu_pt[i] = true; + } vdpasim->running = true; spin_unlock(>iommu_lock); @@ -115,133 +111,6 @@ static void vdpasim_do_reset(struct vdpasim *vdpasim) ++vdpasim->generation; } -static int dir_to_perm(enum dma_data_direction dir) -{ - int perm = -EFAULT; - - switch (dir) { - case DMA_FROM_DEVICE: - perm = VHOST_MAP_WO; - break; - case DMA_TO_DEVICE: - perm = VHOST_MAP_RO; - break; - case DMA_BIDIRECTIONAL: - perm = VHOST_MAP_RW; - break; - default: - break; - } - - return perm; -} - -static dma_addr_t vdpasim_map_range(struct vdpasim *vdpasim, phys_addr_t paddr, - size_t size, unsigned int perm) -{ - struct iova *iova; - dma_addr_t dma_addr; - int ret; - - /* We set the limit_pfn to the maximum (ULONG_MAX - 1) */ - iova = alloc_iova(>iova, size >> iova_shift(>iova), - ULONG_MAX - 1, true); - if (!iova) - return DMA_MAPPING_ERROR; - - dma_addr = iova_dma_addr(>iova, iova); - - spin_lock(>iommu_lock); - ret = vhost_iotlb_add_range(>iommu[0], (u64)dma_addr, - (u64)dma_addr + size - 1, (u64)paddr, perm); - spin_unlock(>iommu_lock); - - if (ret) { - __free_iova(>iova, iova); - return DMA_MAPPING_ERROR; - } - - return dma_addr; -} - -static void vdpasim_unmap_range(struct vdpasim *vdpasim, dma_addr_t dma_addr, - size_t size) -{ - spin_lock(>iommu_lock); - vhost_iotlb_del_range(>iommu[0], (u64)dma_addr, - (u64)dma_addr + size - 1); - spin_unlock(>iommu_lock); - - free_iova(>iova, iova_pfn(>iova, dma_addr)); -} - -static dma_addr_t vdpasim_map_page(struct device *dev, struct page *page, - unsigned long offset, size_t size, - enum dma_data_direction dir, - unsigned long attrs) -{ - struct vdpasim *vdpasim = dev_to_sim(dev); - phys_addr_t paddr = page_to_phys(page) + offset; - int perm = dir_to_perm(dir); - - if (perm < 0) - return DMA_MAPPING_ERROR; - - return vdpasim_map_range(vdpasim, paddr, size, perm); -} - -static void vdpasim_unmap_page(struct device *dev, dma_addr_t dma_addr, - size_t size, enum dma_data_direction dir, - unsigned long attrs) -{ - struct vdpasim *vdpasim = dev_to_sim(dev); - - vdpasim_unmap_range(vdpasim, dma_addr, size); -} - -static void *vdpasim_alloc_coherent(struct device *dev, size_t size, - dma_addr_t *dma_addr, gfp_t flag, - unsigned long attrs) -{ - struct vdpasim *vdpasim = dev_to_sim(dev); - phys_addr_t paddr; - void *addr; - - addr = kmalloc(size, flag); - if (!addr) { -
[PATCH V3 4/4] vdpa_sim_net: vendor satistics
This patch adds support for basic vendor stats that include counters for tx, rx and cvq. Acked-by: Eugenio Pérez Signed-off-by: Jason Wang --- drivers/vdpa/vdpa_sim/vdpa_sim_net.c | 219 ++- 1 file changed, 213 insertions(+), 6 deletions(-) diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim_net.c b/drivers/vdpa/vdpa_sim/vdpa_sim_net.c index 5abd4efd9028..e827708adcca 100644 --- a/drivers/vdpa/vdpa_sim/vdpa_sim_net.c +++ b/drivers/vdpa/vdpa_sim/vdpa_sim_net.c @@ -15,6 +15,7 @@ #include #include #include +#include #include #include @@ -37,6 +38,34 @@ #define VDPASIM_NET_AS_NUM 2 #define VDPASIM_NET_GROUP_NUM 2 +struct vdpasim_dataq_stats { + struct u64_stats_sync syncp; + u64 pkts; + u64 bytes; + u64 drops; + u64 errors; + u64 overruns; +}; + +struct vdpasim_cq_stats { + struct u64_stats_sync syncp; + u64 requests; + u64 successes; + u64 errors; +}; + +struct vdpasim_net{ + struct vdpasim vdpasim; + struct vdpasim_dataq_stats tx_stats; + struct vdpasim_dataq_stats rx_stats; + struct vdpasim_cq_stats cq_stats; +}; + +static struct vdpasim_net *sim_to_net(struct vdpasim *vdpasim) +{ + return container_of(vdpasim, struct vdpasim_net, vdpasim); +} + static void vdpasim_net_complete(struct vdpasim_virtqueue *vq, size_t len) { /* Make sure data is wrote before advancing index */ @@ -97,9 +126,11 @@ static virtio_net_ctrl_ack vdpasim_handle_ctrl_mac(struct vdpasim *vdpasim, static void vdpasim_handle_cvq(struct vdpasim *vdpasim) { struct vdpasim_virtqueue *cvq = >vqs[2]; + struct vdpasim_net *net = sim_to_net(vdpasim); virtio_net_ctrl_ack status = VIRTIO_NET_ERR; struct virtio_net_ctrl_hdr ctrl; size_t read, write; + u64 requests = 0, errors = 0, successes = 0; int err; if (!(vdpasim->features & (1ULL << VIRTIO_NET_F_CTRL_VQ))) @@ -115,10 +146,13 @@ static void vdpasim_handle_cvq(struct vdpasim *vdpasim) if (err <= 0) break; + ++requests; read = vringh_iov_pull_iotlb(>vring, >in_iov, , sizeof(ctrl)); - if (read != sizeof(ctrl)) + if (read != sizeof(ctrl)) { + ++errors; break; + } switch (ctrl.class) { case VIRTIO_NET_CTRL_MAC: @@ -128,6 +162,11 @@ static void vdpasim_handle_cvq(struct vdpasim *vdpasim) break; } + if (status == VIRTIO_NET_OK) + ++successes; + else + ++errors; + /* Make sure data is wrote before advancing index */ smp_wmb(); @@ -145,6 +184,12 @@ static void vdpasim_handle_cvq(struct vdpasim *vdpasim) cvq->cb(cvq->private); local_bh_enable(); } + + u64_stats_update_begin(>cq_stats.syncp); + net->cq_stats.requests += requests; + net->cq_stats.errors += errors; + net->cq_stats.successes += successes; + u64_stats_update_end(>cq_stats.syncp); } static void vdpasim_net_work(struct work_struct *work) @@ -152,8 +197,10 @@ static void vdpasim_net_work(struct work_struct *work) struct vdpasim *vdpasim = container_of(work, struct vdpasim, work); struct vdpasim_virtqueue *txq = >vqs[1]; struct vdpasim_virtqueue *rxq = >vqs[0]; + struct vdpasim_net *net = sim_to_net(vdpasim); ssize_t read, write; - int pkts = 0; + u64 tx_pkts = 0, rx_pkts = 0, tx_bytes = 0, rx_bytes = 0; + u64 rx_drops = 0, rx_overruns = 0, rx_errors = 0, tx_errors = 0; int err; spin_lock(>lock); @@ -172,14 +219,21 @@ static void vdpasim_net_work(struct work_struct *work) while (true) { err = vringh_getdesc_iotlb(>vring, >out_iov, NULL, >head, GFP_ATOMIC); - if (err <= 0) + if (err <= 0) { + if (err) + ++tx_errors; break; + } + ++tx_pkts; read = vringh_iov_pull_iotlb(>vring, >out_iov, vdpasim->buffer, PAGE_SIZE); + tx_bytes += read; + if (!receive_filter(vdpasim, read)) { + ++rx_drops; vdpasim_net_complete(txq, 0); continue; } @@ -187,19 +241,25 @@ static void vdpasim_net_work(struct work_struct *work) err = vringh_getdesc_iotlb(>vring, NULL, >in_iov, >head, GFP_ATOMIC); if (err <= 0) { +
[PATCH V3 3/4] vdpa_sim: support vendor statistics
This patch adds a new config ops callback to allow individual simulator to implement the vendor stats callback. Acked-by: Eugenio Pérez Signed-off-by: Jason Wang --- drivers/vdpa/vdpa_sim/vdpa_sim.c | 14 ++ drivers/vdpa/vdpa_sim/vdpa_sim.h | 3 +++ 2 files changed, 17 insertions(+) diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c b/drivers/vdpa/vdpa_sim/vdpa_sim.c index 341da107e7da..45d3f84b7937 100644 --- a/drivers/vdpa/vdpa_sim/vdpa_sim.c +++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c @@ -424,6 +424,18 @@ static int vdpasim_get_vq_state(struct vdpa_device *vdpa, u16 idx, return 0; } +static int vdpasim_get_vq_stats(struct vdpa_device *vdpa, u16 idx, + struct sk_buff *msg, + struct netlink_ext_ack *extack) +{ + struct vdpasim *vdpasim = vdpa_to_sim(vdpa); + + if (vdpasim->dev_attr.get_stats) + return vdpasim->dev_attr.get_stats(vdpasim, idx, + msg, extack); + return -EOPNOTSUPP; +} + static u32 vdpasim_get_vq_align(struct vdpa_device *vdpa) { return VDPASIM_QUEUE_ALIGN; @@ -710,6 +722,7 @@ static const struct vdpa_config_ops vdpasim_config_ops = { .set_vq_ready = vdpasim_set_vq_ready, .get_vq_ready = vdpasim_get_vq_ready, .set_vq_state = vdpasim_set_vq_state, + .get_vendor_vq_stats= vdpasim_get_vq_stats, .get_vq_state = vdpasim_get_vq_state, .get_vq_align = vdpasim_get_vq_align, .get_vq_group = vdpasim_get_vq_group, @@ -743,6 +756,7 @@ static const struct vdpa_config_ops vdpasim_batch_config_ops = { .set_vq_ready = vdpasim_set_vq_ready, .get_vq_ready = vdpasim_get_vq_ready, .set_vq_state = vdpasim_set_vq_state, + .get_vendor_vq_stats= vdpasim_get_vq_stats, .get_vq_state = vdpasim_get_vq_state, .get_vq_align = vdpasim_get_vq_align, .get_vq_group = vdpasim_get_vq_group, diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.h b/drivers/vdpa/vdpa_sim/vdpa_sim.h index 51c070a543f1..d2a08c0abad7 100644 --- a/drivers/vdpa/vdpa_sim/vdpa_sim.h +++ b/drivers/vdpa/vdpa_sim/vdpa_sim.h @@ -48,6 +48,9 @@ struct vdpasim_dev_attr { work_func_t work_fn; void (*get_config)(struct vdpasim *vdpasim, void *config); void (*set_config)(struct vdpasim *vdpasim, const void *config); + int (*get_stats)(struct vdpasim *vdpasim, u16 idx, +struct sk_buff *msg, +struct netlink_ext_ack *extack); }; /* State of each vdpasim device */ -- 2.25.1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH V3 2/4] vdpasim: customize allocation size
Allow individual simulator to customize the allocation size. Reviewed-by: Stefano Garzarella Acked-by: Eugenio Pérez Signed-off-by: Jason Wang --- drivers/vdpa/vdpa_sim/vdpa_sim.c | 5 - drivers/vdpa/vdpa_sim/vdpa_sim.h | 1 + drivers/vdpa/vdpa_sim/vdpa_sim_blk.c | 1 + drivers/vdpa/vdpa_sim/vdpa_sim_net.c | 1 + 4 files changed, 7 insertions(+), 1 deletion(-) diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c b/drivers/vdpa/vdpa_sim/vdpa_sim.c index 118dbc8e5d67..341da107e7da 100644 --- a/drivers/vdpa/vdpa_sim/vdpa_sim.c +++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c @@ -254,6 +254,9 @@ struct vdpasim *vdpasim_create(struct vdpasim_dev_attr *dev_attr, struct device *dev; int i, ret = -ENOMEM; + if (!dev_attr->alloc_size) + return ERR_PTR(-EINVAL); + if (config->mask & BIT_ULL(VDPA_ATTR_DEV_FEATURES)) { if (config->device_features & ~dev_attr->supported_features) @@ -269,7 +272,7 @@ struct vdpasim *vdpasim_create(struct vdpasim_dev_attr *dev_attr, vdpa = __vdpa_alloc_device(NULL, ops, dev_attr->ngroups, dev_attr->nas, - sizeof(struct vdpasim), + dev_attr->alloc_size, dev_attr->name, false); if (IS_ERR(vdpa)) { ret = PTR_ERR(vdpa); diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.h b/drivers/vdpa/vdpa_sim/vdpa_sim.h index 0e78737dcc16..51c070a543f1 100644 --- a/drivers/vdpa/vdpa_sim/vdpa_sim.h +++ b/drivers/vdpa/vdpa_sim/vdpa_sim.h @@ -37,6 +37,7 @@ struct vdpasim_dev_attr { struct vdpa_mgmt_dev *mgmt_dev; const char *name; u64 supported_features; + size_t alloc_size; size_t config_size; size_t buffer_size; int nvqs; diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c b/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c index f745926237a8..5117959bed8a 100644 --- a/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c +++ b/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c @@ -378,6 +378,7 @@ static int vdpasim_blk_dev_add(struct vdpa_mgmt_dev *mdev, const char *name, dev_attr.nvqs = VDPASIM_BLK_VQ_NUM; dev_attr.ngroups = VDPASIM_BLK_GROUP_NUM; dev_attr.nas = VDPASIM_BLK_AS_NUM; + dev_attr.alloc_size = sizeof(struct vdpasim); dev_attr.config_size = sizeof(struct virtio_blk_config); dev_attr.get_config = vdpasim_blk_get_config; dev_attr.work_fn = vdpasim_blk_work; diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim_net.c b/drivers/vdpa/vdpa_sim/vdpa_sim_net.c index e8a115fbe49f..5abd4efd9028 100644 --- a/drivers/vdpa/vdpa_sim/vdpa_sim_net.c +++ b/drivers/vdpa/vdpa_sim/vdpa_sim_net.c @@ -253,6 +253,7 @@ static int vdpasim_net_dev_add(struct vdpa_mgmt_dev *mdev, const char *name, dev_attr.nvqs = VDPASIM_NET_VQ_NUM; dev_attr.ngroups = VDPASIM_NET_GROUP_NUM; dev_attr.nas = VDPASIM_NET_AS_NUM; + dev_attr.alloc_size = sizeof(struct vdpasim); dev_attr.config_size = sizeof(struct virtio_net_config); dev_attr.get_config = vdpasim_net_get_config; dev_attr.work_fn = vdpasim_net_work; -- 2.25.1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH V3 0/4] Vendor stats support in vdpasim_net
Hi All: This series implements vendor stats in vdpasim_net. Please review. Thanks Changes since V2: - rebase to Michale's tree (linux-next branch) - remove unnecessary blank lines - twaek the error code handling Changes since V1: - typo fixes - move the duplicated get_vendor_vq_stats() in vdpasim_batch_config_ops to vdpa_sim_config_ops - use -EOPNOTSUPP instead of -EINVAL in vdpasim_get_vq_stats - introdce a dedicated variable to record the successful cvq request and track the number of requests correctly Jason Wang (4): vdpa_sim: switch to use __vdpa_alloc_device() vdpasim: customize allocation size vdpa_sim: support vendor statistics vdpa_sim_net: vendor satistics drivers/vdpa/vdpa_sim/vdpa_sim.c | 30 +++- drivers/vdpa/vdpa_sim/vdpa_sim.h | 4 + drivers/vdpa/vdpa_sim/vdpa_sim_blk.c | 1 + drivers/vdpa/vdpa_sim/vdpa_sim_net.c | 218 ++- 4 files changed, 243 insertions(+), 10 deletions(-) -- 2.25.1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH V3 1/4] vdpa_sim: switch to use __vdpa_alloc_device()
This allows us to control the allocation size of the structure. Reviewed-by: Stefano Garzarella Acked-by: Eugenio Pérez Signed-off-by: Jason Wang --- drivers/vdpa/vdpa_sim/vdpa_sim.c | 13 - 1 file changed, 8 insertions(+), 5 deletions(-) diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c b/drivers/vdpa/vdpa_sim/vdpa_sim.c index cb88891b44a8..118dbc8e5d67 100644 --- a/drivers/vdpa/vdpa_sim/vdpa_sim.c +++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c @@ -249,6 +249,7 @@ struct vdpasim *vdpasim_create(struct vdpasim_dev_attr *dev_attr, const struct vdpa_dev_set_config *config) { const struct vdpa_config_ops *ops; + struct vdpa_device *vdpa; struct vdpasim *vdpasim; struct device *dev; int i, ret = -ENOMEM; @@ -266,14 +267,16 @@ struct vdpasim *vdpasim_create(struct vdpasim_dev_attr *dev_attr, else ops = _config_ops; - vdpasim = vdpa_alloc_device(struct vdpasim, vdpa, NULL, ops, - dev_attr->ngroups, dev_attr->nas, - dev_attr->name, false); - if (IS_ERR(vdpasim)) { - ret = PTR_ERR(vdpasim); + vdpa = __vdpa_alloc_device(NULL, ops, + dev_attr->ngroups, dev_attr->nas, + sizeof(struct vdpasim), + dev_attr->name, false); + if (IS_ERR(vdpa)) { + ret = PTR_ERR(vdpa); goto err_alloc; } + vdpasim = vdpa_to_sim(vdpa); vdpasim->dev_attr = *dev_attr; INIT_WORK(>work, dev_attr->work_fn); spin_lock_init(>lock); -- 2.25.1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [RFC PATCH 4/4] virtio-net: sleep instead of busy waiting for cvq command
On Thu, Dec 22, 2022 at 5:19 PM Eugenio Perez Martin wrote: > > On Thu, Dec 22, 2022 at 7:05 AM Jason Wang wrote: > > > > We used to busy waiting on the cvq command this tends to be > > problematic since: > > > > 1) CPU could wait for ever on a buggy/malicous device > > 2) There's no wait to terminate the process that triggers the cvq > >command > > > > So this patch switch to use sleep with a timeout (1s) instead of busy > > polling for the cvq command forever. This gives the scheduler a breath > > and can let the process can respond to a signal. > > > > Signed-off-by: Jason Wang > > --- > > drivers/net/virtio_net.c | 15 --- > > 1 file changed, 8 insertions(+), 7 deletions(-) > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > > index 8225496ccb1e..69173049371f 100644 > > --- a/drivers/net/virtio_net.c > > +++ b/drivers/net/virtio_net.c > > @@ -405,6 +405,7 @@ static void disable_rx_mode_work(struct virtnet_info > > *vi) > > vi->rx_mode_work_enabled = false; > > spin_unlock_bh(>rx_mode_lock); > > > > + virtqueue_wake_up(vi->cvq); > > flush_work(>rx_mode_work); > > } > > > > @@ -1497,6 +1498,11 @@ static bool try_fill_recv(struct virtnet_info *vi, > > struct receive_queue *rq, > > return !oom; > > } > > > > +static void virtnet_cvq_done(struct virtqueue *cvq) > > +{ > > + virtqueue_wake_up(cvq); > > +} > > + > > static void skb_recv_done(struct virtqueue *rvq) > > { > > struct virtnet_info *vi = rvq->vdev->priv; > > @@ -2024,12 +2030,7 @@ static bool virtnet_send_command(struct virtnet_info > > *vi, u8 class, u8 cmd, > > if (unlikely(!virtqueue_kick(vi->cvq))) > > return vi->ctrl->status == VIRTIO_NET_OK; > > > > - /* Spin for a response, the kick causes an ioport write, trapping > > -* into the hypervisor, so the request should be handled > > immediately. > > -*/ > > - while (!virtqueue_get_buf(vi->cvq, ) && > > - !virtqueue_is_broken(vi->cvq)) > > - cpu_relax(); > > + virtqueue_wait_for_used(vi->cvq, ); > > > > return vi->ctrl->status == VIRTIO_NET_OK; > > } > > @@ -3524,7 +3525,7 @@ static int virtnet_find_vqs(struct virtnet_info *vi) > > > > /* Parameters for control virtqueue, if any */ > > if (vi->has_cvq) { > > - callbacks[total_vqs - 1] = NULL; > > + callbacks[total_vqs - 1] = virtnet_cvq_done; > > If we're using CVQ callback, what is the actual use of the timeout? Because we can't sleep forever since locks could be held like RTNL_LOCK. > > I'd say there is no right choice neither in the right timeout value > nor in the action to take. In the next version, I tend to put BAD_RING() to prevent future requests. > Why not simply trigger the cmd and do all > the changes at command return? I don't get this, sorry. > > I suspect the reason is that it complicates the code. For example, > having the possibility of many in flight commands, races between their > completion, etc. Actually the cvq command was serialized through RTNL_LOCK, so we don't need to worry about this. In the next version I can add ASSERT_RTNL(). Thanks > The virtio standard does not even cover unordered > used commands if I'm not wrong. > > Is there any other fundamental reason? > > Thanks! > > > names[total_vqs - 1] = "control"; > > } > > > > -- > > 2.25.1 > > > ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [RFC PATCH 4/4] virtio-net: sleep instead of busy waiting for cvq command
On Thu, Dec 22, 2022 at 11:55 PM Alvaro Karsz wrote: > > My point is that the device may complete the control command after the > timeout, This needs to be proposed to the virtio spec first. And actually we need more than this: 1) we still need a way to deal with the device without this feature 2) driver can't depend solely on what is advertised by the device (e.g device can choose to advertise a very long timeout) > so, if I'm not mistaken, next time we send a control command and call > virtqueue_wait_for_used we'll get the previous response. > In the next version, I will first put BAD_RING() to prevent future requests for cvq. Note that the patch can't fix all the issues, we need more things on top. But it's a good step and it will behave much better than the current code. Thanks ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[GIT PULL] virtio,vhost,vdpa: features, fixes, cleanups
The following changes since commit 830b3c68c1fb1e9176028d02ef86f3cf76aa2476: Linux 6.1 (2022-12-11 14:15:18 -0800) are available in the Git repository at: https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git tags/for_linus for you to fetch changes up to 98dd6b2ef50d6f7876606a86c8d8a767c9fef6f5: virtio_blk: mark all zone fields LE (2022-12-22 14:32:36 -0500) Note: merging this upstream results in a conflict between commit: de4eda9de2d9 ("use less confusing names for iov_iter direction initializers") from Linus' tree and commit: ("virtio/vsock: replace virtio_vsock_pkt with sk_buff") from this tree. This resolution below fixes it up, due to Stephen Rothwell diff --cc drivers/vhost/vsock.c index cd6f7776013a,830bc823addc.. --- a/drivers/vhost/vsock.c +++ b/drivers/vhost/vsock.c @@@ -165,8 -157,9 +157,9 @@@ vhost_transport_do_send_pkt(struct vhos break; } - iov_iter_init(_iter, READ, >iov[out], in, iov_len); + iov_iter_init(_iter, ITER_DEST, >iov[out], in, iov_len); - payload_len = pkt->len - pkt->off; + payload_len = skb->len; + hdr = virtio_vsock_hdr(skb); /* If the packet is greater than the space available in the * buffer, we split it using multiple buffers. @@@ -366,18 -340,21 +340,22 @@@ vhost_vsock_alloc_skb(struct vhost_virt return NULL; } - pkt = kzalloc(sizeof(*pkt), GFP_KERNEL); - if (!pkt) + len = iov_length(vq->iov, out); + + /* len contains both payload and hdr */ + skb = virtio_vsock_alloc_skb(len, GFP_KERNEL); + if (!skb) return NULL; - iov_iter_init(_iter, WRITE, vq->iov, out, len); + len = iov_length(vq->iov, out); + iov_iter_init(_iter, ITER_SOURCE, vq->iov, out, len); - nbytes = copy_from_iter(>hdr, sizeof(pkt->hdr), _iter); - if (nbytes != sizeof(pkt->hdr)) { + hdr = virtio_vsock_hdr(skb); + nbytes = copy_from_iter(hdr, sizeof(*hdr), _iter); + if (nbytes != sizeof(*hdr)) { vq_err(vq, "Expected %zu bytes for pkt->hdr, got %zu bytes\n", - sizeof(pkt->hdr), nbytes); - kfree(pkt); + sizeof(*hdr), nbytes); + kfree_skb(skb); return NULL; } It can also be found in linux-next, see next-20221220. virtio,vhost,vdpa: features, fixes, cleanups zoned block device support lifetime stats support (for virtio devices backed by memory supporting that) vsock rework to use skbuffs ifcvf features provisioning new SolidNET DPU driver Signed-off-by: Michael S. Tsirkin Alvaro Karsz (5): Add SolidRun vendor id New PCI quirk for SolidRun SNET DPU. virtio: vdpa: new SolidNET DPU driver. virtio_blk: add VIRTIO_BLK_F_LIFETIME feature support virtio: vdpa: fix snprintf size argument in snet_vdpa driver Angus Chen (2): virtio_pci: modify ENOENT to EINVAL virtio_blk: use UINT_MAX instead of -1U Bobby Eshleman (1): virtio/vsock: replace virtio_vsock_pkt with sk_buff Cindy Lu (2): vhost_vdpa: fix the crash in unmap a large memory vdpa_sim_net: should not drop the multicast/broadcast packet Colin Ian King (1): RDMA/mlx5: remove variable i Davidlohr Bueso (2): tools/virtio: remove stray characters tools/virtio: remove smp_read_barrier_depends() Dawei Li (1): virtio: Implementing attribute show with sysfs_emit Dmitry Fomichev (2): virtio-blk: use a helper to handle request queuing errors virtio-blk: add support for zoned block devices Eli Cohen (8): vdpa/mlx5: Fix rule forwarding VLAN to TIR vdpa/mlx5: Return error on vlan ctrl commands if not supported vdpa/mlx5: Fix wrong mac address deletion vdpa/mlx5: Avoid using reslock in event_handler vdpa/mlx5: Avoid overwriting CVQ iotlb vdpa/mlx5: Move some definitions to a new header file vdpa/mlx5: Add debugfs subtree vdpa/mlx5: Add RX counters to debugfs Eugenio Pérez (1): vdpa_sim_net: Offer VIRTIO_NET_F_STATUS Harshit Mogalapalli (1): vduse: Validate vq_num in vduse_validate_config() Jason Wang (2): vdpa: conditionally fill max max queue pair for stats vdpasim: fix memory leak when freeing IOTLBs Michael S. Tsirkin (3): virtio_blk: temporary variable type tweak virtio_blk: zone append in header type tweak virtio_blk: mark all zone fields LE Michael Sammler (1): virtio_pmem: populate numa information Rafael Mendonca (1): virtio_blk: Fix signedness bug in virtblk_prep_rq() Ricardo Cañuelo (2): tools/virtio: initialize spinlocks in vring_test.c docs: driver-api: virtio: virtio on Linux Rong
[PATCH] virtio_blk: mark all zone fields LE
zone is a virtio 1.x feature so all fields are LE, they are handled as such, but have mistakenly been labeled __virtioXX in the header. This results in a bunch of sparse warnings. Use the __leXX tags to make sparse happy. Signed-off-by: Michael S. Tsirkin --- include/uapi/linux/virtio_blk.h | 18 +- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/include/uapi/linux/virtio_blk.h b/include/uapi/linux/virtio_blk.h index f4d5ee7c6f30..ec3c3779406f 100644 --- a/include/uapi/linux/virtio_blk.h +++ b/include/uapi/linux/virtio_blk.h @@ -141,11 +141,11 @@ struct virtio_blk_config { /* Zoned block device characteristics (if VIRTIO_BLK_F_ZONED) */ struct virtio_blk_zoned_characteristics { - __virtio32 zone_sectors; - __virtio32 max_open_zones; - __virtio32 max_active_zones; - __virtio32 max_append_sectors; - __virtio32 write_granularity; + __le32 zone_sectors; + __le32 max_open_zones; + __le32 max_active_zones; + __le32 max_append_sectors; + __le32 write_granularity; __u8 model; __u8 unused2[3]; } zoned; @@ -245,11 +245,11 @@ struct virtio_blk_outhdr { */ struct virtio_blk_zone_descriptor { /* Zone capacity */ - __virtio64 z_cap; + __le64 z_cap; /* The starting sector of the zone */ - __virtio64 z_start; + __le64 z_start; /* Zone write pointer position in sectors */ - __virtio64 z_wp; + __le64 z_wp; /* Zone type */ __u8 z_type; /* Zone state */ @@ -258,7 +258,7 @@ struct virtio_blk_zone_descriptor { }; struct virtio_blk_zone_report { - __virtio64 nr_zones; + __le64 nr_zones; __u8 reserved[56]; struct virtio_blk_zone_descriptor zones[]; }; -- MST ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [RFC PATCH 4/4] virtio-net: sleep instead of busy waiting for cvq command
My point is that the device may complete the control command after the timeout, so, if I'm not mistaken, next time we send a control command and call virtqueue_wait_for_used we'll get the previous response. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH V2 2/4] vdpasim: customize allocation size
On Thu, Dec 22, 2022 at 4:22 PM Eugenio Perez Martin wrote: > > On Thu, Dec 22, 2022 at 6:01 AM Jason Wang wrote: > > > > Allow individual simulator to customize the allocation size. > > > > Signed-off-by: Jason Wang > > --- > > drivers/vdpa/vdpa_sim/vdpa_sim.c | 8 ++-- > > drivers/vdpa/vdpa_sim/vdpa_sim.h | 1 + > > drivers/vdpa/vdpa_sim/vdpa_sim_blk.c | 1 + > > drivers/vdpa/vdpa_sim/vdpa_sim_net.c | 1 + > > 4 files changed, 9 insertions(+), 2 deletions(-) > > > > diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c > > b/drivers/vdpa/vdpa_sim/vdpa_sim.c > > index 757afef86ba0..55aaa023a6e2 100644 > > --- a/drivers/vdpa/vdpa_sim/vdpa_sim.c > > +++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c > > @@ -253,7 +253,10 @@ struct vdpasim *vdpasim_create(struct vdpasim_dev_attr > > *dev_attr, > > struct vdpa_device *vdpa; > > struct vdpasim *vdpasim; > > struct device *dev; > > - int i, ret = -ENOMEM; > > + int i, ret = -EINVAL; > > + > > + if (!dev_attr->alloc_size) > > + goto err_alloc; > > It's weird that we need an error goto here and the next chunk of code > may return ERR_PTR(-EINVAL) directly. It's better to return directly > here in my opinion. Right, let me fix it. > > > > > if (config->mask & BIT_ULL(VDPA_ATTR_DEV_FEATURES)) { > > if (config->device_features & > > @@ -268,9 +271,10 @@ struct vdpasim *vdpasim_create(struct vdpasim_dev_attr > > *dev_attr, > > else > > ops = _config_ops; > > > > + ret = -ENOMEM; > > Nitpicking again, > > To set ret = -ENOMEM here is weird to me, since the next failure on > __vdpa_alloc_device will override it. I'd set it right before > dma_set_mask_and_coherent, closer to its actual usage. Yes. > > Actually, I'd propagate dma_set_mask_and_coherent error and set ret = > -ENOMEM right before *alloc functions, but it wasn't done that way > before this series, so not a reason to NAK to me. Yes and I think this needs a separate fix. Thanks > > Thanks! > > > vdpa = __vdpa_alloc_device(NULL, ops, > >dev_attr->ngroups, dev_attr->nas, > > - sizeof(struct vdpasim), > > + dev_attr->alloc_size, > >dev_attr->name, false); > > if (IS_ERR(vdpa)) { > > ret = PTR_ERR(vdpa); > > diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.h > > b/drivers/vdpa/vdpa_sim/vdpa_sim.h > > index 0e78737dcc16..51c070a543f1 100644 > > --- a/drivers/vdpa/vdpa_sim/vdpa_sim.h > > +++ b/drivers/vdpa/vdpa_sim/vdpa_sim.h > > @@ -37,6 +37,7 @@ struct vdpasim_dev_attr { > > struct vdpa_mgmt_dev *mgmt_dev; > > const char *name; > > u64 supported_features; > > + size_t alloc_size; > > size_t config_size; > > size_t buffer_size; > > int nvqs; > > diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c > > b/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c > > index c6db1a1baf76..4f7c35f59aa5 100644 > > --- a/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c > > +++ b/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c > > @@ -378,6 +378,7 @@ static int vdpasim_blk_dev_add(struct vdpa_mgmt_dev > > *mdev, const char *name, > > dev_attr.nvqs = VDPASIM_BLK_VQ_NUM; > > dev_attr.ngroups = VDPASIM_BLK_GROUP_NUM; > > dev_attr.nas = VDPASIM_BLK_AS_NUM; > > + dev_attr.alloc_size = sizeof(struct vdpasim); > > dev_attr.config_size = sizeof(struct virtio_blk_config); > > dev_attr.get_config = vdpasim_blk_get_config; > > dev_attr.work_fn = vdpasim_blk_work; > > diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim_net.c > > b/drivers/vdpa/vdpa_sim/vdpa_sim_net.c > > index c3cb225ea469..20cd5cdff919 100644 > > --- a/drivers/vdpa/vdpa_sim/vdpa_sim_net.c > > +++ b/drivers/vdpa/vdpa_sim/vdpa_sim_net.c > > @@ -249,6 +249,7 @@ static int vdpasim_net_dev_add(struct vdpa_mgmt_dev > > *mdev, const char *name, > > dev_attr.nvqs = VDPASIM_NET_VQ_NUM; > > dev_attr.ngroups = VDPASIM_NET_GROUP_NUM; > > dev_attr.nas = VDPASIM_NET_AS_NUM; > > + dev_attr.alloc_size = sizeof(struct vdpasim); > > dev_attr.config_size = sizeof(struct virtio_net_config); > > dev_attr.get_config = vdpasim_net_get_config; > > dev_attr.work_fn = vdpasim_net_work; > > -- > > 2.25.1 > > > ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [RFC PATCH 4/4] virtio-net: sleep instead of busy waiting for cvq command
Hi Alvaro: On Thu, Dec 22, 2022 at 2:44 PM Alvaro Karsz wrote: > > Hi Jason, > > Adding timeout to the cvq is a great idea IMO. > > > - /* Spin for a response, the kick causes an ioport write, trapping > > -* into the hypervisor, so the request should be handled > > immediately. > > -*/ > > - while (!virtqueue_get_buf(vi->cvq, ) && > > - !virtqueue_is_broken(vi->cvq)) > > - cpu_relax(); > > + virtqueue_wait_for_used(vi->cvq, ); > > Do you think that we should continue like nothing happened in case of a > timeout? We could, but we should not depend on a device to do this since it's not reliable. More below. > Shouldn't we reset the device? We can't depend on device, there's probably another loop in reset(): E.g in vp_reset() we had: while (vp_modern_get_status(mdev)) msleep(1); > What happens if a device completes the control command after timeout? Maybe we could have a BAD_RING() here in this case (and more check in vq->broken in this case). Thanks > > Thanks > > Alvaro > ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization