Re: vdpa legacy guest support (was Re: [PATCH] vdpa/mlx5: set_features should allow reset to zero)

2021-12-12 Thread Michael S. Tsirkin
On Fri, Dec 10, 2021 at 05:44:15PM -0800, Si-Wei Liu wrote:
> Sorry for reviving this ancient thread. I was kinda lost for the conclusion
> it ended up with. I have the following questions,
> 
> 1. legacy guest support: from the past conversations it doesn't seem the
> support will be completely dropped from the table, is my understanding
> correct? Actually we're interested in supporting virtio v0.95 guest for x86,
> which is backed by the spec at
> https://ozlabs.org/~rusty/virtio-spec/virtio-0.9.5.pdf. Though I'm not sure
> if there's request/need to support wilder legacy virtio versions earlier
> beyond.

I personally feel it's less work to add in kernel than try to
work around it in userspace. Jason feels differently.
Maybe post the patches and this will prove to Jason it's not
too terrible?

> 2. suppose some form of legacy guest support needs to be there, how do we
> deal with the bogus assumption below in vdpa_get_config() in the short term?
> It looks one of the intuitive fix is to move the vdpa_set_features call out
> of vdpa_get_config() to vdpa_set_config().
> 
>     /*
>  * Config accesses aren't supposed to trigger before features are
> set.
>  * If it does happen we assume a legacy guest.
>  */
>     if (!vdev->features_valid)
>     vdpa_set_features(vdev, 0);
>     ops->get_config(vdev, offset, buf, len);
> 
> I can post a patch to fix 2) if there's consensus already reached.
> 
> Thanks,
> -Siwei

I'm not sure how important it is to change that.
In any case it only affects transitional devices, right?
Legacy only should not care ...


> On 3/2/2021 2:53 AM, Jason Wang wrote:
> > 
> > On 2021/3/2 5:47 下午, Michael S. Tsirkin wrote:
> > > On Mon, Mar 01, 2021 at 11:56:50AM +0800, Jason Wang wrote:
> > > > On 2021/3/1 5:34 上午, Michael S. Tsirkin wrote:
> > > > > On Wed, Feb 24, 2021 at 10:24:41AM -0800, Si-Wei Liu wrote:
> > > > > > > Detecting it isn't enough though, we will need a new ioctl to 
> > > > > > > notify
> > > > > > > the kernel that it's a legacy guest. Ugh :(
> > > > > > Well, although I think adding an ioctl is doable, may I
> > > > > > know what the use
> > > > > > case there will be for kernel to leverage such info
> > > > > > directly? Is there a
> > > > > > case QEMU can't do with dedicate ioctls later if there's indeed
> > > > > > differentiation (legacy v.s. modern) needed?
> > > > > BTW a good API could be
> > > > > 
> > > > > #define VHOST_SET_ENDIAN _IOW(VHOST_VIRTIO, ?, int)
> > > > > #define VHOST_GET_ENDIAN _IOW(VHOST_VIRTIO, ?, int)
> > > > > 
> > > > > we did it per vring but maybe that was a mistake ...
> > > > 
> > > > Actually, I wonder whether it's good time to just not support
> > > > legacy driver
> > > > for vDPA. Consider:
> > > > 
> > > > 1) It's definition is no-normative
> > > > 2) A lot of budren of codes
> > > > 
> > > > So qemu can still present the legacy device since the config
> > > > space or other
> > > > stuffs that is presented by vhost-vDPA is not expected to be
> > > > accessed by
> > > > guest directly. Qemu can do the endian conversion when necessary
> > > > in this
> > > > case?
> > > > 
> > > > Thanks
> > > > 
> > > Overall I would be fine with this approach but we need to avoid breaking
> > > working userspace, qemu releases with vdpa support are out there and
> > > seem to work for people. Any changes need to take that into account
> > > and document compatibility concerns.
> > 
> > 
> > Agree, let me check.
> > 
> > 
> > >   I note that any hardware
> > > implementation is already broken for legacy except on platforms with
> > > strong ordering which might be helpful in reducing the scope.
> > 
> > 
> > Yes.
> > 
> > Thanks
> > 
> > 
> > > 
> > > 
> > 

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v1 1/7] vdpa: Provide interface to read driver features

2021-12-12 Thread Michael S. Tsirkin
On Thu, Dec 09, 2021 at 02:29:17PM -0800, Si-Wei Liu wrote:
> 
> 
> On 12/8/2021 10:47 PM, Eli Cohen wrote:
> > On Wed, Dec 08, 2021 at 03:57:21PM -0800, Si-Wei Liu wrote:
> > > 
> > > On 12/8/2021 12:14 PM, Eli Cohen wrote:
> > > > Provide an interface to read the negotiated features. This is needed
> > > > when building the netlink message in vdpa_dev_net_config_fill().
> > > > 
> > > > Also fix the implementation of vdpa_dev_net_config_fill() to use the
> > > > negotiated features instead of the device features.
> > > Are we sure the use of device feature is a bug rather than expected
> > > behavior?
> > I think so since in vdpa_dev_net_config_fill() we're returning the
> > current configuration so you you don't want to mislead the user with
> > wrong information.
> You seem to imply vdpa_dev_net_config_fill() should return the
> current/running driver config rather than the initial setting of the device
> side before feature negotiation kicks in. This seems to work for the running
> link status which is propagated to the configuration space, but how do you
> infer the other untrackable running config the driver is operating, such as
> mtu, the effective value of which is not written back to config space or
> propagated back to vdpa backend?
> 
> > 
> > > How do users determine the configured number of queue pairs at any
> > > point in case of a non-multiqueue supporting guest/driver or that the 
> > > device
> > > is being reset (where actual_features == 0)?
> > I would think if you read during reset (which is really a short period
> > of time), you get what there is at the moment.
> I would stress out the case if guest not supporting multi-queue. For e.g
> guest firmware might only support single queue without control queue, which
> is not considered to be transient. Since the validity of the value connects
> to feature negotiation, how does the host admin user infer feature
> negotiation is complete? What is the administrative value if they have to
> keep track of the change without knowing the effective negotiation status?
> 
> Thanks,
> -Siwei

Right now I think the assumption is that features can be
overwritten multiple times.
I would say the way to detect feature negotiation is complete for
a modern guest would be by using set_status.
Legacy guests also kick before set_status, once they do
feature negotiation is complete.

Having said all that, ability to read config before FEATURES_OK
only caused us pain so far. I think we should at least discourage it
at the spec level.

> > 
> > > Maybe we should differentiate
> > > the static device config against driver/device running state here. I 
> > > thought
> > > the change to vdpa_dev_net_config_fill() in this patch would break vdpa 
> > > tool
> > > user's expectation that the value of max_vqp config is a fixed value since
> > > the vdpa creation time.
> > That was not intended to be fixed AFAIU.
> > 
> > > Perhaps worth adding another attribute to represent the running state
> > > (cur_max_qps) based on the negotiated features.
> > > 
> > You can get that information by running e.g. ethtool -l ens1.
> > 
> > > -Siwei
> > > 
> > > > To make APIs clearer, make the following name changes to struct
> > > > vdpa_config_ops so they better describe their operations:
> > > > 
> > > > get_features -> get_device_features
> > > > set_features -> set_driver_features
> > > > 
> > > > Finally, add get_driver_features to return the negotiated features and
> > > > add implementation to all the upstream drivers.
> > > > 
> > > > Signed-off-by: Eli Cohen 
> > > > ---
> > > >drivers/vdpa/alibaba/eni_vdpa.c| 16 
> > > >drivers/vdpa/ifcvf/ifcvf_main.c| 16 
> > > >drivers/vdpa/mlx5/net/mlx5_vnet.c  | 16 
> > > >drivers/vdpa/vdpa.c|  2 +-
> > > >drivers/vdpa/vdpa_sim/vdpa_sim.c   | 21 +++--
> > > >drivers/vdpa/vdpa_user/vduse_dev.c | 16 
> > > >drivers/vdpa/virtio_pci/vp_vdpa.c  | 16 
> > > >drivers/vhost/vdpa.c   |  2 +-
> > > >drivers/virtio/virtio_vdpa.c   |  2 +-
> > > >include/linux/vdpa.h   | 14 +-
> > > >10 files changed, 87 insertions(+), 34 deletions(-)
> > > > 
> > > > diff --git a/drivers/vdpa/alibaba/eni_vdpa.c 
> > > > b/drivers/vdpa/alibaba/eni_vdpa.c
> > > > index 3f788794571a..ac0975660f4d 100644
> > > > --- a/drivers/vdpa/alibaba/eni_vdpa.c
> > > > +++ b/drivers/vdpa/alibaba/eni_vdpa.c
> > > > @@ -58,7 +58,7 @@ static struct virtio_pci_legacy_device 
> > > > *vdpa_to_ldev(struct vdpa_device *vdpa)
> > > > return &eni_vdpa->ldev;
> > > >}
> > > > -static u64 eni_vdpa_get_features(struct vdpa_device *vdpa)
> > > > +static u64 eni_vdpa_get_device_features(struct vdpa_device *vdpa)
> > > >{
> > > > struct virtio_pci_legacy_device *ldev = vdpa_to_ldev(vdpa);
> > > > u64 features = vp_legacy_get_features(ldev);
> > > > @@ -69,7 +

Re: [PATCH 1/5] uio: remove copy_from_iter_flushcache() and copy_mc_to_iter()

2021-12-12 Thread Dan Williams
On Wed, Dec 8, 2021 at 10:38 PM Christoph Hellwig  wrote:
>
> These two wrappers are never used.
>
> Signed-off-by: Christoph Hellwig 
> ---
>  drivers/nvdimm/pmem.c |  4 ++--
>  include/linux/uio.h   | 20 +---
>  2 files changed, 3 insertions(+), 21 deletions(-)
>
> diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
> index 4190c8c46ca88..8294f1c701baa 100644
> --- a/drivers/nvdimm/pmem.c
> +++ b/drivers/nvdimm/pmem.c
> @@ -302,8 +302,8 @@ static long pmem_dax_direct_access(struct dax_device 
> *dax_dev,
>  }
>
>  /*
> - * Use the 'no check' versions of copy_from_iter_flushcache() and
> - * copy_mc_to_iter() to bypass HARDENED_USERCOPY overhead. Bounds
> + * Use the 'no check' versions of _copy_from_iter_flushcache() and
> + * _copy_mc_to_iter() to bypass HARDENED_USERCOPY overhead. Bounds
>   * checking, both file offset and device offset, is handled by
>   * dax_iomap_actor()
>   */

This comment change does not make sense since it is saying why pmem is
using the "_" versions. However, I assume this whole comment goes away
in a later patch.

> diff --git a/include/linux/uio.h b/include/linux/uio.h
> index 6350354f97e90..494d552c1d663 100644
> --- a/include/linux/uio.h
> +++ b/include/linux/uio.h
> @@ -196,7 +196,7 @@ bool copy_from_iter_full_nocache(void *addr, size_t 
> bytes, struct iov_iter *i)
>  #ifdef CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE
>  /*
>   * Note, users like pmem that depend on the stricter semantics of
> - * copy_from_iter_flushcache() than copy_from_iter_nocache() must check for
> + * _copy_from_iter_flushcache() than _copy_from_iter_nocache() must check for
>   * IS_ENABLED(CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE) before assuming that the
>   * destination is flushed from the cache on return.
>   */

Same here.

> @@ -211,24 +211,6 @@ size_t _copy_mc_to_iter(const void *addr, size_t bytes, 
> struct iov_iter *i);
>  #define _copy_mc_to_iter _copy_to_iter
>  #endif
>
> -static __always_inline __must_check
> -size_t copy_from_iter_flushcache(void *addr, size_t bytes, struct iov_iter 
> *i)
> -{
> -   if (unlikely(!check_copy_size(addr, bytes, false)))
> -   return 0;
> -   else
> -   return _copy_from_iter_flushcache(addr, bytes, i);
> -}
> -
> -static __always_inline __must_check
> -size_t copy_mc_to_iter(void *addr, size_t bytes, struct iov_iter *i)
> -{
> -   if (unlikely(!check_copy_size(addr, bytes, true)))
> -   return 0;
> -   else
> -   return _copy_mc_to_iter(addr, bytes, i);
> -}
> -
>  size_t iov_iter_zero(size_t bytes, struct iov_iter *);
>  unsigned long iov_iter_alignment(const struct iov_iter *i);
>  unsigned long iov_iter_gap_alignment(const struct iov_iter *i);
> --
> 2.30.2
>
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 2/5] dax: simplify dax_synchronous and set_dax_synchronous

2021-12-12 Thread Dan Williams
On Wed, Dec 8, 2021 at 10:38 PM Christoph Hellwig  wrote:
>
> Remove the pointless wrappers.
>
> Signed-off-by: Christoph Hellwig 

Looks good, not sure why those ever existed.

Reviewed-by: Dan Williams 

> ---
>  drivers/dax/super.c |  8 
>  include/linux/dax.h | 12 ++--
>  2 files changed, 6 insertions(+), 14 deletions(-)
>
> diff --git a/drivers/dax/super.c b/drivers/dax/super.c
> index e7152a6c4cc40..e18155f43a635 100644
> --- a/drivers/dax/super.c
> +++ b/drivers/dax/super.c
> @@ -208,17 +208,17 @@ bool dax_write_cache_enabled(struct dax_device *dax_dev)
>  }
>  EXPORT_SYMBOL_GPL(dax_write_cache_enabled);
>
> -bool __dax_synchronous(struct dax_device *dax_dev)
> +bool dax_synchronous(struct dax_device *dax_dev)
>  {
> return test_bit(DAXDEV_SYNC, &dax_dev->flags);
>  }
> -EXPORT_SYMBOL_GPL(__dax_synchronous);
> +EXPORT_SYMBOL_GPL(dax_synchronous);
>
> -void __set_dax_synchronous(struct dax_device *dax_dev)
> +void set_dax_synchronous(struct dax_device *dax_dev)
>  {
> set_bit(DAXDEV_SYNC, &dax_dev->flags);
>  }
> -EXPORT_SYMBOL_GPL(__set_dax_synchronous);
> +EXPORT_SYMBOL_GPL(set_dax_synchronous);
>
>  bool dax_alive(struct dax_device *dax_dev)
>  {
> diff --git a/include/linux/dax.h b/include/linux/dax.h
> index 87ae4c9b1d65b..3bd1fdb5d5f4b 100644
> --- a/include/linux/dax.h
> +++ b/include/linux/dax.h
> @@ -48,16 +48,8 @@ void put_dax(struct dax_device *dax_dev);
>  void kill_dax(struct dax_device *dax_dev);
>  void dax_write_cache(struct dax_device *dax_dev, bool wc);
>  bool dax_write_cache_enabled(struct dax_device *dax_dev);
> -bool __dax_synchronous(struct dax_device *dax_dev);
> -static inline bool dax_synchronous(struct dax_device *dax_dev)
> -{
> -   return  __dax_synchronous(dax_dev);
> -}
> -void __set_dax_synchronous(struct dax_device *dax_dev);
> -static inline void set_dax_synchronous(struct dax_device *dax_dev)
> -{
> -   __set_dax_synchronous(dax_dev);
> -}
> +bool dax_synchronous(struct dax_device *dax_dev);
> +void set_dax_synchronous(struct dax_device *dax_dev);
>  /*
>   * Check if given mapping is supported by the file / underlying device.
>   */
> --
> 2.30.2
>
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 3/5] dax: remove the DAXDEV_F_SYNC flag

2021-12-12 Thread Dan Williams
On Wed, Dec 8, 2021 at 10:38 PM Christoph Hellwig  wrote:
>
> Remove the DAXDEV_F_SYNC flag and thus the flags argument to alloc_dax and
> just let the drivers call set_dax_synchronous directly.
>
> Signed-off-by: Christoph Hellwig 

Sure, looks good to me.

Reviewed-by: Dan Williams 

> ---
>  drivers/dax/bus.c| 3 ++-
>  drivers/dax/super.c  | 6 +-
>  drivers/md/dm.c  | 2 +-
>  drivers/nvdimm/pmem.c| 7 +++
>  drivers/s390/block/dcssblk.c | 4 ++--
>  fs/fuse/virtio_fs.c  | 2 +-
>  include/linux/dax.h  | 8 ++--
>  7 files changed, 12 insertions(+), 20 deletions(-)
>
> diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c
> index 6683d42c32c56..da2a14d096d29 100644
> --- a/drivers/dax/bus.c
> +++ b/drivers/dax/bus.c
> @@ -1324,11 +1324,12 @@ struct dev_dax *devm_create_dev_dax(struct 
> dev_dax_data *data)
>  * No dax_operations since there is no access to this device outside 
> of
>  * mmap of the resulting character device.
>  */
> -   dax_dev = alloc_dax(dev_dax, NULL, DAXDEV_F_SYNC);
> +   dax_dev = alloc_dax(dev_dax, NULL);
> if (IS_ERR(dax_dev)) {
> rc = PTR_ERR(dax_dev);
> goto err_alloc_dax;
> }
> +   set_dax_synchronous(dax_dev);
>
> /* a device_dax instance is dead while the driver is not attached */
> kill_dax(dax_dev);
> diff --git a/drivers/dax/super.c b/drivers/dax/super.c
> index e18155f43a635..e81d5ee57390f 100644
> --- a/drivers/dax/super.c
> +++ b/drivers/dax/super.c
> @@ -345,8 +345,7 @@ static struct dax_device *dax_dev_get(dev_t devt)
> return dax_dev;
>  }
>
> -struct dax_device *alloc_dax(void *private, const struct dax_operations *ops,
> -   unsigned long flags)
> +struct dax_device *alloc_dax(void *private, const struct dax_operations *ops)
>  {
> struct dax_device *dax_dev;
> dev_t devt;
> @@ -366,9 +365,6 @@ struct dax_device *alloc_dax(void *private, const struct 
> dax_operations *ops,
>
> dax_dev->ops = ops;
> dax_dev->private = private;
> -   if (flags & DAXDEV_F_SYNC)
> -   set_dax_synchronous(dax_dev);
> -
> return dax_dev;
>
>   err_dev:
> diff --git a/drivers/md/dm.c b/drivers/md/dm.c
> index 4e997c02bb0a0..f4b972af10928 100644
> --- a/drivers/md/dm.c
> +++ b/drivers/md/dm.c
> @@ -1765,7 +1765,7 @@ static struct mapped_device *alloc_dev(int minor)
> sprintf(md->disk->disk_name, "dm-%d", minor);
>
> if (IS_ENABLED(CONFIG_FS_DAX)) {
> -   md->dax_dev = alloc_dax(md, &dm_dax_ops, 0);
> +   md->dax_dev = alloc_dax(md, &dm_dax_ops);
> if (IS_ERR(md->dax_dev)) {
> md->dax_dev = NULL;
> goto bad;
> diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
> index 8294f1c701baa..85b3339286bd8 100644
> --- a/drivers/nvdimm/pmem.c
> +++ b/drivers/nvdimm/pmem.c
> @@ -402,7 +402,6 @@ static int pmem_attach_disk(struct device *dev,
> struct gendisk *disk;
> void *addr;
> int rc;
> -   unsigned long flags = 0UL;
>
> pmem = devm_kzalloc(dev, sizeof(*pmem), GFP_KERNEL);
> if (!pmem)
> @@ -495,13 +494,13 @@ static int pmem_attach_disk(struct device *dev,
> nvdimm_badblocks_populate(nd_region, &pmem->bb, &bb_range);
> disk->bb = &pmem->bb;
>
> -   if (is_nvdimm_sync(nd_region))
> -   flags = DAXDEV_F_SYNC;
> -   dax_dev = alloc_dax(pmem, &pmem_dax_ops, flags);
> +   dax_dev = alloc_dax(pmem, &pmem_dax_ops);
> if (IS_ERR(dax_dev)) {
> rc = PTR_ERR(dax_dev);
> goto out;
> }
> +   if (is_nvdimm_sync(nd_region))
> +   set_dax_synchronous(dax_dev);
> rc = dax_add_host(dax_dev, disk);
> if (rc)
> goto out_cleanup_dax;
> diff --git a/drivers/s390/block/dcssblk.c b/drivers/s390/block/dcssblk.c
> index e65e83764d1ce..10823debc09bd 100644
> --- a/drivers/s390/block/dcssblk.c
> +++ b/drivers/s390/block/dcssblk.c
> @@ -686,13 +686,13 @@ dcssblk_add_store(struct device *dev, struct 
> device_attribute *attr, const char
> if (rc)
> goto put_dev;
>
> -   dev_info->dax_dev = alloc_dax(dev_info, &dcssblk_dax_ops,
> -   DAXDEV_F_SYNC);
> +   dev_info->dax_dev = alloc_dax(dev_info, &dcssblk_dax_ops);
> if (IS_ERR(dev_info->dax_dev)) {
> rc = PTR_ERR(dev_info->dax_dev);
> dev_info->dax_dev = NULL;
> goto put_dev;
> }
> +   set_dax_synchronous(dev_info->dax_dev);
> rc = dax_add_host(dev_info->dax_dev, dev_info->gd);
> if (rc)
> goto out_dax;
> diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c
> index 242cc1c0d7ed7..5c03a0364a9bb 100644
> --- a/fs/fuse/virtio_fs.c
> +++ b/fs/fuse/virtio_fs.c
> @@ -850,7 +850,7 @@ static int virtio_fs_setup_dax

Re: [PATCH 4/5] dax: remove the copy_from_iter and copy_to_iter methods

2021-12-12 Thread Dan Williams
On Wed, Dec 8, 2021 at 10:38 PM Christoph Hellwig  wrote:
>
> These methods indirect the actual DAX read/write path.  In the end pmem
> uses magic flush and mc safe variants and fuse and dcssblk use plain ones
> while device mapper picks redirects to the underlying device.
>
> Add set_dax_virtual() and set_dax_nomcsafe() APIs for fuse to skip these
> special variants, then use them everywhere as they fall back to the plain
> ones on s390 anyway and remove an indirect call from the read/write path
> as well as a lot of boilerplate code.
>
> Signed-off-by: Christoph Hellwig 
> ---
>  drivers/dax/super.c   | 36 ++--
>  drivers/md/dm-linear.c| 20 -
>  drivers/md/dm-log-writes.c| 80 ---
>  drivers/md/dm-stripe.c| 20 -
>  drivers/md/dm.c   | 50 --
>  drivers/nvdimm/pmem.c | 20 -
>  drivers/s390/block/dcssblk.c  | 14 --
>  fs/dax.c  |  5 ---
>  fs/fuse/virtio_fs.c   | 19 +
>  include/linux/dax.h   |  9 ++--
>  include/linux/device-mapper.h |  4 --
>  11 files changed, 37 insertions(+), 240 deletions(-)
>
> diff --git a/drivers/dax/super.c b/drivers/dax/super.c
> index e81d5ee57390f..ff676a07480c8 100644
> --- a/drivers/dax/super.c
> +++ b/drivers/dax/super.c
> @@ -105,6 +105,10 @@ enum dax_device_flags {
> DAXDEV_WRITE_CACHE,
> /* flag to check if device supports synchronous flush */
> DAXDEV_SYNC,
> +   /* do not use uncached operations to write data */
> +   DAXDEV_CACHED,
> +   /* do not use mcsafe operations to read data */
> +   DAXDEV_NOMCSAFE,

Linus did not like the mcsafe name, and this brings it back. Let's
flip the polarity to positively indicate which routine to use, and to
match the 'nofault' style which says "copy and handle faults".

/* do not leave the caches dirty after writes */
DAXDEV_NOCACHE

/* handle CPU fetch exceptions during reads */
DAXDEV_NOMC

...and then flip the use cases around.

Otherwise, nice cleanup. In retrospect I took the feedback to push
this decision to a driver a bit too literally, this is much better.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 4/5] dax: remove the copy_from_iter and copy_to_iter methods

2021-12-12 Thread Dan Williams
On Fri, Dec 10, 2021 at 6:17 AM Vivek Goyal  wrote:
>
> On Thu, Dec 09, 2021 at 07:38:27AM +0100, Christoph Hellwig wrote:
> > These methods indirect the actual DAX read/write path.  In the end pmem
> > uses magic flush and mc safe variants and fuse and dcssblk use plain ones
> > while device mapper picks redirects to the underlying device.
> >
> > Add set_dax_virtual() and set_dax_nomcsafe() APIs for fuse to skip these
> > special variants, then use them everywhere as they fall back to the plain
> > ones on s390 anyway and remove an indirect call from the read/write path
> > as well as a lot of boilerplate code.
> >
> > Signed-off-by: Christoph Hellwig 
> > ---
> >  drivers/dax/super.c   | 36 ++--
> >  drivers/md/dm-linear.c| 20 -
> >  drivers/md/dm-log-writes.c| 80 ---
> >  drivers/md/dm-stripe.c| 20 -
> >  drivers/md/dm.c   | 50 --
> >  drivers/nvdimm/pmem.c | 20 -
> >  drivers/s390/block/dcssblk.c  | 14 --
> >  fs/dax.c  |  5 ---
> >  fs/fuse/virtio_fs.c   | 19 +
> >  include/linux/dax.h   |  9 ++--
> >  include/linux/device-mapper.h |  4 --
> >  11 files changed, 37 insertions(+), 240 deletions(-)
> >
>
> [..]
> > diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c
> > index 5c03a0364a9bb..754319ce2a29b 100644
> > --- a/fs/fuse/virtio_fs.c
> > +++ b/fs/fuse/virtio_fs.c
> > @@ -753,20 +753,6 @@ static long virtio_fs_direct_access(struct dax_device 
> > *dax_dev, pgoff_t pgoff,
> >   return nr_pages > max_nr_pages ? max_nr_pages : nr_pages;
> >  }
> >
> > -static size_t virtio_fs_copy_from_iter(struct dax_device *dax_dev,
> > -pgoff_t pgoff, void *addr,
> > -size_t bytes, struct iov_iter *i)
> > -{
> > - return copy_from_iter(addr, bytes, i);
> > -}
> > -
> > -static size_t virtio_fs_copy_to_iter(struct dax_device *dax_dev,
> > -pgoff_t pgoff, void *addr,
> > -size_t bytes, struct iov_iter *i)
> > -{
> > - return copy_to_iter(addr, bytes, i);
> > -}
> > -
> >  static int virtio_fs_zero_page_range(struct dax_device *dax_dev,
> >pgoff_t pgoff, size_t nr_pages)
> >  {
> > @@ -783,8 +769,6 @@ static int virtio_fs_zero_page_range(struct dax_device 
> > *dax_dev,
> >
> >  static const struct dax_operations virtio_fs_dax_ops = {
> >   .direct_access = virtio_fs_direct_access,
> > - .copy_from_iter = virtio_fs_copy_from_iter,
> > - .copy_to_iter = virtio_fs_copy_to_iter,
> >   .zero_page_range = virtio_fs_zero_page_range,
> >  };
> >
> > @@ -853,7 +837,8 @@ static int virtio_fs_setup_dax(struct virtio_device 
> > *vdev, struct virtio_fs *fs)
> >   fs->dax_dev = alloc_dax(fs, &virtio_fs_dax_ops);
> >   if (IS_ERR(fs->dax_dev))
> >   return PTR_ERR(fs->dax_dev);
> > -
> > + set_dax_cached(fs->dax_dev);
>
> Looks good to me from virtiofs point of view.
>
> Reviewed-by: Vivek Goyal 
>
> Going forward, I am wondering should virtiofs use flushcache version as
> well. What if host filesystem is using DAX and mapping persistent memory
> pfn directly into qemu address space. I have never tested that.
>
> Right now we are relying on applications to do fsync/msync on virtiofs
> for data persistence.

This sounds like it would need coordination with a paravirtualized
driver that can indicate whether the host side is pmem or not, like
the virtio_pmem driver. However, if the guest sends any fsync/msync
you would still need to go explicitly cache flush any dirty page
because you can't necessarily trust that the guest did that already.

>
> > + set_dax_nomcsafe(fs->dax_dev);
> >   return devm_add_action_or_reset(&vdev->dev, virtio_fs_cleanup_dax,
> >   fs->dax_dev);
> >  }
>
> Thanks
> Vivek
>
>
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 5/5] dax: always use _copy_mc_to_iter in dax_copy_to_iter

2021-12-12 Thread Dan Williams
On Fri, Dec 10, 2021 at 6:05 AM Vivek Goyal  wrote:
>
> On Thu, Dec 09, 2021 at 07:38:28AM +0100, Christoph Hellwig wrote:
> > While using the MC-safe copy routines is rather pointless on a virtual 
> > device
> > like virtiofs,
>
> I was wondering about that. Is it completely pointless.
>
> Typically we are just mapping host page cache into qemu address space.
> That shows as virtiofs device pfn in guest and that pfn is mapped into
> guest application address space in mmap() call.
>
> Given on host its DRAM, so I would not expect machine check on load side
> so there was no need to use machine check safe variant.

That's a broken assumption, DRAM experiences multi-bit ECC errors.
Machine checks, data aborts, etc existed before PMEM.

>  But what if host
> filesystem is on persistent memory and using DAX. In that case load in
> guest can trigger a machine check. Not sure if that machine check will
> actually travel into the guest and unblock read() operation or not.
>
> But this sounds like a good change from virtiofs point of view, anyway.
>
> Thanks
> Vivek
>
>
> > it also isn't harmful at all.  So just use _copy_mc_to_iter
> > unconditionally to simplify the code.
> >
> > Signed-off-by: Christoph Hellwig 
> > ---
> >  drivers/dax/super.c | 10 --
> >  fs/fuse/virtio_fs.c |  1 -
> >  include/linux/dax.h |  1 -
> >  3 files changed, 12 deletions(-)
> >
> > diff --git a/drivers/dax/super.c b/drivers/dax/super.c
> > index ff676a07480c8..fe783234ca669 100644
> > --- a/drivers/dax/super.c
> > +++ b/drivers/dax/super.c
> > @@ -107,8 +107,6 @@ enum dax_device_flags {
> >   DAXDEV_SYNC,
> >   /* do not use uncached operations to write data */
> >   DAXDEV_CACHED,
> > - /* do not use mcsafe operations to read data */
> > - DAXDEV_NOMCSAFE,
> >  };
> >
> >  /**
> > @@ -171,8 +169,6 @@ size_t dax_copy_to_iter(struct dax_device *dax_dev, 
> > pgoff_t pgoff, void *addr,
> >* via access_ok() in vfs_red, so use the 'no check' version to bypass
> >* the HARDENED_USERCOPY overhead.
> >*/
> > - if (test_bit(DAXDEV_NOMCSAFE, &dax_dev->flags))
> > - return _copy_to_iter(addr, bytes, i);
> >   return _copy_mc_to_iter(addr, bytes, i);
> >  }
> >
> > @@ -242,12 +238,6 @@ void set_dax_cached(struct dax_device *dax_dev)
> >  }
> >  EXPORT_SYMBOL_GPL(set_dax_cached);
> >
> > -void set_dax_nomcsafe(struct dax_device *dax_dev)
> > -{
> > - set_bit(DAXDEV_NOMCSAFE, &dax_dev->flags);
> > -}
> > -EXPORT_SYMBOL_GPL(set_dax_nomcsafe);
> > -
> >  bool dax_alive(struct dax_device *dax_dev)
> >  {
> >   lockdep_assert_held(&dax_srcu);
> > diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c
> > index 754319ce2a29b..d9c20b148ac19 100644
> > --- a/fs/fuse/virtio_fs.c
> > +++ b/fs/fuse/virtio_fs.c
> > @@ -838,7 +838,6 @@ static int virtio_fs_setup_dax(struct virtio_device 
> > *vdev, struct virtio_fs *fs)
> >   if (IS_ERR(fs->dax_dev))
> >   return PTR_ERR(fs->dax_dev);
> >   set_dax_cached(fs->dax_dev);
> > - set_dax_nomcsafe(fs->dax_dev);
> >   return devm_add_action_or_reset(&vdev->dev, virtio_fs_cleanup_dax,
> >   fs->dax_dev);
> >  }
> > diff --git a/include/linux/dax.h b/include/linux/dax.h
> > index d22cbf03d37d2..d267331bc37e7 100644
> > --- a/include/linux/dax.h
> > +++ b/include/linux/dax.h
> > @@ -90,7 +90,6 @@ static inline bool daxdev_mapping_supported(struct 
> > vm_area_struct *vma,
> >  #endif
> >
> >  void set_dax_cached(struct dax_device *dax_dev);
> > -void set_dax_nomcsafe(struct dax_device *dax_dev);
> >
> >  struct writeback_control;
> >  #if defined(CONFIG_BLOCK) && defined(CONFIG_FS_DAX)
> > --
> > 2.30.2
> >
>
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 5/5] dax: always use _copy_mc_to_iter in dax_copy_to_iter

2021-12-12 Thread Dan Williams
On Wed, Dec 8, 2021 at 10:38 PM Christoph Hellwig  wrote:
>
> While using the MC-safe copy routines is rather pointless on a virtual device
> like virtiofs, it also isn't harmful at all.  So just use _copy_mc_to_iter
> unconditionally to simplify the code.

>From a correctness perspective, yes, but from a performance perspective, see:

enable_copy_mc_fragile()

...on those platforms fast-string copy implementation is replaced with
a manual unrolled copy. So this will cause a performance regression on
those platforms.

How about let's keep this as is / still only use it for PMEM where end
users are already dealing with the performance difference across
platforms? I considered exporting an indicator of which backend
routine has been selected from arch/x86/lib/copy_mc.c, but it got
messy quickly so I fell back to just keeping the status quo.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[GIT PULL] vhost: cleanups and fixes

2021-12-12 Thread Michael S. Tsirkin
The following changes since commit 0fcfb00b28c0b7884635dacf38e46d60bf3d4eb1:

  Linux 5.16-rc4 (2021-12-05 14:08:22 -0800)

are available in the Git repository at:

  https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git tags/for_linus

for you to fetch changes up to bb47620be322c5e9e372536cb6b54e17b3a00258:

  vdpa: Consider device id larger than 31 (2021-12-08 15:41:50 -0500)


virtio,vdpa: bugfixes

Misc bugfixes.

Signed-off-by: Michael S. Tsirkin 


Arnd Bergmann (1):
  virtio: always enter drivers/virtio/

Dan Carpenter (3):
  vduse: fix memory corruption in vduse_dev_ioctl()
  vdpa: check that offsets are within bounds
  vduse: check that offset is within bounds in get_config()

Parav Pandit (1):
  vdpa: Consider device id larger than 31

Wei Wang (1):
  virtio/vsock: fix the transport to work with VMADDR_CID_ANY

Will Deacon (1):
  virtio_ring: Fix querying of maximum DMA mapping size for virtio device

 drivers/Makefile| 3 +--
 drivers/vdpa/vdpa.c | 3 ++-
 drivers/vdpa/vdpa_user/vduse_dev.c  | 6 --
 drivers/vhost/vdpa.c| 2 +-
 drivers/virtio/virtio_ring.c| 2 +-
 net/vmw_vsock/virtio_transport_common.c | 3 ++-
 6 files changed, 11 insertions(+), 8 deletions(-)

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [GIT PULL] vhost: cleanups and fixes

2021-12-12 Thread Michael S. Tsirkin
The email subject is wrong. it's just bugfixes.
But the tag is ok, and that's what matters, right?

On Sun, Dec 12, 2021 at 05:59:58PM -0500, Michael S. Tsirkin wrote:
> The following changes since commit 0fcfb00b28c0b7884635dacf38e46d60bf3d4eb1:
> 
>   Linux 5.16-rc4 (2021-12-05 14:08:22 -0800)
> 
> are available in the Git repository at:
> 
>   https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git tags/for_linus
> 
> for you to fetch changes up to bb47620be322c5e9e372536cb6b54e17b3a00258:
> 
>   vdpa: Consider device id larger than 31 (2021-12-08 15:41:50 -0500)
> 
> 
> virtio,vdpa: bugfixes
> 
> Misc bugfixes.
> 
> Signed-off-by: Michael S. Tsirkin 
> 
> 
> Arnd Bergmann (1):
>   virtio: always enter drivers/virtio/
> 
> Dan Carpenter (3):
>   vduse: fix memory corruption in vduse_dev_ioctl()
>   vdpa: check that offsets are within bounds
>   vduse: check that offset is within bounds in get_config()
> 
> Parav Pandit (1):
>   vdpa: Consider device id larger than 31
> 
> Wei Wang (1):
>   virtio/vsock: fix the transport to work with VMADDR_CID_ANY
> 
> Will Deacon (1):
>   virtio_ring: Fix querying of maximum DMA mapping size for virtio device
> 
>  drivers/Makefile| 3 +--
>  drivers/vdpa/vdpa.c | 3 ++-
>  drivers/vdpa/vdpa_user/vduse_dev.c  | 6 --
>  drivers/vhost/vdpa.c| 2 +-
>  drivers/virtio/virtio_ring.c| 2 +-
>  net/vmw_vsock/virtio_transport_common.c | 3 ++-
>  6 files changed, 11 insertions(+), 8 deletions(-)

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: vdpa legacy guest support (was Re: [PATCH] vdpa/mlx5: set_features should allow reset to zero)

2021-12-12 Thread Jason Wang
On Sun, Dec 12, 2021 at 5:26 PM Michael S. Tsirkin  wrote:
>
> On Fri, Dec 10, 2021 at 05:44:15PM -0800, Si-Wei Liu wrote:
> > Sorry for reviving this ancient thread. I was kinda lost for the conclusion
> > it ended up with. I have the following questions,
> >
> > 1. legacy guest support: from the past conversations it doesn't seem the
> > support will be completely dropped from the table, is my understanding
> > correct? Actually we're interested in supporting virtio v0.95 guest for x86,
> > which is backed by the spec at
> > https://ozlabs.org/~rusty/virtio-spec/virtio-0.9.5.pdf. Though I'm not sure
> > if there's request/need to support wilder legacy virtio versions earlier
> > beyond.
>
> I personally feel it's less work to add in kernel than try to
> work around it in userspace. Jason feels differently.
> Maybe post the patches and this will prove to Jason it's not
> too terrible?

That's one way, other than the config access before setting features,
we need to deal with other stuffs:

1) VIRTIO_F_ORDER_PLATFORM
2) there could be a parent device that only support 1.0 device

And a lot of other stuff summarized in spec 7.4 which seems not an
easy task. Various vDPA parent drivers were written under the
assumption that only modern devices are supported.

Thanks

>
> > 2. suppose some form of legacy guest support needs to be there, how do we
> > deal with the bogus assumption below in vdpa_get_config() in the short term?
> > It looks one of the intuitive fix is to move the vdpa_set_features call out
> > of vdpa_get_config() to vdpa_set_config().
> >
> > /*
> >  * Config accesses aren't supposed to trigger before features are
> > set.
> >  * If it does happen we assume a legacy guest.
> >  */
> > if (!vdev->features_valid)
> > vdpa_set_features(vdev, 0);
> > ops->get_config(vdev, offset, buf, len);
> >
> > I can post a patch to fix 2) if there's consensus already reached.
> >
> > Thanks,
> > -Siwei
>
> I'm not sure how important it is to change that.
> In any case it only affects transitional devices, right?
> Legacy only should not care ...
>
>
> > On 3/2/2021 2:53 AM, Jason Wang wrote:
> > >
> > > On 2021/3/2 5:47 下午, Michael S. Tsirkin wrote:
> > > > On Mon, Mar 01, 2021 at 11:56:50AM +0800, Jason Wang wrote:
> > > > > On 2021/3/1 5:34 上午, Michael S. Tsirkin wrote:
> > > > > > On Wed, Feb 24, 2021 at 10:24:41AM -0800, Si-Wei Liu wrote:
> > > > > > > > Detecting it isn't enough though, we will need a new ioctl to 
> > > > > > > > notify
> > > > > > > > the kernel that it's a legacy guest. Ugh :(
> > > > > > > Well, although I think adding an ioctl is doable, may I
> > > > > > > know what the use
> > > > > > > case there will be for kernel to leverage such info
> > > > > > > directly? Is there a
> > > > > > > case QEMU can't do with dedicate ioctls later if there's indeed
> > > > > > > differentiation (legacy v.s. modern) needed?
> > > > > > BTW a good API could be
> > > > > >
> > > > > > #define VHOST_SET_ENDIAN _IOW(VHOST_VIRTIO, ?, int)
> > > > > > #define VHOST_GET_ENDIAN _IOW(VHOST_VIRTIO, ?, int)
> > > > > >
> > > > > > we did it per vring but maybe that was a mistake ...
> > > > >
> > > > > Actually, I wonder whether it's good time to just not support
> > > > > legacy driver
> > > > > for vDPA. Consider:
> > > > >
> > > > > 1) It's definition is no-normative
> > > > > 2) A lot of budren of codes
> > > > >
> > > > > So qemu can still present the legacy device since the config
> > > > > space or other
> > > > > stuffs that is presented by vhost-vDPA is not expected to be
> > > > > accessed by
> > > > > guest directly. Qemu can do the endian conversion when necessary
> > > > > in this
> > > > > case?
> > > > >
> > > > > Thanks
> > > > >
> > > > Overall I would be fine with this approach but we need to avoid breaking
> > > > working userspace, qemu releases with vdpa support are out there and
> > > > seem to work for people. Any changes need to take that into account
> > > > and document compatibility concerns.
> > >
> > >
> > > Agree, let me check.
> > >
> > >
> > > >   I note that any hardware
> > > > implementation is already broken for legacy except on platforms with
> > > > strong ordering which might be helpful in reducing the scope.
> > >
> > >
> > > Yes.
> > >
> > > Thanks
> > >
> > >
> > > >
> > > >
> > >
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v1 6/7] vdpa: Add support for querying control virtqueue index

2021-12-12 Thread Jason Wang
On Thu, Dec 9, 2021 at 5:21 PM Eli Cohen  wrote:
>
> On Thu, Dec 09, 2021 at 05:12:01PM +0800, Jason Wang wrote:
> > On Thu, Dec 9, 2021 at 4:26 PM Eli Cohen  wrote:
> > >
> > > On Thu, Dec 09, 2021 at 04:17:22PM +0800, Jason Wang wrote:
> > > > On Thu, Dec 9, 2021 at 4:04 PM Eli Cohen  wrote:
> > > > >
> > > > > On Thu, Dec 09, 2021 at 03:55:48PM +0800, Jason Wang wrote:
> > > > > > On Thu, Dec 9, 2021 at 3:09 PM Eli Cohen  wrote:
> > > > > > >
> > > > > > > On Thu, Dec 09, 2021 at 01:44:56PM +0800, Jason Wang wrote:
> > > > > > > > On Thu, Dec 9, 2021 at 4:15 AM Eli Cohen  
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > Add netlink attribute and callback function to query the 
> > > > > > > > > control VQ
> > > > > > > > > index of a device.
> > > > > > > > >
> > > > > > > > > Example:
> > > > > > > > >
> > > > > > > > > $ vdpa dev config show vdpa-a
> > > > > > > > >   vdpa-a: mac 00:00:00:00:88:88 link up link_announce false 
> > > > > > > > > max_vq_pairs 5 \
> > > > > > > > >   mtu 9000 ctrl_vq_idx 10
> > > > > > > >
> > > > > > > >
> > > > > > > > I still wonder about the motivation for this.
> > > > > > > To be able to show the stats for CVQ.
> > > > > >
> > > > > > Right.
> > > > > >
> > > > > > >
> > > > > > > > And as discussed, the
> > > > > > > > ctrl_vq_idx varies depending on whether MQ is negotiated.
> > > > > > >
> > > > > > > I handle this according to the spec and it depends on MQ.
> > > > > >
> > > > > > Yes, but there could be a chance that the cvq index changes after 
> > > > > > the
> > > > > > vdpa dev config show.
> > > > > >
> > > > >
> > > > > It depends on:
> > > > > VIRTIO_NET_F_CTRL_VQ
> > > > > VIRTIO_NET_F_MQ
> > > > >
> > > > > which can change only if the features are re-negotiated and that 
> > > > > happens
> > > > > on driver in itialization.
> > > > >
> > > > > And on max_virtqueue_pairs which is also set at driver initialization.
> > > > >
> > > > > So I don't see any real issue here. Am I missing something?
> > > >
> > > > No. I meant technically there could be a re-negotiation that happens
> > > > in the middle:
> > > >
> > > > 1) vdpa dev config show
> > > > 2) feature renegotiation which change the cvq index
> > > > 3) getting cvq stats
> > > >
> > > > So the cvq index might be changed. E.g it could be changed from a cvq
> > > > to a rx queue. It might be easier to make the get index and stats
> > > > atomic.
> > > >
> > >
> > > The interface to read VQ stats is based on the user providing the index
> > > and getting the statistics.
> > >
> > > If we want to follow your suggestion then we need maybe to use a
> > > slightly modified command:
> > >
> > > For regular VQ:
> > > dpa dev vstats show vdpa-a qidx 0
> > >
> > > For CVQ:
> > > vdpa dev vstats show vdpa-a cvq
> >
> > This might be better, but we need to make sure cvq only works for the
> > device that has a cvq.
> >
>
> OK.
>
> > >
> > > And that raises another question. Do we want to report the CVQ index in
> > > config show?
> >
> > If we go with stats show vdpa-a cvq, we don't need to report the cvq index.
> >
>
> Maybe we should leave some kind of indication as to whether there is a
> CVQ. Maybe not the index itself by just "CVQ" so a user will know if
> it's worth trying to read cvq stats.
>
> Otherwise, the user will attempt to read and fail.
> What do you think?

It should work.

Thanks

>
> > Thanks
> >
> > >
> > > > Thanks
> > > >
> > > > >
> > > > > > Thanks
> > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > Thanks
> > > > > > > >
> > > > > > > > >
> > > > > > > > > Signed-off-by: Eli Cohen 
> > > > > > > > > ---
> > > > > > > > > v0 -> v1:
> > > > > > > > > 1. Use logic defined in the spec to deduce virtqueue index.
> > > > > > > > >
> > > > > > > > >  drivers/vdpa/vdpa.c   | 25 +
> > > > > > > > >  include/uapi/linux/vdpa.h |  1 +
> > > > > > > > >  2 files changed, 26 insertions(+)
> > > > > > > > >
> > > > > > > > > diff --git a/drivers/vdpa/vdpa.c b/drivers/vdpa/vdpa.c
> > > > > > > > > index 3bf016e03512..b4d4b8a7ca4e 100644
> > > > > > > > > --- a/drivers/vdpa/vdpa.c
> > > > > > > > > +++ b/drivers/vdpa/vdpa.c
> > > > > > > > > @@ -712,6 +712,27 @@ static int 
> > > > > > > > > vdpa_nl_cmd_dev_get_dumpit(struct sk_buff *msg, struct 
> > > > > > > > > netlink_callba
> > > > > > > > > return msg->len;
> > > > > > > > >  }
> > > > > > > > >
> > > > > > > > > +static int vdpa_dev_net_ctrl_vq_fill(struct vdpa_device 
> > > > > > > > > *vdev,
> > > > > > > > > +struct sk_buff *msg,
> > > > > > > > > +struct virtio_net_config 
> > > > > > > > > *config,
> > > > > > > > > +u64 features)
> > > > > > > > > +{
> > > > > > > > > +   u16 N;
> > > > > > > > > +
> > > > > > > > > +   /* control VQ index, if available, is deduced 
> > > > > > > > > according to the logic
> > > > > > > > > +* described in the virtio spec in section 

RE: [iproute2-next 0/4] vdpa tool to query and set config layout

2021-12-12 Thread Parav Pandit via Virtualization



> From: David Ahern 
> Sent: Friday, December 3, 2021 10:27 PM
> 
> On 12/1/21 9:22 PM, Parav Pandit wrote:
> > This series implements querying and setting of the mac address and mtu
> > device config fields of the vdpa device of type net.
> >
> > An example of query and set as below.
> >
> > $ vdpa dev add name bar mgmtdev vdpasim_net mac 00:11:22:33:44:55 mtu
> > 9000
> >
> > $ vdpa dev config show
> > bar: mac 00:11:22:33:44:55 link up link_announce false mtu 9000
> >
> > $ vdpa dev config show -jp
> > {
> > "config": {
> > "bar": {
> > "mac": "00:11:22:33:44:55",
> > "link ": "up",
> > "link_announce ": false,
> > "mtu": 1500,
> > }
> > }
> > }
> >
> > patch summary:
> > patch-1 updates the kernel headers
> > patch-2 implements the query command
> > patch-3 implements setting the mac address of vdpa dev config space
> > patch-4 implements setting the mtu of vdpa dev config space
> >
> >
> > Parav Pandit (4):
> >   vdpa: Update kernel headers
> >   vdpa: Enable user to query vdpa device config layout
> >   vdpa: Enable user to set mac address of vdpa device
> >   vdpa: Enable user to set mtu of the vdpa device
> >
> >  include/uapi/linux/virtio_net.h |  81 +
> >  vdpa/include/uapi/linux/vdpa.h  |   7 ++
> >  vdpa/vdpa.c | 198 ++--
> >  3 files changed, 277 insertions(+), 9 deletions(-)  create mode
> > 100644 include/uapi/linux/virtio_net.h
> >
> 
> please update man page(s)
Ok. I will update them and also update patches to address comments in other two 
patches in v2.
Thanks.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] virtio-net: make copy len check in xdp_linearize_page

2021-12-12 Thread Xuan Zhuo
On Mon, 13 Dec 2021 12:50:12 +0800, mengensun8...@gmail.com wrote:
> From: mengensun 
>
> xdp_linearize_page asume ring elem size is smaller then page size
> when copy the first ring elem, but, there may be a elem size bigger
> then page size.
>
> add_recvbuf_mergeable may add a hole to ring elem, the hole size is
> not sure, according EWMA.
>
> so, fix it by check copy len,if checked failed, just dropped the
> whole frame, not make the memory dirty after the page.
>
> Signed-off-by: mengensun 
> Reviewed-by: MengLong Dong 
> Reviewed-by: ZhengXiong Jiang 

LGTM

Reviewed-by: Xuan Zhuo 

> ---
>  drivers/net/virtio_net.c | 6 +-
>  1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 36a4b7c195d5..844bdbd67ff7 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -662,8 +662,12 @@ static struct page *xdp_linearize_page(struct 
> receive_queue *rq,
>  int page_off,
>  unsigned int *len)
>  {
> - struct page *page = alloc_page(GFP_ATOMIC);
> + struct page *page;
>
> + if (*len > PAGE_SIZE - page_off)
> + return NULL;
> +
> + page = alloc_page(GFP_ATOMIC);
>   if (!page)
>   return NULL;
>
> --
> 2.27.0
>
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] virtio-net: make copy len check in xdp_linearize_page

2021-12-12 Thread Jason Wang
On Mon, Dec 13, 2021 at 12:50 PM  wrote:
>
> From: mengensun 
>
> xdp_linearize_page asume ring elem size is smaller then page size
> when copy the first ring elem, but, there may be a elem size bigger
> then page size.
>
> add_recvbuf_mergeable may add a hole to ring elem, the hole size is
> not sure, according EWMA.

The logic is to try to avoid dropping packets in this case, so I
wonder if it's better to "fix" the add_recvbuf_mergeable().

Or another idea is to switch to use XDP generic here where we can use
skb_linearize() which should be more robust and we can drop the
xdp_linearize_page() logic completely.

Thanks

>
> so, fix it by check copy len,if checked failed, just dropped the
> whole frame, not make the memory dirty after the page.
>
> Signed-off-by: mengensun 
> Reviewed-by: MengLong Dong 
> Reviewed-by: ZhengXiong Jiang 
> ---
>  drivers/net/virtio_net.c | 6 +-
>  1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 36a4b7c195d5..844bdbd67ff7 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -662,8 +662,12 @@ static struct page *xdp_linearize_page(struct 
> receive_queue *rq,
>int page_off,
>unsigned int *len)
>  {
> -   struct page *page = alloc_page(GFP_ATOMIC);
> +   struct page *page;
>
> +   if (*len > PAGE_SIZE - page_off)
> +   return NULL;
> +
> +   page = alloc_page(GFP_ATOMIC);
> if (!page)
> return NULL;
>
> --
> 2.27.0
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization