Re: [PATCH v2] MAINTAINERS: Remove Ohad Ben-Cohen from hwspinlock subsystem

2023-12-18 Thread Ohad Ben Cohen
On Mon, Dec 18, 2023 at 3:29 PM Bagas Sanjaya  wrote:
> Commit 62c46d55688894 ("MAINTAINERS: Removing Ohad from remoteproc/rpmsg
> maintenance") removes his MAINTAINERS entry in regards to remoteproc
> subsystem due to his inactivity (the last commit with his Signed-off-by
> is 99c429cb4e628e ("remoteproc/wkup_m3: Use MODULE_DEVICE_TABLE to
> export alias") which is authored in 2015 and his last LKML message prior
> to 62c46d55688894 was [1]).
>
> Remove also his MAINTAINERS entry for hwspinlock subsystem as there is
> no point of Cc'ing maintainers who never respond in a long time.
>
> [1]: 
> https://lore.kernel.org/r/CAK=Wgbbcyi36ef1-PV8VS=m6nfoqnfgudwy6v7ocnkt0ddr...@mail.gmail.com/
>
> Signed-off-by: Bagas Sanjaya 
> ---

Acked-by: Ohad Ben Cohen 



Re: [PATCH] MAINTAINERS: Remove Ohad Ben-Cohen from hwspinlock subsystem

2023-12-16 Thread Ohad Ben Cohen
Hi Bagas,

On Sat, Dec 16, 2023 at 1:10 PM Bagas Sanjaya  wrote:
> --- a/CREDITS
> +++ b/CREDITS
> @@ -323,6 +323,7 @@ N: Ohad Ben Cohen
>  E: o...@wizery.com
>  D: Remote Processor (remoteproc) subsystem
>  D: Remote Processor Messaging (rpmsg) subsystem
> +D: Hardware spinlock (hwspinlock) subsystem

Please also add:

D: OMAP hwspinlock driver
D: OMAP remoteproc driver

Thanks,
Ohad.



[PATCH v2] net/sctp: fix race condition in sctp_destroy_sock

2021-04-13 Thread Or Cohen
If sctp_destroy_sock is called without sock_net(sk)->sctp.addr_wq_lock
held and sp->do_auto_asconf is true, then an element is removed
from the auto_asconf_splist without any proper locking.

This can happen in the following functions:
1. In sctp_accept, if sctp_sock_migrate fails.
2. In inet_create or inet6_create, if there is a bpf program
   attached to BPF_CGROUP_INET_SOCK_CREATE which denies
   creation of the sctp socket.

The bug is fixed by acquiring addr_wq_lock in sctp_destroy_sock
instead of sctp_close.

This addresses CVE-2021-23133.

Reported-by: Or Cohen 
Reviewed-by: Xin Long 
Fixes: 610236587600 ("bpf: Add new cgroup attach type to enable sock 
modifications")
Signed-off-by: Or Cohen 
---
Changes in v2:
- Removed a comment in sctp_init_sock.
- Added a CVE number.

 net/sctp/socket.c | 13 +
 1 file changed, 5 insertions(+), 8 deletions(-)

diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index a710917c5ac7..b9b3d899a611 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -1520,11 +1520,9 @@ static void sctp_close(struct sock *sk, long timeout)
 
/* Supposedly, no process has access to the socket, but
 * the net layers still may.
-* Also, sctp_destroy_sock() needs to be called with addr_wq_lock
-* held and that should be grabbed before socket lock.
 */
-   spin_lock_bh(>sctp.addr_wq_lock);
-   bh_lock_sock_nested(sk);
+   local_bh_disable();
+   bh_lock_sock(sk);
 
/* Hold the sock, since sk_common_release() will put sock_put()
 * and we have just a little more cleanup.
@@ -1533,7 +1531,7 @@ static void sctp_close(struct sock *sk, long timeout)
sk_common_release(sk);
 
bh_unlock_sock(sk);
-   spin_unlock_bh(>sctp.addr_wq_lock);
+   local_bh_enable();
 
sock_put(sk);
 
@@ -4993,9 +4991,6 @@ static int sctp_init_sock(struct sock *sk)
sk_sockets_allocated_inc(sk);
sock_prot_inuse_add(net, sk->sk_prot, 1);
 
-   /* Nothing can fail after this block, otherwise
-* sctp_destroy_sock() will be called without addr_wq_lock held
-*/
if (net->sctp.default_auto_asconf) {
spin_lock(_net(sk)->sctp.addr_wq_lock);
list_add_tail(>auto_asconf_list,
@@ -5030,7 +5025,9 @@ static void sctp_destroy_sock(struct sock *sk)
 
if (sp->do_auto_asconf) {
sp->do_auto_asconf = 0;
+   spin_lock_bh(_net(sk)->sctp.addr_wq_lock);
list_del(>auto_asconf_list);
+   spin_unlock_bh(_net(sk)->sctp.addr_wq_lock);
}
sctp_endpoint_free(sp->ep);
local_bh_disable();
-- 
2.7.4



[PATCH] net/sctp: fix race condition in sctp_destroy_sock

2021-04-13 Thread Or Cohen
If sctp_destroy_sock is called without sock_net(sk)->sctp.addr_wq_lock
held and sp->do_auto_asconf is true, then an element is removed
from the auto_asconf_splist without any proper locking.

This can happen in the following functions:
1. In sctp_accept, if sctp_sock_migrate fails.
2. In inet_create or inet6_create, if there is a bpf program
   attached to BPF_CGROUP_INET_SOCK_CREATE which denies
   creation of the sctp socket.

The bug is fixed by acquiring addr_wq_lock in sctp_destroy_sock
instead of sctp_close.

Reported-by: Or Cohen 
Reviewed-by: Xin Long 
Fixes: 610236587600 ("bpf: Add new cgroup attach type to enable sock 
modifications")
Signed-off-by: Or Cohen 
---
 net/sctp/socket.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index a710917c5ac7..9af232d4fb6b 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -1520,11 +1520,9 @@ static void sctp_close(struct sock *sk, long timeout)
 
/* Supposedly, no process has access to the socket, but
 * the net layers still may.
-* Also, sctp_destroy_sock() needs to be called with addr_wq_lock
-* held and that should be grabbed before socket lock.
 */
-   spin_lock_bh(>sctp.addr_wq_lock);
-   bh_lock_sock_nested(sk);
+   local_bh_disable();
+   bh_lock_sock(sk);
 
/* Hold the sock, since sk_common_release() will put sock_put()
 * and we have just a little more cleanup.
@@ -1533,7 +1531,7 @@ static void sctp_close(struct sock *sk, long timeout)
sk_common_release(sk);
 
bh_unlock_sock(sk);
-   spin_unlock_bh(>sctp.addr_wq_lock);
+   local_bh_enable();
 
sock_put(sk);
 
@@ -5030,7 +5028,9 @@ static void sctp_destroy_sock(struct sock *sk)
 
if (sp->do_auto_asconf) {
sp->do_auto_asconf = 0;
+   spin_lock_bh(_net(sk)->sctp.addr_wq_lock);
list_del(>auto_asconf_list);
+   spin_unlock_bh(_net(sk)->sctp.addr_wq_lock);
}
sctp_endpoint_free(sp->ep);
local_bh_disable();
-- 
2.7.4



Re: [PATCH] vdpa/mlx5: Fix wrong use of bit numbers

2021-03-02 Thread Eli Cohen
On Mon, Mar 01, 2021 at 10:33:14AM -0500, Michael S. Tsirkin wrote:
> On Mon, Mar 01, 2021 at 03:52:45PM +0800, Jason Wang wrote:
> > 
> > On 2021/3/1 2:28 下午, Eli Cohen wrote:
> > > VIRTIO_F_VERSION_1 is a bit number. Use BIT_ULL() with mask
> > > conditionals.
> > > 
> > > Also, in mlx5_vdpa_is_little_endian() use BIT_ULL for consistency with
> > > the rest of the code.
> > > 
> > > Fixes: 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for supported mlx5 
> > > devices")
> > > Signed-off-by: Eli Cohen 
> > 
> > 
> > Acked-by: Jason Wang 
> 
> And CC stable I guess?

Is this a question or a request? :-)

> 
> > 
> > > ---
> > >   drivers/vdpa/mlx5/net/mlx5_vnet.c | 4 ++--
> > >   1 file changed, 2 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
> > > b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > index dc7031132fff..7d21b857a94a 100644
> > > --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > @@ -821,7 +821,7 @@ static int create_virtqueue(struct mlx5_vdpa_net 
> > > *ndev, struct mlx5_vdpa_virtque
> > >   MLX5_SET(virtio_q, vq_ctx, event_qpn_or_msix, 
> > > mvq->fwqp.mqp.qpn);
> > >   MLX5_SET(virtio_q, vq_ctx, queue_size, mvq->num_ent);
> > >   MLX5_SET(virtio_q, vq_ctx, virtio_version_1_0,
> > > -  !!(ndev->mvdev.actual_features & VIRTIO_F_VERSION_1));
> > > +  !!(ndev->mvdev.actual_features & BIT_ULL(VIRTIO_F_VERSION_1)));
> > >   MLX5_SET64(virtio_q, vq_ctx, desc_addr, mvq->desc_addr);
> > >   MLX5_SET64(virtio_q, vq_ctx, used_addr, mvq->device_addr);
> > >   MLX5_SET64(virtio_q, vq_ctx, available_addr, mvq->driver_addr);
> > > @@ -1578,7 +1578,7 @@ static void teardown_virtqueues(struct 
> > > mlx5_vdpa_net *ndev)
> > >   static inline bool mlx5_vdpa_is_little_endian(struct mlx5_vdpa_dev 
> > > *mvdev)
> > >   {
> > >   return virtio_legacy_is_little_endian() ||
> > > - (mvdev->actual_features & (1ULL << VIRTIO_F_VERSION_1));
> > > + (mvdev->actual_features & BIT_ULL(VIRTIO_F_VERSION_1));
> > >   }
> > >   static __virtio16 cpu_to_mlx5vdpa16(struct mlx5_vdpa_dev *mvdev, u16 
> > > val)
> 


[PATCH] vdpa/mlx5: Fix wrong use of bit numbers

2021-02-28 Thread Eli Cohen
VIRTIO_F_VERSION_1 is a bit number. Use BIT_ULL() with mask
conditionals.

Also, in mlx5_vdpa_is_little_endian() use BIT_ULL for consistency with
the rest of the code.

Fixes: 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for supported mlx5 devices")
Signed-off-by: Eli Cohen 
---
 drivers/vdpa/mlx5/net/mlx5_vnet.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index dc7031132fff..7d21b857a94a 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -821,7 +821,7 @@ static int create_virtqueue(struct mlx5_vdpa_net *ndev, 
struct mlx5_vdpa_virtque
MLX5_SET(virtio_q, vq_ctx, event_qpn_or_msix, mvq->fwqp.mqp.qpn);
MLX5_SET(virtio_q, vq_ctx, queue_size, mvq->num_ent);
MLX5_SET(virtio_q, vq_ctx, virtio_version_1_0,
-!!(ndev->mvdev.actual_features & VIRTIO_F_VERSION_1));
+!!(ndev->mvdev.actual_features & BIT_ULL(VIRTIO_F_VERSION_1)));
MLX5_SET64(virtio_q, vq_ctx, desc_addr, mvq->desc_addr);
MLX5_SET64(virtio_q, vq_ctx, used_addr, mvq->device_addr);
MLX5_SET64(virtio_q, vq_ctx, available_addr, mvq->driver_addr);
@@ -1578,7 +1578,7 @@ static void teardown_virtqueues(struct mlx5_vdpa_net 
*ndev)
 static inline bool mlx5_vdpa_is_little_endian(struct mlx5_vdpa_dev *mvdev)
 {
return virtio_legacy_is_little_endian() ||
-   (mvdev->actual_features & (1ULL << VIRTIO_F_VERSION_1));
+   (mvdev->actual_features & BIT_ULL(VIRTIO_F_VERSION_1));
 }
 
 static __virtio16 cpu_to_mlx5vdpa16(struct mlx5_vdpa_dev *mvdev, u16 val)
-- 
2.30.1



Not yet merged patches

2021-02-28 Thread Eli Cohen
Hi Michael,

I see that you did not include these in your latest pull request.

https://lkml.org/lkml/2021/2/10/1386
https://lkml.org/lkml/2021/2/10/1383
https://lkml.org/lkml/2021/2/18/124

Are you going to merge them?



Re: [PATCH] vdpa/mlx5: set_features should allow reset to zero

2021-02-24 Thread Eli Cohen
On Wed, Feb 24, 2021 at 02:12:01AM -0500, Michael S. Tsirkin wrote:
> On Wed, Feb 24, 2021 at 02:55:13PM +0800, Jason Wang wrote:
> > 
> > On 2021/2/24 2:47 下午, Michael S. Tsirkin wrote:
> > > On Wed, Feb 24, 2021 at 08:45:20AM +0200, Eli Cohen wrote:
> > > > On Wed, Feb 24, 2021 at 12:17:58AM -0500, Michael S. Tsirkin wrote:
> > > > > On Wed, Feb 24, 2021 at 11:20:01AM +0800, Jason Wang wrote:
> > > > > > On 2021/2/24 3:35 上午, Si-Wei Liu wrote:
> > > > > > > 
> > > > > > > On 2/23/2021 5:26 AM, Michael S. Tsirkin wrote:
> > > > > > > > On Tue, Feb 23, 2021 at 10:03:57AM +0800, Jason Wang wrote:
> > > > > > > > > On 2021/2/23 9:12 上午, Si-Wei Liu wrote:
> > > > > > > > > > On 2/21/2021 11:34 PM, Michael S. Tsirkin wrote:
> > > > > > > > > > > On Mon, Feb 22, 2021 at 12:14:17PM +0800, Jason Wang 
> > > > > > > > > > > wrote:
> > > > > > > > > > > > On 2021/2/19 7:54 下午, Si-Wei Liu wrote:
> > > > > > > > > > > > > Commit 452639a64ad8 ("vdpa: make sure set_features is 
> > > > > > > > > > > > > invoked
> > > > > > > > > > > > > for legacy") made an exception for legacy guests to 
> > > > > > > > > > > > > reset
> > > > > > > > > > > > > features to 0, when config space is accessed before 
> > > > > > > > > > > > > features
> > > > > > > > > > > > > are set. We should relieve the verify_min_features() 
> > > > > > > > > > > > > check
> > > > > > > > > > > > > and allow features reset to 0 for this case.
> > > > > > > > > > > > > 
> > > > > > > > > > > > > It's worth noting that not just legacy guests could 
> > > > > > > > > > > > > access
> > > > > > > > > > > > > config space before features are set. For instance, 
> > > > > > > > > > > > > when
> > > > > > > > > > > > > feature VIRTIO_NET_F_MTU is advertised some modern 
> > > > > > > > > > > > > driver
> > > > > > > > > > > > > will try to access and validate the MTU present in 
> > > > > > > > > > > > > the config
> > > > > > > > > > > > > space before virtio features are set.
> > > > > > > > > > > > This looks like a spec violation:
> > > > > > > > > > > > 
> > > > > > > > > > > > "
> > > > > > > > > > > > 
> > > > > > > > > > > > The following driver-read-only field, mtu only exists if
> > > > > > > > > > > > VIRTIO_NET_F_MTU is
> > > > > > > > > > > > set.
> > > > > > > > > > > > This field specifies the maximum MTU for the driver to 
> > > > > > > > > > > > use.
> > > > > > > > > > > > "
> > > > > > > > > > > > 
> > > > > > > > > > > > Do we really want to workaround this?
> > > > > > > > > > > > 
> > > > > > > > > > > > Thanks
> > > > > > > > > > > And also:
> > > > > > > > > > > 
> > > > > > > > > > > The driver MUST follow this sequence to initialize a 
> > > > > > > > > > > device:
> > > > > > > > > > > 1. Reset the device.
> > > > > > > > > > > 2. Set the ACKNOWLEDGE status bit: the guest OS has
> > > > > > > > > > > noticed the device.
> > > > > > > > > > > 3. Set the DRIVER status bit: the guest OS knows how to 
> > > > > > > > > > > drive the
> > > > > > > > > > > device.
> > > > > > > > > > > 4. Read device feature bits, and write 

Re: [PATCH] vdpa/mlx5: set_features should allow reset to zero

2021-02-23 Thread Eli Cohen
On Wed, Feb 24, 2021 at 02:55:13PM +0800, Jason Wang wrote:
> 
> On 2021/2/24 2:47 下午, Michael S. Tsirkin wrote:
> > On Wed, Feb 24, 2021 at 08:45:20AM +0200, Eli Cohen wrote:
> > > On Wed, Feb 24, 2021 at 12:17:58AM -0500, Michael S. Tsirkin wrote:
> > > > On Wed, Feb 24, 2021 at 11:20:01AM +0800, Jason Wang wrote:
> > > > > On 2021/2/24 3:35 上午, Si-Wei Liu wrote:
> > > > > > 
> > > > > > On 2/23/2021 5:26 AM, Michael S. Tsirkin wrote:
> > > > > > > On Tue, Feb 23, 2021 at 10:03:57AM +0800, Jason Wang wrote:
> > > > > > > > On 2021/2/23 9:12 上午, Si-Wei Liu wrote:
> > > > > > > > > On 2/21/2021 11:34 PM, Michael S. Tsirkin wrote:
> > > > > > > > > > On Mon, Feb 22, 2021 at 12:14:17PM +0800, Jason Wang wrote:
> > > > > > > > > > > On 2021/2/19 7:54 下午, Si-Wei Liu wrote:
> > > > > > > > > > > > Commit 452639a64ad8 ("vdpa: make sure set_features is 
> > > > > > > > > > > > invoked
> > > > > > > > > > > > for legacy") made an exception for legacy guests to 
> > > > > > > > > > > > reset
> > > > > > > > > > > > features to 0, when config space is accessed before 
> > > > > > > > > > > > features
> > > > > > > > > > > > are set. We should relieve the verify_min_features() 
> > > > > > > > > > > > check
> > > > > > > > > > > > and allow features reset to 0 for this case.
> > > > > > > > > > > > 
> > > > > > > > > > > > It's worth noting that not just legacy guests could 
> > > > > > > > > > > > access
> > > > > > > > > > > > config space before features are set. For instance, when
> > > > > > > > > > > > feature VIRTIO_NET_F_MTU is advertised some modern 
> > > > > > > > > > > > driver
> > > > > > > > > > > > will try to access and validate the MTU present in the 
> > > > > > > > > > > > config
> > > > > > > > > > > > space before virtio features are set.
> > > > > > > > > > > This looks like a spec violation:
> > > > > > > > > > > 
> > > > > > > > > > > "
> > > > > > > > > > > 
> > > > > > > > > > > The following driver-read-only field, mtu only exists if
> > > > > > > > > > > VIRTIO_NET_F_MTU is
> > > > > > > > > > > set.
> > > > > > > > > > > This field specifies the maximum MTU for the driver to 
> > > > > > > > > > > use.
> > > > > > > > > > > "
> > > > > > > > > > > 
> > > > > > > > > > > Do we really want to workaround this?
> > > > > > > > > > > 
> > > > > > > > > > > Thanks
> > > > > > > > > > And also:
> > > > > > > > > > 
> > > > > > > > > > The driver MUST follow this sequence to initialize a device:
> > > > > > > > > > 1. Reset the device.
> > > > > > > > > > 2. Set the ACKNOWLEDGE status bit: the guest OS has
> > > > > > > > > > noticed the device.
> > > > > > > > > > 3. Set the DRIVER status bit: the guest OS knows how to 
> > > > > > > > > > drive the
> > > > > > > > > > device.
> > > > > > > > > > 4. Read device feature bits, and write the subset of 
> > > > > > > > > > feature bits
> > > > > > > > > > understood by the OS and driver to the
> > > > > > > > > > device. During this step the driver MAY read (but MUST NOT 
> > > > > > > > > > write)
> > > > > > > > > > the device-specific configuration
> > > > > > > > > > fields to check that it can support the device before 
> > >

Re: [PATCH] vdpa/mlx5: set_features should allow reset to zero

2021-02-23 Thread Eli Cohen
On Wed, Feb 24, 2021 at 12:17:58AM -0500, Michael S. Tsirkin wrote:
> On Wed, Feb 24, 2021 at 11:20:01AM +0800, Jason Wang wrote:
> > 
> > On 2021/2/24 3:35 上午, Si-Wei Liu wrote:
> > > 
> > > 
> > > On 2/23/2021 5:26 AM, Michael S. Tsirkin wrote:
> > > > On Tue, Feb 23, 2021 at 10:03:57AM +0800, Jason Wang wrote:
> > > > > On 2021/2/23 9:12 上午, Si-Wei Liu wrote:
> > > > > > 
> > > > > > On 2/21/2021 11:34 PM, Michael S. Tsirkin wrote:
> > > > > > > On Mon, Feb 22, 2021 at 12:14:17PM +0800, Jason Wang wrote:
> > > > > > > > On 2021/2/19 7:54 下午, Si-Wei Liu wrote:
> > > > > > > > > Commit 452639a64ad8 ("vdpa: make sure set_features is invoked
> > > > > > > > > for legacy") made an exception for legacy guests to reset
> > > > > > > > > features to 0, when config space is accessed before features
> > > > > > > > > are set. We should relieve the verify_min_features() check
> > > > > > > > > and allow features reset to 0 for this case.
> > > > > > > > > 
> > > > > > > > > It's worth noting that not just legacy guests could access
> > > > > > > > > config space before features are set. For instance, when
> > > > > > > > > feature VIRTIO_NET_F_MTU is advertised some modern driver
> > > > > > > > > will try to access and validate the MTU present in the config
> > > > > > > > > space before virtio features are set.
> > > > > > > > This looks like a spec violation:
> > > > > > > > 
> > > > > > > > "
> > > > > > > > 
> > > > > > > > The following driver-read-only field, mtu only exists if
> > > > > > > > VIRTIO_NET_F_MTU is
> > > > > > > > set.
> > > > > > > > This field specifies the maximum MTU for the driver to use.
> > > > > > > > "
> > > > > > > > 
> > > > > > > > Do we really want to workaround this?
> > > > > > > > 
> > > > > > > > Thanks
> > > > > > > And also:
> > > > > > > 
> > > > > > > The driver MUST follow this sequence to initialize a device:
> > > > > > > 1. Reset the device.
> > > > > > > 2. Set the ACKNOWLEDGE status bit: the guest OS has
> > > > > > > noticed the device.
> > > > > > > 3. Set the DRIVER status bit: the guest OS knows how to drive the
> > > > > > > device.
> > > > > > > 4. Read device feature bits, and write the subset of feature bits
> > > > > > > understood by the OS and driver to the
> > > > > > > device. During this step the driver MAY read (but MUST NOT write)
> > > > > > > the device-specific configuration
> > > > > > > fields to check that it can support the device before accepting 
> > > > > > > it.
> > > > > > > 5. Set the FEATURES_OK status bit. The driver MUST NOT accept new
> > > > > > > feature bits after this step.
> > > > > > > 6. Re-read device status to ensure the FEATURES_OK bit is still 
> > > > > > > set:
> > > > > > > otherwise, the device does not
> > > > > > > support our subset of features and the device is unusable.
> > > > > > > 7. Perform device-specific setup, including discovery of 
> > > > > > > virtqueues
> > > > > > > for the device, optional per-bus setup,
> > > > > > > reading and possibly writing the device’s virtio configuration
> > > > > > > space, and population of virtqueues.
> > > > > > > 8. Set the DRIVER_OK status bit. At this point the device is 
> > > > > > > “live”.
> > > > > > > 
> > > > > > > 
> > > > > > > so accessing config space before FEATURES_OK is a spec
> > > > > > > violation, right?
> > > > > > It is, but it's not relevant to what this commit tries to address. I
> > > > > > thought the legacy guest still needs to be supported.
> > > > > > 
> > > > > > Having said, a separate patch has to be posted to fix the guest 
> > > > > > driver
> > > > > > issue where this discrepancy is introduced to
> > > > > > virtnet_validate() (since
> > > > > > commit fe36cbe067). But it's not technically related to this patch.
> > > > > > 
> > > > > > -Siwei
> > > > > 
> > > > > I think it's a bug to read config space in validate, we should
> > > > > move it to
> > > > > virtnet_probe().
> > > > > 
> > > > > Thanks
> > > > I take it back, reading but not writing seems to be explicitly
> > > > allowed by spec.
> > > > So our way to detect a legacy guest is bogus, need to think what is
> > > > the best way to handle this.
> > > Then maybe revert commit fe36cbe067 and friends, and have QEMU detect
> > > legacy guest? Supposedly only config space write access needs to be
> > > guarded before setting FEATURES_OK.
> > 
> > 
> > I agree. My understanding is that all vDPA must be modern device (since
> > VIRITO_F_ACCESS_PLATFORM is mandated) instead of transitional device.
> > 
> > Thanks
> 
> Well mlx5 has some code to handle legacy guests ...
> Eli, could you comment? Is that support unused right now?
> 

If you mean support for version 1.0, well the knob is there but it's not
set in the firmware I use. Note sure if we will support this.

> 
> > 
> > > 
> > > -Siwie
> > > 
> > > > > > > 
> > > > > > > > > Rejecting reset to 0
> > > > > > > > > prematurely causes correct MTU and link status unable to load
> > > > > > > > > for the very first config space 

Re: [PATCH v2] vdpa/mlx5: Enable user to add/delete vdpa device

2021-02-23 Thread Eli Cohen
On Tue, Feb 23, 2021 at 07:56:16AM -0500, Michael S. Tsirkin wrote:
> On Tue, Feb 23, 2021 at 02:54:42PM +0200, Eli Cohen wrote:
> > On Tue, Feb 23, 2021 at 07:52:34AM -0500, Michael S. Tsirkin wrote:
> > > 
> > > I think I have them in the linux next branch, no?
> > > 
> > 
> > You do.
> 
> I guest there's a conflict with some other patch in that tree then.
> Can you rebase please?
> 

Parav, will send later today.

> -- 
> MST
> 


Re: [PATCH v2] vdpa/mlx5: Enable user to add/delete vdpa device

2021-02-23 Thread Eli Cohen
On Tue, Feb 23, 2021 at 07:52:34AM -0500, Michael S. Tsirkin wrote:
> 
> I think I have them in the linux next branch, no?
> 

You do.


Re: [PATCH] vdpa/mlx5: Extract correct pointer from driver data

2021-02-23 Thread Eli Cohen
On Tue, Feb 23, 2021 at 07:32:49AM -0500, Michael S. Tsirkin wrote:
> On Tue, Feb 16, 2021 at 07:50:21AM +0200, Eli Cohen wrote:
> > struct mlx5_vdpa_net pointer was stored in drvdata. Extract it as well
> > in mlx5v_remove().
> > 
> > Fixes: 74c9729dd892 ("vdpa/mlx5: Connect mlx5_vdpa to auxiliary bus")
> > Signed-off-by: Eli Cohen 
> 
> Sorry which tree this is for? Couldn't apply.
> 

Drop it. The patch that adds support for management bus implicitly
addresses the issue.

> > ---
> >  drivers/vdpa/mlx5/net/mlx5_vnet.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
> > b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > index 6b0a42183622..4103d3b64a2a 100644
> > --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > @@ -2036,9 +2036,9 @@ static int mlx5v_probe(struct auxiliary_device *adev,
> >  
> >  static void mlx5v_remove(struct auxiliary_device *adev)
> >  {
> > -   struct mlx5_vdpa_dev *mvdev = dev_get_drvdata(>dev);
> > +   struct mlx5_vdpa_net *ndev = dev_get_drvdata(>dev);
> >  
> > -   vdpa_unregister_device(>vdev);
> > +   vdpa_unregister_device(>mvdev.vdev);
> >  }
> >  
> >  static const struct auxiliary_device_id mlx5v_id_table[] = {
> > -- 
> > 2.29.2
> 


Re: [PATCH v2] vdpa/mlx5: Enable user to add/delete vdpa device

2021-02-23 Thread Eli Cohen
On Tue, Feb 23, 2021 at 07:29:32AM -0500, Michael S. Tsirkin wrote:
> On Thu, Feb 18, 2021 at 09:41:57AM +0200, Eli Cohen wrote:
> > Allow to control vdpa device creation and destruction using the vdpa
> > management tool.
> > 
> > Examples:
> > 1. List the management devices
> > $ vdpa mgmtdev show
> > pci/:3b:00.1:
> >   supported_classes net
> > 
> > 2. Create vdpa instance
> > $ vdpa dev add mgmtdev pci/:3b:00.1 name vdpa0
> > 
> > 3. Show vdpa devices
> > $ vdpa dev show
> > vdpa0: type network mgmtdev pci/0000:3b:00.1 vendor_id  max_vqs 16 \
> > max_vq_size 256
> > 
> > Signed-off-by: Eli Cohen 
> > Reviewed-by: Parav Pandit 
> 
> Not sure which tree this is for, I could not apply this.
> 

Depends on Parav's vdpa tool patches. We'll send the entire series again
- Parav's and my patches.

> > ---
> > v0->v1:
> > set mgtdev->ndev NULL on dev delete
> > v1->v2: Resend
> > 
> >  drivers/vdpa/mlx5/net/mlx5_vnet.c | 79 +++
> >  1 file changed, 70 insertions(+), 9 deletions(-)
> > 
> > diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
> > b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > index a51b0f86afe2..08fb481ddc4f 100644
> > --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > @@ -1974,23 +1974,32 @@ static void init_mvqs(struct mlx5_vdpa_net *ndev)
> > }
> >  }
> >  
> > -static int mlx5v_probe(struct auxiliary_device *adev,
> > -  const struct auxiliary_device_id *id)
> > +struct mlx5_vdpa_mgmtdev {
> > +   struct vdpa_mgmt_dev mgtdev;
> > +   struct mlx5_adev *madev;
> > +   struct mlx5_vdpa_net *ndev;
> > +};
> > +
> > +static int mlx5_vdpa_dev_add(struct vdpa_mgmt_dev *v_mdev, const char 
> > *name)
> >  {
> > -   struct mlx5_adev *madev = container_of(adev, struct mlx5_adev, adev);
> > -   struct mlx5_core_dev *mdev = madev->mdev;
> > +   struct mlx5_vdpa_mgmtdev *mgtdev = container_of(v_mdev, struct 
> > mlx5_vdpa_mgmtdev, mgtdev);
> > struct virtio_net_config *config;
> > struct mlx5_vdpa_dev *mvdev;
> > struct mlx5_vdpa_net *ndev;
> > +   struct mlx5_core_dev *mdev;
> > u32 max_vqs;
> > int err;
> >  
> > +   if (mgtdev->ndev)
> > +   return -ENOSPC;
> > +
> > +   mdev = mgtdev->madev->mdev;
> > /* we save one virtqueue for control virtqueue should we require it */
> > max_vqs = MLX5_CAP_DEV_VDPA_EMULATION(mdev, max_num_virtio_queues);
> > max_vqs = min_t(u32, max_vqs, MLX5_MAX_SUPPORTED_VQS);
> >  
> > ndev = vdpa_alloc_device(struct mlx5_vdpa_net, mvdev.vdev, 
> > mdev->device, _vdpa_ops,
> > -2 * mlx5_vdpa_max_qps(max_vqs), NULL);
> > +2 * mlx5_vdpa_max_qps(max_vqs), name);
> > if (IS_ERR(ndev))
> > return PTR_ERR(ndev);
> >  
> > @@ -2018,11 +2027,12 @@ static int mlx5v_probe(struct auxiliary_device 
> > *adev,
> > if (err)
> > goto err_res;
> >  
> > -   err = vdpa_register_device(>vdev);
> > +   mvdev->vdev.mdev = >mgtdev;
> > +   err = _vdpa_register_device(>vdev);
> > if (err)
> > goto err_reg;
> >  
> > -   dev_set_drvdata(>dev, ndev);
> > +   mgtdev->ndev = ndev;
> > return 0;
> >  
> >  err_reg:
> > @@ -2035,11 +2045,62 @@ static int mlx5v_probe(struct auxiliary_device 
> > *adev,
> > return err;
> >  }
> >  
> > +static void mlx5_vdpa_dev_del(struct vdpa_mgmt_dev *v_mdev, struct 
> > vdpa_device *dev)
> > +{
> > +   struct mlx5_vdpa_mgmtdev *mgtdev = container_of(v_mdev, struct 
> > mlx5_vdpa_mgmtdev, mgtdev);
> > +
> > +   _vdpa_unregister_device(dev);
> > +   mgtdev->ndev = NULL;
> > +}
> > +
> > +static const struct vdpa_mgmtdev_ops mdev_ops = {
> > +   .dev_add = mlx5_vdpa_dev_add,
> > +   .dev_del = mlx5_vdpa_dev_del,
> > +};
> > +
> > +static struct virtio_device_id id_table[] = {
> > +   { VIRTIO_ID_NET, VIRTIO_DEV_ANY_ID },
> > +   { 0 },
> > +};
> > +
> > +static int mlx5v_probe(struct auxiliary_device *adev,
> > +  const struct auxiliary_device_id *id)
> > +
> > +{
> > +   struct mlx5_adev *madev = container_of(adev, struct mlx5_adev, adev);
> > +   struct mlx5_core_dev *mdev = madev->mdev;
> > +   struct mlx5_vdpa_mgmtdev *mgtdev;
&g

Re: [PATCH] vdpa/mlx5: set_features should allow reset to zero

2021-02-21 Thread Eli Cohen
On Sun, Feb 21, 2021 at 04:52:05PM -0500, Michael S. Tsirkin wrote:
> On Sun, Feb 21, 2021 at 04:44:37PM +0200, Eli Cohen wrote:
> > On Fri, Feb 19, 2021 at 06:54:58AM -0500, Si-Wei Liu wrote:
> > > Commit 452639a64ad8 ("vdpa: make sure set_features is invoked
> > > for legacy") made an exception for legacy guests to reset
> > > features to 0, when config space is accessed before features
> > > are set. We should relieve the verify_min_features() check
> > > and allow features reset to 0 for this case.
> > > 
> > > It's worth noting that not just legacy guests could access
> > > config space before features are set. For instance, when
> > > feature VIRTIO_NET_F_MTU is advertised some modern driver
> > > will try to access and validate the MTU present in the config
> > > space before virtio features are set. Rejecting reset to 0
> > > prematurely causes correct MTU and link status unable to load
> > > for the very first config space access, rendering issues like
> > > guest showing inaccurate MTU value, or failure to reject
> > > out-of-range MTU.
> > > 
> > > Fixes: 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for supported mlx5 
> > > devices")
> > > Signed-off-by: Si-Wei Liu 
> > > ---
> > >  drivers/vdpa/mlx5/net/mlx5_vnet.c | 15 +--
> > >  1 file changed, 1 insertion(+), 14 deletions(-)
> > > 
> > > diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
> > > b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > index 7c1f789..540dd67 100644
> > > --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > @@ -1490,14 +1490,6 @@ static u64 mlx5_vdpa_get_features(struct 
> > > vdpa_device *vdev)
> > >   return mvdev->mlx_features;
> > >  }
> > >  
> > > -static int verify_min_features(struct mlx5_vdpa_dev *mvdev, u64 features)
> > > -{
> > > - if (!(features & BIT_ULL(VIRTIO_F_ACCESS_PLATFORM)))
> > > - return -EOPNOTSUPP;
> > > -
> > > - return 0;
> > > -}
> > > -
> > 
> > But what if VIRTIO_F_ACCESS_PLATFORM is not offerred? This does not
> > support such cases.
> 
> Did you mean "catch such cases" rather than "support"?
> 

Actually I meant this driver/device does not support such cases.

> 
> > Maybe we should call verify_min_features() from mlx5_vdpa_set_status()
> > just before attempting to call setup_driver().
> > 
> > >  static int setup_virtqueues(struct mlx5_vdpa_net *ndev)
> > >  {
> > >   int err;
> > > @@ -1558,18 +1550,13 @@ static int mlx5_vdpa_set_features(struct 
> > > vdpa_device *vdev, u64 features)
> > >  {
> > >   struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
> > >   struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
> > > - int err;
> > >  
> > >   print_features(mvdev, features, true);
> > >  
> > > - err = verify_min_features(mvdev, features);
> > > - if (err)
> > > - return err;
> > > -
> > >   ndev->mvdev.actual_features = features & ndev->mvdev.mlx_features;
> > >   ndev->config.mtu = cpu_to_mlx5vdpa16(mvdev, ndev->mtu);
> > >   ndev->config.status |= cpu_to_mlx5vdpa16(mvdev, VIRTIO_NET_S_LINK_UP);
> > > - return err;
> > > + return 0;
> > >  }
> > >  
> > >  static void mlx5_vdpa_set_config_cb(struct vdpa_device *vdev, struct 
> > > vdpa_callback *cb)
> > > -- 
> > > 1.8.3.1
> > > 
> 


Re: [PATCH] vdpa/mlx5: set_features should allow reset to zero

2021-02-21 Thread Eli Cohen
On Fri, Feb 19, 2021 at 06:54:58AM -0500, Si-Wei Liu wrote:
> Commit 452639a64ad8 ("vdpa: make sure set_features is invoked
> for legacy") made an exception for legacy guests to reset
> features to 0, when config space is accessed before features
> are set. We should relieve the verify_min_features() check
> and allow features reset to 0 for this case.
> 
> It's worth noting that not just legacy guests could access
> config space before features are set. For instance, when
> feature VIRTIO_NET_F_MTU is advertised some modern driver
> will try to access and validate the MTU present in the config
> space before virtio features are set. Rejecting reset to 0
> prematurely causes correct MTU and link status unable to load
> for the very first config space access, rendering issues like
> guest showing inaccurate MTU value, or failure to reject
> out-of-range MTU.
> 
> Fixes: 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for supported mlx5 devices")
> Signed-off-by: Si-Wei Liu 
> ---
>  drivers/vdpa/mlx5/net/mlx5_vnet.c | 15 +--
>  1 file changed, 1 insertion(+), 14 deletions(-)
> 
> diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
> b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> index 7c1f789..540dd67 100644
> --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> @@ -1490,14 +1490,6 @@ static u64 mlx5_vdpa_get_features(struct vdpa_device 
> *vdev)
>   return mvdev->mlx_features;
>  }
>  
> -static int verify_min_features(struct mlx5_vdpa_dev *mvdev, u64 features)
> -{
> - if (!(features & BIT_ULL(VIRTIO_F_ACCESS_PLATFORM)))
> - return -EOPNOTSUPP;
> -
> - return 0;
> -}
> -

But what if VIRTIO_F_ACCESS_PLATFORM is not offerred? This does not
support such cases.

Maybe we should call verify_min_features() from mlx5_vdpa_set_status()
just before attempting to call setup_driver().

>  static int setup_virtqueues(struct mlx5_vdpa_net *ndev)
>  {
>   int err;
> @@ -1558,18 +1550,13 @@ static int mlx5_vdpa_set_features(struct vdpa_device 
> *vdev, u64 features)
>  {
>   struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
>   struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
> - int err;
>  
>   print_features(mvdev, features, true);
>  
> - err = verify_min_features(mvdev, features);
> - if (err)
> - return err;
> -
>   ndev->mvdev.actual_features = features & ndev->mvdev.mlx_features;
>   ndev->config.mtu = cpu_to_mlx5vdpa16(mvdev, ndev->mtu);
>   ndev->config.status |= cpu_to_mlx5vdpa16(mvdev, VIRTIO_NET_S_LINK_UP);
> - return err;
> + return 0;
>  }
>  
>  static void mlx5_vdpa_set_config_cb(struct vdpa_device *vdev, struct 
> vdpa_callback *cb)
> -- 
> 1.8.3.1
> 


[PATCH v1] vdpa/mlx5: Fix suspend/resume index restoration

2021-02-18 Thread Eli Cohen
When we suspend the VM, the VDPA interface will be reset. When the VM is
resumed again, clear_virtqueues() will clear the available and used
indices resulting in hardware virqtqueue objects becoming out of sync.
We can avoid this function alltogether since qemu will clear them if
required, e.g. when the VM went through a reboot.

Moreover, since the hw available and used indices should always be
identical on query and should be restored to the same value same value
for virtqueues that complete in order, we set the single value provided
by set_vq_state(). In get_vq_state() we return the value of hardware
used index.

Fixes: b35ccebe3ef7 ("vdpa/mlx5: Restore the hardware used index after change 
map")
Fixes: 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for supported mlx5 devices")
Signed-off-by: Eli Cohen 
---
v0->v1:
Fix subject prefix

 drivers/vdpa/mlx5/net/mlx5_vnet.c | 17 -
 1 file changed, 4 insertions(+), 13 deletions(-)

diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index b8e9d525d66c..a51b0f86afe2 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -1169,6 +1169,7 @@ static void suspend_vq(struct mlx5_vdpa_net *ndev, struct 
mlx5_vdpa_virtqueue *m
return;
}
mvq->avail_idx = attr.available_index;
+   mvq->used_idx = attr.used_index;
 }
 
 static void suspend_vqs(struct mlx5_vdpa_net *ndev)
@@ -1426,6 +1427,7 @@ static int mlx5_vdpa_set_vq_state(struct vdpa_device 
*vdev, u16 idx,
return -EINVAL;
}
 
+   mvq->used_idx = state->avail_index;
mvq->avail_idx = state->avail_index;
return 0;
 }
@@ -1443,7 +1445,7 @@ static int mlx5_vdpa_get_vq_state(struct vdpa_device 
*vdev, u16 idx, struct vdpa
 * that cares about emulating the index after vq is stopped.
 */
if (!mvq->initialized) {
-   state->avail_index = mvq->avail_idx;
+   state->avail_index = mvq->used_idx;
return 0;
}
 
@@ -1452,7 +1454,7 @@ static int mlx5_vdpa_get_vq_state(struct vdpa_device 
*vdev, u16 idx, struct vdpa
mlx5_vdpa_warn(mvdev, "failed to query virtqueue\n");
return err;
}
-   state->avail_index = attr.available_index;
+   state->avail_index = attr.used_index;
return 0;
 }
 
@@ -1532,16 +1534,6 @@ static void teardown_virtqueues(struct mlx5_vdpa_net 
*ndev)
}
 }
 
-static void clear_virtqueues(struct mlx5_vdpa_net *ndev)
-{
-   int i;
-
-   for (i = ndev->mvdev.max_vqs - 1; i >= 0; i--) {
-   ndev->vqs[i].avail_idx = 0;
-   ndev->vqs[i].used_idx = 0;
-   }
-}
-
 /* TODO: cross-endian support */
 static inline bool mlx5_vdpa_is_little_endian(struct mlx5_vdpa_dev *mvdev)
 {
@@ -1777,7 +1769,6 @@ static void mlx5_vdpa_set_status(struct vdpa_device 
*vdev, u8 status)
if (!status) {
mlx5_vdpa_info(mvdev, "performing device reset\n");
teardown_driver(ndev);
-   clear_virtqueues(ndev);
mlx5_vdpa_destroy_mr(>mvdev);
ndev->mvdev.status = 0;
++mvdev->generation;
-- 
2.29.2



[PATCH v2] vdpa/mlx5: Enable user to add/delete vdpa device

2021-02-18 Thread Eli Cohen
Allow to control vdpa device creation and destruction using the vdpa
management tool.

Examples:
1. List the management devices
$ vdpa mgmtdev show
pci/:3b:00.1:
  supported_classes net

2. Create vdpa instance
$ vdpa dev add mgmtdev pci/:3b:00.1 name vdpa0

3. Show vdpa devices
$ vdpa dev show
vdpa0: type network mgmtdev pci/:3b:00.1 vendor_id  max_vqs 16 \
max_vq_size 256

Signed-off-by: Eli Cohen 
Reviewed-by: Parav Pandit 
---
v0->v1:
set mgtdev->ndev NULL on dev delete
v1->v2: Resend

 drivers/vdpa/mlx5/net/mlx5_vnet.c | 79 +++
 1 file changed, 70 insertions(+), 9 deletions(-)

diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index a51b0f86afe2..08fb481ddc4f 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -1974,23 +1974,32 @@ static void init_mvqs(struct mlx5_vdpa_net *ndev)
}
 }
 
-static int mlx5v_probe(struct auxiliary_device *adev,
-  const struct auxiliary_device_id *id)
+struct mlx5_vdpa_mgmtdev {
+   struct vdpa_mgmt_dev mgtdev;
+   struct mlx5_adev *madev;
+   struct mlx5_vdpa_net *ndev;
+};
+
+static int mlx5_vdpa_dev_add(struct vdpa_mgmt_dev *v_mdev, const char *name)
 {
-   struct mlx5_adev *madev = container_of(adev, struct mlx5_adev, adev);
-   struct mlx5_core_dev *mdev = madev->mdev;
+   struct mlx5_vdpa_mgmtdev *mgtdev = container_of(v_mdev, struct 
mlx5_vdpa_mgmtdev, mgtdev);
struct virtio_net_config *config;
struct mlx5_vdpa_dev *mvdev;
struct mlx5_vdpa_net *ndev;
+   struct mlx5_core_dev *mdev;
u32 max_vqs;
int err;
 
+   if (mgtdev->ndev)
+   return -ENOSPC;
+
+   mdev = mgtdev->madev->mdev;
/* we save one virtqueue for control virtqueue should we require it */
max_vqs = MLX5_CAP_DEV_VDPA_EMULATION(mdev, max_num_virtio_queues);
max_vqs = min_t(u32, max_vqs, MLX5_MAX_SUPPORTED_VQS);
 
ndev = vdpa_alloc_device(struct mlx5_vdpa_net, mvdev.vdev, 
mdev->device, _vdpa_ops,
-2 * mlx5_vdpa_max_qps(max_vqs), NULL);
+2 * mlx5_vdpa_max_qps(max_vqs), name);
if (IS_ERR(ndev))
return PTR_ERR(ndev);
 
@@ -2018,11 +2027,12 @@ static int mlx5v_probe(struct auxiliary_device *adev,
if (err)
goto err_res;
 
-   err = vdpa_register_device(>vdev);
+   mvdev->vdev.mdev = >mgtdev;
+   err = _vdpa_register_device(>vdev);
if (err)
goto err_reg;
 
-   dev_set_drvdata(>dev, ndev);
+   mgtdev->ndev = ndev;
return 0;
 
 err_reg:
@@ -2035,11 +2045,62 @@ static int mlx5v_probe(struct auxiliary_device *adev,
return err;
 }
 
+static void mlx5_vdpa_dev_del(struct vdpa_mgmt_dev *v_mdev, struct vdpa_device 
*dev)
+{
+   struct mlx5_vdpa_mgmtdev *mgtdev = container_of(v_mdev, struct 
mlx5_vdpa_mgmtdev, mgtdev);
+
+   _vdpa_unregister_device(dev);
+   mgtdev->ndev = NULL;
+}
+
+static const struct vdpa_mgmtdev_ops mdev_ops = {
+   .dev_add = mlx5_vdpa_dev_add,
+   .dev_del = mlx5_vdpa_dev_del,
+};
+
+static struct virtio_device_id id_table[] = {
+   { VIRTIO_ID_NET, VIRTIO_DEV_ANY_ID },
+   { 0 },
+};
+
+static int mlx5v_probe(struct auxiliary_device *adev,
+  const struct auxiliary_device_id *id)
+
+{
+   struct mlx5_adev *madev = container_of(adev, struct mlx5_adev, adev);
+   struct mlx5_core_dev *mdev = madev->mdev;
+   struct mlx5_vdpa_mgmtdev *mgtdev;
+   int err;
+
+   mgtdev = kzalloc(sizeof(*mgtdev), GFP_KERNEL);
+   if (!mgtdev)
+   return -ENOMEM;
+
+   mgtdev->mgtdev.ops = _ops;
+   mgtdev->mgtdev.device = mdev->device;
+   mgtdev->mgtdev.id_table = id_table;
+   mgtdev->madev = madev;
+
+   err = vdpa_mgmtdev_register(>mgtdev);
+   if (err)
+   goto reg_err;
+
+   dev_set_drvdata(>dev, mgtdev);
+
+   return 0;
+
+reg_err:
+   kfree(mdev);
+   return err;
+}
+
 static void mlx5v_remove(struct auxiliary_device *adev)
 {
-   struct mlx5_vdpa_dev *mvdev = dev_get_drvdata(>dev);
+   struct mlx5_vdpa_mgmtdev *mgtdev;
 
-   vdpa_unregister_device(>vdev);
+   mgtdev = dev_get_drvdata(>dev);
+   vdpa_mgmtdev_unregister(>mgtdev);
+   kfree(mgtdev);
 }
 
 static const struct auxiliary_device_id mlx5v_id_table[] = {
-- 
2.29.2



Re: [PATCH 1/2] vdpa/mlx5: Fix suspend/resume index restoration

2021-02-18 Thread Eli Cohen
On Wed, Feb 17, 2021 at 04:20:14PM -0500, Michael S. Tsirkin wrote:
> On Wed, Feb 17, 2021 at 11:42:48AM -0800, Si-Wei Liu wrote:
> > 
> > 
> > On 2/16/2021 8:20 AM, Eli Cohen wrote:
> > > When we suspend the VM, the VDPA interface will be reset. When the VM is
> > > resumed again, clear_virtqueues() will clear the available and used
> > > indices resulting in hardware virqtqueue objects becoming out of sync.
> > > We can avoid this function alltogether since qemu will clear them if
> > > required, e.g. when the VM went through a reboot.
> > > 
> > > Moreover, since the hw available and used indices should always be
> > > identical on query and should be restored to the same value same value
> > > for virtqueues that complete in order, we set the single value provided
> > > by set_vq_state(). In get_vq_state() we return the value of hardware
> > > used index.
> > > 
> > > Fixes: 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for supported mlx5 
> > > devices")
> > > Signed-off-by: Eli Cohen 
> > Acked-by: Si-Wei Liu 
> 
> 
> Seems to also fix b35ccebe3ef76168aa2edaa35809c0232cb3578e, right?
> 

Right.

> 
> > > ---
> > >   drivers/vdpa/mlx5/net/mlx5_vnet.c | 17 -
> > >   1 file changed, 4 insertions(+), 13 deletions(-)
> > > 
> > > diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
> > > b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > index b8e9d525d66c..a51b0f86afe2 100644
> > > --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > @@ -1169,6 +1169,7 @@ static void suspend_vq(struct mlx5_vdpa_net *ndev, 
> > > struct mlx5_vdpa_virtqueue *m
> > >   return;
> > >   }
> > >   mvq->avail_idx = attr.available_index;
> > > + mvq->used_idx = attr.used_index;
> > >   }
> > >   static void suspend_vqs(struct mlx5_vdpa_net *ndev)
> > > @@ -1426,6 +1427,7 @@ static int mlx5_vdpa_set_vq_state(struct 
> > > vdpa_device *vdev, u16 idx,
> > >   return -EINVAL;
> > >   }
> > > + mvq->used_idx = state->avail_index;
> > >   mvq->avail_idx = state->avail_index;
> > >   return 0;
> > >   }
> > > @@ -1443,7 +1445,7 @@ static int mlx5_vdpa_get_vq_state(struct 
> > > vdpa_device *vdev, u16 idx, struct vdpa
> > >* that cares about emulating the index after vq is stopped.
> > >*/
> > >   if (!mvq->initialized) {
> > > - state->avail_index = mvq->avail_idx;
> > > + state->avail_index = mvq->used_idx;
> > >   return 0;
> > >   }
> > > @@ -1452,7 +1454,7 @@ static int mlx5_vdpa_get_vq_state(struct 
> > > vdpa_device *vdev, u16 idx, struct vdpa
> > >   mlx5_vdpa_warn(mvdev, "failed to query virtqueue\n");
> > >   return err;
> > >   }
> > > - state->avail_index = attr.available_index;
> > > + state->avail_index = attr.used_index;
> > >   return 0;
> > >   }
> > > @@ -1532,16 +1534,6 @@ static void teardown_virtqueues(struct 
> > > mlx5_vdpa_net *ndev)
> > >   }
> > >   }
> > > -static void clear_virtqueues(struct mlx5_vdpa_net *ndev)
> > > -{
> > > - int i;
> > > -
> > > - for (i = ndev->mvdev.max_vqs - 1; i >= 0; i--) {
> > > - ndev->vqs[i].avail_idx = 0;
> > > - ndev->vqs[i].used_idx = 0;
> > > - }
> > > -}
> > > -
> > >   /* TODO: cross-endian support */
> > >   static inline bool mlx5_vdpa_is_little_endian(struct mlx5_vdpa_dev 
> > > *mvdev)
> > >   {
> > > @@ -1777,7 +1769,6 @@ static void mlx5_vdpa_set_status(struct vdpa_device 
> > > *vdev, u8 status)
> > >   if (!status) {
> > >   mlx5_vdpa_info(mvdev, "performing device reset\n");
> > >   teardown_driver(ndev);
> > > - clear_virtqueues(ndev);
> > >   mlx5_vdpa_destroy_mr(>mvdev);
> > >   ndev->mvdev.status = 0;
> > >   ++mvdev->generation;
> 


Re: [PATCH 2/2 v1] vdpa/mlx5: Enable user to add/delete vdpa device

2021-02-18 Thread Eli Cohen
On Wed, Feb 17, 2021 at 12:13:37PM -0500, Michael S. Tsirkin wrote:
> On Wed, Feb 17, 2021 at 01:31:36PM +0200, Eli Cohen wrote:
> > Allow to control vdpa device creation and destruction using the vdpa
> > management tool.
> > 
> > Examples:
> > 1. List the management devices
> > $ vdpa mgmtdev show
> > pci/:3b:00.1:
> >   supported_classes net
> > 
> > 2. Create vdpa instance
> > $ vdpa dev add mgmtdev pci/:3b:00.1 name vdpa0
> > 
> > 3. Show vdpa devices
> > $ vdpa dev show
> > vdpa0: type network mgmtdev pci/0000:3b:00.1 vendor_id  max_vqs 16 \
> > max_vq_size 256
> > 
> > Signed-off-by: Eli Cohen 
> > Reviewed-by: Parav Pandit 
> 
> where's the rest of the patchset though? I only got 2/2 ... confused.

Sorry, I hade two unrelated patches that git format patch named them
0001... and 0002... git send email added 1/2 and 2/2 even though I sent
them seperately.

I will send them again.

> 
> > ---
> > v0->v1:
> > set mgtdev->ndev NULL on dev delete 
> > 
> >  drivers/vdpa/mlx5/net/mlx5_vnet.c | 79 +++
> >  1 file changed, 70 insertions(+), 9 deletions(-)
> > 
> > diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
> > b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > index a51b0f86afe2..08fb481ddc4f 100644
> > --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > @@ -1974,23 +1974,32 @@ static void init_mvqs(struct mlx5_vdpa_net *ndev)
> > }
> >  }
> >  
> > -static int mlx5v_probe(struct auxiliary_device *adev,
> > -  const struct auxiliary_device_id *id)
> > +struct mlx5_vdpa_mgmtdev {
> > +   struct vdpa_mgmt_dev mgtdev;
> > +   struct mlx5_adev *madev;
> > +   struct mlx5_vdpa_net *ndev;
> > +};
> > +
> > +static int mlx5_vdpa_dev_add(struct vdpa_mgmt_dev *v_mdev, const char 
> > *name)
> >  {
> > -   struct mlx5_adev *madev = container_of(adev, struct mlx5_adev, adev);
> > -   struct mlx5_core_dev *mdev = madev->mdev;
> > +   struct mlx5_vdpa_mgmtdev *mgtdev = container_of(v_mdev, struct 
> > mlx5_vdpa_mgmtdev, mgtdev);
> > struct virtio_net_config *config;
> > struct mlx5_vdpa_dev *mvdev;
> > struct mlx5_vdpa_net *ndev;
> > +   struct mlx5_core_dev *mdev;
> > u32 max_vqs;
> > int err;
> >  
> > +   if (mgtdev->ndev)
> > +   return -ENOSPC;
> > +
> > +   mdev = mgtdev->madev->mdev;
> > /* we save one virtqueue for control virtqueue should we require it */
> > max_vqs = MLX5_CAP_DEV_VDPA_EMULATION(mdev, max_num_virtio_queues);
> > max_vqs = min_t(u32, max_vqs, MLX5_MAX_SUPPORTED_VQS);
> >  
> > ndev = vdpa_alloc_device(struct mlx5_vdpa_net, mvdev.vdev, 
> > mdev->device, _vdpa_ops,
> > -2 * mlx5_vdpa_max_qps(max_vqs), NULL);
> > +2 * mlx5_vdpa_max_qps(max_vqs), name);
> > if (IS_ERR(ndev))
> > return PTR_ERR(ndev);
> >  
> > @@ -2018,11 +2027,12 @@ static int mlx5v_probe(struct auxiliary_device 
> > *adev,
> > if (err)
> > goto err_res;
> >  
> > -   err = vdpa_register_device(>vdev);
> > +   mvdev->vdev.mdev = >mgtdev;
> > +   err = _vdpa_register_device(>vdev);
> > if (err)
> > goto err_reg;
> >  
> > -   dev_set_drvdata(>dev, ndev);
> > +   mgtdev->ndev = ndev;
> > return 0;
> >  
> >  err_reg:
> > @@ -2035,11 +2045,62 @@ static int mlx5v_probe(struct auxiliary_device 
> > *adev,
> > return err;
> >  }
> >  
> > +static void mlx5_vdpa_dev_del(struct vdpa_mgmt_dev *v_mdev, struct 
> > vdpa_device *dev)
> > +{
> > +   struct mlx5_vdpa_mgmtdev *mgtdev = container_of(v_mdev, struct 
> > mlx5_vdpa_mgmtdev, mgtdev);
> > +
> > +   _vdpa_unregister_device(dev);
> > +   mgtdev->ndev = NULL;
> > +}
> > +
> > +static const struct vdpa_mgmtdev_ops mdev_ops = {
> > +   .dev_add = mlx5_vdpa_dev_add,
> > +   .dev_del = mlx5_vdpa_dev_del,
> > +};
> > +
> > +static struct virtio_device_id id_table[] = {
> > +   { VIRTIO_ID_NET, VIRTIO_DEV_ANY_ID },
> > +   { 0 },
> > +};
> > +
> > +static int mlx5v_probe(struct auxiliary_device *adev,
> > +  const struct auxiliary_device_id *id)
> > +
> > +{
> > +   struct mlx5_adev *madev = container_of(adev, struct mlx5_adev, adev);
> > +   struct mlx5_core_dev

[PATCH 2/2 v1] vdpa/mlx5: Enable user to add/delete vdpa device

2021-02-17 Thread Eli Cohen
Allow to control vdpa device creation and destruction using the vdpa
management tool.

Examples:
1. List the management devices
$ vdpa mgmtdev show
pci/:3b:00.1:
  supported_classes net

2. Create vdpa instance
$ vdpa dev add mgmtdev pci/:3b:00.1 name vdpa0

3. Show vdpa devices
$ vdpa dev show
vdpa0: type network mgmtdev pci/:3b:00.1 vendor_id  max_vqs 16 \
max_vq_size 256

Signed-off-by: Eli Cohen 
Reviewed-by: Parav Pandit 
---
v0->v1:
set mgtdev->ndev NULL on dev delete 

 drivers/vdpa/mlx5/net/mlx5_vnet.c | 79 +++
 1 file changed, 70 insertions(+), 9 deletions(-)

diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index a51b0f86afe2..08fb481ddc4f 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -1974,23 +1974,32 @@ static void init_mvqs(struct mlx5_vdpa_net *ndev)
}
 }
 
-static int mlx5v_probe(struct auxiliary_device *adev,
-  const struct auxiliary_device_id *id)
+struct mlx5_vdpa_mgmtdev {
+   struct vdpa_mgmt_dev mgtdev;
+   struct mlx5_adev *madev;
+   struct mlx5_vdpa_net *ndev;
+};
+
+static int mlx5_vdpa_dev_add(struct vdpa_mgmt_dev *v_mdev, const char *name)
 {
-   struct mlx5_adev *madev = container_of(adev, struct mlx5_adev, adev);
-   struct mlx5_core_dev *mdev = madev->mdev;
+   struct mlx5_vdpa_mgmtdev *mgtdev = container_of(v_mdev, struct 
mlx5_vdpa_mgmtdev, mgtdev);
struct virtio_net_config *config;
struct mlx5_vdpa_dev *mvdev;
struct mlx5_vdpa_net *ndev;
+   struct mlx5_core_dev *mdev;
u32 max_vqs;
int err;
 
+   if (mgtdev->ndev)
+   return -ENOSPC;
+
+   mdev = mgtdev->madev->mdev;
/* we save one virtqueue for control virtqueue should we require it */
max_vqs = MLX5_CAP_DEV_VDPA_EMULATION(mdev, max_num_virtio_queues);
max_vqs = min_t(u32, max_vqs, MLX5_MAX_SUPPORTED_VQS);
 
ndev = vdpa_alloc_device(struct mlx5_vdpa_net, mvdev.vdev, 
mdev->device, _vdpa_ops,
-2 * mlx5_vdpa_max_qps(max_vqs), NULL);
+2 * mlx5_vdpa_max_qps(max_vqs), name);
if (IS_ERR(ndev))
return PTR_ERR(ndev);
 
@@ -2018,11 +2027,12 @@ static int mlx5v_probe(struct auxiliary_device *adev,
if (err)
goto err_res;
 
-   err = vdpa_register_device(>vdev);
+   mvdev->vdev.mdev = >mgtdev;
+   err = _vdpa_register_device(>vdev);
if (err)
goto err_reg;
 
-   dev_set_drvdata(>dev, ndev);
+   mgtdev->ndev = ndev;
return 0;
 
 err_reg:
@@ -2035,11 +2045,62 @@ static int mlx5v_probe(struct auxiliary_device *adev,
return err;
 }
 
+static void mlx5_vdpa_dev_del(struct vdpa_mgmt_dev *v_mdev, struct vdpa_device 
*dev)
+{
+   struct mlx5_vdpa_mgmtdev *mgtdev = container_of(v_mdev, struct 
mlx5_vdpa_mgmtdev, mgtdev);
+
+   _vdpa_unregister_device(dev);
+   mgtdev->ndev = NULL;
+}
+
+static const struct vdpa_mgmtdev_ops mdev_ops = {
+   .dev_add = mlx5_vdpa_dev_add,
+   .dev_del = mlx5_vdpa_dev_del,
+};
+
+static struct virtio_device_id id_table[] = {
+   { VIRTIO_ID_NET, VIRTIO_DEV_ANY_ID },
+   { 0 },
+};
+
+static int mlx5v_probe(struct auxiliary_device *adev,
+  const struct auxiliary_device_id *id)
+
+{
+   struct mlx5_adev *madev = container_of(adev, struct mlx5_adev, adev);
+   struct mlx5_core_dev *mdev = madev->mdev;
+   struct mlx5_vdpa_mgmtdev *mgtdev;
+   int err;
+
+   mgtdev = kzalloc(sizeof(*mgtdev), GFP_KERNEL);
+   if (!mgtdev)
+   return -ENOMEM;
+
+   mgtdev->mgtdev.ops = _ops;
+   mgtdev->mgtdev.device = mdev->device;
+   mgtdev->mgtdev.id_table = id_table;
+   mgtdev->madev = madev;
+
+   err = vdpa_mgmtdev_register(>mgtdev);
+   if (err)
+   goto reg_err;
+
+   dev_set_drvdata(>dev, mgtdev);
+
+   return 0;
+
+reg_err:
+   kfree(mdev);
+   return err;
+}
+
 static void mlx5v_remove(struct auxiliary_device *adev)
 {
-   struct mlx5_vdpa_dev *mvdev = dev_get_drvdata(>dev);
+   struct mlx5_vdpa_mgmtdev *mgtdev;
 
-   vdpa_unregister_device(>vdev);
+   mgtdev = dev_get_drvdata(>dev);
+   vdpa_mgmtdev_unregister(>mgtdev);
+   kfree(mgtdev);
 }
 
 static const struct auxiliary_device_id mlx5v_id_table[] = {
-- 
2.29.2



Re: [PATCH v1] vdpa/mlx5: Restore the hardware used index after change map

2021-02-16 Thread Eli Cohen
On Tue, Feb 16, 2021 at 04:25:20PM -0800, Si-Wei Liu wrote:
> 
> > > The saved mvq->avail_idx will be used to recreate hardware virtq object 
> > > and
> > > the used index in create_virtqueue(), once status DRIVER_OK is set. I
> > > suspect we should pass the index to mvq->used_idx in
> > > mlx5_vdpa_set_vq_state() below instead.
> > > 
> > Right, that's what I am checking but still no final conclusions. I need
> > to harness hardware guy to provide me with clear answers.
> OK. Could you update what you find from the hardware guy and let us know
> e.g. if the current firmware interface would suffice?
> 

Te answer I got is that upon query_virtqueue, the hardware available and
used indices should always return the same value for virtqueues that
complete in order - that's the case for network virtqueues. The value
returned is the consumer index of the hardware. These values should be
provided when creating a virtqueue; in case of attaching to an existing
virtqueue (e.g. after suspend and resume), the values can be non zero.

Currently there's a bug in the firmware where for RX virtqueue, the
value returned for the available index is wrong. However, the value
returned for used index is the correct value.

Therefore, we need to return the hardware used index in get_vq_state()
and restore this value into both the new object's available and used
indices.


[PATCH 2/2] vdpa/mlx5: Enable user to add/delete vdpa device

2021-02-16 Thread Eli Cohen
Allow to control vdpa device creation and destruction using the vdpa
management tool.

Examples:
1. List the management devices
$ vdpa mgmtdev show
pci/:3b:00.1:
  supported_classes net

2. Create vdpa instance
$ vdpa dev add mgmtdev pci/:3b:00.1 name vdpa0

3. Show vdpa devices
$ vdpa dev show
vdpa0: type network mgmtdev pci/:3b:00.1 vendor_id  max_vqs 16 \
max_vq_size 256

Signed-off-by: Eli Cohen 
Reviewed-by: Parav Pandit 
---
 drivers/vdpa/mlx5/net/mlx5_vnet.c | 76 +++
 1 file changed, 67 insertions(+), 9 deletions(-)

diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index a51b0f86afe2..acf7dc61acb0 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -1974,23 +1974,32 @@ static void init_mvqs(struct mlx5_vdpa_net *ndev)
}
 }
 
-static int mlx5v_probe(struct auxiliary_device *adev,
-  const struct auxiliary_device_id *id)
+struct mlx5_vdpa_mgmtdev {
+   struct vdpa_mgmt_dev mgtdev;
+   struct mlx5_adev *madev;
+   struct mlx5_vdpa_net *ndev;
+};
+
+static int mlx5_vdpa_dev_add(struct vdpa_mgmt_dev *v_mdev, const char *name)
 {
-   struct mlx5_adev *madev = container_of(adev, struct mlx5_adev, adev);
-   struct mlx5_core_dev *mdev = madev->mdev;
+   struct mlx5_vdpa_mgmtdev *mgtdev = container_of(v_mdev, struct 
mlx5_vdpa_mgmtdev, mgtdev);
struct virtio_net_config *config;
struct mlx5_vdpa_dev *mvdev;
struct mlx5_vdpa_net *ndev;
+   struct mlx5_core_dev *mdev;
u32 max_vqs;
int err;
 
+   if (mgtdev->ndev)
+   return -ENOSPC;
+
+   mdev = mgtdev->madev->mdev;
/* we save one virtqueue for control virtqueue should we require it */
max_vqs = MLX5_CAP_DEV_VDPA_EMULATION(mdev, max_num_virtio_queues);
max_vqs = min_t(u32, max_vqs, MLX5_MAX_SUPPORTED_VQS);
 
ndev = vdpa_alloc_device(struct mlx5_vdpa_net, mvdev.vdev, 
mdev->device, _vdpa_ops,
-2 * mlx5_vdpa_max_qps(max_vqs), NULL);
+2 * mlx5_vdpa_max_qps(max_vqs), name);
if (IS_ERR(ndev))
return PTR_ERR(ndev);
 
@@ -2018,11 +2027,12 @@ static int mlx5v_probe(struct auxiliary_device *adev,
if (err)
goto err_res;
 
-   err = vdpa_register_device(>vdev);
+   mvdev->vdev.mdev = >mgtdev;
+   err = _vdpa_register_device(>vdev);
if (err)
goto err_reg;
 
-   dev_set_drvdata(>dev, ndev);
+   mgtdev->ndev = ndev;
return 0;
 
 err_reg:
@@ -2035,11 +2045,59 @@ static int mlx5v_probe(struct auxiliary_device *adev,
return err;
 }
 
+static void mlx5_vdpa_dev_del(struct vdpa_mgmt_dev *v_mdev, struct vdpa_device 
*dev)
+{
+   _vdpa_unregister_device(dev);
+}
+
+static const struct vdpa_mgmtdev_ops mdev_ops = {
+   .dev_add = mlx5_vdpa_dev_add,
+   .dev_del = mlx5_vdpa_dev_del,
+};
+
+static struct virtio_device_id id_table[] = {
+   { VIRTIO_ID_NET, VIRTIO_DEV_ANY_ID },
+   { 0 },
+};
+
+static int mlx5v_probe(struct auxiliary_device *adev,
+  const struct auxiliary_device_id *id)
+
+{
+   struct mlx5_adev *madev = container_of(adev, struct mlx5_adev, adev);
+   struct mlx5_core_dev *mdev = madev->mdev;
+   struct mlx5_vdpa_mgmtdev *mgtdev;
+   int err;
+
+   mgtdev = kzalloc(sizeof(*mgtdev), GFP_KERNEL);
+   if (!mgtdev)
+   return -ENOMEM;
+
+   mgtdev->mgtdev.ops = _ops;
+   mgtdev->mgtdev.device = mdev->device;
+   mgtdev->mgtdev.id_table = id_table;
+   mgtdev->madev = madev;
+
+   err = vdpa_mgmtdev_register(>mgtdev);
+   if (err)
+   goto reg_err;
+
+   dev_set_drvdata(>dev, mgtdev);
+
+   return 0;
+
+reg_err:
+   kfree(mdev);
+   return err;
+}
+
 static void mlx5v_remove(struct auxiliary_device *adev)
 {
-   struct mlx5_vdpa_dev *mvdev = dev_get_drvdata(>dev);
+   struct mlx5_vdpa_mgmtdev *mgtdev;
 
-   vdpa_unregister_device(>vdev);
+   mgtdev = dev_get_drvdata(>dev);
+   vdpa_mgmtdev_unregister(>mgtdev);
+   kfree(mgtdev);
 }
 
 static const struct auxiliary_device_id mlx5v_id_table[] = {
-- 
2.29.2



[PATCH 1/2] vdpa/mlx5: Fix suspend/resume index restoration

2021-02-16 Thread Eli Cohen
When we suspend the VM, the VDPA interface will be reset. When the VM is
resumed again, clear_virtqueues() will clear the available and used
indices resulting in hardware virqtqueue objects becoming out of sync.
We can avoid this function alltogether since qemu will clear them if
required, e.g. when the VM went through a reboot.

Moreover, since the hw available and used indices should always be
identical on query and should be restored to the same value same value
for virtqueues that complete in order, we set the single value provided
by set_vq_state(). In get_vq_state() we return the value of hardware
used index.

Fixes: 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for supported mlx5 devices")
Signed-off-by: Eli Cohen 
---
 drivers/vdpa/mlx5/net/mlx5_vnet.c | 17 -
 1 file changed, 4 insertions(+), 13 deletions(-)

diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index b8e9d525d66c..a51b0f86afe2 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -1169,6 +1169,7 @@ static void suspend_vq(struct mlx5_vdpa_net *ndev, struct 
mlx5_vdpa_virtqueue *m
return;
}
mvq->avail_idx = attr.available_index;
+   mvq->used_idx = attr.used_index;
 }
 
 static void suspend_vqs(struct mlx5_vdpa_net *ndev)
@@ -1426,6 +1427,7 @@ static int mlx5_vdpa_set_vq_state(struct vdpa_device 
*vdev, u16 idx,
return -EINVAL;
}
 
+   mvq->used_idx = state->avail_index;
mvq->avail_idx = state->avail_index;
return 0;
 }
@@ -1443,7 +1445,7 @@ static int mlx5_vdpa_get_vq_state(struct vdpa_device 
*vdev, u16 idx, struct vdpa
 * that cares about emulating the index after vq is stopped.
 */
if (!mvq->initialized) {
-   state->avail_index = mvq->avail_idx;
+   state->avail_index = mvq->used_idx;
return 0;
}
 
@@ -1452,7 +1454,7 @@ static int mlx5_vdpa_get_vq_state(struct vdpa_device 
*vdev, u16 idx, struct vdpa
mlx5_vdpa_warn(mvdev, "failed to query virtqueue\n");
return err;
}
-   state->avail_index = attr.available_index;
+   state->avail_index = attr.used_index;
return 0;
 }
 
@@ -1532,16 +1534,6 @@ static void teardown_virtqueues(struct mlx5_vdpa_net 
*ndev)
}
 }
 
-static void clear_virtqueues(struct mlx5_vdpa_net *ndev)
-{
-   int i;
-
-   for (i = ndev->mvdev.max_vqs - 1; i >= 0; i--) {
-   ndev->vqs[i].avail_idx = 0;
-   ndev->vqs[i].used_idx = 0;
-   }
-}
-
 /* TODO: cross-endian support */
 static inline bool mlx5_vdpa_is_little_endian(struct mlx5_vdpa_dev *mvdev)
 {
@@ -1777,7 +1769,6 @@ static void mlx5_vdpa_set_status(struct vdpa_device 
*vdev, u8 status)
if (!status) {
mlx5_vdpa_info(mvdev, "performing device reset\n");
teardown_driver(ndev);
-   clear_virtqueues(ndev);
mlx5_vdpa_destroy_mr(>mvdev);
ndev->mvdev.status = 0;
++mvdev->generation;
-- 
2.29.2



Re: [PATCH v2 3/3] vdpa/mlx5: defer clear_virtqueues to until DRIVER_OK

2021-02-16 Thread Eli Cohen
On Thu, Feb 11, 2021 at 09:33:14AM +0200, Eli Cohen wrote:
> On Wed, Feb 10, 2021 at 01:48:00PM -0800, Si-Wei Liu wrote:
> > While virtq is stopped,  get_vq_state() is supposed to
> > be  called to  get  sync'ed  with  the latest internal
> > avail_index from device. The saved avail_index is used
> > to restate  the virtq  once device is started.  Commit
> > b35ccebe3ef7 introduced the clear_virtqueues() routine
> > to  reset  the saved  avail_index,  however, the index
> > gets cleared a bit earlier before get_vq_state() tries
> > to read it. This would cause consistency problems when
> > virtq is restarted, e.g. through a series of link down
> > and link up events. We  could  defer  the  clearing of
> > avail_index  to  until  the  device  is to be started,
> > i.e. until  VIRTIO_CONFIG_S_DRIVER_OK  is set again in
> > set_status().
> > 
> > Fixes: b35ccebe3ef7 ("vdpa/mlx5: Restore the hardware used index after 
> > change map")
> > Signed-off-by: Si-Wei Liu 
> > Acked-by: Jason Wang 
> 
> Acked-by: Eli Cohen 
> 

I take it back. I think we don't need to clear the indexes at all. In
case we need to restore indexes we'll get the right values through
set_vq_state(). If we suspend the virtqueue due to VM being suspended,
qemu will query first and will provide the the queried value. In case of
VM reboot, it will provide 0 in set_vq_state().

I am sending a patch that addresses both reboot and suspend.

> > ---
> >  drivers/vdpa/mlx5/net/mlx5_vnet.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
> > b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > index 7c1f789..ce6aae8 100644
> > --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > @@ -1777,7 +1777,6 @@ static void mlx5_vdpa_set_status(struct vdpa_device 
> > *vdev, u8 status)
> > if (!status) {
> > mlx5_vdpa_info(mvdev, "performing device reset\n");
> > teardown_driver(ndev);
> > -   clear_virtqueues(ndev);
> > mlx5_vdpa_destroy_mr(>mvdev);
> > ndev->mvdev.status = 0;
> > ++mvdev->generation;
> > @@ -1786,6 +1785,7 @@ static void mlx5_vdpa_set_status(struct vdpa_device 
> > *vdev, u8 status)
> >  
> > if ((status ^ ndev->mvdev.status) & VIRTIO_CONFIG_S_DRIVER_OK) {
> > if (status & VIRTIO_CONFIG_S_DRIVER_OK) {
> > +   clear_virtqueues(ndev);
> > err = setup_driver(ndev);
> > if (err) {
> > mlx5_vdpa_warn(mvdev, "failed to setup 
> > driver\n");
> > -- 
> > 1.8.3.1
> > 


Re: [PATCH] vdpa/mlx5: Extract correct pointer from driver data

2021-02-16 Thread Eli Cohen
On Tue, Feb 16, 2021 at 09:37:34AM +0200, Leon Romanovsky wrote:
> On Tue, Feb 16, 2021 at 08:42:26AM +0200, Eli Cohen wrote:
> > On Tue, Feb 16, 2021 at 08:35:51AM +0200, Leon Romanovsky wrote:
> > > On Tue, Feb 16, 2021 at 07:50:22AM +0200, Eli Cohen wrote:
> > > > struct mlx5_vdpa_net pointer was stored in drvdata. Extract it as well
> > > > in mlx5v_remove().
> > > >
> > > > Fixes: 74c9729dd892 ("vdpa/mlx5: Connect mlx5_vdpa to auxiliary bus")
> > > > Signed-off-by: Eli Cohen 
> > > > ---
> > > >  drivers/vdpa/mlx5/net/mlx5_vnet.c | 4 ++--
> > > >  1 file changed, 2 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
> > > > b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > index 6b0a42183622..4103d3b64a2a 100644
> > > > --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > @@ -2036,9 +2036,9 @@ static int mlx5v_probe(struct auxiliary_device 
> > > > *adev,
> > > >
> > > >  static void mlx5v_remove(struct auxiliary_device *adev)
> > > >  {
> > > > -   struct mlx5_vdpa_dev *mvdev = dev_get_drvdata(>dev);
> > > > +   struct mlx5_vdpa_net *ndev = dev_get_drvdata(>dev);
> > > >
> > > > -   vdpa_unregister_device(>vdev);
> > > > +   vdpa_unregister_device(>mvdev.vdev);
> > > >  }
> > >
> > > IMHO, The more correct solution is to fix dev_set_drvdata() call,
> > > because we are regustering/unregistering/allocating "struct 
> > > mlx5_vdpa_dev".
> > >
> >
> > We're allocating "struct mlx5_vdpa_net". "struct mlx5_vdpa_dev" is just
> > a member field of "struct mlx5_vdpa_net".
> 
> I referred to these lines in the mlx5v_probe():
>   1986 err = mlx5_vdpa_alloc_resources(>mvdev);
>   1987 if (err)
>   1988 goto err_mtu;
>   1989
>   1990 err = alloc_resources(ndev);
>   1991 if (err)
>   1992 goto err_res;
>   1993
>   1994 err = vdpa_register_device(>vdev);
> 
> So mlx5v_remove() is better to be symmetrical.
> 

It's "struct mlx5_vdpa_net" that is being allocated here so it makes
sense to set this pointer as the the driver data.

> Thanks
> 
> >
> > > diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
> > > b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > index 88dde3455bfd..079b8fe669af 100644
> > > --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > @@ -1995,7 +1995,7 @@ static int mlx5v_probe(struct auxiliary_device 
> > > *adev,
> > >   if (err)
> > >   goto err_reg;
> > >
> > > - dev_set_drvdata(>dev, ndev);
> > > + dev_set_drvdata(>dev, mvdev);
> > >   return 0;
> > >
> > >  err_reg:
> > >
> > > >
> > > >  static const struct auxiliary_device_id mlx5v_id_table[] = {
> > >
> > > > --
> > > > 2.29.2
> > > >


Re: [PATCH] vdpa/mlx5: Extract correct pointer from driver data

2021-02-15 Thread Eli Cohen
On Tue, Feb 16, 2021 at 08:35:51AM +0200, Leon Romanovsky wrote:
> On Tue, Feb 16, 2021 at 07:50:22AM +0200, Eli Cohen wrote:
> > struct mlx5_vdpa_net pointer was stored in drvdata. Extract it as well
> > in mlx5v_remove().
> >
> > Fixes: 74c9729dd892 ("vdpa/mlx5: Connect mlx5_vdpa to auxiliary bus")
> > Signed-off-by: Eli Cohen 
> > ---
> >  drivers/vdpa/mlx5/net/mlx5_vnet.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
> > b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > index 6b0a42183622..4103d3b64a2a 100644
> > --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > @@ -2036,9 +2036,9 @@ static int mlx5v_probe(struct auxiliary_device *adev,
> >
> >  static void mlx5v_remove(struct auxiliary_device *adev)
> >  {
> > -   struct mlx5_vdpa_dev *mvdev = dev_get_drvdata(>dev);
> > +   struct mlx5_vdpa_net *ndev = dev_get_drvdata(>dev);
> >
> > -   vdpa_unregister_device(>vdev);
> > +   vdpa_unregister_device(>mvdev.vdev);
> >  }
> 
> IMHO, The more correct solution is to fix dev_set_drvdata() call,
> because we are regustering/unregistering/allocating "struct mlx5_vdpa_dev".
> 

We're allocating "struct mlx5_vdpa_net". "struct mlx5_vdpa_dev" is just
a member field of "struct mlx5_vdpa_net".

> diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
> b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> index 88dde3455bfd..079b8fe669af 100644
> --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> @@ -1995,7 +1995,7 @@ static int mlx5v_probe(struct auxiliary_device *adev,
>   if (err)
>   goto err_reg;
> 
> - dev_set_drvdata(>dev, ndev);
> + dev_set_drvdata(>dev, mvdev);
>   return 0;
> 
>  err_reg:
> 
> >
> >  static const struct auxiliary_device_id mlx5v_id_table[] = {
> 
> > --
> > 2.29.2
> >


[PATCH] vdpa/mlx5: Extract correct pointer from driver data

2021-02-15 Thread Eli Cohen
struct mlx5_vdpa_net pointer was stored in drvdata. Extract it as well
in mlx5v_remove().

Fixes: 74c9729dd892 ("vdpa/mlx5: Connect mlx5_vdpa to auxiliary bus")
Signed-off-by: Eli Cohen 
---
 drivers/vdpa/mlx5/net/mlx5_vnet.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index 6b0a42183622..4103d3b64a2a 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -2036,9 +2036,9 @@ static int mlx5v_probe(struct auxiliary_device *adev,
 
 static void mlx5v_remove(struct auxiliary_device *adev)
 {
-   struct mlx5_vdpa_dev *mvdev = dev_get_drvdata(>dev);
+   struct mlx5_vdpa_net *ndev = dev_get_drvdata(>dev);
 
-   vdpa_unregister_device(>vdev);
+   vdpa_unregister_device(>mvdev.vdev);
 }
 
 static const struct auxiliary_device_id mlx5v_id_table[] = {
-- 
2.29.2



[PATCH] vdpa/mlx5: Extract correct pointer from driver data

2021-02-15 Thread Eli Cohen
struct mlx5_vdpa_net pointer was stored in drvdata. Extract it as well
in mlx5v_remove().

Fixes: 74c9729dd892 ("vdpa/mlx5: Connect mlx5_vdpa to auxiliary bus")
Signed-off-by: Eli Cohen 
---
 drivers/vdpa/mlx5/net/mlx5_vnet.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index 6b0a42183622..4103d3b64a2a 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -2036,9 +2036,9 @@ static int mlx5v_probe(struct auxiliary_device *adev,
 
 static void mlx5v_remove(struct auxiliary_device *adev)
 {
-   struct mlx5_vdpa_dev *mvdev = dev_get_drvdata(>dev);
+   struct mlx5_vdpa_net *ndev = dev_get_drvdata(>dev);
 
-   vdpa_unregister_device(>vdev);
+   vdpa_unregister_device(>mvdev.vdev);
 }
 
 static const struct auxiliary_device_id mlx5v_id_table[] = {
-- 
2.29.2



Re: [PATCH v2 3/3] vdpa/mlx5: defer clear_virtqueues to until DRIVER_OK

2021-02-10 Thread Eli Cohen
On Wed, Feb 10, 2021 at 01:48:00PM -0800, Si-Wei Liu wrote:
> While virtq is stopped,  get_vq_state() is supposed to
> be  called to  get  sync'ed  with  the latest internal
> avail_index from device. The saved avail_index is used
> to restate  the virtq  once device is started.  Commit
> b35ccebe3ef7 introduced the clear_virtqueues() routine
> to  reset  the saved  avail_index,  however, the index
> gets cleared a bit earlier before get_vq_state() tries
> to read it. This would cause consistency problems when
> virtq is restarted, e.g. through a series of link down
> and link up events. We  could  defer  the  clearing of
> avail_index  to  until  the  device  is to be started,
> i.e. until  VIRTIO_CONFIG_S_DRIVER_OK  is set again in
> set_status().
> 
> Fixes: b35ccebe3ef7 ("vdpa/mlx5: Restore the hardware used index after change 
> map")
> Signed-off-by: Si-Wei Liu 
> Acked-by: Jason Wang 

Acked-by: Eli Cohen 

> ---
>  drivers/vdpa/mlx5/net/mlx5_vnet.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
> b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> index 7c1f789..ce6aae8 100644
> --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> @@ -1777,7 +1777,6 @@ static void mlx5_vdpa_set_status(struct vdpa_device 
> *vdev, u8 status)
>   if (!status) {
>   mlx5_vdpa_info(mvdev, "performing device reset\n");
>   teardown_driver(ndev);
> - clear_virtqueues(ndev);
>   mlx5_vdpa_destroy_mr(>mvdev);
>   ndev->mvdev.status = 0;
>   ++mvdev->generation;
> @@ -1786,6 +1785,7 @@ static void mlx5_vdpa_set_status(struct vdpa_device 
> *vdev, u8 status)
>  
>   if ((status ^ ndev->mvdev.status) & VIRTIO_CONFIG_S_DRIVER_OK) {
>   if (status & VIRTIO_CONFIG_S_DRIVER_OK) {
> + clear_virtqueues(ndev);
>   err = setup_driver(ndev);
>   if (err) {
>   mlx5_vdpa_warn(mvdev, "failed to setup 
> driver\n");
> -- 
> 1.8.3.1
> 


Re: [PATCH v2 2/3] vdpa/mlx5: fix feature negotiation across device reset

2021-02-10 Thread Eli Cohen
On Wed, Feb 10, 2021 at 01:47:59PM -0800, Si-Wei Liu wrote:
> The mlx_features denotes the capability for which
> set of virtio features is supported by device. In
> principle, this field needs not be cleared during
> virtio device reset, as this capability is static
> and does not change across reset.
> 
> In fact, the current code may have the assumption
> that mlx_features can be reloaded from firmware
> via the .get_features ops after device is reset
> (via the .set_status ops), which is unfortunately
> not true. The userspace VMM might save a copy
> of backend capable features and won't call into
> kernel again to get it on reset. This causes all
> virtio features getting disabled on newly created
> virtqs after device reset, while guest would hold
> mismatched view of available features. For e.g.,
> the guest may still assume tx checksum offload
> is available after reset and feature negotiation,
> causing frames with bogus (incomplete) checksum
> transmitted on the wire.
> 
> Fixes: 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for supported mlx5 devices")
> Signed-off-by: Si-Wei Liu 

Acked-by: Eli Cohen 

> ---
>  drivers/vdpa/mlx5/net/mlx5_vnet.c | 25 +++--
>  1 file changed, 15 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
> b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> index b8416c4..7c1f789 100644
> --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> @@ -1486,16 +1486,8 @@ static u64 mlx_to_vritio_features(u16 dev_features)
>  static u64 mlx5_vdpa_get_features(struct vdpa_device *vdev)
>  {
>   struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
> - struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
> - u16 dev_features;
>  
> - dev_features = MLX5_CAP_DEV_VDPA_EMULATION(mvdev->mdev, 
> device_features_bits_mask);
> - ndev->mvdev.mlx_features = mlx_to_vritio_features(dev_features);
> - if (MLX5_CAP_DEV_VDPA_EMULATION(mvdev->mdev, virtio_version_1_0))
> - ndev->mvdev.mlx_features |= BIT_ULL(VIRTIO_F_VERSION_1);
> - ndev->mvdev.mlx_features |= BIT_ULL(VIRTIO_F_ACCESS_PLATFORM);
> - print_features(mvdev, ndev->mvdev.mlx_features, false);
> - return ndev->mvdev.mlx_features;
> + return mvdev->mlx_features;
>  }
>  
>  static int verify_min_features(struct mlx5_vdpa_dev *mvdev, u64 features)
> @@ -1788,7 +1780,6 @@ static void mlx5_vdpa_set_status(struct vdpa_device 
> *vdev, u8 status)
>   clear_virtqueues(ndev);
>   mlx5_vdpa_destroy_mr(>mvdev);
>   ndev->mvdev.status = 0;
> - ndev->mvdev.mlx_features = 0;
>   ++mvdev->generation;
>   return;
>   }
> @@ -1907,6 +1898,19 @@ static int mlx5_get_vq_irq(struct vdpa_device *vdv, 
> u16 idx)
>   .free = mlx5_vdpa_free,
>  };
>  
> +static void query_virtio_features(struct mlx5_vdpa_net *ndev)
> +{
> + struct mlx5_vdpa_dev *mvdev = >mvdev;
> + u16 dev_features;
> +
> + dev_features = MLX5_CAP_DEV_VDPA_EMULATION(mvdev->mdev, 
> device_features_bits_mask);
> + mvdev->mlx_features = mlx_to_vritio_features(dev_features);
> + if (MLX5_CAP_DEV_VDPA_EMULATION(mvdev->mdev, virtio_version_1_0))
> + mvdev->mlx_features |= BIT_ULL(VIRTIO_F_VERSION_1);
> + mvdev->mlx_features |= BIT_ULL(VIRTIO_F_ACCESS_PLATFORM);
> + print_features(mvdev, mvdev->mlx_features, false);
> +}
> +
>  static int query_mtu(struct mlx5_core_dev *mdev, u16 *mtu)
>  {
>   u16 hw_mtu;
> @@ -2005,6 +2009,7 @@ static int mlx5v_probe(struct auxiliary_device *adev,
>   init_mvqs(ndev);
>   mutex_init(>reslock);
>   config = >config;
> + query_virtio_features(ndev);
>   err = query_mtu(mdev, >mtu);
>   if (err)
>   goto err_mtu;
> -- 
> 1.8.3.1
> 


Re: [PATCH v1] vdpa/mlx5: Restore the hardware used index after change map

2021-02-10 Thread Eli Cohen
On Wed, Feb 10, 2021 at 12:59:03AM -0800, Si-Wei Liu wrote:
> 
> 
> On 2/9/2021 7:53 PM, Jason Wang wrote:
> > 
> > On 2021/2/10 上午10:30, Si-Wei Liu wrote:
> > > 
> > > 
> > > On 2/8/2021 10:37 PM, Jason Wang wrote:
> > > > 
> > > > On 2021/2/9 下午2:12, Eli Cohen wrote:
> > > > > On Tue, Feb 09, 2021 at 11:20:14AM +0800, Jason Wang wrote:
> > > > > > On 2021/2/8 下午6:04, Eli Cohen wrote:
> > > > > > > On Mon, Feb 08, 2021 at 05:04:27PM +0800, Jason Wang wrote:
> > > > > > > > On 2021/2/8 下午2:37, Eli Cohen wrote:
> > > > > > > > > On Mon, Feb 08, 2021 at 12:27:18PM +0800, Jason Wang wrote:
> > > > > > > > > > On 2021/2/6 上午7:07, Si-Wei Liu wrote:
> > > > > > > > > > > On 2/3/2021 11:36 PM, Eli Cohen wrote:
> > > > > > > > > > > > When a change of memory map
> > > > > > > > > > > > occurs, the hardware resources
> > > > > > > > > > > > are destroyed
> > > > > > > > > > > > and then re-created again with
> > > > > > > > > > > > the new memory map. In such
> > > > > > > > > > > > case, we need
> > > > > > > > > > > > to restore the hardware
> > > > > > > > > > > > available and used indices. The
> > > > > > > > > > > > driver failed to
> > > > > > > > > > > > restore the used index which is added here.
> > > > > > > > > > > > 
> > > > > > > > > > > > Also, since the driver also
> > > > > > > > > > > > fails to reset the available and
> > > > > > > > > > > > used
> > > > > > > > > > > > indices upon device reset, fix
> > > > > > > > > > > > this here to avoid regression
> > > > > > > > > > > > caused by
> > > > > > > > > > > > the fact that used index may not be zero upon device 
> > > > > > > > > > > > reset.
> > > > > > > > > > > > 
> > > > > > > > > > > > Fixes: 1a86b377aa21 ("vdpa/mlx5:
> > > > > > > > > > > > Add VDPA driver for supported
> > > > > > > > > > > > mlx5
> > > > > > > > > > > > devices")
> > > > > > > > > > > > Signed-off-by: Eli Cohen
> > > > > > > > > > > > ---
> > > > > > > > > > > > v0 -> v1:
> > > > > > > > > > > > Clear indices upon device reset
> > > > > > > > > > > > 
> > > > > > > > > > > >      drivers/vdpa/mlx5/net/mlx5_vnet.c | 18 
> > > > > > > > > > > > ++
> > > > > > > > > > > >      1 file changed, 18 insertions(+)
> > > > > > > > > > > > 
> > > > > > > > > > > > diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > > > > > > > > b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > > > > > > > > index 88dde3455bfd..b5fe6d2ad22f 100644
> > > > > > > > > > > > --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > > > > > > > > +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > > > > > > > > @@ -87,6 +87,7 @@ struct mlx5_vq_restore_info {
> > > > > > > > > > > >      u64 device_addr;
> > > > > > > > > > > >      u64 driver_addr;
> > > > > > > > > > > >      u16 avail_index;
> > > > > > > > > > > > +    u16 used_index;
> > > > > > > > > > > >      bool ready;
> > > > > > > > > > > >      struct vdpa_callback cb;
> > > > > > > > > > > >      bool restore;
> > > > > > > > > > > > @@ -121,6 +122,7 @@ struct mlx5_vdpa_virtqueue {
&

Re: [PATCH v1] vdpa/mlx5: Restore the hardware used index after change map

2021-02-08 Thread Eli Cohen
On Tue, Feb 09, 2021 at 11:20:14AM +0800, Jason Wang wrote:
> 
> On 2021/2/8 下午6:04, Eli Cohen wrote:
> > On Mon, Feb 08, 2021 at 05:04:27PM +0800, Jason Wang wrote:
> > > On 2021/2/8 下午2:37, Eli Cohen wrote:
> > > > On Mon, Feb 08, 2021 at 12:27:18PM +0800, Jason Wang wrote:
> > > > > On 2021/2/6 上午7:07, Si-Wei Liu wrote:
> > > > > > On 2/3/2021 11:36 PM, Eli Cohen wrote:
> > > > > > > When a change of memory map occurs, the hardware resources are 
> > > > > > > destroyed
> > > > > > > and then re-created again with the new memory map. In such case, 
> > > > > > > we need
> > > > > > > to restore the hardware available and used indices. The driver 
> > > > > > > failed to
> > > > > > > restore the used index which is added here.
> > > > > > > 
> > > > > > > Also, since the driver also fails to reset the available and used
> > > > > > > indices upon device reset, fix this here to avoid regression 
> > > > > > > caused by
> > > > > > > the fact that used index may not be zero upon device reset.
> > > > > > > 
> > > > > > > Fixes: 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for supported 
> > > > > > > mlx5
> > > > > > > devices")
> > > > > > > Signed-off-by: Eli Cohen
> > > > > > > ---
> > > > > > > v0 -> v1:
> > > > > > > Clear indices upon device reset
> > > > > > > 
> > > > > > >     drivers/vdpa/mlx5/net/mlx5_vnet.c | 18 ++
> > > > > > >     1 file changed, 18 insertions(+)
> > > > > > > 
> > > > > > > diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > > > b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > > > index 88dde3455bfd..b5fe6d2ad22f 100644
> > > > > > > --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > > > +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > > > @@ -87,6 +87,7 @@ struct mlx5_vq_restore_info {
> > > > > > >     u64 device_addr;
> > > > > > >     u64 driver_addr;
> > > > > > >     u16 avail_index;
> > > > > > > +    u16 used_index;
> > > > > > >     bool ready;
> > > > > > >     struct vdpa_callback cb;
> > > > > > >     bool restore;
> > > > > > > @@ -121,6 +122,7 @@ struct mlx5_vdpa_virtqueue {
> > > > > > >     u32 virtq_id;
> > > > > > >     struct mlx5_vdpa_net *ndev;
> > > > > > >     u16 avail_idx;
> > > > > > > +    u16 used_idx;
> > > > > > >     int fw_state;
> > > > > > >       /* keep last in the struct */
> > > > > > > @@ -804,6 +806,7 @@ static int create_virtqueue(struct 
> > > > > > > mlx5_vdpa_net
> > > > > > > *ndev, struct mlx5_vdpa_virtque
> > > > > > >       obj_context = MLX5_ADDR_OF(create_virtio_net_q_in, in,
> > > > > > > obj_context);
> > > > > > >     MLX5_SET(virtio_net_q_object, obj_context, 
> > > > > > > hw_available_index,
> > > > > > > mvq->avail_idx);
> > > > > > > +    MLX5_SET(virtio_net_q_object, obj_context, hw_used_index,
> > > > > > > mvq->used_idx);
> > > > > > >     MLX5_SET(virtio_net_q_object, obj_context,
> > > > > > > queue_feature_bit_mask_12_3,
> > > > > > >      get_features_12_3(ndev->mvdev.actual_features));
> > > > > > >     vq_ctx = MLX5_ADDR_OF(virtio_net_q_object, obj_context,
> > > > > > > virtio_q_context);
> > > > > > > @@ -1022,6 +1025,7 @@ static int connect_qps(struct mlx5_vdpa_net
> > > > > > > *ndev, struct mlx5_vdpa_virtqueue *m
> > > > > > >     struct mlx5_virtq_attr {
> > > > > > >     u8 state;
> > > > > > >     u16 available_index;
> > > > > > > +    u16 used_index;
> > > > > > >     };
> > > > > > >       static int query_virtqueue(struct mlx5_vdpa_net *n

Re: [PATCH] vdpa/mlx5: fix param validation in mlx5_vdpa_get_config()

2021-02-08 Thread Eli Cohen
On Mon, Feb 08, 2021 at 05:17:41PM +0100, Stefano Garzarella wrote:
> It's legal to have 'offset + len' equal to
> sizeof(struct virtio_net_config), since 'ndev->config' is a
> 'struct virtio_net_config', so we can safely copy its content under
> this condition.
> 
> Fixes: 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for supported mlx5 devices")
> Cc: sta...@vger.kernel.org
> Signed-off-by: Stefano Garzarella 

Acked-by: Eli Cohen 

BTW, same error in vdpa_sim you may want to fix.

> ---
>  drivers/vdpa/mlx5/net/mlx5_vnet.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
> b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> index dc88559a8d49..10e9b09932eb 100644
> --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> @@ -1820,7 +1820,7 @@ static void mlx5_vdpa_get_config(struct vdpa_device 
> *vdev, unsigned int offset,
>   struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
>   struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
>  
> - if (offset + len < sizeof(struct virtio_net_config))
> + if (offset + len <= sizeof(struct virtio_net_config))
>   memcpy(buf, (u8 *)>config + offset, len);
>  }
>  
> -- 
> 2.29.2
> 


Re: [PATCH v1] vdpa/mlx5: Restore the hardware used index after change map

2021-02-08 Thread Eli Cohen
On Mon, Feb 08, 2021 at 05:04:27PM +0800, Jason Wang wrote:
> 
> On 2021/2/8 下午2:37, Eli Cohen wrote:
> > On Mon, Feb 08, 2021 at 12:27:18PM +0800, Jason Wang wrote:
> > > On 2021/2/6 上午7:07, Si-Wei Liu wrote:
> > > > 
> > > > On 2/3/2021 11:36 PM, Eli Cohen wrote:
> > > > > When a change of memory map occurs, the hardware resources are 
> > > > > destroyed
> > > > > and then re-created again with the new memory map. In such case, we 
> > > > > need
> > > > > to restore the hardware available and used indices. The driver failed 
> > > > > to
> > > > > restore the used index which is added here.
> > > > > 
> > > > > Also, since the driver also fails to reset the available and used
> > > > > indices upon device reset, fix this here to avoid regression caused by
> > > > > the fact that used index may not be zero upon device reset.
> > > > > 
> > > > > Fixes: 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for supported mlx5
> > > > > devices")
> > > > > Signed-off-by: Eli Cohen 
> > > > > ---
> > > > > v0 -> v1:
> > > > > Clear indices upon device reset
> > > > > 
> > > > >    drivers/vdpa/mlx5/net/mlx5_vnet.c | 18 ++
> > > > >    1 file changed, 18 insertions(+)
> > > > > 
> > > > > diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > index 88dde3455bfd..b5fe6d2ad22f 100644
> > > > > --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > @@ -87,6 +87,7 @@ struct mlx5_vq_restore_info {
> > > > >    u64 device_addr;
> > > > >    u64 driver_addr;
> > > > >    u16 avail_index;
> > > > > +    u16 used_index;
> > > > >    bool ready;
> > > > >    struct vdpa_callback cb;
> > > > >    bool restore;
> > > > > @@ -121,6 +122,7 @@ struct mlx5_vdpa_virtqueue {
> > > > >    u32 virtq_id;
> > > > >    struct mlx5_vdpa_net *ndev;
> > > > >    u16 avail_idx;
> > > > > +    u16 used_idx;
> > > > >    int fw_state;
> > > > >      /* keep last in the struct */
> > > > > @@ -804,6 +806,7 @@ static int create_virtqueue(struct mlx5_vdpa_net
> > > > > *ndev, struct mlx5_vdpa_virtque
> > > > >      obj_context = MLX5_ADDR_OF(create_virtio_net_q_in, in,
> > > > > obj_context);
> > > > >    MLX5_SET(virtio_net_q_object, obj_context, hw_available_index,
> > > > > mvq->avail_idx);
> > > > > +    MLX5_SET(virtio_net_q_object, obj_context, hw_used_index,
> > > > > mvq->used_idx);
> > > > >    MLX5_SET(virtio_net_q_object, obj_context,
> > > > > queue_feature_bit_mask_12_3,
> > > > >     get_features_12_3(ndev->mvdev.actual_features));
> > > > >    vq_ctx = MLX5_ADDR_OF(virtio_net_q_object, obj_context,
> > > > > virtio_q_context);
> > > > > @@ -1022,6 +1025,7 @@ static int connect_qps(struct mlx5_vdpa_net
> > > > > *ndev, struct mlx5_vdpa_virtqueue *m
> > > > >    struct mlx5_virtq_attr {
> > > > >    u8 state;
> > > > >    u16 available_index;
> > > > > +    u16 used_index;
> > > > >    };
> > > > >      static int query_virtqueue(struct mlx5_vdpa_net *ndev, struct
> > > > > mlx5_vdpa_virtqueue *mvq,
> > > > > @@ -1052,6 +1056,7 @@ static int query_virtqueue(struct
> > > > > mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueu
> > > > >    memset(attr, 0, sizeof(*attr));
> > > > >    attr->state = MLX5_GET(virtio_net_q_object, obj_context, 
> > > > > state);
> > > > >    attr->available_index = MLX5_GET(virtio_net_q_object,
> > > > > obj_context, hw_available_index);
> > > > > +    attr->used_index = MLX5_GET(virtio_net_q_object, obj_context,
> > > > > hw_used_index);
> > > > >    kfree(out);
> > > > >    return 0;
> > > > >    @@ -1535,6 +1540,16 @@ static void teardown_virtqueues(struct
> > > > > mlx5_vdpa_net *ndev)
> > > &g

Re: [PATCH v1] vdpa/mlx5: Restore the hardware used index after change map

2021-02-07 Thread Eli Cohen
On Mon, Feb 08, 2021 at 12:27:18PM +0800, Jason Wang wrote:
> 
> On 2021/2/6 上午7:07, Si-Wei Liu wrote:
> > 
> > 
> > On 2/3/2021 11:36 PM, Eli Cohen wrote:
> > > When a change of memory map occurs, the hardware resources are destroyed
> > > and then re-created again with the new memory map. In such case, we need
> > > to restore the hardware available and used indices. The driver failed to
> > > restore the used index which is added here.
> > > 
> > > Also, since the driver also fails to reset the available and used
> > > indices upon device reset, fix this here to avoid regression caused by
> > > the fact that used index may not be zero upon device reset.
> > > 
> > > Fixes: 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for supported mlx5
> > > devices")
> > > Signed-off-by: Eli Cohen 
> > > ---
> > > v0 -> v1:
> > > Clear indices upon device reset
> > > 
> > >   drivers/vdpa/mlx5/net/mlx5_vnet.c | 18 ++
> > >   1 file changed, 18 insertions(+)
> > > 
> > > diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > index 88dde3455bfd..b5fe6d2ad22f 100644
> > > --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > @@ -87,6 +87,7 @@ struct mlx5_vq_restore_info {
> > >   u64 device_addr;
> > >   u64 driver_addr;
> > >   u16 avail_index;
> > > +    u16 used_index;
> > >   bool ready;
> > >   struct vdpa_callback cb;
> > >   bool restore;
> > > @@ -121,6 +122,7 @@ struct mlx5_vdpa_virtqueue {
> > >   u32 virtq_id;
> > >   struct mlx5_vdpa_net *ndev;
> > >   u16 avail_idx;
> > > +    u16 used_idx;
> > >   int fw_state;
> > >     /* keep last in the struct */
> > > @@ -804,6 +806,7 @@ static int create_virtqueue(struct mlx5_vdpa_net
> > > *ndev, struct mlx5_vdpa_virtque
> > >     obj_context = MLX5_ADDR_OF(create_virtio_net_q_in, in,
> > > obj_context);
> > >   MLX5_SET(virtio_net_q_object, obj_context, hw_available_index,
> > > mvq->avail_idx);
> > > +    MLX5_SET(virtio_net_q_object, obj_context, hw_used_index,
> > > mvq->used_idx);
> > >   MLX5_SET(virtio_net_q_object, obj_context,
> > > queue_feature_bit_mask_12_3,
> > >    get_features_12_3(ndev->mvdev.actual_features));
> > >   vq_ctx = MLX5_ADDR_OF(virtio_net_q_object, obj_context,
> > > virtio_q_context);
> > > @@ -1022,6 +1025,7 @@ static int connect_qps(struct mlx5_vdpa_net
> > > *ndev, struct mlx5_vdpa_virtqueue *m
> > >   struct mlx5_virtq_attr {
> > >   u8 state;
> > >   u16 available_index;
> > > +    u16 used_index;
> > >   };
> > >     static int query_virtqueue(struct mlx5_vdpa_net *ndev, struct
> > > mlx5_vdpa_virtqueue *mvq,
> > > @@ -1052,6 +1056,7 @@ static int query_virtqueue(struct
> > > mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueu
> > >   memset(attr, 0, sizeof(*attr));
> > >   attr->state = MLX5_GET(virtio_net_q_object, obj_context, state);
> > >   attr->available_index = MLX5_GET(virtio_net_q_object,
> > > obj_context, hw_available_index);
> > > +    attr->used_index = MLX5_GET(virtio_net_q_object, obj_context,
> > > hw_used_index);
> > >   kfree(out);
> > >   return 0;
> > >   @@ -1535,6 +1540,16 @@ static void teardown_virtqueues(struct
> > > mlx5_vdpa_net *ndev)
> > >   }
> > >   }
> > >   +static void clear_virtqueues(struct mlx5_vdpa_net *ndev)
> > > +{
> > > +    int i;
> > > +
> > > +    for (i = ndev->mvdev.max_vqs - 1; i >= 0; i--) {
> > > +    ndev->vqs[i].avail_idx = 0;
> > > +    ndev->vqs[i].used_idx = 0;
> > > +    }
> > > +}
> > > +
> > >   /* TODO: cross-endian support */
> > >   static inline bool mlx5_vdpa_is_little_endian(struct mlx5_vdpa_dev
> > > *mvdev)
> > >   {
> > > @@ -1610,6 +1625,7 @@ static int save_channel_info(struct
> > > mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqu
> > >   return err;
> > >     ri->avail_index = attr.available_index;
> > > +    ri->used_index = attr.used_index;
> > >   ri->ready = mvq->ready;
> > >   ri->num_ent = mvq->num_ent;
> > >

Re: [PATCH 3/3] mlx5_vdpa: defer clear_virtqueues to until DRIVER_OK

2021-02-07 Thread Eli Cohen
On Sat, Feb 06, 2021 at 04:29:24AM -0800, Si-Wei Liu wrote:
> While virtq is stopped,  get_vq_state() is supposed to
> be  called to  get  sync'ed  with  the latest internal
> avail_index from device. The saved avail_index is used
> to restate  the virtq  once device is started.  Commit
> b35ccebe3ef7 introduced the clear_virtqueues() routine
> to  reset  the saved  avail_index,  however, the index
> gets cleared a bit earlier before get_vq_state() tries
> to read it. This would cause consistency problems when
> virtq is restarted, e.g. through a series of link down
> and link up events. We  could  defer  the  clearing of
> avail_index  to  until  the  device  is to be started,
> i.e. until  VIRTIO_CONFIG_S_DRIVER_OK  is set again in
> set_status().


Not sure I understand the scenario. You are talking about reset of the
device followed by up/down events on the interface. How can you trigger
this?

> 
> Fixes: b35ccebe3ef7 ("vdpa/mlx5: Restore the hardware used index after change 
> map")
> Signed-off-by: Si-Wei Liu 
> ---
>  drivers/vdpa/mlx5/net/mlx5_vnet.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
> b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> index aa6f8cd..444ab58 100644
> --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> @@ -1785,7 +1785,6 @@ static void mlx5_vdpa_set_status(struct vdpa_device 
> *vdev, u8 status)
>   if (!status) {
>   mlx5_vdpa_info(mvdev, "performing device reset\n");
>   teardown_driver(ndev);
> - clear_virtqueues(ndev);
>   mlx5_vdpa_destroy_mr(>mvdev);
>   ndev->mvdev.status = 0;
>   ++mvdev->generation;
> @@ -1794,6 +1793,7 @@ static void mlx5_vdpa_set_status(struct vdpa_device 
> *vdev, u8 status)
>  
>   if ((status ^ ndev->mvdev.status) & VIRTIO_CONFIG_S_DRIVER_OK) {
>   if (status & VIRTIO_CONFIG_S_DRIVER_OK) {
> + clear_virtqueues(ndev);
>   err = setup_driver(ndev);
>   if (err) {
>   mlx5_vdpa_warn(mvdev, "failed to setup 
> driver\n");
> -- 
> 1.8.3.1
> 


Re: [PATCH 1/3] mlx5_vdpa: should exclude header length and fcs from mtu

2021-02-07 Thread Eli Cohen
On Sat, Feb 06, 2021 at 04:29:22AM -0800, Si-Wei Liu wrote:
> When feature VIRTIO_NET_F_MTU is negotiated on mlx5_vdpa,
> 22 extra bytes worth of MTU length is shown in guest.
> This is because the mlx5_query_port_max_mtu API returns
> the "hardware" MTU value, which does not just contain the
> Ethernet payload, but includes extra lengths starting
> from the Ethernet header up to the FCS altogether.
> 
> Fix the MTU so packets won't get dropped silently.
> 
> Signed-off-by: Si-Wei Liu 

Acked-by: Eli Cohen 

> ---
>  drivers/vdpa/mlx5/core/mlx5_vdpa.h |  4 
>  drivers/vdpa/mlx5/net/mlx5_vnet.c  | 15 ++-
>  2 files changed, 18 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/vdpa/mlx5/core/mlx5_vdpa.h 
> b/drivers/vdpa/mlx5/core/mlx5_vdpa.h
> index 08f742f..b6cc53b 100644
> --- a/drivers/vdpa/mlx5/core/mlx5_vdpa.h
> +++ b/drivers/vdpa/mlx5/core/mlx5_vdpa.h
> @@ -4,9 +4,13 @@
>  #ifndef __MLX5_VDPA_H__
>  #define __MLX5_VDPA_H__
>  
> +#include 
> +#include 
>  #include 
>  #include 
>  
> +#define MLX5V_ETH_HARD_MTU (ETH_HLEN + VLAN_HLEN + ETH_FCS_LEN)
> +
>  struct mlx5_vdpa_direct_mr {
>   u64 start;
>   u64 end;
> diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
> b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> index dc88559..b8416c4 100644
> --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> @@ -1907,6 +1907,19 @@ static int mlx5_get_vq_irq(struct vdpa_device *vdv, 
> u16 idx)
>   .free = mlx5_vdpa_free,
>  };
>  
> +static int query_mtu(struct mlx5_core_dev *mdev, u16 *mtu)
> +{
> + u16 hw_mtu;
> + int err;
> +
> + err = mlx5_query_nic_vport_mtu(mdev, _mtu);
> + if (err)
> + return err;
> +
> + *mtu = hw_mtu - MLX5V_ETH_HARD_MTU;
> + return 0;
> +}
> +
>  static int alloc_resources(struct mlx5_vdpa_net *ndev)
>  {
>   struct mlx5_vdpa_net_resources *res = >res;
> @@ -1992,7 +2005,7 @@ static int mlx5v_probe(struct auxiliary_device *adev,
>   init_mvqs(ndev);
>   mutex_init(>reslock);
>   config = >config;
> - err = mlx5_query_nic_vport_mtu(mdev, >mtu);
> + err = query_mtu(mdev, >mtu);
>   if (err)
>   goto err_mtu;
>  
> -- 
> 1.8.3.1
> 


Re: [PATCH 2/3] mlx5_vdpa: fix feature negotiation across device reset

2021-02-07 Thread Eli Cohen
On Sat, Feb 06, 2021 at 04:29:23AM -0800, Si-Wei Liu wrote:
> The mlx_features denotes the capability for which
> set of virtio features is supported by device. In
> principle, this field needs not be cleared during
> virtio device reset, as this capability is static
> and does not change across reset.
> 
> In fact, the current code may have the assumption
> that mlx_features can be reloaded from firmware
> via the .get_features ops after device is reset
> (via the .set_status ops), which is unfortunately
> not true. The userspace VMM might save a copy
> of backend capable features and won't call into
> kernel again to get it on reset. This causes all
> virtio features getting disabled on newly created
> virtqs after device reset, while guest would hold
> mismatched view of available features. For e.g.,
> the guest may still assume tx checksum offload
> is available after reset and feature negotiation,
> causing frames with bogus (incomplete) checksum
> transmitted on the wire.
> 
> Signed-off-by: Si-Wei Liu 
> ---
>  drivers/vdpa/mlx5/net/mlx5_vnet.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
> b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> index b8416c4..aa6f8cd 100644
> --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> @@ -1788,7 +1788,6 @@ static void mlx5_vdpa_set_status(struct vdpa_device 
> *vdev, u8 status)
>   clear_virtqueues(ndev);
>   mlx5_vdpa_destroy_mr(>mvdev);
>   ndev->mvdev.status = 0;
> - ndev->mvdev.mlx_features = 0;
>   ++mvdev->generation;
>   return;
>   }

Since we assume that device capabilities don't change, I think I would
get the features through a call done in mlx5v_probe after the netdev
object is created and change mlx5_vdpa_get_features() to just return
ndev->mvdev.mlx_features.

Did you actually see this issue in action? If you did, can you share
with us how you trigerred this?

> -- 
> 1.8.3.1
> 


[PATCH v1] vdpa/mlx5: Restore the hardware used index after change map

2021-02-03 Thread Eli Cohen
When a change of memory map occurs, the hardware resources are destroyed
and then re-created again with the new memory map. In such case, we need
to restore the hardware available and used indices. The driver failed to
restore the used index which is added here.

Also, since the driver also fails to reset the available and used
indices upon device reset, fix this here to avoid regression caused by
the fact that used index may not be zero upon device reset.

Fixes: 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for supported mlx5 devices")
Signed-off-by: Eli Cohen 
---
v0 -> v1:
Clear indices upon device reset

 drivers/vdpa/mlx5/net/mlx5_vnet.c | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index 88dde3455bfd..b5fe6d2ad22f 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -87,6 +87,7 @@ struct mlx5_vq_restore_info {
u64 device_addr;
u64 driver_addr;
u16 avail_index;
+   u16 used_index;
bool ready;
struct vdpa_callback cb;
bool restore;
@@ -121,6 +122,7 @@ struct mlx5_vdpa_virtqueue {
u32 virtq_id;
struct mlx5_vdpa_net *ndev;
u16 avail_idx;
+   u16 used_idx;
int fw_state;
 
/* keep last in the struct */
@@ -804,6 +806,7 @@ static int create_virtqueue(struct mlx5_vdpa_net *ndev, 
struct mlx5_vdpa_virtque
 
obj_context = MLX5_ADDR_OF(create_virtio_net_q_in, in, obj_context);
MLX5_SET(virtio_net_q_object, obj_context, hw_available_index, 
mvq->avail_idx);
+   MLX5_SET(virtio_net_q_object, obj_context, hw_used_index, 
mvq->used_idx);
MLX5_SET(virtio_net_q_object, obj_context, queue_feature_bit_mask_12_3,
 get_features_12_3(ndev->mvdev.actual_features));
vq_ctx = MLX5_ADDR_OF(virtio_net_q_object, obj_context, 
virtio_q_context);
@@ -1022,6 +1025,7 @@ static int connect_qps(struct mlx5_vdpa_net *ndev, struct 
mlx5_vdpa_virtqueue *m
 struct mlx5_virtq_attr {
u8 state;
u16 available_index;
+   u16 used_index;
 };
 
 static int query_virtqueue(struct mlx5_vdpa_net *ndev, struct 
mlx5_vdpa_virtqueue *mvq,
@@ -1052,6 +1056,7 @@ static int query_virtqueue(struct mlx5_vdpa_net *ndev, 
struct mlx5_vdpa_virtqueu
memset(attr, 0, sizeof(*attr));
attr->state = MLX5_GET(virtio_net_q_object, obj_context, state);
attr->available_index = MLX5_GET(virtio_net_q_object, obj_context, 
hw_available_index);
+   attr->used_index = MLX5_GET(virtio_net_q_object, obj_context, 
hw_used_index);
kfree(out);
return 0;
 
@@ -1535,6 +1540,16 @@ static void teardown_virtqueues(struct mlx5_vdpa_net 
*ndev)
}
 }
 
+static void clear_virtqueues(struct mlx5_vdpa_net *ndev)
+{
+   int i;
+
+   for (i = ndev->mvdev.max_vqs - 1; i >= 0; i--) {
+   ndev->vqs[i].avail_idx = 0;
+   ndev->vqs[i].used_idx = 0;
+   }
+}
+
 /* TODO: cross-endian support */
 static inline bool mlx5_vdpa_is_little_endian(struct mlx5_vdpa_dev *mvdev)
 {
@@ -1610,6 +1625,7 @@ static int save_channel_info(struct mlx5_vdpa_net *ndev, 
struct mlx5_vdpa_virtqu
return err;
 
ri->avail_index = attr.available_index;
+   ri->used_index = attr.used_index;
ri->ready = mvq->ready;
ri->num_ent = mvq->num_ent;
ri->desc_addr = mvq->desc_addr;
@@ -1654,6 +1670,7 @@ static void restore_channels_info(struct mlx5_vdpa_net 
*ndev)
continue;
 
mvq->avail_idx = ri->avail_index;
+   mvq->used_idx = ri->used_index;
mvq->ready = ri->ready;
mvq->num_ent = ri->num_ent;
mvq->desc_addr = ri->desc_addr;
@@ -1768,6 +1785,7 @@ static void mlx5_vdpa_set_status(struct vdpa_device 
*vdev, u8 status)
if (!status) {
mlx5_vdpa_info(mvdev, "performing device reset\n");
teardown_driver(ndev);
+   clear_virtqueues(ndev);
mlx5_vdpa_destroy_mr(>mvdev);
ndev->mvdev.status = 0;
ndev->mvdev.mlx_features = 0;
-- 
2.29.2



Re: [PATCH] vdpa/mlx5: Restore the hardware used index after change map

2021-02-03 Thread Eli Cohen
On Wed, Feb 03, 2021 at 12:33:26PM -0800, Si-Wei Liu wrote:
> On Tue, Feb 2, 2021 at 10:48 PM Eli Cohen  wrote:
> >
> > On Tue, Feb 02, 2021 at 09:14:02AM -0800, Si-Wei Liu wrote:
> > > On Tue, Feb 2, 2021 at 6:34 AM Eli Cohen  wrote:
> > > >
> > > > When a change of memory map occurs, the hardware resources are destroyed
> > > > and then re-created again with the new memory map. In such case, we need
> > > > to restore the hardware available and used indices. The driver failed to
> > > > restore the used index which is added here.
> > > >
> > > > Fixes 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for supported mlx5 
> > > > devices")
> > > > Signed-off-by: Eli Cohen 
> > > > ---
> > > > This patch is being sent again a single patch the fixes hot memory
> > > > addtion to a qemy process.
> > > >
> > > >  drivers/vdpa/mlx5/net/mlx5_vnet.c | 7 +++
> > > >  1 file changed, 7 insertions(+)
> > > >
> > > > diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
> > > > b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > index 88dde3455bfd..839f57c64a6f 100644
> > > > --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > @@ -87,6 +87,7 @@ struct mlx5_vq_restore_info {
> > > > u64 device_addr;
> > > > u64 driver_addr;
> > > > u16 avail_index;
> > > > +   u16 used_index;
> > > > bool ready;
> > > > struct vdpa_callback cb;
> > > > bool restore;
> > > > @@ -121,6 +122,7 @@ struct mlx5_vdpa_virtqueue {
> > > > u32 virtq_id;
> > > > struct mlx5_vdpa_net *ndev;
> > > > u16 avail_idx;
> > > > +   u16 used_idx;
> > > > int fw_state;
> > > >
> > > > /* keep last in the struct */
> > > > @@ -804,6 +806,7 @@ static int create_virtqueue(struct mlx5_vdpa_net 
> > > > *ndev, struct mlx5_vdpa_virtque
> > > >
> > > > obj_context = MLX5_ADDR_OF(create_virtio_net_q_in, in, 
> > > > obj_context);
> > > > MLX5_SET(virtio_net_q_object, obj_context, hw_available_index, 
> > > > mvq->avail_idx);
> > > > +   MLX5_SET(virtio_net_q_object, obj_context, hw_used_index, 
> > > > mvq->used_idx);
> > >
> > > The saved indexes will apply to the new virtqueue object whenever it
> > > is created. In virtio spec, these indexes will reset back to zero when
> > > the virtio device is reset. But I don't see how it's done today. IOW,
> > > I don't see where avail_idx and used_idx get cleared from the mvq for
> > > device reset via set_status().
> > >
> >
> > Right, but this is not strictly related to this patch. I will post
> > another patch to fix this.
> 
> Better to post these two patches in a series.Or else it may cause VM
> reboot problem as that is where the device gets reset. The avail_index
> did not as the correct value will be written to by driver right after,
> but used_idx introduced by this patch is supplied by device hence this
> patch alone would introduce regression.
> 

Thinking it over, I think this should be all fixed in a single patch.
This fix alone introduces a regerssion as you pointed and there's no
point in fixing it in another patch.

> >
> > BTW, can you describe a secnario that would cause a reset (through
> > calling set_status()) that happens after the VQ has been used?
> 
> You can try reboot the guest, that'll be the easy way to test.
> 
> -Siwei
> 
> >
> > > -Siwei
> > >
> > >
> > > > MLX5_SET(virtio_net_q_object, obj_context, 
> > > > queue_feature_bit_mask_12_3,
> > > >  get_features_12_3(ndev->mvdev.actual_features));
> > > > vq_ctx = MLX5_ADDR_OF(virtio_net_q_object, obj_context, 
> > > > virtio_q_context);
> > > > @@ -1022,6 +1025,7 @@ static int connect_qps(struct mlx5_vdpa_net 
> > > > *ndev, struct mlx5_vdpa_virtqueue *m
> > > >  struct mlx5_virtq_attr {
> > > > u8 state;
> > > > u16 available_index;
> > > > +   u16 used_index;
> > > >  };
> > > >
> > > >  static int query_virtqueue(struct mlx5_vdpa_net *ndev, struct 
> > > > mlx5_vdpa_virtqueue *mvq,
> > > > @@ -105

Re: [PATCH] vdpa/mlx5: Restore the hardware used index after change map

2021-02-03 Thread Eli Cohen
On Wed, Feb 03, 2021 at 12:33:26PM -0800, Si-Wei Liu wrote:
> On Tue, Feb 2, 2021 at 10:48 PM Eli Cohen  wrote:
> >
> > On Tue, Feb 02, 2021 at 09:14:02AM -0800, Si-Wei Liu wrote:
> > > On Tue, Feb 2, 2021 at 6:34 AM Eli Cohen  wrote:
> > > >
> > > > When a change of memory map occurs, the hardware resources are destroyed
> > > > and then re-created again with the new memory map. In such case, we need
> > > > to restore the hardware available and used indices. The driver failed to
> > > > restore the used index which is added here.
> > > >
> > > > Fixes 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for supported mlx5 
> > > > devices")
> > > > Signed-off-by: Eli Cohen 
> > > > ---
> > > > This patch is being sent again a single patch the fixes hot memory
> > > > addtion to a qemy process.
> > > >
> > > >  drivers/vdpa/mlx5/net/mlx5_vnet.c | 7 +++
> > > >  1 file changed, 7 insertions(+)
> > > >
> > > > diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
> > > > b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > index 88dde3455bfd..839f57c64a6f 100644
> > > > --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > @@ -87,6 +87,7 @@ struct mlx5_vq_restore_info {
> > > > u64 device_addr;
> > > > u64 driver_addr;
> > > > u16 avail_index;
> > > > +   u16 used_index;
> > > > bool ready;
> > > > struct vdpa_callback cb;
> > > > bool restore;
> > > > @@ -121,6 +122,7 @@ struct mlx5_vdpa_virtqueue {
> > > > u32 virtq_id;
> > > > struct mlx5_vdpa_net *ndev;
> > > > u16 avail_idx;
> > > > +   u16 used_idx;
> > > > int fw_state;
> > > >
> > > > /* keep last in the struct */
> > > > @@ -804,6 +806,7 @@ static int create_virtqueue(struct mlx5_vdpa_net 
> > > > *ndev, struct mlx5_vdpa_virtque
> > > >
> > > > obj_context = MLX5_ADDR_OF(create_virtio_net_q_in, in, 
> > > > obj_context);
> > > > MLX5_SET(virtio_net_q_object, obj_context, hw_available_index, 
> > > > mvq->avail_idx);
> > > > +   MLX5_SET(virtio_net_q_object, obj_context, hw_used_index, 
> > > > mvq->used_idx);
> > >
> > > The saved indexes will apply to the new virtqueue object whenever it
> > > is created. In virtio spec, these indexes will reset back to zero when
> > > the virtio device is reset. But I don't see how it's done today. IOW,
> > > I don't see where avail_idx and used_idx get cleared from the mvq for
> > > device reset via set_status().
> > >
> >
> > Right, but this is not strictly related to this patch. I will post
> > another patch to fix this.
> 
> Better to post these two patches in a series.Or else it may cause VM
> reboot problem as that is where the device gets reset. The avail_index
> did not as the correct value will be written to by driver right after,
> but used_idx introduced by this patch is supplied by device hence this
> patch alone would introduce regression.
> 

OK, will do.

> >
> > BTW, can you describe a secnario that would cause a reset (through
> > calling set_status()) that happens after the VQ has been used?
> 
> You can try reboot the guest, that'll be the easy way to test.
> 

Thanks!

> -Siwei
> 
> >
> > > -Siwei
> > >
> > >
> > > > MLX5_SET(virtio_net_q_object, obj_context, 
> > > > queue_feature_bit_mask_12_3,
> > > >  get_features_12_3(ndev->mvdev.actual_features));
> > > > vq_ctx = MLX5_ADDR_OF(virtio_net_q_object, obj_context, 
> > > > virtio_q_context);
> > > > @@ -1022,6 +1025,7 @@ static int connect_qps(struct mlx5_vdpa_net 
> > > > *ndev, struct mlx5_vdpa_virtqueue *m
> > > >  struct mlx5_virtq_attr {
> > > > u8 state;
> > > > u16 available_index;
> > > > +   u16 used_index;
> > > >  };
> > > >
> > > >  static int query_virtqueue(struct mlx5_vdpa_net *ndev, struct 
> > > > mlx5_vdpa_virtqueue *mvq,
> > > > @@ -1052,6 +1056,7 @@ static int query_virtqueue(struct mlx5_vdpa_net 
> > > > *ndev, struct mlx5_vdpa_virtqueu
> > > > memset(attr, 0, sizeo

Re: [PATCH 1/2] vdpa/mlx5: Avoid unnecessary query virtqueue

2021-02-03 Thread Eli Cohen
On Wed, Feb 03, 2021 at 03:19:40PM -0800, Si-Wei Liu wrote:
> On Tue, Feb 2, 2021 at 9:16 PM Jason Wang  wrote:
> >
> >
> > On 2021/2/3 上午1:54, Si-Wei Liu wrote:
> > > On Tue, Feb 2, 2021 at 1:23 AM Eli Cohen  wrote:
> > >> On Tue, Feb 02, 2021 at 12:38:51AM -0800, Si-Wei Liu wrote:
> > >>> Thanks Eli and Jason for clarifications. See inline.
> > >>>
> > >>> On Mon, Feb 1, 2021 at 11:06 PM Eli Cohen  wrote:
> > >>>> On Tue, Feb 02, 2021 at 02:02:25PM +0800, Jason Wang wrote:
> > >>>>> On 2021/2/2 下午12:15, Si-Wei Liu wrote:
> > >>>>>> On Mon, Feb 1, 2021 at 7:13 PM Jason Wang  
> > >>>>>> wrote:
> > >>>>>>> On 2021/2/2 上午3:17, Si-Wei Liu wrote:
> > >>>>>>>> On Mon, Feb 1, 2021 at 10:51 AM Si-Wei Liu 
> > >>>>>>>>  wrote:
> > >>>>>>>>> On Thu, Jan 28, 2021 at 5:46 AM Eli Cohen  wrote:
> > >>>>>>>>>> suspend_vq should only suspend the VQ on not save the current 
> > >>>>>>>>>> available
> > >>>>>>>>>> index. This is done when a change of map occurs when the driver 
> > >>>>>>>>>> calls
> > >>>>>>>>>> save_channel_info().
> > >>>>>>>>> Hmmm, suspend_vq() is also called by teardown_vq(), the latter of
> > >>>>>>>>> which doesn't save the available index as save_channel_info() 
> > >>>>>>>>> doesn't
> > >>>>>>>>> get called in that path at all. How does it handle the case that
> > >>>>>>>>> aget_vq_state() is called from userspace (e.g. QEMU) while the
> > >>>>>>>>> hardware VQ object was torn down, but userspace still wants to 
> > >>>>>>>>> access
> > >>>>>>>>> the queue index?
> > >>>>>>>>>
> > >>>>>>>>> Refer to 
> > >>>>>>>>> https://lore.kernel.org/netdev/1601583511-15138-1-git-send-email-si-wei@oracle.com/
> > >>>>>>>>>
> > >>>>>>>>> vhost VQ 0 ring restore failed: -1: Resource temporarily 
> > >>>>>>>>> unavailable (11)
> > >>>>>>>>> vhost VQ 1 ring restore failed: -1: Resource temporarily 
> > >>>>>>>>> unavailable (11)
> > >>>>>>>>>
> > >>>>>>>>> QEMU will complain with the above warning while VM is being 
> > >>>>>>>>> rebooted
> > >>>>>>>>> or shut down.
> > >>>>>>>>>
> > >>>>>>>>> Looks to me either the kernel driver should cover this 
> > >>>>>>>>> requirement, or
> > >>>>>>>>> the userspace has to bear the burden in saving the index and not 
> > >>>>>>>>> call
> > >>>>>>>>> into kernel if VQ is destroyed.
> > >>>>>>>> Actually, the userspace doesn't have the insights whether virt 
> > >>>>>>>> queue
> > >>>>>>>> will be destroyed if just changing the device status via 
> > >>>>>>>> set_status().
> > >>>>>>>> Looking at other vdpa driver in tree i.e. ifcvf it doesn't behave 
> > >>>>>>>> like
> > >>>>>>>> so. Hence this still looks to me to be Mellanox specifics and
> > >>>>>>>> mlx5_vdpa implementation detail that shouldn't expose to userspace.
> > >>>>>>> So I think we can simply drop this patch?
> > >>>>>> Yep, I think so. To be honest I don't know why it has anything to do
> > >>>>>> with the memory hotplug issue.
> > >>>>>
> > >>>>> Eli may know more, my understanding is that, during memory hotplut, 
> > >>>>> qemu
> > >>>>> need to propagate new memory mappings via set_map(). For mellanox, it 
> > >>>>> means
> > >>>>> it needs to rebuild memory keys, so the virtqueue needs to be 
> > >>>>> suspended.
> > >>

Re: [PATCH] vdpa/mlx5: Restore the hardware used index after change map

2021-02-02 Thread Eli Cohen
On Tue, Feb 02, 2021 at 09:14:02AM -0800, Si-Wei Liu wrote:
> On Tue, Feb 2, 2021 at 6:34 AM Eli Cohen  wrote:
> >
> > When a change of memory map occurs, the hardware resources are destroyed
> > and then re-created again with the new memory map. In such case, we need
> > to restore the hardware available and used indices. The driver failed to
> > restore the used index which is added here.
> >
> > Fixes 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for supported mlx5 devices")
> > Signed-off-by: Eli Cohen 
> > ---
> > This patch is being sent again a single patch the fixes hot memory
> > addtion to a qemy process.
> >
> >  drivers/vdpa/mlx5/net/mlx5_vnet.c | 7 +++
> >  1 file changed, 7 insertions(+)
> >
> > diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
> > b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > index 88dde3455bfd..839f57c64a6f 100644
> > --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > @@ -87,6 +87,7 @@ struct mlx5_vq_restore_info {
> > u64 device_addr;
> > u64 driver_addr;
> > u16 avail_index;
> > +   u16 used_index;
> > bool ready;
> > struct vdpa_callback cb;
> > bool restore;
> > @@ -121,6 +122,7 @@ struct mlx5_vdpa_virtqueue {
> > u32 virtq_id;
> > struct mlx5_vdpa_net *ndev;
> > u16 avail_idx;
> > +   u16 used_idx;
> > int fw_state;
> >
> > /* keep last in the struct */
> > @@ -804,6 +806,7 @@ static int create_virtqueue(struct mlx5_vdpa_net *ndev, 
> > struct mlx5_vdpa_virtque
> >
> > obj_context = MLX5_ADDR_OF(create_virtio_net_q_in, in, obj_context);
> > MLX5_SET(virtio_net_q_object, obj_context, hw_available_index, 
> > mvq->avail_idx);
> > +   MLX5_SET(virtio_net_q_object, obj_context, hw_used_index, 
> > mvq->used_idx);
> 
> The saved indexes will apply to the new virtqueue object whenever it
> is created. In virtio spec, these indexes will reset back to zero when
> the virtio device is reset. But I don't see how it's done today. IOW,
> I don't see where avail_idx and used_idx get cleared from the mvq for
> device reset via set_status().
> 

Right, but this is not strictly related to this patch. I will post
another patch to fix this.

BTW, can you describe a secnario that would cause a reset (through
calling set_status()) that happens after the VQ has been used?

> -Siwei
> 
> 
> > MLX5_SET(virtio_net_q_object, obj_context, 
> > queue_feature_bit_mask_12_3,
> >  get_features_12_3(ndev->mvdev.actual_features));
> > vq_ctx = MLX5_ADDR_OF(virtio_net_q_object, obj_context, 
> > virtio_q_context);
> > @@ -1022,6 +1025,7 @@ static int connect_qps(struct mlx5_vdpa_net *ndev, 
> > struct mlx5_vdpa_virtqueue *m
> >  struct mlx5_virtq_attr {
> > u8 state;
> > u16 available_index;
> > +   u16 used_index;
> >  };
> >
> >  static int query_virtqueue(struct mlx5_vdpa_net *ndev, struct 
> > mlx5_vdpa_virtqueue *mvq,
> > @@ -1052,6 +1056,7 @@ static int query_virtqueue(struct mlx5_vdpa_net 
> > *ndev, struct mlx5_vdpa_virtqueu
> > memset(attr, 0, sizeof(*attr));
> > attr->state = MLX5_GET(virtio_net_q_object, obj_context, state);
> > attr->available_index = MLX5_GET(virtio_net_q_object, obj_context, 
> > hw_available_index);
> > +   attr->used_index = MLX5_GET(virtio_net_q_object, obj_context, 
> > hw_used_index);
> > kfree(out);
> > return 0;
> >
> > @@ -1610,6 +1615,7 @@ static int save_channel_info(struct mlx5_vdpa_net 
> > *ndev, struct mlx5_vdpa_virtqu
> > return err;
> >
> > ri->avail_index = attr.available_index;
> > +   ri->used_index = attr.used_index;
> > ri->ready = mvq->ready;
> > ri->num_ent = mvq->num_ent;
> > ri->desc_addr = mvq->desc_addr;
> > @@ -1654,6 +1660,7 @@ static void restore_channels_info(struct 
> > mlx5_vdpa_net *ndev)
> > continue;
> >
> > mvq->avail_idx = ri->avail_index;
> > +   mvq->used_idx = ri->used_index;
> > mvq->ready = ri->ready;
> > mvq->num_ent = ri->num_ent;
> > mvq->desc_addr = ri->desc_addr;
> > --
> > 2.29.2
> >


[PATCH] vdpa/mlx5: Restore the hardware used index after change map

2021-02-02 Thread Eli Cohen
When a change of memory map occurs, the hardware resources are destroyed
and then re-created again with the new memory map. In such case, we need
to restore the hardware available and used indices. The driver failed to
restore the used index which is added here.

Fixes 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for supported mlx5 devices")
Signed-off-by: Eli Cohen 
---
This patch is being sent again a single patch the fixes hot memory
addtion to a qemy process.

 drivers/vdpa/mlx5/net/mlx5_vnet.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index 88dde3455bfd..839f57c64a6f 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -87,6 +87,7 @@ struct mlx5_vq_restore_info {
u64 device_addr;
u64 driver_addr;
u16 avail_index;
+   u16 used_index;
bool ready;
struct vdpa_callback cb;
bool restore;
@@ -121,6 +122,7 @@ struct mlx5_vdpa_virtqueue {
u32 virtq_id;
struct mlx5_vdpa_net *ndev;
u16 avail_idx;
+   u16 used_idx;
int fw_state;
 
/* keep last in the struct */
@@ -804,6 +806,7 @@ static int create_virtqueue(struct mlx5_vdpa_net *ndev, 
struct mlx5_vdpa_virtque
 
obj_context = MLX5_ADDR_OF(create_virtio_net_q_in, in, obj_context);
MLX5_SET(virtio_net_q_object, obj_context, hw_available_index, 
mvq->avail_idx);
+   MLX5_SET(virtio_net_q_object, obj_context, hw_used_index, 
mvq->used_idx);
MLX5_SET(virtio_net_q_object, obj_context, queue_feature_bit_mask_12_3,
 get_features_12_3(ndev->mvdev.actual_features));
vq_ctx = MLX5_ADDR_OF(virtio_net_q_object, obj_context, 
virtio_q_context);
@@ -1022,6 +1025,7 @@ static int connect_qps(struct mlx5_vdpa_net *ndev, struct 
mlx5_vdpa_virtqueue *m
 struct mlx5_virtq_attr {
u8 state;
u16 available_index;
+   u16 used_index;
 };
 
 static int query_virtqueue(struct mlx5_vdpa_net *ndev, struct 
mlx5_vdpa_virtqueue *mvq,
@@ -1052,6 +1056,7 @@ static int query_virtqueue(struct mlx5_vdpa_net *ndev, 
struct mlx5_vdpa_virtqueu
memset(attr, 0, sizeof(*attr));
attr->state = MLX5_GET(virtio_net_q_object, obj_context, state);
attr->available_index = MLX5_GET(virtio_net_q_object, obj_context, 
hw_available_index);
+   attr->used_index = MLX5_GET(virtio_net_q_object, obj_context, 
hw_used_index);
kfree(out);
return 0;
 
@@ -1610,6 +1615,7 @@ static int save_channel_info(struct mlx5_vdpa_net *ndev, 
struct mlx5_vdpa_virtqu
return err;
 
ri->avail_index = attr.available_index;
+   ri->used_index = attr.used_index;
ri->ready = mvq->ready;
ri->num_ent = mvq->num_ent;
ri->desc_addr = mvq->desc_addr;
@@ -1654,6 +1660,7 @@ static void restore_channels_info(struct mlx5_vdpa_net 
*ndev)
continue;
 
mvq->avail_idx = ri->avail_index;
+   mvq->used_idx = ri->used_index;
mvq->ready = ri->ready;
mvq->num_ent = ri->num_ent;
mvq->desc_addr = ri->desc_addr;
-- 
2.29.2



Re: [PATCH 1/2] vdpa/mlx5: Avoid unnecessary query virtqueue

2021-02-02 Thread Eli Cohen
On Tue, Feb 02, 2021 at 12:38:51AM -0800, Si-Wei Liu wrote:
> Thanks Eli and Jason for clarifications. See inline.
> 
> On Mon, Feb 1, 2021 at 11:06 PM Eli Cohen  wrote:
> >
> > On Tue, Feb 02, 2021 at 02:02:25PM +0800, Jason Wang wrote:
> > >
> > > On 2021/2/2 下午12:15, Si-Wei Liu wrote:
> > > > On Mon, Feb 1, 2021 at 7:13 PM Jason Wang  wrote:
> > > > >
> > > > > On 2021/2/2 上午3:17, Si-Wei Liu wrote:
> > > > > > On Mon, Feb 1, 2021 at 10:51 AM Si-Wei Liu 
> > > > > >  wrote:
> > > > > > > On Thu, Jan 28, 2021 at 5:46 AM Eli Cohen  wrote:
> > > > > > > > suspend_vq should only suspend the VQ on not save the current 
> > > > > > > > available
> > > > > > > > index. This is done when a change of map occurs when the driver 
> > > > > > > > calls
> > > > > > > > save_channel_info().
> > > > > > > Hmmm, suspend_vq() is also called by teardown_vq(), the latter of
> > > > > > > which doesn't save the available index as save_channel_info() 
> > > > > > > doesn't
> > > > > > > get called in that path at all. How does it handle the case that
> > > > > > > aget_vq_state() is called from userspace (e.g. QEMU) while the
> > > > > > > hardware VQ object was torn down, but userspace still wants to 
> > > > > > > access
> > > > > > > the queue index?
> > > > > > >
> > > > > > > Refer to 
> > > > > > > https://lore.kernel.org/netdev/1601583511-15138-1-git-send-email-si-wei@oracle.com/
> > > > > > >
> > > > > > > vhost VQ 0 ring restore failed: -1: Resource temporarily 
> > > > > > > unavailable (11)
> > > > > > > vhost VQ 1 ring restore failed: -1: Resource temporarily 
> > > > > > > unavailable (11)
> > > > > > >
> > > > > > > QEMU will complain with the above warning while VM is being 
> > > > > > > rebooted
> > > > > > > or shut down.
> > > > > > >
> > > > > > > Looks to me either the kernel driver should cover this 
> > > > > > > requirement, or
> > > > > > > the userspace has to bear the burden in saving the index and not 
> > > > > > > call
> > > > > > > into kernel if VQ is destroyed.
> > > > > > Actually, the userspace doesn't have the insights whether virt queue
> > > > > > will be destroyed if just changing the device status via 
> > > > > > set_status().
> > > > > > Looking at other vdpa driver in tree i.e. ifcvf it doesn't behave 
> > > > > > like
> > > > > > so. Hence this still looks to me to be Mellanox specifics and
> > > > > > mlx5_vdpa implementation detail that shouldn't expose to userspace.
> > > > >
> > > > > So I think we can simply drop this patch?
> > > > Yep, I think so. To be honest I don't know why it has anything to do
> > > > with the memory hotplug issue.
> > >
> > >
> > > Eli may know more, my understanding is that, during memory hotplut, qemu
> > > need to propagate new memory mappings via set_map(). For mellanox, it 
> > > means
> > > it needs to rebuild memory keys, so the virtqueue needs to be suspended.
> > >
> >
> > I think Siwei was asking why the first patch was related to the hotplug
> > issue.
> 
> I was thinking how consistency is assured when saving/restoring this
> h/w avail_index against the one in the virtq memory, particularly in
> the region_add/.region_del() context (e.g. the hotplug case). Problem
> is I don't see explicit memory barrier when guest thread updates the
> avail_index, how does the device make sure the h/w avail_index is
> uptodate while guest may race with updating the virtq's avail_index in
> the mean while? Maybe I completely miss something obvious?

If you're asking about syncronization upon hot plug of memory, the
hardware always goes to read the available index from memory when a new
hardware object is associted with a virtqueue. You can argue then that
you don't need to restore the available index and you may be right but
this is the currect inteface to the firmware.


If you're asking on generally how sync is assured when the guest updates
the available index, can you please

Re: [PATCH 1/2] vdpa/mlx5: Avoid unnecessary query virtqueue

2021-02-01 Thread Eli Cohen
On Tue, Feb 02, 2021 at 02:02:25PM +0800, Jason Wang wrote:
> 
> On 2021/2/2 下午12:15, Si-Wei Liu wrote:
> > On Mon, Feb 1, 2021 at 7:13 PM Jason Wang  wrote:
> > > 
> > > On 2021/2/2 上午3:17, Si-Wei Liu wrote:
> > > > On Mon, Feb 1, 2021 at 10:51 AM Si-Wei Liu  
> > > > wrote:
> > > > > On Thu, Jan 28, 2021 at 5:46 AM Eli Cohen  wrote:
> > > > > > suspend_vq should only suspend the VQ on not save the current 
> > > > > > available
> > > > > > index. This is done when a change of map occurs when the driver 
> > > > > > calls
> > > > > > save_channel_info().
> > > > > Hmmm, suspend_vq() is also called by teardown_vq(), the latter of
> > > > > which doesn't save the available index as save_channel_info() doesn't
> > > > > get called in that path at all. How does it handle the case that
> > > > > aget_vq_state() is called from userspace (e.g. QEMU) while the
> > > > > hardware VQ object was torn down, but userspace still wants to access
> > > > > the queue index?
> > > > > 
> > > > > Refer to 
> > > > > https://lore.kernel.org/netdev/1601583511-15138-1-git-send-email-si-wei@oracle.com/
> > > > > 
> > > > > vhost VQ 0 ring restore failed: -1: Resource temporarily unavailable 
> > > > > (11)
> > > > > vhost VQ 1 ring restore failed: -1: Resource temporarily unavailable 
> > > > > (11)
> > > > > 
> > > > > QEMU will complain with the above warning while VM is being rebooted
> > > > > or shut down.
> > > > > 
> > > > > Looks to me either the kernel driver should cover this requirement, or
> > > > > the userspace has to bear the burden in saving the index and not call
> > > > > into kernel if VQ is destroyed.
> > > > Actually, the userspace doesn't have the insights whether virt queue
> > > > will be destroyed if just changing the device status via set_status().
> > > > Looking at other vdpa driver in tree i.e. ifcvf it doesn't behave like
> > > > so. Hence this still looks to me to be Mellanox specifics and
> > > > mlx5_vdpa implementation detail that shouldn't expose to userspace.
> > > 
> > > So I think we can simply drop this patch?
> > Yep, I think so. To be honest I don't know why it has anything to do
> > with the memory hotplug issue.
> 
> 
> Eli may know more, my understanding is that, during memory hotplut, qemu
> need to propagate new memory mappings via set_map(). For mellanox, it means
> it needs to rebuild memory keys, so the virtqueue needs to be suspended.
> 

I think Siwei was asking why the first patch was related to the hotplug
issue.

But you're correct. When memory is added, I get a new memory map. This
requires me to build a new memory key object which covers the new memory
map. Since the virtqueue objects are referencing this memory key, I need
to destroy them and build new ones referncing the new memory key.

> Thanks
> 
> 
> > 
> > -Siwei
> > 
> > > Thanks
> > > 
> > > 
> > > > > -Siwei
> > > > > 
> > > > > 
> > > > > > Signed-off-by: Eli Cohen 
> > > > > > ---
> > > > > >drivers/vdpa/mlx5/net/mlx5_vnet.c | 8 
> > > > > >1 file changed, 8 deletions(-)
> > > > > > 
> > > > > > diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
> > > > > > b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > > index 88dde3455bfd..549ded074ff3 100644
> > > > > > --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > > +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > > @@ -1148,8 +1148,6 @@ static int setup_vq(struct mlx5_vdpa_net 
> > > > > > *ndev, struct mlx5_vdpa_virtqueue *mvq)
> > > > > > 
> > > > > >static void suspend_vq(struct mlx5_vdpa_net *ndev, struct 
> > > > > > mlx5_vdpa_virtqueue *mvq)
> > > > > >{
> > > > > > -   struct mlx5_virtq_attr attr;
> > > > > > -
> > > > > >   if (!mvq->initialized)
> > > > > >   return;
> > > > > > 
> > > > > > @@ -1158,12 +1156,6 @@ static void suspend_vq(struct mlx5_vdpa_net 
> > > > > > *ndev, struct mlx5_vdpa_virtqueue *m
> > > > > > 
> > > > > >   if (modify_virtqueue(ndev, mvq, 
> > > > > > MLX5_VIRTIO_NET_Q_OBJECT_STATE_SUSPEND))
> > > > > >   mlx5_vdpa_warn(>mvdev, "modify to suspend 
> > > > > > failed\n");
> > > > > > -
> > > > > > -   if (query_virtqueue(ndev, mvq, )) {
> > > > > > -   mlx5_vdpa_warn(>mvdev, "failed to query 
> > > > > > virtqueue\n");
> > > > > > -   return;
> > > > > > -   }
> > > > > > -   mvq->avail_idx = attr.available_index;
> > > > > >}
> > > > > > 
> > > > > >static void suspend_vqs(struct mlx5_vdpa_net *ndev)
> > > > > > --
> > > > > > 2.29.2
> > > > > > 
> 


Re: [PATCH 1/2] vdpa/mlx5: Avoid unnecessary query virtqueue

2021-02-01 Thread Eli Cohen
On Mon, Feb 01, 2021 at 08:15:29PM -0800, Si-Wei Liu wrote:
> On Mon, Feb 1, 2021 at 7:13 PM Jason Wang  wrote:
> >
> >
> > On 2021/2/2 上午3:17, Si-Wei Liu wrote:
> > > On Mon, Feb 1, 2021 at 10:51 AM Si-Wei Liu  
> > > wrote:
> > >> On Thu, Jan 28, 2021 at 5:46 AM Eli Cohen  wrote:
> > >>> suspend_vq should only suspend the VQ on not save the current available
> > >>> index. This is done when a change of map occurs when the driver calls
> > >>> save_channel_info().
> > >> Hmmm, suspend_vq() is also called by teardown_vq(), the latter of
> > >> which doesn't save the available index as save_channel_info() doesn't
> > >> get called in that path at all. How does it handle the case that
> > >> aget_vq_state() is called from userspace (e.g. QEMU) while the
> > >> hardware VQ object was torn down, but userspace still wants to access
> > >> the queue index?
> > >>
> > >> Refer to 
> > >> https://lore.kernel.org/netdev/1601583511-15138-1-git-send-email-si-wei@oracle.com/
> > >>
> > >> vhost VQ 0 ring restore failed: -1: Resource temporarily unavailable (11)
> > >> vhost VQ 1 ring restore failed: -1: Resource temporarily unavailable (11)
> > >>
> > >> QEMU will complain with the above warning while VM is being rebooted
> > >> or shut down.
> > >>
> > >> Looks to me either the kernel driver should cover this requirement, or
> > >> the userspace has to bear the burden in saving the index and not call
> > >> into kernel if VQ is destroyed.
> > > Actually, the userspace doesn't have the insights whether virt queue
> > > will be destroyed if just changing the device status via set_status().
> > > Looking at other vdpa driver in tree i.e. ifcvf it doesn't behave like
> > > so. Hence this still looks to me to be Mellanox specifics and
> > > mlx5_vdpa implementation detail that shouldn't expose to userspace.
> >
> >
> > So I think we can simply drop this patch?
> 
> Yep, I think so. To be honest I don't know why it has anything to do
> with the memory hotplug issue.

No relation. That's why I put them in two different patches. Only the
second one is the fix as I stated in the cover letter.

Anyway, let's just take the second patch.

Michael, do you need me to send PATCH 2 again as a single patch or can
you just take it?


> 
> -Siwei
> 
> >
> > Thanks
> >
> >
> > >> -Siwei
> > >>
> > >>
> > >>> Signed-off-by: Eli Cohen 
> > >>> ---
> > >>>   drivers/vdpa/mlx5/net/mlx5_vnet.c | 8 
> > >>>   1 file changed, 8 deletions(-)
> > >>>
> > >>> diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
> > >>> b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > >>> index 88dde3455bfd..549ded074ff3 100644
> > >>> --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > >>> +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > >>> @@ -1148,8 +1148,6 @@ static int setup_vq(struct mlx5_vdpa_net *ndev, 
> > >>> struct mlx5_vdpa_virtqueue *mvq)
> > >>>
> > >>>   static void suspend_vq(struct mlx5_vdpa_net *ndev, struct 
> > >>> mlx5_vdpa_virtqueue *mvq)
> > >>>   {
> > >>> -   struct mlx5_virtq_attr attr;
> > >>> -
> > >>>  if (!mvq->initialized)
> > >>>  return;
> > >>>
> > >>> @@ -1158,12 +1156,6 @@ static void suspend_vq(struct mlx5_vdpa_net 
> > >>> *ndev, struct mlx5_vdpa_virtqueue *m
> > >>>
> > >>>  if (modify_virtqueue(ndev, mvq, 
> > >>> MLX5_VIRTIO_NET_Q_OBJECT_STATE_SUSPEND))
> > >>>  mlx5_vdpa_warn(>mvdev, "modify to suspend 
> > >>> failed\n");
> > >>> -
> > >>> -   if (query_virtqueue(ndev, mvq, )) {
> > >>> -   mlx5_vdpa_warn(>mvdev, "failed to query 
> > >>> virtqueue\n");
> > >>> -   return;
> > >>> -   }
> > >>> -   mvq->avail_idx = attr.available_index;
> > >>>   }
> > >>>
> > >>>   static void suspend_vqs(struct mlx5_vdpa_net *ndev)
> > >>> --
> > >>> 2.29.2
> > >>>
> >


Re: [PATCH 1/2] vdpa/mlx5: Avoid unnecessary query virtqueue

2021-02-01 Thread Eli Cohen
On Tue, Feb 02, 2021 at 11:12:51AM +0800, Jason Wang wrote:
> 
> On 2021/2/2 上午3:17, Si-Wei Liu wrote:
> > On Mon, Feb 1, 2021 at 10:51 AM Si-Wei Liu  wrote:
> > > On Thu, Jan 28, 2021 at 5:46 AM Eli Cohen  wrote:
> > > > suspend_vq should only suspend the VQ on not save the current available
> > > > index. This is done when a change of map occurs when the driver calls
> > > > save_channel_info().
> > > Hmmm, suspend_vq() is also called by teardown_vq(), the latter of
> > > which doesn't save the available index as save_channel_info() doesn't
> > > get called in that path at all. How does it handle the case that
> > > aget_vq_state() is called from userspace (e.g. QEMU) while the
> > > hardware VQ object was torn down, but userspace still wants to access
> > > the queue index?
> > > 
> > > Refer to 
> > > https://lore.kernel.org/netdev/1601583511-15138-1-git-send-email-si-wei@oracle.com/
> > > 
> > > vhost VQ 0 ring restore failed: -1: Resource temporarily unavailable (11)
> > > vhost VQ 1 ring restore failed: -1: Resource temporarily unavailable (11)
> > > 
> > > QEMU will complain with the above warning while VM is being rebooted
> > > or shut down.
> > > 
> > > Looks to me either the kernel driver should cover this requirement, or
> > > the userspace has to bear the burden in saving the index and not call
> > > into kernel if VQ is destroyed.
> > Actually, the userspace doesn't have the insights whether virt queue
> > will be destroyed if just changing the device status via set_status().
> > Looking at other vdpa driver in tree i.e. ifcvf it doesn't behave like
> > so. Hence this still looks to me to be Mellanox specifics and
> > mlx5_vdpa implementation detail that shouldn't expose to userspace.
> 
> 
> So I think we can simply drop this patch?
> 

Yes, I agree. Let's just avoid it.

> Thanks
> 
> 
> > > -Siwei
> > > 
> > > 
> > > > Signed-off-by: Eli Cohen 
> > > > ---
> > > >   drivers/vdpa/mlx5/net/mlx5_vnet.c | 8 
> > > >   1 file changed, 8 deletions(-)
> > > > 
> > > > diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
> > > > b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > index 88dde3455bfd..549ded074ff3 100644
> > > > --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > @@ -1148,8 +1148,6 @@ static int setup_vq(struct mlx5_vdpa_net *ndev, 
> > > > struct mlx5_vdpa_virtqueue *mvq)
> > > > 
> > > >   static void suspend_vq(struct mlx5_vdpa_net *ndev, struct 
> > > > mlx5_vdpa_virtqueue *mvq)
> > > >   {
> > > > -   struct mlx5_virtq_attr attr;
> > > > -
> > > >  if (!mvq->initialized)
> > > >  return;
> > > > 
> > > > @@ -1158,12 +1156,6 @@ static void suspend_vq(struct mlx5_vdpa_net 
> > > > *ndev, struct mlx5_vdpa_virtqueue *m
> > > > 
> > > >  if (modify_virtqueue(ndev, mvq, 
> > > > MLX5_VIRTIO_NET_Q_OBJECT_STATE_SUSPEND))
> > > >  mlx5_vdpa_warn(>mvdev, "modify to suspend 
> > > > failed\n");
> > > > -
> > > > -   if (query_virtqueue(ndev, mvq, )) {
> > > > -   mlx5_vdpa_warn(>mvdev, "failed to query 
> > > > virtqueue\n");
> > > > -   return;
> > > > -   }
> > > > -   mvq->avail_idx = attr.available_index;
> > > >   }
> > > > 
> > > >   static void suspend_vqs(struct mlx5_vdpa_net *ndev)
> > > > --
> > > > 2.29.2
> > > > 
> 


Re: [PATCH 2/2] vdpa/mlx5: Restore the hardware used index after change map

2021-01-31 Thread Eli Cohen
On Mon, Feb 01, 2021 at 02:00:35PM +0800, Jason Wang wrote:
> 
> On 2021/2/1 下午1:52, Eli Cohen wrote:
> > On Mon, Feb 01, 2021 at 11:36:23AM +0800, Jason Wang wrote:
> > > On 2021/2/1 上午2:55, Eli Cohen wrote:
> > > > On Fri, Jan 29, 2021 at 11:49:45AM +0800, Jason Wang wrote:
> > > > > On 2021/1/28 下午9:41, Eli Cohen wrote:
> > > > > > When a change of memory map occurs, the hardware resources are 
> > > > > > destroyed
> > > > > > and then re-created again with the new memory map. In such case, we 
> > > > > > need
> > > > > > to restore the hardware available and used indices. The driver 
> > > > > > failed to
> > > > > > restore the used index which is added here.
> > > > > > 
> > > > > > Fixes 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for supported mlx5 
> > > > > > devices")
> > > > > > Signed-off-by: Eli Cohen 
> > > > > A question. Does this mean after a vq is suspended, the hw used index 
> > > > > is not
> > > > > equal to vq used index?
> > > > Surely there is just one "Used index" for a VQ. What I was trying to say
> > > > is that after the VQ is suspended, I read the used index by querying the
> > > > hardware. The read result is the used index that the hardware wrote to
> > > > memory.
> > > 
> > > Just to make sure I understand here. So it looks to me we had two index. 
> > > The
> > > first is the used index which is stored in the memory/virtqueue, the 
> > > second
> > > is the one that is stored by the device.
> > > 
> > It is the structures defined in the virtio spec in 2.6.6 for the
> > available ring and 2.6.8 for the used ring. As you know these the
> > available ring is written to by the driver and read by the device. The
> > opposite happens for the used index.
> 
> 
> Right, so for used index it was wrote by device. And the device should have
> an internal used index value that is used to write to the used ring. And the
> code is used to sync the device internal used index if I understand this
> correctly.
> 
> 
> > The reason I need to restore the last known indices is for the new
> > hardware objects to sync on the last state and take over from there.
> 
> 
> Right, after the vq suspending, the questions are:
> 
> 1) is hardware internal used index might not be the same with the used index
> in the virtqueue?
> 

Generally the answer is no because the hardware is the only one writing
it. New objects start up with the initial value configured to them upon
creation. This value was zero before this change.
You could argue that since the hardware has access to virtqueue memory,
it could just read the value from there but it does not.

> or
> 
> 2) can we simply sync the virtqueue's used index to the hardware's used
> index?
> 
Theoretically it could be done but that's not how the hardware works.
One reason is that is not supposed to read from that area. But it is
really hardware implementation detail.
> > 
> > > >After the I create the new hardware object, I need to tell it
> > > > what is the used index (and the available index) as a way to sync it
> > > > with the existing VQ.
> > > 
> > > For avail index I understand that the hardware index is not synced with 
> > > the
> > > avail index stored in the memory/virtqueue. The question is used index, if
> > > the hardware one is not synced with the one in the virtqueue. It means 
> > > after
> > > vq is suspended,  some requests is not completed by the hardware (e.g the
> > > buffer were not put to used ring).
> > > 
> > > This may have implications to live migration, it means those 
> > > unaccomplished
> > > requests needs to be migrated to the destination and resubmitted to the
> > > device. This looks not easy.
> > > 
> > > Thanks
> > > 
> > > 
> > > > This sync is especially important when a change of map occurs while the
> > > > VQ was already used (hence the indices are likely to be non zero). This
> > > > can be triggered by hot adding memory after the VQs have been used.
> > > > 
> > > > > Thanks
> > > > > 
> > > > > 
> > > > > > ---
> > > > > > drivers/vdpa/mlx5/net/mlx5_vnet.c | 7 +++
> > > > > > 1 file changed, 7 insertions(+)
> > &

Re: [PATCH 2/2] vdpa/mlx5: Restore the hardware used index after change map

2021-01-31 Thread Eli Cohen
On Mon, Feb 01, 2021 at 11:36:23AM +0800, Jason Wang wrote:
> 
> On 2021/2/1 上午2:55, Eli Cohen wrote:
> > On Fri, Jan 29, 2021 at 11:49:45AM +0800, Jason Wang wrote:
> > > On 2021/1/28 下午9:41, Eli Cohen wrote:
> > > > When a change of memory map occurs, the hardware resources are destroyed
> > > > and then re-created again with the new memory map. In such case, we need
> > > > to restore the hardware available and used indices. The driver failed to
> > > > restore the used index which is added here.
> > > > 
> > > > Fixes 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for supported mlx5 
> > > > devices")
> > > > Signed-off-by: Eli Cohen 
> > > 
> > > A question. Does this mean after a vq is suspended, the hw used index is 
> > > not
> > > equal to vq used index?
> > Surely there is just one "Used index" for a VQ. What I was trying to say
> > is that after the VQ is suspended, I read the used index by querying the
> > hardware. The read result is the used index that the hardware wrote to
> > memory.
> 
> 
> Just to make sure I understand here. So it looks to me we had two index. The
> first is the used index which is stored in the memory/virtqueue, the second
> is the one that is stored by the device.
> 

It is the structures defined in the virtio spec in 2.6.6 for the
available ring and 2.6.8 for the used ring. As you know these the
available ring is written to by the driver and read by the device. The
opposite happens for the used index.
The reason I need to restore the last known indices is for the new
hardware objects to sync on the last state and take over from there.

> 
> >   After the I create the new hardware object, I need to tell it
> > what is the used index (and the available index) as a way to sync it
> > with the existing VQ.
> 
> 
> For avail index I understand that the hardware index is not synced with the
> avail index stored in the memory/virtqueue. The question is used index, if
> the hardware one is not synced with the one in the virtqueue. It means after
> vq is suspended,  some requests is not completed by the hardware (e.g the
> buffer were not put to used ring).
> 
> This may have implications to live migration, it means those unaccomplished
> requests needs to be migrated to the destination and resubmitted to the
> device. This looks not easy.
> 
> Thanks
> 
> 
> > 
> > This sync is especially important when a change of map occurs while the
> > VQ was already used (hence the indices are likely to be non zero). This
> > can be triggered by hot adding memory after the VQs have been used.
> > 
> > > Thanks
> > > 
> > > 
> > > > ---
> > > >drivers/vdpa/mlx5/net/mlx5_vnet.c | 7 +++
> > > >1 file changed, 7 insertions(+)
> > > > 
> > > > diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
> > > > b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > index 549ded074ff3..3fc8588cecae 100644
> > > > --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > @@ -87,6 +87,7 @@ struct mlx5_vq_restore_info {
> > > > u64 device_addr;
> > > > u64 driver_addr;
> > > > u16 avail_index;
> > > > +   u16 used_index;
> > > > bool ready;
> > > > struct vdpa_callback cb;
> > > > bool restore;
> > > > @@ -121,6 +122,7 @@ struct mlx5_vdpa_virtqueue {
> > > > u32 virtq_id;
> > > > struct mlx5_vdpa_net *ndev;
> > > > u16 avail_idx;
> > > > +   u16 used_idx;
> > > > int fw_state;
> > > > /* keep last in the struct */
> > > > @@ -804,6 +806,7 @@ static int create_virtqueue(struct mlx5_vdpa_net 
> > > > *ndev, struct mlx5_vdpa_virtque
> > > > obj_context = MLX5_ADDR_OF(create_virtio_net_q_in, in, 
> > > > obj_context);
> > > > MLX5_SET(virtio_net_q_object, obj_context, hw_available_index, 
> > > > mvq->avail_idx);
> > > > +   MLX5_SET(virtio_net_q_object, obj_context, hw_used_index, 
> > > > mvq->used_idx);
> > > > MLX5_SET(virtio_net_q_object, obj_context, 
> > > > queue_feature_bit_mask_12_3,
> > > >  get_features_12_3(ndev->mvdev.actual_features));
> > > > vq_ctx = MLX5_ADDR_OF(virtio_net_q_object, obj_context, 
> > > > virtio_q_context);
> &g

Re: [PATCH 2/2] vdpa/mlx5: Restore the hardware used index after change map

2021-01-31 Thread Eli Cohen
On Fri, Jan 29, 2021 at 11:49:45AM +0800, Jason Wang wrote:
> 
> On 2021/1/28 下午9:41, Eli Cohen wrote:
> > When a change of memory map occurs, the hardware resources are destroyed
> > and then re-created again with the new memory map. In such case, we need
> > to restore the hardware available and used indices. The driver failed to
> > restore the used index which is added here.
> > 
> > Fixes 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for supported mlx5 devices")
> > Signed-off-by: Eli Cohen 
> 
> 
> A question. Does this mean after a vq is suspended, the hw used index is not
> equal to vq used index?

Surely there is just one "Used index" for a VQ. What I was trying to say
is that after the VQ is suspended, I read the used index by querying the
hardware. The read result is the used index that the hardware wrote to
memory. After the I create the new hardware object, I need to tell it
what is the used index (and the available index) as a way to sync it
with the existing VQ.

This sync is especially important when a change of map occurs while the
VQ was already used (hence the indices are likely to be non zero). This
can be triggered by hot adding memory after the VQs have been used. 

> 
> Thanks
> 
> 
> > ---
> >   drivers/vdpa/mlx5/net/mlx5_vnet.c | 7 +++
> >   1 file changed, 7 insertions(+)
> > 
> > diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
> > b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > index 549ded074ff3..3fc8588cecae 100644
> > --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > @@ -87,6 +87,7 @@ struct mlx5_vq_restore_info {
> > u64 device_addr;
> > u64 driver_addr;
> > u16 avail_index;
> > +   u16 used_index;
> > bool ready;
> > struct vdpa_callback cb;
> > bool restore;
> > @@ -121,6 +122,7 @@ struct mlx5_vdpa_virtqueue {
> > u32 virtq_id;
> > struct mlx5_vdpa_net *ndev;
> > u16 avail_idx;
> > +   u16 used_idx;
> > int fw_state;
> > /* keep last in the struct */
> > @@ -804,6 +806,7 @@ static int create_virtqueue(struct mlx5_vdpa_net *ndev, 
> > struct mlx5_vdpa_virtque
> > obj_context = MLX5_ADDR_OF(create_virtio_net_q_in, in, obj_context);
> > MLX5_SET(virtio_net_q_object, obj_context, hw_available_index, 
> > mvq->avail_idx);
> > +   MLX5_SET(virtio_net_q_object, obj_context, hw_used_index, 
> > mvq->used_idx);
> > MLX5_SET(virtio_net_q_object, obj_context, queue_feature_bit_mask_12_3,
> >  get_features_12_3(ndev->mvdev.actual_features));
> > vq_ctx = MLX5_ADDR_OF(virtio_net_q_object, obj_context, 
> > virtio_q_context);
> > @@ -1022,6 +1025,7 @@ static int connect_qps(struct mlx5_vdpa_net *ndev, 
> > struct mlx5_vdpa_virtqueue *m
> >   struct mlx5_virtq_attr {
> > u8 state;
> > u16 available_index;
> > +   u16 used_index;
> >   };
> >   static int query_virtqueue(struct mlx5_vdpa_net *ndev, struct 
> > mlx5_vdpa_virtqueue *mvq,
> > @@ -1052,6 +1056,7 @@ static int query_virtqueue(struct mlx5_vdpa_net 
> > *ndev, struct mlx5_vdpa_virtqueu
> > memset(attr, 0, sizeof(*attr));
> > attr->state = MLX5_GET(virtio_net_q_object, obj_context, state);
> > attr->available_index = MLX5_GET(virtio_net_q_object, obj_context, 
> > hw_available_index);
> > +   attr->used_index = MLX5_GET(virtio_net_q_object, obj_context, 
> > hw_used_index);
> > kfree(out);
> > return 0;
> > @@ -1602,6 +1607,7 @@ static int save_channel_info(struct mlx5_vdpa_net 
> > *ndev, struct mlx5_vdpa_virtqu
> > return err;
> > ri->avail_index = attr.available_index;
> > +   ri->used_index = attr.used_index;
> > ri->ready = mvq->ready;
> > ri->num_ent = mvq->num_ent;
> > ri->desc_addr = mvq->desc_addr;
> > @@ -1646,6 +1652,7 @@ static void restore_channels_info(struct 
> > mlx5_vdpa_net *ndev)
> > continue;
> > mvq->avail_idx = ri->avail_index;
> > +   mvq->used_idx = ri->used_index;
> > mvq->ready = ri->ready;
> > mvq->num_ent = ri->num_ent;
> > mvq->desc_addr = ri->desc_addr;
> 


[PATCH 2/2] vdpa/mlx5: Restore the hardware used index after change map

2021-01-28 Thread Eli Cohen
When a change of memory map occurs, the hardware resources are destroyed
and then re-created again with the new memory map. In such case, we need
to restore the hardware available and used indices. The driver failed to
restore the used index which is added here.

Fixes 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for supported mlx5 devices")
Signed-off-by: Eli Cohen 
---
 drivers/vdpa/mlx5/net/mlx5_vnet.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index 549ded074ff3..3fc8588cecae 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -87,6 +87,7 @@ struct mlx5_vq_restore_info {
u64 device_addr;
u64 driver_addr;
u16 avail_index;
+   u16 used_index;
bool ready;
struct vdpa_callback cb;
bool restore;
@@ -121,6 +122,7 @@ struct mlx5_vdpa_virtqueue {
u32 virtq_id;
struct mlx5_vdpa_net *ndev;
u16 avail_idx;
+   u16 used_idx;
int fw_state;
 
/* keep last in the struct */
@@ -804,6 +806,7 @@ static int create_virtqueue(struct mlx5_vdpa_net *ndev, 
struct mlx5_vdpa_virtque
 
obj_context = MLX5_ADDR_OF(create_virtio_net_q_in, in, obj_context);
MLX5_SET(virtio_net_q_object, obj_context, hw_available_index, 
mvq->avail_idx);
+   MLX5_SET(virtio_net_q_object, obj_context, hw_used_index, 
mvq->used_idx);
MLX5_SET(virtio_net_q_object, obj_context, queue_feature_bit_mask_12_3,
 get_features_12_3(ndev->mvdev.actual_features));
vq_ctx = MLX5_ADDR_OF(virtio_net_q_object, obj_context, 
virtio_q_context);
@@ -1022,6 +1025,7 @@ static int connect_qps(struct mlx5_vdpa_net *ndev, struct 
mlx5_vdpa_virtqueue *m
 struct mlx5_virtq_attr {
u8 state;
u16 available_index;
+   u16 used_index;
 };
 
 static int query_virtqueue(struct mlx5_vdpa_net *ndev, struct 
mlx5_vdpa_virtqueue *mvq,
@@ -1052,6 +1056,7 @@ static int query_virtqueue(struct mlx5_vdpa_net *ndev, 
struct mlx5_vdpa_virtqueu
memset(attr, 0, sizeof(*attr));
attr->state = MLX5_GET(virtio_net_q_object, obj_context, state);
attr->available_index = MLX5_GET(virtio_net_q_object, obj_context, 
hw_available_index);
+   attr->used_index = MLX5_GET(virtio_net_q_object, obj_context, 
hw_used_index);
kfree(out);
return 0;
 
@@ -1602,6 +1607,7 @@ static int save_channel_info(struct mlx5_vdpa_net *ndev, 
struct mlx5_vdpa_virtqu
return err;
 
ri->avail_index = attr.available_index;
+   ri->used_index = attr.used_index;
ri->ready = mvq->ready;
ri->num_ent = mvq->num_ent;
ri->desc_addr = mvq->desc_addr;
@@ -1646,6 +1652,7 @@ static void restore_channels_info(struct mlx5_vdpa_net 
*ndev)
continue;
 
mvq->avail_idx = ri->avail_index;
+   mvq->used_idx = ri->used_index;
mvq->ready = ri->ready;
mvq->num_ent = ri->num_ent;
mvq->desc_addr = ri->desc_addr;
-- 
2.29.2



[PATCH 1/2] vdpa/mlx5: Avoid unnecessary query virtqueue

2021-01-28 Thread Eli Cohen
suspend_vq should only suspend the VQ on not save the current available
index. This is done when a change of map occurs when the driver calls
save_channel_info().

Signed-off-by: Eli Cohen 
---
 drivers/vdpa/mlx5/net/mlx5_vnet.c | 8 
 1 file changed, 8 deletions(-)

diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index 88dde3455bfd..549ded074ff3 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -1148,8 +1148,6 @@ static int setup_vq(struct mlx5_vdpa_net *ndev, struct 
mlx5_vdpa_virtqueue *mvq)
 
 static void suspend_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue 
*mvq)
 {
-   struct mlx5_virtq_attr attr;
-
if (!mvq->initialized)
return;
 
@@ -1158,12 +1156,6 @@ static void suspend_vq(struct mlx5_vdpa_net *ndev, 
struct mlx5_vdpa_virtqueue *m
 
if (modify_virtqueue(ndev, mvq, MLX5_VIRTIO_NET_Q_OBJECT_STATE_SUSPEND))
mlx5_vdpa_warn(>mvdev, "modify to suspend failed\n");
-
-   if (query_virtqueue(ndev, mvq, )) {
-   mlx5_vdpa_warn(>mvdev, "failed to query virtqueue\n");
-   return;
-   }
-   mvq->avail_idx = attr.available_index;
 }
 
 static void suspend_vqs(struct mlx5_vdpa_net *ndev)
-- 
2.29.2



[PATCH 0/2] Fix failure to hot add memory

2021-01-28 Thread Eli Cohen
Hi Michael,
The following two patches are a fixing a failure to update the hardware
with the updated used index. This results in a failure to to hot add
memory to the guest which results in a memory map update and teardown
and re-create of the resources.

The first patch just removes unnecessary code. The second on is the
actual fix.

Eli Cohen (2):
  vdpa/mlx5: Avoid unnecessary query virtqueue
  vdpa/mlx5: Restore the hardware used index after change map

 drivers/vdpa/mlx5/net/mlx5_vnet.c | 15 +++
 1 file changed, 7 insertions(+), 8 deletions(-)

-- 
2.29.2



Re: [PATCH v1] vdpa/mlx5: Fix memory key MTT population

2021-01-20 Thread Eli Cohen
On Wed, Jan 20, 2021 at 03:52:00AM -0500, Michael S. Tsirkin wrote:
> On Wed, Jan 20, 2021 at 10:11:54AM +0200, Eli Cohen wrote:
> > On Wed, Jan 20, 2021 at 02:57:05AM -0500, Michael S. Tsirkin wrote:
> > > On Wed, Jan 20, 2021 at 07:36:19AM +0200, Eli Cohen wrote:
> > > > On Fri, Jan 08, 2021 at 04:38:55PM +0800, Jason Wang wrote:
> > > > 
> > > > Hi Michael,
> > > > this patch is a fix. Are you going to merge it?
> > > 
> > > yes - in the next pull request.
> > > 
> > 
> > Great thanks.
> > Can you send the path to your git tree where you keep the patches you
> > intend to merge?
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git linux-next
> 
> Note I often rebase it (e.g. just did).
> 

Great, thanks!

> -- 
> MST
> 


Re: [PATCH v1] vdpa/mlx5: Fix memory key MTT population

2021-01-20 Thread Eli Cohen
On Wed, Jan 20, 2021 at 02:57:05AM -0500, Michael S. Tsirkin wrote:
> On Wed, Jan 20, 2021 at 07:36:19AM +0200, Eli Cohen wrote:
> > On Fri, Jan 08, 2021 at 04:38:55PM +0800, Jason Wang wrote:
> > 
> > Hi Michael,
> > this patch is a fix. Are you going to merge it?
> 
> yes - in the next pull request.
> 

Great thanks.
Can you send the path to your git tree where you keep the patches you
intend to merge?

> > > 
> > > On 2021/1/7 下午3:18, Eli Cohen wrote:
> > > > map_direct_mr() assumed that the number of scatter/gather entries
> > > > returned by dma_map_sg_attrs() was equal to the number of segments in
> > > > the sgl list. This led to wrong population of the mkey object. Fix this
> > > > by properly referring to the returned value.
> > > > 
> > > > The hardware expects each MTT entry to contain the DMA address of a
> > > > contiguous block of memory of size (1 << mr->log_size) bytes.
> > > > dma_map_sg_attrs() can coalesce several sg entries into a single
> > > > scatter/gather entry of contiguous DMA range so we need to scan the list
> > > > and refer to the size of each s/g entry.
> > > > 
> > > > In addition, get rid of fill_sg() which effect is overwritten by
> > > > populate_mtts().
> > > > 
> > > > Fixes: 94abbccdf291 ("vdpa/mlx5: Add shared memory registration code")
> > > > Signed-off-by: Eli Cohen 
> > > > ---
> > > > V0->V1:
> > > > 1. Fix typos
> > > > 2. Improve changelog
> > > 
> > > 
> > > Acked-by: Jason Wang 
> > > 
> > > 
> > > > 
> > > >   drivers/vdpa/mlx5/core/mlx5_vdpa.h |  1 +
> > > >   drivers/vdpa/mlx5/core/mr.c| 28 
> > > >   2 files changed, 13 insertions(+), 16 deletions(-)
> > > > 
> > > > diff --git a/drivers/vdpa/mlx5/core/mlx5_vdpa.h 
> > > > b/drivers/vdpa/mlx5/core/mlx5_vdpa.h
> > > > index 5c92a576edae..08f742fd2409 100644
> > > > --- a/drivers/vdpa/mlx5/core/mlx5_vdpa.h
> > > > +++ b/drivers/vdpa/mlx5/core/mlx5_vdpa.h
> > > > @@ -15,6 +15,7 @@ struct mlx5_vdpa_direct_mr {
> > > > struct sg_table sg_head;
> > > > int log_size;
> > > > int nsg;
> > > > +   int nent;
> > > > struct list_head list;
> > > > u64 offset;
> > > >   };
> > > > diff --git a/drivers/vdpa/mlx5/core/mr.c b/drivers/vdpa/mlx5/core/mr.c
> > > > index 4b6195666c58..d300f799efcd 100644
> > > > --- a/drivers/vdpa/mlx5/core/mr.c
> > > > +++ b/drivers/vdpa/mlx5/core/mr.c
> > > > @@ -25,17 +25,6 @@ static int get_octo_len(u64 len, int page_shift)
> > > > return (npages + 1) / 2;
> > > >   }
> > > > -static void fill_sg(struct mlx5_vdpa_direct_mr *mr, void *in)
> > > > -{
> > > > -   struct scatterlist *sg;
> > > > -   __be64 *pas;
> > > > -   int i;
> > > > -
> > > > -   pas = MLX5_ADDR_OF(create_mkey_in, in, klm_pas_mtt);
> > > > -   for_each_sg(mr->sg_head.sgl, sg, mr->nsg, i)
> > > > -   (*pas) = cpu_to_be64(sg_dma_address(sg));
> > > > -}
> > > > -
> > > >   static void mlx5_set_access_mode(void *mkc, int mode)
> > > >   {
> > > > MLX5_SET(mkc, mkc, access_mode_1_0, mode & 0x3);
> > > > @@ -45,10 +34,18 @@ static void mlx5_set_access_mode(void *mkc, int 
> > > > mode)
> > > >   static void populate_mtts(struct mlx5_vdpa_direct_mr *mr, __be64 *mtt)
> > > >   {
> > > > struct scatterlist *sg;
> > > > +   int nsg = mr->nsg;
> > > > +   u64 dma_addr;
> > > > +   u64 dma_len;
> > > > +   int j = 0;
> > > > int i;
> > > > -   for_each_sg(mr->sg_head.sgl, sg, mr->nsg, i)
> > > > -   mtt[i] = cpu_to_be64(sg_dma_address(sg));
> > > > +   for_each_sg(mr->sg_head.sgl, sg, mr->nent, i) {
> > > > +   for (dma_addr = sg_dma_address(sg), dma_len = 
> > > > sg_dma_len(sg);
> > > > +nsg && dma_len;
> > > > +nsg--, dma_addr += BIT(mr->log_size), dma_len -= 
> > > > BIT(mr->log_size))
> > > > + 

Re: [PATCH v1] vdpa/mlx5: Fix memory key MTT population

2021-01-19 Thread Eli Cohen
On Fri, Jan 08, 2021 at 04:38:55PM +0800, Jason Wang wrote:

Hi Michael,
this patch is a fix. Are you going to merge it?

> 
> On 2021/1/7 下午3:18, Eli Cohen wrote:
> > map_direct_mr() assumed that the number of scatter/gather entries
> > returned by dma_map_sg_attrs() was equal to the number of segments in
> > the sgl list. This led to wrong population of the mkey object. Fix this
> > by properly referring to the returned value.
> > 
> > The hardware expects each MTT entry to contain the DMA address of a
> > contiguous block of memory of size (1 << mr->log_size) bytes.
> > dma_map_sg_attrs() can coalesce several sg entries into a single
> > scatter/gather entry of contiguous DMA range so we need to scan the list
> > and refer to the size of each s/g entry.
> > 
> > In addition, get rid of fill_sg() which effect is overwritten by
> > populate_mtts().
> > 
> > Fixes: 94abbccdf291 ("vdpa/mlx5: Add shared memory registration code")
> > Signed-off-by: Eli Cohen 
> > ---
> > V0->V1:
> > 1. Fix typos
> > 2. Improve changelog
> 
> 
> Acked-by: Jason Wang 
> 
> 
> > 
> >   drivers/vdpa/mlx5/core/mlx5_vdpa.h |  1 +
> >   drivers/vdpa/mlx5/core/mr.c| 28 
> >   2 files changed, 13 insertions(+), 16 deletions(-)
> > 
> > diff --git a/drivers/vdpa/mlx5/core/mlx5_vdpa.h 
> > b/drivers/vdpa/mlx5/core/mlx5_vdpa.h
> > index 5c92a576edae..08f742fd2409 100644
> > --- a/drivers/vdpa/mlx5/core/mlx5_vdpa.h
> > +++ b/drivers/vdpa/mlx5/core/mlx5_vdpa.h
> > @@ -15,6 +15,7 @@ struct mlx5_vdpa_direct_mr {
> > struct sg_table sg_head;
> > int log_size;
> > int nsg;
> > +   int nent;
> > struct list_head list;
> > u64 offset;
> >   };
> > diff --git a/drivers/vdpa/mlx5/core/mr.c b/drivers/vdpa/mlx5/core/mr.c
> > index 4b6195666c58..d300f799efcd 100644
> > --- a/drivers/vdpa/mlx5/core/mr.c
> > +++ b/drivers/vdpa/mlx5/core/mr.c
> > @@ -25,17 +25,6 @@ static int get_octo_len(u64 len, int page_shift)
> > return (npages + 1) / 2;
> >   }
> > -static void fill_sg(struct mlx5_vdpa_direct_mr *mr, void *in)
> > -{
> > -   struct scatterlist *sg;
> > -   __be64 *pas;
> > -   int i;
> > -
> > -   pas = MLX5_ADDR_OF(create_mkey_in, in, klm_pas_mtt);
> > -   for_each_sg(mr->sg_head.sgl, sg, mr->nsg, i)
> > -   (*pas) = cpu_to_be64(sg_dma_address(sg));
> > -}
> > -
> >   static void mlx5_set_access_mode(void *mkc, int mode)
> >   {
> > MLX5_SET(mkc, mkc, access_mode_1_0, mode & 0x3);
> > @@ -45,10 +34,18 @@ static void mlx5_set_access_mode(void *mkc, int mode)
> >   static void populate_mtts(struct mlx5_vdpa_direct_mr *mr, __be64 *mtt)
> >   {
> > struct scatterlist *sg;
> > +   int nsg = mr->nsg;
> > +   u64 dma_addr;
> > +   u64 dma_len;
> > +   int j = 0;
> > int i;
> > -   for_each_sg(mr->sg_head.sgl, sg, mr->nsg, i)
> > -   mtt[i] = cpu_to_be64(sg_dma_address(sg));
> > +   for_each_sg(mr->sg_head.sgl, sg, mr->nent, i) {
> > +   for (dma_addr = sg_dma_address(sg), dma_len = sg_dma_len(sg);
> > +nsg && dma_len;
> > +nsg--, dma_addr += BIT(mr->log_size), dma_len -= 
> > BIT(mr->log_size))
> > +   mtt[j++] = cpu_to_be64(dma_addr);
> > +   }
> >   }
> >   static int create_direct_mr(struct mlx5_vdpa_dev *mvdev, struct 
> > mlx5_vdpa_direct_mr *mr)
> > @@ -64,7 +61,6 @@ static int create_direct_mr(struct mlx5_vdpa_dev *mvdev, 
> > struct mlx5_vdpa_direct
> > return -ENOMEM;
> > MLX5_SET(create_mkey_in, in, uid, mvdev->res.uid);
> > -   fill_sg(mr, in);
> > mkc = MLX5_ADDR_OF(create_mkey_in, in, memory_key_mkey_entry);
> > MLX5_SET(mkc, mkc, lw, !!(mr->perm & VHOST_MAP_WO));
> > MLX5_SET(mkc, mkc, lr, !!(mr->perm & VHOST_MAP_RO));
> > @@ -276,8 +272,8 @@ static int map_direct_mr(struct mlx5_vdpa_dev *mvdev, 
> > struct mlx5_vdpa_direct_mr
> >   done:
> > mr->log_size = log_entity_size;
> > mr->nsg = nsg;
> > -   err = dma_map_sg_attrs(dma, mr->sg_head.sgl, mr->nsg, 
> > DMA_BIDIRECTIONAL, 0);
> > -   if (!err)
> > +   mr->nent = dma_map_sg_attrs(dma, mr->sg_head.sgl, mr->nsg, 
> > DMA_BIDIRECTIONAL, 0);
> > +   if (!mr->nent)
> > goto err_map;
> > err = create_direct_mr(mvdev, mr);
> 


Re: Change eats memory on my server

2021-01-18 Thread Eli Cohen
On Mon, Jan 18, 2021 at 10:30:56AM +0100, Thomas Zimmermann wrote:
> Hi
> 
> Am 18.01.21 um 10:13 schrieb Eli Cohen:
> > On Mon, Jan 18, 2021 at 08:54:07AM +0100, Thomas Zimmermann wrote:
> > > Hi
> > > 
> > > Am 18.01.21 um 08:43 schrieb Christian König:
> > > > Hi Eli,
> > > > 
> > > > have you already tried using kmemleak?
> > > > 
> > > > This sounds like a leak of memory allocated using kmalloc(), so kmemleak
> > > > should be able to catch it.
> > > 
> > > I have an idea what happens here. When the refcount is 0 in kmap, a new 
> > > page
> > > mapping for the BO is being established. But VRAM helpers unmap the 
> > > previous
> > > pages only on BO moves or frees; not in kunmap. So the old mapping might
> > > still be around. I'll send out a test patch later today.
> > > 
> > 
> > Great! Looking forward to test it.
> 
> Here's the patch against the latest DRM tree. v5.11-rc3 should work as well.
> 
> I was able to reproduce the memory leak locally and found that the patch
> fixes it. Please give it a try.
> 

Thomas, thanks for looking into it. My first impression is that the
patch indeed fixes the leak.

I will report again later today.

> Best regards
> Thomas
> 
> > 
> > > Best regards
> > > Thomas
> > > 
> > > > 
> > > > Regards,
> > > > Christian.
> > > > 
> > > > Am 17.01.21 um 06:08 schrieb Eli Cohen:
> > > > > On Fri, Jan 15, 2021 at 10:03:50AM +0100, Thomas Zimmermann wrote:
> > > > > > Could you please double-check that 3fb91f56aea4 ("drm/udl: Retrieve 
> > > > > > USB
> > > > > > device from struct drm_device.dev") works correctly
> > > > > Checked again, it does not seem to leak.
> > > > > 
> > > > > > and that 823efa922102
> > > > > > ("drm/cma-helper: Remove empty drm_gem_cma_prime_vunmap()") is 
> > > > > > broken?
> > > > > > 
> > > > > Yes, this one leaks, as does the one preceding it:
> > > > > 
> > > > > 1086db71a1db ("drm/vram-helper: Remove invariant parameters from
> > > > > internal kmap function")
> > > > > > For one of the broken commits, could you please send us the output 
> > > > > > of
> > > > > > 
> > > > > >     dmesg | grep -i drm
> > > > > > 
> > > > > > after most of the memory got leaked?
> > > > > > 
> > > > > I ran the following script in the shell:
> > > > > 
> > > > > while true; do cat /proc/meminfo | grep MemFree:; sleep 5; done
> > > > > 
> > > > > and this is what I saw before I got disconnected from the shell:
> > > > > 
> > > > > MemFree:  148208 kB
> > > > > MemFree:  148304 kB
> > > > > MemFree:  146660 kB
> > > > > Connection to nps-server-24 closed by remote host.
> > > > > Connection to nps-server-24 closed.
> > > > > 
> > > > > 
> > > > > I also mointored the output of dmesg | grep -i drm
> > > > > The last output I was able to save on disk is this:
> > > > > 
> > > > > [   46.140720] ast :03:00.0: [drm] Using P2A bridge for 
> > > > > configuration
> > > > > [   46.140737] ast :03:00.0: [drm] AST 2500 detected
> > > > > [   46.140754] ast :03:00.0: [drm] Analog VGA only
> > > > > [   46.140772] ast :03:00.0: [drm] dram MCLK=800 Mhz type=7
> > > > > bus_width=16
> > > > > [   46.153553] [drm] Initialized ast 0.1.0 20120228 for :03:00.0
> > > > > on minor 0
> > > > > [   46.165097] fbcon: astdrmfb (fb0) is primary device
> > > > > [   46.391381] ast :03:00.0: [drm] fb0: astdrmfb frame buffer 
> > > > > device
> > > > > [   56.097697] systemd[1]: Starting Load Kernel Module drm...
> > > > > [   56.343556] systemd[1]: modprobe@drm.service: Succeeded.
> > > > > [   56.350382] systemd[1]: Finished Load Kernel Module drm.
> > > > > [13319.469462] [   2683] 70889  2683    55586    0    73728
> > > > > 138 0 tdrm
> > > > > [13320.658386] [   2683] 70889  2683    55586    0    73728
> > > > > 138 0 tdrm
> > > > > [13321.800970] [   2683] 70889  2683    55586    0    73728
> > > > > 138 0 tdrm
> > > > 
> > > > ___
> > > > dri-devel mailing list
> > > > dri-de...@lists.freedesktop.org
> > > > https://lists.freedesktop.org/mailman/listinfo/dri-devel
> > > 
> > > -- 
> > > Thomas Zimmermann
> > > Graphics Driver Developer
> > > SUSE Software Solutions Germany GmbH
> > > Maxfeldstr. 5, 90409 Nürnberg, Germany
> > > (HRB 36809, AG Nürnberg)
> > > Geschäftsführer: Felix Imendörffer
> > > 
> > 
> > 
> > 
> 
> -- 
> Thomas Zimmermann
> Graphics Driver Developer
> SUSE Software Solutions Germany GmbH
> Maxfeldstr. 5, 90409 Nürnberg, Germany
> (HRB 36809, AG Nürnberg)
> Geschäftsführer: Felix Imendörffer

> sh: colordiff: command not found
> cat: write error: Broken pipe






Re: Change eats memory on my server

2021-01-18 Thread Eli Cohen
On Mon, Jan 18, 2021 at 02:20:49PM +0100, Thomas Zimmermann wrote:
> Hi
> 
> Am 18.01.21 um 14:16 schrieb Eli Cohen:
> > On Mon, Jan 18, 2021 at 10:30:56AM +0100, Thomas Zimmermann wrote:
> > > 
> > > Here's the patch against the latest DRM tree. v5.11-rc3 should work as 
> > > well.
> > > 
> > > I was able to reproduce the memory leak locally and found that the patch
> > > fixes it. Please give it a try.
> > > 
> > 
> > As far as I am concerned, this issue is fixed by the patch you sent.
> > 
> > Thanks for looking into it.
> 
> OK, great. I'll prepare the real patch soon. Can I add your Reported-by and
> Tested-by tags?

Yes, sure.

> 
> Best regards
> Thomas
> 
> > 
> > Eli
> > 
> 
> -- 
> Thomas Zimmermann
> Graphics Driver Developer
> SUSE Software Solutions Germany GmbH
> Maxfeldstr. 5, 90409 Nürnberg, Germany
> (HRB 36809, AG Nürnberg)
> Geschäftsführer: Felix Imendörffer
> 





Re: Change eats memory on my server

2021-01-18 Thread Eli Cohen
On Mon, Jan 18, 2021 at 10:30:56AM +0100, Thomas Zimmermann wrote:
> 
> Here's the patch against the latest DRM tree. v5.11-rc3 should work as well.
> 
> I was able to reproduce the memory leak locally and found that the patch
> fixes it. Please give it a try.
> 

As far as I am concerned, this issue is fixed by the patch you sent.

Thanks for looking into it.

Eli


Re: Change eats memory on my server

2021-01-18 Thread Eli Cohen
On Mon, Jan 18, 2021 at 08:54:07AM +0100, Thomas Zimmermann wrote:
> Hi
> 
> Am 18.01.21 um 08:43 schrieb Christian König:
> > Hi Eli,
> > 
> > have you already tried using kmemleak?
> > 
> > This sounds like a leak of memory allocated using kmalloc(), so kmemleak
> > should be able to catch it.
> 
> I have an idea what happens here. When the refcount is 0 in kmap, a new page
> mapping for the BO is being established. But VRAM helpers unmap the previous
> pages only on BO moves or frees; not in kunmap. So the old mapping might
> still be around. I'll send out a test patch later today.
> 

Great! Looking forward to test it.

> Best regards
> Thomas
> 
> > 
> > Regards,
> > Christian.
> > 
> > Am 17.01.21 um 06:08 schrieb Eli Cohen:
> > > On Fri, Jan 15, 2021 at 10:03:50AM +0100, Thomas Zimmermann wrote:
> > > > Could you please double-check that 3fb91f56aea4 ("drm/udl: Retrieve USB
> > > > device from struct drm_device.dev") works correctly
> > > Checked again, it does not seem to leak.
> > > 
> > > > and that 823efa922102
> > > > ("drm/cma-helper: Remove empty drm_gem_cma_prime_vunmap()") is broken?
> > > > 
> > > Yes, this one leaks, as does the one preceding it:
> > > 
> > > 1086db71a1db ("drm/vram-helper: Remove invariant parameters from
> > > internal kmap function")
> > > > For one of the broken commits, could you please send us the output of
> > > > 
> > > >    dmesg | grep -i drm
> > > > 
> > > > after most of the memory got leaked?
> > > > 
> > > I ran the following script in the shell:
> > > 
> > > while true; do cat /proc/meminfo | grep MemFree:; sleep 5; done
> > > 
> > > and this is what I saw before I got disconnected from the shell:
> > > 
> > > MemFree:  148208 kB
> > > MemFree:  148304 kB
> > > MemFree:  146660 kB
> > > Connection to nps-server-24 closed by remote host.
> > > Connection to nps-server-24 closed.
> > > 
> > > 
> > > I also mointored the output of dmesg | grep -i drm
> > > The last output I was able to save on disk is this:
> > > 
> > > [   46.140720] ast :03:00.0: [drm] Using P2A bridge for configuration
> > > [   46.140737] ast :03:00.0: [drm] AST 2500 detected
> > > [   46.140754] ast :03:00.0: [drm] Analog VGA only
> > > [   46.140772] ast :03:00.0: [drm] dram MCLK=800 Mhz type=7
> > > bus_width=16
> > > [   46.153553] [drm] Initialized ast 0.1.0 20120228 for :03:00.0
> > > on minor 0
> > > [   46.165097] fbcon: astdrmfb (fb0) is primary device
> > > [   46.391381] ast :03:00.0: [drm] fb0: astdrmfb frame buffer device
> > > [   56.097697] systemd[1]: Starting Load Kernel Module drm...
> > > [   56.343556] systemd[1]: modprobe@drm.service: Succeeded.
> > > [   56.350382] systemd[1]: Finished Load Kernel Module drm.
> > > [13319.469462] [   2683] 70889  2683    55586    0    73728
> > > 138 0 tdrm
> > > [13320.658386] [   2683] 70889  2683    55586    0    73728
> > > 138 0 tdrm
> > > [13321.800970] [   2683] 70889  2683    55586    0    73728
> > > 138 0 tdrm
> > 
> > ___
> > dri-devel mailing list
> > dri-de...@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/dri-devel
> 
> -- 
> Thomas Zimmermann
> Graphics Driver Developer
> SUSE Software Solutions Germany GmbH
> Maxfeldstr. 5, 90409 Nürnberg, Germany
> (HRB 36809, AG Nürnberg)
> Geschäftsführer: Felix Imendörffer
> 





Re: Change eats memory on my server

2021-01-18 Thread Eli Cohen
On Mon, Jan 18, 2021 at 08:57:26AM +0100, Christian König wrote:
> Am 18.01.21 um 08:49 schrieb Eli Cohen:
> > On Mon, Jan 18, 2021 at 08:43:12AM +0100, Christian König wrote:
> > > Hi Eli,
> > > 
> > > have you already tried using kmemleak?
> > > 
> > > This sounds like a leak of memory allocated using kmalloc(), so kmemleak
> > > should be able to catch it.
> > > 
> > Hi Christian,
> > 
> > I have the following configured but I did not see any visible complaint
> > in dmesg.
> > 
> > CONFIG_HAVE_DEBUG_KMEMLEAK=y
> > CONFIG_DEBUG_KMEMLEAK=y
> > CONFIG_DEBUG_KMEMLEAK_MEM_POOL_SIZE=16000
> > 
> > Any other configuration that I need to set?
> 
> As long as you don't have any kernel parameters to enable it I think you
> need to do "echo scan > /sys/kernel/debug/kmemleak" to start a scan.
> 
> The result can then be queried using "cat /sys/kernel/debug/kmemleak".
> 

There are some minor leaks that I noticed a while ago coming from SE
Linux. I don't think these leaks are killing my server but here they
are. Maybe someone from SELInux would like to address them.


unreferenced object 0x8884a5cd5000 (size 512):
  comm "swapper/0", pid 1, jiffies 4294736382 (age 8097.039s)
  hex dump (first 32 bytes):
03 00 00 00 05 00 00 00 03 00 00 00 00 00 00 00  
00 00 00 00 00 00 00 00 00 00 00 00 ad 4e ad de  .N..
  backtrace:
[<28e4d3ae>] selinux_sb_alloc_security+0x2e/0xf0
[<9037afcc>] security_sb_alloc+0x2b/0x50
[<a8f69eea>] alloc_super+0x140/0x590
[<b417f227>] sget_fc+0xa9/0x380
[<41b639cf>] get_tree_single+0x26/0x100
[<bf572b76>] vfs_get_tree+0x4c/0x140
[<c0aa3dd6>] vfs_kern_mount.part.0+0x75/0xd0
[<aa61ad1d>] kern_mount+0x2f/0x60
[<6ce5ffac>] init_sel_fs+0xf6/0x1a6
[<d3ba532d>] do_one_initcall+0xbb/0x3a0
[<84b518fb>] do_initcalls+0xff/0x129
[<a0cc02b2>] kernel_init_freeable+0x14c/0x178
[<5767353a>] kernel_init+0xd/0x120
[<d425dea7>] ret_from_fork+0x22/0x30
unreferenced object 0x888498ae6a78 (size 8):
  comm "(journald)", pid 379, jiffies 4294738985 (age 8094.447s)
  hex dump (first 8 bytes):
01 00 00 00 00 00 00 00  
  backtrace:
[<727257f3>] selinux_key_alloc+0x33/0xa0
[<23fcc23d>] security_key_alloc+0x3b/0x60
[<9b8f5c5c>] key_alloc+0x46e/0x900
[<a49c5ee1>] keyring_alloc+0x27/0x70
[<47d4e2e0>] install_session_keyring_to_cred+0xd7/0x120
[<92fa69fa>] join_session_keyring+0x109/0x1b0
[<c31be2c8>] __do_sys_keyctl+0x2c2/0x310
[<a99bb85a>] do_syscall_64+0x33/0x40
[<acf36f32>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
unreferenced object 0x888498ae6aa0 (size 8):
  comm "(journald)", pid 379, jiffies 4294738985 (age 8094.447s)
  hex dump (first 8 bytes):
01 00 00 00 00 00 00 00  
  backtrace:
[<727257f3>] selinux_key_alloc+0x33/0xa0
[<23fcc23d>] security_key_alloc+0x3b/0x60
[<9b8f5c5c>] key_alloc+0x46e/0x900
[<d752137d>] key_create_or_update+0x45a/0x760
[<cfad8dc7>] __do_sys_add_key+0x144/0x2a0
[<a99bb85a>] do_syscall_64+0x33/0x40
[<acf36f32>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
unreferenced object 0x88812bdab6e0 (size 8):
  comm "(lymouthd)", pid 573, jiffies 4294744193 (age 8089.252s)
  hex dump (first 8 bytes):
01 00 00 00 00 00 00 00  
  backtrace:
[<727257f3>] selinux_key_alloc+0x33/0xa0
[<23fcc23d>] security_key_alloc+0x3b/0x60
[<9b8f5c5c>] key_alloc+0x46e/0x900
[<a49c5ee1>] keyring_alloc+0x27/0x70
[<47d4e2e0>] install_session_keyring_to_cred+0xd7/0x120
[<92fa69fa>] join_session_keyring+0x109/0x1b0
[<c31be2c8>] __do_sys_keyctl+0x2c2/0x310
[<a99bb85a>] do_syscall_64+0x33/0x40
[<acf36f32>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
unreferenced object 0x88812bdab708 (size 8):
  comm "(lymouthd)", pid 573, jiffies 4294744193 (age 8089.252s)
  hex dump (first 8 bytes):
01 00 00 00 00 00 00 00  
  backtrace:
[<727257f3>] selinux_key_alloc+0x33/0xa0
[<23fcc23d>] security_key_alloc+0x3b/0x60
[<9b8f5c5c>] key_alloc+0x46e/0x900
[<000

Re: Change eats memory on my server

2021-01-17 Thread Eli Cohen
On Mon, Jan 18, 2021 at 08:43:12AM +0100, Christian König wrote:
> Hi Eli,
> 
> have you already tried using kmemleak?
> 
> This sounds like a leak of memory allocated using kmalloc(), so kmemleak
> should be able to catch it.
> 

Hi Christian,

I have the following configured but I did not see any visible complaint
in dmesg.

CONFIG_HAVE_DEBUG_KMEMLEAK=y
CONFIG_DEBUG_KMEMLEAK=y
CONFIG_DEBUG_KMEMLEAK_MEM_POOL_SIZE=16000

Any other configuration that I need to set?


Re: Change eats memory on my server

2021-01-16 Thread Eli Cohen
On Fri, Jan 15, 2021 at 10:03:50AM +0100, Thomas Zimmermann wrote:
> 
> Could you please double-check that 3fb91f56aea4 ("drm/udl: Retrieve USB
> device from struct drm_device.dev") works correctly

Checked again, it does not seem to leak.

> and that 823efa922102
> ("drm/cma-helper: Remove empty drm_gem_cma_prime_vunmap()") is broken?
>

Yes, this one leaks, as does the one preceding it: 

1086db71a1db ("drm/vram-helper: Remove invariant parameters from internal kmap 
function")
 
> For one of the broken commits, could you please send us the output of
> 
>   dmesg | grep -i drm
> 
> after most of the memory got leaked?
> 

I ran the following script in the shell:

while true; do cat /proc/meminfo | grep MemFree:; sleep 5; done

and this is what I saw before I got disconnected from the shell:

MemFree:  148208 kB
MemFree:  148304 kB
MemFree:  146660 kB
Connection to nps-server-24 closed by remote host.
Connection to nps-server-24 closed.


I also mointored the output of dmesg | grep -i drm
The last output I was able to save on disk is this:

[   46.140720] ast :03:00.0: [drm] Using P2A bridge for configuration
[   46.140737] ast :03:00.0: [drm] AST 2500 detected
[   46.140754] ast :03:00.0: [drm] Analog VGA only
[   46.140772] ast :03:00.0: [drm] dram MCLK=800 Mhz type=7 bus_width=16
[   46.153553] [drm] Initialized ast 0.1.0 20120228 for :03:00.0 on minor 0
[   46.165097] fbcon: astdrmfb (fb0) is primary device
[   46.391381] ast :03:00.0: [drm] fb0: astdrmfb frame buffer device
[   56.097697] systemd[1]: Starting Load Kernel Module drm...
[   56.343556] systemd[1]: modprobe@drm.service: Succeeded.
[   56.350382] systemd[1]: Finished Load Kernel Module drm.
[13319.469462] [   2683] 70889  268355586073728  138
 0 tdrm
[13320.658386] [   2683] 70889  268355586073728  138
 0 tdrm
[13321.800970] [   2683] 70889  268355586073728  138
 0 tdrm


Re: [PATCH 00/18] drivers: Remove oprofile and dcookies

2021-01-15 Thread William Cohen
On 1/14/21 4:50 PM, Robert Richter wrote:
> On 14.01.21 17:04:24, Viresh Kumar wrote:
>> Hello,
>>
>> The "oprofile" user-space tools don't use the kernel OPROFILE support
>> any more, and haven't in a long time. User-space has been converted to
>> the perf interfaces.
>>
>> Remove oprofile and dcookies (whose only user is oprofile) support from
>> the kernel.
>>
>> This was suggested here [1] earlier.
>>
>> This is build/boot tested by kernel test robot (Intel) and Linaro's
>> Tuxmake[2] for a lot of architectures and no failures were reported.
>>
>> --
>> Viresh
>>
>> [1] 
>> https://lore.kernel.org/lkml/CAHk-=whw9t3ztv8ia2sjwyqs1vojus14p_qhj3v5-9pcbmg...@mail.gmail.com/
>> [2] https://lwn.net/Articles/841624/
>>
>> Viresh Kumar (18):
>>   arch: alpha: Remove CONFIG_OPROFILE support
>>   arch: arm: Remove CONFIG_OPROFILE support
>>   arch: arc: Remove CONFIG_OPROFILE support
>>   arch: hexagon: Don't select HAVE_OPROFILE
>>   arch: ia64: Remove CONFIG_OPROFILE support
>>   arch: ia64: Remove rest of perfmon support
>>   arch: microblaze: Remove CONFIG_OPROFILE support
>>   arch: mips: Remove CONFIG_OPROFILE support
>>   arch: parisc: Remove CONFIG_OPROFILE support
>>   arch: powerpc: Stop building and using oprofile
>>   arch: powerpc: Remove oprofile
>>   arch: s390: Remove CONFIG_OPROFILE support
>>   arch: sh: Remove CONFIG_OPROFILE support
>>   arch: sparc: Remove CONFIG_OPROFILE support
>>   arch: x86: Remove CONFIG_OPROFILE support
>>   arch: xtensa: Remove CONFIG_OPROFILE support
>>   drivers: Remove CONFIG_OPROFILE support
>>   fs: Remove dcookies support
> 
> After oprofile userland moved to version 1.x, the kernel support for
> it isn't needed anymore. The switch was back in 2014 when oprofile
> started using the perf syscall:
> 
>  
> https://sourceforge.net/p/oprofile/oprofile/ci/ba9edea2bdfe2c9475749fc83105632bd916b96c
> 
> Since then I haven't received any significant patches to implement new
> features or add support for newer platforms in the kernel. There
> haven't been bug reports sent or questions asked on the mailing list
> for quite a while, which indicates there are no or less users. Users
> (if any) should switch to oprofile 1.x or the perf tool. No need to
> carry kernel support any longer with us.
> 
> So time to get rid of it. For the whole series:
> 
> Acked-by: Robert Richter 

The oprofile daemon that used the older oprofile kernel support was removed 
before OProfile 1.0 release by the following commit in August 2014:

https://sourceforge.net/p/oprofile/oprofile/ci/0c142c3a096d3e9ec42cc9b0ddad994fea60d135

At this point it makes sense to clean up the kernel and remove this unused code.

Acked-by: William Cohen 

> 
> 
> ___
> oprofile-list mailing list
> oprofile-l...@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/oprofile-list
> 



Change eats memory on my server

2021-01-14 Thread Eli Cohen
Hi Thomas,

After long bisecting I found that this patch,

commit 1086db71a1dbbfb32ffb42cf0d540b69956f951e
Author: Thomas Zimmermann 
Date:   Tue Nov 3 10:30:06 2020 +0100

drm/vram-helper: Remove invariant parameters from internal kmap function

is the offending patch causing the kernel to eat my server memory. It
will eat all 24 GB of ram after around 7 hours.

It's a a super micro server. The output of dmidecode is below:


# dmidecode 3.2
Getting SMBIOS data from sysfs.
SMBIOS 3.1.1 present.
Table at 0x6F01B000.

Handle 0x, DMI type 0, 26 bytes
BIOS Information
Vendor: American Megatrends Inc.
Version: 2.0
Release Date: 11/30/2017
Address: 0xF
Runtime Size: 64 kB
ROM Size: 32 MB
Characteristics:
PCI is supported
BIOS is upgradeable
BIOS shadowing is allowed
Boot from CD is supported
Selectable boot is supported
BIOS ROM is socketed
EDD is supported
5.25"/1.2 MB floppy services are supported (int 13h)
3.5"/720 kB floppy services are supported (int 13h)
3.5"/2.88 MB floppy services are supported (int 13h)
Print screen service is supported (int 5h)
Serial services are supported (int 14h)
Printer services are supported (int 17h)
ACPI is supported
USB legacy is supported
BIOS boot specification is supported
Targeted content distribution is supported
UEFI is supported
BIOS Revision: 5.12

Handle 0x0001, DMI type 1, 27 bytes
System Information
Manufacturer: Supermicro
Product Name: Super Server
Version: 0123456789
Serial Number: 0123456789
UUID: ----0cc47af973ca
Wake-up Type: Power Switch
SKU Number: To be filled by O.E.M.
Family: To be filled by O.E.M.

Handle 0x0002, DMI type 2, 15 bytes
Base Board Information
Manufacturer: Supermicro
Product Name: X11DPT-B
Version: 1.02
Serial Number: HM179S003332
Asset Tag: To be filled by O.E.M.
Features:
Board is a hosting board
Board is replaceable
Location In Chassis: To be filled by O.E.M.
Chassis Handle: 0x0003
Type: Motherboard
Contained Object Handles: 0

Handle 0x0003, DMI type 3, 22 bytes
Chassis Information
Manufacturer: Supermicro
Type: Main Server Chassis
Lock: Not Present
Version: 0123456789
Serial Number: 0123456789
Asset Tag: To be filled by O.E.M.
Boot-up State: Safe
Power Supply State: Safe
Thermal State: Safe
Security Status: None
OEM Information: 0x
Height: Unspecified
Number Of Power Cords: 1
Contained Elements: 0
SKU Number: To be filled by O.E.M.

Handle 0x0004, DMI type 8, 9 bytes
Port Connector Information
Internal Reference Designator: JVGA1
Internal Connector Type: None
External Reference Designator: VGA
External Connector Type: DB-15 female
Port Type: Video Port

Handle 0x0005, DMI type 8, 9 bytes
Port Connector Information
Internal Reference Designator: JLAN1
Internal Connector Type: None
External Reference Designator: IPMI_LAN
External Connector Type: RJ-45
Port Type: Network Port

Handle 0x0006, DMI type 8, 9 bytes
Port Connector Information
Internal Reference Designator: JUSB1
Internal Connector Type: None
External Reference Designator: USB0/1(3.0)
External Connector Type: Access Bus (USB)
Port Type: USB

Handle 0x0007, DMI type 8, 9 bytes
Port Connector Information
Internal Reference Designator: TPM/PORT80
Internal Connector Type: Other
External Reference Designator: Not Specified
External Connector Type: None
Port Type: Other

Handle 0x0008, DMI type 8, 9 bytes
Port Connector Information
Internal Reference Designator: FAN3
Internal Connector Type: Other
External Reference Designator: Not Specified
External Connector Type: None
Port Type: Other

Handle 0x0009, DMI type 8, 9 bytes
Port Connector Information
Internal Reference Designator: FAN4
Internal Connector Type: Other
External Reference Designator: Not Specified
External Connector Type: None
Port Type: Other

Handle 0x000A, DMI type 8, 9 bytes
Port Connector Information
Internal Reference Designator: JCOM1 - COM1
Internal Connector Type: Other
External Reference Designator: Not Specified
External Connector Type: None
Port Type: Other

Handle 0x000B, DMI type 8, 9 bytes
Port Connector Information
Internal Reference Designator: 

Re: [PATCH V2] mlx5: vdpa: fix possible uninitialized var

2021-01-13 Thread Eli Cohen
On Thu, Jan 14, 2021 at 03:09:04PM +0800, Jason Wang wrote:
> When compiling with -Werror=maybe-uninitialized, gcc may complains the
Maybe you want to fix to: gcc may complain about possible...

Other than that:
Acked-by: Eli Cohen 

> possible uninitialized umem. Since the callers won't pass value other
> than 1 to 3, making 3 as default to fix the compiler warning.
> 
> Signed-off-by: Jason Wang 
> ---
>  drivers/vdpa/mlx5/net/mlx5_vnet.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
> b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> index f1d54814db97..07ccc61cd6f6 100644
> --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> @@ -703,7 +703,7 @@ static void umem_destroy(struct mlx5_vdpa_net *ndev, 
> struct mlx5_vdpa_virtqueue
>   case 2:
>   umem = >umem2;
>   break;
> - case 3:
> + default:
>   umem = >umem3;
>   break;
>   }
> -- 
> 2.25.1
> 


Re: [PATCH 21/21] vdpasim: control virtqueue support

2021-01-11 Thread Eli Cohen
On Wed, Dec 16, 2020 at 02:48:18PM +0800, Jason Wang wrote:
> This patch introduces the control virtqueue support for vDPA
> simulator. This is a requirement for supporting advanced features like
> multiqueue.
> 
> A requirement for control virtqueue is to isolate its memory access
> from the rx/tx virtqueues. This is because when using vDPA device
> for VM, the control virqueue is not directly assigned to VM. Userspace
> (Qemu) will present a shadow control virtqueue to control for
> recording the device states.
> 
> The isolation is done via the virtqueue groups and ASID support in
> vDPA through vhost-vdpa. The simulator is extended to have:
> 
> 1) three virtqueues: RXVQ, TXVQ and CVQ (control virtqueue)
> 2) two virtqueue groups: group 0 contains RXVQ and TXVQ; group 1
>contains CVQ
> 3) two address spaces and the simulator simply implements the address
>spaces by mapping it 1:1 to IOTLB.
> 
> For the VM use cases, userspace(Qemu) may set AS 0 to group 0 and AS 1
> to group 1. So we have:
> 
> 1) The IOTLB for virtqueue group 0 contains the mappings of guest, so
>RX and TX can be assigned to guest directly.
> 2) The IOTLB for virtqueue group 1 contains the mappings of CVQ which
>is the buffers that allocated and managed by VMM only. So CVQ of
>vhost-vdpa is visible to VMM only. And Guest can not access the CVQ
>of vhost-vdpa.
> 
> For the other use cases, since AS 0 is associated to all virtqueue
> groups by default. All virtqueues share the same mapping by default.
> 
> To demonstrate the function, VIRITO_NET_F_CTRL_MACADDR is
> implemented in the simulator for the driver to set mac address.
> 

Hi Jason,

is there any version of qemu/libvirt available that I can see the
control virtqueue working in action?

> Signed-off-by: Jason Wang 
> ---
>  drivers/vdpa/vdpa_sim/vdpa_sim.c | 189 +++
>  1 file changed, 166 insertions(+), 23 deletions(-)
> 
> diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c 
> b/drivers/vdpa/vdpa_sim/vdpa_sim.c
> index fe90a783bde4..0fd06ac491cd 100644
> --- a/drivers/vdpa/vdpa_sim/vdpa_sim.c
> +++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c
> @@ -60,14 +60,18 @@ struct vdpasim_virtqueue {
>  #define VDPASIM_QUEUE_MAX 256
>  #define VDPASIM_DEVICE_ID 0x1
>  #define VDPASIM_VENDOR_ID 0
> -#define VDPASIM_VQ_NUM 0x2
> +#define VDPASIM_VQ_NUM 0x3
> +#define VDPASIM_AS_NUM 0x2
> +#define VDPASIM_GROUP_NUM 0x2
>  #define VDPASIM_NAME "vdpasim-netdev"
>  
>  static u64 vdpasim_features = (1ULL << VIRTIO_F_ANY_LAYOUT) |
> (1ULL << VIRTIO_F_VERSION_1)  |
> (1ULL << VIRTIO_F_ACCESS_PLATFORM) |
> +   (1ULL << VIRTIO_NET_F_MTU) |
> (1ULL << VIRTIO_NET_F_MAC) |
> -   (1ULL << VIRTIO_NET_F_MTU);
> +   (1ULL << VIRTIO_NET_F_CTRL_VQ) |
> +   (1ULL << VIRTIO_NET_F_CTRL_MAC_ADDR);
>  
>  /* State of each vdpasim device */
>  struct vdpasim {
> @@ -147,11 +151,17 @@ static void vdpasim_reset(struct vdpasim *vdpasim)
>  {
>   int i;
>  
> - for (i = 0; i < VDPASIM_VQ_NUM; i++)
> + spin_lock(>iommu_lock);
> +
> + for (i = 0; i < VDPASIM_VQ_NUM; i++) {
>   vdpasim_vq_reset(>vqs[i]);
> + vringh_set_iotlb(>vqs[i].vring,
> +  >iommu[0]);
> + }
>  
> - spin_lock(>iommu_lock);
> - vhost_iotlb_reset(vdpasim->iommu);
> + for (i = 0; i < VDPASIM_AS_NUM; i++) {
> + vhost_iotlb_reset(>iommu[i]);
> + }
>   spin_unlock(>iommu_lock);
>  
>   vdpasim->features = 0;
> @@ -191,6 +201,81 @@ static bool receive_filter(struct vdpasim *vdpasim, 
> size_t len)
>   return false;
>  }
>  
> +virtio_net_ctrl_ack vdpasim_handle_ctrl_mac(struct vdpasim *vdpasim,
> + u8 cmd)
> +{
> + struct vdpasim_virtqueue *cvq = >vqs[2];
> + virtio_net_ctrl_ack status = VIRTIO_NET_ERR;
> + size_t read;
> +
> + switch (cmd) {
> + case VIRTIO_NET_CTRL_MAC_ADDR_SET:
> + read = vringh_iov_pull_iotlb(>vring, >in_iov,
> +  (void *)vdpasim->config.mac,
> +  ETH_ALEN);
> + if (read == ETH_ALEN)
> + status = VIRTIO_NET_OK;
> + break;
> + default:
> + break;
> + }
> +
> + return status;
> +}
> +
> +static void vdpasim_handle_cvq(struct vdpasim *vdpasim)
> +{
> + struct vdpasim_virtqueue *cvq = >vqs[2];
> + virtio_net_ctrl_ack status = VIRTIO_NET_ERR;
> + struct virtio_net_ctrl_hdr ctrl;
> + size_t read, write;
> + int err;
> +
> + if (!(vdpasim->features & (1ULL << VIRTIO_NET_F_CTRL_VQ)))
> + return;
> +
> + if (!cvq->ready)
> + return;
> +
> + while (true) {
> + err = vringh_getdesc_iotlb(>vring, >in_iov,
> +  

Re: [PATCH] mlx5: vdpa: fix possible uninitialized var

2021-01-09 Thread Eli Cohen
On Fri, Jan 08, 2021 at 04:24:43PM +0800, Jason Wang wrote:
> Upstream: posted
> 
> When compiling with -Werror=maybe-uninitialized, gcc may complains the
> possible uninitialized umem. Fix that.
> 
> Signed-off-by: Jason Wang 
> ---
>  drivers/vdpa/mlx5/net/mlx5_vnet.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
> b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> index f1d54814db97..a6ad83d8d8e2 100644
> --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> @@ -706,6 +706,9 @@ static void umem_destroy(struct mlx5_vdpa_net *ndev, 
> struct mlx5_vdpa_virtqueue
>   case 3:
>   umem = >umem3;
>   break;
> + default:
> + WARN(1, "unsupported umem num %d\n", num);
> + return;
>   }
>  
>   MLX5_SET(destroy_umem_in, in, opcode, MLX5_CMD_OP_DESTROY_UMEM);

Since the "default" case will never be executed, maybe it's better to
just change "case 3:" to "default:" and avoid the WARN().

> -- 
> 2.25.1
> 


[PATCH v1] vdpa/mlx5: Fix memory key MTT population

2021-01-06 Thread Eli Cohen
map_direct_mr() assumed that the number of scatter/gather entries
returned by dma_map_sg_attrs() was equal to the number of segments in
the sgl list. This led to wrong population of the mkey object. Fix this
by properly referring to the returned value.

The hardware expects each MTT entry to contain the DMA address of a
contiguous block of memory of size (1 << mr->log_size) bytes.
dma_map_sg_attrs() can coalesce several sg entries into a single
scatter/gather entry of contiguous DMA range so we need to scan the list
and refer to the size of each s/g entry.

In addition, get rid of fill_sg() which effect is overwritten by
populate_mtts().

Fixes: 94abbccdf291 ("vdpa/mlx5: Add shared memory registration code")
Signed-off-by: Eli Cohen 
---
V0->V1:
1. Fix typos
2. Improve changelog 


 drivers/vdpa/mlx5/core/mlx5_vdpa.h |  1 +
 drivers/vdpa/mlx5/core/mr.c| 28 
 2 files changed, 13 insertions(+), 16 deletions(-)

diff --git a/drivers/vdpa/mlx5/core/mlx5_vdpa.h 
b/drivers/vdpa/mlx5/core/mlx5_vdpa.h
index 5c92a576edae..08f742fd2409 100644
--- a/drivers/vdpa/mlx5/core/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/core/mlx5_vdpa.h
@@ -15,6 +15,7 @@ struct mlx5_vdpa_direct_mr {
struct sg_table sg_head;
int log_size;
int nsg;
+   int nent;
struct list_head list;
u64 offset;
 };
diff --git a/drivers/vdpa/mlx5/core/mr.c b/drivers/vdpa/mlx5/core/mr.c
index 4b6195666c58..d300f799efcd 100644
--- a/drivers/vdpa/mlx5/core/mr.c
+++ b/drivers/vdpa/mlx5/core/mr.c
@@ -25,17 +25,6 @@ static int get_octo_len(u64 len, int page_shift)
return (npages + 1) / 2;
 }
 
-static void fill_sg(struct mlx5_vdpa_direct_mr *mr, void *in)
-{
-   struct scatterlist *sg;
-   __be64 *pas;
-   int i;
-
-   pas = MLX5_ADDR_OF(create_mkey_in, in, klm_pas_mtt);
-   for_each_sg(mr->sg_head.sgl, sg, mr->nsg, i)
-   (*pas) = cpu_to_be64(sg_dma_address(sg));
-}
-
 static void mlx5_set_access_mode(void *mkc, int mode)
 {
MLX5_SET(mkc, mkc, access_mode_1_0, mode & 0x3);
@@ -45,10 +34,18 @@ static void mlx5_set_access_mode(void *mkc, int mode)
 static void populate_mtts(struct mlx5_vdpa_direct_mr *mr, __be64 *mtt)
 {
struct scatterlist *sg;
+   int nsg = mr->nsg;
+   u64 dma_addr;
+   u64 dma_len;
+   int j = 0;
int i;
 
-   for_each_sg(mr->sg_head.sgl, sg, mr->nsg, i)
-   mtt[i] = cpu_to_be64(sg_dma_address(sg));
+   for_each_sg(mr->sg_head.sgl, sg, mr->nent, i) {
+   for (dma_addr = sg_dma_address(sg), dma_len = sg_dma_len(sg);
+nsg && dma_len;
+nsg--, dma_addr += BIT(mr->log_size), dma_len -= 
BIT(mr->log_size))
+   mtt[j++] = cpu_to_be64(dma_addr);
+   }
 }
 
 static int create_direct_mr(struct mlx5_vdpa_dev *mvdev, struct 
mlx5_vdpa_direct_mr *mr)
@@ -64,7 +61,6 @@ static int create_direct_mr(struct mlx5_vdpa_dev *mvdev, 
struct mlx5_vdpa_direct
return -ENOMEM;
 
MLX5_SET(create_mkey_in, in, uid, mvdev->res.uid);
-   fill_sg(mr, in);
mkc = MLX5_ADDR_OF(create_mkey_in, in, memory_key_mkey_entry);
MLX5_SET(mkc, mkc, lw, !!(mr->perm & VHOST_MAP_WO));
MLX5_SET(mkc, mkc, lr, !!(mr->perm & VHOST_MAP_RO));
@@ -276,8 +272,8 @@ static int map_direct_mr(struct mlx5_vdpa_dev *mvdev, 
struct mlx5_vdpa_direct_mr
 done:
mr->log_size = log_entity_size;
mr->nsg = nsg;
-   err = dma_map_sg_attrs(dma, mr->sg_head.sgl, mr->nsg, 
DMA_BIDIRECTIONAL, 0);
-   if (!err)
+   mr->nent = dma_map_sg_attrs(dma, mr->sg_head.sgl, mr->nsg, 
DMA_BIDIRECTIONAL, 0);
+   if (!mr->nent)
goto err_map;
 
err = create_direct_mr(mvdev, mr);
-- 
2.28.0



Re: [PATCH] vdpa/mlx5: Fix memory key MTT population

2021-01-06 Thread Eli Cohen
On Thu, Jan 07, 2021 at 12:15:53PM +0800, Jason Wang wrote:
> 
> On 2021/1/6 下午5:05, Eli Cohen wrote:
> > map_direct_mr() assumed that the number of scatter/gather entries
> > returned by dma_map_sg_attrs() was equal to the number of segments in
> > the sgl list. This led to wrong population of the mkey object. Fix this
> > by properly referring to the returned value.
> > 
> > In addition, get rid of fill_sg() whjich effect is overwritten bu
> > populate_mtts().
> 
> 
> Typo.
> 
Will fix, thanks.
> 
> > 
> > Fixes: 94abbccdf291 ("vdpa/mlx5: Add shared memory registration code")
> > Signed-off-by: Eli Cohen 
> > ---
> >   drivers/vdpa/mlx5/core/mlx5_vdpa.h |  1 +
> >   drivers/vdpa/mlx5/core/mr.c| 28 
> >   2 files changed, 13 insertions(+), 16 deletions(-)
> > 
> > diff --git a/drivers/vdpa/mlx5/core/mlx5_vdpa.h 
> > b/drivers/vdpa/mlx5/core/mlx5_vdpa.h
> > index 5c92a576edae..08f742fd2409 100644
> > --- a/drivers/vdpa/mlx5/core/mlx5_vdpa.h
> > +++ b/drivers/vdpa/mlx5/core/mlx5_vdpa.h
> > @@ -15,6 +15,7 @@ struct mlx5_vdpa_direct_mr {
> > struct sg_table sg_head;
> > int log_size;
> > int nsg;
> > +   int nent;
> > struct list_head list;
> > u64 offset;
> >   };
> > diff --git a/drivers/vdpa/mlx5/core/mr.c b/drivers/vdpa/mlx5/core/mr.c
> > index 4b6195666c58..d300f799efcd 100644
> > --- a/drivers/vdpa/mlx5/core/mr.c
> > +++ b/drivers/vdpa/mlx5/core/mr.c
> > @@ -25,17 +25,6 @@ static int get_octo_len(u64 len, int page_shift)
> > return (npages + 1) / 2;
> >   }
> > -static void fill_sg(struct mlx5_vdpa_direct_mr *mr, void *in)
> > -{
> > -   struct scatterlist *sg;
> > -   __be64 *pas;
> > -   int i;
> > -
> > -   pas = MLX5_ADDR_OF(create_mkey_in, in, klm_pas_mtt);
> > -   for_each_sg(mr->sg_head.sgl, sg, mr->nsg, i)
> > -   (*pas) = cpu_to_be64(sg_dma_address(sg));
> > -}
> > -
> >   static void mlx5_set_access_mode(void *mkc, int mode)
> >   {
> > MLX5_SET(mkc, mkc, access_mode_1_0, mode & 0x3);
> > @@ -45,10 +34,18 @@ static void mlx5_set_access_mode(void *mkc, int mode)
> >   static void populate_mtts(struct mlx5_vdpa_direct_mr *mr, __be64 *mtt)
> >   {
> > struct scatterlist *sg;
> > +   int nsg = mr->nsg;
> > +   u64 dma_addr;
> > +   u64 dma_len;
> > +   int j = 0;
> > int i;
> > -   for_each_sg(mr->sg_head.sgl, sg, mr->nsg, i)
> > -   mtt[i] = cpu_to_be64(sg_dma_address(sg));
> > +   for_each_sg(mr->sg_head.sgl, sg, mr->nent, i) {
> > +   for (dma_addr = sg_dma_address(sg), dma_len = sg_dma_len(sg);
> > +nsg && dma_len;
> > +nsg--, dma_addr += BIT(mr->log_size), dma_len -= 
> > BIT(mr->log_size))
> > +   mtt[j++] = cpu_to_be64(dma_addr);
> 
> 
> It looks to me the mtt entry is also limited by log_size. It's better to
> explain this a little bit in the commit log.

Actually, each MTT entry covers (1 << mr->log_size) contiguous memory.
I will add an explanation.

> 
> Thanks
> 
> 
> > +   }
> >   }
> >   static int create_direct_mr(struct mlx5_vdpa_dev *mvdev, struct 
> > mlx5_vdpa_direct_mr *mr)
> > @@ -64,7 +61,6 @@ static int create_direct_mr(struct mlx5_vdpa_dev *mvdev, 
> > struct mlx5_vdpa_direct
> > return -ENOMEM;
> > MLX5_SET(create_mkey_in, in, uid, mvdev->res.uid);
> > -   fill_sg(mr, in);
> > mkc = MLX5_ADDR_OF(create_mkey_in, in, memory_key_mkey_entry);
> > MLX5_SET(mkc, mkc, lw, !!(mr->perm & VHOST_MAP_WO));
> > MLX5_SET(mkc, mkc, lr, !!(mr->perm & VHOST_MAP_RO));
> > @@ -276,8 +272,8 @@ static int map_direct_mr(struct mlx5_vdpa_dev *mvdev, 
> > struct mlx5_vdpa_direct_mr
> >   done:
> > mr->log_size = log_entity_size;
> > mr->nsg = nsg;
> > -   err = dma_map_sg_attrs(dma, mr->sg_head.sgl, mr->nsg, 
> > DMA_BIDIRECTIONAL, 0);
> > -   if (!err)
> > +   mr->nent = dma_map_sg_attrs(dma, mr->sg_head.sgl, mr->nsg, 
> > DMA_BIDIRECTIONAL, 0);
> > +   if (!mr->nent)
> > goto err_map;
> > err = create_direct_mr(mvdev, mr);
> 


[PATCH] vdpa/mlx5: Fix memory key MTT population

2021-01-06 Thread Eli Cohen
map_direct_mr() assumed that the number of scatter/gather entries
returned by dma_map_sg_attrs() was equal to the number of segments in
the sgl list. This led to wrong population of the mkey object. Fix this
by properly referring to the returned value.

In addition, get rid of fill_sg() whjich effect is overwritten bu
populate_mtts().

Fixes: 94abbccdf291 ("vdpa/mlx5: Add shared memory registration code")
Signed-off-by: Eli Cohen 
---
 drivers/vdpa/mlx5/core/mlx5_vdpa.h |  1 +
 drivers/vdpa/mlx5/core/mr.c| 28 
 2 files changed, 13 insertions(+), 16 deletions(-)

diff --git a/drivers/vdpa/mlx5/core/mlx5_vdpa.h 
b/drivers/vdpa/mlx5/core/mlx5_vdpa.h
index 5c92a576edae..08f742fd2409 100644
--- a/drivers/vdpa/mlx5/core/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/core/mlx5_vdpa.h
@@ -15,6 +15,7 @@ struct mlx5_vdpa_direct_mr {
struct sg_table sg_head;
int log_size;
int nsg;
+   int nent;
struct list_head list;
u64 offset;
 };
diff --git a/drivers/vdpa/mlx5/core/mr.c b/drivers/vdpa/mlx5/core/mr.c
index 4b6195666c58..d300f799efcd 100644
--- a/drivers/vdpa/mlx5/core/mr.c
+++ b/drivers/vdpa/mlx5/core/mr.c
@@ -25,17 +25,6 @@ static int get_octo_len(u64 len, int page_shift)
return (npages + 1) / 2;
 }
 
-static void fill_sg(struct mlx5_vdpa_direct_mr *mr, void *in)
-{
-   struct scatterlist *sg;
-   __be64 *pas;
-   int i;
-
-   pas = MLX5_ADDR_OF(create_mkey_in, in, klm_pas_mtt);
-   for_each_sg(mr->sg_head.sgl, sg, mr->nsg, i)
-   (*pas) = cpu_to_be64(sg_dma_address(sg));
-}
-
 static void mlx5_set_access_mode(void *mkc, int mode)
 {
MLX5_SET(mkc, mkc, access_mode_1_0, mode & 0x3);
@@ -45,10 +34,18 @@ static void mlx5_set_access_mode(void *mkc, int mode)
 static void populate_mtts(struct mlx5_vdpa_direct_mr *mr, __be64 *mtt)
 {
struct scatterlist *sg;
+   int nsg = mr->nsg;
+   u64 dma_addr;
+   u64 dma_len;
+   int j = 0;
int i;
 
-   for_each_sg(mr->sg_head.sgl, sg, mr->nsg, i)
-   mtt[i] = cpu_to_be64(sg_dma_address(sg));
+   for_each_sg(mr->sg_head.sgl, sg, mr->nent, i) {
+   for (dma_addr = sg_dma_address(sg), dma_len = sg_dma_len(sg);
+nsg && dma_len;
+nsg--, dma_addr += BIT(mr->log_size), dma_len -= 
BIT(mr->log_size))
+   mtt[j++] = cpu_to_be64(dma_addr);
+   }
 }
 
 static int create_direct_mr(struct mlx5_vdpa_dev *mvdev, struct 
mlx5_vdpa_direct_mr *mr)
@@ -64,7 +61,6 @@ static int create_direct_mr(struct mlx5_vdpa_dev *mvdev, 
struct mlx5_vdpa_direct
return -ENOMEM;
 
MLX5_SET(create_mkey_in, in, uid, mvdev->res.uid);
-   fill_sg(mr, in);
mkc = MLX5_ADDR_OF(create_mkey_in, in, memory_key_mkey_entry);
MLX5_SET(mkc, mkc, lw, !!(mr->perm & VHOST_MAP_WO));
MLX5_SET(mkc, mkc, lr, !!(mr->perm & VHOST_MAP_RO));
@@ -276,8 +272,8 @@ static int map_direct_mr(struct mlx5_vdpa_dev *mvdev, 
struct mlx5_vdpa_direct_mr
 done:
mr->log_size = log_entity_size;
mr->nsg = nsg;
-   err = dma_map_sg_attrs(dma, mr->sg_head.sgl, mr->nsg, 
DMA_BIDIRECTIONAL, 0);
-   if (!err)
+   mr->nent = dma_map_sg_attrs(dma, mr->sg_head.sgl, mr->nsg, 
DMA_BIDIRECTIONAL, 0);
+   if (!mr->nent)
goto err_map;
 
err = create_direct_mr(mvdev, mr);
-- 
2.28.0



Re: [PATCH 12/21] vhost-vdpa: introduce uAPI to get the number of virtqueue groups

2020-12-30 Thread Eli Cohen
On Wed, Dec 16, 2020 at 02:48:09PM +0800, Jason Wang wrote:
> Follows the vDPA support for multiple address spaces, this patch
> introduce uAPI for the userspace to know the number of virtqueue
> groups supported by the vDPA device.
> 
> Signed-off-by: Jason Wang 
> ---
>  drivers/vhost/vdpa.c   | 4 
>  include/uapi/linux/vhost.h | 3 +++
>  2 files changed, 7 insertions(+)
> 
> diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
> index 060d5b5b7e64..1ba5901b28e7 100644
> --- a/drivers/vhost/vdpa.c
> +++ b/drivers/vhost/vdpa.c
> @@ -536,6 +536,10 @@ static long vhost_vdpa_unlocked_ioctl(struct file *filep,
>   case VHOST_VDPA_GET_VRING_NUM:
>   r = vhost_vdpa_get_vring_num(v, argp);
>   break;
> + case VHOST_VDPA_GET_GROUP_NUM:
> + r = copy_to_user(argp, >vdpa->ngroups,
> +  sizeof(v->vdpa->ngroups));
> + break;

Is this and other ioctls already supported in qemu?

>   case VHOST_SET_LOG_BASE:
>   case VHOST_SET_LOG_FD:
>   r = -ENOIOCTLCMD;
> diff --git a/include/uapi/linux/vhost.h b/include/uapi/linux/vhost.h
> index 59c6c0fbaba1..8a4e6e426bbf 100644
> --- a/include/uapi/linux/vhost.h
> +++ b/include/uapi/linux/vhost.h
> @@ -145,4 +145,7 @@
>  /* Get the valid iova range */
>  #define VHOST_VDPA_GET_IOVA_RANGE_IOR(VHOST_VIRTIO, 0x78, \
>struct vhost_vdpa_iova_range)
> +/* Get the number of virtqueue groups. */
> +#define VHOST_VDPA_GET_GROUP_NUM _IOR(VHOST_VIRTIO, 0x79, unsigned int)
> +
>  #endif
> -- 
> 2.25.1
> 


Re: [PATCH 07/21] vdpa: multiple address spaces support

2020-12-30 Thread Eli Cohen
On Wed, Dec 30, 2020 at 12:04:30PM +0800, Jason Wang wrote:
> 
> On 2020/12/29 下午3:28, Eli Cohen wrote:
> > > @@ -43,6 +43,8 @@ struct vdpa_vq_state {
> > >* @index: device index
> > >* @features_valid: were features initialized? for legacy guests
> > >* @nvqs: the number of virtqueues
> > > + * @ngroups: the number of virtqueue groups
> > > + * @nas: the number of address spaces
> > I am not sure these can be categorised as part of the state of the VQ.
> > It's more of a property so maybe we can have a callback to get the
> > properties of the VQ?
> 
> 
> Or maybe there's a misunderstanding of the patch.
> 

Yes, I misinterpreted the hunk. No issue here.

> Those two attributes belongs to vdpa_device instead of vdpa_vq_state
> actually.
> 
> Thanks
> 


Re: [PATCH 12/21] vhost-vdpa: introduce uAPI to get the number of virtqueue groups

2020-12-29 Thread Eli Cohen
On Wed, Dec 16, 2020 at 02:48:09PM +0800, Jason Wang wrote:
> Follows the vDPA support for multiple address spaces, this patch
> introduce uAPI for the userspace to know the number of virtqueue
> groups supported by the vDPA device.

Can you explain what exactly you mean be userspace? Is it just qemu or
is it destined to the virtio_net driver run by the qemu process?
Also can you say for what purpose?

> 
> Signed-off-by: Jason Wang 
> ---
>  drivers/vhost/vdpa.c   | 4 
>  include/uapi/linux/vhost.h | 3 +++
>  2 files changed, 7 insertions(+)
> 
> diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
> index 060d5b5b7e64..1ba5901b28e7 100644
> --- a/drivers/vhost/vdpa.c
> +++ b/drivers/vhost/vdpa.c
> @@ -536,6 +536,10 @@ static long vhost_vdpa_unlocked_ioctl(struct file *filep,
>   case VHOST_VDPA_GET_VRING_NUM:
>   r = vhost_vdpa_get_vring_num(v, argp);
>   break;
> + case VHOST_VDPA_GET_GROUP_NUM:
> + r = copy_to_user(argp, >vdpa->ngroups,
> +  sizeof(v->vdpa->ngroups));
> + break;
>   case VHOST_SET_LOG_BASE:
>   case VHOST_SET_LOG_FD:
>   r = -ENOIOCTLCMD;
> diff --git a/include/uapi/linux/vhost.h b/include/uapi/linux/vhost.h
> index 59c6c0fbaba1..8a4e6e426bbf 100644
> --- a/include/uapi/linux/vhost.h
> +++ b/include/uapi/linux/vhost.h
> @@ -145,4 +145,7 @@
>  /* Get the valid iova range */
>  #define VHOST_VDPA_GET_IOVA_RANGE_IOR(VHOST_VIRTIO, 0x78, \
>struct vhost_vdpa_iova_range)
> +/* Get the number of virtqueue groups. */
> +#define VHOST_VDPA_GET_GROUP_NUM _IOR(VHOST_VIRTIO, 0x79, unsigned int)
> +
>  #endif
> -- 
> 2.25.1
> 


Re: [PATCH 11/21] vhost-vdpa: introduce asid based IOTLB

2020-12-29 Thread Eli Cohen
On Wed, Dec 16, 2020 at 02:48:08PM +0800, Jason Wang wrote:
> This patch converts the vhost-vDPA device to support multiple IOTLBs
> tagged via ASID via hlist. This will be used for supporting multiple
> address spaces in the following patches.
> 
> Signed-off-by: Jason Wang 
> ---
>  drivers/vhost/vdpa.c | 106 ---
>  1 file changed, 80 insertions(+), 26 deletions(-)
> 
> diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
> index feb6a58df22d..060d5b5b7e64 100644
> --- a/drivers/vhost/vdpa.c
> +++ b/drivers/vhost/vdpa.c
> @@ -33,13 +33,21 @@ enum {
>  
>  #define VHOST_VDPA_DEV_MAX (1U << MINORBITS)
>  
> +#define VHOST_VDPA_IOTLB_BUCKETS 16
> +
> +struct vhost_vdpa_as {
> + struct hlist_node hash_link;
> + struct vhost_iotlb iotlb;
> + u32 id;
> +};
> +
>  struct vhost_vdpa {
>   struct vhost_dev vdev;
>   struct iommu_domain *domain;
>   struct vhost_virtqueue *vqs;
>   struct completion completion;
>   struct vdpa_device *vdpa;
> - struct vhost_iotlb *iotlb;
> + struct hlist_head as[VHOST_VDPA_IOTLB_BUCKETS];
>   struct device dev;
>   struct cdev cdev;
>   atomic_t opened;
> @@ -49,12 +57,64 @@ struct vhost_vdpa {
>   struct eventfd_ctx *config_ctx;
>   int in_batch;
>   struct vdpa_iova_range range;
> + int used_as;
>  };
>  
>  static DEFINE_IDA(vhost_vdpa_ida);
>  
>  static dev_t vhost_vdpa_major;
>  
> +static struct vhost_vdpa_as *asid_to_as(struct vhost_vdpa *v, u32 asid)
> +{
> + struct hlist_head *head = >as[asid % VHOST_VDPA_IOTLB_BUCKETS];
> + struct vhost_vdpa_as *as;
> +
> + hlist_for_each_entry(as, head, hash_link)
> + if (as->id == asid)
> + return as;
> +
> + return NULL;
> +}
> +
> +static struct vhost_vdpa_as *vhost_vdpa_alloc_as(struct vhost_vdpa *v, u32 
> asid)
> +{
> + struct hlist_head *head = >as[asid % VHOST_VDPA_IOTLB_BUCKETS];
> + struct vhost_vdpa_as *as;
> +
> + if (asid_to_as(v, asid))
> + return NULL;
> +
> + as = kmalloc(sizeof(*as), GFP_KERNEL);
> + if (!as)
> + return NULL;
> +
> + vhost_iotlb_init(>iotlb, 0, 0);
> + as->id = asid;
> + hlist_add_head(>hash_link, head);
> + ++v->used_as;
> +
> + return as;
> +}
> +
> +static int vhost_vdpa_remove_as(struct vhost_vdpa *v, u32 asid)

The return value is never interpreted. I think it should either be made
void or return values checked.

> +{
> + struct vhost_vdpa_as *as = asid_to_as(v, asid);
> +
> + /* Remove default address space is not allowed */
> + if (asid == 0)
> + return -EINVAL;

Can you explain why? I think you have a memory leak due to this as no
one will ever free as with id 0.

> +
> + if (!as)
> + return -EINVAL;
> +
> + hlist_del(>hash_link);
> + vhost_iotlb_reset(>iotlb);
> + kfree(as);
> + --v->used_as;
> +
> + return 0;
> +}
> +
>  static void handle_vq_kick(struct vhost_work *work)
>  {
>   struct vhost_virtqueue *vq = container_of(work, struct vhost_virtqueue,
> @@ -525,15 +585,6 @@ static void vhost_vdpa_iotlb_unmap(struct vhost_vdpa *v,
>   }
>  }
>  
> -static void vhost_vdpa_iotlb_free(struct vhost_vdpa *v)
> -{
> - struct vhost_iotlb *iotlb = v->iotlb;
> -
> - vhost_vdpa_iotlb_unmap(v, iotlb, 0ULL, 0ULL - 1);
> - kfree(v->iotlb);
> - v->iotlb = NULL;
> -}
> -
>  static int perm_to_iommu_flags(u32 perm)
>  {
>   int flags = 0;
> @@ -745,7 +796,8 @@ static int vhost_vdpa_process_iotlb_msg(struct vhost_dev 
> *dev, u32 asid,
>   struct vhost_vdpa *v = container_of(dev, struct vhost_vdpa, vdev);
>   struct vdpa_device *vdpa = v->vdpa;
>   const struct vdpa_config_ops *ops = vdpa->config;
> - struct vhost_iotlb *iotlb = v->iotlb;
> + struct vhost_vdpa_as *as = asid_to_as(v, 0);
> + struct vhost_iotlb *iotlb = >iotlb;
>   int r = 0;
>  
>   if (asid != 0)
> @@ -856,6 +908,13 @@ static void vhost_vdpa_set_iova_range(struct vhost_vdpa 
> *v)
>   }
>  }
>  
> +static void vhost_vdpa_cleanup(struct vhost_vdpa *v)
> +{
> + vhost_dev_cleanup(>vdev);
> + kfree(v->vdev.vqs);
> + vhost_vdpa_remove_as(v, 0);
> +}
> +
>  static int vhost_vdpa_open(struct inode *inode, struct file *filep)
>  {
>   struct vhost_vdpa *v;
> @@ -886,15 +945,12 @@ static int vhost_vdpa_open(struct inode *inode, struct 
> file *filep)
>   vhost_dev_init(dev, vqs, nvqs, 0, 0, 0, false,
>  vhost_vdpa_process_iotlb_msg);
>  
> - v->iotlb = vhost_iotlb_alloc(0, 0);
> - if (!v->iotlb) {
> - r = -ENOMEM;
> - goto err_init_iotlb;
> - }
> + if (!vhost_vdpa_alloc_as(v, 0))
> + goto err_alloc_as;
>  
>   r = vhost_vdpa_alloc_domain(v);
>   if (r)
> - goto err_alloc_domain;
> + goto err_alloc_as;
>  
>   vhost_vdpa_set_iova_range(v);
>  
> @@ -902,11 +958,8 @@ static int 

Re: [PATCH 11/21] vhost-vdpa: introduce asid based IOTLB

2020-12-29 Thread Eli Cohen
On Wed, Dec 16, 2020 at 02:48:08PM +0800, Jason Wang wrote:
> This patch converts the vhost-vDPA device to support multiple IOTLBs
> tagged via ASID via hlist. This will be used for supporting multiple
> address spaces in the following patches.
> 
> Signed-off-by: Jason Wang 
> ---
>  drivers/vhost/vdpa.c | 106 ---
>  1 file changed, 80 insertions(+), 26 deletions(-)
> 
> diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
> index feb6a58df22d..060d5b5b7e64 100644
> --- a/drivers/vhost/vdpa.c
> +++ b/drivers/vhost/vdpa.c
> @@ -33,13 +33,21 @@ enum {
>  
>  #define VHOST_VDPA_DEV_MAX (1U << MINORBITS)
>  
> +#define VHOST_VDPA_IOTLB_BUCKETS 16
> +
> +struct vhost_vdpa_as {
> + struct hlist_node hash_link;
> + struct vhost_iotlb iotlb;
> + u32 id;
> +};
> +
>  struct vhost_vdpa {
>   struct vhost_dev vdev;
>   struct iommu_domain *domain;
>   struct vhost_virtqueue *vqs;
>   struct completion completion;
>   struct vdpa_device *vdpa;
> - struct vhost_iotlb *iotlb;
> + struct hlist_head as[VHOST_VDPA_IOTLB_BUCKETS];
>   struct device dev;
>   struct cdev cdev;
>   atomic_t opened;
> @@ -49,12 +57,64 @@ struct vhost_vdpa {
>   struct eventfd_ctx *config_ctx;
>   int in_batch;
>   struct vdpa_iova_range range;
> + int used_as;
>  };
>  
>  static DEFINE_IDA(vhost_vdpa_ida);
>  
>  static dev_t vhost_vdpa_major;
>  
> +static struct vhost_vdpa_as *asid_to_as(struct vhost_vdpa *v, u32 asid)
> +{
> + struct hlist_head *head = >as[asid % VHOST_VDPA_IOTLB_BUCKETS];
> + struct vhost_vdpa_as *as;
> +
> + hlist_for_each_entry(as, head, hash_link)
> + if (as->id == asid)
> + return as;
> +
> + return NULL;
> +}
> +
> +static struct vhost_vdpa_as *vhost_vdpa_alloc_as(struct vhost_vdpa *v, u32 
> asid)
> +{
> + struct hlist_head *head = >as[asid % VHOST_VDPA_IOTLB_BUCKETS];
> + struct vhost_vdpa_as *as;
> +
> + if (asid_to_as(v, asid))
> + return NULL;
> +
> + as = kmalloc(sizeof(*as), GFP_KERNEL);

kzalloc()? See comment below.

> + if (!as)
> + return NULL;
> +
> + vhost_iotlb_init(>iotlb, 0, 0);
> + as->id = asid;
> + hlist_add_head(>hash_link, head);
> + ++v->used_as;

Although you eventually ended up removing used_as, this is a bug since
you're incrementing a random value. Maybe it's better to be on the safe
side and use kzalloc() for as above.

> +
> + return as;
> +}
> +
> +static int vhost_vdpa_remove_as(struct vhost_vdpa *v, u32 asid)
> +{
> + struct vhost_vdpa_as *as = asid_to_as(v, asid);
> +
> + /* Remove default address space is not allowed */
> + if (asid == 0)
> + return -EINVAL;
> +
> + if (!as)
> + return -EINVAL;
> +
> + hlist_del(>hash_link);
> + vhost_iotlb_reset(>iotlb);
> + kfree(as);
> + --v->used_as;
> +
> + return 0;
> +}
> +
>  static void handle_vq_kick(struct vhost_work *work)
>  {
>   struct vhost_virtqueue *vq = container_of(work, struct vhost_virtqueue,
> @@ -525,15 +585,6 @@ static void vhost_vdpa_iotlb_unmap(struct vhost_vdpa *v,
>   }
>  }
>  
> -static void vhost_vdpa_iotlb_free(struct vhost_vdpa *v)
> -{
> - struct vhost_iotlb *iotlb = v->iotlb;
> -
> - vhost_vdpa_iotlb_unmap(v, iotlb, 0ULL, 0ULL - 1);
> - kfree(v->iotlb);
> - v->iotlb = NULL;
> -}
> -
>  static int perm_to_iommu_flags(u32 perm)
>  {
>   int flags = 0;
> @@ -745,7 +796,8 @@ static int vhost_vdpa_process_iotlb_msg(struct vhost_dev 
> *dev, u32 asid,
>   struct vhost_vdpa *v = container_of(dev, struct vhost_vdpa, vdev);
>   struct vdpa_device *vdpa = v->vdpa;
>   const struct vdpa_config_ops *ops = vdpa->config;
> - struct vhost_iotlb *iotlb = v->iotlb;
> + struct vhost_vdpa_as *as = asid_to_as(v, 0);
> + struct vhost_iotlb *iotlb = >iotlb;
>   int r = 0;
>  
>   if (asid != 0)
> @@ -856,6 +908,13 @@ static void vhost_vdpa_set_iova_range(struct vhost_vdpa 
> *v)
>   }
>  }
>  
> +static void vhost_vdpa_cleanup(struct vhost_vdpa *v)
> +{
> + vhost_dev_cleanup(>vdev);
> + kfree(v->vdev.vqs);
> + vhost_vdpa_remove_as(v, 0);
> +}
> +
>  static int vhost_vdpa_open(struct inode *inode, struct file *filep)
>  {
>   struct vhost_vdpa *v;
> @@ -886,15 +945,12 @@ static int vhost_vdpa_open(struct inode *inode, struct 
> file *filep)
>   vhost_dev_init(dev, vqs, nvqs, 0, 0, 0, false,
>  vhost_vdpa_process_iotlb_msg);
>  
> - v->iotlb = vhost_iotlb_alloc(0, 0);
> - if (!v->iotlb) {
> - r = -ENOMEM;
> - goto err_init_iotlb;
> - }
> + if (!vhost_vdpa_alloc_as(v, 0))
> + goto err_alloc_as;
>  
>   r = vhost_vdpa_alloc_domain(v);
>   if (r)
> - goto err_alloc_domain;
> + goto err_alloc_as;
>  
>   vhost_vdpa_set_iova_range(v);
>  
> @@ -902,11 +958,8 @@ static int 

Re: [PATCH 11/21] vhost-vdpa: introduce asid based IOTLB

2020-12-29 Thread Eli Cohen
On Wed, Dec 16, 2020 at 02:48:08PM +0800, Jason Wang wrote:
> This patch converts the vhost-vDPA device to support multiple IOTLBs
> tagged via ASID via hlist. This will be used for supporting multiple
> address spaces in the following patches.
> 
> Signed-off-by: Jason Wang 
> ---
>  drivers/vhost/vdpa.c | 106 ---
>  1 file changed, 80 insertions(+), 26 deletions(-)
> 
> diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
> index feb6a58df22d..060d5b5b7e64 100644
> --- a/drivers/vhost/vdpa.c
> +++ b/drivers/vhost/vdpa.c
> @@ -33,13 +33,21 @@ enum {
>  
>  #define VHOST_VDPA_DEV_MAX (1U << MINORBITS)
>  
> +#define VHOST_VDPA_IOTLB_BUCKETS 16
> +
> +struct vhost_vdpa_as {
> + struct hlist_node hash_link;
> + struct vhost_iotlb iotlb;
> + u32 id;
> +};
> +
>  struct vhost_vdpa {
>   struct vhost_dev vdev;
>   struct iommu_domain *domain;
>   struct vhost_virtqueue *vqs;
>   struct completion completion;
>   struct vdpa_device *vdpa;
> - struct vhost_iotlb *iotlb;
> + struct hlist_head as[VHOST_VDPA_IOTLB_BUCKETS];
>   struct device dev;
>   struct cdev cdev;
>   atomic_t opened;
> @@ -49,12 +57,64 @@ struct vhost_vdpa {
>   struct eventfd_ctx *config_ctx;
>   int in_batch;
>   struct vdpa_iova_range range;
> + int used_as;

This is not really used. Not in this patch and later removed.

>  };
>  
>  static DEFINE_IDA(vhost_vdpa_ida);
>  
>  static dev_t vhost_vdpa_major;
>  
> +static struct vhost_vdpa_as *asid_to_as(struct vhost_vdpa *v, u32 asid)
> +{
> + struct hlist_head *head = >as[asid % VHOST_VDPA_IOTLB_BUCKETS];
> + struct vhost_vdpa_as *as;
> +
> + hlist_for_each_entry(as, head, hash_link)
> + if (as->id == asid)
> + return as;
> +
> + return NULL;
> +}
> +
> +static struct vhost_vdpa_as *vhost_vdpa_alloc_as(struct vhost_vdpa *v, u32 
> asid)
> +{
> + struct hlist_head *head = >as[asid % VHOST_VDPA_IOTLB_BUCKETS];
> + struct vhost_vdpa_as *as;
> +
> + if (asid_to_as(v, asid))
> + return NULL;
> +
> + as = kmalloc(sizeof(*as), GFP_KERNEL);
> + if (!as)
> + return NULL;
> +
> + vhost_iotlb_init(>iotlb, 0, 0);
> + as->id = asid;
> + hlist_add_head(>hash_link, head);
> + ++v->used_as;
> +
> + return as;
> +}
> +
> +static int vhost_vdpa_remove_as(struct vhost_vdpa *v, u32 asid)
> +{
> + struct vhost_vdpa_as *as = asid_to_as(v, asid);
> +
> + /* Remove default address space is not allowed */
> + if (asid == 0)
> + return -EINVAL;
> +
> + if (!as)
> + return -EINVAL;
> +
> + hlist_del(>hash_link);
> + vhost_iotlb_reset(>iotlb);
> + kfree(as);
> + --v->used_as;
> +
> + return 0;
> +}
> +
>  static void handle_vq_kick(struct vhost_work *work)
>  {
>   struct vhost_virtqueue *vq = container_of(work, struct vhost_virtqueue,
> @@ -525,15 +585,6 @@ static void vhost_vdpa_iotlb_unmap(struct vhost_vdpa *v,
>   }
>  }
>  
> -static void vhost_vdpa_iotlb_free(struct vhost_vdpa *v)
> -{
> - struct vhost_iotlb *iotlb = v->iotlb;
> -
> - vhost_vdpa_iotlb_unmap(v, iotlb, 0ULL, 0ULL - 1);
> - kfree(v->iotlb);
> - v->iotlb = NULL;
> -}
> -
>  static int perm_to_iommu_flags(u32 perm)
>  {
>   int flags = 0;
> @@ -745,7 +796,8 @@ static int vhost_vdpa_process_iotlb_msg(struct vhost_dev 
> *dev, u32 asid,
>   struct vhost_vdpa *v = container_of(dev, struct vhost_vdpa, vdev);
>   struct vdpa_device *vdpa = v->vdpa;
>   const struct vdpa_config_ops *ops = vdpa->config;
> - struct vhost_iotlb *iotlb = v->iotlb;
> + struct vhost_vdpa_as *as = asid_to_as(v, 0);
> + struct vhost_iotlb *iotlb = >iotlb;
>   int r = 0;
>  
>   if (asid != 0)
> @@ -856,6 +908,13 @@ static void vhost_vdpa_set_iova_range(struct vhost_vdpa 
> *v)
>   }
>  }
>  
> +static void vhost_vdpa_cleanup(struct vhost_vdpa *v)
> +{
> + vhost_dev_cleanup(>vdev);
> + kfree(v->vdev.vqs);
> + vhost_vdpa_remove_as(v, 0);
> +}
> +
>  static int vhost_vdpa_open(struct inode *inode, struct file *filep)
>  {
>   struct vhost_vdpa *v;
> @@ -886,15 +945,12 @@ static int vhost_vdpa_open(struct inode *inode, struct 
> file *filep)
>   vhost_dev_init(dev, vqs, nvqs, 0, 0, 0, false,
>  vhost_vdpa_process_iotlb_msg);
>  
> - v->iotlb = vhost_iotlb_alloc(0, 0);
> - if (!v->iotlb) {
> - r = -ENOMEM;
> - goto err_init_iotlb;
> - }
> + if (!vhost_vdpa_alloc_as(v, 0))
> + goto err_alloc_as;
>  
>   r = vhost_vdpa_alloc_domain(v);
>   if (r)
> - goto err_alloc_domain;
> + goto err_alloc_as;
>  
>   vhost_vdpa_set_iova_range(v);
>  
> @@ -902,11 +958,8 @@ static int vhost_vdpa_open(struct inode *inode, struct 
> file *filep)
>  
>   return 0;
>  
> -err_alloc_domain:
> - vhost_vdpa_iotlb_free(v);
> 

Re: [PATCH 10/21] vhost: support ASID in IOTLB API

2020-12-29 Thread Eli Cohen
On Wed, Dec 16, 2020 at 02:48:07PM +0800, Jason Wang wrote:
> This patches allows userspace to send ASID based IOTLB message to
> vhost. This idea is to use the reserved u32 field in the existing V2
> IOTLB message. Vhost device should advertise this capability via
> VHOST_BACKEND_F_IOTLB_ASID backend feature.
> 
> Signed-off-by: Jason Wang 
> ---
>  drivers/vhost/vdpa.c |  5 -
>  drivers/vhost/vhost.c| 23 ++-
>  drivers/vhost/vhost.h|  4 ++--
>  include/uapi/linux/vhost_types.h |  5 -
>  4 files changed, 28 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
> index 03a9b3311c6c..feb6a58df22d 100644
> --- a/drivers/vhost/vdpa.c
> +++ b/drivers/vhost/vdpa.c
> @@ -739,7 +739,7 @@ static int vhost_vdpa_process_iotlb_update(struct 
> vhost_vdpa *v,
>   return ret;
>  }
>  
> -static int vhost_vdpa_process_iotlb_msg(struct vhost_dev *dev,
> +static int vhost_vdpa_process_iotlb_msg(struct vhost_dev *dev, u32 asid,
>   struct vhost_iotlb_msg *msg)
>  {
>   struct vhost_vdpa *v = container_of(dev, struct vhost_vdpa, vdev);
> @@ -748,6 +748,9 @@ static int vhost_vdpa_process_iotlb_msg(struct vhost_dev 
> *dev,
>   struct vhost_iotlb *iotlb = v->iotlb;
>   int r = 0;
>  
> + if (asid != 0)
> + return -EINVAL;
> +
>   r = vhost_dev_check_owner(dev);
>   if (r)
>   return r;
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index a262e12c6dc2..7477b724c29b 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -468,7 +468,7 @@ void vhost_dev_init(struct vhost_dev *dev,
>   struct vhost_virtqueue **vqs, int nvqs,
>   int iov_limit, int weight, int byte_weight,
>   bool use_worker,
> - int (*msg_handler)(struct vhost_dev *dev,
> + int (*msg_handler)(struct vhost_dev *dev, u32 asid,
>  struct vhost_iotlb_msg *msg))
>  {
>   struct vhost_virtqueue *vq;
> @@ -1084,11 +1084,14 @@ static bool umem_access_ok(u64 uaddr, u64 size, int 
> access)
>   return true;
>  }
>  
> -static int vhost_process_iotlb_msg(struct vhost_dev *dev,
> +static int vhost_process_iotlb_msg(struct vhost_dev *dev, u16 asid,
>  struct vhost_iotlb_msg *msg)
>  {
>   int ret = 0;
>  
> + if (asid != 0)
> + return -EINVAL;
> +
>   mutex_lock(>mutex);
>   vhost_dev_lock_vqs(dev);
>   switch (msg->type) {
> @@ -1135,6 +1138,7 @@ ssize_t vhost_chr_write_iter(struct vhost_dev *dev,
>   struct vhost_iotlb_msg msg;
>   size_t offset;
>   int type, ret;
> + u16 asid = 0;

You assume asid occupies just 16 bits. So maybe you should reserve the
other 16 bits for future extension:

struct vhost_msg_v2 {
__u32 type;
-   __u32 reserved;
+   __u16 asid;
+   __u16 reserved;
union {

Moreover, maybe this should be reflected in previous patches that use
the asid:

-static int mlx5_vdpa_set_map(struct vdpa_device *vdev, struct vhost_iotlb 
*iotlb)
+static int mlx5_vdpa_set_map(struct vdpa_device *vdev, u16 asid,
+struct vhost_iotlb *iotlb)

-static int vhost_vdpa_process_iotlb_msg(struct vhost_dev *dev,
+static int vhost_vdpa_process_iotlb_msg(struct vhost_dev *dev, u16 asid,
struct vhost_iotlb_msg *msg)

etc.

>  
>   ret = copy_from_iter(, sizeof(type), from);
>   if (ret != sizeof(type)) {
> @@ -1150,7 +1154,16 @@ ssize_t vhost_chr_write_iter(struct vhost_dev *dev,
>   offset = offsetof(struct vhost_msg, iotlb) - sizeof(int);
>   break;
>   case VHOST_IOTLB_MSG_V2:
> - offset = sizeof(__u32);
> + if (vhost_backend_has_feature(dev->vqs[0],
> +   VHOST_BACKEND_F_IOTLB_ASID)) {
> + ret = copy_from_iter(, sizeof(asid), from);
> + if (ret != sizeof(asid)) {
> + ret = -EINVAL;
> + goto done;
> + }
> + offset = sizeof(__u16);
> + } else
> + offset = sizeof(__u32);
>   break;
>   default:
>   ret = -EINVAL;
> @@ -1165,9 +1178,9 @@ ssize_t vhost_chr_write_iter(struct vhost_dev *dev,
>   }
>  
>   if (dev->msg_handler)
> - ret = dev->msg_handler(dev, );
> + ret = dev->msg_handler(dev, asid, );
>   else
> - ret = vhost_process_iotlb_msg(dev, );
> + ret = vhost_process_iotlb_msg(dev, asid, );
>   if (ret) {
>   ret = -EFAULT;
>   goto done;
> diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
> index b063324c7669..19753a90875c 100644
> --- a/drivers/vhost/vhost.h
> +++ 

Re: [PATCH 07/21] vdpa: multiple address spaces support

2020-12-28 Thread Eli Cohen
On Wed, Dec 16, 2020 at 02:48:04PM +0800, Jason Wang wrote:
> This patches introduces the multiple address spaces support for vDPA
> device. This idea is to identify a specific address space via an
> dedicated identifier - ASID.
> 
> During vDPA device allocation, vDPA device driver needs to report the
> number of address spaces supported by the device then the DMA mapping
> ops of the vDPA device needs to be extended to support ASID.
> 
> This helps to isolate the environments for the virtqueue that will not
> be assigned directly. E.g in the case of virtio-net, the control
> virtqueue will not be assigned directly to guest.
> 
> As a start, simply claim 1 virtqueue groups and 1 address spaces for
> all vDPA devices. And vhost-vDPA will simply reject the device with
> more than 1 virtqueue groups or address spaces.
> 
> Signed-off-by: Jason Wang 
> ---
>  drivers/vdpa/ifcvf/ifcvf_main.c   |  2 +-
>  drivers/vdpa/mlx5/net/mlx5_vnet.c |  5 +++--
>  drivers/vdpa/vdpa.c   |  4 +++-
>  drivers/vdpa/vdpa_sim/vdpa_sim.c  | 10 ++
>  drivers/vhost/vdpa.c  | 14 +-
>  include/linux/vdpa.h  | 23 ---
>  6 files changed, 38 insertions(+), 20 deletions(-)
> 
> diff --git a/drivers/vdpa/ifcvf/ifcvf_main.c b/drivers/vdpa/ifcvf/ifcvf_main.c
> index c629f4fcc738..8a43f562b169 100644
> --- a/drivers/vdpa/ifcvf/ifcvf_main.c
> +++ b/drivers/vdpa/ifcvf/ifcvf_main.c
> @@ -445,7 +445,7 @@ static int ifcvf_probe(struct pci_dev *pdev, const struct 
> pci_device_id *id)
>  
>   adapter = vdpa_alloc_device(struct ifcvf_adapter, vdpa,
>   dev, _vdpa_ops,
> - IFCVF_MAX_QUEUE_PAIRS * 2, 1);
> + IFCVF_MAX_QUEUE_PAIRS * 2, 1, 1);
>  
>   if (adapter == NULL) {
>   IFCVF_ERR(pdev, "Failed to allocate vDPA structure");
> diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
> b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> index 719b52fcc547..7aaf0a4ee80d 100644
> --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> @@ -1804,7 +1804,8 @@ static u32 mlx5_vdpa_get_generation(struct vdpa_device 
> *vdev)
>   return mvdev->generation;
>  }
>  
> -static int mlx5_vdpa_set_map(struct vdpa_device *vdev, struct vhost_iotlb 
> *iotlb)
> +static int mlx5_vdpa_set_map(struct vdpa_device *vdev, unsigned int asid,
> +  struct vhost_iotlb *iotlb)
>  {
>   struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
>   struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
> @@ -1947,7 +1948,7 @@ void *mlx5_vdpa_add_dev(struct mlx5_core_dev *mdev)
>   max_vqs = min_t(u32, max_vqs, MLX5_MAX_SUPPORTED_VQS);
>  
>   ndev = vdpa_alloc_device(struct mlx5_vdpa_net, mvdev.vdev, 
> mdev->device, _vdpa_ops,
> -  2 * mlx5_vdpa_max_qps(max_vqs), 1);
> +  2 * mlx5_vdpa_max_qps(max_vqs), 1, 1);
>   if (IS_ERR(ndev))
>   return ndev;
>  
> diff --git a/drivers/vdpa/vdpa.c b/drivers/vdpa/vdpa.c
> index 46399746ec7c..05195fa7865d 100644
> --- a/drivers/vdpa/vdpa.c
> +++ b/drivers/vdpa/vdpa.c
> @@ -63,6 +63,7 @@ static void vdpa_release_dev(struct device *d)
>   * @config: the bus operations that is supported by this device
>   * @nvqs: number of virtqueues supported by this device
>   * @ngroups: number of groups supported by this device
> + * @nas: number of address spaces supported by this device
>   * @size: size of the parent structure that contains private data
>   *
>   * Driver should use vdpa_alloc_device() wrapper macro instead of
> @@ -74,7 +75,7 @@ static void vdpa_release_dev(struct device *d)
>  struct vdpa_device *__vdpa_alloc_device(struct device *parent,
>   const struct vdpa_config_ops *config,
>   int nvqs, unsigned int ngroups,
> - size_t size)
> + unsigned int nas, size_t size)
>  {
>   struct vdpa_device *vdev;
>   int err = -EINVAL;
> @@ -102,6 +103,7 @@ struct vdpa_device *__vdpa_alloc_device(struct device 
> *parent,
>   vdev->features_valid = false;
>   vdev->nvqs = nvqs;
>   vdev->ngroups = ngroups;
> + vdev->nas = nas;
>  
>   err = dev_set_name(>dev, "vdpa%u", vdev->index);
>   if (err)
> diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c 
> b/drivers/vdpa/vdpa_sim/vdpa_sim.c
> index 5d554b3cd152..140de452 100644
> --- a/drivers/vdpa/vdpa_sim/vdpa_sim.c
> +++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c
> @@ -359,7 +359,7 @@ static struct vdpasim *vdpasim_create(void)
>   ops = _net_config_ops;
>  
>   vdpasim = vdpa_alloc_device(struct vdpasim, vdpa, NULL, ops,
> - VDPASIM_VQ_NUM, 1);
> + VDPASIM_VQ_NUM, 1, 1);
>   if (!vdpasim)
>   goto err_alloc;
>  
> @@ -606,7 +606,7 @@ static 

Re: [PATCH 00/21] Control VQ support in vDPA

2020-12-16 Thread Eli Cohen
On Wed, Dec 16, 2020 at 02:47:57PM +0800, Jason Wang wrote:

Hi Jason,
I saw the patchset and will start reviewing it starting Dec 27. I am out
of office next week.

> Hi All:
> 
> This series tries to add the support for control virtqueue in vDPA.
> 
> Control virtqueue is used by networking device for accepting various
> commands from the driver. It's a must to support multiqueue and other
> configurations.
> 
> When used by vhost-vDPA bus driver for VM, the control virtqueue
> should be shadowed via userspace VMM (Qemu) instead of being assigned
> directly to Guest. This is because Qemu needs to know the device state
> in order to start and stop device correctly (e.g for Live Migration).
> 
> This requies to isolate the memory mapping for control virtqueue
> presented by vhost-vDPA to prevent guest from accesing it directly.
> 
> To achieve this, vDPA introduce two new abstractions:
> 
> - address space: identified through address space id (ASID) and a set
>  of memory mapping in maintained
> - virtqueue group: the minimal set of virtqueues that must share an
>  address space
> 
> Device needs to advertise the following attributes to vDPA:
> 
> - the number of address spaces supported in the device
> - the number of virtqueue groups supported in the device
> - the mappings from a specific virtqueue to its virtqueue groups
> 
> The mappings from virtqueue to virtqueue groups is fixed and defined
> by vDPA device driver. E.g:
> 
> - For the device that has hardware ASID support, it can simply
>   advertise a per virtqueue virtqueue group.
> - For the device that does not have hardware ASID support, it can
>   simply advertise a single virtqueue group that contains all
>   virtqueues. Or if it wants a software emulated control virtqueue, it
>   can advertise two virtqueue groups, one is for cvq, another is for
>   the rest virtqueues.
> 
> vDPA also allow to change the association between virtqueue group and
> address space. So in the case of control virtqueue, userspace
> VMM(Qemu) may use a dedicated address space for the control virtqueue
> group to isolate the memory mapping.
> 
> The vhost/vhost-vDPA is also extend for the userspace to:
> 
> - query the number of virtqueue groups and address spaces supported by
>   the device
> - query the virtqueue group for a specific virtqueue
> - assocaite a virtqueue group with an address space
> - send ASID based IOTLB commands
> 
> This will help userspace VMM(Qemu) to detect whether the control vq
> could be supported and isolate memory mappings of control virtqueue
> from the others.
> 
> To demonstrate the usage, vDPA simulator is extended to support
> setting MAC address via a emulated control virtqueue.
> 
> Please review.
> 
> Changes since RFC:
> 
> - tweak vhost uAPI documentation
> - switch to use device specific IOTLB really in patch 4
> - tweak the commit log
> - fix that ASID in vhost is claimed to be 32 actually but 16bit
>   actually
> - fix use after free when using ASID with IOTLB batching requests
> - switch to use Stefano's patch for having separated iov
> - remove unused "used_as" variable
> - fix the iotlb/asid checking in vhost_vdpa_unmap()
> 
> Thanks
> 
> Jason Wang (20):
>   vhost: move the backend feature bits to vhost_types.h
>   virtio-vdpa: don't set callback if virtio doesn't need it
>   vhost-vdpa: passing iotlb to IOMMU mapping helpers
>   vhost-vdpa: switch to use vhost-vdpa specific IOTLB
>   vdpa: add the missing comment for nvqs in struct vdpa_device
>   vdpa: introduce virtqueue groups
>   vdpa: multiple address spaces support
>   vdpa: introduce config operations for associating ASID to a virtqueue
> group
>   vhost_iotlb: split out IOTLB initialization
>   vhost: support ASID in IOTLB API
>   vhost-vdpa: introduce asid based IOTLB
>   vhost-vdpa: introduce uAPI to get the number of virtqueue groups
>   vhost-vdpa: introduce uAPI to get the number of address spaces
>   vhost-vdpa: uAPI to get virtqueue group id
>   vhost-vdpa: introduce uAPI to set group ASID
>   vhost-vdpa: support ASID based IOTLB API
>   vdpa_sim: advertise VIRTIO_NET_F_MTU
>   vdpa_sim: factor out buffer completion logic
>   vdpa_sim: filter destination mac address
>   vdpasim: control virtqueue support
> 
> Stefano Garzarella (1):
>   vdpa_sim: split vdpasim_virtqueue's iov field in out_iov and in_iov
> 
>  drivers/vdpa/ifcvf/ifcvf_main.c   |   9 +-
>  drivers/vdpa/mlx5/net/mlx5_vnet.c |  11 +-
>  drivers/vdpa/vdpa.c   |   8 +-
>  drivers/vdpa/vdpa_sim/vdpa_sim.c  | 292 --
>  drivers/vhost/iotlb.c |  23 ++-
>  drivers/vhost/vdpa.c  | 246 -
>  drivers/vhost/vhost.c |  23 ++-
>  drivers/vhost/vhost.h |   4 +-
>  drivers/virtio/virtio_vdpa.c  |   2 +-
>  include/linux/vdpa.h  |  42 -
>  include/linux/vhost_iotlb.h   |   2 +
>  include/uapi/linux/vhost.h|  25 ++-
>  

[PATCH v1] vdpa/mlx5: Use write memory barrier after updating CQ index

2020-12-09 Thread Eli Cohen
Make sure to put dma write memory barrier after updating CQ consumer
index so the hardware knows that there are available CQE slots in the
queue.

Failure to do this can cause the update of the RX doorbell record to get
updated before the CQ consumer index resulting in CQ overrun.

Fixes: 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for supported mlx5 devices")
Signed-off-by: Eli Cohen 
---
V0 -> V1
Use dma_wmb() instead of wmb()

 drivers/vdpa/mlx5/net/mlx5_vnet.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index db87abc3cb60..43b0069ff8b1 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -479,6 +479,11 @@ static int mlx5_vdpa_poll_one(struct mlx5_vdpa_cq *vcq)
 static void mlx5_vdpa_handle_completions(struct mlx5_vdpa_virtqueue *mvq, int 
num)
 {
mlx5_cq_set_ci(>cq.mcq);
+
+   /* make sure CQ cosumer update is visible to the hardware before 
updating
+* RX doorbell record.
+*/
+   dma_wmb();
rx_post(>vqqp, num);
if (mvq->event_cb.callback)
mvq->event_cb.callback(mvq->event_cb.private);
-- 
2.27.0



Re: [PATCH] vdpa/mlx5: Use write memory barrier after updating CQ index

2020-12-09 Thread Eli Cohen
On Wed, Dec 09, 2020 at 03:05:42AM -0500, Michael S. Tsirkin wrote:
> On Wed, Dec 09, 2020 at 08:58:46AM +0200, Eli Cohen wrote:
> > On Wed, Dec 09, 2020 at 01:46:22AM -0500, Michael S. Tsirkin wrote:
> > > On Wed, Dec 09, 2020 at 08:02:30AM +0200, Eli Cohen wrote:
> > > > On Tue, Dec 08, 2020 at 04:45:04PM -0500, Michael S. Tsirkin wrote:
> > > > > On Sun, Dec 06, 2020 at 12:57:19PM +0200, Eli Cohen wrote:
> > > > > > Make sure to put write memory barrier after updating CQ consumer 
> > > > > > index
> > > > > > so the hardware knows that there are available CQE slots in the 
> > > > > > queue.
> > > > > > 
> > > > > > Failure to do this can cause the update of the RX doorbell record 
> > > > > > to get
> > > > > > updated before the CQ consumer index resulting in CQ overrun.
> > > > > > 
> > > > > > Change-Id: Ib0ae4c118cce524c9f492b32569179f3c1f04cc1
> > > > > > Fixes: 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for supported mlx5 
> > > > > > devices")
> > > > > > Signed-off-by: Eli Cohen 
> > > > > 
> > > > > Aren't both memory writes?
> > > > 
> > > > Not sure what exactly you mean here.
> > > 
> > > Both updates are CPU writes into RAM that hardware then reads
> > > using DMA.
> > > 
> > 
> > You mean why I did not put a memory barrier right after updating the
> > recieve doorbell record?
> 
> Sorry about being unclear.  I just tried to give justification for why
> dma_wmb seems more appropriate than wmb here. If you need to
> order memory writes wrt writes to card, that is different, but generally
> writeX and friends will handle the ordering for you, except when
> using relaxed memory mappings - then wmb is generally necessary.
> 

Bear in mind, we're writing to memory (not io memory). In this case, we
want this write to be visible my the DMA device.

https://www.kernel.org/doc/Documentation/memory-barriers.txt gives a
similar example using dma_wmb() to flush updates to make them visible
by the hardware before notifying the hardware to come and inspect this
memory.


> > I thought about this and I think it is not required. Suppose it takes a
> > very long time till the hardware can actually see this update. The worst
> > effect would be that the hardware will drop received packets if it does
> > sees none available due to the delayed update. Eventually it will see
> > the update and will continue working.
> > 
> > If I put a memory barrier, I put some delay waiting for the CPU to flush
> > the write before continuing. I tried both options while checking packet
> > rate on couldn't see noticable difference in either case.
> 
> 
> makes sense.
> 
> > > > > And given that, isn't dma_wmb() sufficient here?
> > > > 
> > > > I agree that dma_wmb() is more appropriate here.
> > > > 
> > > > > 
> > > > > 
> > > > > > ---
> > > > > >  drivers/vdpa/mlx5/net/mlx5_vnet.c | 5 +
> > > > > >  1 file changed, 5 insertions(+)
> > > > > > 
> > > > > > diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
> > > > > > b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > > index 1f4089c6f9d7..295f46eea2a5 100644
> > > > > > --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > > +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > > @@ -478,6 +478,11 @@ static int mlx5_vdpa_poll_one(struct 
> > > > > > mlx5_vdpa_cq *vcq)
> > > > > >  static void mlx5_vdpa_handle_completions(struct 
> > > > > > mlx5_vdpa_virtqueue *mvq, int num)
> > > > > >  {
> > > > > > mlx5_cq_set_ci(>cq.mcq);
> > > > > > +
> > > > > > +   /* make sure CQ cosumer update is visible to the hardware 
> > > > > > before updating
> > > > > > +* RX doorbell record.
> > > > > > +*/
> > > > > > +   wmb();
> > > > > > rx_post(>vqqp, num);
> > > > > > if (mvq->event_cb.callback)
> > > > > > mvq->event_cb.callback(mvq->event_cb.private);
> > > > > > -- 
> > > > > > 2.27.0
> > > > > 
> > > 
> 


Re: [PATCH] vdpa/mlx5: Use write memory barrier after updating CQ index

2020-12-08 Thread Eli Cohen
On Wed, Dec 09, 2020 at 01:46:22AM -0500, Michael S. Tsirkin wrote:
> On Wed, Dec 09, 2020 at 08:02:30AM +0200, Eli Cohen wrote:
> > On Tue, Dec 08, 2020 at 04:45:04PM -0500, Michael S. Tsirkin wrote:
> > > On Sun, Dec 06, 2020 at 12:57:19PM +0200, Eli Cohen wrote:
> > > > Make sure to put write memory barrier after updating CQ consumer index
> > > > so the hardware knows that there are available CQE slots in the queue.
> > > > 
> > > > Failure to do this can cause the update of the RX doorbell record to get
> > > > updated before the CQ consumer index resulting in CQ overrun.
> > > > 
> > > > Change-Id: Ib0ae4c118cce524c9f492b32569179f3c1f04cc1
> > > > Fixes: 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for supported mlx5 
> > > > devices")
> > > > Signed-off-by: Eli Cohen 
> > > 
> > > Aren't both memory writes?
> > 
> > Not sure what exactly you mean here.
> 
> Both updates are CPU writes into RAM that hardware then reads
> using DMA.
> 

You mean why I did not put a memory barrier right after updating the
recieve doorbell record?

I thought about this and I think it is not required. Suppose it takes a
very long time till the hardware can actually see this update. The worst
effect would be that the hardware will drop received packets if it does
sees none available due to the delayed update. Eventually it will see
the update and will continue working.

If I put a memory barrier, I put some delay waiting for the CPU to flush
the write before continuing. I tried both options while checking packet
rate on couldn't see noticable difference in either case.

> > > And given that, isn't dma_wmb() sufficient here?
> > 
> > I agree that dma_wmb() is more appropriate here.
> > 
> > > 
> > > 
> > > > ---
> > > >  drivers/vdpa/mlx5/net/mlx5_vnet.c | 5 +
> > > >  1 file changed, 5 insertions(+)
> > > > 
> > > > diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
> > > > b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > index 1f4089c6f9d7..295f46eea2a5 100644
> > > > --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > @@ -478,6 +478,11 @@ static int mlx5_vdpa_poll_one(struct mlx5_vdpa_cq 
> > > > *vcq)
> > > >  static void mlx5_vdpa_handle_completions(struct mlx5_vdpa_virtqueue 
> > > > *mvq, int num)
> > > >  {
> > > > mlx5_cq_set_ci(>cq.mcq);
> > > > +
> > > > +   /* make sure CQ cosumer update is visible to the hardware 
> > > > before updating
> > > > +* RX doorbell record.
> > > > +*/
> > > > +   wmb();
> > > > rx_post(>vqqp, num);
> > > > if (mvq->event_cb.callback)
> > > > mvq->event_cb.callback(mvq->event_cb.private);
> > > > -- 
> > > > 2.27.0
> > > 
> 


Re: [PATCH] vdpa/mlx5: Use write memory barrier after updating CQ index

2020-12-08 Thread Eli Cohen
On Tue, Dec 08, 2020 at 04:45:04PM -0500, Michael S. Tsirkin wrote:
> On Sun, Dec 06, 2020 at 12:57:19PM +0200, Eli Cohen wrote:
> > Make sure to put write memory barrier after updating CQ consumer index
> > so the hardware knows that there are available CQE slots in the queue.
> > 
> > Failure to do this can cause the update of the RX doorbell record to get
> > updated before the CQ consumer index resulting in CQ overrun.
> > 
> > Change-Id: Ib0ae4c118cce524c9f492b32569179f3c1f04cc1
> > Fixes: 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for supported mlx5 
> > devices")
> > Signed-off-by: Eli Cohen 
> 
> Aren't both memory writes?

Not sure what exactly you mean here.

> And given that, isn't dma_wmb() sufficient here?

I agree that dma_wmb() is more appropriate here.

> 
> 
> > ---
> >  drivers/vdpa/mlx5/net/mlx5_vnet.c | 5 +
> >  1 file changed, 5 insertions(+)
> > 
> > diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
> > b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > index 1f4089c6f9d7..295f46eea2a5 100644
> > --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > @@ -478,6 +478,11 @@ static int mlx5_vdpa_poll_one(struct mlx5_vdpa_cq *vcq)
> >  static void mlx5_vdpa_handle_completions(struct mlx5_vdpa_virtqueue *mvq, 
> > int num)
> >  {
> > mlx5_cq_set_ci(>cq.mcq);
> > +
> > +   /* make sure CQ cosumer update is visible to the hardware before 
> > updating
> > +* RX doorbell record.
> > +*/
> > +   wmb();
> > rx_post(>vqqp, num);
> > if (mvq->event_cb.callback)
> > mvq->event_cb.callback(mvq->event_cb.private);
> > -- 
> > 2.27.0
> 


Re: [PATCH] vdpa/mlx5: Use write memory barrier after updating CQ index

2020-12-08 Thread Eli Cohen
On Mon, Dec 07, 2020 at 10:51:44AM +0800, Jason Wang wrote:
> 
> On 2020/12/6 下午6:57, Eli Cohen wrote:
> > Make sure to put write memory barrier after updating CQ consumer index
> > so the hardware knows that there are available CQE slots in the queue.
> > 
> > Failure to do this can cause the update of the RX doorbell record to get
> > updated before the CQ consumer index resulting in CQ overrun.
> > 
> > Change-Id: Ib0ae4c118cce524c9f492b32569179f3c1f04cc1

Michael, I left this gerrit ID by mistake. Can you remove it before
merging?

> > Fixes: 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for supported mlx5 
> > devices")
> > Signed-off-by: Eli Cohen 
> > ---
> >   drivers/vdpa/mlx5/net/mlx5_vnet.c | 5 +
> >   1 file changed, 5 insertions(+)
> > 
> > diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
> > b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > index 1f4089c6f9d7..295f46eea2a5 100644
> > --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > @@ -478,6 +478,11 @@ static int mlx5_vdpa_poll_one(struct mlx5_vdpa_cq *vcq)
> >   static void mlx5_vdpa_handle_completions(struct mlx5_vdpa_virtqueue *mvq, 
> > int num)
> >   {
> > mlx5_cq_set_ci(>cq.mcq);
> > +
> > +   /* make sure CQ cosumer update is visible to the hardware before 
> > updating
> > +* RX doorbell record.
> > +*/
> > +   wmb();
> > rx_post(>vqqp, num);
> > if (mvq->event_cb.callback)
> > mvq->event_cb.callback(mvq->event_cb.private);
> 
> 
> Acked-by: Jason Wang 
> 
> 


[PATCH] vdpa/mlx5: Use write memory barrier after updating CQ index

2020-12-06 Thread Eli Cohen
Make sure to put write memory barrier after updating CQ consumer index
so the hardware knows that there are available CQE slots in the queue.

Failure to do this can cause the update of the RX doorbell record to get
updated before the CQ consumer index resulting in CQ overrun.

Change-Id: Ib0ae4c118cce524c9f492b32569179f3c1f04cc1
Fixes: 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for supported mlx5 devices")
Signed-off-by: Eli Cohen 
---
 drivers/vdpa/mlx5/net/mlx5_vnet.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index 1f4089c6f9d7..295f46eea2a5 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -478,6 +478,11 @@ static int mlx5_vdpa_poll_one(struct mlx5_vdpa_cq *vcq)
 static void mlx5_vdpa_handle_completions(struct mlx5_vdpa_virtqueue *mvq, int 
num)
 {
mlx5_cq_set_ci(>cq.mcq);
+
+   /* make sure CQ cosumer update is visible to the hardware before 
updating
+* RX doorbell record.
+*/
+   wmb();
rx_post(>vqqp, num);
if (mvq->event_cb.callback)
mvq->event_cb.callback(mvq->event_cb.private);
-- 
2.27.0



Re: [PATCH] vdpa/mlx5: Use random MAC for the vdpa net instance

2020-12-05 Thread Eli Cohen
On Fri, Dec 04, 2020 at 10:53:28AM +0800, Jason Wang wrote:
> 
> On 2020/12/3 下午8:24, Eli Cohen wrote:
> > > > It is mentioned in Parav's patchset that this will be coming in a
> > > > subsequent patch to his vdpa tool.
> > > So I think kernel has two options:
> > > - require a mac when device is created, we supply it to guest
> > Yes, the driver should always set VIRTIO_NET_F_MAC and provide a MAC -
> > either random or whatever configured using the vdpa too.
> 
> 
> A questions here, I think current mlx5 vdpa works for VF only. So I think
> the VF should have a given MAC? If yes, can we use that MAC?
> 
The MAC assigned to VF is by the NIC implementation. Both ther regular
NIC driver and the VDPA implementation can co-exist so we can't use the
NIC's MAC for VDPA. We want to steer traffic based on its destination
MAC address to either VDPA or regular NIC.


Re: [PATCH] vdpa/mlx5: Use random MAC for the vdpa net instance

2020-12-03 Thread Eli Cohen
On Thu, Dec 03, 2020 at 07:15:37AM -0500, Michael S. Tsirkin wrote:
> On Thu, Dec 03, 2020 at 02:09:29PM +0200, Eli Cohen wrote:
> > On Thu, Dec 03, 2020 at 05:44:17AM -0500, Michael S. Tsirkin wrote:
> > > On Thu, Dec 03, 2020 at 08:49:28AM +0200, Eli Cohen wrote:
> > > > On Wed, Dec 02, 2020 at 05:00:22PM -0500, Michael S. Tsirkin wrote:
> > > > > On Wed, Dec 02, 2020 at 09:48:25PM +0800, Jason Wang wrote:
> > > > > > 
> > > > > > On 2020/12/2 下午5:23, Michael S. Tsirkin wrote:
> > > > > > > On Wed, Dec 02, 2020 at 07:57:14AM +0200, Eli Cohen wrote:
> > > > > > > > On Wed, Dec 02, 2020 at 12:18:36PM +0800, Jason Wang wrote:
> > > > > > > > > On 2020/12/1 下午5:23, Cindy Lu wrote:
> > > > > > > > > > On Mon, Nov 30, 2020 at 11:33 PM Michael S. 
> > > > > > > > > > Tsirkin  wrote:
> > > > > > > > > > > On Mon, Nov 30, 2020 at 06:41:45PM +0800, Cindy Lu wrote:
> > > > > > > > > > > > On Mon, Nov 30, 2020 at 5:33 PM Michael S. 
> > > > > > > > > > > > Tsirkin  wrote:
> > > > > > > > > > > > > On Mon, Nov 30, 2020 at 11:27:59AM +0200, Eli Cohen 
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > On Mon, Nov 30, 2020 at 04:00:51AM -0500, Michael 
> > > > > > > > > > > > > > S. Tsirkin wrote:
> > > > > > > > > > > > > > > On Mon, Nov 30, 2020 at 08:27:46AM +0200, Eli 
> > > > > > > > > > > > > > > Cohen wrote:
> > > > > > > > > > > > > > > > On Sun, Nov 29, 2020 at 03:08:22PM -0500, 
> > > > > > > > > > > > > > > > Michael S. Tsirkin wrote:
> > > > > > > > > > > > > > > > > On Sun, Nov 29, 2020 at 08:43:51AM +0200, Eli 
> > > > > > > > > > > > > > > > > Cohen wrote:
> > > > > > > > > > > > > > > > > > We should not try to use the VF MAC address 
> > > > > > > > > > > > > > > > > > as that is used by the
> > > > > > > > > > > > > > > > > > regular (e.g. mlx5_core) NIC 
> > > > > > > > > > > > > > > > > > implementation. Instead, use a random
> > > > > > > > > > > > > > > > > > generated MAC address.
> > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > Suggested by: Cindy Lu
> > > > > > > > > > > > > > > > > > Fixes: 1a86b377aa21 ("vdpa/mlx5: Add VDPA 
> > > > > > > > > > > > > > > > > > driver for supported mlx5 devices")
> > > > > > > > > > > > > > > > > > Signed-off-by: Eli Cohen
> > > > > > > > > > > > > > > > > I didn't realise it's possible to use VF in 
> > > > > > > > > > > > > > > > > two ways
> > > > > > > > > > > > > > > > > with and without vdpa.
> > > > > > > > > > > > > > > > Using a VF you can create quite a few 
> > > > > > > > > > > > > > > > resources, e.g. send queues
> > > > > > > > > > > > > > > > recieve queues, virtio_net queues etc. So you 
> > > > > > > > > > > > > > > > can possibly create
> > > > > > > > > > > > > > > > several instances of vdpa net devices and nic 
> > > > > > > > > > > > > > > > net devices.
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > Could you include a bit more description on 
> > > > > > > > > > > > > > > > > the failure
> > > > > > > > > > > > > > > > > mode?
> > > > > >

Re: [PATCH] vdpa/mlx5: Use random MAC for the vdpa net instance

2020-12-03 Thread Eli Cohen
On Thu, Dec 03, 2020 at 05:44:17AM -0500, Michael S. Tsirkin wrote:
> On Thu, Dec 03, 2020 at 08:49:28AM +0200, Eli Cohen wrote:
> > On Wed, Dec 02, 2020 at 05:00:22PM -0500, Michael S. Tsirkin wrote:
> > > On Wed, Dec 02, 2020 at 09:48:25PM +0800, Jason Wang wrote:
> > > > 
> > > > On 2020/12/2 下午5:23, Michael S. Tsirkin wrote:
> > > > > On Wed, Dec 02, 2020 at 07:57:14AM +0200, Eli Cohen wrote:
> > > > > > On Wed, Dec 02, 2020 at 12:18:36PM +0800, Jason Wang wrote:
> > > > > > > On 2020/12/1 下午5:23, Cindy Lu wrote:
> > > > > > > > On Mon, Nov 30, 2020 at 11:33 PM Michael S. 
> > > > > > > > Tsirkin  wrote:
> > > > > > > > > On Mon, Nov 30, 2020 at 06:41:45PM +0800, Cindy Lu wrote:
> > > > > > > > > > On Mon, Nov 30, 2020 at 5:33 PM Michael S. 
> > > > > > > > > > Tsirkin  wrote:
> > > > > > > > > > > On Mon, Nov 30, 2020 at 11:27:59AM +0200, Eli Cohen wrote:
> > > > > > > > > > > > On Mon, Nov 30, 2020 at 04:00:51AM -0500, Michael S. 
> > > > > > > > > > > > Tsirkin wrote:
> > > > > > > > > > > > > On Mon, Nov 30, 2020 at 08:27:46AM +0200, Eli Cohen 
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > On Sun, Nov 29, 2020 at 03:08:22PM -0500, Michael 
> > > > > > > > > > > > > > S. Tsirkin wrote:
> > > > > > > > > > > > > > > On Sun, Nov 29, 2020 at 08:43:51AM +0200, Eli 
> > > > > > > > > > > > > > > Cohen wrote:
> > > > > > > > > > > > > > > > We should not try to use the VF MAC address as 
> > > > > > > > > > > > > > > > that is used by the
> > > > > > > > > > > > > > > > regular (e.g. mlx5_core) NIC implementation. 
> > > > > > > > > > > > > > > > Instead, use a random
> > > > > > > > > > > > > > > > generated MAC address.
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > Suggested by: Cindy Lu
> > > > > > > > > > > > > > > > Fixes: 1a86b377aa21 ("vdpa/mlx5: Add VDPA 
> > > > > > > > > > > > > > > > driver for supported mlx5 devices")
> > > > > > > > > > > > > > > > Signed-off-by: Eli Cohen
> > > > > > > > > > > > > > > I didn't realise it's possible to use VF in two 
> > > > > > > > > > > > > > > ways
> > > > > > > > > > > > > > > with and without vdpa.
> > > > > > > > > > > > > > Using a VF you can create quite a few resources, 
> > > > > > > > > > > > > > e.g. send queues
> > > > > > > > > > > > > > recieve queues, virtio_net queues etc. So you can 
> > > > > > > > > > > > > > possibly create
> > > > > > > > > > > > > > several instances of vdpa net devices and nic net 
> > > > > > > > > > > > > > devices.
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > Could you include a bit more description on the 
> > > > > > > > > > > > > > > failure
> > > > > > > > > > > > > > > mode?
> > > > > > > > > > > > > > Well, using the MAC address of the nic vport is 
> > > > > > > > > > > > > > wrong since that is the
> > > > > > > > > > > > > > MAC of the regular NIC implementation of mlx5_core.
> > > > > > > > > > > > > Right but ATM it doesn't coexist with vdpa so what's 
> > > > > > > > > > > > > the problem?
> > > > > > > > > > > > > 
> > > > > > > > > > > > This call is wrong:  mlx5_qu

Re: [PATCH] vdpa/mlx5: Use random MAC for the vdpa net instance

2020-12-02 Thread Eli Cohen
On Wed, Dec 02, 2020 at 05:00:22PM -0500, Michael S. Tsirkin wrote:
> On Wed, Dec 02, 2020 at 09:48:25PM +0800, Jason Wang wrote:
> > 
> > On 2020/12/2 下午5:23, Michael S. Tsirkin wrote:
> > > On Wed, Dec 02, 2020 at 07:57:14AM +0200, Eli Cohen wrote:
> > > > On Wed, Dec 02, 2020 at 12:18:36PM +0800, Jason Wang wrote:
> > > > > On 2020/12/1 下午5:23, Cindy Lu wrote:
> > > > > > On Mon, Nov 30, 2020 at 11:33 PM Michael S. 
> > > > > > Tsirkin  wrote:
> > > > > > > On Mon, Nov 30, 2020 at 06:41:45PM +0800, Cindy Lu wrote:
> > > > > > > > On Mon, Nov 30, 2020 at 5:33 PM Michael S. 
> > > > > > > > Tsirkin  wrote:
> > > > > > > > > On Mon, Nov 30, 2020 at 11:27:59AM +0200, Eli Cohen wrote:
> > > > > > > > > > On Mon, Nov 30, 2020 at 04:00:51AM -0500, Michael S. 
> > > > > > > > > > Tsirkin wrote:
> > > > > > > > > > > On Mon, Nov 30, 2020 at 08:27:46AM +0200, Eli Cohen wrote:
> > > > > > > > > > > > On Sun, Nov 29, 2020 at 03:08:22PM -0500, Michael S. 
> > > > > > > > > > > > Tsirkin wrote:
> > > > > > > > > > > > > On Sun, Nov 29, 2020 at 08:43:51AM +0200, Eli Cohen 
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > We should not try to use the VF MAC address as that 
> > > > > > > > > > > > > > is used by the
> > > > > > > > > > > > > > regular (e.g. mlx5_core) NIC implementation. 
> > > > > > > > > > > > > > Instead, use a random
> > > > > > > > > > > > > > generated MAC address.
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > Suggested by: Cindy Lu
> > > > > > > > > > > > > > Fixes: 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver 
> > > > > > > > > > > > > > for supported mlx5 devices")
> > > > > > > > > > > > > > Signed-off-by: Eli Cohen
> > > > > > > > > > > > > I didn't realise it's possible to use VF in two ways
> > > > > > > > > > > > > with and without vdpa.
> > > > > > > > > > > > Using a VF you can create quite a few resources, e.g. 
> > > > > > > > > > > > send queues
> > > > > > > > > > > > recieve queues, virtio_net queues etc. So you can 
> > > > > > > > > > > > possibly create
> > > > > > > > > > > > several instances of vdpa net devices and nic net 
> > > > > > > > > > > > devices.
> > > > > > > > > > > > 
> > > > > > > > > > > > > Could you include a bit more description on the 
> > > > > > > > > > > > > failure
> > > > > > > > > > > > > mode?
> > > > > > > > > > > > Well, using the MAC address of the nic vport is wrong 
> > > > > > > > > > > > since that is the
> > > > > > > > > > > > MAC of the regular NIC implementation of mlx5_core.
> > > > > > > > > > > Right but ATM it doesn't coexist with vdpa so what's the 
> > > > > > > > > > > problem?
> > > > > > > > > > > 
> > > > > > > > > > This call is wrong:  mlx5_query_nic_vport_mac_address()
> > > > > > > > > > 
> > > > > > > > > > > > > Is switching to a random mac for such an unusual
> > > > > > > > > > > > > configuration really justified?
> > > > > > > > > > > > Since I can't use the NIC's MAC address, I have two 
> > > > > > > > > > > > options:
> > > > > > > > > > > > 1. To get the MAC address as was chosen by the user 
> > > > > > > > > > > > administering the
> > > > > > > > > > > >  NIC. This should invoke the set_config 

Re: [PATCH] vdpa/mlx5: Use random MAC for the vdpa net instance

2020-12-02 Thread Eli Cohen
On Wed, Dec 02, 2020 at 04:23:11AM -0500, Michael S. Tsirkin wrote:
> On Wed, Dec 02, 2020 at 07:57:14AM +0200, Eli Cohen wrote:
> > On Wed, Dec 02, 2020 at 12:18:36PM +0800, Jason Wang wrote:
> > > 
> > > On 2020/12/1 下午5:23, Cindy Lu wrote:
> > > > On Mon, Nov 30, 2020 at 11:33 PM Michael S. Tsirkin  
> > > > wrote:
> > > > > On Mon, Nov 30, 2020 at 06:41:45PM +0800, Cindy Lu wrote:
> > > > > > On Mon, Nov 30, 2020 at 5:33 PM Michael S. Tsirkin 
> > > > > >  wrote:
> > > > > > > On Mon, Nov 30, 2020 at 11:27:59AM +0200, Eli Cohen wrote:
> > > > > > > > On Mon, Nov 30, 2020 at 04:00:51AM -0500, Michael S. Tsirkin 
> > > > > > > > wrote:
> > > > > > > > > On Mon, Nov 30, 2020 at 08:27:46AM +0200, Eli Cohen wrote:
> > > > > > > > > > On Sun, Nov 29, 2020 at 03:08:22PM -0500, Michael S. 
> > > > > > > > > > Tsirkin wrote:
> > > > > > > > > > > On Sun, Nov 29, 2020 at 08:43:51AM +0200, Eli Cohen wrote:
> > > > > > > > > > > > We should not try to use the VF MAC address as that is 
> > > > > > > > > > > > used by the
> > > > > > > > > > > > regular (e.g. mlx5_core) NIC implementation. Instead, 
> > > > > > > > > > > > use a random
> > > > > > > > > > > > generated MAC address.
> > > > > > > > > > > > 
> > > > > > > > > > > > Suggested by: Cindy Lu 
> > > > > > > > > > > > Fixes: 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for 
> > > > > > > > > > > > supported mlx5 devices")
> > > > > > > > > > > > Signed-off-by: Eli Cohen 
> > > > > > > > > > > I didn't realise it's possible to use VF in two ways
> > > > > > > > > > > with and without vdpa.
> > > > > > > > > > Using a VF you can create quite a few resources, e.g. send 
> > > > > > > > > > queues
> > > > > > > > > > recieve queues, virtio_net queues etc. So you can possibly 
> > > > > > > > > > create
> > > > > > > > > > several instances of vdpa net devices and nic net devices.
> > > > > > > > > > 
> > > > > > > > > > > Could you include a bit more description on the failure
> > > > > > > > > > > mode?
> > > > > > > > > > Well, using the MAC address of the nic vport is wrong since 
> > > > > > > > > > that is the
> > > > > > > > > > MAC of the regular NIC implementation of mlx5_core.
> > > > > > > > > Right but ATM it doesn't coexist with vdpa so what's the 
> > > > > > > > > problem?
> > > > > > > > > 
> > > > > > > > This call is wrong:  mlx5_query_nic_vport_mac_address()
> > > > > > > > 
> > > > > > > > > > > Is switching to a random mac for such an unusual
> > > > > > > > > > > configuration really justified?
> > > > > > > > > > Since I can't use the NIC's MAC address, I have two options:
> > > > > > > > > > 1. To get the MAC address as was chosen by the user 
> > > > > > > > > > administering the
> > > > > > > > > > NIC. This should invoke the set_config callback. 
> > > > > > > > > > Unfortunately this
> > > > > > > > > > is not implemented yet.
> > > > > > > > > > 
> > > > > > > > > > 2. Use a random MAC address. This is OK since if (1) is 
> > > > > > > > > > implemented it
> > > > > > > > > > can always override this random configuration.
> > > > > > > > > > 
> > > > > > > > > > > It looks like changing a MAC could break some guests,
> > > > > > > > > > > can it not?
> > > > > > > > > > > 
> > > > > > > > > > No, it will not. The current version of mlx5

Re: [PATCH] vdpa/mlx5: Use random MAC for the vdpa net instance

2020-12-01 Thread Eli Cohen
On Wed, Dec 02, 2020 at 12:18:36PM +0800, Jason Wang wrote:
> 
> On 2020/12/1 下午5:23, Cindy Lu wrote:
> > On Mon, Nov 30, 2020 at 11:33 PM Michael S. Tsirkin  wrote:
> > > On Mon, Nov 30, 2020 at 06:41:45PM +0800, Cindy Lu wrote:
> > > > On Mon, Nov 30, 2020 at 5:33 PM Michael S. Tsirkin  
> > > > wrote:
> > > > > On Mon, Nov 30, 2020 at 11:27:59AM +0200, Eli Cohen wrote:
> > > > > > On Mon, Nov 30, 2020 at 04:00:51AM -0500, Michael S. Tsirkin wrote:
> > > > > > > On Mon, Nov 30, 2020 at 08:27:46AM +0200, Eli Cohen wrote:
> > > > > > > > On Sun, Nov 29, 2020 at 03:08:22PM -0500, Michael S. Tsirkin 
> > > > > > > > wrote:
> > > > > > > > > On Sun, Nov 29, 2020 at 08:43:51AM +0200, Eli Cohen wrote:
> > > > > > > > > > We should not try to use the VF MAC address as that is used 
> > > > > > > > > > by the
> > > > > > > > > > regular (e.g. mlx5_core) NIC implementation. Instead, use a 
> > > > > > > > > > random
> > > > > > > > > > generated MAC address.
> > > > > > > > > > 
> > > > > > > > > > Suggested by: Cindy Lu 
> > > > > > > > > > Fixes: 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for 
> > > > > > > > > > supported mlx5 devices")
> > > > > > > > > > Signed-off-by: Eli Cohen 
> > > > > > > > > I didn't realise it's possible to use VF in two ways
> > > > > > > > > with and without vdpa.
> > > > > > > > Using a VF you can create quite a few resources, e.g. send 
> > > > > > > > queues
> > > > > > > > recieve queues, virtio_net queues etc. So you can possibly 
> > > > > > > > create
> > > > > > > > several instances of vdpa net devices and nic net devices.
> > > > > > > > 
> > > > > > > > > Could you include a bit more description on the failure
> > > > > > > > > mode?
> > > > > > > > Well, using the MAC address of the nic vport is wrong since 
> > > > > > > > that is the
> > > > > > > > MAC of the regular NIC implementation of mlx5_core.
> > > > > > > Right but ATM it doesn't coexist with vdpa so what's the problem?
> > > > > > > 
> > > > > > This call is wrong:  mlx5_query_nic_vport_mac_address()
> > > > > > 
> > > > > > > > > Is switching to a random mac for such an unusual
> > > > > > > > > configuration really justified?
> > > > > > > > Since I can't use the NIC's MAC address, I have two options:
> > > > > > > > 1. To get the MAC address as was chosen by the user 
> > > > > > > > administering the
> > > > > > > > NIC. This should invoke the set_config callback. 
> > > > > > > > Unfortunately this
> > > > > > > > is not implemented yet.
> > > > > > > > 
> > > > > > > > 2. Use a random MAC address. This is OK since if (1) is 
> > > > > > > > implemented it
> > > > > > > > can always override this random configuration.
> > > > > > > > 
> > > > > > > > > It looks like changing a MAC could break some guests,
> > > > > > > > > can it not?
> > > > > > > > > 
> > > > > > > > No, it will not. The current version of mlx5 VDPA does not 
> > > > > > > > allow regular
> > > > > > > > NIC driver and VDPA to co-exist. I have patches ready that 
> > > > > > > > enable that
> > > > > > > > from steering point of view. I will post them here once other 
> > > > > > > > patches on
> > > > > > > > which they depend will be merged.
> > > > > > > > 
> > > > > > > > https://patchwork.ozlabs.org/project/netdev/patch/20201120230339.651609-12-sae...@nvidia.com/
> > > > > > > Could you be more explicit on the following points:
> > > > > > > - which configuration is broken ATM (as in, tw

Re: [PATCH] vdpa/mlx5: Use random MAC for the vdpa net instance

2020-11-30 Thread Eli Cohen
On Mon, Nov 30, 2020 at 04:33:09AM -0500, Michael S. Tsirkin wrote:
> On Mon, Nov 30, 2020 at 11:27:59AM +0200, Eli Cohen wrote:
> > On Mon, Nov 30, 2020 at 04:00:51AM -0500, Michael S. Tsirkin wrote:
> > > On Mon, Nov 30, 2020 at 08:27:46AM +0200, Eli Cohen wrote:
> > > > On Sun, Nov 29, 2020 at 03:08:22PM -0500, Michael S. Tsirkin wrote:
> > > > > On Sun, Nov 29, 2020 at 08:43:51AM +0200, Eli Cohen wrote:
> > > > > > We should not try to use the VF MAC address as that is used by the
> > > > > > regular (e.g. mlx5_core) NIC implementation. Instead, use a random
> > > > > > generated MAC address.
> > > > > > 
> > > > > > Suggested by: Cindy Lu 
> > > > > > Fixes: 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for supported mlx5 
> > > > > > devices")
> > > > > > Signed-off-by: Eli Cohen 
> > > > > 
> > > > > I didn't realise it's possible to use VF in two ways
> > > > > with and without vdpa.
> > > > 
> > > > Using a VF you can create quite a few resources, e.g. send queues
> > > > recieve queues, virtio_net queues etc. So you can possibly create
> > > > several instances of vdpa net devices and nic net devices.
> > > > 
> > > > > Could you include a bit more description on the failure
> > > > > mode?
> > > > 
> > > > Well, using the MAC address of the nic vport is wrong since that is the
> > > > MAC of the regular NIC implementation of mlx5_core.
> > > 
> > > Right but ATM it doesn't coexist with vdpa so what's the problem?
> > > 
> > 
> > This call is wrong:  mlx5_query_nic_vport_mac_address()
> > 
> > > > > Is switching to a random mac for such an unusual
> > > > > configuration really justified?
> > > > 
> > > > Since I can't use the NIC's MAC address, I have two options:
> > > > 1. To get the MAC address as was chosen by the user administering the
> > > >NIC. This should invoke the set_config callback. Unfortunately this
> > > >is not implemented yet.
> > > > 
> > > > 2. Use a random MAC address. This is OK since if (1) is implemented it
> > > >can always override this random configuration.
> > > > 
> > > > > It looks like changing a MAC could break some guests,
> > > > > can it not?
> > > > >
> > > > 
> > > > No, it will not. The current version of mlx5 VDPA does not allow regular
> > > > NIC driver and VDPA to co-exist. I have patches ready that enable that
> > > > from steering point of view. I will post them here once other patches on
> > > > which they depend will be merged.
> > > > 
> > > > https://patchwork.ozlabs.org/project/netdev/patch/20201120230339.651609-12-sae...@nvidia.com/
> > > 
> > > Could you be more explicit on the following points:
> > > - which configuration is broken ATM (as in, two device have identical
> > >   macs? any other issues)?
> > 
> > The only wrong thing is the call to  mlx5_query_nic_vport_mac_address().
> > It's not breaking anything yet is wrong. The random MAC address setting
> > is required for the steering patches.
> 
> Okay so I'm not sure the Fixes tag at least is appropriate if it's a
> dependency of a new feature.
> 

OK, let's leave it for now. I will push along with the steering patches.
The meaning is the the VDPA net device instance is create with a MAC of
all zeros which also mean the link is down. You can set a MAC which will
let the link come up. The vdpa driver will not get a callback as I
stated but since current mode of steering directs all traffic to the
vdpa instance it will work. In the future this must be fixed.

> > > - why won't device MAC change from guest point of view?
> > > 
> > 
> > It's lack of implementation in qemu as far as I know.
> 
> Sorry not sure I understand. What's not implemented in QEMU?
> 

vdpa config operation set_config() should be called whenever the MAC is
changed, e.g. when administrator of the vdpa net device changes the mac.
This does not happen which is a bug.

> > > 
> > > > > > ---
> > > > > >  drivers/vdpa/mlx5/net/mlx5_vnet.c | 5 +
> > > > > >  1 file changed, 1 insertion(+), 4 deletions(-)
> > > > > > 
> > > > > > diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
> > > > > > b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > > index 1fa6fcac8299..80d06d958b8b 100644
> > > > > > --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > > +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > > @@ -1955,10 +1955,7 @@ void *mlx5_vdpa_add_dev(struct mlx5_core_dev 
> > > > > > *mdev)
> > > > > > if (err)
> > > > > > goto err_mtu;
> > > > > >  
> > > > > > -   err = mlx5_query_nic_vport_mac_address(mdev, 0, 0, config->mac);
> > > > > > -   if (err)
> > > > > > -   goto err_mtu;
> > > > > > -
> > > > > > +   eth_random_addr(config->mac);
> > > > > > mvdev->vdev.dma_dev = mdev->device;
> > > > > > err = mlx5_vdpa_alloc_resources(>mvdev);
> > > > > > if (err)
> > > > > > -- 
> > > > > > 2.26.2
> > > > > 
> > > 
> 


Re: [PATCH] vdpa/mlx5: Use random MAC for the vdpa net instance

2020-11-30 Thread Eli Cohen
On Mon, Nov 30, 2020 at 04:00:51AM -0500, Michael S. Tsirkin wrote:
> On Mon, Nov 30, 2020 at 08:27:46AM +0200, Eli Cohen wrote:
> > On Sun, Nov 29, 2020 at 03:08:22PM -0500, Michael S. Tsirkin wrote:
> > > On Sun, Nov 29, 2020 at 08:43:51AM +0200, Eli Cohen wrote:
> > > > We should not try to use the VF MAC address as that is used by the
> > > > regular (e.g. mlx5_core) NIC implementation. Instead, use a random
> > > > generated MAC address.
> > > > 
> > > > Suggested by: Cindy Lu 
> > > > Fixes: 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for supported mlx5 
> > > > devices")
> > > > Signed-off-by: Eli Cohen 
> > > 
> > > I didn't realise it's possible to use VF in two ways
> > > with and without vdpa.
> > 
> > Using a VF you can create quite a few resources, e.g. send queues
> > recieve queues, virtio_net queues etc. So you can possibly create
> > several instances of vdpa net devices and nic net devices.
> > 
> > > Could you include a bit more description on the failure
> > > mode?
> > 
> > Well, using the MAC address of the nic vport is wrong since that is the
> > MAC of the regular NIC implementation of mlx5_core.
> 
> Right but ATM it doesn't coexist with vdpa so what's the problem?
> 

This call is wrong:  mlx5_query_nic_vport_mac_address()

> > > Is switching to a random mac for such an unusual
> > > configuration really justified?
> > 
> > Since I can't use the NIC's MAC address, I have two options:
> > 1. To get the MAC address as was chosen by the user administering the
> >NIC. This should invoke the set_config callback. Unfortunately this
> >is not implemented yet.
> > 
> > 2. Use a random MAC address. This is OK since if (1) is implemented it
> >can always override this random configuration.
> > 
> > > It looks like changing a MAC could break some guests,
> > > can it not?
> > >
> > 
> > No, it will not. The current version of mlx5 VDPA does not allow regular
> > NIC driver and VDPA to co-exist. I have patches ready that enable that
> > from steering point of view. I will post them here once other patches on
> > which they depend will be merged.
> > 
> > https://patchwork.ozlabs.org/project/netdev/patch/20201120230339.651609-12-sae...@nvidia.com/
> 
> Could you be more explicit on the following points:
> - which configuration is broken ATM (as in, two device have identical
>   macs? any other issues)?

The only wrong thing is the call to  mlx5_query_nic_vport_mac_address().
It's not breaking anything yet is wrong. The random MAC address setting
is required for the steering patches.

> - why won't device MAC change from guest point of view?
> 

It's lack of implementation in qemu as far as I know.

> 
> > > > ---
> > > >  drivers/vdpa/mlx5/net/mlx5_vnet.c | 5 +
> > > >  1 file changed, 1 insertion(+), 4 deletions(-)
> > > > 
> > > > diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
> > > > b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > index 1fa6fcac8299..80d06d958b8b 100644
> > > > --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > @@ -1955,10 +1955,7 @@ void *mlx5_vdpa_add_dev(struct mlx5_core_dev 
> > > > *mdev)
> > > > if (err)
> > > > goto err_mtu;
> > > >  
> > > > -   err = mlx5_query_nic_vport_mac_address(mdev, 0, 0, config->mac);
> > > > -   if (err)
> > > > -   goto err_mtu;
> > > > -
> > > > +   eth_random_addr(config->mac);
> > > > mvdev->vdev.dma_dev = mdev->device;
> > > > err = mlx5_vdpa_alloc_resources(>mvdev);
> > > > if (err)
> > > > -- 
> > > > 2.26.2
> > > 
> 


Re: [PATCH] vdpa/mlx5: Use random MAC for the vdpa net instance

2020-11-29 Thread Eli Cohen
On Sun, Nov 29, 2020 at 03:08:22PM -0500, Michael S. Tsirkin wrote:
> On Sun, Nov 29, 2020 at 08:43:51AM +0200, Eli Cohen wrote:
> > We should not try to use the VF MAC address as that is used by the
> > regular (e.g. mlx5_core) NIC implementation. Instead, use a random
> > generated MAC address.
> > 
> > Suggested by: Cindy Lu 
> > Fixes: 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for supported mlx5 
> > devices")
> > Signed-off-by: Eli Cohen 
> 
> I didn't realise it's possible to use VF in two ways
> with and without vdpa.

Using a VF you can create quite a few resources, e.g. send queues
recieve queues, virtio_net queues etc. So you can possibly create
several instances of vdpa net devices and nic net devices.

> Could you include a bit more description on the failure
> mode?

Well, using the MAC address of the nic vport is wrong since that is the
MAC of the regular NIC implementation of mlx5_core.

> Is switching to a random mac for such an unusual
> configuration really justified?

Since I can't use the NIC's MAC address, I have two options:
1. To get the MAC address as was chosen by the user administering the
   NIC. This should invoke the set_config callback. Unfortunately this
   is not implemented yet.

2. Use a random MAC address. This is OK since if (1) is implemented it
   can always override this random configuration.

> It looks like changing a MAC could break some guests,
> can it not?
>

No, it will not. The current version of mlx5 VDPA does not allow regular
NIC driver and VDPA to co-exist. I have patches ready that enable that
from steering point of view. I will post them here once other patches on
which they depend will be merged.

https://patchwork.ozlabs.org/project/netdev/patch/20201120230339.651609-12-sae...@nvidia.com/
 
> > ---
> >  drivers/vdpa/mlx5/net/mlx5_vnet.c | 5 +
> >  1 file changed, 1 insertion(+), 4 deletions(-)
> > 
> > diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
> > b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > index 1fa6fcac8299..80d06d958b8b 100644
> > --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > @@ -1955,10 +1955,7 @@ void *mlx5_vdpa_add_dev(struct mlx5_core_dev *mdev)
> > if (err)
> > goto err_mtu;
> >  
> > -   err = mlx5_query_nic_vport_mac_address(mdev, 0, 0, config->mac);
> > -   if (err)
> > -   goto err_mtu;
> > -
> > +   eth_random_addr(config->mac);
> > mvdev->vdev.dma_dev = mdev->device;
> > err = mlx5_vdpa_alloc_resources(>mvdev);
> > if (err)
> > -- 
> > 2.26.2
> 


[PATCH] vdpa/mlx5: Use random MAC for the vdpa net instance

2020-11-28 Thread Eli Cohen
We should not try to use the VF MAC address as that is used by the
regular (e.g. mlx5_core) NIC implementation. Instead, use a random
generated MAC address.

Suggested by: Cindy Lu 
Fixes: 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for supported mlx5 devices")
Signed-off-by: Eli Cohen 
---
 drivers/vdpa/mlx5/net/mlx5_vnet.c | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index 1fa6fcac8299..80d06d958b8b 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -1955,10 +1955,7 @@ void *mlx5_vdpa_add_dev(struct mlx5_core_dev *mdev)
if (err)
goto err_mtu;
 
-   err = mlx5_query_nic_vport_mac_address(mdev, 0, 0, config->mac);
-   if (err)
-   goto err_mtu;
-
+   eth_random_addr(config->mac);
mvdev->vdev.dma_dev = mdev->device;
err = mlx5_vdpa_alloc_resources(>mvdev);
if (err)
-- 
2.26.2



Re: [PATCH] dcookies: Make dcookies depend on CONFIG_OPROFILE

2020-10-28 Thread William Cohen
On 10/27/20 12:54 PM, Linus Torvalds wrote:
> On Tue, Oct 27, 2020 at 1:52 AM Christoph Hellwig  wrote:
>>
>> Is it time to deprecate and eventually remove oprofile while we're at
>> it?
> 
> I think it's well past time.
> 
> I think the user-space "oprofile" program doesn't actually use the
> legacy kernel code any more, and hasn't for a long time.
> 
> But I might be wrong. Adding William Cohen to the cc, since he seems
> to still maintain it to make sure it builds etc.
> 
>  Linus
> 

Hi,

Yes, current OProfile code uses the existing linux perf infrastructure and 
doesn't use the old oprofile kernel code.  I have thought about removing that 
old oprofile driver code from kernel, but have not submitted patches for it. I 
would be fine with eliminating that code from the kernel.

-Will



Re: [PATCH] vdpa/mlx5: Fix error return in map_direct_mr()

2020-10-26 Thread Eli Cohen
On Mon, Oct 26, 2020 at 03:06:37PM +0800, Jing Xiangfeng wrote:
> Fix to return the variable "err" from the error handling case instead
> of "ret".
> 
> Fixes: 94abbccdf291 ("vdpa/mlx5: Add shared memory registration code")
> Signed-off-by: Jing Xiangfeng 

Acked-by: Eli Cohen 

> ---
>  drivers/vdpa/mlx5/core/mr.c | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/vdpa/mlx5/core/mr.c b/drivers/vdpa/mlx5/core/mr.c
> index ef1c550f8266..4b6195666c58 100644
> --- a/drivers/vdpa/mlx5/core/mr.c
> +++ b/drivers/vdpa/mlx5/core/mr.c
> @@ -239,7 +239,6 @@ static int map_direct_mr(struct mlx5_vdpa_dev *mvdev, 
> struct mlx5_vdpa_direct_mr
>   u64 paend;
>   struct scatterlist *sg;
>   struct device *dma = mvdev->mdev->device;
> - int ret;
>  
>   for (map = vhost_iotlb_itree_first(iotlb, mr->start, mr->end - 1);
>map; map = vhost_iotlb_itree_next(map, start, mr->end - 1)) {
> @@ -277,8 +276,8 @@ static int map_direct_mr(struct mlx5_vdpa_dev *mvdev, 
> struct mlx5_vdpa_direct_mr
>  done:
>   mr->log_size = log_entity_size;
>   mr->nsg = nsg;
> - ret = dma_map_sg_attrs(dma, mr->sg_head.sgl, mr->nsg, 
> DMA_BIDIRECTIONAL, 0);
> - if (!ret)
> + err = dma_map_sg_attrs(dma, mr->sg_head.sgl, mr->nsg, 
> DMA_BIDIRECTIONAL, 0);
> + if (!err)
>   goto err_map;
>  
>   err = create_direct_mr(mvdev, mr);
> -- 
> 2.17.1
> 


  1   2   3   4   5   6   7   8   9   10   >