Re: [PATCH net-next v2] net/mlx5e: Fix uninitialised struct field moder.comps

2021-04-20 Thread Leon Romanovsky
On Tue, Apr 20, 2021 at 05:28:43PM +0800, Zhu Yanjun wrote:
> On Tue, Apr 20, 2021 at 5:21 PM Leon Romanovsky  wrote:
> >
> > On Tue, Apr 20, 2021 at 03:09:03PM +0800, Zhu Yanjun wrote:
> > > On Tue, Apr 20, 2021 at 3:01 PM wangyunjian  
> > > wrote:
> > > >
> > > > From: Yunjian Wang 
> > > >
> > > > The 'comps' struct field in 'moder' is not being initialized in
> > > > mlx5e_get_def_rx_moderation() and mlx5e_get_def_tx_moderation().
> > > > So initialize 'moder' to zero to avoid the issue.
> >
> > Please state that it is false alarm and this patch doesn't fix anything
> > except broken static analyzer tool.
> >
> > > >
> > > > Addresses-Coverity: ("Uninitialized scalar variable")
> > > > Signed-off-by: Yunjian Wang 
> > > > ---
> > > > v2: update mlx5e_get_def_tx_moderation() also needs fixing
> > > > ---
> > > >  drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 4 ++--
> > > >  1 file changed, 2 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c 
> > > > b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> > > > index 5db63b9f3b70..17a817b7e539 100644
> > > > --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> > > > +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> > > > @@ -4868,7 +4868,7 @@ static bool slow_pci_heuristic(struct 
> > > > mlx5_core_dev *mdev)
> > > >
> > > >  static struct dim_cq_moder mlx5e_get_def_tx_moderation(u8 
> > > > cq_period_mode)
> > > >  {
> > > > -   struct dim_cq_moder moder;
> > >
> > > > +   struct dim_cq_moder moder = {};
> > >
> > > If I remember correctly, some gcc compiler will report errors about this 
> > > "{}".
> >
> > Kernel doesn't support such compilers.
> 
> Are you sure? Why are you so confirmative?

Yes, I'm sure.

Please read this whole discussion, I hope that it will answer your
question on why I'm so sure.
https://lore.kernel.org/linux-rdma/20200730192026.110246-1-yepeilin...@gmail.com/

> 
> Zhu Yanjun
> 
> >
> > Thanks


Re: [PATCH net-next v2] net/mlx5e: Fix uninitialised struct field moder.comps

2021-04-20 Thread Leon Romanovsky
On Tue, Apr 20, 2021 at 03:09:03PM +0800, Zhu Yanjun wrote:
> On Tue, Apr 20, 2021 at 3:01 PM wangyunjian  wrote:
> >
> > From: Yunjian Wang 
> >
> > The 'comps' struct field in 'moder' is not being initialized in
> > mlx5e_get_def_rx_moderation() and mlx5e_get_def_tx_moderation().
> > So initialize 'moder' to zero to avoid the issue.

Please state that it is false alarm and this patch doesn't fix anything
except broken static analyzer tool.

> >
> > Addresses-Coverity: ("Uninitialized scalar variable")
> > Signed-off-by: Yunjian Wang 
> > ---
> > v2: update mlx5e_get_def_tx_moderation() also needs fixing
> > ---
> >  drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c 
> > b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> > index 5db63b9f3b70..17a817b7e539 100644
> > --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> > +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> > @@ -4868,7 +4868,7 @@ static bool slow_pci_heuristic(struct mlx5_core_dev 
> > *mdev)
> >
> >  static struct dim_cq_moder mlx5e_get_def_tx_moderation(u8 cq_period_mode)
> >  {
> > -   struct dim_cq_moder moder;
> 
> > +   struct dim_cq_moder moder = {};
> 
> If I remember correctly, some gcc compiler will report errors about this "{}".

Kernel doesn't support such compilers.

Thanks


Re: [PATCH] net/rds: Avoid potential use after free in rds_send_remove_from_sock

2021-04-20 Thread Leon Romanovsky
On Tue, Apr 20, 2021 at 12:09:06PM +0300, Leon Romanovsky wrote:
> On Tue, Apr 06, 2021 at 07:09:12PM -0500, Aditya Pakki wrote:
> > In case of rs failure in rds_send_remove_from_sock(), the 'rm' resource
> > is freed and later under spinlock, causing potential use-after-free.
> > Set the free pointer to NULL to avoid undefined behavior.
> > 
> > Signed-off-by: Aditya Pakki 
> > ---
> >  net/rds/message.c | 1 +
> >  net/rds/send.c| 2 +-
> >  2 files changed, 2 insertions(+), 1 deletion(-)
> 
> Dave, Jakub
> 
> Please revert this patch, given responses from Eric and Al together
> with this response from Greg here 
> https://lore.kernel.org/lkml/YH5/i7ovsjsmq...@kroah.com

https://lore.kernel.org/lkml/yh5%2fi7ovsjsmq...@kroah.com/

> 
> BTW, I looked on the rds code too and agree with Eric, this patch
> is a total garbage.
> 
> Thanks


Re: [PATCH] net/rds: Avoid potential use after free in rds_send_remove_from_sock

2021-04-20 Thread Leon Romanovsky
On Tue, Apr 06, 2021 at 07:09:12PM -0500, Aditya Pakki wrote:
> In case of rs failure in rds_send_remove_from_sock(), the 'rm' resource
> is freed and later under spinlock, causing potential use-after-free.
> Set the free pointer to NULL to avoid undefined behavior.
> 
> Signed-off-by: Aditya Pakki 
> ---
>  net/rds/message.c | 1 +
>  net/rds/send.c| 2 +-
>  2 files changed, 2 insertions(+), 1 deletion(-)

Dave, Jakub

Please revert this patch, given responses from Eric and Al together
with this response from Greg here 
https://lore.kernel.org/lkml/YH5/i7ovsjsmq...@kroah.com

BTW, I looked on the rds code too and agree with Eric, this patch
is a total garbage.

Thanks


Re: [PATCH iproute2] lib: move get_task_name() from rdma

2021-04-19 Thread Leon Romanovsky
On Mon, Apr 19, 2021 at 03:34:58PM +0200, Andrea Claudi wrote:
> The function get_task_name() is used to get the name of a process from
> its pid, and its implementation is similar to ip/iptuntap.c:pid_name().
> 
> Move it to lib/fs.c to use a single implementation and make it easily
> reusable.
> 
> Signed-off-by: Andrea Claudi 
> ---
>  include/utils.h |  1 +
>  ip/iptuntap.c   | 31 +--
>  lib/fs.c| 24 
>  rdma/res.c  | 24 
>  rdma/res.h  |  1 -
>  5 files changed, 26 insertions(+), 55 deletions(-)
> 

Thanks,
Acked-by: Leon Romanovsky 


Re: [RFC v1 PATCH 3/3] driver: update all the code that use soc_device_match

2021-04-18 Thread Leon Romanovsky
On Mon, Apr 19, 2021 at 12:27:22PM +0800, Alice Guo (OSS) wrote:
> From: Alice Guo 
> 
> Update all the code that use soc_device_match because add support for
> soc_device_match returning -EPROBE_DEFER.
> 
> Signed-off-by: Alice Guo 
> ---
>  drivers/bus/ti-sysc.c |  2 +-
>  drivers/clk/renesas/r8a7795-cpg-mssr.c|  4 +++-
>  drivers/clk/renesas/rcar-gen2-cpg.c   |  2 +-
>  drivers/clk/renesas/rcar-gen3-cpg.c   |  2 +-
>  drivers/dma/fsl-dpaa2-qdma/dpaa2-qdma.c   |  7 ++-
>  drivers/dma/ti/k3-psil.c  |  3 +++
>  drivers/dma/ti/k3-udma.c  |  2 +-
>  drivers/gpu/drm/bridge/nwl-dsi.c  |  2 +-
>  drivers/gpu/drm/meson/meson_drv.c |  4 +++-
>  drivers/gpu/drm/omapdrm/dss/dispc.c   |  2 +-
>  drivers/gpu/drm/omapdrm/dss/dpi.c |  4 +++-
>  drivers/gpu/drm/omapdrm/dss/dsi.c |  3 +++
>  drivers/gpu/drm/omapdrm/dss/dss.c |  3 +++
>  drivers/gpu/drm/omapdrm/dss/hdmi4_core.c  |  3 +++
>  drivers/gpu/drm/omapdrm/dss/venc.c|  4 +++-
>  drivers/gpu/drm/omapdrm/omap_drv.c|  3 +++
>  drivers/gpu/drm/rcar-du/rcar_du_crtc.c|  4 +++-
>  drivers/gpu/drm/rcar-du/rcar_lvds.c   |  2 +-
>  drivers/gpu/drm/tidss/tidss_dispc.c   |  4 +++-
>  drivers/iommu/ipmmu-vmsa.c|  7 +--
>  drivers/media/platform/rcar-vin/rcar-core.c   |  2 +-
>  drivers/media/platform/rcar-vin/rcar-csi2.c   |  2 +-
>  drivers/media/platform/vsp1/vsp1_uif.c|  4 +++-
>  drivers/mmc/host/renesas_sdhi_core.c  |  2 +-
>  drivers/mmc/host/renesas_sdhi_internal_dmac.c |  2 +-
>  drivers/mmc/host/sdhci-of-esdhc.c | 21 ++-
>  drivers/mmc/host/sdhci-omap.c |  2 +-
>  drivers/mmc/host/sdhci_am654.c|  2 +-
>  drivers/net/ethernet/renesas/ravb_main.c  |  4 +++-
>  drivers/net/ethernet/ti/am65-cpsw-nuss.c  |  2 +-
>  drivers/net/ethernet/ti/cpsw.c|  2 +-
>  drivers/net/ethernet/ti/cpsw_new.c|  2 +-
>  drivers/phy/ti/phy-omap-usb2.c|  4 +++-
>  drivers/pinctrl/renesas/core.c|  2 +-
>  drivers/pinctrl/renesas/pfc-r8a7790.c |  5 -
>  drivers/pinctrl/renesas/pfc-r8a7794.c |  5 -
>  drivers/soc/fsl/dpio/dpio-driver.c| 13 
>  drivers/soc/renesas/r8a774c0-sysc.c   |  5 -
>  drivers/soc/renesas/r8a7795-sysc.c|  2 +-
>  drivers/soc/renesas/r8a77990-sysc.c   |  5 -
>  drivers/soc/ti/k3-ringacc.c   |  2 +-
>  drivers/staging/mt7621-pci/pci-mt7621.c   |  2 +-
>  drivers/thermal/rcar_gen3_thermal.c   |  4 +++-
>  drivers/thermal/ti-soc-thermal/ti-bandgap.c   | 10 +++--
>  drivers/usb/gadget/udc/renesas_usb3.c |  2 +-
>  drivers/usb/host/ehci-platform.c  |  4 +++-
>  drivers/usb/host/xhci-rcar.c  |  2 +-
>  drivers/watchdog/renesas_wdt.c|  2 +-
>  48 files changed, 131 insertions(+), 52 deletions(-)
> 
> diff --git a/drivers/bus/ti-sysc.c b/drivers/bus/ti-sysc.c
> index 5fae60f8c135..00c59aa217c1 100644
> --- a/drivers/bus/ti-sysc.c
> +++ b/drivers/bus/ti-sysc.c
> @@ -2909,7 +2909,7 @@ static int sysc_init_soc(struct sysc *ddata)
>   }
>  
>   match = soc_device_match(sysc_soc_feat_match);
> - if (!match)
> + if (!match || IS_ERR(match))
>   return 0;
>  
>   if (match->data)
> diff --git a/drivers/clk/renesas/r8a7795-cpg-mssr.c 
> b/drivers/clk/renesas/r8a7795-cpg-mssr.c
> index c32d2c678046..90a18336a4c3 100644
> --- a/drivers/clk/renesas/r8a7795-cpg-mssr.c
> +++ b/drivers/clk/renesas/r8a7795-cpg-mssr.c
> @@ -439,6 +439,7 @@ static const unsigned int r8a7795es2_mod_nullify[] 
> __initconst = {
>  
>  static int __init r8a7795_cpg_mssr_init(struct device *dev)
>  {
> + const struct soc_device_attribute *match;
>   const struct rcar_gen3_cpg_pll_config *cpg_pll_config;
>   u32 cpg_mode;
>   int error;
> @@ -453,7 +454,8 @@ static int __init r8a7795_cpg_mssr_init(struct device 
> *dev)
>   return -EINVAL;
>   }
>  
> - if (soc_device_match(r8a7795es1)) {
> + match = soc_device_match(r8a7795es1);
> + if (!IS_ERR(match) && match) {

"if (!IS_ERR_OR_NULL(match))" in all places.

Thanks


[PATCH rdma-next 0/2] Two fixes to -next

2021-04-18 Thread Leon Romanovsky
From: Leon Romanovsky 

The two fixes are targeted to the -next. Maor's change fixes DM code
that was accepted in this cycle and Parav's change doesn't qualify
urgency of -rc8.

Thanks

Maor Gottlieb (1):
  RDMA/mlx5: Fix type assignment for ICM DM

Parav Pandit (1):
  IB/mlx5: Set right RoCE l3 type and roce version while deleting GID

 drivers/infiniband/hw/mlx5/dm.c   | 23 +++
 drivers/infiniband/hw/mlx5/main.c |  8 +++
 .../net/ethernet/mellanox/mlx5/core/lib/gid.c |  4 ++--
 3 files changed, 19 insertions(+), 16 deletions(-)

-- 
2.30.2



[PATCH mlx5-next 1/2] IB/mlx5: Set right RoCE l3 type and roce version while deleting GID

2021-04-18 Thread Leon Romanovsky
From: Parav Pandit 

Currently when GID is deleted, it zero out all the fields of the RoCE
address in the SET_ROCE_ADDRESS command for a specified index.

roce_version = 0 means RoCEv1 in the SET_ROCE_ADDRESS command.

This assumes that device has RoCEv1 always enabled which is not always
correct. For example Subfunction does not support RoCEv1.

Due to this assumption a previously added RoCEv2 GID is always deleted as
RoCEv1 GID. This results in a below syndrome.

mlx5_core.sf mlx5_core.sf.4: mlx5_cmd_check:777:(pid 4256): 
SET_ROCE_ADDRESS(0x761) op_mod(0x0) failed, status bad parameter(0x3), syndrome 
(0x12822d)

Hence set the right RoCE version during GID deletion provided by the
core.

Signed-off-by: Parav Pandit 
Signed-off-by: Leon Romanovsky 
---
 drivers/infiniband/hw/mlx5/main.c | 8 
 drivers/net/ethernet/mellanox/mlx5/core/lib/gid.c | 4 ++--
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/main.c 
b/drivers/infiniband/hw/mlx5/main.c
index 7ea6137f8d12..6d1dd09a4388 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -555,15 +555,15 @@ static int set_roce_addr(struct mlx5_ib_dev *dev, u32 
port_num,
 unsigned int index, const union ib_gid *gid,
 const struct ib_gid_attr *attr)
 {
-   enum ib_gid_type gid_type = IB_GID_TYPE_ROCE;
+   enum ib_gid_type gid_type;
u16 vlan_id = 0x;
u8 roce_version = 0;
u8 roce_l3_type = 0;
u8 mac[ETH_ALEN];
int ret;
 
+   gid_type = attr->gid_type;
if (gid) {
-   gid_type = attr->gid_type;
ret = rdma_read_gid_l2_fields(attr, &vlan_id, &mac[0]);
if (ret)
return ret;
@@ -575,7 +575,7 @@ static int set_roce_addr(struct mlx5_ib_dev *dev, u32 
port_num,
break;
case IB_GID_TYPE_ROCE_UDP_ENCAP:
roce_version = MLX5_ROCE_VERSION_2;
-   if (ipv6_addr_v4mapped((void *)gid))
+   if (gid && ipv6_addr_v4mapped((void *)gid))
roce_l3_type = MLX5_ROCE_L3_TYPE_IPV4;
else
roce_l3_type = MLX5_ROCE_L3_TYPE_IPV6;
@@ -602,7 +602,7 @@ static int mlx5_ib_del_gid(const struct ib_gid_attr *attr,
   __always_unused void **context)
 {
return set_roce_addr(to_mdev(attr->device), attr->port_num,
-attr->index, NULL, NULL);
+attr->index, NULL, attr);
 }
 
 __be16 mlx5_get_roce_udp_sport_min(const struct mlx5_ib_dev *dev,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/gid.c 
b/drivers/net/ethernet/mellanox/mlx5/core/lib/gid.c
index a68738c8f4bc..215fa8bdce48 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/lib/gid.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/gid.c
@@ -142,10 +142,10 @@ int mlx5_core_roce_gid_set(struct mlx5_core_dev *dev, 
unsigned int index,
}
 
ether_addr_copy(addr_mac, mac);
-   MLX5_SET_RA(in_addr, roce_version, roce_version);
-   MLX5_SET_RA(in_addr, roce_l3_type, roce_l3_type);
memcpy(addr_l3_addr, gid, gidsz);
}
+   MLX5_SET_RA(in_addr, roce_version, roce_version);
+   MLX5_SET_RA(in_addr, roce_l3_type, roce_l3_type);
 
if (MLX5_CAP_GEN(dev, num_vhca_ports) > 0)
MLX5_SET(set_roce_address_in, in, vhca_port_num, port_num);
-- 
2.30.2



Re: [PATCH iproute2] rdma: stat: fix return code

2021-04-18 Thread Leon Romanovsky
On Sun, Apr 18, 2021 at 02:56:30PM +0200, Andrea Claudi wrote:
> libmnl defines MNL_CB_OK as 1 and MNL_CB_ERROR as -1. rdma uses these
> return codes, and stat_qp_show_parse_cb() should do the same.
> 
> Fixes: 16ce4d23661a ("rdma: stat: initialize ret in stat_qp_show_parse_cb()")
> Reported-by: Leon Romanovsky 
> Signed-off-by: Andrea Claudi 
> ---
>  rdma/stat.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 

Thanks,
Acked-by: Leon Romanovsky 


Re: [PATCH iproute2] rdma: stat: initialize ret in stat_qp_show_parse_cb()

2021-04-18 Thread Leon Romanovsky
On Sun, Apr 18, 2021 at 02:00:38PM +0200, Andrea Claudi wrote:
> On Sun, Apr 18, 2021 at 1:07 PM Leon Romanovsky  wrote:
> >
> > On Wed, Apr 14, 2021 at 12:50:57AM +0200, Andrea Claudi wrote:
> > > In the unlikely case in which the mnl_attr_for_each_nested() cycle is
> > > not executed, this function return an uninitialized value.
> > >
> > > Fix this initializing ret to 0.
> > >
> > > Fixes: 5937552b42e4 ("rdma: Add "stat qp show" support")
> > > Signed-off-by: Andrea Claudi 
> > > ---
> > >  rdma/stat.c | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > > diff --git a/rdma/stat.c b/rdma/stat.c
> > > index 75d45288..3abedae7 100644
> > > --- a/rdma/stat.c
> > > +++ b/rdma/stat.c
> > > @@ -307,7 +307,7 @@ static int stat_qp_show_parse_cb(const struct 
> > > nlmsghdr *nlh, void *data)
> > >   struct rd *rd = data;
> > >   const char *name;
> > >   uint32_t idx;
> > > - int ret;
> > > + int ret = 0;
> >
> > It should be MNL_CB_OK which is 1 and not 0.
> >
> > Thanks.
> >
> 
> Hi Leon, and thanks for pointing this out.
> As this is already merged, I'll submit a fix.

Thanks

> 
> Regards,
> Andrea
> 
> > >
> > >   mnl_attr_parse(nlh, 0, rd_attr_cb, tb);
> > >   if (!tb[RDMA_NLDEV_ATTR_DEV_INDEX] || !tb[RDMA_NLDEV_ATTR_DEV_NAME] 
> > > ||
> > > --
> > > 2.30.2
> > >
> >
> 


Re: [PATCH v4 05/23] ice: Add devlink params support

2021-04-18 Thread Leon Romanovsky
On Wed, Apr 14, 2021 at 12:21:08AM +, Saleem, Shiraz wrote:
> > Subject: RE: [PATCH v4 05/23] ice: Add devlink params support

<...>

> > > Why not just allow the setting to apply dynamically during a 'set'
> > > itself with an unplug/plug of the auxdev with correct type.
> > >
> > This suggestion came up in the internal discussion too.
> > However such task needs to synchronize with devlink reload command and also
> > with driver remove() sequence.
> > So locking wise and depending on amount of config change, it is close to 
> > what
> > reload will do.
> 
> Holding this mutex across the auxiliary device unplug/plug in "set" wont cut 
> it?
> https://elixir.bootlin.com/linux/v5.12-rc7/source/drivers/net/ethernet/mellanox/mlx5/core/main.c#L1304

Like Parav said, we are working to fix it and already have one working
solution, unfortunately it has one eyebrow raising change and we are
trying another one.

You can take a look here to get sense of the scope:
https://git.kernel.org/pub/scm/linux/kernel/git/leon/linux-rdma.git/log/?h=devlink-core

Thanks


Re: [PATCH iproute2] rdma: stat: initialize ret in stat_qp_show_parse_cb()

2021-04-18 Thread Leon Romanovsky
On Wed, Apr 14, 2021 at 12:50:57AM +0200, Andrea Claudi wrote:
> In the unlikely case in which the mnl_attr_for_each_nested() cycle is
> not executed, this function return an uninitialized value.
> 
> Fix this initializing ret to 0.
> 
> Fixes: 5937552b42e4 ("rdma: Add "stat qp show" support")
> Signed-off-by: Andrea Claudi 
> ---
>  rdma/stat.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/rdma/stat.c b/rdma/stat.c
> index 75d45288..3abedae7 100644
> --- a/rdma/stat.c
> +++ b/rdma/stat.c
> @@ -307,7 +307,7 @@ static int stat_qp_show_parse_cb(const struct nlmsghdr 
> *nlh, void *data)
>   struct rd *rd = data;
>   const char *name;
>   uint32_t idx;
> - int ret;
> + int ret = 0;

It should be MNL_CB_OK which is 1 and not 0.

Thanks.

>  
>   mnl_attr_parse(nlh, 0, rd_attr_cb, tb);
>   if (!tb[RDMA_NLDEV_ATTR_DEV_INDEX] || !tb[RDMA_NLDEV_ATTR_DEV_NAME] ||
> -- 
> 2.30.2
> 


Re: [RFC V2 PATCH 11/12] HV/Netvsc: Add Isolation VM support for netvsc driver

2021-04-18 Thread Leon Romanovsky
On Tue, Apr 13, 2021 at 11:22:16AM -0400, Tianyu Lan wrote:
> From: Tianyu Lan 
> 
> In Isolation VM, all shared memory with host needs to mark visible
> to host via hvcall. vmbus_establish_gpadl() has already done it for
> netvsc rx/tx ring buffer. The page buffer used by vmbus_sendpacket_
> pagebuffer() still need to handle. Use DMA API to map/umap these
> memory during sending/receiving packet and Hyper-V DMA ops callback
> will use swiotlb fucntion to allocate bounce buffer and copy data
> from/to bounce buffer.
> 
> Signed-off-by: Tianyu Lan 
> ---
>  drivers/net/hyperv/hyperv_net.h   |  11 +++
>  drivers/net/hyperv/netvsc.c   | 137 --
>  drivers/net/hyperv/rndis_filter.c |   3 +
>  3 files changed, 144 insertions(+), 7 deletions(-)

<...>

> + packet->dma_range = kzalloc(sizeof(struct dma_range) * page_count,
> +   GFP_KERNEL);
> + if (!packet->dma_range)
> + return -ENOMEM;
> +
> + for (i = 0; i < page_count; i++) {
> + char *src = phys_to_virt((pb[i].pfn << HV_HYP_PAGE_SHIFT)
> +  + pb[i].offset);
> + u32 len = pb[i].len;
> +
> + dma = dma_map_single(&hv_dev->device, src, len,
> +  DMA_TO_DEVICE);
> + if (dma_mapping_error(&hv_dev->device, dma))
> + return -ENOMEM;

Don't you leak dma_range here?

BTW, It will be easier if you CC all on all patches, so we will be able
to get whole context.

Thanks


Re: [PATCH rdma-next v2 0/5] Get rid of custom made module dependency

2021-04-17 Thread Leon Romanovsky
On Sun, Apr 18, 2021 at 12:09:16AM +0530, Devesh Sharma wrote:
> On Sat, Apr 17, 2021 at 1:44 PM Leon Romanovsky  wrote:
> >
> > On Wed, Apr 14, 2021 at 07:15:37PM +0530, Devesh Sharma wrote:
> > > On Mon, Apr 12, 2021 at 1:10 PM Leon Romanovsky  wrote:
> > > >
> > > > On Thu, Apr 08, 2021 at 08:42:57PM +0530, Devesh Sharma wrote:
> > > > > On Thu, Apr 8, 2021 at 5:14 PM Leon Romanovsky  
> > > > > wrote:
> > > > > >
> > > > > > On Thu, Apr 08, 2021 at 05:06:24PM +0530, Devesh Sharma wrote:
> > > > > > > On Sat, Apr 3, 2021 at 5:12 PM Leon Romanovsky  
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > On Sat, Apr 03, 2021 at 03:52:13PM +0530, Devesh Sharma wrote:
> > > > > > > > > On Thu, Apr 1, 2021 at 12:27 PM Leon Romanovsky 
> > > > > > > > >  wrote:
> > > > > > > > > >
> > > > > > > > > > From: Leon Romanovsky 
> > > > > > > > > >
> > > > > > > > > > Changelog:
> > > > > > > > > > v2:
> > > > > > > > > >  * kbuild spotted that I didn't delete all code in patch 
> > > > > > > > > > #5, so deleted
> > > > > > > > > >even more ulp_ops derefences.
> > > > > > > > > > v1: 
> > > > > > > > > > https://lore.kernel.org/linux-rdma/20210329085212.257771-1-l...@kernel.org
> > > > > > > > > >  * Go much deeper and removed useless ULP indirection
> > > > > > > > > > v0: 
> > > > > > > > > > https://lore.kernel.org/linux-rdma/20210324142524.1135319-1-l...@kernel.org
> > > > > > > > > > ---
> > > > > > > > > >
> > > > > > > > > > The following series fixes issue spotted in [1], where 
> > > > > > > > > > bnxt_re driver
> > > > > > > > > > messed with module reference counting in order to implement 
> > > > > > > > > > symbol
> > > > > > > > > > dependency of bnxt_re and bnxt modules. All of this is 
> > > > > > > > > > done, when in
> > > > > > > > > > upstream we have only one ULP user of that bnxt module. The 
> > > > > > > > > > simple
> > > > > > > > > > declaration of exported symbol would do the trick.
> > > > > > > > > >
> > > > > > > > > > This series removes that custom module_get/_put, which is 
> > > > > > > > > > not supposed
> > > > > > > > > > to be in the driver from the beginning and get rid of nasty 
> > > > > > > > > > indirection
> > > > > > > > > > logic that isn't relevant for the upstream code.
> > > > > > > > > >
> > > > > > > > > > Such small changes allow us to simplify the bnxt code and 
> > > > > > > > > > my hope that
> > > > > > > > > > Devesh will continue where I stopped and remove struct 
> > > > > > > > > > bnxt_ulp_ops too.
> > > > > > > > > >
> > > > > > > > > > Thanks
> > > > > > > > > >
> > > > > > > > > > [1] 
> > > > > > > > > > https://lore.kernel.org/linux-rdma/20210324142524.1135319-1-l...@kernel.org
> > > > > > > > > >
> > > > > > > > > > Leon Romanovsky (5):
> > > > > > > > > >   RDMA/bnxt_re: Depend on bnxt ethernet driver and not 
> > > > > > > > > > blindly select it
> > > > > > > > > >   RDMA/bnxt_re: Create direct symbolic link between bnxt 
> > > > > > > > > > modules
> > > > > > > > > >   RDMA/bnxt_re: Get rid of custom module reference counting
> > > > > > > > > >   net/bnxt: Remove useless check of non-existent ULP id
> > > > > > > > > >   net/bnxt: Use direct API instead of useless in

Re: [PATCH rdma-next v2 0/5] Get rid of custom made module dependency

2021-04-17 Thread Leon Romanovsky
On Wed, Apr 14, 2021 at 07:15:37PM +0530, Devesh Sharma wrote:
> On Mon, Apr 12, 2021 at 1:10 PM Leon Romanovsky  wrote:
> >
> > On Thu, Apr 08, 2021 at 08:42:57PM +0530, Devesh Sharma wrote:
> > > On Thu, Apr 8, 2021 at 5:14 PM Leon Romanovsky  wrote:
> > > >
> > > > On Thu, Apr 08, 2021 at 05:06:24PM +0530, Devesh Sharma wrote:
> > > > > On Sat, Apr 3, 2021 at 5:12 PM Leon Romanovsky  
> > > > > wrote:
> > > > > >
> > > > > > On Sat, Apr 03, 2021 at 03:52:13PM +0530, Devesh Sharma wrote:
> > > > > > > On Thu, Apr 1, 2021 at 12:27 PM Leon Romanovsky  
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > From: Leon Romanovsky 
> > > > > > > >
> > > > > > > > Changelog:
> > > > > > > > v2:
> > > > > > > >  * kbuild spotted that I didn't delete all code in patch #5, so 
> > > > > > > > deleted
> > > > > > > >even more ulp_ops derefences.
> > > > > > > > v1: 
> > > > > > > > https://lore.kernel.org/linux-rdma/20210329085212.257771-1-l...@kernel.org
> > > > > > > >  * Go much deeper and removed useless ULP indirection
> > > > > > > > v0: 
> > > > > > > > https://lore.kernel.org/linux-rdma/20210324142524.1135319-1-l...@kernel.org
> > > > > > > > ---
> > > > > > > >
> > > > > > > > The following series fixes issue spotted in [1], where bnxt_re 
> > > > > > > > driver
> > > > > > > > messed with module reference counting in order to implement 
> > > > > > > > symbol
> > > > > > > > dependency of bnxt_re and bnxt modules. All of this is done, 
> > > > > > > > when in
> > > > > > > > upstream we have only one ULP user of that bnxt module. The 
> > > > > > > > simple
> > > > > > > > declaration of exported symbol would do the trick.
> > > > > > > >
> > > > > > > > This series removes that custom module_get/_put, which is not 
> > > > > > > > supposed
> > > > > > > > to be in the driver from the beginning and get rid of nasty 
> > > > > > > > indirection
> > > > > > > > logic that isn't relevant for the upstream code.
> > > > > > > >
> > > > > > > > Such small changes allow us to simplify the bnxt code and my 
> > > > > > > > hope that
> > > > > > > > Devesh will continue where I stopped and remove struct 
> > > > > > > > bnxt_ulp_ops too.
> > > > > > > >
> > > > > > > > Thanks
> > > > > > > >
> > > > > > > > [1] 
> > > > > > > > https://lore.kernel.org/linux-rdma/20210324142524.1135319-1-l...@kernel.org
> > > > > > > >
> > > > > > > > Leon Romanovsky (5):
> > > > > > > >   RDMA/bnxt_re: Depend on bnxt ethernet driver and not blindly 
> > > > > > > > select it
> > > > > > > >   RDMA/bnxt_re: Create direct symbolic link between bnxt modules
> > > > > > > >   RDMA/bnxt_re: Get rid of custom module reference counting
> > > > > > > >   net/bnxt: Remove useless check of non-existent ULP id
> > > > > > > >   net/bnxt: Use direct API instead of useless indirection
> > > > > > > >
> > > > > > > >  drivers/infiniband/hw/bnxt_re/Kconfig |   4 +-
> > > > > > > >  drivers/infiniband/hw/bnxt_re/main.c  |  93 ++-
> > > > > > > >  drivers/net/ethernet/broadcom/bnxt/bnxt.c |   4 +-
> > > > > > > >  drivers/net/ethernet/broadcom/bnxt/bnxt.h |   1 -
> > > > > > > >  drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.c | 245 
> > > > > > > > +++---
> > > > > > > >  drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.h |  32 +--
> > > > > > > >  6 files changed, 119 insertions(+), 260 deletions(-)
> > > > > > >
> > > > > > > Hi Leon,
> > > > > > >
> > > > > > > After a couple of internal discussions we reached a conclusion to
> > > > > > > implement the Auxbus driver interface and fix the problem once 
> > > > > > > and for
> > > > > > > all.
> > > > > >
> > > > > > Thanks Devesh,
> > > > > >
> > > > > > Jason, it looks like we can proceed with this patchset, because in
> > > > > > auxbus mode this module refcount and ULP indirection logics will be
> > > > > > removed anyway.
> > > > > >
> > > > > > Thanks
> > > > > Hi Leon,
> > > > >
> > > > > In my internal testing, I am seeing a crash using the 3rd patch. I am
> > > > > spending a few cycles on debugging it. expect my input in a day or so.
> > > >
> > > > Can you please post the kernel crash report here?
> > > > I don't see how function rename in patch #3 can cause to the crash.
> > > Hey, unfortunately my kdump service config is giving me tough time on
> > > my host. I will share if I get it.
> >
> > Any news here?
> Expect something by this Friday. yesterday was a holiday in India.

Any update? 
This series is close to three weeks already and I would like to progress with 
it.

Thanks


Re: [PATCH rdma-next v1 0/7] Add MEMIC operations support

2021-04-13 Thread Leon Romanovsky
On Tue, Apr 13, 2021 at 03:55:27PM -0300, Jason Gunthorpe wrote:
> On Sun, Apr 11, 2021 at 03:29:17PM +0300, Leon Romanovsky wrote:
> > From: Leon Romanovsky 
> > 
> > Changelog:
> > v1: 
> >  * Changed logic of patch #6 per-Jason's request. 
> > v0: 
> > https://lore.kernel.org/linux-rdma/20210318111548.674749-1-l...@kernel.org
> > 
> > ---
> > 
> > Hi,
> > 
> > This series from Maor extends MEMIC to support atomic operations from
> > the host in addition to already supported regular read/write.
> > 
> > Thanks
> > 
> > Maor Gottlieb (7):
> >   net/mlx5: Add MEMIC operations related bits
> >   RDMA/uverbs: Make UVERBS_OBJECT_METHODS to consider line number
> >   RDMA/mlx5: Move all DM logic to separate file
> >   RDMA/mlx5: Re-organize the DM code
> >   RDMA/mlx5: Add support to MODIFY_MEMIC command
> >   RDMA/mlx5: Add support in MEMIC operations
> >   RDMA/mlx5: Expose UAPI to query DM
> 
> This looks OK now, can you update the shared branch?

63f9c44bca5e net/mlx5: Add MEMIC operations related bits

mlx5-next was updated, thanks

> 
> Thanks,
> Jason


Re: [PATCH for-next v3 0/2] Introduce rdma_set_min_rnr_timer() and use it in RDS

2021-04-13 Thread Leon Romanovsky
On Tue, Apr 13, 2021 at 11:13:38AM +, Haakon Bugge wrote:
> 
> 
> > On 13 Apr 2021, at 08:29, Leon Romanovsky  wrote:
> > 
> > On Mon, Apr 12, 2021 at 07:58:47PM -0300, Jason Gunthorpe wrote:
> >> On Wed, Mar 31, 2021 at 08:43:12PM +0200, Håkon Bugge wrote:
> >>> ib_modify_qp() is an expensive operation on some HCAs running
> >>> virtualized. This series removes two ib_modify_qp() calls from RDS.
> >>> 
> >>> I am sending this as a v3, even though it is the first sent to
> >>> net. This because the IB Core commit has reach v3.
> >>> 
> >>> Håkon Bugge (2):
> >>>  IB/cma: Introduce rdma_set_min_rnr_timer()
> >>>  rds: ib: Remove two ib_modify_qp() calls
> >> 
> >> Applied to rdma for-next, thanks
> > 
> > Jason,
> > 
> > It should be 
> > +   WARN_ON(id->qp_type != IB_QPT_RC && id->qp_type != IB_QPT_XRC_TGT);
> 
> With no return you will arm the setting of the timer and subsequently get an 
> error from the modify_qp later.

The addition of WARN_ON() means that this is programmer error to get
such input. Historically, in-kernel API doesn't need to have protection
from other kernel developers.

Thanks

> 
> 
> Håkon
> 
> > 
> > and not
> > +   if (WARN_ON(id->qp_type != IB_QPT_RC && id->qp_type != IB_QPT_XRC_TGT))
> > +   return -EINVAL;
> > 
> > Thanks
> > 
> >> 
> >> Jason
> 


Re: [PATCH for-next v3 0/2] Introduce rdma_set_min_rnr_timer() and use it in RDS

2021-04-12 Thread Leon Romanovsky
On Mon, Apr 12, 2021 at 07:58:47PM -0300, Jason Gunthorpe wrote:
> On Wed, Mar 31, 2021 at 08:43:12PM +0200, Håkon Bugge wrote:
> > ib_modify_qp() is an expensive operation on some HCAs running
> > virtualized. This series removes two ib_modify_qp() calls from RDS.
> > 
> > I am sending this as a v3, even though it is the first sent to
> > net. This because the IB Core commit has reach v3.
> > 
> > Håkon Bugge (2):
> >   IB/cma: Introduce rdma_set_min_rnr_timer()
> >   rds: ib: Remove two ib_modify_qp() calls
> 
> Applied to rdma for-next, thanks

Jason,

It should be 
+   WARN_ON(id->qp_type != IB_QPT_RC && id->qp_type != IB_QPT_XRC_TGT);

and not
+   if (WARN_ON(id->qp_type != IB_QPT_RC && id->qp_type != IB_QPT_XRC_TGT))
+   return -EINVAL;

Thanks

> 
> Jason


Re: [PATCH net-next 4/5] bnxt_en: Refactor __bnxt_vf_reps_destroy().

2021-04-12 Thread Leon Romanovsky
On Mon, Apr 12, 2021 at 10:51:25AM -0700, Michael Chan wrote:
> On Mon, Apr 12, 2021 at 10:33 AM Leon Romanovsky  wrote:
> >
> > On Mon, Apr 12, 2021 at 09:31:33AM -0700, Michael Chan wrote:
> > > On Mon, Apr 12, 2021 at 12:37 AM Leon Romanovsky  wrote:
> > > >
> > > > On Sun, Apr 11, 2021 at 08:18:14PM -0400, Michael Chan wrote:
> > > > > Add a new helper function __bnxt_free_one_vf_rep() to free one VF rep.
> > > > > We also reintialize the VF rep fields to proper initial values so that
> > > > > the function can be used without freeing the VF rep data structure.  
> > > > > This
> > > > > will be used in subsequent patches to free and recreate VF reps after
> > > > > error recovery.
> > > > >
> > > > > Reviewed-by: Edwin Peer 
> > > > > Reviewed-by: Sriharsha Basavapatna 
> > > > > 
> > > > > Signed-off-by: Michael Chan 
> > > > > ---
> > > > >  drivers/net/ethernet/broadcom/bnxt/bnxt_vfr.c | 21 
> > > > > ++-
> > > > >  1 file changed, 16 insertions(+), 5 deletions(-)
> > > > >
> > > > > diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_vfr.c 
> > > > > b/drivers/net/ethernet/broadcom/bnxt/bnxt_vfr.c
> > > > > index b5d6cd63bea7..a4ac11f5b0e5 100644
> > > > > --- a/drivers/net/ethernet/broadcom/bnxt/bnxt_vfr.c
> > > > > +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_vfr.c
> > > > > @@ -288,6 +288,21 @@ void bnxt_vf_reps_open(struct bnxt *bp)
> > > > >   bnxt_vf_rep_open(bp->vf_reps[i]->dev);
> > > > >  }
> > > > >
> > > > > +static void __bnxt_free_one_vf_rep(struct bnxt *bp, struct 
> > > > > bnxt_vf_rep *vf_rep)
> > > > > +{
> > > > > + if (!vf_rep)
> > > > > + return;
> > > >
> > > > How can it be NULL if you check that vf_rep != NULL when called to
> > > > __bnxt_free_one_vf_rep() ?
> > > >
> > >
> > > For this patch, the if (!vf_rep) check here is redundant.  But it is
> > > needed in the next patch (patch 5) that calls this function from
> > > bnxt_vf_reps_free() in a different context.  Thanks.
> >
> > So add it in the patch that needs it.
> >
> 
> As stated in the changelog, we added more code to make this function
> more general and usable from another context.  The check for !vf_rep
> is part of that.  In my opinion, I think it is ok to keep it here
> given that the intent of this patch is to create a more general
> function.  Thanks.

I disagreed, but given the fact that Dave already merged this series, it
doesn't matter anymore.

Thanks



Re: [PATCH net-next 4/5] bnxt_en: Refactor __bnxt_vf_reps_destroy().

2021-04-12 Thread Leon Romanovsky
On Mon, Apr 12, 2021 at 09:31:33AM -0700, Michael Chan wrote:
> On Mon, Apr 12, 2021 at 12:37 AM Leon Romanovsky  wrote:
> >
> > On Sun, Apr 11, 2021 at 08:18:14PM -0400, Michael Chan wrote:
> > > Add a new helper function __bnxt_free_one_vf_rep() to free one VF rep.
> > > We also reintialize the VF rep fields to proper initial values so that
> > > the function can be used without freeing the VF rep data structure.  This
> > > will be used in subsequent patches to free and recreate VF reps after
> > > error recovery.
> > >
> > > Reviewed-by: Edwin Peer 
> > > Reviewed-by: Sriharsha Basavapatna 
> > > Signed-off-by: Michael Chan 
> > > ---
> > >  drivers/net/ethernet/broadcom/bnxt/bnxt_vfr.c | 21 ++-
> > >  1 file changed, 16 insertions(+), 5 deletions(-)
> > >
> > > diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_vfr.c 
> > > b/drivers/net/ethernet/broadcom/bnxt/bnxt_vfr.c
> > > index b5d6cd63bea7..a4ac11f5b0e5 100644
> > > --- a/drivers/net/ethernet/broadcom/bnxt/bnxt_vfr.c
> > > +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_vfr.c
> > > @@ -288,6 +288,21 @@ void bnxt_vf_reps_open(struct bnxt *bp)
> > >   bnxt_vf_rep_open(bp->vf_reps[i]->dev);
> > >  }
> > >
> > > +static void __bnxt_free_one_vf_rep(struct bnxt *bp, struct bnxt_vf_rep 
> > > *vf_rep)
> > > +{
> > > + if (!vf_rep)
> > > + return;
> >
> > How can it be NULL if you check that vf_rep != NULL when called to
> > __bnxt_free_one_vf_rep() ?
> >
> 
> For this patch, the if (!vf_rep) check here is redundant.  But it is
> needed in the next patch (patch 5) that calls this function from
> bnxt_vf_reps_free() in a different context.  Thanks.

So add it in the patch that needs it.

Thanks


Re: [PATCH v4 net-next] net: mana: Add a driver for Microsoft Azure Network Adapter (MANA)

2021-04-12 Thread Leon Romanovsky
On Mon, Apr 12, 2021 at 08:35:32AM +, Dexuan Cui wrote:
> > From: Leon Romanovsky 
> > Sent: Monday, April 12, 2021 12:46 AM
> > To: Dexuan Cui 
> > > ...
> > > +#define ANA_MAJOR_VERSION0
> > > +#define ANA_MINOR_VERSION1
> > > +#define ANA_MICRO_VERSION1
> > 
> > Please don't introduce drier versions.
> 
> This is not the usual "driver version", though it's called  "drv version" :-)
> As you can see, the driver does not use the macro MODULE_VERSION().
> 
> Here the "drv version" actually means the version of the VF-to-PF protocol,
> with which the Azure Network Adapter ethernet NIC driver (i.e. the VF driver)
> talks to the PF driver.  The protocol version determines the formats of the
> messages that are sent from the VF driver to the PF driver, e.g. query the
> MAC address, create Send/Receive queues, configure RSS, etc.
> 
> Currently the protocol versin is 0.1.1 You may ask why it's called
> "drv version" rather than "protocol version" -- it's because the PF driver
> calls it that way, so I think here the VF driver may as well use the same
> name. BTW, the "drv ver" info is passed to the PF driver in the below
> function:

Ohh, yes, the "driver version" is not the ideal name for that.

I already looked on it in previous patch, came to the conclusion about
the protocol and forgot :(.

> 
> static int mana_query_client_cfg(struct ana_context *ac, u32 drv_major_ver,
>  u32 drv_minor_ver, u32 drv_micro_ver,
>  u16 *max_num_vports)
> {
> struct gdma_context *gc = ac->gdma_dev->gdma_context;
> struct ana_query_client_cfg_resp resp = {};
> struct ana_query_client_cfg_req req = {};
> struct device *dev = gc->dev;
> int err = 0;
> 
> mana_gd_init_req_hdr(&req.hdr, ANA_QUERY_CLIENT_CONFIG,
>  sizeof(req), sizeof(resp));
> req.drv_major_ver = drv_major_ver;
> req.drv_minor_ver = drv_minor_ver;
> req.drv_micro_ver = drv_micro_ver;
> 
> err = mana_send_request(ac, &req, sizeof(req), &resp, sizeof(resp));
> 
> Thanks,
> Dexuan
> 


Re: [PATCH v4 net-next] net: mana: Add a driver for Microsoft Azure Network Adapter (MANA)

2021-04-12 Thread Leon Romanovsky
On Sun, Apr 11, 2021 at 07:34:55PM -0700, Dexuan Cui wrote:
> Add a VF driver for Microsoft Azure Network Adapter (MANA) that will be
> available in the future.
> 
> Co-developed-by: Haiyang Zhang 
> Signed-off-by: Haiyang Zhang 
> Co-developed-by: Shachar Raindel 
> Signed-off-by: Shachar Raindel 
> Signed-off-by: Dexuan Cui 
> ---
>  MAINTAINERS   |4 +-
>  drivers/net/ethernet/Kconfig  |1 +
>  drivers/net/ethernet/Makefile |1 +
>  drivers/net/ethernet/microsoft/Kconfig|   29 +
>  drivers/net/ethernet/microsoft/Makefile   |5 +
>  drivers/net/ethernet/microsoft/mana/Makefile  |6 +
>  drivers/net/ethernet/microsoft/mana/gdma.h|  728 +++
>  .../net/ethernet/microsoft/mana/gdma_main.c   | 1525 +
>  .../net/ethernet/microsoft/mana/hw_channel.c  |  854 
>  .../net/ethernet/microsoft/mana/hw_channel.h  |  190 ++
>  drivers/net/ethernet/microsoft/mana/mana.h|  549 +
>  drivers/net/ethernet/microsoft/mana/mana_en.c | 1924 +
>  .../ethernet/microsoft/mana/mana_ethtool.c|  252 +++
>  .../net/ethernet/microsoft/mana/shm_channel.c |  298 +++
>  .../net/ethernet/microsoft/mana/shm_channel.h |   21 +
>  15 files changed, 6386 insertions(+), 1 deletion(-)
>  create mode 100644 drivers/net/ethernet/microsoft/Kconfig
>  create mode 100644 drivers/net/ethernet/microsoft/Makefile
>  create mode 100644 drivers/net/ethernet/microsoft/mana/Makefile
>  create mode 100644 drivers/net/ethernet/microsoft/mana/gdma.h
>  create mode 100644 drivers/net/ethernet/microsoft/mana/gdma_main.c
>  create mode 100644 drivers/net/ethernet/microsoft/mana/hw_channel.c
>  create mode 100644 drivers/net/ethernet/microsoft/mana/hw_channel.h
>  create mode 100644 drivers/net/ethernet/microsoft/mana/mana.h
>  create mode 100644 drivers/net/ethernet/microsoft/mana/mana_en.c
>  create mode 100644 drivers/net/ethernet/microsoft/mana/mana_ethtool.c
>  create mode 100644 drivers/net/ethernet/microsoft/mana/shm_channel.c
>  create mode 100644 drivers/net/ethernet/microsoft/mana/shm_channel.h
> 

<...>

> +/* Microsoft Azure Network Adapter (MANA)'s definitions
> + *
> + * Structures labeled with "HW DATA" are exchanged with the hardware. All of
> + * them are naturally aligned and hence don't need __packed.
> + */
> +
> +#define ANA_MAJOR_VERSION0
> +#define ANA_MINOR_VERSION1
> +#define ANA_MICRO_VERSION1

Please don't introduce drier versions.

Thanks


Re: [PATCH rdma-next v2 0/5] Get rid of custom made module dependency

2021-04-12 Thread Leon Romanovsky
On Thu, Apr 08, 2021 at 08:42:57PM +0530, Devesh Sharma wrote:
> On Thu, Apr 8, 2021 at 5:14 PM Leon Romanovsky  wrote:
> >
> > On Thu, Apr 08, 2021 at 05:06:24PM +0530, Devesh Sharma wrote:
> > > On Sat, Apr 3, 2021 at 5:12 PM Leon Romanovsky  wrote:
> > > >
> > > > On Sat, Apr 03, 2021 at 03:52:13PM +0530, Devesh Sharma wrote:
> > > > > On Thu, Apr 1, 2021 at 12:27 PM Leon Romanovsky  
> > > > > wrote:
> > > > > >
> > > > > > From: Leon Romanovsky 
> > > > > >
> > > > > > Changelog:
> > > > > > v2:
> > > > > >  * kbuild spotted that I didn't delete all code in patch #5, so 
> > > > > > deleted
> > > > > >even more ulp_ops derefences.
> > > > > > v1: 
> > > > > > https://lore.kernel.org/linux-rdma/20210329085212.257771-1-l...@kernel.org
> > > > > >  * Go much deeper and removed useless ULP indirection
> > > > > > v0: 
> > > > > > https://lore.kernel.org/linux-rdma/20210324142524.1135319-1-l...@kernel.org
> > > > > > ---
> > > > > >
> > > > > > The following series fixes issue spotted in [1], where bnxt_re 
> > > > > > driver
> > > > > > messed with module reference counting in order to implement symbol
> > > > > > dependency of bnxt_re and bnxt modules. All of this is done, when in
> > > > > > upstream we have only one ULP user of that bnxt module. The simple
> > > > > > declaration of exported symbol would do the trick.
> > > > > >
> > > > > > This series removes that custom module_get/_put, which is not 
> > > > > > supposed
> > > > > > to be in the driver from the beginning and get rid of nasty 
> > > > > > indirection
> > > > > > logic that isn't relevant for the upstream code.
> > > > > >
> > > > > > Such small changes allow us to simplify the bnxt code and my hope 
> > > > > > that
> > > > > > Devesh will continue where I stopped and remove struct bnxt_ulp_ops 
> > > > > > too.
> > > > > >
> > > > > > Thanks
> > > > > >
> > > > > > [1] 
> > > > > > https://lore.kernel.org/linux-rdma/20210324142524.1135319-1-l...@kernel.org
> > > > > >
> > > > > > Leon Romanovsky (5):
> > > > > >   RDMA/bnxt_re: Depend on bnxt ethernet driver and not blindly 
> > > > > > select it
> > > > > >   RDMA/bnxt_re: Create direct symbolic link between bnxt modules
> > > > > >   RDMA/bnxt_re: Get rid of custom module reference counting
> > > > > >   net/bnxt: Remove useless check of non-existent ULP id
> > > > > >   net/bnxt: Use direct API instead of useless indirection
> > > > > >
> > > > > >  drivers/infiniband/hw/bnxt_re/Kconfig |   4 +-
> > > > > >  drivers/infiniband/hw/bnxt_re/main.c  |  93 ++-
> > > > > >  drivers/net/ethernet/broadcom/bnxt/bnxt.c |   4 +-
> > > > > >  drivers/net/ethernet/broadcom/bnxt/bnxt.h |   1 -
> > > > > >  drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.c | 245 
> > > > > > +++---
> > > > > >  drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.h |  32 +--
> > > > > >  6 files changed, 119 insertions(+), 260 deletions(-)
> > > > >
> > > > > Hi Leon,
> > > > >
> > > > > After a couple of internal discussions we reached a conclusion to
> > > > > implement the Auxbus driver interface and fix the problem once and for
> > > > > all.
> > > >
> > > > Thanks Devesh,
> > > >
> > > > Jason, it looks like we can proceed with this patchset, because in
> > > > auxbus mode this module refcount and ULP indirection logics will be
> > > > removed anyway.
> > > >
> > > > Thanks
> > > Hi Leon,
> > >
> > > In my internal testing, I am seeing a crash using the 3rd patch. I am
> > > spending a few cycles on debugging it. expect my input in a day or so.
> >
> > Can you please post the kernel crash report here?
> > I don't see how function rename in patch #3 can cause to the crash.
> Hey, unfortunately my kdump service config is giving me tough time on
> my host. I will share if I get it.

Any news here?

> >
> > Thanks
> >
> > > >
> > > > > >
> > > > > > --
> > > > > > 2.30.2
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > -Regards
> > > > > Devesh
> > > >
> > > >
> > >
> > >
> > > --
> > > -Regards
> > > Devesh
> >
> >
> 
> 
> -- 
> -Regards
> Devesh




Re: [PATCH net-next 4/5] bnxt_en: Refactor __bnxt_vf_reps_destroy().

2021-04-12 Thread Leon Romanovsky
On Sun, Apr 11, 2021 at 08:18:14PM -0400, Michael Chan wrote:
> Add a new helper function __bnxt_free_one_vf_rep() to free one VF rep.
> We also reintialize the VF rep fields to proper initial values so that
> the function can be used without freeing the VF rep data structure.  This
> will be used in subsequent patches to free and recreate VF reps after
> error recovery.
> 
> Reviewed-by: Edwin Peer 
> Reviewed-by: Sriharsha Basavapatna 
> Signed-off-by: Michael Chan 
> ---
>  drivers/net/ethernet/broadcom/bnxt/bnxt_vfr.c | 21 ++-
>  1 file changed, 16 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_vfr.c 
> b/drivers/net/ethernet/broadcom/bnxt/bnxt_vfr.c
> index b5d6cd63bea7..a4ac11f5b0e5 100644
> --- a/drivers/net/ethernet/broadcom/bnxt/bnxt_vfr.c
> +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_vfr.c
> @@ -288,6 +288,21 @@ void bnxt_vf_reps_open(struct bnxt *bp)
>   bnxt_vf_rep_open(bp->vf_reps[i]->dev);
>  }
>  
> +static void __bnxt_free_one_vf_rep(struct bnxt *bp, struct bnxt_vf_rep 
> *vf_rep)
> +{
> + if (!vf_rep)
> + return;

How can it be NULL if you check that vf_rep != NULL when called to
__bnxt_free_one_vf_rep() ?

Thanks

> +
> + if (vf_rep->dst) {
> + dst_release((struct dst_entry *)vf_rep->dst);
> + vf_rep->dst = NULL;
> + }
> + if (vf_rep->tx_cfa_action != CFA_HANDLE_INVALID) {
> + hwrm_cfa_vfr_free(bp, vf_rep->vf_idx);
> + vf_rep->tx_cfa_action = CFA_HANDLE_INVALID;
> + }
> +}
> +
>  static void __bnxt_vf_reps_destroy(struct bnxt *bp)
>  {
>   u16 num_vfs = pci_num_vf(bp->pdev);
> @@ -297,11 +312,7 @@ static void __bnxt_vf_reps_destroy(struct bnxt *bp)
>   for (i = 0; i < num_vfs; i++) {
>   vf_rep = bp->vf_reps[i];
>   if (vf_rep) {
> - dst_release((struct dst_entry *)vf_rep->dst);
> -
> - if (vf_rep->tx_cfa_action != CFA_HANDLE_INVALID)
> - hwrm_cfa_vfr_free(bp, vf_rep->vf_idx);
> -
> + __bnxt_free_one_vf_rep(bp, vf_rep);
>   if (vf_rep->dev) {
>   /* if register_netdev failed, then netdev_ops
>* would have been set to NULL
> -- 
> 2.18.1
> 


Re: [PATCH net-next 4/7] net: ipa: ipa_stop() does not return an error

2021-04-12 Thread Leon Romanovsky
On Sun, Apr 11, 2021 at 08:42:15AM -0500, Alex Elder wrote:
> On 4/11/21 8:28 AM, Leon Romanovsky wrote:
> >> I think *not* checking an available return value is questionable
> >> practice.  I'd really rather have a build option for a
> >> "__need_not_check" tag and have "must_check" be the default.
> > __need_not_check == void ???
> 
> I'm not sure I understand your statement here, but...

We are talking about the same thing. My point was that __need_not_check
is actually void. The API author was supposed to declare that by
declaring that function doesn't return anything.

Thanks


Re: [PATCH net-next 4/7] net: ipa: ipa_stop() does not return an error

2021-04-11 Thread Leon Romanovsky
On Sun, Apr 11, 2021 at 08:09:55AM -0500, Alex Elder wrote:
> On 4/11/21 1:34 AM, Leon Romanovsky wrote:
> > On Fri, Apr 09, 2021 at 01:07:19PM -0500, Alex Elder wrote:
> >> In ipa_modem_stop(), if the modem netdev pointer is non-null we call
> >> ipa_stop().  We check for an error and if one is returned we handle
> >> it.  But ipa_stop() never returns an error, so this extra handling
> >> is unnecessary.  Simplify the code in ipa_modem_stop() based on the
> >> knowledge no error handling is needed at this spot.
> >>
> >> Signed-off-by: Alex Elder 
> >> ---
> >>  drivers/net/ipa/ipa_modem.c | 18 --
> >>  1 file changed, 4 insertions(+), 14 deletions(-)
> > 
> > <...>
> > 
> >> +  /* Stop the queue and disable the endpoints if it's open */
> >>if (netdev) {
> >> -  /* Stop the queue and disable the endpoints if it's open */
> >> -  ret = ipa_stop(netdev);
> >> -  if (ret)
> >> -  goto out_set_state;
> >> -
> >> +  (void)ipa_stop(netdev);
> > 
> > This void casting is not needed here and in more general case sometimes
> > even be seen as a mistake, for example if the returned attribute declared
> > as __must_check.
> 
> I accept your point but I feel like it's sort of a 50/50 thing.
> 
> I think *not* checking an available return value is questionable
> practice.  I'd really rather have a build option for a
> "__need_not_check" tag and have "must_check" be the default.

__need_not_check == void ???

> 
> The void cast here says "I know this returns a result, but I am
> intentionally not checking it."  If it had been __must_check I
> would certainly have checked it.  
> 
> That being said, I don't really care that much, so I'll plan
> to post version 2, which will drop this cast (I'll probably
> add a comment though).

Thanks

> 
> Thanks.
> 
>   -Alex
> 
> > 
> > Thanks
> > 
> 


[PATCH rdma-next v1 5/7] RDMA/mlx5: Add support to MODIFY_MEMIC command

2021-04-11 Thread Leon Romanovsky
From: Maor Gottlieb 

Add two functions to allocate and deallocate MEMIC operations
by using the MODIFY_MEMIC command.

Signed-off-by: Maor Gottlieb 
Signed-off-by: Leon Romanovsky 
---
 drivers/infiniband/hw/mlx5/dm.c | 36 +
 drivers/infiniband/hw/mlx5/dm.h |  2 ++
 2 files changed, 38 insertions(+)

diff --git a/drivers/infiniband/hw/mlx5/dm.c b/drivers/infiniband/hw/mlx5/dm.c
index 29eb5c9df3a4..a648554210f8 100644
--- a/drivers/infiniband/hw/mlx5/dm.c
+++ b/drivers/infiniband/hw/mlx5/dm.c
@@ -111,6 +111,42 @@ void mlx5_cmd_dealloc_memic(struct mlx5_dm *dm, 
phys_addr_t addr,
spin_unlock(&dm->lock);
 }
 
+void mlx5_cmd_dealloc_memic_op(struct mlx5_dm *dm, phys_addr_t addr,
+  u8 operation)
+{
+   u32 in[MLX5_ST_SZ_DW(modify_memic_in)] = {};
+   struct mlx5_core_dev *dev = dm->dev;
+
+   MLX5_SET(modify_memic_in, in, opcode, MLX5_CMD_OP_MODIFY_MEMIC);
+   MLX5_SET(modify_memic_in, in, op_mod, MLX5_MODIFY_MEMIC_OP_MOD_DEALLOC);
+   MLX5_SET(modify_memic_in, in, memic_operation_type, operation);
+   MLX5_SET64(modify_memic_in, in, memic_start_addr, addr - dev->bar_addr);
+
+   mlx5_cmd_exec_in(dev, modify_memic, in);
+}
+
+static int mlx5_cmd_alloc_memic_op(struct mlx5_dm *dm, phys_addr_t addr,
+  u8 operation, phys_addr_t *op_addr)
+{
+   u32 out[MLX5_ST_SZ_DW(modify_memic_out)] = {};
+   u32 in[MLX5_ST_SZ_DW(modify_memic_in)] = {};
+   struct mlx5_core_dev *dev = dm->dev;
+   int err;
+
+   MLX5_SET(modify_memic_in, in, opcode, MLX5_CMD_OP_MODIFY_MEMIC);
+   MLX5_SET(modify_memic_in, in, op_mod, MLX5_MODIFY_MEMIC_OP_MOD_ALLOC);
+   MLX5_SET(modify_memic_in, in, memic_operation_type, operation);
+   MLX5_SET64(modify_memic_in, in, memic_start_addr, addr - dev->bar_addr);
+
+   err = mlx5_cmd_exec_inout(dev, modify_memic, in, out);
+   if (err)
+   return err;
+
+   *op_addr = dev->bar_addr +
+  MLX5_GET64(modify_memic_out, out, memic_operation_addr);
+   return 0;
+}
+
 static int add_dm_mmap_entry(struct ib_ucontext *context,
 struct mlx5_ib_dm_memic *mdm, u64 address)
 {
diff --git a/drivers/infiniband/hw/mlx5/dm.h b/drivers/infiniband/hw/mlx5/dm.h
index 39e66b51264d..6857bbdb201c 100644
--- a/drivers/infiniband/hw/mlx5/dm.h
+++ b/drivers/infiniband/hw/mlx5/dm.h
@@ -49,5 +49,7 @@ struct ib_dm *mlx5_ib_alloc_dm(struct ib_device *ibdev,
   struct uverbs_attr_bundle *attrs);
 void mlx5_cmd_dealloc_memic(struct mlx5_dm *dm, phys_addr_t addr,
u64 length);
+void mlx5_cmd_dealloc_memic_op(struct mlx5_dm *dm, phys_addr_t addr,
+  u8 operation);
 
 #endif /* _MLX5_IB_DM_H */
-- 
2.30.2



[PATCH rdma-next v1 7/7] RDMA/mlx5: Expose UAPI to query DM

2021-04-11 Thread Leon Romanovsky
From: Maor Gottlieb 

Expose UAPI to query MEMIC DM, this will let user space application
that didn't allocate the DM but has access to by owning the matching
command FD to retrieve its information.

Signed-off-by: Maor Gottlieb 
Signed-off-by: Leon Romanovsky 
---
 drivers/infiniband/hw/mlx5/dm.c  | 47 +++-
 drivers/infiniband/hw/mlx5/dm.h  |  1 +
 include/uapi/rdma/mlx5_user_ioctl_cmds.h |  8 
 3 files changed, 55 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/hw/mlx5/dm.c b/drivers/infiniband/hw/mlx5/dm.c
index bec0489428f8..235aad6beacb 100644
--- a/drivers/infiniband/hw/mlx5/dm.c
+++ b/drivers/infiniband/hw/mlx5/dm.c
@@ -293,6 +293,7 @@ static struct ib_dm *handle_alloc_dm_memic(struct 
ib_ucontext *ctx,
kref_init(&dm->ref);
xa_init(&dm->ops);
mutex_init(&dm->ops_xa_lock);
+   dm->req_length = attr->length;
 
err = mlx5_cmd_alloc_memic(dm_db, &dm->base.dev_addr,
   dm->base.size, attr->alignment);
@@ -473,6 +474,38 @@ static int mlx5_ib_dealloc_dm(struct ib_dm *ibdm,
}
 }
 
+static int UVERBS_HANDLER(MLX5_IB_METHOD_DM_QUERY)(
+   struct uverbs_attr_bundle *attrs)
+{
+   struct ib_dm *ibdm =
+   uverbs_attr_get_obj(attrs, MLX5_IB_ATTR_QUERY_DM_REQ_HANDLE);
+   struct mlx5_ib_dm *dm = to_mdm(ibdm);
+   struct mlx5_ib_dm_memic *memic;
+   u64 start_offset;
+   u16 page_idx;
+   int err;
+
+   if (dm->type != MLX5_IB_UAPI_DM_TYPE_MEMIC)
+   return -EOPNOTSUPP;
+
+   memic = to_memic(ibdm);
+   page_idx = memic->mentry.rdma_entry.start_pgoff & 0x;
+   err = uverbs_copy_to(attrs, MLX5_IB_ATTR_QUERY_DM_RESP_PAGE_INDEX,
+&page_idx, sizeof(page_idx));
+   if (err)
+   return err;
+
+   start_offset = memic->base.dev_addr & ~PAGE_MASK;
+   err =  uverbs_copy_to(attrs, MLX5_IB_ATTR_QUERY_DM_RESP_START_OFFSET,
+ &start_offset, sizeof(start_offset));
+   if (err)
+   return err;
+
+   return uverbs_copy_to(attrs, MLX5_IB_ATTR_QUERY_DM_RESP_LENGTH,
+ &memic->req_length,
+ sizeof(memic->req_length));
+}
+
 void mlx5_ib_dm_mmap_free(struct mlx5_ib_dev *dev,
  struct mlx5_user_mmap_entry *mentry)
 {
@@ -498,6 +531,17 @@ void mlx5_ib_dm_mmap_free(struct mlx5_ib_dev *dev,
}
 }
 
+DECLARE_UVERBS_NAMED_METHOD(
+   MLX5_IB_METHOD_DM_QUERY,
+   UVERBS_ATTR_IDR(MLX5_IB_ATTR_QUERY_DM_REQ_HANDLE, UVERBS_OBJECT_DM,
+   UVERBS_ACCESS_READ, UA_MANDATORY),
+   UVERBS_ATTR_PTR_OUT(MLX5_IB_ATTR_QUERY_DM_RESP_START_OFFSET,
+   UVERBS_ATTR_TYPE(u64), UA_MANDATORY),
+   UVERBS_ATTR_PTR_OUT(MLX5_IB_ATTR_QUERY_DM_RESP_PAGE_INDEX,
+   UVERBS_ATTR_TYPE(u16), UA_MANDATORY),
+   UVERBS_ATTR_PTR_OUT(MLX5_IB_ATTR_QUERY_DM_RESP_LENGTH,
+   UVERBS_ATTR_TYPE(u64), UA_MANDATORY));
+
 ADD_UVERBS_ATTRIBUTES_SIMPLE(
mlx5_ib_dm, UVERBS_OBJECT_DM, UVERBS_METHOD_DM_ALLOC,
UVERBS_ATTR_PTR_OUT(MLX5_IB_ATTR_ALLOC_DM_RESP_START_OFFSET,
@@ -524,7 +568,8 @@ DECLARE_UVERBS_NAMED_METHOD(
UA_OPTIONAL));
 
 DECLARE_UVERBS_GLOBAL_METHODS(UVERBS_OBJECT_DM,
- &UVERBS_METHOD(MLX5_IB_METHOD_DM_MAP_OP_ADDR));
+ &UVERBS_METHOD(MLX5_IB_METHOD_DM_MAP_OP_ADDR),
+ &UVERBS_METHOD(MLX5_IB_METHOD_DM_QUERY));
 
 const struct uapi_definition mlx5_ib_dm_defs[] = {
UAPI_DEF_CHAIN_OBJ_TREE(UVERBS_OBJECT_DM, &mlx5_ib_dm),
diff --git a/drivers/infiniband/hw/mlx5/dm.h b/drivers/infiniband/hw/mlx5/dm.h
index 4d0ffad29d8f..9674a80d8d70 100644
--- a/drivers/infiniband/hw/mlx5/dm.h
+++ b/drivers/infiniband/hw/mlx5/dm.h
@@ -31,6 +31,7 @@ struct mlx5_ib_dm_memic {
struct xarray   ops;
struct mutexops_xa_lock;
struct kref ref;
+   size_t  req_length;
 };
 
 struct mlx5_ib_dm_icm {
diff --git a/include/uapi/rdma/mlx5_user_ioctl_cmds.h 
b/include/uapi/rdma/mlx5_user_ioctl_cmds.h
index c6fbc5211717..3798cbcb9021 100644
--- a/include/uapi/rdma/mlx5_user_ioctl_cmds.h
+++ b/include/uapi/rdma/mlx5_user_ioctl_cmds.h
@@ -43,6 +43,7 @@ enum mlx5_ib_create_flow_action_attrs {
 
 enum mlx5_ib_dm_methods {
MLX5_IB_METHOD_DM_MAP_OP_ADDR  = (1U << UVERBS_ID_NS_SHIFT),
+   MLX5_IB_METHOD_DM_QUERY,
 };
 
 enum mlx5_ib_dm_map_op_addr_attrs {
@@ -52,6 +53,13 @@ enum mlx5_ib_dm_map_op_addr_attrs {
MLX5_IB_ATTR_DM_MAP_OP_ADDR_RESP_PAGE_INDEX,
 };
 
+enum mlx5_ib_query_dm_attrs {
+   MLX5_IB_ATTR_QUERY_DM_REQ_HANDLE = (1U << UVERBS_

[PATCH rdma-next v1 2/7] RDMA/uverbs: Make UVERBS_OBJECT_METHODS to consider line number

2021-04-11 Thread Leon Romanovsky
From: Maor Gottlieb 

In order to support multiple methods declaration in the same file we
should use the line number as part of the name.

Signed-off-by: Maor Gottlieb 
Signed-off-by: Leon Romanovsky 
---
 include/rdma/uverbs_named_ioctl.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/rdma/uverbs_named_ioctl.h 
b/include/rdma/uverbs_named_ioctl.h
index f04f5126f61e..ee7873f872c3 100644
--- a/include/rdma/uverbs_named_ioctl.h
+++ b/include/rdma/uverbs_named_ioctl.h
@@ -20,7 +20,7 @@
 
 /* These are static so they do not need to be qualified */
 #define UVERBS_METHOD_ATTRS(method_id) _method_attrs_##method_id
-#define UVERBS_OBJECT_METHODS(object_id) _object_methods_##object_id
+#define UVERBS_OBJECT_METHODS(object_id) 
_UVERBS_NAME(_object_methods_##object_id, __LINE__)
 
 #define DECLARE_UVERBS_NAMED_METHOD(_method_id, ...)   
\
static const struct uverbs_attr_def *const UVERBS_METHOD_ATTRS(\
-- 
2.30.2



[PATCH rdma-next v1 6/7] RDMA/mlx5: Add support in MEMIC operations

2021-04-11 Thread Leon Romanovsky
From: Maor Gottlieb 

MEMIC buffer, in addition to regular read and write operations, can
support atomic operations from the host.

Introduce and implement new UAPI to allocate address space for MEMIC
operations such as atomic. This includes:

1. Expose new IOCTL for request mapping of MEMIC operation.
2. Hold the operations address in a list, so same operation to same DM
   will be allocated only once.
3. Manage refcount on the mlx5_ib_dm object, so it would be keep valid
   until all addresses were unmapped.

Signed-off-by: Maor Gottlieb 
Signed-off-by: Leon Romanovsky 
---
 drivers/infiniband/hw/mlx5/dm.c  | 192 +--
 drivers/infiniband/hw/mlx5/dm.h  |  12 ++
 drivers/infiniband/hw/mlx5/main.c|   7 +-
 drivers/infiniband/hw/mlx5/mlx5_ib.h |   1 +
 include/uapi/rdma/mlx5_user_ioctl_cmds.h |  11 ++
 5 files changed, 209 insertions(+), 14 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/dm.c b/drivers/infiniband/hw/mlx5/dm.c
index a648554210f8..bec0489428f8 100644
--- a/drivers/infiniband/hw/mlx5/dm.c
+++ b/drivers/infiniband/hw/mlx5/dm.c
@@ -148,16 +148,126 @@ static int mlx5_cmd_alloc_memic_op(struct mlx5_dm *dm, 
phys_addr_t addr,
 }
 
 static int add_dm_mmap_entry(struct ib_ucontext *context,
-struct mlx5_ib_dm_memic *mdm, u64 address)
+struct mlx5_user_mmap_entry *mentry, u8 mmap_flag,
+size_t size, u64 address)
 {
-   mdm->mentry.mmap_flag = MLX5_IB_MMAP_TYPE_MEMIC;
-   mdm->mentry.address = address;
+   mentry->mmap_flag = mmap_flag;
+   mentry->address = address;
+
return rdma_user_mmap_entry_insert_range(
-   context, &mdm->mentry.rdma_entry, mdm->base.size,
+   context, &mentry->rdma_entry, size,
MLX5_IB_MMAP_DEVICE_MEM << 16,
(MLX5_IB_MMAP_DEVICE_MEM << 16) + (1UL << 16) - 1);
 }
 
+static void mlx5_ib_dm_memic_free(struct kref *kref)
+{
+   struct mlx5_ib_dm_memic *dm =
+   container_of(kref, struct mlx5_ib_dm_memic, ref);
+   struct mlx5_ib_dev *dev = to_mdev(dm->base.ibdm.device);
+
+   mlx5_cmd_dealloc_memic(&dev->dm, dm->base.dev_addr, dm->base.size);
+   kfree(dm);
+}
+
+static int copy_op_to_user(struct mlx5_ib_dm_op_entry *op_entry,
+  struct uverbs_attr_bundle *attrs)
+{
+   u64 start_offset;
+   u16 page_idx;
+   int err;
+
+   page_idx = op_entry->mentry.rdma_entry.start_pgoff & 0x;
+   start_offset = op_entry->op_addr & ~PAGE_MASK;
+   err = uverbs_copy_to(attrs, MLX5_IB_ATTR_DM_MAP_OP_ADDR_RESP_PAGE_INDEX,
+&page_idx, sizeof(page_idx));
+   if (err)
+   return err;
+
+   return uverbs_copy_to(attrs,
+ MLX5_IB_ATTR_DM_MAP_OP_ADDR_RESP_START_OFFSET,
+ &start_offset, sizeof(start_offset));
+}
+
+static int map_existing_op(struct mlx5_ib_dm_memic *dm, u8 op,
+  struct uverbs_attr_bundle *attrs)
+{
+   struct mlx5_ib_dm_op_entry *op_entry;
+
+   op_entry = xa_load(&dm->ops, op);
+   if (!op_entry)
+   return -ENOENT;
+
+   return copy_op_to_user(op_entry, attrs);
+}
+
+static int UVERBS_HANDLER(MLX5_IB_METHOD_DM_MAP_OP_ADDR)(
+   struct uverbs_attr_bundle *attrs)
+{
+   struct ib_uobject *uobj = uverbs_attr_get_uobject(
+   attrs, MLX5_IB_ATTR_DM_MAP_OP_ADDR_REQ_HANDLE);
+   struct mlx5_ib_dev *dev = to_mdev(uobj->context->device);
+   struct ib_dm *ibdm = uobj->object;
+   struct mlx5_ib_dm_memic *dm = to_memic(ibdm);
+   struct mlx5_ib_dm_op_entry *op_entry;
+   int err;
+   u8 op;
+
+   err = uverbs_copy_from(&op, attrs, MLX5_IB_ATTR_DM_MAP_OP_ADDR_REQ_OP);
+   if (err)
+   return err;
+
+   if (!(MLX5_CAP_DEV_MEM(dev->mdev, memic_operations) & BIT(op)))
+   return -EOPNOTSUPP;
+
+   mutex_lock(&dm->ops_xa_lock);
+   err = map_existing_op(dm, op, attrs);
+   if (!err || err != -ENOENT)
+   goto err_unlock;
+
+   op_entry = kzalloc(sizeof(*op_entry), GFP_KERNEL);
+   if (!op_entry)
+   goto err_unlock;
+
+   err = mlx5_cmd_alloc_memic_op(&dev->dm, dm->base.dev_addr, op,
+ &op_entry->op_addr);
+   if (err) {
+   kfree(op_entry);
+   goto err_unlock;
+   }
+   op_entry->op = op;
+   op_entry->dm = dm;
+
+   err = add_dm_mmap_entry(uobj->context, &op_entry->mentry,
+   MLX5_IB_MMAP_TYPE_MEMIC_OP, dm->base.size,
+   op_entry->op_addr & PAGE_MASK);
+   if (err) {
+   mlx5_cmd_dealloc_memic_op(&dev

[PATCH rdma-next v1 4/7] RDMA/mlx5: Re-organize the DM code

2021-04-11 Thread Leon Romanovsky
From: Maor Gottlieb 

1. Inline the checks from check_dm_type_support() into their
   respective allocation functions.
2. Fix use after free when driver fails to copy the MEMIC address to the
   user by moving the allocation code into their respective functions,
   hence we avoid the explicit call to free the DM in the error flow.
3. Split mlx5_ib_dm struct to memic and icm proper typesafety
   throughout.

Fixes: dc2316eba73f ("IB/mlx5: Fix device memory flows")
Signed-off-by: Maor Gottlieb 
Signed-off-by: Leon Romanovsky 
---
 drivers/infiniband/hw/mlx5/dm.c   | 193 ++
 drivers/infiniband/hw/mlx5/dm.h   |  29 +++--
 drivers/infiniband/hw/mlx5/main.c |   8 +-
 3 files changed, 114 insertions(+), 116 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/dm.c b/drivers/infiniband/hw/mlx5/dm.c
index 21fefcd405ef..29eb5c9df3a4 100644
--- a/drivers/infiniband/hw/mlx5/dm.c
+++ b/drivers/infiniband/hw/mlx5/dm.c
@@ -112,64 +112,50 @@ void mlx5_cmd_dealloc_memic(struct mlx5_dm *dm, 
phys_addr_t addr,
 }
 
 static int add_dm_mmap_entry(struct ib_ucontext *context,
-struct mlx5_ib_dm *mdm, u64 address)
+struct mlx5_ib_dm_memic *mdm, u64 address)
 {
mdm->mentry.mmap_flag = MLX5_IB_MMAP_TYPE_MEMIC;
mdm->mentry.address = address;
return rdma_user_mmap_entry_insert_range(
-   context, &mdm->mentry.rdma_entry, mdm->size,
+   context, &mdm->mentry.rdma_entry, mdm->base.size,
MLX5_IB_MMAP_DEVICE_MEM << 16,
(MLX5_IB_MMAP_DEVICE_MEM << 16) + (1UL << 16) - 1);
 }
 
-static inline int check_dm_type_support(struct mlx5_ib_dev *dev, u32 type)
-{
-   switch (type) {
-   case MLX5_IB_UAPI_DM_TYPE_MEMIC:
-   if (!MLX5_CAP_DEV_MEM(dev->mdev, memic))
-   return -EOPNOTSUPP;
-   break;
-   case MLX5_IB_UAPI_DM_TYPE_STEERING_SW_ICM:
-   case MLX5_IB_UAPI_DM_TYPE_HEADER_MODIFY_SW_ICM:
-   if (!capable(CAP_SYS_RAWIO) || !capable(CAP_NET_RAW))
-   return -EPERM;
-
-   if (!(MLX5_CAP_FLOWTABLE_NIC_RX(dev->mdev, sw_owner) ||
- MLX5_CAP_FLOWTABLE_NIC_TX(dev->mdev, sw_owner) ||
- MLX5_CAP_FLOWTABLE_NIC_RX(dev->mdev, sw_owner_v2) ||
- MLX5_CAP_FLOWTABLE_NIC_TX(dev->mdev, sw_owner_v2)))
-   return -EOPNOTSUPP;
-   break;
-   }
-
-   return 0;
-}
-
-static int handle_alloc_dm_memic(struct ib_ucontext *ctx, struct mlx5_ib_dm 
*dm,
-struct ib_dm_alloc_attr *attr,
-struct uverbs_attr_bundle *attrs)
+static struct ib_dm *handle_alloc_dm_memic(struct ib_ucontext *ctx,
+  struct ib_dm_alloc_attr *attr,
+  struct uverbs_attr_bundle *attrs)
 {
struct mlx5_dm *dm_db = &to_mdev(ctx->device)->dm;
+   struct mlx5_ib_dm_memic *dm;
u64 start_offset;
u16 page_idx;
int err;
u64 address;
 
-   dm->size = roundup(attr->length, MLX5_MEMIC_BASE_SIZE);
+   if (!MLX5_CAP_DEV_MEM(dm_db->dev, memic))
+   return ERR_PTR(-EOPNOTSUPP);
+
+   dm = kzalloc(sizeof(*dm), GFP_KERNEL);
+   if (!dm)
+   return ERR_PTR(-ENOMEM);
+
+   dm->base.type = MLX5_IB_UAPI_DM_TYPE_MEMIC;
+   dm->base.size = roundup(attr->length, MLX5_MEMIC_BASE_SIZE);
 
-   err = mlx5_cmd_alloc_memic(dm_db, &dm->dev_addr,
-  dm->size, attr->alignment);
+   err = mlx5_cmd_alloc_memic(dm_db, &dm->base.dev_addr,
+  dm->base.size, attr->alignment);
if (err) {
kfree(dm);
-   return err;
+   return ERR_PTR(err);
}
 
-   address = dm->dev_addr & PAGE_MASK;
+   address = dm->base.dev_addr & PAGE_MASK;
err = add_dm_mmap_entry(ctx, dm, address);
if (err) {
-   mlx5_cmd_dealloc_memic(dm_db, dm->dev_addr, dm->size);
+   mlx5_cmd_dealloc_memic(dm_db, dm->base.dev_addr, dm->base.size);
kfree(dm);
-   return err;
+   return ERR_PTR(err);
}
 
page_idx = dm->mentry.rdma_entry.start_pgoff & 0x;
@@ -180,51 +166,74 @@ static int handle_alloc_dm_memic(struct ib_ucontext *ctx, 
struct mlx5_ib_dm *dm,
if (err)
goto err_copy;
 
-   start_offset = dm->dev_addr & ~PAGE_MASK;
+   start_offset = dm->base.dev_addr & ~PAGE_MASK;
err = uverbs_copy_to(attrs,
 MLX5_IB_ATTR_ALLOC_DM_RESP_START_OFFSET,
 &start_offset, sizeof(start_offset

[PATCH rdma-next v1 3/7] RDMA/mlx5: Move all DM logic to separate file

2021-04-11 Thread Leon Romanovsky
From: Maor Gottlieb 

Move all device memory related code to a separate file.

Signed-off-by: Maor Gottlieb 
Signed-off-by: Leon Romanovsky 
---
 drivers/infiniband/hw/mlx5/Makefile  |   1 +
 drivers/infiniband/hw/mlx5/cmd.c | 101 
 drivers/infiniband/hw/mlx5/cmd.h |   3 -
 drivers/infiniband/hw/mlx5/dm.c  | 342 +++
 drivers/infiniband/hw/mlx5/dm.h  |  42 
 drivers/infiniband/hw/mlx5/main.c| 236 +-
 drivers/infiniband/hw/mlx5/mlx5_ib.h |  24 --
 drivers/infiniband/hw/mlx5/mr.c  |   1 +
 8 files changed, 388 insertions(+), 362 deletions(-)
 create mode 100644 drivers/infiniband/hw/mlx5/dm.c
 create mode 100644 drivers/infiniband/hw/mlx5/dm.h

diff --git a/drivers/infiniband/hw/mlx5/Makefile 
b/drivers/infiniband/hw/mlx5/Makefile
index b4c009bb0db6..f43380106bd0 100644
--- a/drivers/infiniband/hw/mlx5/Makefile
+++ b/drivers/infiniband/hw/mlx5/Makefile
@@ -6,6 +6,7 @@ mlx5_ib-y := ah.o \
 cong.o \
 counters.o \
 cq.o \
+dm.o \
 doorbell.o \
 gsi.o \
 ib_virt.o \
diff --git a/drivers/infiniband/hw/mlx5/cmd.c b/drivers/infiniband/hw/mlx5/cmd.c
index 234f29912ba9..a8db8a051170 100644
--- a/drivers/infiniband/hw/mlx5/cmd.c
+++ b/drivers/infiniband/hw/mlx5/cmd.c
@@ -47,107 +47,6 @@ int mlx5_cmd_query_cong_params(struct mlx5_core_dev *dev, 
int cong_point,
return mlx5_cmd_exec_inout(dev, query_cong_params, in, out);
 }
 
-int mlx5_cmd_alloc_memic(struct mlx5_dm *dm, phys_addr_t *addr,
-u64 length, u32 alignment)
-{
-   struct mlx5_core_dev *dev = dm->dev;
-   u64 num_memic_hw_pages = MLX5_CAP_DEV_MEM(dev, memic_bar_size)
-   >> PAGE_SHIFT;
-   u64 hw_start_addr = MLX5_CAP64_DEV_MEM(dev, memic_bar_start_addr);
-   u32 max_alignment = MLX5_CAP_DEV_MEM(dev, log_max_memic_addr_alignment);
-   u32 num_pages = DIV_ROUND_UP(length, PAGE_SIZE);
-   u32 out[MLX5_ST_SZ_DW(alloc_memic_out)] = {};
-   u32 in[MLX5_ST_SZ_DW(alloc_memic_in)] = {};
-   u32 mlx5_alignment;
-   u64 page_idx = 0;
-   int ret = 0;
-
-   if (!length || (length & MLX5_MEMIC_ALLOC_SIZE_MASK))
-   return -EINVAL;
-
-   /* mlx5 device sets alignment as 64*2^driver_value
-* so normalizing is needed.
-*/
-   mlx5_alignment = (alignment < MLX5_MEMIC_BASE_ALIGN) ? 0 :
-alignment - MLX5_MEMIC_BASE_ALIGN;
-   if (mlx5_alignment > max_alignment)
-   return -EINVAL;
-
-   MLX5_SET(alloc_memic_in, in, opcode, MLX5_CMD_OP_ALLOC_MEMIC);
-   MLX5_SET(alloc_memic_in, in, range_size, num_pages * PAGE_SIZE);
-   MLX5_SET(alloc_memic_in, in, memic_size, length);
-   MLX5_SET(alloc_memic_in, in, log_memic_addr_alignment,
-mlx5_alignment);
-
-   while (page_idx < num_memic_hw_pages) {
-   spin_lock(&dm->lock);
-   page_idx = bitmap_find_next_zero_area(dm->memic_alloc_pages,
- num_memic_hw_pages,
- page_idx,
- num_pages, 0);
-
-   if (page_idx < num_memic_hw_pages)
-   bitmap_set(dm->memic_alloc_pages,
-  page_idx, num_pages);
-
-   spin_unlock(&dm->lock);
-
-   if (page_idx >= num_memic_hw_pages)
-   break;
-
-   MLX5_SET64(alloc_memic_in, in, range_start_addr,
-  hw_start_addr + (page_idx * PAGE_SIZE));
-
-   ret = mlx5_cmd_exec_inout(dev, alloc_memic, in, out);
-   if (ret) {
-   spin_lock(&dm->lock);
-   bitmap_clear(dm->memic_alloc_pages,
-page_idx, num_pages);
-   spin_unlock(&dm->lock);
-
-   if (ret == -EAGAIN) {
-   page_idx++;
-   continue;
-   }
-
-   return ret;
-   }
-
-   *addr = dev->bar_addr +
-   MLX5_GET64(alloc_memic_out, out, memic_start_addr);
-
-   return 0;
-   }
-
-   return -ENOMEM;
-}
-
-void mlx5_cmd_dealloc_memic(struct mlx5_dm *dm, phys_addr_t addr, u64 length)
-{
-   struct mlx5_core_dev *dev = dm->dev;
-   u64 hw_start_addr = MLX5_CAP64_DEV_MEM(dev, memic_bar_start_addr);
-   u32 num_pages = DIV_ROUND_UP(length, PAGE_SIZE);
-   u32 in[MLX5_ST_SZ_DW(dealloc_memic_in)] = {};
-   u64 start_page_idx;
-   int err;
-
-   addr -= dev->bar_addr;
-   start_page_idx = (addr - hw_start_addr) >> PAGE_SHIFT;
-
-   MLX5_SET(dealloc_memic_in,

[PATCH mlx5-next v1 1/7] net/mlx5: Add MEMIC operations related bits

2021-04-11 Thread Leon Romanovsky
From: Maor Gottlieb 

Add the MEMIC operations bits and structures to the mlx5_ifc file.

Signed-off-by: Maor Gottlieb 
Signed-off-by: Leon Romanovsky 
---
 include/linux/mlx5/mlx5_ifc.h | 42 ++-
 1 file changed, 41 insertions(+), 1 deletion(-)

diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index df5d91c8b2d4..0ec4ea0a9c92 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -133,6 +133,7 @@ enum {
MLX5_CMD_OP_PAGE_FAULT_RESUME = 0x204,
MLX5_CMD_OP_ALLOC_MEMIC   = 0x205,
MLX5_CMD_OP_DEALLOC_MEMIC = 0x206,
+   MLX5_CMD_OP_MODIFY_MEMIC  = 0x207,
MLX5_CMD_OP_CREATE_EQ = 0x301,
MLX5_CMD_OP_DESTROY_EQ= 0x302,
MLX5_CMD_OP_QUERY_EQ  = 0x303,
@@ -1015,7 +1016,11 @@ struct mlx5_ifc_device_mem_cap_bits {
 
u8 header_modify_sw_icm_start_address[0x40];
 
-   u8 reserved_at_180[0x680];
+   u8 reserved_at_180[0x80];
+
+   u8 memic_operations[0x20];
+
+   u8 reserved_at_220[0x5e0];
 };
 
 struct mlx5_ifc_device_event_cap_bits {
@@ -10399,6 +10404,41 @@ struct mlx5_ifc_destroy_vport_lag_in_bits {
u8 reserved_at_40[0x40];
 };
 
+enum {
+   MLX5_MODIFY_MEMIC_OP_MOD_ALLOC,
+   MLX5_MODIFY_MEMIC_OP_MOD_DEALLOC,
+};
+
+struct mlx5_ifc_modify_memic_in_bits {
+   u8 opcode[0x10];
+   u8 uid[0x10];
+
+   u8 reserved_at_20[0x10];
+   u8 op_mod[0x10];
+
+   u8 reserved_at_40[0x20];
+
+   u8 reserved_at_60[0x18];
+   u8 memic_operation_type[0x8];
+
+   u8 memic_start_addr[0x40];
+
+   u8 reserved_at_c0[0x140];
+};
+
+struct mlx5_ifc_modify_memic_out_bits {
+   u8 status[0x8];
+   u8 reserved_at_8[0x18];
+
+   u8 syndrome[0x20];
+
+   u8 reserved_at_40[0x40];
+
+   u8 memic_operation_addr[0x40];
+
+   u8 reserved_at_c0[0x140];
+};
+
 struct mlx5_ifc_alloc_memic_in_bits {
u8 opcode[0x10];
u8 reserved_at_10[0x10];
-- 
2.30.2



[PATCH rdma-next v1 0/7] Add MEMIC operations support

2021-04-11 Thread Leon Romanovsky
From: Leon Romanovsky 

Changelog:
v1: 
 * Changed logic of patch #6 per-Jason's request. 
v0: https://lore.kernel.org/linux-rdma/20210318111548.674749-1-l...@kernel.org

---

Hi,

This series from Maor extends MEMIC to support atomic operations from
the host in addition to already supported regular read/write.

Thanks

Maor Gottlieb (7):
  net/mlx5: Add MEMIC operations related bits
  RDMA/uverbs: Make UVERBS_OBJECT_METHODS to consider line number
  RDMA/mlx5: Move all DM logic to separate file
  RDMA/mlx5: Re-organize the DM code
  RDMA/mlx5: Add support to MODIFY_MEMIC command
  RDMA/mlx5: Add support in MEMIC operations
  RDMA/mlx5: Expose UAPI to query DM

 drivers/infiniband/hw/mlx5/Makefile  |   1 +
 drivers/infiniband/hw/mlx5/cmd.c | 101 
 drivers/infiniband/hw/mlx5/cmd.h |   3 -
 drivers/infiniband/hw/mlx5/dm.c  | 584 +++
 drivers/infiniband/hw/mlx5/dm.h  |  68 +++
 drivers/infiniband/hw/mlx5/main.c| 243 +-
 drivers/infiniband/hw/mlx5/mlx5_ib.h |  25 +-
 drivers/infiniband/hw/mlx5/mr.c  |   1 +
 include/linux/mlx5/mlx5_ifc.h|  42 +-
 include/rdma/uverbs_named_ioctl.h|   2 +-
 include/uapi/rdma/mlx5_user_ioctl_cmds.h |  19 +
 11 files changed, 720 insertions(+), 369 deletions(-)
 create mode 100644 drivers/infiniband/hw/mlx5/dm.c
 create mode 100644 drivers/infiniband/hw/mlx5/dm.h

-- 
2.30.2



Re: [PATCH v4 01/23] iidc: Introduce iidc.h

2021-04-11 Thread Leon Romanovsky
On Fri, Apr 09, 2021 at 01:38:37AM +, Saleem, Shiraz wrote:
> > Subject: Re: [PATCH v4 01/23] iidc: Introduce iidc.h
> > 
> > On Wed, Apr 07, 2021 at 07:43:24PM -0300, Jason Gunthorpe wrote:
> > > On Wed, Apr 07, 2021 at 08:58:49PM +, Saleem, Shiraz wrote:
> > > > > Subject: Re: [PATCH v4 01/23] iidc: Introduce iidc.h
> > > > >
> > > > > On Tue, Apr 06, 2021 at 04:01:03PM -0500, Shiraz Saleem wrote:
> > > > >
> > > > > > +/* Following APIs are implemented by core PCI driver */ struct
> > > > > > +iidc_core_ops {
> > > > > > +   /* APIs to allocate resources such as VEB, VSI, Doorbell queues,
> > > > > > +* completion queues, Tx/Rx queues, etc...
> > > > > > +*/
> > > > > > +   int (*alloc_res)(struct iidc_core_dev_info *cdev_info,
> > > > > > +struct iidc_res *res,
> > > > > > +int partial_acceptable);
> > > > > > +   int (*free_res)(struct iidc_core_dev_info *cdev_info,
> > > > > > +   struct iidc_res *res);
> > > > > > +
> > > > > > +   int (*request_reset)(struct iidc_core_dev_info *cdev_info,
> > > > > > +enum iidc_reset_type reset_type);
> > > > > > +
> > > > > > +   int (*update_vport_filter)(struct iidc_core_dev_info *cdev_info,
> > > > > > +  u16 vport_id, bool enable);
> > > > > > +   int (*vc_send)(struct iidc_core_dev_info *cdev_info, u32 vf_id, 
> > > > > > u8
> > *msg,
> > > > > > +  u16 len);
> > > > > > +};
> > > > >
> > > > > What is this? There is only one implementation:
> > > > >
> > > > > static const struct iidc_core_ops ops = {
> > > > >   .alloc_res  = ice_cdev_info_alloc_res,
> > > > >   .free_res   = ice_cdev_info_free_res,
> > > > >   .request_reset  = ice_cdev_info_request_reset,
> > > > >   .update_vport_filter= 
> > > > > ice_cdev_info_update_vsi_filter,
> > > > >   .vc_send= ice_cdev_info_vc_send,
> > > > > };
> > > > >
> > > > > So export and call the functions directly.
> > > >
> > > > No. Then we end up requiring ice to be loaded even when just want to
> > > > use irdma with x722 [whose ethernet driver is "i40e"].
> > >
> > > So what? What does it matter to load a few extra kb of modules?
> > 
> > And if user cares about it, he will blacklist that module anyway.
> > 
>  blacklist ice when you just have an x722 card? How does that solve anything? 
> You wont be able to load irdma then.

You will blacklist i40e if you want solely irdma functionality.

Thanks

> 
> Shiraz


Re: [PATCH net-next 4/7] net: ipa: ipa_stop() does not return an error

2021-04-10 Thread Leon Romanovsky
On Fri, Apr 09, 2021 at 01:07:19PM -0500, Alex Elder wrote:
> In ipa_modem_stop(), if the modem netdev pointer is non-null we call
> ipa_stop().  We check for an error and if one is returned we handle
> it.  But ipa_stop() never returns an error, so this extra handling
> is unnecessary.  Simplify the code in ipa_modem_stop() based on the
> knowledge no error handling is needed at this spot.
> 
> Signed-off-by: Alex Elder 
> ---
>  drivers/net/ipa/ipa_modem.c | 18 --
>  1 file changed, 4 insertions(+), 14 deletions(-)

<...>

> + /* Stop the queue and disable the endpoints if it's open */
>   if (netdev) {
> - /* Stop the queue and disable the endpoints if it's open */
> - ret = ipa_stop(netdev);
> - if (ret)
> - goto out_set_state;
> -
> + (void)ipa_stop(netdev);

This void casting is not needed here and in more general case sometimes
even be seen as a mistake, for example if the returned attribute declared
as __must_check.

Thanks


Re: [PATCH rdma-next v2 0/5] Get rid of custom made module dependency

2021-04-08 Thread Leon Romanovsky
On Thu, Apr 08, 2021 at 08:53:47AM -0300, Jason Gunthorpe wrote:
> On Thu, Apr 08, 2021 at 02:44:45PM +0300, Leon Romanovsky wrote:
> 
> > > In my internal testing, I am seeing a crash using the 3rd patch. I am
> > > spending a few cycles on debugging it. expect my input in a day or so.
> > 
> > Can you please post the kernel crash report here?
> > I don't see how function rename in patch #3 can cause to the crash.
> 
> I looked too, I'm also quite surprised that 1,2,3 alone have a
> bug.. Is there some condition where ulp_probe can be null?

My speculative guess that they are testing not upstream kernel/module.

> 
> Ugh the is_bnxt_re_dev() is horribly gross too

The whole bnxt* code is very creative. The function bnxt_re_from_netdev()
below is junk too.

It is interesting to see how my review skills from 2017 improved over years :).

Thanks

> 
> Jason


Re: [PATCH rdma-next v2 0/5] Get rid of custom made module dependency

2021-04-08 Thread Leon Romanovsky
On Thu, Apr 08, 2021 at 05:06:24PM +0530, Devesh Sharma wrote:
> On Sat, Apr 3, 2021 at 5:12 PM Leon Romanovsky  wrote:
> >
> > On Sat, Apr 03, 2021 at 03:52:13PM +0530, Devesh Sharma wrote:
> > > On Thu, Apr 1, 2021 at 12:27 PM Leon Romanovsky  wrote:
> > > >
> > > > From: Leon Romanovsky 
> > > >
> > > > Changelog:
> > > > v2:
> > > >  * kbuild spotted that I didn't delete all code in patch #5, so deleted
> > > >even more ulp_ops derefences.
> > > > v1: 
> > > > https://lore.kernel.org/linux-rdma/20210329085212.257771-1-l...@kernel.org
> > > >  * Go much deeper and removed useless ULP indirection
> > > > v0: 
> > > > https://lore.kernel.org/linux-rdma/20210324142524.1135319-1-l...@kernel.org
> > > > ---
> > > >
> > > > The following series fixes issue spotted in [1], where bnxt_re driver
> > > > messed with module reference counting in order to implement symbol
> > > > dependency of bnxt_re and bnxt modules. All of this is done, when in
> > > > upstream we have only one ULP user of that bnxt module. The simple
> > > > declaration of exported symbol would do the trick.
> > > >
> > > > This series removes that custom module_get/_put, which is not supposed
> > > > to be in the driver from the beginning and get rid of nasty indirection
> > > > logic that isn't relevant for the upstream code.
> > > >
> > > > Such small changes allow us to simplify the bnxt code and my hope that
> > > > Devesh will continue where I stopped and remove struct bnxt_ulp_ops too.
> > > >
> > > > Thanks
> > > >
> > > > [1] 
> > > > https://lore.kernel.org/linux-rdma/20210324142524.1135319-1-l...@kernel.org
> > > >
> > > > Leon Romanovsky (5):
> > > >   RDMA/bnxt_re: Depend on bnxt ethernet driver and not blindly select it
> > > >   RDMA/bnxt_re: Create direct symbolic link between bnxt modules
> > > >   RDMA/bnxt_re: Get rid of custom module reference counting
> > > >   net/bnxt: Remove useless check of non-existent ULP id
> > > >   net/bnxt: Use direct API instead of useless indirection
> > > >
> > > >  drivers/infiniband/hw/bnxt_re/Kconfig |   4 +-
> > > >  drivers/infiniband/hw/bnxt_re/main.c  |  93 ++-
> > > >  drivers/net/ethernet/broadcom/bnxt/bnxt.c |   4 +-
> > > >  drivers/net/ethernet/broadcom/bnxt/bnxt.h |   1 -
> > > >  drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.c | 245 +++---
> > > >  drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.h |  32 +--
> > > >  6 files changed, 119 insertions(+), 260 deletions(-)
> > >
> > > Hi Leon,
> > >
> > > After a couple of internal discussions we reached a conclusion to
> > > implement the Auxbus driver interface and fix the problem once and for
> > > all.
> >
> > Thanks Devesh,
> >
> > Jason, it looks like we can proceed with this patchset, because in
> > auxbus mode this module refcount and ULP indirection logics will be
> > removed anyway.
> >
> > Thanks
> Hi Leon,
> 
> In my internal testing, I am seeing a crash using the 3rd patch. I am
> spending a few cycles on debugging it. expect my input in a day or so.

Can you please post the kernel crash report here?
I don't see how function rename in patch #3 can cause to the crash.

Thanks

> >
> > > >
> > > > --
> > > > 2.30.2
> > > >
> > >
> > >
> > > --
> > > -Regards
> > > Devesh
> >
> >
> 
> 
> -- 
> -Regards
> Devesh




Re: [PATCH net-next] net: mana: Add a driver for Microsoft Azure Network Adapter (MANA)

2021-04-08 Thread Leon Romanovsky
On Wed, Apr 07, 2021 at 09:59:52PM +, Dexuan Cui wrote:
> > From: Leon Romanovsky 
> > Sent: Wednesday, April 7, 2021 5:45 AM
> > > >
> > > > BTW, you don't need to write { 0 }, the {} is enough.
> > >
> > > Thanks for the suggestion! I'll use {0} in v2.
> > 
> > You missed the point, "{ 0 }" change to be "{}" without 0.
> 
> Got it. Will make the suggested change.

The numbers are not important, if you are curious, read this thread, it
talks about {}, {0}, memset(0,..) and padding :)
https://lore.kernel.org/linux-rdma/20200730192026.110246-1-yepeilin...@gmail.com/

> 
> FWIW, {0} and { 0 } are still widely used, but it looks like
> {} is indeed more preferred:
> 
> $ grep "= {};" drivers/net/  -nr  | wc -l
> 829
> 
> $ grep "= {0};" drivers/net/  -nr  | wc -l
> 708
> 
> $ grep "= {};" kernel/  -nr  | wc -l
> 29
> 
> $ grep "= {0};" kernel/  -nr  | wc -l
> 4


Re: [PATCH v4 01/23] iidc: Introduce iidc.h

2021-04-08 Thread Leon Romanovsky
On Wed, Apr 07, 2021 at 07:43:24PM -0300, Jason Gunthorpe wrote:
> On Wed, Apr 07, 2021 at 08:58:49PM +, Saleem, Shiraz wrote:
> > > Subject: Re: [PATCH v4 01/23] iidc: Introduce iidc.h
> > > 
> > > On Tue, Apr 06, 2021 at 04:01:03PM -0500, Shiraz Saleem wrote:
> > > 
> > > > +/* Following APIs are implemented by core PCI driver */ struct
> > > > +iidc_core_ops {
> > > > +   /* APIs to allocate resources such as VEB, VSI, Doorbell queues,
> > > > +* completion queues, Tx/Rx queues, etc...
> > > > +*/
> > > > +   int (*alloc_res)(struct iidc_core_dev_info *cdev_info,
> > > > +struct iidc_res *res,
> > > > +int partial_acceptable);
> > > > +   int (*free_res)(struct iidc_core_dev_info *cdev_info,
> > > > +   struct iidc_res *res);
> > > > +
> > > > +   int (*request_reset)(struct iidc_core_dev_info *cdev_info,
> > > > +enum iidc_reset_type reset_type);
> > > > +
> > > > +   int (*update_vport_filter)(struct iidc_core_dev_info *cdev_info,
> > > > +  u16 vport_id, bool enable);
> > > > +   int (*vc_send)(struct iidc_core_dev_info *cdev_info, u32 vf_id, 
> > > > u8 *msg,
> > > > +  u16 len);
> > > > +};
> > > 
> > > What is this? There is only one implementation:
> > > 
> > > static const struct iidc_core_ops ops = {
> > >   .alloc_res  = ice_cdev_info_alloc_res,
> > >   .free_res   = ice_cdev_info_free_res,
> > >   .request_reset  = ice_cdev_info_request_reset,
> > >   .update_vport_filter= ice_cdev_info_update_vsi_filter,
> > >   .vc_send= ice_cdev_info_vc_send,
> > > };
> > > 
> > > So export and call the functions directly.
> > 
> > No. Then we end up requiring ice to be loaded even when just want to
> > use irdma with x722 [whose ethernet driver is "i40e"].
> 
> So what? What does it matter to load a few extra kb of modules?

And if user cares about it, he will blacklist that module anyway.

Thanks


Re: [PATCH net-next] net: mana: Add a driver for Microsoft Azure Network Adapter (MANA)

2021-04-07 Thread Leon Romanovsky
On Wed, Apr 07, 2021 at 03:05:26PM +, Haiyang Zhang wrote:
> 
> 
> > -Original Message-
> > From: Leon Romanovsky 
> > Sent: Wednesday, April 7, 2021 10:55 AM
> > To: Haiyang Zhang 
> > Cc: Dexuan Cui ; da...@davemloft.net;
> > k...@kernel.org; KY Srinivasan ; Stephen Hemminger
> > ; wei@kernel.org; Wei Liu
> > ; netdev@vger.kernel.org; linux-
> > ker...@vger.kernel.org; linux-hyp...@vger.kernel.org
> > Subject: Re: [PATCH net-next] net: mana: Add a driver for Microsoft Azure
> > Network Adapter (MANA)
> > 
> > On Wed, Apr 07, 2021 at 02:41:45PM +, Haiyang Zhang wrote:
> > >
> > >
> > > > -Original Message-
> > > > From: Leon Romanovsky 
> > > > Sent: Wednesday, April 7, 2021 8:51 AM
> > > > To: Dexuan Cui 
> > > > Cc: da...@davemloft.net; k...@kernel.org; KY Srinivasan
> > > > ; Haiyang Zhang ;
> > Stephen
> > > > Hemminger ; wei@kernel.org; Wei Liu
> > > > ; netdev@vger.kernel.org; linux-
> > > > ker...@vger.kernel.org; linux-hyp...@vger.kernel.org
> > > > Subject: Re: [PATCH net-next] net: mana: Add a driver for Microsoft
> > > > Azure Network Adapter (MANA)
> > > >
> > > > On Wed, Apr 07, 2021 at 08:40:13AM +, Dexuan Cui wrote:
> > > > > > From: Leon Romanovsky 
> > > > > > Sent: Wednesday, April 7, 2021 1:10 AM
> > > > > >
> > > > > > <...>
> > > > > >
> > > > > > > +int gdma_verify_vf_version(struct pci_dev *pdev) {
> > > > > > > + struct gdma_context *gc = pci_get_drvdata(pdev);
> > > > > > > + struct gdma_verify_ver_req req = { 0 };
> > > > > > > + struct gdma_verify_ver_resp resp = { 0 };
> > > > > > > + int err;
> > > > > > > +
> > > > > > > + gdma_init_req_hdr(&req.hdr,
> > GDMA_VERIFY_VF_DRIVER_VERSION,
> > > > > > > +   sizeof(req), sizeof(resp));
> > > > > > > +
> > > > > > > + req.protocol_ver_min = GDMA_PROTOCOL_FIRST;
> > > > > > > + req.protocol_ver_max = GDMA_PROTOCOL_LAST;
> > > > > > > +
> > > > > > > + err = gdma_send_request(gc, sizeof(req), &req, sizeof(resp),
> > &resp);
> > > > > > > + if (err || resp.hdr.status) {
> > > > > > > + pr_err("VfVerifyVersionOutput: %d, status=0x%x\n",
> > err,
> > > > > > > +resp.hdr.status);
> > > > > > > + return -EPROTO;
> > > > > > > + }
> > > > > > > +
> > > > > > > + return 0;
> > > > > > > +}
> > > > > >
> > > > > > <...>
> > > > > > > + err = gdma_verify_vf_version(pdev);
> > > > > > > + if (err)
> > > > > > > + goto remove_irq;
> > > > > >
> > > > > > Will this VF driver be used in the guest VM? What will prevent
> > > > > > from users
> > > > to
> > > > > > change it?
> > > > > > I think that such version negotiation scheme is not allowed.
> > > > >
> > > > > Yes, the VF driver is expected to run in a Linux VM that runs on 
> > > > > Azure.
> > > > >
> > > > > Currently gdma_verify_vf_version() just tells the PF driver that
> > > > > the VF
> > > > driver
> > > > > is only able to support GDMA_PROTOCOL_V1, and want to use
> > > > > GDMA_PROTOCOL_V1's message formats to talk to the PF driver later.
> > > > >
> > > > > enum {
> > > > > GDMA_PROTOCOL_UNDEFINED = 0,
> > > > > GDMA_PROTOCOL_V1 = 1,
> > > > > GDMA_PROTOCOL_FIRST = GDMA_PROTOCOL_V1,
> > > > > GDMA_PROTOCOL_LAST = GDMA_PROTOCOL_V1,
> > > > > GDMA_PROTOCOL_VALUE_MAX
> > > > > };
> > > > >
> > > > > The PF driver is supposed to always support GDMA_PROTOCOL_V1, so I
> > > > expect
> > > > > here gdma_verify_vf_version() should succeed. If a user changes
> > > > > the Linux
> > > > VF
> > > > > driver and try to use a protocol version not supported by the PF
> > > > > driver,
> > > > then
> > > > > gdma_verify_vf_version() will fail; later, if the VF driver tries
> > > > > to talk to the
> > > > PF
> > > > > driver using an unsupported message format, the PF driver will
> > > > > return a
> > > > failure.
> > > >
> > > > The worry is not for the current code, but for the future one when
> > > > you will support v2, v3 e.t.c. First, your code will look like a
> > > > spaghetti and second, users will try and mix vX with "unsupported"
> > > > commands just for the fun.
> > >
> > > In the future, if the protocol version updated on the host side,
> > > guests need to support different host versions because not all hosts
> > > are updated (simultaneously). So this negotiation is necessary to know
> > > the supported version, and decide the proper command version to use.
> > 
> > And how do other paravirtual drivers solve this negotiation scheme?
> 
> I saw some other drivers used version negotiation too, for example:

I see, thanks.

> 
> /**
>  *  ixgbevf_negotiate_api_version_vf - Negotiate supported API version
>  *  @hw: pointer to the HW structure
>  *  @api: integer containing requested API version
>  **/
> static int ixgbevf_negotiate_api_version_vf(struct ixgbe_hw *hw, int api)
> {
> 
> Thanks,
> - Haiyang


Re: [PATCH net-next] net: mana: Add a driver for Microsoft Azure Network Adapter (MANA)

2021-04-07 Thread Leon Romanovsky
On Wed, Apr 07, 2021 at 02:41:45PM +, Haiyang Zhang wrote:
> 
> 
> > -Original Message-
> > From: Leon Romanovsky 
> > Sent: Wednesday, April 7, 2021 8:51 AM
> > To: Dexuan Cui 
> > Cc: da...@davemloft.net; k...@kernel.org; KY Srinivasan
> > ; Haiyang Zhang ; Stephen
> > Hemminger ; wei@kernel.org; Wei Liu
> > ; netdev@vger.kernel.org; linux-
> > ker...@vger.kernel.org; linux-hyp...@vger.kernel.org
> > Subject: Re: [PATCH net-next] net: mana: Add a driver for Microsoft Azure
> > Network Adapter (MANA)
> > 
> > On Wed, Apr 07, 2021 at 08:40:13AM +, Dexuan Cui wrote:
> > > > From: Leon Romanovsky 
> > > > Sent: Wednesday, April 7, 2021 1:10 AM
> > > >
> > > > <...>
> > > >
> > > > > +int gdma_verify_vf_version(struct pci_dev *pdev)
> > > > > +{
> > > > > + struct gdma_context *gc = pci_get_drvdata(pdev);
> > > > > + struct gdma_verify_ver_req req = { 0 };
> > > > > + struct gdma_verify_ver_resp resp = { 0 };
> > > > > + int err;
> > > > > +
> > > > > + gdma_init_req_hdr(&req.hdr, GDMA_VERIFY_VF_DRIVER_VERSION,
> > > > > +   sizeof(req), sizeof(resp));
> > > > > +
> > > > > + req.protocol_ver_min = GDMA_PROTOCOL_FIRST;
> > > > > + req.protocol_ver_max = GDMA_PROTOCOL_LAST;
> > > > > +
> > > > > + err = gdma_send_request(gc, sizeof(req), &req, sizeof(resp), 
> > > > > &resp);
> > > > > + if (err || resp.hdr.status) {
> > > > > + pr_err("VfVerifyVersionOutput: %d, status=0x%x\n", err,
> > > > > +resp.hdr.status);
> > > > > + return -EPROTO;
> > > > > + }
> > > > > +
> > > > > + return 0;
> > > > > +}
> > > >
> > > > <...>
> > > > > + err = gdma_verify_vf_version(pdev);
> > > > > + if (err)
> > > > > + goto remove_irq;
> > > >
> > > > Will this VF driver be used in the guest VM? What will prevent from 
> > > > users
> > to
> > > > change it?
> > > > I think that such version negotiation scheme is not allowed.
> > >
> > > Yes, the VF driver is expected to run in a Linux VM that runs on Azure.
> > >
> > > Currently gdma_verify_vf_version() just tells the PF driver that the VF
> > driver
> > > is only able to support GDMA_PROTOCOL_V1, and want to use
> > > GDMA_PROTOCOL_V1's message formats to talk to the PF driver later.
> > >
> > > enum {
> > > GDMA_PROTOCOL_UNDEFINED = 0,
> > > GDMA_PROTOCOL_V1 = 1,
> > > GDMA_PROTOCOL_FIRST = GDMA_PROTOCOL_V1,
> > > GDMA_PROTOCOL_LAST = GDMA_PROTOCOL_V1,
> > > GDMA_PROTOCOL_VALUE_MAX
> > > };
> > >
> > > The PF driver is supposed to always support GDMA_PROTOCOL_V1, so I
> > expect
> > > here gdma_verify_vf_version() should succeed. If a user changes the Linux
> > VF
> > > driver and try to use a protocol version not supported by the PF driver,
> > then
> > > gdma_verify_vf_version() will fail; later, if the VF driver tries to talk 
> > > to the
> > PF
> > > driver using an unsupported message format, the PF driver will return a
> > failure.
> > 
> > The worry is not for the current code, but for the future one when you will
> > support v2, v3 e.t.c. First, your code will look like a spaghetti and
> > second, users will try and mix vX with "unsupported" commands just for the
> > fun.
> 
> In the future, if the protocol version updated on the host side, guests need 
> to support different host versions because not all hosts are updated 
> (simultaneously). So this negotiation is necessary to know the supported 
> version, and decide the proper command version to use. 

And how do other paravirtual drivers solve this negotiation scheme?

> 
> If any user try "unsupported commands just for the fun", the host will deny 
> and return an error.
> 
> Thanks,
> - Haiyang


Re: [PATCH net-next] net: mana: Add a driver for Microsoft Azure Network Adapter (MANA)

2021-04-07 Thread Leon Romanovsky
On Wed, Apr 07, 2021 at 08:40:13AM +, Dexuan Cui wrote:
> > From: Leon Romanovsky 
> > Sent: Wednesday, April 7, 2021 1:10 AM
> > 
> > <...>
> > 
> > > +int gdma_verify_vf_version(struct pci_dev *pdev)
> > > +{
> > > + struct gdma_context *gc = pci_get_drvdata(pdev);
> > > + struct gdma_verify_ver_req req = { 0 };
> > > + struct gdma_verify_ver_resp resp = { 0 };
> > > + int err;
> > > +
> > > + gdma_init_req_hdr(&req.hdr, GDMA_VERIFY_VF_DRIVER_VERSION,
> > > +   sizeof(req), sizeof(resp));
> > > +
> > > + req.protocol_ver_min = GDMA_PROTOCOL_FIRST;
> > > + req.protocol_ver_max = GDMA_PROTOCOL_LAST;
> > > +
> > > + err = gdma_send_request(gc, sizeof(req), &req, sizeof(resp), &resp);
> > > + if (err || resp.hdr.status) {
> > > + pr_err("VfVerifyVersionOutput: %d, status=0x%x\n", err,
> > > +resp.hdr.status);
> > > + return -EPROTO;
> > > + }
> > > +
> > > + return 0;
> > > +}
> > 
> > <...>
> > > + err = gdma_verify_vf_version(pdev);
> > > + if (err)
> > > + goto remove_irq;
> > 
> > Will this VF driver be used in the guest VM? What will prevent from users to
> > change it?
> > I think that such version negotiation scheme is not allowed.
> 
> Yes, the VF driver is expected to run in a Linux VM that runs on Azure.
> 
> Currently gdma_verify_vf_version() just tells the PF driver that the VF driver
> is only able to support GDMA_PROTOCOL_V1, and want to use
> GDMA_PROTOCOL_V1's message formats to talk to the PF driver later.
> 
> enum {
> GDMA_PROTOCOL_UNDEFINED = 0,
> GDMA_PROTOCOL_V1 = 1,
> GDMA_PROTOCOL_FIRST = GDMA_PROTOCOL_V1,
> GDMA_PROTOCOL_LAST = GDMA_PROTOCOL_V1,
> GDMA_PROTOCOL_VALUE_MAX
> };
> 
> The PF driver is supposed to always support GDMA_PROTOCOL_V1, so I expect
> here gdma_verify_vf_version() should succeed. If a user changes the Linux VF
> driver and try to use a protocol version not supported by the PF driver, then
> gdma_verify_vf_version() will fail; later, if the VF driver tries to talk to 
> the PF
> driver using an unsupported message format, the PF driver will return a 
> failure.

The worry is not for the current code, but for the future one when you will
support v2, v3 e.t.c. First, your code will look like a spaghetti and
second, users will try and mix vX with "unsupported" commands just for the
fun.

Thanks

> 
> Thanks,
> -- Dexuan


Re: [PATCH net-next] net: mana: Add a driver for Microsoft Azure Network Adapter (MANA)

2021-04-07 Thread Leon Romanovsky
On Wed, Apr 07, 2021 at 08:28:45AM +, Dexuan Cui wrote:
> > From: Leon Romanovsky 
> > Sent: Wednesday, April 7, 2021 1:15 AM
> > > ...
> > > int gdma_test_eq(struct gdma_context *gc, struct gdma_queue *eq)
> > > {
> > > struct gdma_generate_test_event_req req = { 0 };
> > > struct gdma_general_resp resp = { 0 };
> > 
> > BTW, you don't need to write { 0 }, the {} is enough.
>  
> Thanks for the suggestion! I'll use {0} in v2. 

You missed the point, "{ 0 }" change to be "{}" without 0.

Thanks


Re: [PATCH net-next] net: mana: Add a driver for Microsoft Azure Network Adapter (MANA)

2021-04-07 Thread Leon Romanovsky
On Wed, Apr 07, 2021 at 08:02:17AM +, Dexuan Cui wrote:
> > From: Andrew Lunn 
> > Sent: Tuesday, April 6, 2021 6:08 PM
> > To: Dexuan Cui 
> > 
> > > +static int gdma_query_max_resources(struct pci_dev *pdev)
> > > +{
> > > + struct gdma_context *gc = pci_get_drvdata(pdev);
> > > + struct gdma_general_req req = { 0 };
> > > + struct gdma_query_max_resources_resp resp = { 0 };
> > > + int err;
> > 
> > Network drivers need to use reverse christmas tree. I spotted a number
> > of functions getting this wrong. Please review the whole driver.
> 
> Hi Andrew, thanks for the quick comments!
> 
> I think In general the patch follows the reverse christmas tree style.
> 
> For the function you pointed out, I didn't follow the reverse
> christmas tree style because I wanted to keep the variable defines
> more natural, i.e. first define 'req' and then 'resp'.
> 
> I can swap the 2 lines here, i.e. define 'resp' first, but this looks a little
> strange to me, because in some other functions the 'req' should be
> defined first, e.g. 
> 
> int gdma_test_eq(struct gdma_context *gc, struct gdma_queue *eq)
> {
> struct gdma_generate_test_event_req req = { 0 };
> struct gdma_general_resp resp = { 0 };

BTW, you don't need to write { 0 }, the {} is enough.

Thanks


Re: [PATCH net-next] net: mana: Add a driver for Microsoft Azure Network Adapter (MANA)

2021-04-07 Thread Leon Romanovsky
On Tue, Apr 06, 2021 at 04:23:21PM -0700, Dexuan Cui wrote:
> Add a VF driver for Microsoft Azure Network Adapter (MANA) that will be
> available in the future.
> 
> Co-developed-by: Haiyang Zhang 
> Signed-off-by: Haiyang Zhang 
> Signed-off-by: Dexuan Cui 
> ---
>  MAINTAINERS   |4 +-
>  drivers/net/ethernet/Kconfig  |1 +
>  drivers/net/ethernet/Makefile |1 +
>  drivers/net/ethernet/microsoft/Kconfig|   29 +
>  drivers/net/ethernet/microsoft/Makefile   |5 +
>  drivers/net/ethernet/microsoft/mana/Makefile  |6 +
>  drivers/net/ethernet/microsoft/mana/gdma.h|  731 +++
>  .../net/ethernet/microsoft/mana/gdma_main.c   | 1500 +
>  .../net/ethernet/microsoft/mana/hw_channel.c  |  851 
>  .../net/ethernet/microsoft/mana/hw_channel.h  |  181 ++
>  drivers/net/ethernet/microsoft/mana/mana.h|  529 +
>  drivers/net/ethernet/microsoft/mana/mana_en.c | 1861 +
>  .../ethernet/microsoft/mana/mana_ethtool.c|  276 +++
>  .../net/ethernet/microsoft/mana/shm_channel.c |  290 +++
>  .../net/ethernet/microsoft/mana/shm_channel.h |   19 +
>  15 files changed, 6283 insertions(+), 1 deletion(-)
>  create mode 100644 drivers/net/ethernet/microsoft/Kconfig
>  create mode 100644 drivers/net/ethernet/microsoft/Makefile
>  create mode 100644 drivers/net/ethernet/microsoft/mana/Makefile
>  create mode 100644 drivers/net/ethernet/microsoft/mana/gdma.h
>  create mode 100644 drivers/net/ethernet/microsoft/mana/gdma_main.c
>  create mode 100644 drivers/net/ethernet/microsoft/mana/hw_channel.c
>  create mode 100644 drivers/net/ethernet/microsoft/mana/hw_channel.h
>  create mode 100644 drivers/net/ethernet/microsoft/mana/mana.h
>  create mode 100644 drivers/net/ethernet/microsoft/mana/mana_en.c
>  create mode 100644 drivers/net/ethernet/microsoft/mana/mana_ethtool.c
>  create mode 100644 drivers/net/ethernet/microsoft/mana/shm_channel.c
>  create mode 100644 drivers/net/ethernet/microsoft/mana/shm_channel.h

<...>

> +int gdma_verify_vf_version(struct pci_dev *pdev)
> +{
> + struct gdma_context *gc = pci_get_drvdata(pdev);
> + struct gdma_verify_ver_req req = { 0 };
> + struct gdma_verify_ver_resp resp = { 0 };
> + int err;
> +
> + gdma_init_req_hdr(&req.hdr, GDMA_VERIFY_VF_DRIVER_VERSION,
> +   sizeof(req), sizeof(resp));
> +
> + req.protocol_ver_min = GDMA_PROTOCOL_FIRST;
> + req.protocol_ver_max = GDMA_PROTOCOL_LAST;
> +
> + err = gdma_send_request(gc, sizeof(req), &req, sizeof(resp), &resp);
> + if (err || resp.hdr.status) {
> + pr_err("VfVerifyVersionOutput: %d, status=0x%x\n", err,
> +resp.hdr.status);
> + return -EPROTO;
> + }
> +
> + return 0;
> +}

<...>

> +static int gdma_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
> +{
> + struct gdma_context *gc;
> + void __iomem *bar0_va;
> + int bar = 0;
> + int err;
> +
> + err = pci_enable_device(pdev);
> + if (err)
> + return -ENXIO;
> +
> + pci_set_master(pdev);
> +
> + err = pci_request_regions(pdev, "gdma");
> + if (err)
> + goto disable_dev;
> +
> + err = dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(64));
> + if (err)
> + goto release_region;
> +
> + err = -ENOMEM;
> + gc = vzalloc(sizeof(*gc));
> + if (!gc)
> + goto release_region;
> +
> + bar0_va = pci_iomap(pdev, bar, 0);
> + if (!bar0_va)
> + goto free_gc;
> +
> + gc->bar0_va = bar0_va;
> + gc->pci_dev = pdev;
> +
> + pci_set_drvdata(pdev, gc);
> +
> + gdma_init_registers(pdev);
> +
> + shm_channel_init(&gc->shm_channel, gc->shm_base);
> +
> + err = gdma_setup_irqs(pdev);
> + if (err)
> + goto unmap_bar;
> +
> + mutex_init(&gc->eq_test_event_mutex);
> +
> + err = hwc_create_channel(gc);
> + if (err)
> + goto remove_irq;
> +
> + err = gdma_verify_vf_version(pdev);
> + if (err)
> + goto remove_irq;

Will this VF driver be used in the guest VM? What will prevent from users to 
change it? 
I think that such version negotiation scheme is not allowed.

Thanks


Re: [PATCH rdma-next 02/10] RDMA/core: Enable Relaxed Ordering in __ib_alloc_pd()

2021-04-05 Thread Leon Romanovsky
On Mon, Apr 05, 2021 at 02:01:16PM -0400, Tom Talpey wrote:
> On 4/5/2021 1:23 AM, Leon Romanovsky wrote:
> > From: Avihai Horon 
> > 
> > Enable Relaxed Ordering in __ib_alloc_pd() allocation of the
> > local_dma_lkey.
> > 
> > This will take effect only for devices that don't pre-allocate the lkey
> > but allocate it per PD allocation.
> > 
> > Signed-off-by: Avihai Horon 
> > Reviewed-by: Michael Guralnik 
> > Signed-off-by: Leon Romanovsky 
> > ---
> >   drivers/infiniband/core/verbs.c  | 3 ++-
> >   drivers/infiniband/hw/vmw_pvrdma/pvrdma_mr.c | 1 +
> >   2 files changed, 3 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/infiniband/core/verbs.c 
> > b/drivers/infiniband/core/verbs.c
> > index a1782f8a6ca0..9b719f7d6fd5 100644
> > --- a/drivers/infiniband/core/verbs.c
> > +++ b/drivers/infiniband/core/verbs.c
> > @@ -287,7 +287,8 @@ struct ib_pd *__ib_alloc_pd(struct ib_device *device, 
> > unsigned int flags,
> > if (device->attrs.device_cap_flags & IB_DEVICE_LOCAL_DMA_LKEY)
> > pd->local_dma_lkey = device->local_dma_lkey;
> > else
> > -   mr_access_flags |= IB_ACCESS_LOCAL_WRITE;
> > +   mr_access_flags |=
> > +   IB_ACCESS_LOCAL_WRITE | IB_ACCESS_RELAXED_ORDERING;
> 
> So, do local_dma_lkey's get relaxed ordering unconditionally?

Yes, in mlx5, this lkey is created on the fly.

Thanks


Re: [PATCH rdma-next 01/10] RDMA: Add access flags to ib_alloc_mr() and ib_mr_pool_init()

2021-04-05 Thread Leon Romanovsky
On Tue, Apr 06, 2021 at 07:27:17AM +0200, Christoph Hellwig wrote:
> On Tue, Apr 06, 2021 at 08:23:28AM +0300, Leon Romanovsky wrote:
> > The same proposal (enable unconditionally) was raised during
> > submission preparations and we decided to follow same pattern
> > as other verbs objects which receive flag parameter.
> 
> A flags argument can be added when it actually is needed.  Using it
> to pass an argument enabled by all ULPs just gets us back to the bad
> old days of complete crap APIs someone drew up on a whiteboard.

Let's wait till Jason wakes up, before jumping to conclusions.
It was his request to update all ULPs.

> 
> I think we need to:
> 
>  a) document the semantics
>  b) sort out any technical concerns
>  c) just enable the damn thing

Sure

> 
> instead of requiring some form of cargo culting.


Re: [PATCH rdma-next 01/10] RDMA: Add access flags to ib_alloc_mr() and ib_mr_pool_init()

2021-04-05 Thread Leon Romanovsky
On Mon, Apr 05, 2021 at 03:46:18PM +0200, Christoph Hellwig wrote:
> On Mon, Apr 05, 2021 at 08:23:55AM +0300, Leon Romanovsky wrote:
> > From: Avihai Horon 
> > 
> > Add access flags parameter to ib_alloc_mr() and to ib_mr_pool_init(),
> > and refactor relevant code. This parameter is used to pass MR access
> > flags during MR allocation.
> > 
> > In the following patches, the new access flags parameter will be used
> > to enable Relaxed Ordering for ib_alloc_mr() and ib_mr_pool_init() users.
> 
> So this weirds up a new RELAXED_ORDERING flag without ever mentioning
> that flag in the commit log, never mind what it actually does.

We will improve commit messages.

Thanks


Re: [PATCH rdma-next 01/10] RDMA: Add access flags to ib_alloc_mr() and ib_mr_pool_init()

2021-04-05 Thread Leon Romanovsky
On Mon, Apr 05, 2021 at 08:27:16AM -0700, Bart Van Assche wrote:
> On 4/4/21 10:23 PM, Leon Romanovsky wrote:
> > diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
> > index bed4cfe50554..59138174affa 100644
> > --- a/include/rdma/ib_verbs.h
> > +++ b/include/rdma/ib_verbs.h
> > @@ -2444,10 +2444,10 @@ struct ib_device_ops {
> >struct ib_udata *udata);
> > int (*dereg_mr)(struct ib_mr *mr, struct ib_udata *udata);
> > struct ib_mr *(*alloc_mr)(struct ib_pd *pd, enum ib_mr_type mr_type,
> > - u32 max_num_sg);
> > + u32 max_num_sg, u32 access);
> > struct ib_mr *(*alloc_mr_integrity)(struct ib_pd *pd,
> > u32 max_num_data_sg,
> > -   u32 max_num_meta_sg);
> > +   u32 max_num_meta_sg, u32 access);
> > int (*advise_mr)(struct ib_pd *pd,
> >  enum ib_uverbs_advise_mr_advice advice, u32 flags,
> >  struct ib_sge *sg_list, u32 num_sge,
> > @@ -4142,11 +4142,10 @@ static inline int ib_dereg_mr(struct ib_mr *mr)
> >  }
> >  
> >  struct ib_mr *ib_alloc_mr(struct ib_pd *pd, enum ib_mr_type mr_type,
> > - u32 max_num_sg);
> > + u32 max_num_sg, u32 access);
> >  
> > -struct ib_mr *ib_alloc_mr_integrity(struct ib_pd *pd,
> > -   u32 max_num_data_sg,
> > -   u32 max_num_meta_sg);
> > +struct ib_mr *ib_alloc_mr_integrity(struct ib_pd *pd, u32 max_num_data_sg,
> > +   u32 max_num_meta_sg, u32 access);
> >  
> >  /**
> >   * ib_update_fast_reg_key - updates the key portion of the fast_reg MR
> > diff --git a/include/rdma/mr_pool.h b/include/rdma/mr_pool.h
> > index e77123bcb43b..2a0ee791037d 100644
> > --- a/include/rdma/mr_pool.h
> > +++ b/include/rdma/mr_pool.h
> > @@ -11,7 +11,8 @@ struct ib_mr *ib_mr_pool_get(struct ib_qp *qp, struct 
> > list_head *list);
> >  void ib_mr_pool_put(struct ib_qp *qp, struct list_head *list, struct ib_mr 
> > *mr);
> >  
> >  int ib_mr_pool_init(struct ib_qp *qp, struct list_head *list, int nr,
> > -   enum ib_mr_type type, u32 max_num_sg, u32 max_num_meta_sg);
> > +   enum ib_mr_type type, u32 max_num_sg, u32 max_num_meta_sg,
> > +   u32 access);
> >  void ib_mr_pool_destroy(struct ib_qp *qp, struct list_head *list);
> >  
> >  #endif /* _RDMA_MR_POOL_H */
> 
> Does the new 'access' argument only control whether or not PCIe relaxed
> ordering is enabled? It seems wrong to me to make enabling of PCIe
> relaxed ordering configurable. I think this mechanism should be enabled
> unconditionally if the HCA supports it.

The same proposal (enable unconditionally) was raised during
submission preparations and we decided to follow same pattern
as other verbs objects which receive flag parameter.

Thanks

> 
> Thanks,
> 
> Bart.


Re: [PATCH rdma-next 00/10] Enable relaxed ordering for ULPs

2021-04-05 Thread Leon Romanovsky
On Mon, Apr 05, 2021 at 11:42:31PM +, Chuck Lever III wrote:
> 
> 
> > On Apr 5, 2021, at 4:07 PM, Jason Gunthorpe  wrote:
> > 
> > On Mon, Apr 05, 2021 at 03:41:15PM +0200, Christoph Hellwig wrote:
> >> On Mon, Apr 05, 2021 at 08:23:54AM +0300, Leon Romanovsky wrote:
> >>> From: Leon Romanovsky 
> >>> 
> >>>> From Avihai,
> >>> 
> >>> Relaxed Ordering is a PCIe mechanism that relaxes the strict ordering
> >>> imposed on PCI transactions, and thus, can improve performance.
> >>> 
> >>> Until now, relaxed ordering could be set only by user space applications
> >>> for user MRs. The following patch series enables relaxed ordering for the
> >>> kernel ULPs as well. Relaxed ordering is an optional capability, and as
> >>> such, it is ignored by vendors that don't support it.
> >>> 
> >>> The following test results show the performance improvement achieved
> >>> with relaxed ordering. The test was performed on a NVIDIA A100 in order
> >>> to check performance of storage infrastructure over xprtrdma:
> >> 
> >> Isn't the Nvidia A100 a GPU not actually supported by Linux at all?
> >> What does that have to do with storage protocols?
> > 
> > I think it is a typo (or at least mit makes no sense to be talking
> > about NFS with a GPU chip) Probably it should be a DGX A100 which is a
> > dual socket AMD server with alot of PCIe, and xptrtrdma is a NFS-RDMA
> > workload.
> 
> We need to get a better idea what correctness testing has been done,
> and whether positive correctness testing results can be replicated
> on a variety of platforms.

I will ask to provide more details.

> 
> I have an old Haswell dual-socket system in my lab, but otherwise
> I'm not sure I have a platform that would be interesting for such a
> test.

We don't have such old systems too.

> 
> 
> > AMD dual socket systems are well known to benefit from relaxed
> > ordering, people have been doing this in userspace for a while now
> > with the opt in.
> 
> 
> --
> Chuck Lever
> 
> 
> 


Re: [PATCH rdma-next 00/10] Enable relaxed ordering for ULPs

2021-04-05 Thread Leon Romanovsky
On Tue, Apr 06, 2021 at 10:37:38AM +0800, Honggang LI wrote:
> On Mon, Apr 05, 2021 at 08:23:54AM +0300, Leon Romanovsky wrote:
> > From: Leon Romanovsky 
> > 
> > From Avihai,
> > 
> > Relaxed Ordering is a PCIe mechanism that relaxes the strict ordering
> > imposed on PCI transactions, and thus, can improve performance.
> > 
> > Until now, relaxed ordering could be set only by user space applications
> > for user MRs. The following patch series enables relaxed ordering for the
> > kernel ULPs as well. Relaxed ordering is an optional capability, and as
> > such, it is ignored by vendors that don't support it.
> > 
> > The following test results show the performance improvement achieved
> 
> Did you test this patchset with CPU does not support relaxed ordering?

I don't think so, the CPUs that don't support RO are Intel's 
fourth/fifth-generation
and they are not interesting from performance point of view.

> 
> We observed significantly performance degradation when run perftest with
> relaxed ordering enabled over old CPU.
> 
> https://github.com/linux-rdma/perftest/issues/116

The perftest is slightly different, but you pointed to the valid point.
We forgot to call pcie_relaxed_ordering_enabled() before setting RO bit
and arguably this was needed to be done in perftest too.

Thanks

> 
> thanks
> 


Re: [PATCH rdma-next 00/10] Enable relaxed ordering for ULPs

2021-04-05 Thread Leon Romanovsky
On Mon, Apr 05, 2021 at 03:41:15PM +0200, Christoph Hellwig wrote:
> On Mon, Apr 05, 2021 at 08:23:54AM +0300, Leon Romanovsky wrote:
> > From: Leon Romanovsky 
> > 
> > >From Avihai,
> > 
> > Relaxed Ordering is a PCIe mechanism that relaxes the strict ordering
> > imposed on PCI transactions, and thus, can improve performance.
> > 
> > Until now, relaxed ordering could be set only by user space applications
> > for user MRs. The following patch series enables relaxed ordering for the
> > kernel ULPs as well. Relaxed ordering is an optional capability, and as
> > such, it is ignored by vendors that don't support it.
> > 
> > The following test results show the performance improvement achieved
> > with relaxed ordering. The test was performed on a NVIDIA A100 in order
> > to check performance of storage infrastructure over xprtrdma:
> 
> Isn't the Nvidia A100 a GPU not actually supported by Linux at all?
> What does that have to do with storage protocols?

This system is in use by our storage oriented customer who performed the
test. He runs drivers/infiniband/* stack from the upstream, simply backported
to specific kernel version.

The performance boost is seen in other systems too.

> 
> Also if you enable this for basically all kernel ULPs, why not have
> an opt-out into strict ordering for the cases that need it (if there are
> any).

The RO property is optional, it can only improve. In addition, all in-kernel 
ULPs
don't need strict ordering. I can be mistaken here and Jason will correct me, it
is because of two things: ULP doesn't touch data before CQE and DMA API 
prohibits it.

Thanks


Re: [PATCH net 1/2] net: hns3: Remove the left over redundant check & assignment

2021-04-05 Thread Leon Romanovsky
On Mon, Apr 05, 2021 at 12:26:37PM +, Salil Mehta wrote:
> Hi Leon,
> Thanks for the review.
> 
> > From: Leon Romanovsky [mailto:l...@kernel.org]
> > Sent: Sunday, April 4, 2021 7:26 AM
> > To: Salil Mehta 
> > Cc: da...@davemloft.net; k...@kernel.org; netdev@vger.kernel.org;
> > linux-ker...@vger.kernel.org; Linuxarm ;
> > linux...@openeuler.org
> > Subject: Re: [PATCH net 1/2] net: hns3: Remove the left over redundant check
> > & assignment
> > 
> > On Sat, Apr 03, 2021 at 02:35:19AM +0100, Salil Mehta wrote:
> > > This removes the left over check and assignment which is no longer used
> > > anywhere in the function and should have been removed as part of the
> > > below mentioned patch.
> > >
> > > Fixes: 012fcb52f67c ("net: hns3: activate reset timer when calling
> > reset_event")
> > > Signed-off-by: Salil Mehta 
> > > ---
> > >  drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c | 2 --
> > >  1 file changed, 2 deletions(-)
> > >
> > > diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
> > b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
> > > index e3f81c7e0ce7..7ad0722383f5 100644
> > > --- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
> > > +++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
> > > @@ -3976,8 +3976,6 @@ static void hclge_reset_event(struct pci_dev *pdev,
> > struct hnae3_handle *handle)
> > >* want to make sure we throttle the reset request. Therefore, we will
> > >* not allow it again before 3*HZ times.
> > >*/
> > > - if (!handle)
> > > - handle = &hdev->vport[0].nic;
> > 
> > The comment above should be updated too, and probably the signature of
> > hclge_reset_event() worth to be changed.
> 
> 
> Yes, true. Both the comment and the prototype will be updated in near future.
> I can assure you this did not go un-noticed during the change. There are
> some internal subtleties which I am trying to sort out. Those might come
> as part of different patch-set which deals with other related changes as well.

I can buy such explanation for the change in function signature, but have hard
time to believe that extra commit is needed to change comment above.

Thanks

> 
> The current change(and some other) will pave the way for necessary refactoring
> Of the code being done.
> 
> 
> > 
> > Thanks
> > 
> > >
> > >   if (time_before(jiffies, (hdev->last_reset_time +
> > > HCLGE_RESET_INTERVAL))) {
> > > --
> > > 2.17.1
> > >


Re: [PATCH rdma-next 1/8] RDMA/core: Check if client supports IB device or not

2021-04-05 Thread Leon Romanovsky
On Mon, Apr 05, 2021 at 09:20:32AM +0300, Gal Pressman wrote:
> On 05/04/2021 8:49, Leon Romanovsky wrote:
> > From: Parav Pandit 
> > 
> > RDMA devices are of different transport(iWarp, IB, RoCE) and have
> > different attributes.
> > Not all clients are interested in all type of devices.
> > 
> > Implement a generic callback that each IB client can implement to decide
> > if client add() or remove() should be done by the IB core or not for a
> > given IB device, client combination.
> > 
> > Signed-off-by: Parav Pandit 
> > Signed-off-by: Leon Romanovsky 
> > ---
> >  drivers/infiniband/core/device.c | 3 +++
> >  include/rdma/ib_verbs.h  | 9 +
> >  2 files changed, 12 insertions(+)
> > 
> > diff --git a/drivers/infiniband/core/device.c 
> > b/drivers/infiniband/core/device.c
> > index c660cef66ac6..c9af2deba8c1 100644
> > --- a/drivers/infiniband/core/device.c
> > +++ b/drivers/infiniband/core/device.c
> > @@ -691,6 +691,9 @@ static int add_client_context(struct ib_device *device,
> > if (!device->kverbs_provider && !client->no_kverbs_req)
> > return 0;
> >  
> > +   if (client->is_supported && !client->is_supported(device))
> > +   return 0;
> 
> Isn't it better to remove the kverbs_provider flag (from previous if 
> statement)
> and unify it with this generic support check?

I thought about it, but didn't find it worth. The kverbs_provider needs
to be provided by device and all ULPs except uverbs will have the same check.

Thanks


[PATCH rdma-next 8/8] net/rds: Move to client_supported callback

2021-04-04 Thread Leon Romanovsky
From: Parav Pandit 

Use newly introduced client_supported() callback to avoid client
additional if the RDMA device is not of IB type or if it doesn't
support device memory extensions.

Signed-off-by: Parav Pandit 
Signed-off-by: Leon Romanovsky 
---
 net/rds/ib.c | 20 +---
 1 file changed, 13 insertions(+), 7 deletions(-)

diff --git a/net/rds/ib.c b/net/rds/ib.c
index 24c9a9005a6f..bd2ff7d5a718 100644
--- a/net/rds/ib.c
+++ b/net/rds/ib.c
@@ -125,18 +125,23 @@ void rds_ib_dev_put(struct rds_ib_device *rds_ibdev)
queue_work(rds_wq, &rds_ibdev->free_work);
 }
 
-static int rds_ib_add_one(struct ib_device *device)
+static bool rds_client_supported(struct ib_device *device)
 {
-   struct rds_ib_device *rds_ibdev;
-   int ret;
-
/* Only handle IB (no iWARP) devices */
if (device->node_type != RDMA_NODE_IB_CA)
-   return -EOPNOTSUPP;
+   return false;
 
/* Device must support FRWR */
if (!(device->attrs.device_cap_flags & IB_DEVICE_MEM_MGT_EXTENSIONS))
-   return -EOPNOTSUPP;
+   return false;
+
+   return true;
+}
+
+static int rds_ib_add_one(struct ib_device *device)
+{
+   struct rds_ib_device *rds_ibdev;
+   int ret;
 
rds_ibdev = kzalloc_node(sizeof(struct rds_ib_device), GFP_KERNEL,
 ibdev_to_node(device));
@@ -288,7 +293,8 @@ static void rds_ib_remove_one(struct ib_device *device, 
void *client_data)
 struct ib_client rds_ib_client = {
.name   = "rds_ib",
.add= rds_ib_add_one,
-   .remove = rds_ib_remove_one
+   .remove = rds_ib_remove_one,
+   .is_supported = rds_client_supported,
 };
 
 static int rds_ib_conn_info_visitor(struct rds_connection *conn,
-- 
2.30.2



[PATCH rdma-next 4/8] IB/core: Skip device which doesn't have necessary capabilities

2021-04-04 Thread Leon Romanovsky
From: Parav Pandit 

If device doesn't have multicast capability, avoid client registration
for it. This saves 16Kbytes of memory for a RDMA device consist of 128
ports.

If device doesn't support subnet administration, avoid client
registration for it. This saves 8Kbytes of memory for a RDMA device
consist of 128 ports.

Signed-off-by: Parav Pandit 
Signed-off-by: Leon Romanovsky 
---
 drivers/infiniband/core/multicast.c | 15 ++-
 drivers/infiniband/core/sa_query.c  | 15 ++-
 2 files changed, 28 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/core/multicast.c 
b/drivers/infiniband/core/multicast.c
index a5dd4b7a74bc..8c81acc24e3e 100644
--- a/drivers/infiniband/core/multicast.c
+++ b/drivers/infiniband/core/multicast.c
@@ -44,11 +44,13 @@
 
 static int mcast_add_one(struct ib_device *device);
 static void mcast_remove_one(struct ib_device *device, void *client_data);
+static bool mcast_client_supported(struct ib_device *device);
 
 static struct ib_client mcast_client = {
.name   = "ib_multicast",
.add= mcast_add_one,
-   .remove = mcast_remove_one
+   .remove = mcast_remove_one,
+   .is_supported = mcast_client_supported,
 };
 
 static struct ib_sa_client sa_client;
@@ -816,6 +818,17 @@ static void mcast_event_handler(struct ib_event_handler 
*handler,
}
 }
 
+static bool mcast_client_supported(struct ib_device *device)
+{
+   u32 i;
+
+   rdma_for_each_port(device, i) {
+   if (rdma_cap_ib_mcast(device, i))
+   return true;
+   }
+   return false;
+}
+
 static int mcast_add_one(struct ib_device *device)
 {
struct mcast_device *dev;
diff --git a/drivers/infiniband/core/sa_query.c 
b/drivers/infiniband/core/sa_query.c
index 9a4a49c37922..7e00e24d9423 100644
--- a/drivers/infiniband/core/sa_query.c
+++ b/drivers/infiniband/core/sa_query.c
@@ -176,11 +176,13 @@ static const struct nla_policy 
ib_nl_policy[LS_NLA_TYPE_MAX] = {
 
 static int ib_sa_add_one(struct ib_device *device);
 static void ib_sa_remove_one(struct ib_device *device, void *client_data);
+static bool ib_sa_client_supported(struct ib_device *device);
 
 static struct ib_client sa_client = {
.name   = "sa",
.add= ib_sa_add_one,
-   .remove = ib_sa_remove_one
+   .remove = ib_sa_remove_one,
+   .is_supported = ib_sa_client_supported,
 };
 
 static DEFINE_XARRAY_FLAGS(queries, XA_FLAGS_ALLOC | XA_FLAGS_LOCK_IRQ);
@@ -2293,6 +2295,17 @@ static void ib_sa_event(struct ib_event_handler *handler,
}
 }
 
+static bool ib_sa_client_supported(struct ib_device *device)
+{
+   unsigned int i;
+
+   rdma_for_each_port(device, i) {
+   if (rdma_cap_ib_sa(device, i))
+   return true;
+   }
+   return false;
+}
+
 static int ib_sa_add_one(struct ib_device *device)
 {
struct ib_sa_device *sa_dev;
-- 
2.30.2



[PATCH rdma-next 7/8] net/smc: Move to client_supported callback

2021-04-04 Thread Leon Romanovsky
From: Parav Pandit 

Use newly introduced client_supported() callback to avoid client
additional if the RDMA device is not of IB type.

Signed-off-by: Parav Pandit 
Signed-off-by: Leon Romanovsky 
---
 net/smc/smc_ib.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/net/smc/smc_ib.c b/net/smc/smc_ib.c
index 6b65c5d1f957..f7186d9d1299 100644
--- a/net/smc/smc_ib.c
+++ b/net/smc/smc_ib.c
@@ -767,6 +767,11 @@ void smc_ib_ndev_change(struct net_device *ndev, unsigned 
long event)
mutex_unlock(&smc_ib_devices.mutex);
 }
 
+static bool smc_client_supported(struct ib_device *ibdev)
+{
+   return ibdev->node_type == RDMA_NODE_IB_CA;
+}
+
 /* callback function for ib_register_client() */
 static int smc_ib_add_dev(struct ib_device *ibdev)
 {
@@ -774,9 +779,6 @@ static int smc_ib_add_dev(struct ib_device *ibdev)
u8 port_cnt;
int i;
 
-   if (ibdev->node_type != RDMA_NODE_IB_CA)
-   return -EOPNOTSUPP;
-
smcibdev = kzalloc(sizeof(*smcibdev), GFP_KERNEL);
if (!smcibdev)
return -ENOMEM;
@@ -840,6 +842,7 @@ static struct ib_client smc_ib_client = {
.name   = "smc_ib",
.add= smc_ib_add_dev,
.remove = smc_ib_remove_dev,
+   .is_supported = smc_client_supported,
 };
 
 int __init smc_ib_register_client(void)
-- 
2.30.2



[PATCH rdma-next 6/8] IB/opa_vnic: Move to client_supported callback

2021-04-04 Thread Leon Romanovsky
From: Parav Pandit 

Move to newly introduced client_supported callback
Avoid client registration using newly introduced helper callback if the
IB device doesn't have OPA VNIC capability.

Signed-off-by: Parav Pandit 
Signed-off-by: Leon Romanovsky 
---
 drivers/infiniband/ulp/opa_vnic/opa_vnic_vema.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/infiniband/ulp/opa_vnic/opa_vnic_vema.c 
b/drivers/infiniband/ulp/opa_vnic/opa_vnic_vema.c
index cecf0f7cadf9..58658eba97dd 100644
--- a/drivers/infiniband/ulp/opa_vnic/opa_vnic_vema.c
+++ b/drivers/infiniband/ulp/opa_vnic/opa_vnic_vema.c
@@ -121,6 +121,7 @@ static struct ib_client opa_vnic_client = {
.name   = opa_vnic_driver_name,
.add= opa_vnic_vema_add_one,
.remove = opa_vnic_vema_rem_one,
+   .is_supported = rdma_cap_opa_vnic,
 };
 
 /**
@@ -993,9 +994,6 @@ static int opa_vnic_vema_add_one(struct ib_device *device)
struct opa_vnic_ctrl_port *cport;
int rc, size = sizeof(*cport);
 
-   if (!rdma_cap_opa_vnic(device))
-   return -EOPNOTSUPP;
-
size += device->phys_port_cnt * sizeof(struct opa_vnic_vema_port);
cport = kzalloc(size, GFP_KERNEL);
if (!cport)
-- 
2.30.2



[PATCH rdma-next 5/8] IB/IPoIB: Skip device which doesn't have InfiniBand port

2021-04-04 Thread Leon Romanovsky
From: Parav Pandit 

Skip RDMA device which doesn't have InfiniBand ports using newly
introduced client_supported() callback.

Signed-off-by: Parav Pandit 
Signed-off-by: Leon Romanovsky 
---
 drivers/infiniband/ulp/ipoib/ipoib_main.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c 
b/drivers/infiniband/ulp/ipoib/ipoib_main.c
index 8f769ebaacc6..b02c10dea242 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
@@ -93,6 +93,7 @@ static struct net_device *ipoib_get_net_dev_by_params(
struct ib_device *dev, u32 port, u16 pkey,
const union ib_gid *gid, const struct sockaddr *addr,
void *client_data);
+static bool ipoib_client_supported(struct ib_device *device);
 static int ipoib_set_mac(struct net_device *dev, void *addr);
 static int ipoib_ioctl(struct net_device *dev, struct ifreq *ifr,
   int cmd);
@@ -102,6 +103,7 @@ static struct ib_client ipoib_client = {
.add= ipoib_add_one,
.remove = ipoib_remove_one,
.get_net_dev_by_params = ipoib_get_net_dev_by_params,
+   .is_supported = ipoib_client_supported,
 };
 
 #ifdef CONFIG_INFINIBAND_IPOIB_DEBUG
@@ -2530,6 +2532,17 @@ static struct net_device *ipoib_add_port(const char 
*format,
return ERR_PTR(-ENOMEM);
 }
 
+static bool ipoib_client_supported(struct ib_device *device)
+{
+   u32 i;
+
+   rdma_for_each_port(device, i) {
+   if (rdma_protocol_ib(device, i))
+   return true;
+   }
+   return false;
+}
+
 static int ipoib_add_one(struct ib_device *device)
 {
struct list_head *dev_list;
-- 
2.30.2



[PATCH rdma-next 1/8] RDMA/core: Check if client supports IB device or not

2021-04-04 Thread Leon Romanovsky
From: Parav Pandit 

RDMA devices are of different transport(iWarp, IB, RoCE) and have
different attributes.
Not all clients are interested in all type of devices.

Implement a generic callback that each IB client can implement to decide
if client add() or remove() should be done by the IB core or not for a
given IB device, client combination.

Signed-off-by: Parav Pandit 
Signed-off-by: Leon Romanovsky 
---
 drivers/infiniband/core/device.c | 3 +++
 include/rdma/ib_verbs.h  | 9 +
 2 files changed, 12 insertions(+)

diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
index c660cef66ac6..c9af2deba8c1 100644
--- a/drivers/infiniband/core/device.c
+++ b/drivers/infiniband/core/device.c
@@ -691,6 +691,9 @@ static int add_client_context(struct ib_device *device,
if (!device->kverbs_provider && !client->no_kverbs_req)
return 0;
 
+   if (client->is_supported && !client->is_supported(device))
+   return 0;
+
down_write(&device->client_data_rwsem);
/*
 * So long as the client is registered hold both the client and device
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 59138174affa..777fbcbd4858 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -2756,6 +2756,15 @@ struct ib_client {
const union ib_gid *gid,
const struct sockaddr *addr,
void *client_data);
+   /*
+* Returns if the client is supported for a given device or not.
+* @dev: An RDMA device to check if client can support this RDMA or not.
+*
+* A client that is interested in specific device attributes, should
+* implement it to check if client can be supported for this device or
+* not.
+*/
+   bool (*is_supported)(struct ib_device *dev);
 
refcount_t uses;
struct completion uses_zero;
-- 
2.30.2



[PATCH rdma-next 3/8] IB/cm: Skip device which doesn't support IB CM

2021-04-04 Thread Leon Romanovsky
From: Parav Pandit 

There are at least 3 types of RDMA devices which do not support IB CM.
They are
(1) A (eswitch) switchdev RDMA device,
(2) iWARP device and
(3) RDMA device without a RoCE capability

Hence, avoid IB CM initialization for such devices.

This saves 8Kbytes of memory for eswitch device consist of 512 ports and
also avoids unnecessary initialization for all above 3 types of devices.

Signed-off-by: Parav Pandit 
Signed-off-by: Leon Romanovsky 
---
 drivers/infiniband/core/cm.c | 15 ++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
index 8a7791ebae69..5025f2c1347b 100644
--- a/drivers/infiniband/core/cm.c
+++ b/drivers/infiniband/core/cm.c
@@ -87,6 +87,7 @@ struct cm_id_private;
 struct cm_work;
 static int cm_add_one(struct ib_device *device);
 static void cm_remove_one(struct ib_device *device, void *client_data);
+static bool cm_supported(struct ib_device *device);
 static void cm_process_work(struct cm_id_private *cm_id_priv,
struct cm_work *work);
 static int cm_send_sidr_rep_locked(struct cm_id_private *cm_id_priv,
@@ -103,7 +104,8 @@ static int cm_send_rej_locked(struct cm_id_private 
*cm_id_priv,
 static struct ib_client cm_client = {
.name   = "cm",
.add= cm_add_one,
-   .remove = cm_remove_one
+   .remove = cm_remove_one,
+   .is_supported = cm_supported,
 };
 
 static struct ib_cm {
@@ -4371,6 +4373,17 @@ static void cm_remove_port_fs(struct cm_port *port)
 
 }
 
+static bool cm_supported(struct ib_device *device)
+{
+   u32 i;
+
+   rdma_for_each_port(device, i) {
+   if (rdma_cap_ib_cm(device, i))
+   return true;
+   }
+   return false;
+}
+
 static int cm_add_one(struct ib_device *ib_device)
 {
struct cm_device *cm_dev;
-- 
2.30.2



[PATCH rdma-next 0/8] Generalize if ULP supported check

2021-04-04 Thread Leon Romanovsky
From: Leon Romanovsky 

Hi,

This series adds new callback to check if ib client is supported/not_supported.
Such general callback allows us to save memory footprint by not starting
on devices that not going to work on them anyway.

Thanks

Parav Pandit (8):
  RDMA/core: Check if client supports IB device or not
  RDMA/cma: Skip device which doesn't support CM
  IB/cm: Skip device which doesn't support IB CM
  IB/core: Skip device which doesn't have necessary capabilities
  IB/IPoIB: Skip device which doesn't have InfiniBand port
  IB/opa_vnic: Move to client_supported callback
  net/smc: Move to client_supported callback
  net/rds: Move to client_supported callback

 drivers/infiniband/core/cm.c  | 15 +-
 drivers/infiniband/core/cma.c | 15 +-
 drivers/infiniband/core/device.c  |  3 +++
 drivers/infiniband/core/multicast.c   | 15 +-
 drivers/infiniband/core/sa_query.c| 15 +-
 drivers/infiniband/ulp/ipoib/ipoib_main.c | 13 
 .../infiniband/ulp/opa_vnic/opa_vnic_vema.c   |  4 +---
 include/rdma/ib_verbs.h   |  9 +
 net/rds/ib.c  | 20 ---
 net/smc/smc_ib.c  |  9 ++---
 10 files changed, 101 insertions(+), 17 deletions(-)

-- 
2.30.2



[PATCH rdma-next 2/8] RDMA/cma: Skip device which doesn't support CM

2021-04-04 Thread Leon Romanovsky
From: Parav Pandit 

A switchdev RDMA device do not support IB CM. When such device is added
to the RDMA CM's device list, when application invokes rdma_listen(),
cma attempts to listen to such device, however it has IB CM attribute
disabled.

Due to this, rdma_listen() call fails to listen for other non
switchdev devices as well.

A below error message can be seen.

infiniband mlx5_0: RDMA CMA: cma_listen_on_dev, error -38

A failing call flow is below.

rdma_listen()
  cma_listen_on_all()
cma_listen_on_dev()
  _cma_attach_to_dev()
  rdma_listen() <- fails on a specific switchdev device

Hence, when a IB device doesn't support IB CM or IW CM, avoid adding
such device to the cma list.

Signed-off-by: Parav Pandit 
Signed-off-by: Leon Romanovsky 
---
 drivers/infiniband/core/cma.c | 15 ++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index 42a1c8955c50..80156faf90de 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -157,11 +157,13 @@ EXPORT_SYMBOL(rdma_res_to_id);
 
 static int cma_add_one(struct ib_device *device);
 static void cma_remove_one(struct ib_device *device, void *client_data);
+static bool cma_supported(struct ib_device *device);
 
 static struct ib_client cma_client = {
.name   = "cma",
.add= cma_add_one,
-   .remove = cma_remove_one
+   .remove = cma_remove_one,
+   .is_supported = cma_supported,
 };
 
 static struct ib_sa_client sa_client;
@@ -4870,6 +4872,17 @@ static void cma_process_remove(struct cma_device 
*cma_dev)
wait_for_completion(&cma_dev->comp);
 }
 
+static bool cma_supported(struct ib_device *device)
+{
+   u32 i;
+
+   rdma_for_each_port(device, i) {
+   if (rdma_cap_ib_cm(device, i) || rdma_cap_iw_cm(device, i))
+   return true;
+   }
+   return false;
+}
+
 static int cma_add_one(struct ib_device *device)
 {
struct rdma_id_private *to_destroy;
-- 
2.30.2



[PATCH rdma-next 09/10] net/smc: Enable Relaxed Ordering

2021-04-04 Thread Leon Romanovsky
From: Avihai Horon 

Enable Relaxed Ordering for smc.

Relaxed Ordering is an optional access flag and as such, it is ignored
by vendors that don't support it.

Signed-off-by: Avihai Horon 
Reviewed-by: Michael Guralnik 
Signed-off-by: Leon Romanovsky 
---
 net/smc/smc_ib.c | 3 ++-
 net/smc/smc_wr.c | 3 ++-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/net/smc/smc_ib.c b/net/smc/smc_ib.c
index 4e91ed3dc265..6b65c5d1f957 100644
--- a/net/smc/smc_ib.c
+++ b/net/smc/smc_ib.c
@@ -579,7 +579,8 @@ int smc_ib_get_memory_region(struct ib_pd *pd, int 
access_flags,
return 0; /* already done */
 
buf_slot->mr_rx[link_idx] =
-   ib_alloc_mr(pd, IB_MR_TYPE_MEM_REG, 1 << buf_slot->order, 0);
+   ib_alloc_mr(pd, IB_MR_TYPE_MEM_REG, 1 << buf_slot->order,
+   IB_ACCESS_RELAXED_ORDERING);
if (IS_ERR(buf_slot->mr_rx[link_idx])) {
int rc;
 
diff --git a/net/smc/smc_wr.c b/net/smc/smc_wr.c
index cbc73a7e4d59..78e9650621f1 100644
--- a/net/smc/smc_wr.c
+++ b/net/smc/smc_wr.c
@@ -555,7 +555,8 @@ static void smc_wr_init_sge(struct smc_link *lnk)
lnk->wr_reg.wr.num_sge = 0;
lnk->wr_reg.wr.send_flags = IB_SEND_SIGNALED;
lnk->wr_reg.wr.opcode = IB_WR_REG_MR;
-   lnk->wr_reg.access = IB_ACCESS_LOCAL_WRITE | IB_ACCESS_REMOTE_WRITE;
+   lnk->wr_reg.access = IB_ACCESS_LOCAL_WRITE | IB_ACCESS_REMOTE_WRITE |
+IB_ACCESS_RELAXED_ORDERING;
 }
 
 void smc_wr_free_link(struct smc_link *lnk)
-- 
2.30.2



[PATCH rdma-next 10/10] xprtrdma: Enable Relaxed Ordering

2021-04-04 Thread Leon Romanovsky
From: Avihai Horon 

Enable Relaxed Ordering for xprtrdma.

Relaxed Ordering is an optional access flag and as such, it is ignored
by vendors that don't support it.

Signed-off-by: Avihai Horon 
Reviewed-by: Michael Guralnik 
Signed-off-by: Leon Romanovsky 
---
 net/sunrpc/xprtrdma/frwr_ops.c | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/net/sunrpc/xprtrdma/frwr_ops.c b/net/sunrpc/xprtrdma/frwr_ops.c
index cfbdd197cdfe..f9334c0a1a13 100644
--- a/net/sunrpc/xprtrdma/frwr_ops.c
+++ b/net/sunrpc/xprtrdma/frwr_ops.c
@@ -135,7 +135,8 @@ int frwr_mr_init(struct rpcrdma_xprt *r_xprt, struct 
rpcrdma_mr *mr)
struct ib_mr *frmr;
int rc;
 
-   frmr = ib_alloc_mr(ep->re_pd, ep->re_mrtype, depth, 0);
+   frmr = ib_alloc_mr(ep->re_pd, ep->re_mrtype, depth,
+  IB_ACCESS_RELAXED_ORDERING);
if (IS_ERR(frmr))
goto out_mr_err;
 
@@ -339,9 +340,10 @@ struct rpcrdma_mr_seg *frwr_map(struct rpcrdma_xprt 
*r_xprt,
reg_wr = &mr->frwr.fr_regwr;
reg_wr->mr = ibmr;
reg_wr->key = ibmr->rkey;
-   reg_wr->access = writing ?
-IB_ACCESS_REMOTE_WRITE | IB_ACCESS_LOCAL_WRITE :
-IB_ACCESS_REMOTE_READ;
+   reg_wr->access =
+   (writing ? IB_ACCESS_REMOTE_WRITE | IB_ACCESS_LOCAL_WRITE :
+  IB_ACCESS_REMOTE_READ) |
+   IB_ACCESS_RELAXED_ORDERING;
 
mr->mr_handle = ibmr->rkey;
mr->mr_length = ibmr->length;
-- 
2.30.2



[PATCH rdma-next 06/10] nvme-rdma: Enable Relaxed Ordering

2021-04-04 Thread Leon Romanovsky
From: Avihai Horon 

Enable Relaxed Ordering for nvme.

Relaxed Ordering is an optional access flag and as such, it is ignored
by vendors that don't support it.

Signed-off-by: Avihai Horon 
Reviewed-by: Max Gurtovoy 
Reviewed-by: Michael Guralnik 
Signed-off-by: Leon Romanovsky 
---
 drivers/nvme/host/rdma.c | 19 +--
 1 file changed, 9 insertions(+), 10 deletions(-)

diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index 4dbc17311e0b..8f106b20b39c 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -532,9 +532,8 @@ static int nvme_rdma_create_queue_ib(struct nvme_rdma_queue 
*queue)
 */
pages_per_mr = nvme_rdma_get_max_fr_pages(ibdev, queue->pi_support) + 1;
ret = ib_mr_pool_init(queue->qp, &queue->qp->rdma_mrs,
- queue->queue_size,
- IB_MR_TYPE_MEM_REG,
- pages_per_mr, 0, 0);
+ queue->queue_size, IB_MR_TYPE_MEM_REG,
+ pages_per_mr, 0, IB_ACCESS_RELAXED_ORDERING);
if (ret) {
dev_err(queue->ctrl->ctrl.device,
"failed to initialize MR pool sized %d for QID %d\n",
@@ -545,7 +544,8 @@ static int nvme_rdma_create_queue_ib(struct nvme_rdma_queue 
*queue)
if (queue->pi_support) {
ret = ib_mr_pool_init(queue->qp, &queue->qp->sig_mrs,
  queue->queue_size, IB_MR_TYPE_INTEGRITY,
- pages_per_mr, pages_per_mr, 0);
+ pages_per_mr, pages_per_mr,
+ IB_ACCESS_RELAXED_ORDERING);
if (ret) {
dev_err(queue->ctrl->ctrl.device,
"failed to initialize PI MR pool sized %d for 
QID %d\n",
@@ -1382,9 +1382,9 @@ static int nvme_rdma_map_sg_fr(struct nvme_rdma_queue 
*queue,
req->reg_wr.wr.num_sge = 0;
req->reg_wr.mr = req->mr;
req->reg_wr.key = req->mr->rkey;
-   req->reg_wr.access = IB_ACCESS_LOCAL_WRITE |
-IB_ACCESS_REMOTE_READ |
-IB_ACCESS_REMOTE_WRITE;
+   req->reg_wr.access = IB_ACCESS_LOCAL_WRITE | IB_ACCESS_REMOTE_READ |
+IB_ACCESS_REMOTE_WRITE |
+IB_ACCESS_RELAXED_ORDERING;
 
sg->addr = cpu_to_le64(req->mr->iova);
put_unaligned_le24(req->mr->length, sg->length);
@@ -1488,9 +1488,8 @@ static int nvme_rdma_map_sg_pi(struct nvme_rdma_queue 
*queue,
wr->wr.send_flags = 0;
wr->mr = req->mr;
wr->key = req->mr->rkey;
-   wr->access = IB_ACCESS_LOCAL_WRITE |
-IB_ACCESS_REMOTE_READ |
-IB_ACCESS_REMOTE_WRITE;
+   wr->access = IB_ACCESS_LOCAL_WRITE | IB_ACCESS_REMOTE_READ |
+IB_ACCESS_REMOTE_WRITE | IB_ACCESS_RELAXED_ORDERING;
 
sg->addr = cpu_to_le64(req->mr->iova);
put_unaligned_le24(req->mr->length, sg->length);
-- 
2.30.2



[PATCH rdma-next 08/10] net/rds: Enable Relaxed Ordering

2021-04-04 Thread Leon Romanovsky
From: Avihai Horon 

Enable Relaxed Ordering for rds.

Relaxed Ordering is an optional access flag and as such, it is ignored
by vendors that don't support it.

Signed-off-by: Avihai Horon 
Reviewed-by: Michael Guralnik 
Signed-off-by: Leon Romanovsky 
---
 net/rds/ib_frmr.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/net/rds/ib_frmr.c b/net/rds/ib_frmr.c
index 694eb916319e..1a60c2c90c78 100644
--- a/net/rds/ib_frmr.c
+++ b/net/rds/ib_frmr.c
@@ -76,7 +76,7 @@ static struct rds_ib_mr *rds_ib_alloc_frmr(struct 
rds_ib_device *rds_ibdev,
 
frmr = &ibmr->u.frmr;
frmr->mr = ib_alloc_mr(rds_ibdev->pd, IB_MR_TYPE_MEM_REG,
-  pool->max_pages, 0);
+  pool->max_pages, IB_ACCESS_RELAXED_ORDERING);
if (IS_ERR(frmr->mr)) {
pr_warn("RDS/IB: %s failed to allocate MR", __func__);
err = PTR_ERR(frmr->mr);
@@ -156,9 +156,8 @@ static int rds_ib_post_reg_frmr(struct rds_ib_mr *ibmr)
reg_wr.wr.num_sge = 0;
reg_wr.mr = frmr->mr;
reg_wr.key = frmr->mr->rkey;
-   reg_wr.access = IB_ACCESS_LOCAL_WRITE |
-   IB_ACCESS_REMOTE_READ |
-   IB_ACCESS_REMOTE_WRITE;
+   reg_wr.access = IB_ACCESS_LOCAL_WRITE | IB_ACCESS_REMOTE_READ |
+   IB_ACCESS_REMOTE_WRITE | IB_ACCESS_RELAXED_ORDERING;
reg_wr.wr.send_flags = IB_SEND_SIGNALED;
 
ret = ib_post_send(ibmr->ic->i_cm_id->qp, ®_wr.wr, NULL);
-- 
2.30.2



[PATCH rdma-next 02/10] RDMA/core: Enable Relaxed Ordering in __ib_alloc_pd()

2021-04-04 Thread Leon Romanovsky
From: Avihai Horon 

Enable Relaxed Ordering in __ib_alloc_pd() allocation of the
local_dma_lkey.

This will take effect only for devices that don't pre-allocate the lkey
but allocate it per PD allocation.

Signed-off-by: Avihai Horon 
Reviewed-by: Michael Guralnik 
Signed-off-by: Leon Romanovsky 
---
 drivers/infiniband/core/verbs.c  | 3 ++-
 drivers/infiniband/hw/vmw_pvrdma/pvrdma_mr.c | 1 +
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index a1782f8a6ca0..9b719f7d6fd5 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -287,7 +287,8 @@ struct ib_pd *__ib_alloc_pd(struct ib_device *device, 
unsigned int flags,
if (device->attrs.device_cap_flags & IB_DEVICE_LOCAL_DMA_LKEY)
pd->local_dma_lkey = device->local_dma_lkey;
else
-   mr_access_flags |= IB_ACCESS_LOCAL_WRITE;
+   mr_access_flags |=
+   IB_ACCESS_LOCAL_WRITE | IB_ACCESS_RELAXED_ORDERING;
 
if (flags & IB_PD_UNSAFE_GLOBAL_RKEY) {
pr_warn("%s: enabling unsafe global rkey\n", caller);
diff --git a/drivers/infiniband/hw/vmw_pvrdma/pvrdma_mr.c 
b/drivers/infiniband/hw/vmw_pvrdma/pvrdma_mr.c
index b3fa783698a0..d74827694f92 100644
--- a/drivers/infiniband/hw/vmw_pvrdma/pvrdma_mr.c
+++ b/drivers/infiniband/hw/vmw_pvrdma/pvrdma_mr.c
@@ -66,6 +66,7 @@ struct ib_mr *pvrdma_get_dma_mr(struct ib_pd *pd, int acc)
int ret;
 
/* Support only LOCAL_WRITE flag for DMA MRs */
+   acc &= ~IB_ACCESS_RELAXED_ORDERING;
if (acc & ~IB_ACCESS_LOCAL_WRITE) {
dev_warn(&dev->pdev->dev,
 "unsupported dma mr access flags %#x\n", acc);
-- 
2.30.2



[PATCH rdma-next 07/10] cifs: smbd: Enable Relaxed Ordering

2021-04-04 Thread Leon Romanovsky
From: Avihai Horon 

Enable Relaxed Ordering for smbd.

Relaxed Ordering is an optional access flag and as such, it is ignored
by vendors that don't support it.

Signed-off-by: Avihai Horon 
Reviewed-by: Michael Guralnik 
Signed-off-by: Leon Romanovsky 
---
 fs/cifs/smbdirect.c | 16 ++--
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/fs/cifs/smbdirect.c b/fs/cifs/smbdirect.c
index 647098a5cf3b..1e86dc8bbe85 100644
--- a/fs/cifs/smbdirect.c
+++ b/fs/cifs/smbdirect.c
@@ -2178,8 +2178,10 @@ static void smbd_mr_recovery_work(struct work_struct 
*work)
continue;
}
 
-   smbdirect_mr->mr = ib_alloc_mr(info->pd, info->mr_type,
-  info->max_frmr_depth, 0);
+   smbdirect_mr->mr =
+   ib_alloc_mr(info->pd, info->mr_type,
+   info->max_frmr_depth,
+   IB_ACCESS_RELAXED_ORDERING);
if (IS_ERR(smbdirect_mr->mr)) {
log_rdma_mr(ERR, "ib_alloc_mr failed mr_type=%x 
max_frmr_depth=%x\n",
info->mr_type,
@@ -2244,7 +2246,8 @@ static int allocate_mr_list(struct smbd_connection *info)
if (!smbdirect_mr)
goto out;
smbdirect_mr->mr = ib_alloc_mr(info->pd, info->mr_type,
-  info->max_frmr_depth, 0);
+  info->max_frmr_depth,
+  IB_ACCESS_RELAXED_ORDERING);
if (IS_ERR(smbdirect_mr->mr)) {
log_rdma_mr(ERR, "ib_alloc_mr failed mr_type=%x 
max_frmr_depth=%x\n",
info->mr_type, info->max_frmr_depth);
@@ -2406,9 +2409,10 @@ struct smbd_mr *smbd_register_mr(
reg_wr->wr.send_flags = IB_SEND_SIGNALED;
reg_wr->mr = smbdirect_mr->mr;
reg_wr->key = smbdirect_mr->mr->rkey;
-   reg_wr->access = writing ?
-   IB_ACCESS_REMOTE_WRITE | IB_ACCESS_LOCAL_WRITE :
-   IB_ACCESS_REMOTE_READ;
+   reg_wr->access =
+   (writing ? IB_ACCESS_REMOTE_WRITE | IB_ACCESS_LOCAL_WRITE :
+  IB_ACCESS_REMOTE_READ) |
+   IB_ACCESS_RELAXED_ORDERING;
 
/*
 * There is no need for waiting for complemtion on ib_post_send
-- 
2.30.2



[PATCH rdma-next 05/10] RDMA/srp: Enable Relaxed Ordering

2021-04-04 Thread Leon Romanovsky
From: Avihai Horon 

Enable Relaxed Ordering for srp.

Relaxed Ordering is an optional access flag and as such, it is ignored
by vendors that don't support it.

Signed-off-by: Avihai Horon 
Reviewed-by: Max Gurtovoy 
Reviewed-by: Michael Guralnik 
Signed-off-by: Leon Romanovsky 
---
 drivers/infiniband/ulp/srp/ib_srp.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/infiniband/ulp/srp/ib_srp.c 
b/drivers/infiniband/ulp/srp/ib_srp.c
index 8481ad769ba4..0026660c3ead 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -436,7 +436,8 @@ static struct srp_fr_pool *srp_create_fr_pool(struct 
ib_device *device,
mr_type = IB_MR_TYPE_MEM_REG;
 
for (i = 0, d = &pool->desc[0]; i < pool->size; i++, d++) {
-   mr = ib_alloc_mr(pd, mr_type, max_page_list_len, 0);
+   mr = ib_alloc_mr(pd, mr_type, max_page_list_len,
+IB_ACCESS_RELAXED_ORDERING);
if (IS_ERR(mr)) {
ret = PTR_ERR(mr);
if (ret == -ENOMEM)
@@ -1487,9 +1488,8 @@ static int srp_map_finish_fr(struct srp_map_state *state,
wr.wr.send_flags = 0;
wr.mr = desc->mr;
wr.key = desc->mr->rkey;
-   wr.access = (IB_ACCESS_LOCAL_WRITE |
-IB_ACCESS_REMOTE_READ |
-IB_ACCESS_REMOTE_WRITE);
+   wr.access = (IB_ACCESS_LOCAL_WRITE | IB_ACCESS_REMOTE_READ |
+IB_ACCESS_REMOTE_WRITE | IB_ACCESS_RELAXED_ORDERING);
 
*state->fr.next++ = desc;
state->nmdesc++;
-- 
2.30.2



[PATCH rdma-next 04/10] RDMA/rtrs: Enable Relaxed Ordering

2021-04-04 Thread Leon Romanovsky
From: Avihai Horon 

Enable Relaxed Ordering fro rtrs client and server.

Relaxed Ordering is an optional access flag and as such, it is ignored
by vendors that don't support it.

Signed-off-by: Avihai Horon 
Reviewed-by: Michael Guralnik 
Signed-off-by: Leon Romanovsky 
---
 drivers/infiniband/ulp/rtrs/rtrs-clt.c |  6 --
 drivers/infiniband/ulp/rtrs/rtrs-srv.c | 15 ---
 2 files changed, 12 insertions(+), 9 deletions(-)

diff --git a/drivers/infiniband/ulp/rtrs/rtrs-clt.c 
b/drivers/infiniband/ulp/rtrs/rtrs-clt.c
index 0d3960ed5b2b..a3fbb47a3574 100644
--- a/drivers/infiniband/ulp/rtrs/rtrs-clt.c
+++ b/drivers/infiniband/ulp/rtrs/rtrs-clt.c
@@ -1099,7 +1099,8 @@ static int rtrs_clt_read_req(struct rtrs_clt_io_req *req)
.mr = req->mr,
.key = req->mr->rkey,
.access = (IB_ACCESS_LOCAL_WRITE |
-  IB_ACCESS_REMOTE_WRITE),
+  IB_ACCESS_REMOTE_WRITE |
+  IB_ACCESS_RELAXED_ORDERING),
};
wr = &rwr.wr;
 
@@ -1260,7 +1261,8 @@ static int alloc_sess_reqs(struct rtrs_clt_sess *sess)
goto out;
 
req->mr = ib_alloc_mr(sess->s.dev->ib_pd, IB_MR_TYPE_MEM_REG,
- sess->max_pages_per_mr, 0);
+ sess->max_pages_per_mr,
+ IB_ACCESS_RELAXED_ORDERING);
if (IS_ERR(req->mr)) {
err = PTR_ERR(req->mr);
req->mr = NULL;
diff --git a/drivers/infiniband/ulp/rtrs/rtrs-srv.c 
b/drivers/infiniband/ulp/rtrs/rtrs-srv.c
index 575f31ff20fd..c28ed5e2245d 100644
--- a/drivers/infiniband/ulp/rtrs/rtrs-srv.c
+++ b/drivers/infiniband/ulp/rtrs/rtrs-srv.c
@@ -312,8 +312,8 @@ static int rdma_write_sg(struct rtrs_srv_op *id)
rwr.mr = srv_mr->mr;
rwr.wr.send_flags = 0;
rwr.key = srv_mr->mr->rkey;
-   rwr.access = (IB_ACCESS_LOCAL_WRITE |
- IB_ACCESS_REMOTE_WRITE);
+   rwr.access = (IB_ACCESS_LOCAL_WRITE | IB_ACCESS_REMOTE_WRITE |
+ IB_ACCESS_RELAXED_ORDERING);
msg = srv_mr->iu->buf;
msg->buf_id = cpu_to_le16(id->msg_id);
msg->type = cpu_to_le16(RTRS_MSG_RKEY_RSP);
@@ -432,8 +432,8 @@ static int send_io_resp_imm(struct rtrs_srv_con *con, 
struct rtrs_srv_op *id,
rwr.wr.send_flags = 0;
rwr.mr = srv_mr->mr;
rwr.key = srv_mr->mr->rkey;
-   rwr.access = (IB_ACCESS_LOCAL_WRITE |
- IB_ACCESS_REMOTE_WRITE);
+   rwr.access = (IB_ACCESS_LOCAL_WRITE | IB_ACCESS_REMOTE_WRITE |
+ IB_ACCESS_RELAXED_ORDERING);
msg = srv_mr->iu->buf;
msg->buf_id = cpu_to_le16(id->msg_id);
msg->type = cpu_to_le16(RTRS_MSG_RKEY_RSP);
@@ -638,7 +638,7 @@ static int map_cont_bufs(struct rtrs_srv_sess *sess)
goto free_sg;
}
mr = ib_alloc_mr(sess->s.dev->ib_pd, IB_MR_TYPE_MEM_REG,
-sgt->nents, 0);
+sgt->nents, IB_ACCESS_RELAXED_ORDERING);
if (IS_ERR(mr)) {
err = PTR_ERR(mr);
goto unmap_sg;
@@ -823,8 +823,9 @@ static int process_info_req(struct rtrs_srv_con *con,
rwr[mri].wr.send_flags = 0;
rwr[mri].mr = mr;
rwr[mri].key = mr->rkey;
-   rwr[mri].access = (IB_ACCESS_LOCAL_WRITE |
-  IB_ACCESS_REMOTE_WRITE);
+   rwr[mri].access =
+   (IB_ACCESS_LOCAL_WRITE | IB_ACCESS_REMOTE_WRITE |
+IB_ACCESS_RELAXED_ORDERING);
reg_wr = &rwr[mri].wr;
}
 
-- 
2.30.2



[PATCH rdma-next 03/10] RDMA/iser: Enable Relaxed Ordering

2021-04-04 Thread Leon Romanovsky
From: Avihai Horon 

Enable Relaxed Ordering for iser.

Relaxed Ordering is an optional access flag and as such, it is ignored
by vendors that don't support it.

Signed-off-by: Avihai Horon 
Reviewed-by: Max Gurtovoy 
Reviewed-by: Michael Guralnik 
Signed-off-by: Leon Romanovsky 
---
 drivers/infiniband/ulp/iser/iser_memory.c | 10 --
 drivers/infiniband/ulp/iser/iser_verbs.c  |  6 --
 2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/infiniband/ulp/iser/iser_memory.c 
b/drivers/infiniband/ulp/iser/iser_memory.c
index afec40da9b58..29849c9e82e8 100644
--- a/drivers/infiniband/ulp/iser/iser_memory.c
+++ b/drivers/infiniband/ulp/iser/iser_memory.c
@@ -271,9 +271,8 @@ iser_reg_sig_mr(struct iscsi_iser_task *iser_task,
wr->wr.send_flags = 0;
wr->mr = mr;
wr->key = mr->rkey;
-   wr->access = IB_ACCESS_LOCAL_WRITE |
-IB_ACCESS_REMOTE_READ |
-IB_ACCESS_REMOTE_WRITE;
+   wr->access = IB_ACCESS_LOCAL_WRITE | IB_ACCESS_REMOTE_READ |
+IB_ACCESS_REMOTE_WRITE | IB_ACCESS_RELAXED_ORDERING;
rsc->mr_valid = 1;
 
sig_reg->sge.lkey = mr->lkey;
@@ -318,9 +317,8 @@ static int iser_fast_reg_mr(struct iscsi_iser_task 
*iser_task,
wr->wr.num_sge = 0;
wr->mr = mr;
wr->key = mr->rkey;
-   wr->access = IB_ACCESS_LOCAL_WRITE  |
-IB_ACCESS_REMOTE_WRITE |
-IB_ACCESS_REMOTE_READ;
+   wr->access = IB_ACCESS_LOCAL_WRITE | IB_ACCESS_REMOTE_WRITE |
+IB_ACCESS_REMOTE_READ | IB_ACCESS_RELAXED_ORDERING;
 
rsc->mr_valid = 1;
 
diff --git a/drivers/infiniband/ulp/iser/iser_verbs.c 
b/drivers/infiniband/ulp/iser/iser_verbs.c
index 3c370ee25f2f..1e236b1cf29b 100644
--- a/drivers/infiniband/ulp/iser/iser_verbs.c
+++ b/drivers/infiniband/ulp/iser/iser_verbs.c
@@ -121,7 +121,8 @@ iser_create_fastreg_desc(struct iser_device *device,
else
mr_type = IB_MR_TYPE_MEM_REG;
 
-   desc->rsc.mr = ib_alloc_mr(pd, mr_type, size, 0);
+   desc->rsc.mr = ib_alloc_mr(pd, mr_type, size,
+   IB_ACCESS_RELAXED_ORDERING);
if (IS_ERR(desc->rsc.mr)) {
ret = PTR_ERR(desc->rsc.mr);
iser_err("Failed to allocate ib_fast_reg_mr err=%d\n", ret);
@@ -129,7 +130,8 @@ iser_create_fastreg_desc(struct iser_device *device,
}
 
if (pi_enable) {
-   desc->rsc.sig_mr = ib_alloc_mr_integrity(pd, size, size, 0);
+   desc->rsc.sig_mr = ib_alloc_mr_integrity(pd, size, size,
+   IB_ACCESS_RELAXED_ORDERING);
if (IS_ERR(desc->rsc.sig_mr)) {
ret = PTR_ERR(desc->rsc.sig_mr);
iser_err("Failed to allocate sig_mr err=%d\n", ret);
-- 
2.30.2



[PATCH rdma-next 01/10] RDMA: Add access flags to ib_alloc_mr() and ib_mr_pool_init()

2021-04-04 Thread Leon Romanovsky
From: Avihai Horon 

Add access flags parameter to ib_alloc_mr() and to ib_mr_pool_init(),
and refactor relevant code. This parameter is used to pass MR access
flags during MR allocation.

In the following patches, the new access flags parameter will be used
to enable Relaxed Ordering for ib_alloc_mr() and ib_mr_pool_init() users.

Signed-off-by: Avihai Horon 
Reviewed-by: Max Gurtovoy 
Reviewed-by: Michael Guralnik 
Signed-off-by: Leon Romanovsky 
---
 drivers/infiniband/core/mr_pool.c |  7 +-
 drivers/infiniband/core/rw.c  | 12 ++--
 drivers/infiniband/core/verbs.c   | 23 +--
 drivers/infiniband/hw/bnxt_re/ib_verbs.c  |  2 +-
 drivers/infiniband/hw/bnxt_re/ib_verbs.h  |  2 +-
 drivers/infiniband/hw/cxgb4/iw_cxgb4.h|  2 +-
 drivers/infiniband/hw/cxgb4/mem.c |  2 +-
 drivers/infiniband/hw/hns/hns_roce_device.h   |  2 +-
 drivers/infiniband/hw/hns/hns_roce_mr.c   |  2 +-
 drivers/infiniband/hw/i40iw/i40iw_verbs.c |  3 +-
 drivers/infiniband/hw/mlx4/mlx4_ib.h  |  2 +-
 drivers/infiniband/hw/mlx4/mr.c   |  2 +-
 drivers/infiniband/hw/mlx5/mlx5_ib.h  | 12 ++--
 drivers/infiniband/hw/mlx5/mr.c   | 61 
 drivers/infiniband/hw/mlx5/wr.c   | 69 ++-
 drivers/infiniband/hw/ocrdma/ocrdma_verbs.c   |  2 +-
 drivers/infiniband/hw/ocrdma/ocrdma_verbs.h   |  2 +-
 drivers/infiniband/hw/qedr/verbs.c|  2 +-
 drivers/infiniband/hw/qedr/verbs.h|  2 +-
 drivers/infiniband/hw/vmw_pvrdma/pvrdma_mr.c  |  3 +-
 .../infiniband/hw/vmw_pvrdma/pvrdma_verbs.h   |  2 +-
 drivers/infiniband/sw/rdmavt/mr.c |  3 +-
 drivers/infiniband/sw/rdmavt/mr.h |  2 +-
 drivers/infiniband/sw/rxe/rxe_verbs.c |  2 +-
 drivers/infiniband/sw/siw/siw_verbs.c |  2 +-
 drivers/infiniband/sw/siw/siw_verbs.h |  2 +-
 drivers/infiniband/ulp/iser/iser_verbs.c  |  4 +-
 drivers/infiniband/ulp/rtrs/rtrs-clt.c|  2 +-
 drivers/infiniband/ulp/rtrs/rtrs-srv.c|  2 +-
 drivers/infiniband/ulp/srp/ib_srp.c   |  2 +-
 drivers/nvme/host/rdma.c  |  4 +-
 fs/cifs/smbdirect.c   |  7 +-
 include/rdma/ib_verbs.h   | 11 ++-
 include/rdma/mr_pool.h|  3 +-
 net/rds/ib_frmr.c |  2 +-
 net/smc/smc_ib.c  |  2 +-
 net/sunrpc/xprtrdma/frwr_ops.c|  2 +-
 37 files changed, 163 insertions(+), 105 deletions(-)

diff --git a/drivers/infiniband/core/mr_pool.c 
b/drivers/infiniband/core/mr_pool.c
index c0e2df128b34..b869c3487475 100644
--- a/drivers/infiniband/core/mr_pool.c
+++ b/drivers/infiniband/core/mr_pool.c
@@ -34,7 +34,8 @@ void ib_mr_pool_put(struct ib_qp *qp, struct list_head *list, 
struct ib_mr *mr)
 EXPORT_SYMBOL(ib_mr_pool_put);
 
 int ib_mr_pool_init(struct ib_qp *qp, struct list_head *list, int nr,
-   enum ib_mr_type type, u32 max_num_sg, u32 max_num_meta_sg)
+   enum ib_mr_type type, u32 max_num_sg, u32 max_num_meta_sg,
+   u32 access)
 {
struct ib_mr *mr;
unsigned long flags;
@@ -43,9 +44,9 @@ int ib_mr_pool_init(struct ib_qp *qp, struct list_head *list, 
int nr,
for (i = 0; i < nr; i++) {
if (type == IB_MR_TYPE_INTEGRITY)
mr = ib_alloc_mr_integrity(qp->pd, max_num_sg,
-  max_num_meta_sg);
+  max_num_meta_sg, access);
else
-   mr = ib_alloc_mr(qp->pd, type, max_num_sg);
+   mr = ib_alloc_mr(qp->pd, type, max_num_sg, access);
if (IS_ERR(mr)) {
ret = PTR_ERR(mr);
goto out;
diff --git a/drivers/infiniband/core/rw.c b/drivers/infiniband/core/rw.c
index a588c2038479..d5a0038e82a4 100644
--- a/drivers/infiniband/core/rw.c
+++ b/drivers/infiniband/core/rw.c
@@ -110,7 +110,7 @@ static int rdma_rw_init_one_mr(struct ib_qp *qp, u32 
port_num,
 
reg->reg_wr.wr.opcode = IB_WR_REG_MR;
reg->reg_wr.mr = reg->mr;
-   reg->reg_wr.access = IB_ACCESS_LOCAL_WRITE;
+   reg->reg_wr.access = IB_ACCESS_LOCAL_WRITE | IB_ACCESS_RELAXED_ORDERING;
if (rdma_protocol_iwarp(qp->device, port_num))
reg->reg_wr.access |= IB_ACCESS_REMOTE_WRITE;
count++;
@@ -437,7 +437,8 @@ int rdma_rw_ctx_signature_init(struct rdma_rw_ctx *ctx, 
struct ib_qp *qp,
ctx->reg->reg_wr.wr.wr_cqe = NULL;
ctx->reg->reg_wr.wr.num_sge = 0;
ctx->reg->reg_wr.wr.send_flags = 0;
-   ctx->reg->reg_wr.access = IB_ACCESS_LOCAL_WRITE;
+   ctx->reg->reg_wr.access =
+   IB_ACCESS_LOCAL_WRITE | IB_ACCESS_RELAXED_ORDERING;
if (rdma

[PATCH rdma-next 00/10] Enable relaxed ordering for ULPs

2021-04-04 Thread Leon Romanovsky
From: Leon Romanovsky 

>From Avihai,

Relaxed Ordering is a PCIe mechanism that relaxes the strict ordering
imposed on PCI transactions, and thus, can improve performance.

Until now, relaxed ordering could be set only by user space applications
for user MRs. The following patch series enables relaxed ordering for the
kernel ULPs as well. Relaxed ordering is an optional capability, and as
such, it is ignored by vendors that don't support it.

The following test results show the performance improvement achieved
with relaxed ordering. The test was performed on a NVIDIA A100 in order
to check performance of storage infrastructure over xprtrdma:

Without Relaxed Ordering:
READ: bw=16.5GiB/s (17.7GB/s), 16.5GiB/s-16.5GiB/s (17.7GB/s-17.7GB/s),
io=1987GiB (2133GB), run=120422-120422msec

With relaxed ordering:
READ: bw=72.9GiB/s (78.2GB/s), 72.9GiB/s-72.9GiB/s (78.2GB/s-78.2GB/s),
io=2367GiB (2542GB), run=32492-32492msec

Thanks

Avihai Horon (10):
  RDMA: Add access flags to ib_alloc_mr() and ib_mr_pool_init()
  RDMA/core: Enable Relaxed Ordering in __ib_alloc_pd()
  RDMA/iser: Enable Relaxed Ordering
  RDMA/rtrs: Enable Relaxed Ordering
  RDMA/srp: Enable Relaxed Ordering
  nvme-rdma: Enable Relaxed Ordering
  cifs: smbd: Enable Relaxed Ordering
  net/rds: Enable Relaxed Ordering
  net/smc: Enable Relaxed Ordering
  xprtrdma: Enable Relaxed Ordering

 drivers/infiniband/core/mr_pool.c |  7 +-
 drivers/infiniband/core/rw.c  | 12 ++--
 drivers/infiniband/core/verbs.c   | 26 +--
 drivers/infiniband/hw/bnxt_re/ib_verbs.c  |  2 +-
 drivers/infiniband/hw/bnxt_re/ib_verbs.h  |  2 +-
 drivers/infiniband/hw/cxgb4/iw_cxgb4.h|  2 +-
 drivers/infiniband/hw/cxgb4/mem.c |  2 +-
 drivers/infiniband/hw/hns/hns_roce_device.h   |  2 +-
 drivers/infiniband/hw/hns/hns_roce_mr.c   |  2 +-
 drivers/infiniband/hw/i40iw/i40iw_verbs.c |  3 +-
 drivers/infiniband/hw/mlx4/mlx4_ib.h  |  2 +-
 drivers/infiniband/hw/mlx4/mr.c   |  2 +-
 drivers/infiniband/hw/mlx5/mlx5_ib.h  | 12 ++--
 drivers/infiniband/hw/mlx5/mr.c   | 61 
 drivers/infiniband/hw/mlx5/wr.c   | 69 ++-
 drivers/infiniband/hw/ocrdma/ocrdma_verbs.c   |  2 +-
 drivers/infiniband/hw/ocrdma/ocrdma_verbs.h   |  2 +-
 drivers/infiniband/hw/qedr/verbs.c|  2 +-
 drivers/infiniband/hw/qedr/verbs.h|  2 +-
 drivers/infiniband/hw/vmw_pvrdma/pvrdma_mr.c  |  4 +-
 .../infiniband/hw/vmw_pvrdma/pvrdma_verbs.h   |  2 +-
 drivers/infiniband/sw/rdmavt/mr.c |  3 +-
 drivers/infiniband/sw/rdmavt/mr.h |  2 +-
 drivers/infiniband/sw/rxe/rxe_verbs.c |  2 +-
 drivers/infiniband/sw/siw/siw_verbs.c |  2 +-
 drivers/infiniband/sw/siw/siw_verbs.h |  2 +-
 drivers/infiniband/ulp/iser/iser_memory.c | 10 ++-
 drivers/infiniband/ulp/iser/iser_verbs.c  |  6 +-
 drivers/infiniband/ulp/rtrs/rtrs-clt.c|  6 +-
 drivers/infiniband/ulp/rtrs/rtrs-srv.c| 15 ++--
 drivers/infiniband/ulp/srp/ib_srp.c   |  8 +--
 drivers/nvme/host/rdma.c  | 19 +++--
 fs/cifs/smbdirect.c   | 17 +++--
 include/rdma/ib_verbs.h   | 11 ++-
 include/rdma/mr_pool.h|  3 +-
 net/rds/ib_frmr.c |  7 +-
 net/smc/smc_ib.c  |  3 +-
 net/smc/smc_wr.c  |  3 +-
 net/sunrpc/xprtrdma/frwr_ops.c| 10 +--
 39 files changed, 209 insertions(+), 140 deletions(-)

-- 
2.30.2



Re: [PATCH] net/mlx5: fix kfree mismatch in indir_table.c

2021-04-04 Thread Leon Romanovsky
On Mon, Apr 05, 2021 at 10:53:39AM +0800, Xiaoming Ni wrote:
> Memory allocated by kvzalloc() should be freed by kvfree().
> 
> Fixes: 34ca65352ddf2 ("net/mlx5: E-Switch, Indirect table infrastructur")
> Signed-off-by: Xiaoming Ni 
> ---
>  .../net/ethernet/mellanox/mlx5/core/esw/indir_table.c  | 10 +-
>  1 file changed, 5 insertions(+), 5 deletions(-)
> 

Thanks,
Reviewed-by: Leon Romanovsky 


Re: [PATCH net-next 05/12] ionic: add hw timestamp support files

2021-04-04 Thread Leon Romanovsky
On Sun, Apr 04, 2021 at 04:05:26PM -0700, Richard Cochran wrote:
> On Thu, Apr 01, 2021 at 10:56:03AM -0700, Shannon Nelson wrote:
> > @@ -0,0 +1,589 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/* Copyright(c) 2017 - 2021 Pensando Systems, Inc */
> > +
> > +#include 
> > +#include 
> > +
> > +#include "ionic.h"
> > +#include "ionic_bus.h"
> > +#include "ionic_lif.h"
> > +#include "ionic_ethtool.h"
> > +
> > +static int ionic_hwstamp_tx_mode(int config_tx_type)
> > +{
> > +   switch (config_tx_type) {
> > +   case HWTSTAMP_TX_OFF:
> > +   return IONIC_TXSTAMP_OFF;
> > +   case HWTSTAMP_TX_ON:
> > +   return IONIC_TXSTAMP_ON;
> > +   case HWTSTAMP_TX_ONESTEP_SYNC:
> > +   return IONIC_TXSTAMP_ONESTEP_SYNC;
> > +#ifdef HAVE_HWSTAMP_TX_ONESTEP_P2P
> > +   case HWTSTAMP_TX_ONESTEP_P2P:
> > +   return IONIC_TXSTAMP_ONESTEP_P2P;
> > +#endif
> 
> This ifdef is not needed.  (I guess you have to support older kernel
> versions, but my understanding of the policy is that new code
> shouldn't carry such stuff).

The HAVE_HWSTAMP_TX_ONESTEP_P2P don't exist in the kernel and the ifdef should
be deleted.

Thanks


Re: [PATCH rdma-next 6/7] RDMA/mlx5: Add support in MEMIC operations

2021-04-04 Thread Leon Romanovsky
On Thu, Apr 01, 2021 at 02:47:04PM -0300, Jason Gunthorpe wrote:
> On Thu, Mar 18, 2021 at 01:15:47PM +0200, Leon Romanovsky wrote:
> > From: Maor Gottlieb 
> > 
> > MEMIC buffer, in addition to regular read and write operations, can
> > support atomic operations from the host.
> > 
> > Introduce and implement new UAPI to allocate address space for MEMIC
> > operations such as atomic. This includes:

<...>

> It looks mostly fine otherwise, the error flows are a bit hard to read
> though, when a new type is added this should also get re-organized so
> we don't do stuff like:
> 
> err_free:
>   /* In MEMIC error flow, dm will be freed internally */
>   if (type != MLX5_IB_UAPI_DM_TYPE_MEMIC)
>   kfree(dm);

I actually liked it, because the "re-organized" code was harder to read
than this simple check. but ok, let's try again.

Thanks


Re: [PATCH mlx5-next v8 0/4] Dynamically assign MSI-X vectors count

2021-04-04 Thread Leon Romanovsky
On Sun, Mar 14, 2021 at 02:42:52PM +0200, Leon Romanovsky wrote:
> -
> Changelog
> v8:
>  * Added "physical/virtual function" words near PF and VF acronyms.
> v7: https://lore.kernel.org/linux-pci/20210301075524.441609-1-l...@kernel.org
>  * Rebase on top v5.12-rc1
>  * More english fixes
>  * Returned to static sysfs creation model as was implemented in v0/v1.

<...>

> Leon Romanovsky (4):
>   PCI: Add a sysfs file to change the MSI-X table size of SR-IOV VFs
>   net/mlx5: Add dynamic MSI-X capabilities bits
>   net/mlx5: Dynamically assign MSI-X vectors count
>   net/mlx5: Implement sriov_get_vf_total_msix/count() callbacks

applied to mlx5-next with changes asked by Bjorn.

e71b75f73763 net/mlx5: Implement sriov_get_vf_total_msix/count() callbacks
604774add516 net/mlx5: Dynamically assign MSI-X vectors count
0b989c1e3705 net/mlx5: Add dynamic MSI-X capabilities bits
c3d5c2d96d69 PCI/IOV: Add sysfs MSI-X vector assignment interface

Thanks


Re: [PATCH mlx5-next v8 1/4] PCI: Add a sysfs file to change the MSI-X table size of SR-IOV VFs

2021-04-04 Thread Leon Romanovsky
On Fri, Apr 02, 2021 at 07:24:26PM -0500, Bjorn Helgaas wrote:
> Possible subject, since this adds *two* files, not just "a file":
> 
>   PCI/IOV: Add sysfs MSI-X vector assignment interface

Sure

> 
> On Sun, Mar 14, 2021 at 02:42:53PM +0200, Leon Romanovsky wrote:
> > A typical cloud provider SR-IOV use case is to create many VFs for use by
> > guest VMs. The VFs may not be assigned to a VM until a customer requests a
> > VM of a certain size, e.g., number of CPUs. A VF may need MSI-X vectors
> > proportional to the number of CPUs in the VM, but there is no standard way
> > to change the number of MSI-X vectors supported by a VF.
> > ...
> 
> > +#ifdef CONFIG_PCI_MSI
> > +static ssize_t sriov_vf_msix_count_store(struct device *dev,
> > +struct device_attribute *attr,
> > +const char *buf, size_t count)
> > +{
> > +   struct pci_dev *vf_dev = to_pci_dev(dev);
> > +   struct pci_dev *pdev = pci_physfn(vf_dev);
> > +   int val, ret;
> > +
> > +   ret = kstrtoint(buf, 0, &val);
> > +   if (ret)
> > +   return ret;
> > +
> > +   if (val < 0)
> > +   return -EINVAL;
> > +
> > +   device_lock(&pdev->dev);
> > +   if (!pdev->driver || !pdev->driver->sriov_set_msix_vec_count) {
> > +   ret = -EOPNOTSUPP;
> > +   goto err_pdev;
> > +   }
> > +
> > +   device_lock(&vf_dev->dev);
> > +   if (vf_dev->driver) {
> > +   /*
> > +* A driver is already attached to this VF and has configured
> > +* itself based on the current MSI-X vector count. Changing
> > +* the vector size could mess up the driver, so block it.
> > +*/
> > +   ret = -EBUSY;
> > +   goto err_dev;
> > +   }
> > +
> > +   ret = pdev->driver->sriov_set_msix_vec_count(vf_dev, val);
> > +
> > +err_dev:
> > +   device_unlock(&vf_dev->dev);
> > +err_pdev:
> > +   device_unlock(&pdev->dev);
> > +   return ret ? : count;
> > +}
> > +static DEVICE_ATTR_WO(sriov_vf_msix_count);
> > +
> > +static ssize_t sriov_vf_total_msix_show(struct device *dev,
> > +   struct device_attribute *attr,
> > +   char *buf)
> > +{
> > +   struct pci_dev *pdev = to_pci_dev(dev);
> > +   u32 vf_total_msix = 0;
> > +
> > +   device_lock(dev);
> > +   if (!pdev->driver || !pdev->driver->sriov_get_vf_total_msix)
> > +   goto unlock;
> > +
> > +   vf_total_msix = pdev->driver->sriov_get_vf_total_msix(pdev);
> > +unlock:
> > +   device_unlock(dev);
> > +   return sysfs_emit(buf, "%u\n", vf_total_msix);
> > +}
> > +static DEVICE_ATTR_RO(sriov_vf_total_msix);
> 
> Can you reverse the order of sriov_vf_total_msix_show() and
> sriov_vf_msix_count_store()?  Currently we have:
> 
>   VF stuff (msix_count_store)
>   PF stuff (total_msix)
>   more VF stuff related to the above (vf_dev_attrs, are_visible)
> 
> so the total_msix bit is mixed in the middle.

No problem, I'll do.

> 
> > +#endif
> > +
> > +static struct attribute *sriov_vf_dev_attrs[] = {
> > +#ifdef CONFIG_PCI_MSI
> > +   &dev_attr_sriov_vf_msix_count.attr,
> > +#endif
> > +   NULL,
> > +};
> > +
> > +static umode_t sriov_vf_attrs_are_visible(struct kobject *kobj,
> > + struct attribute *a, int n)
> > +{
> > +   struct device *dev = kobj_to_dev(kobj);
> > +   struct pci_dev *pdev = to_pci_dev(dev);
> > +
> > +   if (!pdev->is_virtfn)
> > +   return 0;
> > +
> > +   return a->mode;
> > +}
> > +
> > +const struct attribute_group sriov_vf_dev_attr_group = {
> > +   .attrs = sriov_vf_dev_attrs,
> > +   .is_visible = sriov_vf_attrs_are_visible,
> > +};
> > +
> >  int pci_iov_add_virtfn(struct pci_dev *dev, int id)
> >  {
> > int i;
> > @@ -400,18 +487,21 @@ static DEVICE_ATTR_RO(sriov_stride);
> >  static DEVICE_ATTR_RO(sriov_vf_device);
> >  static DEVICE_ATTR_RW(sriov_drivers_autoprobe);
> > 
> > -static struct attribute *sriov_dev_attrs[] = {
> > +static struct attribute *sriov_pf_dev_attrs[] = {
> 
> This and the related sriov_pf_attrs_are_visible change below are nice.
> Would you mind splitting them to a preliminary patch, since they
> really aren't related to the concept of *this* 

Re: [PATCH net 1/2] net: hns3: Remove the left over redundant check & assignment

2021-04-03 Thread Leon Romanovsky
On Sat, Apr 03, 2021 at 02:35:19AM +0100, Salil Mehta wrote:
> This removes the left over check and assignment which is no longer used
> anywhere in the function and should have been removed as part of the
> below mentioned patch.
> 
> Fixes: 012fcb52f67c ("net: hns3: activate reset timer when calling 
> reset_event")
> Signed-off-by: Salil Mehta 
> ---
>  drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c | 2 --
>  1 file changed, 2 deletions(-)
> 
> diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c 
> b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
> index e3f81c7e0ce7..7ad0722383f5 100644
> --- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
> +++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
> @@ -3976,8 +3976,6 @@ static void hclge_reset_event(struct pci_dev *pdev, 
> struct hnae3_handle *handle)
>* want to make sure we throttle the reset request. Therefore, we will
>* not allow it again before 3*HZ times.
>*/
> - if (!handle)
> - handle = &hdev->vport[0].nic;

The comment above should be updated too, and probably the signature of
hclge_reset_event() worth to be changed.

Thanks

>  
>   if (time_before(jiffies, (hdev->last_reset_time +
> HCLGE_RESET_INTERVAL))) {
> -- 
> 2.17.1
> 


Re: [PATCH rdma-next v2 0/5] Get rid of custom made module dependency

2021-04-03 Thread Leon Romanovsky
On Sat, Apr 03, 2021 at 03:52:13PM +0530, Devesh Sharma wrote:
> On Thu, Apr 1, 2021 at 12:27 PM Leon Romanovsky  wrote:
> >
> > From: Leon Romanovsky 
> >
> > Changelog:
> > v2:
> >  * kbuild spotted that I didn't delete all code in patch #5, so deleted
> >even more ulp_ops derefences.
> > v1: 
> > https://lore.kernel.org/linux-rdma/20210329085212.257771-1-l...@kernel.org
> >  * Go much deeper and removed useless ULP indirection
> > v0: 
> > https://lore.kernel.org/linux-rdma/20210324142524.1135319-1-l...@kernel.org
> > ---
> >
> > The following series fixes issue spotted in [1], where bnxt_re driver
> > messed with module reference counting in order to implement symbol
> > dependency of bnxt_re and bnxt modules. All of this is done, when in
> > upstream we have only one ULP user of that bnxt module. The simple
> > declaration of exported symbol would do the trick.
> >
> > This series removes that custom module_get/_put, which is not supposed
> > to be in the driver from the beginning and get rid of nasty indirection
> > logic that isn't relevant for the upstream code.
> >
> > Such small changes allow us to simplify the bnxt code and my hope that
> > Devesh will continue where I stopped and remove struct bnxt_ulp_ops too.
> >
> > Thanks
> >
> > [1] 
> > https://lore.kernel.org/linux-rdma/20210324142524.1135319-1-l...@kernel.org
> >
> > Leon Romanovsky (5):
> >   RDMA/bnxt_re: Depend on bnxt ethernet driver and not blindly select it
> >   RDMA/bnxt_re: Create direct symbolic link between bnxt modules
> >   RDMA/bnxt_re: Get rid of custom module reference counting
> >   net/bnxt: Remove useless check of non-existent ULP id
> >   net/bnxt: Use direct API instead of useless indirection
> >
> >  drivers/infiniband/hw/bnxt_re/Kconfig |   4 +-
> >  drivers/infiniband/hw/bnxt_re/main.c  |  93 ++-
> >  drivers/net/ethernet/broadcom/bnxt/bnxt.c |   4 +-
> >  drivers/net/ethernet/broadcom/bnxt/bnxt.h |   1 -
> >  drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.c | 245 +++---
> >  drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.h |  32 +--
> >  6 files changed, 119 insertions(+), 260 deletions(-)
> 
> Hi Leon,
> 
> After a couple of internal discussions we reached a conclusion to
> implement the Auxbus driver interface and fix the problem once and for
> all.

Thanks Devesh,

Jason, it looks like we can proceed with this patchset, because in
auxbus mode this module refcount and ULP indirection logics will be
removed anyway.

Thanks

> >
> > --
> > 2.30.2
> >
> 
> 
> -- 
> -Regards
> Devesh




Re: [PATCH mlx5-next v7 0/4] Dynamically assign MSI-X vectors count

2021-04-01 Thread Leon Romanovsky
On Wed, Mar 31, 2021 at 08:23:40PM -0500, Bjorn Helgaas wrote:
> [+cc Rafael, in case you're interested in the driver core issue here]
> 
> On Wed, Mar 31, 2021 at 07:08:07AM +0300, Leon Romanovsky wrote:
> > On Tue, Mar 30, 2021 at 03:41:41PM -0500, Bjorn Helgaas wrote:
> > > On Tue, Mar 30, 2021 at 04:47:16PM -0300, Jason Gunthorpe wrote:
> > > > On Tue, Mar 30, 2021 at 10:00:19AM -0500, Bjorn Helgaas wrote:
> > > > > On Tue, Mar 30, 2021 at 10:57:38AM -0300, Jason Gunthorpe wrote:
> > > > > > On Mon, Mar 29, 2021 at 08:29:49PM -0500, Bjorn Helgaas wrote:
> > > > > > 
> > > > > > > I think I misunderstood Greg's subdirectory comment.  We already 
> > > > > > > have
> > > > > > > directories like this:
> > > > > > 
> > > > > > Yes, IIRC, Greg's remark applies if you have to start creating
> > > > > > directories with manual kobjects.
> > > > > > 
> > > > > > > and aspm_ctrl_attr_group (for "link") is nicely done with static
> > > > > > > attributes.  So I think we could do something like this:
> > > > > > > 
> > > > > > >   /sys/bus/pci/devices/:01:00.0/   # PF directory
> > > > > > > sriov/ # SR-IOV related stuff
> > > > > > >   vf_total_msix
> > > > > > >   vf_msix_count_BB:DD.F# includes bus/dev/fn of first 
> > > > > > > VF
> > > > > > >   ...
> > > > > > >   vf_msix_count_BB:DD.F# includes bus/dev/fn of last 
> > > > > > > VF
> > > > > > 
> > > > > > It looks a bit odd that it isn't a subdirectory, but this seems
> > > > > > reasonable.
> > > > > 
> > > > > Sorry, I missed your point; you'll have to lay it out more explicitly.
> > > > > I did intend that "sriov" *is* a subdirectory of the :01:00.0
> > > > > directory.  The full paths would be:
> > > > >
> > > > >   /sys/bus/pci/devices/:01:00.0/sriov/vf_total_msix
> > > > >   /sys/bus/pci/devices/:01:00.0/sriov/vf_msix_count_BB:DD.F
> > > > >   ...
> > > > 
> > > > Sorry, I was meaning what you first proposed:
> > > > 
> > > >/sys/bus/pci/devices/:01:00.0/sriov/BB:DD.F/vf_msix_count
> > > > 
> > > > Which has the extra sub directory to organize the child VFs.
> > > > 
> > > > Keep in mind there is going to be alot of VFs here, > 1k - so this
> > > > will be a huge directory.
> > > 
> > > With :01:00.0/sriov/vf_msix_count_BB:DD.F, sriov/ will contain
> > > 1 + 1K files ("vf_total_msix" + 1 per VF).
> > > 
> > > With :01:00.0/sriov/BB:DD.F/vf_msix_count, sriov/ will contain
> > > 1 file and 1K subdirectories.
> > 
> > This is racy by design, in order to add new file and create BB:DD.F
> > directory, the VF will need to do it after or during it's creation.
> > During PF creation it is unknown to PF those BB:DD.F values.
> > 
> > The race here is due to the events of PF,VF directory already sent
> > but new directory structure is not ready yet.
> >
> > From code perspective, we will need to add something similar to
> > pci_iov_sysfs_link() with the code that you didn't like in previous
> > variants (the one that messes with sysfs_create_file API).
> > 
> > It looks not good for large SR-IOV systems with >1K VFs with
> > gazillion subdirectories inside PF, while the more natural is to see
> > them in VF.
> > 
> > So I'm completely puzzled why you want to do these files on PF and
> > not on VF as v0, v7 and v8 proposed.
> 
> On both mlx5 and NVMe, the "assign vectors to VF" functionality is
> implemented by the PF, so I think it's reasonable to explore the idea
> of "associate the vector assignment sysfs file with the PF."

As you said, it is our (Linux kernel) implementation, but from user space
perspective it is seen different. The user "configures" specific VF and
doesn't need to know that we are routing his requests through PF.

> 
> Assume 1K VFs.  Either way we have >1K subdirectories of
> /sys/devices/pci:00/.  I think we should avoid an extra
> subdirectory level, so I think the choices on the table are:

Right, we already have 

Re: [PATCH for-next v3 1/2] IB/cma: Introduce rdma_set_min_rnr_timer()

2021-04-01 Thread Leon Romanovsky
On Wed, Mar 31, 2021 at 08:43:13PM +0200, Håkon Bugge wrote:
> Introduce the ability for kernel ULPs to adjust the minimum RNR Retry
> timer. The INIT -> RTR transition executed by RDMA CM will be used for
> this adjustment. This avoids an additional ib_modify_qp() call.
> 
> rdma_set_min_rnr_timer() must be called before the call to
> rdma_connect() on the active side and before the call to rdma_accept()
> on the passive side.
> 
> The default value of RNR Retry timer is zero, which translates to 655
> ms. When the receiver is not ready to accept a send messages, it
> encodes the RNR Retry timer value in the NAK. The requestor will then
> wait at least the specified time value before retrying the send.
> 
> The 5-bit value to be supplied to the rdma_set_min_rnr_timer() is
> documented in IBTA Table 45: "Encoding for RNR NAK Timer Field".
> 
> Signed-off-by: Håkon Bugge 
> Acked-by: Jason Gunthorpe 
> ---
>  drivers/infiniband/core/cma.c  | 41 
> ++
>  drivers/infiniband/core/cma_priv.h |  2 ++
>  include/rdma/rdma_cm.h |  2 ++
>  3 files changed, 45 insertions(+)
> 
> diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
> index 9409651..5ce097d 100644
> --- a/drivers/infiniband/core/cma.c
> +++ b/drivers/infiniband/core/cma.c
> @@ -852,6 +852,7 @@ static void cma_id_put(struct rdma_id_private *id_priv)
>   id_priv->id.qp_type = qp_type;
>   id_priv->tos_set = false;
>   id_priv->timeout_set = false;
> + id_priv->min_rnr_timer_set = false;
>   id_priv->gid_type = IB_GID_TYPE_IB;
>   spin_lock_init(&id_priv->lock);
>   mutex_init(&id_priv->qp_mutex);
> @@ -1141,6 +1142,9 @@ int rdma_init_qp_attr(struct rdma_cm_id *id, struct 
> ib_qp_attr *qp_attr,
>   if ((*qp_attr_mask & IB_QP_TIMEOUT) && id_priv->timeout_set)
>   qp_attr->timeout = id_priv->timeout;
>  
> + if ((*qp_attr_mask & IB_QP_MIN_RNR_TIMER) && id_priv->min_rnr_timer_set)
> + qp_attr->min_rnr_timer = id_priv->min_rnr_timer;
> +
>   return ret;
>  }
>  EXPORT_SYMBOL(rdma_init_qp_attr);
> @@ -2615,6 +2619,43 @@ int rdma_set_ack_timeout(struct rdma_cm_id *id, u8 
> timeout)
>  }
>  EXPORT_SYMBOL(rdma_set_ack_timeout);
>  
> +/**
> + * rdma_set_min_rnr_timer() - Set the minimum RNR Retry timer of the
> + * QP associated with a connection identifier.
> + * @id: Communication identifier to associated with service type.
> + * @min_rnr_timer: 5-bit value encoded as Table 45: "Encoding for RNR NAK
> + *  Timer Field" in the IBTA specification.
> + *
> + * This function should be called before rdma_connect() on active
> + * side, and on passive side before rdma_accept(). The timer value
> + * will be associated with the local QP. When it receives a send it is
> + * not read to handle, typically if the receive queue is empty, an RNR
> + * Retry NAK is returned to the requester with the min_rnr_timer
> + * encoded. The requester will then wait at least the time specified
> + * in the NAK before retrying. The default is zero, which translates
> + * to a minimum RNR Timer value of 655 ms.
> + *
> + * Return: 0 for success
> + */
> +int rdma_set_min_rnr_timer(struct rdma_cm_id *id, u8 min_rnr_timer)
> +{
> + struct rdma_id_private *id_priv;
> +
> + /* It is a five-bit value */
> + if (min_rnr_timer & 0xe0)
> + return -EINVAL;
> +
> + if (id->qp_type != IB_QPT_RC && id->qp_type != IB_QPT_XRC_TGT)
> + return -EINVAL;

This is in-kernel API and safe to use WARN_ON() instead of returning
error which RDS is not checking anyway.

Thanks


[PATCH rdma-next v2 5/5] net/bnxt: Use direct API instead of useless indirection

2021-03-31 Thread Leon Romanovsky
From: Leon Romanovsky 

There is no need in any indirection complexity for one ULP user,
remove all this complexity in favour of direct calls to the exported
symbols. This allows us to greatly simplify the code.

Signed-off-by: Leon Romanovsky 
---
 drivers/infiniband/hw/bnxt_re/main.c  | 70 +++-
 drivers/net/ethernet/broadcom/bnxt/bnxt.c |  2 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.c | 83 +++
 drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.h | 26 ++
 4 files changed, 55 insertions(+), 126 deletions(-)

diff --git a/drivers/infiniband/hw/bnxt_re/main.c 
b/drivers/infiniband/hw/bnxt_re/main.c
index 8bfbf0231a9e..b3f8fe8314a8 100644
--- a/drivers/infiniband/hw/bnxt_re/main.c
+++ b/drivers/infiniband/hw/bnxt_re/main.c
@@ -351,24 +351,6 @@ static struct bnxt_ulp_ops bnxt_re_ulp_ops = {
 
 /* RoCE -> Net driver */
 
-/* Driver registration routines used to let the networking driver (bnxt_en)
- * to know that the RoCE driver is now installed
- */
-static int bnxt_re_unregister_netdev(struct bnxt_re_dev *rdev)
-{
-   struct bnxt_en_dev *en_dev;
-   int rc;
-
-   if (!rdev)
-   return -EINVAL;
-
-   en_dev = rdev->en_dev;
-
-   rc = en_dev->en_ops->bnxt_unregister_device(rdev->en_dev,
-   BNXT_ROCE_ULP);
-   return rc;
-}
-
 static int bnxt_re_register_netdev(struct bnxt_re_dev *rdev)
 {
struct bnxt_en_dev *en_dev;
@@ -379,28 +361,11 @@ static int bnxt_re_register_netdev(struct bnxt_re_dev 
*rdev)
 
en_dev = rdev->en_dev;
 
-   rc = en_dev->en_ops->bnxt_register_device(en_dev, BNXT_ROCE_ULP,
- &bnxt_re_ulp_ops, rdev);
+   rc = bnxt_register_dev(en_dev, &bnxt_re_ulp_ops, rdev);
rdev->qplib_res.pdev = rdev->en_dev->pdev;
return rc;
 }
 
-static int bnxt_re_free_msix(struct bnxt_re_dev *rdev)
-{
-   struct bnxt_en_dev *en_dev;
-   int rc;
-
-   if (!rdev)
-   return -EINVAL;
-
-   en_dev = rdev->en_dev;
-
-
-   rc = en_dev->en_ops->bnxt_free_msix(rdev->en_dev, BNXT_ROCE_ULP);
-
-   return rc;
-}
-
 static int bnxt_re_request_msix(struct bnxt_re_dev *rdev)
 {
int rc = 0, num_msix_want = BNXT_RE_MAX_MSIX, num_msix_got;
@@ -413,9 +378,8 @@ static int bnxt_re_request_msix(struct bnxt_re_dev *rdev)
 
num_msix_want = min_t(u32, BNXT_RE_MAX_MSIX, num_online_cpus());
 
-   num_msix_got = en_dev->en_ops->bnxt_request_msix(en_dev, BNXT_ROCE_ULP,
-rdev->msix_entries,
-num_msix_want);
+   num_msix_got =
+   bnxt_req_msix_vecs(en_dev, rdev->msix_entries, num_msix_want);
if (num_msix_got < BNXT_RE_MIN_MSIX) {
rc = -EINVAL;
goto done;
@@ -471,7 +435,7 @@ static int bnxt_re_net_ring_free(struct bnxt_re_dev *rdev,
req.ring_id = cpu_to_le16(fw_ring_id);
bnxt_re_fill_fw_msg(&fw_msg, (void *)&req, sizeof(req), (void *)&resp,
sizeof(resp), DFLT_HWRM_CMD_TIMEOUT);
-   rc = en_dev->en_ops->bnxt_send_fw_msg(en_dev, BNXT_ROCE_ULP, &fw_msg);
+   rc = bnxt_send_msg(en_dev, &fw_msg);
if (rc)
ibdev_err(&rdev->ibdev, "Failed to free HW ring:%d :%#x",
  req.ring_id, rc);
@@ -508,7 +472,7 @@ static int bnxt_re_net_ring_alloc(struct bnxt_re_dev *rdev,
req.int_mode = ring_attr->mode;
bnxt_re_fill_fw_msg(&fw_msg, (void *)&req, sizeof(req), (void *)&resp,
sizeof(resp), DFLT_HWRM_CMD_TIMEOUT);
-   rc = en_dev->en_ops->bnxt_send_fw_msg(en_dev, BNXT_ROCE_ULP, &fw_msg);
+   rc = bnxt_send_msg(en_dev, &fw_msg);
if (!rc)
*fw_ring_id = le16_to_cpu(resp.ring_id);
 
@@ -535,7 +499,7 @@ static int bnxt_re_net_stats_ctx_free(struct bnxt_re_dev 
*rdev,
req.stat_ctx_id = cpu_to_le32(fw_stats_ctx_id);
bnxt_re_fill_fw_msg(&fw_msg, (void *)&req, sizeof(req), (void *)&req,
sizeof(req), DFLT_HWRM_CMD_TIMEOUT);
-   rc = en_dev->en_ops->bnxt_send_fw_msg(en_dev, BNXT_ROCE_ULP, &fw_msg);
+   rc = bnxt_send_msg(en_dev, &fw_msg);
if (rc)
ibdev_err(&rdev->ibdev, "Failed to free HW stats context %#x",
  rc);
@@ -567,7 +531,7 @@ static int bnxt_re_net_stats_ctx_alloc(struct bnxt_re_dev 
*rdev,
req.stat_ctx_flags = STAT_CTX_ALLOC_REQ_STAT_CTX_FLAGS_ROCE;
bnxt_re_fill_fw_msg(&fw_msg, (void *)&req, sizeof(req), (void *)&resp,
sizeof(resp), DFLT_HWRM_CMD_TIMEOUT);
-   rc = en_dev->en_ops->bnxt_send_fw_msg(en_dev, BNXT_ROCE_ULP

[PATCH rdma-next v2 0/5] Get rid of custom made module dependency

2021-03-31 Thread Leon Romanovsky
From: Leon Romanovsky 

Changelog:
v2:
 * kbuild spotted that I didn't delete all code in patch #5, so deleted
   even more ulp_ops derefences.
v1: https://lore.kernel.org/linux-rdma/20210329085212.257771-1-l...@kernel.org
 * Go much deeper and removed useless ULP indirection
v0: https://lore.kernel.org/linux-rdma/20210324142524.1135319-1-l...@kernel.org
---

The following series fixes issue spotted in [1], where bnxt_re driver
messed with module reference counting in order to implement symbol
dependency of bnxt_re and bnxt modules. All of this is done, when in
upstream we have only one ULP user of that bnxt module. The simple
declaration of exported symbol would do the trick.

This series removes that custom module_get/_put, which is not supposed
to be in the driver from the beginning and get rid of nasty indirection
logic that isn't relevant for the upstream code.

Such small changes allow us to simplify the bnxt code and my hope that
Devesh will continue where I stopped and remove struct bnxt_ulp_ops too.

Thanks

[1] https://lore.kernel.org/linux-rdma/20210324142524.1135319-1-l...@kernel.org

Leon Romanovsky (5):
  RDMA/bnxt_re: Depend on bnxt ethernet driver and not blindly select it
  RDMA/bnxt_re: Create direct symbolic link between bnxt modules
  RDMA/bnxt_re: Get rid of custom module reference counting
  net/bnxt: Remove useless check of non-existent ULP id
  net/bnxt: Use direct API instead of useless indirection

 drivers/infiniband/hw/bnxt_re/Kconfig |   4 +-
 drivers/infiniband/hw/bnxt_re/main.c  |  93 ++-
 drivers/net/ethernet/broadcom/bnxt/bnxt.c |   4 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt.h |   1 -
 drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.c | 245 +++---
 drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.h |  32 +--
 6 files changed, 119 insertions(+), 260 deletions(-)

-- 
2.30.2



[PATCH rdma-next v2 3/5] RDMA/bnxt_re: Get rid of custom module reference counting

2021-03-31 Thread Leon Romanovsky
From: Leon Romanovsky 

Instead of manually messing with parent driver module reference
counting rely on export symbol mechanism to ensure that proper
probe/remove chain is performed.

Signed-off-by: Leon Romanovsky 
---
 drivers/infiniband/hw/bnxt_re/main.c | 16 ++--
 1 file changed, 2 insertions(+), 14 deletions(-)

diff --git a/drivers/infiniband/hw/bnxt_re/main.c 
b/drivers/infiniband/hw/bnxt_re/main.c
index 140c54ee5916..8bfbf0231a9e 100644
--- a/drivers/infiniband/hw/bnxt_re/main.c
+++ b/drivers/infiniband/hw/bnxt_re/main.c
@@ -601,13 +601,6 @@ static struct bnxt_re_dev *bnxt_re_from_netdev(struct 
net_device *netdev)
return container_of(ibdev, struct bnxt_re_dev, ibdev);
 }
 
-static void bnxt_re_dev_unprobe(struct net_device *netdev,
-   struct bnxt_en_dev *en_dev)
-{
-   dev_put(netdev);
-   module_put(en_dev->pdev->driver->driver.owner);
-}
-
 static struct bnxt_en_dev *bnxt_re_dev_probe(struct net_device *netdev)
 {
struct bnxt_en_dev *en_dev;
@@ -628,10 +621,6 @@ static struct bnxt_en_dev *bnxt_re_dev_probe(struct 
net_device *netdev)
return ERR_PTR(-ENODEV);
}
 
-   /* Bump net device reference count */
-   if (!try_module_get(pdev->driver->driver.owner))
-   return ERR_PTR(-ENODEV);
-
dev_hold(netdev);
 
return en_dev;
@@ -1558,13 +1547,12 @@ static int bnxt_re_dev_init(struct bnxt_re_dev *rdev, 
u8 wqe_mode)
 
 static void bnxt_re_dev_unreg(struct bnxt_re_dev *rdev)
 {
-   struct bnxt_en_dev *en_dev = rdev->en_dev;
struct net_device *netdev = rdev->netdev;
 
bnxt_re_dev_remove(rdev);
 
if (netdev)
-   bnxt_re_dev_unprobe(netdev, en_dev);
+   dev_put(netdev);
 }
 
 static int bnxt_re_dev_reg(struct bnxt_re_dev **rdev, struct net_device 
*netdev)
@@ -1586,7 +1574,7 @@ static int bnxt_re_dev_reg(struct bnxt_re_dev **rdev, 
struct net_device *netdev)
*rdev = bnxt_re_dev_add(netdev, en_dev);
if (!*rdev) {
rc = -ENOMEM;
-   bnxt_re_dev_unprobe(netdev, en_dev);
+   dev_put(netdev);
goto exit;
}
 exit:
-- 
2.30.2



[PATCH rdma-next v2 4/5] net/bnxt: Remove useless check of non-existent ULP id

2021-03-31 Thread Leon Romanovsky
From: Leon Romanovsky 

There is no other bnxt ULP driver in the upstream and all checks
for the ULP id are useless, so remove them and convert double array
table to proper pointer structure.

Signed-off-by: Leon Romanovsky 
---
 drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.c | 189 +++---
 drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.h |   6 +-
 2 files changed, 73 insertions(+), 122 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.c 
b/drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.c
index a918e374f3c5..f7af900afaed 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.c
@@ -29,34 +29,26 @@ static int bnxt_register_dev(struct bnxt_en_dev *edev, int 
ulp_id,
 {
struct net_device *dev = edev->net;
struct bnxt *bp = netdev_priv(dev);
+   unsigned int max_stat_ctxs;
struct bnxt_ulp *ulp;
 
ASSERT_RTNL();
-   if (ulp_id >= BNXT_MAX_ULP)
-   return -EINVAL;
 
-   ulp = &edev->ulp_tbl[ulp_id];
-   if (rcu_access_pointer(ulp->ulp_ops)) {
-   netdev_err(bp->dev, "ulp id %d already registered\n", ulp_id);
-   return -EBUSY;
-   }
-   if (ulp_id == BNXT_ROCE_ULP) {
-   unsigned int max_stat_ctxs;
+   max_stat_ctxs = bnxt_get_max_func_stat_ctxs(bp);
+   if (max_stat_ctxs <= BNXT_MIN_ROCE_STAT_CTXS ||
+   bp->cp_nr_rings == max_stat_ctxs)
+   return -ENOMEM;
 
-   max_stat_ctxs = bnxt_get_max_func_stat_ctxs(bp);
-   if (max_stat_ctxs <= BNXT_MIN_ROCE_STAT_CTXS ||
-   bp->cp_nr_rings == max_stat_ctxs)
-   return -ENOMEM;
-   }
+   ulp = kzalloc(sizeof(*ulp), GFP_KERNEL);
+   if (!ulp)
+   return -ENOMEM;
 
-   atomic_set(&ulp->ref_count, 0);
+   edev->ulp_tbl = ulp;
ulp->handle = handle;
rcu_assign_pointer(ulp->ulp_ops, ulp_ops);
 
-   if (ulp_id == BNXT_ROCE_ULP) {
-   if (test_bit(BNXT_STATE_OPEN, &bp->state))
-   bnxt_hwrm_vnic_cfg(bp, 0);
-   }
+   if (test_bit(BNXT_STATE_OPEN, &bp->state))
+   bnxt_hwrm_vnic_cfg(bp, 0);
 
return 0;
 }
@@ -69,15 +61,9 @@ static int bnxt_unregister_dev(struct bnxt_en_dev *edev, int 
ulp_id)
int i = 0;
 
ASSERT_RTNL();
-   if (ulp_id >= BNXT_MAX_ULP)
-   return -EINVAL;
 
-   ulp = &edev->ulp_tbl[ulp_id];
-   if (!rcu_access_pointer(ulp->ulp_ops)) {
-   netdev_err(bp->dev, "ulp id %d not registered\n", ulp_id);
-   return -EINVAL;
-   }
-   if (ulp_id == BNXT_ROCE_ULP && ulp->msix_requested)
+   ulp = edev->ulp_tbl;
+   if (ulp->msix_requested)
edev->en_ops->bnxt_free_msix(edev, ulp_id);
 
if (ulp->max_async_event_id)
@@ -91,6 +77,8 @@ static int bnxt_unregister_dev(struct bnxt_en_dev *edev, int 
ulp_id)
msleep(100);
i++;
}
+   kfree(ulp);
+   edev->ulp_tbl = NULL;
return 0;
 }
 
@@ -99,8 +87,8 @@ static void bnxt_fill_msix_vecs(struct bnxt *bp, struct 
bnxt_msix_entry *ent)
struct bnxt_en_dev *edev = bp->edev;
int num_msix, idx, i;
 
-   num_msix = edev->ulp_tbl[BNXT_ROCE_ULP].msix_requested;
-   idx = edev->ulp_tbl[BNXT_ROCE_ULP].msix_base;
+   num_msix = edev->ulp_tbl->msix_requested;
+   idx = edev->ulp_tbl->msix_base;
for (i = 0; i < num_msix; i++) {
ent[i].vector = bp->irq_tbl[idx + i].vector;
ent[i].ring_idx = idx + i;
@@ -126,13 +114,11 @@ static int bnxt_req_msix_vecs(struct bnxt_en_dev *edev, 
int ulp_id,
int rc = 0;
 
ASSERT_RTNL();
-   if (ulp_id != BNXT_ROCE_ULP)
-   return -EINVAL;
 
if (!(bp->flags & BNXT_FLAG_USING_MSIX))
return -ENODEV;
 
-   if (edev->ulp_tbl[ulp_id].msix_requested)
+   if (edev->ulp_tbl->msix_requested)
return -EAGAIN;
 
max_cp_rings = bnxt_get_max_func_cp_rings(bp);
@@ -148,8 +134,8 @@ static int bnxt_req_msix_vecs(struct bnxt_en_dev *edev, int 
ulp_id,
max_idx = min_t(int, bp->total_irqs, max_cp_rings);
idx = max_idx - avail_msix;
}
-   edev->ulp_tbl[ulp_id].msix_base = idx;
-   edev->ulp_tbl[ulp_id].msix_requested = avail_msix;
+   edev->ulp_tbl->msix_base = idx;
+   edev->ulp_tbl->msix_requested = avail_msix;
hw_resc = &bp->hw_resc;
total_vecs = idx + avail_msix;
if (bp->total_irqs < total_vecs ||
@@ -162,7 +148,7 @@ static int bnxt_req_msix_vecs(struct bnxt_en_dev *edev, int 
ulp_id,
}
}
if (rc) {
-   edev->ulp_tbl[ulp_

[PATCH rdma-next v2 2/5] RDMA/bnxt_re: Create direct symbolic link between bnxt modules

2021-03-31 Thread Leon Romanovsky
From: Leon Romanovsky 

Convert indirect probe call to its direct equivalent to create
symbols link between RDMA and netdev modules. This will give
us an ability to remove custom module reference counting that
doesn't belong to the driver.

Signed-off-by: Leon Romanovsky 
---
 drivers/infiniband/hw/bnxt_re/main.c  | 7 +--
 drivers/net/ethernet/broadcom/bnxt/bnxt.c | 2 --
 drivers/net/ethernet/broadcom/bnxt/bnxt.h | 1 -
 drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.c | 1 +
 4 files changed, 2 insertions(+), 9 deletions(-)

diff --git a/drivers/infiniband/hw/bnxt_re/main.c 
b/drivers/infiniband/hw/bnxt_re/main.c
index b30d37f0bad2..140c54ee5916 100644
--- a/drivers/infiniband/hw/bnxt_re/main.c
+++ b/drivers/infiniband/hw/bnxt_re/main.c
@@ -610,15 +610,10 @@ static void bnxt_re_dev_unprobe(struct net_device *netdev,
 
 static struct bnxt_en_dev *bnxt_re_dev_probe(struct net_device *netdev)
 {
-   struct bnxt *bp = netdev_priv(netdev);
struct bnxt_en_dev *en_dev;
struct pci_dev *pdev;
 
-   /* Call bnxt_en's RoCE probe via indirect API */
-   if (!bp->ulp_probe)
-   return ERR_PTR(-EINVAL);
-
-   en_dev = bp->ulp_probe(netdev);
+   en_dev = bnxt_ulp_probe(netdev);
if (IS_ERR(en_dev))
return en_dev;
 
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c 
b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index a680fd9c68ea..3f0e4bde5dc9 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -12859,8 +12859,6 @@ static int bnxt_init_one(struct pci_dev *pdev, const 
struct pci_device_id *ent)
if (!BNXT_CHIP_P4_PLUS(bp))
bp->flags |= BNXT_FLAG_DOUBLE_DB;
 
-   bp->ulp_probe = bnxt_ulp_probe;
-
rc = bnxt_init_mac_addr(bp);
if (rc) {
dev_err(&pdev->dev, "Unable to initialize mac address.\n");
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h 
b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
index 1259e68cba2a..eb0314d7a9b1 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
@@ -1745,7 +1745,6 @@ struct bnxt {
(BNXT_CHIP_P4(bp) || BNXT_CHIP_P5(bp))
 
struct bnxt_en_dev  *edev;
-   struct bnxt_en_dev *(*ulp_probe)(struct net_device *);
 
struct bnxt_napi**bnapi;
 
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.c 
b/drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.c
index 64dbbb04b043..a918e374f3c5 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.c
@@ -491,3 +491,4 @@ struct bnxt_en_dev *bnxt_ulp_probe(struct net_device *dev)
}
return bp->edev;
 }
+EXPORT_SYMBOL(bnxt_ulp_probe);
-- 
2.30.2



[PATCH rdma-next v2 1/5] RDMA/bnxt_re: Depend on bnxt ethernet driver and not blindly select it

2021-03-31 Thread Leon Romanovsky
From: Leon Romanovsky 

The "select" kconfig keyword provides reverse dependency, however it
doesn't check that selected symbol meets its own dependencies. Usually
"select" is used for non-visible symbols, so instead of trying to keep
dependencies in sync with BNXT ethernet driver, simply "depends on" it,
like Kconfig documentation suggest.

* CONFIG_PCI is already required by BNXT
* CONFIG_NETDEVICES and CONFIG_ETHERNET are needed to chose BNXT

Signed-off-by: Leon Romanovsky 
---
 drivers/infiniband/hw/bnxt_re/Kconfig | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/infiniband/hw/bnxt_re/Kconfig 
b/drivers/infiniband/hw/bnxt_re/Kconfig
index 0feac5132ce1..6a17f5cdb020 100644
--- a/drivers/infiniband/hw/bnxt_re/Kconfig
+++ b/drivers/infiniband/hw/bnxt_re/Kconfig
@@ -2,9 +2,7 @@
 config INFINIBAND_BNXT_RE
tristate "Broadcom Netxtreme HCA support"
depends on 64BIT
-   depends on ETHERNET && NETDEVICES && PCI && INET && DCB
-   select NET_VENDOR_BROADCOM
-   select BNXT
+   depends on INET && DCB && BNXT
help
  This driver supports Broadcom NetXtreme-E 10/25/40/50 gigabit
  RoCE HCAs.  To compile this driver as a module, choose M here:
-- 
2.30.2



Re: [PATCH mlx5-next v7 0/4] Dynamically assign MSI-X vectors count

2021-03-30 Thread Leon Romanovsky
On Tue, Mar 30, 2021 at 03:41:41PM -0500, Bjorn Helgaas wrote:
> On Tue, Mar 30, 2021 at 04:47:16PM -0300, Jason Gunthorpe wrote:
> > On Tue, Mar 30, 2021 at 10:00:19AM -0500, Bjorn Helgaas wrote:
> > > On Tue, Mar 30, 2021 at 10:57:38AM -0300, Jason Gunthorpe wrote:
> > > > On Mon, Mar 29, 2021 at 08:29:49PM -0500, Bjorn Helgaas wrote:
> > > > 
> > > > > I think I misunderstood Greg's subdirectory comment.  We already have
> > > > > directories like this:
> > > > 
> > > > Yes, IIRC, Greg's remark applies if you have to start creating
> > > > directories with manual kobjects.
> > > > 
> > > > > and aspm_ctrl_attr_group (for "link") is nicely done with static
> > > > > attributes.  So I think we could do something like this:
> > > > > 
> > > > >   /sys/bus/pci/devices/:01:00.0/   # PF directory
> > > > > sriov/ # SR-IOV related stuff
> > > > >   vf_total_msix
> > > > >   vf_msix_count_BB:DD.F# includes bus/dev/fn of first VF
> > > > >   ...
> > > > >   vf_msix_count_BB:DD.F# includes bus/dev/fn of last VF
> > > > 
> > > > It looks a bit odd that it isn't a subdirectory, but this seems
> > > > reasonable.
> > > 
> > > Sorry, I missed your point; you'll have to lay it out more explicitly.
> > > I did intend that "sriov" *is* a subdirectory of the :01:00.0
> > > directory.  The full paths would be:
> > >
> > >   /sys/bus/pci/devices/:01:00.0/sriov/vf_total_msix
> > >   /sys/bus/pci/devices/:01:00.0/sriov/vf_msix_count_BB:DD.F
> > >   ...
> > 
> > Sorry, I was meaning what you first proposed:
> > 
> >/sys/bus/pci/devices/:01:00.0/sriov/BB:DD.F/vf_msix_count
> > 
> > Which has the extra sub directory to organize the child VFs.
> > 
> > Keep in mind there is going to be alot of VFs here, > 1k - so this
> > will be a huge directory.
> 
> With :01:00.0/sriov/vf_msix_count_BB:DD.F, sriov/ will contain
> 1 + 1K files ("vf_total_msix" + 1 per VF).
> 
> With :01:00.0/sriov/BB:DD.F/vf_msix_count, sriov/ will contain
> 1 file and 1K subdirectories.

This is racy by design, in order to add new file and create BB:DD.F
directory, the VF will need to do it after or during it's creation.
During PF creation it is unknown to PF those BB:DD.F values.

The race here is due to the events of PF,VF directory already sent but
new directory structure is not ready yet.

>From code perspective, we will need to add something similar to 
>pci_iov_sysfs_link()
with the code that you didn't like in previous variants (the one that messes 
with
sysfs_create_file API).

It looks not good for large SR-IOV systems with >1K VFs with gazillion
subdirectories inside PF, while the more natural is to see them in VF.

So I'm completely puzzled why you want to do these files on PF and not
on VF as v0, v7 and v8 proposed.

Thanks


[PATCH rdma-next v1 4/5] net/bnxt: Remove useless check of non-existent ULP id

2021-03-29 Thread Leon Romanovsky
From: Leon Romanovsky 

There is no other bnxt ULP driver in the upstream and all checks
for the ULP id are useless, so remove them and convert double array
table to proper pointer structure.
---
 drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.c | 189 +++---
 drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.h |   6 +-
 2 files changed, 73 insertions(+), 122 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.c 
b/drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.c
index a918e374f3c5..f7af900afaed 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.c
@@ -29,34 +29,26 @@ static int bnxt_register_dev(struct bnxt_en_dev *edev, int 
ulp_id,
 {
struct net_device *dev = edev->net;
struct bnxt *bp = netdev_priv(dev);
+   unsigned int max_stat_ctxs;
struct bnxt_ulp *ulp;
 
ASSERT_RTNL();
-   if (ulp_id >= BNXT_MAX_ULP)
-   return -EINVAL;
 
-   ulp = &edev->ulp_tbl[ulp_id];
-   if (rcu_access_pointer(ulp->ulp_ops)) {
-   netdev_err(bp->dev, "ulp id %d already registered\n", ulp_id);
-   return -EBUSY;
-   }
-   if (ulp_id == BNXT_ROCE_ULP) {
-   unsigned int max_stat_ctxs;
+   max_stat_ctxs = bnxt_get_max_func_stat_ctxs(bp);
+   if (max_stat_ctxs <= BNXT_MIN_ROCE_STAT_CTXS ||
+   bp->cp_nr_rings == max_stat_ctxs)
+   return -ENOMEM;
 
-   max_stat_ctxs = bnxt_get_max_func_stat_ctxs(bp);
-   if (max_stat_ctxs <= BNXT_MIN_ROCE_STAT_CTXS ||
-   bp->cp_nr_rings == max_stat_ctxs)
-   return -ENOMEM;
-   }
+   ulp = kzalloc(sizeof(*ulp), GFP_KERNEL);
+   if (!ulp)
+   return -ENOMEM;
 
-   atomic_set(&ulp->ref_count, 0);
+   edev->ulp_tbl = ulp;
ulp->handle = handle;
rcu_assign_pointer(ulp->ulp_ops, ulp_ops);
 
-   if (ulp_id == BNXT_ROCE_ULP) {
-   if (test_bit(BNXT_STATE_OPEN, &bp->state))
-   bnxt_hwrm_vnic_cfg(bp, 0);
-   }
+   if (test_bit(BNXT_STATE_OPEN, &bp->state))
+   bnxt_hwrm_vnic_cfg(bp, 0);
 
return 0;
 }
@@ -69,15 +61,9 @@ static int bnxt_unregister_dev(struct bnxt_en_dev *edev, int 
ulp_id)
int i = 0;
 
ASSERT_RTNL();
-   if (ulp_id >= BNXT_MAX_ULP)
-   return -EINVAL;
 
-   ulp = &edev->ulp_tbl[ulp_id];
-   if (!rcu_access_pointer(ulp->ulp_ops)) {
-   netdev_err(bp->dev, "ulp id %d not registered\n", ulp_id);
-   return -EINVAL;
-   }
-   if (ulp_id == BNXT_ROCE_ULP && ulp->msix_requested)
+   ulp = edev->ulp_tbl;
+   if (ulp->msix_requested)
edev->en_ops->bnxt_free_msix(edev, ulp_id);
 
if (ulp->max_async_event_id)
@@ -91,6 +77,8 @@ static int bnxt_unregister_dev(struct bnxt_en_dev *edev, int 
ulp_id)
msleep(100);
i++;
}
+   kfree(ulp);
+   edev->ulp_tbl = NULL;
return 0;
 }
 
@@ -99,8 +87,8 @@ static void bnxt_fill_msix_vecs(struct bnxt *bp, struct 
bnxt_msix_entry *ent)
struct bnxt_en_dev *edev = bp->edev;
int num_msix, idx, i;
 
-   num_msix = edev->ulp_tbl[BNXT_ROCE_ULP].msix_requested;
-   idx = edev->ulp_tbl[BNXT_ROCE_ULP].msix_base;
+   num_msix = edev->ulp_tbl->msix_requested;
+   idx = edev->ulp_tbl->msix_base;
for (i = 0; i < num_msix; i++) {
ent[i].vector = bp->irq_tbl[idx + i].vector;
ent[i].ring_idx = idx + i;
@@ -126,13 +114,11 @@ static int bnxt_req_msix_vecs(struct bnxt_en_dev *edev, 
int ulp_id,
int rc = 0;
 
ASSERT_RTNL();
-   if (ulp_id != BNXT_ROCE_ULP)
-   return -EINVAL;
 
if (!(bp->flags & BNXT_FLAG_USING_MSIX))
return -ENODEV;
 
-   if (edev->ulp_tbl[ulp_id].msix_requested)
+   if (edev->ulp_tbl->msix_requested)
return -EAGAIN;
 
max_cp_rings = bnxt_get_max_func_cp_rings(bp);
@@ -148,8 +134,8 @@ static int bnxt_req_msix_vecs(struct bnxt_en_dev *edev, int 
ulp_id,
max_idx = min_t(int, bp->total_irqs, max_cp_rings);
idx = max_idx - avail_msix;
}
-   edev->ulp_tbl[ulp_id].msix_base = idx;
-   edev->ulp_tbl[ulp_id].msix_requested = avail_msix;
+   edev->ulp_tbl->msix_base = idx;
+   edev->ulp_tbl->msix_requested = avail_msix;
hw_resc = &bp->hw_resc;
total_vecs = idx + avail_msix;
if (bp->total_irqs < total_vecs ||
@@ -162,7 +148,7 @@ static int bnxt_req_msix_vecs(struct bnxt_en_dev *edev, int 
ulp_id,
}
}
if (rc) {
-   edev->ulp_tbl[ulp_id].msix_requested = 0;
+

[PATCH rdma-next v1 5/5] net/bnxt: Use direct API instead of useless indirection

2021-03-29 Thread Leon Romanovsky
From: Leon Romanovsky 

There is no need in any indirection complexity for one ULP user,
remove all this complexity in favour of direct calls to the exported
symbols. This allows us to greatly simplify the code.

Signed-off-by: Leon Romanovsky 
---
 drivers/infiniband/hw/bnxt_re/main.c  | 70 ---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c |  2 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.c | 70 +++
 drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.h | 26 +++
 4 files changed, 50 insertions(+), 118 deletions(-)

diff --git a/drivers/infiniband/hw/bnxt_re/main.c 
b/drivers/infiniband/hw/bnxt_re/main.c
index 8bfbf0231a9e..b3f8fe8314a8 100644
--- a/drivers/infiniband/hw/bnxt_re/main.c
+++ b/drivers/infiniband/hw/bnxt_re/main.c
@@ -351,24 +351,6 @@ static struct bnxt_ulp_ops bnxt_re_ulp_ops = {
 
 /* RoCE -> Net driver */
 
-/* Driver registration routines used to let the networking driver (bnxt_en)
- * to know that the RoCE driver is now installed
- */
-static int bnxt_re_unregister_netdev(struct bnxt_re_dev *rdev)
-{
-   struct bnxt_en_dev *en_dev;
-   int rc;
-
-   if (!rdev)
-   return -EINVAL;
-
-   en_dev = rdev->en_dev;
-
-   rc = en_dev->en_ops->bnxt_unregister_device(rdev->en_dev,
-   BNXT_ROCE_ULP);
-   return rc;
-}
-
 static int bnxt_re_register_netdev(struct bnxt_re_dev *rdev)
 {
struct bnxt_en_dev *en_dev;
@@ -379,28 +361,11 @@ static int bnxt_re_register_netdev(struct bnxt_re_dev 
*rdev)
 
en_dev = rdev->en_dev;
 
-   rc = en_dev->en_ops->bnxt_register_device(en_dev, BNXT_ROCE_ULP,
- &bnxt_re_ulp_ops, rdev);
+   rc = bnxt_register_dev(en_dev, &bnxt_re_ulp_ops, rdev);
rdev->qplib_res.pdev = rdev->en_dev->pdev;
return rc;
 }
 
-static int bnxt_re_free_msix(struct bnxt_re_dev *rdev)
-{
-   struct bnxt_en_dev *en_dev;
-   int rc;
-
-   if (!rdev)
-   return -EINVAL;
-
-   en_dev = rdev->en_dev;
-
-
-   rc = en_dev->en_ops->bnxt_free_msix(rdev->en_dev, BNXT_ROCE_ULP);
-
-   return rc;
-}
-
 static int bnxt_re_request_msix(struct bnxt_re_dev *rdev)
 {
int rc = 0, num_msix_want = BNXT_RE_MAX_MSIX, num_msix_got;
@@ -413,9 +378,8 @@ static int bnxt_re_request_msix(struct bnxt_re_dev *rdev)
 
num_msix_want = min_t(u32, BNXT_RE_MAX_MSIX, num_online_cpus());
 
-   num_msix_got = en_dev->en_ops->bnxt_request_msix(en_dev, BNXT_ROCE_ULP,
-rdev->msix_entries,
-num_msix_want);
+   num_msix_got =
+   bnxt_req_msix_vecs(en_dev, rdev->msix_entries, num_msix_want);
if (num_msix_got < BNXT_RE_MIN_MSIX) {
rc = -EINVAL;
goto done;
@@ -471,7 +435,7 @@ static int bnxt_re_net_ring_free(struct bnxt_re_dev *rdev,
req.ring_id = cpu_to_le16(fw_ring_id);
bnxt_re_fill_fw_msg(&fw_msg, (void *)&req, sizeof(req), (void *)&resp,
sizeof(resp), DFLT_HWRM_CMD_TIMEOUT);
-   rc = en_dev->en_ops->bnxt_send_fw_msg(en_dev, BNXT_ROCE_ULP, &fw_msg);
+   rc = bnxt_send_msg(en_dev, &fw_msg);
if (rc)
ibdev_err(&rdev->ibdev, "Failed to free HW ring:%d :%#x",
  req.ring_id, rc);
@@ -508,7 +472,7 @@ static int bnxt_re_net_ring_alloc(struct bnxt_re_dev *rdev,
req.int_mode = ring_attr->mode;
bnxt_re_fill_fw_msg(&fw_msg, (void *)&req, sizeof(req), (void *)&resp,
sizeof(resp), DFLT_HWRM_CMD_TIMEOUT);
-   rc = en_dev->en_ops->bnxt_send_fw_msg(en_dev, BNXT_ROCE_ULP, &fw_msg);
+   rc = bnxt_send_msg(en_dev, &fw_msg);
if (!rc)
*fw_ring_id = le16_to_cpu(resp.ring_id);
 
@@ -535,7 +499,7 @@ static int bnxt_re_net_stats_ctx_free(struct bnxt_re_dev 
*rdev,
req.stat_ctx_id = cpu_to_le32(fw_stats_ctx_id);
bnxt_re_fill_fw_msg(&fw_msg, (void *)&req, sizeof(req), (void *)&req,
sizeof(req), DFLT_HWRM_CMD_TIMEOUT);
-   rc = en_dev->en_ops->bnxt_send_fw_msg(en_dev, BNXT_ROCE_ULP, &fw_msg);
+   rc = bnxt_send_msg(en_dev, &fw_msg);
if (rc)
ibdev_err(&rdev->ibdev, "Failed to free HW stats context %#x",
  rc);
@@ -567,7 +531,7 @@ static int bnxt_re_net_stats_ctx_alloc(struct bnxt_re_dev 
*rdev,
req.stat_ctx_flags = STAT_CTX_ALLOC_REQ_STAT_CTX_FLAGS_ROCE;
bnxt_re_fill_fw_msg(&fw_msg, (void *)&req, sizeof(req), (void *)&resp,
sizeof(resp), DFLT_HWRM_CMD_TIMEOUT);
-   rc = en_dev->en_ops->bnxt_send_fw_msg(en_dev, BNXT_ROCE_ULP

[PATCH rdma-next v1 1/5] RDMA/bnxt_re: Depend on bnxt ethernet driver and not blindly select it

2021-03-29 Thread Leon Romanovsky
From: Leon Romanovsky 

The "select" kconfig keyword provides reverse dependency, however it
doesn't check that selected symbol meets its own dependencies. Usually
"select" is used for non-visible symbols, so instead of trying to keep
dependencies in sync with BNXT ethernet driver, simply "depends on" it,
like Kconfig documentation suggest.

* CONFIG_PCI is already required by BNXT
* CONFIG_NETDEVICES and CONFIG_ETHERNET are needed to chose BNXT

Signed-off-by: Leon Romanovsky 
---
 drivers/infiniband/hw/bnxt_re/Kconfig | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/infiniband/hw/bnxt_re/Kconfig 
b/drivers/infiniband/hw/bnxt_re/Kconfig
index 0feac5132ce1..6a17f5cdb020 100644
--- a/drivers/infiniband/hw/bnxt_re/Kconfig
+++ b/drivers/infiniband/hw/bnxt_re/Kconfig
@@ -2,9 +2,7 @@
 config INFINIBAND_BNXT_RE
tristate "Broadcom Netxtreme HCA support"
depends on 64BIT
-   depends on ETHERNET && NETDEVICES && PCI && INET && DCB
-   select NET_VENDOR_BROADCOM
-   select BNXT
+   depends on INET && DCB && BNXT
help
  This driver supports Broadcom NetXtreme-E 10/25/40/50 gigabit
  RoCE HCAs.  To compile this driver as a module, choose M here:
-- 
2.30.2



[PATCH rdma-next v1 2/5] RDMA/bnxt_re: Create direct symbolic link between bnxt modules

2021-03-29 Thread Leon Romanovsky
From: Leon Romanovsky 

Convert indirect probe call to its direct equivalent to create
symbols link between RDMA and netdev modules. This will give
us an ability to remove custom module reference counting that
doesn't belong to the driver.

Signed-off-by: Leon Romanovsky 
---
 drivers/infiniband/hw/bnxt_re/main.c  | 7 +--
 drivers/net/ethernet/broadcom/bnxt/bnxt.c | 2 --
 drivers/net/ethernet/broadcom/bnxt/bnxt.h | 1 -
 drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.c | 1 +
 4 files changed, 2 insertions(+), 9 deletions(-)

diff --git a/drivers/infiniband/hw/bnxt_re/main.c 
b/drivers/infiniband/hw/bnxt_re/main.c
index b30d37f0bad2..140c54ee5916 100644
--- a/drivers/infiniband/hw/bnxt_re/main.c
+++ b/drivers/infiniband/hw/bnxt_re/main.c
@@ -610,15 +610,10 @@ static void bnxt_re_dev_unprobe(struct net_device *netdev,
 
 static struct bnxt_en_dev *bnxt_re_dev_probe(struct net_device *netdev)
 {
-   struct bnxt *bp = netdev_priv(netdev);
struct bnxt_en_dev *en_dev;
struct pci_dev *pdev;
 
-   /* Call bnxt_en's RoCE probe via indirect API */
-   if (!bp->ulp_probe)
-   return ERR_PTR(-EINVAL);
-
-   en_dev = bp->ulp_probe(netdev);
+   en_dev = bnxt_ulp_probe(netdev);
if (IS_ERR(en_dev))
return en_dev;
 
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c 
b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index a680fd9c68ea..3f0e4bde5dc9 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -12859,8 +12859,6 @@ static int bnxt_init_one(struct pci_dev *pdev, const 
struct pci_device_id *ent)
if (!BNXT_CHIP_P4_PLUS(bp))
bp->flags |= BNXT_FLAG_DOUBLE_DB;
 
-   bp->ulp_probe = bnxt_ulp_probe;
-
rc = bnxt_init_mac_addr(bp);
if (rc) {
dev_err(&pdev->dev, "Unable to initialize mac address.\n");
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h 
b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
index 1259e68cba2a..eb0314d7a9b1 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
@@ -1745,7 +1745,6 @@ struct bnxt {
(BNXT_CHIP_P4(bp) || BNXT_CHIP_P5(bp))
 
struct bnxt_en_dev  *edev;
-   struct bnxt_en_dev *(*ulp_probe)(struct net_device *);
 
struct bnxt_napi**bnapi;
 
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.c 
b/drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.c
index 64dbbb04b043..a918e374f3c5 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.c
@@ -491,3 +491,4 @@ struct bnxt_en_dev *bnxt_ulp_probe(struct net_device *dev)
}
return bp->edev;
 }
+EXPORT_SYMBOL(bnxt_ulp_probe);
-- 
2.30.2



[PATCH rdma-next v1 3/5] RDMA/bnxt_re: Get rid of custom module reference counting

2021-03-29 Thread Leon Romanovsky
From: Leon Romanovsky 

Instead of manually messing with parent driver module reference
counting rely on export symbol mechanism to ensure that proper
probe/remove chain is performed.

Signed-off-by: Leon Romanovsky 
---
 drivers/infiniband/hw/bnxt_re/main.c | 16 ++--
 1 file changed, 2 insertions(+), 14 deletions(-)

diff --git a/drivers/infiniband/hw/bnxt_re/main.c 
b/drivers/infiniband/hw/bnxt_re/main.c
index 140c54ee5916..8bfbf0231a9e 100644
--- a/drivers/infiniband/hw/bnxt_re/main.c
+++ b/drivers/infiniband/hw/bnxt_re/main.c
@@ -601,13 +601,6 @@ static struct bnxt_re_dev *bnxt_re_from_netdev(struct 
net_device *netdev)
return container_of(ibdev, struct bnxt_re_dev, ibdev);
 }
 
-static void bnxt_re_dev_unprobe(struct net_device *netdev,
-   struct bnxt_en_dev *en_dev)
-{
-   dev_put(netdev);
-   module_put(en_dev->pdev->driver->driver.owner);
-}
-
 static struct bnxt_en_dev *bnxt_re_dev_probe(struct net_device *netdev)
 {
struct bnxt_en_dev *en_dev;
@@ -628,10 +621,6 @@ static struct bnxt_en_dev *bnxt_re_dev_probe(struct 
net_device *netdev)
return ERR_PTR(-ENODEV);
}
 
-   /* Bump net device reference count */
-   if (!try_module_get(pdev->driver->driver.owner))
-   return ERR_PTR(-ENODEV);
-
dev_hold(netdev);
 
return en_dev;
@@ -1558,13 +1547,12 @@ static int bnxt_re_dev_init(struct bnxt_re_dev *rdev, 
u8 wqe_mode)
 
 static void bnxt_re_dev_unreg(struct bnxt_re_dev *rdev)
 {
-   struct bnxt_en_dev *en_dev = rdev->en_dev;
struct net_device *netdev = rdev->netdev;
 
bnxt_re_dev_remove(rdev);
 
if (netdev)
-   bnxt_re_dev_unprobe(netdev, en_dev);
+   dev_put(netdev);
 }
 
 static int bnxt_re_dev_reg(struct bnxt_re_dev **rdev, struct net_device 
*netdev)
@@ -1586,7 +1574,7 @@ static int bnxt_re_dev_reg(struct bnxt_re_dev **rdev, 
struct net_device *netdev)
*rdev = bnxt_re_dev_add(netdev, en_dev);
if (!*rdev) {
rc = -ENOMEM;
-   bnxt_re_dev_unprobe(netdev, en_dev);
+   dev_put(netdev);
goto exit;
}
 exit:
-- 
2.30.2



[PATCH rdma-next v1 0/5] Get rid of custom made module dependency

2021-03-29 Thread Leon Romanovsky
From: Leon Romanovsky 

Changelog:
v1: 
 * Go much deeper and removed useless ULP indirection
v0: https://lore.kernel.org/linux-rdma/20210324142524.1135319-1-l...@kernel.org
---

The following series fixes issue spotted in [1], where bnxt_re driver
messed with module reference counting in order to implement symbol
dependency of bnxt_re and bnxt modules. All of this is done, when in
upstream we have only one ULP user of that bnxt module. The simple
declaration of exported symbol would do the trick.

This series removes that custom module_get/_put, which is not supposed
to be in the driver from the beginning and get rid of nasty indirection
logic that isn't relevant for the upstream code.

Such small changes allow us to simplify the bnxt code and my hope that
Devesh will continue where I stopped and remove struct bnxt_ulp_ops too.

Thanks

[1] https://lore.kernel.org/linux-rdma/20210324142524.1135319-1-l...@kernel.org

Leon Romanovsky (5):
  RDMA/bnxt_re: Depend on bnxt ethernet driver and not blindly select it
  RDMA/bnxt_re: Create direct symbolic link between bnxt modules
  RDMA/bnxt_re: Get rid of custom module reference counting
  net/bnxt: Remove useless check of non-existent ULP id
  net/bnxt: Use direct API instead of useless indirection

 drivers/infiniband/hw/bnxt_re/Kconfig |   4 +-
 drivers/infiniband/hw/bnxt_re/main.c  |  93 ++-
 drivers/net/ethernet/broadcom/bnxt/bnxt.c |   4 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt.h |   1 -
 drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.c | 240 +++---
 drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.h |  32 +--
 6 files changed, 118 insertions(+), 256 deletions(-)

-- 
2.30.2



Re: [PATCH net-next] qrtr: move to staging

2021-03-28 Thread Leon Romanovsky
On Mon, Mar 29, 2021 at 07:30:08AM +0200, Greg Kroah-Hartman wrote:
> On Mon, Mar 29, 2021 at 08:17:06AM +0300, Leon Romanovsky wrote:
> > On Sun, Mar 28, 2021 at 02:26:21PM +0200, Greg Kroah-Hartman wrote:
> > > There does not seem to be any developers willing to maintain the
> > > net/qrtr/ code, so move it to drivers/staging/ so that it can be removed
> > > from the kernel tree entirely in a few kernel releases if no one steps
> > > up to maintain it.
> > > 
> > > Reported-by: Matthew Wilcox 
> > > Cc: Du Cheng 
> > > Signed-off-by: Greg Kroah-Hartman 
> > > ---
> > 
> > Greg,
> > 
> > Why don't you simply delete it like other code that is not maintained?
> 
> "normally" we have been giving code a chance by having it live in
> drivers/staging/ for a bit before removing it to allow anyone that
> actually cares about the codebase to notice it before removing it.

I don't know about netdev view on this, but for the RDMA code, the code
in staging means _not_exist_. We took this decision after/during Lustre
fiasco. 

Thanks


Re: [PATCH net-next] qrtr: move to staging

2021-03-28 Thread Leon Romanovsky
On Sun, Mar 28, 2021 at 02:26:21PM +0200, Greg Kroah-Hartman wrote:
> There does not seem to be any developers willing to maintain the
> net/qrtr/ code, so move it to drivers/staging/ so that it can be removed
> from the kernel tree entirely in a few kernel releases if no one steps
> up to maintain it.
> 
> Reported-by: Matthew Wilcox 
> Cc: Du Cheng 
> Signed-off-by: Greg Kroah-Hartman 
> ---

Greg,

Why don't you simply delete it like other code that is not maintained?

Thanks


Re: [PATCH mlx5-next v7 0/4] Dynamically assign MSI-X vectors count

2021-03-26 Thread Leon Romanovsky
On Fri, Mar 26, 2021 at 02:12:09PM -0300, Jason Gunthorpe wrote:
> On Fri, Mar 26, 2021 at 12:08:31PM -0500, Bjorn Helgaas wrote:
> 
> > Leon has implemented a ton of variations, but I don't think having all
> > the files in the PF directory was one of them.
> 
> If you promise this is the last of this bike painting adventure then
> let's do it.

So Bjorn, can you promise?

Thanks

> 
> Jason


  1   2   3   4   5   6   7   8   9   10   >