Re: [PATCH 1/3] IB/uverbs: reject invalid or unknown opcodes

2015-08-20 Thread Christoph Hellwig
On Wed, Aug 19, 2015 at 07:50:23PM +, Hefty, Sean wrote:
  AFAIK, this path is rarely (never?) actually used. I think all the
  drivers we have can post directly from userspace.
 
 I didn't think the ipath or qib drivers post from userspace.

Makes sense with their software IB stack.  Guess the idea to get rid
of this path is dead, would have been too nice..
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] IB/uverbs: reject invalid or unknown opcodes

2015-08-20 Thread Sagi Grimberg

On 8/19/2015 8:54 PM, Jason Gunthorpe wrote:

On Wed, Aug 19, 2015 at 07:48:02PM +0200, Christoph Hellwig wrote:

On Wed, Aug 19, 2015 at 11:46:14AM -0600, Jason Gunthorpe wrote:

Reviewed-by: Jason Gunthorpe jguntho...@obsidianresearch.com

AFAIK, this path is rarely (never?) actually used. I think all the
drivers we have can post directly from userspace.


Oh, interesting.  Is there any chance to deprecate it?  Not having
to care for the uvers command would really help with some of the
upcoming changes I have in my mind.


Hmm, we'd need a survey of the userspace side to see if it is rarely
or never...

And we'd have to talk to the soft XXX guys to see if they plan to use
it..


Checked in librxe (user-space softroce). Looks like posts are going via
this path...
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] IB: remove xrc_remote_srq_num from struct ib_send_wr

2015-08-20 Thread Sagi Grimberg

On 8/19/2015 7:37 PM, Christoph Hellwig wrote:

The field is only initialized in mlx, but never used.

If we want to add proper XRC support it should be done with a new
struct ib_xrc_wr.

This shrinks the various WR structures by another 4 bytes.

Signed-off-by: Christoph Hellwig h...@lst.de
---
  drivers/infiniband/hw/mlx5/qp.c | 1 -
  include/rdma/ib_verbs.h | 1 -
  2 files changed, 2 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c
index 04df156..83a290f 100644
--- a/drivers/infiniband/hw/mlx5/qp.c
+++ b/drivers/infiniband/hw/mlx5/qp.c
@@ -2634,7 +2634,6 @@ int mlx5_ib_post_send(struct ib_qp *ibqp, struct 
ib_send_wr *wr,
switch (ibqp-qp_type) {
case IB_QPT_XRC_INI:
xrc = seg;
-   xrc-xrc_srqn = htonl(wr-xrc_remote_srq_num);
seg += sizeof(*xrc);
size += sizeof(*xrc) / 16;
/* fall through */
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 9b29c78..b855189 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -1100,7 +1100,6 @@ struct ib_send_wr {
__be32  imm_data;
u32 invalidate_rkey;
} ex;
-   u32 xrc_remote_srq_num; /* XRC TGT QPs only */
  };

  struct ib_rdma_wr {



Looks OK to me,

Reviewed-by: Sagi Grimberg sa...@mellanox.com

This will need Eli's ack though...
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] IB/uverbs: reject invalid or unknown opcodes

2015-08-20 Thread Sagi Grimberg

On 8/19/2015 7:37 PM, Christoph Hellwig wrote:

We have many WR opcodes that are only supported in kernel space
and/or require optional information to be copied into the WR
structure.  Reject all those not explicitly handled so that we
can't pass invalid information to drivers.

Cc: sta...@vger.kernel.org
Signed-off-by: Christoph Hellwig h...@lst.de
---
  drivers/infiniband/core/uverbs_cmd.c | 9 -
  1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/core/uverbs_cmd.c 
b/drivers/infiniband/core/uverbs_cmd.c
index a15318a..f9f3921 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -2372,6 +2372,12 @@ ssize_t ib_uverbs_post_send(struct ib_uverbs_file *file,
next-send_flags = user_wr-send_flags;

if (is_ud) {
+   if (next-opcode != IB_WR_SEND 
+   next-opcode != IB_WR_SEND_WITH_IMM) {
+   ret = -EINVAL;
+   goto out_put;
+   }
+
next-wr.ud.ah = idr_read_ah(user_wr-wr.ud.ah,
 file-ucontext);
if (!next-wr.ud.ah) {
@@ -2413,7 +2419,8 @@ ssize_t ib_uverbs_post_send(struct ib_uverbs_file *file,
next-wr.atomic.rkey = user_wr-wr.atomic.rkey;
break;
default:
-   break;
+   ret = -EINVAL;
+   goto out_put;
}
}




Reviewed-by: Sagi Grimberg sa...@mellanox.com

Haggai, can you also have a look?
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: shrink struct ib_send_wr

2015-08-20 Thread Sagi Grimberg

  - patch 2 now explicitly replaces the weird overloading in the mlx5
driver with an explicit embedding of struct ib_send_wr, similar
to what we do for all other MRs.


This is on the user-space memory registration path.

Haggai, can you grab it for a Tested-by tag?
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH WIP 28/43] IB/core: Introduce new fast registration API

2015-08-20 Thread Sagi Grimberg

1. if register - call ib_map_mr_sg (which calls dma_map_sg)
else do dma_map_sg
2. if registered - call ib_dma_unmap_sg (which calles dma_unmap_sg)
else do dma_unmap_sg


 From what I've seen in the ULPs the flow control is generally such
that the MR is 'consumed' even if it isn't used by a send.


Not really. if registration is not needed, an MR is not consumed. In
fact, in svcrdma the IB code path never uses those, and the iWARP code
path always use those for RDMA_READs and not RDMA_WRITEs. Also, isert
use those only when signature is enabled and registration is required.



So lkey usage is simply split into things that absolutely don't need a
MR, and things that maybe do. The maybe side can go ahead and always
consume the MR resource, but optimize the implementation to a SG list
to avoid a performance hit.

Then the whole API becomes symmetric. The ULP says, 'here is a
scatterlist list and a lkey MR, make me a ib_sg list' and the core
either packes it as is into the sg, or it spins up the MR and packs
that.


Always consuming an MR resource is an extra lock acquire given these
are always kept in a pool structure.


I'm thinking we should keep dma_map_sg out of ib_map_mr_sg, and leave
it to the ULP like it does today (at least in the first stage...)


I'm fine with first stage, but we still really do need to figure how
how to get better code sharing in our API here..

Maybe we can do the rkey side right away until we can figure out how
to harmonize the rkey sg/mr usage?


I'm fine with that. I agree we still need to do better.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v1 for-next 2/7] IB/core: Allow setting create flags in QP init attribute

2015-08-20 Thread Eran Ben Elisha
Allow setting IB_QP_CREATE_BLOCK_MULTICAST_LOOPBACK at create_flags in
ib_uverbs_create_qp_ex.

Signed-off-by: Eran Ben Elisha era...@mellanox.com
---
 drivers/infiniband/core/uverbs_cmd.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/infiniband/core/uverbs_cmd.c 
b/drivers/infiniband/core/uverbs_cmd.c
index 3cc2261..fbde0c6 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -1843,7 +1843,7 @@ static int create_qp(struct ib_uverbs_file *file,
  sizeof(cmd-create_flags))
attr.create_flags = cmd-create_flags;
 
-   if (attr.create_flags) {
+   if (attr.create_flags  ~IB_QP_CREATE_BLOCK_MULTICAST_LOOPBACK) {
ret = -EINVAL;
goto err_put;
}
-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v1 for-next 4/7] net/mlx4_en: Implement mcast loopback prevention for ETH qps

2015-08-20 Thread Eran Ben Elisha
From: Maor Gottlieb ma...@mellanox.com

Set the mcast loopback prevention bit in the QPC for ETH MLX QPs (not
RSS QPs), when the firmware supports this feature. In addition, all rx
ring QPs need to be updated in order not to enforce loopback checks.
This prevents getting packets we sent both from the network stack and
the HCA. Loopback prevention is done by comparing the counter indices of
the sent and receiving QPs. If they're equal, packets aren't
loopback-ed.

Signed-off-by: Maor Gottlieb ma...@mellanox.com
Signed-off-by: Eran Ben Elisha era...@mellanox.com
---
 drivers/net/ethernet/mellanox/mlx4/en_main.c  | 22 
 drivers/net/ethernet/mellanox/mlx4/en_resources.c | 25 +++
 drivers/net/ethernet/mellanox/mlx4/mlx4_en.h  |  3 ++-
 3 files changed, 49 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_main.c 
b/drivers/net/ethernet/mellanox/mlx4/en_main.c
index a946e4b..005f910 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_main.c
@@ -123,6 +123,28 @@ void mlx4_en_update_loopback_state(struct net_device *dev,
 */
if (mlx4_is_mfunc(priv-mdev-dev) || priv-validate_loopback)
priv-flags |= MLX4_EN_FLAG_ENABLE_HW_LOOPBACK;
+
+   mutex_lock(priv-mdev-state_lock);
+   if (priv-mdev-dev-caps.flags2 
+   MLX4_DEV_CAP_FLAG2_UPDATE_QP_SRC_CHECK_LB 
+   priv-rss_map.indir_qp.qpn) {
+   int i;
+   int err = 0;
+   int loopback = !!(features  NETIF_F_LOOPBACK);
+
+   for (i = 0; i  priv-rx_ring_num; i++) {
+   int ret;
+
+   ret = mlx4_en_change_mcast_lb(priv,
+ priv-rss_map.qps[i],
+ loopback);
+   if (!err)
+   err = ret;
+   }
+   if (err)
+   mlx4_warn(priv-mdev, failed to change mcast 
loopback\n);
+   }
+   mutex_unlock(priv-mdev-state_lock);
 }
 
 static int mlx4_en_get_profile(struct mlx4_en_dev *mdev)
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_resources.c 
b/drivers/net/ethernet/mellanox/mlx4/en_resources.c
index e482fa1b..12aab5a 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_resources.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_resources.c
@@ -69,6 +69,15 @@ void mlx4_en_fill_qp_context(struct mlx4_en_priv *priv, int 
size, int stride,
context-pri_path.counter_index = priv-counter_index;
context-cqn_send = cpu_to_be32(cqn);
context-cqn_recv = cpu_to_be32(cqn);
+   if (!rss 
+   (mdev-dev-caps.flags2  MLX4_DEV_CAP_FLAG2_LB_SRC_CHK) 
+   context-pri_path.counter_index !=
+   MLX4_SINK_COUNTER_INDEX(mdev-dev)) {
+   /* disable multicast loopback to qp with same counter */
+   if (!(dev-features  NETIF_F_LOOPBACK))
+   context-pri_path.fl |= MLX4_FL_ETH_SRC_CHECK_MC_LB;
+   context-pri_path.control |= MLX4_CTRL_ETH_SRC_CHECK_IF_COUNTER;
+   }
context-db_rec_addr = cpu_to_be64(priv-res.db.dma  2);
if (!(dev-features  NETIF_F_HW_VLAN_CTAG_RX))
context-param3 |= cpu_to_be32(1  30);
@@ -80,6 +89,22 @@ void mlx4_en_fill_qp_context(struct mlx4_en_priv *priv, int 
size, int stride,
}
 }
 
+int mlx4_en_change_mcast_lb(struct mlx4_en_priv *priv, struct mlx4_qp *qp,
+   int loopback)
+{
+   int ret;
+   struct mlx4_update_qp_params qp_params;
+
+   memset(qp_params, 0, sizeof(qp_params));
+   if (!loopback)
+   qp_params.flags = MLX4_UPDATE_QP_PARAMS_FLAGS_ETH_CHECK_MC_LB;
+
+   ret = mlx4_update_qp(priv-mdev-dev, qp-qpn,
+MLX4_UPDATE_QP_ETH_SRC_CHECK_MC_LB,
+qp_params);
+
+   return ret;
+}
 
 int mlx4_en_map_buffer(struct mlx4_buf *buf)
 {
diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h 
b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
index 666d166..7db86d4 100644
--- a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
+++ b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
@@ -797,7 +797,8 @@ void mlx4_en_fill_qp_context(struct mlx4_en_priv *priv, int 
size, int stride,
 void mlx4_en_sqp_event(struct mlx4_qp *qp, enum mlx4_event event);
 int mlx4_en_map_buffer(struct mlx4_buf *buf);
 void mlx4_en_unmap_buffer(struct mlx4_buf *buf);
-
+int mlx4_en_change_mcast_lb(struct mlx4_en_priv *priv, struct mlx4_qp *qp,
+   int loopback);
 void mlx4_en_calc_rx_buf(struct net_device *dev);
 int mlx4_en_config_rss_steer(struct mlx4_en_priv *priv);
 void mlx4_en_release_rss_steer(struct mlx4_en_priv *priv);
-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More 

[PATCH v1 for-next 3/7] net/mlx4_core: Add support for filtering multicast loopback

2015-08-20 Thread Eran Ben Elisha
From: Maor Gottlieb ma...@mellanox.com

Update device capabilities regarding HW filtering multicast loopback support.

Add MLX4_UPDATE_QP_ETH_SRC_CHECK_MC_LB attribute to mlx4_update_qp to
enable changing QP context to support filtering incoming multicast
loopback traffic according the sender's counter index.

Set the corresponding bits in QP context to force the loopback source
checks if attribute is given and HW supports it.

Signed-off-by: Maor Gottlieb ma...@mellanox.com
Signed-off-by: Eran Ben Elisha era...@mellanox.com
---
 drivers/net/ethernet/mellanox/mlx4/fw.c|  6 +
 drivers/net/ethernet/mellanox/mlx4/qp.c| 19 +-
 .../net/ethernet/mellanox/mlx4/resource_tracker.c  | 30 +-
 include/linux/mlx4/device.h|  2 ++
 include/linux/mlx4/qp.h| 24 +
 5 files changed, 68 insertions(+), 13 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/fw.c 
b/drivers/net/ethernet/mellanox/mlx4/fw.c
index e30bf57..5218e1e 100644
--- a/drivers/net/ethernet/mellanox/mlx4/fw.c
+++ b/drivers/net/ethernet/mellanox/mlx4/fw.c
@@ -154,6 +154,8 @@ static void dump_dev_cap_flags2(struct mlx4_dev *dev, u64 
flags)
[26] = Port ETS Scheduler support,
[27] = Port beacon support,
[28] = RX-ALL support,
+   [31] = Modifying loopback source checks using UPDATE_QP 
support,
+   [32] = Loopback source checks support,
};
int i;
 
@@ -946,6 +948,10 @@ int mlx4_QUERY_DEV_CAP(struct mlx4_dev *dev, struct 
mlx4_dev_cap *dev_cap)
MLX4_GET(field32, outbox, QUERY_DEV_CAP_EXT_2_FLAGS_OFFSET);
if (field32  (1  16))
dev_cap-flags2 |= MLX4_DEV_CAP_FLAG2_UPDATE_QP;
+   if (field32  (1  18))
+   dev_cap-flags2 |= MLX4_DEV_CAP_FLAG2_UPDATE_QP_SRC_CHECK_LB;
+   if (field32  (1  19))
+   dev_cap-flags2 |= MLX4_DEV_CAP_FLAG2_LB_SRC_CHK;
if (field32  (1  26))
dev_cap-flags2 |= MLX4_DEV_CAP_FLAG2_VLAN_CONTROL;
if (field32  (1  20))
diff --git a/drivers/net/ethernet/mellanox/mlx4/qp.c 
b/drivers/net/ethernet/mellanox/mlx4/qp.c
index 2026863..b162495 100644
--- a/drivers/net/ethernet/mellanox/mlx4/qp.c
+++ b/drivers/net/ethernet/mellanox/mlx4/qp.c
@@ -436,6 +436,23 @@ int mlx4_update_qp(struct mlx4_dev *dev, u32 qpn,
cmd-qp_context.pri_path.grh_mylmc = params-smac_index;
}
 
+   if (attr  MLX4_UPDATE_QP_ETH_SRC_CHECK_MC_LB) {
+   if (!(dev-caps.flags2
+  MLX4_DEV_CAP_FLAG2_UPDATE_QP_SRC_CHECK_LB)) {
+   mlx4_warn(dev,
+ Trying to set src check LB, but it isn't 
supported\n);
+   err = -ENOTSUPP;
+   goto out;
+   }
+   pri_addr_path_mask |=
+   1ULL  MLX4_UPD_QP_PATH_MASK_ETH_SRC_CHECK_MC_LB;
+   if (params-flags 
+   MLX4_UPDATE_QP_PARAMS_FLAGS_ETH_CHECK_MC_LB) {
+   cmd-qp_context.pri_path.fl |=
+   MLX4_FL_ETH_SRC_CHECK_MC_LB;
+   }
+   }
+
if (attr  MLX4_UPDATE_QP_VSD) {
qp_mask |= 1ULL  MLX4_UPD_QP_MASK_VSD;
if (params-flags  MLX4_UPDATE_QP_PARAMS_FLAGS_VSD_ENABLE)
@@ -458,7 +475,7 @@ int mlx4_update_qp(struct mlx4_dev *dev, u32 qpn,
err = mlx4_cmd(dev, mailbox-dma, qpn  0xff, 0,
   MLX4_CMD_UPDATE_QP, MLX4_CMD_TIME_CLASS_A,
   MLX4_CMD_NATIVE);
-
+out:
mlx4_free_cmd_mailbox(dev, mailbox);
return err;
 }
diff --git a/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c 
b/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c
index 731423c..502f335 100644
--- a/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c
+++ b/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c
@@ -770,9 +770,12 @@ static int update_vport_qp_param(struct mlx4_dev *dev,
}
}
 
+   /* preserve IF_COUNTER flag */
+   qpc-pri_path.vlan_control =
+   MLX4_CTRL_ETH_SRC_CHECK_IF_COUNTER;
if (vp_oper-state.link_state == IFLA_VF_LINK_STATE_DISABLE 
dev-caps.flags2  MLX4_DEV_CAP_FLAG2_UPDATE_QP) {
-   qpc-pri_path.vlan_control =
+   qpc-pri_path.vlan_control |=
MLX4_VLAN_CTRL_ETH_TX_BLOCK_TAGGED |
MLX4_VLAN_CTRL_ETH_TX_BLOCK_PRIO_TAGGED |
MLX4_VLAN_CTRL_ETH_TX_BLOCK_UNTAGGED |
@@ -780,12 +783,12 @@ static int update_vport_qp_param(struct mlx4_dev *dev,
MLX4_VLAN_CTRL_ETH_RX_BLOCK_UNTAGGED |
MLX4_VLAN_CTRL_ETH_RX_BLOCK_TAGGED;
} else if (0 != 

[PATCH v1 for-next 7/7] IB/mlx4: Add support for blocking multicast loopback QP creation user flag

2015-08-20 Thread Eran Ben Elisha
MLX4_IB_QP_BLOCK_MULTICAST_LOOPBACK is now supported downstream.

In addition, this flag was supported only for IB_QPT_UD, now, with the
new implementation it is supported for all QP types.

Support IB_USER_VERBS_EX_CMD_CREATE_QP in order to get the flag from
user space using the extension create qp command.

Signed-off-by: Eran Ben Elisha era...@mellanox.com
---
 drivers/infiniband/hw/mlx4/main.c |  3 ++-
 drivers/infiniband/hw/mlx4/qp.c   | 13 -
 2 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/drivers/infiniband/hw/mlx4/main.c 
b/drivers/infiniband/hw/mlx4/main.c
index 26a96b8..1079ee5 100644
--- a/drivers/infiniband/hw/mlx4/main.c
+++ b/drivers/infiniband/hw/mlx4/main.c
@@ -2310,7 +2310,8 @@ static void *mlx4_ib_add(struct mlx4_dev *dev)
 
ibdev-ib_dev.uverbs_ex_cmd_mask |=
(1ull  IB_USER_VERBS_EX_CMD_QUERY_DEVICE) |
-   (1ull  IB_USER_VERBS_EX_CMD_CREATE_CQ);
+   (1ull  IB_USER_VERBS_EX_CMD_CREATE_CQ) |
+   (1ull  IB_USER_VERBS_EX_CMD_CREATE_QP);
 
mlx4_ib_alloc_eqs(dev, ibdev);
 
diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c
index 2871949..ba25a1b 100644
--- a/drivers/infiniband/hw/mlx4/qp.c
+++ b/drivers/infiniband/hw/mlx4/qp.c
@@ -758,9 +758,6 @@ static int create_qp_common(struct mlx4_ib_dev *dev, struct 
ib_pd *pd,
} else {
qp-sq_no_prefetch = 0;
 
-   if (init_attr-create_flags  
IB_QP_CREATE_BLOCK_MULTICAST_LOOPBACK)
-   qp-flags |= MLX4_IB_QP_BLOCK_MULTICAST_LOOPBACK;
-
if (init_attr-create_flags  IB_QP_CREATE_IPOIB_UD_LSO)
qp-flags |= MLX4_IB_QP_LSO;
 
@@ -834,6 +831,9 @@ static int create_qp_common(struct mlx4_ib_dev *dev, struct 
ib_pd *pd,
goto err_proxy;
}
 
+   if (init_attr-create_flags  IB_QP_CREATE_BLOCK_MULTICAST_LOOPBACK)
+   qp-flags |= MLX4_IB_QP_BLOCK_MULTICAST_LOOPBACK;
+
err = mlx4_qp_alloc(dev-dev, qpn, qp-mqp, gfp);
if (err)
goto err_qpn;
@@ -1098,6 +1098,7 @@ struct ib_qp *mlx4_ib_create_qp(struct ib_pd *pd,
 {
struct mlx4_ib_qp *qp = NULL;
int err;
+   int sup_u_create_flags = MLX4_IB_QP_BLOCK_MULTICAST_LOOPBACK;
u16 xrcdn = 0;
gfp_t gfp;
 
@@ -1121,8 +1122,10 @@ struct ib_qp *mlx4_ib_create_qp(struct ib_pd *pd,
}
 
if (init_attr-create_flags 
-   (udata ||
-((init_attr-create_flags  ~(MLX4_IB_SRIOV_SQP | 
MLX4_IB_QP_CREATE_USE_GFP_NOIO)) 
+   ((udata  init_attr-create_flags  ~(sup_u_create_flags)) ||
+((init_attr-create_flags  ~(MLX4_IB_SRIOV_SQP |
+  MLX4_IB_QP_CREATE_USE_GFP_NOIO |
+  
MLX4_IB_QP_BLOCK_MULTICAST_LOOPBACK)) 
  init_attr-qp_type != IB_QPT_UD) ||
 ((init_attr-create_flags  MLX4_IB_SRIOV_SQP) 
  init_attr-qp_type  IB_QPT_GSI)))
-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v1 for-next 5/7] IB/mlx4: Add IB counters table

2015-08-20 Thread Eran Ben Elisha
This is an infrastructure step for allocating and attaching more than
one counter to QPs on the same port. Allocate a counters table and
manage the insertion and removals of the counters in load and unload of
mlx4 IB.

Signed-off-by: Eran Ben Elisha era...@mellanox.com
---
 drivers/infiniband/hw/mlx4/mad.c | 25 ++
 drivers/infiniband/hw/mlx4/main.c| 63 
 drivers/infiniband/hw/mlx4/mlx4_ib.h |  9 +-
 drivers/infiniband/hw/mlx4/qp.c  |  8 +++--
 4 files changed, 81 insertions(+), 24 deletions(-)

diff --git a/drivers/infiniband/hw/mlx4/mad.c b/drivers/infiniband/hw/mlx4/mad.c
index 1cd75ff..68f2567 100644
--- a/drivers/infiniband/hw/mlx4/mad.c
+++ b/drivers/infiniband/hw/mlx4/mad.c
@@ -824,18 +824,29 @@ static int iboe_process_mad(struct ib_device *ibdev, int 
mad_flags, u8 port_num,
 {
struct mlx4_counter counter_stats;
struct mlx4_ib_dev *dev = to_mdev(ibdev);
-   int err;
+   struct counter_index *tmp_counter;
+   int err = IB_MAD_RESULT_FAILURE, stats_avail = 0;
 
if (in_mad-mad_hdr.mgmt_class != IB_MGMT_CLASS_PERF_MGMT)
return -EINVAL;
 
memset(counter_stats, 0, sizeof(counter_stats));
-   err = mlx4_get_counter_stats(dev-dev,
-dev-counters[port_num - 1].index,
-counter_stats, 0);
-   if (err)
-   err = IB_MAD_RESULT_FAILURE;
-   else {
+   mutex_lock(dev-counters_table[port_num - 1].mutex);
+   list_for_each_entry(tmp_counter,
+   dev-counters_table[port_num - 1].counters_list,
+   list) {
+   err = mlx4_get_counter_stats(dev-dev,
+tmp_counter-index,
+counter_stats, 0);
+   if (err) {
+   err = IB_MAD_RESULT_FAILURE;
+   stats_avail = 0;
+   break;
+   }
+   stats_avail = 1;
+   }
+   mutex_unlock(dev-counters_table[port_num - 1].mutex);
+   if (stats_avail) {
memset(out_mad-data, 0, sizeof out_mad-data);
switch (counter_stats.counter_mode  0xf) {
case 0:
diff --git a/drivers/infiniband/hw/mlx4/main.c 
b/drivers/infiniband/hw/mlx4/main.c
index efecdf0..26a96b8 100644
--- a/drivers/infiniband/hw/mlx4/main.c
+++ b/drivers/infiniband/hw/mlx4/main.c
@@ -1247,6 +1247,22 @@ static int add_gid_entry(struct ib_qp *ibqp, union 
ib_gid *gid)
return 0;
 }
 
+static void mlx4_ib_delete_counters_table(struct mlx4_ib_dev *ibdev,
+ struct mlx4_ib_counters *ctr_table)
+{
+   struct counter_index *counter, *tmp_count;
+
+   mutex_lock(ctr_table-mutex);
+   list_for_each_entry_safe(counter, tmp_count, ctr_table-counters_list,
+list) {
+   if (counter-allocated)
+   mlx4_counter_free(ibdev-dev, counter-index);
+   list_del(counter-list);
+   kfree(counter);
+   }
+   mutex_unlock(ctr_table-mutex);
+}
+
 int mlx4_ib_add_mc(struct mlx4_ib_dev *mdev, struct mlx4_ib_qp *mqp,
   union ib_gid *gid)
 {
@@ -2131,6 +2147,7 @@ static void *mlx4_ib_add(struct mlx4_dev *dev)
int num_req_counters;
int allocated;
u32 counter_index;
+   struct counter_index *new_counter_index = NULL;
 
pr_info_once(%s, mlx4_ib_version);
 
@@ -2302,6 +2319,11 @@ static void *mlx4_ib_add(struct mlx4_dev *dev)
if (init_node_data(ibdev))
goto err_map;
 
+   for (i = 0; i  ibdev-num_ports; ++i) {
+   mutex_init(ibdev-counters_table[i].mutex);
+   INIT_LIST_HEAD(ibdev-counters_table[i].counters_list);
+   }
+
num_req_counters = mlx4_is_bonded(dev) ? 1 : ibdev-num_ports;
for (i = 0; i  num_req_counters; ++i) {
mutex_init(ibdev-qp1_proxy_lock[i]);
@@ -2320,15 +2342,34 @@ static void *mlx4_ib_add(struct mlx4_dev *dev)
counter_index = mlx4_get_default_counter_index(dev,
   i + 1);
}
-   ibdev-counters[i].index = counter_index;
-   ibdev-counters[i].allocated = allocated;
+   new_counter_index = kmalloc(sizeof(*new_counter_index),
+   GFP_KERNEL);
+   if (!new_counter_index) {
+   if (allocated)
+   mlx4_counter_free(ibdev-dev, counter_index);
+   goto err_counter;
+   }
+   new_counter_index-index = counter_index;
+   new_counter_index-allocated = allocated;
+   list_add_tail(new_counter_index-list,
+ 

[PATCH v1 for-next 1/7] IB/core: Extend ib_uverbs_create_qp

2015-08-20 Thread Eran Ben Elisha
ib_uverbs_ex_create_qp follows the extension verbs
mechanism. New features (for example, QP creation flags
field which is added in a downstream patch) could used
via user-space libraries without breaking the ABI.

Signed-off-by: Eran Ben Elisha era...@mellanox.com
---
 drivers/infiniband/core/uverbs.h  |   1 +
 drivers/infiniband/core/uverbs_cmd.c  | 259 +-
 drivers/infiniband/core/uverbs_main.c |   1 +
 include/uapi/rdma/ib_user_verbs.h |  26 
 4 files changed, 222 insertions(+), 65 deletions(-)

diff --git a/drivers/infiniband/core/uverbs.h b/drivers/infiniband/core/uverbs.h
index 3863d33..94bbd8c 100644
--- a/drivers/infiniband/core/uverbs.h
+++ b/drivers/infiniband/core/uverbs.h
@@ -272,5 +272,6 @@ IB_UVERBS_DECLARE_EX_CMD(create_flow);
 IB_UVERBS_DECLARE_EX_CMD(destroy_flow);
 IB_UVERBS_DECLARE_EX_CMD(query_device);
 IB_UVERBS_DECLARE_EX_CMD(create_cq);
+IB_UVERBS_DECLARE_EX_CMD(create_qp);
 
 #endif /* UVERBS_H */
diff --git a/drivers/infiniband/core/uverbs_cmd.c 
b/drivers/infiniband/core/uverbs_cmd.c
index a15318a..3cc2261 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -1741,66 +1741,65 @@ ssize_t ib_uverbs_destroy_cq(struct ib_uverbs_file 
*file,
return in_len;
 }
 
-ssize_t ib_uverbs_create_qp(struct ib_uverbs_file *file,
-   struct ib_device *ib_dev,
-   const char __user *buf, int in_len,
-   int out_len)
-{
-   struct ib_uverbs_create_qp  cmd;
-   struct ib_uverbs_create_qp_resp resp;
-   struct ib_udata udata;
-   struct ib_uqp_object   *obj;
-   struct ib_device   *device;
-   struct ib_pd   *pd = NULL;
-   struct ib_xrcd *xrcd = NULL;
-   struct ib_uobject  *uninitialized_var(xrcd_uobj);
-   struct ib_cq   *scq = NULL, *rcq = NULL;
-   struct ib_srq  *srq = NULL;
-   struct ib_qp   *qp;
-   struct ib_qp_init_attr  attr;
-   int ret;
-
-   if (out_len  sizeof resp)
-   return -ENOSPC;
-
-   if (copy_from_user(cmd, buf, sizeof cmd))
-   return -EFAULT;
+static int create_qp(struct ib_uverbs_file *file,
+struct ib_udata *ucore,
+struct ib_udata *uhw,
+struct ib_uverbs_ex_create_qp *cmd,
+size_t cmd_sz,
+int (*cb)(struct ib_uverbs_file *file,
+  struct ib_uverbs_ex_create_qp_resp *resp,
+  struct ib_udata *udata),
+void *context)
+{
+   struct ib_uqp_object*obj;
+   struct ib_device*device;
+   struct ib_pd*pd = NULL;
+   struct ib_xrcd  *xrcd = NULL;
+   struct ib_uobject   *uninitialized_var(xrcd_uobj);
+   struct ib_cq*scq = NULL, *rcq = NULL;
+   struct ib_srq   *srq = NULL;
+   struct ib_qp*qp;
+   char*buf;
+   struct ib_qp_init_attr  attr;
+   struct ib_uverbs_ex_create_qp_resp resp;
+   int ret;
 
-   if (cmd.qp_type == IB_QPT_RAW_PACKET  !capable(CAP_NET_RAW))
+   if (cmd-qp_type == IB_QPT_RAW_PACKET  !capable(CAP_NET_RAW))
return -EPERM;
 
-   INIT_UDATA(udata, buf + sizeof cmd,
-  (unsigned long) cmd.response + sizeof resp,
-  in_len - sizeof cmd, out_len - sizeof resp);
-
obj = kzalloc(sizeof *obj, GFP_KERNEL);
if (!obj)
return -ENOMEM;
 
-   init_uobj(obj-uevent.uobject, cmd.user_handle, file-ucontext, 
qp_lock_class);
+   init_uobj(obj-uevent.uobject, cmd-user_handle, file-ucontext,
+ qp_lock_class);
down_write(obj-uevent.uobject.mutex);
 
-   if (cmd.qp_type == IB_QPT_XRC_TGT) {
-   xrcd = idr_read_xrcd(cmd.pd_handle, file-ucontext, xrcd_uobj);
+   if (cmd-qp_type == IB_QPT_XRC_TGT) {
+   xrcd = idr_read_xrcd(cmd-pd_handle, file-ucontext,
+xrcd_uobj);
if (!xrcd) {
ret = -EINVAL;
goto err_put;
}
device = xrcd-device;
} else {
-   if (cmd.qp_type == IB_QPT_XRC_INI) {
-   cmd.max_recv_wr = cmd.max_recv_sge = 0;
+   if (cmd-qp_type == IB_QPT_XRC_INI) {
+   cmd-max_recv_wr = 0;
+   cmd-max_recv_sge = 0;
} else {
-   if (cmd.is_srq) {
-   srq = idr_read_srq(cmd.srq_handle, 
file-ucontext);
+   if (cmd-is_srq) {
+ 

[PATCH v1 for-next 0/7] Add support for multicast loopback prevention to mlx4

2015-08-20 Thread Eran Ben Elisha
Hi Doug,

This patch-set adds a new  implementation for multicast loopback prevention for
mlx4 driver.  The current implementation is very limited, especially if link
layer is Ethernet. The new implementation is based on HW feature of dropping
incoming multicast packets if the sender QP counter index is equal to the
receiver counter index.

Patch 0001 extends ib_uverbs_create_qp in order to allow receiving the
multicast loopback flag at create flags.
Patch 0002 adds an infrastructure for the counters' loopback prevention in the
mlx4_core.
Patch 0003 modifies mlx4_en QPs to use the new loopback prevention mode.
Patches 0004-0006 implements this feature for mlx4_ib driver.
Patch 0007 allows setting IB_QP_CREATE_BLOCK_MULTICAST_LOOPBACK in create_flag
field both from uverbs and ib_create_qp.

Changes from v0:
  Move loopback assignment outside the for loop according to Yuval's comment
  rebase over to-be-rebased/for-4.3 


Thanks,
Eran.

Eran Ben Elisha (5):
  IB/core: Extend ib_uverbs_create_qp
  IB/core: Allow setting create flags in QP init attribute
  IB/mlx4: Add IB counters table
  IB/mlx4: Add counter based implementation for QP multicast loopback
block
  IB/mlx4: Add support for blocking multicast loopback QP creation user
flag

Maor Gottlieb (2):
  net/mlx4_core: Add support for filtering multicast loopback
  net/mlx4_en: Implement mcast loopback prevention for ETH qps

 drivers/infiniband/core/uverbs.h   |   1 +
 drivers/infiniband/core/uverbs_cmd.c   | 259 +++--
 drivers/infiniband/core/uverbs_main.c  |   1 +
 drivers/infiniband/hw/mlx4/mad.c   |  25 +-
 drivers/infiniband/hw/mlx4/main.c  |  66 --
 drivers/infiniband/hw/mlx4/mlx4_ib.h   |  10 +-
 drivers/infiniband/hw/mlx4/qp.c|  88 ++-
 drivers/net/ethernet/mellanox/mlx4/en_main.c   |  22 ++
 drivers/net/ethernet/mellanox/mlx4/en_resources.c  |  25 ++
 drivers/net/ethernet/mellanox/mlx4/fw.c|   6 +
 drivers/net/ethernet/mellanox/mlx4/mlx4_en.h   |   3 +-
 drivers/net/ethernet/mellanox/mlx4/qp.c|  19 +-
 .../net/ethernet/mellanox/mlx4/resource_tracker.c  |  30 ++-
 include/linux/mlx4/device.h|   2 +
 include/linux/mlx4/qp.h|  24 +-
 include/uapi/rdma/ib_user_verbs.h  |  26 +++
 16 files changed, 498 insertions(+), 109 deletions(-)

-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH for-next 10/10] IB/mlx5: Support RoCE

2015-08-20 Thread Achiad Shochat
Advertise RoCE support for IB/core layer and set the hardware to
work in RoCE mode.

Signed-off-by: Achiad Shochat ach...@mellanox.com
---
 drivers/infiniband/hw/mlx5/main.c | 48 +++
 1 file changed, 44 insertions(+), 4 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/main.c 
b/drivers/infiniband/hw/mlx5/main.c
index 27dab5d..3e2e24e8 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -1586,6 +1586,32 @@ static void destroy_dev_resources(struct 
mlx5_ib_resources *devr)
mlx5_ib_dealloc_pd(devr-p0);
 }
 
+static u32 get_core_cap_flags(struct ib_device *ibdev)
+{
+   struct mlx5_ib_dev *dev = to_mdev(ibdev);
+   enum rdma_link_layer ll = mlx5_ib_port_link_layer(ibdev, 1);
+   u8 l3_type_cap = MLX5_CAP_ROCE(dev-mdev, l3_type);
+   u8 roce_version_cap = MLX5_CAP_ROCE(dev-mdev, roce_version);
+   u32 ret = 0;
+
+   if (ll == IB_LINK_LAYER_INFINIBAND)
+   return RDMA_CORE_PORT_IBA_IB;
+
+   if (!(l3_type_cap  MLX5_ROCE_L3_TYPE_IPV4_CAP))
+   return 0;
+
+   if (!(l3_type_cap  MLX5_ROCE_L3_TYPE_IPV6_CAP))
+   return 0;
+
+   if (roce_version_cap  MLX5_ROCE_VERSION_1_CAP)
+   ret |= RDMA_CORE_PORT_IBA_ROCE;
+
+   if (roce_version_cap  MLX5_ROCE_VERSION_2_CAP)
+   ret |= RDMA_CORE_PORT_IBA_ROCE_UDP_ENCAP;
+
+   return ret;
+}
+
 static int mlx5_port_immutable(struct ib_device *ibdev, u8 port_num,
   struct ib_port_immutable *immutable)
 {
@@ -1598,7 +1624,7 @@ static int mlx5_port_immutable(struct ib_device *ibdev, 
u8 port_num,
 
immutable-pkey_tbl_len = attr.pkey_tbl_len;
immutable-gid_tbl_len = attr.gid_tbl_len;
-   immutable-core_cap_flags = RDMA_CORE_PORT_IBA_IB;
+   immutable-core_cap_flags = get_core_cap_flags(ibdev);
immutable-max_mad_size = IB_MGMT_MAD_SIZE;
 
return 0;
@@ -1606,13 +1632,28 @@ static int mlx5_port_immutable(struct ib_device *ibdev, 
u8 port_num,
 
 static int mlx5_enable_roce(struct mlx5_ib_dev *dev)
 {
+   int err;
+
rwlock_init(dev-roce.netdev_lock);
dev-roce.nb.notifier_call = mlx5_netdev_event;
-   return register_netdevice_notifier(dev-roce.nb);
+   err = register_netdevice_notifier(dev-roce.nb);
+   if (err)
+   return err;
+
+   err = mlx5_nic_vport_enable_roce(dev-mdev);
+   if (err)
+   goto err_unregister_netdevice_notifier;
+
+   return 0;
+
+err_unregister_netdevice_notifier:
+   unregister_netdevice_notifier(dev-roce.nb);
+   return err;
 }
 
 static void mlx5_disable_roce(struct mlx5_ib_dev *dev)
 {
+   mlx5_nic_vport_disable_roce(dev-mdev);
unregister_netdevice_notifier(dev-roce.nb);
 }
 
@@ -1627,8 +1668,7 @@ static void *mlx5_ib_add(struct mlx5_core_dev *mdev)
port_type_cap = MLX5_CAP_GEN(mdev, port_type);
ll = mlx5_port_type_cap_to_rdma_ll(port_type_cap);
 
-   /* don't create IB instance over Eth ports, no RoCE yet! */
-   if (ll == IB_LINK_LAYER_ETHERNET)
+   if ((ll == IB_LINK_LAYER_ETHERNET)  !MLX5_CAP_GEN(mdev, roce))
return NULL;
 
printk_once(KERN_INFO %s, mlx5_version);
-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH for-next 02/10] IB/mlx5: Support IB device's callback for getting its netdev

2015-08-20 Thread Achiad Shochat
For Eth ports only.
Maintain a net device pointer in mlx5_ib_device and update it
upon NETDEV_REGISTER and NETDEV_UNREGISTER events if the
net-device and IB device have the same PCI parent device.
Implement the get_netdev callback to return this net device.

Signed-off-by: Achiad Shochat ach...@mellanox.com
---
 drivers/infiniband/hw/mlx5/main.c| 64 +++-
 drivers/infiniband/hw/mlx5/mlx5_ib.h | 10 ++
 2 files changed, 73 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/hw/mlx5/main.c 
b/drivers/infiniband/hw/mlx5/main.c
index 8540e00..5a176d7 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -85,6 +85,41 @@ mlx5_ib_port_link_layer(struct ib_device *device, u8 
port_num)
return mlx5_port_type_cap_to_rdma_ll(port_type_cap);
 }
 
+static int mlx5_netdev_event(struct notifier_block *this,
+unsigned long event, void *ptr)
+{
+   struct net_device *ndev = netdev_notifier_info_to_dev(ptr);
+   struct mlx5_ib_dev *ibdev = container_of(this, struct mlx5_ib_dev,
+roce.nb);
+
+   if ((event != NETDEV_UNREGISTER)  (event != NETDEV_REGISTER))
+   return NOTIFY_DONE;
+
+   write_lock(ibdev-roce.netdev_lock);
+   if (ndev-dev.parent == ibdev-mdev-pdev-dev)
+   ibdev-roce.netdev = (event == NETDEV_UNREGISTER) ? NULL : ndev;
+   write_unlock(ibdev-roce.netdev_lock);
+
+   return NOTIFY_DONE;
+}
+
+static struct net_device *mlx5_ib_get_netdev(struct ib_device *device,
+u8 port_num)
+{
+   struct mlx5_ib_dev *ibdev = to_mdev(device);
+   struct net_device *ndev;
+
+   /* Ensure ndev does not disappear before we invoke dev_hold()
+*/
+   read_lock(ibdev-roce.netdev_lock);
+   ndev = ibdev-roce.netdev;
+   if (ndev)
+   dev_hold(ndev);
+   read_unlock(ibdev-roce.netdev_lock);
+
+   return ndev;
+}
+
 static int mlx5_use_mad_ifc(struct mlx5_ib_dev *dev)
 {
return !dev-mdev-issi;
@@ -1398,6 +1433,18 @@ static int mlx5_port_immutable(struct ib_device *ibdev, 
u8 port_num,
return 0;
 }
 
+static int mlx5_enable_roce(struct mlx5_ib_dev *dev)
+{
+   rwlock_init(dev-roce.netdev_lock);
+   dev-roce.nb.notifier_call = mlx5_netdev_event;
+   return register_netdevice_notifier(dev-roce.nb);
+}
+
+static void mlx5_disable_roce(struct mlx5_ib_dev *dev)
+{
+   unregister_netdevice_notifier(dev-roce.nb);
+}
+
 static void *mlx5_ib_add(struct mlx5_core_dev *mdev)
 {
struct mlx5_ib_dev *dev;
@@ -1471,6 +1518,8 @@ static void *mlx5_ib_add(struct mlx5_core_dev *mdev)
dev-ib_dev.query_device= mlx5_ib_query_device;
dev-ib_dev.query_port  = mlx5_ib_query_port;
dev-ib_dev.get_link_layer  = mlx5_ib_port_link_layer;
+   if (ll == IB_LINK_LAYER_ETHERNET)
+   dev-ib_dev.get_netdev  = mlx5_ib_get_netdev;
dev-ib_dev.query_gid   = mlx5_ib_query_gid;
dev-ib_dev.query_pkey  = mlx5_ib_query_pkey;
dev-ib_dev.modify_device   = mlx5_ib_modify_device;
@@ -1530,9 +1579,15 @@ static void *mlx5_ib_add(struct mlx5_core_dev *mdev)
 
mutex_init(dev-cap_mask_mutex);
 
+   if (ll == IB_LINK_LAYER_ETHERNET) {
+   err = mlx5_enable_roce(dev);
+   if (err)
+   goto err_dealloc;
+   }
+
err = create_dev_resources(dev-devr);
if (err)
-   goto err_dealloc;
+   goto err_disable_roce;
 
err = mlx5_ib_odp_init_one(dev);
if (err)
@@ -1569,6 +1624,10 @@ err_odp:
 err_rsrc:
destroy_dev_resources(dev-devr);
 
+err_disable_roce:
+   if (ll == IB_LINK_LAYER_ETHERNET)
+   mlx5_disable_roce(dev);
+
 err_dealloc:
ib_dealloc_device((struct ib_device *)dev);
 
@@ -1578,11 +1637,14 @@ err_dealloc:
 static void mlx5_ib_remove(struct mlx5_core_dev *mdev, void *context)
 {
struct mlx5_ib_dev *dev = context;
+   enum rdma_link_layer ll = mlx5_ib_port_link_layer(dev-ib_dev, 1);
 
ib_unregister_device(dev-ib_dev);
destroy_umrc_res(dev);
mlx5_ib_odp_remove_one(dev);
destroy_dev_resources(dev-devr);
+   if (ll == IB_LINK_LAYER_ETHERNET)
+   mlx5_disable_roce(dev);
ib_dealloc_device(dev-ib_dev);
 }
 
diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h 
b/drivers/infiniband/hw/mlx5/mlx5_ib.h
index 7cae098..81df6d4 100644
--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
+++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
@@ -418,9 +418,19 @@ struct mlx5_ib_resources {
struct ib_srq   *s1;
 };
 
+struct mlx5_roce {
+   /* Protect mlx5_ib_get_netdev from invoking dev_hold() with a NULL
+* netdev pointer
+*/
+   rwlock_tnetdev_lock;
+   struct net_device   *netdev;
+   struct notifier_block  

[PATCH for-next 07/10] IB/mlx5: Set network_hdr_type upon RoCE responder completion

2015-08-20 Thread Achiad Shochat
When handling a responder completion, if the link layer is Ethernet,
set the work completion network_hdr_type field according to CQE's
info and the IB_WC_WITH_NETWORK_HDR_TYPE flag.

Signed-off-by: Achiad Shochat ach...@mellanox.com
---
 drivers/infiniband/hw/mlx5/cq.c | 17 +
 include/linux/mlx5/device.h |  6 ++
 2 files changed, 23 insertions(+)

diff --git a/drivers/infiniband/hw/mlx5/cq.c b/drivers/infiniband/hw/mlx5/cq.c
index 5c9eeea..bd6c738 100644
--- a/drivers/infiniband/hw/mlx5/cq.c
+++ b/drivers/infiniband/hw/mlx5/cq.c
@@ -170,6 +170,7 @@ enum {
 static void handle_responder(struct ib_wc *wc, struct mlx5_cqe64 *cqe,
 struct mlx5_ib_qp *qp)
 {
+   enum rdma_link_layer ll = rdma_port_get_link_layer(qp-ibqp.device, 1);
struct mlx5_ib_dev *dev = to_mdev(qp-ibqp.device);
struct mlx5_ib_srq *srq;
struct mlx5_ib_wq *wq;
@@ -228,6 +229,22 @@ static void handle_responder(struct ib_wc *wc, struct 
mlx5_cqe64 *cqe,
g = (be32_to_cpu(cqe-flags_rqpn)  28)  3;
wc-wc_flags |= g ? IB_WC_GRH : 0;
wc-pkey_index = be32_to_cpu(cqe-imm_inval_pkey)  0x;
+
+   if (ll != IB_LINK_LAYER_ETHERNET)
+   return;
+
+   switch (wc-sl  0x3) {
+   case MLX5_CQE_ROCE_L3_HEADER_TYPE_GRH:
+   wc-network_hdr_type = RDMA_NETWORK_IB;
+   break;
+   case MLX5_CQE_ROCE_L3_HEADER_TYPE_IPV6:
+   wc-network_hdr_type = RDMA_NETWORK_IPV6;
+   break;
+   case MLX5_CQE_ROCE_L3_HEADER_TYPE_IPV4:
+   wc-network_hdr_type = RDMA_NETWORK_IPV4;
+   break;
+   }
+   wc-wc_flags |= IB_WC_WITH_NETWORK_HDR_TYPE;
 }
 
 static void dump_cqe(struct mlx5_ib_dev *dev, struct mlx5_err_cqe *cqe)
diff --git a/include/linux/mlx5/device.h b/include/linux/mlx5/device.h
index b943cd9..294 100644
--- a/include/linux/mlx5/device.h
+++ b/include/linux/mlx5/device.h
@@ -628,6 +628,12 @@ enum {
 };
 
 enum {
+   MLX5_CQE_ROCE_L3_HEADER_TYPE_GRH= 0x0,
+   MLX5_CQE_ROCE_L3_HEADER_TYPE_IPV6   = 0x1,
+   MLX5_CQE_ROCE_L3_HEADER_TYPE_IPV4   = 0x2,
+};
+
+enum {
CQE_L2_OK   = 1  0,
CQE_L3_OK   = 1  1,
CQE_L4_OK   = 1  2,
-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH for-next 06/10] IB/mlx5: Extend query_device/port to support RoCE

2015-08-20 Thread Achiad Shochat
Using the vport access functions to retrieve the Ethernet
specific information and return this information in
ib_query_device and ib_query_port.

Signed-off-by: Achiad Shochat ach...@mellanox.com
---
 drivers/infiniband/hw/mlx5/main.c | 75 +++
 include/linux/mlx5/driver.h   |  7 
 2 files changed, 69 insertions(+), 13 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/main.c 
b/drivers/infiniband/hw/mlx5/main.c
index 5a176d7..612dc3a 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -40,6 +40,7 @@
 #include linux/io-mapping.h
 #include linux/sched.h
 #include rdma/ib_user_verbs.h
+#include rdma/ib_addr.h
 #include linux/mlx5/vport.h
 #include rdma/ib_smi.h
 #include rdma/ib_umem.h
@@ -120,6 +121,50 @@ static struct net_device *mlx5_ib_get_netdev(struct 
ib_device *device,
return ndev;
 }
 
+static int mlx5_query_port_roce(struct ib_device *device, u8 port_num,
+   struct ib_port_attr *props)
+{
+   struct mlx5_ib_dev *dev = to_mdev(device);
+   struct net_device *ndev;
+   enum ib_mtu ndev_ib_mtu;
+
+   memset(props, 0, sizeof(*props));
+
+   props-port_cap_flags  |= IB_PORT_CM_SUP;
+   props-port_cap_flags  |= IB_PORT_IP_BASED_GIDS;
+
+   props-gid_tbl_len  = MLX5_CAP_ROCE(dev-mdev,
+   roce_address_table_size);
+   props-max_mtu  = IB_MTU_4096;
+   props-max_msg_sz   = 1  MLX5_CAP_GEN(dev-mdev, log_max_msg);
+   props-pkey_tbl_len = 1;
+   props-state= IB_PORT_DOWN;
+   props-phys_state   = 3;
+
+   mlx5_query_nic_vport_qkey_viol_cntr(dev-mdev,
+   (u16 *)props-qkey_viol_cntr);
+
+   ndev = mlx5_ib_get_netdev(device, port_num);
+   if (!ndev)
+   return 0;
+
+   if (netif_running(ndev)  netif_carrier_ok(ndev)) {
+   props-state  = IB_PORT_ACTIVE;
+   props-phys_state = 5;
+   }
+
+   ndev_ib_mtu = iboe_get_mtu(ndev-mtu);
+
+   dev_put(ndev);
+
+   props-active_mtu   = min(props-max_mtu, ndev_ib_mtu);
+
+   props-active_width = IB_WIDTH_4X;  /* TODO */
+   props-active_speed = IB_SPEED_QDR; /* TODO */
+
+   return 0;
+}
+
 static int mlx5_use_mad_ifc(struct mlx5_ib_dev *dev)
 {
return !dev-mdev-issi;
@@ -158,13 +203,21 @@ static int mlx5_query_system_image_guid(struct ib_device 
*ibdev,
 
case MLX5_VPORT_ACCESS_METHOD_HCA:
err = mlx5_query_hca_vport_system_image_guid(mdev, tmp);
-   if (!err)
-   *sys_image_guid = cpu_to_be64(tmp);
-   return err;
+   break;
+
+   case MLX5_VPORT_ACCESS_METHOD_NIC:
+   err = mlx5_query_nic_vport_system_image_guid(mdev, tmp);
+   break;
 
default:
return -EINVAL;
}
+
+   if (!err)
+   *sys_image_guid = cpu_to_be64(tmp);
+
+   return err;
+
 }
 
 static int mlx5_query_max_pkeys(struct ib_device *ibdev,
@@ -218,13 +271,20 @@ static int mlx5_query_node_guid(struct mlx5_ib_dev *dev,
 
case MLX5_VPORT_ACCESS_METHOD_HCA:
err = mlx5_query_hca_vport_node_guid(dev-mdev, tmp);
-   if (!err)
-   *node_guid = cpu_to_be64(tmp);
-   return err;
+   break;
+
+   case MLX5_VPORT_ACCESS_METHOD_NIC:
+   err = mlx5_query_nic_vport_node_guid(dev-mdev, tmp);
+   break;
 
default:
return -EINVAL;
}
+
+   if (!err)
+   *node_guid = cpu_to_be64(tmp);
+
+   return err;
 }
 
 struct mlx5_reg_node_desc {
@@ -521,6 +581,9 @@ int mlx5_ib_query_port(struct ib_device *ibdev, u8 port,
case MLX5_VPORT_ACCESS_METHOD_HCA:
return mlx5_query_hca_port(ibdev, port, props);
 
+   case MLX5_VPORT_ACCESS_METHOD_NIC:
+   return mlx5_query_port_roce(ibdev, port, props);
+
default:
return -EINVAL;
}
diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index 5722d88..74e833d 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -600,13 +600,6 @@ extern struct workqueue_struct *mlx5_core_wq;
.struct_offset_bytes = offsetof(struct ib_unpacked_ ## header, field),  
\
.struct_size_bytes   = sizeof((struct ib_unpacked_ ## header *)0)-field
 
-struct ib_field {
-   size_t struct_offset_bytes;
-   size_t struct_size_bytes;
-   intoffset_bits;
-   intsize_bits;
-};
-
 static inline struct mlx5_core_dev *pci2mlx5_core_dev(struct pci_dev *pdev)
 {
return pci_get_drvdata(pdev);
-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  

[PATCH for-next 09/10] IB/mlx5: Add RoCE fields to Address Vector

2015-08-20 Thread Achiad Shochat
Set the address handle and QP address path fields according to the
link layer type (IB/Eth).

Signed-off-by: Achiad Shochat ach...@mellanox.com
---
 drivers/infiniband/hw/mlx5/ah.c  | 32 +--
 drivers/infiniband/hw/mlx5/main.c| 21 ++
 drivers/infiniband/hw/mlx5/mlx5_ib.h |  5 +++--
 drivers/infiniband/hw/mlx5/qp.c  | 42 ++--
 include/linux/mlx5/qp.h  | 21 --
 5 files changed, 96 insertions(+), 25 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/ah.c b/drivers/infiniband/hw/mlx5/ah.c
index 6608058..745efa4 100644
--- a/drivers/infiniband/hw/mlx5/ah.c
+++ b/drivers/infiniband/hw/mlx5/ah.c
@@ -32,8 +32,10 @@
 
 #include mlx5_ib.h
 
-struct ib_ah *create_ib_ah(struct ib_ah_attr *ah_attr,
-  struct mlx5_ib_ah *ah)
+static struct ib_ah *create_ib_ah(struct mlx5_ib_dev *dev,
+ struct mlx5_ib_ah *ah,
+ struct ib_ah_attr *ah_attr,
+ enum rdma_link_layer ll)
 {
if (ah_attr-ah_flags  IB_AH_GRH) {
memcpy(ah-av.rgid, ah_attr-grh.dgid, 16);
@@ -44,9 +46,20 @@ struct ib_ah *create_ib_ah(struct ib_ah_attr *ah_attr,
ah-av.tclass = ah_attr-grh.traffic_class;
}
 
-   ah-av.rlid = cpu_to_be16(ah_attr-dlid);
-   ah-av.fl_mlid = ah_attr-src_path_bits  0x7f;
-   ah-av.stat_rate_sl = (ah_attr-static_rate  4) | (ah_attr-sl  0xf);
+   ah-av.stat_rate_sl = (ah_attr-static_rate  4);
+
+   if (ll == IB_LINK_LAYER_ETHERNET) {
+   memcpy(ah-av.rmac, ah_attr-dmac, sizeof(ah_attr-dmac));
+   ah-av.udp_sport =
+   mlx5_get_roce_udp_sport(dev,
+   ah_attr-port_num,
+   ah_attr-grh.sgid_index);
+   ah-av.stat_rate_sl |= (ah_attr-sl  0x7)  1;
+   } else {
+   ah-av.rlid = cpu_to_be16(ah_attr-dlid);
+   ah-av.fl_mlid = ah_attr-src_path_bits  0x7f;
+   ah-av.stat_rate_sl |= (ah_attr-sl  0xf);
+   }
 
return ah-ibah;
 }
@@ -54,12 +67,19 @@ struct ib_ah *create_ib_ah(struct ib_ah_attr *ah_attr,
 struct ib_ah *mlx5_ib_create_ah(struct ib_pd *pd, struct ib_ah_attr *ah_attr)
 {
struct mlx5_ib_ah *ah;
+   struct mlx5_ib_dev *dev = to_mdev(pd-device);
+   enum rdma_link_layer ll;
+
+   ll = pd-device-get_link_layer(pd-device, ah_attr-port_num);
+
+   if (ll == IB_LINK_LAYER_ETHERNET  !(ah_attr-ah_flags  IB_AH_GRH))
+   return ERR_PTR(-EINVAL);
 
ah = kzalloc(sizeof(*ah), GFP_ATOMIC);
if (!ah)
return ERR_PTR(-ENOMEM);
 
-   return create_ib_ah(ah_attr, ah); /* never fails */
+   return create_ib_ah(dev, ah, ah_attr, ll); /* never fails */
 }
 
 int mlx5_ib_query_ah(struct ib_ah *ibah, struct ib_ah_attr *ah_attr)
diff --git a/drivers/infiniband/hw/mlx5/main.c 
b/drivers/infiniband/hw/mlx5/main.c
index 975e332..27dab5d 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -41,6 +41,7 @@
 #include linux/sched.h
 #include rdma/ib_user_verbs.h
 #include rdma/ib_addr.h
+#include rdma/ib_cache.h
 #include linux/mlx5/vport.h
 #include rdma/ib_smi.h
 #include rdma/ib_umem.h
@@ -252,6 +253,26 @@ static int mlx5_ib_del_gid(struct ib_device *device, u8 
port_num,
return set_roce_addr(device, port_num, index, NULL, NULL);
 }
 
+__be16 mlx5_get_roce_udp_sport(struct mlx5_ib_dev *dev, u8 port_num,
+  int index)
+{
+   struct ib_gid_attr attr;
+   union ib_gid gid;
+
+   if (ib_get_cached_gid(dev-ib_dev, port_num, index, gid, attr))
+   return 0;
+
+   if (!attr.ndev)
+   return 0;
+
+   dev_put(attr.ndev);
+
+   if (attr.gid_type != IB_GID_TYPE_ROCE_UDP_ENCAP)
+   return 0;
+
+   return cpu_to_be16(MLX5_CAP_ROCE(dev-mdev, r_roce_min_src_udp_port));
+}
+
 static int mlx5_use_mad_ifc(struct mlx5_ib_dev *dev)
 {
return !dev-mdev-issi;
diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h 
b/drivers/infiniband/hw/mlx5/mlx5_ib.h
index 81df6d4..c5704f2 100644
--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
+++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
@@ -538,8 +538,6 @@ void mlx5_ib_free_srq_wqe(struct mlx5_ib_srq *srq, int 
wqe_index);
 int mlx5_MAD_IFC(struct mlx5_ib_dev *dev, int ignore_mkey, int ignore_bkey,
 u8 port, const struct ib_wc *in_wc, const struct ib_grh 
*in_grh,
 const void *in_mad, void *response_mad);
-struct ib_ah *create_ib_ah(struct ib_ah_attr *ah_attr,
-  struct mlx5_ib_ah *ah);
 struct ib_ah *mlx5_ib_create_ah(struct ib_pd *pd, struct ib_ah_attr *ah_attr);
 int mlx5_ib_query_ah(struct ib_ah *ibah, struct ib_ah_attr *ah_attr);
 int mlx5_ib_destroy_ah(struct ib_ah *ah);
@@ -676,6 +674,9 @@ static 

[PATCH for-next 04/10] net/mlx5_core: Introduce access functions to enable/disable RoCE

2015-08-20 Thread Achiad Shochat
A mlx5 Ethernet port must be explicitly enabled for RoCE.
When RoCE is not enabled on the port, the NIC will refuse to create
QPs attached to it and incoming RoCE packets will be considered by the
NIC as plain Ethernet packets.

Signed-off-by: Achiad Shochat ach...@mellanox.com
---
 drivers/net/ethernet/mellanox/mlx5/core/vport.c | 52 +
 include/linux/mlx5/vport.h  |  3 ++
 2 files changed, 55 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/vport.c 
b/drivers/net/ethernet/mellanox/mlx5/core/vport.c
index 54ab63b..245ff4a 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/vport.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/vport.c
@@ -70,6 +70,17 @@ static int mlx5_query_nic_vport_context(struct mlx5_core_dev 
*mdev, u32 *out,
return mlx5_cmd_exec_check_status(mdev, in, sizeof(in), out, outlen);
 }
 
+static int mlx5_modify_nic_vport_context(struct mlx5_core_dev *mdev, void *in,
+int inlen)
+{
+   u32 out[MLX5_ST_SZ_DW(modify_nic_vport_context_out)];
+
+   MLX5_SET(modify_nic_vport_context_in, in, opcode,
+MLX5_CMD_OP_MODIFY_NIC_VPORT_CONTEXT);
+
+   return mlx5_cmd_exec_check_status(mdev, in, inlen, out, sizeof(out));
+}
+
 void mlx5_query_nic_vport_mac_address(struct mlx5_core_dev *mdev, u8 *addr)
 {
u32 *out;
@@ -350,3 +361,44 @@ int mlx5_query_hca_vport_node_guid(struct mlx5_core_dev 
*dev,
return err;
 }
 EXPORT_SYMBOL_GPL(mlx5_query_hca_vport_node_guid);
+
+enum mlx5_vport_roce_state {
+   MLX5_VPORT_ROCE_DISABLED = 0,
+   MLX5_VPORT_ROCE_ENABLED  = 1,
+};
+
+static int mlx5_nic_vport_update_roce_state(struct mlx5_core_dev *mdev,
+   enum mlx5_vport_roce_state state)
+{
+   void *in;
+   int inlen = MLX5_ST_SZ_BYTES(modify_nic_vport_context_in);
+   int err;
+
+   in = mlx5_vzalloc(inlen);
+   if (!in) {
+   mlx5_core_warn(mdev, failed to allocate inbox\n);
+   return -ENOMEM;
+   }
+
+   MLX5_SET(modify_nic_vport_context_in, in, field_select.roce_en, 1);
+   MLX5_SET(modify_nic_vport_context_in, in, nic_vport_context.roce_en,
+state);
+
+   err = mlx5_modify_nic_vport_context(mdev, in, inlen);
+
+   kvfree(in);
+
+   return err;
+}
+
+int mlx5_nic_vport_enable_roce(struct mlx5_core_dev *mdev)
+{
+   return mlx5_nic_vport_update_roce_state(mdev, MLX5_VPORT_ROCE_ENABLED);
+}
+EXPORT_SYMBOL_GPL(mlx5_nic_vport_enable_roce);
+
+int mlx5_nic_vport_disable_roce(struct mlx5_core_dev *mdev)
+{
+   return mlx5_nic_vport_update_roce_state(mdev, MLX5_VPORT_ROCE_DISABLED);
+}
+EXPORT_SYMBOL_GPL(mlx5_nic_vport_disable_roce);
diff --git a/include/linux/mlx5/vport.h b/include/linux/mlx5/vport.h
index 967e0fd..4c9ac60 100644
--- a/include/linux/mlx5/vport.h
+++ b/include/linux/mlx5/vport.h
@@ -52,4 +52,7 @@ int mlx5_query_hca_vport_system_image_guid(struct 
mlx5_core_dev *dev,
 int mlx5_query_hca_vport_node_guid(struct mlx5_core_dev *dev,
   u64 *node_guid);
 
+int mlx5_nic_vport_enable_roce(struct mlx5_core_dev *mdev);
+int mlx5_nic_vport_disable_roce(struct mlx5_core_dev *mdev);
+
 #endif /* __MLX5_VPORT_H__ */
-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH for-next 00/10] Add RoCE support to the mlx5 driver

2015-08-20 Thread Achiad Shochat
Hi Doug,

This patchset adds RoCE V1 and RoCE V2 support to the mlx5 device
driver.

This patchset was applied and tested over patchset Add RoCE v2
support which was sent to the mailing list by Matan Barak.

Achiad Shochat (10):
  IB/mlx5: Support IB device's callback for getting the link layer
  IB/mlx5: Support IB device's callback for getting its netdev
  net/mlx5_core: Break down the vport mac address query function
  net/mlx5_core: Introduce access functions to enable/disable RoCE
  net/mlx5_core: Introduce access functions to query vport RoCE fields
  IB/mlx5: Extend query_device/port to support RoCE
  IB/mlx5: Set network_hdr_type upon RoCE responder completion
  IB/mlx5: Support IB device's callbacks for adding/deleting GIDs
  IB/mlx5: Add RoCE fields to Address Vector
  IB/mlx5: Support RoCE

 drivers/infiniband/hw/mlx5/ah.c |  32 ++-
 drivers/infiniband/hw/mlx5/cq.c |  17 ++
 drivers/infiniband/hw/mlx5/main.c   | 318 ++--
 drivers/infiniband/hw/mlx5/mlx5_ib.h|  15 +-
 drivers/infiniband/hw/mlx5/qp.c |  42 +++-
 drivers/net/ethernet/mellanox/mlx5/core/vport.c | 139 ++-
 include/linux/mlx5/device.h |  26 ++
 include/linux/mlx5/driver.h |   7 -
 include/linux/mlx5/mlx5_ifc.h   |  10 +-
 include/linux/mlx5/qp.h |  21 +-
 include/linux/mlx5/vport.h  |   8 +
 11 files changed, 578 insertions(+), 57 deletions(-)

-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH for-next 03/10] net/mlx5_core: Break down the vport mac address query function

2015-08-20 Thread Achiad Shochat
Introduce a new function called mlx5_query_nic_vport_context().
This function gets all the NIC vport attributes from the device.

The MAC address is just one of the NIC vport attributes, so
mlx5_query_nic_vport_mac_address() is now just a wrapper function
above mlx5_query_nic_vport_context().

More NIC vport attributes will be used in following commits.

Signed-off-by: Achiad Shochat ach...@mellanox.com
---
 drivers/net/ethernet/mellanox/mlx5/core/vport.c | 27 -
 1 file changed, 17 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/vport.c 
b/drivers/net/ethernet/mellanox/mlx5/core/vport.c
index b94177e..54ab63b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/vport.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/vport.c
@@ -57,12 +57,25 @@ u8 mlx5_query_vport_state(struct mlx5_core_dev *mdev, u8 
opmod)
 }
 EXPORT_SYMBOL(mlx5_query_vport_state);
 
+static int mlx5_query_nic_vport_context(struct mlx5_core_dev *mdev, u32 *out,
+   int outlen)
+{
+   u32 in[MLX5_ST_SZ_DW(query_nic_vport_context_in)];
+
+   memset(in, 0, sizeof(in));
+
+   MLX5_SET(query_nic_vport_context_in, in, opcode,
+MLX5_CMD_OP_QUERY_NIC_VPORT_CONTEXT);
+
+   return mlx5_cmd_exec_check_status(mdev, in, sizeof(in), out, outlen);
+}
+
 void mlx5_query_nic_vport_mac_address(struct mlx5_core_dev *mdev, u8 *addr)
 {
-   u32  in[MLX5_ST_SZ_DW(query_nic_vport_context_in)];
u32 *out;
int outlen = MLX5_ST_SZ_BYTES(query_nic_vport_context_out);
u8 *out_addr;
+   int err;
 
out = mlx5_vzalloc(outlen);
if (!out)
@@ -71,15 +84,9 @@ void mlx5_query_nic_vport_mac_address(struct mlx5_core_dev 
*mdev, u8 *addr)
out_addr = MLX5_ADDR_OF(query_nic_vport_context_out, out,
nic_vport_context.permanent_address);
 
-   memset(in, 0, sizeof(in));
-
-   MLX5_SET(query_nic_vport_context_in, in, opcode,
-MLX5_CMD_OP_QUERY_NIC_VPORT_CONTEXT);
-
-   memset(out, 0, outlen);
-   mlx5_cmd_exec_check_status(mdev, in, sizeof(in), out, outlen);
-
-   ether_addr_copy(addr, out_addr[2]);
+   err = mlx5_query_nic_vport_context(mdev, out, outlen);
+   if (!err)
+   ether_addr_copy(addr, out_addr[2]);
 
kvfree(out);
 }
-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH for-next 08/10] IB/mlx5: Support IB device's callbacks for adding/deleting GIDs

2015-08-20 Thread Achiad Shochat
These callbacks write into the mlx5 RoCE address table.
Upon del_gid we write a zero'd GID.

Signed-off-by: Achiad Shochat ach...@mellanox.com
---
 drivers/infiniband/hw/mlx5/main.c | 89 +++
 include/linux/mlx5/device.h   | 20 +
 2 files changed, 109 insertions(+)

diff --git a/drivers/infiniband/hw/mlx5/main.c 
b/drivers/infiniband/hw/mlx5/main.c
index 612dc3a..975e332 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -165,6 +165,93 @@ static int mlx5_query_port_roce(struct ib_device *device, 
u8 port_num,
return 0;
 }
 
+static void ib_gid_to_mlx5_roce_addr(const union ib_gid *gid,
+const struct ib_gid_attr *attr,
+void *mlx5_addr)
+{
+#define MLX5_SET_RA(p, f, v) MLX5_SET(roce_addr_layout, p, f, v)
+   char *mlx5_addr_l3_addr = MLX5_ADDR_OF(roce_addr_layout, mlx5_addr,
+  source_l3_address);
+   void *mlx5_addr_mac = MLX5_ADDR_OF(roce_addr_layout, mlx5_addr,
+  source_mac_47_32);
+
+   if (!gid)
+   return;
+
+   ether_addr_copy(mlx5_addr_mac, attr-ndev-dev_addr);
+
+   if (is_vlan_dev(attr-ndev)) {
+   MLX5_SET_RA(mlx5_addr, vlan_valid, 1);
+   MLX5_SET_RA(mlx5_addr, vlan_id, vlan_dev_vlan_id(attr-ndev));
+   }
+
+   switch (attr-gid_type) {
+   case IB_GID_TYPE_IB:
+   MLX5_SET_RA(mlx5_addr, roce_version, MLX5_ROCE_VERSION_1);
+   break;
+   case IB_GID_TYPE_ROCE_UDP_ENCAP:
+   MLX5_SET_RA(mlx5_addr, roce_version, MLX5_ROCE_VERSION_2);
+   break;
+
+   default:
+   WARN_ON(true);
+   }
+
+   if (attr-gid_type != IB_GID_TYPE_IB) {
+   if (ipv6_addr_v4mapped((void *)gid))
+   MLX5_SET_RA(mlx5_addr, roce_l3_type,
+   MLX5_ROCE_L3_TYPE_IPV4);
+   else
+   MLX5_SET_RA(mlx5_addr, roce_l3_type,
+   MLX5_ROCE_L3_TYPE_IPV6);
+   }
+
+   if ((attr-gid_type == IB_GID_TYPE_IB) ||
+   !ipv6_addr_v4mapped((void *)gid))
+   memcpy(mlx5_addr_l3_addr, gid, sizeof(*gid));
+   else
+   memcpy(mlx5_addr_l3_addr[12], gid-raw[12], 4);
+}
+
+static int set_roce_addr(struct ib_device *device, u8 port_num,
+unsigned int index,
+const union ib_gid *gid,
+const struct ib_gid_attr *attr)
+{
+   struct mlx5_ib_dev *dev = to_mdev(device);
+   u32  in[MLX5_ST_SZ_DW(set_roce_address_in)];
+   u32 out[MLX5_ST_SZ_DW(set_roce_address_out)];
+   void *in_addr = MLX5_ADDR_OF(set_roce_address_in, in, roce_address);
+   enum rdma_link_layer ll = mlx5_ib_port_link_layer(device, port_num);
+
+   if (ll != IB_LINK_LAYER_ETHERNET)
+   return -EINVAL;
+
+   memset(in, 0, sizeof(in));
+
+   ib_gid_to_mlx5_roce_addr(gid, attr, in_addr);
+
+   MLX5_SET(set_roce_address_in, in, roce_address_index, index);
+   MLX5_SET(set_roce_address_in, in, opcode, MLX5_CMD_OP_SET_ROCE_ADDRESS);
+
+   memset(out, 0, sizeof(out));
+   return mlx5_cmd_exec(dev-mdev, in, sizeof(in), out, sizeof(out));
+}
+
+static int mlx5_ib_add_gid(struct ib_device *device, u8 port_num,
+  unsigned int index, const union ib_gid *gid,
+  const struct ib_gid_attr *attr,
+  __always_unused void **context)
+{
+   return set_roce_addr(device, port_num, index, gid, attr);
+}
+
+static int mlx5_ib_del_gid(struct ib_device *device, u8 port_num,
+  unsigned int index, __always_unused void **context)
+{
+   return set_roce_addr(device, port_num, index, NULL, NULL);
+}
+
 static int mlx5_use_mad_ifc(struct mlx5_ib_dev *dev)
 {
return !dev-mdev-issi;
@@ -1584,6 +1671,8 @@ static void *mlx5_ib_add(struct mlx5_core_dev *mdev)
if (ll == IB_LINK_LAYER_ETHERNET)
dev-ib_dev.get_netdev  = mlx5_ib_get_netdev;
dev-ib_dev.query_gid   = mlx5_ib_query_gid;
+   dev-ib_dev.add_gid = mlx5_ib_add_gid;
+   dev-ib_dev.del_gid = mlx5_ib_del_gid;
dev-ib_dev.query_pkey  = mlx5_ib_query_pkey;
dev-ib_dev.modify_device   = mlx5_ib_modify_device;
dev-ib_dev.modify_port = mlx5_ib_modify_port;
diff --git a/include/linux/mlx5/device.h b/include/linux/mlx5/device.h
index 294..90406b4 100644
--- a/include/linux/mlx5/device.h
+++ b/include/linux/mlx5/device.h
@@ -279,6 +279,26 @@ enum {
 };
 
 enum {
+   MLX5_ROCE_VERSION_1 = 0,
+   MLX5_ROCE_VERSION_2 = 2,
+};
+
+enum {
+   MLX5_ROCE_VERSION_1_CAP = 1  MLX5_ROCE_VERSION_1,
+   MLX5_ROCE_VERSION_2_CAP   

[PATCH for-next 01/10] IB/mlx5: Support IB device's callback for getting the link layer

2015-08-20 Thread Achiad Shochat
Make the existing mlx5_ib_port_link_layer() signature match
the ib device callback signature (add port_num parameter).
Refactor it to use a sub function so that the link layer could
be queried also before the ibdev is created.

Signed-off-by: Achiad Shochat ach...@mellanox.com
---
 drivers/infiniband/hw/mlx5/main.c | 25 +++--
 1 file changed, 19 insertions(+), 6 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/main.c 
b/drivers/infiniband/hw/mlx5/main.c
index 085c24b..8540e00 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -64,11 +64,9 @@ static char mlx5_version[] =
DRIVER_VERSION  ( DRIVER_RELDATE )\n;
 
 static enum rdma_link_layer
-mlx5_ib_port_link_layer(struct ib_device *device)
+mlx5_port_type_cap_to_rdma_ll(int port_type_cap)
 {
-   struct mlx5_ib_dev *dev = to_mdev(device);
-
-   switch (MLX5_CAP_GEN(dev-mdev, port_type)) {
+   switch (port_type_cap) {
case MLX5_CAP_PORT_TYPE_IB:
return IB_LINK_LAYER_INFINIBAND;
case MLX5_CAP_PORT_TYPE_ETH:
@@ -78,6 +76,15 @@ mlx5_ib_port_link_layer(struct ib_device *device)
}
 }
 
+static enum rdma_link_layer
+mlx5_ib_port_link_layer(struct ib_device *device, u8 port_num)
+{
+   struct mlx5_ib_dev *dev = to_mdev(device);
+   int port_type_cap = MLX5_CAP_GEN(dev-mdev, port_type);
+
+   return mlx5_port_type_cap_to_rdma_ll(port_type_cap);
+}
+
 static int mlx5_use_mad_ifc(struct mlx5_ib_dev *dev)
 {
return !dev-mdev-issi;
@@ -94,7 +101,7 @@ static int mlx5_get_vport_access_method(struct ib_device 
*ibdev)
if (mlx5_use_mad_ifc(to_mdev(ibdev)))
return MLX5_VPORT_ACCESS_METHOD_MAD;
 
-   if (mlx5_ib_port_link_layer(ibdev) ==
+   if (mlx5_ib_port_link_layer(ibdev, 1) ==
IB_LINK_LAYER_ETHERNET)
return MLX5_VPORT_ACCESS_METHOD_NIC;
 
@@ -1394,11 +1401,16 @@ static int mlx5_port_immutable(struct ib_device *ibdev, 
u8 port_num,
 static void *mlx5_ib_add(struct mlx5_core_dev *mdev)
 {
struct mlx5_ib_dev *dev;
+   enum rdma_link_layer ll;
+   int port_type_cap;
int err;
int i;
 
+   port_type_cap = MLX5_CAP_GEN(mdev, port_type);
+   ll = mlx5_port_type_cap_to_rdma_ll(port_type_cap);
+
/* don't create IB instance over Eth ports, no RoCE yet! */
-   if (MLX5_CAP_GEN(mdev, port_type) == MLX5_CAP_PORT_TYPE_ETH)
+   if (ll == IB_LINK_LAYER_ETHERNET)
return NULL;
 
printk_once(KERN_INFO %s, mlx5_version);
@@ -1458,6 +1470,7 @@ static void *mlx5_ib_add(struct mlx5_core_dev *mdev)
 
dev-ib_dev.query_device= mlx5_ib_query_device;
dev-ib_dev.query_port  = mlx5_ib_query_port;
+   dev-ib_dev.get_link_layer  = mlx5_ib_port_link_layer;
dev-ib_dev.query_gid   = mlx5_ib_query_gid;
dev-ib_dev.query_pkey  = mlx5_ib_query_pkey;
dev-ib_dev.modify_device   = mlx5_ib_modify_device;
-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] IB/uverbs: reject invalid or unknown opcodes

2015-08-20 Thread Steve Wise

On 8/20/2015 3:49 AM, Sagi Grimberg wrote:

On 8/19/2015 8:54 PM, Jason Gunthorpe wrote:

On Wed, Aug 19, 2015 at 07:48:02PM +0200, Christoph Hellwig wrote:

On Wed, Aug 19, 2015 at 11:46:14AM -0600, Jason Gunthorpe wrote:

Reviewed-by: Jason Gunthorpe jguntho...@obsidianresearch.com

AFAIK, this path is rarely (never?) actually used. I think all the
drivers we have can post directly from userspace.


Oh, interesting.  Is there any chance to deprecate it?  Not having
to care for the uvers command would really help with some of the
upcoming changes I have in my mind.


Hmm, we'd need a survey of the userspace side to see if it is rarely
or never...

And we'd have to talk to the soft XXX guys to see if they plan to use
it..


Checked in librxe (user-space softroce). Looks like posts are going via
this path...


Ditto for the soft iWARP stack, which is still out-of-linux.


--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html