[PATCH for-4.3] iw_cxgb4: Add support for clip

2015-08-25 Thread Hariprasad Shenai
Add support for ipv6 address handling clip api provided by lld

Signed-off-by: Hariprasad Shenai 
---
 drivers/infiniband/hw/cxgb4/cm.c | 76 +---
 1 file changed, 72 insertions(+), 4 deletions(-)

diff --git a/drivers/infiniband/hw/cxgb4/cm.c b/drivers/infiniband/hw/cxgb4/cm.c
index 3ad8dc7..d699719 100644
--- a/drivers/infiniband/hw/cxgb4/cm.c
+++ b/drivers/infiniband/hw/cxgb4/cm.c
@@ -50,6 +50,7 @@
 #include 
 
 #include "iw_cxgb4.h"
+#include "clip_tbl.h"
 
 static char *states[] = {
"idle",
@@ -298,6 +299,16 @@ void _c4iw_free_ep(struct kref *kref)
if (test_bit(QP_REFERENCED, &ep->com.flags))
deref_qp(ep);
if (test_bit(RELEASE_RESOURCES, &ep->com.flags)) {
+   if (ep->com.remote_addr.ss_family == AF_INET6) {
+   struct sockaddr_in6 *sin6 =
+   (struct sockaddr_in6 *)
+   &ep->com.mapped_local_addr;
+
+   cxgb4_clip_release(
+   ep->com.dev->rdev.lldi.ports[0],
+   (const u32 *)&sin6->sin6_addr.s6_addr,
+   1);
+   }
remove_handle(ep->com.dev, &ep->com.dev->hwtid_idr, ep->hwtid);
cxgb4_remove_tid(ep->com.dev->rdev.lldi.tids, 0, ep->hwtid);
dst_release(ep->dst);
@@ -442,6 +453,12 @@ static void act_open_req_arp_failure(void *handle, struct 
sk_buff *skb)
kfree_skb(skb);
connect_reply_upcall(ep, -EHOSTUNREACH);
state_set(&ep->com, DEAD);
+   if (ep->com.remote_addr.ss_family == AF_INET6) {
+   struct sockaddr_in6 *sin6 =
+   (struct sockaddr_in6 *)&ep->com.mapped_local_addr;
+   cxgb4_clip_release(ep->com.dev->rdev.lldi.ports[0],
+  (const u32 *)&sin6->sin6_addr.s6_addr, 1);
+   }
remove_handle(ep->com.dev, &ep->com.dev->atid_idr, ep->atid);
cxgb4_free_atid(ep->com.dev->rdev.lldi.tids, ep->atid);
dst_release(ep->dst);
@@ -640,6 +657,7 @@ static int send_connect(struct c4iw_ep *ep)
struct sockaddr_in6 *ra6 = (struct sockaddr_in6 *)
   &ep->com.mapped_remote_addr;
int win;
+   int ret;
 
wrlen = (ep->com.remote_addr.ss_family == AF_INET) ?
roundup(sizev4, 16) :
@@ -693,6 +711,11 @@ static int send_connect(struct c4iw_ep *ep)
opt2 |= CONG_CNTRL_V(CONG_ALG_TAHOE);
opt2 |= T5_ISS_F;
}
+
+   if (ep->com.remote_addr.ss_family == AF_INET6)
+   cxgb4_clip_get(ep->com.dev->rdev.lldi.ports[0],
+  (const u32 *)&la6->sin6_addr.s6_addr, 1);
+
t4_set_arp_err_handler(skb, ep, act_open_req_arp_failure);
 
if (is_t4(ep->com.dev->rdev.lldi.adapter_type)) {
@@ -790,7 +813,11 @@ static int send_connect(struct c4iw_ep *ep)
}
 
set_bit(ACT_OPEN_REQ, &ep->com.history);
-   return c4iw_l2t_send(&ep->com.dev->rdev, skb, ep->l2t);
+   ret = c4iw_l2t_send(&ep->com.dev->rdev, skb, ep->l2t);
+   if (ret && ep->com.remote_addr.ss_family == AF_INET6)
+   cxgb4_clip_release(ep->com.dev->rdev.lldi.ports[0],
+  (const u32 *)&la6->sin6_addr.s6_addr, 1);
+   return ret;
 }
 
 static void send_mpa_req(struct c4iw_ep *ep, struct sk_buff *skb,
@@ -2091,6 +2118,15 @@ static int act_open_rpl(struct c4iw_dev *dev, struct 
sk_buff *skb)
case CPL_ERR_CONN_EXIST:
if (ep->retry_count++ < ACT_OPEN_RETRY_COUNT) {
set_bit(ACT_RETRY_INUSE, &ep->com.history);
+   if (ep->com.remote_addr.ss_family == AF_INET6) {
+   struct sockaddr_in6 *sin6 =
+   (struct sockaddr_in6 *)
+   &ep->com.mapped_local_addr;
+   cxgb4_clip_release(
+   ep->com.dev->rdev.lldi.ports[0],
+   (const u32 *)
+   &sin6->sin6_addr.s6_addr, 1);
+   }
remove_handle(ep->com.dev, &ep->com.dev->atid_idr,
atid);
cxgb4_free_atid(t, atid);
@@ -2118,6 +2154,12 @@ static int act_open_rpl(struct c4iw_dev *dev, struct 
sk_buff *skb)
connect_reply_upcall(ep, status2errno(status));
state_set(&ep->com, DEAD);
 
+   if (ep->com.remote_addr.ss_family == AF_INET6) {
+   struct sockaddr_in6 *sin6 =
+   (struct sockaddr_in6 *)&ep->com.mapped_local_addr;
+   cxgb4_clip_release(ep->com.dev->rdev.lldi.ports[0],
+  (const u32 *)

Re: [PATCH for-next 00/10] Add RoCE support to the mlx5 driver

2015-08-25 Thread Achiad Shochat

On 8/24/2015 11:40 PM, Tom Talpey wrote:

On 8/20/2015 12:46 PM, Achiad Shochat wrote:

Hi Doug,

This patchset adds RoCE V1 and RoCE V2 support to the mlx5 device
driver.


Question - assuming I read them correctly, these patches add the
RoCE v1 and RoCE v2 support on a per-port basis. That is, a port
can be either IB, RoCE v1 or RoCE v2, but not a combination.

Has any thought been put toward supporting these protocols on a
per-QP basis, i.e. the caller of rdma_connect() can specify the
desired protocol? Or to have some sort of discovery?

I know that there may be implementation restrictions on today's
devices, but it's my personal belief that future devices will
support multiple protocols (perhaps beyond the three above), and
laying the groundwork for this today will be important.

Tom.



This patchset was applied and tested over patchset "Add RoCE v2
support" which was sent to the mailing list by Matan Barak.

Achiad Shochat (10):
   IB/mlx5: Support IB device's callback for getting the link layer
   IB/mlx5: Support IB device's callback for getting its netdev
   net/mlx5_core: Break down the vport mac address query function
   net/mlx5_core: Introduce access functions to enable/disable RoCE
   net/mlx5_core: Introduce access functions to query vport RoCE fields
   IB/mlx5: Extend query_device/port to support RoCE
   IB/mlx5: Set network_hdr_type upon RoCE responder completion
   IB/mlx5: Support IB device's callbacks for adding/deleting GIDs
   IB/mlx5: Add RoCE fields to Address Vector
   IB/mlx5: Support RoCE

  drivers/infiniband/hw/mlx5/ah.c |  32 ++-
  drivers/infiniband/hw/mlx5/cq.c |  17 ++
  drivers/infiniband/hw/mlx5/main.c   | 318
++--
  drivers/infiniband/hw/mlx5/mlx5_ib.h|  15 +-
  drivers/infiniband/hw/mlx5/qp.c |  42 +++-
  drivers/net/ethernet/mellanox/mlx5/core/vport.c | 139 ++-
  include/linux/mlx5/device.h |  26 ++
  include/linux/mlx5/driver.h |   7 -
  include/linux/mlx5/mlx5_ifc.h   |  10 +-
  include/linux/mlx5/qp.h |  21 +-
  include/linux/mlx5/vport.h  |   8 +
  11 files changed, 578 insertions(+), 57 deletions(-)





Hello Tom,

RoCE v1 and RoCE v2 are supported on a per-GID basis, not sure what got 
you to understand it is per-port.


Achiad

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] IB/uverbs: reject invalid or unknown opcodes

2015-08-25 Thread Christoph Hellwig
On Mon, Aug 24, 2015 at 10:59:21AM +0300, Haggai Eran wrote:
> > It looks odd to me as well, but it's not really something I want to
> > change in this series.  Note that sparse annoted types like __be32
> > aren't really common in userspace, but with a bit of effort they can
> > be supported.  We have them and regularly run sparse for xfsprogs for
> > example.
> 
> I have to try it with libibverbs sometime. It doesn't use uapi yet
> though IIRC - it has its own version of the header files.

Yes, I noticed that.  And the WR opcodes aren't even exported in the
uapi header, and use a shared namespace with the in-kernel only ones.

It's all a giant mess unfortunately.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH for-next 00/10] Add RoCE support to the mlx5 driver

2015-08-25 Thread Tom Talpey

On 8/25/2015 4:35 AM, Achiad Shochat wrote:

On 8/24/2015 11:40 PM, Tom Talpey wrote:

On 8/20/2015 12:46 PM, Achiad Shochat wrote:

Hi Doug,

This patchset adds RoCE V1 and RoCE V2 support to the mlx5 device
driver.


Question - assuming I read them correctly, these patches add the
RoCE v1 and RoCE v2 support on a per-port basis. That is, a port
can be either IB, RoCE v1 or RoCE v2, but not a combination.

Has any thought been put toward supporting these protocols on a
per-QP basis, i.e. the caller of rdma_connect() can specify the
desired protocol? Or to have some sort of discovery?

I know that there may be implementation restrictions on today's
devices, but it's my personal belief that future devices will
support multiple protocols (perhaps beyond the three above), and
laying the groundwork for this today will be important.

Tom.





RoCE v1 and RoCE v2 are supported on a per-GID basis, not sure what got
you to understand it is per-port.


Because the protocol capabilities were being marked at the ib_device
level. Ok, it's good that the protocol is per-endpoint.

But I don't understand how it will work per-GID. What if the target
node is RoCEv2 and on another subnet? How will it discover the remote's
capability and establish the right protocol? How does the initiator
select the protocol, if there is a choice?

Another way of asking this question is, why is all this stuff in the
driver, at the bottom of the stack? I think it should be in the
rdmacm layer.

Tom.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] IB/ipoib: Clean up send-only multicast joins

2015-08-25 Thread Hal Rosenstock
On 8/21/2015 7:34 PM, Jason Gunthorpe wrote:
> Even though we don't expect the group to be created by the SM we
> sill need to provide all the parameters to force the SM to validate
> they are correct.

Out of curiosity, has it been observed that there was inconsistency in
these additional IPoIB parameters between broadcast and non broadcast
groups on client side or is this just to be defensive to make sure this
does not occur ?

-- Hal

> 
> Signed-off-by: Jason Gunthorpe 
> ---
>  drivers/infiniband/ulp/ipoib/ipoib_multicast.c | 47 
> +-
>  1 file changed, 31 insertions(+), 16 deletions(-)
> 
> diff --git a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c 
> b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
> index 0d23e0568deb..c0e702c577d5 100644
> --- a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
> +++ b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
> @@ -448,8 +448,8 @@ out_locked:
>   return status;
>  }
>  
> -static void ipoib_mcast_join(struct net_device *dev, struct ipoib_mcast 
> *mcast,
> -  int create)
> +/* priv->lock must be held when calling */
> +static void ipoib_mcast_join(struct net_device *dev, struct ipoib_mcast 
> *mcast)
>  {
>   struct ipoib_dev_priv *priv = netdev_priv(dev);
>   struct ib_sa_multicast *multicast;
> @@ -471,7 +471,14 @@ static void ipoib_mcast_join(struct net_device *dev, 
> struct ipoib_mcast *mcast,
>   IB_SA_MCMEMBER_REC_PKEY |
>   IB_SA_MCMEMBER_REC_JOIN_STATE;
>  
> - if (create) {
> + if (mcast != priv->broadcast) {
> + /*
> +  * RFC 4391:
> +  *  The MGID MUST use the same P_Key, Q_Key, SL, MTU,
> +  *  and HopLimit as those used in the broadcast-GID.  The rest
> +  *  of attributes SHOULD follow the values used in the
> +  *  broadcast-GID as well.
> +  */
>   comp_mask |=
>   IB_SA_MCMEMBER_REC_QKEY |
>   IB_SA_MCMEMBER_REC_MTU_SELECTOR |
> @@ -492,19 +499,35 @@ static void ipoib_mcast_join(struct net_device *dev, 
> struct ipoib_mcast *mcast,
>   rec.sl= priv->broadcast->mcmember.sl;
>   rec.flow_label= priv->broadcast->mcmember.flow_label;
>   rec.hop_limit = priv->broadcast->mcmember.hop_limit;
> +
> + /*
> +  * Historically Linux IPoIB has never properly supported SEND
> +  * ONLY join. It emulated it by not providing all the required
> +  * attributes, which is enough to prevent group creation and
> +  * detect if there are full members or not. A major problem
> +  * with supporting SEND ONLY is detecting when the group is
> +  * auto-destroyed as IPoIB will cache the MLID..
> +  */
> +#if 1
> + if (test_bit(IPOIB_MCAST_FLAG_SENDONLY, &mcast->flags))
> + comp_mask &= ~IB_SA_MCMEMBER_REC_TRAFFIC_CLASS;
> +#else
> + if (test_bit(IPOIB_MCAST_FLAG_SENDONLY, &mcast->flags))
> + rec.join_state = 4;
> +#endif
>   }
>  
> + spin_unlock_irq(&priv->lock);
>   multicast = ib_sa_join_multicast(&ipoib_sa_client, priv->ca, priv->port,
>&rec, comp_mask, GFP_KERNEL,
>ipoib_mcast_join_complete, mcast);
> + spin_lock_irq(&priv->lock);
>   if (IS_ERR(multicast)) {
>   ret = PTR_ERR(multicast);
>   ipoib_warn(priv, "ib_sa_join_multicast failed, status %d\n", 
> ret);
> - spin_lock_irq(&priv->lock);
>   /* Requeue this join task with a backoff delay */
>   __ipoib_mcast_schedule_join_thread(priv, mcast, 1);
>   clear_bit(IPOIB_MCAST_FLAG_BUSY, &mcast->flags);
> - spin_unlock_irq(&priv->lock);
>   complete(&mcast->done);
>   }
>  }
> @@ -517,7 +540,6 @@ void ipoib_mcast_join_task(struct work_struct *work)
>   struct ib_port_attr port_attr;
>   unsigned long delay_until = 0;
>   struct ipoib_mcast *mcast = NULL;
> - int create = 1;
>  
>   if (!test_bit(IPOIB_FLAG_OPER_UP, &priv->flags))
>   return;
> @@ -566,7 +588,6 @@ void ipoib_mcast_join_task(struct work_struct *work)
>   if (IS_ERR_OR_NULL(priv->broadcast->mc) &&
>   !test_bit(IPOIB_MCAST_FLAG_BUSY, &priv->broadcast->flags)) {
>   mcast = priv->broadcast;
> - create = 0;
>   if (mcast->backoff > 1 &&
>   time_before(jiffies, mcast->delay_until)) {
>   delay_until = mcast->delay_until;
> @@ -590,13 +611,7 @@ void ipoib_mcast_join_task(struct work_struct *work)
>   /* Found the next unjoined group */
>   init_completion(&mcast->don

Re: [PATCH 2/2] IB/ipoib: Suppress warning for send only join failures

2015-08-25 Thread Hal Rosenstock
On 8/21/2015 7:34 PM, Jason Gunthorpe wrote:
> We expect send only joins to fail, it just means there are no listeners
> for the group. The correct thing to do is silently drop the packet
> at source.
> 
> Eg avahi will full join 224.0.0.251 which causes a send only IGMP packet
> to 224.0.0.22, and then a warning level kmessage like this:
> 
>  ib0: sendonly multicast join failed for 
> ff12:401b::::::0016, status -22
> 
> If there is no IP router listening to IGMP.
> 
> Signed-off-by: Jason Gunthorpe 
> ---
>  drivers/infiniband/ulp/ipoib/ipoib_multicast.c | 12 ++--
>  1 file changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c 
> b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
> index c0e702c577d5..2d43ec542b63 100644
> --- a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
> +++ b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
> @@ -393,8 +393,13 @@ static int ipoib_mcast_join_complete(int status,
>   goto out_locked;
>   }
>   } else {
> - if (mcast->logcount++ < 20) {
> - if (status == -ETIMEDOUT || status == -EAGAIN) {
> + bool silent_fail =
> + test_bit(IPOIB_MCAST_FLAG_SENDONLY, &mcast->flags) &&
> + status == -EINVAL;

Aren't there other reasons that send only join might have EINVAL
indicated ? Maybe it's better to be overly silent rather than overly
verbose as to not spam the log but it seems like it would make debug of
such cases harder.

> +
> + if (mcast->logcount < 20) {
> + if (status == -ETIMEDOUT || status == -EAGAIN ||
> + silent_fail) {
>   ipoib_dbg_mcast(priv, "%smulticast join failed 
> for %pI6, status %d\n",
>   
> test_bit(IPOIB_MCAST_FLAG_SENDONLY, &mcast->flags) ? "sendonly " : "",
>   mcast->mcmember.mgid.raw, 
> status);

ipoib_dbg_mcast logging is conditionalized on CONFIG_INFINIBAND_IPOIB_DEBUG

> @@ -403,6 +408,9 @@ static int ipoib_mcast_join_complete(int status,
>   
> test_bit(IPOIB_MCAST_FLAG_SENDONLY, &mcast->flags) ? "sendonly " : "",
>  mcast->mcmember.mgid.raw, status);
>   }
> +
> + if (!silent_fail)
> + mcast->logcount++;
>   }
>  
>   if (test_bit(IPOIB_MCAST_FLAG_SENDONLY, &mcast->flags) &&

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH for-next 00/10] Add RoCE support to the mlx5 driver

2015-08-25 Thread Achiad Shochat

On 8/25/2015 3:32 PM, Tom Talpey wrote:

On 8/25/2015 4:35 AM, Achiad Shochat wrote:

On 8/24/2015 11:40 PM, Tom Talpey wrote:

On 8/20/2015 12:46 PM, Achiad Shochat wrote:

Hi Doug,

This patchset adds RoCE V1 and RoCE V2 support to the mlx5 device
driver.


Question - assuming I read them correctly, these patches add the
RoCE v1 and RoCE v2 support on a per-port basis. That is, a port
can be either IB, RoCE v1 or RoCE v2, but not a combination.

Has any thought been put toward supporting these protocols on a
per-QP basis, i.e. the caller of rdma_connect() can specify the
desired protocol? Or to have some sort of discovery?

I know that there may be implementation restrictions on today's
devices, but it's my personal belief that future devices will
support multiple protocols (perhaps beyond the three above), and
laying the groundwork for this today will be important.

Tom.





RoCE v1 and RoCE v2 are supported on a per-GID basis, not sure what got
you to understand it is per-port.


Because the protocol capabilities were being marked at the ib_device
level. Ok, it's good that the protocol is per-endpoint.

But I don't understand how it will work per-GID. What if the target
node is RoCEv2 and on another subnet? How will it discover the remote's
capability and establish the right protocol? How does the initiator
select the protocol, if there is a choice?

Another way of asking this question is, why is all this stuff in the
driver, at the bottom of the stack? I think it should be in the
rdmacm layer.

Tom.



Selection of the RoCE version to be used per connection and discovering 
the RoCE versions supported by the target is up to the application.


It is not done by the driver of course, neither by the kernel stack.

The driver only exposes add/del_gid callbacks, used by the ib_core to 
configure the device GID table entries (including RoCE version per 
entry) based on netdev IPs configured in the system.



--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH for-4.3] iw_cxgb4: Add support for clip

2015-08-25 Thread Steve Wise
Acked-by: Steve Wise 


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] IB/ipoib: Clean up send-only multicast joins

2015-08-25 Thread Jason Gunthorpe
On Tue, Aug 25, 2015 at 08:58:34AM -0400, Hal Rosenstock wrote:
> On 8/21/2015 7:34 PM, Jason Gunthorpe wrote:
> > Even though we don't expect the group to be created by the SM we
> > sill need to provide all the parameters to force the SM to validate
> > they are correct.
> 
> Out of curiosity, has it been observed that there was inconsistency in
> these additional IPoIB parameters between broadcast and non broadcast
> groups on client side or is this just to be defensive to make sure this
> does not occur ?

Not that I've seen. This is one half a code clean up, one half
better following the spec..

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] IB/ipoib: Suppress warning for send only join failures

2015-08-25 Thread Jason Gunthorpe
On Tue, Aug 25, 2015 at 08:59:13AM -0400, Hal Rosenstock wrote:
> > -   if (mcast->logcount++ < 20) {
> > -   if (status == -ETIMEDOUT || status == -EAGAIN) {
> > +   bool silent_fail =
> > +   test_bit(IPOIB_MCAST_FLAG_SENDONLY, &mcast->flags) &&
> > +   status == -EINVAL;
> 
> Aren't there other reasons that send only join might have EINVAL
> indicated ?

Not sure, the layers below all eat the detailed error code. Hopefully
EINVAL isn't re-used.

> Maybe it's better to be overly silent rather than overly
> verbose as to not spam the log but it seems like it would make debug of
> such cases harder.

It makes debugging harder to have worthless messages because they
obscure what is going on. The first time I saw this I assumed there
was an issue, but it turns out to be an expected failure.

The other issue is the way the rate limiting works:

> > +   if (mcast->logcount < 20) {
> > +   if (status == -ETIMEDOUT || status == -EAGAIN ||
> > +   silent_fail) {
> > ipoib_dbg_mcast(priv, "%smulticast join failed 
> > for %pI6, status %d\n",
> > 
> > test_bit(IPOIB_MCAST_FLAG_SENDONLY, &mcast->flags) ? "sendonly " : "",
> > mcast->mcmember.mgid.raw, 
> > status);

So wasting logcount with expected failures just results in eating
unexpected failures...

> ipoib_dbg_mcast logging is conditionalized on CONFIG_INFINIBAND_IPOIB_DEBUG

Most distros turn this off so the change only impacts people trying to
debug this stuff.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1 for-next 0/7] Add support for multicast loopback prevention to mlx4

2015-08-25 Thread Christoph Lameter
On Thu, 20 Aug 2015, Eran Ben Elisha wrote:

> This patch-set adds a new  implementation for multicast loopback prevention 
> for
> mlx4 driver.  The current implementation is very limited, especially if link
> layer is Ethernet. The new implementation is based on HW feature of dropping
> incoming multicast packets if the sender QP counter index is equal to the
> receiver counter index.

Do you have this in a git tree somewhere for testing?

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] IB/ipoib: Clean up send-only multicast joins

2015-08-25 Thread Doug Ledford
On 08/21/2015 07:34 PM, Jason Gunthorpe wrote:
> Even though we don't expect the group to be created by the SM we
> sill need to provide all the parameters to force the SM to validate
> they are correct.

Why does this patch embed locking changes that, as far I can tell, are
not needed by the rest of the patch?  If the locking changes are needed
for some reason, then they likely need to be their own patch with their
own changelog.

> Signed-off-by: Jason Gunthorpe 
> ---
>  drivers/infiniband/ulp/ipoib/ipoib_multicast.c | 47 
> +-
>  1 file changed, 31 insertions(+), 16 deletions(-)
> 
> diff --git a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c 
> b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
> index 0d23e0568deb..c0e702c577d5 100644
> --- a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
> +++ b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
> @@ -448,8 +448,8 @@ out_locked:
>   return status;
>  }
>  
> -static void ipoib_mcast_join(struct net_device *dev, struct ipoib_mcast 
> *mcast,
> -  int create)
> +/* priv->lock must be held when calling */
> +static void ipoib_mcast_join(struct net_device *dev, struct ipoib_mcast 
> *mcast)
>  {
>   struct ipoib_dev_priv *priv = netdev_priv(dev);
>   struct ib_sa_multicast *multicast;
> @@ -471,7 +471,14 @@ static void ipoib_mcast_join(struct net_device *dev, 
> struct ipoib_mcast *mcast,
>   IB_SA_MCMEMBER_REC_PKEY |
>   IB_SA_MCMEMBER_REC_JOIN_STATE;
>  
> - if (create) {
> + if (mcast != priv->broadcast) {
> + /*
> +  * RFC 4391:
> +  *  The MGID MUST use the same P_Key, Q_Key, SL, MTU,
> +  *  and HopLimit as those used in the broadcast-GID.  The rest
> +  *  of attributes SHOULD follow the values used in the
> +  *  broadcast-GID as well.
> +  */
>   comp_mask |=
>   IB_SA_MCMEMBER_REC_QKEY |
>   IB_SA_MCMEMBER_REC_MTU_SELECTOR |
> @@ -492,19 +499,35 @@ static void ipoib_mcast_join(struct net_device *dev, 
> struct ipoib_mcast *mcast,
>   rec.sl= priv->broadcast->mcmember.sl;
>   rec.flow_label= priv->broadcast->mcmember.flow_label;
>   rec.hop_limit = priv->broadcast->mcmember.hop_limit;
> +
> + /*
> +  * Historically Linux IPoIB has never properly supported SEND
> +  * ONLY join. It emulated it by not providing all the required
> +  * attributes, which is enough to prevent group creation and
> +  * detect if there are full members or not. A major problem
> +  * with supporting SEND ONLY is detecting when the group is
> +  * auto-destroyed as IPoIB will cache the MLID..
> +  */
> +#if 1
> + if (test_bit(IPOIB_MCAST_FLAG_SENDONLY, &mcast->flags))
> + comp_mask &= ~IB_SA_MCMEMBER_REC_TRAFFIC_CLASS;
> +#else
> + if (test_bit(IPOIB_MCAST_FLAG_SENDONLY, &mcast->flags))
> + rec.join_state = 4;
> +#endif
>   }
>  
> + spin_unlock_irq(&priv->lock);
>   multicast = ib_sa_join_multicast(&ipoib_sa_client, priv->ca, priv->port,
>&rec, comp_mask, GFP_KERNEL,
>ipoib_mcast_join_complete, mcast);
> + spin_lock_irq(&priv->lock);
>   if (IS_ERR(multicast)) {
>   ret = PTR_ERR(multicast);
>   ipoib_warn(priv, "ib_sa_join_multicast failed, status %d\n", 
> ret);
> - spin_lock_irq(&priv->lock);
>   /* Requeue this join task with a backoff delay */
>   __ipoib_mcast_schedule_join_thread(priv, mcast, 1);
>   clear_bit(IPOIB_MCAST_FLAG_BUSY, &mcast->flags);
> - spin_unlock_irq(&priv->lock);
>   complete(&mcast->done);
>   }
>  }
> @@ -517,7 +540,6 @@ void ipoib_mcast_join_task(struct work_struct *work)
>   struct ib_port_attr port_attr;
>   unsigned long delay_until = 0;
>   struct ipoib_mcast *mcast = NULL;
> - int create = 1;
>  
>   if (!test_bit(IPOIB_FLAG_OPER_UP, &priv->flags))
>   return;
> @@ -566,7 +588,6 @@ void ipoib_mcast_join_task(struct work_struct *work)
>   if (IS_ERR_OR_NULL(priv->broadcast->mc) &&
>   !test_bit(IPOIB_MCAST_FLAG_BUSY, &priv->broadcast->flags)) {
>   mcast = priv->broadcast;
> - create = 0;
>   if (mcast->backoff > 1 &&
>   time_before(jiffies, mcast->delay_until)) {
>   delay_until = mcast->delay_until;
> @@ -590,13 +611,7 @@ void ipoib_mcast_join_task(struct work_struct *work)
>   /* Found the next unjoined group */
>   init_completion(&mcast->done);
>

Re: [PATCH 1/2] IB/ipoib: Clean up send-only multicast joins

2015-08-25 Thread Jason Gunthorpe
On Tue, Aug 25, 2015 at 01:50:05PM -0400, Doug Ledford wrote:
> On 08/21/2015 07:34 PM, Jason Gunthorpe wrote:
> > Even though we don't expect the group to be created by the SM we
> > sill need to provide all the parameters to force the SM to validate
> > they are correct.
> 
> Why does this patch embed locking changes that, as far I can tell, are
> not needed by the rest of the patch?

test_bit was lowered into ipoib_mcast_join, which requires pushing the
lock unlock down as well. The set_bit side holds that lock.

> If the locking changes are needed for some reason, then they likely
> need to be their own patch with their own changelog.

It doesn't make sense to move the locking without the code motion that
motivates it, IMHO.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] IB/ipoib: Clean up send-only multicast joins

2015-08-25 Thread Doug Ledford
On 08/25/2015 02:22 PM, Jason Gunthorpe wrote:
> On Tue, Aug 25, 2015 at 01:50:05PM -0400, Doug Ledford wrote:
>> On 08/21/2015 07:34 PM, Jason Gunthorpe wrote:
>>> Even though we don't expect the group to be created by the SM we
>>> sill need to provide all the parameters to force the SM to validate
>>> they are correct.
>>
>> Why does this patch embed locking changes that, as far I can tell, are
>> not needed by the rest of the patch?
> 
> test_bit was lowered into ipoib_mcast_join, which requires pushing the
> lock unlock down as well. The set_bit side holds that lock.

I see the confusion.  The test bit of SENDONLY isn't protected by the
lock, just the setting and clearing of BUSY.  There isn't any need to
push the locking down into mcast_join just because we are checking
SENDONLY in mcast_join.

>> If the locking changes are needed for some reason, then they likely
>> need to be their own patch with their own changelog.
> 
> It doesn't make sense to move the locking without the code motion that
> motivates it, IMHO.

Sure, I agree with you on that point.  I thought you were changing the
locking for some other reason that I wasn't groking, I didn't think you
were doing that for the SENDONLY flag test.


-- 
Doug Ledford 
  GPG KeyID: 0E572FDD




signature.asc
Description: OpenPGP digital signature


Re: [PATCH 1/2] IB/ipoib: Clean up send-only multicast joins

2015-08-25 Thread Jason Gunthorpe
On Tue, Aug 25, 2015 at 02:35:27PM -0400, Doug Ledford wrote:
> On 08/25/2015 02:22 PM, Jason Gunthorpe wrote:
> > On Tue, Aug 25, 2015 at 01:50:05PM -0400, Doug Ledford wrote:
> >> On 08/21/2015 07:34 PM, Jason Gunthorpe wrote:
> >>> Even though we don't expect the group to be created by the SM we
> >>> sill need to provide all the parameters to force the SM to validate
> >>> they are correct.
> >>
> >> Why does this patch embed locking changes that, as far I can tell, are
> >> not needed by the rest of the patch?
> > 
> > test_bit was lowered into ipoib_mcast_join, which requires pushing the
> > lock unlock down as well. The set_bit side holds that lock.
> 
> I see the confusion.  The test bit of SENDONLY isn't protected by the
> lock, just the setting and clearing of BUSY.

Well, I don't like to see locking elided unless necessary. The flags
is clearly lock protected data, the lock should be held when accessing
it - even if one can reason some of the locking away today. That just
increases maintainability and clarity.

In this instance pushing the locking is trivial. Do you really want it
gone?

.. and looking at this, I feel justified in this position, because
I noticed a bug in how flags is manipulated.

This:

static int ipoib_mcast_join_finish(struct ipoib_mcast *mcast,
   struct ib_sa_mcmember_rec *mcmember)
{
[..]
if (test_and_set_bit(IPOIB_MCAST_FLAG_ATTACHED, &mcast->flags)) 
{
[..]
clear_bit(IPOIB_MCAST_FLAG_ATTACHED, &mcast->flags);
return ret;
}

Has two unlocked sets for flags. The above is called directly from
the sa callback.

While, the clear_bit/etc in ipoib_mcast_restart_task has a lock held
and no obvious exclusion with the above (the mcast is on the
multicast_list by now)..

Sure looks like these two race:

if (test_and_set_bit(IPOIB_MCAST_FLAG_ATTACHED, 
&mcast->flags))..
clear_bit(IPOIB_MCAST_FLAG_FOUND, &mcast->flags);

Resulting in corruption of the flags.

This ugly untested thing sort it..

>From ccfc99859d221ea4dada20e388d50e2cc6be580c Mon Sep 17 00:00:00 2001
From: Jason Gunthorpe 
Date: Tue, 25 Aug 2015 13:27:02 -0600
Subject: [PATCH] IB/ipoib: Do not write to mcast->flags without holding a lock

At a minimum the ipoib_mcast_restart_task could be called at any
time and it will also write the flags resulting in corruption
of the flags value.

Signed-off-by: Jason Gunthorpe 
---
 drivers/infiniband/ulp/ipoib/ipoib_multicast.c | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c 
b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
index 2d43ec542b63..077586a867bf 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
@@ -219,9 +219,9 @@ static int ipoib_mcast_join_finish(struct ipoib_mcast 
*mcast,
/* Set the multicast MTU and cached Q_Key before we attach if it's
 * the broadcast group.
 */
+   spin_lock_irq(&priv->lock);
if (!memcmp(mcast->mcmember.mgid.raw, priv->dev->broadcast + 4,
sizeof (union ib_gid))) {
-   spin_lock_irq(&priv->lock);
if (!priv->broadcast) {
spin_unlock_irq(&priv->lock);
return -EAGAIN;
@@ -244,7 +244,6 @@ static int ipoib_mcast_join_finish(struct ipoib_mcast 
*mcast,

IPOIB_UD_MTU(ib_mtu_enum_to_int(priv->broadcast->mcmember.mtu));
 
priv->qkey = be32_to_cpu(priv->broadcast->mcmember.qkey);
-   spin_unlock_irq(&priv->lock);
priv->tx_wr.wr.ud.remote_qkey = priv->qkey;
set_qkey = 1;
}
@@ -254,19 +253,24 @@ static int ipoib_mcast_join_finish(struct ipoib_mcast 
*mcast,
ipoib_warn(priv, "multicast group %pI6 already 
attached\n",
   mcast->mcmember.mgid.raw);
 
+   spin_unlock_irq(&priv->lock);
return 0;
}
 
+   spin_unlock_irq(&priv->lock);
ret = ipoib_mcast_attach(dev, be16_to_cpu(mcast->mcmember.mlid),
 &mcast->mcmember.mgid, set_qkey);
if (ret < 0) {
ipoib_warn(priv, "couldn't attach QP to multicast group 
%pI6\n",
   mcast->mcmember.mgid.raw);
 
+   spin_lock_irq(&priv->lock);
clear_bit(IPOIB_MCAST_FLAG_ATTACHED, &mcast->flags);
+   spin_unlock_irq(&priv->lock);
return ret;
}
-   }
+   } else
+   spin_unlock_irq(&priv->lock);
 
{
struct ib_ah_attr av = {
-- 
2.1.4


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majo

Re: [PATCH for-next 00/10] Add RoCE support to the mlx5 driver

2015-08-25 Thread Haggai Eran
On 25/08/2015 16:06, Achiad Shochat wrote:
> On 8/25/2015 3:32 PM, Tom Talpey wrote:
>> But I don't understand how it will work per-GID. What if the target
>> node is RoCEv2 and on another subnet? How will it discover the remote's
>> capability and establish the right protocol? How does the initiator
>> select the protocol, if there is a choice?
>>
>> Another way of asking this question is, why is all this stuff in the
>> driver, at the bottom of the stack? I think it should be in the
>> rdmacm layer.
>>
>> Tom.
>>
> 
> Selection of the RoCE version to be used per connection and discovering
> the RoCE versions supported by the target is up to the application.
> 
> It is not done by the driver of course, neither by the kernel stack.
> 
> The driver only exposes add/del_gid callbacks, used by the ib_core to
> configure the device GID table entries (including RoCE version per
> entry) based on netdev IPs configured in the system.

Let me add that when using RDMA CM, it is in charge of choosing the gid
index for a connection, and needs to be aware of the different
protocols. The patchset that includes the core changes for RoCE v2 [1]
allows configuring the preferred protocol for each port. There is also a
heuristic that selects RoCE v2 (which is routable) when the address
resolution detects more than one hop to the destination.

Regards,
Haggai

[1] http://www.spinics.net/lists/linux-rdma/msg28120.html

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html