Re: IB/core: Use GID table in AH creation and dmac resolution

2015-11-03 Thread Somnath Kotur
Thanks Dan and Matan.

We will take a look and revert on this

Thanks
Som

On Wed, Nov 4, 2015 at 9:31 AM, Somnath Kotur
<somnath.ko...@avagotech.com> wrote:
> Thanks Dan and Matan.
>
> We will take a look and revert on this
>
> Thanks
> Som
>
> On Tue, Nov 3, 2015 at 7:14 PM, Matan Barak <mat...@mellanox.com> wrote:
>>
>>
>>
>> On 11/3/2015 3:11 PM, Dan Carpenter wrote:
>>>
>>> Hello Matan Barak,
>>>
>>> This is a semi-automatic email about new static checker warnings.
>>>
>>> The patch dbf727de7440: "IB/core: Use GID table in AH creation and
>>> dmac resolution" from Oct 15, 2015, leads to the following Smatch
>>> complaint:
>>>
>>> drivers/infiniband/hw/ocrdma/ocrdma_ah.c:157 ocrdma_create_ah()
>>>  error: we previously assumed 'sgid_attr.ndev' could be null (see
>>> line 146)
>>>
>>> drivers/infiniband/hw/ocrdma/ocrdma_ah.c
>>> 145 }
>>> 146 if (sgid_attr.ndev) {
>>>  ^^
>>> Patch introduces a NULL check.
>>>
>>> 147 if (is_vlan_dev(sgid_attr.ndev))
>>> 148 vlan_tag =
>>> vlan_dev_vlan_id(sgid_attr.ndev);
>>> 149 dev_put(sgid_attr.ndev);
>>> 150 }
>>> 151
>>> 152 if ((pd->uctx) &&
>>> 153 (!rdma_is_multicast_addr((struct in6_addr
>>> *)attr->grh.dgid.raw)) &&
>>> 154 (!rdma_link_local_addr((struct in6_addr
>>> *)attr->grh.dgid.raw))) {
>>> 155 status = rdma_addr_find_dmac_by_grh(,
>>> >grh.dgid,
>>> 156 attr->dmac,
>>> _tag,
>>> 157
>>> sgid_attr.ndev->ifindex);
>>>
>>> 
>>> Patch introduces this new dereference.  The warning might be a false
>>> positive if "pd->uctx" or rdma_is_multicast_addr() imply it's non-NULL
>>> but I don't know this code well enough to say for sure.  Hence this
>>> email.  :)
>>>
>>> 158 if (status) {
>>> 159 pr_err("%s(): Failed to resolve dmac from
>>> gid."
>>>
>>> regards,
>>> dan carpenter
>>>
>>
>> Thanks for the catch Dan.
>> As I wrote in the commit message - "ocrdma driver changes were done by
>> Somnath Kotur <somnath.ko...@avagotech.com>"
>> Somnath, RoCE implies non-NULL ndev, but dereferencing ifindex after
>> dev_put doesn't seem to be safe.
>> Could you please take a look?
>>
>> Thanks,
>> Matan
>>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH for-next V5 12/12] RDMA/ocrdma: Changes in driver to incorporate the moving of GID Table mgmt to IB/Core.

2015-06-11 Thread Somnath Kotur
Hi,
  Yes , Matan and I need to work together and revisit this patch in light
of the split patch series and remove any references to RoCE v2...

Thanks for the feedback Jason and apologies for the oversight, we should
have worked this out internally before sending out V5

Regards
Som

 -Original Message-
 From: Jason Gunthorpe [mailto:jguntho...@obsidianresearch.com]
 Sent: Thursday, June 11, 2015 9:41 AM
 To: Matan Barak
 Cc: Doug Ledford; Or Gerlitz; Moni Shoua; Sean Hefty; Somnath Kotur;
linux-
 r...@vger.kernel.org; Somnath Kotur; Devesh Sharma
 Subject: Re: [PATCH for-next V5 12/12] RDMA/ocrdma: Changes in driver to
 incorporate the moving of GID Table mgmt to IB/Core.

 On Mon, Jun 08, 2015 at 05:12:15PM +0300, Matan Barak wrote:
  From: Somnath Kotur somnath.ko...@emulex.com
 
  1.Check and set port capability flags to indicate RoCEV2 support.

 ??? This series has nothing to with rocev2 now, what is this about?

  mutex_init(dev-dev_lock);
  -   dev-sgid_tbl = kzalloc(sizeof(union ib_gid) *
  -   OCRDMA_MAX_SGID, GFP_KERNEL);

 Should sgid_tbl be dropped from the structure?

  +int ocrdma_modify_gid(struct ib_device *ibdev, u8 port_num, unsigned
 int index,
  + const union ib_gid *gid, const struct ib_gid_attr
*attr,
  + void **context)
  +{
  +   struct ocrdma_dev *dev;
  +
  +   dev = get_ocrdma_dev(ibdev);
 
  return 0;
   }

 Empty modify gid? Shouldn't it be completely empty?

 This is correct? This HW sends the full SGID in the WQE?

  +enum {
  + OCRDMA_L3_TYPE_IB_GRH   = 0x00,
  + OCRDMA_L3_TYPE_IPV4 = 0x01,
  + OCRDMA_L3_TYPE_IPV6 = 0x02
  +};

 These added constants are not used? Probably others as well?

 Jason
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v3 for-next 01/33] IB/core: Add RoCE GID cache

2015-04-14 Thread Somnath Kotur


 -Original Message-
 From: Hefty, Sean [mailto:sean.he...@intel.com]
 Sent: Tuesday, April 14, 2015 11:02 PM
 To: Matan Barak; Somnath Kotur; rol...@kernel.org
 Cc: linux-rdma@vger.kernel.org
 Subject: RE: [PATCH v3 for-next 01/33] IB/core: Add RoCE GID cache
 
  This is a part of the GID meta info. The user should be able to choose
  between RoCE V1 (which is represented here by IB_GID_TYPE_IB) and
 RoCE
  V2 - just as a user could choose between IPv6 and IPv4.
 
 IPv4 and IPv6 are different protocols, not different formats for the same
 address.  How does RoCE v2 not break every app? 
It does not  break every app, the choice of which GID type to use is made by 
the RDMA-CM based on network topology hint obtained from the IP stack.
Please refer to patch 15/33: IB/Core: Changes to the IB Core infrastructure for 
RoCEv2 support.
Of course, if the user does not want to go with this choice made by the 
RDMA-CM, then there is the option of overriding it using the configfs patch 
(PATCH 14/33)
Hope that clarifies?

Thanks
Som
 
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v3 for-next 01/33] IB/core: Add RoCE GID cache

2015-04-07 Thread Somnath Kotur
Hi Sean,

 -Original Message-
 From: Hefty, Sean [mailto:sean.he...@intel.com]
 Sent: Wednesday, April 08, 2015 6:00 AM
 To: Somnath Kotur; rol...@kernel.org
 Cc: linux-rdma@vger.kernel.org; Matan Barak
 Subject: RE: [PATCH v3 for-next 01/33] IB/core: Add RoCE GID cache
 
  In order to manage multiple types, vlans and MACs per GID, we need to
  store them along the GID itself. We store the net device as well, as
  sometimes GIDs should be handled according to the net device they came
  from. Since populating the GID table should be identical for every
  RoCE provider, the GIDs table should be handled in ib_core.
 
  Adding a GID cache table that supports a lockless find, add and delete
  gids. The lockless nature comes from using a unique sequence number
  per table entry and detecting that while reading/ writing this
  sequence wasn't changed.
 
  By using this RoCE GID cache table, providers must implement a
  modify_gid callback. The table is managed exclusively by this
  roce_gid_cache and the provider just need to write the data to the
  hardware.
 
  Signed-off-by: Matan Barak mat...@mellanox.com
  Signed-off-by: Somnath Kotur somnath.ko...@emulex.com
  ---
   drivers/infiniband/core/Makefile |   3 +-
   drivers/infiniband/core/core_priv.h  |  24 ++
   drivers/infiniband/core/roce_gid_cache.c | 518
 
 Why does RoCE need such a complex gid cache?  If a gid cache is needed at
 all, why should it be restricted to RoCE only?  And why is such a complex
 synchronization scheme needed?  Seriously, how many times will GIDs
 change and how many readers at once do you expect to have?
 
 
  diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index
  65994a1..1866595 100644
  --- a/include/rdma/ib_verbs.h
  +++ b/include/rdma/ib_verbs.h
  @@ -64,6 +64,36 @@ union ib_gid {
  } global;
   };
 
  +extern union ib_gid zgid;
  +
  +enum ib_gid_type {
  +   /* If link layer is Ethernet, this is RoCE V1 */
 
 I don't understand this comment.  Does RoCE v2 not run on Ethernet?
 
Yes, this comment probably could use a reword..
  +   IB_GID_TYPE_IB= 0,
  +   IB_GID_TYPE_ROCE_V2   = 1,
  +   IB_GID_TYPE_SIZE
  +};
 
 Can you explain the purpose of defining a 'GID type'.  A GID is just a global
 address.  Why does it matter to anyone using it how it was constructed?

This is part of RoCE V2 Specification.  Please refer to Section A 17.8 . 
The GID Type determines the protocol for outbound packet generation i.e RoCE V1 
(0x8915 Ether Type) or RoCEV2 (IPv4 or IPv6)
 
  +
  +struct ib_gid_attr {
  +   enum ib_gid_typegid_type;
  +   struct net_device   *ndev;
  +};
  +
  +struct ib_roce_gid_cache_entry {
  +   /* seq number of 0 indicates entry being changed. */
  +   unsigned intseq;
  +   union ib_gidgid;
  +   struct ib_gid_attr  attr;
  +   void   *context;
  +};
  +
  +struct ib_roce_gid_cache {
  +   int  active;
  +   int  sz;
  +   /* locking against multiple writes in data_vec */
  +   struct mutex lock;
  +   struct ib_roce_gid_cache_entry *data_vec; };
  +
   enum rdma_node_type {
  /* IB values map to NodeInfo:NodeType. */
  RDMA_NODE_IB_CA = 1,
  @@ -265,7 +295,9 @@ enum ib_port_cap_flags {
  IB_PORT_BOOT_MGMT_SUP   = 1  23,
  IB_PORT_LINK_LATENCY_SUP= 1  24,
  IB_PORT_CLIENT_REG_SUP  = 1  25,
  -   IB_PORT_IP_BASED_GIDS   = 1  26
  +   IB_PORT_IP_BASED_GIDS   = 1  26,
  +   IB_PORT_ROCE= 1  27,
  +   IB_PORT_ROCE_V2 = 1  28,
 
 Why does RoCE suddenly require a port capability bit?  RoCE runs today
 without setting any bit.
Again, this is part of RoCE V2 SPEC, please refer to Section A17.5.1- Query 
HCA(Pasting snippet below)
A new RoCE Supported capability bit shall be added to the Port Attributes
list. This capability bit applies exclusively to ports of the new
RoCEv2 type


Thanks
Som
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v3 for-next 01/33] IB/core: Add RoCE GID cache

2015-03-26 Thread Somnath Kotur
Hi Matan/Moni,
Could either of you please respond to both of Bart's 
queries?

Thanks
Somnath

 -Original Message-
 From: Bart Van Assche [mailto:bart.vanass...@sandisk.com]
 Sent: Thursday, March 26, 2015 5:13 AM
 To: Somnath Kotur; rol...@kernel.org
 Cc: linux-rdma@vger.kernel.org; Matan Barak
 Subject: Re: [PATCH v3 for-next 01/33] IB/core: Add RoCE GID cache
 
 On 03/25/2015 02:19 PM, Somnath Kotur wrote:
  +   if (cache-data_vec[ix].attr.ndev 
  +   cache-data_vec[ix].attr.ndev != old_net_dev)
 
 A few lines earlier the memory old_net_dev points at was freed. If two
 instances of this function run concurrently, what prevents that the
 old_net_dev memory has been reallocated and hence that attr.ndev ==
 old_net_dev although both pointers refer(red) to different network devices
 ?
 
  +   ACCESS_ONCE(cache-data_vec[ix].seq) = orig_seq;
 
 Invoking write_gid() is only safe if the caller serializes write_gid() calls.
 Apparently the cache-lock mutex is used for that purpose. So why is it
 necessary to use ACCESS_ONCE() here ? Why is it needed to prevent that
 the compiler coalesces this write with another write into the same structure
 ?
 
  +   /* Make sure the sequence number we remeber was read
 
 This looks like a typo - shouldn't the above read remember ?
 
 BTW, the style of that comment is recommended only for networking code
 and not for IB code. Have you verified this patch with checkpatch ?
 
  +   mutex_lock(cache-lock);
  +
  +   for (ix = 0; ix  cache-sz; ix++)
  +   if (cache-data_vec[ix].attr.ndev == ndev)
  +   write_gid(ib_dev, port, cache, ix, zgid, zattr);
  +
  +   mutex_unlock(cache-lock);
  +   return 0;
 
 The traditional Linux kernel coding style is one blank line before
 mutex_lock() and after mutex_unlock() but not after mutex_lock() nor
 before mutex_unlock().
 
  +   orig_seq = ACCESS_ONCE(cache-data_vec[index].seq);
  +   /* Make sure we read the sequence number before copying the
  +* gid to local storage. */
  +   smp_rmb();
 
 Please use READ_ONCE() instead of ACCESS_ONCE() as recommended in
 linux/compiler.h.
 
  +static void free_roce_gid_cache(struct ib_device *ib_dev, u8 port) {
  +   int i;
  +   struct ib_roce_gid_cache *cache =
  +   ib_dev-cache.roce_gid_cache[port - 1];
  +
  +   if (!cache)
  +   return;
  +
  +   for (i = 0; i  cache-sz; ++i) {
  +   if (memcmp(cache-data_vec[i].gid, zgid,
  +  sizeof(cache-data_vec[i].gid)))
  +   write_gid(ib_dev, port, cache, i, zgid, zattr);
  +   }
   +  kfree(cache-data_vec);
   +  kfree(cache);
   +}
 
 Overwriting data just before it is freed is not useful. Please use
 CONFIG_SLUB_DEBUG=y to debug use-after-free issues instead of such
 code.
 
 Bart.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 for-next 22/33] IB/mlx4: Lock with RCU instead of RTNL

2015-03-24 Thread Somnath Kotur
From: Moni Shoua mo...@mellanox.com

The function eth_link_query_port() used to take the RTNL lock when
call to netdev_master_upper_dev_get() was necessary. This makes it
impossible to call this function with RTNL lock is held. Calling
netdev_master_upper_dev_get_rcu() and locking with RCU instead solve
this problem.

Signed-off-by: Moni Shoua mo...@mellanox.com
Signed-off-by: Somnath Kotur somnath.ko...@emulex.com
---
 drivers/infiniband/hw/mlx4/main.c | 15 +++
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/drivers/infiniband/hw/mlx4/main.c 
b/drivers/infiniband/hw/mlx4/main.c
index d8b227e..32cd009 100644
--- a/drivers/infiniband/hw/mlx4/main.c
+++ b/drivers/infiniband/hw/mlx4/main.c
@@ -367,14 +367,15 @@ static int eth_link_query_port(struct ib_device *ibdev, 
u8 port,
props-state= IB_PORT_DOWN;
props-phys_state   = state_to_phys_state(props-state);
props-active_mtu   = IB_MTU_256;
-   if (is_bonded)
-   rtnl_lock(); /* required to get upper dev */
down_read(iboe-sem);
ndev = iboe-netdevs[port - 1];
-   if (ndev  is_bonded)
-   ndev = netdev_master_upper_dev_get(ndev);
+   if (ndev  is_bonded) {
+   rcu_read_lock(); /* required to get upper dev */
+   ndev = netdev_master_upper_dev_get_rcu(ndev);
+   rcu_read_unlock();
+   }
if (!ndev)
-   goto out_unlock;
+   goto unlock;
 
tmp = iboe_get_mtu(ndev-mtu);
props-active_mtu = tmp ? min(props-max_mtu, tmp) : IB_MTU_256;
@@ -382,10 +383,8 @@ static int eth_link_query_port(struct ib_device *ibdev, u8 
port,
props-state= (netif_running(ndev)  
netif_carrier_ok(ndev)) ?
IB_PORT_ACTIVE : IB_PORT_DOWN;
props-phys_state   = state_to_phys_state(props-state);
-out_unlock:
+unlock:
up_read(iboe-sem);
-   if (is_bonded)
-   rtnl_unlock();
 out:
mlx4_free_cmd_mailbox(mdev-dev, mailbox);
return err;
-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 for-next 17/33] RDMA/ocrdma: changes to support RoCE-v2 in UD path

2015-03-24 Thread Somnath Kotur
From: Devesh Sharma devesh.sha...@emulex.com

To support UD protocol this patch adds following
changes to existing UD implementation.

1. AH creation resolves gid-type for a given index.
2. Based on GID-type protocol header is built.
3. Work completion reports l3-type if f/w supports RoCE-v2
   and sets IB_WC_WITH_NETWORK_HDR_TYPE flag in wc-wc_flags.

Signed-off-by: Somnath Kotur somnath.ko...@emulex.com
Signed-off-by: Devesh Sharma devesh.sha...@emulex.com
---
 drivers/infiniband/hw/ocrdma/ocrdma.h   |  1 +
 drivers/infiniband/hw/ocrdma/ocrdma_ah.c| 69 -
 drivers/infiniband/hw/ocrdma/ocrdma_sli.h   |  5 ++-
 drivers/infiniband/hw/ocrdma/ocrdma_verbs.c | 23 --
 4 files changed, 81 insertions(+), 17 deletions(-)

diff --git a/drivers/infiniband/hw/ocrdma/ocrdma.h 
b/drivers/infiniband/hw/ocrdma/ocrdma.h
index 97f971a..302fd0e 100644
--- a/drivers/infiniband/hw/ocrdma/ocrdma.h
+++ b/drivers/infiniband/hw/ocrdma/ocrdma.h
@@ -341,6 +341,7 @@ struct ocrdma_ah {
struct ocrdma_av *av;
u16 sgid_index;
u32 id;
+   u8 hdr_type;
 };
 
 struct ocrdma_qp_hwq_info {
diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_ah.c 
b/drivers/infiniband/hw/ocrdma/ocrdma_ah.c
index 7ecd230..1bb72a0 100644
--- a/drivers/infiniband/hw/ocrdma/ocrdma_ah.c
+++ b/drivers/infiniband/hw/ocrdma/ocrdma_ah.c
@@ -39,6 +39,20 @@
 
 #define OCRDMA_VID_PCP_SHIFT   0xD
 
+static u16 ocrdma_hdr_type_to_proto_num(u8 hdr_type)
+{
+   switch (hdr_type) {
+   case OCRDMA_L3_TYPE_IB_GRH:
+   return (u16)0x8915;
+   case OCRDMA_L3_TYPE_IPV4:
+   return (u16)0x0800;
+   case OCRDMA_L3_TYPE_IPV6:
+   return (u16)0x86dd;
+   default:
+   return 0;
+   }
+}
+
 static inline int set_av_attr(struct ocrdma_dev *dev, struct ocrdma_ah *ah,
struct ib_ah_attr *attr, union ib_gid *sgid,
int pdid, bool *isvlan, u16 vlan_tag)
@@ -47,22 +61,33 @@ static inline int set_av_attr(struct ocrdma_dev *dev, 
struct ocrdma_ah *ah,
struct ocrdma_eth_vlan eth;
struct ocrdma_grh grh;
int eth_sz;
+   u16 proto_num = 0;
+   u8 nxthdr = 0x11;
+   struct iphdr ipv4;
+   union {
+   struct sockaddr _sockaddr;
+   struct sockaddr_in  _sockaddr_in;
+   struct sockaddr_in6 _sockaddr_in6;
+   } sgid_addr, dgid_addr;
 
memset(eth, 0, sizeof(eth));
memset(grh, 0, sizeof(grh));
+   /* Protocol Number */
+   proto_num = ocrdma_hdr_type_to_proto_num(ah-hdr_type);
+   nxthdr = (proto_num == 0x8915) ? 0x1b : 0x11;
 
/* VLAN */
if (!vlan_tag || (vlan_tag  0xFFF))
vlan_tag = dev-pvid;
if (vlan_tag  (vlan_tag  0x1000)) {
eth.eth_type = cpu_to_be16(0x8100);
-   eth.roce_eth_type = cpu_to_be16(OCRDMA_ROCE_ETH_TYPE);
+   eth.roce_eth_type = cpu_to_be16(proto_num);
vlan_tag |= (dev-sl  0x07)  OCRDMA_VID_PCP_SHIFT;
eth.vlan_tag = cpu_to_be16(vlan_tag);
eth_sz = sizeof(struct ocrdma_eth_vlan);
*isvlan = true;
} else {
-   eth.eth_type = cpu_to_be16(OCRDMA_ROCE_ETH_TYPE);
+   eth.eth_type = cpu_to_be16(proto_num);
eth_sz = sizeof(struct ocrdma_eth_basic);
}
/* MAC */
@@ -71,18 +96,34 @@ static inline int set_av_attr(struct ocrdma_dev *dev, 
struct ocrdma_ah *ah,
if (status)
return status;
ah-sgid_index = attr-grh.sgid_index;
-   memcpy(grh.sgid[0], sgid-raw, sizeof(union ib_gid));
-   memcpy(grh.dgid[0], attr-grh.dgid.raw, sizeof(attr-grh.dgid.raw));
-
-   grh.tclass_flow = cpu_to_be32((6  28) |
-   (attr-grh.traffic_class  24) |
-   attr-grh.flow_label);
-   /* 0x1b is next header value in GRH */
-   grh.pdid_hoplimit = cpu_to_be32((pdid  16) |
-   (0x1b  8) | attr-grh.hop_limit);
/* Eth HDR */
memcpy(ah-av-eth_hdr, eth, eth_sz);
-   memcpy((u8 *)ah-av + eth_sz, grh, sizeof(struct ocrdma_grh));
+   if (ah-hdr_type == RDMA_NETWORK_IPV4) {
+   *((__be16 *)ipv4) = htons((4  12) | (5  8) |
+  attr-grh.traffic_class);
+   ipv4.id = cpu_to_be16(pdid);
+   ipv4.frag_off = htons(IP_DF);
+   ipv4.tot_len = htons(0);
+   ipv4.ttl = attr-grh.hop_limit;
+   ipv4.protocol = nxthdr;
+   rdma_gid2ip(sgid_addr._sockaddr, sgid);
+   ipv4.saddr = sgid_addr._sockaddr_in.sin_addr.s_addr;
+   rdma_gid2ip(dgid_addr._sockaddr, attr-grh.dgid);
+   ipv4.daddr = dgid_addr._sockaddr_in.sin_addr.s_addr;
+   memcpy((u8 *)ah-av + eth_sz, ipv4, sizeof(struct iphdr));
+   } else {
+   memcpy(grh.sgid[0], sgid-raw

[PATCH v3 for-next 24/33] IB/mlx4: Advertise RoCE support in port capabilities

2015-03-24 Thread Somnath Kotur
From: Moni Shoua mo...@mellanox.com

The port capability flags should indicate the support in RoCE modes (V1
or V2) of the port. The mlx4 driver sets these flags according to the
capabilities reported by the HW.

Signed-off-by: Moni Shoua mo...@mellanox.com
Signed-off-by: Somnath Kotur somnath.ko...@emulex.com
---
 drivers/infiniband/hw/mlx4/main.c |  6 ++
 drivers/net/ethernet/mellanox/mlx4/fw.c   |  5 -
 drivers/net/ethernet/mellanox/mlx4/main.c |  6 +-
 include/linux/mlx4/device.h   | 13 ++---
 4 files changed, 25 insertions(+), 5 deletions(-)

diff --git a/drivers/infiniband/hw/mlx4/main.c 
b/drivers/infiniband/hw/mlx4/main.c
index 32cd009..bf87a95 100644
--- a/drivers/infiniband/hw/mlx4/main.c
+++ b/drivers/infiniband/hw/mlx4/main.c
@@ -359,6 +359,12 @@ static int eth_link_query_port(struct ib_device *ibdev, u8 
port,
IB_WIDTH_4X : IB_WIDTH_1X;
props-active_speed = IB_SPEED_QDR;
props-port_cap_flags   = IB_PORT_CM_SUP | IB_PORT_IP_BASED_GIDS;
+
+   if (mdev-dev-caps.flags  MLX4_DEV_CAP_FLAG_IBOE)
+   props-port_cap_flags   |= IB_PORT_ROCE;
+   if (mdev-dev-caps.flags2  MLX4_DEV_CAP_FLAG2_ROCE_V1_V2)
+   props-port_cap_flags   |= IB_PORT_ROCE_V2 | IB_PORT_ROCE;
+
props-gid_tbl_len  = mdev-dev-caps.gid_table_len[port];
props-max_msg_sz   = mdev-dev-caps.max_msg_sz;
props-pkey_tbl_len = 1;
diff --git a/drivers/net/ethernet/mellanox/mlx4/fw.c 
b/drivers/net/ethernet/mellanox/mlx4/fw.c
index 3702fd1..d573e73 100644
--- a/drivers/net/ethernet/mellanox/mlx4/fw.c
+++ b/drivers/net/ethernet/mellanox/mlx4/fw.c
@@ -146,7 +146,8 @@ static void dump_dev_cap_flags2(struct mlx4_dev *dev, u64 
flags)
[17] = Asymmetric EQs support,
[18] = More than 80 VFs support,
[19] = Performance optimized for limited rule configuration 
flow steering support,
-   [21] = Port Remap support
+   [21] = Port Remap support,
+   [22] = RoCEv2 support
};
int i;
 
@@ -852,6 +853,8 @@ int mlx4_QUERY_DEV_CAP(struct mlx4_dev *dev, struct 
mlx4_dev_cap *dev_cap)
dev_cap-flags2 |= MLX4_DEV_CAP_FLAG2_EQE_STRIDE;
MLX4_GET(dev_cap-bmme_flags, outbox,
 QUERY_DEV_CAP_BMME_FLAGS_OFFSET);
+   if (dev_cap-bmme_flags  MLX4_FLAG_ROCE_V1_V2)
+   dev_cap-flags2 |= MLX4_DEV_CAP_FLAG2_ROCE_V1_V2;
if (dev_cap-bmme_flags  MLX4_FLAG_PORT_REMAP)
dev_cap-flags2 |= MLX4_DEV_CAP_FLAG2_PORT_REMAP;
MLX4_GET(field, outbox, QUERY_DEV_CAP_CONFIG_DEV_OFFSET);
diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c 
b/drivers/net/ethernet/mellanox/mlx4/main.c
index 1893a57..29c60fd 100644
--- a/drivers/net/ethernet/mellanox/mlx4/main.c
+++ b/drivers/net/ethernet/mellanox/mlx4/main.c
@@ -386,8 +386,12 @@ static int mlx4_dev_cap(struct mlx4_dev *dev, struct 
mlx4_dev_cap *dev_cap)
if (mlx4_priv(dev)-pci_dev_data  MLX4_PCI_DEV_FORCE_SENSE_PORT)
dev-caps.flags |= MLX4_DEV_CAP_FLAG_SENSE_SUPPORT;
/* Don't do sense port on multifunction devices (for now at least) */
-   if (mlx4_is_mfunc(dev))
+   /* Don't do enable RoCE V2 on multifunction devices */
+   if (mlx4_is_mfunc(dev)) {
dev-caps.flags = ~MLX4_DEV_CAP_FLAG_SENSE_SUPPORT;
+   dev_cap-flags2 = ~MLX4_DEV_CAP_FLAG2_ROCE_V1_V2;
+   mlx4_dbg(dev, RoCE V2 is not supported when SR-IOV is 
enabled\n);
+   }
 
if (mlx4_low_memory_profile()) {
dev-caps.log_num_macs  = MLX4_MIN_LOG_NUM_MAC;
diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h
index 9a05e73..9bdf157 100644
--- a/include/linux/mlx4/device.h
+++ b/include/linux/mlx4/device.h
@@ -202,7 +202,8 @@ enum {
MLX4_DEV_CAP_FLAG2_SYS_EQS  = 1LL   17,
MLX4_DEV_CAP_FLAG2_80_VFS   = 1LL   18,
MLX4_DEV_CAP_FLAG2_FS_A0= 1LL   19,
-   MLX4_DEV_CAP_FLAG2_PORT_REMAP   = 1LL   21
+   MLX4_DEV_CAP_FLAG2_PORT_REMAP   = 1LL   21,
+   MLX4_DEV_CAP_FLAG2_ROCE_V1_V2   = 1LL   22
 };
 
 enum {
@@ -250,6 +251,7 @@ enum {
MLX4_BMME_FLAG_TYPE_2_WIN   = 1   9,
MLX4_BMME_FLAG_RESERVED_LKEY= 1  10,
MLX4_BMME_FLAG_FAST_REG_WR  = 1  11,
+   MLX4_BMME_FLAG_ROCE_V1_V2   = 1  19,
MLX4_BMME_FLAG_PORT_REMAP   = 1  24,
MLX4_BMME_FLAG_VSD_INIT2RTR = 1  28,
 };
@@ -258,6 +260,10 @@ enum {
MLX4_FLAG_PORT_REMAP= MLX4_BMME_FLAG_PORT_REMAP
 };
 
+enum {
+   MLX4_FLAG_ROCE_V1_V2= MLX4_BMME_FLAG_ROCE_V1_V2
+};
+
 enum mlx4_event {
MLX4_EVENT_TYPE_COMP   = 0x00,
MLX4_EVENT_TYPE_PATH_MIG   = 0x01,
@@ -888,9 +894,10 @@ struct mlx4_mad_ifc {
if (((dev)-caps.port_mask[port

[PATCH v3 for-next 25/33] IB/mlx4: Implement ib_device callback - get_netdev

2015-03-24 Thread Somnath Kotur
From: Moni Shoua mo...@mellanox.com

This is a new callback that is required for RoCEv2 support.
In port aggregation mode it is required to return the netdev of the
active port so  support in mlx4 core driver to figure out that port
identity is required.

Signed-off-by: Moni Shoua mo...@mellanox.com
Signed-off-by: Somnath Kotur somnath.ko...@emulex.com
---
 drivers/infiniband/hw/mlx4/main.c | 29 +
 drivers/net/ethernet/mellanox/mlx4/main.c | 18 ++
 include/linux/mlx4/driver.h   |  1 +
 3 files changed, 48 insertions(+)

diff --git a/drivers/infiniband/hw/mlx4/main.c 
b/drivers/infiniband/hw/mlx4/main.c
index bf87a95..04e6603 100644
--- a/drivers/infiniband/hw/mlx4/main.c
+++ b/drivers/infiniband/hw/mlx4/main.c
@@ -47,6 +47,8 @@
 #include rdma/ib_addr.h
 #include rdma/ib_cache.h
 
+#include net/bonding.h
+
 #include linux/mlx4/driver.h
 #include linux/mlx4/cmd.h
 #include linux/mlx4/qp.h
@@ -1527,6 +1529,32 @@ unlock:
mutex_unlock(ibdev-qp1_proxy_lock[port - 1]);
 }
 
+static struct net_device *mlx4_ib_get_netdev(struct ib_device *device, u8 
port_num)
+{
+   struct mlx4_ib_dev *ibdev = to_mdev(device);
+
+   if (mlx4_is_bonded(ibdev-dev)) {
+   struct net_device *dev;
+   struct net_device *upper = NULL;
+
+   rcu_read_lock();
+
+   dev = mlx4_get_protocol_dev(ibdev-dev, MLX4_PROT_ETH, 
port_num);
+   if (dev)
+   upper = netdev_master_upper_dev_get_rcu(dev);
+   else
+   goto unlock;
+   if (upper)
+   dev = 
bond_option_active_slave_get_rcu(netdev_priv(upper));
+unlock:
+   rcu_read_unlock();
+
+   return dev;
+   }
+
+   return mlx4_get_protocol_dev(ibdev-dev, MLX4_PROT_ETH, port_num);
+}
+
 static void mlx4_ib_scan_netdevs(struct mlx4_ib_dev *ibdev,
 struct net_device *dev,
 unsigned long event)
@@ -1806,6 +1834,7 @@ static void *mlx4_ib_add(struct mlx4_dev *dev)
ibdev-ib_dev.attach_mcast  = mlx4_ib_mcg_attach;
ibdev-ib_dev.detach_mcast  = mlx4_ib_mcg_detach;
ibdev-ib_dev.process_mad   = mlx4_ib_process_mad;
+   ibdev-ib_dev.get_netdev= mlx4_ib_get_netdev;
 
if (!mlx4_is_slave(ibdev-dev)) {
ibdev-ib_dev.alloc_fmr = mlx4_ib_fmr_alloc;
diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c 
b/drivers/net/ethernet/mellanox/mlx4/main.c
index 29c60fd..3f469d3 100644
--- a/drivers/net/ethernet/mellanox/mlx4/main.c
+++ b/drivers/net/ethernet/mellanox/mlx4/main.c
@@ -1241,6 +1241,24 @@ int mlx4_port_map_set(struct mlx4_dev *dev, struct 
mlx4_port_map *v2p)
 }
 EXPORT_SYMBOL_GPL(mlx4_port_map_set);
 
+int mlx4_port_map_get(struct mlx4_dev *dev, u8 vport, u8 *pport)
+{
+   struct mlx4_priv *priv = mlx4_priv(dev);
+
+   if (!pport)
+   return -EINVAL;
+   *pport = 0;
+
+   if (vport == 1)
+   *pport = priv-v2p.port1;
+   else if (vport == 2)
+   *pport = priv-v2p.port2;
+   if (!*pport)
+   return -EINVAL;
+   return 0;
+}
+EXPORT_SYMBOL_GPL(mlx4_port_map_get);
+
 static int mlx4_load_fw(struct mlx4_dev *dev)
 {
struct mlx4_priv *priv = mlx4_priv(dev);
diff --git a/include/linux/mlx4/driver.h b/include/linux/mlx4/driver.h
index 5a06d96..a992971 100644
--- a/include/linux/mlx4/driver.h
+++ b/include/linux/mlx4/driver.h
@@ -81,6 +81,7 @@ struct mlx4_port_map {
 };
 
 int mlx4_port_map_set(struct mlx4_dev *dev, struct mlx4_port_map *v2p);
+int mlx4_port_map_get(struct mlx4_dev *dev, u8 vport, u8 *pport);
 
 void *mlx4_get_protocol_dev(struct mlx4_dev *dev, enum mlx4_protocol proto, 
int port);
 
-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 for-next 32/33] IB/mlx4: Create and use another QP1 for RoCEv2

2015-03-24 Thread Somnath Kotur
From: Moni Shoua mo...@mellanox.com

The mlx4 driver uses a special QP to implement the GSI QP. This kind of
QP allows to build the InfiniBand headers in SW to be put before the
payload that comes in with the WR. The mlx4 HW builds the packet,
calculates the ICRC and puts it at the end of the payload. This ICRC
calculation however depends on the QP configuration which is determined
when QP is modified (roce_mode during INIT-RTR). On the other hand,  ICRC
verification when packet is received does to depend on this
configuration.
Therefore, using 2 GSI QPs for send (one for each RoCE version) and 1
GSI QP for receive are required.

Signed-off-by: Moni Shoua mo...@mellanox.com
Signed-off-by: Somnath Kotur somnath.ko...@emulex.com
---
 drivers/infiniband/hw/mlx4/mlx4_ib.h |   7 ++
 drivers/infiniband/hw/mlx4/qp.c  | 155 +++
 2 files changed, 144 insertions(+), 18 deletions(-)

diff --git a/drivers/infiniband/hw/mlx4/mlx4_ib.h 
b/drivers/infiniband/hw/mlx4/mlx4_ib.h
index 018bda6..a853330 100644
--- a/drivers/infiniband/hw/mlx4/mlx4_ib.h
+++ b/drivers/infiniband/hw/mlx4/mlx4_ib.h
@@ -159,11 +159,18 @@ struct mlx4_ib_wq {
unsignedtail;
 };
 
+enum {
+   MLX4_IB_QP_CREATE_ROCE_V2_GSI = IB_QP_CREATE_RESERVED_START
+};
+
 enum mlx4_ib_qp_flags {
MLX4_IB_QP_LSO = IB_QP_CREATE_IPOIB_UD_LSO,
MLX4_IB_QP_BLOCK_MULTICAST_LOOPBACK = 
IB_QP_CREATE_BLOCK_MULTICAST_LOOPBACK,
MLX4_IB_QP_NETIF = IB_QP_CREATE_NETIF_QP,
MLX4_IB_QP_CREATE_USE_GFP_NOIO = IB_QP_CREATE_USE_GFP_NOIO,
+
+   /* Mellanox specific flags start from IB_QP_CREATE_RESERVED_START */
+   MLX4_IB_ROCE_V2_GSI_QP = MLX4_IB_QP_CREATE_ROCE_V2_GSI,
MLX4_IB_SRIOV_TUNNEL_QP = 1  30,
MLX4_IB_SRIOV_SQP = 1  31,
 };
diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c
index fb37415..b54f315 100644
--- a/drivers/infiniband/hw/mlx4/qp.c
+++ b/drivers/infiniband/hw/mlx4/qp.c
@@ -81,6 +81,7 @@ struct mlx4_ib_sqp {
u32 send_psn;
struct ib_ud_header ud_header;
u8  header_buf[MLX4_IB_UD_HEADER_SIZE];
+   struct ib_qp*roce_v2_gsi;
 };
 
 enum {
@@ -150,7 +151,10 @@ static int is_sqp(struct mlx4_ib_dev *dev, struct 
mlx4_ib_qp *qp)
}
}
}
-   return proxy_sqp;
+   if (proxy_sqp)
+   return 1;
+
+   return !!(qp-flags  MLX4_IB_ROCE_V2_GSI_QP);
 }
 
 /* used for INIT/CLOSE port logic */
@@ -672,6 +676,7 @@ static int create_qp_common(struct mlx4_ib_dev *dev, struct 
ib_pd *pd,
qp = sqp-qp;
qp-pri.vid = 0x;
qp-alt.vid = 0x;
+   sqp-roce_v2_gsi = NULL;
} else {
qp = kzalloc(sizeof (struct mlx4_ib_qp), gfp);
if (!qp)
@@ -1029,9 +1034,17 @@ static void destroy_qp_common(struct mlx4_ib_dev *dev, 
struct mlx4_ib_qp *qp,
del_gid_entries(qp);
 }
 
-static u32 get_sqp_num(struct mlx4_ib_dev *dev, struct ib_qp_init_attr *attr)
+static int get_sqp_num(struct mlx4_ib_dev *dev, struct ib_qp_init_attr *attr)
 {
/* Native or PPF */
+   if ((!mlx4_is_mfunc(dev-dev) || mlx4_is_master(dev-dev)) 
+   attr-create_flags  MLX4_IB_QP_CREATE_ROCE_V2_GSI) {
+   int sqpn;
+   int res = mlx4_qp_reserve_range(dev-dev, 1, 1, sqpn, 0);
+
+   return res ? -abs(res) : sqpn;
+   }
+
if (!mlx4_is_mfunc(dev-dev) ||
(mlx4_is_master(dev-dev) 
 attr-create_flags  MLX4_IB_SRIOV_SQP)) {
@@ -1039,6 +1052,7 @@ static u32 get_sqp_num(struct mlx4_ib_dev *dev, struct 
ib_qp_init_attr *attr)
(attr-qp_type == IB_QPT_SMI ? 0 : 2) +
attr-port_num - 1;
}
+
/* PF or VF -- creating proxies */
if (attr-qp_type == IB_QPT_SMI)
return dev-dev-caps.qp0_proxy[attr-port_num - 1];
@@ -1046,9 +1060,9 @@ static u32 get_sqp_num(struct mlx4_ib_dev *dev, struct 
ib_qp_init_attr *attr)
return dev-dev-caps.qp1_proxy[attr-port_num - 1];
 }
 
-struct ib_qp *mlx4_ib_create_qp(struct ib_pd *pd,
-   struct ib_qp_init_attr *init_attr,
-   struct ib_udata *udata)
+static struct ib_qp *_mlx4_ib_create_qp(struct ib_pd *pd,
+   struct ib_qp_init_attr *init_attr,
+   struct ib_udata *udata)
 {
struct mlx4_ib_qp *qp = NULL;
int err;
@@ -1066,6 +1080,7 @@ struct ib_qp *mlx4_ib_create_qp(struct ib_pd *pd,
MLX4_IB_SRIOV_TUNNEL_QP |
MLX4_IB_SRIOV_SQP |
MLX4_IB_QP_NETIF |
+   MLX4_IB_QP_CREATE_ROCE_V2_GSI

[PATCH v3 for-next 19/33] RDMA/ocrdma: changes to support user AH creation

2015-03-24 Thread Somnath Kotur
From: Devesh Sharma devesh.sha...@emulex.com

To support user space AH this uses ahid field to convey
l3-type to user space library. The library is responsible
for decoding the l3-type out of ahid.

Signed-off-by: Somnath Kotur somnath.ko...@emulex.com
Signed-off-by: Devesh Sharma devesh.sha...@emulex.com
---
 drivers/infiniband/hw/ocrdma/ocrdma_ah.c | 5 +
 drivers/infiniband/hw/ocrdma/ocrdma_ah.h | 5 +++--
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_ah.c 
b/drivers/infiniband/hw/ocrdma/ocrdma_ah.c
index 1bb72a0..65a39cc 100644
--- a/drivers/infiniband/hw/ocrdma/ocrdma_ah.c
+++ b/drivers/infiniband/hw/ocrdma/ocrdma_ah.c
@@ -191,6 +191,11 @@ struct ib_ah *ocrdma_create_ah(struct ib_pd *ibpd, struct 
ib_ah_attr *attr)
ahid_addr = pd-uctx-ah_tbl.va + attr-dlid;
*ahid_addr = 0;
*ahid_addr |= ah-id  OCRDMA_AH_ID_MASK;
+   if (ocrdma_is_rocev2_supported(dev)) {
+   *ahid_addr |= ((u32)ah-hdr_type 
+  OCRDMA_AH_L3_TYPE_MASK) 
+  OCRDMA_AH_L3_TYPE_SHIFT;
+   }
if (isvlan)
*ahid_addr |= (OCRDMA_AH_VLAN_VALID_MASK 
   OCRDMA_AH_VLAN_VALID_SHIFT);
diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_ah.h 
b/drivers/infiniband/hw/ocrdma/ocrdma_ah.h
index 726a87c..ed45ecd 100644
--- a/drivers/infiniband/hw/ocrdma/ocrdma_ah.h
+++ b/drivers/infiniband/hw/ocrdma/ocrdma_ah.h
@@ -31,9 +31,10 @@
 enum {
OCRDMA_AH_ID_MASK   = 0x3FF,
OCRDMA_AH_VLAN_VALID_MASK   = 0x01,
-   OCRDMA_AH_VLAN_VALID_SHIFT  = 0x1F
+   OCRDMA_AH_VLAN_VALID_SHIFT  = 0x1F,
+   OCRDMA_AH_L3_TYPE_MASK  = 0x03,
+   OCRDMA_AH_L3_TYPE_SHIFT = 0x1D /* 29 bits */
 };
-
 struct ib_ah *ocrdma_create_ah(struct ib_pd *, struct ib_ah_attr *);
 int ocrdma_destroy_ah(struct ib_ah *);
 int ocrdma_query_ah(struct ib_ah *, struct ib_ah_attr *);
-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 for-next 21/33] IB/mlx4: Replace spin_lock with rw_semaphore

2015-03-24 Thread Somnath Kotur
From: Moni Shoua mo...@mellanox.com

Protection on iboe-netdevs is no longer required to be from an atomic context.
Replacing a spin_lock with a semaphore is allowed and makes more sense.

Signed-off-by: Moni Shoua mo...@mellanox.com
Signed-off-by: Somnath Kotur somnath.ko...@emulex.com
---
 drivers/infiniband/hw/mlx4/main.c| 27 ++-
 drivers/infiniband/hw/mlx4/mlx4_ib.h |  2 +-
 2 files changed, 11 insertions(+), 18 deletions(-)

diff --git a/drivers/infiniband/hw/mlx4/main.c 
b/drivers/infiniband/hw/mlx4/main.c
index 91caffc..d8b227e 100644
--- a/drivers/infiniband/hw/mlx4/main.c
+++ b/drivers/infiniband/hw/mlx4/main.c
@@ -369,7 +369,7 @@ static int eth_link_query_port(struct ib_device *ibdev, u8 
port,
props-active_mtu   = IB_MTU_256;
if (is_bonded)
rtnl_lock(); /* required to get upper dev */
-   spin_lock_bh(iboe-lock);
+   down_read(iboe-sem);
ndev = iboe-netdevs[port - 1];
if (ndev  is_bonded)
ndev = netdev_master_upper_dev_get(ndev);
@@ -383,7 +383,7 @@ static int eth_link_query_port(struct ib_device *ibdev, u8 
port,
IB_PORT_ACTIVE : IB_PORT_DOWN;
props-phys_state   = state_to_phys_state(props-state);
 out_unlock:
-   spin_unlock_bh(iboe-lock);
+   up_read(iboe-sem);
if (is_bonded)
rtnl_unlock();
 out:
@@ -825,11 +825,11 @@ int mlx4_ib_add_mc(struct mlx4_ib_dev *mdev, struct 
mlx4_ib_qp *mqp,
if (!mqp-port)
return 0;
 
-   spin_lock_bh(mdev-iboe.lock);
+   down_read(mdev-iboe.sem);
ndev = mdev-iboe.netdevs[mqp-port - 1];
if (ndev)
dev_hold(ndev);
-   spin_unlock_bh(mdev-iboe.lock);
+   up_read(mdev-iboe.sem);
 
if (ndev) {
ret = 1;
@@ -1330,7 +1330,6 @@ static int mlx4_ib_mcg_detach(struct ib_qp *ibqp, union 
ib_gid *gid, u16 lid)
struct mlx4_ib_dev *mdev = to_mdev(ibqp-device);
struct mlx4_dev *dev = mdev-dev;
struct mlx4_ib_qp *mqp = to_mqp(ibqp);
-   struct net_device *ndev;
struct mlx4_ib_gid_entry *ge;
enum mlx4_protocol prot =  MLX4_PROT_IB_IPV6;
struct mlx4_flow_reg_id reg_id = {0, 0};
@@ -1370,13 +1369,6 @@ static int mlx4_ib_mcg_detach(struct ib_qp *ibqp, union 
ib_gid *gid, u16 lid)
mutex_lock(mqp-mutex);
ge = find_gid_entry(mqp, gid-raw);
if (ge) {
-   spin_lock_bh(mdev-iboe.lock);
-   ndev = ge-added ? mdev-iboe.netdevs[ge-port - 1] : NULL;
-   if (ndev)
-   dev_hold(ndev);
-   spin_unlock_bh(mdev-iboe.lock);
-   if (ndev)
-   dev_put(ndev);
list_del(ge-list);
kfree(ge);
} else
@@ -1543,7 +1535,7 @@ static void mlx4_ib_scan_netdevs(struct mlx4_ib_dev 
*ibdev,
 
iboe = ibdev-iboe;
 
-   spin_lock_bh(iboe-lock);
+   down_write(iboe-sem);
mlx4_foreach_ib_transport_port(port, ibdev-dev) {
 
iboe-netdevs[port - 1] =
@@ -1555,7 +1547,7 @@ static void mlx4_ib_scan_netdevs(struct mlx4_ib_dev 
*ibdev,
update_qps_port = port;
 
}
-   spin_unlock_bh(iboe-lock);
+   up_write(iboe-sem);
 
if (update_qps_port  0)
mlx4_ib_update_qps(ibdev, dev, update_qps_port);
@@ -1848,7 +1840,7 @@ static void *mlx4_ib_add(struct mlx4_dev *dev)
 
mlx4_ib_alloc_eqs(dev, ibdev);
 
-   spin_lock_init(iboe-lock);
+   init_rwsem(iboe-sem);
 
if (init_node_data(ibdev))
goto err_map;
@@ -2153,7 +2145,8 @@ static void handle_bonded_port_state_event(struct 
work_struct *work)
struct ib_event ibev;
 
kfree(ew);
-   spin_lock_bh(ibdev-iboe.lock);
+
+   down_read(ibdev-iboe.sem);
for (i = 0; i  MLX4_MAX_PORTS; ++i) {
struct net_device *curr_netdev = ibdev-iboe.netdevs[i];
 
@@ -2165,7 +2158,7 @@ static void handle_bonded_port_state_event(struct 
work_struct *work)
bonded_port_state = (bonded_port_state != IB_PORT_ACTIVE) ?
curr_port_state : IB_PORT_ACTIVE;
}
-   spin_unlock_bh(ibdev-iboe.lock);
+   up_read(ibdev-iboe.sem);
 
ibev.device = ibdev-ib_dev;
ibev.element.port_num = 1;
diff --git a/drivers/infiniband/hw/mlx4/mlx4_ib.h 
b/drivers/infiniband/hw/mlx4/mlx4_ib.h
index e3805a4..166ebf9 100644
--- a/drivers/infiniband/hw/mlx4/mlx4_ib.h
+++ b/drivers/infiniband/hw/mlx4/mlx4_ib.h
@@ -455,7 +455,7 @@ struct mlx4_ib_sriov {
 };
 
 struct mlx4_ib_iboe {
-   spinlock_t  lock;
+   struct rw_semaphore sem; /* guard from concurrent access to data in 
this struct */
struct net_device  *netdevs[MLX4_MAX_PORTS];
atomic64_t  mac[MLX4_MAX_PORTS];
struct notifier_block   nb;
-- 
2.1.0

--
To unsubscribe from this list: send

[PATCH v3 for-next 28/33] IB/mlx4: Translate cache gid index to real index

2015-03-24 Thread Somnath Kotur
From: Moni Shoua mo...@mellanox.com

When QP is modified with path the given sgid_index is not necessarily
the index that HW knows. This is due to optimizations that can save
place in the HW table. Therefore, translation is required.

Signed-off-by: Moni Shoua mo...@mellanox.com
Signed-off-by: Somnath Kotur somnath.ko...@emulex.com
---
 drivers/infiniband/hw/mlx4/qp.c | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c
index 847f9ec..d7d7c5a 100644
--- a/drivers/infiniband/hw/mlx4/qp.c
+++ b/drivers/infiniband/hw/mlx4/qp.c
@@ -1256,14 +1256,18 @@ static int _mlx4_set_path(struct mlx4_ib_dev *dev, 
const struct ib_ah_attr *ah,
path-static_rate = 0;
 
if (ah-ah_flags  IB_AH_GRH) {
-   if (ah-grh.sgid_index = dev-dev-caps.gid_table_len[port]) {
+   int real_sgid_index = mlx4_ib_gid_index_to_real_index(dev,
+ port,
+ 
ah-grh.sgid_index);
+
+   if (real_sgid_index = dev-dev-caps.gid_table_len[port]) {
pr_err(sgid_index (%u) too large. max is %d\n,
-  ah-grh.sgid_index, 
dev-dev-caps.gid_table_len[port] - 1);
+  real_sgid_index, 
dev-dev-caps.gid_table_len[port] - 1);
return -1;
}
 
path-grh_mylmc |= 1  7;
-   path-mgid_index = ah-grh.sgid_index;
+   path-mgid_index = real_sgid_index;
path-hop_limit  = ah-grh.hop_limit;
path-tclass_flowlabel =
cpu_to_be32((ah-grh.traffic_class  20) |
-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 for-next 33/33] IB/cma: Join and leave multicast groups with IGMP

2015-03-24 Thread Somnath Kotur
From: Moni Shoua mo...@mellanox.com

Since RoCEv2 is a protocol over IP header it is required to send IGMP
join and leave requests to the network when joining and leaving
multicast groups.

Signed-off-by: Moni Shoua mo...@mellanox.com
Signed-off-by: Somnath Kotur somnath.ko...@emulex.com
---
 drivers/infiniband/core/cma.c   | 78 ++---
 drivers/infiniband/core/multicast.c | 18 -
 include/rdma/ib_sa.h|  3 ++
 3 files changed, 92 insertions(+), 7 deletions(-)

diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index 6f345e2..8f997d7 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -38,6 +38,7 @@
 #include linux/in6.h
 #include linux/mutex.h
 #include linux/random.h
+#include linux/igmp.h
 #include linux/idr.h
 #include linux/inetdevice.h
 #include linux/slab.h
@@ -196,6 +197,7 @@ struct cma_multicast {
void*context;
struct sockaddr_storage addr;
struct kref mcref;
+   booligmp_joined;
 };
 
 struct cma_work {
@@ -283,6 +285,26 @@ static inline void cma_set_ip_ver(struct cma_hdr *hdr, u8 
ip_ver)
hdr-ip_version = (ip_ver  4) | (hdr-ip_version  0xF);
 }
 
+static int cma_igmp_send(struct net_device *ndev, union ib_gid *mgid, bool 
join)
+{
+   struct in_device *in_dev = NULL;
+
+   if (ndev) {
+   rtnl_lock();
+   in_dev = __in_dev_get_rtnl(ndev);
+   if (in_dev) {
+   if (join)
+   ip_mc_inc_group(in_dev,
+   *(__be32 *)(mgid-raw+12));
+   else
+   ip_mc_dec_group(in_dev,
+   *(__be32 *)(mgid-raw+12));
+   }
+   rtnl_unlock();
+   }
+   return (in_dev) ? 0 : -ENODEV;
+}
+
 static void cma_attach_to_dev(struct rdma_id_private *id_priv,
  struct cma_device *cma_dev)
 {
@@ -1076,6 +1098,20 @@ static void cma_leave_mc_groups(struct rdma_id_private 
*id_priv)
kfree(mc);
break;
case IB_LINK_LAYER_ETHERNET:
+   if (mc-igmp_joined) {
+   struct rdma_dev_addr *dev_addr = 
id_priv-id.route.addr.dev_addr;
+   struct net_device *ndev = NULL;
+
+   if (dev_addr-bound_dev_if)
+   ndev = dev_get_by_index(init_net,
+   
dev_addr-bound_dev_if);
+   if (ndev) {
+   cma_igmp_send(ndev,
+ 
mc-multicast.ib-rec.mgid,
+ false);
+   dev_put(ndev);
+   }
+   }
kref_put(mc-mcref, release_mc);
break;
default:
@@ -3356,7 +3392,7 @@ static int cma_iboe_join_multicast(struct rdma_id_private 
*id_priv,
 {
struct iboe_mcast_work *work;
struct rdma_dev_addr *dev_addr = id_priv-id.route.addr.dev_addr;
-   int err;
+   int err = 0;
struct sockaddr *addr = (struct sockaddr *)mc-addr;
struct net_device *ndev = NULL;
 
@@ -3388,13 +3424,30 @@ static int cma_iboe_join_multicast(struct 
rdma_id_private *id_priv,
mc-multicast.ib-rec.rate = iboe_get_rate(ndev);
mc-multicast.ib-rec.hop_limit = 1;
mc-multicast.ib-rec.mtu = iboe_get_mtu(ndev-mtu);
+   mc-multicast.ib-rec.ifindex = dev_addr-bound_dev_if;
+   mc-multicast.ib-rec.net = init_net;
+   rdma_ip2gid((struct sockaddr *)id_priv-id.route.addr.src_addr,
+   mc-multicast.ib-rec.port_gid);
+
+   if (addr-sa_family == AF_INET) {
+   mc-multicast.ib-rec.gid_type =
+   id_priv-cma_dev-default_gid_type;
+   if (mc-multicast.ib-rec.gid_type == IB_GID_TYPE_ROCE_V2)
+   err = cma_igmp_send(ndev, mc-multicast.ib-rec.mgid,
+   true);
+   if (!err) {
+   mc-igmp_joined = true;
+   mc-multicast.ib-rec.hop_limit = IPV6_DEFAULT_HOPLIMIT;
+   }
+   } else {
+   mc-multicast.ib-rec.gid_type = IB_GID_TYPE_IB;
+   }
dev_put(ndev);
-   if (!mc-multicast.ib-rec.mtu) {
+   if (err || !mc-multicast.ib-rec.mtu) {
err = -EINVAL;
goto out2;
}
-   rdma_ip2gid((struct sockaddr *)id_priv-id.route.addr.src_addr,
-   mc-multicast.ib-rec.port_gid);
+
work-id = id_priv;
work-mc = mc;
INIT_WORK(work

[PATCH v3 for-next 26/33] IB/mlx4: Implement ib_device callback - modify_gid

2015-03-24 Thread Somnath Kotur
From: Moni Shoua mo...@mellanox.com

This is a new callbac that is required for RoCEv2 support.
In RoCE, GID table is managed in the IB core driver. The role of the
mlx4 driver is to synchronize the HW with the entries in the GID table.
Since it is possible that the same GID value will appear more than once
in the GID table (though with different attributes) it is required from
the mlx4 driver to maintain a reference counting mechanism and populate
the HW with a single value.
Since an index to the GID table is not necessarily the same as index to
the matching entry in the HW GID table, a translation between indexes is
required.

Signed-off-by: Moni Shoua mo...@mellanox.com
Signed-off-by: Somnath Kotur somnath.ko...@emulex.com
---
 drivers/infiniband/hw/mlx4/main.c| 226 +++
 drivers/infiniband/hw/mlx4/mlx4_ib.h |  18 +++
 include/linux/mlx4/cmd.h |   3 +-
 include/linux/mlx4/device.h  |   3 +-
 4 files changed, 248 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/hw/mlx4/main.c 
b/drivers/infiniband/hw/mlx4/main.c
index 04e6603..96a6ec0 100644
--- a/drivers/infiniband/hw/mlx4/main.c
+++ b/drivers/infiniband/hw/mlx4/main.c
@@ -1555,6 +1555,230 @@ unlock:
return mlx4_get_protocol_dev(ibdev-dev, MLX4_PROT_ETH, port_num);
 }
 
+static int mlx4_ib_update_gids_v1(struct gid_entry *gids,
+ struct mlx4_ib_dev *ibdev,
+ u8 port_num)
+{
+   struct mlx4_cmd_mailbox *mailbox;
+   int err;
+   struct mlx4_dev *dev = ibdev-dev;
+   int i;
+   union ib_gid *gid_tbl;
+
+   mailbox = mlx4_alloc_cmd_mailbox(dev);
+   if (IS_ERR(mailbox))
+   return -ENOMEM;
+
+   gid_tbl = mailbox-buf;
+
+   for (i = 0; i  MLX4_MAX_PORT_GIDS; ++i)
+   memcpy(gid_tbl[i], gids[i].gid, sizeof(union ib_gid));
+
+   err = mlx4_cmd(dev, mailbox-dma,
+  MLX4_SET_PORT_GID_TABLE  8 | port_num,
+  1, MLX4_CMD_SET_PORT, MLX4_CMD_TIME_CLASS_B,
+  MLX4_CMD_WRAPPED);
+   if (mlx4_is_bonded(dev))
+   err += mlx4_cmd(dev, mailbox-dma,
+   MLX4_SET_PORT_GID_TABLE  8 | 2,
+   1, MLX4_CMD_SET_PORT, MLX4_CMD_TIME_CLASS_B,
+   MLX4_CMD_WRAPPED);
+
+   mlx4_free_cmd_mailbox(dev, mailbox);
+   return err;
+}
+
+static int mlx4_ib_update_gids_v1_v2(struct gid_entry *gids,
+struct mlx4_ib_dev *ibdev,
+u8 port_num)
+{
+   struct mlx4_cmd_mailbox *mailbox;
+   int err;
+   struct mlx4_dev *dev = ibdev-dev;
+   int i;
+   struct {
+   union ib_gidgid;
+   __be32  rsrvd1[2];
+   __be16  rsrvd2;
+   u8  type;
+   u8  version;
+   __be32  rsrvd3;
+   } *gid_tbl;
+
+   mailbox = mlx4_alloc_cmd_mailbox(dev);
+   if (IS_ERR(mailbox))
+   return -ENOMEM;
+
+   gid_tbl = mailbox-buf;
+   for (i = 0; i  MLX4_MAX_PORT_GIDS; ++i) {
+   memcpy(gid_tbl[i].gid, gids[i].gid, sizeof(union ib_gid));
+   if (gids[i].gid_type == IB_GID_TYPE_ROCE_V2) {
+   gid_tbl[i].version = 2;
+   if (!ipv6_addr_v4mapped((struct in6_addr 
*)gids[i].gid))
+   gid_tbl[i].type = 1;
+   }
+   }
+
+   err = mlx4_cmd(dev, mailbox-dma,
+  MLX4_SET_PORT_ROCE_ADDR  8 | port_num,
+  1, MLX4_CMD_SET_PORT, MLX4_CMD_TIME_CLASS_B,
+  MLX4_CMD_WRAPPED);
+   if (mlx4_is_bonded(dev))
+   err += mlx4_cmd(dev, mailbox-dma,
+   MLX4_SET_PORT_ROCE_ADDR  8 | 2,
+   1, MLX4_CMD_SET_PORT, MLX4_CMD_TIME_CLASS_B,
+   MLX4_CMD_WRAPPED);
+
+   mlx4_free_cmd_mailbox(dev, mailbox);
+   return err;
+}
+
+static int mlx4_ib_update_gids(struct gid_entry *gids,
+  struct mlx4_ib_dev *ibdev,
+  u8 port_num)
+{
+   if (ibdev-dev-caps.flags2  MLX4_DEV_CAP_FLAG2_ROCE_V1_V2)
+   return mlx4_ib_update_gids_v1_v2(gids, ibdev, port_num);
+
+   return mlx4_ib_update_gids_v1(gids, ibdev, port_num);
+}
+
+static int mlx4_ib_modify_gid(struct ib_device *device,
+ u8 port_num, unsigned int index,
+ const union ib_gid *gid,
+ const struct ib_gid_attr *attr,
+ void **context)
+{
+   struct mlx4_ib_dev *ibdev = to_mdev(device);
+   struct mlx4_ib_iboe *iboe = ibdev-iboe;
+   struct mlx4_port_gid_table   *port_gid_table;
+   int free = -1, found = -1;
+   int ret

[PATCH v3 for-next 06/33] net: Add info for NETDEV_CHANGEUPPER event

2015-03-24 Thread Somnath Kotur
From: Matan Barak mat...@mellanox.com

Consumers of NETDEV_CHANGEUPPER event sometimes want
to know which upper device was linked/unlinked and which
operation was carried. Adding extra information in the
notifier info block.

Signed-off-by: Matan Barak mat...@mellanox.com
Signed-off-by: Somnath Kotur somnath.ko...@emulex.com
---
 include/linux/netdevice.h | 14 ++
 net/core/dev.c| 12 ++--
 2 files changed, 24 insertions(+), 2 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index f36f7d3..599d7c8 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -3466,6 +3466,20 @@ struct sk_buff *__skb_gso_segment(struct sk_buff *skb,
 struct sk_buff *skb_mac_gso_segment(struct sk_buff *skb,
netdev_features_t features);
 
+enum netdev_changeupper_event {
+   NETDEV_CHANGEUPPER_LINK,
+   NETDEV_CHANGEUPPER_UNLINK,
+};
+
+struct netdev_changeupper_info {
+   struct netdev_notifier_info info; /* must be first */
+   enum netdev_changeupper_event   event;
+   struct net_device   *upper;
+};
+
+void netdev_changeupper_info_change(struct net_device *dev,
+   struct netdev_changeupper_info *info);
+
 struct netdev_bonding_info {
ifslave slave;
ifbond  master;
diff --git a/net/core/dev.c b/net/core/dev.c
index ea714fc..1ef1bd5 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -5118,6 +5118,7 @@ static int __netdev_upper_dev_link(struct net_device *dev,
   void *private)
 {
struct netdev_adjacent *i, *j, *to_i, *to_j;
+   struct netdev_changeupper_info changeupper_info;
int ret = 0;
 
ASSERT_RTNL();
@@ -5173,7 +5174,10 @@ static int __netdev_upper_dev_link(struct net_device 
*dev,
goto rollback_lower_mesh;
}
 
-   call_netdevice_notifiers(NETDEV_CHANGEUPPER, dev);
+   changeupper_info.event = NETDEV_CHANGEUPPER_LINK;
+   changeupper_info.upper = upper_dev;
+   call_netdevice_notifiers_info(NETDEV_CHANGEUPPER, dev,
+ changeupper_info.info);
return 0;
 
 rollback_lower_mesh:
@@ -5269,6 +5273,7 @@ void netdev_upper_dev_unlink(struct net_device *dev,
 struct net_device *upper_dev)
 {
struct netdev_adjacent *i, *j;
+   struct netdev_changeupper_info changeupper_info;
ASSERT_RTNL();
 
__netdev_adjacent_dev_unlink_neighbour(dev, upper_dev);
@@ -5290,7 +5295,10 @@ void netdev_upper_dev_unlink(struct net_device *dev,
list_for_each_entry(i, upper_dev-all_adj_list.upper, list)
__netdev_adjacent_dev_unlink(dev, i-dev);
 
-   call_netdevice_notifiers(NETDEV_CHANGEUPPER, dev);
+   changeupper_info.event = NETDEV_CHANGEUPPER_UNLINK;
+   changeupper_info.upper = upper_dev;
+   call_netdevice_notifiers_info(NETDEV_CHANGEUPPER, dev,
+ changeupper_info.info);
 }
 EXPORT_SYMBOL(netdev_upper_dev_unlink);
 
-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 for-next 13/33] IB/core: Add rdma_network_type to wc

2015-03-24 Thread Somnath Kotur
From: Matan Barak mat...@mellanox.com

Providers should tell IB core the wc's network type.
This is used in order to search for the proper GID in the
GID table. When using HCAs that can't provide this info,
IB core tries to deep examine the packet and extract
the GID type by itself.

Signed-off-by: Matan Barak mat...@mellanox.com
Signed-off-by: Somnath Kotur somnath.ko...@emulex.com
---
 drivers/infiniband/core/verbs.c | 106 ++--
 include/rdma/ib_verbs.h |  30 
 2 files changed, 131 insertions(+), 5 deletions(-)

diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index 2f5fd7a..2e7ccad 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -195,8 +195,84 @@ struct ib_ah *ib_create_ah(struct ib_pd *pd, struct 
ib_ah_attr *ah_attr)
 }
 EXPORT_SYMBOL(ib_create_ah);
 
+static int ib_get_grh_header_version(const void *h)
+{
+   const struct iphdr *ip4h = (struct iphdr *)(h + 20);
+   struct iphdr ip4h_checked;
+   const struct ipv6hdr *ip6h = (struct ipv6hdr *)h;
+
+   if (ip6h-version != 6)
+   return (ip4h-version == 4) ? 4 : 0;
+   /* version may be 6 or 4 */
+   if (ip4h-ihl != 5) /* IPv4 header length must be 5 for RR */
+   return 6;
+   /* Verify checksum.
+  We can't write on scattered buffers so we need to copy to
+  temp buffer.
+*/
+   memcpy(ip4h_checked, ip4h, sizeof(ip4h_checked));
+   ip4h_checked.check = 0;
+   ip4h_checked.check = ip_fast_csum((u8 *)ip4h_checked, 5);
+   /* if IPv4 header checksum is OK, bellive it */
+   if (ip4h-check == ip4h_checked.check)
+   return 4;
+   return 6;
+}
+
+static int ib_get_dgid_sgid_by_grh(const void *h,
+  enum rdma_network_type net_type,
+  union ib_gid *dgid, union ib_gid *sgid)
+{
+   switch (net_type) {
+   case RDMA_NETWORK_IPV4: {
+   const struct iphdr *ip4h = (struct iphdr *)(h + 20);
+
+   ipv6_addr_set_v4mapped(ip4h-daddr, (struct in6_addr *)dgid);
+   ipv6_addr_set_v4mapped(ip4h-saddr, (struct in6_addr *)sgid);
+   return 0;
+   }
+   case RDMA_NETWORK_IPV6: {
+   struct ipv6hdr *ip6h = (struct ipv6hdr *)h;
+
+   memcpy(dgid, ip6h-daddr, sizeof(*dgid));
+   memcpy(sgid, ip6h-saddr, sizeof(*sgid));
+   return 0;
+   }
+   case RDMA_NETWORK_IB: {
+   struct ib_grh *grh = (struct ib_grh *)h;
+
+   memcpy(dgid, grh-dgid, sizeof(*dgid));
+   memcpy(sgid, grh-sgid, sizeof(*sgid));
+   return 0;
+   }
+   }
+
+   return -EINVAL;
+}
+
+static enum rdma_network_type ib_get_net_type_by_grh(struct ib_device *device,
+u8 port_num,
+const struct ib_grh *grh)
+{
+   int grh_version;
+
+   if (rdma_port_get_link_layer(device, port_num) == 
IB_LINK_LAYER_INFINIBAND)
+   return RDMA_NETWORK_IB;
+
+   grh_version = ib_get_grh_header_version(grh);
+
+   if (grh_version == 4)
+   return RDMA_NETWORK_IPV4;
+
+   if (grh-next_hdr == IPPROTO_UDP)
+   return RDMA_NETWORK_IPV6;
+
+   return RDMA_NETWORK_IB;
+}
+
 struct find_gid_index_context {
u16 vlan_id;
+   enum ib_gid_type gid_type;
 };
 
 static bool find_gid_index(const union ib_gid *gid,
@@ -206,6 +282,9 @@ static bool find_gid_index(const union ib_gid *gid,
struct find_gid_index_context *ctx =
(struct find_gid_index_context *)context;
 
+   if (ctx-gid_type != gid_attr-gid_type)
+   return false;
+
if ((!!(ctx-vlan_id != 0x) == !is_vlan_dev(gid_attr-ndev)) ||
(is_vlan_dev(gid_attr-ndev) 
 vlan_dev_vlan_id(gid_attr-ndev) != ctx-vlan_id))
@@ -216,9 +295,11 @@ static bool find_gid_index(const union ib_gid *gid,
 
 static int get_sgid_index_from_eth(struct ib_device *device, u8 port_num,
   u16 vlan_id, union ib_gid *sgid,
+  enum ib_gid_type gid_type,
   u16 *gid_index)
 {
-   struct find_gid_index_context context = {.vlan_id = vlan_id};
+   struct find_gid_index_context context = {.vlan_id = vlan_id,
+.gid_type = gid_type};
 
return ib_find_gid_by_filter(device, sgid, port_num, find_gid_index,
 context, gid_index);
@@ -232,9 +313,24 @@ int ib_init_ah_from_wc(struct ib_device *device, u8 
port_num, struct ib_wc *wc,
int ret;
int is_eth = (rdma_port_get_link_layer(device, port_num) ==
IB_LINK_LAYER_ETHERNET);
+   enum rdma_network_type net_type = RDMA_NETWORK_IB

[PATCH v3 for-next 07/33] IB/core: Add RoCE cache bonding support

2015-03-24 Thread Somnath Kotur
From: Matan Barak mat...@mellanox.com

Bonding is a unique behavior since when working in
active-backup mode, only the current selected slave
should occupy the default GIDs and the master's GID.
Listening to bonding events and only adding the
required GIDs to the active slave in the RoCE cache
GID table.

Signed-off-by: Matan Barak mat...@mellanox.com
Signed-off-by: Somnath Kotur somnath.ko...@emulex.com
---
 drivers/infiniband/core/roce_gid_mgmt.c | 291 ++--
 drivers/net/bonding/bond_options.c  |  13 --
 include/net/bonding.h   |   7 +
 3 files changed, 282 insertions(+), 29 deletions(-)

diff --git a/drivers/infiniband/core/roce_gid_mgmt.c 
b/drivers/infiniband/core/roce_gid_mgmt.c
index c0cbb23..362327f 100644
--- a/drivers/infiniband/core/roce_gid_mgmt.c
+++ b/drivers/infiniband/core/roce_gid_mgmt.c
@@ -37,6 +37,7 @@
 
 /* For in6_dev_get/in6_dev_put */
 #include net/addrconf.h
+#include net/bonding.h
 
 #include rdma/ib_cache.h
 #include rdma/ib_addr.h
@@ -55,16 +56,17 @@ struct  update_gid_event_work {
enum gid_op_type gid_op;
 };
 
-#define ROCE_NETDEV_CALLBACK_SZ2
+#define ROCE_NETDEV_CALLBACK_SZ3
 struct netdev_event_work_cmd {
roce_netdev_callbackcb;
roce_netdev_filter  filter;
+   struct net_device   *ndev;
+   struct net_device   *f_ndev;
 };
 
 struct netdev_event_work {
struct work_struct  work;
struct netdev_event_work_cmdcmds[ROCE_NETDEV_CALLBACK_SZ];
-   struct net_device   *ndev;
 };
 
 struct roce_rescan_work {
@@ -127,22 +129,96 @@ static void update_gid(enum gid_op_type gid_op, struct 
ib_device *ib_dev,
}
 }
 
+#define IS_NETDEV_BONDING_MASTER(ndev) \
+   (((ndev)-priv_flags   \
+ (IFF_BONDING | IFF_MASTER)) == (IFF_BONDING | IFF_MASTER))
+
+enum bonding_slave_state {
+   BONDING_SLAVE_STATE_ACTIVE  = 1UL  0,
+   BONDING_SLAVE_STATE_INACTIVE= 1UL  1,
+   BONDING_SLAVE_STATE_NA  = 1UL  2,
+};
+
+static enum bonding_slave_state is_eth_active_slave_of_bonding(struct 
net_device *idev,
+  struct 
net_device *upper)
+{
+   if (upper  IS_NETDEV_BONDING_MASTER(upper)) {
+   struct net_device *pdev;
+
+   rcu_read_lock();
+   pdev = bond_option_active_slave_get_rcu(netdev_priv(upper));
+   rcu_read_unlock();
+   if (pdev)
+   return idev == pdev ? BONDING_SLAVE_STATE_ACTIVE :
+   BONDING_SLAVE_STATE_INACTIVE;
+   }
+
+   return BONDING_SLAVE_STATE_NA;
+}
+
+static bool is_upper_dev_rcu(struct net_device *dev, struct net_device *upper)
+{
+   struct net_device *_upper = NULL;
+   struct list_head *iter;
+
+   rcu_read_lock();
+   netdev_for_each_all_upper_dev_rcu(dev, _upper, iter) {
+   if (_upper == upper)
+   break;
+   }
+
+   rcu_read_unlock();
+   return _upper == upper;
+}
+
+static int _is_eth_port_of_netdev(struct ib_device *ib_dev, u8 port,
+ struct net_device *idev, void *cookie,
+ unsigned long bond_state)
+{
+   struct net_device *ndev = (struct net_device *)cookie;
+   struct net_device *rdev;
+   int res;
+
+   if (!idev)
+   return 0;
+
+   rcu_read_lock();
+   rdev = rdma_vlan_dev_real_dev(ndev);
+   if (!rdev)
+   rdev = ndev;
+
+   res = ((is_upper_dev_rcu(idev, ndev) 
+  (is_eth_active_slave_of_bonding(idev, rdev) 
+   bond_state)) ||
+  rdev == idev);
+
+   rcu_read_unlock();
+   return res;
+}
+
 static int is_eth_port_of_netdev(struct ib_device *ib_dev, u8 port,
 struct net_device *idev, void *cookie)
 {
-   struct net_device *rdev;
-   struct net_device *mdev;
-   struct net_device *ndev = (struct net_device *)cookie;
+   return _is_eth_port_of_netdev(ib_dev, port, idev, cookie,
+ BONDING_SLAVE_STATE_ACTIVE |
+ BONDING_SLAVE_STATE_NA);
+}
 
+static int is_eth_port_inactive_slave(struct ib_device *ib_dev, u8 port,
+ struct net_device *idev, void *cookie)
+{
+   struct net_device *mdev;
+   int res;
if (!idev)
return 0;
 
rcu_read_lock();
mdev = netdev_master_upper_dev_get_rcu(idev);
-   rdev = rdma_vlan_dev_real_dev(ndev);
+   res = is_eth_active_slave_of_bonding(idev, mdev) ==
+   BONDING_SLAVE_STATE_INACTIVE;
rcu_read_unlock();
 
-   return (rdev ? rdev : ndev) == (mdev ? mdev : idev);
+   return res;
 }
 
 static int pass_all_filter(struct ib_device *ib_dev, u8 port,
@@ -151,17 +227,49 @@ static int

[PATCH v3 for-next 09/33] IB/core: Report gid_type and gid_ndev through sysfs

2015-03-24 Thread Somnath Kotur
From: Matan Barak mat...@mellanox.com

Since we've added GID attributes to the RoCE GID table,
the users need a convenient way to query them.
Adding the GID type and relate net device to IB's sysfs.

The new attributes are available in:
/sys/class/infiniband/device/ports/port/gid_attrs/ndevs/index
/sys/class/infiniband/device/ports/port/gid_attrs/types/index

The index corresponds to the index of the respective GID in:
/sys/class/infiniband/device/ports/port/gids/index

Signed-off-by: Matan Barak mat...@mellanox.com
Signed-off-by: Somnath Kotur somnath.ko...@emulex.com
---
 drivers/infiniband/core/core_priv.h  |   2 +
 drivers/infiniband/core/roce_gid_cache.c |  13 +++
 drivers/infiniband/core/sysfs.c  | 184 ++-
 3 files changed, 197 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/core/core_priv.h 
b/drivers/infiniband/core/core_priv.h
index 128d2b3..b5bbbdf 100644
--- a/drivers/infiniband/core/core_priv.h
+++ b/drivers/infiniband/core/core_priv.h
@@ -71,6 +71,8 @@ void ib_enum_roce_ports_of_netdev(roce_netdev_filter filter,
  roce_netdev_callback cb,
  void *cookie);
 
+const char *roce_gid_cache_type_str(enum ib_gid_type gid_type);
+
 int roce_gid_cache_get_gid(struct ib_device *ib_dev, u8 port, int index,
   union ib_gid *gid, struct ib_gid_attr *attr);
 
diff --git a/drivers/infiniband/core/roce_gid_cache.c 
b/drivers/infiniband/core/roce_gid_cache.c
index 1f30dad..b6180eb 100644
--- a/drivers/infiniband/core/roce_gid_cache.c
+++ b/drivers/infiniband/core/roce_gid_cache.c
@@ -50,6 +50,11 @@ enum gid_attr_find_mask {
GID_ATTR_FIND_MASK_DEFAULT  = 1UL  3,
 };
 
+static const char * const gid_type_str[] = {
+   [IB_GID_TYPE_IB]= IB/RoCE V1\n,
+   [IB_GID_TYPE_ROCE_V2]   = RoCE V2\n,
+};
+
 static inline int start_port(struct ib_device *ib_dev)
 {
return (ib_dev-node_type == RDMA_NODE_IB_SWITCH) ? 0 : 1;
@@ -60,6 +65,14 @@ struct dev_put_rcu {
struct net_device   *ndev;
 };
 
+const char *roce_gid_cache_type_str(enum ib_gid_type gid_type)
+{
+   if (gid_type  ARRAY_SIZE(gid_type_str)  gid_type_str[gid_type])
+   return gid_type_str[gid_type];
+
+   return Invalid GID type;
+}
+
 static void put_ndev(struct rcu_head *rcu)
 {
struct dev_put_rcu *put_rcu =
diff --git a/drivers/infiniband/core/sysfs.c b/drivers/infiniband/core/sysfs.c
index 5cee246..887c2f8 100644
--- a/drivers/infiniband/core/sysfs.c
+++ b/drivers/infiniband/core/sysfs.c
@@ -37,12 +37,22 @@
 #include linux/slab.h
 #include linux/stat.h
 #include linux/string.h
+#include linux/netdevice.h
 
 #include rdma/ib_mad.h
 
+struct ib_port;
+
+struct gid_attr_group {
+   struct ib_port  *port;
+   struct kobject  kobj;
+   struct attribute_group  ndev;
+   struct attribute_group  type;
+};
 struct ib_port {
struct kobject kobj;
struct ib_device  *ibdev;
+   struct gid_attr_group *gid_attr_group;
struct attribute_group gid_group;
struct attribute_group pkey_group;
u8 port_num;
@@ -84,6 +94,24 @@ static const struct sysfs_ops port_sysfs_ops = {
.show = port_attr_show
 };
 
+static ssize_t gid_attr_show(struct kobject *kobj,
+struct attribute *attr, char *buf)
+{
+   struct port_attribute *port_attr =
+   container_of(attr, struct port_attribute, attr);
+   struct ib_port *p = container_of(kobj, struct gid_attr_group,
+kobj)-port;
+
+   if (!port_attr-show)
+   return -EIO;
+
+   return port_attr-show(p, port_attr, buf);
+}
+
+static const struct sysfs_ops gid_attr_sysfs_ops = {
+   .show = gid_attr_show
+};
+
 static ssize_t state_show(struct ib_port *p, struct port_attribute *unused,
  char *buf)
 {
@@ -281,6 +309,46 @@ static struct attribute *port_default_attrs[] = {
NULL
 };
 
+static size_t print_ndev(struct ib_gid_attr *gid_attr, char *buf)
+{
+   if (!gid_attr-ndev)
+   return -EINVAL;
+
+   return sprintf(buf, %s\n, gid_attr-ndev-name);
+}
+
+static size_t print_gid_type(struct ib_gid_attr *gid_attr, char *buf)
+{
+   return sprintf(buf, %s, roce_gid_cache_type_str(gid_attr-gid_type));
+}
+
+static ssize_t _show_port_gid_attr(struct ib_port *p,
+  struct port_attribute *attr,
+  char *buf,
+  size_t (*print)(struct ib_gid_attr *gid_attr,
+  char *buf))
+{
+   struct port_table_attribute *tab_attr =
+   container_of(attr, struct port_table_attribute, attr);
+   union ib_gid gid;
+   struct ib_gid_attr gid_attr;
+   ssize_t ret;
+   va_list args;
+
+   rcu_read_lock

[PATCH v3 for-next 03/33] IB/core: Add RoCE GID population

2015-03-24 Thread Somnath Kotur
From: Matan Barak mat...@mellanox.com

In order to populate the GID table, we need to listen for
events:
(a) IB device has been added or removed - used in order
to allocate/deallocate the cache and populate
the GID table internally.
(b) inet events - add new GIDs (according to the IP addresses)
to the table.
(c) netdev up/down/change_addr - if a netdev is built onto our
RoCE device, we need to add/delete its IPs.

When an event is received, multiple entries (each with
different GID type) are added.

Signed-off-by: Matan Barak mat...@mellanox.com
Signed-off-by: Somnath Kotur somnath.ko...@emulex.com
---
 drivers/infiniband/core/Makefile |   2 +-
 drivers/infiniband/core/core_priv.h  |  26 ++
 drivers/infiniband/core/device.c |  80 +
 drivers/infiniband/core/roce_gid_cache.c |  68 
 drivers/infiniband/core/roce_gid_mgmt.c  | 516 +++
 include/rdma/ib_addr.h   |   2 +-
 include/rdma/ib_verbs.h  |   9 +
 7 files changed, 701 insertions(+), 2 deletions(-)
 create mode 100644 drivers/infiniband/core/roce_gid_mgmt.c

diff --git a/drivers/infiniband/core/Makefile b/drivers/infiniband/core/Makefile
index 9b63bdf..2c94963 100644
--- a/drivers/infiniband/core/Makefile
+++ b/drivers/infiniband/core/Makefile
@@ -10,7 +10,7 @@ obj-$(CONFIG_INFINIBAND_USER_ACCESS) +=   ib_uverbs.o 
ib_ucm.o \
 
 ib_core-y :=   packer.o ud_header.o verbs.o sysfs.o \
device.o fmr_pool.o cache.o netlink.o \
-   roce_gid_cache.o
+   roce_gid_cache.o roce_gid_mgmt.o
 ib_core-$(CONFIG_INFINIBAND_USER_MEM) += umem.o
 ib_core-$(CONFIG_INFINIBAND_ON_DEMAND_PAGING) += umem_odp.o umem_rbtree.o
 
diff --git a/drivers/infiniband/core/core_priv.h 
b/drivers/infiniband/core/core_priv.h
index a502daa..12797d9 100644
--- a/drivers/infiniband/core/core_priv.h
+++ b/drivers/infiniband/core/core_priv.h
@@ -39,6 +39,8 @@
 
 #include rdma/ib_verbs.h
 
+extern struct workqueue_struct *roce_gid_mgmt_wq;
+
 int  ib_device_register_sysfs(struct ib_device *device,
  int (*port_callback)(struct ib_device *,
   u8, struct kobject *));
@@ -53,6 +55,22 @@ void ib_cache_cleanup(void);
 int ib_resolve_eth_l2_attrs(struct ib_qp *qp,
struct ib_qp_attr *qp_attr, int *qp_attr_mask);
 
+typedef void (*roce_netdev_callback)(struct ib_device *device, u8 port,
+ struct net_device *idev, void *cookie);
+
+typedef int (*roce_netdev_filter)(struct ib_device *device, u8 port,
+struct net_device *idev, void *cookie);
+
+void ib_dev_roce_ports_of_netdev(struct ib_device *ib_dev,
+roce_netdev_filter filter,
+void *filter_cookie,
+roce_netdev_callback cb,
+void *cookie);
+void ib_enum_roce_ports_of_netdev(roce_netdev_filter filter,
+ void *filter_cookie,
+ roce_netdev_callback cb,
+ void *cookie);
+
 int roce_gid_cache_get_gid(struct ib_device *ib_dev, u8 port, int index,
   union ib_gid *gid, struct ib_gid_attr *attr);
 
@@ -66,6 +84,9 @@ int roce_gid_cache_find_gid_by_port(struct ib_device *ib_dev, 
union ib_gid *gid,
 
 int roce_gid_cache_is_active(struct ib_device *ib_dev, u8 port);
 
+int roce_gid_cache_setup(void);
+void roce_gid_cache_cleanup(void);
+
 int roce_add_gid(struct ib_device *ib_dev, u8 port,
 union ib_gid *gid, struct ib_gid_attr *attr);
 
@@ -75,4 +96,9 @@ int roce_del_gid(struct ib_device *ib_dev, u8 port,
 int roce_del_all_netdev_gids(struct ib_device *ib_dev, u8 port,
 struct net_device *ndev);
 
+int roce_gid_mgmt_init(void);
+void roce_gid_mgmt_cleanup(void);
+
+int roce_rescan_device(struct ib_device *ib_dev);
+
 #endif /* _CORE_PRIV_H */
diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
index 8616a95..5ce57bf 100644
--- a/drivers/infiniband/core/device.c
+++ b/drivers/infiniband/core/device.c
@@ -39,6 +39,7 @@
 #include linux/init.h
 #include linux/mutex.h
 #include rdma/rdma_netlink.h
+#include rdma/ib_addr.h
 
 #include core_priv.h
 
@@ -640,6 +641,82 @@ int ib_query_gid(struct ib_device *device,
 EXPORT_SYMBOL(ib_query_gid);
 
 /**
+ * ib_dev_roce_ports_of_netdev - enumerate RoCE ports of ibdev in
+ *  respect of netdev
+ * @ib_dev : IB device we want to query
+ * @filter: Should we call the callback?
+ * @filter_cookie: Cookie passed to filter
+ * @cb: Callback to call for each found RoCE ports
+ * @cookie: Cookie passed back to the callback
+ *
+ * Enumerates all of the physical RoCE ports of ib_dev RoCE ports
+ * which are relaying Ethernet packets to a specific
+ * (possibly

[PATCH v3 for-next 11/33] IB/core: Modify ib_verbs and cma in order to use roce_gid_cache

2015-03-24 Thread Somnath Kotur
From: Matan Barak mat...@mellanox.com

Previously, we resolved the dmac and took the smac and vlan
from the resolved address. Changing that into finding a net
device that matches the IP and vlan of the network packet
and querying the RoCE GID cache for this net device,
GID and GID type.

ocrdma driver changes were done by Somnath Kotur somnath.ko...@emulex.com

Signed-off-by: Matan Barak mat...@mellanox.com
Signed-off-by: Somnath Kotur somnath.ko...@emulex.com
---
 drivers/infiniband/core/addr.c   |   3 +-
 drivers/infiniband/core/cm.c |  30 --
 drivers/infiniband/core/cma.c|   9 --
 drivers/infiniband/core/core_priv.h  |   4 +-
 drivers/infiniband/core/sa_query.c   |   4 -
 drivers/infiniband/core/ucma.c   |   1 -
 drivers/infiniband/core/uverbs_cmd.c |   3 +-
 drivers/infiniband/core/verbs.c  | 162 ++-
 drivers/infiniband/hw/mlx4/ah.c  |  15 ++-
 drivers/infiniband/hw/mlx4/mad.c |  12 ++-
 drivers/infiniband/hw/mlx4/mcg.c |   2 +-
 drivers/infiniband/hw/mlx4/mlx4_ib.h |   2 +-
 drivers/infiniband/hw/mlx4/qp.c  |  48 +++--
 drivers/infiniband/hw/ocrdma/ocrdma.h|   1 +
 drivers/infiniband/hw/ocrdma/ocrdma_ah.c |  20 ++--
 drivers/infiniband/hw/ocrdma/ocrdma_hw.c |  17 ++--
 include/rdma/ib_addr.h   |   2 +-
 include/rdma/ib_sa.h |   2 -
 include/rdma/ib_verbs.h  |  11 +--
 19 files changed, 190 insertions(+), 158 deletions(-)

diff --git a/drivers/infiniband/core/addr.c b/drivers/infiniband/core/addr.c
index f80da50..43af7f5 100644
--- a/drivers/infiniband/core/addr.c
+++ b/drivers/infiniband/core/addr.c
@@ -458,7 +458,7 @@ static void resolve_cb(int status, struct sockaddr 
*src_addr,
 }
 
 int rdma_addr_find_dmac_by_grh(union ib_gid *sgid, union ib_gid *dgid, u8 
*dmac,
-  u16 *vlan_id)
+  u16 *vlan_id, int if_index)
 {
int ret = 0;
struct rdma_dev_addr dev_addr;
@@ -481,6 +481,7 @@ int rdma_addr_find_dmac_by_grh(union ib_gid *sgid, union 
ib_gid *dgid, u8 *dmac,
return ret;
 
memset(dev_addr, 0, sizeof(dev_addr));
+   dev_addr.bound_dev_if = if_index;
 
ctx.addr = dev_addr;
init_completion(ctx.comp);
diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
index d88f2ae..7974e74 100644
--- a/drivers/infiniband/core/cm.c
+++ b/drivers/infiniband/core/cm.c
@@ -178,8 +178,6 @@ struct cm_av {
struct ib_ah_attr ah_attr;
u16 pkey_index;
u8 timeout;
-   u8  valid;
-   u8  smac[ETH_ALEN];
 };
 
 struct cm_work {
@@ -382,7 +380,6 @@ static int cm_init_av_by_path(struct ib_sa_path_rec *path, 
struct cm_av *av)
 av-ah_attr);
av-timeout = path-packet_life_time + 1;
 
-   av-valid = 1;
return 0;
 }
 
@@ -1563,7 +1560,6 @@ static int cm_req_handler(struct cm_work *work)
cm_format_paths_from_req(req_msg, work-path[0], work-path[1]);
 
memcpy(work-path[0].dmac, cm_id_priv-av.ah_attr.dmac, ETH_ALEN);
-   work-path[0].vlan_id = cm_id_priv-av.ah_attr.vlan_id;
ret = cm_init_av_by_path(work-path[0], cm_id_priv-av);
if (ret) {
ib_get_cached_gid(work-port-cm_dev-ib_device,
@@ -3511,32 +3507,6 @@ static int cm_init_qp_rtr_attr(struct cm_id_private 
*cm_id_priv,
*qp_attr_mask = IB_QP_STATE | IB_QP_AV | IB_QP_PATH_MTU |
IB_QP_DEST_QPN | IB_QP_RQ_PSN;
qp_attr-ah_attr = cm_id_priv-av.ah_attr;
-   if (!cm_id_priv-av.valid) {
-   spin_unlock_irqrestore(cm_id_priv-lock, flags);
-   return -EINVAL;
-   }
-   if (cm_id_priv-av.ah_attr.vlan_id != 0x) {
-   qp_attr-vlan_id = cm_id_priv-av.ah_attr.vlan_id;
-   *qp_attr_mask |= IB_QP_VID;
-   }
-   if (!is_zero_ether_addr(cm_id_priv-av.smac)) {
-   memcpy(qp_attr-smac, cm_id_priv-av.smac,
-  sizeof(qp_attr-smac));
-   *qp_attr_mask |= IB_QP_SMAC;
-   }
-   if (cm_id_priv-alt_av.valid) {
-   if (cm_id_priv-alt_av.ah_attr.vlan_id != 0x) {
-   qp_attr-alt_vlan_id =
-   cm_id_priv-alt_av.ah_attr.vlan_id;
-   *qp_attr_mask |= IB_QP_ALT_VID;
-   }
-   if (!is_zero_ether_addr(cm_id_priv-alt_av.smac)) {
-   memcpy(qp_attr-alt_smac,
-  cm_id_priv-alt_av.smac,
-  sizeof(qp_attr-alt_smac));
-   *qp_attr_mask |= IB_QP_ALT_SMAC

[PATCH v3 for-next 05/33] net/bonding: make DRV macros private

2015-03-24 Thread Somnath Kotur
From: Matan Barak mat...@mellanox.com

The bonding modules currently defines 4 macros with
general names that pollute the global namespace:
DRV_VERSION
DRV_RELDATE
DRV_NAME
DRV_DESCRIPTION

Fixing that by defining a private bonding_priv.h
header files which includes those defines.

Signed-off-by: Matan Barak mat...@mellanox.com
Signed-off-by: Somnath Kotur somnath.ko...@emulex.com
---
 drivers/net/bonding/bond_main.c|  2 ++
 drivers/net/bonding/bond_procfs.c  |  1 +
 drivers/net/bonding/bonding_priv.h | 26 ++
 include/net/bonding.h  |  7 ---
 4 files changed, 29 insertions(+), 7 deletions(-)
 create mode 100644 drivers/net/bonding/bonding_priv.h

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 468c70e..55f2d3e 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -81,6 +81,8 @@
 #include net/bond_3ad.h
 #include net/bond_alb.h
 
+#include bonding_priv.h
+
 /* Module parameters */
 
 /* monitor all links that often (in milliseconds). =0 disables monitoring */
diff --git a/drivers/net/bonding/bond_procfs.c 
b/drivers/net/bonding/bond_procfs.c
index 976f5ad..b50a002 100644
--- a/drivers/net/bonding/bond_procfs.c
+++ b/drivers/net/bonding/bond_procfs.c
@@ -4,6 +4,7 @@
 #include net/netns/generic.h
 #include net/bonding.h
 
+#include bonding_priv.h
 
 static void *bond_info_seq_start(struct seq_file *seq, loff_t *pos)
__acquires(RCU)
diff --git a/drivers/net/bonding/bonding_priv.h 
b/drivers/net/bonding/bonding_priv.h
new file mode 100644
index 000..c093e91
--- /dev/null
+++ b/drivers/net/bonding/bonding_priv.h
@@ -0,0 +1,26 @@
+/*
+ * Bond several ethernet interfaces into a Cisco, running 'Etherchannel'.
+ *
+ * Portions are (c) Copyright 1995 Simon Guru Aleph-Null Janes
+ * NCM: Network and Communications Management, Inc.
+ *
+ * BUT, I'm the one who modified it for ethernet, so:
+ * (c) Copyright 1999, Thomas Davis, tada...@lbl.gov
+ *
+ * This software may be used and distributed according to the terms
+ * of the GNU Public License, incorporated herein by reference.
+ *
+ */
+
+#ifndef _BONDING_PRIV_H
+#define _BONDING_PRIV_H
+
+#define DRV_VERSION3.7.1
+#define DRV_RELDATEApril 27, 2011
+#define DRV_NAME   bonding
+#define DRV_DESCRIPTIONEthernet Channel Bonding Driver
+
+#define bond_version DRV_DESCRIPTION : v DRV_VERSION  ( DRV_RELDATE )\n
+
+#endif
+
diff --git a/include/net/bonding.h b/include/net/bonding.h
index 4c2b0f4..a124173 100644
--- a/include/net/bonding.h
+++ b/include/net/bonding.h
@@ -30,13 +30,6 @@
 #include net/bond_alb.h
 #include net/bond_options.h
 
-#define DRV_VERSION3.7.1
-#define DRV_RELDATEApril 27, 2011
-#define DRV_NAME   bonding
-#define DRV_DESCRIPTIONEthernet Channel Bonding Driver
-
-#define bond_version DRV_DESCRIPTION : v DRV_VERSION  ( DRV_RELDATE )\n
-
 #define BOND_MAX_ARP_TARGETS   16
 
 #define BOND_DEFAULT_MIIMON100
-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 for-next 02/33] IB/core: Add kref to IB devices

2015-03-24 Thread Somnath Kotur
From: Matan Barak mat...@mellanox.com

Previously. we used device_mutex lock in order to protect
the device's list. That means that in order to guarantee a
device isn't freed while we use it, we had to lock all
devices.

Adding a kref per IB device. Before an IB device
is unregistered, we wait before its not held anymore.

Signed-off-by: Matan Barak mat...@mellanox.com
Signed-off-by: Somnath Kotur somnath.ko...@emulex.com
---
 drivers/infiniband/core/device.c | 41 
 include/rdma/ib_verbs.h  |  6 ++
 2 files changed, 47 insertions(+)

diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
index 18c1ece..8616a95 100644
--- a/drivers/infiniband/core/device.c
+++ b/drivers/infiniband/core/device.c
@@ -261,6 +261,39 @@ out:
return ret;
 }
 
+static void ib_device_complete_cb(struct kref *kref)
+{
+   struct ib_device *device = container_of(kref, struct ib_device,
+   refcount);
+
+   if (device-reg_state = IB_DEV_UNREGISTERING)
+   complete(device-free);
+}
+
+/**
+ * ib_device_hold - increase the reference count of device
+ * @device: ib device to prevent from being free'd
+ *
+ * Prevent the device from being free'd.
+ */
+void ib_device_hold(struct ib_device *device)
+{
+   kref_get(device-refcount);
+}
+EXPORT_SYMBOL(ib_device_hold);
+
+/**
+ * ib_device_put - decrease the reference count of device
+ * @device: allows this device to be free'd
+ *
+ * Puts the ib_device and allows it to be free'd.
+ */
+int ib_device_put(struct ib_device *device)
+{
+   return kref_put(device-refcount, ib_device_complete_cb);
+}
+EXPORT_SYMBOL(ib_device_put);
+
 /**
  * ib_register_device - Register an IB device with IB core
  * @device:Device to register
@@ -312,6 +345,9 @@ int ib_register_device(struct ib_device *device,
 
list_add_tail(device-core_list, device_list);
 
+   kref_init(device-refcount);
+   init_completion(device-free);
+
device-reg_state = IB_DEV_REGISTERED;
 
{
@@ -342,6 +378,8 @@ void ib_unregister_device(struct ib_device *device)
 
mutex_lock(device_mutex);
 
+   device-reg_state = IB_DEV_UNREGISTERING;
+
list_for_each_entry_reverse(client, client_list, list)
if (client-remove)
client-remove(device);
@@ -355,6 +393,9 @@ void ib_unregister_device(struct ib_device *device)
 
ib_device_unregister_sysfs(device);
 
+   ib_device_put(device);
+   wait_for_completion(device-free);
+
spin_lock_irqsave(device-client_data_lock, flags);
list_for_each_entry_safe(context, tmp, device-client_data_list, list)
kfree(context);
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 1866595..a7593b0 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -1716,6 +1716,7 @@ struct ib_device {
enum {
IB_DEV_UNINITIALIZED,
IB_DEV_REGISTERED,
+   IB_DEV_UNREGISTERING,
IB_DEV_UNREGISTERED
}reg_state;
 
@@ -1728,6 +1729,8 @@ struct ib_device {
u32  local_dma_lkey;
u8   node_type;
u8   phys_port_cnt;
+   struct kref  refcount;
+   struct completionfree;
 };
 
 struct ib_client {
@@ -1741,6 +1744,9 @@ struct ib_client {
 struct ib_device *ib_alloc_device(size_t size);
 void ib_dealloc_device(struct ib_device *device);
 
+void ib_device_hold(struct ib_device *device);
+int ib_device_put(struct ib_device *device);
+
 int ib_register_device(struct ib_device *device,
   int (*port_callback)(struct ib_device *,
u8, struct kobject *));
-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 for-next 04/33] IB/core: Add default GID for RoCE GID Cache

2015-03-24 Thread Somnath Kotur
From: Matan Barak mat...@mellanox.com

When RoCE is used, a default GID address should be generated
for every supported RoCE type. These default GID addresses are
generated based on the IPv6 link-local address, but in contrast
to the GID based on the regular IPv6 link-local (as we generate
GID per IP address), these GIDs are also available if the net
device is down (in order to support loopback).
Moreover, these default GID addresses can't be deleted.

Signed-off-by: Matan Barak mat...@mellanox.com
Signed-off-by: Somnath Kotur somnath.ko...@emulex.com
---
 drivers/infiniband/core/core_priv.h  |  12 +++
 drivers/infiniband/core/roce_gid_cache.c | 179 ---
 drivers/infiniband/core/roce_gid_mgmt.c  |  43 ++--
 include/net/addrconf.h   |  31 ++
 include/rdma/ib_verbs.h  |   1 +
 net/ipv6/addrconf.c  |  31 --
 6 files changed, 243 insertions(+), 54 deletions(-)

diff --git a/drivers/infiniband/core/core_priv.h 
b/drivers/infiniband/core/core_priv.h
index 12797d9..128d2b3 100644
--- a/drivers/infiniband/core/core_priv.h
+++ b/drivers/infiniband/core/core_priv.h
@@ -84,6 +84,16 @@ int roce_gid_cache_find_gid_by_port(struct ib_device 
*ib_dev, union ib_gid *gid,
 
 int roce_gid_cache_is_active(struct ib_device *ib_dev, u8 port);
 
+enum roce_gid_cache_default_mode {
+   ROCE_GID_CACHE_DEFAULT_MODE_SET,
+   ROCE_GID_CACHE_DEFAULT_MODE_DELETE
+};
+
+void roce_gid_cache_set_default_gid(struct ib_device *ib_dev, u8 port,
+   struct net_device *ndev,
+   unsigned long gid_type_mask,
+   enum roce_gid_cache_default_mode mode);
+
 int roce_gid_cache_setup(void);
 void roce_gid_cache_cleanup(void);
 
@@ -100,5 +110,7 @@ int roce_gid_mgmt_init(void);
 void roce_gid_mgmt_cleanup(void);
 
 int roce_rescan_device(struct ib_device *ib_dev);
+unsigned long roce_gid_type_mask_support(struct ib_device *ib_dev, u8 port);
+
 
 #endif /* _CORE_PRIV_H */
diff --git a/drivers/infiniband/core/roce_gid_cache.c 
b/drivers/infiniband/core/roce_gid_cache.c
index 1d0f841..1f30dad 100644
--- a/drivers/infiniband/core/roce_gid_cache.c
+++ b/drivers/infiniband/core/roce_gid_cache.c
@@ -34,6 +34,7 @@
 #include linux/netdevice.h
 #include linux/rtnetlink.h
 #include rdma/ib_cache.h
+#include net/addrconf.h
 
 #include core_priv.h
 
@@ -43,8 +44,10 @@ EXPORT_SYMBOL_GPL(zgid);
 static const struct ib_gid_attr zattr;
 
 enum gid_attr_find_mask {
-   GID_ATTR_FIND_MASK_GID_TYPE = 1UL  0,
-   GID_ATTR_FIND_MASK_NETDEV   = 1UL  1,
+   GID_ATTR_FIND_MASK_GID  = 1UL  0,
+   GID_ATTR_FIND_MASK_GID_TYPE = 1UL  1,
+   GID_ATTR_FIND_MASK_NETDEV   = 1UL  2,
+   GID_ATTR_FIND_MASK_DEFAULT  = 1UL  3,
 };
 
 static inline int start_port(struct ib_device *ib_dev)
@@ -69,7 +72,8 @@ static void put_ndev(struct rcu_head *rcu)
 static int write_gid(struct ib_device *ib_dev, u8 port,
 struct ib_roce_gid_cache *cache, int ix,
 const union ib_gid *gid,
-const struct ib_gid_attr *attr)
+const struct ib_gid_attr *attr,
+bool  default_gid)
 {
unsigned int orig_seq;
int ret;
@@ -83,6 +87,7 @@ static int write_gid(struct ib_device *ib_dev, u8 port,
 */
smp_wmb();
 
+   cache-data_vec[ix].default_gid = default_gid;
ret = ib_dev-modify_gid(ib_dev, port, ix, gid, attr,
 cache-data_vec[ix].context);
 
@@ -132,7 +137,8 @@ static int write_gid(struct ib_device *ib_dev, u8 port,
 }
 
 static int find_gid(struct ib_roce_gid_cache *cache, union ib_gid *gid,
-   const struct ib_gid_attr *val, unsigned long mask)
+   const struct ib_gid_attr *val, bool default_gid,
+   unsigned long mask)
 {
int i;
unsigned int orig_seq;
@@ -152,13 +158,18 @@ static int find_gid(struct ib_roce_gid_cache *cache, 
union ib_gid *gid,
attr-gid_type != val-gid_type)
continue;
 
-   if (memcmp(gid, cache-data_vec[i].gid, sizeof(*gid)))
+   if (mask  GID_ATTR_FIND_MASK_GID 
+   memcmp(gid, cache-data_vec[i].gid, sizeof(*gid)))
continue;
 
if (mask  GID_ATTR_FIND_MASK_NETDEV 
attr-ndev != val-ndev)
continue;
 
+   if (mask  GID_ATTR_FIND_MASK_DEFAULT 
+   cache-data_vec[i].default_gid != default_gid)
+   continue;
+
/* We have a match, verify that the data we
 * compared is valid. Make sure that the
 * sequence number we read is the last to be
@@ -176,12 +187,19 @@ static int find_gid(struct ib_roce_gid_cache *cache, 
union ib_gid *gid,
return -1;
 }
 
+static

[PATCH v3 for-next 08/33] IB/core: GID attribute should be returned from verbs API and cache API

2015-03-24 Thread Somnath Kotur
From: Matan Barak mat...@mellanox.com

Along with the GID itself, we now store GIDs attribute.
This GID attribute contains important meta information regarding
the GID itself, for example the netdevice. Thus, this information
needs to be returned in APIs. This patch changes the following APIs:
(a) ib_get_cached_gid
(b) ib_find_cached_gid
(c) ib_find_cached_gid_by_port
(d) ib_query_gid

It changes the usage of those APIs and use the RoCE GID cache
when needed.

Signed-off-by: Matan Barak mat...@mellanox.com
Signed-off-by: Somnath Kotur somnath.ko...@emulex.com
---
 drivers/infiniband/core/cache.c| 225 +
 drivers/infiniband/core/cm.c   |   6 +-
 drivers/infiniband/core/cma.c  |  84 ++---
 drivers/infiniband/core/device.c   |  29 +++-
 drivers/infiniband/core/mad.c  |   2 +-
 drivers/infiniband/core/multicast.c|   3 +-
 drivers/infiniband/core/sa_query.c |   7 +-
 drivers/infiniband/core/sysfs.c|   2 +-
 drivers/infiniband/core/uverbs_marshall.c  |   4 +-
 drivers/infiniband/core/verbs.c|   7 +-
 drivers/infiniband/hw/mlx4/qp.c|   5 +-
 drivers/infiniband/hw/mthca/mthca_av.c |   2 +-
 drivers/infiniband/ulp/ipoib/ipoib_main.c  |   2 +-
 drivers/infiniband/ulp/ipoib/ipoib_multicast.c |   2 +-
 drivers/infiniband/ulp/srp/ib_srp.c|   2 +-
 drivers/infiniband/ulp/srpt/ib_srpt.c  |   3 +-
 include/rdma/ib_cache.h|  44 -
 include/rdma/ib_sa.h   |   4 +-
 include/rdma/ib_verbs.h|   7 +-
 19 files changed, 352 insertions(+), 88 deletions(-)

diff --git a/drivers/infiniband/core/cache.c b/drivers/infiniband/core/cache.c
index 80f6cf2..882d491 100644
--- a/drivers/infiniband/core/cache.c
+++ b/drivers/infiniband/core/cache.c
@@ -42,6 +42,8 @@
 
 #include core_priv.h
 
+#define __IB_ONLY
+
 struct ib_pkey_cache {
int table_len;
u16 table[0];
@@ -69,16 +71,16 @@ static inline int end_port(struct ib_device *device)
0 : device-phys_port_cnt;
 }
 
-int ib_get_cached_gid(struct ib_device *device,
- u8port_num,
- int   index,
- union ib_gid *gid)
+static int __IB_ONLY __ib_get_cached_gid(struct ib_device *device,
+u8port_num,
+int   index,
+union ib_gid *gid)
 {
struct ib_gid_cache *cache;
unsigned long flags;
int ret = 0;
 
-   if (port_num  start_port(device) || port_num  end_port(device))
+   if (!device-cache.gid_cache)
return -EINVAL;
 
read_lock_irqsave(device-cache.lock, flags);
@@ -94,43 +96,183 @@ int ib_get_cached_gid(struct ib_device *device,
 
return ret;
 }
+
+int ib_cache_use_roce_gid_cache(struct ib_device *device, u8 port_num)
+{
+   if (rdma_port_get_link_layer(device, port_num) ==
+   IB_LINK_LAYER_ETHERNET) {
+   if (device-cache.roce_gid_cache)
+   return 0;
+   else
+   return -EAGAIN;
+   }
+
+   return -EINVAL;
+}
+EXPORT_SYMBOL(ib_cache_use_roce_gid_cache);
+
+int ib_get_cached_gid(struct ib_device *device,
+ u8port_num,
+ int   index,
+ union ib_gid *gid,
+ struct ib_gid_attr *attr)
+{
+   int ret;
+
+   if (port_num  start_port(device) || port_num  end_port(device))
+   return -EINVAL;
+
+   ret = ib_cache_use_roce_gid_cache(device, port_num);
+   if (!ret)
+   return roce_gid_cache_get_gid(device, port_num, index, gid,
+ attr);
+
+   if (ret == -EAGAIN)
+   return ret;
+
+   ret = __ib_get_cached_gid(device, port_num, index, gid);
+
+   if (!ret  attr) {
+   memset(attr, 0, sizeof(*attr));
+   attr-gid_type = IB_GID_TYPE_IB;
+   }
+
+   return ret;
+}
 EXPORT_SYMBOL(ib_get_cached_gid);
 
-int ib_find_cached_gid(struct ib_device *device,
-  union ib_gid *gid,
-  u8   *port_num,
-  u16  *index)
+static int __IB_ONLY ___ib_find_cached_gid_by_port(struct ib_device *device,
+  u8   port_num,
+  const union ib_gid *gid,
+  u16  *index)
 {
struct ib_gid_cache *cache;
+   u8 p = port_num - start_port(device);
+   int i;
+
+   if (!ib_cache_use_roce_gid_cache(device

[PATCH v3 for-next 00/33] RoCE V1/V2 per GID

2015-03-24 Thread Somnath Kotur
 it.
(5) cma_configfs should depend on both address translation and configfs.
(6) ocrdma driver redefined zgid.
(7) Added event information for NETDEV_CHANGEUPPER event.

Changes from V1:
(1) Addressed Shachar and Haggai's comments
(2) Fixed multicast support
(3) Generalized bonding support
(4) Added default GID after the IB device's net device was removed from bonding
(5) Fixed bugs in mlx4 implementation regarding multicast
(6) Fixed bugs in mlx4 when using XRC QPs after this patchset was applied
(7) Fixed bug when the RoCE gid cache didn't exist
(8) Moved the bonding's DRV macros to a private header
(9) Support non-configfs configurations

Devesh Sharma (3):
  RDMA/ocrdma: changes to support RoCE-v2 in UD path
  RDMA/ocrdma: changes to support RoCE-v2 in RC path
  RDMA/ocrdma: changes to support user AH creation

Maor Gottlieb (1):
  net/mlx4_core: Add handlning of R-RoCE over IPV4 in qp attach flow

Matan Barak (14):
  IB/core: Add RoCE GID cache
  IB/core: Add kref to IB devices
  IB/core: Add RoCE GID population
  IB/core: Add default GID for RoCE GID Cache
  net/bonding: make DRV macros private
  net: Add info for NETDEV_CHANGEUPPER event
  IB/core: Add RoCE cache bonding support
  IB/core: GID attribute should be returned from verbs API and cache API
  IB/core: Report gid_type and gid_ndev through sysfs
  IB/core: Support find sgid index using a filter function
  IB/core: Modify ib_verbs and cma in order to use roce_gid_cache
  IB/core: Add gid_type to path and rdma_id_private
  IB/core: Add rdma_network_type to wc
  IB/cma: Add configfs for rdma_cm

Moni Shoua (13):
  IB/mlx4: Remove gid table management for RoCE
  IB/mlx4: Replace spin_lock with rw_semaphore
  IB/mlx4: Lock with RCU instead of RTNL
  net/mlx4: Postpone the registration of net_device
  IB/mlx4: Advertise RoCE support in port capabilities
  IB/mlx4: Implement ib_device callback - get_netdev
  IB/mlx4: Implement ib_device callback - modify_gid
  IB/mlx4: Configure device to work in RoCEv2
  IB/mlx4: Translate cache gid index to real index
  IB/core: Initialize UD header structure with IP and UDP headers
  IB/mlx4: Enable send of RoCE QP1 packets with IP/UDP headers
  IB/mlx4: Create and use another QP1 for RoCEv2
  IB/cma: Join and leave multicast groups with IGMP

Somnath Kotur (2):
  IB/Core: Changes to the IB Core infrastructure for RoCEv2 support
  RDMA/ocrdma: Changes in driver to incorporate the moving of GID Table
mgmt to IB/Core.

 drivers/infiniband/Kconfig |   5 +
 drivers/infiniband/core/Makefile   |   5 +-
 drivers/infiniband/core/addr.c |  11 +-
 drivers/infiniband/core/cache.c| 249 ++--
 drivers/infiniband/core/cm.c   |  49 +-
 drivers/infiniband/core/cma.c  | 233 +--
 drivers/infiniband/core/cma_configfs.c | 222 +++
 drivers/infiniband/core/core_priv.h|  92 ++-
 drivers/infiniband/core/device.c   | 150 -
 drivers/infiniband/core/mad.c  |   2 +-
 drivers/infiniband/core/multicast.c|  17 +-
 drivers/infiniband/core/roce_gid_cache.c   | 825 +
 drivers/infiniband/core/roce_gid_mgmt.c| 804 
 drivers/infiniband/core/sa_query.c |  12 +-
 drivers/infiniband/core/sysfs.c| 186 +-
 drivers/infiniband/core/ucma.c |   1 -
 drivers/infiniband/core/ud_header.c| 153 -
 drivers/infiniband/core/uverbs_cmd.c   |   3 +-
 drivers/infiniband/core/uverbs_marshall.c  |   5 +-
 drivers/infiniband/core/verbs.c| 266 ++--
 drivers/infiniband/hw/mlx4/ah.c|  15 +-
 drivers/infiniband/hw/mlx4/mad.c   |  12 +-
 drivers/infiniband/hw/mlx4/main.c  | 758 +--
 drivers/infiniband/hw/mlx4/mcg.c   |   2 +-
 drivers/infiniband/hw/mlx4/mlx4_ib.h   |  33 +-
 drivers/infiniband/hw/mlx4/qp.c| 337 --
 drivers/infiniband/hw/mthca/mthca_av.c |   2 +-
 drivers/infiniband/hw/mthca/mthca_qp.c |   2 +-
 drivers/infiniband/hw/ocrdma/ocrdma.h  |  12 +
 drivers/infiniband/hw/ocrdma/ocrdma_ah.c   |  94 ++-
 drivers/infiniband/hw/ocrdma/ocrdma_ah.h   |   5 +-
 drivers/infiniband/hw/ocrdma/ocrdma_hw.c   |  50 +-
 drivers/infiniband/hw/ocrdma/ocrdma_main.c | 233 +--
 drivers/infiniband/hw/ocrdma/ocrdma_sli.h  |  18 +-
 drivers/infiniband/hw/ocrdma/ocrdma_verbs.c|  54 +-
 drivers/infiniband/hw/ocrdma/ocrdma_verbs.h|   4 +
 drivers/infiniband/ulp/ipoib/ipoib_main.c  |   2 +-
 drivers/infiniband/ulp/ipoib/ipoib_multicast.c |   2 +-
 drivers/infiniband/ulp/srp/ib_srp.c|   2 +-
 drivers/infiniband/ulp/srpt/ib_srpt.c  |   3 +-
 drivers/net/bonding/bond_main.c|   2 +
 drivers/net/bonding/bond_options.c |  13 -
 drivers/net/bonding/bond_procfs.c

[PATCH v3 for-next 15/33] IB/Core: Changes to the IB Core infrastructure for RoCEv2 support

2015-03-24 Thread Somnath Kotur
1. Choose sgid_index and type from all the matching entries in RDMA-CM
   based on hint from the IP stack.
2. Set hop_limit for the IP Packet based on above hint from IP stack
3. Define a RDMA_NETWORK enum type.

Signed-off-by: Somnath Kotur somnath.ko...@emulex.com
---
 drivers/infiniband/core/addr.c  |  8 +
 drivers/infiniband/core/cma.c   | 10 +-
 drivers/infiniband/core/verbs.c | 77 ++---
 include/rdma/ib_addr.h  |  1 +
 include/rdma/ib_verbs.h |  9 +
 5 files changed, 68 insertions(+), 37 deletions(-)

diff --git a/drivers/infiniband/core/addr.c b/drivers/infiniband/core/addr.c
index 43af7f5..da24c0e 100644
--- a/drivers/infiniband/core/addr.c
+++ b/drivers/infiniband/core/addr.c
@@ -257,6 +257,9 @@ static int addr4_resolve(struct sockaddr_in *src_in,
goto put;
}
 
+   if (rt-rt_uses_gateway)
+   addr-network = RDMA_NETWORK_IPV4;
+
ret = dst_fetch_ha(rt-dst, addr, fl4.daddr);
 put:
ip_rt_put(rt);
@@ -271,6 +274,7 @@ static int addr6_resolve(struct sockaddr_in6 *src_in,
 {
struct flowi6 fl6;
struct dst_entry *dst;
+   struct rt6_info *rt;
int ret;
 
memset(fl6, 0, sizeof fl6);
@@ -282,6 +286,7 @@ static int addr6_resolve(struct sockaddr_in6 *src_in,
if ((ret = dst-error))
goto put;
 
+   rt = (struct rt6_info *)dst;
if (ipv6_addr_any(fl6.saddr)) {
ret = ipv6_dev_get_saddr(init_net, ip6_dst_idev(dst)-dev,
 fl6.daddr, 0, fl6.saddr);
@@ -305,6 +310,9 @@ static int addr6_resolve(struct sockaddr_in6 *src_in,
goto put;
}
 
+   if (rt-rt6i_flags  RTF_GATEWAY)
+   addr-network = RDMA_NETWORK_IPV6;
+
ret = dst_fetch_ha(dst, addr, fl6.daddr);
 put:
dst_release(dst);
diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index 8dec040..6f345e2 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -1952,6 +1952,7 @@ static int cma_resolve_iboe_route(struct rdma_id_private 
*id_priv)
 {
struct rdma_route *route = id_priv-id.route;
struct rdma_addr *addr = route-addr;
+   enum ib_gid_type network_gid_type;
struct cma_work *work;
int ret;
struct net_device *ndev = NULL;
@@ -1990,7 +1991,14 @@ static int cma_resolve_iboe_route(struct rdma_id_private 
*id_priv)
rdma_ip2gid((struct sockaddr *)id_priv-id.route.addr.dst_addr,
route-path_rec-dgid);
 
-   route-path_rec-hop_limit = 1;
+   /* Use the hint from IP Stack to select GID Type */
+   network_gid_type = ib_network_to_gid_type(addr-dev_addr.network);
+   if (addr-dev_addr.network != RDMA_NETWORK_IB) {
+   route-path_rec-gid_type = network_gid_type;
+   route-path_rec-hop_limit = IPV6_DEFAULT_HOPLIMIT;
+   } else {
+   route-path_rec-hop_limit = 1;
+   }
route-path_rec-reversible = 1;
route-path_rec-pkey = cpu_to_be16(0x);
route-path_rec-mtu_selector = IB_SA_EQ;
diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index 2e7ccad..3586996 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -195,11 +195,11 @@ struct ib_ah *ib_create_ah(struct ib_pd *pd, struct 
ib_ah_attr *ah_attr)
 }
 EXPORT_SYMBOL(ib_create_ah);
 
-static int ib_get_grh_header_version(const void *h)
+static int ib_get_grh_header_version(const union rdma_network_hdr *h)
 {
-   const struct iphdr *ip4h = (struct iphdr *)(h + 20);
+   const struct iphdr *ip4h = (struct iphdr *)h-roce4grh;
struct iphdr ip4h_checked;
-   const struct ipv6hdr *ip6h = (struct ipv6hdr *)h;
+   const struct ipv6hdr *ip6h = (struct ipv6hdr *)h-ibgrh;
 
if (ip6h-version != 6)
return (ip4h-version == 4) ? 4 : 0;
@@ -219,37 +219,6 @@ static int ib_get_grh_header_version(const void *h)
return 6;
 }
 
-static int ib_get_dgid_sgid_by_grh(const void *h,
-  enum rdma_network_type net_type,
-  union ib_gid *dgid, union ib_gid *sgid)
-{
-   switch (net_type) {
-   case RDMA_NETWORK_IPV4: {
-   const struct iphdr *ip4h = (struct iphdr *)(h + 20);
-
-   ipv6_addr_set_v4mapped(ip4h-daddr, (struct in6_addr *)dgid);
-   ipv6_addr_set_v4mapped(ip4h-saddr, (struct in6_addr *)sgid);
-   return 0;
-   }
-   case RDMA_NETWORK_IPV6: {
-   struct ipv6hdr *ip6h = (struct ipv6hdr *)h;
-
-   memcpy(dgid, ip6h-daddr, sizeof(*dgid));
-   memcpy(sgid, ip6h-saddr, sizeof(*sgid));
-   return 0;
-   }
-   case RDMA_NETWORK_IB: {
-   struct ib_grh *grh = (struct ib_grh *)h;
-
-   memcpy(dgid, grh-dgid, sizeof(*dgid

[PATCH v3 for-next 16/33] RDMA/ocrdma: Changes in driver to incorporate the moving of GID Table mgmt to IB/Core.

2015-03-24 Thread Somnath Kotur
1.Check and set port capability flags to indicate RoCEV2 support.
2.Change query_gid hook to return value from IB/Core GID Mgmt APIs.
3.Get rid of all the netdev notifier chain subscription code as well as
maintenance of SGID Table in memory.
4.Implement get_netdev hook in driver.

Signed-off-by: Somnath Kotur somnath.ko...@emulex.com
Signed-off-by: Devesh Sharma devesh.sha...@emulex.com
---
 drivers/infiniband/hw/ocrdma/ocrdma.h   |  10 ++
 drivers/infiniband/hw/ocrdma/ocrdma_hw.c|   3 +
 drivers/infiniband/hw/ocrdma/ocrdma_main.c  | 233 +---
 drivers/infiniband/hw/ocrdma/ocrdma_sli.h   |  13 ++
 drivers/infiniband/hw/ocrdma/ocrdma_verbs.c |  33 +++-
 drivers/infiniband/hw/ocrdma/ocrdma_verbs.h |   4 +
 6 files changed, 64 insertions(+), 232 deletions(-)

diff --git a/drivers/infiniband/hw/ocrdma/ocrdma.h 
b/drivers/infiniband/hw/ocrdma/ocrdma.h
index 16ee36e..97f971a 100644
--- a/drivers/infiniband/hw/ocrdma/ocrdma.h
+++ b/drivers/infiniband/hw/ocrdma/ocrdma.h
@@ -100,6 +100,7 @@ struct ocrdma_dev_attr {
u8 local_ca_ack_delay;
u8 ird;
u8 num_ird_pages;
+   u8 roce_flags;
 };
 
 struct ocrdma_dma_mem {
@@ -575,4 +576,13 @@ static inline u8 ocrdma_is_enabled_and_synced(u32 state)
(state  OCRDMA_STATE_FLAG_SYNC);
 }
 
+static inline bool ocrdma_is_rocev2_supported(struct ocrdma_dev *dev)
+{
+   return (dev-attr.roce_flags  (OCRDMA_L3_TYPE_IPV4 
+   OCRDMA_ROUDP_FLAGS_SHIFT) ||
+   dev-attr.roce_flags  (OCRDMA_L3_TYPE_IPV6 
+   OCRDMA_ROUDP_FLAGS_SHIFT)) ?
+   true : false;
+}
+
 #endif
diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_hw.c 
b/drivers/infiniband/hw/ocrdma/ocrdma_hw.c
index e5f0244..20f9e8f 100644
--- a/drivers/infiniband/hw/ocrdma/ocrdma_hw.c
+++ b/drivers/infiniband/hw/ocrdma/ocrdma_hw.c
@@ -1112,6 +1112,9 @@ static void ocrdma_get_attr(struct ocrdma_dev *dev,
attr-local_ca_ack_delay = (rsp-max_pd_ca_ack_delay 
OCRDMA_MBX_QUERY_CFG_CA_ACK_DELAY_MASK) 
OCRDMA_MBX_QUERY_CFG_CA_ACK_DELAY_SHIFT;
+   attr-roce_flags = (rsp-max_pd_ca_ack_delay 
+   OCRDMA_MBX_QUERY_CFG_L3_TYPE_MASK) 
+   OCRDMA_MBX_QUERY_CFG_L3_TYPE_SHIFT;
attr-max_mw = rsp-max_mw;
attr-max_mr = rsp-max_mr;
attr-max_mr_size = ((u64)rsp-max_mr_size_hi  32) |
diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_main.c 
b/drivers/infiniband/hw/ocrdma/ocrdma_main.c
index 7a2b59a..a81492f 100644
--- a/drivers/infiniband/hw/ocrdma/ocrdma_main.c
+++ b/drivers/infiniband/hw/ocrdma/ocrdma_main.c
@@ -51,8 +51,6 @@ static LIST_HEAD(ocrdma_dev_list);
 static DEFINE_SPINLOCK(ocrdma_devlist_lock);
 static DEFINE_IDR(ocrdma_dev_id);
 
-static union ib_gid ocrdma_zero_sgid;
-
 void ocrdma_get_guid(struct ocrdma_dev *dev, u8 *guid)
 {
u8 mac_addr[6];
@@ -67,135 +65,6 @@ void ocrdma_get_guid(struct ocrdma_dev *dev, u8 *guid)
guid[6] = mac_addr[4];
guid[7] = mac_addr[5];
 }
-
-static bool ocrdma_add_sgid(struct ocrdma_dev *dev, union ib_gid *new_sgid)
-{
-   int i;
-   unsigned long flags;
-
-   memset(ocrdma_zero_sgid, 0, sizeof(union ib_gid));
-
-
-   spin_lock_irqsave(dev-sgid_lock, flags);
-   for (i = 0; i  OCRDMA_MAX_SGID; i++) {
-   if (!memcmp(dev-sgid_tbl[i], ocrdma_zero_sgid,
-   sizeof(union ib_gid))) {
-   /* found free entry */
-   memcpy(dev-sgid_tbl[i], new_sgid,
-  sizeof(union ib_gid));
-   spin_unlock_irqrestore(dev-sgid_lock, flags);
-   return true;
-   } else if (!memcmp(dev-sgid_tbl[i], new_sgid,
-  sizeof(union ib_gid))) {
-   /* entry already present, no addition is required. */
-   spin_unlock_irqrestore(dev-sgid_lock, flags);
-   return false;
-   }
-   }
-   spin_unlock_irqrestore(dev-sgid_lock, flags);
-   return false;
-}
-
-static bool ocrdma_del_sgid(struct ocrdma_dev *dev, union ib_gid *sgid)
-{
-   int found = false;
-   int i;
-   unsigned long flags;
-
-
-   spin_lock_irqsave(dev-sgid_lock, flags);
-   /* first is default sgid, which cannot be deleted. */
-   for (i = 1; i  OCRDMA_MAX_SGID; i++) {
-   if (!memcmp(dev-sgid_tbl[i], sgid, sizeof(union ib_gid))) {
-   /* found matching entry */
-   memset(dev-sgid_tbl[i], 0, sizeof(union ib_gid));
-   found = true;
-   break;
-   }
-   }
-   spin_unlock_irqrestore(dev-sgid_lock, flags);
-   return found;
-}
-
-static int ocrdma_addr_event(unsigned long event, struct

[PATCH v3 for-next 14/33] IB/cma: Add configfs for rdma_cm

2015-03-24 Thread Somnath Kotur
From: Matan Barak mat...@mellanox.com

Users would like to control the behaviour of rdma_cm.
For example, old applications which doesn't set the
required RoCE gid type could be executed on RoCE V2
network types. In order to support this configuration,
we implement a configfs for rdma_cm.

In order to use the configfs, one needs to mount it and
mkdir IB device name inside rdma_cm directory.

The patch adds support for a single configuration file,
default_roce_mode. The mode can either be IB  RoCEv1 or
RoCEv2.

Signed-off-by: Matan Barak mat...@mellanox.com
Signed-off-by: Somnath Kotur somnath.ko...@emulex.com
---
 drivers/infiniband/Kconfig   |   5 +
 drivers/infiniband/core/Makefile |   2 +
 drivers/infiniband/core/cma.c|  54 +++-
 drivers/infiniband/core/cma_configfs.c   | 222 +++
 drivers/infiniband/core/core_priv.h  |  15 +++
 drivers/infiniband/core/roce_gid_cache.c |  13 ++
 6 files changed, 307 insertions(+), 4 deletions(-)
 create mode 100644 drivers/infiniband/core/cma_configfs.c

diff --git a/drivers/infiniband/Kconfig b/drivers/infiniband/Kconfig
index b899531..20bda60 100644
--- a/drivers/infiniband/Kconfig
+++ b/drivers/infiniband/Kconfig
@@ -54,6 +54,11 @@ config INFINIBAND_ADDR_TRANS
depends on INFINIBAND
default y
 
+config CONFIG_INFINIBAND_ADDR_TRANS_CONFIGFS
+   bool
+   depends on INFINIBAND_ADDR_TRANS  CONFIGFS_FS
+   default y
+
 source drivers/infiniband/hw/mthca/Kconfig
 source drivers/infiniband/hw/ipath/Kconfig
 source drivers/infiniband/hw/qib/Kconfig
diff --git a/drivers/infiniband/core/Makefile b/drivers/infiniband/core/Makefile
index 2c94963..f6bc8c5 100644
--- a/drivers/infiniband/core/Makefile
+++ b/drivers/infiniband/core/Makefile
@@ -24,6 +24,8 @@ iw_cm-y :=iwcm.o iwpm_util.o iwpm_msg.o
 
 rdma_cm-y :=   cma.o
 
+rdma_cm-$(CONFIG_INFINIBAND_ADDR_TRANS_CONFIGFS) += cma_configfs.o
+
 rdma_ucm-y :=  ucma.o
 
 ib_addr-y :=   addr.o
diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index 9afa410..8dec040 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -55,6 +55,7 @@
 #include rdma/ib_cm.h
 #include rdma/ib_sa.h
 #include rdma/iw_cm.h
+#include core_priv.h
 
 MODULE_AUTHOR(Sean Hefty);
 MODULE_DESCRIPTION(Generic RDMA CM Agent);
@@ -91,6 +92,7 @@ struct cma_device {
struct completion   comp;
atomic_trefcount;
struct list_headid_list;
+   enum ib_gid_typedefault_gid_type;
 };
 
 struct rdma_bind_list {
@@ -103,6 +105,42 @@ enum {
CMA_OPTION_AFONLY,
 };
 
+void cma_ref_dev(struct cma_device *cma_dev)
+{
+   atomic_inc(cma_dev-refcount);
+}
+
+struct cma_device *cma_enum_devices_by_ibdev(cma_device_filter filter,
+void   *cookie)
+{
+   struct cma_device *cma_dev;
+   struct cma_device *found_cma_dev = NULL;
+
+   mutex_lock(lock);
+
+   list_for_each_entry(cma_dev, dev_list, list)
+   if (filter(cma_dev-device, cookie)) {
+   found_cma_dev = cma_dev;
+   break;
+   }
+
+   if (found_cma_dev)
+   cma_ref_dev(found_cma_dev);
+   mutex_unlock(lock);
+   return found_cma_dev;
+}
+
+enum ib_gid_type cma_get_default_gid_type(struct cma_device *cma_dev)
+{
+   return cma_dev-default_gid_type;
+}
+
+void cma_set_default_gid_type(struct cma_device *cma_dev,
+ enum ib_gid_type default_gid_type)
+{
+   cma_dev-default_gid_type = default_gid_type;
+}
+
 /*
  * Device removal can occur at anytime, so we need extra handling to
  * serialize notifying the user of device removal with other callbacks.
@@ -248,15 +286,16 @@ static inline void cma_set_ip_ver(struct cma_hdr *hdr, u8 
ip_ver)
 static void cma_attach_to_dev(struct rdma_id_private *id_priv,
  struct cma_device *cma_dev)
 {
-   atomic_inc(cma_dev-refcount);
+   cma_ref_dev(cma_dev);
id_priv-cma_dev = cma_dev;
+   id_priv-gid_type = cma_dev-default_gid_type;
id_priv-id.device = cma_dev-device;
id_priv-id.route.addr.dev_addr.transport =
rdma_node_get_transport(cma_dev-device-node_type);
list_add_tail(id_priv-list, cma_dev-id_list);
 }
 
-static inline void cma_deref_dev(struct cma_device *cma_dev)
+void cma_deref_dev(struct cma_device *cma_dev)
 {
if (atomic_dec_and_test(cma_dev-refcount))
complete(cma_dev-comp);
@@ -385,7 +424,7 @@ static int cma_acquire_dev(struct rdma_id_private *id_priv,
 
ret = ib_find_cached_gid_by_port(cma_dev-device,
 iboe_gid,
-IB_GID_TYPE_IB

[PATCH v3 for-next 12/33] IB/core: Add gid_type to path and rdma_id_private

2015-03-24 Thread Somnath Kotur
From: Matan Barak mat...@mellanox.com

When using rdma cm, we want to take the gid_type from
the rdma_id_private. This is mandatory before adding
an API from user-space/configfs that sets
the gid_type of CM connection.

Signed-off-by: Matan Barak mat...@mellanox.com
Signed-off-by: Somnath Kotur somnath.ko...@emulex.com
---
 drivers/infiniband/core/cm.c  | 19 ++-
 drivers/infiniband/core/cma.c |  2 ++
 drivers/infiniband/core/sa_query.c|  3 ++-
 drivers/infiniband/core/uverbs_marshall.c |  1 +
 include/rdma/ib_sa.h  |  1 +
 5 files changed, 20 insertions(+), 6 deletions(-)

diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
index 7974e74..22dac05 100644
--- a/drivers/infiniband/core/cm.c
+++ b/drivers/infiniband/core/cm.c
@@ -358,9 +358,8 @@ static int cm_init_av_by_path(struct ib_sa_path_rec *path, 
struct cm_av *av)
read_lock_irqsave(cm.device_lock, flags);
list_for_each_entry(cm_dev, cm.device_list, list) {
if (!ib_find_cached_gid(cm_dev-ib_device, path-sgid,
-   IB_GID_TYPE_IB, path-net,
-   path-ifindex,
-   p, NULL)) {
+   path-gid_type, path-net,
+   path-ifindex, p, NULL)) {
port = cm_dev-port[p-1];
break;
}
@@ -1521,6 +1520,8 @@ static int cm_req_handler(struct cm_work *work)
struct ib_cm_id *cm_id;
struct cm_id_private *cm_id_priv, *listen_cm_id_priv;
struct cm_req_msg *req_msg;
+   union ib_gid gid;
+   struct ib_gid_attr gid_attr;
int ret;
 
req_msg = (struct cm_req_msg *)work-mad_recv_wc-recv_buf.mad;
@@ -1560,11 +1561,19 @@ static int cm_req_handler(struct cm_work *work)
cm_format_paths_from_req(req_msg, work-path[0], work-path[1]);
 
memcpy(work-path[0].dmac, cm_id_priv-av.ah_attr.dmac, ETH_ALEN);
-   ret = cm_init_av_by_path(work-path[0], cm_id_priv-av);
+   ret = ib_get_cached_gid(work-port-cm_dev-ib_device,
+   work-port-port_num,
+   cm_id_priv-av.ah_attr.grh.sgid_index,
+   gid, gid_attr);
+   if (!ret) {
+   work-path[0].gid_type = gid_attr.gid_type;
+   ret = cm_init_av_by_path(work-path[0], cm_id_priv-av);
+   }
if (ret) {
ib_get_cached_gid(work-port-cm_dev-ib_device,
  work-port-port_num, 0, work-path[0].sgid,
- NULL);
+ gid_attr);
+   work-path[0].gid_type = gid_attr.gid_type;
ib_send_cm_rej(cm_id, IB_CM_REJ_INVALID_GID,
   work-path[0].sgid, sizeof work-path[0].sgid,
   NULL, 0);
diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index 659676c..9afa410 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -146,6 +146,7 @@ struct rdma_id_private {
u8  tos;
u8  reuseaddr;
u8  afonly;
+   enum ib_gid_typegid_type;
 };
 
 struct cma_multicast {
@@ -1936,6 +1937,7 @@ static int cma_resolve_iboe_route(struct rdma_id_private 
*id_priv)
ndev = dev_get_by_index(init_net, addr-dev_addr.bound_dev_if);
route-path_rec-net = init_net;
route-path_rec-ifindex = addr-dev_addr.bound_dev_if;
+   route-path_rec-gid_type = id_priv-gid_type;
}
if (!ndev) {
ret = -ENODEV;
diff --git a/drivers/infiniband/core/sa_query.c 
b/drivers/infiniband/core/sa_query.c
index 705b6b8..f770049 100644
--- a/drivers/infiniband/core/sa_query.c
+++ b/drivers/infiniband/core/sa_query.c
@@ -546,7 +546,7 @@ int ib_init_ah_from_path(struct ib_device *device, u8 
port_num,
ah_attr-ah_flags = IB_AH_GRH;
ah_attr-grh.dgid = rec-dgid;
 
-   ret = ib_find_cached_gid(device, rec-sgid, IB_GID_TYPE_IB,
+   ret = ib_find_cached_gid(device, rec-sgid, rec-gid_type,
 rec-net, rec-ifindex, port_num,
 gid_index);
if (ret)
@@ -676,6 +676,7 @@ static void ib_sa_path_rec_callback(struct ib_sa_query 
*sa_query,
  mad-data, rec);
rec.net = NULL;
rec.ifindex = 0;
+   rec.gid_type = IB_GID_TYPE_IB;
memset(rec.dmac, 0, ETH_ALEN);
query-callback(status, rec, query-context);
} else
diff --git a/drivers/infiniband/core/uverbs_marshall.c 
b/drivers/infiniband/core/uverbs_marshall.c
index 7d2f14c..af020f8 100644

[PATCH v3 for-next 10/33] IB/core: Support find sgid index using a filter function

2015-03-24 Thread Somnath Kotur
From: Matan Barak mat...@mellanox.com

Sometimes a sgid index need to be found based on variable parameters.
For example, when the CM gets a packet from network, it needs to
match a sgid_index that matches the appropriate L2 attributes
of a packet. Extending the cache's API to include Ethernet L2
attribute is problematic, since they may be vastly extended
in the future. As a result, we add a find function that
gets a user filter function and searches the GID table
until a match is found.

Signed-off-by: Matan Barak mat...@mellanox.com
Signed-off-by: Somnath Kotur somnath.ko...@emulex.com
---
 drivers/infiniband/core/cache.c  | 24 
 drivers/infiniband/core/core_priv.h  |  9 +
 drivers/infiniband/core/roce_gid_cache.c | 66 
 include/rdma/ib_cache.h  | 27 +
 4 files changed, 126 insertions(+)

diff --git a/drivers/infiniband/core/cache.c b/drivers/infiniband/core/cache.c
index 882d491..ae86fe8 100644
--- a/drivers/infiniband/core/cache.c
+++ b/drivers/infiniband/core/cache.c
@@ -273,6 +273,30 @@ int ib_find_cached_gid_by_port(struct ib_device *device,
 }
 EXPORT_SYMBOL(ib_find_cached_gid_by_port);
 
+int ib_find_gid_by_filter(struct ib_device *device,
+ union ib_gid *gid,
+ u8 port_num,
+ bool (*filter)(const union ib_gid *gid,
+const struct ib_gid_attr *,
+void *),
+ void *context, u16 *index)
+{
+   /* Look for a RoCE device with the specified GID. */
+   if (!ib_cache_use_roce_gid_cache(device, port_num))
+   return roce_gid_cache_find_gid_by_filter(device, gid,
+port_num, filter,
+context, index);
+
+   /* Only RoCE GID cache supports filter function */
+   if (filter)
+   return -ENOSYS;
+
+   /* If no RoCE devices with the specified GID, look for IB device. */
+   return __ib_find_cached_gid_by_port(device, port_num,
+   gid, index);
+}
+EXPORT_SYMBOL(ib_find_gid_by_filter);
+
 int ib_get_cached_pkey(struct ib_device *device,
   u8port_num,
   int   index,
diff --git a/drivers/infiniband/core/core_priv.h 
b/drivers/infiniband/core/core_priv.h
index b5bbbdf..949844c 100644
--- a/drivers/infiniband/core/core_priv.h
+++ b/drivers/infiniband/core/core_priv.h
@@ -84,6 +84,15 @@ int roce_gid_cache_find_gid_by_port(struct ib_device 
*ib_dev, union ib_gid *gid,
enum ib_gid_type gid_type, u8 port,
struct net *net, int if_index, u16 *index);
 
+int roce_gid_cache_find_gid_by_filter(struct ib_device *ib_dev,
+ union ib_gid *gid,
+ u8 port,
+ bool (*filter)(const union ib_gid *gid,
+const struct ib_gid_attr *,
+void *),
+ void *context,
+ u16 *index);
+
 int roce_gid_cache_is_active(struct ib_device *ib_dev, u8 port);
 
 enum roce_gid_cache_default_mode {
diff --git a/drivers/infiniband/core/roce_gid_cache.c 
b/drivers/infiniband/core/roce_gid_cache.c
index b6180eb..bd51d97 100644
--- a/drivers/infiniband/core/roce_gid_cache.c
+++ b/drivers/infiniband/core/roce_gid_cache.c
@@ -455,6 +455,72 @@ int roce_gid_cache_find_gid_by_port(struct ib_device 
*ib_dev, union ib_gid *gid,
return -ENOENT;
 }
 
+int roce_gid_cache_find_gid_by_filter(struct ib_device *ib_dev,
+ union ib_gid *gid,
+ u8 port,
+ bool (*filter)(const union ib_gid *,
+const struct ib_gid_attr *,
+void *),
+ void *context,
+ u16 *index)
+{
+   struct ib_roce_gid_cache *cache;
+   unsigned int i;
+   bool found = false;
+
+   if (!ib_dev-cache.roce_gid_cache)
+   return -ENOSYS;
+
+   if (port  start_port(ib_dev) ||
+   port  start_port(ib_dev) + ib_dev-phys_port_cnt ||
+   rdma_port_get_link_layer(ib_dev, port) !=
+   IB_LINK_LAYER_ETHERNET)
+   return -ENOSYS;
+
+   cache = ib_dev-cache.roce_gid_cache[port - start_port(ib_dev)];
+
+   if (!cache || !cache-active)
+   return -ENOENT;
+
+   for (i = 0; i  cache-sz; i++) {
+   unsigned int orig_seq;
+   struct ib_gid_attr attr

[PATCH v3 for-next 18/33] RDMA/ocrdma: changes to support RoCE-v2 in RC path

2015-03-24 Thread Somnath Kotur
From: Devesh Sharma devesh.sha...@emulex.com

To support RoCE-V2 this patch implements following changes
1. Get the GID-type for a given sgid.
2. Based on the gid type get IPv4 L3 address
   and give those to FW.
3. Provide l3-type to FW.

Signed-off-by: Somnath Kotur somnath.ko...@emulex.com
Signed-off-by: Devesh Sharma devesh.sha...@emulex.com
---
 drivers/infiniband/hw/ocrdma/ocrdma_hw.c | 30 --
 1 file changed, 28 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_hw.c 
b/drivers/infiniband/hw/ocrdma/ocrdma_hw.c
index 20f9e8f..147fccf 100644
--- a/drivers/infiniband/hw/ocrdma/ocrdma_hw.c
+++ b/drivers/infiniband/hw/ocrdma/ocrdma_hw.c
@@ -2433,7 +2433,13 @@ static int ocrdma_set_av_params(struct ocrdma_qp *qp,
union ib_gid sgid, zgid;
struct ib_gid_attr sgid_attr;
u32 vlan_id = 0x;
-   u8 mac_addr[6];
+   u8 mac_addr[6], hdr_type;
+   union {
+   struct sockaddr _sockaddr;
+   struct sockaddr_in  _sockaddr_in;
+   struct sockaddr_in6 _sockaddr_in6;
+   } sgid_addr, dgid_addr;
+
struct ocrdma_dev *dev = get_ocrdma_dev(qp-ibqp.device);
 
if ((ah_attr-ah_flags  IB_AH_GRH) == 0)
@@ -2448,6 +2454,8 @@ static int ocrdma_set_av_params(struct ocrdma_qp *qp,
cmd-params.hop_lmt_rq_psn |=
(ah_attr-grh.hop_limit  OCRDMA_QP_PARAMS_HOP_LMT_SHIFT);
cmd-flags |= OCRDMA_QP_PARA_FLOW_LBL_VALID;
+
+   /* GIDs */
memcpy(cmd-params.dgid[0], ah_attr-grh.dgid.raw[0],
   sizeof(cmd-params.dgid));
 
@@ -2471,17 +2479,35 @@ static int ocrdma_set_av_params(struct ocrdma_qp *qp,
return status;
cmd-params.dmac_b0_to_b3 = mac_addr[0] | (mac_addr[1]  8) |
(mac_addr[2]  16) | (mac_addr[3]  24);
+   hdr_type = ib_gid_to_network_type(sgid_attr.gid_type, sgid);
+   if (hdr_type == RDMA_NETWORK_IPV4) {
+   status = rdma_gid2ip(sgid_addr._sockaddr, sgid);
+   if (status)
+   return status;
+   status = rdma_gid2ip(dgid_addr._sockaddr, ah_attr-grh.dgid);
+   if (status)
+   return status;
+   memcpy(cmd-params.dgid[0],
+  dgid_addr._sockaddr_in.sin_addr.s_addr, 4);
+   memcpy(cmd-params.sgid[0],
+  sgid_addr._sockaddr_in.sin_addr.s_addr, 4);
+   }
/* convert them to LE format. */
ocrdma_cpu_to_le32(cmd-params.dgid[0], sizeof(cmd-params.dgid));
ocrdma_cpu_to_le32(cmd-params.sgid[0], sizeof(cmd-params.sgid));
cmd-params.vlan_dmac_b4_to_b5 = mac_addr[4] | (mac_addr[5]  8);
-   if (attr_mask  IB_QP_VID) {
+   if (vlan_id  0x1000) {
cmd-params.vlan_dmac_b4_to_b5 |=
vlan_id  OCRDMA_QP_PARAMS_VLAN_SHIFT;
cmd-flags |= OCRDMA_QP_PARA_VLAN_EN_VALID;
cmd-params.rnt_rc_sl_fl |=
(dev-sl  0x07)  OCRDMA_QP_PARAMS_SL_SHIFT;
}
+
+   cmd-params.max_sge_recv_flags |=
+((hdr_type 
+OCRDMA_QP_PARAMS_FLAGS_L3_TYPE_SHIFT) 
+OCRDMA_QP_PARAMS_FLAGS_L3_TYPE_MASK);
return 0;
 }
 
-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 for-next 29/33] net/mlx4_core: Add handling of R-RoCE over IPV4 in qp attach flow

2015-03-24 Thread Somnath Kotur
From: Maor Gottlieb ma...@mellanox.com

In that case, the IPv4 bit should be enabled in the IB flow spec.

Signed-off-by: Maor Gottlieb ma...@mellanox.com
Signed-off-by: Somnath Kotur somnath.ko...@emulex.com
---
 drivers/net/ethernet/mellanox/mlx4/mcg.c | 14 --
 include/linux/mlx4/device.h  |  6 ++
 2 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/mcg.c 
b/drivers/net/ethernet/mellanox/mlx4/mcg.c
index a3867e7..cdf07b9 100644
--- a/drivers/net/ethernet/mellanox/mlx4/mcg.c
+++ b/drivers/net/ethernet/mellanox/mlx4/mcg.c
@@ -858,7 +858,9 @@ static int parse_trans_rule(struct mlx4_dev *dev, struct 
mlx4_spec_list *spec,
break;
 
case MLX4_NET_TRANS_RULE_ID_IB:
-   rule_hw-ib.l3_qpn = spec-ib.l3_qpn;
+   rule_hw-ib.l3_qpn = spec-ib.l3_qpn |
+   (spec-ib.roce_type == MLX4_FLOW_SPEC_IB_ROCE_TYPE_IPV4 
?
+0x80 : 0);
rule_hw-ib.qpn_mask = spec-ib.qpn_msk;
memcpy(rule_hw-ib.dst_gid, spec-ib.dst_gid, 16);
memcpy(rule_hw-ib.dst_gid_msk, spec-ib.dst_gid_msk, 16);
@@ -1377,10 +1379,18 @@ int mlx4_trans_to_dmfs_attach(struct mlx4_dev *dev, 
struct mlx4_qp *qp,
memcpy(spec.eth.dst_mac_msk, mac_mask, ETH_ALEN);
break;
 
+   case MLX4_PROT_IB_IPV4:
+   spec.id = MLX4_NET_TRANS_RULE_ID_IB;
+   memcpy(spec.ib.dst_gid + 12, gid + 12, 4);
+   memset(spec.ib.dst_gid_msk + 12, 0xff, 4);
+   spec.ib.roce_type = MLX4_FLOW_SPEC_IB_ROCE_TYPE_IPV4;
+
+   break;
case MLX4_PROT_IB_IPV6:
spec.id = MLX4_NET_TRANS_RULE_ID_IB;
memcpy(spec.ib.dst_gid, gid, 16);
-   memset(spec.ib.dst_gid_msk, 0xff, 16);
+   memset(spec.ib.dst_gid_msk, 0xff, 16);
+   spec.ib.roce_type = MLX4_FLOW_SPEC_IB_ROCE_TYPE_IPV6;
break;
default:
return -EINVAL;
diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h
index dd1488c..58b0b8c 100644
--- a/include/linux/mlx4/device.h
+++ b/include/linux/mlx4/device.h
@@ -369,6 +369,11 @@ enum mlx4_protocol {
MLX4_PROT_FCOE
 };
 
+enum mlx4_flow_roce_type {
+   MLX4_FLOW_SPEC_IB_ROCE_TYPE_IPV6 = 0,
+   MLX4_FLOW_SPEC_IB_ROCE_TYPE_IPV4
+};
+
 enum {
MLX4_MTT_FLAG_PRESENT   = 1
 };
@@ -1096,6 +1101,7 @@ struct mlx4_spec_ipv4 {
 struct mlx4_spec_ib {
__be32  l3_qpn;
__be32  qpn_msk;
+   enummlx4_flow_roce_type roce_type;
u8  dst_gid[16];
u8  dst_gid_msk[16];
 };
-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 for-next 20/33] IB/mlx4: Remove gid table management for RoCE

2015-03-24 Thread Somnath Kotur
From: Moni Shoua mo...@mellanox.com

RoCE GID table management moved to InfiniBand core driver.
Core driver is now responsible to populate the GID table and supply
query and lookup functions for GIDs. HW drivers are responsible only modify
GID table in network adapters.
The query_gid hook should now return the answer from the cache when link layer
is Ethernet.

Signed-off-by: Moni Shoua mo...@mellanox.com
Signed-off-by: Somnath Kotur somnath.ko...@emulex.com
---
 drivers/infiniband/hw/mlx4/main.c| 495 +--
 drivers/infiniband/hw/mlx4/mlx4_ib.h |   4 -
 2 files changed, 14 insertions(+), 485 deletions(-)

diff --git a/drivers/infiniband/hw/mlx4/main.c 
b/drivers/infiniband/hw/mlx4/main.c
index 6fa5e49..91caffc 100644
--- a/drivers/infiniband/hw/mlx4/main.c
+++ b/drivers/infiniband/hw/mlx4/main.c
@@ -45,6 +45,7 @@
 #include rdma/ib_smi.h
 #include rdma/ib_user_verbs.h
 #include rdma/ib_addr.h
+#include rdma/ib_cache.h
 
 #include linux/mlx4/driver.h
 #include linux/mlx4/cmd.h
@@ -74,13 +75,6 @@ static const char mlx4_ib_version[] =
DRV_NAME : Mellanox ConnectX InfiniBand driver v
DRV_VERSION  ( DRV_RELDATE )\n;
 
-struct update_gid_work {
-   struct work_struct  work;
-   union ib_gidgids[128];
-   struct mlx4_ib_dev *dev;
-   int port;
-};
-
 static void do_slave_init(struct mlx4_ib_dev *ibdev, int slave, int do_init);
 
 static struct workqueue_struct *wq;
@@ -474,23 +468,21 @@ out:
return err;
 }
 
-static int iboe_query_gid(struct ib_device *ibdev, u8 port, int index,
- union ib_gid *gid)
-{
-   struct mlx4_ib_dev *dev = to_mdev(ibdev);
-
-   *gid = dev-iboe.gid_table[port - 1][index];
-
-   return 0;
-}
-
 static int mlx4_ib_query_gid(struct ib_device *ibdev, u8 port, int index,
 union ib_gid *gid)
 {
-   if (rdma_port_get_link_layer(ibdev, port) == IB_LINK_LAYER_INFINIBAND)
+   int ret;
+
+   if (ib_cache_use_roce_gid_cache(ibdev, port))
return __mlx4_ib_query_gid(ibdev, port, index, gid, 0);
-   else
-   return iboe_query_gid(ibdev, port, index, gid);
+
+   ret = ib_get_cached_gid(ibdev, port, index, gid, NULL);
+   if (ret == -EAGAIN) {
+   memcpy(gid, zgid, sizeof(*gid));
+   return 0;
+   }
+
+   return ret;
 }
 
 int __mlx4_ib_query_pkey(struct ib_device *ibdev, u8 port, u16 index,
@@ -1480,273 +1472,6 @@ static struct device_attribute *mlx4_class_attributes[] 
= {
dev_attr_board_id
 };
 
-static void mlx4_addrconf_ifid_eui48(u8 *eui, u16 vlan_id,
-struct net_device *dev)
-{
-   memcpy(eui, dev-dev_addr, 3);
-   memcpy(eui + 5, dev-dev_addr + 3, 3);
-   if (vlan_id  0x1000) {
-   eui[3] = vlan_id  8;
-   eui[4] = vlan_id  0xff;
-   } else {
-   eui[3] = 0xff;
-   eui[4] = 0xfe;
-   }
-   eui[0] ^= 2;
-}
-
-static void update_gids_task(struct work_struct *work)
-{
-   struct update_gid_work *gw = container_of(work, struct update_gid_work, 
work);
-   struct mlx4_cmd_mailbox *mailbox;
-   union ib_gid *gids;
-   int err;
-   struct mlx4_dev *dev = gw-dev-dev;
-   int is_bonded = mlx4_is_bonded(dev);
-
-   if (!gw-dev-ib_active)
-   return;
-
-   mailbox = mlx4_alloc_cmd_mailbox(dev);
-   if (IS_ERR(mailbox)) {
-   pr_warn(update gid table failed %ld\n, PTR_ERR(mailbox));
-   return;
-   }
-
-   gids = mailbox-buf;
-   memcpy(gids, gw-gids, sizeof gw-gids);
-
-   err = mlx4_cmd(dev, mailbox-dma, MLX4_SET_PORT_GID_TABLE  8 | 
gw-port,
-  1, MLX4_CMD_SET_PORT, MLX4_CMD_TIME_CLASS_B,
-  MLX4_CMD_WRAPPED);
-   if (err)
-   pr_warn(set port command failed\n);
-   else
-   if ((gw-port == 1) || !is_bonded)
-   mlx4_ib_dispatch_event(gw-dev,
-  is_bonded ? 1 : gw-port,
-  IB_EVENT_GID_CHANGE);
-
-   mlx4_free_cmd_mailbox(dev, mailbox);
-   kfree(gw);
-}
-
-static void reset_gids_task(struct work_struct *work)
-{
-   struct update_gid_work *gw =
-   container_of(work, struct update_gid_work, work);
-   struct mlx4_cmd_mailbox *mailbox;
-   union ib_gid *gids;
-   int err;
-   struct mlx4_dev *dev = gw-dev-dev;
-
-   if (!gw-dev-ib_active)
-   return;
-
-   mailbox = mlx4_alloc_cmd_mailbox(dev);
-   if (IS_ERR(mailbox)) {
-   pr_warn(reset gid table failed\n);
-   goto free;
-   }
-
-   gids = mailbox-buf;
-   memcpy(gids, gw-gids, sizeof(gw-gids));
-
-   if (mlx4_ib_port_link_layer(gw-dev-ib_dev, gw-port

[PATCH v3 for-next 23/33] net/mlx4: Postpone the registration of net_device

2015-03-24 Thread Somnath Kotur
From: Moni Shoua mo...@mellanox.com

The mlx4 network driver was registered in the context of the 'add'
function of the core driver (called when HW should be registered).
This makes the netdev event NETDEV_REGISTER to be sent in a context
where the answer to get_protocol_dev() callback returns NULL. This may
be confusing to listeners of netdev events.
This patch is a preparation to the patch that implements the
get_netdev() callback in the IB/mlx4 driver.

Signed-off-by: Moni Shoua mo...@mellanox.com
Signed-off-by: Somnath Kotur somnath.ko...@emulex.com
---
 drivers/net/ethernet/mellanox/mlx4/en_main.c | 36 
 drivers/net/ethernet/mellanox/mlx4/intf.c|  3 +++
 include/linux/mlx4/driver.h  |  1 +
 3 files changed, 25 insertions(+), 15 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_main.c 
b/drivers/net/ethernet/mellanox/mlx4/en_main.c
index 2859ac6..64b4f8d2 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_main.c
@@ -219,6 +219,26 @@ static void mlx4_en_remove(struct mlx4_dev *dev, void 
*endev_ptr)
kfree(mdev);
 }
 
+static void mlx4_en_activate(struct mlx4_dev *dev, void *ctx)
+{
+   int i;
+   struct mlx4_en_dev *mdev = ctx;
+
+   /* Create a netdev for each port */
+   mlx4_foreach_port(i, dev, MLX4_PORT_TYPE_ETH) {
+   mlx4_info(mdev, Activating port:%d\n, i);
+   if (mlx4_en_init_netdev(mdev, i, mdev-profile.prof[i]))
+   mdev-pndev[i] = NULL;
+   }
+
+   /* register notifier */
+   mdev-nb.notifier_call = mlx4_en_netdev_event;
+   if (register_netdevice_notifier(mdev-nb)) {
+   mdev-nb.notifier_call = NULL;
+   mlx4_err(mdev, Failed to create notifier\n);
+   }
+}
+
 static void *mlx4_en_add(struct mlx4_dev *dev)
 {
struct mlx4_en_dev *mdev;
@@ -292,21 +312,6 @@ static void *mlx4_en_add(struct mlx4_dev *dev)
mutex_init(mdev-state_lock);
mdev-device_up = true;
 
-   /* Setup ports */
-
-   /* Create a netdev for each port */
-   mlx4_foreach_port(i, dev, MLX4_PORT_TYPE_ETH) {
-   mlx4_info(mdev, Activating port:%d\n, i);
-   if (mlx4_en_init_netdev(mdev, i, mdev-profile.prof[i]))
-   mdev-pndev[i] = NULL;
-   }
-   /* register notifier */
-   mdev-nb.notifier_call = mlx4_en_netdev_event;
-   if (register_netdevice_notifier(mdev-nb)) {
-   mdev-nb.notifier_call = NULL;
-   mlx4_err(mdev, Failed to create notifier\n);
-   }
-
return mdev;
 
 err_mr:
@@ -330,6 +335,7 @@ static struct mlx4_interface mlx4_en_interface = {
.event  = mlx4_en_event,
.get_dev= mlx4_en_get_netdev,
.protocol   = MLX4_PROT_ETH,
+   .activate   = mlx4_en_activate,
 };
 
 static void mlx4_en_verify_params(void)
diff --git a/drivers/net/ethernet/mellanox/mlx4/intf.c 
b/drivers/net/ethernet/mellanox/mlx4/intf.c
index a1a5985..ccd4030 100644
--- a/drivers/net/ethernet/mellanox/mlx4/intf.c
+++ b/drivers/net/ethernet/mellanox/mlx4/intf.c
@@ -63,8 +63,11 @@ static void mlx4_add_device(struct mlx4_interface *intf, 
struct mlx4_priv *priv)
spin_lock_irq(priv-ctx_lock);
list_add_tail(dev_ctx-list, priv-ctx_list);
spin_unlock_irq(priv-ctx_lock);
+   if (intf-activate)
+   intf-activate(priv-dev, dev_ctx-context);
} else
kfree(dev_ctx);
+
 }
 
 static void mlx4_remove_device(struct mlx4_interface *intf, struct mlx4_priv 
*priv)
diff --git a/include/linux/mlx4/driver.h b/include/linux/mlx4/driver.h
index 9553a73..5a06d96 100644
--- a/include/linux/mlx4/driver.h
+++ b/include/linux/mlx4/driver.h
@@ -59,6 +59,7 @@ struct mlx4_interface {
void(*event) (struct mlx4_dev *dev, void *context,
  enum mlx4_dev_event event, unsigned 
long param);
void *  (*get_dev)(struct mlx4_dev *dev, void *context, 
u8 port);
+   void(*activate)(struct mlx4_dev *dev, void 
*context);
struct list_headlist;
enum mlx4_protocol  protocol;
int flags;
-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 for-next 31/33] IB/mlx4: Enable send of RoCE QP1 packets with IP/UDP headers

2015-03-24 Thread Somnath Kotur
From: Moni Shoua mo...@mellanox.com

RoCEv2 packets are sent over IP/UDP protocols.
The mlx4 driver uses a type of RAW QP to send packets for QP1 and
therefore needs to build the network headers below BTH in software.

This patche adds option to build QP1 packets with IP and UDP headers if
RoCEv2 is requested.

Signed-off-by: Moni Shoua mo...@mellanox.com
Signed-off-by: Somnath Kotur somnath.ko...@emulex.com
---
 drivers/infiniband/hw/mlx4/qp.c | 84 +
 1 file changed, 52 insertions(+), 32 deletions(-)

diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c
index 1141cf0..fb37415 100644
--- a/drivers/infiniband/hw/mlx4/qp.c
+++ b/drivers/infiniband/hw/mlx4/qp.c
@@ -32,6 +32,8 @@
  */
 
 #include linux/log2.h
+#include linux/if_ether.h
+#include net/ip.h
 #include linux/slab.h
 #include linux/netdevice.h
 
@@ -2169,16 +2171,7 @@ static int build_sriov_qp0_header(struct mlx4_ib_sqp 
*sqp,
return 0;
 }
 
-static void mlx4_u64_to_smac(u8 *dst_mac, u64 src_mac)
-{
-   int i;
-
-   for (i = ETH_ALEN; i; i--) {
-   dst_mac[i - 1] = src_mac  0xff;
-   src_mac = 8;
-   }
-}
-
+#define MLX4_ROCEV2_QP1_SPORT 0xC000
 static int build_mlx_header(struct mlx4_ib_sqp *sqp, struct ib_send_wr *wr,
void *wqe, unsigned *mlx_seg_len)
 {
@@ -2198,6 +2191,8 @@ static int build_mlx_header(struct mlx4_ib_sqp *sqp, 
struct ib_send_wr *wr,
bool is_eth;
bool is_vlan = false;
bool is_grh;
+   bool is_udp = false;
+   int ip_version = 0;
 
send_size = 0;
for (i = 0; i  wr-num_sge; ++i)
@@ -2206,6 +2201,8 @@ static int build_mlx_header(struct mlx4_ib_sqp *sqp, 
struct ib_send_wr *wr,
is_eth = rdma_port_get_link_layer(sqp-qp.ibqp.device, sqp-qp.port) == 
IB_LINK_LAYER_ETHERNET;
is_grh = mlx4_ib_ah_grh_present(ah);
if (is_eth) {
+   struct ib_gid_attr gid_attr;
+
if (mlx4_is_mfunc(to_mdev(ib_dev)-dev)) {
/* When multi-function is enabled, the ib_core gid
 * indexes don't necessarily match the hw ones, so
@@ -2216,23 +2213,31 @@ static int build_mlx_header(struct mlx4_ib_sqp *sqp, 
struct ib_send_wr *wr,
if (err)
return err;
} else  {
-   err = ib_get_cached_gid(ib_dev,
+   err = ib_get_cached_gid(sqp-qp.ibqp.device,
be32_to_cpu(ah-av.ib.port_pd) 
 24,
-   ah-av.ib.gid_index, sgid,
-   NULL);
+   ah-av.ib.gid_index, sgid, 
gid_attr);
if (!err  !memcmp(sgid, zgid, sizeof(sgid)))
err = -ENOENT;
-   if (err)
+   if (!err) {
+   is_udp = (gid_attr.gid_type == 
IB_GID_TYPE_ROCE_V2) ? true : false;
+   if (is_udp) {
+   if (ipv6_addr_v4mapped((struct in6_addr 
*)sgid))
+   ip_version = 4;
+   else
+   ip_version = 6;
+   is_grh = false;
+   }
+   } else {
return err;
+   }
}
-
if (ah-av.eth.vlan != cpu_to_be16(0x)) {
vlan = be16_to_cpu(ah-av.eth.vlan)  0x0fff;
is_vlan = 1;
}
}
err = ib_ud_header_init(send_size, !is_eth, is_eth, is_vlan, is_grh,
-   0, 0, 0, sqp-ud_header);
+ ip_version, is_udp, 0, sqp-ud_header);
if (err)
return err;
 
@@ -2243,12 +2248,14 @@ static int build_mlx_header(struct mlx4_ib_sqp *sqp, 
struct ib_send_wr *wr,
sqp-ud_header.lrh.source_lid = cpu_to_be16(ah-av.ib.g_slid  
0x7f);
}
 
-   if (is_grh) {
+   if (is_grh || (ip_version == 6)) {
sqp-ud_header.grh.traffic_class =
(be32_to_cpu(ah-av.ib.sl_tclass_flowlabel)  20)  
0xff;
sqp-ud_header.grh.flow_label=
ah-av.ib.sl_tclass_flowlabel  cpu_to_be32(0xf);
-   sqp-ud_header.grh.hop_limit = ah-av.ib.hop_limit;
+
+   sqp-ud_header.grh.hop_limit = (is_udp) ?
+   IPV6_DEFAULT_HOPLIMIT : ah-av.ib.hop_limit;
if (is_eth)
memcpy(sqp-ud_header.grh.source_gid.raw, sgid.raw, 16);
else {
@@ -2272,6 +2279,26 @@ static int build_mlx_header(struct mlx4_ib_sqp *sqp, 
struct

[PATCH v3 for-next 30/33] IB/core: Initialize UD header structure with IP and UDP headers

2015-03-24 Thread Somnath Kotur
From: Moni Shoua mo...@mellanox.com

ib_ud_header_init() is used to format InfiniBand headers
in a buffer up to (but not with) BTH. For RoCEv2 it is required that
this function would be able to build also IP and UDP headers.

Signed-off-by: Moni Shoua mo...@mellanox.com
Signed-off-by: Somnath Kotur somnath.ko...@emulex.com
---
 drivers/infiniband/core/ud_header.c| 153 ++---
 drivers/infiniband/hw/mlx4/qp.c|   7 +-
 drivers/infiniband/hw/mthca/mthca_qp.c |   2 +-
 include/rdma/ib_pack.h |  44 --
 4 files changed, 186 insertions(+), 20 deletions(-)

diff --git a/drivers/infiniband/core/ud_header.c 
b/drivers/infiniband/core/ud_header.c
index 72feee6..a4d4072 100644
--- a/drivers/infiniband/core/ud_header.c
+++ b/drivers/infiniband/core/ud_header.c
@@ -35,6 +35,7 @@
 #include linux/string.h
 #include linux/export.h
 #include linux/if_ether.h
+#include linux/ip.h
 
 #include rdma/ib_pack.h
 
@@ -116,6 +117,68 @@ static const struct ib_field vlan_table[]  = {
  .size_bits= 16 }
 };
 
+static const struct ib_field ip4_table[]  = {
+   { STRUCT_FIELD(ip4, ver_len),
+ .offset_words = 0,
+ .offset_bits  = 0,
+ .size_bits= 8 },
+   { STRUCT_FIELD(ip4, tos),
+ .offset_words = 0,
+ .offset_bits  = 8,
+ .size_bits= 8 },
+   { STRUCT_FIELD(ip4, tot_len),
+ .offset_words = 0,
+ .offset_bits  = 16,
+ .size_bits= 16 },
+   { STRUCT_FIELD(ip4, id),
+ .offset_words = 1,
+ .offset_bits  = 0,
+ .size_bits= 16 },
+   { STRUCT_FIELD(ip4, frag_off),
+ .offset_words = 1,
+ .offset_bits  = 16,
+ .size_bits= 16 },
+   { STRUCT_FIELD(ip4, ttl),
+ .offset_words = 2,
+ .offset_bits  = 0,
+ .size_bits= 8 },
+   { STRUCT_FIELD(ip4, protocol),
+ .offset_words = 2,
+ .offset_bits  = 8,
+ .size_bits= 8 },
+   { STRUCT_FIELD(ip4, check),
+ .offset_words = 2,
+ .offset_bits  = 16,
+ .size_bits= 16 },
+   { STRUCT_FIELD(ip4, saddr),
+ .offset_words = 3,
+ .offset_bits  = 0,
+ .size_bits= 32 },
+   { STRUCT_FIELD(ip4, daddr),
+ .offset_words = 4,
+ .offset_bits  = 0,
+ .size_bits= 32 }
+};
+
+static const struct ib_field udp_table[]  = {
+   { STRUCT_FIELD(udp, sport),
+ .offset_words = 0,
+ .offset_bits  = 0,
+ .size_bits= 16 },
+   { STRUCT_FIELD(udp, dport),
+ .offset_words = 0,
+ .offset_bits  = 16,
+ .size_bits= 16 },
+   { STRUCT_FIELD(udp, length),
+ .offset_words = 1,
+ .offset_bits  = 0,
+ .size_bits= 16 },
+   { STRUCT_FIELD(udp, csum),
+ .offset_words = 1,
+ .offset_bits  = 16,
+ .size_bits= 16 }
+};
+
 static const struct ib_field grh_table[]  = {
{ STRUCT_FIELD(grh, ip_version),
  .offset_words = 0,
@@ -213,6 +276,26 @@ static const struct ib_field deth_table[] = {
  .size_bits= 24 }
 };
 
+__be16 ib_ud_ip4_csum(struct ib_ud_header *header)
+{
+   struct iphdr iph;
+
+   iph.ihl = 5;
+   iph.version = 4;
+   iph.tos = header-ip4.tos;
+   iph.tot_len = header-ip4.tot_len;
+   iph.id  = header-ip4.id;
+   iph.frag_off= header-ip4.frag_off;
+   iph.ttl = header-ip4.ttl;
+   iph.protocol= header-ip4.protocol;
+   iph.check   = 0;
+   iph.saddr   = header-ip4.saddr;
+   iph.daddr   = header-ip4.daddr;
+
+   return ip_fast_csum((u8 *)iph, iph.ihl);
+}
+EXPORT_SYMBOL(ib_ud_ip4_csum);
+
 /**
  * ib_ud_header_init - Initialize UD header structure
  * @payload_bytes:Length of packet payload
@@ -220,19 +303,35 @@ static const struct ib_field deth_table[] = {
  * @eth_present: specify if Eth header is present
  * @vlan_present: packet is tagged vlan
  * @grh_present:GRH flag (if non-zero, GRH will be included)
+ * @ip_version:GRH flag (if non-zero, IP header, V4 or V6, will be included)
+ * @grh_present:GRH flag (if non-zero, UDP header will be included)
  * @immediate_present: specify if immediate data is present
  * @header:Structure to initialize
  */
-void ib_ud_header_init(int payload_bytes,
-  int  lrh_present,
-  int  eth_present,
-  int  vlan_present,
-  int  grh_present,
-  int  immediate_present,
-  struct ib_ud_header *header)
+int ib_ud_header_init(int payload_bytes,
+ intlrh_present,
+ inteth_present,
+ intvlan_present,
+ intgrh_present

[PATCH v3 for-next 01/33] IB/core: Add RoCE GID cache

2015-03-24 Thread Somnath Kotur
From: Matan Barak mat...@mellanox.com

In order to manage multiple types, vlans and MACs per GID, we
need to store them along the GID itself. We store the net device
as well, as sometimes GIDs should be handled according to the
net device they came from. Since populating the GID table should
be identical for every RoCE provider, the GIDs table should be
handled in ib_core.

Adding a GID cache table that supports a lockless find, add and
delete gids. The lockless nature comes from using a unique
sequence number per table entry and detecting that while reading/
writing this sequence wasn't changed.

By using this RoCE GID cache table, providers must implement a
modify_gid callback. The table is managed exclusively by
this roce_gid_cache and the provider just need to write
the data to the hardware.

Signed-off-by: Matan Barak mat...@mellanox.com
Signed-off-by: Somnath Kotur somnath.ko...@emulex.com
---
 drivers/infiniband/core/Makefile |   3 +-
 drivers/infiniband/core/core_priv.h  |  24 ++
 drivers/infiniband/core/roce_gid_cache.c | 518 +++
 drivers/infiniband/hw/mlx4/main.c|   2 -
 include/rdma/ib_verbs.h  |  55 +++-
 5 files changed, 598 insertions(+), 4 deletions(-)
 create mode 100644 drivers/infiniband/core/roce_gid_cache.c

diff --git a/drivers/infiniband/core/Makefile b/drivers/infiniband/core/Makefile
index acf7367..9b63bdf 100644
--- a/drivers/infiniband/core/Makefile
+++ b/drivers/infiniband/core/Makefile
@@ -9,7 +9,8 @@ obj-$(CONFIG_INFINIBAND_USER_ACCESS) += ib_uverbs.o ib_ucm.o \
$(user_access-y)
 
 ib_core-y :=   packer.o ud_header.o verbs.o sysfs.o \
-   device.o fmr_pool.o cache.o netlink.o
+   device.o fmr_pool.o cache.o netlink.o \
+   roce_gid_cache.o
 ib_core-$(CONFIG_INFINIBAND_USER_MEM) += umem.o
 ib_core-$(CONFIG_INFINIBAND_ON_DEMAND_PAGING) += umem_odp.o umem_rbtree.o
 
diff --git a/drivers/infiniband/core/core_priv.h 
b/drivers/infiniband/core/core_priv.h
index 87d1936..a502daa 100644
--- a/drivers/infiniband/core/core_priv.h
+++ b/drivers/infiniband/core/core_priv.h
@@ -35,6 +35,7 @@
 
 #include linux/list.h
 #include linux/spinlock.h
+#include net/net_namespace.h
 
 #include rdma/ib_verbs.h
 
@@ -51,4 +52,27 @@ void ib_cache_cleanup(void);
 
 int ib_resolve_eth_l2_attrs(struct ib_qp *qp,
struct ib_qp_attr *qp_attr, int *qp_attr_mask);
+
+int roce_gid_cache_get_gid(struct ib_device *ib_dev, u8 port, int index,
+  union ib_gid *gid, struct ib_gid_attr *attr);
+
+int roce_gid_cache_find_gid(struct ib_device *ib_dev, union ib_gid *gid,
+   enum ib_gid_type gid_type, struct net *net,
+   int if_index, u8 *port, u16 *index);
+
+int roce_gid_cache_find_gid_by_port(struct ib_device *ib_dev, union ib_gid 
*gid,
+   enum ib_gid_type gid_type, u8 port,
+   struct net *net, int if_index, u16 *index);
+
+int roce_gid_cache_is_active(struct ib_device *ib_dev, u8 port);
+
+int roce_add_gid(struct ib_device *ib_dev, u8 port,
+union ib_gid *gid, struct ib_gid_attr *attr);
+
+int roce_del_gid(struct ib_device *ib_dev, u8 port,
+union ib_gid *gid, struct ib_gid_attr *attr);
+
+int roce_del_all_netdev_gids(struct ib_device *ib_dev, u8 port,
+struct net_device *ndev);
+
 #endif /* _CORE_PRIV_H */
diff --git a/drivers/infiniband/core/roce_gid_cache.c 
b/drivers/infiniband/core/roce_gid_cache.c
new file mode 100644
index 000..80f364a
--- /dev/null
+++ b/drivers/infiniband/core/roce_gid_cache.c
@@ -0,0 +1,518 @@
+/*
+ * Copyright (c) 2015, Mellanox Technologies inc.  All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *  - Redistributions of source code must retain the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer.
+ *
+ *  - Redistributions in binary form must reproduce the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer in the documentation and/or other materials
+ *provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT

[PATCH v3 for-next 27/33] IB/mlx4: Configure device to work in RoCEv2

2015-03-24 Thread Somnath Kotur
From: Moni Shoua mo...@mellanox.com

Some mlx4 adapters are RoCEv2 capable. To enable this feature some
hardware configuration is required. This is

1. Set port general parameters
2. Configure the outgoing UDP destination port
3. Configure the QP that work with RoCEv2

Signed-off-by: Moni Shoua mo...@mellanox.com
Signed-off-by: Somnath Kotur somnath.ko...@emulex.com
---
 drivers/infiniband/hw/mlx4/main.c | 10 +++-
 drivers/infiniband/hw/mlx4/qp.c   | 40 +++
 drivers/net/ethernet/mellanox/mlx4/fw.c   | 16 -
 drivers/net/ethernet/mellanox/mlx4/mlx4.h |  3 ++-
 drivers/net/ethernet/mellanox/mlx4/port.c |  9 ++-
 drivers/net/ethernet/mellanox/mlx4/qp.c   | 27 +
 include/linux/mlx4/device.h   |  1 +
 include/linux/mlx4/qp.h   | 15 ++--
 8 files changed, 111 insertions(+), 10 deletions(-)

diff --git a/drivers/infiniband/hw/mlx4/main.c 
b/drivers/infiniband/hw/mlx4/main.c
index 96a6ec0..ee99f62 100644
--- a/drivers/infiniband/hw/mlx4/main.c
+++ b/drivers/infiniband/hw/mlx4/main.c
@@ -2168,7 +2168,8 @@ static void *mlx4_ib_add(struct mlx4_dev *dev)
if (mlx4_ib_init_sriov(ibdev))
goto err_mad;
 
-   if (dev-caps.flags  MLX4_DEV_CAP_FLAG_IBOE) {
+   if (dev-caps.flags  MLX4_DEV_CAP_FLAG_IBOE ||
+   dev-caps.flags2  MLX4_DEV_CAP_FLAG2_ROCE_V1_V2) {
if (!iboe-nb.notifier_call) {
iboe-nb.notifier_call = mlx4_ib_netdev_event;
err = register_netdevice_notifier(iboe-nb);
@@ -2177,6 +2178,13 @@ static void *mlx4_ib_add(struct mlx4_dev *dev)
goto err_notif;
}
}
+   if (!mlx4_is_slave(dev) 
+   dev-caps.flags2  MLX4_DEV_CAP_FLAG2_ROCE_V1_V2) {
+   err = mlx4_config_roce_v2_port(dev, ROCE_V2_UDP_DPORT);
+   if (err) {
+   goto err_notif;
+   }
+   }
}
 
for (j = 0; j  ARRAY_SIZE(mlx4_class_attributes); ++j) {
diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c
index 6f6d0db..847f9ec 100644
--- a/drivers/infiniband/hw/mlx4/qp.c
+++ b/drivers/infiniband/hw/mlx4/qp.c
@@ -1408,6 +1408,24 @@ static int handle_eth_ud_smac_index(struct mlx4_ib_dev 
*dev,
return 0;
 }
 
+enum {
+   MLX4_QPC_ROCE_MODE_1 = 0,
+   MLX4_QPC_ROCE_MODE_2 = 2,
+   MLX4_QPC_ROCE_MODE_MAX = 0xff
+};
+
+static u8 gid_type_to_qpc(enum ib_gid_type gid_type)
+{
+   switch (gid_type) {
+   case IB_GID_TYPE_IB:
+   return MLX4_QPC_ROCE_MODE_1;
+   case IB_GID_TYPE_ROCE_V2:
+   return MLX4_QPC_ROCE_MODE_2;
+   default:
+   return MLX4_QPC_ROCE_MODE_MAX;
+   }
+}
+
 static int __mlx4_ib_modify_qp(struct ib_qp *ibqp,
   const struct ib_qp_attr *attr, int attr_mask,
   enum ib_qp_state cur_state, enum ib_qp_state 
new_state)
@@ -1531,12 +1549,14 @@ static int __mlx4_ib_modify_qp(struct ib_qp *ibqp,
u16 vlan = 0x;
u8 smac[ETH_ALEN];
int status = 0;
+   int is_eth = rdma_port_get_link_layer(dev-ib_dev, qp-port) ==
+   IB_LINK_LAYER_ETHERNET;
 
-   if (rdma_port_get_link_layer(dev-ib_dev, qp-port) ==
-   IB_LINK_LAYER_ETHERNET 
-   attr-ah_attr.ah_flags  IB_AH_GRH) {
+   if (is_eth  attr-ah_attr.ah_flags  IB_AH_GRH) {
int index = attr-ah_attr.grh.sgid_index;
 
+   if (mlx4_is_bonded(dev-dev))
+   port_num  = 1;
rcu_read_lock();
status = ib_get_cached_gid(ibqp-device, port_num,
   index, gid, gid_attr);
@@ -1555,8 +1575,20 @@ static int __mlx4_ib_modify_qp(struct ib_qp *ibqp,
  port_num, vlan, smac))
goto out;
 
+   if (is_eth  gid_attr.gid_type == IB_GID_TYPE_ROCE_V2)
+   context-pri_path.hop_limit = IPV6_DEFAULT_HOPLIMIT;
+
optpar |= (MLX4_QP_OPTPAR_PRIMARY_ADDR_PATH |
   MLX4_QP_OPTPAR_SCHED_QUEUE);
+
+   if (is_eth  (cur_state == IB_QPS_INIT  new_state == 
IB_QPS_RTR)) {
+   u8 qpc_roce_mode = gid_type_to_qpc(gid_attr.gid_type);
+
+   if (qpc_roce_mode == MLX4_QPC_ROCE_MODE_MAX)
+   goto out;
+   context-rlkey_roce_mode |= (qpc_roce_mode  6);
+   }
+
}
 
if (attr_mask  IB_QP_TIMEOUT) {
@@ -1728,7 +1760,7 @@ static int __mlx4_ib_modify_qp(struct ib_qp *ibqp,
sqd_event = 0;
 
if (!ibqp-uobject

RE: [PATCH v2 for-next 00/32] RoCE V1/v2 per GID

2015-03-19 Thread Somnath Kotur
Hi Roland,
Could you please chime in on this patch series? Its been more 
than a week since we sent out V2?
Thanks 
Som


From: linux-rdma-ow...@vger.kernel.org [linux-rdma-ow...@vger.kernel.org] on 
behalf of Somnath Kotur [somnath.ko...@emulex.com]
Sent: Wednesday, March 11, 2015 10:25 AM
To: rol...@kernel.org
Cc: linux-rdma@vger.kernel.org; Somnath Kotur
Subject: [PATCH v2 for-next 00/32] RoCE V1/v2 per GID

Hi Roland,

This patch series was created out of collaboration between Emulex and Mellanox.
While Emulex sent out the RoCEV2 patch first to the community, Mellanox which
was also working on some core infrastructure changes from the ground-up towards
RoCEV2 felt that the RoCEV2 patch would be better served if done on top of
their basic infrastructure changes to associate entities like MAC, VLAN,
IP Address with GIDs and thereby move GID Table Management from HW Vendor
drivers to IB/Core.
This patchset is the result of joint development effort between the two teams.

RoCE per GID patch-set aims to introduce RoCE V2 GID type while
maintaining compatibility with RoCE V1. This is done by adding
a type attribute for every GID type in addition to the required
extra net device attribute required for RoCE V2. Previously,
every vendor implemented its net device notifiers in its own
driver. This introduces a huge code duplication as figuring
whether the event is related to the vendor's net device in the
various cases (bonding, vlan or any other upper device) is
similar for all vendors. Introducing multiple GID types and other
attributes would have made this code duplication even worse. Therefore,
we decided moving this into a common core core. roce_gid_cache and
roce_gid_mgmt were created in order to store and manage
the new GID table, by filling it when getting the related events.
Vendors now only have to implement modify_gid and get_netdev IB
device calls, which are truly unique for each vendor.
Patch 0001 creates a new infrastructure for storing GIDs and their attributes
in IB/core. This infrastructure support lock-less read of GIDs using a
sequence number. The data structure is initialized only for RoCE ports.
Every gid has meta information describes its related net device and its
type.

Patch 0002 adds a reference count mechanism to IB devices. This mechanism
is similar to dev_hold and dev_put available for net devices. This is
mandatory for later patches as IB clients might want to wait for its
work to complete in the device removal function, but a work might
traverse the device list. This might cause a dead lock, as the removal
function grabbed the device lock and in turn it waits for the client's work
which wants to grab the device mutex as well.

Patches 0003, 0004 and 0006 add population of this table for various cases
based on net device events. We always enable default gids for an active
device (an active device is defined here as a device that doesn't have
a bonding master or is the current active slave). This is done in order
to allow loopback traffic. Patch 0005 adds proper bonding support -
only the active slaves retain their master's IP based gids and default gids.

This whole concept needs to fit the existing sysfs model, thus patch 0008
adds sysfs entries that represent the net device and gid type related to
each gid.

Patch 0009 adds a new API for RoCE gid cache lookup. Since users might
want to find a GID which matches a net device with a specific attributes,
the new API allows them to pass a filter function. This function is a bit
slower than the regular find by gid, gid_type, if_index and namespace -
thus it should be used only when necessary.

Patches 0007, 0010, 0011 and 0012 changes the rest of IB/core to fit the new
model. Instead of storing smac and vlan, we store either if_index, gid
and gid_type or sgid_index. Either set suffices in order to resolve all
the required Ethernet parameters. ib_init_ah_from_wc was changed, such
that when a wc is arrived, we search our RoCE gid cache in order to
find a suitable sgid_index that matches the net device. Matching is
done based on GID and VLAN.

Patch 0013 is used in order to configure the default mode of the cma.
In order to avoid changing existing rdma-cm applications, we adds a
configfs that states for each ib device what's the default RoCE mode.

Patch 0014 is the post refactored version of the original RoCE V2 patch from 
Emulex
that now mainly corrects the hop limit value and adds a hint about
RoCE type based on whether we have a gateway. This is the patch that
makes it possible for applications to seamlessly interop between RoCE V1
and V2 without undergoing any changes themselves.

Patch 0029 deals with serializing QP1 packets for software based
QP1 and the last patch handles joining and leaving IGMP groups
for RoCE V2 multicast functionality.

The rest of the patches add support for ocrdma and mlx4 devices.

This series depends on RoCE LAG series (already accepted in net-next

[PATCH v2 for-next 30/32] IB/mlx4: Enable send of RoCE QP1 packets with IP/UDP headers

2015-03-10 Thread Somnath Kotur
From: Moni Shoua mo...@mellanox.com

RoCEv2 packets are sent over IP/UDP protocols.
The mlx4 driver uses a type of RAW QP to send packets for QP1 and
therefore needs to build the network headers below BTH in software.

This patche adds option to build QP1 packets with IP and UDP headers if
RoCEv2 is requested.

Signed-off-by: Moni Shoua mo...@mellanox.com
Signed-off-by: Somnath Kotur somnath.ko...@emulex.com
---
 drivers/infiniband/hw/mlx4/qp.c | 84 +
 1 file changed, 52 insertions(+), 32 deletions(-)

diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c
index 1141cf0..fb37415 100644
--- a/drivers/infiniband/hw/mlx4/qp.c
+++ b/drivers/infiniband/hw/mlx4/qp.c
@@ -32,6 +32,8 @@
  */
 
 #include linux/log2.h
+#include linux/if_ether.h
+#include net/ip.h
 #include linux/slab.h
 #include linux/netdevice.h
 
@@ -2169,16 +2171,7 @@ static int build_sriov_qp0_header(struct mlx4_ib_sqp 
*sqp,
return 0;
 }
 
-static void mlx4_u64_to_smac(u8 *dst_mac, u64 src_mac)
-{
-   int i;
-
-   for (i = ETH_ALEN; i; i--) {
-   dst_mac[i - 1] = src_mac  0xff;
-   src_mac = 8;
-   }
-}
-
+#define MLX4_ROCEV2_QP1_SPORT 0xC000
 static int build_mlx_header(struct mlx4_ib_sqp *sqp, struct ib_send_wr *wr,
void *wqe, unsigned *mlx_seg_len)
 {
@@ -2198,6 +2191,8 @@ static int build_mlx_header(struct mlx4_ib_sqp *sqp, 
struct ib_send_wr *wr,
bool is_eth;
bool is_vlan = false;
bool is_grh;
+   bool is_udp = false;
+   int ip_version = 0;
 
send_size = 0;
for (i = 0; i  wr-num_sge; ++i)
@@ -2206,6 +2201,8 @@ static int build_mlx_header(struct mlx4_ib_sqp *sqp, 
struct ib_send_wr *wr,
is_eth = rdma_port_get_link_layer(sqp-qp.ibqp.device, sqp-qp.port) == 
IB_LINK_LAYER_ETHERNET;
is_grh = mlx4_ib_ah_grh_present(ah);
if (is_eth) {
+   struct ib_gid_attr gid_attr;
+
if (mlx4_is_mfunc(to_mdev(ib_dev)-dev)) {
/* When multi-function is enabled, the ib_core gid
 * indexes don't necessarily match the hw ones, so
@@ -2216,23 +2213,31 @@ static int build_mlx_header(struct mlx4_ib_sqp *sqp, 
struct ib_send_wr *wr,
if (err)
return err;
} else  {
-   err = ib_get_cached_gid(ib_dev,
+   err = ib_get_cached_gid(sqp-qp.ibqp.device,
be32_to_cpu(ah-av.ib.port_pd) 
 24,
-   ah-av.ib.gid_index, sgid,
-   NULL);
+   ah-av.ib.gid_index, sgid, 
gid_attr);
if (!err  !memcmp(sgid, zgid, sizeof(sgid)))
err = -ENOENT;
-   if (err)
+   if (!err) {
+   is_udp = (gid_attr.gid_type == 
IB_GID_TYPE_ROCE_V2) ? true : false;
+   if (is_udp) {
+   if (ipv6_addr_v4mapped((struct in6_addr 
*)sgid))
+   ip_version = 4;
+   else
+   ip_version = 6;
+   is_grh = false;
+   }
+   } else {
return err;
+   }
}
-
if (ah-av.eth.vlan != cpu_to_be16(0x)) {
vlan = be16_to_cpu(ah-av.eth.vlan)  0x0fff;
is_vlan = 1;
}
}
err = ib_ud_header_init(send_size, !is_eth, is_eth, is_vlan, is_grh,
-   0, 0, 0, sqp-ud_header);
+ ip_version, is_udp, 0, sqp-ud_header);
if (err)
return err;
 
@@ -2243,12 +2248,14 @@ static int build_mlx_header(struct mlx4_ib_sqp *sqp, 
struct ib_send_wr *wr,
sqp-ud_header.lrh.source_lid = cpu_to_be16(ah-av.ib.g_slid  
0x7f);
}
 
-   if (is_grh) {
+   if (is_grh || (ip_version == 6)) {
sqp-ud_header.grh.traffic_class =
(be32_to_cpu(ah-av.ib.sl_tclass_flowlabel)  20)  
0xff;
sqp-ud_header.grh.flow_label=
ah-av.ib.sl_tclass_flowlabel  cpu_to_be32(0xf);
-   sqp-ud_header.grh.hop_limit = ah-av.ib.hop_limit;
+
+   sqp-ud_header.grh.hop_limit = (is_udp) ?
+   IPV6_DEFAULT_HOPLIMIT : ah-av.ib.hop_limit;
if (is_eth)
memcpy(sqp-ud_header.grh.source_gid.raw, sgid.raw, 16);
else {
@@ -2272,6 +2279,26 @@ static int build_mlx_header(struct mlx4_ib_sqp *sqp, 
struct

[PATCH v2 for-next 31/32] IB/mlx4: Create and use another QP1 for RoCEv2

2015-03-10 Thread Somnath Kotur
From: Moni Shoua mo...@mellanox.com

The mlx4 driver uses a special QP to implement the GSI QP. This kind of
QP allows to build the InfiniBand headers in SW to be put before the
payload that comes in with the WR. The mlx4 HW builds the packet,
calculates the ICRC and puts it at the end of the payload. This ICRC
calculation however depends on the QP configuration which is determined
when QP is modified (roce_mode during INIT-RTR). On the other hand,  ICRC
verification when packet is received does to depend on this
configuration.
Therefore, using 2 GSI QPs for send (one for each RoCE version) and 1
GSI QP for receive are required.

Signed-off-by: Moni Shoua mo...@mellanox.com
Signed-off-by: Somnath Kotur somnath.ko...@emulex.com
---
 drivers/infiniband/hw/mlx4/mlx4_ib.h |   7 ++
 drivers/infiniband/hw/mlx4/qp.c  | 155 +++
 2 files changed, 144 insertions(+), 18 deletions(-)

diff --git a/drivers/infiniband/hw/mlx4/mlx4_ib.h 
b/drivers/infiniband/hw/mlx4/mlx4_ib.h
index 018bda6..a853330 100644
--- a/drivers/infiniband/hw/mlx4/mlx4_ib.h
+++ b/drivers/infiniband/hw/mlx4/mlx4_ib.h
@@ -159,11 +159,18 @@ struct mlx4_ib_wq {
unsignedtail;
 };
 
+enum {
+   MLX4_IB_QP_CREATE_ROCE_V2_GSI = IB_QP_CREATE_RESERVED_START
+};
+
 enum mlx4_ib_qp_flags {
MLX4_IB_QP_LSO = IB_QP_CREATE_IPOIB_UD_LSO,
MLX4_IB_QP_BLOCK_MULTICAST_LOOPBACK = 
IB_QP_CREATE_BLOCK_MULTICAST_LOOPBACK,
MLX4_IB_QP_NETIF = IB_QP_CREATE_NETIF_QP,
MLX4_IB_QP_CREATE_USE_GFP_NOIO = IB_QP_CREATE_USE_GFP_NOIO,
+
+   /* Mellanox specific flags start from IB_QP_CREATE_RESERVED_START */
+   MLX4_IB_ROCE_V2_GSI_QP = MLX4_IB_QP_CREATE_ROCE_V2_GSI,
MLX4_IB_SRIOV_TUNNEL_QP = 1  30,
MLX4_IB_SRIOV_SQP = 1  31,
 };
diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c
index fb37415..b54f315 100644
--- a/drivers/infiniband/hw/mlx4/qp.c
+++ b/drivers/infiniband/hw/mlx4/qp.c
@@ -81,6 +81,7 @@ struct mlx4_ib_sqp {
u32 send_psn;
struct ib_ud_header ud_header;
u8  header_buf[MLX4_IB_UD_HEADER_SIZE];
+   struct ib_qp*roce_v2_gsi;
 };
 
 enum {
@@ -150,7 +151,10 @@ static int is_sqp(struct mlx4_ib_dev *dev, struct 
mlx4_ib_qp *qp)
}
}
}
-   return proxy_sqp;
+   if (proxy_sqp)
+   return 1;
+
+   return !!(qp-flags  MLX4_IB_ROCE_V2_GSI_QP);
 }
 
 /* used for INIT/CLOSE port logic */
@@ -672,6 +676,7 @@ static int create_qp_common(struct mlx4_ib_dev *dev, struct 
ib_pd *pd,
qp = sqp-qp;
qp-pri.vid = 0x;
qp-alt.vid = 0x;
+   sqp-roce_v2_gsi = NULL;
} else {
qp = kzalloc(sizeof (struct mlx4_ib_qp), gfp);
if (!qp)
@@ -1029,9 +1034,17 @@ static void destroy_qp_common(struct mlx4_ib_dev *dev, 
struct mlx4_ib_qp *qp,
del_gid_entries(qp);
 }
 
-static u32 get_sqp_num(struct mlx4_ib_dev *dev, struct ib_qp_init_attr *attr)
+static int get_sqp_num(struct mlx4_ib_dev *dev, struct ib_qp_init_attr *attr)
 {
/* Native or PPF */
+   if ((!mlx4_is_mfunc(dev-dev) || mlx4_is_master(dev-dev)) 
+   attr-create_flags  MLX4_IB_QP_CREATE_ROCE_V2_GSI) {
+   int sqpn;
+   int res = mlx4_qp_reserve_range(dev-dev, 1, 1, sqpn, 0);
+
+   return res ? -abs(res) : sqpn;
+   }
+
if (!mlx4_is_mfunc(dev-dev) ||
(mlx4_is_master(dev-dev) 
 attr-create_flags  MLX4_IB_SRIOV_SQP)) {
@@ -1039,6 +1052,7 @@ static u32 get_sqp_num(struct mlx4_ib_dev *dev, struct 
ib_qp_init_attr *attr)
(attr-qp_type == IB_QPT_SMI ? 0 : 2) +
attr-port_num - 1;
}
+
/* PF or VF -- creating proxies */
if (attr-qp_type == IB_QPT_SMI)
return dev-dev-caps.qp0_proxy[attr-port_num - 1];
@@ -1046,9 +1060,9 @@ static u32 get_sqp_num(struct mlx4_ib_dev *dev, struct 
ib_qp_init_attr *attr)
return dev-dev-caps.qp1_proxy[attr-port_num - 1];
 }
 
-struct ib_qp *mlx4_ib_create_qp(struct ib_pd *pd,
-   struct ib_qp_init_attr *init_attr,
-   struct ib_udata *udata)
+static struct ib_qp *_mlx4_ib_create_qp(struct ib_pd *pd,
+   struct ib_qp_init_attr *init_attr,
+   struct ib_udata *udata)
 {
struct mlx4_ib_qp *qp = NULL;
int err;
@@ -1066,6 +1080,7 @@ struct ib_qp *mlx4_ib_create_qp(struct ib_pd *pd,
MLX4_IB_SRIOV_TUNNEL_QP |
MLX4_IB_SRIOV_SQP |
MLX4_IB_QP_NETIF |
+   MLX4_IB_QP_CREATE_ROCE_V2_GSI

[PATCH v2 for-next 22/32] net/mlx4: Postpone the registration of net_device

2015-03-10 Thread Somnath Kotur
From: Moni Shoua mo...@mellanox.com

The mlx4 network driver was registered in the context of the 'add'
function of the core driver (called when HW should be registered).
This makes the netdev event NETDEV_REGISTER to be sent in a context
where the answer to get_protocol_dev() callback returns NULL. This may
be confusing to listeners of netdev events.
This patch is a preparation to the patch that implements the
get_netdev() callback in the IB/mlx4 driver.

Signed-off-by: Moni Shoua mo...@mellanox.com
Signed-off-by: Somnath Kotur somnath.ko...@emulex.com
---
 drivers/net/ethernet/mellanox/mlx4/en_main.c | 36 
 drivers/net/ethernet/mellanox/mlx4/intf.c|  3 +++
 include/linux/mlx4/driver.h  |  1 +
 3 files changed, 25 insertions(+), 15 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_main.c 
b/drivers/net/ethernet/mellanox/mlx4/en_main.c
index 2859ac6..64b4f8d2 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_main.c
@@ -219,6 +219,26 @@ static void mlx4_en_remove(struct mlx4_dev *dev, void 
*endev_ptr)
kfree(mdev);
 }
 
+static void mlx4_en_activate(struct mlx4_dev *dev, void *ctx)
+{
+   int i;
+   struct mlx4_en_dev *mdev = ctx;
+
+   /* Create a netdev for each port */
+   mlx4_foreach_port(i, dev, MLX4_PORT_TYPE_ETH) {
+   mlx4_info(mdev, Activating port:%d\n, i);
+   if (mlx4_en_init_netdev(mdev, i, mdev-profile.prof[i]))
+   mdev-pndev[i] = NULL;
+   }
+
+   /* register notifier */
+   mdev-nb.notifier_call = mlx4_en_netdev_event;
+   if (register_netdevice_notifier(mdev-nb)) {
+   mdev-nb.notifier_call = NULL;
+   mlx4_err(mdev, Failed to create notifier\n);
+   }
+}
+
 static void *mlx4_en_add(struct mlx4_dev *dev)
 {
struct mlx4_en_dev *mdev;
@@ -292,21 +312,6 @@ static void *mlx4_en_add(struct mlx4_dev *dev)
mutex_init(mdev-state_lock);
mdev-device_up = true;
 
-   /* Setup ports */
-
-   /* Create a netdev for each port */
-   mlx4_foreach_port(i, dev, MLX4_PORT_TYPE_ETH) {
-   mlx4_info(mdev, Activating port:%d\n, i);
-   if (mlx4_en_init_netdev(mdev, i, mdev-profile.prof[i]))
-   mdev-pndev[i] = NULL;
-   }
-   /* register notifier */
-   mdev-nb.notifier_call = mlx4_en_netdev_event;
-   if (register_netdevice_notifier(mdev-nb)) {
-   mdev-nb.notifier_call = NULL;
-   mlx4_err(mdev, Failed to create notifier\n);
-   }
-
return mdev;
 
 err_mr:
@@ -330,6 +335,7 @@ static struct mlx4_interface mlx4_en_interface = {
.event  = mlx4_en_event,
.get_dev= mlx4_en_get_netdev,
.protocol   = MLX4_PROT_ETH,
+   .activate   = mlx4_en_activate,
 };
 
 static void mlx4_en_verify_params(void)
diff --git a/drivers/net/ethernet/mellanox/mlx4/intf.c 
b/drivers/net/ethernet/mellanox/mlx4/intf.c
index a1a5985..ccd4030 100644
--- a/drivers/net/ethernet/mellanox/mlx4/intf.c
+++ b/drivers/net/ethernet/mellanox/mlx4/intf.c
@@ -63,8 +63,11 @@ static void mlx4_add_device(struct mlx4_interface *intf, 
struct mlx4_priv *priv)
spin_lock_irq(priv-ctx_lock);
list_add_tail(dev_ctx-list, priv-ctx_list);
spin_unlock_irq(priv-ctx_lock);
+   if (intf-activate)
+   intf-activate(priv-dev, dev_ctx-context);
} else
kfree(dev_ctx);
+
 }
 
 static void mlx4_remove_device(struct mlx4_interface *intf, struct mlx4_priv 
*priv)
diff --git a/include/linux/mlx4/driver.h b/include/linux/mlx4/driver.h
index 9553a73..5a06d96 100644
--- a/include/linux/mlx4/driver.h
+++ b/include/linux/mlx4/driver.h
@@ -59,6 +59,7 @@ struct mlx4_interface {
void(*event) (struct mlx4_dev *dev, void *context,
  enum mlx4_dev_event event, unsigned 
long param);
void *  (*get_dev)(struct mlx4_dev *dev, void *context, 
u8 port);
+   void(*activate)(struct mlx4_dev *dev, void 
*context);
struct list_headlist;
enum mlx4_protocol  protocol;
int flags;
-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 for-next 06/32] IB/core: Add RoCE cache bonding support

2015-03-10 Thread Somnath Kotur
From: Matan Barak mat...@mellanox.com

Bonding is a unique behavior since when working in
active-backup mode, only the current selected slave
should occupy the default GIDs and the master's GID.
Listening to bonding events and only adding the
required GIDs to the active slave in the RoCE cache
GID table.

Signed-off-by: Matan Barak mat...@mellanox.com
Signed-off-by: Somnath Kotur somnath.ko...@emulex.com
---
 drivers/infiniband/core/roce_gid_mgmt.c | 228 ++--
 drivers/net/bonding/bond_options.c  |  13 --
 include/net/bonding.h   |   7 +
 3 files changed, 227 insertions(+), 21 deletions(-)

diff --git a/drivers/infiniband/core/roce_gid_mgmt.c 
b/drivers/infiniband/core/roce_gid_mgmt.c
index 3c11a64..bf7ef95 100644
--- a/drivers/infiniband/core/roce_gid_mgmt.c
+++ b/drivers/infiniband/core/roce_gid_mgmt.c
@@ -37,6 +37,7 @@
 
 /* For in6_dev_get/in6_dev_put */
 #include net/addrconf.h
+#include net/bonding.h
 
 #include rdma/ib_cache.h
 #include rdma/ib_addr.h
@@ -55,7 +56,7 @@ struct  update_gid_event_work {
enum gid_op_type gid_op;
 };
 
-#define ROCE_NETDEV_CALLBACK_SZ2
+#define ROCE_NETDEV_CALLBACK_SZ3
 struct netdev_event_work_cmd {
roce_netdev_callbackcb;
roce_netdev_filter  filter;
@@ -127,22 +128,96 @@ static void update_gid(enum gid_op_type gid_op, struct 
ib_device *ib_dev,
}
 }
 
+#define IS_NETDEV_BONDING_MASTER(ndev) \
+   (((ndev)-priv_flags   \
+ (IFF_BONDING | IFF_MASTER)) == (IFF_BONDING | IFF_MASTER))
+
+enum bonding_slave_state {
+   BONDING_SLAVE_STATE_ACTIVE  = 1UL  0,
+   BONDING_SLAVE_STATE_INACTIVE= 1UL  1,
+   BONDING_SLAVE_STATE_NA  = 1UL  2,
+};
+
+static enum bonding_slave_state is_eth_active_slave_of_bonding(struct 
net_device *idev,
+  struct 
net_device *upper)
+{
+   if (upper  IS_NETDEV_BONDING_MASTER(upper)) {
+   struct net_device *pdev;
+
+   rcu_read_lock();
+   pdev = bond_option_active_slave_get_rcu(netdev_priv(upper));
+   rcu_read_unlock();
+   if (pdev)
+   return idev == pdev ? BONDING_SLAVE_STATE_ACTIVE :
+   BONDING_SLAVE_STATE_INACTIVE;
+   }
+
+   return BONDING_SLAVE_STATE_NA;
+}
+
+static bool is_upper_dev_rcu(struct net_device *dev, struct net_device *upper)
+{
+   struct net_device *_upper = NULL;
+   struct list_head *iter;
+
+   rcu_read_lock();
+   netdev_for_each_all_upper_dev_rcu(dev, _upper, iter) {
+   if (_upper == upper)
+   break;
+   }
+
+   rcu_read_unlock();
+   return _upper == upper;
+}
+
+static int _is_eth_port_of_netdev(struct ib_device *ib_dev, u8 port,
+ struct net_device *idev, void *cookie,
+ unsigned long bond_state)
+{
+   struct net_device *ndev = (struct net_device *)cookie;
+   struct net_device *rdev;
+   int res;
+
+   if (!idev)
+   return 0;
+
+   rcu_read_lock();
+   rdev = rdma_vlan_dev_real_dev(ndev);
+   if (!rdev)
+   rdev = ndev;
+
+   res = ((is_upper_dev_rcu(idev, ndev) 
+  (is_eth_active_slave_of_bonding(idev, rdev) 
+   bond_state)) ||
+  rdev == idev);
+
+   rcu_read_unlock();
+   return res;
+}
+
 static int is_eth_port_of_netdev(struct ib_device *ib_dev, u8 port,
 struct net_device *idev, void *cookie)
 {
-   struct net_device *rdev;
-   struct net_device *mdev;
-   struct net_device *ndev = (struct net_device *)cookie;
+   return _is_eth_port_of_netdev(ib_dev, port, idev, cookie,
+ BONDING_SLAVE_STATE_ACTIVE |
+ BONDING_SLAVE_STATE_NA);
+}
 
+static int is_eth_port_inactive_slave(struct ib_device *ib_dev, u8 port,
+ struct net_device *idev, void *cookie)
+{
+   struct net_device *mdev;
+   int res;
if (!idev)
return 0;
 
rcu_read_lock();
mdev = netdev_master_upper_dev_get_rcu(idev);
-   rdev = rdma_vlan_dev_real_dev(ndev);
+   res = is_eth_active_slave_of_bonding(idev, mdev) ==
+   BONDING_SLAVE_STATE_INACTIVE;
rcu_read_unlock();
 
-   return (rdev ? rdev : ndev) == (mdev ? mdev : idev);
+   return res;
 }
 
 static int pass_all_filter(struct ib_device *ib_dev, u8 port,
@@ -151,6 +226,26 @@ static int pass_all_filter(struct ib_device *ib_dev, u8 
port,
return 1;
 }
 
+static int bonding_slaves_filter(struct ib_device *ib_dev, u8 port,
+struct net_device *idev, void *cookie)
+{
+   struct net_device *rdev;
+   struct net_device *ndev = (struct net_device *)cookie

[PATCH v2 for-next 16/32] RDMA/ocrdma: changes to support RoCE-v2 in UD path

2015-03-10 Thread Somnath Kotur
From: Devesh Sharma devesh.sha...@emulex.com

To support UD protocol this patch adds following
changes to existing UD implementation.

1. AH creation resolves gid-type for a given index.
2. Based on GID-type protocol header is built.
3. Work completion reports l3-type if f/w supports RoCE-v2
   and sets IB_WC_WITH_NETWORK_HDR_TYPE flag in wc-wc_flags.
4. Set hop_limit to enable non RDMA-CM applications for RoCEV2.

Signed-off-by: Somnath Kotur somnath.ko...@emulex.com
Signed-off-by: Devesh Sharma devesh.sha...@emulex.com
---
 drivers/infiniband/hw/ocrdma/ocrdma.h   |1 +
 drivers/infiniband/hw/ocrdma/ocrdma_ah.c|   70 ++-
 drivers/infiniband/hw/ocrdma/ocrdma_sli.h   |5 ++-
 drivers/infiniband/hw/ocrdma/ocrdma_verbs.c |   23 +++--
 4 files changed, 82 insertions(+), 17 deletions(-)

diff --git a/drivers/infiniband/hw/ocrdma/ocrdma.h 
b/drivers/infiniband/hw/ocrdma/ocrdma.h
index 97f971a..302fd0e 100644
--- a/drivers/infiniband/hw/ocrdma/ocrdma.h
+++ b/drivers/infiniband/hw/ocrdma/ocrdma.h
@@ -341,6 +341,7 @@ struct ocrdma_ah {
struct ocrdma_av *av;
u16 sgid_index;
u32 id;
+   u8 hdr_type;
 };
 
 struct ocrdma_qp_hwq_info {
diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_ah.c 
b/drivers/infiniband/hw/ocrdma/ocrdma_ah.c
index 7ecd230..6f838f1 100644
--- a/drivers/infiniband/hw/ocrdma/ocrdma_ah.c
+++ b/drivers/infiniband/hw/ocrdma/ocrdma_ah.c
@@ -39,6 +39,20 @@
 
 #define OCRDMA_VID_PCP_SHIFT   0xD
 
+static u16 ocrdma_hdr_type_to_proto_num(u8 hdr_type)
+{
+   switch (hdr_type) {
+   case OCRDMA_L3_TYPE_IB_GRH:
+   return (u16)0x8915;
+   case OCRDMA_L3_TYPE_IPV4:
+   return (u16)0x0800;
+   case OCRDMA_L3_TYPE_IPV6:
+   return (u16)0x86dd;
+   default:
+   return 0;
+   }
+}
+
 static inline int set_av_attr(struct ocrdma_dev *dev, struct ocrdma_ah *ah,
struct ib_ah_attr *attr, union ib_gid *sgid,
int pdid, bool *isvlan, u16 vlan_tag)
@@ -47,22 +61,33 @@ static inline int set_av_attr(struct ocrdma_dev *dev, 
struct ocrdma_ah *ah,
struct ocrdma_eth_vlan eth;
struct ocrdma_grh grh;
int eth_sz;
+   u16 proto_num = 0;
+   u8 nxthdr = 0x11;
+   struct iphdr ipv4;
+   union {
+   struct sockaddr _sockaddr;
+   struct sockaddr_in  _sockaddr_in;
+   struct sockaddr_in6 _sockaddr_in6;
+   } sgid_addr, dgid_addr;
 
memset(eth, 0, sizeof(eth));
memset(grh, 0, sizeof(grh));
+   /* Protocol Number */
+   proto_num = ocrdma_hdr_type_to_proto_num(ah-hdr_type);
+   nxthdr = (proto_num == 0x8915) ? 0x1b : 0x11;
 
/* VLAN */
if (!vlan_tag || (vlan_tag  0xFFF))
vlan_tag = dev-pvid;
if (vlan_tag  (vlan_tag  0x1000)) {
eth.eth_type = cpu_to_be16(0x8100);
-   eth.roce_eth_type = cpu_to_be16(OCRDMA_ROCE_ETH_TYPE);
+   eth.roce_eth_type = cpu_to_be16(proto_num);
vlan_tag |= (dev-sl  0x07)  OCRDMA_VID_PCP_SHIFT;
eth.vlan_tag = cpu_to_be16(vlan_tag);
eth_sz = sizeof(struct ocrdma_eth_vlan);
*isvlan = true;
} else {
-   eth.eth_type = cpu_to_be16(OCRDMA_ROCE_ETH_TYPE);
+   eth.eth_type = cpu_to_be16(proto_num);
eth_sz = sizeof(struct ocrdma_eth_basic);
}
/* MAC */
@@ -71,18 +96,34 @@ static inline int set_av_attr(struct ocrdma_dev *dev, 
struct ocrdma_ah *ah,
if (status)
return status;
ah-sgid_index = attr-grh.sgid_index;
-   memcpy(grh.sgid[0], sgid-raw, sizeof(union ib_gid));
-   memcpy(grh.dgid[0], attr-grh.dgid.raw, sizeof(attr-grh.dgid.raw));
-
-   grh.tclass_flow = cpu_to_be32((6  28) |
-   (attr-grh.traffic_class  24) |
-   attr-grh.flow_label);
-   /* 0x1b is next header value in GRH */
-   grh.pdid_hoplimit = cpu_to_be32((pdid  16) |
-   (0x1b  8) | attr-grh.hop_limit);
/* Eth HDR */
memcpy(ah-av-eth_hdr, eth, eth_sz);
-   memcpy((u8 *)ah-av + eth_sz, grh, sizeof(struct ocrdma_grh));
+   if (ah-hdr_type == RDMA_NETWORK_IPV4) {
+   *((__be16 *)ipv4) = htons((4  12) | (5  8) |
+  attr-grh.traffic_class);
+   ipv4.id = cpu_to_be16(pdid);
+   ipv4.frag_off = htons(IP_DF);
+   ipv4.tot_len = htons(0);
+   ipv4.ttl = attr-grh.hop_limit;
+   ipv4.protocol = nxthdr;
+   rdma_gid2ip(sgid_addr._sockaddr, sgid);
+   ipv4.saddr = sgid_addr._sockaddr_in.sin_addr.s_addr;
+   rdma_gid2ip(dgid_addr._sockaddr, attr-grh.dgid);
+   ipv4.daddr = dgid_addr._sockaddr_in.sin_addr.s_addr;
+   memcpy((u8 *)ah-av + eth_sz, ipv4, sizeof(struct iphdr

[PATCH v2 for-next 17/32] RDMA/ocrdma: changes to support RoCE-v2 in RC path

2015-03-10 Thread Somnath Kotur
From: Devesh Sharma devesh.sha...@emulex.com

To support RoCE-V2 this patch implements following changes
1. Get the GID-type for a given sgid.
2. Based on the gid type get IPv4 L3 address
   and give those to FW.
3. Provide l3-type to FW.

Signed-off-by: Somnath Kotur somnath.ko...@emulex.com
Signed-off-by: Devesh Sharma devesh.sha...@emulex.com
---
 drivers/infiniband/hw/ocrdma/ocrdma_hw.c | 30 --
 1 file changed, 28 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_hw.c 
b/drivers/infiniband/hw/ocrdma/ocrdma_hw.c
index 20f9e8f..147fccf 100644
--- a/drivers/infiniband/hw/ocrdma/ocrdma_hw.c
+++ b/drivers/infiniband/hw/ocrdma/ocrdma_hw.c
@@ -2433,7 +2433,13 @@ static int ocrdma_set_av_params(struct ocrdma_qp *qp,
union ib_gid sgid, zgid;
struct ib_gid_attr sgid_attr;
u32 vlan_id = 0x;
-   u8 mac_addr[6];
+   u8 mac_addr[6], hdr_type;
+   union {
+   struct sockaddr _sockaddr;
+   struct sockaddr_in  _sockaddr_in;
+   struct sockaddr_in6 _sockaddr_in6;
+   } sgid_addr, dgid_addr;
+
struct ocrdma_dev *dev = get_ocrdma_dev(qp-ibqp.device);
 
if ((ah_attr-ah_flags  IB_AH_GRH) == 0)
@@ -2448,6 +2454,8 @@ static int ocrdma_set_av_params(struct ocrdma_qp *qp,
cmd-params.hop_lmt_rq_psn |=
(ah_attr-grh.hop_limit  OCRDMA_QP_PARAMS_HOP_LMT_SHIFT);
cmd-flags |= OCRDMA_QP_PARA_FLOW_LBL_VALID;
+
+   /* GIDs */
memcpy(cmd-params.dgid[0], ah_attr-grh.dgid.raw[0],
   sizeof(cmd-params.dgid));
 
@@ -2471,17 +2479,35 @@ static int ocrdma_set_av_params(struct ocrdma_qp *qp,
return status;
cmd-params.dmac_b0_to_b3 = mac_addr[0] | (mac_addr[1]  8) |
(mac_addr[2]  16) | (mac_addr[3]  24);
+   hdr_type = ib_gid_to_network_type(sgid_attr.gid_type, sgid);
+   if (hdr_type == RDMA_NETWORK_IPV4) {
+   status = rdma_gid2ip(sgid_addr._sockaddr, sgid);
+   if (status)
+   return status;
+   status = rdma_gid2ip(dgid_addr._sockaddr, ah_attr-grh.dgid);
+   if (status)
+   return status;
+   memcpy(cmd-params.dgid[0],
+  dgid_addr._sockaddr_in.sin_addr.s_addr, 4);
+   memcpy(cmd-params.sgid[0],
+  sgid_addr._sockaddr_in.sin_addr.s_addr, 4);
+   }
/* convert them to LE format. */
ocrdma_cpu_to_le32(cmd-params.dgid[0], sizeof(cmd-params.dgid));
ocrdma_cpu_to_le32(cmd-params.sgid[0], sizeof(cmd-params.sgid));
cmd-params.vlan_dmac_b4_to_b5 = mac_addr[4] | (mac_addr[5]  8);
-   if (attr_mask  IB_QP_VID) {
+   if (vlan_id  0x1000) {
cmd-params.vlan_dmac_b4_to_b5 |=
vlan_id  OCRDMA_QP_PARAMS_VLAN_SHIFT;
cmd-flags |= OCRDMA_QP_PARA_VLAN_EN_VALID;
cmd-params.rnt_rc_sl_fl |=
(dev-sl  0x07)  OCRDMA_QP_PARAMS_SL_SHIFT;
}
+
+   cmd-params.max_sge_recv_flags |=
+((hdr_type 
+OCRDMA_QP_PARAMS_FLAGS_L3_TYPE_SHIFT) 
+OCRDMA_QP_PARAMS_FLAGS_L3_TYPE_MASK);
return 0;
 }
 
-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 for-next 21/32] IB/mlx4: Lock with RCU instead of RTNL

2015-03-10 Thread Somnath Kotur
From: Moni Shoua mo...@mellanox.com

The function eth_link_query_port() used to take the RTNL lock when
call to netdev_master_upper_dev_get() was necessary. This makes it
impossible to call this function with RTNL lock is held. Calling
netdev_master_upper_dev_get_rcu() and locking with RCU instead solve
this problem.

Signed-off-by: Moni Shoua mo...@mellanox.com
Signed-off-by: Somnath Kotur somnath.ko...@emulex.com
---
 drivers/infiniband/hw/mlx4/main.c | 15 +++
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/drivers/infiniband/hw/mlx4/main.c 
b/drivers/infiniband/hw/mlx4/main.c
index d8b227e..32cd009 100644
--- a/drivers/infiniband/hw/mlx4/main.c
+++ b/drivers/infiniband/hw/mlx4/main.c
@@ -367,14 +367,15 @@ static int eth_link_query_port(struct ib_device *ibdev, 
u8 port,
props-state= IB_PORT_DOWN;
props-phys_state   = state_to_phys_state(props-state);
props-active_mtu   = IB_MTU_256;
-   if (is_bonded)
-   rtnl_lock(); /* required to get upper dev */
down_read(iboe-sem);
ndev = iboe-netdevs[port - 1];
-   if (ndev  is_bonded)
-   ndev = netdev_master_upper_dev_get(ndev);
+   if (ndev  is_bonded) {
+   rcu_read_lock(); /* required to get upper dev */
+   ndev = netdev_master_upper_dev_get_rcu(ndev);
+   rcu_read_unlock();
+   }
if (!ndev)
-   goto out_unlock;
+   goto unlock;
 
tmp = iboe_get_mtu(ndev-mtu);
props-active_mtu = tmp ? min(props-max_mtu, tmp) : IB_MTU_256;
@@ -382,10 +383,8 @@ static int eth_link_query_port(struct ib_device *ibdev, u8 
port,
props-state= (netif_running(ndev)  
netif_carrier_ok(ndev)) ?
IB_PORT_ACTIVE : IB_PORT_DOWN;
props-phys_state   = state_to_phys_state(props-state);
-out_unlock:
+unlock:
up_read(iboe-sem);
-   if (is_bonded)
-   rtnl_unlock();
 out:
mlx4_free_cmd_mailbox(mdev-dev, mailbox);
return err;
-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 for-next 18/32] RDMA/ocrdma: changes to support user AH creation

2015-03-10 Thread Somnath Kotur
From: Devesh Sharma devesh.sha...@emulex.com

To support user space AH this uses ahid field to convey
l3-type to user space library. The library is responsible
for decoding the l3-type out of ahid.

Signed-off-by: Somnath Kotur somnath.ko...@emulex.com
Signed-off-by: Devesh Sharma devesh.sha...@emulex.com
---
 drivers/infiniband/hw/ocrdma/ocrdma_ah.c | 5 +
 drivers/infiniband/hw/ocrdma/ocrdma_ah.h | 5 +++--
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_ah.c 
b/drivers/infiniband/hw/ocrdma/ocrdma_ah.c
index 1bb72a0..65a39cc 100644
--- a/drivers/infiniband/hw/ocrdma/ocrdma_ah.c
+++ b/drivers/infiniband/hw/ocrdma/ocrdma_ah.c
@@ -191,6 +191,11 @@ struct ib_ah *ocrdma_create_ah(struct ib_pd *ibpd, struct 
ib_ah_attr *attr)
ahid_addr = pd-uctx-ah_tbl.va + attr-dlid;
*ahid_addr = 0;
*ahid_addr |= ah-id  OCRDMA_AH_ID_MASK;
+   if (ocrdma_is_rocev2_supported(dev)) {
+   *ahid_addr |= ((u32)ah-hdr_type 
+  OCRDMA_AH_L3_TYPE_MASK) 
+  OCRDMA_AH_L3_TYPE_SHIFT;
+   }
if (isvlan)
*ahid_addr |= (OCRDMA_AH_VLAN_VALID_MASK 
   OCRDMA_AH_VLAN_VALID_SHIFT);
diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_ah.h 
b/drivers/infiniband/hw/ocrdma/ocrdma_ah.h
index 726a87c..ed45ecd 100644
--- a/drivers/infiniband/hw/ocrdma/ocrdma_ah.h
+++ b/drivers/infiniband/hw/ocrdma/ocrdma_ah.h
@@ -31,9 +31,10 @@
 enum {
OCRDMA_AH_ID_MASK   = 0x3FF,
OCRDMA_AH_VLAN_VALID_MASK   = 0x01,
-   OCRDMA_AH_VLAN_VALID_SHIFT  = 0x1F
+   OCRDMA_AH_VLAN_VALID_SHIFT  = 0x1F,
+   OCRDMA_AH_L3_TYPE_MASK  = 0x03,
+   OCRDMA_AH_L3_TYPE_SHIFT = 0x1D /* 29 bits */
 };
-
 struct ib_ah *ocrdma_create_ah(struct ib_pd *, struct ib_ah_attr *);
 int ocrdma_destroy_ah(struct ib_ah *);
 int ocrdma_query_ah(struct ib_ah *, struct ib_ah_attr *);
-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 for-next 20/32] IB/mlx4: Replace spin_lock with rw_semaphore

2015-03-10 Thread Somnath Kotur
From: Moni Shoua mo...@mellanox.com

Protection on iboe-netdevs is no longer required to be from an atomic context.
Replacing a spin_lock with a semaphore is allowed and makes more sense.

Signed-off-by: Moni Shoua mo...@mellanox.com
Signed-off-by: Somnath Kotur somnath.ko...@emulex.com
---
 drivers/infiniband/hw/mlx4/main.c| 27 ++-
 drivers/infiniband/hw/mlx4/mlx4_ib.h |  2 +-
 2 files changed, 11 insertions(+), 18 deletions(-)

diff --git a/drivers/infiniband/hw/mlx4/main.c 
b/drivers/infiniband/hw/mlx4/main.c
index 91caffc..d8b227e 100644
--- a/drivers/infiniband/hw/mlx4/main.c
+++ b/drivers/infiniband/hw/mlx4/main.c
@@ -369,7 +369,7 @@ static int eth_link_query_port(struct ib_device *ibdev, u8 
port,
props-active_mtu   = IB_MTU_256;
if (is_bonded)
rtnl_lock(); /* required to get upper dev */
-   spin_lock_bh(iboe-lock);
+   down_read(iboe-sem);
ndev = iboe-netdevs[port - 1];
if (ndev  is_bonded)
ndev = netdev_master_upper_dev_get(ndev);
@@ -383,7 +383,7 @@ static int eth_link_query_port(struct ib_device *ibdev, u8 
port,
IB_PORT_ACTIVE : IB_PORT_DOWN;
props-phys_state   = state_to_phys_state(props-state);
 out_unlock:
-   spin_unlock_bh(iboe-lock);
+   up_read(iboe-sem);
if (is_bonded)
rtnl_unlock();
 out:
@@ -825,11 +825,11 @@ int mlx4_ib_add_mc(struct mlx4_ib_dev *mdev, struct 
mlx4_ib_qp *mqp,
if (!mqp-port)
return 0;
 
-   spin_lock_bh(mdev-iboe.lock);
+   down_read(mdev-iboe.sem);
ndev = mdev-iboe.netdevs[mqp-port - 1];
if (ndev)
dev_hold(ndev);
-   spin_unlock_bh(mdev-iboe.lock);
+   up_read(mdev-iboe.sem);
 
if (ndev) {
ret = 1;
@@ -1330,7 +1330,6 @@ static int mlx4_ib_mcg_detach(struct ib_qp *ibqp, union 
ib_gid *gid, u16 lid)
struct mlx4_ib_dev *mdev = to_mdev(ibqp-device);
struct mlx4_dev *dev = mdev-dev;
struct mlx4_ib_qp *mqp = to_mqp(ibqp);
-   struct net_device *ndev;
struct mlx4_ib_gid_entry *ge;
enum mlx4_protocol prot =  MLX4_PROT_IB_IPV6;
struct mlx4_flow_reg_id reg_id = {0, 0};
@@ -1370,13 +1369,6 @@ static int mlx4_ib_mcg_detach(struct ib_qp *ibqp, union 
ib_gid *gid, u16 lid)
mutex_lock(mqp-mutex);
ge = find_gid_entry(mqp, gid-raw);
if (ge) {
-   spin_lock_bh(mdev-iboe.lock);
-   ndev = ge-added ? mdev-iboe.netdevs[ge-port - 1] : NULL;
-   if (ndev)
-   dev_hold(ndev);
-   spin_unlock_bh(mdev-iboe.lock);
-   if (ndev)
-   dev_put(ndev);
list_del(ge-list);
kfree(ge);
} else
@@ -1543,7 +1535,7 @@ static void mlx4_ib_scan_netdevs(struct mlx4_ib_dev 
*ibdev,
 
iboe = ibdev-iboe;
 
-   spin_lock_bh(iboe-lock);
+   down_write(iboe-sem);
mlx4_foreach_ib_transport_port(port, ibdev-dev) {
 
iboe-netdevs[port - 1] =
@@ -1555,7 +1547,7 @@ static void mlx4_ib_scan_netdevs(struct mlx4_ib_dev 
*ibdev,
update_qps_port = port;
 
}
-   spin_unlock_bh(iboe-lock);
+   up_write(iboe-sem);
 
if (update_qps_port  0)
mlx4_ib_update_qps(ibdev, dev, update_qps_port);
@@ -1848,7 +1840,7 @@ static void *mlx4_ib_add(struct mlx4_dev *dev)
 
mlx4_ib_alloc_eqs(dev, ibdev);
 
-   spin_lock_init(iboe-lock);
+   init_rwsem(iboe-sem);
 
if (init_node_data(ibdev))
goto err_map;
@@ -2153,7 +2145,8 @@ static void handle_bonded_port_state_event(struct 
work_struct *work)
struct ib_event ibev;
 
kfree(ew);
-   spin_lock_bh(ibdev-iboe.lock);
+
+   down_read(ibdev-iboe.sem);
for (i = 0; i  MLX4_MAX_PORTS; ++i) {
struct net_device *curr_netdev = ibdev-iboe.netdevs[i];
 
@@ -2165,7 +2158,7 @@ static void handle_bonded_port_state_event(struct 
work_struct *work)
bonded_port_state = (bonded_port_state != IB_PORT_ACTIVE) ?
curr_port_state : IB_PORT_ACTIVE;
}
-   spin_unlock_bh(ibdev-iboe.lock);
+   up_read(ibdev-iboe.sem);
 
ibev.device = ibdev-ib_dev;
ibev.element.port_num = 1;
diff --git a/drivers/infiniband/hw/mlx4/mlx4_ib.h 
b/drivers/infiniband/hw/mlx4/mlx4_ib.h
index e3805a4..166ebf9 100644
--- a/drivers/infiniband/hw/mlx4/mlx4_ib.h
+++ b/drivers/infiniband/hw/mlx4/mlx4_ib.h
@@ -455,7 +455,7 @@ struct mlx4_ib_sriov {
 };
 
 struct mlx4_ib_iboe {
-   spinlock_t  lock;
+   struct rw_semaphore sem; /* guard from concurrent access to data in 
this struct */
struct net_device  *netdevs[MLX4_MAX_PORTS];
atomic64_t  mac[MLX4_MAX_PORTS];
struct notifier_block   nb;
-- 
2.1.0

--
To unsubscribe from this list: send

[PATCH v2 for-next 29/32] IB/core: Initialize UD header structure with IP and UDP headers

2015-03-10 Thread Somnath Kotur
From: Moni Shoua mo...@mellanox.com

ib_ud_header_init() is used to format InfiniBand headers
in a buffer up to (but not with) BTH. For RoCEv2 it is required that
this function would be able to build also IP and UDP headers.

Signed-off-by: Moni Shoua mo...@mellanox.com
Signed-off-by: Somnath Kotur somnath.ko...@emulex.com
---
 drivers/infiniband/core/ud_header.c| 153 ++---
 drivers/infiniband/hw/mlx4/qp.c|   7 +-
 drivers/infiniband/hw/mthca/mthca_qp.c |   2 +-
 include/rdma/ib_pack.h |  44 --
 4 files changed, 186 insertions(+), 20 deletions(-)

diff --git a/drivers/infiniband/core/ud_header.c 
b/drivers/infiniband/core/ud_header.c
index 72feee6..a4d4072 100644
--- a/drivers/infiniband/core/ud_header.c
+++ b/drivers/infiniband/core/ud_header.c
@@ -35,6 +35,7 @@
 #include linux/string.h
 #include linux/export.h
 #include linux/if_ether.h
+#include linux/ip.h
 
 #include rdma/ib_pack.h
 
@@ -116,6 +117,68 @@ static const struct ib_field vlan_table[]  = {
  .size_bits= 16 }
 };
 
+static const struct ib_field ip4_table[]  = {
+   { STRUCT_FIELD(ip4, ver_len),
+ .offset_words = 0,
+ .offset_bits  = 0,
+ .size_bits= 8 },
+   { STRUCT_FIELD(ip4, tos),
+ .offset_words = 0,
+ .offset_bits  = 8,
+ .size_bits= 8 },
+   { STRUCT_FIELD(ip4, tot_len),
+ .offset_words = 0,
+ .offset_bits  = 16,
+ .size_bits= 16 },
+   { STRUCT_FIELD(ip4, id),
+ .offset_words = 1,
+ .offset_bits  = 0,
+ .size_bits= 16 },
+   { STRUCT_FIELD(ip4, frag_off),
+ .offset_words = 1,
+ .offset_bits  = 16,
+ .size_bits= 16 },
+   { STRUCT_FIELD(ip4, ttl),
+ .offset_words = 2,
+ .offset_bits  = 0,
+ .size_bits= 8 },
+   { STRUCT_FIELD(ip4, protocol),
+ .offset_words = 2,
+ .offset_bits  = 8,
+ .size_bits= 8 },
+   { STRUCT_FIELD(ip4, check),
+ .offset_words = 2,
+ .offset_bits  = 16,
+ .size_bits= 16 },
+   { STRUCT_FIELD(ip4, saddr),
+ .offset_words = 3,
+ .offset_bits  = 0,
+ .size_bits= 32 },
+   { STRUCT_FIELD(ip4, daddr),
+ .offset_words = 4,
+ .offset_bits  = 0,
+ .size_bits= 32 }
+};
+
+static const struct ib_field udp_table[]  = {
+   { STRUCT_FIELD(udp, sport),
+ .offset_words = 0,
+ .offset_bits  = 0,
+ .size_bits= 16 },
+   { STRUCT_FIELD(udp, dport),
+ .offset_words = 0,
+ .offset_bits  = 16,
+ .size_bits= 16 },
+   { STRUCT_FIELD(udp, length),
+ .offset_words = 1,
+ .offset_bits  = 0,
+ .size_bits= 16 },
+   { STRUCT_FIELD(udp, csum),
+ .offset_words = 1,
+ .offset_bits  = 16,
+ .size_bits= 16 }
+};
+
 static const struct ib_field grh_table[]  = {
{ STRUCT_FIELD(grh, ip_version),
  .offset_words = 0,
@@ -213,6 +276,26 @@ static const struct ib_field deth_table[] = {
  .size_bits= 24 }
 };
 
+__be16 ib_ud_ip4_csum(struct ib_ud_header *header)
+{
+   struct iphdr iph;
+
+   iph.ihl = 5;
+   iph.version = 4;
+   iph.tos = header-ip4.tos;
+   iph.tot_len = header-ip4.tot_len;
+   iph.id  = header-ip4.id;
+   iph.frag_off= header-ip4.frag_off;
+   iph.ttl = header-ip4.ttl;
+   iph.protocol= header-ip4.protocol;
+   iph.check   = 0;
+   iph.saddr   = header-ip4.saddr;
+   iph.daddr   = header-ip4.daddr;
+
+   return ip_fast_csum((u8 *)iph, iph.ihl);
+}
+EXPORT_SYMBOL(ib_ud_ip4_csum);
+
 /**
  * ib_ud_header_init - Initialize UD header structure
  * @payload_bytes:Length of packet payload
@@ -220,19 +303,35 @@ static const struct ib_field deth_table[] = {
  * @eth_present: specify if Eth header is present
  * @vlan_present: packet is tagged vlan
  * @grh_present:GRH flag (if non-zero, GRH will be included)
+ * @ip_version:GRH flag (if non-zero, IP header, V4 or V6, will be included)
+ * @grh_present:GRH flag (if non-zero, UDP header will be included)
  * @immediate_present: specify if immediate data is present
  * @header:Structure to initialize
  */
-void ib_ud_header_init(int payload_bytes,
-  int  lrh_present,
-  int  eth_present,
-  int  vlan_present,
-  int  grh_present,
-  int  immediate_present,
-  struct ib_ud_header *header)
+int ib_ud_header_init(int payload_bytes,
+ intlrh_present,
+ inteth_present,
+ intvlan_present,
+ intgrh_present

[PATCH v2 for-next 03/32] IB/core: Add RoCE GID population

2015-03-10 Thread Somnath Kotur
From: Matan Barak mat...@mellanox.com

In order to populate the GID table, we need to listen for
events:
(a) IB device has been added or removed - used in order
to allocate/deallocate the cache and populate
the GID table internally.
(b) inet events - add new GIDs (according to the IP addresses)
to the table.
(c) netdev up/down/change_addr - if a netdev is built onto our
RoCE device, we need to add/delete its IPs.

When an event is received, multiple entries (each with
different GID type) are added.

Signed-off-by: Matan Barak mat...@mellanox.com
Signed-off-by: Somnath Kotur somnath.ko...@emulex.com
---
 drivers/infiniband/core/Makefile |   2 +-
 drivers/infiniband/core/core_priv.h  |  26 ++
 drivers/infiniband/core/device.c |  80 +
 drivers/infiniband/core/roce_gid_cache.c |  66 
 drivers/infiniband/core/roce_gid_mgmt.c  | 516 +++
 include/rdma/ib_addr.h   |   2 +-
 include/rdma/ib_verbs.h  |   9 +
 7 files changed, 699 insertions(+), 2 deletions(-)
 create mode 100644 drivers/infiniband/core/roce_gid_mgmt.c

diff --git a/drivers/infiniband/core/Makefile b/drivers/infiniband/core/Makefile
index 9b63bdf..2c94963 100644
--- a/drivers/infiniband/core/Makefile
+++ b/drivers/infiniband/core/Makefile
@@ -10,7 +10,7 @@ obj-$(CONFIG_INFINIBAND_USER_ACCESS) +=   ib_uverbs.o 
ib_ucm.o \
 
 ib_core-y :=   packer.o ud_header.o verbs.o sysfs.o \
device.o fmr_pool.o cache.o netlink.o \
-   roce_gid_cache.o
+   roce_gid_cache.o roce_gid_mgmt.o
 ib_core-$(CONFIG_INFINIBAND_USER_MEM) += umem.o
 ib_core-$(CONFIG_INFINIBAND_ON_DEMAND_PAGING) += umem_odp.o umem_rbtree.o
 
diff --git a/drivers/infiniband/core/core_priv.h 
b/drivers/infiniband/core/core_priv.h
index a502daa..12797d9 100644
--- a/drivers/infiniband/core/core_priv.h
+++ b/drivers/infiniband/core/core_priv.h
@@ -39,6 +39,8 @@
 
 #include rdma/ib_verbs.h
 
+extern struct workqueue_struct *roce_gid_mgmt_wq;
+
 int  ib_device_register_sysfs(struct ib_device *device,
  int (*port_callback)(struct ib_device *,
   u8, struct kobject *));
@@ -53,6 +55,22 @@ void ib_cache_cleanup(void);
 int ib_resolve_eth_l2_attrs(struct ib_qp *qp,
struct ib_qp_attr *qp_attr, int *qp_attr_mask);
 
+typedef void (*roce_netdev_callback)(struct ib_device *device, u8 port,
+ struct net_device *idev, void *cookie);
+
+typedef int (*roce_netdev_filter)(struct ib_device *device, u8 port,
+struct net_device *idev, void *cookie);
+
+void ib_dev_roce_ports_of_netdev(struct ib_device *ib_dev,
+roce_netdev_filter filter,
+void *filter_cookie,
+roce_netdev_callback cb,
+void *cookie);
+void ib_enum_roce_ports_of_netdev(roce_netdev_filter filter,
+ void *filter_cookie,
+ roce_netdev_callback cb,
+ void *cookie);
+
 int roce_gid_cache_get_gid(struct ib_device *ib_dev, u8 port, int index,
   union ib_gid *gid, struct ib_gid_attr *attr);
 
@@ -66,6 +84,9 @@ int roce_gid_cache_find_gid_by_port(struct ib_device *ib_dev, 
union ib_gid *gid,
 
 int roce_gid_cache_is_active(struct ib_device *ib_dev, u8 port);
 
+int roce_gid_cache_setup(void);
+void roce_gid_cache_cleanup(void);
+
 int roce_add_gid(struct ib_device *ib_dev, u8 port,
 union ib_gid *gid, struct ib_gid_attr *attr);
 
@@ -75,4 +96,9 @@ int roce_del_gid(struct ib_device *ib_dev, u8 port,
 int roce_del_all_netdev_gids(struct ib_device *ib_dev, u8 port,
 struct net_device *ndev);
 
+int roce_gid_mgmt_init(void);
+void roce_gid_mgmt_cleanup(void);
+
+int roce_rescan_device(struct ib_device *ib_dev);
+
 #endif /* _CORE_PRIV_H */
diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
index 8616a95..5ce57bf 100644
--- a/drivers/infiniband/core/device.c
+++ b/drivers/infiniband/core/device.c
@@ -39,6 +39,7 @@
 #include linux/init.h
 #include linux/mutex.h
 #include rdma/rdma_netlink.h
+#include rdma/ib_addr.h
 
 #include core_priv.h
 
@@ -640,6 +641,82 @@ int ib_query_gid(struct ib_device *device,
 EXPORT_SYMBOL(ib_query_gid);
 
 /**
+ * ib_dev_roce_ports_of_netdev - enumerate RoCE ports of ibdev in
+ *  respect of netdev
+ * @ib_dev : IB device we want to query
+ * @filter: Should we call the callback?
+ * @filter_cookie: Cookie passed to filter
+ * @cb: Callback to call for each found RoCE ports
+ * @cookie: Cookie passed back to the callback
+ *
+ * Enumerates all of the physical RoCE ports of ib_dev RoCE ports
+ * which are relaying Ethernet packets to a specific
+ * (possibly

[PATCH v2 for-next 04/32] IB/core: Add default GID for RoCE GID Cache

2015-03-10 Thread Somnath Kotur
From: Matan Barak mat...@mellanox.com

When RoCE is used, a default GID address should be generated
for every supported RoCE type. These default GID addresses are
generated based on the IPv6 link-local address, but in contrast
to the GID based on the regular IPv6 link-local (as we generate
GID per IP address), these GIDs are also available if the net
device is down (in order to support loopback).
Moreover, these default GID addresses can't be deleted.

Signed-off-by: Matan Barak mat...@mellanox.com
Signed-off-by: Somnath Kotur somnath.ko...@emulex.com
---
 drivers/infiniband/core/core_priv.h  | 10 
 drivers/infiniband/core/roce_gid_cache.c | 86 
 drivers/infiniband/core/roce_gid_mgmt.c  | 43 +---
 include/net/addrconf.h   | 31 
 net/ipv6/addrconf.c  | 31 
 5 files changed, 163 insertions(+), 38 deletions(-)

diff --git a/drivers/infiniband/core/core_priv.h 
b/drivers/infiniband/core/core_priv.h
index 12797d9..6ab40a9 100644
--- a/drivers/infiniband/core/core_priv.h
+++ b/drivers/infiniband/core/core_priv.h
@@ -84,6 +84,16 @@ int roce_gid_cache_find_gid_by_port(struct ib_device 
*ib_dev, union ib_gid *gid,
 
 int roce_gid_cache_is_active(struct ib_device *ib_dev, u8 port);
 
+enum roce_gid_cache_default_mode {
+   ROCE_GID_CACHE_DEFAULT_MODE_SET,
+   ROCE_GID_CACHE_DEFAULT_MODE_DELETE
+};
+
+void roce_gid_cache_set_default_gid(struct ib_device *ib_dev, u8 port,
+   struct net_device *ndev,
+   unsigned long gid_type_mask,
+   enum roce_gid_cache_default_mode mode);
+
 int roce_gid_cache_setup(void);
 void roce_gid_cache_cleanup(void);
 
diff --git a/drivers/infiniband/core/roce_gid_cache.c 
b/drivers/infiniband/core/roce_gid_cache.c
index 2b0a310..2bd663f 100644
--- a/drivers/infiniband/core/roce_gid_cache.c
+++ b/drivers/infiniband/core/roce_gid_cache.c
@@ -34,6 +34,7 @@
 #include linux/netdevice.h
 #include linux/rtnetlink.h
 #include rdma/ib_cache.h
+#include net/addrconf.h
 
 #include core_priv.h
 
@@ -176,12 +177,19 @@ static int find_gid(struct ib_roce_gid_cache *cache, 
union ib_gid *gid,
return -1;
 }
 
+static void make_default_gid(struct  net_device *dev, union ib_gid *gid)
+{
+   gid-global.subnet_prefix = cpu_to_be64(0xfe80LL);
+   addrconf_ifid_eui48(gid-raw[8], dev);
+}
+
 int roce_add_gid(struct ib_device *ib_dev, u8 port,
 union ib_gid *gid, struct ib_gid_attr *attr)
 {
struct ib_roce_gid_cache *cache;
int ix;
int ret = 0;
+   struct net_device *idev;
 
if (!ib_dev-cache.roce_gid_cache)
return -ENOSYS;
@@ -191,6 +199,22 @@ int roce_add_gid(struct ib_device *ib_dev, u8 port,
if (!cache || !cache-active)
return -ENOSYS;
 
+   if (ib_dev-get_netdev) {
+   rcu_read_lock();
+   idev = ib_dev-get_netdev(ib_dev, port);
+   if (idev  attr-ndev != idev) {
+   union ib_gid default_gid;
+
+   /* Adding default GIDs in not permitted */
+   make_default_gid(idev, default_gid);
+   if (!memcmp(gid, default_gid, sizeof(*gid))) {
+   rcu_read_unlock();
+   return -EPERM;
+   }
+   }
+   rcu_read_unlock();
+   }
+
mutex_lock(cache-lock);
 
ix = find_gid(cache, gid, attr, GID_ATTR_FIND_MASK_GID_TYPE |
@@ -215,6 +239,7 @@ int roce_del_gid(struct ib_device *ib_dev, u8 port,
 union ib_gid *gid, struct ib_gid_attr *attr)
 {
struct ib_roce_gid_cache *cache;
+   union ib_gid default_gid;
int ix;
 
if (!ib_dev-cache.roce_gid_cache)
@@ -225,6 +250,13 @@ int roce_del_gid(struct ib_device *ib_dev, u8 port,
if (!cache || !cache-active)
return -ENOSYS;
 
+   if (attr-ndev) {
+   /* Deleting default GIDs in not permitted */
+   make_default_gid(attr-ndev, default_gid);
+   if (!memcmp(gid, default_gid, sizeof(*gid)))
+   return -EPERM;
+   }
+
mutex_lock(cache-lock);
 
ix = find_gid(cache, gid, attr,
@@ -437,6 +469,60 @@ static void set_roce_gid_cache_active(struct 
ib_roce_gid_cache *cache,
cache-active = active;
 }
 
+void roce_gid_cache_set_default_gid(struct ib_device *ib_dev, u8 port,
+   struct net_device *ndev,
+   unsigned long gid_type_mask,
+   enum roce_gid_cache_default_mode mode)
+{
+   union ib_gid gid;
+   struct ib_gid_attr gid_attr;
+   struct ib_roce_gid_cache *cache;
+   unsigned int gid_type;
+   unsigned int gid_index = 0;
+
+   cache  = ib_dev-cache.roce_gid_cache

[PATCH v2 for-next 12/32] IB/core: Add rdma_network_type to wc

2015-03-10 Thread Somnath Kotur
From: Matan Barak mat...@mellanox.com

Providers should tell IB core the wc's network type.
This is used in order to search for the proper GID in the
GID table. When using HCAs that can't provide this info,
IB core tries to deep examine the packet and extract
the GID type by itself.

Signed-off-by: Matan Barak mat...@mellanox.com
Signed-off-by: Somnath Kotur somnath.ko...@emulex.com
---
 drivers/infiniband/core/verbs.c | 106 ++--
 include/rdma/ib_verbs.h |  30 
 2 files changed, 131 insertions(+), 5 deletions(-)

diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index 2f5fd7a..2e7ccad 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -195,8 +195,84 @@ struct ib_ah *ib_create_ah(struct ib_pd *pd, struct 
ib_ah_attr *ah_attr)
 }
 EXPORT_SYMBOL(ib_create_ah);
 
+static int ib_get_grh_header_version(const void *h)
+{
+   const struct iphdr *ip4h = (struct iphdr *)(h + 20);
+   struct iphdr ip4h_checked;
+   const struct ipv6hdr *ip6h = (struct ipv6hdr *)h;
+
+   if (ip6h-version != 6)
+   return (ip4h-version == 4) ? 4 : 0;
+   /* version may be 6 or 4 */
+   if (ip4h-ihl != 5) /* IPv4 header length must be 5 for RR */
+   return 6;
+   /* Verify checksum.
+  We can't write on scattered buffers so we need to copy to
+  temp buffer.
+*/
+   memcpy(ip4h_checked, ip4h, sizeof(ip4h_checked));
+   ip4h_checked.check = 0;
+   ip4h_checked.check = ip_fast_csum((u8 *)ip4h_checked, 5);
+   /* if IPv4 header checksum is OK, bellive it */
+   if (ip4h-check == ip4h_checked.check)
+   return 4;
+   return 6;
+}
+
+static int ib_get_dgid_sgid_by_grh(const void *h,
+  enum rdma_network_type net_type,
+  union ib_gid *dgid, union ib_gid *sgid)
+{
+   switch (net_type) {
+   case RDMA_NETWORK_IPV4: {
+   const struct iphdr *ip4h = (struct iphdr *)(h + 20);
+
+   ipv6_addr_set_v4mapped(ip4h-daddr, (struct in6_addr *)dgid);
+   ipv6_addr_set_v4mapped(ip4h-saddr, (struct in6_addr *)sgid);
+   return 0;
+   }
+   case RDMA_NETWORK_IPV6: {
+   struct ipv6hdr *ip6h = (struct ipv6hdr *)h;
+
+   memcpy(dgid, ip6h-daddr, sizeof(*dgid));
+   memcpy(sgid, ip6h-saddr, sizeof(*sgid));
+   return 0;
+   }
+   case RDMA_NETWORK_IB: {
+   struct ib_grh *grh = (struct ib_grh *)h;
+
+   memcpy(dgid, grh-dgid, sizeof(*dgid));
+   memcpy(sgid, grh-sgid, sizeof(*sgid));
+   return 0;
+   }
+   }
+
+   return -EINVAL;
+}
+
+static enum rdma_network_type ib_get_net_type_by_grh(struct ib_device *device,
+u8 port_num,
+const struct ib_grh *grh)
+{
+   int grh_version;
+
+   if (rdma_port_get_link_layer(device, port_num) == 
IB_LINK_LAYER_INFINIBAND)
+   return RDMA_NETWORK_IB;
+
+   grh_version = ib_get_grh_header_version(grh);
+
+   if (grh_version == 4)
+   return RDMA_NETWORK_IPV4;
+
+   if (grh-next_hdr == IPPROTO_UDP)
+   return RDMA_NETWORK_IPV6;
+
+   return RDMA_NETWORK_IB;
+}
+
 struct find_gid_index_context {
u16 vlan_id;
+   enum ib_gid_type gid_type;
 };
 
 static bool find_gid_index(const union ib_gid *gid,
@@ -206,6 +282,9 @@ static bool find_gid_index(const union ib_gid *gid,
struct find_gid_index_context *ctx =
(struct find_gid_index_context *)context;
 
+   if (ctx-gid_type != gid_attr-gid_type)
+   return false;
+
if ((!!(ctx-vlan_id != 0x) == !is_vlan_dev(gid_attr-ndev)) ||
(is_vlan_dev(gid_attr-ndev) 
 vlan_dev_vlan_id(gid_attr-ndev) != ctx-vlan_id))
@@ -216,9 +295,11 @@ static bool find_gid_index(const union ib_gid *gid,
 
 static int get_sgid_index_from_eth(struct ib_device *device, u8 port_num,
   u16 vlan_id, union ib_gid *sgid,
+  enum ib_gid_type gid_type,
   u16 *gid_index)
 {
-   struct find_gid_index_context context = {.vlan_id = vlan_id};
+   struct find_gid_index_context context = {.vlan_id = vlan_id,
+.gid_type = gid_type};
 
return ib_find_gid_by_filter(device, sgid, port_num, find_gid_index,
 context, gid_index);
@@ -232,9 +313,24 @@ int ib_init_ah_from_wc(struct ib_device *device, u8 
port_num, struct ib_wc *wc,
int ret;
int is_eth = (rdma_port_get_link_layer(device, port_num) ==
IB_LINK_LAYER_ETHERNET);
+   enum rdma_network_type net_type = RDMA_NETWORK_IB

[PATCH v2 for-next 23/32] IB/mlx4: Advertise RoCE support in port capabilities

2015-03-10 Thread Somnath Kotur
From: Moni Shoua mo...@mellanox.com

The port capability flags should indicate the support in RoCE modes (V1
or V2) of the port. The mlx4 driver sets these flags according to the
capabilities reported by the HW.

Signed-off-by: Moni Shoua mo...@mellanox.com
Signed-off-by: Somnath Kotur somnath.ko...@emulex.com
---
 drivers/infiniband/hw/mlx4/main.c   |  6 ++
 drivers/net/ethernet/mellanox/mlx4/fw.c |  5 -
 include/linux/mlx4/device.h | 13 ++---
 3 files changed, 20 insertions(+), 4 deletions(-)

diff --git a/drivers/infiniband/hw/mlx4/main.c 
b/drivers/infiniband/hw/mlx4/main.c
index 32cd009..bf87a95 100644
--- a/drivers/infiniband/hw/mlx4/main.c
+++ b/drivers/infiniband/hw/mlx4/main.c
@@ -359,6 +359,12 @@ static int eth_link_query_port(struct ib_device *ibdev, u8 
port,
IB_WIDTH_4X : IB_WIDTH_1X;
props-active_speed = IB_SPEED_QDR;
props-port_cap_flags   = IB_PORT_CM_SUP | IB_PORT_IP_BASED_GIDS;
+
+   if (mdev-dev-caps.flags  MLX4_DEV_CAP_FLAG_IBOE)
+   props-port_cap_flags   |= IB_PORT_ROCE;
+   if (mdev-dev-caps.flags2  MLX4_DEV_CAP_FLAG2_ROCE_V1_V2)
+   props-port_cap_flags   |= IB_PORT_ROCE_V2 | IB_PORT_ROCE;
+
props-gid_tbl_len  = mdev-dev-caps.gid_table_len[port];
props-max_msg_sz   = mdev-dev-caps.max_msg_sz;
props-pkey_tbl_len = 1;
diff --git a/drivers/net/ethernet/mellanox/mlx4/fw.c 
b/drivers/net/ethernet/mellanox/mlx4/fw.c
index 3702fd1..d573e73 100644
--- a/drivers/net/ethernet/mellanox/mlx4/fw.c
+++ b/drivers/net/ethernet/mellanox/mlx4/fw.c
@@ -146,7 +146,8 @@ static void dump_dev_cap_flags2(struct mlx4_dev *dev, u64 
flags)
[17] = Asymmetric EQs support,
[18] = More than 80 VFs support,
[19] = Performance optimized for limited rule configuration 
flow steering support,
-   [21] = Port Remap support
+   [21] = Port Remap support,
+   [22] = RoCEv2 support
};
int i;
 
@@ -852,6 +853,8 @@ int mlx4_QUERY_DEV_CAP(struct mlx4_dev *dev, struct 
mlx4_dev_cap *dev_cap)
dev_cap-flags2 |= MLX4_DEV_CAP_FLAG2_EQE_STRIDE;
MLX4_GET(dev_cap-bmme_flags, outbox,
 QUERY_DEV_CAP_BMME_FLAGS_OFFSET);
+   if (dev_cap-bmme_flags  MLX4_FLAG_ROCE_V1_V2)
+   dev_cap-flags2 |= MLX4_DEV_CAP_FLAG2_ROCE_V1_V2;
if (dev_cap-bmme_flags  MLX4_FLAG_PORT_REMAP)
dev_cap-flags2 |= MLX4_DEV_CAP_FLAG2_PORT_REMAP;
MLX4_GET(field, outbox, QUERY_DEV_CAP_CONFIG_DEV_OFFSET);
diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h
index 9a05e73..02dd6a0 100644
--- a/include/linux/mlx4/device.h
+++ b/include/linux/mlx4/device.h
@@ -202,7 +202,8 @@ enum {
MLX4_DEV_CAP_FLAG2_SYS_EQS  = 1LL   17,
MLX4_DEV_CAP_FLAG2_80_VFS   = 1LL   18,
MLX4_DEV_CAP_FLAG2_FS_A0= 1LL   19,
-   MLX4_DEV_CAP_FLAG2_PORT_REMAP   = 1LL   21
+   MLX4_DEV_CAP_FLAG2_PORT_REMAP   = 1LL   21,
++  MLX4_DEV_CAP_FLAG2_ROCE_V1_V2   = 1LL   22
 };
 
 enum {
@@ -250,6 +251,7 @@ enum {
MLX4_BMME_FLAG_TYPE_2_WIN   = 1   9,
MLX4_BMME_FLAG_RESERVED_LKEY= 1  10,
MLX4_BMME_FLAG_FAST_REG_WR  = 1  11,
+   MLX4_BMME_FLAG_ROCE_V1_V2   = 1  19,
MLX4_BMME_FLAG_PORT_REMAP   = 1  24,
MLX4_BMME_FLAG_VSD_INIT2RTR = 1  28,
 };
@@ -258,6 +260,10 @@ enum {
MLX4_FLAG_PORT_REMAP= MLX4_BMME_FLAG_PORT_REMAP
 };
 
+enum {
+   MLX4_FLAG_ROCE_V1_V2= MLX4_BMME_FLAG_ROCE_V1_V2
+};
+
 enum mlx4_event {
MLX4_EVENT_TYPE_COMP   = 0x00,
MLX4_EVENT_TYPE_PATH_MIG   = 0x01,
@@ -888,9 +894,10 @@ struct mlx4_mad_ifc {
if (((dev)-caps.port_mask[port] != MLX4_PORT_TYPE_IB))
 
 #define mlx4_foreach_ib_transport_port(port, dev) \
-   for ((port) = 1; (port) = (dev)-caps.num_ports; (port)++)   \
+   for ((port) = 1; (port) = (dev)-caps.num_ports; (port)++)   \
if (((dev)-caps.port_mask[port] == MLX4_PORT_TYPE_IB) || \
-   ((dev)-caps.flags  MLX4_DEV_CAP_FLAG_IBOE))
+   ((dev)-caps.flags  MLX4_DEV_CAP_FLAG_IBOE) || \
+   ((dev)-caps.flags2  MLX4_DEV_CAP_FLAG2_ROCE_V1_V2))
 
 #define MLX4_INVALID_SLAVE_ID  0xFF
 
-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 for-next 14/32] IB/Core: Changes to the IB Core infrastructure for RoCEv2 support

2015-03-10 Thread Somnath Kotur
1. Choose sgid_index and type from all the matching entries in RDMA-CM
   based on hint from the IP stack.
2. Set hop_limit for the IP Packet based on above hint from IP stack
3. Define a RDMA_NETWORK enum type.

Signed-off-by: Somnath Kotur somnath.ko...@emulex.com
Signed-off-by: Matan Barak mat...@mellanox.com
---
 drivers/infiniband/core/addr.c  |  8 +
 drivers/infiniband/core/cma.c   | 10 +-
 drivers/infiniband/core/verbs.c | 77 ++---
 include/rdma/ib_addr.h  |  1 +
 include/rdma/ib_verbs.h |  9 +
 5 files changed, 68 insertions(+), 37 deletions(-)

diff --git a/drivers/infiniband/core/addr.c b/drivers/infiniband/core/addr.c
index 43af7f5..da24c0e 100644
--- a/drivers/infiniband/core/addr.c
+++ b/drivers/infiniband/core/addr.c
@@ -257,6 +257,9 @@ static int addr4_resolve(struct sockaddr_in *src_in,
goto put;
}
 
+   if (rt-rt_uses_gateway)
+   addr-network = RDMA_NETWORK_IPV4;
+
ret = dst_fetch_ha(rt-dst, addr, fl4.daddr);
 put:
ip_rt_put(rt);
@@ -271,6 +274,7 @@ static int addr6_resolve(struct sockaddr_in6 *src_in,
 {
struct flowi6 fl6;
struct dst_entry *dst;
+   struct rt6_info *rt;
int ret;
 
memset(fl6, 0, sizeof fl6);
@@ -282,6 +286,7 @@ static int addr6_resolve(struct sockaddr_in6 *src_in,
if ((ret = dst-error))
goto put;
 
+   rt = (struct rt6_info *)dst;
if (ipv6_addr_any(fl6.saddr)) {
ret = ipv6_dev_get_saddr(init_net, ip6_dst_idev(dst)-dev,
 fl6.daddr, 0, fl6.saddr);
@@ -305,6 +310,9 @@ static int addr6_resolve(struct sockaddr_in6 *src_in,
goto put;
}
 
+   if (rt-rt6i_flags  RTF_GATEWAY)
+   addr-network = RDMA_NETWORK_IPV6;
+
ret = dst_fetch_ha(dst, addr, fl6.daddr);
 put:
dst_release(dst);
diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index 1705280..2bfe798 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -1952,6 +1952,7 @@ static int cma_resolve_iboe_route(struct rdma_id_private 
*id_priv)
 {
struct rdma_route *route = id_priv-id.route;
struct rdma_addr *addr = route-addr;
+   enum ib_gid_type network_gid_type;
struct cma_work *work;
int ret;
struct net_device *ndev = NULL;
@@ -1990,7 +1991,14 @@ static int cma_resolve_iboe_route(struct rdma_id_private 
*id_priv)
rdma_ip2gid((struct sockaddr *)id_priv-id.route.addr.dst_addr,
route-path_rec-dgid);
 
-   route-path_rec-hop_limit = 1;
+   /* Use the hint from IP Stack to select GID Type */
+   network_gid_type = ib_network_to_gid_type(addr-dev_addr.network);
+   if (addr-dev_addr.network != RDMA_NETWORK_IB) {
+   route-path_rec-gid_type = network_gid_type;
+   route-path_rec-hop_limit = IPV6_DEFAULT_HOPLIMIT;
+   } else {
+   route-path_rec-hop_limit = 1;
+   }
route-path_rec-reversible = 1;
route-path_rec-pkey = cpu_to_be16(0x);
route-path_rec-mtu_selector = IB_SA_EQ;
diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index 2e7ccad..3586996 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -195,11 +195,11 @@ struct ib_ah *ib_create_ah(struct ib_pd *pd, struct 
ib_ah_attr *ah_attr)
 }
 EXPORT_SYMBOL(ib_create_ah);
 
-static int ib_get_grh_header_version(const void *h)
+static int ib_get_grh_header_version(const union rdma_network_hdr *h)
 {
-   const struct iphdr *ip4h = (struct iphdr *)(h + 20);
+   const struct iphdr *ip4h = (struct iphdr *)h-roce4grh;
struct iphdr ip4h_checked;
-   const struct ipv6hdr *ip6h = (struct ipv6hdr *)h;
+   const struct ipv6hdr *ip6h = (struct ipv6hdr *)h-ibgrh;
 
if (ip6h-version != 6)
return (ip4h-version == 4) ? 4 : 0;
@@ -219,37 +219,6 @@ static int ib_get_grh_header_version(const void *h)
return 6;
 }
 
-static int ib_get_dgid_sgid_by_grh(const void *h,
-  enum rdma_network_type net_type,
-  union ib_gid *dgid, union ib_gid *sgid)
-{
-   switch (net_type) {
-   case RDMA_NETWORK_IPV4: {
-   const struct iphdr *ip4h = (struct iphdr *)(h + 20);
-
-   ipv6_addr_set_v4mapped(ip4h-daddr, (struct in6_addr *)dgid);
-   ipv6_addr_set_v4mapped(ip4h-saddr, (struct in6_addr *)sgid);
-   return 0;
-   }
-   case RDMA_NETWORK_IPV6: {
-   struct ipv6hdr *ip6h = (struct ipv6hdr *)h;
-
-   memcpy(dgid, ip6h-daddr, sizeof(*dgid));
-   memcpy(sgid, ip6h-saddr, sizeof(*sgid));
-   return 0;
-   }
-   case RDMA_NETWORK_IB: {
-   struct ib_grh *grh = (struct ib_grh *)h;
-
-   memcpy(dgid

[PATCH v2 for-next 26/32] IB/mlx4: Configure device to work in RoCEv2

2015-03-10 Thread Somnath Kotur
From: Moni Shoua mo...@mellanox.com

Some mlx4 adapters are RoCEv2 capable. To enable this feature some
hardware configuration is required. This is

1. Set port general parameters
2. Configure the outgoing UDP destination port
3. Configure the QP that work with RoCEv2

Signed-off-by: Moni Shoua mo...@mellanox.com
Signed-off-by: Somnath Kotur somnath.ko...@emulex.com
---
 drivers/infiniband/hw/mlx4/main.c | 10 +++-
 drivers/infiniband/hw/mlx4/qp.c   | 40 +++
 drivers/net/ethernet/mellanox/mlx4/fw.c   | 16 -
 drivers/net/ethernet/mellanox/mlx4/mlx4.h |  3 ++-
 drivers/net/ethernet/mellanox/mlx4/port.c |  9 ++-
 drivers/net/ethernet/mellanox/mlx4/qp.c   | 27 +
 include/linux/mlx4/device.h   |  3 ++-
 include/linux/mlx4/qp.h   | 15 ++--
 8 files changed, 112 insertions(+), 11 deletions(-)

diff --git a/drivers/infiniband/hw/mlx4/main.c 
b/drivers/infiniband/hw/mlx4/main.c
index 9d651cf..53c855b 100644
--- a/drivers/infiniband/hw/mlx4/main.c
+++ b/drivers/infiniband/hw/mlx4/main.c
@@ -2166,7 +2166,8 @@ static void *mlx4_ib_add(struct mlx4_dev *dev)
if (mlx4_ib_init_sriov(ibdev))
goto err_mad;
 
-   if (dev-caps.flags  MLX4_DEV_CAP_FLAG_IBOE) {
+   if (dev-caps.flags  MLX4_DEV_CAP_FLAG_IBOE ||
+   dev-caps.flags2  MLX4_DEV_CAP_FLAG2_ROCE_V1_V2) {
if (!iboe-nb.notifier_call) {
iboe-nb.notifier_call = mlx4_ib_netdev_event;
err = register_netdevice_notifier(iboe-nb);
@@ -2175,6 +2176,13 @@ static void *mlx4_ib_add(struct mlx4_dev *dev)
goto err_notif;
}
}
+   if (!mlx4_is_slave(dev) 
+   dev-caps.flags2  MLX4_DEV_CAP_FLAG2_ROCE_V1_V2) {
+   err = mlx4_config_roce_v2_port(dev, ROCE_V2_UDP_DPORT);
+   if (err) {
+   goto err_notif;
+   }
+   }
}
 
for (j = 0; j  ARRAY_SIZE(mlx4_class_attributes); ++j) {
diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c
index 6f6d0db..847f9ec 100644
--- a/drivers/infiniband/hw/mlx4/qp.c
+++ b/drivers/infiniband/hw/mlx4/qp.c
@@ -1408,6 +1408,24 @@ static int handle_eth_ud_smac_index(struct mlx4_ib_dev 
*dev,
return 0;
 }
 
+enum {
+   MLX4_QPC_ROCE_MODE_1 = 0,
+   MLX4_QPC_ROCE_MODE_2 = 2,
+   MLX4_QPC_ROCE_MODE_MAX = 0xff
+};
+
+static u8 gid_type_to_qpc(enum ib_gid_type gid_type)
+{
+   switch (gid_type) {
+   case IB_GID_TYPE_IB:
+   return MLX4_QPC_ROCE_MODE_1;
+   case IB_GID_TYPE_ROCE_V2:
+   return MLX4_QPC_ROCE_MODE_2;
+   default:
+   return MLX4_QPC_ROCE_MODE_MAX;
+   }
+}
+
 static int __mlx4_ib_modify_qp(struct ib_qp *ibqp,
   const struct ib_qp_attr *attr, int attr_mask,
   enum ib_qp_state cur_state, enum ib_qp_state 
new_state)
@@ -1531,12 +1549,14 @@ static int __mlx4_ib_modify_qp(struct ib_qp *ibqp,
u16 vlan = 0x;
u8 smac[ETH_ALEN];
int status = 0;
+   int is_eth = rdma_port_get_link_layer(dev-ib_dev, qp-port) ==
+   IB_LINK_LAYER_ETHERNET;
 
-   if (rdma_port_get_link_layer(dev-ib_dev, qp-port) ==
-   IB_LINK_LAYER_ETHERNET 
-   attr-ah_attr.ah_flags  IB_AH_GRH) {
+   if (is_eth  attr-ah_attr.ah_flags  IB_AH_GRH) {
int index = attr-ah_attr.grh.sgid_index;
 
+   if (mlx4_is_bonded(dev-dev))
+   port_num  = 1;
rcu_read_lock();
status = ib_get_cached_gid(ibqp-device, port_num,
   index, gid, gid_attr);
@@ -1555,8 +1575,20 @@ static int __mlx4_ib_modify_qp(struct ib_qp *ibqp,
  port_num, vlan, smac))
goto out;
 
+   if (is_eth  gid_attr.gid_type == IB_GID_TYPE_ROCE_V2)
+   context-pri_path.hop_limit = IPV6_DEFAULT_HOPLIMIT;
+
optpar |= (MLX4_QP_OPTPAR_PRIMARY_ADDR_PATH |
   MLX4_QP_OPTPAR_SCHED_QUEUE);
+
+   if (is_eth  (cur_state == IB_QPS_INIT  new_state == 
IB_QPS_RTR)) {
+   u8 qpc_roce_mode = gid_type_to_qpc(gid_attr.gid_type);
+
+   if (qpc_roce_mode == MLX4_QPC_ROCE_MODE_MAX)
+   goto out;
+   context-rlkey_roce_mode |= (qpc_roce_mode  6);
+   }
+
}
 
if (attr_mask  IB_QP_TIMEOUT) {
@@ -1728,7 +1760,7 @@ static int __mlx4_ib_modify_qp(struct ib_qp *ibqp,
sqd_event = 0;
 
if (!ibqp-uobject

[PATCH v2 for-next 09/32] IB/core: Support find sgid index using a filter function

2015-03-10 Thread Somnath Kotur
From: Matan Barak mat...@mellanox.com

Sometimes a sgid index need to be found based on variable parameters.
For example, when the CM gets a packet from network, it needs to
match a sgid_index that matches the appropriate L2 attributes
of a packet. Extending the cache's API to include Ethernet L2
attribute is problematic, since they may be vastly extended
in the future. As a result, we add a find function that
gets a user filter function and searches the GID table
until a match is found.

Signed-off-by: Matan Barak mat...@mellanox.com
Signed-off-by: Somnath Kotur somnath.ko...@emulex.com
---
 drivers/infiniband/core/cache.c  | 24 
 drivers/infiniband/core/core_priv.h  |  9 +
 drivers/infiniband/core/roce_gid_cache.c | 66 
 include/rdma/ib_cache.h  | 27 +
 4 files changed, 126 insertions(+)

diff --git a/drivers/infiniband/core/cache.c b/drivers/infiniband/core/cache.c
index 882d491..ae86fe8 100644
--- a/drivers/infiniband/core/cache.c
+++ b/drivers/infiniband/core/cache.c
@@ -273,6 +273,30 @@ int ib_find_cached_gid_by_port(struct ib_device *device,
 }
 EXPORT_SYMBOL(ib_find_cached_gid_by_port);
 
+int ib_find_gid_by_filter(struct ib_device *device,
+ union ib_gid *gid,
+ u8 port_num,
+ bool (*filter)(const union ib_gid *gid,
+const struct ib_gid_attr *,
+void *),
+ void *context, u16 *index)
+{
+   /* Look for a RoCE device with the specified GID. */
+   if (!ib_cache_use_roce_gid_cache(device, port_num))
+   return roce_gid_cache_find_gid_by_filter(device, gid,
+port_num, filter,
+context, index);
+
+   /* Only RoCE GID cache supports filter function */
+   if (filter)
+   return -ENOSYS;
+
+   /* If no RoCE devices with the specified GID, look for IB device. */
+   return __ib_find_cached_gid_by_port(device, port_num,
+   gid, index);
+}
+EXPORT_SYMBOL(ib_find_gid_by_filter);
+
 int ib_get_cached_pkey(struct ib_device *device,
   u8port_num,
   int   index,
diff --git a/drivers/infiniband/core/core_priv.h 
b/drivers/infiniband/core/core_priv.h
index 411672f..d6e73f8 100644
--- a/drivers/infiniband/core/core_priv.h
+++ b/drivers/infiniband/core/core_priv.h
@@ -84,6 +84,15 @@ int roce_gid_cache_find_gid_by_port(struct ib_device 
*ib_dev, union ib_gid *gid,
enum ib_gid_type gid_type, u8 port,
struct net *net, int if_index, u16 *index);
 
+int roce_gid_cache_find_gid_by_filter(struct ib_device *ib_dev,
+ union ib_gid *gid,
+ u8 port,
+ bool (*filter)(const union ib_gid *gid,
+const struct ib_gid_attr *,
+void *),
+ void *context,
+ u16 *index);
+
 int roce_gid_cache_is_active(struct ib_device *ib_dev, u8 port);
 
 enum roce_gid_cache_default_mode {
diff --git a/drivers/infiniband/core/roce_gid_cache.c 
b/drivers/infiniband/core/roce_gid_cache.c
index 5c109f7..ee9ac4d 100644
--- a/drivers/infiniband/core/roce_gid_cache.c
+++ b/drivers/infiniband/core/roce_gid_cache.c
@@ -436,6 +436,72 @@ int roce_gid_cache_find_gid_by_port(struct ib_device 
*ib_dev, union ib_gid *gid,
return -ENOENT;
 }
 
+int roce_gid_cache_find_gid_by_filter(struct ib_device *ib_dev,
+ union ib_gid *gid,
+ u8 port,
+ bool (*filter)(const union ib_gid *,
+const struct ib_gid_attr *,
+void *),
+ void *context,
+ u16 *index)
+{
+   struct ib_roce_gid_cache *cache;
+   unsigned int i;
+   bool found = false;
+
+   if (!ib_dev-cache.roce_gid_cache)
+   return -ENOSYS;
+
+   if (port  start_port(ib_dev) ||
+   port  start_port(ib_dev) + ib_dev-phys_port_cnt ||
+   rdma_port_get_link_layer(ib_dev, port) !=
+   IB_LINK_LAYER_ETHERNET)
+   return -ENOSYS;
+
+   cache = ib_dev-cache.roce_gid_cache[port - start_port(ib_dev)];
+
+   if (!cache || !cache-active)
+   return -ENOENT;
+
+   for (i = 0; i  cache-sz; i++) {
+   unsigned int orig_seq;
+   struct ib_gid_attr attr

[PATCH v2 for-next 27/32] IB/mlx4: Translate cache gid index to real index

2015-03-10 Thread Somnath Kotur
From: Moni Shoua mo...@mellanox.com

When QP is modified with path the given sgid_index is not necessarily
the index that HW knows. This is due to optimizations that can save
place in the HW table. Therefore, translation is required.

Signed-off-by: Moni Shoua mo...@mellanox.com
Signed-off-by: Somnath Kotur somnath.ko...@emulex.com
---
 drivers/infiniband/hw/mlx4/qp.c | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c
index 847f9ec..d7d7c5a 100644
--- a/drivers/infiniband/hw/mlx4/qp.c
+++ b/drivers/infiniband/hw/mlx4/qp.c
@@ -1256,14 +1256,18 @@ static int _mlx4_set_path(struct mlx4_ib_dev *dev, 
const struct ib_ah_attr *ah,
path-static_rate = 0;
 
if (ah-ah_flags  IB_AH_GRH) {
-   if (ah-grh.sgid_index = dev-dev-caps.gid_table_len[port]) {
+   int real_sgid_index = mlx4_ib_gid_index_to_real_index(dev,
+ port,
+ 
ah-grh.sgid_index);
+
+   if (real_sgid_index = dev-dev-caps.gid_table_len[port]) {
pr_err(sgid_index (%u) too large. max is %d\n,
-  ah-grh.sgid_index, 
dev-dev-caps.gid_table_len[port] - 1);
+  real_sgid_index, 
dev-dev-caps.gid_table_len[port] - 1);
return -1;
}
 
path-grh_mylmc |= 1  7;
-   path-mgid_index = ah-grh.sgid_index;
+   path-mgid_index = real_sgid_index;
path-hop_limit  = ah-grh.hop_limit;
path-tclass_flowlabel =
cpu_to_be32((ah-grh.traffic_class  20) |
-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 for-next 11/32] IB/core: Add gid_type to path and rdma_id_private

2015-03-10 Thread Somnath Kotur
From: Matan Barak mat...@mellanox.com

When using rdma cm, we want to take the gid_type from
the rdma_id_private. This is mandatory before adding
an API from user-space/configfs that sets
the gid_type of CM connection.

Signed-off-by: Matan Barak mat...@mellanox.com
Signed-off-by: Somnath Kotur somnath.ko...@emulex.com
---
 drivers/infiniband/core/cm.c  | 19 ++-
 drivers/infiniband/core/cma.c |  2 ++
 drivers/infiniband/core/sa_query.c|  3 ++-
 drivers/infiniband/core/uverbs_marshall.c |  1 +
 include/rdma/ib_sa.h  |  1 +
 5 files changed, 20 insertions(+), 6 deletions(-)

diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
index 7974e74..22dac05 100644
--- a/drivers/infiniband/core/cm.c
+++ b/drivers/infiniband/core/cm.c
@@ -358,9 +358,8 @@ static int cm_init_av_by_path(struct ib_sa_path_rec *path, 
struct cm_av *av)
read_lock_irqsave(cm.device_lock, flags);
list_for_each_entry(cm_dev, cm.device_list, list) {
if (!ib_find_cached_gid(cm_dev-ib_device, path-sgid,
-   IB_GID_TYPE_IB, path-net,
-   path-ifindex,
-   p, NULL)) {
+   path-gid_type, path-net,
+   path-ifindex, p, NULL)) {
port = cm_dev-port[p-1];
break;
}
@@ -1521,6 +1520,8 @@ static int cm_req_handler(struct cm_work *work)
struct ib_cm_id *cm_id;
struct cm_id_private *cm_id_priv, *listen_cm_id_priv;
struct cm_req_msg *req_msg;
+   union ib_gid gid;
+   struct ib_gid_attr gid_attr;
int ret;
 
req_msg = (struct cm_req_msg *)work-mad_recv_wc-recv_buf.mad;
@@ -1560,11 +1561,19 @@ static int cm_req_handler(struct cm_work *work)
cm_format_paths_from_req(req_msg, work-path[0], work-path[1]);
 
memcpy(work-path[0].dmac, cm_id_priv-av.ah_attr.dmac, ETH_ALEN);
-   ret = cm_init_av_by_path(work-path[0], cm_id_priv-av);
+   ret = ib_get_cached_gid(work-port-cm_dev-ib_device,
+   work-port-port_num,
+   cm_id_priv-av.ah_attr.grh.sgid_index,
+   gid, gid_attr);
+   if (!ret) {
+   work-path[0].gid_type = gid_attr.gid_type;
+   ret = cm_init_av_by_path(work-path[0], cm_id_priv-av);
+   }
if (ret) {
ib_get_cached_gid(work-port-cm_dev-ib_device,
  work-port-port_num, 0, work-path[0].sgid,
- NULL);
+ gid_attr);
+   work-path[0].gid_type = gid_attr.gid_type;
ib_send_cm_rej(cm_id, IB_CM_REJ_INVALID_GID,
   work-path[0].sgid, sizeof work-path[0].sgid,
   NULL, 0);
diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index 659676c..9afa410 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -146,6 +146,7 @@ struct rdma_id_private {
u8  tos;
u8  reuseaddr;
u8  afonly;
+   enum ib_gid_typegid_type;
 };
 
 struct cma_multicast {
@@ -1936,6 +1937,7 @@ static int cma_resolve_iboe_route(struct rdma_id_private 
*id_priv)
ndev = dev_get_by_index(init_net, addr-dev_addr.bound_dev_if);
route-path_rec-net = init_net;
route-path_rec-ifindex = addr-dev_addr.bound_dev_if;
+   route-path_rec-gid_type = id_priv-gid_type;
}
if (!ndev) {
ret = -ENODEV;
diff --git a/drivers/infiniband/core/sa_query.c 
b/drivers/infiniband/core/sa_query.c
index 705b6b8..f770049 100644
--- a/drivers/infiniband/core/sa_query.c
+++ b/drivers/infiniband/core/sa_query.c
@@ -546,7 +546,7 @@ int ib_init_ah_from_path(struct ib_device *device, u8 
port_num,
ah_attr-ah_flags = IB_AH_GRH;
ah_attr-grh.dgid = rec-dgid;
 
-   ret = ib_find_cached_gid(device, rec-sgid, IB_GID_TYPE_IB,
+   ret = ib_find_cached_gid(device, rec-sgid, rec-gid_type,
 rec-net, rec-ifindex, port_num,
 gid_index);
if (ret)
@@ -676,6 +676,7 @@ static void ib_sa_path_rec_callback(struct ib_sa_query 
*sa_query,
  mad-data, rec);
rec.net = NULL;
rec.ifindex = 0;
+   rec.gid_type = IB_GID_TYPE_IB;
memset(rec.dmac, 0, ETH_ALEN);
query-callback(status, rec, query-context);
} else
diff --git a/drivers/infiniband/core/uverbs_marshall.c 
b/drivers/infiniband/core/uverbs_marshall.c
index 7d2f14c..af020f8 100644

[PATCH v2 for-next 00/32] RoCE V1/v2 per GID

2015-03-10 Thread Somnath Kotur
 to a private header
(9) Support non-configfs configurations

Devesh Sharma (3):
  RDMA/ocrdma: changes to support RoCE-v2 in UD path
  RDMA/ocrdma: changes to support RoCE-v2 in RC path
  RDMA/ocrdma: changes to support user AH creation

Maor Gottlieb (1):
  net/mlx4_core: Add handlning of R-RoCE over IPV4 in qp attach flow

Matan Barak (13):
  IB/core: Add RoCE GID cache
  IB/core: Add kref to IB devices
  IB/core: Add RoCE GID population
  IB/core: Add default GID for RoCE GID Cache
  net/bonding: make DRV macros private
  IB/core: Add RoCE cache bonding support
  IB/core: GID attribute should be returned from verbs API and cache API
  IB/core: Report gid_type and gid_ndev through sysfs
  IB/core: Support find sgid index using a filter function
  IB/core: Modify ib_verbs and cma in order to use roce_gid_cache
  IB/core: Add gid_type to path and rdma_id_private
  IB/core: Add rdma_network_type to wc
  IB/cma: Add configfs for rdma_cm

Moni Shoua (13):
  IB/mlx4: Remove gid table management for RoCE
  IB/mlx4: Replace spin_lock with rw_semaphore
  IB/mlx4: Lock with RCU instead of RTNL
  net/mlx4: Postpone the registration of net_device
  IB/mlx4: Advertise RoCE support in port capabilities
  IB/mlx4: Implement ib_device callback - get_netdev
  IB/mlx4: Implement ib_device callback - modify_gid
  IB/mlx4: Configure device to work in RoCEv2
  IB/mlx4: Translate cache gid index to real index
  IB/core: Initialize UD header structure with IP and UDP headers
  IB/mlx4: Enable send of RoCE QP1 packets with IP/UDP headers
  IB/mlx4: Create and use another QP1 for RoCEv2
  IB/cma: Join and leave multicast groups with IGMP

Somnath Kotur (2):
  IB/Core: Changes to the IB Core infrastructure for RoCEv2 support
  RDMA/ocrdma: Changes in driver to incorporate the moving of GID Table
mgmt to IB/Core.

 drivers/infiniband/core/Makefile   |   5 +-
 drivers/infiniband/core/addr.c |  11 +-
 drivers/infiniband/core/cache.c| 249 ++--
 drivers/infiniband/core/cm.c   |  49 +-
 drivers/infiniband/core/cma.c  | 233 ++--
 drivers/infiniband/core/cma_configfs.c | 222 
 drivers/infiniband/core/core_priv.h|  88 ++-
 drivers/infiniband/core/device.c   | 150 -
 drivers/infiniband/core/mad.c  |   2 +-
 drivers/infiniband/core/multicast.c|  17 +-
 drivers/infiniband/core/roce_gid_cache.c   | 755 
 drivers/infiniband/core/roce_gid_mgmt.c| 757 +
 drivers/infiniband/core/sa_query.c |  12 +-
 drivers/infiniband/core/sysfs.c| 186 +-
 drivers/infiniband/core/ucma.c |   1 -
 drivers/infiniband/core/ud_header.c| 153 -
 drivers/infiniband/core/uverbs_cmd.c   |   3 +-
 drivers/infiniband/core/uverbs_marshall.c  |   5 +-
 drivers/infiniband/core/verbs.c| 266 ++---
 drivers/infiniband/hw/mlx4/ah.c|  15 +-
 drivers/infiniband/hw/mlx4/mad.c   |  12 +-
 drivers/infiniband/hw/mlx4/main.c  | 756 +---
 drivers/infiniband/hw/mlx4/mcg.c   |   2 +-
 drivers/infiniband/hw/mlx4/mlx4_ib.h   |  33 +-
 drivers/infiniband/hw/mlx4/qp.c| 337 ---
 drivers/infiniband/hw/mthca/mthca_av.c |   2 +-
 drivers/infiniband/hw/mthca/mthca_qp.c |   2 +-
 drivers/infiniband/hw/ocrdma/ocrdma.h  |  12 +
 drivers/infiniband/hw/ocrdma/ocrdma_ah.c   |  94 ++-
 drivers/infiniband/hw/ocrdma/ocrdma_ah.h   |   5 +-
 drivers/infiniband/hw/ocrdma/ocrdma_hw.c   |  50 +-
 drivers/infiniband/hw/ocrdma/ocrdma_main.c | 233 +---
 drivers/infiniband/hw/ocrdma/ocrdma_sli.h  |  18 +-
 drivers/infiniband/hw/ocrdma/ocrdma_verbs.c|  55 +-
 drivers/infiniband/hw/ocrdma/ocrdma_verbs.h|   4 +
 drivers/infiniband/ulp/ipoib/ipoib_main.c  |   2 +-
 drivers/infiniband/ulp/ipoib/ipoib_multicast.c |   2 +-
 drivers/infiniband/ulp/srp/ib_srp.c|   2 +-
 drivers/infiniband/ulp/srpt/ib_srpt.c  |   3 +-
 drivers/net/bonding/bond_main.c|   2 +
 drivers/net/bonding/bond_options.c |  13 -
 drivers/net/bonding/bond_procfs.c  |   1 +
 drivers/net/bonding/bonding_priv.h |  26 +
 drivers/net/ethernet/mellanox/mlx4/en_main.c   |  36 +-
 drivers/net/ethernet/mellanox/mlx4/fw.c|  21 +-
 drivers/net/ethernet/mellanox/mlx4/intf.c  |   3 +
 drivers/net/ethernet/mellanox/mlx4/main.c  |  18 +
 drivers/net/ethernet/mellanox/mlx4/mcg.c   |  14 +-
 drivers/net/ethernet/mellanox/mlx4/mlx4.h  |   3 +-
 drivers/net/ethernet/mellanox/mlx4/port.c  |   9 +-
 drivers/net/ethernet/mellanox/mlx4/qp.c|  27 +
 include/linux/mlx4/cmd.h   |   3 +-
 include/linux/mlx4/device.h|  23 +-
 include/linux/mlx4/driver.h

[PATCH v2 for-next 07/32] IB/core: GID attribute should be returned from verbs API and cache API

2015-03-10 Thread Somnath Kotur
From: Matan Barak mat...@mellanox.com

Along with the GID itself, we now store GIDs attribute.
This GID attribute contains important meta information regarding
the GID itself, for example the netdevice. Thus, this information
needs to be returned in APIs. This patch changes the following APIs:
(a) ib_get_cached_gid
(b) ib_find_cached_gid
(c) ib_find_cached_gid_by_port
(d) ib_query_gid

It changes the usage of those APIs and use the RoCE GID cache
when needed.

Signed-off-by: Matan Barak mat...@mellanox.com
Signed-off-by: Somnath Kotur somnath.ko...@emulex.com
---
 drivers/infiniband/core/cache.c| 225 +
 drivers/infiniband/core/cm.c   |   6 +-
 drivers/infiniband/core/cma.c  |  84 ++---
 drivers/infiniband/core/device.c   |  29 +++-
 drivers/infiniband/core/mad.c  |   2 +-
 drivers/infiniband/core/multicast.c|   3 +-
 drivers/infiniband/core/sa_query.c |   7 +-
 drivers/infiniband/core/sysfs.c|   2 +-
 drivers/infiniband/core/uverbs_marshall.c  |   4 +-
 drivers/infiniband/core/verbs.c|   7 +-
 drivers/infiniband/hw/mlx4/qp.c|   5 +-
 drivers/infiniband/hw/mthca/mthca_av.c |   2 +-
 drivers/infiniband/ulp/ipoib/ipoib_main.c  |   2 +-
 drivers/infiniband/ulp/ipoib/ipoib_multicast.c |   2 +-
 drivers/infiniband/ulp/srp/ib_srp.c|   2 +-
 drivers/infiniband/ulp/srpt/ib_srpt.c  |   3 +-
 include/rdma/ib_cache.h|  44 -
 include/rdma/ib_sa.h   |   4 +-
 include/rdma/ib_verbs.h|   7 +-
 19 files changed, 352 insertions(+), 88 deletions(-)

diff --git a/drivers/infiniband/core/cache.c b/drivers/infiniband/core/cache.c
index 80f6cf2..882d491 100644
--- a/drivers/infiniband/core/cache.c
+++ b/drivers/infiniband/core/cache.c
@@ -42,6 +42,8 @@
 
 #include core_priv.h
 
+#define __IB_ONLY
+
 struct ib_pkey_cache {
int table_len;
u16 table[0];
@@ -69,16 +71,16 @@ static inline int end_port(struct ib_device *device)
0 : device-phys_port_cnt;
 }
 
-int ib_get_cached_gid(struct ib_device *device,
- u8port_num,
- int   index,
- union ib_gid *gid)
+static int __IB_ONLY __ib_get_cached_gid(struct ib_device *device,
+u8port_num,
+int   index,
+union ib_gid *gid)
 {
struct ib_gid_cache *cache;
unsigned long flags;
int ret = 0;
 
-   if (port_num  start_port(device) || port_num  end_port(device))
+   if (!device-cache.gid_cache)
return -EINVAL;
 
read_lock_irqsave(device-cache.lock, flags);
@@ -94,43 +96,183 @@ int ib_get_cached_gid(struct ib_device *device,
 
return ret;
 }
+
+int ib_cache_use_roce_gid_cache(struct ib_device *device, u8 port_num)
+{
+   if (rdma_port_get_link_layer(device, port_num) ==
+   IB_LINK_LAYER_ETHERNET) {
+   if (device-cache.roce_gid_cache)
+   return 0;
+   else
+   return -EAGAIN;
+   }
+
+   return -EINVAL;
+}
+EXPORT_SYMBOL(ib_cache_use_roce_gid_cache);
+
+int ib_get_cached_gid(struct ib_device *device,
+ u8port_num,
+ int   index,
+ union ib_gid *gid,
+ struct ib_gid_attr *attr)
+{
+   int ret;
+
+   if (port_num  start_port(device) || port_num  end_port(device))
+   return -EINVAL;
+
+   ret = ib_cache_use_roce_gid_cache(device, port_num);
+   if (!ret)
+   return roce_gid_cache_get_gid(device, port_num, index, gid,
+ attr);
+
+   if (ret == -EAGAIN)
+   return ret;
+
+   ret = __ib_get_cached_gid(device, port_num, index, gid);
+
+   if (!ret  attr) {
+   memset(attr, 0, sizeof(*attr));
+   attr-gid_type = IB_GID_TYPE_IB;
+   }
+
+   return ret;
+}
 EXPORT_SYMBOL(ib_get_cached_gid);
 
-int ib_find_cached_gid(struct ib_device *device,
-  union ib_gid *gid,
-  u8   *port_num,
-  u16  *index)
+static int __IB_ONLY ___ib_find_cached_gid_by_port(struct ib_device *device,
+  u8   port_num,
+  const union ib_gid *gid,
+  u16  *index)
 {
struct ib_gid_cache *cache;
+   u8 p = port_num - start_port(device);
+   int i;
+
+   if (!ib_cache_use_roce_gid_cache(device

[PATCH v2 for-next 05/32] net/bonding: make DRV macros private

2015-03-10 Thread Somnath Kotur
From: Matan Barak mat...@mellanox.com

The bonding modules currently defines 4 macros with
general names that pollute the global namespace:
DRV_VERSION
DRV_RELDATE
DRV_NAME
DRV_DESCRIPTION

Fixing that by defining a private bonding_priv.h
header files which includes those defines.

Signed-off-by: Matan Barak mat...@mellanox.com
Signed-off-by: Somnath Kotur somnath.ko...@emulex.com
---
 drivers/net/bonding/bond_main.c|  2 ++
 drivers/net/bonding/bond_procfs.c  |  1 +
 drivers/net/bonding/bonding_priv.h | 26 ++
 include/net/bonding.h  |  7 ---
 4 files changed, 29 insertions(+), 7 deletions(-)
 create mode 100644 drivers/net/bonding/bonding_priv.h

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 468c70e..55f2d3e 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -81,6 +81,8 @@
 #include net/bond_3ad.h
 #include net/bond_alb.h
 
+#include bonding_priv.h
+
 /* Module parameters */
 
 /* monitor all links that often (in milliseconds). =0 disables monitoring */
diff --git a/drivers/net/bonding/bond_procfs.c 
b/drivers/net/bonding/bond_procfs.c
index 976f5ad..b50a002 100644
--- a/drivers/net/bonding/bond_procfs.c
+++ b/drivers/net/bonding/bond_procfs.c
@@ -4,6 +4,7 @@
 #include net/netns/generic.h
 #include net/bonding.h
 
+#include bonding_priv.h
 
 static void *bond_info_seq_start(struct seq_file *seq, loff_t *pos)
__acquires(RCU)
diff --git a/drivers/net/bonding/bonding_priv.h 
b/drivers/net/bonding/bonding_priv.h
new file mode 100644
index 000..c093e91
--- /dev/null
+++ b/drivers/net/bonding/bonding_priv.h
@@ -0,0 +1,26 @@
+/*
+ * Bond several ethernet interfaces into a Cisco, running 'Etherchannel'.
+ *
+ * Portions are (c) Copyright 1995 Simon Guru Aleph-Null Janes
+ * NCM: Network and Communications Management, Inc.
+ *
+ * BUT, I'm the one who modified it for ethernet, so:
+ * (c) Copyright 1999, Thomas Davis, tada...@lbl.gov
+ *
+ * This software may be used and distributed according to the terms
+ * of the GNU Public License, incorporated herein by reference.
+ *
+ */
+
+#ifndef _BONDING_PRIV_H
+#define _BONDING_PRIV_H
+
+#define DRV_VERSION3.7.1
+#define DRV_RELDATEApril 27, 2011
+#define DRV_NAME   bonding
+#define DRV_DESCRIPTIONEthernet Channel Bonding Driver
+
+#define bond_version DRV_DESCRIPTION : v DRV_VERSION  ( DRV_RELDATE )\n
+
+#endif
+
diff --git a/include/net/bonding.h b/include/net/bonding.h
index 4c2b0f4..a124173 100644
--- a/include/net/bonding.h
+++ b/include/net/bonding.h
@@ -30,13 +30,6 @@
 #include net/bond_alb.h
 #include net/bond_options.h
 
-#define DRV_VERSION3.7.1
-#define DRV_RELDATEApril 27, 2011
-#define DRV_NAME   bonding
-#define DRV_DESCRIPTIONEthernet Channel Bonding Driver
-
-#define bond_version DRV_DESCRIPTION : v DRV_VERSION  ( DRV_RELDATE )\n
-
 #define BOND_MAX_ARP_TARGETS   16
 
 #define BOND_DEFAULT_MIIMON100
-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 for-next 10/32] IB/core: Modify ib_verbs and cma in order to use roce_gid_cache

2015-03-10 Thread Somnath Kotur
From: Matan Barak mat...@mellanox.com

Previously, we resolved the dmac and took the smac and vlan
from the resolved address. Changing that into finding a net
device that matches the IP and vlan of the network packet
and querying the RoCE GID cache for this net device,
GID and GID type.

ocrdma driver changes were done by Somnath Kotur somnath.ko...@emulex.com

Signed-off-by: Matan Barak mat...@mellanox.com
Signed-off-by: Somnath Kotur somnath.ko...@emulex.com
---
 drivers/infiniband/core/addr.c   |   3 +-
 drivers/infiniband/core/cm.c |  30 --
 drivers/infiniband/core/cma.c|   9 --
 drivers/infiniband/core/core_priv.h  |   4 +-
 drivers/infiniband/core/sa_query.c   |   4 -
 drivers/infiniband/core/ucma.c   |   1 -
 drivers/infiniband/core/uverbs_cmd.c |   3 +-
 drivers/infiniband/core/verbs.c  | 162 ++-
 drivers/infiniband/hw/mlx4/ah.c  |  15 ++-
 drivers/infiniband/hw/mlx4/mad.c |  12 ++-
 drivers/infiniband/hw/mlx4/mcg.c |   2 +-
 drivers/infiniband/hw/mlx4/mlx4_ib.h |   2 +-
 drivers/infiniband/hw/mlx4/qp.c  |  48 +++--
 drivers/infiniband/hw/ocrdma/ocrdma.h|   1 +
 drivers/infiniband/hw/ocrdma/ocrdma_ah.c |  20 ++--
 drivers/infiniband/hw/ocrdma/ocrdma_hw.c |  17 ++--
 include/rdma/ib_addr.h   |   2 +-
 include/rdma/ib_sa.h |   2 -
 include/rdma/ib_verbs.h  |  11 +--
 19 files changed, 190 insertions(+), 158 deletions(-)

diff --git a/drivers/infiniband/core/addr.c b/drivers/infiniband/core/addr.c
index f80da50..43af7f5 100644
--- a/drivers/infiniband/core/addr.c
+++ b/drivers/infiniband/core/addr.c
@@ -458,7 +458,7 @@ static void resolve_cb(int status, struct sockaddr 
*src_addr,
 }
 
 int rdma_addr_find_dmac_by_grh(union ib_gid *sgid, union ib_gid *dgid, u8 
*dmac,
-  u16 *vlan_id)
+  u16 *vlan_id, int if_index)
 {
int ret = 0;
struct rdma_dev_addr dev_addr;
@@ -481,6 +481,7 @@ int rdma_addr_find_dmac_by_grh(union ib_gid *sgid, union 
ib_gid *dgid, u8 *dmac,
return ret;
 
memset(dev_addr, 0, sizeof(dev_addr));
+   dev_addr.bound_dev_if = if_index;
 
ctx.addr = dev_addr;
init_completion(ctx.comp);
diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
index d88f2ae..7974e74 100644
--- a/drivers/infiniband/core/cm.c
+++ b/drivers/infiniband/core/cm.c
@@ -178,8 +178,6 @@ struct cm_av {
struct ib_ah_attr ah_attr;
u16 pkey_index;
u8 timeout;
-   u8  valid;
-   u8  smac[ETH_ALEN];
 };
 
 struct cm_work {
@@ -382,7 +380,6 @@ static int cm_init_av_by_path(struct ib_sa_path_rec *path, 
struct cm_av *av)
 av-ah_attr);
av-timeout = path-packet_life_time + 1;
 
-   av-valid = 1;
return 0;
 }
 
@@ -1563,7 +1560,6 @@ static int cm_req_handler(struct cm_work *work)
cm_format_paths_from_req(req_msg, work-path[0], work-path[1]);
 
memcpy(work-path[0].dmac, cm_id_priv-av.ah_attr.dmac, ETH_ALEN);
-   work-path[0].vlan_id = cm_id_priv-av.ah_attr.vlan_id;
ret = cm_init_av_by_path(work-path[0], cm_id_priv-av);
if (ret) {
ib_get_cached_gid(work-port-cm_dev-ib_device,
@@ -3511,32 +3507,6 @@ static int cm_init_qp_rtr_attr(struct cm_id_private 
*cm_id_priv,
*qp_attr_mask = IB_QP_STATE | IB_QP_AV | IB_QP_PATH_MTU |
IB_QP_DEST_QPN | IB_QP_RQ_PSN;
qp_attr-ah_attr = cm_id_priv-av.ah_attr;
-   if (!cm_id_priv-av.valid) {
-   spin_unlock_irqrestore(cm_id_priv-lock, flags);
-   return -EINVAL;
-   }
-   if (cm_id_priv-av.ah_attr.vlan_id != 0x) {
-   qp_attr-vlan_id = cm_id_priv-av.ah_attr.vlan_id;
-   *qp_attr_mask |= IB_QP_VID;
-   }
-   if (!is_zero_ether_addr(cm_id_priv-av.smac)) {
-   memcpy(qp_attr-smac, cm_id_priv-av.smac,
-  sizeof(qp_attr-smac));
-   *qp_attr_mask |= IB_QP_SMAC;
-   }
-   if (cm_id_priv-alt_av.valid) {
-   if (cm_id_priv-alt_av.ah_attr.vlan_id != 0x) {
-   qp_attr-alt_vlan_id =
-   cm_id_priv-alt_av.ah_attr.vlan_id;
-   *qp_attr_mask |= IB_QP_ALT_VID;
-   }
-   if (!is_zero_ether_addr(cm_id_priv-alt_av.smac)) {
-   memcpy(qp_attr-alt_smac,
-  cm_id_priv-alt_av.smac,
-  sizeof(qp_attr-alt_smac));
-   *qp_attr_mask |= IB_QP_ALT_SMAC

[PATCH v2 for-next 24/32] IB/mlx4: Implement ib_device callback - get_netdev

2015-03-10 Thread Somnath Kotur
From: Moni Shoua mo...@mellanox.com

This is a new callback that is required for RoCEv2 support.
In port aggregation mode it is required to return the netdev of the
active port so  support in mlx4 core driver to figure out that port
identity is required.

Signed-off-by: Moni Shoua mo...@mellanox.com
Signed-off-by: Somnath Kotur somnath.ko...@emulex.com
---
 drivers/infiniband/hw/mlx4/main.c | 29 +
 drivers/net/ethernet/mellanox/mlx4/main.c | 18 ++
 include/linux/mlx4/driver.h   |  1 +
 3 files changed, 48 insertions(+)

diff --git a/drivers/infiniband/hw/mlx4/main.c 
b/drivers/infiniband/hw/mlx4/main.c
index bf87a95..04e6603 100644
--- a/drivers/infiniband/hw/mlx4/main.c
+++ b/drivers/infiniband/hw/mlx4/main.c
@@ -47,6 +47,8 @@
 #include rdma/ib_addr.h
 #include rdma/ib_cache.h
 
+#include net/bonding.h
+
 #include linux/mlx4/driver.h
 #include linux/mlx4/cmd.h
 #include linux/mlx4/qp.h
@@ -1527,6 +1529,32 @@ unlock:
mutex_unlock(ibdev-qp1_proxy_lock[port - 1]);
 }
 
+static struct net_device *mlx4_ib_get_netdev(struct ib_device *device, u8 
port_num)
+{
+   struct mlx4_ib_dev *ibdev = to_mdev(device);
+
+   if (mlx4_is_bonded(ibdev-dev)) {
+   struct net_device *dev;
+   struct net_device *upper = NULL;
+
+   rcu_read_lock();
+
+   dev = mlx4_get_protocol_dev(ibdev-dev, MLX4_PROT_ETH, 
port_num);
+   if (dev)
+   upper = netdev_master_upper_dev_get_rcu(dev);
+   else
+   goto unlock;
+   if (upper)
+   dev = 
bond_option_active_slave_get_rcu(netdev_priv(upper));
+unlock:
+   rcu_read_unlock();
+
+   return dev;
+   }
+
+   return mlx4_get_protocol_dev(ibdev-dev, MLX4_PROT_ETH, port_num);
+}
+
 static void mlx4_ib_scan_netdevs(struct mlx4_ib_dev *ibdev,
 struct net_device *dev,
 unsigned long event)
@@ -1806,6 +1834,7 @@ static void *mlx4_ib_add(struct mlx4_dev *dev)
ibdev-ib_dev.attach_mcast  = mlx4_ib_mcg_attach;
ibdev-ib_dev.detach_mcast  = mlx4_ib_mcg_detach;
ibdev-ib_dev.process_mad   = mlx4_ib_process_mad;
+   ibdev-ib_dev.get_netdev= mlx4_ib_get_netdev;
 
if (!mlx4_is_slave(ibdev-dev)) {
ibdev-ib_dev.alloc_fmr = mlx4_ib_fmr_alloc;
diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c 
b/drivers/net/ethernet/mellanox/mlx4/main.c
index 1893a57..6311897 100644
--- a/drivers/net/ethernet/mellanox/mlx4/main.c
+++ b/drivers/net/ethernet/mellanox/mlx4/main.c
@@ -1237,6 +1237,24 @@ int mlx4_port_map_set(struct mlx4_dev *dev, struct 
mlx4_port_map *v2p)
 }
 EXPORT_SYMBOL_GPL(mlx4_port_map_set);
 
+int mlx4_port_map_get(struct mlx4_dev *dev, u8 vport, u8 *pport)
+{
+   struct mlx4_priv *priv = mlx4_priv(dev);
+
+   if (!pport)
+   return -EINVAL;
+   *pport = 0;
+
+   if (vport == 1)
+   *pport = priv-v2p.port1;
+   else if (vport == 2)
+   *pport = priv-v2p.port2;
+   if (!*pport)
+   return -EINVAL;
+   return 0;
+}
+EXPORT_SYMBOL_GPL(mlx4_port_map_get);
+
 static int mlx4_load_fw(struct mlx4_dev *dev)
 {
struct mlx4_priv *priv = mlx4_priv(dev);
diff --git a/include/linux/mlx4/driver.h b/include/linux/mlx4/driver.h
index 5a06d96..a992971 100644
--- a/include/linux/mlx4/driver.h
+++ b/include/linux/mlx4/driver.h
@@ -81,6 +81,7 @@ struct mlx4_port_map {
 };
 
 int mlx4_port_map_set(struct mlx4_dev *dev, struct mlx4_port_map *v2p);
+int mlx4_port_map_get(struct mlx4_dev *dev, u8 vport, u8 *pport);
 
 void *mlx4_get_protocol_dev(struct mlx4_dev *dev, enum mlx4_protocol proto, 
int port);
 
-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 for-next 08/32] IB/core: Report gid_type and gid_ndev through sysfs

2015-03-10 Thread Somnath Kotur
From: Matan Barak mat...@mellanox.com

Since we've added GID attributes to the RoCE GID table,
the users need a convenient way to query them.
Adding the GID type and relate net device to IB's sysfs.

The new attributes are available in:
/sys/class/infiniband/device/ports/port/gid_attrs/ndevs/index
/sys/class/infiniband/device/ports/port/gid_attrs/types/index

The index corresponds to the index of the respective GID in:
/sys/class/infiniband/device/ports/port/gids/index

Signed-off-by: Matan Barak mat...@mellanox.com
Signed-off-by: Somnath Kotur somnath.ko...@emulex.com
---
 drivers/infiniband/core/core_priv.h  |   2 +
 drivers/infiniband/core/roce_gid_cache.c |  13 +++
 drivers/infiniband/core/sysfs.c  | 184 ++-
 3 files changed, 197 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/core/core_priv.h 
b/drivers/infiniband/core/core_priv.h
index 6ab40a9..411672f 100644
--- a/drivers/infiniband/core/core_priv.h
+++ b/drivers/infiniband/core/core_priv.h
@@ -71,6 +71,8 @@ void ib_enum_roce_ports_of_netdev(roce_netdev_filter filter,
  roce_netdev_callback cb,
  void *cookie);
 
+const char *roce_gid_cache_type_str(enum ib_gid_type gid_type);
+
 int roce_gid_cache_get_gid(struct ib_device *ib_dev, u8 port, int index,
   union ib_gid *gid, struct ib_gid_attr *attr);
 
diff --git a/drivers/infiniband/core/roce_gid_cache.c 
b/drivers/infiniband/core/roce_gid_cache.c
index 2bd663f..5c109f7 100644
--- a/drivers/infiniband/core/roce_gid_cache.c
+++ b/drivers/infiniband/core/roce_gid_cache.c
@@ -48,6 +48,11 @@ enum gid_attr_find_mask {
GID_ATTR_FIND_MASK_NETDEV   = 1UL  1,
 };
 
+static const char * const gid_type_str[] = {
+   [IB_GID_TYPE_IB]= IB/RoCE V1\n,
+   [IB_GID_TYPE_ROCE_V2]   = RoCE V2\n,
+};
+
 static inline int start_port(struct ib_device *ib_dev)
 {
return (ib_dev-node_type == RDMA_NODE_IB_SWITCH) ? 0 : 1;
@@ -58,6 +63,14 @@ struct dev_put_rcu {
struct net_device   *ndev;
 };
 
+const char *roce_gid_cache_type_str(enum ib_gid_type gid_type)
+{
+   if (gid_type  ARRAY_SIZE(gid_type_str)  gid_type_str[gid_type])
+   return gid_type_str[gid_type];
+
+   return Invalid GID type;
+}
+
 static void put_ndev(struct rcu_head *rcu)
 {
struct dev_put_rcu *put_rcu =
diff --git a/drivers/infiniband/core/sysfs.c b/drivers/infiniband/core/sysfs.c
index 5cee246..887c2f8 100644
--- a/drivers/infiniband/core/sysfs.c
+++ b/drivers/infiniband/core/sysfs.c
@@ -37,12 +37,22 @@
 #include linux/slab.h
 #include linux/stat.h
 #include linux/string.h
+#include linux/netdevice.h
 
 #include rdma/ib_mad.h
 
+struct ib_port;
+
+struct gid_attr_group {
+   struct ib_port  *port;
+   struct kobject  kobj;
+   struct attribute_group  ndev;
+   struct attribute_group  type;
+};
 struct ib_port {
struct kobject kobj;
struct ib_device  *ibdev;
+   struct gid_attr_group *gid_attr_group;
struct attribute_group gid_group;
struct attribute_group pkey_group;
u8 port_num;
@@ -84,6 +94,24 @@ static const struct sysfs_ops port_sysfs_ops = {
.show = port_attr_show
 };
 
+static ssize_t gid_attr_show(struct kobject *kobj,
+struct attribute *attr, char *buf)
+{
+   struct port_attribute *port_attr =
+   container_of(attr, struct port_attribute, attr);
+   struct ib_port *p = container_of(kobj, struct gid_attr_group,
+kobj)-port;
+
+   if (!port_attr-show)
+   return -EIO;
+
+   return port_attr-show(p, port_attr, buf);
+}
+
+static const struct sysfs_ops gid_attr_sysfs_ops = {
+   .show = gid_attr_show
+};
+
 static ssize_t state_show(struct ib_port *p, struct port_attribute *unused,
  char *buf)
 {
@@ -281,6 +309,46 @@ static struct attribute *port_default_attrs[] = {
NULL
 };
 
+static size_t print_ndev(struct ib_gid_attr *gid_attr, char *buf)
+{
+   if (!gid_attr-ndev)
+   return -EINVAL;
+
+   return sprintf(buf, %s\n, gid_attr-ndev-name);
+}
+
+static size_t print_gid_type(struct ib_gid_attr *gid_attr, char *buf)
+{
+   return sprintf(buf, %s, roce_gid_cache_type_str(gid_attr-gid_type));
+}
+
+static ssize_t _show_port_gid_attr(struct ib_port *p,
+  struct port_attribute *attr,
+  char *buf,
+  size_t (*print)(struct ib_gid_attr *gid_attr,
+  char *buf))
+{
+   struct port_table_attribute *tab_attr =
+   container_of(attr, struct port_table_attribute, attr);
+   union ib_gid gid;
+   struct ib_gid_attr gid_attr;
+   ssize_t ret;
+   va_list args;
+
+   rcu_read_lock

[PATCH v2 for-next 13/32] IB/cma: Add configfs for rdma_cm

2015-03-10 Thread Somnath Kotur
From: Matan Barak mat...@mellanox.com

Users would like to control the behaviour of rdma_cm.
For example, old applications which doesn't set the
required RoCE gid type could be executed on RoCE V2
network types. In order to support this configuration,
we implement a configfs for rdma_cm.

In order to use the configfs, one needs to mount it and
mkdir IB device name inside rdma_cm directory.

The patch adds support for a single configuration file,
default_roce_mode. The mode can either be IB  RoCEv1 or
RoCEv2.

Signed-off-by: Matan Barak mat...@mellanox.com
Signed-off-by: Somnath Kotur somnath.ko...@emulex.com
---
 drivers/infiniband/core/Makefile |   2 +
 drivers/infiniband/core/cma.c|  54 +++-
 drivers/infiniband/core/cma_configfs.c   | 222 +++
 drivers/infiniband/core/core_priv.h  |  13 ++
 drivers/infiniband/core/roce_gid_cache.c |  13 ++
 5 files changed, 300 insertions(+), 4 deletions(-)
 create mode 100644 drivers/infiniband/core/cma_configfs.c

diff --git a/drivers/infiniband/core/Makefile b/drivers/infiniband/core/Makefile
index 2c94963..e25a96c 100644
--- a/drivers/infiniband/core/Makefile
+++ b/drivers/infiniband/core/Makefile
@@ -24,6 +24,8 @@ iw_cm-y :=iwcm.o iwpm_util.o iwpm_msg.o
 
 rdma_cm-y :=   cma.o
 
+rdma_cm-$(CONFIG_CONFIGFS_FS) += cma_configfs.o
+
 rdma_ucm-y :=  ucma.o
 
 ib_addr-y :=   addr.o
diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index 9afa410..1705280 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -55,6 +55,7 @@
 #include rdma/ib_cm.h
 #include rdma/ib_sa.h
 #include rdma/iw_cm.h
+#include core_priv.h
 
 MODULE_AUTHOR(Sean Hefty);
 MODULE_DESCRIPTION(Generic RDMA CM Agent);
@@ -91,6 +92,7 @@ struct cma_device {
struct completion   comp;
atomic_trefcount;
struct list_headid_list;
+   enum ib_gid_typedefault_gid_type;
 };
 
 struct rdma_bind_list {
@@ -103,6 +105,42 @@ enum {
CMA_OPTION_AFONLY,
 };
 
+void cma_ref_dev(struct cma_device *cma_dev)
+{
+   atomic_inc(cma_dev-refcount);
+}
+
+struct cma_device *cma_enum_devices_by_ibdev(cma_device_filter filter,
+void   *cookie)
+{
+   struct cma_device *cma_dev;
+   struct cma_device *found_cma_dev = NULL;
+
+   mutex_lock(lock);
+
+   list_for_each_entry(cma_dev, dev_list, list)
+   if (filter(cma_dev-device, cookie)) {
+   found_cma_dev = cma_dev;
+   break;
+   }
+
+   if (found_cma_dev)
+   cma_ref_dev(found_cma_dev);
+   mutex_unlock(lock);
+   return found_cma_dev;
+}
+
+enum ib_gid_type cma_get_default_gid_type(struct cma_device *cma_dev)
+{
+   return cma_dev-default_gid_type;
+}
+
+void cma_set_default_gid_type(struct cma_device *cma_dev,
+ enum ib_gid_type default_gid_type)
+{
+   cma_dev-default_gid_type = default_gid_type;
+}
+
 /*
  * Device removal can occur at anytime, so we need extra handling to
  * serialize notifying the user of device removal with other callbacks.
@@ -248,15 +286,16 @@ static inline void cma_set_ip_ver(struct cma_hdr *hdr, u8 
ip_ver)
 static void cma_attach_to_dev(struct rdma_id_private *id_priv,
  struct cma_device *cma_dev)
 {
-   atomic_inc(cma_dev-refcount);
+   cma_ref_dev(cma_dev);
id_priv-cma_dev = cma_dev;
+   id_priv-gid_type = cma_dev-default_gid_type;
id_priv-id.device = cma_dev-device;
id_priv-id.route.addr.dev_addr.transport =
rdma_node_get_transport(cma_dev-device-node_type);
list_add_tail(id_priv-list, cma_dev-id_list);
 }
 
-static inline void cma_deref_dev(struct cma_device *cma_dev)
+void cma_deref_dev(struct cma_device *cma_dev)
 {
if (atomic_dec_and_test(cma_dev-refcount))
complete(cma_dev-comp);
@@ -385,7 +424,7 @@ static int cma_acquire_dev(struct rdma_id_private *id_priv,
 
ret = ib_find_cached_gid_by_port(cma_dev-device,
 iboe_gid,
-IB_GID_TYPE_IB,
+
cma_dev-default_gid_type,
 port,
 init_net,
 if_index,
@@ -418,7 +457,7 @@ static int cma_acquire_dev(struct rdma_id_private *id_priv,
 
ret = 
ib_find_cached_gid_by_port(cma_dev-device,
 
iboe_gid,
-
IB_GID_TYPE_IB

[PATCH v2 for-next 25/32] IB/mlx4: Implement ib_device callback - modify_gid

2015-03-10 Thread Somnath Kotur
From: Moni Shoua mo...@mellanox.com

This is a new callbac that is required for RoCEv2 support.
In RoCE, GID table is managed in the IB core driver. The role of the
mlx4 driver is to synchronize the HW with the entries in the GID table.
Since it is possible that the same GID value will appear more than once
in the GID table (though with different attributes) it is required from
the mlx4 driver to maintain a reference counting mechanism and populate
the HW with a single value.
Since an index to the GID table is not necessarily the same as index to
the matching entry in the HW GID table, a translation between indexes is
required.

Signed-off-by: Moni Shoua mo...@mellanox.com
Signed-off-by: Somnath Kotur somnath.ko...@emulex.com
---
 drivers/infiniband/hw/mlx4/main.c| 224 +++
 drivers/infiniband/hw/mlx4/mlx4_ib.h |  18 +++
 include/linux/mlx4/cmd.h |   3 +-
 include/linux/mlx4/device.h  |   3 +-
 4 files changed, 246 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/hw/mlx4/main.c 
b/drivers/infiniband/hw/mlx4/main.c
index 04e6603..9d651cf 100644
--- a/drivers/infiniband/hw/mlx4/main.c
+++ b/drivers/infiniband/hw/mlx4/main.c
@@ -1555,6 +1555,228 @@ unlock:
return mlx4_get_protocol_dev(ibdev-dev, MLX4_PROT_ETH, port_num);
 }
 
+static int mlx4_ib_update_gids_v1(struct gid_entry *gids,
+ struct mlx4_ib_dev *ibdev,
+ u8 port_num)
+{
+   struct mlx4_cmd_mailbox *mailbox;
+   int err;
+   struct mlx4_dev *dev = ibdev-dev;
+   int i;
+   union ib_gid *gid_tbl;
+
+   mailbox = mlx4_alloc_cmd_mailbox(dev);
+   if (IS_ERR(mailbox))
+   return -ENOMEM;
+
+   gid_tbl = mailbox-buf;
+
+   for (i = 0; i  MLX4_MAX_PORT_GIDS; ++i)
+   memcpy(gid_tbl[i], gids[i].gid, sizeof(union ib_gid));
+
+   err = mlx4_cmd(dev, mailbox-dma,
+  MLX4_SET_PORT_GID_TABLE  8 | port_num,
+  1, MLX4_CMD_SET_PORT, MLX4_CMD_TIME_CLASS_B,
+  MLX4_CMD_WRAPPED);
+   if (mlx4_is_bonded(dev))
+   err += mlx4_cmd(dev, mailbox-dma,
+   MLX4_SET_PORT_GID_TABLE  8 | 2,
+   1, MLX4_CMD_SET_PORT, MLX4_CMD_TIME_CLASS_B,
+   MLX4_CMD_WRAPPED);
+
+   mlx4_free_cmd_mailbox(dev, mailbox);
+   return err;
+}
+
+static int mlx4_ib_update_gids_v1_v2(struct gid_entry *gids,
+struct mlx4_ib_dev *ibdev,
+u8 port_num)
+{
+   struct mlx4_cmd_mailbox *mailbox;
+   int err;
+   struct mlx4_dev *dev = ibdev-dev;
+   int i;
+   struct {
+   union ib_gidgid;
+   __be32  rsrvd1[2];
+   __be16  rsrvd2;
+   u8  type;
+   u8  version;
+   __be32  rsrvd3;
+   } *gid_tbl;
+
+   mailbox = mlx4_alloc_cmd_mailbox(dev);
+   if (IS_ERR(mailbox))
+   return -ENOMEM;
+
+   gid_tbl = mailbox-buf;
+   for (i = 0; i  MLX4_MAX_PORT_GIDS; ++i) {
+   memcpy(gid_tbl[i].gid, gids[i].gid, sizeof(union ib_gid));
+   if (gids[i].gid_type == IB_GID_TYPE_ROCE_V2) {
+   gid_tbl[i].version = 2;
+   if (!ipv6_addr_v4mapped((struct in6_addr 
*)gids[i].gid))
+   gid_tbl[i].type = 1;
+   }
+   }
+
+   err = mlx4_cmd(dev, mailbox-dma,
+  MLX4_SET_PORT_ROCE_ADDR  8 | port_num,
+  1, MLX4_CMD_SET_PORT, MLX4_CMD_TIME_CLASS_B,
+  MLX4_CMD_WRAPPED);
+   if (mlx4_is_bonded(dev))
+   err += mlx4_cmd(dev, mailbox-dma,
+   MLX4_SET_PORT_ROCE_ADDR  8 | 2,
+   1, MLX4_CMD_SET_PORT, MLX4_CMD_TIME_CLASS_B,
+   MLX4_CMD_WRAPPED);
+
+   mlx4_free_cmd_mailbox(dev, mailbox);
+   return err;
+}
+
+static int mlx4_ib_update_gids(struct gid_entry *gids,
+  struct mlx4_ib_dev *ibdev,
+  u8 port_num)
+{
+   if (ibdev-dev-caps.flags2  MLX4_DEV_CAP_FLAG2_ROCE_V1_V2)
+   return mlx4_ib_update_gids_v1_v2(gids, ibdev, port_num);
+
+   return mlx4_ib_update_gids_v1(gids, ibdev, port_num);
+}
+
+static int mlx4_ib_modify_gid(struct ib_device *device,
+ u8 port_num, unsigned int index,
+ const union ib_gid *gid,
+ const struct ib_gid_attr *attr,
+ void **context)
+{
+   struct mlx4_ib_dev *ibdev = to_mdev(device);
+   struct mlx4_ib_iboe *iboe = ibdev-iboe;
+   struct mlx4_port_gid_table   *port_gid_table;
+   int free = -1, found = -1;
+   int ret

[PATCH v2 for-next 15/32] RDMA/ocrdma: Changes in driver to incorporate the moving of GID Table mgmt to IB/Core.

2015-03-10 Thread Somnath Kotur
1.Check and set port capability flags to indicate RoCEV2 support.
2.Change query_gid hook to return value from IB/Core GID Mgmt APIs.
3.Get rid of all the netdev notifier chain subscription code as well as
maintenance of SGID Table in memory.
4.Implement get_netdev hook in driver.

Signed-off-by: Somnath Kotur somnath.ko...@emulex.com
Signed-off-by: Devesh Sharma devesh.sha...@emulex.com
---
 drivers/infiniband/hw/ocrdma/ocrdma.h   |  10 ++
 drivers/infiniband/hw/ocrdma/ocrdma_hw.c|   3 +
 drivers/infiniband/hw/ocrdma/ocrdma_main.c  | 233 +---
 drivers/infiniband/hw/ocrdma/ocrdma_sli.h   |  13 ++
 drivers/infiniband/hw/ocrdma/ocrdma_verbs.c |  34 +++-
 drivers/infiniband/hw/ocrdma/ocrdma_verbs.h |   4 +
 6 files changed, 65 insertions(+), 232 deletions(-)

diff --git a/drivers/infiniband/hw/ocrdma/ocrdma.h 
b/drivers/infiniband/hw/ocrdma/ocrdma.h
index 16ee36e..97f971a 100644
--- a/drivers/infiniband/hw/ocrdma/ocrdma.h
+++ b/drivers/infiniband/hw/ocrdma/ocrdma.h
@@ -100,6 +100,7 @@ struct ocrdma_dev_attr {
u8 local_ca_ack_delay;
u8 ird;
u8 num_ird_pages;
+   u8 roce_flags;
 };
 
 struct ocrdma_dma_mem {
@@ -575,4 +576,13 @@ static inline u8 ocrdma_is_enabled_and_synced(u32 state)
(state  OCRDMA_STATE_FLAG_SYNC);
 }
 
+static inline bool ocrdma_is_rocev2_supported(struct ocrdma_dev *dev)
+{
+   return (dev-attr.roce_flags  (OCRDMA_L3_TYPE_IPV4 
+   OCRDMA_ROUDP_FLAGS_SHIFT) ||
+   dev-attr.roce_flags  (OCRDMA_L3_TYPE_IPV6 
+   OCRDMA_ROUDP_FLAGS_SHIFT)) ?
+   true : false;
+}
+
 #endif
diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_hw.c 
b/drivers/infiniband/hw/ocrdma/ocrdma_hw.c
index e5f0244..20f9e8f 100644
--- a/drivers/infiniband/hw/ocrdma/ocrdma_hw.c
+++ b/drivers/infiniband/hw/ocrdma/ocrdma_hw.c
@@ -1112,6 +1112,9 @@ static void ocrdma_get_attr(struct ocrdma_dev *dev,
attr-local_ca_ack_delay = (rsp-max_pd_ca_ack_delay 
OCRDMA_MBX_QUERY_CFG_CA_ACK_DELAY_MASK) 
OCRDMA_MBX_QUERY_CFG_CA_ACK_DELAY_SHIFT;
+   attr-roce_flags = (rsp-max_pd_ca_ack_delay 
+   OCRDMA_MBX_QUERY_CFG_L3_TYPE_MASK) 
+   OCRDMA_MBX_QUERY_CFG_L3_TYPE_SHIFT;
attr-max_mw = rsp-max_mw;
attr-max_mr = rsp-max_mr;
attr-max_mr_size = ((u64)rsp-max_mr_size_hi  32) |
diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_main.c 
b/drivers/infiniband/hw/ocrdma/ocrdma_main.c
index 7a2b59a..a81492f 100644
--- a/drivers/infiniband/hw/ocrdma/ocrdma_main.c
+++ b/drivers/infiniband/hw/ocrdma/ocrdma_main.c
@@ -51,8 +51,6 @@ static LIST_HEAD(ocrdma_dev_list);
 static DEFINE_SPINLOCK(ocrdma_devlist_lock);
 static DEFINE_IDR(ocrdma_dev_id);
 
-static union ib_gid ocrdma_zero_sgid;
-
 void ocrdma_get_guid(struct ocrdma_dev *dev, u8 *guid)
 {
u8 mac_addr[6];
@@ -67,135 +65,6 @@ void ocrdma_get_guid(struct ocrdma_dev *dev, u8 *guid)
guid[6] = mac_addr[4];
guid[7] = mac_addr[5];
 }
-
-static bool ocrdma_add_sgid(struct ocrdma_dev *dev, union ib_gid *new_sgid)
-{
-   int i;
-   unsigned long flags;
-
-   memset(ocrdma_zero_sgid, 0, sizeof(union ib_gid));
-
-
-   spin_lock_irqsave(dev-sgid_lock, flags);
-   for (i = 0; i  OCRDMA_MAX_SGID; i++) {
-   if (!memcmp(dev-sgid_tbl[i], ocrdma_zero_sgid,
-   sizeof(union ib_gid))) {
-   /* found free entry */
-   memcpy(dev-sgid_tbl[i], new_sgid,
-  sizeof(union ib_gid));
-   spin_unlock_irqrestore(dev-sgid_lock, flags);
-   return true;
-   } else if (!memcmp(dev-sgid_tbl[i], new_sgid,
-  sizeof(union ib_gid))) {
-   /* entry already present, no addition is required. */
-   spin_unlock_irqrestore(dev-sgid_lock, flags);
-   return false;
-   }
-   }
-   spin_unlock_irqrestore(dev-sgid_lock, flags);
-   return false;
-}
-
-static bool ocrdma_del_sgid(struct ocrdma_dev *dev, union ib_gid *sgid)
-{
-   int found = false;
-   int i;
-   unsigned long flags;
-
-
-   spin_lock_irqsave(dev-sgid_lock, flags);
-   /* first is default sgid, which cannot be deleted. */
-   for (i = 1; i  OCRDMA_MAX_SGID; i++) {
-   if (!memcmp(dev-sgid_tbl[i], sgid, sizeof(union ib_gid))) {
-   /* found matching entry */
-   memset(dev-sgid_tbl[i], 0, sizeof(union ib_gid));
-   found = true;
-   break;
-   }
-   }
-   spin_unlock_irqrestore(dev-sgid_lock, flags);
-   return found;
-}
-
-static int ocrdma_addr_event(unsigned long event, struct

[PATCH v2 for-next 02/32] IB/core: Add kref to IB devices

2015-03-10 Thread Somnath Kotur
From: Matan Barak mat...@mellanox.com

Previously. we used device_mutex lock in order to protect
the device's list. That means that in order to guarantee a
device isn't freed while we use it, we had to lock all
devices.

Adding a kref per IB device. Before an IB device
is unregistered, we wait before its not held anymore.

Signed-off-by: Matan Barak mat...@mellanox.com
Signed-off-by: Somnath Kotur somnath.ko...@emulex.com
---
 drivers/infiniband/core/device.c | 41 
 include/rdma/ib_verbs.h  |  6 ++
 2 files changed, 47 insertions(+)

diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
index 18c1ece..8616a95 100644
--- a/drivers/infiniband/core/device.c
+++ b/drivers/infiniband/core/device.c
@@ -261,6 +261,39 @@ out:
return ret;
 }
 
+static void ib_device_complete_cb(struct kref *kref)
+{
+   struct ib_device *device = container_of(kref, struct ib_device,
+   refcount);
+
+   if (device-reg_state = IB_DEV_UNREGISTERING)
+   complete(device-free);
+}
+
+/**
+ * ib_device_hold - increase the reference count of device
+ * @device: ib device to prevent from being free'd
+ *
+ * Prevent the device from being free'd.
+ */
+void ib_device_hold(struct ib_device *device)
+{
+   kref_get(device-refcount);
+}
+EXPORT_SYMBOL(ib_device_hold);
+
+/**
+ * ib_device_put - decrease the reference count of device
+ * @device: allows this device to be free'd
+ *
+ * Puts the ib_device and allows it to be free'd.
+ */
+int ib_device_put(struct ib_device *device)
+{
+   return kref_put(device-refcount, ib_device_complete_cb);
+}
+EXPORT_SYMBOL(ib_device_put);
+
 /**
  * ib_register_device - Register an IB device with IB core
  * @device:Device to register
@@ -312,6 +345,9 @@ int ib_register_device(struct ib_device *device,
 
list_add_tail(device-core_list, device_list);
 
+   kref_init(device-refcount);
+   init_completion(device-free);
+
device-reg_state = IB_DEV_REGISTERED;
 
{
@@ -342,6 +378,8 @@ void ib_unregister_device(struct ib_device *device)
 
mutex_lock(device_mutex);
 
+   device-reg_state = IB_DEV_UNREGISTERING;
+
list_for_each_entry_reverse(client, client_list, list)
if (client-remove)
client-remove(device);
@@ -355,6 +393,9 @@ void ib_unregister_device(struct ib_device *device)
 
ib_device_unregister_sysfs(device);
 
+   ib_device_put(device);
+   wait_for_completion(device-free);
+
spin_lock_irqsave(device-client_data_lock, flags);
list_for_each_entry_safe(context, tmp, device-client_data_list, list)
kfree(context);
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 1866595..a7593b0 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -1716,6 +1716,7 @@ struct ib_device {
enum {
IB_DEV_UNINITIALIZED,
IB_DEV_REGISTERED,
+   IB_DEV_UNREGISTERING,
IB_DEV_UNREGISTERED
}reg_state;
 
@@ -1728,6 +1729,8 @@ struct ib_device {
u32  local_dma_lkey;
u8   node_type;
u8   phys_port_cnt;
+   struct kref  refcount;
+   struct completionfree;
 };
 
 struct ib_client {
@@ -1741,6 +1744,9 @@ struct ib_client {
 struct ib_device *ib_alloc_device(size_t size);
 void ib_dealloc_device(struct ib_device *device);
 
+void ib_device_hold(struct ib_device *device);
+int ib_device_put(struct ib_device *device);
+
 int ib_register_device(struct ib_device *device,
   int (*port_callback)(struct ib_device *,
u8, struct kobject *));
-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 for-next 01/32] IB/core: Add RoCE GID cache

2015-03-10 Thread Somnath Kotur
From: Matan Barak mat...@mellanox.com

In order to manage multiple types, vlans and MACs per GID, we
need to store them along the GID itself. We store the net device
as well, as sometimes GIDs should be handled according to the
net device they came from. Since populating the GID table should
be identical for every RoCE provider, the GIDs table should be
handled in ib_core.

Adding a GID cache table that supports a lockless find, add and
delete gids. The lockless nature comes from using a unique
sequence number per table entry and detecting that while reading/
writing this sequence wasn't changed.

By using this RoCE GID cache table, providers must implement a
modify_gid callback. The table is managed exclusively by
this roce_gid_cache and the provider just need to write
the data to the hardware.

Signed-off-by: Matan Barak mat...@mellanox.com
Signed-off-by: Somnath Kotur somnath.ko...@emulex.com
---
 drivers/infiniband/core/Makefile |   3 +-
 drivers/infiniband/core/core_priv.h  |  24 ++
 drivers/infiniband/core/roce_gid_cache.c | 511 +++
 drivers/infiniband/hw/mlx4/main.c|   2 -
 include/rdma/ib_verbs.h  |  55 +++-
 5 files changed, 591 insertions(+), 4 deletions(-)
 create mode 100644 drivers/infiniband/core/roce_gid_cache.c

diff --git a/drivers/infiniband/core/Makefile b/drivers/infiniband/core/Makefile
index acf7367..9b63bdf 100644
--- a/drivers/infiniband/core/Makefile
+++ b/drivers/infiniband/core/Makefile
@@ -9,7 +9,8 @@ obj-$(CONFIG_INFINIBAND_USER_ACCESS) += ib_uverbs.o ib_ucm.o \
$(user_access-y)
 
 ib_core-y :=   packer.o ud_header.o verbs.o sysfs.o \
-   device.o fmr_pool.o cache.o netlink.o
+   device.o fmr_pool.o cache.o netlink.o \
+   roce_gid_cache.o
 ib_core-$(CONFIG_INFINIBAND_USER_MEM) += umem.o
 ib_core-$(CONFIG_INFINIBAND_ON_DEMAND_PAGING) += umem_odp.o umem_rbtree.o
 
diff --git a/drivers/infiniband/core/core_priv.h 
b/drivers/infiniband/core/core_priv.h
index 87d1936..a502daa 100644
--- a/drivers/infiniband/core/core_priv.h
+++ b/drivers/infiniband/core/core_priv.h
@@ -35,6 +35,7 @@
 
 #include linux/list.h
 #include linux/spinlock.h
+#include net/net_namespace.h
 
 #include rdma/ib_verbs.h
 
@@ -51,4 +52,27 @@ void ib_cache_cleanup(void);
 
 int ib_resolve_eth_l2_attrs(struct ib_qp *qp,
struct ib_qp_attr *qp_attr, int *qp_attr_mask);
+
+int roce_gid_cache_get_gid(struct ib_device *ib_dev, u8 port, int index,
+  union ib_gid *gid, struct ib_gid_attr *attr);
+
+int roce_gid_cache_find_gid(struct ib_device *ib_dev, union ib_gid *gid,
+   enum ib_gid_type gid_type, struct net *net,
+   int if_index, u8 *port, u16 *index);
+
+int roce_gid_cache_find_gid_by_port(struct ib_device *ib_dev, union ib_gid 
*gid,
+   enum ib_gid_type gid_type, u8 port,
+   struct net *net, int if_index, u16 *index);
+
+int roce_gid_cache_is_active(struct ib_device *ib_dev, u8 port);
+
+int roce_add_gid(struct ib_device *ib_dev, u8 port,
+union ib_gid *gid, struct ib_gid_attr *attr);
+
+int roce_del_gid(struct ib_device *ib_dev, u8 port,
+union ib_gid *gid, struct ib_gid_attr *attr);
+
+int roce_del_all_netdev_gids(struct ib_device *ib_dev, u8 port,
+struct net_device *ndev);
+
 #endif /* _CORE_PRIV_H */
diff --git a/drivers/infiniband/core/roce_gid_cache.c 
b/drivers/infiniband/core/roce_gid_cache.c
new file mode 100644
index 000..aa20371
--- /dev/null
+++ b/drivers/infiniband/core/roce_gid_cache.c
@@ -0,0 +1,511 @@
+/*
+ * Copyright (c) 2015, Mellanox Technologies inc.  All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *  - Redistributions of source code must retain the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer.
+ *
+ *  - Redistributions in binary form must reproduce the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer in the documentation and/or other materials
+ *provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT

[PATCH v2 for-next 32/32] IB/cma: Join and leave multicast groups with IGMP

2015-03-10 Thread Somnath Kotur
From: Moni Shoua mo...@mellanox.com

Since RoCEv2 is a protocol over IP header it is required to send IGMP
join and leave requests to the network when joining and leaving
multicast groups.

Signed-off-by: Moni Shoua mo...@mellanox.com
Signed-off-by: Somnath Kotur somnath.ko...@emulex.com
---
 drivers/infiniband/core/cma.c   | 78 ++---
 drivers/infiniband/core/multicast.c | 18 -
 include/rdma/ib_sa.h|  3 ++
 3 files changed, 92 insertions(+), 7 deletions(-)

diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index 2bfe798..bc30bc5 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -38,6 +38,7 @@
 #include linux/in6.h
 #include linux/mutex.h
 #include linux/random.h
+#include linux/igmp.h
 #include linux/idr.h
 #include linux/inetdevice.h
 #include linux/slab.h
@@ -196,6 +197,7 @@ struct cma_multicast {
void*context;
struct sockaddr_storage addr;
struct kref mcref;
+   booligmp_joined;
 };
 
 struct cma_work {
@@ -283,6 +285,26 @@ static inline void cma_set_ip_ver(struct cma_hdr *hdr, u8 
ip_ver)
hdr-ip_version = (ip_ver  4) | (hdr-ip_version  0xF);
 }
 
+static int cma_igmp_send(struct net_device *ndev, union ib_gid *mgid, bool 
join)
+{
+   struct in_device *in_dev = NULL;
+
+   if (ndev) {
+   rtnl_lock();
+   in_dev = __in_dev_get_rtnl(ndev);
+   if (in_dev) {
+   if (join)
+   ip_mc_inc_group(in_dev,
+   *(__be32 *)(mgid-raw+12));
+   else
+   ip_mc_dec_group(in_dev,
+   *(__be32 *)(mgid-raw+12));
+   }
+   rtnl_unlock();
+   }
+   return (in_dev) ? 0 : -ENODEV;
+}
+
 static void cma_attach_to_dev(struct rdma_id_private *id_priv,
  struct cma_device *cma_dev)
 {
@@ -1076,6 +1098,20 @@ static void cma_leave_mc_groups(struct rdma_id_private 
*id_priv)
kfree(mc);
break;
case IB_LINK_LAYER_ETHERNET:
+   if (mc-igmp_joined) {
+   struct rdma_dev_addr *dev_addr = 
id_priv-id.route.addr.dev_addr;
+   struct net_device *ndev = NULL;
+
+   if (dev_addr-bound_dev_if)
+   ndev = dev_get_by_index(init_net,
+   
dev_addr-bound_dev_if);
+   if (ndev) {
+   cma_igmp_send(ndev,
+ 
mc-multicast.ib-rec.mgid,
+ false);
+   dev_put(ndev);
+   }
+   }
kref_put(mc-mcref, release_mc);
break;
default:
@@ -3356,7 +3392,7 @@ static int cma_iboe_join_multicast(struct rdma_id_private 
*id_priv,
 {
struct iboe_mcast_work *work;
struct rdma_dev_addr *dev_addr = id_priv-id.route.addr.dev_addr;
-   int err;
+   int err = 0;
struct sockaddr *addr = (struct sockaddr *)mc-addr;
struct net_device *ndev = NULL;
 
@@ -3388,13 +3424,30 @@ static int cma_iboe_join_multicast(struct 
rdma_id_private *id_priv,
mc-multicast.ib-rec.rate = iboe_get_rate(ndev);
mc-multicast.ib-rec.hop_limit = 1;
mc-multicast.ib-rec.mtu = iboe_get_mtu(ndev-mtu);
+   mc-multicast.ib-rec.ifindex = dev_addr-bound_dev_if;
+   mc-multicast.ib-rec.net = init_net;
+   rdma_ip2gid((struct sockaddr *)id_priv-id.route.addr.src_addr,
+   mc-multicast.ib-rec.port_gid);
+
+   if (addr-sa_family == AF_INET) {
+   mc-multicast.ib-rec.gid_type =
+   id_priv-cma_dev-default_gid_type;
+   if (mc-multicast.ib-rec.gid_type == IB_GID_TYPE_ROCE_V2)
+   err = cma_igmp_send(ndev, mc-multicast.ib-rec.mgid,
+   true);
+   if (!err) {
+   mc-igmp_joined = true;
+   mc-multicast.ib-rec.hop_limit = IPV6_DEFAULT_HOPLIMIT;
+   }
+   } else {
+   mc-multicast.ib-rec.gid_type = IB_GID_TYPE_IB;
+   }
dev_put(ndev);
-   if (!mc-multicast.ib-rec.mtu) {
+   if (err || !mc-multicast.ib-rec.mtu) {
err = -EINVAL;
goto out2;
}
-   rdma_ip2gid((struct sockaddr *)id_priv-id.route.addr.src_addr,
-   mc-multicast.ib-rec.port_gid);
+
work-id = id_priv;
work-mc = mc;
INIT_WORK(work

[PATCH v2 for-next 28/32] net/mlx4_core: Add handling of R-RoCE over IPV4 in qp attach flow

2015-03-10 Thread Somnath Kotur
From: Maor Gottlieb ma...@mellanox.com

In that case, the IPv4 bit should be enabled in the IB flow spec.

Signed-off-by: Maor Gottlieb ma...@mellanox.com
Signed-off-by: Somnath Kotur somnath.ko...@emulex.com
---
 drivers/net/ethernet/mellanox/mlx4/mcg.c | 14 --
 include/linux/mlx4/device.h  |  6 ++
 2 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/mcg.c 
b/drivers/net/ethernet/mellanox/mlx4/mcg.c
index a3867e7..cdf07b9 100644
--- a/drivers/net/ethernet/mellanox/mlx4/mcg.c
+++ b/drivers/net/ethernet/mellanox/mlx4/mcg.c
@@ -858,7 +858,9 @@ static int parse_trans_rule(struct mlx4_dev *dev, struct 
mlx4_spec_list *spec,
break;
 
case MLX4_NET_TRANS_RULE_ID_IB:
-   rule_hw-ib.l3_qpn = spec-ib.l3_qpn;
+   rule_hw-ib.l3_qpn = spec-ib.l3_qpn |
+   (spec-ib.roce_type == MLX4_FLOW_SPEC_IB_ROCE_TYPE_IPV4 
?
+0x80 : 0);
rule_hw-ib.qpn_mask = spec-ib.qpn_msk;
memcpy(rule_hw-ib.dst_gid, spec-ib.dst_gid, 16);
memcpy(rule_hw-ib.dst_gid_msk, spec-ib.dst_gid_msk, 16);
@@ -1377,10 +1379,18 @@ int mlx4_trans_to_dmfs_attach(struct mlx4_dev *dev, 
struct mlx4_qp *qp,
memcpy(spec.eth.dst_mac_msk, mac_mask, ETH_ALEN);
break;
 
+   case MLX4_PROT_IB_IPV4:
+   spec.id = MLX4_NET_TRANS_RULE_ID_IB;
+   memcpy(spec.ib.dst_gid + 12, gid + 12, 4);
+   memset(spec.ib.dst_gid_msk + 12, 0xff, 4);
+   spec.ib.roce_type = MLX4_FLOW_SPEC_IB_ROCE_TYPE_IPV4;
+
+   break;
case MLX4_PROT_IB_IPV6:
spec.id = MLX4_NET_TRANS_RULE_ID_IB;
memcpy(spec.ib.dst_gid, gid, 16);
-   memset(spec.ib.dst_gid_msk, 0xff, 16);
+   memset(spec.ib.dst_gid_msk, 0xff, 16);
+   spec.ib.roce_type = MLX4_FLOW_SPEC_IB_ROCE_TYPE_IPV6;
break;
default:
return -EINVAL;
diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h
index dd1488c..58b0b8c 100644
--- a/include/linux/mlx4/device.h
+++ b/include/linux/mlx4/device.h
@@ -369,6 +369,11 @@ enum mlx4_protocol {
MLX4_PROT_FCOE
 };
 
+enum mlx4_flow_roce_type {
+   MLX4_FLOW_SPEC_IB_ROCE_TYPE_IPV6 = 0,
+   MLX4_FLOW_SPEC_IB_ROCE_TYPE_IPV4
+};
+
 enum {
MLX4_MTT_FLAG_PRESENT   = 1
 };
@@ -1096,6 +1101,7 @@ struct mlx4_spec_ipv4 {
 struct mlx4_spec_ib {
__be32  l3_qpn;
__be32  qpn_msk;
+   enummlx4_flow_roce_type roce_type;
u8  dst_gid[16];
u8  dst_gid_msk[16];
 };
-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 for-next 19/32] IB/mlx4: Remove gid table management for RoCE

2015-03-10 Thread Somnath Kotur
From: Moni Shoua mo...@mellanox.com

RoCE GID table management moved to InfiniBand core driver.
Core driver is now responsible to populate the GID table and supply
query and lookup functions for GIDs. HW drivers are responsible only modify
GID table in network adapters.
The query_gid hook should now return the answer from the cache when link layer
is Ethernet.

Signed-off-by: Moni Shoua mo...@mellanox.com
Signed-off-by: Somnath Kotur somnath.ko...@emulex.com
---
 drivers/infiniband/hw/mlx4/main.c| 495 +--
 drivers/infiniband/hw/mlx4/mlx4_ib.h |   4 -
 2 files changed, 14 insertions(+), 485 deletions(-)

diff --git a/drivers/infiniband/hw/mlx4/main.c 
b/drivers/infiniband/hw/mlx4/main.c
index 6fa5e49..91caffc 100644
--- a/drivers/infiniband/hw/mlx4/main.c
+++ b/drivers/infiniband/hw/mlx4/main.c
@@ -45,6 +45,7 @@
 #include rdma/ib_smi.h
 #include rdma/ib_user_verbs.h
 #include rdma/ib_addr.h
+#include rdma/ib_cache.h
 
 #include linux/mlx4/driver.h
 #include linux/mlx4/cmd.h
@@ -74,13 +75,6 @@ static const char mlx4_ib_version[] =
DRV_NAME : Mellanox ConnectX InfiniBand driver v
DRV_VERSION  ( DRV_RELDATE )\n;
 
-struct update_gid_work {
-   struct work_struct  work;
-   union ib_gidgids[128];
-   struct mlx4_ib_dev *dev;
-   int port;
-};
-
 static void do_slave_init(struct mlx4_ib_dev *ibdev, int slave, int do_init);
 
 static struct workqueue_struct *wq;
@@ -474,23 +468,21 @@ out:
return err;
 }
 
-static int iboe_query_gid(struct ib_device *ibdev, u8 port, int index,
- union ib_gid *gid)
-{
-   struct mlx4_ib_dev *dev = to_mdev(ibdev);
-
-   *gid = dev-iboe.gid_table[port - 1][index];
-
-   return 0;
-}
-
 static int mlx4_ib_query_gid(struct ib_device *ibdev, u8 port, int index,
 union ib_gid *gid)
 {
-   if (rdma_port_get_link_layer(ibdev, port) == IB_LINK_LAYER_INFINIBAND)
+   int ret;
+
+   if (ib_cache_use_roce_gid_cache(ibdev, port))
return __mlx4_ib_query_gid(ibdev, port, index, gid, 0);
-   else
-   return iboe_query_gid(ibdev, port, index, gid);
+
+   ret = ib_get_cached_gid(ibdev, port, index, gid, NULL);
+   if (ret == -EAGAIN) {
+   memcpy(gid, zgid, sizeof(*gid));
+   return 0;
+   }
+
+   return ret;
 }
 
 int __mlx4_ib_query_pkey(struct ib_device *ibdev, u8 port, u16 index,
@@ -1480,273 +1472,6 @@ static struct device_attribute *mlx4_class_attributes[] 
= {
dev_attr_board_id
 };
 
-static void mlx4_addrconf_ifid_eui48(u8 *eui, u16 vlan_id,
-struct net_device *dev)
-{
-   memcpy(eui, dev-dev_addr, 3);
-   memcpy(eui + 5, dev-dev_addr + 3, 3);
-   if (vlan_id  0x1000) {
-   eui[3] = vlan_id  8;
-   eui[4] = vlan_id  0xff;
-   } else {
-   eui[3] = 0xff;
-   eui[4] = 0xfe;
-   }
-   eui[0] ^= 2;
-}
-
-static void update_gids_task(struct work_struct *work)
-{
-   struct update_gid_work *gw = container_of(work, struct update_gid_work, 
work);
-   struct mlx4_cmd_mailbox *mailbox;
-   union ib_gid *gids;
-   int err;
-   struct mlx4_dev *dev = gw-dev-dev;
-   int is_bonded = mlx4_is_bonded(dev);
-
-   if (!gw-dev-ib_active)
-   return;
-
-   mailbox = mlx4_alloc_cmd_mailbox(dev);
-   if (IS_ERR(mailbox)) {
-   pr_warn(update gid table failed %ld\n, PTR_ERR(mailbox));
-   return;
-   }
-
-   gids = mailbox-buf;
-   memcpy(gids, gw-gids, sizeof gw-gids);
-
-   err = mlx4_cmd(dev, mailbox-dma, MLX4_SET_PORT_GID_TABLE  8 | 
gw-port,
-  1, MLX4_CMD_SET_PORT, MLX4_CMD_TIME_CLASS_B,
-  MLX4_CMD_WRAPPED);
-   if (err)
-   pr_warn(set port command failed\n);
-   else
-   if ((gw-port == 1) || !is_bonded)
-   mlx4_ib_dispatch_event(gw-dev,
-  is_bonded ? 1 : gw-port,
-  IB_EVENT_GID_CHANGE);
-
-   mlx4_free_cmd_mailbox(dev, mailbox);
-   kfree(gw);
-}
-
-static void reset_gids_task(struct work_struct *work)
-{
-   struct update_gid_work *gw =
-   container_of(work, struct update_gid_work, work);
-   struct mlx4_cmd_mailbox *mailbox;
-   union ib_gid *gids;
-   int err;
-   struct mlx4_dev *dev = gw-dev-dev;
-
-   if (!gw-dev-ib_active)
-   return;
-
-   mailbox = mlx4_alloc_cmd_mailbox(dev);
-   if (IS_ERR(mailbox)) {
-   pr_warn(reset gid table failed\n);
-   goto free;
-   }
-
-   gids = mailbox-buf;
-   memcpy(gids, gw-gids, sizeof(gw-gids));
-
-   if (mlx4_ib_port_link_layer(gw-dev-ib_dev, gw-port

RE: [PATCH 09/30] IB/core: Modify ib_verbs and cma in order to use roce_gid_cache

2015-02-23 Thread Somnath Kotur


 -Original Message-
 From: Matan Barak [mailto:mat...@mellanox.com]
 Sent: Monday, February 23, 2015 3:47 PM
 To: Devesh Sharma; Somnath Kotur; rol...@kernel.org
 Cc: linux-rdma@vger.kernel.org
 Subject: Re: [PATCH 09/30] IB/core: Modify ib_verbs and cma in order to use
 roce_gid_cache
 
 
 
 On 2/23/2015 7:25 AM, Devesh Sharma wrote:
  Hi Matan,
 
  Please find a comment inline below:
 
  -Regards
  Devesh
  -Original Message-
  From: linux-rdma-ow...@vger.kernel.org [mailto:linux-rdma-
  ow...@vger.kernel.org] On Behalf Of Somnath Kotur
  Sent: Friday, February 20, 2015 3:32 AM
  To: rol...@kernel.org
  Cc: linux-rdma@vger.kernel.org; Matan Barak; Somnath Kotur
  Subject: [PATCH 09/30] IB/core: Modify ib_verbs and cma in order to
  use roce_gid_cache
 
  From: Matan Barak mat...@mellanox.com
 
  Previously, we resolved the dmac and took the smac and vlan from the
  resolved address. Changing that into finding a net device that
  matches the IP and vlan of the network packet and querying the RoCE
  GID cache for this net device, GID and GID type.
 
  ocrdma driver changes were done by Somnath Kotur
  somnath.ko...@emulex.com
 
  Signed-off-by: Matan Barak mat...@mellanox.com
  Signed-off-by: Somnath Kotur somnath.ko...@emulex.com
  ---
drivers/infiniband/core/addr.c   |3 +-
drivers/infiniband/core/cm.c |   30 --
drivers/infiniband/core/cma.c|9 --
drivers/infiniband/core/core_priv.h  |4 +-
drivers/infiniband/core/sa_query.c   |4 -
drivers/infiniband/core/ucma.c   |1 -
drivers/infiniband/core/uverbs_cmd.c |6 +-
drivers/infiniband/core/verbs.c  |  159 
  +--
 --
drivers/infiniband/hw/mlx4/ah.c  |   15 +++-
drivers/infiniband/hw/mlx4/mad.c |   12 ++-
drivers/infiniband/hw/mlx4/mcg.c |2 +-
drivers/infiniband/hw/mlx4/mlx4_ib.h |2 +-
drivers/infiniband/hw/mlx4/qp.c  |   42 ++--
drivers/infiniband/hw/ocrdma/ocrdma.h|1 +
drivers/infiniband/hw/ocrdma/ocrdma_ah.c |   20 +++--
drivers/infiniband/hw/ocrdma/ocrdma_hw.c |   17 ++-
include/rdma/ib_addr.h   |2 +-
include/rdma/ib_sa.h |2 -
include/rdma/ib_verbs.h  |7 +-
19 files changed, 183 insertions(+), 155 deletions(-)
 
  diff --git a/drivers/infiniband/core/addr.c
  b/drivers/infiniband/core/addr.c index f80da50..43af7f5 100644
  --- a/drivers/infiniband/core/addr.c
  +++ b/drivers/infiniband/core/addr.c
  @@ -458,7 +458,7 @@ static void resolve_cb(int status, struct
  sockaddr *src_addr,  }
 
int rdma_addr_find_dmac_by_grh(union ib_gid *sgid, union ib_gid
  *dgid, u8 *dmac,
  - u16 *vlan_id)
  + u16 *vlan_id, int if_index)
{
 int ret = 0;
 struct rdma_dev_addr dev_addr;
  @@ -481,6 +481,7 @@ int rdma_addr_find_dmac_by_grh(union ib_gid
  *sgid, union ib_gid *dgid, u8 *dmac,
 return ret;
 
 memset(dev_addr, 0, sizeof(dev_addr));
  +  dev_addr.bound_dev_if = if_index;
 
 ctx.addr = dev_addr;
 init_completion(ctx.comp);
  diff --git a/drivers/infiniband/core/cm.c
  b/drivers/infiniband/core/cm.c index
  d88f2ae..7974e74 100644
  --- a/drivers/infiniband/core/cm.c
  +++ b/drivers/infiniband/core/cm.c
  @@ -178,8 +178,6 @@ struct cm_av {
 struct ib_ah_attr ah_attr;
 u16 pkey_index;
 u8 timeout;
  -  u8  valid;
  -  u8  smac[ETH_ALEN];
};
 
struct cm_work {
  @@ -382,7 +380,6 @@ static int cm_init_av_by_path(struct
  ib_sa_path_rec *path, struct cm_av *av)
  av-ah_attr);
 av-timeout = path-packet_life_time + 1;
 
  -  av-valid = 1;
 return 0;
}
 
  @@ -1563,7 +1560,6 @@ static int cm_req_handler(struct cm_work
 *work)
 cm_format_paths_from_req(req_msg, work-path[0], work-
  path[1]);
 
 memcpy(work-path[0].dmac, cm_id_priv-av.ah_attr.dmac,
 ETH_ALEN);
  -  work-path[0].vlan_id = cm_id_priv-av.ah_attr.vlan_id;
 ret = cm_init_av_by_path(work-path[0], cm_id_priv-av);
 if (ret) {
 ib_get_cached_gid(work-port-cm_dev-ib_device,
  @@ -3511,32 +3507,6 @@ static int cm_init_qp_rtr_attr(struct
  cm_id_private *cm_id_priv,
 *qp_attr_mask = IB_QP_STATE | IB_QP_AV |
 IB_QP_PATH_MTU |
 IB_QP_DEST_QPN | IB_QP_RQ_PSN;
 qp_attr-ah_attr = cm_id_priv-av.ah_attr;
  -  if (!cm_id_priv-av.valid) {
  -  spin_unlock_irqrestore(cm_id_priv-lock, flags);
  -  return -EINVAL;
  -  }
  -  if (cm_id_priv-av.ah_attr.vlan_id != 0x) {
  -  qp_attr-vlan_id = cm_id_priv-av.ah_attr.vlan_id;
  -  *qp_attr_mask |= IB_QP_VID;
  -  }
  -  if (!is_zero_ether_addr(cm_id_priv-av.smac)) {
  -  memcpy(qp_attr-smac, cm_id_priv-av.smac

RE: [PATCH] RDMA/ocrdma: Changes in driver to incorporate the moving of GID Table mgmt to IB/Core.

2015-02-19 Thread Somnath Kotur
Shachar, 
Yes, it happened by mistake which I realized and immediately sent out 
the patch with the correct patch number 
 
Thanks
Som

 -Original Message-
 From: Shachar Raindel [mailto:rain...@mellanox.com]
 Sent: Thursday, February 19, 2015 2:31 PM
 To: Somnath Kotur; rol...@kernel.org
 Cc: linux-rdma@vger.kernel.org; Devesh Sharma
 Subject: RE: [PATCH] RDMA/ocrdma: Changes in driver to incorporate the
 moving of GID Table mgmt to IB/Core.
 
 
 
  -Original Message-
  From: linux-rdma-ow...@vger.kernel.org [mailto:linux-rdma-
  ow...@vger.kernel.org] On Behalf Of Somnath Kotur
  Sent: Friday, February 20, 2015 12:02 AM
  To: rol...@kernel.org
  Cc: linux-rdma@vger.kernel.org; Somnath Kotur; Devesh Sharma
  Subject: [PATCH] RDMA/ocrdma: Changes in driver to incorporate the
  moving of GID Table mgmt to IB/Core.
 
 
 Som, the patch number seems to be missing here.
 When sending next iteration, please make sure:
 - That all patches include the proper numbers
 - That the version of the patchset is cleanly indicated in the header. You can
 use --subject-prefix=PATCH V2 when using format-patch to make this
 happen.
 
 Thanks,
 --Shachar
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v1 00/30] IB/Core: Adding support for RoCEV2 Specification

2015-02-19 Thread Somnath Kotur
Hi Bart,
Here's the link to the git tree with the patches

https://github.com/matanb10/linux.git 
branch name: rocev2_rc4

Thanks
Som

 -Original Message-
 From: Bart Van Assche [mailto:bart.vanass...@sandisk.com]
 Sent: Thursday, February 19, 2015 1:47 PM
 To: Somnath Kotur; rol...@kernel.org
 Cc: linux-rdma@vger.kernel.org
 Subject: Re: [PATCH v1 00/30] IB/Core: Adding support for RoCEV2
 Specification
 
 On 02/19/15 23:02, Somnath Kotur wrote:
  This series depends on RoCE LAG series (already accepted in net-next
  tree)
 
 Hello Somnath,
 
 Can you make a git tree available with these patches ? These patches do not
 apply cleanly on Dave Miller's latest net-next branch (git commit ID
 fece13ca005a5f559147e9424321f4b5e01272b4; Feb 17, 2015).
 
 Thanks,
 
 Bart.

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 01/30] IB/core: Add RoCE GID cache

2015-02-18 Thread Somnath Kotur
From: Matan Barak mat...@mellanox.com

In order to manage multiple types, vlans and MACs per GID, we
need to store them along the GID itself. We store the net device
as well, as sometimes GIDs should be handled according to the
net device they came from. Since populating the GID table should
be identical for every RoCE provider, the GIDs table should be
handled in ib_core.

Adding a GID cache table that supports a lockless find, add and
delete gids. The lockless nature comes from using a unique
sequence number per table entry and detecting that while reading/
writing this sequence wasn't changed.

By using this RoCE GID cache table, providers must implement a
modify_gid callback. The table is managed exclusively by
this roce_gid_cache and the provider just need to write
the data to the hardware.

Signed-off-by: Matan Barak mat...@mellanox.com
Signed-off-by: Somnath Kotur somnath.ko...@emulex.com
---
 drivers/infiniband/core/Makefile |3 +-
 drivers/infiniband/core/core_priv.h  |   24 ++
 drivers/infiniband/core/roce_gid_cache.c |  511 ++
 drivers/infiniband/hw/mlx4/main.c|2 -
 include/rdma/ib_verbs.h  |   55 -
 5 files changed, 591 insertions(+), 4 deletions(-)
 create mode 100644 drivers/infiniband/core/roce_gid_cache.c

diff --git a/drivers/infiniband/core/Makefile b/drivers/infiniband/core/Makefile
index acf7367..9b63bdf 100644
--- a/drivers/infiniband/core/Makefile
+++ b/drivers/infiniband/core/Makefile
@@ -9,7 +9,8 @@ obj-$(CONFIG_INFINIBAND_USER_ACCESS) += ib_uverbs.o ib_ucm.o \
$(user_access-y)
 
 ib_core-y :=   packer.o ud_header.o verbs.o sysfs.o \
-   device.o fmr_pool.o cache.o netlink.o
+   device.o fmr_pool.o cache.o netlink.o \
+   roce_gid_cache.o
 ib_core-$(CONFIG_INFINIBAND_USER_MEM) += umem.o
 ib_core-$(CONFIG_INFINIBAND_ON_DEMAND_PAGING) += umem_odp.o umem_rbtree.o
 
diff --git a/drivers/infiniband/core/core_priv.h 
b/drivers/infiniband/core/core_priv.h
index 87d1936..a502daa 100644
--- a/drivers/infiniband/core/core_priv.h
+++ b/drivers/infiniband/core/core_priv.h
@@ -35,6 +35,7 @@
 
 #include linux/list.h
 #include linux/spinlock.h
+#include net/net_namespace.h
 
 #include rdma/ib_verbs.h
 
@@ -51,4 +52,27 @@ void ib_cache_cleanup(void);
 
 int ib_resolve_eth_l2_attrs(struct ib_qp *qp,
struct ib_qp_attr *qp_attr, int *qp_attr_mask);
+
+int roce_gid_cache_get_gid(struct ib_device *ib_dev, u8 port, int index,
+  union ib_gid *gid, struct ib_gid_attr *attr);
+
+int roce_gid_cache_find_gid(struct ib_device *ib_dev, union ib_gid *gid,
+   enum ib_gid_type gid_type, struct net *net,
+   int if_index, u8 *port, u16 *index);
+
+int roce_gid_cache_find_gid_by_port(struct ib_device *ib_dev, union ib_gid 
*gid,
+   enum ib_gid_type gid_type, u8 port,
+   struct net *net, int if_index, u16 *index);
+
+int roce_gid_cache_is_active(struct ib_device *ib_dev, u8 port);
+
+int roce_add_gid(struct ib_device *ib_dev, u8 port,
+union ib_gid *gid, struct ib_gid_attr *attr);
+
+int roce_del_gid(struct ib_device *ib_dev, u8 port,
+union ib_gid *gid, struct ib_gid_attr *attr);
+
+int roce_del_all_netdev_gids(struct ib_device *ib_dev, u8 port,
+struct net_device *ndev);
+
 #endif /* _CORE_PRIV_H */
diff --git a/drivers/infiniband/core/roce_gid_cache.c 
b/drivers/infiniband/core/roce_gid_cache.c
new file mode 100644
index 000..8f6af4a
--- /dev/null
+++ b/drivers/infiniband/core/roce_gid_cache.c
@@ -0,0 +1,511 @@
+/*
+ * Copyright (c) 2015, Mellanox Technologies inc.  All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *  - Redistributions of source code must retain the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer.
+ *
+ *  - Redistributions in binary form must reproduce the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer in the documentation and/or other materials
+ *provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT

[PATCH 14/30] RDMA/ocrdma: Changes in driver to incorporate the moving of GID Table mgmt to IB/Core.

2015-02-18 Thread Somnath Kotur
1.Check and set port capability flags to indicate RoCEV2 support.
2.Change query_gid hook to return value from IB/Core GID Mgmt APIs.
3.Get rid of all the netdev notifier chain subscription code as well as
maintenance of SGID Table in memory.
4.Implement get_netdev hook in driver.

Signed-off-by: Somnath Kotur somnath.ko...@emulex.com
Signed-off-by: Devesh Sharma devesh.sha...@emulex.com
---
 drivers/infiniband/hw/ocrdma/ocrdma.h   |   10 ++
 drivers/infiniband/hw/ocrdma/ocrdma_hw.c|3 +
 drivers/infiniband/hw/ocrdma/ocrdma_main.c  |  233 +--
 drivers/infiniband/hw/ocrdma/ocrdma_sli.h   |   13 ++
 drivers/infiniband/hw/ocrdma/ocrdma_verbs.c |   31 -
 drivers/infiniband/hw/ocrdma/ocrdma_verbs.h |4 +
 6 files changed, 63 insertions(+), 231 deletions(-)

diff --git a/drivers/infiniband/hw/ocrdma/ocrdma.h 
b/drivers/infiniband/hw/ocrdma/ocrdma.h
index 16ee36e..97f971a 100644
--- a/drivers/infiniband/hw/ocrdma/ocrdma.h
+++ b/drivers/infiniband/hw/ocrdma/ocrdma.h
@@ -100,6 +100,7 @@ struct ocrdma_dev_attr {
u8 local_ca_ack_delay;
u8 ird;
u8 num_ird_pages;
+   u8 roce_flags;
 };
 
 struct ocrdma_dma_mem {
@@ -575,4 +576,13 @@ static inline u8 ocrdma_is_enabled_and_synced(u32 state)
(state  OCRDMA_STATE_FLAG_SYNC);
 }
 
+static inline bool ocrdma_is_rocev2_supported(struct ocrdma_dev *dev)
+{
+   return (dev-attr.roce_flags  (OCRDMA_L3_TYPE_IPV4 
+   OCRDMA_ROUDP_FLAGS_SHIFT) ||
+   dev-attr.roce_flags  (OCRDMA_L3_TYPE_IPV6 
+   OCRDMA_ROUDP_FLAGS_SHIFT)) ?
+   true : false;
+}
+
 #endif
diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_hw.c 
b/drivers/infiniband/hw/ocrdma/ocrdma_hw.c
index c0dda74..cb98911 100644
--- a/drivers/infiniband/hw/ocrdma/ocrdma_hw.c
+++ b/drivers/infiniband/hw/ocrdma/ocrdma_hw.c
@@ -1112,6 +1112,9 @@ static void ocrdma_get_attr(struct ocrdma_dev *dev,
attr-local_ca_ack_delay = (rsp-max_pd_ca_ack_delay 
OCRDMA_MBX_QUERY_CFG_CA_ACK_DELAY_MASK) 
OCRDMA_MBX_QUERY_CFG_CA_ACK_DELAY_SHIFT;
+   attr-roce_flags = (rsp-max_pd_ca_ack_delay 
+   OCRDMA_MBX_QUERY_CFG_L3_TYPE_MASK) 
+   OCRDMA_MBX_QUERY_CFG_L3_TYPE_SHIFT;
attr-max_mw = rsp-max_mw;
attr-max_mr = rsp-max_mr;
attr-max_mr_size = ((u64)rsp-max_mr_size_hi  32) |
diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_main.c 
b/drivers/infiniband/hw/ocrdma/ocrdma_main.c
index 7a2b59a..a81492f 100644
--- a/drivers/infiniband/hw/ocrdma/ocrdma_main.c
+++ b/drivers/infiniband/hw/ocrdma/ocrdma_main.c
@@ -51,8 +51,6 @@ static LIST_HEAD(ocrdma_dev_list);
 static DEFINE_SPINLOCK(ocrdma_devlist_lock);
 static DEFINE_IDR(ocrdma_dev_id);
 
-static union ib_gid ocrdma_zero_sgid;
-
 void ocrdma_get_guid(struct ocrdma_dev *dev, u8 *guid)
 {
u8 mac_addr[6];
@@ -67,135 +65,6 @@ void ocrdma_get_guid(struct ocrdma_dev *dev, u8 *guid)
guid[6] = mac_addr[4];
guid[7] = mac_addr[5];
 }
-
-static bool ocrdma_add_sgid(struct ocrdma_dev *dev, union ib_gid *new_sgid)
-{
-   int i;
-   unsigned long flags;
-
-   memset(ocrdma_zero_sgid, 0, sizeof(union ib_gid));
-
-
-   spin_lock_irqsave(dev-sgid_lock, flags);
-   for (i = 0; i  OCRDMA_MAX_SGID; i++) {
-   if (!memcmp(dev-sgid_tbl[i], ocrdma_zero_sgid,
-   sizeof(union ib_gid))) {
-   /* found free entry */
-   memcpy(dev-sgid_tbl[i], new_sgid,
-  sizeof(union ib_gid));
-   spin_unlock_irqrestore(dev-sgid_lock, flags);
-   return true;
-   } else if (!memcmp(dev-sgid_tbl[i], new_sgid,
-  sizeof(union ib_gid))) {
-   /* entry already present, no addition is required. */
-   spin_unlock_irqrestore(dev-sgid_lock, flags);
-   return false;
-   }
-   }
-   spin_unlock_irqrestore(dev-sgid_lock, flags);
-   return false;
-}
-
-static bool ocrdma_del_sgid(struct ocrdma_dev *dev, union ib_gid *sgid)
-{
-   int found = false;
-   int i;
-   unsigned long flags;
-
-
-   spin_lock_irqsave(dev-sgid_lock, flags);
-   /* first is default sgid, which cannot be deleted. */
-   for (i = 1; i  OCRDMA_MAX_SGID; i++) {
-   if (!memcmp(dev-sgid_tbl[i], sgid, sizeof(union ib_gid))) {
-   /* found matching entry */
-   memset(dev-sgid_tbl[i], 0, sizeof(union ib_gid));
-   found = true;
-   break;
-   }
-   }
-   spin_unlock_irqrestore(dev-sgid_lock, flags);
-   return found;
-}
-
-static int ocrdma_addr_event(unsigned long event

[PATCH 13/30] IB/Core: Changes to the IB Core infrastructure for RoCEv2 support

2015-02-18 Thread Somnath Kotur
1. Choose sgid_index and type from all the matching entries in RDMA-CM
   based on hint from the IP stack.
2. Set hop_limit for the IP Packet based on above hint from IP stack
3. Define a RDMA_NETWORK enum type.

Signed-off-by: Somnath Kotur somnath.ko...@emulex.com
Signed-off-by: Matan Barak mat...@mellanox.com
---
 drivers/infiniband/core/addr.c  |8 
 drivers/infiniband/core/cma.c   |   10 +-
 drivers/infiniband/core/verbs.c |   70 +--
 include/rdma/ib_addr.h  |1 +
 include/rdma/ib_verbs.h |6 +++
 5 files changed, 62 insertions(+), 33 deletions(-)

diff --git a/drivers/infiniband/core/addr.c b/drivers/infiniband/core/addr.c
index 43af7f5..da24c0e 100644
--- a/drivers/infiniband/core/addr.c
+++ b/drivers/infiniband/core/addr.c
@@ -257,6 +257,9 @@ static int addr4_resolve(struct sockaddr_in *src_in,
goto put;
}
 
+   if (rt-rt_uses_gateway)
+   addr-network = RDMA_NETWORK_IPV4;
+
ret = dst_fetch_ha(rt-dst, addr, fl4.daddr);
 put:
ip_rt_put(rt);
@@ -271,6 +274,7 @@ static int addr6_resolve(struct sockaddr_in6 *src_in,
 {
struct flowi6 fl6;
struct dst_entry *dst;
+   struct rt6_info *rt;
int ret;
 
memset(fl6, 0, sizeof fl6);
@@ -282,6 +286,7 @@ static int addr6_resolve(struct sockaddr_in6 *src_in,
if ((ret = dst-error))
goto put;
 
+   rt = (struct rt6_info *)dst;
if (ipv6_addr_any(fl6.saddr)) {
ret = ipv6_dev_get_saddr(init_net, ip6_dst_idev(dst)-dev,
 fl6.daddr, 0, fl6.saddr);
@@ -305,6 +310,9 @@ static int addr6_resolve(struct sockaddr_in6 *src_in,
goto put;
}
 
+   if (rt-rt6i_flags  RTF_GATEWAY)
+   addr-network = RDMA_NETWORK_IPV6;
+
ret = dst_fetch_ha(dst, addr, fl6.daddr);
 put:
dst_release(dst);
diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index 237f2dd..50635fe 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -1952,6 +1952,7 @@ static int cma_resolve_iboe_route(struct rdma_id_private 
*id_priv)
 {
struct rdma_route *route = id_priv-id.route;
struct rdma_addr *addr = route-addr;
+   enum ib_gid_type network_gid_type;
struct cma_work *work;
int ret;
struct net_device *ndev = NULL;
@@ -1990,7 +1991,14 @@ static int cma_resolve_iboe_route(struct rdma_id_private 
*id_priv)
rdma_ip2gid((struct sockaddr *)id_priv-id.route.addr.dst_addr,
route-path_rec-dgid);
 
-   route-path_rec-hop_limit = 1;
+   /* Use the hint from IP Stack to select GID Type */
+   network_gid_type = ib_network_to_gid_type(addr-dev_addr.network);
+   if (addr-dev_addr.network != RDMA_NETWORK_IB) {
+   route-path_rec-gid_type = network_gid_type;
+   route-path_rec-hop_limit = IPV6_DEFAULT_HOPLIMIT;
+   } else {
+   route-path_rec-hop_limit = 1;
+   }
route-path_rec-reversible = 1;
route-path_rec-pkey = cpu_to_be16(0x);
route-path_rec-mtu_selector = IB_SA_EQ;
diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index 0fdac14..5478c5d 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -219,37 +219,6 @@ static int ib_get_grh_header_version(const void *h)
return 6;
 }
 
-static int ib_get_dgid_sgid_by_grh(const void *h,
-  enum rdma_network_type net_type,
-  union ib_gid *dgid, union ib_gid *sgid)
-{
-   switch (net_type) {
-   case RDMA_NETWORK_IPV4: {
-   const struct iphdr *ip4h = (struct iphdr *)(h + 20);
-
-   ipv6_addr_set_v4mapped(ip4h-daddr, (struct in6_addr *)dgid);
-   ipv6_addr_set_v4mapped(ip4h-saddr, (struct in6_addr *)sgid);
-   return 0;
-   }
-   case RDMA_NETWORK_IPV6: {
-   struct ipv6hdr *ip6h = (struct ipv6hdr *)h;
-
-   memcpy(dgid, ip6h-daddr, sizeof(*dgid));
-   memcpy(sgid, ip6h-saddr, sizeof(*sgid));
-   return 0;
-   }
-   case RDMA_NETWORK_IB: {
-   struct ib_grh *grh = (struct ib_grh *)h;
-
-   memcpy(dgid, grh-dgid, sizeof(*dgid));
-   memcpy(sgid, grh-sgid, sizeof(*sgid));
-   return 0;
-   }
-   }
-
-   return -EINVAL;
-}
-
 static enum rdma_network_type ib_get_net_type_by_grh(struct ib_device *device,
 u8 port_num,
 const struct ib_grh *grh)
@@ -305,6 +274,40 @@ static int get_sgid_index_from_eth(struct ib_device 
*device, u8 port_num,
 context, gid_index);
 }
 
+static int get_gids_from_grh(struct ib_grh *grh, enum rdma_network_type

[PATCH 00/30] IB/Core: Adding support for RoCEV2 Specification

2015-02-18 Thread Somnath Kotur
Hi Roland,

This patch series was created out of collaboration between Emulex and Mellanox.
While Emulex sent out the RoCEV2 patch first to the community, Mellanox which
was also working on some core infrastructure changes from the ground-up towards
RoCEV2 felt that the RoCEV2 patch would be better served if done on top of
their basic infrastructure changes to associate entities like MAC, VLAN,
IP Address with GIDs and thereby move GID Table Management from HW Vendor
drivers to IB/Core.
This patchset is the result of joint development effort between the two teams.

Patch 0001 creates a new infrastructure for storing GIDs and their attributes in
IB/core. This infrastructure support lock-less read of GIDs using a sequence
number.  The data structure is initialized only for RoCE ports.
Every gid has meta information describes its related net device and its type.

Patches 0002, 0004 and 0005 add population of this table for various cases
based on net device events. We always enable default gids for an active device
(an active device is defined here as a device that doesn't have a bonding master
or is the current active slave). This is done in order to allow loopback traffic

Patch 0005 adds proper bonding support - only the active slaves retain their
master's IP based gids and default gids.

This whole concept needs to fit the existing sysfs model, thus patch 0006 adds
sysfs entries that represent the net device and gid type related to each gid.

Patches 0002, 0007, 0008 and 0009 changes the rest of IB/core to fit the new 
model.
Instead of storing smac and vlan, we store either if_index, gid and gid_type 
or sgid_index. Either set suffices in order to resolve all the required
Ethernet parameters. ib_init_ah_from_wc was changed, such as that when a wc is
arrived, we query all the net devices in all namespaces trying to find a match.
This match is later used to find an appropriate sgid_index.

Patch 0010 is used in order to configure the default mode of the cma.
In order to avoid changing existing rdma-cm applications, we adds a configfs 
that states for each ib device what's the default RoCE mode.

Patch 0011 mainly corrects the hop limit value and adds a hint about RoCE type
according to whether we have a gateway. This is the patch that makes it possible
for applications to seamlessly interop between RoCE V1 and V2 without undergoing
any changes themselves.

The rest of the patches add support for ocrdma and mlx4 devices.

This series depends on RoCE LAG series (already accepted in net-next tree)

Thanks,
Somnath, Devesh, Moni and Matan

Devesh Sharma (3):
  RDMA/ocrdma: changes to support RoCE-v2 in UD path
  RDMA/ocrdma: changes to support RoCE-v2 in RC path
  RDMA/ocrdma: changes to support user AH creation

Matan Barak (12):
  IB/core: Add RoCE GID cache
  IB/core: Add kref to IB devices
  IB/core: Add RoCE GID population
  IB/core: Add default GID for RoCE GID Cache
  IB/core: Add RoCE cache bonding support
  IB/core: GID attribute should be returned from verbs API and cache
API
  IB/core: Report gid_type and gid_ndev through sysfs
  IB/core: Support find sgid index using a filter function
  IB/core: Modify ib_verbs and cma in order to use roce_gid_cache
  IB/core: Add gid_type to path and rdma_id_private
  IB/core: Add rdma_network_type to wc
  IB/cma: Add configfs for rdma_cm

Moni Shoua (13):
  IB/mlx4: Remove gid table management for RoCE
  IB/mlx4: Replace spin_lock with rw_semaphore
  IB/mlx4: Lock with RCU instead of RTNL
  net/mlx4: Postpone the registration of net_device
  IB/mlx4: Advertise RoCE support in port capabilities
  IB/mlx4: Implement ib_device callback - get_netdev
  IB/mlx4: Implement ib_device callback - modify_gid
  IB/mlx4: Configure device to work in RoCEv2
  IB/mlx4: Translate cache gid index to real index
  IB/core: Initialize UD header structure with IP and UDP headers
  IB/mlx4: Enable send of RoCE QP1 packets with IP/UDP headers
  IB/mlx4: Create and use another QP1 for RoCEv2
  IB/cma: Join and leave multicast groups with IGMP

Somnath Kotur (2):
  IB/Core: Changes to the IB Core infrastructure for RoCEv2 support
  RDMA/ocrdma: Changes in driver to incorporate the moving of GID Table
mgmt to IB/Core.

 drivers/infiniband/core/Makefile   |5 +-
 drivers/infiniband/core/addr.c |   11 +-
 drivers/infiniband/core/cache.c|  249 +++--
 drivers/infiniband/core/cm.c   |   49 +--
 drivers/infiniband/core/cma.c  |  229 ++--
 drivers/infiniband/core/cma_configfs.c |  222 +++
 drivers/infiniband/core/core_priv.h|   88 +++-
 drivers/infiniband/core/device.c   |  150 +-
 drivers/infiniband/core/mad.c  |2 +-
 drivers/infiniband/core/multicast.c|3 +-
 drivers/infiniband/core/roce_gid_cache.c   |  755 
 drivers/infiniband/core/roce_gid_mgmt.c|  703 ++
 drivers

[PATCH 05/30] IB/core: Add RoCE cache bonding support

2015-02-18 Thread Somnath Kotur
From: Matan Barak mat...@mellanox.com

Bonding is a unique behavior since when working in
active-backup mode, only the current selected slave
should occupy the default GIDs and the master's GID.
Listening to bonding events and only adding the
required GIDs to the active slave in the RoCE cache
GID table.

Signed-off-by: Matan Barak mat...@mellanox.com
Signed-off-by: Somnath Kotur somnath.ko...@emulex.com
---
 drivers/infiniband/core/roce_gid_mgmt.c |  137 ++-
 drivers/net/bonding/bond_options.c  |   13 ---
 include/net/bonding.h   |7 ++
 3 files changed, 140 insertions(+), 17 deletions(-)

diff --git a/drivers/infiniband/core/roce_gid_mgmt.c 
b/drivers/infiniband/core/roce_gid_mgmt.c
index b65eab8..e724295 100644
--- a/drivers/infiniband/core/roce_gid_mgmt.c
+++ b/drivers/infiniband/core/roce_gid_mgmt.c
@@ -37,6 +37,7 @@
 
 /* For in6_dev_get/in6_dev_put */
 #include net/addrconf.h
+#include net/bonding.h
 
 #include rdma/ib_cache.h
 #include rdma/ib_addr.h
@@ -127,12 +128,40 @@ static void update_gid(enum gid_op_type gid_op, struct 
ib_device *ib_dev,
}
 }
 
+#define IS_NETDEV_BONDING_MASTER(ndev) \
+   (((ndev)-priv_flags   \
+ (IFF_BONDING | IFF_MASTER)) == (IFF_BONDING | IFF_MASTER))
+
+enum bonding_slave_state {
+   BONDING_SLAVE_STATE_ACTIVE,
+   BONDING_SLAVE_STATE_INACTIVE,
+   BONDING_SLAVE_STATE_NA
+};
+
+static enum bonding_slave_state is_eth_active_slave_of_bonding(struct 
net_device *idev,
+  struct 
net_device *upper)
+{
+   if (upper  IS_NETDEV_BONDING_MASTER(upper)) {
+   struct net_device *pdev;
+
+   rcu_read_lock();
+   pdev = bond_option_active_slave_get_rcu(netdev_priv(upper));
+   rcu_read_unlock();
+   if (pdev)
+   return idev == pdev ? BONDING_SLAVE_STATE_ACTIVE :
+   BONDING_SLAVE_STATE_INACTIVE;
+   }
+
+   return BONDING_SLAVE_STATE_NA;
+}
+
 static int is_eth_port_of_netdev(struct ib_device *ib_dev, u8 port,
 struct net_device *idev, void *cookie)
 {
struct net_device *rdev;
struct net_device *mdev;
struct net_device *ndev = (struct net_device *)cookie;
+   int res;
 
if (!idev)
return 0;
@@ -140,9 +169,16 @@ static int is_eth_port_of_netdev(struct ib_device *ib_dev, 
u8 port,
rcu_read_lock();
mdev = netdev_master_upper_dev_get_rcu(idev);
rdev = rdma_vlan_dev_real_dev(ndev);
-   rcu_read_unlock();
+   if (!rdev)
+   rdev = ndev;
 
-   return (rdev ? rdev : ndev) == (mdev ? mdev : idev);
+   res = (rdev == idev ||
+  (rdev == mdev 
+   is_eth_active_slave_of_bonding(idev, mdev) !=
+   BONDING_SLAVE_STATE_INACTIVE));
+
+   rcu_read_unlock();
+   return res;
 }
 
 static int pass_all_filter(struct ib_device *ib_dev, u8 port,
@@ -151,6 +187,26 @@ static int pass_all_filter(struct ib_device *ib_dev, u8 
port,
return 1;
 }
 
+static int bonding_slaves_filter(struct ib_device *ib_dev, u8 port,
+struct net_device *idev, void *cookie)
+{
+   struct net_device *mdev;
+   struct net_device *rdev;
+   struct net_device *ndev = (struct net_device *)cookie;
+
+   rdev = rdma_vlan_dev_real_dev(ndev);
+
+   ndev = rdev ? rdev : ndev;
+   if (!idev || !IS_NETDEV_BONDING_MASTER(ndev))
+   return 0;
+
+   rcu_read_lock();
+   mdev = netdev_master_upper_dev_get_rcu(idev);
+   rcu_read_unlock();
+
+   return ndev == mdev;
+}
+
 static void netdevice_event_work_handler(struct work_struct *_work)
 {
struct netdev_event_work *work =
@@ -186,8 +242,16 @@ static void enum_netdev_default_gids(struct ib_device 
*ib_dev,
 {
unsigned long gid_type_mask;
 
-   if (idev != ndev)
+   rcu_read_lock();
+   if (!idev ||
+   ((idev != ndev  netdev_master_upper_dev_get_rcu(idev) != ndev) ||
+is_eth_active_slave_of_bonding(idev,
+   
netdev_master_upper_dev_get_rcu(idev)) ==
+BONDING_SLAVE_STATE_INACTIVE)) {
+   rcu_read_unlock();
return;
+   }
+   rcu_read_unlock();
 
gid_type_mask = gid_type_mask_support(ib_dev, port);
 
@@ -195,6 +259,35 @@ static void enum_netdev_default_gids(struct ib_device 
*ib_dev,
   ROCE_GID_CACHE_DEFAULT_MODE_SET);
 }
 
+static void bond_delete_netdev_default_gids(struct ib_device *ib_dev,
+   u8 port, struct net_device *ndev,
+   struct net_device *idev)
+{
+   struct net_device *upper;
+
+   if (!idev)
+   return;
+
+   rcu_read_lock();
+   upper

[PATCH 09/30] IB/core: Modify ib_verbs and cma in order to use roce_gid_cache

2015-02-18 Thread Somnath Kotur
From: Matan Barak mat...@mellanox.com

Previously, we resolved the dmac and took the smac and vlan
from the resolved address. Changing that into finding a net
device that matches the IP and vlan of the network packet
and querying the RoCE GID cache for this net device,
GID and GID type.

ocrdma driver changes were done by Somnath Kotur somnath.ko...@emulex.com

Signed-off-by: Matan Barak mat...@mellanox.com
Signed-off-by: Somnath Kotur somnath.ko...@emulex.com
---
 drivers/infiniband/core/addr.c   |3 +-
 drivers/infiniband/core/cm.c |   30 --
 drivers/infiniband/core/cma.c|9 --
 drivers/infiniband/core/core_priv.h  |4 +-
 drivers/infiniband/core/sa_query.c   |4 -
 drivers/infiniband/core/ucma.c   |1 -
 drivers/infiniband/core/uverbs_cmd.c |6 +-
 drivers/infiniband/core/verbs.c  |  159 +
 drivers/infiniband/hw/mlx4/ah.c  |   15 +++-
 drivers/infiniband/hw/mlx4/mad.c |   12 ++-
 drivers/infiniband/hw/mlx4/mcg.c |2 +-
 drivers/infiniband/hw/mlx4/mlx4_ib.h |2 +-
 drivers/infiniband/hw/mlx4/qp.c  |   42 ++--
 drivers/infiniband/hw/ocrdma/ocrdma.h|1 +
 drivers/infiniband/hw/ocrdma/ocrdma_ah.c |   20 +++--
 drivers/infiniband/hw/ocrdma/ocrdma_hw.c |   17 ++-
 include/rdma/ib_addr.h   |2 +-
 include/rdma/ib_sa.h |2 -
 include/rdma/ib_verbs.h  |7 +-
 19 files changed, 183 insertions(+), 155 deletions(-)

diff --git a/drivers/infiniband/core/addr.c b/drivers/infiniband/core/addr.c
index f80da50..43af7f5 100644
--- a/drivers/infiniband/core/addr.c
+++ b/drivers/infiniband/core/addr.c
@@ -458,7 +458,7 @@ static void resolve_cb(int status, struct sockaddr 
*src_addr,
 }
 
 int rdma_addr_find_dmac_by_grh(union ib_gid *sgid, union ib_gid *dgid, u8 
*dmac,
-  u16 *vlan_id)
+  u16 *vlan_id, int if_index)
 {
int ret = 0;
struct rdma_dev_addr dev_addr;
@@ -481,6 +481,7 @@ int rdma_addr_find_dmac_by_grh(union ib_gid *sgid, union 
ib_gid *dgid, u8 *dmac,
return ret;
 
memset(dev_addr, 0, sizeof(dev_addr));
+   dev_addr.bound_dev_if = if_index;
 
ctx.addr = dev_addr;
init_completion(ctx.comp);
diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
index d88f2ae..7974e74 100644
--- a/drivers/infiniband/core/cm.c
+++ b/drivers/infiniband/core/cm.c
@@ -178,8 +178,6 @@ struct cm_av {
struct ib_ah_attr ah_attr;
u16 pkey_index;
u8 timeout;
-   u8  valid;
-   u8  smac[ETH_ALEN];
 };
 
 struct cm_work {
@@ -382,7 +380,6 @@ static int cm_init_av_by_path(struct ib_sa_path_rec *path, 
struct cm_av *av)
 av-ah_attr);
av-timeout = path-packet_life_time + 1;
 
-   av-valid = 1;
return 0;
 }
 
@@ -1563,7 +1560,6 @@ static int cm_req_handler(struct cm_work *work)
cm_format_paths_from_req(req_msg, work-path[0], work-path[1]);
 
memcpy(work-path[0].dmac, cm_id_priv-av.ah_attr.dmac, ETH_ALEN);
-   work-path[0].vlan_id = cm_id_priv-av.ah_attr.vlan_id;
ret = cm_init_av_by_path(work-path[0], cm_id_priv-av);
if (ret) {
ib_get_cached_gid(work-port-cm_dev-ib_device,
@@ -3511,32 +3507,6 @@ static int cm_init_qp_rtr_attr(struct cm_id_private 
*cm_id_priv,
*qp_attr_mask = IB_QP_STATE | IB_QP_AV | IB_QP_PATH_MTU |
IB_QP_DEST_QPN | IB_QP_RQ_PSN;
qp_attr-ah_attr = cm_id_priv-av.ah_attr;
-   if (!cm_id_priv-av.valid) {
-   spin_unlock_irqrestore(cm_id_priv-lock, flags);
-   return -EINVAL;
-   }
-   if (cm_id_priv-av.ah_attr.vlan_id != 0x) {
-   qp_attr-vlan_id = cm_id_priv-av.ah_attr.vlan_id;
-   *qp_attr_mask |= IB_QP_VID;
-   }
-   if (!is_zero_ether_addr(cm_id_priv-av.smac)) {
-   memcpy(qp_attr-smac, cm_id_priv-av.smac,
-  sizeof(qp_attr-smac));
-   *qp_attr_mask |= IB_QP_SMAC;
-   }
-   if (cm_id_priv-alt_av.valid) {
-   if (cm_id_priv-alt_av.ah_attr.vlan_id != 0x) {
-   qp_attr-alt_vlan_id =
-   cm_id_priv-alt_av.ah_attr.vlan_id;
-   *qp_attr_mask |= IB_QP_ALT_VID;
-   }
-   if (!is_zero_ether_addr(cm_id_priv-alt_av.smac)) {
-   memcpy(qp_attr-alt_smac,
-  cm_id_priv-alt_av.smac,
-  sizeof(qp_attr-alt_smac));
-   *qp_attr_mask |= IB_QP_ALT_SMAC

[PATCH 07/30] IB/core: Report gid_type and gid_ndev through sysfs

2015-02-18 Thread Somnath Kotur
From: Matan Barak mat...@mellanox.com

Since we've added GID attributes to the RoCE GID table,
the users need a convenient way to query them.
Adding the GID type and relate net device to IB's sysfs.

The new attributes are available in:
/sys/class/infiniband/device/ports/port/gid_attrs/ndevs/index
/sys/class/infiniband/device/ports/port/gid_attrs/types/index

The index corresponds to the index of the respective GID in:
/sys/class/infiniband/device/ports/port/gids/index

Signed-off-by: Matan Barak mat...@mellanox.com
Signed-off-by: Somnath Kotur somnath.ko...@emulex.com
---
 drivers/infiniband/core/core_priv.h  |2 +
 drivers/infiniband/core/roce_gid_cache.c |   13 ++
 drivers/infiniband/core/sysfs.c  |  185 +-
 3 files changed, 198 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/core/core_priv.h 
b/drivers/infiniband/core/core_priv.h
index 6ab40a9..411672f 100644
--- a/drivers/infiniband/core/core_priv.h
+++ b/drivers/infiniband/core/core_priv.h
@@ -71,6 +71,8 @@ void ib_enum_roce_ports_of_netdev(roce_netdev_filter filter,
  roce_netdev_callback cb,
  void *cookie);
 
+const char *roce_gid_cache_type_str(enum ib_gid_type gid_type);
+
 int roce_gid_cache_get_gid(struct ib_device *ib_dev, u8 port, int index,
   union ib_gid *gid, struct ib_gid_attr *attr);
 
diff --git a/drivers/infiniband/core/roce_gid_cache.c 
b/drivers/infiniband/core/roce_gid_cache.c
index fc6a4e6..895b9c1 100644
--- a/drivers/infiniband/core/roce_gid_cache.c
+++ b/drivers/infiniband/core/roce_gid_cache.c
@@ -48,6 +48,11 @@ enum gid_attr_find_mask {
GID_ATTR_FIND_MASK_NETDEV   = 1UL  1,
 };
 
+static const char * const gid_type_str[] = {
+   [IB_GID_TYPE_IB]= IB/RoCE V1\n,
+   [IB_GID_TYPE_ROCE_V2]   = RoCE V2\n,
+};
+
 static inline int start_port(struct ib_device *ib_dev)
 {
return (ib_dev-node_type == RDMA_NODE_IB_SWITCH) ? 0 : 1;
@@ -58,6 +63,14 @@ struct dev_put_rcu {
struct net_device   *ndev;
 };
 
+const char *roce_gid_cache_type_str(enum ib_gid_type gid_type)
+{
+   if (gid_type  ARRAY_SIZE(gid_type_str)  gid_type_str[gid_type])
+   return gid_type_str[gid_type];
+
+   return Invalid GID type;
+}
+
 static void put_ndev(struct rcu_head *rcu)
 {
struct dev_put_rcu *put_rcu =
diff --git a/drivers/infiniband/core/sysfs.c b/drivers/infiniband/core/sysfs.c
index 5cee246..51f0e32 100644
--- a/drivers/infiniband/core/sysfs.c
+++ b/drivers/infiniband/core/sysfs.c
@@ -37,12 +37,22 @@
 #include linux/slab.h
 #include linux/stat.h
 #include linux/string.h
+#include linux/netdevice.h
 
 #include rdma/ib_mad.h
 
+struct ib_port;
+
+struct gid_attr_group {
+   struct ib_port  *port;
+   struct kobject  kobj;
+   struct attribute_group  ndev;
+   struct attribute_group  type;
+};
 struct ib_port {
struct kobject kobj;
struct ib_device  *ibdev;
+   struct gid_attr_group *gid_attr_group;
struct attribute_group gid_group;
struct attribute_group pkey_group;
u8 port_num;
@@ -84,6 +94,24 @@ static const struct sysfs_ops port_sysfs_ops = {
.show = port_attr_show
 };
 
+static ssize_t gid_attr_show(struct kobject *kobj,
+struct attribute *attr, char *buf)
+{
+   struct port_attribute *port_attr =
+   container_of(attr, struct port_attribute, attr);
+   struct ib_port *p = container_of(kobj, struct gid_attr_group,
+kobj)-port;
+
+   if (!port_attr-show)
+   return -EIO;
+
+   return port_attr-show(p, port_attr, buf);
+}
+
+static const struct sysfs_ops gid_attr_sysfs_ops = {
+   .show = gid_attr_show
+};
+
 static ssize_t state_show(struct ib_port *p, struct port_attribute *unused,
  char *buf)
 {
@@ -281,6 +309,46 @@ static struct attribute *port_default_attrs[] = {
NULL
 };
 
+static size_t print_ndev(struct ib_gid_attr *gid_attr, char *buf)
+{
+   if (!gid_attr-ndev)
+   return -EINVAL;
+
+   return sprintf(buf, %s\n, gid_attr-ndev-name);
+}
+
+static size_t print_gid_type(struct ib_gid_attr *gid_attr, char *buf)
+{
+   return sprintf(buf, %s, roce_gid_cache_type_str(gid_attr-gid_type));
+}
+
+static ssize_t _show_port_gid_attr(struct ib_port *p,
+  struct port_attribute *attr,
+  char *buf,
+  size_t (*print)(struct ib_gid_attr *gid_attr,
+  char *buf))
+{
+   struct port_table_attribute *tab_attr =
+   container_of(attr, struct port_table_attribute, attr);
+   union ib_gid gid;
+   struct ib_gid_attr gid_attr;
+   ssize_t ret;
+   va_list args;
+
+   rcu_read_lock

[PATCH 04/30] IB/core: Add default GID for RoCE GID Cache

2015-02-18 Thread Somnath Kotur
From: Matan Barak mat...@mellanox.com

When RoCE is used, a default GID address should be generated
for every supported RoCE type. These default GID addresses are
generated based on the IPv6 link-local address, but in contrast
to the GID based on the regular IPv6 link-local (as we generate
GID per IP address), these GIDs are also available if the net
device is down (in order to support loopback).
Moreover, these default GID addresses can't be deleted.

Signed-off-by: Matan Barak mat...@mellanox.com
Signed-off-by: Somnath Kotur somnath.ko...@emulex.com
---
 drivers/infiniband/core/core_priv.h  |   10 
 drivers/infiniband/core/roce_gid_cache.c |   86 ++
 drivers/infiniband/core/roce_gid_mgmt.c  |   43 ---
 include/net/addrconf.h   |   31 +++
 net/ipv6/addrconf.c  |   31 ---
 5 files changed, 163 insertions(+), 38 deletions(-)

diff --git a/drivers/infiniband/core/core_priv.h 
b/drivers/infiniband/core/core_priv.h
index 12797d9..6ab40a9 100644
--- a/drivers/infiniband/core/core_priv.h
+++ b/drivers/infiniband/core/core_priv.h
@@ -84,6 +84,16 @@ int roce_gid_cache_find_gid_by_port(struct ib_device 
*ib_dev, union ib_gid *gid,
 
 int roce_gid_cache_is_active(struct ib_device *ib_dev, u8 port);
 
+enum roce_gid_cache_default_mode {
+   ROCE_GID_CACHE_DEFAULT_MODE_SET,
+   ROCE_GID_CACHE_DEFAULT_MODE_DELETE
+};
+
+void roce_gid_cache_set_default_gid(struct ib_device *ib_dev, u8 port,
+   struct net_device *ndev,
+   unsigned long gid_type_mask,
+   enum roce_gid_cache_default_mode mode);
+
 int roce_gid_cache_setup(void);
 void roce_gid_cache_cleanup(void);
 
diff --git a/drivers/infiniband/core/roce_gid_cache.c 
b/drivers/infiniband/core/roce_gid_cache.c
index f072533..fc6a4e6 100644
--- a/drivers/infiniband/core/roce_gid_cache.c
+++ b/drivers/infiniband/core/roce_gid_cache.c
@@ -34,6 +34,7 @@
 #include linux/netdevice.h
 #include linux/rtnetlink.h
 #include rdma/ib_cache.h
+#include net/addrconf.h
 
 #include core_priv.h
 
@@ -176,12 +177,19 @@ static int find_gid(struct ib_roce_gid_cache *cache, 
union ib_gid *gid,
return -1;
 }
 
+static void make_default_gid(struct  net_device *dev, union ib_gid *gid)
+{
+   gid-global.subnet_prefix = cpu_to_be64(0xfe80LL);
+   addrconf_ifid_eui48(gid-raw[8], dev);
+}
+
 int roce_add_gid(struct ib_device *ib_dev, u8 port,
 union ib_gid *gid, struct ib_gid_attr *attr)
 {
struct ib_roce_gid_cache *cache;
int ix;
int ret = 0;
+   struct net_device *idev;
 
if (!ib_dev-cache.roce_gid_cache)
return -ENOSYS;
@@ -191,6 +199,22 @@ int roce_add_gid(struct ib_device *ib_dev, u8 port,
if (!cache-active)
return -ENOSYS;
 
+   if (ib_dev-get_netdev) {
+   rcu_read_lock();
+   idev = ib_dev-get_netdev(ib_dev, port);
+   if (attr-ndev != idev) {
+   union ib_gid default_gid;
+
+   /* Adding default GIDs in not permitted */
+   make_default_gid(idev, default_gid);
+   if (!memcmp(gid, default_gid, sizeof(*gid))) {
+   rcu_read_unlock();
+   return -EPERM;
+   }
+   }
+   rcu_read_unlock();
+   }
+
mutex_lock(cache-lock);
 
ix = find_gid(cache, gid, attr, GID_ATTR_FIND_MASK_GID_TYPE |
@@ -215,6 +239,7 @@ int roce_del_gid(struct ib_device *ib_dev, u8 port,
 union ib_gid *gid, struct ib_gid_attr *attr)
 {
struct ib_roce_gid_cache *cache;
+   union ib_gid default_gid;
int ix;
 
if (!ib_dev-cache.roce_gid_cache)
@@ -225,6 +250,13 @@ int roce_del_gid(struct ib_device *ib_dev, u8 port,
if (!cache-active)
return -ENOSYS;
 
+   if (attr-ndev) {
+   /* Deleting default GIDs in not permitted */
+   make_default_gid(attr-ndev, default_gid);
+   if (!memcmp(gid, default_gid, sizeof(*gid)))
+   return -EPERM;
+   }
+
mutex_lock(cache-lock);
 
ix = find_gid(cache, gid, attr,
@@ -437,6 +469,60 @@ static void set_roce_gid_cache_active(struct 
ib_roce_gid_cache *cache,
cache-active = active;
 }
 
+void roce_gid_cache_set_default_gid(struct ib_device *ib_dev, u8 port,
+   struct net_device *ndev,
+   unsigned long gid_type_mask,
+   enum roce_gid_cache_default_mode mode)
+{
+   union ib_gid gid;
+   struct ib_gid_attr gid_attr;
+   struct ib_roce_gid_cache *cache;
+   unsigned int gid_type;
+   unsigned int gid_index = 0;
+
+   cache  = ib_dev-cache.roce_gid_cache[port - 1

[PATCH 23/30] IB/mlx4: Implement ib_device callback - get_netdev

2015-02-18 Thread Somnath Kotur
From: Moni Shoua mo...@mellanox.com

This is a new callback that is required for RoCEv2 support.
In port aggregation mode it is required to return the netdev of the
active port so  support in mlx4 core driver to figure out that port
identity is required.

Signed-off-by: Moni Shoua mo...@mellanox.com
Signed-off-by: Somnath Kotur somnath.ko...@emulex.com
---
 drivers/infiniband/hw/mlx4/main.c |   17 +
 drivers/net/ethernet/mellanox/mlx4/main.c |   18 ++
 include/linux/mlx4/driver.h   |1 +
 3 files changed, 36 insertions(+), 0 deletions(-)

diff --git a/drivers/infiniband/hw/mlx4/main.c 
b/drivers/infiniband/hw/mlx4/main.c
index bf87a95..38061a0 100644
--- a/drivers/infiniband/hw/mlx4/main.c
+++ b/drivers/infiniband/hw/mlx4/main.c
@@ -1527,6 +1527,22 @@ unlock:
mutex_unlock(ibdev-qp1_proxy_lock[port - 1]);
 }
 
+static struct net_device *mlx4_ib_get_netdev(struct ib_device *device, u8 
port_num)
+{
+   struct mlx4_ib_dev *ibdev = to_mdev(device);
+
+   if (mlx4_is_bonded(ibdev-dev)) {
+   u8 true_port_num;
+
+   if (!mlx4_port_map_get(ibdev-dev, port_num, true_port_num))
+   port_num = true_port_num;
+   else
+   return NULL;
+   }
+
+   return mlx4_get_protocol_dev(ibdev-dev, MLX4_PROT_ETH, port_num);
+}
+
 static void mlx4_ib_scan_netdevs(struct mlx4_ib_dev *ibdev,
 struct net_device *dev,
 unsigned long event)
@@ -1806,6 +1822,7 @@ static void *mlx4_ib_add(struct mlx4_dev *dev)
ibdev-ib_dev.attach_mcast  = mlx4_ib_mcg_attach;
ibdev-ib_dev.detach_mcast  = mlx4_ib_mcg_detach;
ibdev-ib_dev.process_mad   = mlx4_ib_process_mad;
+   ibdev-ib_dev.get_netdev= mlx4_ib_get_netdev;
 
if (!mlx4_is_slave(ibdev-dev)) {
ibdev-ib_dev.alloc_fmr = mlx4_ib_fmr_alloc;
diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c 
b/drivers/net/ethernet/mellanox/mlx4/main.c
index 1893a57..6311897 100644
--- a/drivers/net/ethernet/mellanox/mlx4/main.c
+++ b/drivers/net/ethernet/mellanox/mlx4/main.c
@@ -1237,6 +1237,24 @@ int mlx4_port_map_set(struct mlx4_dev *dev, struct 
mlx4_port_map *v2p)
 }
 EXPORT_SYMBOL_GPL(mlx4_port_map_set);
 
+int mlx4_port_map_get(struct mlx4_dev *dev, u8 vport, u8 *pport)
+{
+   struct mlx4_priv *priv = mlx4_priv(dev);
+
+   if (!pport)
+   return -EINVAL;
+   *pport = 0;
+
+   if (vport == 1)
+   *pport = priv-v2p.port1;
+   else if (vport == 2)
+   *pport = priv-v2p.port2;
+   if (!*pport)
+   return -EINVAL;
+   return 0;
+}
+EXPORT_SYMBOL_GPL(mlx4_port_map_get);
+
 static int mlx4_load_fw(struct mlx4_dev *dev)
 {
struct mlx4_priv *priv = mlx4_priv(dev);
diff --git a/include/linux/mlx4/driver.h b/include/linux/mlx4/driver.h
index 5a06d96..a992971 100644
--- a/include/linux/mlx4/driver.h
+++ b/include/linux/mlx4/driver.h
@@ -81,6 +81,7 @@ struct mlx4_port_map {
 };
 
 int mlx4_port_map_set(struct mlx4_dev *dev, struct mlx4_port_map *v2p);
+int mlx4_port_map_get(struct mlx4_dev *dev, u8 vport, u8 *pport);
 
 void *mlx4_get_protocol_dev(struct mlx4_dev *dev, enum mlx4_protocol proto, 
int port);
 
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 26/30] IB/mlx4: Translate cache gid index to real index

2015-02-18 Thread Somnath Kotur
From: Moni Shoua mo...@mellanox.com

When QP is modified with path the given sgid_index is not necessarily
the index that HW knows. This is due to optimizations that can save
place in the HW table. Therefore, translation is required.

Signed-off-by: Moni Shoua mo...@mellanox.com
Signed-off-by: Somnath Kotur somnath.ko...@emulex.com
---
 drivers/infiniband/hw/mlx4/qp.c |   10 +++---
 1 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c
index 9731c07..b06e9fc 100644
--- a/drivers/infiniband/hw/mlx4/qp.c
+++ b/drivers/infiniband/hw/mlx4/qp.c
@@ -1256,14 +1256,18 @@ static int _mlx4_set_path(struct mlx4_ib_dev *dev, 
const struct ib_ah_attr *ah,
path-static_rate = 0;
 
if (ah-ah_flags  IB_AH_GRH) {
-   if (ah-grh.sgid_index = dev-dev-caps.gid_table_len[port]) {
+   int real_sgid_index = mlx4_ib_gid_index_to_real_index(dev,
+ port,
+ 
ah-grh.sgid_index);
+
+   if (real_sgid_index = dev-dev-caps.gid_table_len[port]) {
pr_err(sgid_index (%u) too large. max is %d\n,
-  ah-grh.sgid_index, 
dev-dev-caps.gid_table_len[port] - 1);
+  real_sgid_index, 
dev-dev-caps.gid_table_len[port] - 1);
return -1;
}
 
path-grh_mylmc |= 1  7;
-   path-mgid_index = ah-grh.sgid_index;
+   path-mgid_index = real_sgid_index;
path-hop_limit  = ah-grh.hop_limit;
path-tclass_flowlabel =
cpu_to_be32((ah-grh.traffic_class  20) |
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 11/30] IB/core: Add rdma_network_type to wc

2015-02-18 Thread Somnath Kotur
From: Matan Barak mat...@mellanox.com

Providers should tell IB core the wc's network type.
This is used in order to search for the proper GID in the
GID table. When using HCAs that can't provide this info,
IB core tries to deep examine the packet and extract
the GID type by itself.

Signed-off-by: Matan Barak mat...@mellanox.com
Signed-off-by: Somnath Kotur somnath.ko...@emulex.com
---
 drivers/infiniband/core/verbs.c |  106 +--
 include/rdma/ib_verbs.h |   30 +++
 2 files changed, 131 insertions(+), 5 deletions(-)

diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index 2c54d31..0fdac14 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -195,8 +195,84 @@ struct ib_ah *ib_create_ah(struct ib_pd *pd, struct 
ib_ah_attr *ah_attr)
 }
 EXPORT_SYMBOL(ib_create_ah);
 
+static int ib_get_grh_header_version(const void *h)
+{
+   const struct iphdr *ip4h = (struct iphdr *)(h + 20);
+   struct iphdr ip4h_checked;
+   const struct ipv6hdr *ip6h = (struct ipv6hdr *)h;
+
+   if (ip6h-version != 6)
+   return (ip4h-version == 4) ? 4 : 0;
+   /* version may be 6 or 4 */
+   if (ip4h-ihl != 5) /* IPv4 header length must be 5 for RR */
+   return 6;
+   /* Verify checksum.
+  We can't write on scattered buffers so we need to copy to
+  temp buffer.
+*/
+   memcpy(ip4h_checked, ip4h, sizeof(ip4h_checked));
+   ip4h_checked.check = 0;
+   ip4h_checked.check = ip_fast_csum((u8 *)ip4h_checked, 5);
+   /* if IPv4 header checksum is OK, bellive it */
+   if (ip4h-check == ip4h_checked.check)
+   return 4;
+   return 6;
+}
+
+static int ib_get_dgid_sgid_by_grh(const void *h,
+  enum rdma_network_type net_type,
+  union ib_gid *dgid, union ib_gid *sgid)
+{
+   switch (net_type) {
+   case RDMA_NETWORK_IPV4: {
+   const struct iphdr *ip4h = (struct iphdr *)(h + 20);
+
+   ipv6_addr_set_v4mapped(ip4h-daddr, (struct in6_addr *)dgid);
+   ipv6_addr_set_v4mapped(ip4h-saddr, (struct in6_addr *)sgid);
+   return 0;
+   }
+   case RDMA_NETWORK_IPV6: {
+   struct ipv6hdr *ip6h = (struct ipv6hdr *)h;
+
+   memcpy(dgid, ip6h-daddr, sizeof(*dgid));
+   memcpy(sgid, ip6h-saddr, sizeof(*sgid));
+   return 0;
+   }
+   case RDMA_NETWORK_IB: {
+   struct ib_grh *grh = (struct ib_grh *)h;
+
+   memcpy(dgid, grh-dgid, sizeof(*dgid));
+   memcpy(sgid, grh-sgid, sizeof(*sgid));
+   return 0;
+   }
+   }
+
+   return -EINVAL;
+}
+
+static enum rdma_network_type ib_get_net_type_by_grh(struct ib_device *device,
+u8 port_num,
+const struct ib_grh *grh)
+{
+   int grh_version;
+
+   if (rdma_port_get_link_layer(device, port_num) == 
IB_LINK_LAYER_INFINIBAND)
+   return RDMA_NETWORK_IB;
+
+   grh_version = ib_get_grh_header_version(grh);
+
+   if (grh_version == 4)
+   return RDMA_NETWORK_IPV4;
+
+   if (grh-next_hdr == IPPROTO_UDP)
+   return RDMA_NETWORK_IPV6;
+
+   return RDMA_NETWORK_IB;
+}
+
 struct find_gid_index_context {
u16 vlan_id;
+   enum ib_gid_type gid_type;
 };
 
 static bool find_gid_index(const union ib_gid *gid,
@@ -206,6 +282,9 @@ static bool find_gid_index(const union ib_gid *gid,
struct find_gid_index_context *ctx =
(struct find_gid_index_context *)context;
 
+   if (ctx-gid_type != gid_attr-gid_type)
+   return false;
+
if ((!!(ctx-vlan_id != 0x) == !is_vlan_dev(gid_attr-ndev)) ||
(is_vlan_dev(gid_attr-ndev) 
 vlan_dev_vlan_id(gid_attr-ndev) != ctx-vlan_id))
@@ -216,9 +295,11 @@ static bool find_gid_index(const union ib_gid *gid,
 
 static int get_sgid_index_from_eth(struct ib_device *device, u8 port_num,
   u16 vlan_id, union ib_gid *sgid,
+  enum ib_gid_type gid_type,
   u16 *gid_index)
 {
-   struct find_gid_index_context context = {.vlan_id = vlan_id};
+   struct find_gid_index_context context = {.vlan_id = vlan_id,
+.gid_type = gid_type};
 
return ib_find_gid_by_filter(device, sgid, port_num, find_gid_index,
 context, gid_index);
@@ -232,9 +313,24 @@ int ib_init_ah_from_wc(struct ib_device *device, u8 
port_num, struct ib_wc *wc,
int ret;
int is_eth = (rdma_port_get_link_layer(device, port_num) ==
IB_LINK_LAYER_ETHERNET);
+   enum rdma_network_type net_type = RDMA_NETWORK_IB

[PATCH 28/30] IB/mlx4: Enable send of RoCE QP1 packets with IP/UDP headers

2015-02-18 Thread Somnath Kotur
From: Moni Shoua mo...@mellanox.com

RoCEv2 packets are sent over IP/UDP protocols.
The mlx4 driver uses a type of RAW QP to send packets for QP1 and
therefore needs to build the network headers below BTH in software.

This patche adds option to build QP1 packets with IP and UDP headers if
RoCEv2 is requested.

Signed-off-by: Moni Shoua mo...@mellanox.com
Signed-off-by: Somnath Kotur somnath.ko...@emulex.com
---
 drivers/infiniband/hw/mlx4/qp.c |   84 ---
 1 files changed, 52 insertions(+), 32 deletions(-)

diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c
index f55f4d4..9996527 100644
--- a/drivers/infiniband/hw/mlx4/qp.c
+++ b/drivers/infiniband/hw/mlx4/qp.c
@@ -32,6 +32,8 @@
  */
 
 #include linux/log2.h
+#include linux/if_ether.h
+#include net/ip.h
 #include linux/slab.h
 #include linux/netdevice.h
 
@@ -2164,16 +2166,7 @@ static int build_sriov_qp0_header(struct mlx4_ib_sqp 
*sqp,
return 0;
 }
 
-static void mlx4_u64_to_smac(u8 *dst_mac, u64 src_mac)
-{
-   int i;
-
-   for (i = ETH_ALEN; i; i--) {
-   dst_mac[i - 1] = src_mac  0xff;
-   src_mac = 8;
-   }
-}
-
+#define MLX4_ROCEV2_QP1_SPORT 0xC000
 static int build_mlx_header(struct mlx4_ib_sqp *sqp, struct ib_send_wr *wr,
void *wqe, unsigned *mlx_seg_len)
 {
@@ -2193,6 +2186,8 @@ static int build_mlx_header(struct mlx4_ib_sqp *sqp, 
struct ib_send_wr *wr,
bool is_eth;
bool is_vlan = false;
bool is_grh;
+   bool is_udp = false;
+   int ip_version = 0;
 
send_size = 0;
for (i = 0; i  wr-num_sge; ++i)
@@ -2201,6 +2196,8 @@ static int build_mlx_header(struct mlx4_ib_sqp *sqp, 
struct ib_send_wr *wr,
is_eth = rdma_port_get_link_layer(sqp-qp.ibqp.device, sqp-qp.port) == 
IB_LINK_LAYER_ETHERNET;
is_grh = mlx4_ib_ah_grh_present(ah);
if (is_eth) {
+   struct ib_gid_attr gid_attr;
+
if (mlx4_is_mfunc(to_mdev(ib_dev)-dev)) {
/* When multi-function is enabled, the ib_core gid
 * indexes don't necessarily match the hw ones, so
@@ -2211,21 +2208,29 @@ static int build_mlx_header(struct mlx4_ib_sqp *sqp, 
struct ib_send_wr *wr,
if (err)
return err;
} else  {
-   err = ib_get_cached_gid(ib_dev,
+   err = ib_get_cached_gid(sqp-qp.ibqp.device,
be32_to_cpu(ah-av.ib.port_pd) 
 24,
-   ah-av.ib.gid_index, sgid,
-   NULL);
-   if (err)
+   ah-av.ib.gid_index, sgid, 
gid_attr);
+   if (!err) {
+   is_udp = (gid_attr.gid_type == 
IB_GID_TYPE_ROCE_V2) ? true : false;
+   if (is_udp) {
+   if (ipv6_addr_v4mapped((struct in6_addr 
*)sgid))
+   ip_version = 4;
+   else
+   ip_version = 6;
+   is_grh = false;
+   }
+   } else {
return err;
+   }
}
-
if (ah-av.eth.vlan != cpu_to_be16(0x)) {
vlan = be16_to_cpu(ah-av.eth.vlan)  0x0fff;
is_vlan = 1;
}
}
err = ib_ud_header_init(send_size, !is_eth, is_eth, is_vlan, is_grh,
-   0, 0, 0, sqp-ud_header);
+ ip_version, is_udp, 0, sqp-ud_header);
if (err)
return err;
 
@@ -2236,12 +2241,14 @@ static int build_mlx_header(struct mlx4_ib_sqp *sqp, 
struct ib_send_wr *wr,
sqp-ud_header.lrh.source_lid = cpu_to_be16(ah-av.ib.g_slid  
0x7f);
}
 
-   if (is_grh) {
+   if (is_grh || (ip_version == 6)) {
sqp-ud_header.grh.traffic_class =
(be32_to_cpu(ah-av.ib.sl_tclass_flowlabel)  20)  
0xff;
sqp-ud_header.grh.flow_label=
ah-av.ib.sl_tclass_flowlabel  cpu_to_be32(0xf);
-   sqp-ud_header.grh.hop_limit = ah-av.ib.hop_limit;
+
+   sqp-ud_header.grh.hop_limit = (is_udp) ?
+   IPV6_DEFAULT_HOPLIMIT : ah-av.ib.hop_limit;
if (is_eth)
memcpy(sqp-ud_header.grh.source_gid.raw, sgid.raw, 16);
else {
@@ -2265,6 +2272,26 @@ static int build_mlx_header(struct mlx4_ib_sqp *sqp, 
struct ib_send_wr *wr,
   ah-av.ib.dgid, 16);
}
 
+   if (ip_version == 4

[PATCH 27/30] IB/core: Initialize UD header structure with IP and UDP headers

2015-02-18 Thread Somnath Kotur
From: Moni Shoua mo...@mellanox.com

ib_ud_header_init() is used to format InfiniBand headers
in a buffer up to (but not with) BTH. For RoCEv2 it is required that
this function would be able to build also IP and UDP headers.

Signed-off-by: Moni Shoua mo...@mellanox.com
Signed-off-by: Somnath Kotur somnath.ko...@emulex.com
---
 drivers/infiniband/core/ud_header.c|  153 +--
 drivers/infiniband/hw/mlx4/qp.c|7 +-
 drivers/infiniband/hw/mthca/mthca_qp.c |2 +-
 include/rdma/ib_pack.h |   44 --
 4 files changed, 186 insertions(+), 20 deletions(-)

diff --git a/drivers/infiniband/core/ud_header.c 
b/drivers/infiniband/core/ud_header.c
index 72feee6..a7797a7 100644
--- a/drivers/infiniband/core/ud_header.c
+++ b/drivers/infiniband/core/ud_header.c
@@ -35,6 +35,7 @@
 #include linux/string.h
 #include linux/export.h
 #include linux/if_ether.h
+#include linux/ip.h
 
 #include rdma/ib_pack.h
 
@@ -116,6 +117,68 @@ static const struct ib_field vlan_table[]  = {
  .size_bits= 16 }
 };
 
+static const struct ib_field ip4_table[]  = {
+   { STRUCT_FIELD(ip4, ver_len),
+ .offset_words = 0,
+ .offset_bits  = 0,
+ .size_bits= 8 },
+   { STRUCT_FIELD(ip4, tos),
+ .offset_words = 0,
+ .offset_bits  = 8,
+ .size_bits= 8 },
+   { STRUCT_FIELD(ip4, tot_len),
+ .offset_words = 0,
+ .offset_bits  = 16,
+ .size_bits= 16 },
+   { STRUCT_FIELD(ip4, id),
+ .offset_words = 1,
+ .offset_bits  = 0,
+ .size_bits= 16 },
+   { STRUCT_FIELD(ip4, frag_off),
+ .offset_words = 1,
+ .offset_bits  = 16,
+ .size_bits= 16 },
+   { STRUCT_FIELD(ip4, ttl),
+ .offset_words = 2,
+ .offset_bits  = 0,
+ .size_bits= 8 },
+   { STRUCT_FIELD(ip4, protocol),
+ .offset_words = 2,
+ .offset_bits  = 8,
+ .size_bits= 8 },
+   { STRUCT_FIELD(ip4, check),
+ .offset_words = 2,
+ .offset_bits  = 16,
+ .size_bits= 16 },
+   { STRUCT_FIELD(ip4, saddr),
+ .offset_words = 3,
+ .offset_bits  = 0,
+ .size_bits= 32 },
+   { STRUCT_FIELD(ip4, daddr),
+ .offset_words = 4,
+ .offset_bits  = 0,
+ .size_bits= 32 }
+};
+
+static const struct ib_field udp_table[]  = {
+   { STRUCT_FIELD(udp, sport),
+ .offset_words = 0,
+ .offset_bits  = 0,
+ .size_bits= 16 },
+   { STRUCT_FIELD(udp, dport),
+ .offset_words = 0,
+ .offset_bits  = 16,
+ .size_bits= 16 },
+   { STRUCT_FIELD(udp, length),
+ .offset_words = 1,
+ .offset_bits  = 0,
+ .size_bits= 16 },
+   { STRUCT_FIELD(udp, csum),
+ .offset_words = 1,
+ .offset_bits  = 16,
+ .size_bits= 16 }
+};
+
 static const struct ib_field grh_table[]  = {
{ STRUCT_FIELD(grh, ip_version),
  .offset_words = 0,
@@ -213,6 +276,26 @@ static const struct ib_field deth_table[] = {
  .size_bits= 24 }
 };
 
+u16 ib_ud_ip4_csum(struct ib_ud_header *header)
+{
+   struct iphdr iph;
+
+   iph.ihl = 5;
+   iph.version = 4;
+   iph.tos = header-ip4.tos;
+   iph.tot_len = header-ip4.tot_len;
+   iph.id  = header-ip4.id;
+   iph.frag_off= header-ip4.frag_off;
+   iph.ttl = header-ip4.ttl;
+   iph.protocol= header-ip4.protocol;
+   iph.check   = 0;
+   iph.saddr   = header-ip4.saddr;
+   iph.daddr   = header-ip4.daddr;
+
+   return ip_fast_csum((u8 *)iph, iph.ihl);
+}
+EXPORT_SYMBOL(ib_ud_ip4_csum);
+
 /**
  * ib_ud_header_init - Initialize UD header structure
  * @payload_bytes:Length of packet payload
@@ -220,19 +303,35 @@ static const struct ib_field deth_table[] = {
  * @eth_present: specify if Eth header is present
  * @vlan_present: packet is tagged vlan
  * @grh_present:GRH flag (if non-zero, GRH will be included)
+ * @ip_version:GRH flag (if non-zero, IP header, V4 or V6, will be included)
+ * @grh_present:GRH flag (if non-zero, UDP header will be included)
  * @immediate_present: specify if immediate data is present
  * @header:Structure to initialize
  */
-void ib_ud_header_init(int payload_bytes,
-  int  lrh_present,
-  int  eth_present,
-  int  vlan_present,
-  int  grh_present,
-  int  immediate_present,
-  struct ib_ud_header *header)
+int ib_ud_header_init(int payload_bytes,
+ intlrh_present,
+ inteth_present,
+ intvlan_present,
+ intgrh_present

[PATCH 25/30] IB/mlx4: Configure device to work in RoCEv2

2015-02-18 Thread Somnath Kotur
From: Moni Shoua mo...@mellanox.com

Some mlx4 adapters are RoCEv2 capable. To enable this feature some
hardware configuration is required. This is

1. Set port general parameters
2. Configure the outgoing UDP destination port
3. Configure the QP that work with RoCEv2

Signed-off-by: Moni Shoua mo...@mellanox.com
Signed-off-by: Somnath Kotur somnath.ko...@emulex.com
---
 drivers/infiniband/hw/mlx4/main.c |   10 ++-
 drivers/infiniband/hw/mlx4/qp.c   |   39 ++--
 drivers/net/ethernet/mellanox/mlx4/fw.c   |   16 +++-
 drivers/net/ethernet/mellanox/mlx4/mlx4.h |3 +-
 drivers/net/ethernet/mellanox/mlx4/port.c |9 ++-
 drivers/net/ethernet/mellanox/mlx4/qp.c   |   27 
 include/linux/mlx4/device.h   |3 +-
 include/linux/mlx4/qp.h   |   15 +-
 8 files changed, 112 insertions(+), 10 deletions(-)

diff --git a/drivers/infiniband/hw/mlx4/main.c 
b/drivers/infiniband/hw/mlx4/main.c
index ca19d1d..50612b8 100644
--- a/drivers/infiniband/hw/mlx4/main.c
+++ b/drivers/infiniband/hw/mlx4/main.c
@@ -2154,7 +2154,8 @@ static void *mlx4_ib_add(struct mlx4_dev *dev)
if (mlx4_ib_init_sriov(ibdev))
goto err_mad;
 
-   if (dev-caps.flags  MLX4_DEV_CAP_FLAG_IBOE) {
+   if (dev-caps.flags  MLX4_DEV_CAP_FLAG_IBOE ||
+   dev-caps.flags2  MLX4_DEV_CAP_FLAG2_ROCE_V1_V2) {
if (!iboe-nb.notifier_call) {
iboe-nb.notifier_call = mlx4_ib_netdev_event;
err = register_netdevice_notifier(iboe-nb);
@@ -2163,6 +2164,13 @@ static void *mlx4_ib_add(struct mlx4_dev *dev)
goto err_notif;
}
}
+   if (!mlx4_is_slave(dev) 
+   dev-caps.flags2  MLX4_DEV_CAP_FLAG2_ROCE_V1_V2) {
+   err = mlx4_config_roce_v2_port(dev, ROCE_V2_UDP_DPORT);
+   if (err) {
+   goto err_notif;
+   }
+   }
}
 
for (j = 0; j  ARRAY_SIZE(mlx4_class_attributes); ++j) {
diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c
index 9ab9156..9731c07 100644
--- a/drivers/infiniband/hw/mlx4/qp.c
+++ b/drivers/infiniband/hw/mlx4/qp.c
@@ -1408,6 +1408,24 @@ static int handle_eth_ud_smac_index(struct mlx4_ib_dev 
*dev,
return 0;
 }
 
+enum {
+   MLX4_QPC_ROCE_MODE_1 = 0,
+   MLX4_QPC_ROCE_MODE_2 = 2,
+   MLX4_QPC_ROCE_MODE_MAX = 0xff
+};
+
+static u8 gid_type_to_qpc(enum ib_gid_type gid_type)
+{
+   switch (gid_type) {
+   case IB_GID_TYPE_IB:
+   return MLX4_QPC_ROCE_MODE_1;
+   case IB_GID_TYPE_ROCE_V2:
+   return MLX4_QPC_ROCE_MODE_2;
+   default:
+   return MLX4_QPC_ROCE_MODE_MAX;
+   }
+}
+
 static int __mlx4_ib_modify_qp(struct ib_qp *ibqp,
   const struct ib_qp_attr *attr, int attr_mask,
   enum ib_qp_state cur_state, enum ib_qp_state 
new_state)
@@ -1532,9 +1550,12 @@ static int __mlx4_ib_modify_qp(struct ib_qp *ibqp,
u16 vlan = 0x;
u8 smac[ETH_ALEN];
int status = 0;
+   int is_eth = rdma_port_get_link_layer(dev-ib_dev, qp-port) ==
+   IB_LINK_LAYER_ETHERNET;
 
-   if (rdma_port_get_link_layer(dev-ib_dev, qp-port) ==
-   IB_LINK_LAYER_ETHERNET) {
+   if (is_eth) {
+   if (mlx4_is_bonded(dev-dev))
+   port_num  = 1;
rcu_read_lock();
status = ib_get_cached_gid(ibqp-device, port_num,
   index, gid, gid_attr);
@@ -1551,8 +1572,20 @@ static int __mlx4_ib_modify_qp(struct ib_qp *ibqp,
  port_num, vlan, smac))
goto out;
 
+   if (is_eth  gid_attr.gid_type == IB_GID_TYPE_ROCE_V2)
+   context-pri_path.hop_limit = IPV6_DEFAULT_HOPLIMIT;
+
optpar |= (MLX4_QP_OPTPAR_PRIMARY_ADDR_PATH |
   MLX4_QP_OPTPAR_SCHED_QUEUE);
+
+   if (is_eth  (cur_state == IB_QPS_INIT  new_state == 
IB_QPS_RTR)) {
+   u8 qpc_roce_mode = gid_type_to_qpc(gid_attr.gid_type);
+
+   if (qpc_roce_mode == MLX4_QPC_ROCE_MODE_MAX)
+   goto out;
+   context-rlkey_roce_mode |= (qpc_roce_mode  6);
+   }
+
}
 
if (attr_mask  IB_QP_TIMEOUT) {
@@ -1722,7 +1755,7 @@ static int __mlx4_ib_modify_qp(struct ib_qp *ibqp,
sqd_event = 0;
 
if (!ibqp-uobject  cur_state == IB_QPS_RESET  new_state == 
IB_QPS_INIT)
-   context-rlkey |= (1  4);
+   context-rlkey_roce_mode

[PATCH 29/30] IB/mlx4: Create and use another QP1 for RoCEv2

2015-02-18 Thread Somnath Kotur
From: Moni Shoua mo...@mellanox.com

The mlx4 driver uses a special QP to implement the GSI QP. This kind of
QP allows to build the InfiniBand headers in SW to be put before the
payload that comes in with the WR. The mlx4 HW builds the packet,
calculates the ICRC and puts it at the end of the payload. This ICRC
calculation however depends on the QP configuration which is determined
when QP is modified (roce_mode during INIT-RTR). On the other hand,  ICRC
verification when packet is received does to depend on this
configuration.
Therefore, using 2 GSI QPs for send (one for each RoCE version) and 1
GSI QP for receive are required.

Signed-off-by: Moni Shoua mo...@mellanox.com
Signed-off-by: Somnath Kotur somnath.ko...@emulex.com
---
 drivers/infiniband/hw/mlx4/mlx4_ib.h |7 ++
 drivers/infiniband/hw/mlx4/qp.c  |  154 ++
 2 files changed, 143 insertions(+), 18 deletions(-)

diff --git a/drivers/infiniband/hw/mlx4/mlx4_ib.h 
b/drivers/infiniband/hw/mlx4/mlx4_ib.h
index 018bda6..a853330 100644
--- a/drivers/infiniband/hw/mlx4/mlx4_ib.h
+++ b/drivers/infiniband/hw/mlx4/mlx4_ib.h
@@ -159,11 +159,18 @@ struct mlx4_ib_wq {
unsignedtail;
 };
 
+enum {
+   MLX4_IB_QP_CREATE_ROCE_V2_GSI = IB_QP_CREATE_RESERVED_START
+};
+
 enum mlx4_ib_qp_flags {
MLX4_IB_QP_LSO = IB_QP_CREATE_IPOIB_UD_LSO,
MLX4_IB_QP_BLOCK_MULTICAST_LOOPBACK = 
IB_QP_CREATE_BLOCK_MULTICAST_LOOPBACK,
MLX4_IB_QP_NETIF = IB_QP_CREATE_NETIF_QP,
MLX4_IB_QP_CREATE_USE_GFP_NOIO = IB_QP_CREATE_USE_GFP_NOIO,
+
+   /* Mellanox specific flags start from IB_QP_CREATE_RESERVED_START */
+   MLX4_IB_ROCE_V2_GSI_QP = MLX4_IB_QP_CREATE_ROCE_V2_GSI,
MLX4_IB_SRIOV_TUNNEL_QP = 1  30,
MLX4_IB_SRIOV_SQP = 1  31,
 };
diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c
index 9996527..161b933 100644
--- a/drivers/infiniband/hw/mlx4/qp.c
+++ b/drivers/infiniband/hw/mlx4/qp.c
@@ -81,6 +81,7 @@ struct mlx4_ib_sqp {
u32 send_psn;
struct ib_ud_header ud_header;
u8  header_buf[MLX4_IB_UD_HEADER_SIZE];
+   struct ib_qp*roce_v2_gsi;
 };
 
 enum {
@@ -150,7 +151,10 @@ static int is_sqp(struct mlx4_ib_dev *dev, struct 
mlx4_ib_qp *qp)
}
}
}
-   return proxy_sqp;
+   if (proxy_sqp)
+   return 1;
+
+   return !!(qp-flags  MLX4_IB_ROCE_V2_GSI_QP);
 }
 
 /* used for INIT/CLOSE port logic */
@@ -672,6 +676,7 @@ static int create_qp_common(struct mlx4_ib_dev *dev, struct 
ib_pd *pd,
qp = sqp-qp;
qp-pri.vid = 0x;
qp-alt.vid = 0x;
+   sqp-roce_v2_gsi = NULL;
} else {
qp = kzalloc(sizeof (struct mlx4_ib_qp), gfp);
if (!qp)
@@ -1029,9 +1034,17 @@ static void destroy_qp_common(struct mlx4_ib_dev *dev, 
struct mlx4_ib_qp *qp,
del_gid_entries(qp);
 }
 
-static u32 get_sqp_num(struct mlx4_ib_dev *dev, struct ib_qp_init_attr *attr)
+static int get_sqp_num(struct mlx4_ib_dev *dev, struct ib_qp_init_attr *attr)
 {
/* Native or PPF */
+   if ((!mlx4_is_mfunc(dev-dev) || mlx4_is_master(dev-dev)) 
+   attr-create_flags  MLX4_IB_QP_CREATE_ROCE_V2_GSI) {
+   int sqpn;
+   int res = mlx4_qp_reserve_range(dev-dev, 1, 1, sqpn, 0);
+
+   return res ? -abs(res) : sqpn;
+   }
+
if (!mlx4_is_mfunc(dev-dev) ||
(mlx4_is_master(dev-dev) 
 attr-create_flags  MLX4_IB_SRIOV_SQP)) {
@@ -1039,6 +1052,7 @@ static u32 get_sqp_num(struct mlx4_ib_dev *dev, struct 
ib_qp_init_attr *attr)
(attr-qp_type == IB_QPT_SMI ? 0 : 2) +
attr-port_num - 1;
}
+
/* PF or VF -- creating proxies */
if (attr-qp_type == IB_QPT_SMI)
return dev-dev-caps.qp0_proxy[attr-port_num - 1];
@@ -1046,9 +1060,9 @@ static u32 get_sqp_num(struct mlx4_ib_dev *dev, struct 
ib_qp_init_attr *attr)
return dev-dev-caps.qp1_proxy[attr-port_num - 1];
 }
 
-struct ib_qp *mlx4_ib_create_qp(struct ib_pd *pd,
-   struct ib_qp_init_attr *init_attr,
-   struct ib_udata *udata)
+static struct ib_qp *_mlx4_ib_create_qp(struct ib_pd *pd,
+   struct ib_qp_init_attr *init_attr,
+   struct ib_udata *udata)
 {
struct mlx4_ib_qp *qp = NULL;
int err;
@@ -1066,6 +1080,7 @@ struct ib_qp *mlx4_ib_create_qp(struct ib_pd *pd,
MLX4_IB_SRIOV_TUNNEL_QP |
MLX4_IB_SRIOV_SQP |
MLX4_IB_QP_NETIF |
+   MLX4_IB_QP_CREATE_ROCE_V2_GSI

[PATCH 15/30] RDMA/ocrdma: changes to support RoCE-v2 in UD path

2015-02-18 Thread Somnath Kotur
From: Devesh Sharma devesh.sha...@emulex.com

To support UD protocol this patch adds following
changes to existing UD implementation.

1. AH creation resolves gid-type for a given index.
2. Based on GID-type protocol header is built.
3. Work completion reports l3-type if f/w supports RoCE-v2
   and sets IB_WC_WITH_NETWORK_HDR_TYPE flag in wc-wc_flags.

Signed-off-by: Somnath Kotur somnath.ko...@emulex.com
Signed-off-by: Devesh Sharma devesh.sha...@emulex.com
---
 drivers/infiniband/hw/ocrdma/ocrdma.h   |1 +
 drivers/infiniband/hw/ocrdma/ocrdma_ah.c|   68 ++-
 drivers/infiniband/hw/ocrdma/ocrdma_sli.h   |5 ++-
 drivers/infiniband/hw/ocrdma/ocrdma_verbs.c |   23 +++--
 4 files changed, 80 insertions(+), 17 deletions(-)

diff --git a/drivers/infiniband/hw/ocrdma/ocrdma.h 
b/drivers/infiniband/hw/ocrdma/ocrdma.h
index 97f971a..302fd0e 100644
--- a/drivers/infiniband/hw/ocrdma/ocrdma.h
+++ b/drivers/infiniband/hw/ocrdma/ocrdma.h
@@ -341,6 +341,7 @@ struct ocrdma_ah {
struct ocrdma_av *av;
u16 sgid_index;
u32 id;
+   u8 hdr_type;
 };
 
 struct ocrdma_qp_hwq_info {
diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_ah.c 
b/drivers/infiniband/hw/ocrdma/ocrdma_ah.c
index 7ecd230..70a885b 100644
--- a/drivers/infiniband/hw/ocrdma/ocrdma_ah.c
+++ b/drivers/infiniband/hw/ocrdma/ocrdma_ah.c
@@ -39,6 +39,20 @@
 
 #define OCRDMA_VID_PCP_SHIFT   0xD
 
+static u16 ocrdma_hdr_type_to_proto_num(u8 hdr_type)
+{
+   switch (hdr_type) {
+   case OCRDMA_L3_TYPE_IB_GRH:
+   return (u16)0x8915;
+   case OCRDMA_L3_TYPE_IPV4:
+   return (u16)0x0800;
+   case OCRDMA_L3_TYPE_IPV6:
+   return (u16)0x86dd;
+   default:
+   return 0;
+   }
+}
+
 static inline int set_av_attr(struct ocrdma_dev *dev, struct ocrdma_ah *ah,
struct ib_ah_attr *attr, union ib_gid *sgid,
int pdid, bool *isvlan, u16 vlan_tag)
@@ -47,22 +61,32 @@ static inline int set_av_attr(struct ocrdma_dev *dev, 
struct ocrdma_ah *ah,
struct ocrdma_eth_vlan eth;
struct ocrdma_grh grh;
int eth_sz;
+   u16 proto_num = 0;
+   struct iphdr ipv4;
+   union {
+   struct sockaddr _sockaddr;
+   struct sockaddr_in  _sockaddr_in;
+   struct sockaddr_in6 _sockaddr_in6;
+   } sgid_addr, dgid_addr;
 
memset(eth, 0, sizeof(eth));
memset(grh, 0, sizeof(grh));
+   /* Protocol Number */
+   proto_num = ocrdma_hdr_type_to_proto_num(ah-hdr_type);
+
 
/* VLAN */
if (!vlan_tag || (vlan_tag  0xFFF))
vlan_tag = dev-pvid;
if (vlan_tag  (vlan_tag  0x1000)) {
eth.eth_type = cpu_to_be16(0x8100);
-   eth.roce_eth_type = cpu_to_be16(OCRDMA_ROCE_ETH_TYPE);
+   eth.roce_eth_type = cpu_to_be16(proto_num);
vlan_tag |= (dev-sl  0x07)  OCRDMA_VID_PCP_SHIFT;
eth.vlan_tag = cpu_to_be16(vlan_tag);
eth_sz = sizeof(struct ocrdma_eth_vlan);
*isvlan = true;
} else {
-   eth.eth_type = cpu_to_be16(OCRDMA_ROCE_ETH_TYPE);
+   eth.eth_type = cpu_to_be16(proto_num);
eth_sz = sizeof(struct ocrdma_eth_basic);
}
/* MAC */
@@ -71,18 +95,34 @@ static inline int set_av_attr(struct ocrdma_dev *dev, 
struct ocrdma_ah *ah,
if (status)
return status;
ah-sgid_index = attr-grh.sgid_index;
-   memcpy(grh.sgid[0], sgid-raw, sizeof(union ib_gid));
-   memcpy(grh.dgid[0], attr-grh.dgid.raw, sizeof(attr-grh.dgid.raw));
-
-   grh.tclass_flow = cpu_to_be32((6  28) |
-   (attr-grh.traffic_class  24) |
-   attr-grh.flow_label);
-   /* 0x1b is next header value in GRH */
-   grh.pdid_hoplimit = cpu_to_be32((pdid  16) |
-   (0x1b  8) | attr-grh.hop_limit);
/* Eth HDR */
memcpy(ah-av-eth_hdr, eth, eth_sz);
-   memcpy((u8 *)ah-av + eth_sz, grh, sizeof(struct ocrdma_grh));
+   if (ah-hdr_type == RDMA_NETWORK_IPV4) {
+   *((__be16 *)ipv4) = htons((4  12) | (5  8) |
+  attr-grh.traffic_class);
+   ipv4.id = cpu_to_be16(pdid);
+   ipv4.frag_off = htons(IP_DF);
+   ipv4.tot_len = htons(0);
+   ipv4.ttl = attr-grh.hop_limit;
+   ipv4.protocol = 0x11;
+   rdma_gid2ip(sgid_addr._sockaddr, sgid);
+   ipv4.saddr = sgid_addr._sockaddr_in.sin_addr.s_addr;
+   rdma_gid2ip(dgid_addr._sockaddr, attr-grh.dgid);
+   ipv4.daddr = dgid_addr._sockaddr_in.sin_addr.s_addr;
+   memcpy((u8 *)ah-av + eth_sz, ipv4, sizeof(struct iphdr));
+   } else {
+   memcpy(grh.sgid[0], sgid-raw, sizeof(union ib_gid));
+   grh.tclass_flow = cpu_to_be32((6  28

[PATCH 30/30] IB/cma: Join and leave multicast groups with IGMP

2015-02-18 Thread Somnath Kotur
From: Moni Shoua mo...@mellanox.com

Since RoCEv2 is a protocol over IP header it is required to send IGMP
join and leave requests to the network when joining and leaving
multicast groups.

Signed-off-by: Moni Shoua mo...@mellanox.com
Signed-off-by: Somnath Kotur somnath.ko...@emulex.com
---
 drivers/infiniband/core/cma.c |   78 ++--
 1 files changed, 74 insertions(+), 4 deletions(-)

diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index 50635fe..6e658e8 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -38,6 +38,7 @@
 #include linux/in6.h
 #include linux/mutex.h
 #include linux/random.h
+#include linux/igmp.h
 #include linux/idr.h
 #include linux/inetdevice.h
 #include linux/slab.h
@@ -185,6 +186,7 @@ struct rdma_id_private {
u8  reuseaddr;
u8  afonly;
enum ib_gid_typegid_type;
+   booligmp_joined;
 };
 
 struct cma_multicast {
@@ -283,6 +285,26 @@ static inline void cma_set_ip_ver(struct cma_hdr *hdr, u8 
ip_ver)
hdr-ip_version = (ip_ver  4) | (hdr-ip_version  0xF);
 }
 
+static int cma_igmp_send(struct net_device *ndev, union ib_gid *mgid, bool 
join)
+{
+   struct in_device *in_dev = NULL;
+
+   if (ndev) {
+   rtnl_lock();
+   in_dev = __in_dev_get_rtnl(ndev);
+   if (in_dev) {
+   if (join)
+   ip_mc_inc_group(in_dev,
+   *(__be32 *)(mgid-raw+12));
+   else
+   ip_mc_dec_group(in_dev,
+   *(__be32 *)(mgid-raw+12));
+   }
+   rtnl_unlock();
+   }
+   return (in_dev) ? 0 : -ENODEV;
+}
+
 static void cma_attach_to_dev(struct rdma_id_private *id_priv,
  struct cma_device *cma_dev)
 {
@@ -585,6 +607,7 @@ struct rdma_cm_id *rdma_create_id(rdma_cm_event_handler 
event_handler,
INIT_LIST_HEAD(id_priv-listen_list);
INIT_LIST_HEAD(id_priv-mc_list);
get_random_bytes(id_priv-seq_num, sizeof id_priv-seq_num);
+   id_priv-igmp_joined = false;
 
return id_priv-id;
 }
@@ -1076,6 +1099,20 @@ static void cma_leave_mc_groups(struct rdma_id_private 
*id_priv)
kfree(mc);
break;
case IB_LINK_LAYER_ETHERNET:
+   if (id_priv-igmp_joined) {
+   struct rdma_dev_addr *dev_addr = 
id_priv-id.route.addr.dev_addr;
+   struct net_device *ndev = NULL;
+
+   if (dev_addr-bound_dev_if)
+   ndev = dev_get_by_index(init_net,
+   
dev_addr-bound_dev_if);
+   if (ndev) {
+   cma_igmp_send(ndev,
+ 
mc-multicast.ib-rec.mgid,
+ false);
+   dev_put(ndev);
+   }
+   }
kref_put(mc-mcref, release_mc);
break;
default:
@@ -3356,7 +3393,7 @@ static int cma_iboe_join_multicast(struct rdma_id_private 
*id_priv,
 {
struct iboe_mcast_work *work;
struct rdma_dev_addr *dev_addr = id_priv-id.route.addr.dev_addr;
-   int err;
+   int err = 0;
struct sockaddr *addr = (struct sockaddr *)mc-addr;
struct net_device *ndev = NULL;
 
@@ -3388,13 +3425,31 @@ static int cma_iboe_join_multicast(struct 
rdma_id_private *id_priv,
mc-multicast.ib-rec.rate = iboe_get_rate(ndev);
mc-multicast.ib-rec.hop_limit = 1;
mc-multicast.ib-rec.mtu = iboe_get_mtu(ndev-mtu);
+   rdma_ip2gid((struct sockaddr *)id_priv-id.route.addr.src_addr,
+   mc-multicast.ib-rec.port_gid);
+
+   if (addr-sa_family == AF_INET) {
+   u16 sgid_index;
+
+   err = ib_find_cached_gid_by_port(id_priv-cma_dev-device,
+
mc-multicast.ib-rec.port_gid,
+IB_GID_TYPE_ROCE_V2,
+id_priv-id.port_num,
+init_net, 
dev_addr-bound_dev_if,
+sgid_index);
+   if (!err)
+   err = cma_igmp_send(ndev, mc-multicast.ib-rec.mgid, 
true);
+   if (!err) {
+   id_priv-igmp_joined = true;
+   mc-multicast.ib-rec.hop_limit = IPV6_DEFAULT_HOPLIMIT;
+   }
+   }
dev_put(ndev);
-   if (!mc

[PATCH 10/30] IB/core: Add gid_type to path and rdma_id_private

2015-02-18 Thread Somnath Kotur
From: Matan Barak mat...@mellanox.com

When using rdma cm, we want to take the gid_type from
the rdma_id_private. This is mandatory before adding
an API from user-space/configfs that sets
the gid_type of CM connection.

Signed-off-by: Matan Barak mat...@mellanox.com
Signed-off-by: Somnath Kotur somnath.ko...@emulex.com
---
 drivers/infiniband/core/cm.c  |   19 ++-
 drivers/infiniband/core/cma.c |2 ++
 drivers/infiniband/core/sa_query.c|3 ++-
 drivers/infiniband/core/uverbs_marshall.c |1 +
 include/rdma/ib_sa.h  |1 +
 5 files changed, 20 insertions(+), 6 deletions(-)

diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
index 7974e74..22dac05 100644
--- a/drivers/infiniband/core/cm.c
+++ b/drivers/infiniband/core/cm.c
@@ -358,9 +358,8 @@ static int cm_init_av_by_path(struct ib_sa_path_rec *path, 
struct cm_av *av)
read_lock_irqsave(cm.device_lock, flags);
list_for_each_entry(cm_dev, cm.device_list, list) {
if (!ib_find_cached_gid(cm_dev-ib_device, path-sgid,
-   IB_GID_TYPE_IB, path-net,
-   path-ifindex,
-   p, NULL)) {
+   path-gid_type, path-net,
+   path-ifindex, p, NULL)) {
port = cm_dev-port[p-1];
break;
}
@@ -1521,6 +1520,8 @@ static int cm_req_handler(struct cm_work *work)
struct ib_cm_id *cm_id;
struct cm_id_private *cm_id_priv, *listen_cm_id_priv;
struct cm_req_msg *req_msg;
+   union ib_gid gid;
+   struct ib_gid_attr gid_attr;
int ret;
 
req_msg = (struct cm_req_msg *)work-mad_recv_wc-recv_buf.mad;
@@ -1560,11 +1561,19 @@ static int cm_req_handler(struct cm_work *work)
cm_format_paths_from_req(req_msg, work-path[0], work-path[1]);
 
memcpy(work-path[0].dmac, cm_id_priv-av.ah_attr.dmac, ETH_ALEN);
-   ret = cm_init_av_by_path(work-path[0], cm_id_priv-av);
+   ret = ib_get_cached_gid(work-port-cm_dev-ib_device,
+   work-port-port_num,
+   cm_id_priv-av.ah_attr.grh.sgid_index,
+   gid, gid_attr);
+   if (!ret) {
+   work-path[0].gid_type = gid_attr.gid_type;
+   ret = cm_init_av_by_path(work-path[0], cm_id_priv-av);
+   }
if (ret) {
ib_get_cached_gid(work-port-cm_dev-ib_device,
  work-port-port_num, 0, work-path[0].sgid,
- NULL);
+ gid_attr);
+   work-path[0].gid_type = gid_attr.gid_type;
ib_send_cm_rej(cm_id, IB_CM_REJ_INVALID_GID,
   work-path[0].sgid, sizeof work-path[0].sgid,
   NULL, 0);
diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index 659676c..9afa410 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -146,6 +146,7 @@ struct rdma_id_private {
u8  tos;
u8  reuseaddr;
u8  afonly;
+   enum ib_gid_typegid_type;
 };
 
 struct cma_multicast {
@@ -1936,6 +1937,7 @@ static int cma_resolve_iboe_route(struct rdma_id_private 
*id_priv)
ndev = dev_get_by_index(init_net, addr-dev_addr.bound_dev_if);
route-path_rec-net = init_net;
route-path_rec-ifindex = addr-dev_addr.bound_dev_if;
+   route-path_rec-gid_type = id_priv-gid_type;
}
if (!ndev) {
ret = -ENODEV;
diff --git a/drivers/infiniband/core/sa_query.c 
b/drivers/infiniband/core/sa_query.c
index 705b6b8..f770049 100644
--- a/drivers/infiniband/core/sa_query.c
+++ b/drivers/infiniband/core/sa_query.c
@@ -546,7 +546,7 @@ int ib_init_ah_from_path(struct ib_device *device, u8 
port_num,
ah_attr-ah_flags = IB_AH_GRH;
ah_attr-grh.dgid = rec-dgid;
 
-   ret = ib_find_cached_gid(device, rec-sgid, IB_GID_TYPE_IB,
+   ret = ib_find_cached_gid(device, rec-sgid, rec-gid_type,
 rec-net, rec-ifindex, port_num,
 gid_index);
if (ret)
@@ -676,6 +676,7 @@ static void ib_sa_path_rec_callback(struct ib_sa_query 
*sa_query,
  mad-data, rec);
rec.net = NULL;
rec.ifindex = 0;
+   rec.gid_type = IB_GID_TYPE_IB;
memset(rec.dmac, 0, ETH_ALEN);
query-callback(status, rec, query-context);
} else
diff --git a/drivers/infiniband/core/uverbs_marshall.c 
b/drivers/infiniband/core/uverbs_marshall.c
index 7d2f14c..af020f8

[PATCH 19/30] IB/mlx4: Replace spin_lock with rw_semaphore

2015-02-18 Thread Somnath Kotur
From: Moni Shoua mo...@mellanox.com

Protection on iboe-netdevs is no longer required to be from an atomic context.
Replacing a spin_lock with a semaphore is allowed and makes more sense.

Signed-off-by: Moni Shoua mo...@mellanox.com
Signed-off-by: Somnath Kotur somnath.ko...@emulex.com
---
 drivers/infiniband/hw/mlx4/main.c|   27 ++-
 drivers/infiniband/hw/mlx4/mlx4_ib.h |2 +-
 2 files changed, 11 insertions(+), 18 deletions(-)

diff --git a/drivers/infiniband/hw/mlx4/main.c 
b/drivers/infiniband/hw/mlx4/main.c
index 91caffc..d8b227e 100644
--- a/drivers/infiniband/hw/mlx4/main.c
+++ b/drivers/infiniband/hw/mlx4/main.c
@@ -369,7 +369,7 @@ static int eth_link_query_port(struct ib_device *ibdev, u8 
port,
props-active_mtu   = IB_MTU_256;
if (is_bonded)
rtnl_lock(); /* required to get upper dev */
-   spin_lock_bh(iboe-lock);
+   down_read(iboe-sem);
ndev = iboe-netdevs[port - 1];
if (ndev  is_bonded)
ndev = netdev_master_upper_dev_get(ndev);
@@ -383,7 +383,7 @@ static int eth_link_query_port(struct ib_device *ibdev, u8 
port,
IB_PORT_ACTIVE : IB_PORT_DOWN;
props-phys_state   = state_to_phys_state(props-state);
 out_unlock:
-   spin_unlock_bh(iboe-lock);
+   up_read(iboe-sem);
if (is_bonded)
rtnl_unlock();
 out:
@@ -825,11 +825,11 @@ int mlx4_ib_add_mc(struct mlx4_ib_dev *mdev, struct 
mlx4_ib_qp *mqp,
if (!mqp-port)
return 0;
 
-   spin_lock_bh(mdev-iboe.lock);
+   down_read(mdev-iboe.sem);
ndev = mdev-iboe.netdevs[mqp-port - 1];
if (ndev)
dev_hold(ndev);
-   spin_unlock_bh(mdev-iboe.lock);
+   up_read(mdev-iboe.sem);
 
if (ndev) {
ret = 1;
@@ -1330,7 +1330,6 @@ static int mlx4_ib_mcg_detach(struct ib_qp *ibqp, union 
ib_gid *gid, u16 lid)
struct mlx4_ib_dev *mdev = to_mdev(ibqp-device);
struct mlx4_dev *dev = mdev-dev;
struct mlx4_ib_qp *mqp = to_mqp(ibqp);
-   struct net_device *ndev;
struct mlx4_ib_gid_entry *ge;
enum mlx4_protocol prot =  MLX4_PROT_IB_IPV6;
struct mlx4_flow_reg_id reg_id = {0, 0};
@@ -1370,13 +1369,6 @@ static int mlx4_ib_mcg_detach(struct ib_qp *ibqp, union 
ib_gid *gid, u16 lid)
mutex_lock(mqp-mutex);
ge = find_gid_entry(mqp, gid-raw);
if (ge) {
-   spin_lock_bh(mdev-iboe.lock);
-   ndev = ge-added ? mdev-iboe.netdevs[ge-port - 1] : NULL;
-   if (ndev)
-   dev_hold(ndev);
-   spin_unlock_bh(mdev-iboe.lock);
-   if (ndev)
-   dev_put(ndev);
list_del(ge-list);
kfree(ge);
} else
@@ -1543,7 +1535,7 @@ static void mlx4_ib_scan_netdevs(struct mlx4_ib_dev 
*ibdev,
 
iboe = ibdev-iboe;
 
-   spin_lock_bh(iboe-lock);
+   down_write(iboe-sem);
mlx4_foreach_ib_transport_port(port, ibdev-dev) {
 
iboe-netdevs[port - 1] =
@@ -1555,7 +1547,7 @@ static void mlx4_ib_scan_netdevs(struct mlx4_ib_dev 
*ibdev,
update_qps_port = port;
 
}
-   spin_unlock_bh(iboe-lock);
+   up_write(iboe-sem);
 
if (update_qps_port  0)
mlx4_ib_update_qps(ibdev, dev, update_qps_port);
@@ -1848,7 +1840,7 @@ static void *mlx4_ib_add(struct mlx4_dev *dev)
 
mlx4_ib_alloc_eqs(dev, ibdev);
 
-   spin_lock_init(iboe-lock);
+   init_rwsem(iboe-sem);
 
if (init_node_data(ibdev))
goto err_map;
@@ -2153,7 +2145,8 @@ static void handle_bonded_port_state_event(struct 
work_struct *work)
struct ib_event ibev;
 
kfree(ew);
-   spin_lock_bh(ibdev-iboe.lock);
+
+   down_read(ibdev-iboe.sem);
for (i = 0; i  MLX4_MAX_PORTS; ++i) {
struct net_device *curr_netdev = ibdev-iboe.netdevs[i];
 
@@ -2165,7 +2158,7 @@ static void handle_bonded_port_state_event(struct 
work_struct *work)
bonded_port_state = (bonded_port_state != IB_PORT_ACTIVE) ?
curr_port_state : IB_PORT_ACTIVE;
}
-   spin_unlock_bh(ibdev-iboe.lock);
+   up_read(ibdev-iboe.sem);
 
ibev.device = ibdev-ib_dev;
ibev.element.port_num = 1;
diff --git a/drivers/infiniband/hw/mlx4/mlx4_ib.h 
b/drivers/infiniband/hw/mlx4/mlx4_ib.h
index e3805a4..166ebf9 100644
--- a/drivers/infiniband/hw/mlx4/mlx4_ib.h
+++ b/drivers/infiniband/hw/mlx4/mlx4_ib.h
@@ -455,7 +455,7 @@ struct mlx4_ib_sriov {
 };
 
 struct mlx4_ib_iboe {
-   spinlock_t  lock;
+   struct rw_semaphore sem; /* guard from concurrent access to data in 
this struct */
struct net_device  *netdevs[MLX4_MAX_PORTS];
atomic64_t  mac[MLX4_MAX_PORTS];
struct notifier_block   nb;
-- 
1.7.1

--
To unsubscribe from this list

[PATCH 18/30] IB/mlx4: Remove gid table management for RoCE

2015-02-18 Thread Somnath Kotur
From: Moni Shoua mo...@mellanox.com

RoCE GID table management moved to InfiniBand core driver.
Core driver is now responsible to populate the GID table and supply
query and lookup functions for GIDs. HW drivers are responsible only modify
GID table in network adapters.
The query_gid hook should now return the answer from the cache when link layer
is Ethernet.

Signed-off-by: Moni Shoua mo...@mellanox.com
Signed-off-by: Somnath Kotur somnath.ko...@emulex.com
---
 drivers/infiniband/hw/mlx4/main.c|  495 +-
 drivers/infiniband/hw/mlx4/mlx4_ib.h |4 -
 2 files changed, 14 insertions(+), 485 deletions(-)

diff --git a/drivers/infiniband/hw/mlx4/main.c 
b/drivers/infiniband/hw/mlx4/main.c
index 6fa5e49..91caffc 100644
--- a/drivers/infiniband/hw/mlx4/main.c
+++ b/drivers/infiniband/hw/mlx4/main.c
@@ -45,6 +45,7 @@
 #include rdma/ib_smi.h
 #include rdma/ib_user_verbs.h
 #include rdma/ib_addr.h
+#include rdma/ib_cache.h
 
 #include linux/mlx4/driver.h
 #include linux/mlx4/cmd.h
@@ -74,13 +75,6 @@ static const char mlx4_ib_version[] =
DRV_NAME : Mellanox ConnectX InfiniBand driver v
DRV_VERSION  ( DRV_RELDATE )\n;
 
-struct update_gid_work {
-   struct work_struct  work;
-   union ib_gidgids[128];
-   struct mlx4_ib_dev *dev;
-   int port;
-};
-
 static void do_slave_init(struct mlx4_ib_dev *ibdev, int slave, int do_init);
 
 static struct workqueue_struct *wq;
@@ -474,23 +468,21 @@ out:
return err;
 }
 
-static int iboe_query_gid(struct ib_device *ibdev, u8 port, int index,
- union ib_gid *gid)
-{
-   struct mlx4_ib_dev *dev = to_mdev(ibdev);
-
-   *gid = dev-iboe.gid_table[port - 1][index];
-
-   return 0;
-}
-
 static int mlx4_ib_query_gid(struct ib_device *ibdev, u8 port, int index,
 union ib_gid *gid)
 {
-   if (rdma_port_get_link_layer(ibdev, port) == IB_LINK_LAYER_INFINIBAND)
+   int ret;
+
+   if (ib_cache_use_roce_gid_cache(ibdev, port))
return __mlx4_ib_query_gid(ibdev, port, index, gid, 0);
-   else
-   return iboe_query_gid(ibdev, port, index, gid);
+
+   ret = ib_get_cached_gid(ibdev, port, index, gid, NULL);
+   if (ret == -EAGAIN) {
+   memcpy(gid, zgid, sizeof(*gid));
+   return 0;
+   }
+
+   return ret;
 }
 
 int __mlx4_ib_query_pkey(struct ib_device *ibdev, u8 port, u16 index,
@@ -1480,273 +1472,6 @@ static struct device_attribute *mlx4_class_attributes[] 
= {
dev_attr_board_id
 };
 
-static void mlx4_addrconf_ifid_eui48(u8 *eui, u16 vlan_id,
-struct net_device *dev)
-{
-   memcpy(eui, dev-dev_addr, 3);
-   memcpy(eui + 5, dev-dev_addr + 3, 3);
-   if (vlan_id  0x1000) {
-   eui[3] = vlan_id  8;
-   eui[4] = vlan_id  0xff;
-   } else {
-   eui[3] = 0xff;
-   eui[4] = 0xfe;
-   }
-   eui[0] ^= 2;
-}
-
-static void update_gids_task(struct work_struct *work)
-{
-   struct update_gid_work *gw = container_of(work, struct update_gid_work, 
work);
-   struct mlx4_cmd_mailbox *mailbox;
-   union ib_gid *gids;
-   int err;
-   struct mlx4_dev *dev = gw-dev-dev;
-   int is_bonded = mlx4_is_bonded(dev);
-
-   if (!gw-dev-ib_active)
-   return;
-
-   mailbox = mlx4_alloc_cmd_mailbox(dev);
-   if (IS_ERR(mailbox)) {
-   pr_warn(update gid table failed %ld\n, PTR_ERR(mailbox));
-   return;
-   }
-
-   gids = mailbox-buf;
-   memcpy(gids, gw-gids, sizeof gw-gids);
-
-   err = mlx4_cmd(dev, mailbox-dma, MLX4_SET_PORT_GID_TABLE  8 | 
gw-port,
-  1, MLX4_CMD_SET_PORT, MLX4_CMD_TIME_CLASS_B,
-  MLX4_CMD_WRAPPED);
-   if (err)
-   pr_warn(set port command failed\n);
-   else
-   if ((gw-port == 1) || !is_bonded)
-   mlx4_ib_dispatch_event(gw-dev,
-  is_bonded ? 1 : gw-port,
-  IB_EVENT_GID_CHANGE);
-
-   mlx4_free_cmd_mailbox(dev, mailbox);
-   kfree(gw);
-}
-
-static void reset_gids_task(struct work_struct *work)
-{
-   struct update_gid_work *gw =
-   container_of(work, struct update_gid_work, work);
-   struct mlx4_cmd_mailbox *mailbox;
-   union ib_gid *gids;
-   int err;
-   struct mlx4_dev *dev = gw-dev-dev;
-
-   if (!gw-dev-ib_active)
-   return;
-
-   mailbox = mlx4_alloc_cmd_mailbox(dev);
-   if (IS_ERR(mailbox)) {
-   pr_warn(reset gid table failed\n);
-   goto free;
-   }
-
-   gids = mailbox-buf;
-   memcpy(gids, gw-gids, sizeof(gw-gids));
-
-   if (mlx4_ib_port_link_layer(gw-dev-ib_dev, gw-port

[PATCH 17/30] RDMA/ocrdma: changes to support user AH creation

2015-02-18 Thread Somnath Kotur
From: Devesh Sharma devesh.sha...@emulex.com

To support user space AH this uses ahid field to convey
l3-type to user space library. The library is responsible
for decoding the l3-type out of ahid.

Signed-off-by: Somnath Kotur somnath.ko...@emulex.com
Signed-off-by: Devesh Sharma devesh.sha...@emulex.com
---
 drivers/infiniband/hw/ocrdma/ocrdma_ah.c |5 +
 drivers/infiniband/hw/ocrdma/ocrdma_ah.h |5 +++--
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_ah.c 
b/drivers/infiniband/hw/ocrdma/ocrdma_ah.c
index 70a885b..b42fa24 100644
--- a/drivers/infiniband/hw/ocrdma/ocrdma_ah.c
+++ b/drivers/infiniband/hw/ocrdma/ocrdma_ah.c
@@ -190,6 +190,11 @@ struct ib_ah *ocrdma_create_ah(struct ib_pd *ibpd, struct 
ib_ah_attr *attr)
ahid_addr = pd-uctx-ah_tbl.va + attr-dlid;
*ahid_addr = 0;
*ahid_addr |= ah-id  OCRDMA_AH_ID_MASK;
+   if (ocrdma_is_rocev2_supported(dev)) {
+   *ahid_addr |= ((u32)ah-hdr_type 
+  OCRDMA_AH_L3_TYPE_MASK) 
+  OCRDMA_AH_L3_TYPE_SHIFT;
+   }
if (isvlan)
*ahid_addr |= (OCRDMA_AH_VLAN_VALID_MASK 
   OCRDMA_AH_VLAN_VALID_SHIFT);
diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_ah.h 
b/drivers/infiniband/hw/ocrdma/ocrdma_ah.h
index 726a87c..ed45ecd 100644
--- a/drivers/infiniband/hw/ocrdma/ocrdma_ah.h
+++ b/drivers/infiniband/hw/ocrdma/ocrdma_ah.h
@@ -31,9 +31,10 @@
 enum {
OCRDMA_AH_ID_MASK   = 0x3FF,
OCRDMA_AH_VLAN_VALID_MASK   = 0x01,
-   OCRDMA_AH_VLAN_VALID_SHIFT  = 0x1F
+   OCRDMA_AH_VLAN_VALID_SHIFT  = 0x1F,
+   OCRDMA_AH_L3_TYPE_MASK  = 0x03,
+   OCRDMA_AH_L3_TYPE_SHIFT = 0x1D /* 29 bits */
 };
-
 struct ib_ah *ocrdma_create_ah(struct ib_pd *, struct ib_ah_attr *);
 int ocrdma_destroy_ah(struct ib_ah *);
 int ocrdma_query_ah(struct ib_ah *, struct ib_ah_attr *);
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 16/30] RDMA/ocrdma: changes to support RoCE-v2 in RC path

2015-02-18 Thread Somnath Kotur
From: Devesh Sharma devesh.sha...@emulex.com

To support RoCE-V2 this patch implements following changes
1. Get the GID-type for a given sgid.
2. Based on the gid type get IPv4 L3 address
   and give those to FW.
3. Provide l3-type to FW.

Signed-off-by: Somnath Kotur somnath.ko...@emulex.com
Signed-off-by: Devesh Sharma devesh.sha...@emulex.com
---
 drivers/infiniband/hw/ocrdma/ocrdma_hw.c |   28 +++-
 1 files changed, 27 insertions(+), 1 deletions(-)

diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_hw.c 
b/drivers/infiniband/hw/ocrdma/ocrdma_hw.c
index cb98911..237b62c 100644
--- a/drivers/infiniband/hw/ocrdma/ocrdma_hw.c
+++ b/drivers/infiniband/hw/ocrdma/ocrdma_hw.c
@@ -2433,7 +2433,13 @@ static int ocrdma_set_av_params(struct ocrdma_qp *qp,
union ib_gid sgid, zgid;
struct ib_gid_attr sgid_attr;
u32 vlan_id = 0x;
-   u8 mac_addr[6];
+   u8 mac_addr[6], hdr_type;
+   union {
+   struct sockaddr _sockaddr;
+   struct sockaddr_in  _sockaddr_in;
+   struct sockaddr_in6 _sockaddr_in6;
+   } sgid_addr, dgid_addr;
+
struct ocrdma_dev *dev = get_ocrdma_dev(qp-ibqp.device);
 
if ((ah_attr-ah_flags  IB_AH_GRH) == 0)
@@ -2448,6 +2454,8 @@ static int ocrdma_set_av_params(struct ocrdma_qp *qp,
cmd-params.hop_lmt_rq_psn |=
(ah_attr-grh.hop_limit  OCRDMA_QP_PARAMS_HOP_LMT_SHIFT);
cmd-flags |= OCRDMA_QP_PARA_FLOW_LBL_VALID;
+
+   /* GIDs */
memcpy(cmd-params.dgid[0], ah_attr-grh.dgid.raw[0],
   sizeof(cmd-params.dgid));
 
@@ -2471,6 +2479,19 @@ static int ocrdma_set_av_params(struct ocrdma_qp *qp,
return status;
cmd-params.dmac_b0_to_b3 = mac_addr[0] | (mac_addr[1]  8) |
(mac_addr[2]  16) | (mac_addr[3]  24);
+   hdr_type = ib_gid_to_network_type(sgid_attr.gid_type, sgid);
+   if (hdr_type == RDMA_NETWORK_IPV4) {
+   status = rdma_gid2ip(sgid_addr._sockaddr, sgid);
+   if (status)
+   return status;
+   status = rdma_gid2ip(dgid_addr._sockaddr, ah_attr-grh.dgid);
+   if (status)
+   return status;
+   memcpy(cmd-params.dgid[0],
+  dgid_addr._sockaddr_in.sin_addr.s_addr, 4);
+   memcpy(cmd-params.sgid[0],
+  sgid_addr._sockaddr_in.sin_addr.s_addr, 4);
+   }
/* convert them to LE format. */
ocrdma_cpu_to_le32(cmd-params.dgid[0], sizeof(cmd-params.dgid));
ocrdma_cpu_to_le32(cmd-params.sgid[0], sizeof(cmd-params.sgid));
@@ -2482,6 +2503,11 @@ static int ocrdma_set_av_params(struct ocrdma_qp *qp,
cmd-params.rnt_rc_sl_fl |=
(dev-sl  0x07)  OCRDMA_QP_PARAMS_SL_SHIFT;
}
+
+   cmd-params.max_sge_recv_flags |=
+((hdr_type 
+OCRDMA_QP_PARAMS_FLAGS_L3_TYPE_SHIFT) 
+OCRDMA_QP_PARAMS_FLAGS_L3_TYPE_MASK);
return 0;
 }
 
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 02/30] IB/core: Add kref to IB devices

2015-02-18 Thread Somnath Kotur
From: Matan Barak mat...@mellanox.com

Previously. we used device_mutex lock in order to protect
the device's list. That means that in order to guarantee a
device isn't freed while we use it, we had to lock all
devices.

Adding a kref per IB device. Before an IB device
is unregistered, we wait before its not held anymore.

Signed-off-by: Matan Barak mat...@mellanox.com
Signed-off-by: Somnath Kotur somnath.ko...@emulex.com
---
 drivers/infiniband/core/device.c |   41 ++
 include/rdma/ib_verbs.h  |6 +
 2 files changed, 47 insertions(+), 0 deletions(-)

diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
index 18c1ece..8616a95 100644
--- a/drivers/infiniband/core/device.c
+++ b/drivers/infiniband/core/device.c
@@ -261,6 +261,39 @@ out:
return ret;
 }
 
+static void ib_device_complete_cb(struct kref *kref)
+{
+   struct ib_device *device = container_of(kref, struct ib_device,
+   refcount);
+
+   if (device-reg_state = IB_DEV_UNREGISTERING)
+   complete(device-free);
+}
+
+/**
+ * ib_device_hold - increase the reference count of device
+ * @device: ib device to prevent from being free'd
+ *
+ * Prevent the device from being free'd.
+ */
+void ib_device_hold(struct ib_device *device)
+{
+   kref_get(device-refcount);
+}
+EXPORT_SYMBOL(ib_device_hold);
+
+/**
+ * ib_device_put - decrease the reference count of device
+ * @device: allows this device to be free'd
+ *
+ * Puts the ib_device and allows it to be free'd.
+ */
+int ib_device_put(struct ib_device *device)
+{
+   return kref_put(device-refcount, ib_device_complete_cb);
+}
+EXPORT_SYMBOL(ib_device_put);
+
 /**
  * ib_register_device - Register an IB device with IB core
  * @device:Device to register
@@ -312,6 +345,9 @@ int ib_register_device(struct ib_device *device,
 
list_add_tail(device-core_list, device_list);
 
+   kref_init(device-refcount);
+   init_completion(device-free);
+
device-reg_state = IB_DEV_REGISTERED;
 
{
@@ -342,6 +378,8 @@ void ib_unregister_device(struct ib_device *device)
 
mutex_lock(device_mutex);
 
+   device-reg_state = IB_DEV_UNREGISTERING;
+
list_for_each_entry_reverse(client, client_list, list)
if (client-remove)
client-remove(device);
@@ -355,6 +393,9 @@ void ib_unregister_device(struct ib_device *device)
 
ib_device_unregister_sysfs(device);
 
+   ib_device_put(device);
+   wait_for_completion(device-free);
+
spin_lock_irqsave(device-client_data_lock, flags);
list_for_each_entry_safe(context, tmp, device-client_data_list, list)
kfree(context);
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 1866595..a7593b0 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -1716,6 +1716,7 @@ struct ib_device {
enum {
IB_DEV_UNINITIALIZED,
IB_DEV_REGISTERED,
+   IB_DEV_UNREGISTERING,
IB_DEV_UNREGISTERED
}reg_state;
 
@@ -1728,6 +1729,8 @@ struct ib_device {
u32  local_dma_lkey;
u8   node_type;
u8   phys_port_cnt;
+   struct kref  refcount;
+   struct completionfree;
 };
 
 struct ib_client {
@@ -1741,6 +1744,9 @@ struct ib_client {
 struct ib_device *ib_alloc_device(size_t size);
 void ib_dealloc_device(struct ib_device *device);
 
+void ib_device_hold(struct ib_device *device);
+int ib_device_put(struct ib_device *device);
+
 int ib_register_device(struct ib_device *device,
   int (*port_callback)(struct ib_device *,
u8, struct kobject *));
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 03/30] IB/core: Add RoCE GID population

2015-02-18 Thread Somnath Kotur
From: Matan Barak mat...@mellanox.com

In order to populate the GID table, we need to listen for
events:
(a) IB device has been added or removed - used in order
to allocate/deallocate the cache and populate
the GID table internally.
(b) inet events - add new GIDs (according to the IP addresses)
to the table.
(c) netdev up/down/change_addr - if a netdev is built onto our
RoCE device, we need to add/delete its IPs.

When an event is received, multiple entries (each with
different GID type) are added.

Signed-off-by: Matan Barak mat...@mellanox.com
Signed-off-by: Somnath Kotur somnath.ko...@emulex.com
---
 drivers/infiniband/core/Makefile |2 +-
 drivers/infiniband/core/core_priv.h  |   26 ++
 drivers/infiniband/core/device.c |   80 +
 drivers/infiniband/core/roce_gid_cache.c |   66 
 drivers/infiniband/core/roce_gid_mgmt.c  |  545 ++
 include/rdma/ib_addr.h   |2 +-
 include/rdma/ib_verbs.h  |9 +
 7 files changed, 728 insertions(+), 2 deletions(-)
 create mode 100644 drivers/infiniband/core/roce_gid_mgmt.c

diff --git a/drivers/infiniband/core/Makefile b/drivers/infiniband/core/Makefile
index 9b63bdf..2c94963 100644
--- a/drivers/infiniband/core/Makefile
+++ b/drivers/infiniband/core/Makefile
@@ -10,7 +10,7 @@ obj-$(CONFIG_INFINIBAND_USER_ACCESS) +=   ib_uverbs.o 
ib_ucm.o \
 
 ib_core-y :=   packer.o ud_header.o verbs.o sysfs.o \
device.o fmr_pool.o cache.o netlink.o \
-   roce_gid_cache.o
+   roce_gid_cache.o roce_gid_mgmt.o
 ib_core-$(CONFIG_INFINIBAND_USER_MEM) += umem.o
 ib_core-$(CONFIG_INFINIBAND_ON_DEMAND_PAGING) += umem_odp.o umem_rbtree.o
 
diff --git a/drivers/infiniband/core/core_priv.h 
b/drivers/infiniband/core/core_priv.h
index a502daa..12797d9 100644
--- a/drivers/infiniband/core/core_priv.h
+++ b/drivers/infiniband/core/core_priv.h
@@ -39,6 +39,8 @@
 
 #include rdma/ib_verbs.h
 
+extern struct workqueue_struct *roce_gid_mgmt_wq;
+
 int  ib_device_register_sysfs(struct ib_device *device,
  int (*port_callback)(struct ib_device *,
   u8, struct kobject *));
@@ -53,6 +55,22 @@ void ib_cache_cleanup(void);
 int ib_resolve_eth_l2_attrs(struct ib_qp *qp,
struct ib_qp_attr *qp_attr, int *qp_attr_mask);
 
+typedef void (*roce_netdev_callback)(struct ib_device *device, u8 port,
+ struct net_device *idev, void *cookie);
+
+typedef int (*roce_netdev_filter)(struct ib_device *device, u8 port,
+struct net_device *idev, void *cookie);
+
+void ib_dev_roce_ports_of_netdev(struct ib_device *ib_dev,
+roce_netdev_filter filter,
+void *filter_cookie,
+roce_netdev_callback cb,
+void *cookie);
+void ib_enum_roce_ports_of_netdev(roce_netdev_filter filter,
+ void *filter_cookie,
+ roce_netdev_callback cb,
+ void *cookie);
+
 int roce_gid_cache_get_gid(struct ib_device *ib_dev, u8 port, int index,
   union ib_gid *gid, struct ib_gid_attr *attr);
 
@@ -66,6 +84,9 @@ int roce_gid_cache_find_gid_by_port(struct ib_device *ib_dev, 
union ib_gid *gid,
 
 int roce_gid_cache_is_active(struct ib_device *ib_dev, u8 port);
 
+int roce_gid_cache_setup(void);
+void roce_gid_cache_cleanup(void);
+
 int roce_add_gid(struct ib_device *ib_dev, u8 port,
 union ib_gid *gid, struct ib_gid_attr *attr);
 
@@ -75,4 +96,9 @@ int roce_del_gid(struct ib_device *ib_dev, u8 port,
 int roce_del_all_netdev_gids(struct ib_device *ib_dev, u8 port,
 struct net_device *ndev);
 
+int roce_gid_mgmt_init(void);
+void roce_gid_mgmt_cleanup(void);
+
+int roce_rescan_device(struct ib_device *ib_dev);
+
 #endif /* _CORE_PRIV_H */
diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
index 8616a95..5ce57bf 100644
--- a/drivers/infiniband/core/device.c
+++ b/drivers/infiniband/core/device.c
@@ -39,6 +39,7 @@
 #include linux/init.h
 #include linux/mutex.h
 #include rdma/rdma_netlink.h
+#include rdma/ib_addr.h
 
 #include core_priv.h
 
@@ -640,6 +641,82 @@ int ib_query_gid(struct ib_device *device,
 EXPORT_SYMBOL(ib_query_gid);
 
 /**
+ * ib_dev_roce_ports_of_netdev - enumerate RoCE ports of ibdev in
+ *  respect of netdev
+ * @ib_dev : IB device we want to query
+ * @filter: Should we call the callback?
+ * @filter_cookie: Cookie passed to filter
+ * @cb: Callback to call for each found RoCE ports
+ * @cookie: Cookie passed back to the callback
+ *
+ * Enumerates all of the physical RoCE ports of ib_dev RoCE ports
+ * which are relaying Ethernet packets to a specific

  1   2   >