Re: IB/core: Use GID table in AH creation and dmac resolution
Thanks Dan and Matan. We will take a look and revert on this Thanks Som On Wed, Nov 4, 2015 at 9:31 AM, Somnath Kotur <somnath.ko...@avagotech.com> wrote: > Thanks Dan and Matan. > > We will take a look and revert on this > > Thanks > Som > > On Tue, Nov 3, 2015 at 7:14 PM, Matan Barak <mat...@mellanox.com> wrote: >> >> >> >> On 11/3/2015 3:11 PM, Dan Carpenter wrote: >>> >>> Hello Matan Barak, >>> >>> This is a semi-automatic email about new static checker warnings. >>> >>> The patch dbf727de7440: "IB/core: Use GID table in AH creation and >>> dmac resolution" from Oct 15, 2015, leads to the following Smatch >>> complaint: >>> >>> drivers/infiniband/hw/ocrdma/ocrdma_ah.c:157 ocrdma_create_ah() >>> error: we previously assumed 'sgid_attr.ndev' could be null (see >>> line 146) >>> >>> drivers/infiniband/hw/ocrdma/ocrdma_ah.c >>> 145 } >>> 146 if (sgid_attr.ndev) { >>> ^^ >>> Patch introduces a NULL check. >>> >>> 147 if (is_vlan_dev(sgid_attr.ndev)) >>> 148 vlan_tag = >>> vlan_dev_vlan_id(sgid_attr.ndev); >>> 149 dev_put(sgid_attr.ndev); >>> 150 } >>> 151 >>> 152 if ((pd->uctx) && >>> 153 (!rdma_is_multicast_addr((struct in6_addr >>> *)attr->grh.dgid.raw)) && >>> 154 (!rdma_link_local_addr((struct in6_addr >>> *)attr->grh.dgid.raw))) { >>> 155 status = rdma_addr_find_dmac_by_grh(, >>> >grh.dgid, >>> 156 attr->dmac, >>> _tag, >>> 157 >>> sgid_attr.ndev->ifindex); >>> >>> >>> Patch introduces this new dereference. The warning might be a false >>> positive if "pd->uctx" or rdma_is_multicast_addr() imply it's non-NULL >>> but I don't know this code well enough to say for sure. Hence this >>> email. :) >>> >>> 158 if (status) { >>> 159 pr_err("%s(): Failed to resolve dmac from >>> gid." >>> >>> regards, >>> dan carpenter >>> >> >> Thanks for the catch Dan. >> As I wrote in the commit message - "ocrdma driver changes were done by >> Somnath Kotur <somnath.ko...@avagotech.com>" >> Somnath, RoCE implies non-NULL ndev, but dereferencing ifindex after >> dev_put doesn't seem to be safe. >> Could you please take a look? >> >> Thanks, >> Matan >> > -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH for-next V5 12/12] RDMA/ocrdma: Changes in driver to incorporate the moving of GID Table mgmt to IB/Core.
Hi, Yes , Matan and I need to work together and revisit this patch in light of the split patch series and remove any references to RoCE v2... Thanks for the feedback Jason and apologies for the oversight, we should have worked this out internally before sending out V5 Regards Som -Original Message- From: Jason Gunthorpe [mailto:jguntho...@obsidianresearch.com] Sent: Thursday, June 11, 2015 9:41 AM To: Matan Barak Cc: Doug Ledford; Or Gerlitz; Moni Shoua; Sean Hefty; Somnath Kotur; linux- r...@vger.kernel.org; Somnath Kotur; Devesh Sharma Subject: Re: [PATCH for-next V5 12/12] RDMA/ocrdma: Changes in driver to incorporate the moving of GID Table mgmt to IB/Core. On Mon, Jun 08, 2015 at 05:12:15PM +0300, Matan Barak wrote: From: Somnath Kotur somnath.ko...@emulex.com 1.Check and set port capability flags to indicate RoCEV2 support. ??? This series has nothing to with rocev2 now, what is this about? mutex_init(dev-dev_lock); - dev-sgid_tbl = kzalloc(sizeof(union ib_gid) * - OCRDMA_MAX_SGID, GFP_KERNEL); Should sgid_tbl be dropped from the structure? +int ocrdma_modify_gid(struct ib_device *ibdev, u8 port_num, unsigned int index, + const union ib_gid *gid, const struct ib_gid_attr *attr, + void **context) +{ + struct ocrdma_dev *dev; + + dev = get_ocrdma_dev(ibdev); return 0; } Empty modify gid? Shouldn't it be completely empty? This is correct? This HW sends the full SGID in the WQE? +enum { + OCRDMA_L3_TYPE_IB_GRH = 0x00, + OCRDMA_L3_TYPE_IPV4 = 0x01, + OCRDMA_L3_TYPE_IPV6 = 0x02 +}; These added constants are not used? Probably others as well? Jason -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH v3 for-next 01/33] IB/core: Add RoCE GID cache
-Original Message- From: Hefty, Sean [mailto:sean.he...@intel.com] Sent: Tuesday, April 14, 2015 11:02 PM To: Matan Barak; Somnath Kotur; rol...@kernel.org Cc: linux-rdma@vger.kernel.org Subject: RE: [PATCH v3 for-next 01/33] IB/core: Add RoCE GID cache This is a part of the GID meta info. The user should be able to choose between RoCE V1 (which is represented here by IB_GID_TYPE_IB) and RoCE V2 - just as a user could choose between IPv6 and IPv4. IPv4 and IPv6 are different protocols, not different formats for the same address. How does RoCE v2 not break every app? It does not break every app, the choice of which GID type to use is made by the RDMA-CM based on network topology hint obtained from the IP stack. Please refer to patch 15/33: IB/Core: Changes to the IB Core infrastructure for RoCEv2 support. Of course, if the user does not want to go with this choice made by the RDMA-CM, then there is the option of overriding it using the configfs patch (PATCH 14/33) Hope that clarifies? Thanks Som -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH v3 for-next 01/33] IB/core: Add RoCE GID cache
Hi Sean, -Original Message- From: Hefty, Sean [mailto:sean.he...@intel.com] Sent: Wednesday, April 08, 2015 6:00 AM To: Somnath Kotur; rol...@kernel.org Cc: linux-rdma@vger.kernel.org; Matan Barak Subject: RE: [PATCH v3 for-next 01/33] IB/core: Add RoCE GID cache In order to manage multiple types, vlans and MACs per GID, we need to store them along the GID itself. We store the net device as well, as sometimes GIDs should be handled according to the net device they came from. Since populating the GID table should be identical for every RoCE provider, the GIDs table should be handled in ib_core. Adding a GID cache table that supports a lockless find, add and delete gids. The lockless nature comes from using a unique sequence number per table entry and detecting that while reading/ writing this sequence wasn't changed. By using this RoCE GID cache table, providers must implement a modify_gid callback. The table is managed exclusively by this roce_gid_cache and the provider just need to write the data to the hardware. Signed-off-by: Matan Barak mat...@mellanox.com Signed-off-by: Somnath Kotur somnath.ko...@emulex.com --- drivers/infiniband/core/Makefile | 3 +- drivers/infiniband/core/core_priv.h | 24 ++ drivers/infiniband/core/roce_gid_cache.c | 518 Why does RoCE need such a complex gid cache? If a gid cache is needed at all, why should it be restricted to RoCE only? And why is such a complex synchronization scheme needed? Seriously, how many times will GIDs change and how many readers at once do you expect to have? diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index 65994a1..1866595 100644 --- a/include/rdma/ib_verbs.h +++ b/include/rdma/ib_verbs.h @@ -64,6 +64,36 @@ union ib_gid { } global; }; +extern union ib_gid zgid; + +enum ib_gid_type { + /* If link layer is Ethernet, this is RoCE V1 */ I don't understand this comment. Does RoCE v2 not run on Ethernet? Yes, this comment probably could use a reword.. + IB_GID_TYPE_IB= 0, + IB_GID_TYPE_ROCE_V2 = 1, + IB_GID_TYPE_SIZE +}; Can you explain the purpose of defining a 'GID type'. A GID is just a global address. Why does it matter to anyone using it how it was constructed? This is part of RoCE V2 Specification. Please refer to Section A 17.8 . The GID Type determines the protocol for outbound packet generation i.e RoCE V1 (0x8915 Ether Type) or RoCEV2 (IPv4 or IPv6) + +struct ib_gid_attr { + enum ib_gid_typegid_type; + struct net_device *ndev; +}; + +struct ib_roce_gid_cache_entry { + /* seq number of 0 indicates entry being changed. */ + unsigned intseq; + union ib_gidgid; + struct ib_gid_attr attr; + void *context; +}; + +struct ib_roce_gid_cache { + int active; + int sz; + /* locking against multiple writes in data_vec */ + struct mutex lock; + struct ib_roce_gid_cache_entry *data_vec; }; + enum rdma_node_type { /* IB values map to NodeInfo:NodeType. */ RDMA_NODE_IB_CA = 1, @@ -265,7 +295,9 @@ enum ib_port_cap_flags { IB_PORT_BOOT_MGMT_SUP = 1 23, IB_PORT_LINK_LATENCY_SUP= 1 24, IB_PORT_CLIENT_REG_SUP = 1 25, - IB_PORT_IP_BASED_GIDS = 1 26 + IB_PORT_IP_BASED_GIDS = 1 26, + IB_PORT_ROCE= 1 27, + IB_PORT_ROCE_V2 = 1 28, Why does RoCE suddenly require a port capability bit? RoCE runs today without setting any bit. Again, this is part of RoCE V2 SPEC, please refer to Section A17.5.1- Query HCA(Pasting snippet below) A new RoCE Supported capability bit shall be added to the Port Attributes list. This capability bit applies exclusively to ports of the new RoCEv2 type Thanks Som -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH v3 for-next 01/33] IB/core: Add RoCE GID cache
Hi Matan/Moni, Could either of you please respond to both of Bart's queries? Thanks Somnath -Original Message- From: Bart Van Assche [mailto:bart.vanass...@sandisk.com] Sent: Thursday, March 26, 2015 5:13 AM To: Somnath Kotur; rol...@kernel.org Cc: linux-rdma@vger.kernel.org; Matan Barak Subject: Re: [PATCH v3 for-next 01/33] IB/core: Add RoCE GID cache On 03/25/2015 02:19 PM, Somnath Kotur wrote: + if (cache-data_vec[ix].attr.ndev + cache-data_vec[ix].attr.ndev != old_net_dev) A few lines earlier the memory old_net_dev points at was freed. If two instances of this function run concurrently, what prevents that the old_net_dev memory has been reallocated and hence that attr.ndev == old_net_dev although both pointers refer(red) to different network devices ? + ACCESS_ONCE(cache-data_vec[ix].seq) = orig_seq; Invoking write_gid() is only safe if the caller serializes write_gid() calls. Apparently the cache-lock mutex is used for that purpose. So why is it necessary to use ACCESS_ONCE() here ? Why is it needed to prevent that the compiler coalesces this write with another write into the same structure ? + /* Make sure the sequence number we remeber was read This looks like a typo - shouldn't the above read remember ? BTW, the style of that comment is recommended only for networking code and not for IB code. Have you verified this patch with checkpatch ? + mutex_lock(cache-lock); + + for (ix = 0; ix cache-sz; ix++) + if (cache-data_vec[ix].attr.ndev == ndev) + write_gid(ib_dev, port, cache, ix, zgid, zattr); + + mutex_unlock(cache-lock); + return 0; The traditional Linux kernel coding style is one blank line before mutex_lock() and after mutex_unlock() but not after mutex_lock() nor before mutex_unlock(). + orig_seq = ACCESS_ONCE(cache-data_vec[index].seq); + /* Make sure we read the sequence number before copying the +* gid to local storage. */ + smp_rmb(); Please use READ_ONCE() instead of ACCESS_ONCE() as recommended in linux/compiler.h. +static void free_roce_gid_cache(struct ib_device *ib_dev, u8 port) { + int i; + struct ib_roce_gid_cache *cache = + ib_dev-cache.roce_gid_cache[port - 1]; + + if (!cache) + return; + + for (i = 0; i cache-sz; ++i) { + if (memcmp(cache-data_vec[i].gid, zgid, + sizeof(cache-data_vec[i].gid))) + write_gid(ib_dev, port, cache, i, zgid, zattr); + } + kfree(cache-data_vec); + kfree(cache); +} Overwriting data just before it is freed is not useful. Please use CONFIG_SLUB_DEBUG=y to debug use-after-free issues instead of such code. Bart. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 for-next 22/33] IB/mlx4: Lock with RCU instead of RTNL
From: Moni Shoua mo...@mellanox.com The function eth_link_query_port() used to take the RTNL lock when call to netdev_master_upper_dev_get() was necessary. This makes it impossible to call this function with RTNL lock is held. Calling netdev_master_upper_dev_get_rcu() and locking with RCU instead solve this problem. Signed-off-by: Moni Shoua mo...@mellanox.com Signed-off-by: Somnath Kotur somnath.ko...@emulex.com --- drivers/infiniband/hw/mlx4/main.c | 15 +++ 1 file changed, 7 insertions(+), 8 deletions(-) diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c index d8b227e..32cd009 100644 --- a/drivers/infiniband/hw/mlx4/main.c +++ b/drivers/infiniband/hw/mlx4/main.c @@ -367,14 +367,15 @@ static int eth_link_query_port(struct ib_device *ibdev, u8 port, props-state= IB_PORT_DOWN; props-phys_state = state_to_phys_state(props-state); props-active_mtu = IB_MTU_256; - if (is_bonded) - rtnl_lock(); /* required to get upper dev */ down_read(iboe-sem); ndev = iboe-netdevs[port - 1]; - if (ndev is_bonded) - ndev = netdev_master_upper_dev_get(ndev); + if (ndev is_bonded) { + rcu_read_lock(); /* required to get upper dev */ + ndev = netdev_master_upper_dev_get_rcu(ndev); + rcu_read_unlock(); + } if (!ndev) - goto out_unlock; + goto unlock; tmp = iboe_get_mtu(ndev-mtu); props-active_mtu = tmp ? min(props-max_mtu, tmp) : IB_MTU_256; @@ -382,10 +383,8 @@ static int eth_link_query_port(struct ib_device *ibdev, u8 port, props-state= (netif_running(ndev) netif_carrier_ok(ndev)) ? IB_PORT_ACTIVE : IB_PORT_DOWN; props-phys_state = state_to_phys_state(props-state); -out_unlock: +unlock: up_read(iboe-sem); - if (is_bonded) - rtnl_unlock(); out: mlx4_free_cmd_mailbox(mdev-dev, mailbox); return err; -- 2.1.0 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 for-next 17/33] RDMA/ocrdma: changes to support RoCE-v2 in UD path
From: Devesh Sharma devesh.sha...@emulex.com To support UD protocol this patch adds following changes to existing UD implementation. 1. AH creation resolves gid-type for a given index. 2. Based on GID-type protocol header is built. 3. Work completion reports l3-type if f/w supports RoCE-v2 and sets IB_WC_WITH_NETWORK_HDR_TYPE flag in wc-wc_flags. Signed-off-by: Somnath Kotur somnath.ko...@emulex.com Signed-off-by: Devesh Sharma devesh.sha...@emulex.com --- drivers/infiniband/hw/ocrdma/ocrdma.h | 1 + drivers/infiniband/hw/ocrdma/ocrdma_ah.c| 69 - drivers/infiniband/hw/ocrdma/ocrdma_sli.h | 5 ++- drivers/infiniband/hw/ocrdma/ocrdma_verbs.c | 23 -- 4 files changed, 81 insertions(+), 17 deletions(-) diff --git a/drivers/infiniband/hw/ocrdma/ocrdma.h b/drivers/infiniband/hw/ocrdma/ocrdma.h index 97f971a..302fd0e 100644 --- a/drivers/infiniband/hw/ocrdma/ocrdma.h +++ b/drivers/infiniband/hw/ocrdma/ocrdma.h @@ -341,6 +341,7 @@ struct ocrdma_ah { struct ocrdma_av *av; u16 sgid_index; u32 id; + u8 hdr_type; }; struct ocrdma_qp_hwq_info { diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_ah.c b/drivers/infiniband/hw/ocrdma/ocrdma_ah.c index 7ecd230..1bb72a0 100644 --- a/drivers/infiniband/hw/ocrdma/ocrdma_ah.c +++ b/drivers/infiniband/hw/ocrdma/ocrdma_ah.c @@ -39,6 +39,20 @@ #define OCRDMA_VID_PCP_SHIFT 0xD +static u16 ocrdma_hdr_type_to_proto_num(u8 hdr_type) +{ + switch (hdr_type) { + case OCRDMA_L3_TYPE_IB_GRH: + return (u16)0x8915; + case OCRDMA_L3_TYPE_IPV4: + return (u16)0x0800; + case OCRDMA_L3_TYPE_IPV6: + return (u16)0x86dd; + default: + return 0; + } +} + static inline int set_av_attr(struct ocrdma_dev *dev, struct ocrdma_ah *ah, struct ib_ah_attr *attr, union ib_gid *sgid, int pdid, bool *isvlan, u16 vlan_tag) @@ -47,22 +61,33 @@ static inline int set_av_attr(struct ocrdma_dev *dev, struct ocrdma_ah *ah, struct ocrdma_eth_vlan eth; struct ocrdma_grh grh; int eth_sz; + u16 proto_num = 0; + u8 nxthdr = 0x11; + struct iphdr ipv4; + union { + struct sockaddr _sockaddr; + struct sockaddr_in _sockaddr_in; + struct sockaddr_in6 _sockaddr_in6; + } sgid_addr, dgid_addr; memset(eth, 0, sizeof(eth)); memset(grh, 0, sizeof(grh)); + /* Protocol Number */ + proto_num = ocrdma_hdr_type_to_proto_num(ah-hdr_type); + nxthdr = (proto_num == 0x8915) ? 0x1b : 0x11; /* VLAN */ if (!vlan_tag || (vlan_tag 0xFFF)) vlan_tag = dev-pvid; if (vlan_tag (vlan_tag 0x1000)) { eth.eth_type = cpu_to_be16(0x8100); - eth.roce_eth_type = cpu_to_be16(OCRDMA_ROCE_ETH_TYPE); + eth.roce_eth_type = cpu_to_be16(proto_num); vlan_tag |= (dev-sl 0x07) OCRDMA_VID_PCP_SHIFT; eth.vlan_tag = cpu_to_be16(vlan_tag); eth_sz = sizeof(struct ocrdma_eth_vlan); *isvlan = true; } else { - eth.eth_type = cpu_to_be16(OCRDMA_ROCE_ETH_TYPE); + eth.eth_type = cpu_to_be16(proto_num); eth_sz = sizeof(struct ocrdma_eth_basic); } /* MAC */ @@ -71,18 +96,34 @@ static inline int set_av_attr(struct ocrdma_dev *dev, struct ocrdma_ah *ah, if (status) return status; ah-sgid_index = attr-grh.sgid_index; - memcpy(grh.sgid[0], sgid-raw, sizeof(union ib_gid)); - memcpy(grh.dgid[0], attr-grh.dgid.raw, sizeof(attr-grh.dgid.raw)); - - grh.tclass_flow = cpu_to_be32((6 28) | - (attr-grh.traffic_class 24) | - attr-grh.flow_label); - /* 0x1b is next header value in GRH */ - grh.pdid_hoplimit = cpu_to_be32((pdid 16) | - (0x1b 8) | attr-grh.hop_limit); /* Eth HDR */ memcpy(ah-av-eth_hdr, eth, eth_sz); - memcpy((u8 *)ah-av + eth_sz, grh, sizeof(struct ocrdma_grh)); + if (ah-hdr_type == RDMA_NETWORK_IPV4) { + *((__be16 *)ipv4) = htons((4 12) | (5 8) | + attr-grh.traffic_class); + ipv4.id = cpu_to_be16(pdid); + ipv4.frag_off = htons(IP_DF); + ipv4.tot_len = htons(0); + ipv4.ttl = attr-grh.hop_limit; + ipv4.protocol = nxthdr; + rdma_gid2ip(sgid_addr._sockaddr, sgid); + ipv4.saddr = sgid_addr._sockaddr_in.sin_addr.s_addr; + rdma_gid2ip(dgid_addr._sockaddr, attr-grh.dgid); + ipv4.daddr = dgid_addr._sockaddr_in.sin_addr.s_addr; + memcpy((u8 *)ah-av + eth_sz, ipv4, sizeof(struct iphdr)); + } else { + memcpy(grh.sgid[0], sgid-raw
[PATCH v3 for-next 24/33] IB/mlx4: Advertise RoCE support in port capabilities
From: Moni Shoua mo...@mellanox.com The port capability flags should indicate the support in RoCE modes (V1 or V2) of the port. The mlx4 driver sets these flags according to the capabilities reported by the HW. Signed-off-by: Moni Shoua mo...@mellanox.com Signed-off-by: Somnath Kotur somnath.ko...@emulex.com --- drivers/infiniband/hw/mlx4/main.c | 6 ++ drivers/net/ethernet/mellanox/mlx4/fw.c | 5 - drivers/net/ethernet/mellanox/mlx4/main.c | 6 +- include/linux/mlx4/device.h | 13 ++--- 4 files changed, 25 insertions(+), 5 deletions(-) diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c index 32cd009..bf87a95 100644 --- a/drivers/infiniband/hw/mlx4/main.c +++ b/drivers/infiniband/hw/mlx4/main.c @@ -359,6 +359,12 @@ static int eth_link_query_port(struct ib_device *ibdev, u8 port, IB_WIDTH_4X : IB_WIDTH_1X; props-active_speed = IB_SPEED_QDR; props-port_cap_flags = IB_PORT_CM_SUP | IB_PORT_IP_BASED_GIDS; + + if (mdev-dev-caps.flags MLX4_DEV_CAP_FLAG_IBOE) + props-port_cap_flags |= IB_PORT_ROCE; + if (mdev-dev-caps.flags2 MLX4_DEV_CAP_FLAG2_ROCE_V1_V2) + props-port_cap_flags |= IB_PORT_ROCE_V2 | IB_PORT_ROCE; + props-gid_tbl_len = mdev-dev-caps.gid_table_len[port]; props-max_msg_sz = mdev-dev-caps.max_msg_sz; props-pkey_tbl_len = 1; diff --git a/drivers/net/ethernet/mellanox/mlx4/fw.c b/drivers/net/ethernet/mellanox/mlx4/fw.c index 3702fd1..d573e73 100644 --- a/drivers/net/ethernet/mellanox/mlx4/fw.c +++ b/drivers/net/ethernet/mellanox/mlx4/fw.c @@ -146,7 +146,8 @@ static void dump_dev_cap_flags2(struct mlx4_dev *dev, u64 flags) [17] = Asymmetric EQs support, [18] = More than 80 VFs support, [19] = Performance optimized for limited rule configuration flow steering support, - [21] = Port Remap support + [21] = Port Remap support, + [22] = RoCEv2 support }; int i; @@ -852,6 +853,8 @@ int mlx4_QUERY_DEV_CAP(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap) dev_cap-flags2 |= MLX4_DEV_CAP_FLAG2_EQE_STRIDE; MLX4_GET(dev_cap-bmme_flags, outbox, QUERY_DEV_CAP_BMME_FLAGS_OFFSET); + if (dev_cap-bmme_flags MLX4_FLAG_ROCE_V1_V2) + dev_cap-flags2 |= MLX4_DEV_CAP_FLAG2_ROCE_V1_V2; if (dev_cap-bmme_flags MLX4_FLAG_PORT_REMAP) dev_cap-flags2 |= MLX4_DEV_CAP_FLAG2_PORT_REMAP; MLX4_GET(field, outbox, QUERY_DEV_CAP_CONFIG_DEV_OFFSET); diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c b/drivers/net/ethernet/mellanox/mlx4/main.c index 1893a57..29c60fd 100644 --- a/drivers/net/ethernet/mellanox/mlx4/main.c +++ b/drivers/net/ethernet/mellanox/mlx4/main.c @@ -386,8 +386,12 @@ static int mlx4_dev_cap(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap) if (mlx4_priv(dev)-pci_dev_data MLX4_PCI_DEV_FORCE_SENSE_PORT) dev-caps.flags |= MLX4_DEV_CAP_FLAG_SENSE_SUPPORT; /* Don't do sense port on multifunction devices (for now at least) */ - if (mlx4_is_mfunc(dev)) + /* Don't do enable RoCE V2 on multifunction devices */ + if (mlx4_is_mfunc(dev)) { dev-caps.flags = ~MLX4_DEV_CAP_FLAG_SENSE_SUPPORT; + dev_cap-flags2 = ~MLX4_DEV_CAP_FLAG2_ROCE_V1_V2; + mlx4_dbg(dev, RoCE V2 is not supported when SR-IOV is enabled\n); + } if (mlx4_low_memory_profile()) { dev-caps.log_num_macs = MLX4_MIN_LOG_NUM_MAC; diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h index 9a05e73..9bdf157 100644 --- a/include/linux/mlx4/device.h +++ b/include/linux/mlx4/device.h @@ -202,7 +202,8 @@ enum { MLX4_DEV_CAP_FLAG2_SYS_EQS = 1LL 17, MLX4_DEV_CAP_FLAG2_80_VFS = 1LL 18, MLX4_DEV_CAP_FLAG2_FS_A0= 1LL 19, - MLX4_DEV_CAP_FLAG2_PORT_REMAP = 1LL 21 + MLX4_DEV_CAP_FLAG2_PORT_REMAP = 1LL 21, + MLX4_DEV_CAP_FLAG2_ROCE_V1_V2 = 1LL 22 }; enum { @@ -250,6 +251,7 @@ enum { MLX4_BMME_FLAG_TYPE_2_WIN = 1 9, MLX4_BMME_FLAG_RESERVED_LKEY= 1 10, MLX4_BMME_FLAG_FAST_REG_WR = 1 11, + MLX4_BMME_FLAG_ROCE_V1_V2 = 1 19, MLX4_BMME_FLAG_PORT_REMAP = 1 24, MLX4_BMME_FLAG_VSD_INIT2RTR = 1 28, }; @@ -258,6 +260,10 @@ enum { MLX4_FLAG_PORT_REMAP= MLX4_BMME_FLAG_PORT_REMAP }; +enum { + MLX4_FLAG_ROCE_V1_V2= MLX4_BMME_FLAG_ROCE_V1_V2 +}; + enum mlx4_event { MLX4_EVENT_TYPE_COMP = 0x00, MLX4_EVENT_TYPE_PATH_MIG = 0x01, @@ -888,9 +894,10 @@ struct mlx4_mad_ifc { if (((dev)-caps.port_mask[port
[PATCH v3 for-next 25/33] IB/mlx4: Implement ib_device callback - get_netdev
From: Moni Shoua mo...@mellanox.com This is a new callback that is required for RoCEv2 support. In port aggregation mode it is required to return the netdev of the active port so support in mlx4 core driver to figure out that port identity is required. Signed-off-by: Moni Shoua mo...@mellanox.com Signed-off-by: Somnath Kotur somnath.ko...@emulex.com --- drivers/infiniband/hw/mlx4/main.c | 29 + drivers/net/ethernet/mellanox/mlx4/main.c | 18 ++ include/linux/mlx4/driver.h | 1 + 3 files changed, 48 insertions(+) diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c index bf87a95..04e6603 100644 --- a/drivers/infiniband/hw/mlx4/main.c +++ b/drivers/infiniband/hw/mlx4/main.c @@ -47,6 +47,8 @@ #include rdma/ib_addr.h #include rdma/ib_cache.h +#include net/bonding.h + #include linux/mlx4/driver.h #include linux/mlx4/cmd.h #include linux/mlx4/qp.h @@ -1527,6 +1529,32 @@ unlock: mutex_unlock(ibdev-qp1_proxy_lock[port - 1]); } +static struct net_device *mlx4_ib_get_netdev(struct ib_device *device, u8 port_num) +{ + struct mlx4_ib_dev *ibdev = to_mdev(device); + + if (mlx4_is_bonded(ibdev-dev)) { + struct net_device *dev; + struct net_device *upper = NULL; + + rcu_read_lock(); + + dev = mlx4_get_protocol_dev(ibdev-dev, MLX4_PROT_ETH, port_num); + if (dev) + upper = netdev_master_upper_dev_get_rcu(dev); + else + goto unlock; + if (upper) + dev = bond_option_active_slave_get_rcu(netdev_priv(upper)); +unlock: + rcu_read_unlock(); + + return dev; + } + + return mlx4_get_protocol_dev(ibdev-dev, MLX4_PROT_ETH, port_num); +} + static void mlx4_ib_scan_netdevs(struct mlx4_ib_dev *ibdev, struct net_device *dev, unsigned long event) @@ -1806,6 +1834,7 @@ static void *mlx4_ib_add(struct mlx4_dev *dev) ibdev-ib_dev.attach_mcast = mlx4_ib_mcg_attach; ibdev-ib_dev.detach_mcast = mlx4_ib_mcg_detach; ibdev-ib_dev.process_mad = mlx4_ib_process_mad; + ibdev-ib_dev.get_netdev= mlx4_ib_get_netdev; if (!mlx4_is_slave(ibdev-dev)) { ibdev-ib_dev.alloc_fmr = mlx4_ib_fmr_alloc; diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c b/drivers/net/ethernet/mellanox/mlx4/main.c index 29c60fd..3f469d3 100644 --- a/drivers/net/ethernet/mellanox/mlx4/main.c +++ b/drivers/net/ethernet/mellanox/mlx4/main.c @@ -1241,6 +1241,24 @@ int mlx4_port_map_set(struct mlx4_dev *dev, struct mlx4_port_map *v2p) } EXPORT_SYMBOL_GPL(mlx4_port_map_set); +int mlx4_port_map_get(struct mlx4_dev *dev, u8 vport, u8 *pport) +{ + struct mlx4_priv *priv = mlx4_priv(dev); + + if (!pport) + return -EINVAL; + *pport = 0; + + if (vport == 1) + *pport = priv-v2p.port1; + else if (vport == 2) + *pport = priv-v2p.port2; + if (!*pport) + return -EINVAL; + return 0; +} +EXPORT_SYMBOL_GPL(mlx4_port_map_get); + static int mlx4_load_fw(struct mlx4_dev *dev) { struct mlx4_priv *priv = mlx4_priv(dev); diff --git a/include/linux/mlx4/driver.h b/include/linux/mlx4/driver.h index 5a06d96..a992971 100644 --- a/include/linux/mlx4/driver.h +++ b/include/linux/mlx4/driver.h @@ -81,6 +81,7 @@ struct mlx4_port_map { }; int mlx4_port_map_set(struct mlx4_dev *dev, struct mlx4_port_map *v2p); +int mlx4_port_map_get(struct mlx4_dev *dev, u8 vport, u8 *pport); void *mlx4_get_protocol_dev(struct mlx4_dev *dev, enum mlx4_protocol proto, int port); -- 2.1.0 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 for-next 32/33] IB/mlx4: Create and use another QP1 for RoCEv2
From: Moni Shoua mo...@mellanox.com The mlx4 driver uses a special QP to implement the GSI QP. This kind of QP allows to build the InfiniBand headers in SW to be put before the payload that comes in with the WR. The mlx4 HW builds the packet, calculates the ICRC and puts it at the end of the payload. This ICRC calculation however depends on the QP configuration which is determined when QP is modified (roce_mode during INIT-RTR). On the other hand, ICRC verification when packet is received does to depend on this configuration. Therefore, using 2 GSI QPs for send (one for each RoCE version) and 1 GSI QP for receive are required. Signed-off-by: Moni Shoua mo...@mellanox.com Signed-off-by: Somnath Kotur somnath.ko...@emulex.com --- drivers/infiniband/hw/mlx4/mlx4_ib.h | 7 ++ drivers/infiniband/hw/mlx4/qp.c | 155 +++ 2 files changed, 144 insertions(+), 18 deletions(-) diff --git a/drivers/infiniband/hw/mlx4/mlx4_ib.h b/drivers/infiniband/hw/mlx4/mlx4_ib.h index 018bda6..a853330 100644 --- a/drivers/infiniband/hw/mlx4/mlx4_ib.h +++ b/drivers/infiniband/hw/mlx4/mlx4_ib.h @@ -159,11 +159,18 @@ struct mlx4_ib_wq { unsignedtail; }; +enum { + MLX4_IB_QP_CREATE_ROCE_V2_GSI = IB_QP_CREATE_RESERVED_START +}; + enum mlx4_ib_qp_flags { MLX4_IB_QP_LSO = IB_QP_CREATE_IPOIB_UD_LSO, MLX4_IB_QP_BLOCK_MULTICAST_LOOPBACK = IB_QP_CREATE_BLOCK_MULTICAST_LOOPBACK, MLX4_IB_QP_NETIF = IB_QP_CREATE_NETIF_QP, MLX4_IB_QP_CREATE_USE_GFP_NOIO = IB_QP_CREATE_USE_GFP_NOIO, + + /* Mellanox specific flags start from IB_QP_CREATE_RESERVED_START */ + MLX4_IB_ROCE_V2_GSI_QP = MLX4_IB_QP_CREATE_ROCE_V2_GSI, MLX4_IB_SRIOV_TUNNEL_QP = 1 30, MLX4_IB_SRIOV_SQP = 1 31, }; diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c index fb37415..b54f315 100644 --- a/drivers/infiniband/hw/mlx4/qp.c +++ b/drivers/infiniband/hw/mlx4/qp.c @@ -81,6 +81,7 @@ struct mlx4_ib_sqp { u32 send_psn; struct ib_ud_header ud_header; u8 header_buf[MLX4_IB_UD_HEADER_SIZE]; + struct ib_qp*roce_v2_gsi; }; enum { @@ -150,7 +151,10 @@ static int is_sqp(struct mlx4_ib_dev *dev, struct mlx4_ib_qp *qp) } } } - return proxy_sqp; + if (proxy_sqp) + return 1; + + return !!(qp-flags MLX4_IB_ROCE_V2_GSI_QP); } /* used for INIT/CLOSE port logic */ @@ -672,6 +676,7 @@ static int create_qp_common(struct mlx4_ib_dev *dev, struct ib_pd *pd, qp = sqp-qp; qp-pri.vid = 0x; qp-alt.vid = 0x; + sqp-roce_v2_gsi = NULL; } else { qp = kzalloc(sizeof (struct mlx4_ib_qp), gfp); if (!qp) @@ -1029,9 +1034,17 @@ static void destroy_qp_common(struct mlx4_ib_dev *dev, struct mlx4_ib_qp *qp, del_gid_entries(qp); } -static u32 get_sqp_num(struct mlx4_ib_dev *dev, struct ib_qp_init_attr *attr) +static int get_sqp_num(struct mlx4_ib_dev *dev, struct ib_qp_init_attr *attr) { /* Native or PPF */ + if ((!mlx4_is_mfunc(dev-dev) || mlx4_is_master(dev-dev)) + attr-create_flags MLX4_IB_QP_CREATE_ROCE_V2_GSI) { + int sqpn; + int res = mlx4_qp_reserve_range(dev-dev, 1, 1, sqpn, 0); + + return res ? -abs(res) : sqpn; + } + if (!mlx4_is_mfunc(dev-dev) || (mlx4_is_master(dev-dev) attr-create_flags MLX4_IB_SRIOV_SQP)) { @@ -1039,6 +1052,7 @@ static u32 get_sqp_num(struct mlx4_ib_dev *dev, struct ib_qp_init_attr *attr) (attr-qp_type == IB_QPT_SMI ? 0 : 2) + attr-port_num - 1; } + /* PF or VF -- creating proxies */ if (attr-qp_type == IB_QPT_SMI) return dev-dev-caps.qp0_proxy[attr-port_num - 1]; @@ -1046,9 +1060,9 @@ static u32 get_sqp_num(struct mlx4_ib_dev *dev, struct ib_qp_init_attr *attr) return dev-dev-caps.qp1_proxy[attr-port_num - 1]; } -struct ib_qp *mlx4_ib_create_qp(struct ib_pd *pd, - struct ib_qp_init_attr *init_attr, - struct ib_udata *udata) +static struct ib_qp *_mlx4_ib_create_qp(struct ib_pd *pd, + struct ib_qp_init_attr *init_attr, + struct ib_udata *udata) { struct mlx4_ib_qp *qp = NULL; int err; @@ -1066,6 +1080,7 @@ struct ib_qp *mlx4_ib_create_qp(struct ib_pd *pd, MLX4_IB_SRIOV_TUNNEL_QP | MLX4_IB_SRIOV_SQP | MLX4_IB_QP_NETIF | + MLX4_IB_QP_CREATE_ROCE_V2_GSI
[PATCH v3 for-next 19/33] RDMA/ocrdma: changes to support user AH creation
From: Devesh Sharma devesh.sha...@emulex.com To support user space AH this uses ahid field to convey l3-type to user space library. The library is responsible for decoding the l3-type out of ahid. Signed-off-by: Somnath Kotur somnath.ko...@emulex.com Signed-off-by: Devesh Sharma devesh.sha...@emulex.com --- drivers/infiniband/hw/ocrdma/ocrdma_ah.c | 5 + drivers/infiniband/hw/ocrdma/ocrdma_ah.h | 5 +++-- 2 files changed, 8 insertions(+), 2 deletions(-) diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_ah.c b/drivers/infiniband/hw/ocrdma/ocrdma_ah.c index 1bb72a0..65a39cc 100644 --- a/drivers/infiniband/hw/ocrdma/ocrdma_ah.c +++ b/drivers/infiniband/hw/ocrdma/ocrdma_ah.c @@ -191,6 +191,11 @@ struct ib_ah *ocrdma_create_ah(struct ib_pd *ibpd, struct ib_ah_attr *attr) ahid_addr = pd-uctx-ah_tbl.va + attr-dlid; *ahid_addr = 0; *ahid_addr |= ah-id OCRDMA_AH_ID_MASK; + if (ocrdma_is_rocev2_supported(dev)) { + *ahid_addr |= ((u32)ah-hdr_type + OCRDMA_AH_L3_TYPE_MASK) + OCRDMA_AH_L3_TYPE_SHIFT; + } if (isvlan) *ahid_addr |= (OCRDMA_AH_VLAN_VALID_MASK OCRDMA_AH_VLAN_VALID_SHIFT); diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_ah.h b/drivers/infiniband/hw/ocrdma/ocrdma_ah.h index 726a87c..ed45ecd 100644 --- a/drivers/infiniband/hw/ocrdma/ocrdma_ah.h +++ b/drivers/infiniband/hw/ocrdma/ocrdma_ah.h @@ -31,9 +31,10 @@ enum { OCRDMA_AH_ID_MASK = 0x3FF, OCRDMA_AH_VLAN_VALID_MASK = 0x01, - OCRDMA_AH_VLAN_VALID_SHIFT = 0x1F + OCRDMA_AH_VLAN_VALID_SHIFT = 0x1F, + OCRDMA_AH_L3_TYPE_MASK = 0x03, + OCRDMA_AH_L3_TYPE_SHIFT = 0x1D /* 29 bits */ }; - struct ib_ah *ocrdma_create_ah(struct ib_pd *, struct ib_ah_attr *); int ocrdma_destroy_ah(struct ib_ah *); int ocrdma_query_ah(struct ib_ah *, struct ib_ah_attr *); -- 2.1.0 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 for-next 21/33] IB/mlx4: Replace spin_lock with rw_semaphore
From: Moni Shoua mo...@mellanox.com Protection on iboe-netdevs is no longer required to be from an atomic context. Replacing a spin_lock with a semaphore is allowed and makes more sense. Signed-off-by: Moni Shoua mo...@mellanox.com Signed-off-by: Somnath Kotur somnath.ko...@emulex.com --- drivers/infiniband/hw/mlx4/main.c| 27 ++- drivers/infiniband/hw/mlx4/mlx4_ib.h | 2 +- 2 files changed, 11 insertions(+), 18 deletions(-) diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c index 91caffc..d8b227e 100644 --- a/drivers/infiniband/hw/mlx4/main.c +++ b/drivers/infiniband/hw/mlx4/main.c @@ -369,7 +369,7 @@ static int eth_link_query_port(struct ib_device *ibdev, u8 port, props-active_mtu = IB_MTU_256; if (is_bonded) rtnl_lock(); /* required to get upper dev */ - spin_lock_bh(iboe-lock); + down_read(iboe-sem); ndev = iboe-netdevs[port - 1]; if (ndev is_bonded) ndev = netdev_master_upper_dev_get(ndev); @@ -383,7 +383,7 @@ static int eth_link_query_port(struct ib_device *ibdev, u8 port, IB_PORT_ACTIVE : IB_PORT_DOWN; props-phys_state = state_to_phys_state(props-state); out_unlock: - spin_unlock_bh(iboe-lock); + up_read(iboe-sem); if (is_bonded) rtnl_unlock(); out: @@ -825,11 +825,11 @@ int mlx4_ib_add_mc(struct mlx4_ib_dev *mdev, struct mlx4_ib_qp *mqp, if (!mqp-port) return 0; - spin_lock_bh(mdev-iboe.lock); + down_read(mdev-iboe.sem); ndev = mdev-iboe.netdevs[mqp-port - 1]; if (ndev) dev_hold(ndev); - spin_unlock_bh(mdev-iboe.lock); + up_read(mdev-iboe.sem); if (ndev) { ret = 1; @@ -1330,7 +1330,6 @@ static int mlx4_ib_mcg_detach(struct ib_qp *ibqp, union ib_gid *gid, u16 lid) struct mlx4_ib_dev *mdev = to_mdev(ibqp-device); struct mlx4_dev *dev = mdev-dev; struct mlx4_ib_qp *mqp = to_mqp(ibqp); - struct net_device *ndev; struct mlx4_ib_gid_entry *ge; enum mlx4_protocol prot = MLX4_PROT_IB_IPV6; struct mlx4_flow_reg_id reg_id = {0, 0}; @@ -1370,13 +1369,6 @@ static int mlx4_ib_mcg_detach(struct ib_qp *ibqp, union ib_gid *gid, u16 lid) mutex_lock(mqp-mutex); ge = find_gid_entry(mqp, gid-raw); if (ge) { - spin_lock_bh(mdev-iboe.lock); - ndev = ge-added ? mdev-iboe.netdevs[ge-port - 1] : NULL; - if (ndev) - dev_hold(ndev); - spin_unlock_bh(mdev-iboe.lock); - if (ndev) - dev_put(ndev); list_del(ge-list); kfree(ge); } else @@ -1543,7 +1535,7 @@ static void mlx4_ib_scan_netdevs(struct mlx4_ib_dev *ibdev, iboe = ibdev-iboe; - spin_lock_bh(iboe-lock); + down_write(iboe-sem); mlx4_foreach_ib_transport_port(port, ibdev-dev) { iboe-netdevs[port - 1] = @@ -1555,7 +1547,7 @@ static void mlx4_ib_scan_netdevs(struct mlx4_ib_dev *ibdev, update_qps_port = port; } - spin_unlock_bh(iboe-lock); + up_write(iboe-sem); if (update_qps_port 0) mlx4_ib_update_qps(ibdev, dev, update_qps_port); @@ -1848,7 +1840,7 @@ static void *mlx4_ib_add(struct mlx4_dev *dev) mlx4_ib_alloc_eqs(dev, ibdev); - spin_lock_init(iboe-lock); + init_rwsem(iboe-sem); if (init_node_data(ibdev)) goto err_map; @@ -2153,7 +2145,8 @@ static void handle_bonded_port_state_event(struct work_struct *work) struct ib_event ibev; kfree(ew); - spin_lock_bh(ibdev-iboe.lock); + + down_read(ibdev-iboe.sem); for (i = 0; i MLX4_MAX_PORTS; ++i) { struct net_device *curr_netdev = ibdev-iboe.netdevs[i]; @@ -2165,7 +2158,7 @@ static void handle_bonded_port_state_event(struct work_struct *work) bonded_port_state = (bonded_port_state != IB_PORT_ACTIVE) ? curr_port_state : IB_PORT_ACTIVE; } - spin_unlock_bh(ibdev-iboe.lock); + up_read(ibdev-iboe.sem); ibev.device = ibdev-ib_dev; ibev.element.port_num = 1; diff --git a/drivers/infiniband/hw/mlx4/mlx4_ib.h b/drivers/infiniband/hw/mlx4/mlx4_ib.h index e3805a4..166ebf9 100644 --- a/drivers/infiniband/hw/mlx4/mlx4_ib.h +++ b/drivers/infiniband/hw/mlx4/mlx4_ib.h @@ -455,7 +455,7 @@ struct mlx4_ib_sriov { }; struct mlx4_ib_iboe { - spinlock_t lock; + struct rw_semaphore sem; /* guard from concurrent access to data in this struct */ struct net_device *netdevs[MLX4_MAX_PORTS]; atomic64_t mac[MLX4_MAX_PORTS]; struct notifier_block nb; -- 2.1.0 -- To unsubscribe from this list: send
[PATCH v3 for-next 28/33] IB/mlx4: Translate cache gid index to real index
From: Moni Shoua mo...@mellanox.com When QP is modified with path the given sgid_index is not necessarily the index that HW knows. This is due to optimizations that can save place in the HW table. Therefore, translation is required. Signed-off-by: Moni Shoua mo...@mellanox.com Signed-off-by: Somnath Kotur somnath.ko...@emulex.com --- drivers/infiniband/hw/mlx4/qp.c | 10 +++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c index 847f9ec..d7d7c5a 100644 --- a/drivers/infiniband/hw/mlx4/qp.c +++ b/drivers/infiniband/hw/mlx4/qp.c @@ -1256,14 +1256,18 @@ static int _mlx4_set_path(struct mlx4_ib_dev *dev, const struct ib_ah_attr *ah, path-static_rate = 0; if (ah-ah_flags IB_AH_GRH) { - if (ah-grh.sgid_index = dev-dev-caps.gid_table_len[port]) { + int real_sgid_index = mlx4_ib_gid_index_to_real_index(dev, + port, + ah-grh.sgid_index); + + if (real_sgid_index = dev-dev-caps.gid_table_len[port]) { pr_err(sgid_index (%u) too large. max is %d\n, - ah-grh.sgid_index, dev-dev-caps.gid_table_len[port] - 1); + real_sgid_index, dev-dev-caps.gid_table_len[port] - 1); return -1; } path-grh_mylmc |= 1 7; - path-mgid_index = ah-grh.sgid_index; + path-mgid_index = real_sgid_index; path-hop_limit = ah-grh.hop_limit; path-tclass_flowlabel = cpu_to_be32((ah-grh.traffic_class 20) | -- 2.1.0 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 for-next 33/33] IB/cma: Join and leave multicast groups with IGMP
From: Moni Shoua mo...@mellanox.com Since RoCEv2 is a protocol over IP header it is required to send IGMP join and leave requests to the network when joining and leaving multicast groups. Signed-off-by: Moni Shoua mo...@mellanox.com Signed-off-by: Somnath Kotur somnath.ko...@emulex.com --- drivers/infiniband/core/cma.c | 78 ++--- drivers/infiniband/core/multicast.c | 18 - include/rdma/ib_sa.h| 3 ++ 3 files changed, 92 insertions(+), 7 deletions(-) diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index 6f345e2..8f997d7 100644 --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -38,6 +38,7 @@ #include linux/in6.h #include linux/mutex.h #include linux/random.h +#include linux/igmp.h #include linux/idr.h #include linux/inetdevice.h #include linux/slab.h @@ -196,6 +197,7 @@ struct cma_multicast { void*context; struct sockaddr_storage addr; struct kref mcref; + booligmp_joined; }; struct cma_work { @@ -283,6 +285,26 @@ static inline void cma_set_ip_ver(struct cma_hdr *hdr, u8 ip_ver) hdr-ip_version = (ip_ver 4) | (hdr-ip_version 0xF); } +static int cma_igmp_send(struct net_device *ndev, union ib_gid *mgid, bool join) +{ + struct in_device *in_dev = NULL; + + if (ndev) { + rtnl_lock(); + in_dev = __in_dev_get_rtnl(ndev); + if (in_dev) { + if (join) + ip_mc_inc_group(in_dev, + *(__be32 *)(mgid-raw+12)); + else + ip_mc_dec_group(in_dev, + *(__be32 *)(mgid-raw+12)); + } + rtnl_unlock(); + } + return (in_dev) ? 0 : -ENODEV; +} + static void cma_attach_to_dev(struct rdma_id_private *id_priv, struct cma_device *cma_dev) { @@ -1076,6 +1098,20 @@ static void cma_leave_mc_groups(struct rdma_id_private *id_priv) kfree(mc); break; case IB_LINK_LAYER_ETHERNET: + if (mc-igmp_joined) { + struct rdma_dev_addr *dev_addr = id_priv-id.route.addr.dev_addr; + struct net_device *ndev = NULL; + + if (dev_addr-bound_dev_if) + ndev = dev_get_by_index(init_net, + dev_addr-bound_dev_if); + if (ndev) { + cma_igmp_send(ndev, + mc-multicast.ib-rec.mgid, + false); + dev_put(ndev); + } + } kref_put(mc-mcref, release_mc); break; default: @@ -3356,7 +3392,7 @@ static int cma_iboe_join_multicast(struct rdma_id_private *id_priv, { struct iboe_mcast_work *work; struct rdma_dev_addr *dev_addr = id_priv-id.route.addr.dev_addr; - int err; + int err = 0; struct sockaddr *addr = (struct sockaddr *)mc-addr; struct net_device *ndev = NULL; @@ -3388,13 +3424,30 @@ static int cma_iboe_join_multicast(struct rdma_id_private *id_priv, mc-multicast.ib-rec.rate = iboe_get_rate(ndev); mc-multicast.ib-rec.hop_limit = 1; mc-multicast.ib-rec.mtu = iboe_get_mtu(ndev-mtu); + mc-multicast.ib-rec.ifindex = dev_addr-bound_dev_if; + mc-multicast.ib-rec.net = init_net; + rdma_ip2gid((struct sockaddr *)id_priv-id.route.addr.src_addr, + mc-multicast.ib-rec.port_gid); + + if (addr-sa_family == AF_INET) { + mc-multicast.ib-rec.gid_type = + id_priv-cma_dev-default_gid_type; + if (mc-multicast.ib-rec.gid_type == IB_GID_TYPE_ROCE_V2) + err = cma_igmp_send(ndev, mc-multicast.ib-rec.mgid, + true); + if (!err) { + mc-igmp_joined = true; + mc-multicast.ib-rec.hop_limit = IPV6_DEFAULT_HOPLIMIT; + } + } else { + mc-multicast.ib-rec.gid_type = IB_GID_TYPE_IB; + } dev_put(ndev); - if (!mc-multicast.ib-rec.mtu) { + if (err || !mc-multicast.ib-rec.mtu) { err = -EINVAL; goto out2; } - rdma_ip2gid((struct sockaddr *)id_priv-id.route.addr.src_addr, - mc-multicast.ib-rec.port_gid); + work-id = id_priv; work-mc = mc; INIT_WORK(work
[PATCH v3 for-next 26/33] IB/mlx4: Implement ib_device callback - modify_gid
From: Moni Shoua mo...@mellanox.com This is a new callbac that is required for RoCEv2 support. In RoCE, GID table is managed in the IB core driver. The role of the mlx4 driver is to synchronize the HW with the entries in the GID table. Since it is possible that the same GID value will appear more than once in the GID table (though with different attributes) it is required from the mlx4 driver to maintain a reference counting mechanism and populate the HW with a single value. Since an index to the GID table is not necessarily the same as index to the matching entry in the HW GID table, a translation between indexes is required. Signed-off-by: Moni Shoua mo...@mellanox.com Signed-off-by: Somnath Kotur somnath.ko...@emulex.com --- drivers/infiniband/hw/mlx4/main.c| 226 +++ drivers/infiniband/hw/mlx4/mlx4_ib.h | 18 +++ include/linux/mlx4/cmd.h | 3 +- include/linux/mlx4/device.h | 3 +- 4 files changed, 248 insertions(+), 2 deletions(-) diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c index 04e6603..96a6ec0 100644 --- a/drivers/infiniband/hw/mlx4/main.c +++ b/drivers/infiniband/hw/mlx4/main.c @@ -1555,6 +1555,230 @@ unlock: return mlx4_get_protocol_dev(ibdev-dev, MLX4_PROT_ETH, port_num); } +static int mlx4_ib_update_gids_v1(struct gid_entry *gids, + struct mlx4_ib_dev *ibdev, + u8 port_num) +{ + struct mlx4_cmd_mailbox *mailbox; + int err; + struct mlx4_dev *dev = ibdev-dev; + int i; + union ib_gid *gid_tbl; + + mailbox = mlx4_alloc_cmd_mailbox(dev); + if (IS_ERR(mailbox)) + return -ENOMEM; + + gid_tbl = mailbox-buf; + + for (i = 0; i MLX4_MAX_PORT_GIDS; ++i) + memcpy(gid_tbl[i], gids[i].gid, sizeof(union ib_gid)); + + err = mlx4_cmd(dev, mailbox-dma, + MLX4_SET_PORT_GID_TABLE 8 | port_num, + 1, MLX4_CMD_SET_PORT, MLX4_CMD_TIME_CLASS_B, + MLX4_CMD_WRAPPED); + if (mlx4_is_bonded(dev)) + err += mlx4_cmd(dev, mailbox-dma, + MLX4_SET_PORT_GID_TABLE 8 | 2, + 1, MLX4_CMD_SET_PORT, MLX4_CMD_TIME_CLASS_B, + MLX4_CMD_WRAPPED); + + mlx4_free_cmd_mailbox(dev, mailbox); + return err; +} + +static int mlx4_ib_update_gids_v1_v2(struct gid_entry *gids, +struct mlx4_ib_dev *ibdev, +u8 port_num) +{ + struct mlx4_cmd_mailbox *mailbox; + int err; + struct mlx4_dev *dev = ibdev-dev; + int i; + struct { + union ib_gidgid; + __be32 rsrvd1[2]; + __be16 rsrvd2; + u8 type; + u8 version; + __be32 rsrvd3; + } *gid_tbl; + + mailbox = mlx4_alloc_cmd_mailbox(dev); + if (IS_ERR(mailbox)) + return -ENOMEM; + + gid_tbl = mailbox-buf; + for (i = 0; i MLX4_MAX_PORT_GIDS; ++i) { + memcpy(gid_tbl[i].gid, gids[i].gid, sizeof(union ib_gid)); + if (gids[i].gid_type == IB_GID_TYPE_ROCE_V2) { + gid_tbl[i].version = 2; + if (!ipv6_addr_v4mapped((struct in6_addr *)gids[i].gid)) + gid_tbl[i].type = 1; + } + } + + err = mlx4_cmd(dev, mailbox-dma, + MLX4_SET_PORT_ROCE_ADDR 8 | port_num, + 1, MLX4_CMD_SET_PORT, MLX4_CMD_TIME_CLASS_B, + MLX4_CMD_WRAPPED); + if (mlx4_is_bonded(dev)) + err += mlx4_cmd(dev, mailbox-dma, + MLX4_SET_PORT_ROCE_ADDR 8 | 2, + 1, MLX4_CMD_SET_PORT, MLX4_CMD_TIME_CLASS_B, + MLX4_CMD_WRAPPED); + + mlx4_free_cmd_mailbox(dev, mailbox); + return err; +} + +static int mlx4_ib_update_gids(struct gid_entry *gids, + struct mlx4_ib_dev *ibdev, + u8 port_num) +{ + if (ibdev-dev-caps.flags2 MLX4_DEV_CAP_FLAG2_ROCE_V1_V2) + return mlx4_ib_update_gids_v1_v2(gids, ibdev, port_num); + + return mlx4_ib_update_gids_v1(gids, ibdev, port_num); +} + +static int mlx4_ib_modify_gid(struct ib_device *device, + u8 port_num, unsigned int index, + const union ib_gid *gid, + const struct ib_gid_attr *attr, + void **context) +{ + struct mlx4_ib_dev *ibdev = to_mdev(device); + struct mlx4_ib_iboe *iboe = ibdev-iboe; + struct mlx4_port_gid_table *port_gid_table; + int free = -1, found = -1; + int ret
[PATCH v3 for-next 06/33] net: Add info for NETDEV_CHANGEUPPER event
From: Matan Barak mat...@mellanox.com Consumers of NETDEV_CHANGEUPPER event sometimes want to know which upper device was linked/unlinked and which operation was carried. Adding extra information in the notifier info block. Signed-off-by: Matan Barak mat...@mellanox.com Signed-off-by: Somnath Kotur somnath.ko...@emulex.com --- include/linux/netdevice.h | 14 ++ net/core/dev.c| 12 ++-- 2 files changed, 24 insertions(+), 2 deletions(-) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index f36f7d3..599d7c8 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -3466,6 +3466,20 @@ struct sk_buff *__skb_gso_segment(struct sk_buff *skb, struct sk_buff *skb_mac_gso_segment(struct sk_buff *skb, netdev_features_t features); +enum netdev_changeupper_event { + NETDEV_CHANGEUPPER_LINK, + NETDEV_CHANGEUPPER_UNLINK, +}; + +struct netdev_changeupper_info { + struct netdev_notifier_info info; /* must be first */ + enum netdev_changeupper_event event; + struct net_device *upper; +}; + +void netdev_changeupper_info_change(struct net_device *dev, + struct netdev_changeupper_info *info); + struct netdev_bonding_info { ifslave slave; ifbond master; diff --git a/net/core/dev.c b/net/core/dev.c index ea714fc..1ef1bd5 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -5118,6 +5118,7 @@ static int __netdev_upper_dev_link(struct net_device *dev, void *private) { struct netdev_adjacent *i, *j, *to_i, *to_j; + struct netdev_changeupper_info changeupper_info; int ret = 0; ASSERT_RTNL(); @@ -5173,7 +5174,10 @@ static int __netdev_upper_dev_link(struct net_device *dev, goto rollback_lower_mesh; } - call_netdevice_notifiers(NETDEV_CHANGEUPPER, dev); + changeupper_info.event = NETDEV_CHANGEUPPER_LINK; + changeupper_info.upper = upper_dev; + call_netdevice_notifiers_info(NETDEV_CHANGEUPPER, dev, + changeupper_info.info); return 0; rollback_lower_mesh: @@ -5269,6 +5273,7 @@ void netdev_upper_dev_unlink(struct net_device *dev, struct net_device *upper_dev) { struct netdev_adjacent *i, *j; + struct netdev_changeupper_info changeupper_info; ASSERT_RTNL(); __netdev_adjacent_dev_unlink_neighbour(dev, upper_dev); @@ -5290,7 +5295,10 @@ void netdev_upper_dev_unlink(struct net_device *dev, list_for_each_entry(i, upper_dev-all_adj_list.upper, list) __netdev_adjacent_dev_unlink(dev, i-dev); - call_netdevice_notifiers(NETDEV_CHANGEUPPER, dev); + changeupper_info.event = NETDEV_CHANGEUPPER_UNLINK; + changeupper_info.upper = upper_dev; + call_netdevice_notifiers_info(NETDEV_CHANGEUPPER, dev, + changeupper_info.info); } EXPORT_SYMBOL(netdev_upper_dev_unlink); -- 2.1.0 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 for-next 13/33] IB/core: Add rdma_network_type to wc
From: Matan Barak mat...@mellanox.com Providers should tell IB core the wc's network type. This is used in order to search for the proper GID in the GID table. When using HCAs that can't provide this info, IB core tries to deep examine the packet and extract the GID type by itself. Signed-off-by: Matan Barak mat...@mellanox.com Signed-off-by: Somnath Kotur somnath.ko...@emulex.com --- drivers/infiniband/core/verbs.c | 106 ++-- include/rdma/ib_verbs.h | 30 2 files changed, 131 insertions(+), 5 deletions(-) diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c index 2f5fd7a..2e7ccad 100644 --- a/drivers/infiniband/core/verbs.c +++ b/drivers/infiniband/core/verbs.c @@ -195,8 +195,84 @@ struct ib_ah *ib_create_ah(struct ib_pd *pd, struct ib_ah_attr *ah_attr) } EXPORT_SYMBOL(ib_create_ah); +static int ib_get_grh_header_version(const void *h) +{ + const struct iphdr *ip4h = (struct iphdr *)(h + 20); + struct iphdr ip4h_checked; + const struct ipv6hdr *ip6h = (struct ipv6hdr *)h; + + if (ip6h-version != 6) + return (ip4h-version == 4) ? 4 : 0; + /* version may be 6 or 4 */ + if (ip4h-ihl != 5) /* IPv4 header length must be 5 for RR */ + return 6; + /* Verify checksum. + We can't write on scattered buffers so we need to copy to + temp buffer. +*/ + memcpy(ip4h_checked, ip4h, sizeof(ip4h_checked)); + ip4h_checked.check = 0; + ip4h_checked.check = ip_fast_csum((u8 *)ip4h_checked, 5); + /* if IPv4 header checksum is OK, bellive it */ + if (ip4h-check == ip4h_checked.check) + return 4; + return 6; +} + +static int ib_get_dgid_sgid_by_grh(const void *h, + enum rdma_network_type net_type, + union ib_gid *dgid, union ib_gid *sgid) +{ + switch (net_type) { + case RDMA_NETWORK_IPV4: { + const struct iphdr *ip4h = (struct iphdr *)(h + 20); + + ipv6_addr_set_v4mapped(ip4h-daddr, (struct in6_addr *)dgid); + ipv6_addr_set_v4mapped(ip4h-saddr, (struct in6_addr *)sgid); + return 0; + } + case RDMA_NETWORK_IPV6: { + struct ipv6hdr *ip6h = (struct ipv6hdr *)h; + + memcpy(dgid, ip6h-daddr, sizeof(*dgid)); + memcpy(sgid, ip6h-saddr, sizeof(*sgid)); + return 0; + } + case RDMA_NETWORK_IB: { + struct ib_grh *grh = (struct ib_grh *)h; + + memcpy(dgid, grh-dgid, sizeof(*dgid)); + memcpy(sgid, grh-sgid, sizeof(*sgid)); + return 0; + } + } + + return -EINVAL; +} + +static enum rdma_network_type ib_get_net_type_by_grh(struct ib_device *device, +u8 port_num, +const struct ib_grh *grh) +{ + int grh_version; + + if (rdma_port_get_link_layer(device, port_num) == IB_LINK_LAYER_INFINIBAND) + return RDMA_NETWORK_IB; + + grh_version = ib_get_grh_header_version(grh); + + if (grh_version == 4) + return RDMA_NETWORK_IPV4; + + if (grh-next_hdr == IPPROTO_UDP) + return RDMA_NETWORK_IPV6; + + return RDMA_NETWORK_IB; +} + struct find_gid_index_context { u16 vlan_id; + enum ib_gid_type gid_type; }; static bool find_gid_index(const union ib_gid *gid, @@ -206,6 +282,9 @@ static bool find_gid_index(const union ib_gid *gid, struct find_gid_index_context *ctx = (struct find_gid_index_context *)context; + if (ctx-gid_type != gid_attr-gid_type) + return false; + if ((!!(ctx-vlan_id != 0x) == !is_vlan_dev(gid_attr-ndev)) || (is_vlan_dev(gid_attr-ndev) vlan_dev_vlan_id(gid_attr-ndev) != ctx-vlan_id)) @@ -216,9 +295,11 @@ static bool find_gid_index(const union ib_gid *gid, static int get_sgid_index_from_eth(struct ib_device *device, u8 port_num, u16 vlan_id, union ib_gid *sgid, + enum ib_gid_type gid_type, u16 *gid_index) { - struct find_gid_index_context context = {.vlan_id = vlan_id}; + struct find_gid_index_context context = {.vlan_id = vlan_id, +.gid_type = gid_type}; return ib_find_gid_by_filter(device, sgid, port_num, find_gid_index, context, gid_index); @@ -232,9 +313,24 @@ int ib_init_ah_from_wc(struct ib_device *device, u8 port_num, struct ib_wc *wc, int ret; int is_eth = (rdma_port_get_link_layer(device, port_num) == IB_LINK_LAYER_ETHERNET); + enum rdma_network_type net_type = RDMA_NETWORK_IB
[PATCH v3 for-next 07/33] IB/core: Add RoCE cache bonding support
From: Matan Barak mat...@mellanox.com Bonding is a unique behavior since when working in active-backup mode, only the current selected slave should occupy the default GIDs and the master's GID. Listening to bonding events and only adding the required GIDs to the active slave in the RoCE cache GID table. Signed-off-by: Matan Barak mat...@mellanox.com Signed-off-by: Somnath Kotur somnath.ko...@emulex.com --- drivers/infiniband/core/roce_gid_mgmt.c | 291 ++-- drivers/net/bonding/bond_options.c | 13 -- include/net/bonding.h | 7 + 3 files changed, 282 insertions(+), 29 deletions(-) diff --git a/drivers/infiniband/core/roce_gid_mgmt.c b/drivers/infiniband/core/roce_gid_mgmt.c index c0cbb23..362327f 100644 --- a/drivers/infiniband/core/roce_gid_mgmt.c +++ b/drivers/infiniband/core/roce_gid_mgmt.c @@ -37,6 +37,7 @@ /* For in6_dev_get/in6_dev_put */ #include net/addrconf.h +#include net/bonding.h #include rdma/ib_cache.h #include rdma/ib_addr.h @@ -55,16 +56,17 @@ struct update_gid_event_work { enum gid_op_type gid_op; }; -#define ROCE_NETDEV_CALLBACK_SZ2 +#define ROCE_NETDEV_CALLBACK_SZ3 struct netdev_event_work_cmd { roce_netdev_callbackcb; roce_netdev_filter filter; + struct net_device *ndev; + struct net_device *f_ndev; }; struct netdev_event_work { struct work_struct work; struct netdev_event_work_cmdcmds[ROCE_NETDEV_CALLBACK_SZ]; - struct net_device *ndev; }; struct roce_rescan_work { @@ -127,22 +129,96 @@ static void update_gid(enum gid_op_type gid_op, struct ib_device *ib_dev, } } +#define IS_NETDEV_BONDING_MASTER(ndev) \ + (((ndev)-priv_flags \ + (IFF_BONDING | IFF_MASTER)) == (IFF_BONDING | IFF_MASTER)) + +enum bonding_slave_state { + BONDING_SLAVE_STATE_ACTIVE = 1UL 0, + BONDING_SLAVE_STATE_INACTIVE= 1UL 1, + BONDING_SLAVE_STATE_NA = 1UL 2, +}; + +static enum bonding_slave_state is_eth_active_slave_of_bonding(struct net_device *idev, + struct net_device *upper) +{ + if (upper IS_NETDEV_BONDING_MASTER(upper)) { + struct net_device *pdev; + + rcu_read_lock(); + pdev = bond_option_active_slave_get_rcu(netdev_priv(upper)); + rcu_read_unlock(); + if (pdev) + return idev == pdev ? BONDING_SLAVE_STATE_ACTIVE : + BONDING_SLAVE_STATE_INACTIVE; + } + + return BONDING_SLAVE_STATE_NA; +} + +static bool is_upper_dev_rcu(struct net_device *dev, struct net_device *upper) +{ + struct net_device *_upper = NULL; + struct list_head *iter; + + rcu_read_lock(); + netdev_for_each_all_upper_dev_rcu(dev, _upper, iter) { + if (_upper == upper) + break; + } + + rcu_read_unlock(); + return _upper == upper; +} + +static int _is_eth_port_of_netdev(struct ib_device *ib_dev, u8 port, + struct net_device *idev, void *cookie, + unsigned long bond_state) +{ + struct net_device *ndev = (struct net_device *)cookie; + struct net_device *rdev; + int res; + + if (!idev) + return 0; + + rcu_read_lock(); + rdev = rdma_vlan_dev_real_dev(ndev); + if (!rdev) + rdev = ndev; + + res = ((is_upper_dev_rcu(idev, ndev) + (is_eth_active_slave_of_bonding(idev, rdev) + bond_state)) || + rdev == idev); + + rcu_read_unlock(); + return res; +} + static int is_eth_port_of_netdev(struct ib_device *ib_dev, u8 port, struct net_device *idev, void *cookie) { - struct net_device *rdev; - struct net_device *mdev; - struct net_device *ndev = (struct net_device *)cookie; + return _is_eth_port_of_netdev(ib_dev, port, idev, cookie, + BONDING_SLAVE_STATE_ACTIVE | + BONDING_SLAVE_STATE_NA); +} +static int is_eth_port_inactive_slave(struct ib_device *ib_dev, u8 port, + struct net_device *idev, void *cookie) +{ + struct net_device *mdev; + int res; if (!idev) return 0; rcu_read_lock(); mdev = netdev_master_upper_dev_get_rcu(idev); - rdev = rdma_vlan_dev_real_dev(ndev); + res = is_eth_active_slave_of_bonding(idev, mdev) == + BONDING_SLAVE_STATE_INACTIVE; rcu_read_unlock(); - return (rdev ? rdev : ndev) == (mdev ? mdev : idev); + return res; } static int pass_all_filter(struct ib_device *ib_dev, u8 port, @@ -151,17 +227,49 @@ static int
[PATCH v3 for-next 09/33] IB/core: Report gid_type and gid_ndev through sysfs
From: Matan Barak mat...@mellanox.com Since we've added GID attributes to the RoCE GID table, the users need a convenient way to query them. Adding the GID type and relate net device to IB's sysfs. The new attributes are available in: /sys/class/infiniband/device/ports/port/gid_attrs/ndevs/index /sys/class/infiniband/device/ports/port/gid_attrs/types/index The index corresponds to the index of the respective GID in: /sys/class/infiniband/device/ports/port/gids/index Signed-off-by: Matan Barak mat...@mellanox.com Signed-off-by: Somnath Kotur somnath.ko...@emulex.com --- drivers/infiniband/core/core_priv.h | 2 + drivers/infiniband/core/roce_gid_cache.c | 13 +++ drivers/infiniband/core/sysfs.c | 184 ++- 3 files changed, 197 insertions(+), 2 deletions(-) diff --git a/drivers/infiniband/core/core_priv.h b/drivers/infiniband/core/core_priv.h index 128d2b3..b5bbbdf 100644 --- a/drivers/infiniband/core/core_priv.h +++ b/drivers/infiniband/core/core_priv.h @@ -71,6 +71,8 @@ void ib_enum_roce_ports_of_netdev(roce_netdev_filter filter, roce_netdev_callback cb, void *cookie); +const char *roce_gid_cache_type_str(enum ib_gid_type gid_type); + int roce_gid_cache_get_gid(struct ib_device *ib_dev, u8 port, int index, union ib_gid *gid, struct ib_gid_attr *attr); diff --git a/drivers/infiniband/core/roce_gid_cache.c b/drivers/infiniband/core/roce_gid_cache.c index 1f30dad..b6180eb 100644 --- a/drivers/infiniband/core/roce_gid_cache.c +++ b/drivers/infiniband/core/roce_gid_cache.c @@ -50,6 +50,11 @@ enum gid_attr_find_mask { GID_ATTR_FIND_MASK_DEFAULT = 1UL 3, }; +static const char * const gid_type_str[] = { + [IB_GID_TYPE_IB]= IB/RoCE V1\n, + [IB_GID_TYPE_ROCE_V2] = RoCE V2\n, +}; + static inline int start_port(struct ib_device *ib_dev) { return (ib_dev-node_type == RDMA_NODE_IB_SWITCH) ? 0 : 1; @@ -60,6 +65,14 @@ struct dev_put_rcu { struct net_device *ndev; }; +const char *roce_gid_cache_type_str(enum ib_gid_type gid_type) +{ + if (gid_type ARRAY_SIZE(gid_type_str) gid_type_str[gid_type]) + return gid_type_str[gid_type]; + + return Invalid GID type; +} + static void put_ndev(struct rcu_head *rcu) { struct dev_put_rcu *put_rcu = diff --git a/drivers/infiniband/core/sysfs.c b/drivers/infiniband/core/sysfs.c index 5cee246..887c2f8 100644 --- a/drivers/infiniband/core/sysfs.c +++ b/drivers/infiniband/core/sysfs.c @@ -37,12 +37,22 @@ #include linux/slab.h #include linux/stat.h #include linux/string.h +#include linux/netdevice.h #include rdma/ib_mad.h +struct ib_port; + +struct gid_attr_group { + struct ib_port *port; + struct kobject kobj; + struct attribute_group ndev; + struct attribute_group type; +}; struct ib_port { struct kobject kobj; struct ib_device *ibdev; + struct gid_attr_group *gid_attr_group; struct attribute_group gid_group; struct attribute_group pkey_group; u8 port_num; @@ -84,6 +94,24 @@ static const struct sysfs_ops port_sysfs_ops = { .show = port_attr_show }; +static ssize_t gid_attr_show(struct kobject *kobj, +struct attribute *attr, char *buf) +{ + struct port_attribute *port_attr = + container_of(attr, struct port_attribute, attr); + struct ib_port *p = container_of(kobj, struct gid_attr_group, +kobj)-port; + + if (!port_attr-show) + return -EIO; + + return port_attr-show(p, port_attr, buf); +} + +static const struct sysfs_ops gid_attr_sysfs_ops = { + .show = gid_attr_show +}; + static ssize_t state_show(struct ib_port *p, struct port_attribute *unused, char *buf) { @@ -281,6 +309,46 @@ static struct attribute *port_default_attrs[] = { NULL }; +static size_t print_ndev(struct ib_gid_attr *gid_attr, char *buf) +{ + if (!gid_attr-ndev) + return -EINVAL; + + return sprintf(buf, %s\n, gid_attr-ndev-name); +} + +static size_t print_gid_type(struct ib_gid_attr *gid_attr, char *buf) +{ + return sprintf(buf, %s, roce_gid_cache_type_str(gid_attr-gid_type)); +} + +static ssize_t _show_port_gid_attr(struct ib_port *p, + struct port_attribute *attr, + char *buf, + size_t (*print)(struct ib_gid_attr *gid_attr, + char *buf)) +{ + struct port_table_attribute *tab_attr = + container_of(attr, struct port_table_attribute, attr); + union ib_gid gid; + struct ib_gid_attr gid_attr; + ssize_t ret; + va_list args; + + rcu_read_lock
[PATCH v3 for-next 03/33] IB/core: Add RoCE GID population
From: Matan Barak mat...@mellanox.com In order to populate the GID table, we need to listen for events: (a) IB device has been added or removed - used in order to allocate/deallocate the cache and populate the GID table internally. (b) inet events - add new GIDs (according to the IP addresses) to the table. (c) netdev up/down/change_addr - if a netdev is built onto our RoCE device, we need to add/delete its IPs. When an event is received, multiple entries (each with different GID type) are added. Signed-off-by: Matan Barak mat...@mellanox.com Signed-off-by: Somnath Kotur somnath.ko...@emulex.com --- drivers/infiniband/core/Makefile | 2 +- drivers/infiniband/core/core_priv.h | 26 ++ drivers/infiniband/core/device.c | 80 + drivers/infiniband/core/roce_gid_cache.c | 68 drivers/infiniband/core/roce_gid_mgmt.c | 516 +++ include/rdma/ib_addr.h | 2 +- include/rdma/ib_verbs.h | 9 + 7 files changed, 701 insertions(+), 2 deletions(-) create mode 100644 drivers/infiniband/core/roce_gid_mgmt.c diff --git a/drivers/infiniband/core/Makefile b/drivers/infiniband/core/Makefile index 9b63bdf..2c94963 100644 --- a/drivers/infiniband/core/Makefile +++ b/drivers/infiniband/core/Makefile @@ -10,7 +10,7 @@ obj-$(CONFIG_INFINIBAND_USER_ACCESS) += ib_uverbs.o ib_ucm.o \ ib_core-y := packer.o ud_header.o verbs.o sysfs.o \ device.o fmr_pool.o cache.o netlink.o \ - roce_gid_cache.o + roce_gid_cache.o roce_gid_mgmt.o ib_core-$(CONFIG_INFINIBAND_USER_MEM) += umem.o ib_core-$(CONFIG_INFINIBAND_ON_DEMAND_PAGING) += umem_odp.o umem_rbtree.o diff --git a/drivers/infiniband/core/core_priv.h b/drivers/infiniband/core/core_priv.h index a502daa..12797d9 100644 --- a/drivers/infiniband/core/core_priv.h +++ b/drivers/infiniband/core/core_priv.h @@ -39,6 +39,8 @@ #include rdma/ib_verbs.h +extern struct workqueue_struct *roce_gid_mgmt_wq; + int ib_device_register_sysfs(struct ib_device *device, int (*port_callback)(struct ib_device *, u8, struct kobject *)); @@ -53,6 +55,22 @@ void ib_cache_cleanup(void); int ib_resolve_eth_l2_attrs(struct ib_qp *qp, struct ib_qp_attr *qp_attr, int *qp_attr_mask); +typedef void (*roce_netdev_callback)(struct ib_device *device, u8 port, + struct net_device *idev, void *cookie); + +typedef int (*roce_netdev_filter)(struct ib_device *device, u8 port, +struct net_device *idev, void *cookie); + +void ib_dev_roce_ports_of_netdev(struct ib_device *ib_dev, +roce_netdev_filter filter, +void *filter_cookie, +roce_netdev_callback cb, +void *cookie); +void ib_enum_roce_ports_of_netdev(roce_netdev_filter filter, + void *filter_cookie, + roce_netdev_callback cb, + void *cookie); + int roce_gid_cache_get_gid(struct ib_device *ib_dev, u8 port, int index, union ib_gid *gid, struct ib_gid_attr *attr); @@ -66,6 +84,9 @@ int roce_gid_cache_find_gid_by_port(struct ib_device *ib_dev, union ib_gid *gid, int roce_gid_cache_is_active(struct ib_device *ib_dev, u8 port); +int roce_gid_cache_setup(void); +void roce_gid_cache_cleanup(void); + int roce_add_gid(struct ib_device *ib_dev, u8 port, union ib_gid *gid, struct ib_gid_attr *attr); @@ -75,4 +96,9 @@ int roce_del_gid(struct ib_device *ib_dev, u8 port, int roce_del_all_netdev_gids(struct ib_device *ib_dev, u8 port, struct net_device *ndev); +int roce_gid_mgmt_init(void); +void roce_gid_mgmt_cleanup(void); + +int roce_rescan_device(struct ib_device *ib_dev); + #endif /* _CORE_PRIV_H */ diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c index 8616a95..5ce57bf 100644 --- a/drivers/infiniband/core/device.c +++ b/drivers/infiniband/core/device.c @@ -39,6 +39,7 @@ #include linux/init.h #include linux/mutex.h #include rdma/rdma_netlink.h +#include rdma/ib_addr.h #include core_priv.h @@ -640,6 +641,82 @@ int ib_query_gid(struct ib_device *device, EXPORT_SYMBOL(ib_query_gid); /** + * ib_dev_roce_ports_of_netdev - enumerate RoCE ports of ibdev in + * respect of netdev + * @ib_dev : IB device we want to query + * @filter: Should we call the callback? + * @filter_cookie: Cookie passed to filter + * @cb: Callback to call for each found RoCE ports + * @cookie: Cookie passed back to the callback + * + * Enumerates all of the physical RoCE ports of ib_dev RoCE ports + * which are relaying Ethernet packets to a specific + * (possibly
[PATCH v3 for-next 11/33] IB/core: Modify ib_verbs and cma in order to use roce_gid_cache
From: Matan Barak mat...@mellanox.com Previously, we resolved the dmac and took the smac and vlan from the resolved address. Changing that into finding a net device that matches the IP and vlan of the network packet and querying the RoCE GID cache for this net device, GID and GID type. ocrdma driver changes were done by Somnath Kotur somnath.ko...@emulex.com Signed-off-by: Matan Barak mat...@mellanox.com Signed-off-by: Somnath Kotur somnath.ko...@emulex.com --- drivers/infiniband/core/addr.c | 3 +- drivers/infiniband/core/cm.c | 30 -- drivers/infiniband/core/cma.c| 9 -- drivers/infiniband/core/core_priv.h | 4 +- drivers/infiniband/core/sa_query.c | 4 - drivers/infiniband/core/ucma.c | 1 - drivers/infiniband/core/uverbs_cmd.c | 3 +- drivers/infiniband/core/verbs.c | 162 ++- drivers/infiniband/hw/mlx4/ah.c | 15 ++- drivers/infiniband/hw/mlx4/mad.c | 12 ++- drivers/infiniband/hw/mlx4/mcg.c | 2 +- drivers/infiniband/hw/mlx4/mlx4_ib.h | 2 +- drivers/infiniband/hw/mlx4/qp.c | 48 +++-- drivers/infiniband/hw/ocrdma/ocrdma.h| 1 + drivers/infiniband/hw/ocrdma/ocrdma_ah.c | 20 ++-- drivers/infiniband/hw/ocrdma/ocrdma_hw.c | 17 ++-- include/rdma/ib_addr.h | 2 +- include/rdma/ib_sa.h | 2 - include/rdma/ib_verbs.h | 11 +-- 19 files changed, 190 insertions(+), 158 deletions(-) diff --git a/drivers/infiniband/core/addr.c b/drivers/infiniband/core/addr.c index f80da50..43af7f5 100644 --- a/drivers/infiniband/core/addr.c +++ b/drivers/infiniband/core/addr.c @@ -458,7 +458,7 @@ static void resolve_cb(int status, struct sockaddr *src_addr, } int rdma_addr_find_dmac_by_grh(union ib_gid *sgid, union ib_gid *dgid, u8 *dmac, - u16 *vlan_id) + u16 *vlan_id, int if_index) { int ret = 0; struct rdma_dev_addr dev_addr; @@ -481,6 +481,7 @@ int rdma_addr_find_dmac_by_grh(union ib_gid *sgid, union ib_gid *dgid, u8 *dmac, return ret; memset(dev_addr, 0, sizeof(dev_addr)); + dev_addr.bound_dev_if = if_index; ctx.addr = dev_addr; init_completion(ctx.comp); diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c index d88f2ae..7974e74 100644 --- a/drivers/infiniband/core/cm.c +++ b/drivers/infiniband/core/cm.c @@ -178,8 +178,6 @@ struct cm_av { struct ib_ah_attr ah_attr; u16 pkey_index; u8 timeout; - u8 valid; - u8 smac[ETH_ALEN]; }; struct cm_work { @@ -382,7 +380,6 @@ static int cm_init_av_by_path(struct ib_sa_path_rec *path, struct cm_av *av) av-ah_attr); av-timeout = path-packet_life_time + 1; - av-valid = 1; return 0; } @@ -1563,7 +1560,6 @@ static int cm_req_handler(struct cm_work *work) cm_format_paths_from_req(req_msg, work-path[0], work-path[1]); memcpy(work-path[0].dmac, cm_id_priv-av.ah_attr.dmac, ETH_ALEN); - work-path[0].vlan_id = cm_id_priv-av.ah_attr.vlan_id; ret = cm_init_av_by_path(work-path[0], cm_id_priv-av); if (ret) { ib_get_cached_gid(work-port-cm_dev-ib_device, @@ -3511,32 +3507,6 @@ static int cm_init_qp_rtr_attr(struct cm_id_private *cm_id_priv, *qp_attr_mask = IB_QP_STATE | IB_QP_AV | IB_QP_PATH_MTU | IB_QP_DEST_QPN | IB_QP_RQ_PSN; qp_attr-ah_attr = cm_id_priv-av.ah_attr; - if (!cm_id_priv-av.valid) { - spin_unlock_irqrestore(cm_id_priv-lock, flags); - return -EINVAL; - } - if (cm_id_priv-av.ah_attr.vlan_id != 0x) { - qp_attr-vlan_id = cm_id_priv-av.ah_attr.vlan_id; - *qp_attr_mask |= IB_QP_VID; - } - if (!is_zero_ether_addr(cm_id_priv-av.smac)) { - memcpy(qp_attr-smac, cm_id_priv-av.smac, - sizeof(qp_attr-smac)); - *qp_attr_mask |= IB_QP_SMAC; - } - if (cm_id_priv-alt_av.valid) { - if (cm_id_priv-alt_av.ah_attr.vlan_id != 0x) { - qp_attr-alt_vlan_id = - cm_id_priv-alt_av.ah_attr.vlan_id; - *qp_attr_mask |= IB_QP_ALT_VID; - } - if (!is_zero_ether_addr(cm_id_priv-alt_av.smac)) { - memcpy(qp_attr-alt_smac, - cm_id_priv-alt_av.smac, - sizeof(qp_attr-alt_smac)); - *qp_attr_mask |= IB_QP_ALT_SMAC
[PATCH v3 for-next 05/33] net/bonding: make DRV macros private
From: Matan Barak mat...@mellanox.com The bonding modules currently defines 4 macros with general names that pollute the global namespace: DRV_VERSION DRV_RELDATE DRV_NAME DRV_DESCRIPTION Fixing that by defining a private bonding_priv.h header files which includes those defines. Signed-off-by: Matan Barak mat...@mellanox.com Signed-off-by: Somnath Kotur somnath.ko...@emulex.com --- drivers/net/bonding/bond_main.c| 2 ++ drivers/net/bonding/bond_procfs.c | 1 + drivers/net/bonding/bonding_priv.h | 26 ++ include/net/bonding.h | 7 --- 4 files changed, 29 insertions(+), 7 deletions(-) create mode 100644 drivers/net/bonding/bonding_priv.h diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index 468c70e..55f2d3e 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -81,6 +81,8 @@ #include net/bond_3ad.h #include net/bond_alb.h +#include bonding_priv.h + /* Module parameters */ /* monitor all links that often (in milliseconds). =0 disables monitoring */ diff --git a/drivers/net/bonding/bond_procfs.c b/drivers/net/bonding/bond_procfs.c index 976f5ad..b50a002 100644 --- a/drivers/net/bonding/bond_procfs.c +++ b/drivers/net/bonding/bond_procfs.c @@ -4,6 +4,7 @@ #include net/netns/generic.h #include net/bonding.h +#include bonding_priv.h static void *bond_info_seq_start(struct seq_file *seq, loff_t *pos) __acquires(RCU) diff --git a/drivers/net/bonding/bonding_priv.h b/drivers/net/bonding/bonding_priv.h new file mode 100644 index 000..c093e91 --- /dev/null +++ b/drivers/net/bonding/bonding_priv.h @@ -0,0 +1,26 @@ +/* + * Bond several ethernet interfaces into a Cisco, running 'Etherchannel'. + * + * Portions are (c) Copyright 1995 Simon Guru Aleph-Null Janes + * NCM: Network and Communications Management, Inc. + * + * BUT, I'm the one who modified it for ethernet, so: + * (c) Copyright 1999, Thomas Davis, tada...@lbl.gov + * + * This software may be used and distributed according to the terms + * of the GNU Public License, incorporated herein by reference. + * + */ + +#ifndef _BONDING_PRIV_H +#define _BONDING_PRIV_H + +#define DRV_VERSION3.7.1 +#define DRV_RELDATEApril 27, 2011 +#define DRV_NAME bonding +#define DRV_DESCRIPTIONEthernet Channel Bonding Driver + +#define bond_version DRV_DESCRIPTION : v DRV_VERSION ( DRV_RELDATE )\n + +#endif + diff --git a/include/net/bonding.h b/include/net/bonding.h index 4c2b0f4..a124173 100644 --- a/include/net/bonding.h +++ b/include/net/bonding.h @@ -30,13 +30,6 @@ #include net/bond_alb.h #include net/bond_options.h -#define DRV_VERSION3.7.1 -#define DRV_RELDATEApril 27, 2011 -#define DRV_NAME bonding -#define DRV_DESCRIPTIONEthernet Channel Bonding Driver - -#define bond_version DRV_DESCRIPTION : v DRV_VERSION ( DRV_RELDATE )\n - #define BOND_MAX_ARP_TARGETS 16 #define BOND_DEFAULT_MIIMON100 -- 2.1.0 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 for-next 02/33] IB/core: Add kref to IB devices
From: Matan Barak mat...@mellanox.com Previously. we used device_mutex lock in order to protect the device's list. That means that in order to guarantee a device isn't freed while we use it, we had to lock all devices. Adding a kref per IB device. Before an IB device is unregistered, we wait before its not held anymore. Signed-off-by: Matan Barak mat...@mellanox.com Signed-off-by: Somnath Kotur somnath.ko...@emulex.com --- drivers/infiniband/core/device.c | 41 include/rdma/ib_verbs.h | 6 ++ 2 files changed, 47 insertions(+) diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c index 18c1ece..8616a95 100644 --- a/drivers/infiniband/core/device.c +++ b/drivers/infiniband/core/device.c @@ -261,6 +261,39 @@ out: return ret; } +static void ib_device_complete_cb(struct kref *kref) +{ + struct ib_device *device = container_of(kref, struct ib_device, + refcount); + + if (device-reg_state = IB_DEV_UNREGISTERING) + complete(device-free); +} + +/** + * ib_device_hold - increase the reference count of device + * @device: ib device to prevent from being free'd + * + * Prevent the device from being free'd. + */ +void ib_device_hold(struct ib_device *device) +{ + kref_get(device-refcount); +} +EXPORT_SYMBOL(ib_device_hold); + +/** + * ib_device_put - decrease the reference count of device + * @device: allows this device to be free'd + * + * Puts the ib_device and allows it to be free'd. + */ +int ib_device_put(struct ib_device *device) +{ + return kref_put(device-refcount, ib_device_complete_cb); +} +EXPORT_SYMBOL(ib_device_put); + /** * ib_register_device - Register an IB device with IB core * @device:Device to register @@ -312,6 +345,9 @@ int ib_register_device(struct ib_device *device, list_add_tail(device-core_list, device_list); + kref_init(device-refcount); + init_completion(device-free); + device-reg_state = IB_DEV_REGISTERED; { @@ -342,6 +378,8 @@ void ib_unregister_device(struct ib_device *device) mutex_lock(device_mutex); + device-reg_state = IB_DEV_UNREGISTERING; + list_for_each_entry_reverse(client, client_list, list) if (client-remove) client-remove(device); @@ -355,6 +393,9 @@ void ib_unregister_device(struct ib_device *device) ib_device_unregister_sysfs(device); + ib_device_put(device); + wait_for_completion(device-free); + spin_lock_irqsave(device-client_data_lock, flags); list_for_each_entry_safe(context, tmp, device-client_data_list, list) kfree(context); diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index 1866595..a7593b0 100644 --- a/include/rdma/ib_verbs.h +++ b/include/rdma/ib_verbs.h @@ -1716,6 +1716,7 @@ struct ib_device { enum { IB_DEV_UNINITIALIZED, IB_DEV_REGISTERED, + IB_DEV_UNREGISTERING, IB_DEV_UNREGISTERED }reg_state; @@ -1728,6 +1729,8 @@ struct ib_device { u32 local_dma_lkey; u8 node_type; u8 phys_port_cnt; + struct kref refcount; + struct completionfree; }; struct ib_client { @@ -1741,6 +1744,9 @@ struct ib_client { struct ib_device *ib_alloc_device(size_t size); void ib_dealloc_device(struct ib_device *device); +void ib_device_hold(struct ib_device *device); +int ib_device_put(struct ib_device *device); + int ib_register_device(struct ib_device *device, int (*port_callback)(struct ib_device *, u8, struct kobject *)); -- 2.1.0 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 for-next 04/33] IB/core: Add default GID for RoCE GID Cache
From: Matan Barak mat...@mellanox.com When RoCE is used, a default GID address should be generated for every supported RoCE type. These default GID addresses are generated based on the IPv6 link-local address, but in contrast to the GID based on the regular IPv6 link-local (as we generate GID per IP address), these GIDs are also available if the net device is down (in order to support loopback). Moreover, these default GID addresses can't be deleted. Signed-off-by: Matan Barak mat...@mellanox.com Signed-off-by: Somnath Kotur somnath.ko...@emulex.com --- drivers/infiniband/core/core_priv.h | 12 +++ drivers/infiniband/core/roce_gid_cache.c | 179 --- drivers/infiniband/core/roce_gid_mgmt.c | 43 ++-- include/net/addrconf.h | 31 ++ include/rdma/ib_verbs.h | 1 + net/ipv6/addrconf.c | 31 -- 6 files changed, 243 insertions(+), 54 deletions(-) diff --git a/drivers/infiniband/core/core_priv.h b/drivers/infiniband/core/core_priv.h index 12797d9..128d2b3 100644 --- a/drivers/infiniband/core/core_priv.h +++ b/drivers/infiniband/core/core_priv.h @@ -84,6 +84,16 @@ int roce_gid_cache_find_gid_by_port(struct ib_device *ib_dev, union ib_gid *gid, int roce_gid_cache_is_active(struct ib_device *ib_dev, u8 port); +enum roce_gid_cache_default_mode { + ROCE_GID_CACHE_DEFAULT_MODE_SET, + ROCE_GID_CACHE_DEFAULT_MODE_DELETE +}; + +void roce_gid_cache_set_default_gid(struct ib_device *ib_dev, u8 port, + struct net_device *ndev, + unsigned long gid_type_mask, + enum roce_gid_cache_default_mode mode); + int roce_gid_cache_setup(void); void roce_gid_cache_cleanup(void); @@ -100,5 +110,7 @@ int roce_gid_mgmt_init(void); void roce_gid_mgmt_cleanup(void); int roce_rescan_device(struct ib_device *ib_dev); +unsigned long roce_gid_type_mask_support(struct ib_device *ib_dev, u8 port); + #endif /* _CORE_PRIV_H */ diff --git a/drivers/infiniband/core/roce_gid_cache.c b/drivers/infiniband/core/roce_gid_cache.c index 1d0f841..1f30dad 100644 --- a/drivers/infiniband/core/roce_gid_cache.c +++ b/drivers/infiniband/core/roce_gid_cache.c @@ -34,6 +34,7 @@ #include linux/netdevice.h #include linux/rtnetlink.h #include rdma/ib_cache.h +#include net/addrconf.h #include core_priv.h @@ -43,8 +44,10 @@ EXPORT_SYMBOL_GPL(zgid); static const struct ib_gid_attr zattr; enum gid_attr_find_mask { - GID_ATTR_FIND_MASK_GID_TYPE = 1UL 0, - GID_ATTR_FIND_MASK_NETDEV = 1UL 1, + GID_ATTR_FIND_MASK_GID = 1UL 0, + GID_ATTR_FIND_MASK_GID_TYPE = 1UL 1, + GID_ATTR_FIND_MASK_NETDEV = 1UL 2, + GID_ATTR_FIND_MASK_DEFAULT = 1UL 3, }; static inline int start_port(struct ib_device *ib_dev) @@ -69,7 +72,8 @@ static void put_ndev(struct rcu_head *rcu) static int write_gid(struct ib_device *ib_dev, u8 port, struct ib_roce_gid_cache *cache, int ix, const union ib_gid *gid, -const struct ib_gid_attr *attr) +const struct ib_gid_attr *attr, +bool default_gid) { unsigned int orig_seq; int ret; @@ -83,6 +87,7 @@ static int write_gid(struct ib_device *ib_dev, u8 port, */ smp_wmb(); + cache-data_vec[ix].default_gid = default_gid; ret = ib_dev-modify_gid(ib_dev, port, ix, gid, attr, cache-data_vec[ix].context); @@ -132,7 +137,8 @@ static int write_gid(struct ib_device *ib_dev, u8 port, } static int find_gid(struct ib_roce_gid_cache *cache, union ib_gid *gid, - const struct ib_gid_attr *val, unsigned long mask) + const struct ib_gid_attr *val, bool default_gid, + unsigned long mask) { int i; unsigned int orig_seq; @@ -152,13 +158,18 @@ static int find_gid(struct ib_roce_gid_cache *cache, union ib_gid *gid, attr-gid_type != val-gid_type) continue; - if (memcmp(gid, cache-data_vec[i].gid, sizeof(*gid))) + if (mask GID_ATTR_FIND_MASK_GID + memcmp(gid, cache-data_vec[i].gid, sizeof(*gid))) continue; if (mask GID_ATTR_FIND_MASK_NETDEV attr-ndev != val-ndev) continue; + if (mask GID_ATTR_FIND_MASK_DEFAULT + cache-data_vec[i].default_gid != default_gid) + continue; + /* We have a match, verify that the data we * compared is valid. Make sure that the * sequence number we read is the last to be @@ -176,12 +187,19 @@ static int find_gid(struct ib_roce_gid_cache *cache, union ib_gid *gid, return -1; } +static
[PATCH v3 for-next 08/33] IB/core: GID attribute should be returned from verbs API and cache API
From: Matan Barak mat...@mellanox.com Along with the GID itself, we now store GIDs attribute. This GID attribute contains important meta information regarding the GID itself, for example the netdevice. Thus, this information needs to be returned in APIs. This patch changes the following APIs: (a) ib_get_cached_gid (b) ib_find_cached_gid (c) ib_find_cached_gid_by_port (d) ib_query_gid It changes the usage of those APIs and use the RoCE GID cache when needed. Signed-off-by: Matan Barak mat...@mellanox.com Signed-off-by: Somnath Kotur somnath.ko...@emulex.com --- drivers/infiniband/core/cache.c| 225 + drivers/infiniband/core/cm.c | 6 +- drivers/infiniband/core/cma.c | 84 ++--- drivers/infiniband/core/device.c | 29 +++- drivers/infiniband/core/mad.c | 2 +- drivers/infiniband/core/multicast.c| 3 +- drivers/infiniband/core/sa_query.c | 7 +- drivers/infiniband/core/sysfs.c| 2 +- drivers/infiniband/core/uverbs_marshall.c | 4 +- drivers/infiniband/core/verbs.c| 7 +- drivers/infiniband/hw/mlx4/qp.c| 5 +- drivers/infiniband/hw/mthca/mthca_av.c | 2 +- drivers/infiniband/ulp/ipoib/ipoib_main.c | 2 +- drivers/infiniband/ulp/ipoib/ipoib_multicast.c | 2 +- drivers/infiniband/ulp/srp/ib_srp.c| 2 +- drivers/infiniband/ulp/srpt/ib_srpt.c | 3 +- include/rdma/ib_cache.h| 44 - include/rdma/ib_sa.h | 4 +- include/rdma/ib_verbs.h| 7 +- 19 files changed, 352 insertions(+), 88 deletions(-) diff --git a/drivers/infiniband/core/cache.c b/drivers/infiniband/core/cache.c index 80f6cf2..882d491 100644 --- a/drivers/infiniband/core/cache.c +++ b/drivers/infiniband/core/cache.c @@ -42,6 +42,8 @@ #include core_priv.h +#define __IB_ONLY + struct ib_pkey_cache { int table_len; u16 table[0]; @@ -69,16 +71,16 @@ static inline int end_port(struct ib_device *device) 0 : device-phys_port_cnt; } -int ib_get_cached_gid(struct ib_device *device, - u8port_num, - int index, - union ib_gid *gid) +static int __IB_ONLY __ib_get_cached_gid(struct ib_device *device, +u8port_num, +int index, +union ib_gid *gid) { struct ib_gid_cache *cache; unsigned long flags; int ret = 0; - if (port_num start_port(device) || port_num end_port(device)) + if (!device-cache.gid_cache) return -EINVAL; read_lock_irqsave(device-cache.lock, flags); @@ -94,43 +96,183 @@ int ib_get_cached_gid(struct ib_device *device, return ret; } + +int ib_cache_use_roce_gid_cache(struct ib_device *device, u8 port_num) +{ + if (rdma_port_get_link_layer(device, port_num) == + IB_LINK_LAYER_ETHERNET) { + if (device-cache.roce_gid_cache) + return 0; + else + return -EAGAIN; + } + + return -EINVAL; +} +EXPORT_SYMBOL(ib_cache_use_roce_gid_cache); + +int ib_get_cached_gid(struct ib_device *device, + u8port_num, + int index, + union ib_gid *gid, + struct ib_gid_attr *attr) +{ + int ret; + + if (port_num start_port(device) || port_num end_port(device)) + return -EINVAL; + + ret = ib_cache_use_roce_gid_cache(device, port_num); + if (!ret) + return roce_gid_cache_get_gid(device, port_num, index, gid, + attr); + + if (ret == -EAGAIN) + return ret; + + ret = __ib_get_cached_gid(device, port_num, index, gid); + + if (!ret attr) { + memset(attr, 0, sizeof(*attr)); + attr-gid_type = IB_GID_TYPE_IB; + } + + return ret; +} EXPORT_SYMBOL(ib_get_cached_gid); -int ib_find_cached_gid(struct ib_device *device, - union ib_gid *gid, - u8 *port_num, - u16 *index) +static int __IB_ONLY ___ib_find_cached_gid_by_port(struct ib_device *device, + u8 port_num, + const union ib_gid *gid, + u16 *index) { struct ib_gid_cache *cache; + u8 p = port_num - start_port(device); + int i; + + if (!ib_cache_use_roce_gid_cache(device
[PATCH v3 for-next 00/33] RoCE V1/V2 per GID
it. (5) cma_configfs should depend on both address translation and configfs. (6) ocrdma driver redefined zgid. (7) Added event information for NETDEV_CHANGEUPPER event. Changes from V1: (1) Addressed Shachar and Haggai's comments (2) Fixed multicast support (3) Generalized bonding support (4) Added default GID after the IB device's net device was removed from bonding (5) Fixed bugs in mlx4 implementation regarding multicast (6) Fixed bugs in mlx4 when using XRC QPs after this patchset was applied (7) Fixed bug when the RoCE gid cache didn't exist (8) Moved the bonding's DRV macros to a private header (9) Support non-configfs configurations Devesh Sharma (3): RDMA/ocrdma: changes to support RoCE-v2 in UD path RDMA/ocrdma: changes to support RoCE-v2 in RC path RDMA/ocrdma: changes to support user AH creation Maor Gottlieb (1): net/mlx4_core: Add handlning of R-RoCE over IPV4 in qp attach flow Matan Barak (14): IB/core: Add RoCE GID cache IB/core: Add kref to IB devices IB/core: Add RoCE GID population IB/core: Add default GID for RoCE GID Cache net/bonding: make DRV macros private net: Add info for NETDEV_CHANGEUPPER event IB/core: Add RoCE cache bonding support IB/core: GID attribute should be returned from verbs API and cache API IB/core: Report gid_type and gid_ndev through sysfs IB/core: Support find sgid index using a filter function IB/core: Modify ib_verbs and cma in order to use roce_gid_cache IB/core: Add gid_type to path and rdma_id_private IB/core: Add rdma_network_type to wc IB/cma: Add configfs for rdma_cm Moni Shoua (13): IB/mlx4: Remove gid table management for RoCE IB/mlx4: Replace spin_lock with rw_semaphore IB/mlx4: Lock with RCU instead of RTNL net/mlx4: Postpone the registration of net_device IB/mlx4: Advertise RoCE support in port capabilities IB/mlx4: Implement ib_device callback - get_netdev IB/mlx4: Implement ib_device callback - modify_gid IB/mlx4: Configure device to work in RoCEv2 IB/mlx4: Translate cache gid index to real index IB/core: Initialize UD header structure with IP and UDP headers IB/mlx4: Enable send of RoCE QP1 packets with IP/UDP headers IB/mlx4: Create and use another QP1 for RoCEv2 IB/cma: Join and leave multicast groups with IGMP Somnath Kotur (2): IB/Core: Changes to the IB Core infrastructure for RoCEv2 support RDMA/ocrdma: Changes in driver to incorporate the moving of GID Table mgmt to IB/Core. drivers/infiniband/Kconfig | 5 + drivers/infiniband/core/Makefile | 5 +- drivers/infiniband/core/addr.c | 11 +- drivers/infiniband/core/cache.c| 249 ++-- drivers/infiniband/core/cm.c | 49 +- drivers/infiniband/core/cma.c | 233 +-- drivers/infiniband/core/cma_configfs.c | 222 +++ drivers/infiniband/core/core_priv.h| 92 ++- drivers/infiniband/core/device.c | 150 - drivers/infiniband/core/mad.c | 2 +- drivers/infiniband/core/multicast.c| 17 +- drivers/infiniband/core/roce_gid_cache.c | 825 + drivers/infiniband/core/roce_gid_mgmt.c| 804 drivers/infiniband/core/sa_query.c | 12 +- drivers/infiniband/core/sysfs.c| 186 +- drivers/infiniband/core/ucma.c | 1 - drivers/infiniband/core/ud_header.c| 153 - drivers/infiniband/core/uverbs_cmd.c | 3 +- drivers/infiniband/core/uverbs_marshall.c | 5 +- drivers/infiniband/core/verbs.c| 266 ++-- drivers/infiniband/hw/mlx4/ah.c| 15 +- drivers/infiniband/hw/mlx4/mad.c | 12 +- drivers/infiniband/hw/mlx4/main.c | 758 +-- drivers/infiniband/hw/mlx4/mcg.c | 2 +- drivers/infiniband/hw/mlx4/mlx4_ib.h | 33 +- drivers/infiniband/hw/mlx4/qp.c| 337 -- drivers/infiniband/hw/mthca/mthca_av.c | 2 +- drivers/infiniband/hw/mthca/mthca_qp.c | 2 +- drivers/infiniband/hw/ocrdma/ocrdma.h | 12 + drivers/infiniband/hw/ocrdma/ocrdma_ah.c | 94 ++- drivers/infiniband/hw/ocrdma/ocrdma_ah.h | 5 +- drivers/infiniband/hw/ocrdma/ocrdma_hw.c | 50 +- drivers/infiniband/hw/ocrdma/ocrdma_main.c | 233 +-- drivers/infiniband/hw/ocrdma/ocrdma_sli.h | 18 +- drivers/infiniband/hw/ocrdma/ocrdma_verbs.c| 54 +- drivers/infiniband/hw/ocrdma/ocrdma_verbs.h| 4 + drivers/infiniband/ulp/ipoib/ipoib_main.c | 2 +- drivers/infiniband/ulp/ipoib/ipoib_multicast.c | 2 +- drivers/infiniband/ulp/srp/ib_srp.c| 2 +- drivers/infiniband/ulp/srpt/ib_srpt.c | 3 +- drivers/net/bonding/bond_main.c| 2 + drivers/net/bonding/bond_options.c | 13 - drivers/net/bonding/bond_procfs.c
[PATCH v3 for-next 15/33] IB/Core: Changes to the IB Core infrastructure for RoCEv2 support
1. Choose sgid_index and type from all the matching entries in RDMA-CM based on hint from the IP stack. 2. Set hop_limit for the IP Packet based on above hint from IP stack 3. Define a RDMA_NETWORK enum type. Signed-off-by: Somnath Kotur somnath.ko...@emulex.com --- drivers/infiniband/core/addr.c | 8 + drivers/infiniband/core/cma.c | 10 +- drivers/infiniband/core/verbs.c | 77 ++--- include/rdma/ib_addr.h | 1 + include/rdma/ib_verbs.h | 9 + 5 files changed, 68 insertions(+), 37 deletions(-) diff --git a/drivers/infiniband/core/addr.c b/drivers/infiniband/core/addr.c index 43af7f5..da24c0e 100644 --- a/drivers/infiniband/core/addr.c +++ b/drivers/infiniband/core/addr.c @@ -257,6 +257,9 @@ static int addr4_resolve(struct sockaddr_in *src_in, goto put; } + if (rt-rt_uses_gateway) + addr-network = RDMA_NETWORK_IPV4; + ret = dst_fetch_ha(rt-dst, addr, fl4.daddr); put: ip_rt_put(rt); @@ -271,6 +274,7 @@ static int addr6_resolve(struct sockaddr_in6 *src_in, { struct flowi6 fl6; struct dst_entry *dst; + struct rt6_info *rt; int ret; memset(fl6, 0, sizeof fl6); @@ -282,6 +286,7 @@ static int addr6_resolve(struct sockaddr_in6 *src_in, if ((ret = dst-error)) goto put; + rt = (struct rt6_info *)dst; if (ipv6_addr_any(fl6.saddr)) { ret = ipv6_dev_get_saddr(init_net, ip6_dst_idev(dst)-dev, fl6.daddr, 0, fl6.saddr); @@ -305,6 +310,9 @@ static int addr6_resolve(struct sockaddr_in6 *src_in, goto put; } + if (rt-rt6i_flags RTF_GATEWAY) + addr-network = RDMA_NETWORK_IPV6; + ret = dst_fetch_ha(dst, addr, fl6.daddr); put: dst_release(dst); diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index 8dec040..6f345e2 100644 --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -1952,6 +1952,7 @@ static int cma_resolve_iboe_route(struct rdma_id_private *id_priv) { struct rdma_route *route = id_priv-id.route; struct rdma_addr *addr = route-addr; + enum ib_gid_type network_gid_type; struct cma_work *work; int ret; struct net_device *ndev = NULL; @@ -1990,7 +1991,14 @@ static int cma_resolve_iboe_route(struct rdma_id_private *id_priv) rdma_ip2gid((struct sockaddr *)id_priv-id.route.addr.dst_addr, route-path_rec-dgid); - route-path_rec-hop_limit = 1; + /* Use the hint from IP Stack to select GID Type */ + network_gid_type = ib_network_to_gid_type(addr-dev_addr.network); + if (addr-dev_addr.network != RDMA_NETWORK_IB) { + route-path_rec-gid_type = network_gid_type; + route-path_rec-hop_limit = IPV6_DEFAULT_HOPLIMIT; + } else { + route-path_rec-hop_limit = 1; + } route-path_rec-reversible = 1; route-path_rec-pkey = cpu_to_be16(0x); route-path_rec-mtu_selector = IB_SA_EQ; diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c index 2e7ccad..3586996 100644 --- a/drivers/infiniband/core/verbs.c +++ b/drivers/infiniband/core/verbs.c @@ -195,11 +195,11 @@ struct ib_ah *ib_create_ah(struct ib_pd *pd, struct ib_ah_attr *ah_attr) } EXPORT_SYMBOL(ib_create_ah); -static int ib_get_grh_header_version(const void *h) +static int ib_get_grh_header_version(const union rdma_network_hdr *h) { - const struct iphdr *ip4h = (struct iphdr *)(h + 20); + const struct iphdr *ip4h = (struct iphdr *)h-roce4grh; struct iphdr ip4h_checked; - const struct ipv6hdr *ip6h = (struct ipv6hdr *)h; + const struct ipv6hdr *ip6h = (struct ipv6hdr *)h-ibgrh; if (ip6h-version != 6) return (ip4h-version == 4) ? 4 : 0; @@ -219,37 +219,6 @@ static int ib_get_grh_header_version(const void *h) return 6; } -static int ib_get_dgid_sgid_by_grh(const void *h, - enum rdma_network_type net_type, - union ib_gid *dgid, union ib_gid *sgid) -{ - switch (net_type) { - case RDMA_NETWORK_IPV4: { - const struct iphdr *ip4h = (struct iphdr *)(h + 20); - - ipv6_addr_set_v4mapped(ip4h-daddr, (struct in6_addr *)dgid); - ipv6_addr_set_v4mapped(ip4h-saddr, (struct in6_addr *)sgid); - return 0; - } - case RDMA_NETWORK_IPV6: { - struct ipv6hdr *ip6h = (struct ipv6hdr *)h; - - memcpy(dgid, ip6h-daddr, sizeof(*dgid)); - memcpy(sgid, ip6h-saddr, sizeof(*sgid)); - return 0; - } - case RDMA_NETWORK_IB: { - struct ib_grh *grh = (struct ib_grh *)h; - - memcpy(dgid, grh-dgid, sizeof(*dgid
[PATCH v3 for-next 16/33] RDMA/ocrdma: Changes in driver to incorporate the moving of GID Table mgmt to IB/Core.
1.Check and set port capability flags to indicate RoCEV2 support. 2.Change query_gid hook to return value from IB/Core GID Mgmt APIs. 3.Get rid of all the netdev notifier chain subscription code as well as maintenance of SGID Table in memory. 4.Implement get_netdev hook in driver. Signed-off-by: Somnath Kotur somnath.ko...@emulex.com Signed-off-by: Devesh Sharma devesh.sha...@emulex.com --- drivers/infiniband/hw/ocrdma/ocrdma.h | 10 ++ drivers/infiniband/hw/ocrdma/ocrdma_hw.c| 3 + drivers/infiniband/hw/ocrdma/ocrdma_main.c | 233 +--- drivers/infiniband/hw/ocrdma/ocrdma_sli.h | 13 ++ drivers/infiniband/hw/ocrdma/ocrdma_verbs.c | 33 +++- drivers/infiniband/hw/ocrdma/ocrdma_verbs.h | 4 + 6 files changed, 64 insertions(+), 232 deletions(-) diff --git a/drivers/infiniband/hw/ocrdma/ocrdma.h b/drivers/infiniband/hw/ocrdma/ocrdma.h index 16ee36e..97f971a 100644 --- a/drivers/infiniband/hw/ocrdma/ocrdma.h +++ b/drivers/infiniband/hw/ocrdma/ocrdma.h @@ -100,6 +100,7 @@ struct ocrdma_dev_attr { u8 local_ca_ack_delay; u8 ird; u8 num_ird_pages; + u8 roce_flags; }; struct ocrdma_dma_mem { @@ -575,4 +576,13 @@ static inline u8 ocrdma_is_enabled_and_synced(u32 state) (state OCRDMA_STATE_FLAG_SYNC); } +static inline bool ocrdma_is_rocev2_supported(struct ocrdma_dev *dev) +{ + return (dev-attr.roce_flags (OCRDMA_L3_TYPE_IPV4 + OCRDMA_ROUDP_FLAGS_SHIFT) || + dev-attr.roce_flags (OCRDMA_L3_TYPE_IPV6 + OCRDMA_ROUDP_FLAGS_SHIFT)) ? + true : false; +} + #endif diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_hw.c b/drivers/infiniband/hw/ocrdma/ocrdma_hw.c index e5f0244..20f9e8f 100644 --- a/drivers/infiniband/hw/ocrdma/ocrdma_hw.c +++ b/drivers/infiniband/hw/ocrdma/ocrdma_hw.c @@ -1112,6 +1112,9 @@ static void ocrdma_get_attr(struct ocrdma_dev *dev, attr-local_ca_ack_delay = (rsp-max_pd_ca_ack_delay OCRDMA_MBX_QUERY_CFG_CA_ACK_DELAY_MASK) OCRDMA_MBX_QUERY_CFG_CA_ACK_DELAY_SHIFT; + attr-roce_flags = (rsp-max_pd_ca_ack_delay + OCRDMA_MBX_QUERY_CFG_L3_TYPE_MASK) + OCRDMA_MBX_QUERY_CFG_L3_TYPE_SHIFT; attr-max_mw = rsp-max_mw; attr-max_mr = rsp-max_mr; attr-max_mr_size = ((u64)rsp-max_mr_size_hi 32) | diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_main.c b/drivers/infiniband/hw/ocrdma/ocrdma_main.c index 7a2b59a..a81492f 100644 --- a/drivers/infiniband/hw/ocrdma/ocrdma_main.c +++ b/drivers/infiniband/hw/ocrdma/ocrdma_main.c @@ -51,8 +51,6 @@ static LIST_HEAD(ocrdma_dev_list); static DEFINE_SPINLOCK(ocrdma_devlist_lock); static DEFINE_IDR(ocrdma_dev_id); -static union ib_gid ocrdma_zero_sgid; - void ocrdma_get_guid(struct ocrdma_dev *dev, u8 *guid) { u8 mac_addr[6]; @@ -67,135 +65,6 @@ void ocrdma_get_guid(struct ocrdma_dev *dev, u8 *guid) guid[6] = mac_addr[4]; guid[7] = mac_addr[5]; } - -static bool ocrdma_add_sgid(struct ocrdma_dev *dev, union ib_gid *new_sgid) -{ - int i; - unsigned long flags; - - memset(ocrdma_zero_sgid, 0, sizeof(union ib_gid)); - - - spin_lock_irqsave(dev-sgid_lock, flags); - for (i = 0; i OCRDMA_MAX_SGID; i++) { - if (!memcmp(dev-sgid_tbl[i], ocrdma_zero_sgid, - sizeof(union ib_gid))) { - /* found free entry */ - memcpy(dev-sgid_tbl[i], new_sgid, - sizeof(union ib_gid)); - spin_unlock_irqrestore(dev-sgid_lock, flags); - return true; - } else if (!memcmp(dev-sgid_tbl[i], new_sgid, - sizeof(union ib_gid))) { - /* entry already present, no addition is required. */ - spin_unlock_irqrestore(dev-sgid_lock, flags); - return false; - } - } - spin_unlock_irqrestore(dev-sgid_lock, flags); - return false; -} - -static bool ocrdma_del_sgid(struct ocrdma_dev *dev, union ib_gid *sgid) -{ - int found = false; - int i; - unsigned long flags; - - - spin_lock_irqsave(dev-sgid_lock, flags); - /* first is default sgid, which cannot be deleted. */ - for (i = 1; i OCRDMA_MAX_SGID; i++) { - if (!memcmp(dev-sgid_tbl[i], sgid, sizeof(union ib_gid))) { - /* found matching entry */ - memset(dev-sgid_tbl[i], 0, sizeof(union ib_gid)); - found = true; - break; - } - } - spin_unlock_irqrestore(dev-sgid_lock, flags); - return found; -} - -static int ocrdma_addr_event(unsigned long event, struct
[PATCH v3 for-next 14/33] IB/cma: Add configfs for rdma_cm
From: Matan Barak mat...@mellanox.com Users would like to control the behaviour of rdma_cm. For example, old applications which doesn't set the required RoCE gid type could be executed on RoCE V2 network types. In order to support this configuration, we implement a configfs for rdma_cm. In order to use the configfs, one needs to mount it and mkdir IB device name inside rdma_cm directory. The patch adds support for a single configuration file, default_roce_mode. The mode can either be IB RoCEv1 or RoCEv2. Signed-off-by: Matan Barak mat...@mellanox.com Signed-off-by: Somnath Kotur somnath.ko...@emulex.com --- drivers/infiniband/Kconfig | 5 + drivers/infiniband/core/Makefile | 2 + drivers/infiniband/core/cma.c| 54 +++- drivers/infiniband/core/cma_configfs.c | 222 +++ drivers/infiniband/core/core_priv.h | 15 +++ drivers/infiniband/core/roce_gid_cache.c | 13 ++ 6 files changed, 307 insertions(+), 4 deletions(-) create mode 100644 drivers/infiniband/core/cma_configfs.c diff --git a/drivers/infiniband/Kconfig b/drivers/infiniband/Kconfig index b899531..20bda60 100644 --- a/drivers/infiniband/Kconfig +++ b/drivers/infiniband/Kconfig @@ -54,6 +54,11 @@ config INFINIBAND_ADDR_TRANS depends on INFINIBAND default y +config CONFIG_INFINIBAND_ADDR_TRANS_CONFIGFS + bool + depends on INFINIBAND_ADDR_TRANS CONFIGFS_FS + default y + source drivers/infiniband/hw/mthca/Kconfig source drivers/infiniband/hw/ipath/Kconfig source drivers/infiniband/hw/qib/Kconfig diff --git a/drivers/infiniband/core/Makefile b/drivers/infiniband/core/Makefile index 2c94963..f6bc8c5 100644 --- a/drivers/infiniband/core/Makefile +++ b/drivers/infiniband/core/Makefile @@ -24,6 +24,8 @@ iw_cm-y :=iwcm.o iwpm_util.o iwpm_msg.o rdma_cm-y := cma.o +rdma_cm-$(CONFIG_INFINIBAND_ADDR_TRANS_CONFIGFS) += cma_configfs.o + rdma_ucm-y := ucma.o ib_addr-y := addr.o diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index 9afa410..8dec040 100644 --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -55,6 +55,7 @@ #include rdma/ib_cm.h #include rdma/ib_sa.h #include rdma/iw_cm.h +#include core_priv.h MODULE_AUTHOR(Sean Hefty); MODULE_DESCRIPTION(Generic RDMA CM Agent); @@ -91,6 +92,7 @@ struct cma_device { struct completion comp; atomic_trefcount; struct list_headid_list; + enum ib_gid_typedefault_gid_type; }; struct rdma_bind_list { @@ -103,6 +105,42 @@ enum { CMA_OPTION_AFONLY, }; +void cma_ref_dev(struct cma_device *cma_dev) +{ + atomic_inc(cma_dev-refcount); +} + +struct cma_device *cma_enum_devices_by_ibdev(cma_device_filter filter, +void *cookie) +{ + struct cma_device *cma_dev; + struct cma_device *found_cma_dev = NULL; + + mutex_lock(lock); + + list_for_each_entry(cma_dev, dev_list, list) + if (filter(cma_dev-device, cookie)) { + found_cma_dev = cma_dev; + break; + } + + if (found_cma_dev) + cma_ref_dev(found_cma_dev); + mutex_unlock(lock); + return found_cma_dev; +} + +enum ib_gid_type cma_get_default_gid_type(struct cma_device *cma_dev) +{ + return cma_dev-default_gid_type; +} + +void cma_set_default_gid_type(struct cma_device *cma_dev, + enum ib_gid_type default_gid_type) +{ + cma_dev-default_gid_type = default_gid_type; +} + /* * Device removal can occur at anytime, so we need extra handling to * serialize notifying the user of device removal with other callbacks. @@ -248,15 +286,16 @@ static inline void cma_set_ip_ver(struct cma_hdr *hdr, u8 ip_ver) static void cma_attach_to_dev(struct rdma_id_private *id_priv, struct cma_device *cma_dev) { - atomic_inc(cma_dev-refcount); + cma_ref_dev(cma_dev); id_priv-cma_dev = cma_dev; + id_priv-gid_type = cma_dev-default_gid_type; id_priv-id.device = cma_dev-device; id_priv-id.route.addr.dev_addr.transport = rdma_node_get_transport(cma_dev-device-node_type); list_add_tail(id_priv-list, cma_dev-id_list); } -static inline void cma_deref_dev(struct cma_device *cma_dev) +void cma_deref_dev(struct cma_device *cma_dev) { if (atomic_dec_and_test(cma_dev-refcount)) complete(cma_dev-comp); @@ -385,7 +424,7 @@ static int cma_acquire_dev(struct rdma_id_private *id_priv, ret = ib_find_cached_gid_by_port(cma_dev-device, iboe_gid, -IB_GID_TYPE_IB
[PATCH v3 for-next 12/33] IB/core: Add gid_type to path and rdma_id_private
From: Matan Barak mat...@mellanox.com When using rdma cm, we want to take the gid_type from the rdma_id_private. This is mandatory before adding an API from user-space/configfs that sets the gid_type of CM connection. Signed-off-by: Matan Barak mat...@mellanox.com Signed-off-by: Somnath Kotur somnath.ko...@emulex.com --- drivers/infiniband/core/cm.c | 19 ++- drivers/infiniband/core/cma.c | 2 ++ drivers/infiniband/core/sa_query.c| 3 ++- drivers/infiniband/core/uverbs_marshall.c | 1 + include/rdma/ib_sa.h | 1 + 5 files changed, 20 insertions(+), 6 deletions(-) diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c index 7974e74..22dac05 100644 --- a/drivers/infiniband/core/cm.c +++ b/drivers/infiniband/core/cm.c @@ -358,9 +358,8 @@ static int cm_init_av_by_path(struct ib_sa_path_rec *path, struct cm_av *av) read_lock_irqsave(cm.device_lock, flags); list_for_each_entry(cm_dev, cm.device_list, list) { if (!ib_find_cached_gid(cm_dev-ib_device, path-sgid, - IB_GID_TYPE_IB, path-net, - path-ifindex, - p, NULL)) { + path-gid_type, path-net, + path-ifindex, p, NULL)) { port = cm_dev-port[p-1]; break; } @@ -1521,6 +1520,8 @@ static int cm_req_handler(struct cm_work *work) struct ib_cm_id *cm_id; struct cm_id_private *cm_id_priv, *listen_cm_id_priv; struct cm_req_msg *req_msg; + union ib_gid gid; + struct ib_gid_attr gid_attr; int ret; req_msg = (struct cm_req_msg *)work-mad_recv_wc-recv_buf.mad; @@ -1560,11 +1561,19 @@ static int cm_req_handler(struct cm_work *work) cm_format_paths_from_req(req_msg, work-path[0], work-path[1]); memcpy(work-path[0].dmac, cm_id_priv-av.ah_attr.dmac, ETH_ALEN); - ret = cm_init_av_by_path(work-path[0], cm_id_priv-av); + ret = ib_get_cached_gid(work-port-cm_dev-ib_device, + work-port-port_num, + cm_id_priv-av.ah_attr.grh.sgid_index, + gid, gid_attr); + if (!ret) { + work-path[0].gid_type = gid_attr.gid_type; + ret = cm_init_av_by_path(work-path[0], cm_id_priv-av); + } if (ret) { ib_get_cached_gid(work-port-cm_dev-ib_device, work-port-port_num, 0, work-path[0].sgid, - NULL); + gid_attr); + work-path[0].gid_type = gid_attr.gid_type; ib_send_cm_rej(cm_id, IB_CM_REJ_INVALID_GID, work-path[0].sgid, sizeof work-path[0].sgid, NULL, 0); diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index 659676c..9afa410 100644 --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -146,6 +146,7 @@ struct rdma_id_private { u8 tos; u8 reuseaddr; u8 afonly; + enum ib_gid_typegid_type; }; struct cma_multicast { @@ -1936,6 +1937,7 @@ static int cma_resolve_iboe_route(struct rdma_id_private *id_priv) ndev = dev_get_by_index(init_net, addr-dev_addr.bound_dev_if); route-path_rec-net = init_net; route-path_rec-ifindex = addr-dev_addr.bound_dev_if; + route-path_rec-gid_type = id_priv-gid_type; } if (!ndev) { ret = -ENODEV; diff --git a/drivers/infiniband/core/sa_query.c b/drivers/infiniband/core/sa_query.c index 705b6b8..f770049 100644 --- a/drivers/infiniband/core/sa_query.c +++ b/drivers/infiniband/core/sa_query.c @@ -546,7 +546,7 @@ int ib_init_ah_from_path(struct ib_device *device, u8 port_num, ah_attr-ah_flags = IB_AH_GRH; ah_attr-grh.dgid = rec-dgid; - ret = ib_find_cached_gid(device, rec-sgid, IB_GID_TYPE_IB, + ret = ib_find_cached_gid(device, rec-sgid, rec-gid_type, rec-net, rec-ifindex, port_num, gid_index); if (ret) @@ -676,6 +676,7 @@ static void ib_sa_path_rec_callback(struct ib_sa_query *sa_query, mad-data, rec); rec.net = NULL; rec.ifindex = 0; + rec.gid_type = IB_GID_TYPE_IB; memset(rec.dmac, 0, ETH_ALEN); query-callback(status, rec, query-context); } else diff --git a/drivers/infiniband/core/uverbs_marshall.c b/drivers/infiniband/core/uverbs_marshall.c index 7d2f14c..af020f8 100644
[PATCH v3 for-next 10/33] IB/core: Support find sgid index using a filter function
From: Matan Barak mat...@mellanox.com Sometimes a sgid index need to be found based on variable parameters. For example, when the CM gets a packet from network, it needs to match a sgid_index that matches the appropriate L2 attributes of a packet. Extending the cache's API to include Ethernet L2 attribute is problematic, since they may be vastly extended in the future. As a result, we add a find function that gets a user filter function and searches the GID table until a match is found. Signed-off-by: Matan Barak mat...@mellanox.com Signed-off-by: Somnath Kotur somnath.ko...@emulex.com --- drivers/infiniband/core/cache.c | 24 drivers/infiniband/core/core_priv.h | 9 + drivers/infiniband/core/roce_gid_cache.c | 66 include/rdma/ib_cache.h | 27 + 4 files changed, 126 insertions(+) diff --git a/drivers/infiniband/core/cache.c b/drivers/infiniband/core/cache.c index 882d491..ae86fe8 100644 --- a/drivers/infiniband/core/cache.c +++ b/drivers/infiniband/core/cache.c @@ -273,6 +273,30 @@ int ib_find_cached_gid_by_port(struct ib_device *device, } EXPORT_SYMBOL(ib_find_cached_gid_by_port); +int ib_find_gid_by_filter(struct ib_device *device, + union ib_gid *gid, + u8 port_num, + bool (*filter)(const union ib_gid *gid, +const struct ib_gid_attr *, +void *), + void *context, u16 *index) +{ + /* Look for a RoCE device with the specified GID. */ + if (!ib_cache_use_roce_gid_cache(device, port_num)) + return roce_gid_cache_find_gid_by_filter(device, gid, +port_num, filter, +context, index); + + /* Only RoCE GID cache supports filter function */ + if (filter) + return -ENOSYS; + + /* If no RoCE devices with the specified GID, look for IB device. */ + return __ib_find_cached_gid_by_port(device, port_num, + gid, index); +} +EXPORT_SYMBOL(ib_find_gid_by_filter); + int ib_get_cached_pkey(struct ib_device *device, u8port_num, int index, diff --git a/drivers/infiniband/core/core_priv.h b/drivers/infiniband/core/core_priv.h index b5bbbdf..949844c 100644 --- a/drivers/infiniband/core/core_priv.h +++ b/drivers/infiniband/core/core_priv.h @@ -84,6 +84,15 @@ int roce_gid_cache_find_gid_by_port(struct ib_device *ib_dev, union ib_gid *gid, enum ib_gid_type gid_type, u8 port, struct net *net, int if_index, u16 *index); +int roce_gid_cache_find_gid_by_filter(struct ib_device *ib_dev, + union ib_gid *gid, + u8 port, + bool (*filter)(const union ib_gid *gid, +const struct ib_gid_attr *, +void *), + void *context, + u16 *index); + int roce_gid_cache_is_active(struct ib_device *ib_dev, u8 port); enum roce_gid_cache_default_mode { diff --git a/drivers/infiniband/core/roce_gid_cache.c b/drivers/infiniband/core/roce_gid_cache.c index b6180eb..bd51d97 100644 --- a/drivers/infiniband/core/roce_gid_cache.c +++ b/drivers/infiniband/core/roce_gid_cache.c @@ -455,6 +455,72 @@ int roce_gid_cache_find_gid_by_port(struct ib_device *ib_dev, union ib_gid *gid, return -ENOENT; } +int roce_gid_cache_find_gid_by_filter(struct ib_device *ib_dev, + union ib_gid *gid, + u8 port, + bool (*filter)(const union ib_gid *, +const struct ib_gid_attr *, +void *), + void *context, + u16 *index) +{ + struct ib_roce_gid_cache *cache; + unsigned int i; + bool found = false; + + if (!ib_dev-cache.roce_gid_cache) + return -ENOSYS; + + if (port start_port(ib_dev) || + port start_port(ib_dev) + ib_dev-phys_port_cnt || + rdma_port_get_link_layer(ib_dev, port) != + IB_LINK_LAYER_ETHERNET) + return -ENOSYS; + + cache = ib_dev-cache.roce_gid_cache[port - start_port(ib_dev)]; + + if (!cache || !cache-active) + return -ENOENT; + + for (i = 0; i cache-sz; i++) { + unsigned int orig_seq; + struct ib_gid_attr attr
[PATCH v3 for-next 18/33] RDMA/ocrdma: changes to support RoCE-v2 in RC path
From: Devesh Sharma devesh.sha...@emulex.com To support RoCE-V2 this patch implements following changes 1. Get the GID-type for a given sgid. 2. Based on the gid type get IPv4 L3 address and give those to FW. 3. Provide l3-type to FW. Signed-off-by: Somnath Kotur somnath.ko...@emulex.com Signed-off-by: Devesh Sharma devesh.sha...@emulex.com --- drivers/infiniband/hw/ocrdma/ocrdma_hw.c | 30 -- 1 file changed, 28 insertions(+), 2 deletions(-) diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_hw.c b/drivers/infiniband/hw/ocrdma/ocrdma_hw.c index 20f9e8f..147fccf 100644 --- a/drivers/infiniband/hw/ocrdma/ocrdma_hw.c +++ b/drivers/infiniband/hw/ocrdma/ocrdma_hw.c @@ -2433,7 +2433,13 @@ static int ocrdma_set_av_params(struct ocrdma_qp *qp, union ib_gid sgid, zgid; struct ib_gid_attr sgid_attr; u32 vlan_id = 0x; - u8 mac_addr[6]; + u8 mac_addr[6], hdr_type; + union { + struct sockaddr _sockaddr; + struct sockaddr_in _sockaddr_in; + struct sockaddr_in6 _sockaddr_in6; + } sgid_addr, dgid_addr; + struct ocrdma_dev *dev = get_ocrdma_dev(qp-ibqp.device); if ((ah_attr-ah_flags IB_AH_GRH) == 0) @@ -2448,6 +2454,8 @@ static int ocrdma_set_av_params(struct ocrdma_qp *qp, cmd-params.hop_lmt_rq_psn |= (ah_attr-grh.hop_limit OCRDMA_QP_PARAMS_HOP_LMT_SHIFT); cmd-flags |= OCRDMA_QP_PARA_FLOW_LBL_VALID; + + /* GIDs */ memcpy(cmd-params.dgid[0], ah_attr-grh.dgid.raw[0], sizeof(cmd-params.dgid)); @@ -2471,17 +2479,35 @@ static int ocrdma_set_av_params(struct ocrdma_qp *qp, return status; cmd-params.dmac_b0_to_b3 = mac_addr[0] | (mac_addr[1] 8) | (mac_addr[2] 16) | (mac_addr[3] 24); + hdr_type = ib_gid_to_network_type(sgid_attr.gid_type, sgid); + if (hdr_type == RDMA_NETWORK_IPV4) { + status = rdma_gid2ip(sgid_addr._sockaddr, sgid); + if (status) + return status; + status = rdma_gid2ip(dgid_addr._sockaddr, ah_attr-grh.dgid); + if (status) + return status; + memcpy(cmd-params.dgid[0], + dgid_addr._sockaddr_in.sin_addr.s_addr, 4); + memcpy(cmd-params.sgid[0], + sgid_addr._sockaddr_in.sin_addr.s_addr, 4); + } /* convert them to LE format. */ ocrdma_cpu_to_le32(cmd-params.dgid[0], sizeof(cmd-params.dgid)); ocrdma_cpu_to_le32(cmd-params.sgid[0], sizeof(cmd-params.sgid)); cmd-params.vlan_dmac_b4_to_b5 = mac_addr[4] | (mac_addr[5] 8); - if (attr_mask IB_QP_VID) { + if (vlan_id 0x1000) { cmd-params.vlan_dmac_b4_to_b5 |= vlan_id OCRDMA_QP_PARAMS_VLAN_SHIFT; cmd-flags |= OCRDMA_QP_PARA_VLAN_EN_VALID; cmd-params.rnt_rc_sl_fl |= (dev-sl 0x07) OCRDMA_QP_PARAMS_SL_SHIFT; } + + cmd-params.max_sge_recv_flags |= +((hdr_type +OCRDMA_QP_PARAMS_FLAGS_L3_TYPE_SHIFT) +OCRDMA_QP_PARAMS_FLAGS_L3_TYPE_MASK); return 0; } -- 2.1.0 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 for-next 29/33] net/mlx4_core: Add handling of R-RoCE over IPV4 in qp attach flow
From: Maor Gottlieb ma...@mellanox.com In that case, the IPv4 bit should be enabled in the IB flow spec. Signed-off-by: Maor Gottlieb ma...@mellanox.com Signed-off-by: Somnath Kotur somnath.ko...@emulex.com --- drivers/net/ethernet/mellanox/mlx4/mcg.c | 14 -- include/linux/mlx4/device.h | 6 ++ 2 files changed, 18 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx4/mcg.c b/drivers/net/ethernet/mellanox/mlx4/mcg.c index a3867e7..cdf07b9 100644 --- a/drivers/net/ethernet/mellanox/mlx4/mcg.c +++ b/drivers/net/ethernet/mellanox/mlx4/mcg.c @@ -858,7 +858,9 @@ static int parse_trans_rule(struct mlx4_dev *dev, struct mlx4_spec_list *spec, break; case MLX4_NET_TRANS_RULE_ID_IB: - rule_hw-ib.l3_qpn = spec-ib.l3_qpn; + rule_hw-ib.l3_qpn = spec-ib.l3_qpn | + (spec-ib.roce_type == MLX4_FLOW_SPEC_IB_ROCE_TYPE_IPV4 ? +0x80 : 0); rule_hw-ib.qpn_mask = spec-ib.qpn_msk; memcpy(rule_hw-ib.dst_gid, spec-ib.dst_gid, 16); memcpy(rule_hw-ib.dst_gid_msk, spec-ib.dst_gid_msk, 16); @@ -1377,10 +1379,18 @@ int mlx4_trans_to_dmfs_attach(struct mlx4_dev *dev, struct mlx4_qp *qp, memcpy(spec.eth.dst_mac_msk, mac_mask, ETH_ALEN); break; + case MLX4_PROT_IB_IPV4: + spec.id = MLX4_NET_TRANS_RULE_ID_IB; + memcpy(spec.ib.dst_gid + 12, gid + 12, 4); + memset(spec.ib.dst_gid_msk + 12, 0xff, 4); + spec.ib.roce_type = MLX4_FLOW_SPEC_IB_ROCE_TYPE_IPV4; + + break; case MLX4_PROT_IB_IPV6: spec.id = MLX4_NET_TRANS_RULE_ID_IB; memcpy(spec.ib.dst_gid, gid, 16); - memset(spec.ib.dst_gid_msk, 0xff, 16); + memset(spec.ib.dst_gid_msk, 0xff, 16); + spec.ib.roce_type = MLX4_FLOW_SPEC_IB_ROCE_TYPE_IPV6; break; default: return -EINVAL; diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h index dd1488c..58b0b8c 100644 --- a/include/linux/mlx4/device.h +++ b/include/linux/mlx4/device.h @@ -369,6 +369,11 @@ enum mlx4_protocol { MLX4_PROT_FCOE }; +enum mlx4_flow_roce_type { + MLX4_FLOW_SPEC_IB_ROCE_TYPE_IPV6 = 0, + MLX4_FLOW_SPEC_IB_ROCE_TYPE_IPV4 +}; + enum { MLX4_MTT_FLAG_PRESENT = 1 }; @@ -1096,6 +1101,7 @@ struct mlx4_spec_ipv4 { struct mlx4_spec_ib { __be32 l3_qpn; __be32 qpn_msk; + enummlx4_flow_roce_type roce_type; u8 dst_gid[16]; u8 dst_gid_msk[16]; }; -- 2.1.0 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 for-next 20/33] IB/mlx4: Remove gid table management for RoCE
From: Moni Shoua mo...@mellanox.com RoCE GID table management moved to InfiniBand core driver. Core driver is now responsible to populate the GID table and supply query and lookup functions for GIDs. HW drivers are responsible only modify GID table in network adapters. The query_gid hook should now return the answer from the cache when link layer is Ethernet. Signed-off-by: Moni Shoua mo...@mellanox.com Signed-off-by: Somnath Kotur somnath.ko...@emulex.com --- drivers/infiniband/hw/mlx4/main.c| 495 +-- drivers/infiniband/hw/mlx4/mlx4_ib.h | 4 - 2 files changed, 14 insertions(+), 485 deletions(-) diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c index 6fa5e49..91caffc 100644 --- a/drivers/infiniband/hw/mlx4/main.c +++ b/drivers/infiniband/hw/mlx4/main.c @@ -45,6 +45,7 @@ #include rdma/ib_smi.h #include rdma/ib_user_verbs.h #include rdma/ib_addr.h +#include rdma/ib_cache.h #include linux/mlx4/driver.h #include linux/mlx4/cmd.h @@ -74,13 +75,6 @@ static const char mlx4_ib_version[] = DRV_NAME : Mellanox ConnectX InfiniBand driver v DRV_VERSION ( DRV_RELDATE )\n; -struct update_gid_work { - struct work_struct work; - union ib_gidgids[128]; - struct mlx4_ib_dev *dev; - int port; -}; - static void do_slave_init(struct mlx4_ib_dev *ibdev, int slave, int do_init); static struct workqueue_struct *wq; @@ -474,23 +468,21 @@ out: return err; } -static int iboe_query_gid(struct ib_device *ibdev, u8 port, int index, - union ib_gid *gid) -{ - struct mlx4_ib_dev *dev = to_mdev(ibdev); - - *gid = dev-iboe.gid_table[port - 1][index]; - - return 0; -} - static int mlx4_ib_query_gid(struct ib_device *ibdev, u8 port, int index, union ib_gid *gid) { - if (rdma_port_get_link_layer(ibdev, port) == IB_LINK_LAYER_INFINIBAND) + int ret; + + if (ib_cache_use_roce_gid_cache(ibdev, port)) return __mlx4_ib_query_gid(ibdev, port, index, gid, 0); - else - return iboe_query_gid(ibdev, port, index, gid); + + ret = ib_get_cached_gid(ibdev, port, index, gid, NULL); + if (ret == -EAGAIN) { + memcpy(gid, zgid, sizeof(*gid)); + return 0; + } + + return ret; } int __mlx4_ib_query_pkey(struct ib_device *ibdev, u8 port, u16 index, @@ -1480,273 +1472,6 @@ static struct device_attribute *mlx4_class_attributes[] = { dev_attr_board_id }; -static void mlx4_addrconf_ifid_eui48(u8 *eui, u16 vlan_id, -struct net_device *dev) -{ - memcpy(eui, dev-dev_addr, 3); - memcpy(eui + 5, dev-dev_addr + 3, 3); - if (vlan_id 0x1000) { - eui[3] = vlan_id 8; - eui[4] = vlan_id 0xff; - } else { - eui[3] = 0xff; - eui[4] = 0xfe; - } - eui[0] ^= 2; -} - -static void update_gids_task(struct work_struct *work) -{ - struct update_gid_work *gw = container_of(work, struct update_gid_work, work); - struct mlx4_cmd_mailbox *mailbox; - union ib_gid *gids; - int err; - struct mlx4_dev *dev = gw-dev-dev; - int is_bonded = mlx4_is_bonded(dev); - - if (!gw-dev-ib_active) - return; - - mailbox = mlx4_alloc_cmd_mailbox(dev); - if (IS_ERR(mailbox)) { - pr_warn(update gid table failed %ld\n, PTR_ERR(mailbox)); - return; - } - - gids = mailbox-buf; - memcpy(gids, gw-gids, sizeof gw-gids); - - err = mlx4_cmd(dev, mailbox-dma, MLX4_SET_PORT_GID_TABLE 8 | gw-port, - 1, MLX4_CMD_SET_PORT, MLX4_CMD_TIME_CLASS_B, - MLX4_CMD_WRAPPED); - if (err) - pr_warn(set port command failed\n); - else - if ((gw-port == 1) || !is_bonded) - mlx4_ib_dispatch_event(gw-dev, - is_bonded ? 1 : gw-port, - IB_EVENT_GID_CHANGE); - - mlx4_free_cmd_mailbox(dev, mailbox); - kfree(gw); -} - -static void reset_gids_task(struct work_struct *work) -{ - struct update_gid_work *gw = - container_of(work, struct update_gid_work, work); - struct mlx4_cmd_mailbox *mailbox; - union ib_gid *gids; - int err; - struct mlx4_dev *dev = gw-dev-dev; - - if (!gw-dev-ib_active) - return; - - mailbox = mlx4_alloc_cmd_mailbox(dev); - if (IS_ERR(mailbox)) { - pr_warn(reset gid table failed\n); - goto free; - } - - gids = mailbox-buf; - memcpy(gids, gw-gids, sizeof(gw-gids)); - - if (mlx4_ib_port_link_layer(gw-dev-ib_dev, gw-port
[PATCH v3 for-next 23/33] net/mlx4: Postpone the registration of net_device
From: Moni Shoua mo...@mellanox.com The mlx4 network driver was registered in the context of the 'add' function of the core driver (called when HW should be registered). This makes the netdev event NETDEV_REGISTER to be sent in a context where the answer to get_protocol_dev() callback returns NULL. This may be confusing to listeners of netdev events. This patch is a preparation to the patch that implements the get_netdev() callback in the IB/mlx4 driver. Signed-off-by: Moni Shoua mo...@mellanox.com Signed-off-by: Somnath Kotur somnath.ko...@emulex.com --- drivers/net/ethernet/mellanox/mlx4/en_main.c | 36 drivers/net/ethernet/mellanox/mlx4/intf.c| 3 +++ include/linux/mlx4/driver.h | 1 + 3 files changed, 25 insertions(+), 15 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx4/en_main.c b/drivers/net/ethernet/mellanox/mlx4/en_main.c index 2859ac6..64b4f8d2 100644 --- a/drivers/net/ethernet/mellanox/mlx4/en_main.c +++ b/drivers/net/ethernet/mellanox/mlx4/en_main.c @@ -219,6 +219,26 @@ static void mlx4_en_remove(struct mlx4_dev *dev, void *endev_ptr) kfree(mdev); } +static void mlx4_en_activate(struct mlx4_dev *dev, void *ctx) +{ + int i; + struct mlx4_en_dev *mdev = ctx; + + /* Create a netdev for each port */ + mlx4_foreach_port(i, dev, MLX4_PORT_TYPE_ETH) { + mlx4_info(mdev, Activating port:%d\n, i); + if (mlx4_en_init_netdev(mdev, i, mdev-profile.prof[i])) + mdev-pndev[i] = NULL; + } + + /* register notifier */ + mdev-nb.notifier_call = mlx4_en_netdev_event; + if (register_netdevice_notifier(mdev-nb)) { + mdev-nb.notifier_call = NULL; + mlx4_err(mdev, Failed to create notifier\n); + } +} + static void *mlx4_en_add(struct mlx4_dev *dev) { struct mlx4_en_dev *mdev; @@ -292,21 +312,6 @@ static void *mlx4_en_add(struct mlx4_dev *dev) mutex_init(mdev-state_lock); mdev-device_up = true; - /* Setup ports */ - - /* Create a netdev for each port */ - mlx4_foreach_port(i, dev, MLX4_PORT_TYPE_ETH) { - mlx4_info(mdev, Activating port:%d\n, i); - if (mlx4_en_init_netdev(mdev, i, mdev-profile.prof[i])) - mdev-pndev[i] = NULL; - } - /* register notifier */ - mdev-nb.notifier_call = mlx4_en_netdev_event; - if (register_netdevice_notifier(mdev-nb)) { - mdev-nb.notifier_call = NULL; - mlx4_err(mdev, Failed to create notifier\n); - } - return mdev; err_mr: @@ -330,6 +335,7 @@ static struct mlx4_interface mlx4_en_interface = { .event = mlx4_en_event, .get_dev= mlx4_en_get_netdev, .protocol = MLX4_PROT_ETH, + .activate = mlx4_en_activate, }; static void mlx4_en_verify_params(void) diff --git a/drivers/net/ethernet/mellanox/mlx4/intf.c b/drivers/net/ethernet/mellanox/mlx4/intf.c index a1a5985..ccd4030 100644 --- a/drivers/net/ethernet/mellanox/mlx4/intf.c +++ b/drivers/net/ethernet/mellanox/mlx4/intf.c @@ -63,8 +63,11 @@ static void mlx4_add_device(struct mlx4_interface *intf, struct mlx4_priv *priv) spin_lock_irq(priv-ctx_lock); list_add_tail(dev_ctx-list, priv-ctx_list); spin_unlock_irq(priv-ctx_lock); + if (intf-activate) + intf-activate(priv-dev, dev_ctx-context); } else kfree(dev_ctx); + } static void mlx4_remove_device(struct mlx4_interface *intf, struct mlx4_priv *priv) diff --git a/include/linux/mlx4/driver.h b/include/linux/mlx4/driver.h index 9553a73..5a06d96 100644 --- a/include/linux/mlx4/driver.h +++ b/include/linux/mlx4/driver.h @@ -59,6 +59,7 @@ struct mlx4_interface { void(*event) (struct mlx4_dev *dev, void *context, enum mlx4_dev_event event, unsigned long param); void * (*get_dev)(struct mlx4_dev *dev, void *context, u8 port); + void(*activate)(struct mlx4_dev *dev, void *context); struct list_headlist; enum mlx4_protocol protocol; int flags; -- 2.1.0 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 for-next 31/33] IB/mlx4: Enable send of RoCE QP1 packets with IP/UDP headers
From: Moni Shoua mo...@mellanox.com RoCEv2 packets are sent over IP/UDP protocols. The mlx4 driver uses a type of RAW QP to send packets for QP1 and therefore needs to build the network headers below BTH in software. This patche adds option to build QP1 packets with IP and UDP headers if RoCEv2 is requested. Signed-off-by: Moni Shoua mo...@mellanox.com Signed-off-by: Somnath Kotur somnath.ko...@emulex.com --- drivers/infiniband/hw/mlx4/qp.c | 84 + 1 file changed, 52 insertions(+), 32 deletions(-) diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c index 1141cf0..fb37415 100644 --- a/drivers/infiniband/hw/mlx4/qp.c +++ b/drivers/infiniband/hw/mlx4/qp.c @@ -32,6 +32,8 @@ */ #include linux/log2.h +#include linux/if_ether.h +#include net/ip.h #include linux/slab.h #include linux/netdevice.h @@ -2169,16 +2171,7 @@ static int build_sriov_qp0_header(struct mlx4_ib_sqp *sqp, return 0; } -static void mlx4_u64_to_smac(u8 *dst_mac, u64 src_mac) -{ - int i; - - for (i = ETH_ALEN; i; i--) { - dst_mac[i - 1] = src_mac 0xff; - src_mac = 8; - } -} - +#define MLX4_ROCEV2_QP1_SPORT 0xC000 static int build_mlx_header(struct mlx4_ib_sqp *sqp, struct ib_send_wr *wr, void *wqe, unsigned *mlx_seg_len) { @@ -2198,6 +2191,8 @@ static int build_mlx_header(struct mlx4_ib_sqp *sqp, struct ib_send_wr *wr, bool is_eth; bool is_vlan = false; bool is_grh; + bool is_udp = false; + int ip_version = 0; send_size = 0; for (i = 0; i wr-num_sge; ++i) @@ -2206,6 +2201,8 @@ static int build_mlx_header(struct mlx4_ib_sqp *sqp, struct ib_send_wr *wr, is_eth = rdma_port_get_link_layer(sqp-qp.ibqp.device, sqp-qp.port) == IB_LINK_LAYER_ETHERNET; is_grh = mlx4_ib_ah_grh_present(ah); if (is_eth) { + struct ib_gid_attr gid_attr; + if (mlx4_is_mfunc(to_mdev(ib_dev)-dev)) { /* When multi-function is enabled, the ib_core gid * indexes don't necessarily match the hw ones, so @@ -2216,23 +2213,31 @@ static int build_mlx_header(struct mlx4_ib_sqp *sqp, struct ib_send_wr *wr, if (err) return err; } else { - err = ib_get_cached_gid(ib_dev, + err = ib_get_cached_gid(sqp-qp.ibqp.device, be32_to_cpu(ah-av.ib.port_pd) 24, - ah-av.ib.gid_index, sgid, - NULL); + ah-av.ib.gid_index, sgid, gid_attr); if (!err !memcmp(sgid, zgid, sizeof(sgid))) err = -ENOENT; - if (err) + if (!err) { + is_udp = (gid_attr.gid_type == IB_GID_TYPE_ROCE_V2) ? true : false; + if (is_udp) { + if (ipv6_addr_v4mapped((struct in6_addr *)sgid)) + ip_version = 4; + else + ip_version = 6; + is_grh = false; + } + } else { return err; + } } - if (ah-av.eth.vlan != cpu_to_be16(0x)) { vlan = be16_to_cpu(ah-av.eth.vlan) 0x0fff; is_vlan = 1; } } err = ib_ud_header_init(send_size, !is_eth, is_eth, is_vlan, is_grh, - 0, 0, 0, sqp-ud_header); + ip_version, is_udp, 0, sqp-ud_header); if (err) return err; @@ -2243,12 +2248,14 @@ static int build_mlx_header(struct mlx4_ib_sqp *sqp, struct ib_send_wr *wr, sqp-ud_header.lrh.source_lid = cpu_to_be16(ah-av.ib.g_slid 0x7f); } - if (is_grh) { + if (is_grh || (ip_version == 6)) { sqp-ud_header.grh.traffic_class = (be32_to_cpu(ah-av.ib.sl_tclass_flowlabel) 20) 0xff; sqp-ud_header.grh.flow_label= ah-av.ib.sl_tclass_flowlabel cpu_to_be32(0xf); - sqp-ud_header.grh.hop_limit = ah-av.ib.hop_limit; + + sqp-ud_header.grh.hop_limit = (is_udp) ? + IPV6_DEFAULT_HOPLIMIT : ah-av.ib.hop_limit; if (is_eth) memcpy(sqp-ud_header.grh.source_gid.raw, sgid.raw, 16); else { @@ -2272,6 +2279,26 @@ static int build_mlx_header(struct mlx4_ib_sqp *sqp, struct
[PATCH v3 for-next 30/33] IB/core: Initialize UD header structure with IP and UDP headers
From: Moni Shoua mo...@mellanox.com ib_ud_header_init() is used to format InfiniBand headers in a buffer up to (but not with) BTH. For RoCEv2 it is required that this function would be able to build also IP and UDP headers. Signed-off-by: Moni Shoua mo...@mellanox.com Signed-off-by: Somnath Kotur somnath.ko...@emulex.com --- drivers/infiniband/core/ud_header.c| 153 ++--- drivers/infiniband/hw/mlx4/qp.c| 7 +- drivers/infiniband/hw/mthca/mthca_qp.c | 2 +- include/rdma/ib_pack.h | 44 -- 4 files changed, 186 insertions(+), 20 deletions(-) diff --git a/drivers/infiniband/core/ud_header.c b/drivers/infiniband/core/ud_header.c index 72feee6..a4d4072 100644 --- a/drivers/infiniband/core/ud_header.c +++ b/drivers/infiniband/core/ud_header.c @@ -35,6 +35,7 @@ #include linux/string.h #include linux/export.h #include linux/if_ether.h +#include linux/ip.h #include rdma/ib_pack.h @@ -116,6 +117,68 @@ static const struct ib_field vlan_table[] = { .size_bits= 16 } }; +static const struct ib_field ip4_table[] = { + { STRUCT_FIELD(ip4, ver_len), + .offset_words = 0, + .offset_bits = 0, + .size_bits= 8 }, + { STRUCT_FIELD(ip4, tos), + .offset_words = 0, + .offset_bits = 8, + .size_bits= 8 }, + { STRUCT_FIELD(ip4, tot_len), + .offset_words = 0, + .offset_bits = 16, + .size_bits= 16 }, + { STRUCT_FIELD(ip4, id), + .offset_words = 1, + .offset_bits = 0, + .size_bits= 16 }, + { STRUCT_FIELD(ip4, frag_off), + .offset_words = 1, + .offset_bits = 16, + .size_bits= 16 }, + { STRUCT_FIELD(ip4, ttl), + .offset_words = 2, + .offset_bits = 0, + .size_bits= 8 }, + { STRUCT_FIELD(ip4, protocol), + .offset_words = 2, + .offset_bits = 8, + .size_bits= 8 }, + { STRUCT_FIELD(ip4, check), + .offset_words = 2, + .offset_bits = 16, + .size_bits= 16 }, + { STRUCT_FIELD(ip4, saddr), + .offset_words = 3, + .offset_bits = 0, + .size_bits= 32 }, + { STRUCT_FIELD(ip4, daddr), + .offset_words = 4, + .offset_bits = 0, + .size_bits= 32 } +}; + +static const struct ib_field udp_table[] = { + { STRUCT_FIELD(udp, sport), + .offset_words = 0, + .offset_bits = 0, + .size_bits= 16 }, + { STRUCT_FIELD(udp, dport), + .offset_words = 0, + .offset_bits = 16, + .size_bits= 16 }, + { STRUCT_FIELD(udp, length), + .offset_words = 1, + .offset_bits = 0, + .size_bits= 16 }, + { STRUCT_FIELD(udp, csum), + .offset_words = 1, + .offset_bits = 16, + .size_bits= 16 } +}; + static const struct ib_field grh_table[] = { { STRUCT_FIELD(grh, ip_version), .offset_words = 0, @@ -213,6 +276,26 @@ static const struct ib_field deth_table[] = { .size_bits= 24 } }; +__be16 ib_ud_ip4_csum(struct ib_ud_header *header) +{ + struct iphdr iph; + + iph.ihl = 5; + iph.version = 4; + iph.tos = header-ip4.tos; + iph.tot_len = header-ip4.tot_len; + iph.id = header-ip4.id; + iph.frag_off= header-ip4.frag_off; + iph.ttl = header-ip4.ttl; + iph.protocol= header-ip4.protocol; + iph.check = 0; + iph.saddr = header-ip4.saddr; + iph.daddr = header-ip4.daddr; + + return ip_fast_csum((u8 *)iph, iph.ihl); +} +EXPORT_SYMBOL(ib_ud_ip4_csum); + /** * ib_ud_header_init - Initialize UD header structure * @payload_bytes:Length of packet payload @@ -220,19 +303,35 @@ static const struct ib_field deth_table[] = { * @eth_present: specify if Eth header is present * @vlan_present: packet is tagged vlan * @grh_present:GRH flag (if non-zero, GRH will be included) + * @ip_version:GRH flag (if non-zero, IP header, V4 or V6, will be included) + * @grh_present:GRH flag (if non-zero, UDP header will be included) * @immediate_present: specify if immediate data is present * @header:Structure to initialize */ -void ib_ud_header_init(int payload_bytes, - int lrh_present, - int eth_present, - int vlan_present, - int grh_present, - int immediate_present, - struct ib_ud_header *header) +int ib_ud_header_init(int payload_bytes, + intlrh_present, + inteth_present, + intvlan_present, + intgrh_present
[PATCH v3 for-next 01/33] IB/core: Add RoCE GID cache
From: Matan Barak mat...@mellanox.com In order to manage multiple types, vlans and MACs per GID, we need to store them along the GID itself. We store the net device as well, as sometimes GIDs should be handled according to the net device they came from. Since populating the GID table should be identical for every RoCE provider, the GIDs table should be handled in ib_core. Adding a GID cache table that supports a lockless find, add and delete gids. The lockless nature comes from using a unique sequence number per table entry and detecting that while reading/ writing this sequence wasn't changed. By using this RoCE GID cache table, providers must implement a modify_gid callback. The table is managed exclusively by this roce_gid_cache and the provider just need to write the data to the hardware. Signed-off-by: Matan Barak mat...@mellanox.com Signed-off-by: Somnath Kotur somnath.ko...@emulex.com --- drivers/infiniband/core/Makefile | 3 +- drivers/infiniband/core/core_priv.h | 24 ++ drivers/infiniband/core/roce_gid_cache.c | 518 +++ drivers/infiniband/hw/mlx4/main.c| 2 - include/rdma/ib_verbs.h | 55 +++- 5 files changed, 598 insertions(+), 4 deletions(-) create mode 100644 drivers/infiniband/core/roce_gid_cache.c diff --git a/drivers/infiniband/core/Makefile b/drivers/infiniband/core/Makefile index acf7367..9b63bdf 100644 --- a/drivers/infiniband/core/Makefile +++ b/drivers/infiniband/core/Makefile @@ -9,7 +9,8 @@ obj-$(CONFIG_INFINIBAND_USER_ACCESS) += ib_uverbs.o ib_ucm.o \ $(user_access-y) ib_core-y := packer.o ud_header.o verbs.o sysfs.o \ - device.o fmr_pool.o cache.o netlink.o + device.o fmr_pool.o cache.o netlink.o \ + roce_gid_cache.o ib_core-$(CONFIG_INFINIBAND_USER_MEM) += umem.o ib_core-$(CONFIG_INFINIBAND_ON_DEMAND_PAGING) += umem_odp.o umem_rbtree.o diff --git a/drivers/infiniband/core/core_priv.h b/drivers/infiniband/core/core_priv.h index 87d1936..a502daa 100644 --- a/drivers/infiniband/core/core_priv.h +++ b/drivers/infiniband/core/core_priv.h @@ -35,6 +35,7 @@ #include linux/list.h #include linux/spinlock.h +#include net/net_namespace.h #include rdma/ib_verbs.h @@ -51,4 +52,27 @@ void ib_cache_cleanup(void); int ib_resolve_eth_l2_attrs(struct ib_qp *qp, struct ib_qp_attr *qp_attr, int *qp_attr_mask); + +int roce_gid_cache_get_gid(struct ib_device *ib_dev, u8 port, int index, + union ib_gid *gid, struct ib_gid_attr *attr); + +int roce_gid_cache_find_gid(struct ib_device *ib_dev, union ib_gid *gid, + enum ib_gid_type gid_type, struct net *net, + int if_index, u8 *port, u16 *index); + +int roce_gid_cache_find_gid_by_port(struct ib_device *ib_dev, union ib_gid *gid, + enum ib_gid_type gid_type, u8 port, + struct net *net, int if_index, u16 *index); + +int roce_gid_cache_is_active(struct ib_device *ib_dev, u8 port); + +int roce_add_gid(struct ib_device *ib_dev, u8 port, +union ib_gid *gid, struct ib_gid_attr *attr); + +int roce_del_gid(struct ib_device *ib_dev, u8 port, +union ib_gid *gid, struct ib_gid_attr *attr); + +int roce_del_all_netdev_gids(struct ib_device *ib_dev, u8 port, +struct net_device *ndev); + #endif /* _CORE_PRIV_H */ diff --git a/drivers/infiniband/core/roce_gid_cache.c b/drivers/infiniband/core/roce_gid_cache.c new file mode 100644 index 000..80f364a --- /dev/null +++ b/drivers/infiniband/core/roce_gid_cache.c @@ -0,0 +1,518 @@ +/* + * Copyright (c) 2015, Mellanox Technologies inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + *copyright notice, this list of conditions and the following + *disclaimer. + * + * - Redistributions in binary form must reproduce the above + *copyright notice, this list of conditions and the following + *disclaimer in the documentation and/or other materials + *provided with the distribution. + * + * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT
[PATCH v3 for-next 27/33] IB/mlx4: Configure device to work in RoCEv2
From: Moni Shoua mo...@mellanox.com Some mlx4 adapters are RoCEv2 capable. To enable this feature some hardware configuration is required. This is 1. Set port general parameters 2. Configure the outgoing UDP destination port 3. Configure the QP that work with RoCEv2 Signed-off-by: Moni Shoua mo...@mellanox.com Signed-off-by: Somnath Kotur somnath.ko...@emulex.com --- drivers/infiniband/hw/mlx4/main.c | 10 +++- drivers/infiniband/hw/mlx4/qp.c | 40 +++ drivers/net/ethernet/mellanox/mlx4/fw.c | 16 - drivers/net/ethernet/mellanox/mlx4/mlx4.h | 3 ++- drivers/net/ethernet/mellanox/mlx4/port.c | 9 ++- drivers/net/ethernet/mellanox/mlx4/qp.c | 27 + include/linux/mlx4/device.h | 1 + include/linux/mlx4/qp.h | 15 ++-- 8 files changed, 111 insertions(+), 10 deletions(-) diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c index 96a6ec0..ee99f62 100644 --- a/drivers/infiniband/hw/mlx4/main.c +++ b/drivers/infiniband/hw/mlx4/main.c @@ -2168,7 +2168,8 @@ static void *mlx4_ib_add(struct mlx4_dev *dev) if (mlx4_ib_init_sriov(ibdev)) goto err_mad; - if (dev-caps.flags MLX4_DEV_CAP_FLAG_IBOE) { + if (dev-caps.flags MLX4_DEV_CAP_FLAG_IBOE || + dev-caps.flags2 MLX4_DEV_CAP_FLAG2_ROCE_V1_V2) { if (!iboe-nb.notifier_call) { iboe-nb.notifier_call = mlx4_ib_netdev_event; err = register_netdevice_notifier(iboe-nb); @@ -2177,6 +2178,13 @@ static void *mlx4_ib_add(struct mlx4_dev *dev) goto err_notif; } } + if (!mlx4_is_slave(dev) + dev-caps.flags2 MLX4_DEV_CAP_FLAG2_ROCE_V1_V2) { + err = mlx4_config_roce_v2_port(dev, ROCE_V2_UDP_DPORT); + if (err) { + goto err_notif; + } + } } for (j = 0; j ARRAY_SIZE(mlx4_class_attributes); ++j) { diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c index 6f6d0db..847f9ec 100644 --- a/drivers/infiniband/hw/mlx4/qp.c +++ b/drivers/infiniband/hw/mlx4/qp.c @@ -1408,6 +1408,24 @@ static int handle_eth_ud_smac_index(struct mlx4_ib_dev *dev, return 0; } +enum { + MLX4_QPC_ROCE_MODE_1 = 0, + MLX4_QPC_ROCE_MODE_2 = 2, + MLX4_QPC_ROCE_MODE_MAX = 0xff +}; + +static u8 gid_type_to_qpc(enum ib_gid_type gid_type) +{ + switch (gid_type) { + case IB_GID_TYPE_IB: + return MLX4_QPC_ROCE_MODE_1; + case IB_GID_TYPE_ROCE_V2: + return MLX4_QPC_ROCE_MODE_2; + default: + return MLX4_QPC_ROCE_MODE_MAX; + } +} + static int __mlx4_ib_modify_qp(struct ib_qp *ibqp, const struct ib_qp_attr *attr, int attr_mask, enum ib_qp_state cur_state, enum ib_qp_state new_state) @@ -1531,12 +1549,14 @@ static int __mlx4_ib_modify_qp(struct ib_qp *ibqp, u16 vlan = 0x; u8 smac[ETH_ALEN]; int status = 0; + int is_eth = rdma_port_get_link_layer(dev-ib_dev, qp-port) == + IB_LINK_LAYER_ETHERNET; - if (rdma_port_get_link_layer(dev-ib_dev, qp-port) == - IB_LINK_LAYER_ETHERNET - attr-ah_attr.ah_flags IB_AH_GRH) { + if (is_eth attr-ah_attr.ah_flags IB_AH_GRH) { int index = attr-ah_attr.grh.sgid_index; + if (mlx4_is_bonded(dev-dev)) + port_num = 1; rcu_read_lock(); status = ib_get_cached_gid(ibqp-device, port_num, index, gid, gid_attr); @@ -1555,8 +1575,20 @@ static int __mlx4_ib_modify_qp(struct ib_qp *ibqp, port_num, vlan, smac)) goto out; + if (is_eth gid_attr.gid_type == IB_GID_TYPE_ROCE_V2) + context-pri_path.hop_limit = IPV6_DEFAULT_HOPLIMIT; + optpar |= (MLX4_QP_OPTPAR_PRIMARY_ADDR_PATH | MLX4_QP_OPTPAR_SCHED_QUEUE); + + if (is_eth (cur_state == IB_QPS_INIT new_state == IB_QPS_RTR)) { + u8 qpc_roce_mode = gid_type_to_qpc(gid_attr.gid_type); + + if (qpc_roce_mode == MLX4_QPC_ROCE_MODE_MAX) + goto out; + context-rlkey_roce_mode |= (qpc_roce_mode 6); + } + } if (attr_mask IB_QP_TIMEOUT) { @@ -1728,7 +1760,7 @@ static int __mlx4_ib_modify_qp(struct ib_qp *ibqp, sqd_event = 0; if (!ibqp-uobject
RE: [PATCH v2 for-next 00/32] RoCE V1/v2 per GID
Hi Roland, Could you please chime in on this patch series? Its been more than a week since we sent out V2? Thanks Som From: linux-rdma-ow...@vger.kernel.org [linux-rdma-ow...@vger.kernel.org] on behalf of Somnath Kotur [somnath.ko...@emulex.com] Sent: Wednesday, March 11, 2015 10:25 AM To: rol...@kernel.org Cc: linux-rdma@vger.kernel.org; Somnath Kotur Subject: [PATCH v2 for-next 00/32] RoCE V1/v2 per GID Hi Roland, This patch series was created out of collaboration between Emulex and Mellanox. While Emulex sent out the RoCEV2 patch first to the community, Mellanox which was also working on some core infrastructure changes from the ground-up towards RoCEV2 felt that the RoCEV2 patch would be better served if done on top of their basic infrastructure changes to associate entities like MAC, VLAN, IP Address with GIDs and thereby move GID Table Management from HW Vendor drivers to IB/Core. This patchset is the result of joint development effort between the two teams. RoCE per GID patch-set aims to introduce RoCE V2 GID type while maintaining compatibility with RoCE V1. This is done by adding a type attribute for every GID type in addition to the required extra net device attribute required for RoCE V2. Previously, every vendor implemented its net device notifiers in its own driver. This introduces a huge code duplication as figuring whether the event is related to the vendor's net device in the various cases (bonding, vlan or any other upper device) is similar for all vendors. Introducing multiple GID types and other attributes would have made this code duplication even worse. Therefore, we decided moving this into a common core core. roce_gid_cache and roce_gid_mgmt were created in order to store and manage the new GID table, by filling it when getting the related events. Vendors now only have to implement modify_gid and get_netdev IB device calls, which are truly unique for each vendor. Patch 0001 creates a new infrastructure for storing GIDs and their attributes in IB/core. This infrastructure support lock-less read of GIDs using a sequence number. The data structure is initialized only for RoCE ports. Every gid has meta information describes its related net device and its type. Patch 0002 adds a reference count mechanism to IB devices. This mechanism is similar to dev_hold and dev_put available for net devices. This is mandatory for later patches as IB clients might want to wait for its work to complete in the device removal function, but a work might traverse the device list. This might cause a dead lock, as the removal function grabbed the device lock and in turn it waits for the client's work which wants to grab the device mutex as well. Patches 0003, 0004 and 0006 add population of this table for various cases based on net device events. We always enable default gids for an active device (an active device is defined here as a device that doesn't have a bonding master or is the current active slave). This is done in order to allow loopback traffic. Patch 0005 adds proper bonding support - only the active slaves retain their master's IP based gids and default gids. This whole concept needs to fit the existing sysfs model, thus patch 0008 adds sysfs entries that represent the net device and gid type related to each gid. Patch 0009 adds a new API for RoCE gid cache lookup. Since users might want to find a GID which matches a net device with a specific attributes, the new API allows them to pass a filter function. This function is a bit slower than the regular find by gid, gid_type, if_index and namespace - thus it should be used only when necessary. Patches 0007, 0010, 0011 and 0012 changes the rest of IB/core to fit the new model. Instead of storing smac and vlan, we store either if_index, gid and gid_type or sgid_index. Either set suffices in order to resolve all the required Ethernet parameters. ib_init_ah_from_wc was changed, such that when a wc is arrived, we search our RoCE gid cache in order to find a suitable sgid_index that matches the net device. Matching is done based on GID and VLAN. Patch 0013 is used in order to configure the default mode of the cma. In order to avoid changing existing rdma-cm applications, we adds a configfs that states for each ib device what's the default RoCE mode. Patch 0014 is the post refactored version of the original RoCE V2 patch from Emulex that now mainly corrects the hop limit value and adds a hint about RoCE type based on whether we have a gateway. This is the patch that makes it possible for applications to seamlessly interop between RoCE V1 and V2 without undergoing any changes themselves. Patch 0029 deals with serializing QP1 packets for software based QP1 and the last patch handles joining and leaving IGMP groups for RoCE V2 multicast functionality. The rest of the patches add support for ocrdma and mlx4 devices. This series depends on RoCE LAG series (already accepted in net-next
[PATCH v2 for-next 30/32] IB/mlx4: Enable send of RoCE QP1 packets with IP/UDP headers
From: Moni Shoua mo...@mellanox.com RoCEv2 packets are sent over IP/UDP protocols. The mlx4 driver uses a type of RAW QP to send packets for QP1 and therefore needs to build the network headers below BTH in software. This patche adds option to build QP1 packets with IP and UDP headers if RoCEv2 is requested. Signed-off-by: Moni Shoua mo...@mellanox.com Signed-off-by: Somnath Kotur somnath.ko...@emulex.com --- drivers/infiniband/hw/mlx4/qp.c | 84 + 1 file changed, 52 insertions(+), 32 deletions(-) diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c index 1141cf0..fb37415 100644 --- a/drivers/infiniband/hw/mlx4/qp.c +++ b/drivers/infiniband/hw/mlx4/qp.c @@ -32,6 +32,8 @@ */ #include linux/log2.h +#include linux/if_ether.h +#include net/ip.h #include linux/slab.h #include linux/netdevice.h @@ -2169,16 +2171,7 @@ static int build_sriov_qp0_header(struct mlx4_ib_sqp *sqp, return 0; } -static void mlx4_u64_to_smac(u8 *dst_mac, u64 src_mac) -{ - int i; - - for (i = ETH_ALEN; i; i--) { - dst_mac[i - 1] = src_mac 0xff; - src_mac = 8; - } -} - +#define MLX4_ROCEV2_QP1_SPORT 0xC000 static int build_mlx_header(struct mlx4_ib_sqp *sqp, struct ib_send_wr *wr, void *wqe, unsigned *mlx_seg_len) { @@ -2198,6 +2191,8 @@ static int build_mlx_header(struct mlx4_ib_sqp *sqp, struct ib_send_wr *wr, bool is_eth; bool is_vlan = false; bool is_grh; + bool is_udp = false; + int ip_version = 0; send_size = 0; for (i = 0; i wr-num_sge; ++i) @@ -2206,6 +2201,8 @@ static int build_mlx_header(struct mlx4_ib_sqp *sqp, struct ib_send_wr *wr, is_eth = rdma_port_get_link_layer(sqp-qp.ibqp.device, sqp-qp.port) == IB_LINK_LAYER_ETHERNET; is_grh = mlx4_ib_ah_grh_present(ah); if (is_eth) { + struct ib_gid_attr gid_attr; + if (mlx4_is_mfunc(to_mdev(ib_dev)-dev)) { /* When multi-function is enabled, the ib_core gid * indexes don't necessarily match the hw ones, so @@ -2216,23 +2213,31 @@ static int build_mlx_header(struct mlx4_ib_sqp *sqp, struct ib_send_wr *wr, if (err) return err; } else { - err = ib_get_cached_gid(ib_dev, + err = ib_get_cached_gid(sqp-qp.ibqp.device, be32_to_cpu(ah-av.ib.port_pd) 24, - ah-av.ib.gid_index, sgid, - NULL); + ah-av.ib.gid_index, sgid, gid_attr); if (!err !memcmp(sgid, zgid, sizeof(sgid))) err = -ENOENT; - if (err) + if (!err) { + is_udp = (gid_attr.gid_type == IB_GID_TYPE_ROCE_V2) ? true : false; + if (is_udp) { + if (ipv6_addr_v4mapped((struct in6_addr *)sgid)) + ip_version = 4; + else + ip_version = 6; + is_grh = false; + } + } else { return err; + } } - if (ah-av.eth.vlan != cpu_to_be16(0x)) { vlan = be16_to_cpu(ah-av.eth.vlan) 0x0fff; is_vlan = 1; } } err = ib_ud_header_init(send_size, !is_eth, is_eth, is_vlan, is_grh, - 0, 0, 0, sqp-ud_header); + ip_version, is_udp, 0, sqp-ud_header); if (err) return err; @@ -2243,12 +2248,14 @@ static int build_mlx_header(struct mlx4_ib_sqp *sqp, struct ib_send_wr *wr, sqp-ud_header.lrh.source_lid = cpu_to_be16(ah-av.ib.g_slid 0x7f); } - if (is_grh) { + if (is_grh || (ip_version == 6)) { sqp-ud_header.grh.traffic_class = (be32_to_cpu(ah-av.ib.sl_tclass_flowlabel) 20) 0xff; sqp-ud_header.grh.flow_label= ah-av.ib.sl_tclass_flowlabel cpu_to_be32(0xf); - sqp-ud_header.grh.hop_limit = ah-av.ib.hop_limit; + + sqp-ud_header.grh.hop_limit = (is_udp) ? + IPV6_DEFAULT_HOPLIMIT : ah-av.ib.hop_limit; if (is_eth) memcpy(sqp-ud_header.grh.source_gid.raw, sgid.raw, 16); else { @@ -2272,6 +2279,26 @@ static int build_mlx_header(struct mlx4_ib_sqp *sqp, struct
[PATCH v2 for-next 31/32] IB/mlx4: Create and use another QP1 for RoCEv2
From: Moni Shoua mo...@mellanox.com The mlx4 driver uses a special QP to implement the GSI QP. This kind of QP allows to build the InfiniBand headers in SW to be put before the payload that comes in with the WR. The mlx4 HW builds the packet, calculates the ICRC and puts it at the end of the payload. This ICRC calculation however depends on the QP configuration which is determined when QP is modified (roce_mode during INIT-RTR). On the other hand, ICRC verification when packet is received does to depend on this configuration. Therefore, using 2 GSI QPs for send (one for each RoCE version) and 1 GSI QP for receive are required. Signed-off-by: Moni Shoua mo...@mellanox.com Signed-off-by: Somnath Kotur somnath.ko...@emulex.com --- drivers/infiniband/hw/mlx4/mlx4_ib.h | 7 ++ drivers/infiniband/hw/mlx4/qp.c | 155 +++ 2 files changed, 144 insertions(+), 18 deletions(-) diff --git a/drivers/infiniband/hw/mlx4/mlx4_ib.h b/drivers/infiniband/hw/mlx4/mlx4_ib.h index 018bda6..a853330 100644 --- a/drivers/infiniband/hw/mlx4/mlx4_ib.h +++ b/drivers/infiniband/hw/mlx4/mlx4_ib.h @@ -159,11 +159,18 @@ struct mlx4_ib_wq { unsignedtail; }; +enum { + MLX4_IB_QP_CREATE_ROCE_V2_GSI = IB_QP_CREATE_RESERVED_START +}; + enum mlx4_ib_qp_flags { MLX4_IB_QP_LSO = IB_QP_CREATE_IPOIB_UD_LSO, MLX4_IB_QP_BLOCK_MULTICAST_LOOPBACK = IB_QP_CREATE_BLOCK_MULTICAST_LOOPBACK, MLX4_IB_QP_NETIF = IB_QP_CREATE_NETIF_QP, MLX4_IB_QP_CREATE_USE_GFP_NOIO = IB_QP_CREATE_USE_GFP_NOIO, + + /* Mellanox specific flags start from IB_QP_CREATE_RESERVED_START */ + MLX4_IB_ROCE_V2_GSI_QP = MLX4_IB_QP_CREATE_ROCE_V2_GSI, MLX4_IB_SRIOV_TUNNEL_QP = 1 30, MLX4_IB_SRIOV_SQP = 1 31, }; diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c index fb37415..b54f315 100644 --- a/drivers/infiniband/hw/mlx4/qp.c +++ b/drivers/infiniband/hw/mlx4/qp.c @@ -81,6 +81,7 @@ struct mlx4_ib_sqp { u32 send_psn; struct ib_ud_header ud_header; u8 header_buf[MLX4_IB_UD_HEADER_SIZE]; + struct ib_qp*roce_v2_gsi; }; enum { @@ -150,7 +151,10 @@ static int is_sqp(struct mlx4_ib_dev *dev, struct mlx4_ib_qp *qp) } } } - return proxy_sqp; + if (proxy_sqp) + return 1; + + return !!(qp-flags MLX4_IB_ROCE_V2_GSI_QP); } /* used for INIT/CLOSE port logic */ @@ -672,6 +676,7 @@ static int create_qp_common(struct mlx4_ib_dev *dev, struct ib_pd *pd, qp = sqp-qp; qp-pri.vid = 0x; qp-alt.vid = 0x; + sqp-roce_v2_gsi = NULL; } else { qp = kzalloc(sizeof (struct mlx4_ib_qp), gfp); if (!qp) @@ -1029,9 +1034,17 @@ static void destroy_qp_common(struct mlx4_ib_dev *dev, struct mlx4_ib_qp *qp, del_gid_entries(qp); } -static u32 get_sqp_num(struct mlx4_ib_dev *dev, struct ib_qp_init_attr *attr) +static int get_sqp_num(struct mlx4_ib_dev *dev, struct ib_qp_init_attr *attr) { /* Native or PPF */ + if ((!mlx4_is_mfunc(dev-dev) || mlx4_is_master(dev-dev)) + attr-create_flags MLX4_IB_QP_CREATE_ROCE_V2_GSI) { + int sqpn; + int res = mlx4_qp_reserve_range(dev-dev, 1, 1, sqpn, 0); + + return res ? -abs(res) : sqpn; + } + if (!mlx4_is_mfunc(dev-dev) || (mlx4_is_master(dev-dev) attr-create_flags MLX4_IB_SRIOV_SQP)) { @@ -1039,6 +1052,7 @@ static u32 get_sqp_num(struct mlx4_ib_dev *dev, struct ib_qp_init_attr *attr) (attr-qp_type == IB_QPT_SMI ? 0 : 2) + attr-port_num - 1; } + /* PF or VF -- creating proxies */ if (attr-qp_type == IB_QPT_SMI) return dev-dev-caps.qp0_proxy[attr-port_num - 1]; @@ -1046,9 +1060,9 @@ static u32 get_sqp_num(struct mlx4_ib_dev *dev, struct ib_qp_init_attr *attr) return dev-dev-caps.qp1_proxy[attr-port_num - 1]; } -struct ib_qp *mlx4_ib_create_qp(struct ib_pd *pd, - struct ib_qp_init_attr *init_attr, - struct ib_udata *udata) +static struct ib_qp *_mlx4_ib_create_qp(struct ib_pd *pd, + struct ib_qp_init_attr *init_attr, + struct ib_udata *udata) { struct mlx4_ib_qp *qp = NULL; int err; @@ -1066,6 +1080,7 @@ struct ib_qp *mlx4_ib_create_qp(struct ib_pd *pd, MLX4_IB_SRIOV_TUNNEL_QP | MLX4_IB_SRIOV_SQP | MLX4_IB_QP_NETIF | + MLX4_IB_QP_CREATE_ROCE_V2_GSI
[PATCH v2 for-next 22/32] net/mlx4: Postpone the registration of net_device
From: Moni Shoua mo...@mellanox.com The mlx4 network driver was registered in the context of the 'add' function of the core driver (called when HW should be registered). This makes the netdev event NETDEV_REGISTER to be sent in a context where the answer to get_protocol_dev() callback returns NULL. This may be confusing to listeners of netdev events. This patch is a preparation to the patch that implements the get_netdev() callback in the IB/mlx4 driver. Signed-off-by: Moni Shoua mo...@mellanox.com Signed-off-by: Somnath Kotur somnath.ko...@emulex.com --- drivers/net/ethernet/mellanox/mlx4/en_main.c | 36 drivers/net/ethernet/mellanox/mlx4/intf.c| 3 +++ include/linux/mlx4/driver.h | 1 + 3 files changed, 25 insertions(+), 15 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx4/en_main.c b/drivers/net/ethernet/mellanox/mlx4/en_main.c index 2859ac6..64b4f8d2 100644 --- a/drivers/net/ethernet/mellanox/mlx4/en_main.c +++ b/drivers/net/ethernet/mellanox/mlx4/en_main.c @@ -219,6 +219,26 @@ static void mlx4_en_remove(struct mlx4_dev *dev, void *endev_ptr) kfree(mdev); } +static void mlx4_en_activate(struct mlx4_dev *dev, void *ctx) +{ + int i; + struct mlx4_en_dev *mdev = ctx; + + /* Create a netdev for each port */ + mlx4_foreach_port(i, dev, MLX4_PORT_TYPE_ETH) { + mlx4_info(mdev, Activating port:%d\n, i); + if (mlx4_en_init_netdev(mdev, i, mdev-profile.prof[i])) + mdev-pndev[i] = NULL; + } + + /* register notifier */ + mdev-nb.notifier_call = mlx4_en_netdev_event; + if (register_netdevice_notifier(mdev-nb)) { + mdev-nb.notifier_call = NULL; + mlx4_err(mdev, Failed to create notifier\n); + } +} + static void *mlx4_en_add(struct mlx4_dev *dev) { struct mlx4_en_dev *mdev; @@ -292,21 +312,6 @@ static void *mlx4_en_add(struct mlx4_dev *dev) mutex_init(mdev-state_lock); mdev-device_up = true; - /* Setup ports */ - - /* Create a netdev for each port */ - mlx4_foreach_port(i, dev, MLX4_PORT_TYPE_ETH) { - mlx4_info(mdev, Activating port:%d\n, i); - if (mlx4_en_init_netdev(mdev, i, mdev-profile.prof[i])) - mdev-pndev[i] = NULL; - } - /* register notifier */ - mdev-nb.notifier_call = mlx4_en_netdev_event; - if (register_netdevice_notifier(mdev-nb)) { - mdev-nb.notifier_call = NULL; - mlx4_err(mdev, Failed to create notifier\n); - } - return mdev; err_mr: @@ -330,6 +335,7 @@ static struct mlx4_interface mlx4_en_interface = { .event = mlx4_en_event, .get_dev= mlx4_en_get_netdev, .protocol = MLX4_PROT_ETH, + .activate = mlx4_en_activate, }; static void mlx4_en_verify_params(void) diff --git a/drivers/net/ethernet/mellanox/mlx4/intf.c b/drivers/net/ethernet/mellanox/mlx4/intf.c index a1a5985..ccd4030 100644 --- a/drivers/net/ethernet/mellanox/mlx4/intf.c +++ b/drivers/net/ethernet/mellanox/mlx4/intf.c @@ -63,8 +63,11 @@ static void mlx4_add_device(struct mlx4_interface *intf, struct mlx4_priv *priv) spin_lock_irq(priv-ctx_lock); list_add_tail(dev_ctx-list, priv-ctx_list); spin_unlock_irq(priv-ctx_lock); + if (intf-activate) + intf-activate(priv-dev, dev_ctx-context); } else kfree(dev_ctx); + } static void mlx4_remove_device(struct mlx4_interface *intf, struct mlx4_priv *priv) diff --git a/include/linux/mlx4/driver.h b/include/linux/mlx4/driver.h index 9553a73..5a06d96 100644 --- a/include/linux/mlx4/driver.h +++ b/include/linux/mlx4/driver.h @@ -59,6 +59,7 @@ struct mlx4_interface { void(*event) (struct mlx4_dev *dev, void *context, enum mlx4_dev_event event, unsigned long param); void * (*get_dev)(struct mlx4_dev *dev, void *context, u8 port); + void(*activate)(struct mlx4_dev *dev, void *context); struct list_headlist; enum mlx4_protocol protocol; int flags; -- 2.1.0 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 for-next 06/32] IB/core: Add RoCE cache bonding support
From: Matan Barak mat...@mellanox.com Bonding is a unique behavior since when working in active-backup mode, only the current selected slave should occupy the default GIDs and the master's GID. Listening to bonding events and only adding the required GIDs to the active slave in the RoCE cache GID table. Signed-off-by: Matan Barak mat...@mellanox.com Signed-off-by: Somnath Kotur somnath.ko...@emulex.com --- drivers/infiniband/core/roce_gid_mgmt.c | 228 ++-- drivers/net/bonding/bond_options.c | 13 -- include/net/bonding.h | 7 + 3 files changed, 227 insertions(+), 21 deletions(-) diff --git a/drivers/infiniband/core/roce_gid_mgmt.c b/drivers/infiniband/core/roce_gid_mgmt.c index 3c11a64..bf7ef95 100644 --- a/drivers/infiniband/core/roce_gid_mgmt.c +++ b/drivers/infiniband/core/roce_gid_mgmt.c @@ -37,6 +37,7 @@ /* For in6_dev_get/in6_dev_put */ #include net/addrconf.h +#include net/bonding.h #include rdma/ib_cache.h #include rdma/ib_addr.h @@ -55,7 +56,7 @@ struct update_gid_event_work { enum gid_op_type gid_op; }; -#define ROCE_NETDEV_CALLBACK_SZ2 +#define ROCE_NETDEV_CALLBACK_SZ3 struct netdev_event_work_cmd { roce_netdev_callbackcb; roce_netdev_filter filter; @@ -127,22 +128,96 @@ static void update_gid(enum gid_op_type gid_op, struct ib_device *ib_dev, } } +#define IS_NETDEV_BONDING_MASTER(ndev) \ + (((ndev)-priv_flags \ + (IFF_BONDING | IFF_MASTER)) == (IFF_BONDING | IFF_MASTER)) + +enum bonding_slave_state { + BONDING_SLAVE_STATE_ACTIVE = 1UL 0, + BONDING_SLAVE_STATE_INACTIVE= 1UL 1, + BONDING_SLAVE_STATE_NA = 1UL 2, +}; + +static enum bonding_slave_state is_eth_active_slave_of_bonding(struct net_device *idev, + struct net_device *upper) +{ + if (upper IS_NETDEV_BONDING_MASTER(upper)) { + struct net_device *pdev; + + rcu_read_lock(); + pdev = bond_option_active_slave_get_rcu(netdev_priv(upper)); + rcu_read_unlock(); + if (pdev) + return idev == pdev ? BONDING_SLAVE_STATE_ACTIVE : + BONDING_SLAVE_STATE_INACTIVE; + } + + return BONDING_SLAVE_STATE_NA; +} + +static bool is_upper_dev_rcu(struct net_device *dev, struct net_device *upper) +{ + struct net_device *_upper = NULL; + struct list_head *iter; + + rcu_read_lock(); + netdev_for_each_all_upper_dev_rcu(dev, _upper, iter) { + if (_upper == upper) + break; + } + + rcu_read_unlock(); + return _upper == upper; +} + +static int _is_eth_port_of_netdev(struct ib_device *ib_dev, u8 port, + struct net_device *idev, void *cookie, + unsigned long bond_state) +{ + struct net_device *ndev = (struct net_device *)cookie; + struct net_device *rdev; + int res; + + if (!idev) + return 0; + + rcu_read_lock(); + rdev = rdma_vlan_dev_real_dev(ndev); + if (!rdev) + rdev = ndev; + + res = ((is_upper_dev_rcu(idev, ndev) + (is_eth_active_slave_of_bonding(idev, rdev) + bond_state)) || + rdev == idev); + + rcu_read_unlock(); + return res; +} + static int is_eth_port_of_netdev(struct ib_device *ib_dev, u8 port, struct net_device *idev, void *cookie) { - struct net_device *rdev; - struct net_device *mdev; - struct net_device *ndev = (struct net_device *)cookie; + return _is_eth_port_of_netdev(ib_dev, port, idev, cookie, + BONDING_SLAVE_STATE_ACTIVE | + BONDING_SLAVE_STATE_NA); +} +static int is_eth_port_inactive_slave(struct ib_device *ib_dev, u8 port, + struct net_device *idev, void *cookie) +{ + struct net_device *mdev; + int res; if (!idev) return 0; rcu_read_lock(); mdev = netdev_master_upper_dev_get_rcu(idev); - rdev = rdma_vlan_dev_real_dev(ndev); + res = is_eth_active_slave_of_bonding(idev, mdev) == + BONDING_SLAVE_STATE_INACTIVE; rcu_read_unlock(); - return (rdev ? rdev : ndev) == (mdev ? mdev : idev); + return res; } static int pass_all_filter(struct ib_device *ib_dev, u8 port, @@ -151,6 +226,26 @@ static int pass_all_filter(struct ib_device *ib_dev, u8 port, return 1; } +static int bonding_slaves_filter(struct ib_device *ib_dev, u8 port, +struct net_device *idev, void *cookie) +{ + struct net_device *rdev; + struct net_device *ndev = (struct net_device *)cookie
[PATCH v2 for-next 16/32] RDMA/ocrdma: changes to support RoCE-v2 in UD path
From: Devesh Sharma devesh.sha...@emulex.com To support UD protocol this patch adds following changes to existing UD implementation. 1. AH creation resolves gid-type for a given index. 2. Based on GID-type protocol header is built. 3. Work completion reports l3-type if f/w supports RoCE-v2 and sets IB_WC_WITH_NETWORK_HDR_TYPE flag in wc-wc_flags. 4. Set hop_limit to enable non RDMA-CM applications for RoCEV2. Signed-off-by: Somnath Kotur somnath.ko...@emulex.com Signed-off-by: Devesh Sharma devesh.sha...@emulex.com --- drivers/infiniband/hw/ocrdma/ocrdma.h |1 + drivers/infiniband/hw/ocrdma/ocrdma_ah.c| 70 ++- drivers/infiniband/hw/ocrdma/ocrdma_sli.h |5 ++- drivers/infiniband/hw/ocrdma/ocrdma_verbs.c | 23 +++-- 4 files changed, 82 insertions(+), 17 deletions(-) diff --git a/drivers/infiniband/hw/ocrdma/ocrdma.h b/drivers/infiniband/hw/ocrdma/ocrdma.h index 97f971a..302fd0e 100644 --- a/drivers/infiniband/hw/ocrdma/ocrdma.h +++ b/drivers/infiniband/hw/ocrdma/ocrdma.h @@ -341,6 +341,7 @@ struct ocrdma_ah { struct ocrdma_av *av; u16 sgid_index; u32 id; + u8 hdr_type; }; struct ocrdma_qp_hwq_info { diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_ah.c b/drivers/infiniband/hw/ocrdma/ocrdma_ah.c index 7ecd230..6f838f1 100644 --- a/drivers/infiniband/hw/ocrdma/ocrdma_ah.c +++ b/drivers/infiniband/hw/ocrdma/ocrdma_ah.c @@ -39,6 +39,20 @@ #define OCRDMA_VID_PCP_SHIFT 0xD +static u16 ocrdma_hdr_type_to_proto_num(u8 hdr_type) +{ + switch (hdr_type) { + case OCRDMA_L3_TYPE_IB_GRH: + return (u16)0x8915; + case OCRDMA_L3_TYPE_IPV4: + return (u16)0x0800; + case OCRDMA_L3_TYPE_IPV6: + return (u16)0x86dd; + default: + return 0; + } +} + static inline int set_av_attr(struct ocrdma_dev *dev, struct ocrdma_ah *ah, struct ib_ah_attr *attr, union ib_gid *sgid, int pdid, bool *isvlan, u16 vlan_tag) @@ -47,22 +61,33 @@ static inline int set_av_attr(struct ocrdma_dev *dev, struct ocrdma_ah *ah, struct ocrdma_eth_vlan eth; struct ocrdma_grh grh; int eth_sz; + u16 proto_num = 0; + u8 nxthdr = 0x11; + struct iphdr ipv4; + union { + struct sockaddr _sockaddr; + struct sockaddr_in _sockaddr_in; + struct sockaddr_in6 _sockaddr_in6; + } sgid_addr, dgid_addr; memset(eth, 0, sizeof(eth)); memset(grh, 0, sizeof(grh)); + /* Protocol Number */ + proto_num = ocrdma_hdr_type_to_proto_num(ah-hdr_type); + nxthdr = (proto_num == 0x8915) ? 0x1b : 0x11; /* VLAN */ if (!vlan_tag || (vlan_tag 0xFFF)) vlan_tag = dev-pvid; if (vlan_tag (vlan_tag 0x1000)) { eth.eth_type = cpu_to_be16(0x8100); - eth.roce_eth_type = cpu_to_be16(OCRDMA_ROCE_ETH_TYPE); + eth.roce_eth_type = cpu_to_be16(proto_num); vlan_tag |= (dev-sl 0x07) OCRDMA_VID_PCP_SHIFT; eth.vlan_tag = cpu_to_be16(vlan_tag); eth_sz = sizeof(struct ocrdma_eth_vlan); *isvlan = true; } else { - eth.eth_type = cpu_to_be16(OCRDMA_ROCE_ETH_TYPE); + eth.eth_type = cpu_to_be16(proto_num); eth_sz = sizeof(struct ocrdma_eth_basic); } /* MAC */ @@ -71,18 +96,34 @@ static inline int set_av_attr(struct ocrdma_dev *dev, struct ocrdma_ah *ah, if (status) return status; ah-sgid_index = attr-grh.sgid_index; - memcpy(grh.sgid[0], sgid-raw, sizeof(union ib_gid)); - memcpy(grh.dgid[0], attr-grh.dgid.raw, sizeof(attr-grh.dgid.raw)); - - grh.tclass_flow = cpu_to_be32((6 28) | - (attr-grh.traffic_class 24) | - attr-grh.flow_label); - /* 0x1b is next header value in GRH */ - grh.pdid_hoplimit = cpu_to_be32((pdid 16) | - (0x1b 8) | attr-grh.hop_limit); /* Eth HDR */ memcpy(ah-av-eth_hdr, eth, eth_sz); - memcpy((u8 *)ah-av + eth_sz, grh, sizeof(struct ocrdma_grh)); + if (ah-hdr_type == RDMA_NETWORK_IPV4) { + *((__be16 *)ipv4) = htons((4 12) | (5 8) | + attr-grh.traffic_class); + ipv4.id = cpu_to_be16(pdid); + ipv4.frag_off = htons(IP_DF); + ipv4.tot_len = htons(0); + ipv4.ttl = attr-grh.hop_limit; + ipv4.protocol = nxthdr; + rdma_gid2ip(sgid_addr._sockaddr, sgid); + ipv4.saddr = sgid_addr._sockaddr_in.sin_addr.s_addr; + rdma_gid2ip(dgid_addr._sockaddr, attr-grh.dgid); + ipv4.daddr = dgid_addr._sockaddr_in.sin_addr.s_addr; + memcpy((u8 *)ah-av + eth_sz, ipv4, sizeof(struct iphdr
[PATCH v2 for-next 17/32] RDMA/ocrdma: changes to support RoCE-v2 in RC path
From: Devesh Sharma devesh.sha...@emulex.com To support RoCE-V2 this patch implements following changes 1. Get the GID-type for a given sgid. 2. Based on the gid type get IPv4 L3 address and give those to FW. 3. Provide l3-type to FW. Signed-off-by: Somnath Kotur somnath.ko...@emulex.com Signed-off-by: Devesh Sharma devesh.sha...@emulex.com --- drivers/infiniband/hw/ocrdma/ocrdma_hw.c | 30 -- 1 file changed, 28 insertions(+), 2 deletions(-) diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_hw.c b/drivers/infiniband/hw/ocrdma/ocrdma_hw.c index 20f9e8f..147fccf 100644 --- a/drivers/infiniband/hw/ocrdma/ocrdma_hw.c +++ b/drivers/infiniband/hw/ocrdma/ocrdma_hw.c @@ -2433,7 +2433,13 @@ static int ocrdma_set_av_params(struct ocrdma_qp *qp, union ib_gid sgid, zgid; struct ib_gid_attr sgid_attr; u32 vlan_id = 0x; - u8 mac_addr[6]; + u8 mac_addr[6], hdr_type; + union { + struct sockaddr _sockaddr; + struct sockaddr_in _sockaddr_in; + struct sockaddr_in6 _sockaddr_in6; + } sgid_addr, dgid_addr; + struct ocrdma_dev *dev = get_ocrdma_dev(qp-ibqp.device); if ((ah_attr-ah_flags IB_AH_GRH) == 0) @@ -2448,6 +2454,8 @@ static int ocrdma_set_av_params(struct ocrdma_qp *qp, cmd-params.hop_lmt_rq_psn |= (ah_attr-grh.hop_limit OCRDMA_QP_PARAMS_HOP_LMT_SHIFT); cmd-flags |= OCRDMA_QP_PARA_FLOW_LBL_VALID; + + /* GIDs */ memcpy(cmd-params.dgid[0], ah_attr-grh.dgid.raw[0], sizeof(cmd-params.dgid)); @@ -2471,17 +2479,35 @@ static int ocrdma_set_av_params(struct ocrdma_qp *qp, return status; cmd-params.dmac_b0_to_b3 = mac_addr[0] | (mac_addr[1] 8) | (mac_addr[2] 16) | (mac_addr[3] 24); + hdr_type = ib_gid_to_network_type(sgid_attr.gid_type, sgid); + if (hdr_type == RDMA_NETWORK_IPV4) { + status = rdma_gid2ip(sgid_addr._sockaddr, sgid); + if (status) + return status; + status = rdma_gid2ip(dgid_addr._sockaddr, ah_attr-grh.dgid); + if (status) + return status; + memcpy(cmd-params.dgid[0], + dgid_addr._sockaddr_in.sin_addr.s_addr, 4); + memcpy(cmd-params.sgid[0], + sgid_addr._sockaddr_in.sin_addr.s_addr, 4); + } /* convert them to LE format. */ ocrdma_cpu_to_le32(cmd-params.dgid[0], sizeof(cmd-params.dgid)); ocrdma_cpu_to_le32(cmd-params.sgid[0], sizeof(cmd-params.sgid)); cmd-params.vlan_dmac_b4_to_b5 = mac_addr[4] | (mac_addr[5] 8); - if (attr_mask IB_QP_VID) { + if (vlan_id 0x1000) { cmd-params.vlan_dmac_b4_to_b5 |= vlan_id OCRDMA_QP_PARAMS_VLAN_SHIFT; cmd-flags |= OCRDMA_QP_PARA_VLAN_EN_VALID; cmd-params.rnt_rc_sl_fl |= (dev-sl 0x07) OCRDMA_QP_PARAMS_SL_SHIFT; } + + cmd-params.max_sge_recv_flags |= +((hdr_type +OCRDMA_QP_PARAMS_FLAGS_L3_TYPE_SHIFT) +OCRDMA_QP_PARAMS_FLAGS_L3_TYPE_MASK); return 0; } -- 2.1.0 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 for-next 21/32] IB/mlx4: Lock with RCU instead of RTNL
From: Moni Shoua mo...@mellanox.com The function eth_link_query_port() used to take the RTNL lock when call to netdev_master_upper_dev_get() was necessary. This makes it impossible to call this function with RTNL lock is held. Calling netdev_master_upper_dev_get_rcu() and locking with RCU instead solve this problem. Signed-off-by: Moni Shoua mo...@mellanox.com Signed-off-by: Somnath Kotur somnath.ko...@emulex.com --- drivers/infiniband/hw/mlx4/main.c | 15 +++ 1 file changed, 7 insertions(+), 8 deletions(-) diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c index d8b227e..32cd009 100644 --- a/drivers/infiniband/hw/mlx4/main.c +++ b/drivers/infiniband/hw/mlx4/main.c @@ -367,14 +367,15 @@ static int eth_link_query_port(struct ib_device *ibdev, u8 port, props-state= IB_PORT_DOWN; props-phys_state = state_to_phys_state(props-state); props-active_mtu = IB_MTU_256; - if (is_bonded) - rtnl_lock(); /* required to get upper dev */ down_read(iboe-sem); ndev = iboe-netdevs[port - 1]; - if (ndev is_bonded) - ndev = netdev_master_upper_dev_get(ndev); + if (ndev is_bonded) { + rcu_read_lock(); /* required to get upper dev */ + ndev = netdev_master_upper_dev_get_rcu(ndev); + rcu_read_unlock(); + } if (!ndev) - goto out_unlock; + goto unlock; tmp = iboe_get_mtu(ndev-mtu); props-active_mtu = tmp ? min(props-max_mtu, tmp) : IB_MTU_256; @@ -382,10 +383,8 @@ static int eth_link_query_port(struct ib_device *ibdev, u8 port, props-state= (netif_running(ndev) netif_carrier_ok(ndev)) ? IB_PORT_ACTIVE : IB_PORT_DOWN; props-phys_state = state_to_phys_state(props-state); -out_unlock: +unlock: up_read(iboe-sem); - if (is_bonded) - rtnl_unlock(); out: mlx4_free_cmd_mailbox(mdev-dev, mailbox); return err; -- 2.1.0 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 for-next 18/32] RDMA/ocrdma: changes to support user AH creation
From: Devesh Sharma devesh.sha...@emulex.com To support user space AH this uses ahid field to convey l3-type to user space library. The library is responsible for decoding the l3-type out of ahid. Signed-off-by: Somnath Kotur somnath.ko...@emulex.com Signed-off-by: Devesh Sharma devesh.sha...@emulex.com --- drivers/infiniband/hw/ocrdma/ocrdma_ah.c | 5 + drivers/infiniband/hw/ocrdma/ocrdma_ah.h | 5 +++-- 2 files changed, 8 insertions(+), 2 deletions(-) diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_ah.c b/drivers/infiniband/hw/ocrdma/ocrdma_ah.c index 1bb72a0..65a39cc 100644 --- a/drivers/infiniband/hw/ocrdma/ocrdma_ah.c +++ b/drivers/infiniband/hw/ocrdma/ocrdma_ah.c @@ -191,6 +191,11 @@ struct ib_ah *ocrdma_create_ah(struct ib_pd *ibpd, struct ib_ah_attr *attr) ahid_addr = pd-uctx-ah_tbl.va + attr-dlid; *ahid_addr = 0; *ahid_addr |= ah-id OCRDMA_AH_ID_MASK; + if (ocrdma_is_rocev2_supported(dev)) { + *ahid_addr |= ((u32)ah-hdr_type + OCRDMA_AH_L3_TYPE_MASK) + OCRDMA_AH_L3_TYPE_SHIFT; + } if (isvlan) *ahid_addr |= (OCRDMA_AH_VLAN_VALID_MASK OCRDMA_AH_VLAN_VALID_SHIFT); diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_ah.h b/drivers/infiniband/hw/ocrdma/ocrdma_ah.h index 726a87c..ed45ecd 100644 --- a/drivers/infiniband/hw/ocrdma/ocrdma_ah.h +++ b/drivers/infiniband/hw/ocrdma/ocrdma_ah.h @@ -31,9 +31,10 @@ enum { OCRDMA_AH_ID_MASK = 0x3FF, OCRDMA_AH_VLAN_VALID_MASK = 0x01, - OCRDMA_AH_VLAN_VALID_SHIFT = 0x1F + OCRDMA_AH_VLAN_VALID_SHIFT = 0x1F, + OCRDMA_AH_L3_TYPE_MASK = 0x03, + OCRDMA_AH_L3_TYPE_SHIFT = 0x1D /* 29 bits */ }; - struct ib_ah *ocrdma_create_ah(struct ib_pd *, struct ib_ah_attr *); int ocrdma_destroy_ah(struct ib_ah *); int ocrdma_query_ah(struct ib_ah *, struct ib_ah_attr *); -- 2.1.0 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 for-next 20/32] IB/mlx4: Replace spin_lock with rw_semaphore
From: Moni Shoua mo...@mellanox.com Protection on iboe-netdevs is no longer required to be from an atomic context. Replacing a spin_lock with a semaphore is allowed and makes more sense. Signed-off-by: Moni Shoua mo...@mellanox.com Signed-off-by: Somnath Kotur somnath.ko...@emulex.com --- drivers/infiniband/hw/mlx4/main.c| 27 ++- drivers/infiniband/hw/mlx4/mlx4_ib.h | 2 +- 2 files changed, 11 insertions(+), 18 deletions(-) diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c index 91caffc..d8b227e 100644 --- a/drivers/infiniband/hw/mlx4/main.c +++ b/drivers/infiniband/hw/mlx4/main.c @@ -369,7 +369,7 @@ static int eth_link_query_port(struct ib_device *ibdev, u8 port, props-active_mtu = IB_MTU_256; if (is_bonded) rtnl_lock(); /* required to get upper dev */ - spin_lock_bh(iboe-lock); + down_read(iboe-sem); ndev = iboe-netdevs[port - 1]; if (ndev is_bonded) ndev = netdev_master_upper_dev_get(ndev); @@ -383,7 +383,7 @@ static int eth_link_query_port(struct ib_device *ibdev, u8 port, IB_PORT_ACTIVE : IB_PORT_DOWN; props-phys_state = state_to_phys_state(props-state); out_unlock: - spin_unlock_bh(iboe-lock); + up_read(iboe-sem); if (is_bonded) rtnl_unlock(); out: @@ -825,11 +825,11 @@ int mlx4_ib_add_mc(struct mlx4_ib_dev *mdev, struct mlx4_ib_qp *mqp, if (!mqp-port) return 0; - spin_lock_bh(mdev-iboe.lock); + down_read(mdev-iboe.sem); ndev = mdev-iboe.netdevs[mqp-port - 1]; if (ndev) dev_hold(ndev); - spin_unlock_bh(mdev-iboe.lock); + up_read(mdev-iboe.sem); if (ndev) { ret = 1; @@ -1330,7 +1330,6 @@ static int mlx4_ib_mcg_detach(struct ib_qp *ibqp, union ib_gid *gid, u16 lid) struct mlx4_ib_dev *mdev = to_mdev(ibqp-device); struct mlx4_dev *dev = mdev-dev; struct mlx4_ib_qp *mqp = to_mqp(ibqp); - struct net_device *ndev; struct mlx4_ib_gid_entry *ge; enum mlx4_protocol prot = MLX4_PROT_IB_IPV6; struct mlx4_flow_reg_id reg_id = {0, 0}; @@ -1370,13 +1369,6 @@ static int mlx4_ib_mcg_detach(struct ib_qp *ibqp, union ib_gid *gid, u16 lid) mutex_lock(mqp-mutex); ge = find_gid_entry(mqp, gid-raw); if (ge) { - spin_lock_bh(mdev-iboe.lock); - ndev = ge-added ? mdev-iboe.netdevs[ge-port - 1] : NULL; - if (ndev) - dev_hold(ndev); - spin_unlock_bh(mdev-iboe.lock); - if (ndev) - dev_put(ndev); list_del(ge-list); kfree(ge); } else @@ -1543,7 +1535,7 @@ static void mlx4_ib_scan_netdevs(struct mlx4_ib_dev *ibdev, iboe = ibdev-iboe; - spin_lock_bh(iboe-lock); + down_write(iboe-sem); mlx4_foreach_ib_transport_port(port, ibdev-dev) { iboe-netdevs[port - 1] = @@ -1555,7 +1547,7 @@ static void mlx4_ib_scan_netdevs(struct mlx4_ib_dev *ibdev, update_qps_port = port; } - spin_unlock_bh(iboe-lock); + up_write(iboe-sem); if (update_qps_port 0) mlx4_ib_update_qps(ibdev, dev, update_qps_port); @@ -1848,7 +1840,7 @@ static void *mlx4_ib_add(struct mlx4_dev *dev) mlx4_ib_alloc_eqs(dev, ibdev); - spin_lock_init(iboe-lock); + init_rwsem(iboe-sem); if (init_node_data(ibdev)) goto err_map; @@ -2153,7 +2145,8 @@ static void handle_bonded_port_state_event(struct work_struct *work) struct ib_event ibev; kfree(ew); - spin_lock_bh(ibdev-iboe.lock); + + down_read(ibdev-iboe.sem); for (i = 0; i MLX4_MAX_PORTS; ++i) { struct net_device *curr_netdev = ibdev-iboe.netdevs[i]; @@ -2165,7 +2158,7 @@ static void handle_bonded_port_state_event(struct work_struct *work) bonded_port_state = (bonded_port_state != IB_PORT_ACTIVE) ? curr_port_state : IB_PORT_ACTIVE; } - spin_unlock_bh(ibdev-iboe.lock); + up_read(ibdev-iboe.sem); ibev.device = ibdev-ib_dev; ibev.element.port_num = 1; diff --git a/drivers/infiniband/hw/mlx4/mlx4_ib.h b/drivers/infiniband/hw/mlx4/mlx4_ib.h index e3805a4..166ebf9 100644 --- a/drivers/infiniband/hw/mlx4/mlx4_ib.h +++ b/drivers/infiniband/hw/mlx4/mlx4_ib.h @@ -455,7 +455,7 @@ struct mlx4_ib_sriov { }; struct mlx4_ib_iboe { - spinlock_t lock; + struct rw_semaphore sem; /* guard from concurrent access to data in this struct */ struct net_device *netdevs[MLX4_MAX_PORTS]; atomic64_t mac[MLX4_MAX_PORTS]; struct notifier_block nb; -- 2.1.0 -- To unsubscribe from this list: send
[PATCH v2 for-next 29/32] IB/core: Initialize UD header structure with IP and UDP headers
From: Moni Shoua mo...@mellanox.com ib_ud_header_init() is used to format InfiniBand headers in a buffer up to (but not with) BTH. For RoCEv2 it is required that this function would be able to build also IP and UDP headers. Signed-off-by: Moni Shoua mo...@mellanox.com Signed-off-by: Somnath Kotur somnath.ko...@emulex.com --- drivers/infiniband/core/ud_header.c| 153 ++--- drivers/infiniband/hw/mlx4/qp.c| 7 +- drivers/infiniband/hw/mthca/mthca_qp.c | 2 +- include/rdma/ib_pack.h | 44 -- 4 files changed, 186 insertions(+), 20 deletions(-) diff --git a/drivers/infiniband/core/ud_header.c b/drivers/infiniband/core/ud_header.c index 72feee6..a4d4072 100644 --- a/drivers/infiniband/core/ud_header.c +++ b/drivers/infiniband/core/ud_header.c @@ -35,6 +35,7 @@ #include linux/string.h #include linux/export.h #include linux/if_ether.h +#include linux/ip.h #include rdma/ib_pack.h @@ -116,6 +117,68 @@ static const struct ib_field vlan_table[] = { .size_bits= 16 } }; +static const struct ib_field ip4_table[] = { + { STRUCT_FIELD(ip4, ver_len), + .offset_words = 0, + .offset_bits = 0, + .size_bits= 8 }, + { STRUCT_FIELD(ip4, tos), + .offset_words = 0, + .offset_bits = 8, + .size_bits= 8 }, + { STRUCT_FIELD(ip4, tot_len), + .offset_words = 0, + .offset_bits = 16, + .size_bits= 16 }, + { STRUCT_FIELD(ip4, id), + .offset_words = 1, + .offset_bits = 0, + .size_bits= 16 }, + { STRUCT_FIELD(ip4, frag_off), + .offset_words = 1, + .offset_bits = 16, + .size_bits= 16 }, + { STRUCT_FIELD(ip4, ttl), + .offset_words = 2, + .offset_bits = 0, + .size_bits= 8 }, + { STRUCT_FIELD(ip4, protocol), + .offset_words = 2, + .offset_bits = 8, + .size_bits= 8 }, + { STRUCT_FIELD(ip4, check), + .offset_words = 2, + .offset_bits = 16, + .size_bits= 16 }, + { STRUCT_FIELD(ip4, saddr), + .offset_words = 3, + .offset_bits = 0, + .size_bits= 32 }, + { STRUCT_FIELD(ip4, daddr), + .offset_words = 4, + .offset_bits = 0, + .size_bits= 32 } +}; + +static const struct ib_field udp_table[] = { + { STRUCT_FIELD(udp, sport), + .offset_words = 0, + .offset_bits = 0, + .size_bits= 16 }, + { STRUCT_FIELD(udp, dport), + .offset_words = 0, + .offset_bits = 16, + .size_bits= 16 }, + { STRUCT_FIELD(udp, length), + .offset_words = 1, + .offset_bits = 0, + .size_bits= 16 }, + { STRUCT_FIELD(udp, csum), + .offset_words = 1, + .offset_bits = 16, + .size_bits= 16 } +}; + static const struct ib_field grh_table[] = { { STRUCT_FIELD(grh, ip_version), .offset_words = 0, @@ -213,6 +276,26 @@ static const struct ib_field deth_table[] = { .size_bits= 24 } }; +__be16 ib_ud_ip4_csum(struct ib_ud_header *header) +{ + struct iphdr iph; + + iph.ihl = 5; + iph.version = 4; + iph.tos = header-ip4.tos; + iph.tot_len = header-ip4.tot_len; + iph.id = header-ip4.id; + iph.frag_off= header-ip4.frag_off; + iph.ttl = header-ip4.ttl; + iph.protocol= header-ip4.protocol; + iph.check = 0; + iph.saddr = header-ip4.saddr; + iph.daddr = header-ip4.daddr; + + return ip_fast_csum((u8 *)iph, iph.ihl); +} +EXPORT_SYMBOL(ib_ud_ip4_csum); + /** * ib_ud_header_init - Initialize UD header structure * @payload_bytes:Length of packet payload @@ -220,19 +303,35 @@ static const struct ib_field deth_table[] = { * @eth_present: specify if Eth header is present * @vlan_present: packet is tagged vlan * @grh_present:GRH flag (if non-zero, GRH will be included) + * @ip_version:GRH flag (if non-zero, IP header, V4 or V6, will be included) + * @grh_present:GRH flag (if non-zero, UDP header will be included) * @immediate_present: specify if immediate data is present * @header:Structure to initialize */ -void ib_ud_header_init(int payload_bytes, - int lrh_present, - int eth_present, - int vlan_present, - int grh_present, - int immediate_present, - struct ib_ud_header *header) +int ib_ud_header_init(int payload_bytes, + intlrh_present, + inteth_present, + intvlan_present, + intgrh_present
[PATCH v2 for-next 03/32] IB/core: Add RoCE GID population
From: Matan Barak mat...@mellanox.com In order to populate the GID table, we need to listen for events: (a) IB device has been added or removed - used in order to allocate/deallocate the cache and populate the GID table internally. (b) inet events - add new GIDs (according to the IP addresses) to the table. (c) netdev up/down/change_addr - if a netdev is built onto our RoCE device, we need to add/delete its IPs. When an event is received, multiple entries (each with different GID type) are added. Signed-off-by: Matan Barak mat...@mellanox.com Signed-off-by: Somnath Kotur somnath.ko...@emulex.com --- drivers/infiniband/core/Makefile | 2 +- drivers/infiniband/core/core_priv.h | 26 ++ drivers/infiniband/core/device.c | 80 + drivers/infiniband/core/roce_gid_cache.c | 66 drivers/infiniband/core/roce_gid_mgmt.c | 516 +++ include/rdma/ib_addr.h | 2 +- include/rdma/ib_verbs.h | 9 + 7 files changed, 699 insertions(+), 2 deletions(-) create mode 100644 drivers/infiniband/core/roce_gid_mgmt.c diff --git a/drivers/infiniband/core/Makefile b/drivers/infiniband/core/Makefile index 9b63bdf..2c94963 100644 --- a/drivers/infiniband/core/Makefile +++ b/drivers/infiniband/core/Makefile @@ -10,7 +10,7 @@ obj-$(CONFIG_INFINIBAND_USER_ACCESS) += ib_uverbs.o ib_ucm.o \ ib_core-y := packer.o ud_header.o verbs.o sysfs.o \ device.o fmr_pool.o cache.o netlink.o \ - roce_gid_cache.o + roce_gid_cache.o roce_gid_mgmt.o ib_core-$(CONFIG_INFINIBAND_USER_MEM) += umem.o ib_core-$(CONFIG_INFINIBAND_ON_DEMAND_PAGING) += umem_odp.o umem_rbtree.o diff --git a/drivers/infiniband/core/core_priv.h b/drivers/infiniband/core/core_priv.h index a502daa..12797d9 100644 --- a/drivers/infiniband/core/core_priv.h +++ b/drivers/infiniband/core/core_priv.h @@ -39,6 +39,8 @@ #include rdma/ib_verbs.h +extern struct workqueue_struct *roce_gid_mgmt_wq; + int ib_device_register_sysfs(struct ib_device *device, int (*port_callback)(struct ib_device *, u8, struct kobject *)); @@ -53,6 +55,22 @@ void ib_cache_cleanup(void); int ib_resolve_eth_l2_attrs(struct ib_qp *qp, struct ib_qp_attr *qp_attr, int *qp_attr_mask); +typedef void (*roce_netdev_callback)(struct ib_device *device, u8 port, + struct net_device *idev, void *cookie); + +typedef int (*roce_netdev_filter)(struct ib_device *device, u8 port, +struct net_device *idev, void *cookie); + +void ib_dev_roce_ports_of_netdev(struct ib_device *ib_dev, +roce_netdev_filter filter, +void *filter_cookie, +roce_netdev_callback cb, +void *cookie); +void ib_enum_roce_ports_of_netdev(roce_netdev_filter filter, + void *filter_cookie, + roce_netdev_callback cb, + void *cookie); + int roce_gid_cache_get_gid(struct ib_device *ib_dev, u8 port, int index, union ib_gid *gid, struct ib_gid_attr *attr); @@ -66,6 +84,9 @@ int roce_gid_cache_find_gid_by_port(struct ib_device *ib_dev, union ib_gid *gid, int roce_gid_cache_is_active(struct ib_device *ib_dev, u8 port); +int roce_gid_cache_setup(void); +void roce_gid_cache_cleanup(void); + int roce_add_gid(struct ib_device *ib_dev, u8 port, union ib_gid *gid, struct ib_gid_attr *attr); @@ -75,4 +96,9 @@ int roce_del_gid(struct ib_device *ib_dev, u8 port, int roce_del_all_netdev_gids(struct ib_device *ib_dev, u8 port, struct net_device *ndev); +int roce_gid_mgmt_init(void); +void roce_gid_mgmt_cleanup(void); + +int roce_rescan_device(struct ib_device *ib_dev); + #endif /* _CORE_PRIV_H */ diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c index 8616a95..5ce57bf 100644 --- a/drivers/infiniband/core/device.c +++ b/drivers/infiniband/core/device.c @@ -39,6 +39,7 @@ #include linux/init.h #include linux/mutex.h #include rdma/rdma_netlink.h +#include rdma/ib_addr.h #include core_priv.h @@ -640,6 +641,82 @@ int ib_query_gid(struct ib_device *device, EXPORT_SYMBOL(ib_query_gid); /** + * ib_dev_roce_ports_of_netdev - enumerate RoCE ports of ibdev in + * respect of netdev + * @ib_dev : IB device we want to query + * @filter: Should we call the callback? + * @filter_cookie: Cookie passed to filter + * @cb: Callback to call for each found RoCE ports + * @cookie: Cookie passed back to the callback + * + * Enumerates all of the physical RoCE ports of ib_dev RoCE ports + * which are relaying Ethernet packets to a specific + * (possibly
[PATCH v2 for-next 04/32] IB/core: Add default GID for RoCE GID Cache
From: Matan Barak mat...@mellanox.com When RoCE is used, a default GID address should be generated for every supported RoCE type. These default GID addresses are generated based on the IPv6 link-local address, but in contrast to the GID based on the regular IPv6 link-local (as we generate GID per IP address), these GIDs are also available if the net device is down (in order to support loopback). Moreover, these default GID addresses can't be deleted. Signed-off-by: Matan Barak mat...@mellanox.com Signed-off-by: Somnath Kotur somnath.ko...@emulex.com --- drivers/infiniband/core/core_priv.h | 10 drivers/infiniband/core/roce_gid_cache.c | 86 drivers/infiniband/core/roce_gid_mgmt.c | 43 +--- include/net/addrconf.h | 31 net/ipv6/addrconf.c | 31 5 files changed, 163 insertions(+), 38 deletions(-) diff --git a/drivers/infiniband/core/core_priv.h b/drivers/infiniband/core/core_priv.h index 12797d9..6ab40a9 100644 --- a/drivers/infiniband/core/core_priv.h +++ b/drivers/infiniband/core/core_priv.h @@ -84,6 +84,16 @@ int roce_gid_cache_find_gid_by_port(struct ib_device *ib_dev, union ib_gid *gid, int roce_gid_cache_is_active(struct ib_device *ib_dev, u8 port); +enum roce_gid_cache_default_mode { + ROCE_GID_CACHE_DEFAULT_MODE_SET, + ROCE_GID_CACHE_DEFAULT_MODE_DELETE +}; + +void roce_gid_cache_set_default_gid(struct ib_device *ib_dev, u8 port, + struct net_device *ndev, + unsigned long gid_type_mask, + enum roce_gid_cache_default_mode mode); + int roce_gid_cache_setup(void); void roce_gid_cache_cleanup(void); diff --git a/drivers/infiniband/core/roce_gid_cache.c b/drivers/infiniband/core/roce_gid_cache.c index 2b0a310..2bd663f 100644 --- a/drivers/infiniband/core/roce_gid_cache.c +++ b/drivers/infiniband/core/roce_gid_cache.c @@ -34,6 +34,7 @@ #include linux/netdevice.h #include linux/rtnetlink.h #include rdma/ib_cache.h +#include net/addrconf.h #include core_priv.h @@ -176,12 +177,19 @@ static int find_gid(struct ib_roce_gid_cache *cache, union ib_gid *gid, return -1; } +static void make_default_gid(struct net_device *dev, union ib_gid *gid) +{ + gid-global.subnet_prefix = cpu_to_be64(0xfe80LL); + addrconf_ifid_eui48(gid-raw[8], dev); +} + int roce_add_gid(struct ib_device *ib_dev, u8 port, union ib_gid *gid, struct ib_gid_attr *attr) { struct ib_roce_gid_cache *cache; int ix; int ret = 0; + struct net_device *idev; if (!ib_dev-cache.roce_gid_cache) return -ENOSYS; @@ -191,6 +199,22 @@ int roce_add_gid(struct ib_device *ib_dev, u8 port, if (!cache || !cache-active) return -ENOSYS; + if (ib_dev-get_netdev) { + rcu_read_lock(); + idev = ib_dev-get_netdev(ib_dev, port); + if (idev attr-ndev != idev) { + union ib_gid default_gid; + + /* Adding default GIDs in not permitted */ + make_default_gid(idev, default_gid); + if (!memcmp(gid, default_gid, sizeof(*gid))) { + rcu_read_unlock(); + return -EPERM; + } + } + rcu_read_unlock(); + } + mutex_lock(cache-lock); ix = find_gid(cache, gid, attr, GID_ATTR_FIND_MASK_GID_TYPE | @@ -215,6 +239,7 @@ int roce_del_gid(struct ib_device *ib_dev, u8 port, union ib_gid *gid, struct ib_gid_attr *attr) { struct ib_roce_gid_cache *cache; + union ib_gid default_gid; int ix; if (!ib_dev-cache.roce_gid_cache) @@ -225,6 +250,13 @@ int roce_del_gid(struct ib_device *ib_dev, u8 port, if (!cache || !cache-active) return -ENOSYS; + if (attr-ndev) { + /* Deleting default GIDs in not permitted */ + make_default_gid(attr-ndev, default_gid); + if (!memcmp(gid, default_gid, sizeof(*gid))) + return -EPERM; + } + mutex_lock(cache-lock); ix = find_gid(cache, gid, attr, @@ -437,6 +469,60 @@ static void set_roce_gid_cache_active(struct ib_roce_gid_cache *cache, cache-active = active; } +void roce_gid_cache_set_default_gid(struct ib_device *ib_dev, u8 port, + struct net_device *ndev, + unsigned long gid_type_mask, + enum roce_gid_cache_default_mode mode) +{ + union ib_gid gid; + struct ib_gid_attr gid_attr; + struct ib_roce_gid_cache *cache; + unsigned int gid_type; + unsigned int gid_index = 0; + + cache = ib_dev-cache.roce_gid_cache
[PATCH v2 for-next 12/32] IB/core: Add rdma_network_type to wc
From: Matan Barak mat...@mellanox.com Providers should tell IB core the wc's network type. This is used in order to search for the proper GID in the GID table. When using HCAs that can't provide this info, IB core tries to deep examine the packet and extract the GID type by itself. Signed-off-by: Matan Barak mat...@mellanox.com Signed-off-by: Somnath Kotur somnath.ko...@emulex.com --- drivers/infiniband/core/verbs.c | 106 ++-- include/rdma/ib_verbs.h | 30 2 files changed, 131 insertions(+), 5 deletions(-) diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c index 2f5fd7a..2e7ccad 100644 --- a/drivers/infiniband/core/verbs.c +++ b/drivers/infiniband/core/verbs.c @@ -195,8 +195,84 @@ struct ib_ah *ib_create_ah(struct ib_pd *pd, struct ib_ah_attr *ah_attr) } EXPORT_SYMBOL(ib_create_ah); +static int ib_get_grh_header_version(const void *h) +{ + const struct iphdr *ip4h = (struct iphdr *)(h + 20); + struct iphdr ip4h_checked; + const struct ipv6hdr *ip6h = (struct ipv6hdr *)h; + + if (ip6h-version != 6) + return (ip4h-version == 4) ? 4 : 0; + /* version may be 6 or 4 */ + if (ip4h-ihl != 5) /* IPv4 header length must be 5 for RR */ + return 6; + /* Verify checksum. + We can't write on scattered buffers so we need to copy to + temp buffer. +*/ + memcpy(ip4h_checked, ip4h, sizeof(ip4h_checked)); + ip4h_checked.check = 0; + ip4h_checked.check = ip_fast_csum((u8 *)ip4h_checked, 5); + /* if IPv4 header checksum is OK, bellive it */ + if (ip4h-check == ip4h_checked.check) + return 4; + return 6; +} + +static int ib_get_dgid_sgid_by_grh(const void *h, + enum rdma_network_type net_type, + union ib_gid *dgid, union ib_gid *sgid) +{ + switch (net_type) { + case RDMA_NETWORK_IPV4: { + const struct iphdr *ip4h = (struct iphdr *)(h + 20); + + ipv6_addr_set_v4mapped(ip4h-daddr, (struct in6_addr *)dgid); + ipv6_addr_set_v4mapped(ip4h-saddr, (struct in6_addr *)sgid); + return 0; + } + case RDMA_NETWORK_IPV6: { + struct ipv6hdr *ip6h = (struct ipv6hdr *)h; + + memcpy(dgid, ip6h-daddr, sizeof(*dgid)); + memcpy(sgid, ip6h-saddr, sizeof(*sgid)); + return 0; + } + case RDMA_NETWORK_IB: { + struct ib_grh *grh = (struct ib_grh *)h; + + memcpy(dgid, grh-dgid, sizeof(*dgid)); + memcpy(sgid, grh-sgid, sizeof(*sgid)); + return 0; + } + } + + return -EINVAL; +} + +static enum rdma_network_type ib_get_net_type_by_grh(struct ib_device *device, +u8 port_num, +const struct ib_grh *grh) +{ + int grh_version; + + if (rdma_port_get_link_layer(device, port_num) == IB_LINK_LAYER_INFINIBAND) + return RDMA_NETWORK_IB; + + grh_version = ib_get_grh_header_version(grh); + + if (grh_version == 4) + return RDMA_NETWORK_IPV4; + + if (grh-next_hdr == IPPROTO_UDP) + return RDMA_NETWORK_IPV6; + + return RDMA_NETWORK_IB; +} + struct find_gid_index_context { u16 vlan_id; + enum ib_gid_type gid_type; }; static bool find_gid_index(const union ib_gid *gid, @@ -206,6 +282,9 @@ static bool find_gid_index(const union ib_gid *gid, struct find_gid_index_context *ctx = (struct find_gid_index_context *)context; + if (ctx-gid_type != gid_attr-gid_type) + return false; + if ((!!(ctx-vlan_id != 0x) == !is_vlan_dev(gid_attr-ndev)) || (is_vlan_dev(gid_attr-ndev) vlan_dev_vlan_id(gid_attr-ndev) != ctx-vlan_id)) @@ -216,9 +295,11 @@ static bool find_gid_index(const union ib_gid *gid, static int get_sgid_index_from_eth(struct ib_device *device, u8 port_num, u16 vlan_id, union ib_gid *sgid, + enum ib_gid_type gid_type, u16 *gid_index) { - struct find_gid_index_context context = {.vlan_id = vlan_id}; + struct find_gid_index_context context = {.vlan_id = vlan_id, +.gid_type = gid_type}; return ib_find_gid_by_filter(device, sgid, port_num, find_gid_index, context, gid_index); @@ -232,9 +313,24 @@ int ib_init_ah_from_wc(struct ib_device *device, u8 port_num, struct ib_wc *wc, int ret; int is_eth = (rdma_port_get_link_layer(device, port_num) == IB_LINK_LAYER_ETHERNET); + enum rdma_network_type net_type = RDMA_NETWORK_IB
[PATCH v2 for-next 23/32] IB/mlx4: Advertise RoCE support in port capabilities
From: Moni Shoua mo...@mellanox.com The port capability flags should indicate the support in RoCE modes (V1 or V2) of the port. The mlx4 driver sets these flags according to the capabilities reported by the HW. Signed-off-by: Moni Shoua mo...@mellanox.com Signed-off-by: Somnath Kotur somnath.ko...@emulex.com --- drivers/infiniband/hw/mlx4/main.c | 6 ++ drivers/net/ethernet/mellanox/mlx4/fw.c | 5 - include/linux/mlx4/device.h | 13 ++--- 3 files changed, 20 insertions(+), 4 deletions(-) diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c index 32cd009..bf87a95 100644 --- a/drivers/infiniband/hw/mlx4/main.c +++ b/drivers/infiniband/hw/mlx4/main.c @@ -359,6 +359,12 @@ static int eth_link_query_port(struct ib_device *ibdev, u8 port, IB_WIDTH_4X : IB_WIDTH_1X; props-active_speed = IB_SPEED_QDR; props-port_cap_flags = IB_PORT_CM_SUP | IB_PORT_IP_BASED_GIDS; + + if (mdev-dev-caps.flags MLX4_DEV_CAP_FLAG_IBOE) + props-port_cap_flags |= IB_PORT_ROCE; + if (mdev-dev-caps.flags2 MLX4_DEV_CAP_FLAG2_ROCE_V1_V2) + props-port_cap_flags |= IB_PORT_ROCE_V2 | IB_PORT_ROCE; + props-gid_tbl_len = mdev-dev-caps.gid_table_len[port]; props-max_msg_sz = mdev-dev-caps.max_msg_sz; props-pkey_tbl_len = 1; diff --git a/drivers/net/ethernet/mellanox/mlx4/fw.c b/drivers/net/ethernet/mellanox/mlx4/fw.c index 3702fd1..d573e73 100644 --- a/drivers/net/ethernet/mellanox/mlx4/fw.c +++ b/drivers/net/ethernet/mellanox/mlx4/fw.c @@ -146,7 +146,8 @@ static void dump_dev_cap_flags2(struct mlx4_dev *dev, u64 flags) [17] = Asymmetric EQs support, [18] = More than 80 VFs support, [19] = Performance optimized for limited rule configuration flow steering support, - [21] = Port Remap support + [21] = Port Remap support, + [22] = RoCEv2 support }; int i; @@ -852,6 +853,8 @@ int mlx4_QUERY_DEV_CAP(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap) dev_cap-flags2 |= MLX4_DEV_CAP_FLAG2_EQE_STRIDE; MLX4_GET(dev_cap-bmme_flags, outbox, QUERY_DEV_CAP_BMME_FLAGS_OFFSET); + if (dev_cap-bmme_flags MLX4_FLAG_ROCE_V1_V2) + dev_cap-flags2 |= MLX4_DEV_CAP_FLAG2_ROCE_V1_V2; if (dev_cap-bmme_flags MLX4_FLAG_PORT_REMAP) dev_cap-flags2 |= MLX4_DEV_CAP_FLAG2_PORT_REMAP; MLX4_GET(field, outbox, QUERY_DEV_CAP_CONFIG_DEV_OFFSET); diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h index 9a05e73..02dd6a0 100644 --- a/include/linux/mlx4/device.h +++ b/include/linux/mlx4/device.h @@ -202,7 +202,8 @@ enum { MLX4_DEV_CAP_FLAG2_SYS_EQS = 1LL 17, MLX4_DEV_CAP_FLAG2_80_VFS = 1LL 18, MLX4_DEV_CAP_FLAG2_FS_A0= 1LL 19, - MLX4_DEV_CAP_FLAG2_PORT_REMAP = 1LL 21 + MLX4_DEV_CAP_FLAG2_PORT_REMAP = 1LL 21, ++ MLX4_DEV_CAP_FLAG2_ROCE_V1_V2 = 1LL 22 }; enum { @@ -250,6 +251,7 @@ enum { MLX4_BMME_FLAG_TYPE_2_WIN = 1 9, MLX4_BMME_FLAG_RESERVED_LKEY= 1 10, MLX4_BMME_FLAG_FAST_REG_WR = 1 11, + MLX4_BMME_FLAG_ROCE_V1_V2 = 1 19, MLX4_BMME_FLAG_PORT_REMAP = 1 24, MLX4_BMME_FLAG_VSD_INIT2RTR = 1 28, }; @@ -258,6 +260,10 @@ enum { MLX4_FLAG_PORT_REMAP= MLX4_BMME_FLAG_PORT_REMAP }; +enum { + MLX4_FLAG_ROCE_V1_V2= MLX4_BMME_FLAG_ROCE_V1_V2 +}; + enum mlx4_event { MLX4_EVENT_TYPE_COMP = 0x00, MLX4_EVENT_TYPE_PATH_MIG = 0x01, @@ -888,9 +894,10 @@ struct mlx4_mad_ifc { if (((dev)-caps.port_mask[port] != MLX4_PORT_TYPE_IB)) #define mlx4_foreach_ib_transport_port(port, dev) \ - for ((port) = 1; (port) = (dev)-caps.num_ports; (port)++) \ + for ((port) = 1; (port) = (dev)-caps.num_ports; (port)++) \ if (((dev)-caps.port_mask[port] == MLX4_PORT_TYPE_IB) || \ - ((dev)-caps.flags MLX4_DEV_CAP_FLAG_IBOE)) + ((dev)-caps.flags MLX4_DEV_CAP_FLAG_IBOE) || \ + ((dev)-caps.flags2 MLX4_DEV_CAP_FLAG2_ROCE_V1_V2)) #define MLX4_INVALID_SLAVE_ID 0xFF -- 2.1.0 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 for-next 14/32] IB/Core: Changes to the IB Core infrastructure for RoCEv2 support
1. Choose sgid_index and type from all the matching entries in RDMA-CM based on hint from the IP stack. 2. Set hop_limit for the IP Packet based on above hint from IP stack 3. Define a RDMA_NETWORK enum type. Signed-off-by: Somnath Kotur somnath.ko...@emulex.com Signed-off-by: Matan Barak mat...@mellanox.com --- drivers/infiniband/core/addr.c | 8 + drivers/infiniband/core/cma.c | 10 +- drivers/infiniband/core/verbs.c | 77 ++--- include/rdma/ib_addr.h | 1 + include/rdma/ib_verbs.h | 9 + 5 files changed, 68 insertions(+), 37 deletions(-) diff --git a/drivers/infiniband/core/addr.c b/drivers/infiniband/core/addr.c index 43af7f5..da24c0e 100644 --- a/drivers/infiniband/core/addr.c +++ b/drivers/infiniband/core/addr.c @@ -257,6 +257,9 @@ static int addr4_resolve(struct sockaddr_in *src_in, goto put; } + if (rt-rt_uses_gateway) + addr-network = RDMA_NETWORK_IPV4; + ret = dst_fetch_ha(rt-dst, addr, fl4.daddr); put: ip_rt_put(rt); @@ -271,6 +274,7 @@ static int addr6_resolve(struct sockaddr_in6 *src_in, { struct flowi6 fl6; struct dst_entry *dst; + struct rt6_info *rt; int ret; memset(fl6, 0, sizeof fl6); @@ -282,6 +286,7 @@ static int addr6_resolve(struct sockaddr_in6 *src_in, if ((ret = dst-error)) goto put; + rt = (struct rt6_info *)dst; if (ipv6_addr_any(fl6.saddr)) { ret = ipv6_dev_get_saddr(init_net, ip6_dst_idev(dst)-dev, fl6.daddr, 0, fl6.saddr); @@ -305,6 +310,9 @@ static int addr6_resolve(struct sockaddr_in6 *src_in, goto put; } + if (rt-rt6i_flags RTF_GATEWAY) + addr-network = RDMA_NETWORK_IPV6; + ret = dst_fetch_ha(dst, addr, fl6.daddr); put: dst_release(dst); diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index 1705280..2bfe798 100644 --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -1952,6 +1952,7 @@ static int cma_resolve_iboe_route(struct rdma_id_private *id_priv) { struct rdma_route *route = id_priv-id.route; struct rdma_addr *addr = route-addr; + enum ib_gid_type network_gid_type; struct cma_work *work; int ret; struct net_device *ndev = NULL; @@ -1990,7 +1991,14 @@ static int cma_resolve_iboe_route(struct rdma_id_private *id_priv) rdma_ip2gid((struct sockaddr *)id_priv-id.route.addr.dst_addr, route-path_rec-dgid); - route-path_rec-hop_limit = 1; + /* Use the hint from IP Stack to select GID Type */ + network_gid_type = ib_network_to_gid_type(addr-dev_addr.network); + if (addr-dev_addr.network != RDMA_NETWORK_IB) { + route-path_rec-gid_type = network_gid_type; + route-path_rec-hop_limit = IPV6_DEFAULT_HOPLIMIT; + } else { + route-path_rec-hop_limit = 1; + } route-path_rec-reversible = 1; route-path_rec-pkey = cpu_to_be16(0x); route-path_rec-mtu_selector = IB_SA_EQ; diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c index 2e7ccad..3586996 100644 --- a/drivers/infiniband/core/verbs.c +++ b/drivers/infiniband/core/verbs.c @@ -195,11 +195,11 @@ struct ib_ah *ib_create_ah(struct ib_pd *pd, struct ib_ah_attr *ah_attr) } EXPORT_SYMBOL(ib_create_ah); -static int ib_get_grh_header_version(const void *h) +static int ib_get_grh_header_version(const union rdma_network_hdr *h) { - const struct iphdr *ip4h = (struct iphdr *)(h + 20); + const struct iphdr *ip4h = (struct iphdr *)h-roce4grh; struct iphdr ip4h_checked; - const struct ipv6hdr *ip6h = (struct ipv6hdr *)h; + const struct ipv6hdr *ip6h = (struct ipv6hdr *)h-ibgrh; if (ip6h-version != 6) return (ip4h-version == 4) ? 4 : 0; @@ -219,37 +219,6 @@ static int ib_get_grh_header_version(const void *h) return 6; } -static int ib_get_dgid_sgid_by_grh(const void *h, - enum rdma_network_type net_type, - union ib_gid *dgid, union ib_gid *sgid) -{ - switch (net_type) { - case RDMA_NETWORK_IPV4: { - const struct iphdr *ip4h = (struct iphdr *)(h + 20); - - ipv6_addr_set_v4mapped(ip4h-daddr, (struct in6_addr *)dgid); - ipv6_addr_set_v4mapped(ip4h-saddr, (struct in6_addr *)sgid); - return 0; - } - case RDMA_NETWORK_IPV6: { - struct ipv6hdr *ip6h = (struct ipv6hdr *)h; - - memcpy(dgid, ip6h-daddr, sizeof(*dgid)); - memcpy(sgid, ip6h-saddr, sizeof(*sgid)); - return 0; - } - case RDMA_NETWORK_IB: { - struct ib_grh *grh = (struct ib_grh *)h; - - memcpy(dgid
[PATCH v2 for-next 26/32] IB/mlx4: Configure device to work in RoCEv2
From: Moni Shoua mo...@mellanox.com Some mlx4 adapters are RoCEv2 capable. To enable this feature some hardware configuration is required. This is 1. Set port general parameters 2. Configure the outgoing UDP destination port 3. Configure the QP that work with RoCEv2 Signed-off-by: Moni Shoua mo...@mellanox.com Signed-off-by: Somnath Kotur somnath.ko...@emulex.com --- drivers/infiniband/hw/mlx4/main.c | 10 +++- drivers/infiniband/hw/mlx4/qp.c | 40 +++ drivers/net/ethernet/mellanox/mlx4/fw.c | 16 - drivers/net/ethernet/mellanox/mlx4/mlx4.h | 3 ++- drivers/net/ethernet/mellanox/mlx4/port.c | 9 ++- drivers/net/ethernet/mellanox/mlx4/qp.c | 27 + include/linux/mlx4/device.h | 3 ++- include/linux/mlx4/qp.h | 15 ++-- 8 files changed, 112 insertions(+), 11 deletions(-) diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c index 9d651cf..53c855b 100644 --- a/drivers/infiniband/hw/mlx4/main.c +++ b/drivers/infiniband/hw/mlx4/main.c @@ -2166,7 +2166,8 @@ static void *mlx4_ib_add(struct mlx4_dev *dev) if (mlx4_ib_init_sriov(ibdev)) goto err_mad; - if (dev-caps.flags MLX4_DEV_CAP_FLAG_IBOE) { + if (dev-caps.flags MLX4_DEV_CAP_FLAG_IBOE || + dev-caps.flags2 MLX4_DEV_CAP_FLAG2_ROCE_V1_V2) { if (!iboe-nb.notifier_call) { iboe-nb.notifier_call = mlx4_ib_netdev_event; err = register_netdevice_notifier(iboe-nb); @@ -2175,6 +2176,13 @@ static void *mlx4_ib_add(struct mlx4_dev *dev) goto err_notif; } } + if (!mlx4_is_slave(dev) + dev-caps.flags2 MLX4_DEV_CAP_FLAG2_ROCE_V1_V2) { + err = mlx4_config_roce_v2_port(dev, ROCE_V2_UDP_DPORT); + if (err) { + goto err_notif; + } + } } for (j = 0; j ARRAY_SIZE(mlx4_class_attributes); ++j) { diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c index 6f6d0db..847f9ec 100644 --- a/drivers/infiniband/hw/mlx4/qp.c +++ b/drivers/infiniband/hw/mlx4/qp.c @@ -1408,6 +1408,24 @@ static int handle_eth_ud_smac_index(struct mlx4_ib_dev *dev, return 0; } +enum { + MLX4_QPC_ROCE_MODE_1 = 0, + MLX4_QPC_ROCE_MODE_2 = 2, + MLX4_QPC_ROCE_MODE_MAX = 0xff +}; + +static u8 gid_type_to_qpc(enum ib_gid_type gid_type) +{ + switch (gid_type) { + case IB_GID_TYPE_IB: + return MLX4_QPC_ROCE_MODE_1; + case IB_GID_TYPE_ROCE_V2: + return MLX4_QPC_ROCE_MODE_2; + default: + return MLX4_QPC_ROCE_MODE_MAX; + } +} + static int __mlx4_ib_modify_qp(struct ib_qp *ibqp, const struct ib_qp_attr *attr, int attr_mask, enum ib_qp_state cur_state, enum ib_qp_state new_state) @@ -1531,12 +1549,14 @@ static int __mlx4_ib_modify_qp(struct ib_qp *ibqp, u16 vlan = 0x; u8 smac[ETH_ALEN]; int status = 0; + int is_eth = rdma_port_get_link_layer(dev-ib_dev, qp-port) == + IB_LINK_LAYER_ETHERNET; - if (rdma_port_get_link_layer(dev-ib_dev, qp-port) == - IB_LINK_LAYER_ETHERNET - attr-ah_attr.ah_flags IB_AH_GRH) { + if (is_eth attr-ah_attr.ah_flags IB_AH_GRH) { int index = attr-ah_attr.grh.sgid_index; + if (mlx4_is_bonded(dev-dev)) + port_num = 1; rcu_read_lock(); status = ib_get_cached_gid(ibqp-device, port_num, index, gid, gid_attr); @@ -1555,8 +1575,20 @@ static int __mlx4_ib_modify_qp(struct ib_qp *ibqp, port_num, vlan, smac)) goto out; + if (is_eth gid_attr.gid_type == IB_GID_TYPE_ROCE_V2) + context-pri_path.hop_limit = IPV6_DEFAULT_HOPLIMIT; + optpar |= (MLX4_QP_OPTPAR_PRIMARY_ADDR_PATH | MLX4_QP_OPTPAR_SCHED_QUEUE); + + if (is_eth (cur_state == IB_QPS_INIT new_state == IB_QPS_RTR)) { + u8 qpc_roce_mode = gid_type_to_qpc(gid_attr.gid_type); + + if (qpc_roce_mode == MLX4_QPC_ROCE_MODE_MAX) + goto out; + context-rlkey_roce_mode |= (qpc_roce_mode 6); + } + } if (attr_mask IB_QP_TIMEOUT) { @@ -1728,7 +1760,7 @@ static int __mlx4_ib_modify_qp(struct ib_qp *ibqp, sqd_event = 0; if (!ibqp-uobject
[PATCH v2 for-next 09/32] IB/core: Support find sgid index using a filter function
From: Matan Barak mat...@mellanox.com Sometimes a sgid index need to be found based on variable parameters. For example, when the CM gets a packet from network, it needs to match a sgid_index that matches the appropriate L2 attributes of a packet. Extending the cache's API to include Ethernet L2 attribute is problematic, since they may be vastly extended in the future. As a result, we add a find function that gets a user filter function and searches the GID table until a match is found. Signed-off-by: Matan Barak mat...@mellanox.com Signed-off-by: Somnath Kotur somnath.ko...@emulex.com --- drivers/infiniband/core/cache.c | 24 drivers/infiniband/core/core_priv.h | 9 + drivers/infiniband/core/roce_gid_cache.c | 66 include/rdma/ib_cache.h | 27 + 4 files changed, 126 insertions(+) diff --git a/drivers/infiniband/core/cache.c b/drivers/infiniband/core/cache.c index 882d491..ae86fe8 100644 --- a/drivers/infiniband/core/cache.c +++ b/drivers/infiniband/core/cache.c @@ -273,6 +273,30 @@ int ib_find_cached_gid_by_port(struct ib_device *device, } EXPORT_SYMBOL(ib_find_cached_gid_by_port); +int ib_find_gid_by_filter(struct ib_device *device, + union ib_gid *gid, + u8 port_num, + bool (*filter)(const union ib_gid *gid, +const struct ib_gid_attr *, +void *), + void *context, u16 *index) +{ + /* Look for a RoCE device with the specified GID. */ + if (!ib_cache_use_roce_gid_cache(device, port_num)) + return roce_gid_cache_find_gid_by_filter(device, gid, +port_num, filter, +context, index); + + /* Only RoCE GID cache supports filter function */ + if (filter) + return -ENOSYS; + + /* If no RoCE devices with the specified GID, look for IB device. */ + return __ib_find_cached_gid_by_port(device, port_num, + gid, index); +} +EXPORT_SYMBOL(ib_find_gid_by_filter); + int ib_get_cached_pkey(struct ib_device *device, u8port_num, int index, diff --git a/drivers/infiniband/core/core_priv.h b/drivers/infiniband/core/core_priv.h index 411672f..d6e73f8 100644 --- a/drivers/infiniband/core/core_priv.h +++ b/drivers/infiniband/core/core_priv.h @@ -84,6 +84,15 @@ int roce_gid_cache_find_gid_by_port(struct ib_device *ib_dev, union ib_gid *gid, enum ib_gid_type gid_type, u8 port, struct net *net, int if_index, u16 *index); +int roce_gid_cache_find_gid_by_filter(struct ib_device *ib_dev, + union ib_gid *gid, + u8 port, + bool (*filter)(const union ib_gid *gid, +const struct ib_gid_attr *, +void *), + void *context, + u16 *index); + int roce_gid_cache_is_active(struct ib_device *ib_dev, u8 port); enum roce_gid_cache_default_mode { diff --git a/drivers/infiniband/core/roce_gid_cache.c b/drivers/infiniband/core/roce_gid_cache.c index 5c109f7..ee9ac4d 100644 --- a/drivers/infiniband/core/roce_gid_cache.c +++ b/drivers/infiniband/core/roce_gid_cache.c @@ -436,6 +436,72 @@ int roce_gid_cache_find_gid_by_port(struct ib_device *ib_dev, union ib_gid *gid, return -ENOENT; } +int roce_gid_cache_find_gid_by_filter(struct ib_device *ib_dev, + union ib_gid *gid, + u8 port, + bool (*filter)(const union ib_gid *, +const struct ib_gid_attr *, +void *), + void *context, + u16 *index) +{ + struct ib_roce_gid_cache *cache; + unsigned int i; + bool found = false; + + if (!ib_dev-cache.roce_gid_cache) + return -ENOSYS; + + if (port start_port(ib_dev) || + port start_port(ib_dev) + ib_dev-phys_port_cnt || + rdma_port_get_link_layer(ib_dev, port) != + IB_LINK_LAYER_ETHERNET) + return -ENOSYS; + + cache = ib_dev-cache.roce_gid_cache[port - start_port(ib_dev)]; + + if (!cache || !cache-active) + return -ENOENT; + + for (i = 0; i cache-sz; i++) { + unsigned int orig_seq; + struct ib_gid_attr attr
[PATCH v2 for-next 27/32] IB/mlx4: Translate cache gid index to real index
From: Moni Shoua mo...@mellanox.com When QP is modified with path the given sgid_index is not necessarily the index that HW knows. This is due to optimizations that can save place in the HW table. Therefore, translation is required. Signed-off-by: Moni Shoua mo...@mellanox.com Signed-off-by: Somnath Kotur somnath.ko...@emulex.com --- drivers/infiniband/hw/mlx4/qp.c | 10 +++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c index 847f9ec..d7d7c5a 100644 --- a/drivers/infiniband/hw/mlx4/qp.c +++ b/drivers/infiniband/hw/mlx4/qp.c @@ -1256,14 +1256,18 @@ static int _mlx4_set_path(struct mlx4_ib_dev *dev, const struct ib_ah_attr *ah, path-static_rate = 0; if (ah-ah_flags IB_AH_GRH) { - if (ah-grh.sgid_index = dev-dev-caps.gid_table_len[port]) { + int real_sgid_index = mlx4_ib_gid_index_to_real_index(dev, + port, + ah-grh.sgid_index); + + if (real_sgid_index = dev-dev-caps.gid_table_len[port]) { pr_err(sgid_index (%u) too large. max is %d\n, - ah-grh.sgid_index, dev-dev-caps.gid_table_len[port] - 1); + real_sgid_index, dev-dev-caps.gid_table_len[port] - 1); return -1; } path-grh_mylmc |= 1 7; - path-mgid_index = ah-grh.sgid_index; + path-mgid_index = real_sgid_index; path-hop_limit = ah-grh.hop_limit; path-tclass_flowlabel = cpu_to_be32((ah-grh.traffic_class 20) | -- 2.1.0 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 for-next 11/32] IB/core: Add gid_type to path and rdma_id_private
From: Matan Barak mat...@mellanox.com When using rdma cm, we want to take the gid_type from the rdma_id_private. This is mandatory before adding an API from user-space/configfs that sets the gid_type of CM connection. Signed-off-by: Matan Barak mat...@mellanox.com Signed-off-by: Somnath Kotur somnath.ko...@emulex.com --- drivers/infiniband/core/cm.c | 19 ++- drivers/infiniband/core/cma.c | 2 ++ drivers/infiniband/core/sa_query.c| 3 ++- drivers/infiniband/core/uverbs_marshall.c | 1 + include/rdma/ib_sa.h | 1 + 5 files changed, 20 insertions(+), 6 deletions(-) diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c index 7974e74..22dac05 100644 --- a/drivers/infiniband/core/cm.c +++ b/drivers/infiniband/core/cm.c @@ -358,9 +358,8 @@ static int cm_init_av_by_path(struct ib_sa_path_rec *path, struct cm_av *av) read_lock_irqsave(cm.device_lock, flags); list_for_each_entry(cm_dev, cm.device_list, list) { if (!ib_find_cached_gid(cm_dev-ib_device, path-sgid, - IB_GID_TYPE_IB, path-net, - path-ifindex, - p, NULL)) { + path-gid_type, path-net, + path-ifindex, p, NULL)) { port = cm_dev-port[p-1]; break; } @@ -1521,6 +1520,8 @@ static int cm_req_handler(struct cm_work *work) struct ib_cm_id *cm_id; struct cm_id_private *cm_id_priv, *listen_cm_id_priv; struct cm_req_msg *req_msg; + union ib_gid gid; + struct ib_gid_attr gid_attr; int ret; req_msg = (struct cm_req_msg *)work-mad_recv_wc-recv_buf.mad; @@ -1560,11 +1561,19 @@ static int cm_req_handler(struct cm_work *work) cm_format_paths_from_req(req_msg, work-path[0], work-path[1]); memcpy(work-path[0].dmac, cm_id_priv-av.ah_attr.dmac, ETH_ALEN); - ret = cm_init_av_by_path(work-path[0], cm_id_priv-av); + ret = ib_get_cached_gid(work-port-cm_dev-ib_device, + work-port-port_num, + cm_id_priv-av.ah_attr.grh.sgid_index, + gid, gid_attr); + if (!ret) { + work-path[0].gid_type = gid_attr.gid_type; + ret = cm_init_av_by_path(work-path[0], cm_id_priv-av); + } if (ret) { ib_get_cached_gid(work-port-cm_dev-ib_device, work-port-port_num, 0, work-path[0].sgid, - NULL); + gid_attr); + work-path[0].gid_type = gid_attr.gid_type; ib_send_cm_rej(cm_id, IB_CM_REJ_INVALID_GID, work-path[0].sgid, sizeof work-path[0].sgid, NULL, 0); diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index 659676c..9afa410 100644 --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -146,6 +146,7 @@ struct rdma_id_private { u8 tos; u8 reuseaddr; u8 afonly; + enum ib_gid_typegid_type; }; struct cma_multicast { @@ -1936,6 +1937,7 @@ static int cma_resolve_iboe_route(struct rdma_id_private *id_priv) ndev = dev_get_by_index(init_net, addr-dev_addr.bound_dev_if); route-path_rec-net = init_net; route-path_rec-ifindex = addr-dev_addr.bound_dev_if; + route-path_rec-gid_type = id_priv-gid_type; } if (!ndev) { ret = -ENODEV; diff --git a/drivers/infiniband/core/sa_query.c b/drivers/infiniband/core/sa_query.c index 705b6b8..f770049 100644 --- a/drivers/infiniband/core/sa_query.c +++ b/drivers/infiniband/core/sa_query.c @@ -546,7 +546,7 @@ int ib_init_ah_from_path(struct ib_device *device, u8 port_num, ah_attr-ah_flags = IB_AH_GRH; ah_attr-grh.dgid = rec-dgid; - ret = ib_find_cached_gid(device, rec-sgid, IB_GID_TYPE_IB, + ret = ib_find_cached_gid(device, rec-sgid, rec-gid_type, rec-net, rec-ifindex, port_num, gid_index); if (ret) @@ -676,6 +676,7 @@ static void ib_sa_path_rec_callback(struct ib_sa_query *sa_query, mad-data, rec); rec.net = NULL; rec.ifindex = 0; + rec.gid_type = IB_GID_TYPE_IB; memset(rec.dmac, 0, ETH_ALEN); query-callback(status, rec, query-context); } else diff --git a/drivers/infiniband/core/uverbs_marshall.c b/drivers/infiniband/core/uverbs_marshall.c index 7d2f14c..af020f8 100644
[PATCH v2 for-next 00/32] RoCE V1/v2 per GID
to a private header (9) Support non-configfs configurations Devesh Sharma (3): RDMA/ocrdma: changes to support RoCE-v2 in UD path RDMA/ocrdma: changes to support RoCE-v2 in RC path RDMA/ocrdma: changes to support user AH creation Maor Gottlieb (1): net/mlx4_core: Add handlning of R-RoCE over IPV4 in qp attach flow Matan Barak (13): IB/core: Add RoCE GID cache IB/core: Add kref to IB devices IB/core: Add RoCE GID population IB/core: Add default GID for RoCE GID Cache net/bonding: make DRV macros private IB/core: Add RoCE cache bonding support IB/core: GID attribute should be returned from verbs API and cache API IB/core: Report gid_type and gid_ndev through sysfs IB/core: Support find sgid index using a filter function IB/core: Modify ib_verbs and cma in order to use roce_gid_cache IB/core: Add gid_type to path and rdma_id_private IB/core: Add rdma_network_type to wc IB/cma: Add configfs for rdma_cm Moni Shoua (13): IB/mlx4: Remove gid table management for RoCE IB/mlx4: Replace spin_lock with rw_semaphore IB/mlx4: Lock with RCU instead of RTNL net/mlx4: Postpone the registration of net_device IB/mlx4: Advertise RoCE support in port capabilities IB/mlx4: Implement ib_device callback - get_netdev IB/mlx4: Implement ib_device callback - modify_gid IB/mlx4: Configure device to work in RoCEv2 IB/mlx4: Translate cache gid index to real index IB/core: Initialize UD header structure with IP and UDP headers IB/mlx4: Enable send of RoCE QP1 packets with IP/UDP headers IB/mlx4: Create and use another QP1 for RoCEv2 IB/cma: Join and leave multicast groups with IGMP Somnath Kotur (2): IB/Core: Changes to the IB Core infrastructure for RoCEv2 support RDMA/ocrdma: Changes in driver to incorporate the moving of GID Table mgmt to IB/Core. drivers/infiniband/core/Makefile | 5 +- drivers/infiniband/core/addr.c | 11 +- drivers/infiniband/core/cache.c| 249 ++-- drivers/infiniband/core/cm.c | 49 +- drivers/infiniband/core/cma.c | 233 ++-- drivers/infiniband/core/cma_configfs.c | 222 drivers/infiniband/core/core_priv.h| 88 ++- drivers/infiniband/core/device.c | 150 - drivers/infiniband/core/mad.c | 2 +- drivers/infiniband/core/multicast.c| 17 +- drivers/infiniband/core/roce_gid_cache.c | 755 drivers/infiniband/core/roce_gid_mgmt.c| 757 + drivers/infiniband/core/sa_query.c | 12 +- drivers/infiniband/core/sysfs.c| 186 +- drivers/infiniband/core/ucma.c | 1 - drivers/infiniband/core/ud_header.c| 153 - drivers/infiniband/core/uverbs_cmd.c | 3 +- drivers/infiniband/core/uverbs_marshall.c | 5 +- drivers/infiniband/core/verbs.c| 266 ++--- drivers/infiniband/hw/mlx4/ah.c| 15 +- drivers/infiniband/hw/mlx4/mad.c | 12 +- drivers/infiniband/hw/mlx4/main.c | 756 +--- drivers/infiniband/hw/mlx4/mcg.c | 2 +- drivers/infiniband/hw/mlx4/mlx4_ib.h | 33 +- drivers/infiniband/hw/mlx4/qp.c| 337 --- drivers/infiniband/hw/mthca/mthca_av.c | 2 +- drivers/infiniband/hw/mthca/mthca_qp.c | 2 +- drivers/infiniband/hw/ocrdma/ocrdma.h | 12 + drivers/infiniband/hw/ocrdma/ocrdma_ah.c | 94 ++- drivers/infiniband/hw/ocrdma/ocrdma_ah.h | 5 +- drivers/infiniband/hw/ocrdma/ocrdma_hw.c | 50 +- drivers/infiniband/hw/ocrdma/ocrdma_main.c | 233 +--- drivers/infiniband/hw/ocrdma/ocrdma_sli.h | 18 +- drivers/infiniband/hw/ocrdma/ocrdma_verbs.c| 55 +- drivers/infiniband/hw/ocrdma/ocrdma_verbs.h| 4 + drivers/infiniband/ulp/ipoib/ipoib_main.c | 2 +- drivers/infiniband/ulp/ipoib/ipoib_multicast.c | 2 +- drivers/infiniband/ulp/srp/ib_srp.c| 2 +- drivers/infiniband/ulp/srpt/ib_srpt.c | 3 +- drivers/net/bonding/bond_main.c| 2 + drivers/net/bonding/bond_options.c | 13 - drivers/net/bonding/bond_procfs.c | 1 + drivers/net/bonding/bonding_priv.h | 26 + drivers/net/ethernet/mellanox/mlx4/en_main.c | 36 +- drivers/net/ethernet/mellanox/mlx4/fw.c| 21 +- drivers/net/ethernet/mellanox/mlx4/intf.c | 3 + drivers/net/ethernet/mellanox/mlx4/main.c | 18 + drivers/net/ethernet/mellanox/mlx4/mcg.c | 14 +- drivers/net/ethernet/mellanox/mlx4/mlx4.h | 3 +- drivers/net/ethernet/mellanox/mlx4/port.c | 9 +- drivers/net/ethernet/mellanox/mlx4/qp.c| 27 + include/linux/mlx4/cmd.h | 3 +- include/linux/mlx4/device.h| 23 +- include/linux/mlx4/driver.h
[PATCH v2 for-next 07/32] IB/core: GID attribute should be returned from verbs API and cache API
From: Matan Barak mat...@mellanox.com Along with the GID itself, we now store GIDs attribute. This GID attribute contains important meta information regarding the GID itself, for example the netdevice. Thus, this information needs to be returned in APIs. This patch changes the following APIs: (a) ib_get_cached_gid (b) ib_find_cached_gid (c) ib_find_cached_gid_by_port (d) ib_query_gid It changes the usage of those APIs and use the RoCE GID cache when needed. Signed-off-by: Matan Barak mat...@mellanox.com Signed-off-by: Somnath Kotur somnath.ko...@emulex.com --- drivers/infiniband/core/cache.c| 225 + drivers/infiniband/core/cm.c | 6 +- drivers/infiniband/core/cma.c | 84 ++--- drivers/infiniband/core/device.c | 29 +++- drivers/infiniband/core/mad.c | 2 +- drivers/infiniband/core/multicast.c| 3 +- drivers/infiniband/core/sa_query.c | 7 +- drivers/infiniband/core/sysfs.c| 2 +- drivers/infiniband/core/uverbs_marshall.c | 4 +- drivers/infiniband/core/verbs.c| 7 +- drivers/infiniband/hw/mlx4/qp.c| 5 +- drivers/infiniband/hw/mthca/mthca_av.c | 2 +- drivers/infiniband/ulp/ipoib/ipoib_main.c | 2 +- drivers/infiniband/ulp/ipoib/ipoib_multicast.c | 2 +- drivers/infiniband/ulp/srp/ib_srp.c| 2 +- drivers/infiniband/ulp/srpt/ib_srpt.c | 3 +- include/rdma/ib_cache.h| 44 - include/rdma/ib_sa.h | 4 +- include/rdma/ib_verbs.h| 7 +- 19 files changed, 352 insertions(+), 88 deletions(-) diff --git a/drivers/infiniband/core/cache.c b/drivers/infiniband/core/cache.c index 80f6cf2..882d491 100644 --- a/drivers/infiniband/core/cache.c +++ b/drivers/infiniband/core/cache.c @@ -42,6 +42,8 @@ #include core_priv.h +#define __IB_ONLY + struct ib_pkey_cache { int table_len; u16 table[0]; @@ -69,16 +71,16 @@ static inline int end_port(struct ib_device *device) 0 : device-phys_port_cnt; } -int ib_get_cached_gid(struct ib_device *device, - u8port_num, - int index, - union ib_gid *gid) +static int __IB_ONLY __ib_get_cached_gid(struct ib_device *device, +u8port_num, +int index, +union ib_gid *gid) { struct ib_gid_cache *cache; unsigned long flags; int ret = 0; - if (port_num start_port(device) || port_num end_port(device)) + if (!device-cache.gid_cache) return -EINVAL; read_lock_irqsave(device-cache.lock, flags); @@ -94,43 +96,183 @@ int ib_get_cached_gid(struct ib_device *device, return ret; } + +int ib_cache_use_roce_gid_cache(struct ib_device *device, u8 port_num) +{ + if (rdma_port_get_link_layer(device, port_num) == + IB_LINK_LAYER_ETHERNET) { + if (device-cache.roce_gid_cache) + return 0; + else + return -EAGAIN; + } + + return -EINVAL; +} +EXPORT_SYMBOL(ib_cache_use_roce_gid_cache); + +int ib_get_cached_gid(struct ib_device *device, + u8port_num, + int index, + union ib_gid *gid, + struct ib_gid_attr *attr) +{ + int ret; + + if (port_num start_port(device) || port_num end_port(device)) + return -EINVAL; + + ret = ib_cache_use_roce_gid_cache(device, port_num); + if (!ret) + return roce_gid_cache_get_gid(device, port_num, index, gid, + attr); + + if (ret == -EAGAIN) + return ret; + + ret = __ib_get_cached_gid(device, port_num, index, gid); + + if (!ret attr) { + memset(attr, 0, sizeof(*attr)); + attr-gid_type = IB_GID_TYPE_IB; + } + + return ret; +} EXPORT_SYMBOL(ib_get_cached_gid); -int ib_find_cached_gid(struct ib_device *device, - union ib_gid *gid, - u8 *port_num, - u16 *index) +static int __IB_ONLY ___ib_find_cached_gid_by_port(struct ib_device *device, + u8 port_num, + const union ib_gid *gid, + u16 *index) { struct ib_gid_cache *cache; + u8 p = port_num - start_port(device); + int i; + + if (!ib_cache_use_roce_gid_cache(device
[PATCH v2 for-next 05/32] net/bonding: make DRV macros private
From: Matan Barak mat...@mellanox.com The bonding modules currently defines 4 macros with general names that pollute the global namespace: DRV_VERSION DRV_RELDATE DRV_NAME DRV_DESCRIPTION Fixing that by defining a private bonding_priv.h header files which includes those defines. Signed-off-by: Matan Barak mat...@mellanox.com Signed-off-by: Somnath Kotur somnath.ko...@emulex.com --- drivers/net/bonding/bond_main.c| 2 ++ drivers/net/bonding/bond_procfs.c | 1 + drivers/net/bonding/bonding_priv.h | 26 ++ include/net/bonding.h | 7 --- 4 files changed, 29 insertions(+), 7 deletions(-) create mode 100644 drivers/net/bonding/bonding_priv.h diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index 468c70e..55f2d3e 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -81,6 +81,8 @@ #include net/bond_3ad.h #include net/bond_alb.h +#include bonding_priv.h + /* Module parameters */ /* monitor all links that often (in milliseconds). =0 disables monitoring */ diff --git a/drivers/net/bonding/bond_procfs.c b/drivers/net/bonding/bond_procfs.c index 976f5ad..b50a002 100644 --- a/drivers/net/bonding/bond_procfs.c +++ b/drivers/net/bonding/bond_procfs.c @@ -4,6 +4,7 @@ #include net/netns/generic.h #include net/bonding.h +#include bonding_priv.h static void *bond_info_seq_start(struct seq_file *seq, loff_t *pos) __acquires(RCU) diff --git a/drivers/net/bonding/bonding_priv.h b/drivers/net/bonding/bonding_priv.h new file mode 100644 index 000..c093e91 --- /dev/null +++ b/drivers/net/bonding/bonding_priv.h @@ -0,0 +1,26 @@ +/* + * Bond several ethernet interfaces into a Cisco, running 'Etherchannel'. + * + * Portions are (c) Copyright 1995 Simon Guru Aleph-Null Janes + * NCM: Network and Communications Management, Inc. + * + * BUT, I'm the one who modified it for ethernet, so: + * (c) Copyright 1999, Thomas Davis, tada...@lbl.gov + * + * This software may be used and distributed according to the terms + * of the GNU Public License, incorporated herein by reference. + * + */ + +#ifndef _BONDING_PRIV_H +#define _BONDING_PRIV_H + +#define DRV_VERSION3.7.1 +#define DRV_RELDATEApril 27, 2011 +#define DRV_NAME bonding +#define DRV_DESCRIPTIONEthernet Channel Bonding Driver + +#define bond_version DRV_DESCRIPTION : v DRV_VERSION ( DRV_RELDATE )\n + +#endif + diff --git a/include/net/bonding.h b/include/net/bonding.h index 4c2b0f4..a124173 100644 --- a/include/net/bonding.h +++ b/include/net/bonding.h @@ -30,13 +30,6 @@ #include net/bond_alb.h #include net/bond_options.h -#define DRV_VERSION3.7.1 -#define DRV_RELDATEApril 27, 2011 -#define DRV_NAME bonding -#define DRV_DESCRIPTIONEthernet Channel Bonding Driver - -#define bond_version DRV_DESCRIPTION : v DRV_VERSION ( DRV_RELDATE )\n - #define BOND_MAX_ARP_TARGETS 16 #define BOND_DEFAULT_MIIMON100 -- 2.1.0 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 for-next 10/32] IB/core: Modify ib_verbs and cma in order to use roce_gid_cache
From: Matan Barak mat...@mellanox.com Previously, we resolved the dmac and took the smac and vlan from the resolved address. Changing that into finding a net device that matches the IP and vlan of the network packet and querying the RoCE GID cache for this net device, GID and GID type. ocrdma driver changes were done by Somnath Kotur somnath.ko...@emulex.com Signed-off-by: Matan Barak mat...@mellanox.com Signed-off-by: Somnath Kotur somnath.ko...@emulex.com --- drivers/infiniband/core/addr.c | 3 +- drivers/infiniband/core/cm.c | 30 -- drivers/infiniband/core/cma.c| 9 -- drivers/infiniband/core/core_priv.h | 4 +- drivers/infiniband/core/sa_query.c | 4 - drivers/infiniband/core/ucma.c | 1 - drivers/infiniband/core/uverbs_cmd.c | 3 +- drivers/infiniband/core/verbs.c | 162 ++- drivers/infiniband/hw/mlx4/ah.c | 15 ++- drivers/infiniband/hw/mlx4/mad.c | 12 ++- drivers/infiniband/hw/mlx4/mcg.c | 2 +- drivers/infiniband/hw/mlx4/mlx4_ib.h | 2 +- drivers/infiniband/hw/mlx4/qp.c | 48 +++-- drivers/infiniband/hw/ocrdma/ocrdma.h| 1 + drivers/infiniband/hw/ocrdma/ocrdma_ah.c | 20 ++-- drivers/infiniband/hw/ocrdma/ocrdma_hw.c | 17 ++-- include/rdma/ib_addr.h | 2 +- include/rdma/ib_sa.h | 2 - include/rdma/ib_verbs.h | 11 +-- 19 files changed, 190 insertions(+), 158 deletions(-) diff --git a/drivers/infiniband/core/addr.c b/drivers/infiniband/core/addr.c index f80da50..43af7f5 100644 --- a/drivers/infiniband/core/addr.c +++ b/drivers/infiniband/core/addr.c @@ -458,7 +458,7 @@ static void resolve_cb(int status, struct sockaddr *src_addr, } int rdma_addr_find_dmac_by_grh(union ib_gid *sgid, union ib_gid *dgid, u8 *dmac, - u16 *vlan_id) + u16 *vlan_id, int if_index) { int ret = 0; struct rdma_dev_addr dev_addr; @@ -481,6 +481,7 @@ int rdma_addr_find_dmac_by_grh(union ib_gid *sgid, union ib_gid *dgid, u8 *dmac, return ret; memset(dev_addr, 0, sizeof(dev_addr)); + dev_addr.bound_dev_if = if_index; ctx.addr = dev_addr; init_completion(ctx.comp); diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c index d88f2ae..7974e74 100644 --- a/drivers/infiniband/core/cm.c +++ b/drivers/infiniband/core/cm.c @@ -178,8 +178,6 @@ struct cm_av { struct ib_ah_attr ah_attr; u16 pkey_index; u8 timeout; - u8 valid; - u8 smac[ETH_ALEN]; }; struct cm_work { @@ -382,7 +380,6 @@ static int cm_init_av_by_path(struct ib_sa_path_rec *path, struct cm_av *av) av-ah_attr); av-timeout = path-packet_life_time + 1; - av-valid = 1; return 0; } @@ -1563,7 +1560,6 @@ static int cm_req_handler(struct cm_work *work) cm_format_paths_from_req(req_msg, work-path[0], work-path[1]); memcpy(work-path[0].dmac, cm_id_priv-av.ah_attr.dmac, ETH_ALEN); - work-path[0].vlan_id = cm_id_priv-av.ah_attr.vlan_id; ret = cm_init_av_by_path(work-path[0], cm_id_priv-av); if (ret) { ib_get_cached_gid(work-port-cm_dev-ib_device, @@ -3511,32 +3507,6 @@ static int cm_init_qp_rtr_attr(struct cm_id_private *cm_id_priv, *qp_attr_mask = IB_QP_STATE | IB_QP_AV | IB_QP_PATH_MTU | IB_QP_DEST_QPN | IB_QP_RQ_PSN; qp_attr-ah_attr = cm_id_priv-av.ah_attr; - if (!cm_id_priv-av.valid) { - spin_unlock_irqrestore(cm_id_priv-lock, flags); - return -EINVAL; - } - if (cm_id_priv-av.ah_attr.vlan_id != 0x) { - qp_attr-vlan_id = cm_id_priv-av.ah_attr.vlan_id; - *qp_attr_mask |= IB_QP_VID; - } - if (!is_zero_ether_addr(cm_id_priv-av.smac)) { - memcpy(qp_attr-smac, cm_id_priv-av.smac, - sizeof(qp_attr-smac)); - *qp_attr_mask |= IB_QP_SMAC; - } - if (cm_id_priv-alt_av.valid) { - if (cm_id_priv-alt_av.ah_attr.vlan_id != 0x) { - qp_attr-alt_vlan_id = - cm_id_priv-alt_av.ah_attr.vlan_id; - *qp_attr_mask |= IB_QP_ALT_VID; - } - if (!is_zero_ether_addr(cm_id_priv-alt_av.smac)) { - memcpy(qp_attr-alt_smac, - cm_id_priv-alt_av.smac, - sizeof(qp_attr-alt_smac)); - *qp_attr_mask |= IB_QP_ALT_SMAC
[PATCH v2 for-next 24/32] IB/mlx4: Implement ib_device callback - get_netdev
From: Moni Shoua mo...@mellanox.com This is a new callback that is required for RoCEv2 support. In port aggregation mode it is required to return the netdev of the active port so support in mlx4 core driver to figure out that port identity is required. Signed-off-by: Moni Shoua mo...@mellanox.com Signed-off-by: Somnath Kotur somnath.ko...@emulex.com --- drivers/infiniband/hw/mlx4/main.c | 29 + drivers/net/ethernet/mellanox/mlx4/main.c | 18 ++ include/linux/mlx4/driver.h | 1 + 3 files changed, 48 insertions(+) diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c index bf87a95..04e6603 100644 --- a/drivers/infiniband/hw/mlx4/main.c +++ b/drivers/infiniband/hw/mlx4/main.c @@ -47,6 +47,8 @@ #include rdma/ib_addr.h #include rdma/ib_cache.h +#include net/bonding.h + #include linux/mlx4/driver.h #include linux/mlx4/cmd.h #include linux/mlx4/qp.h @@ -1527,6 +1529,32 @@ unlock: mutex_unlock(ibdev-qp1_proxy_lock[port - 1]); } +static struct net_device *mlx4_ib_get_netdev(struct ib_device *device, u8 port_num) +{ + struct mlx4_ib_dev *ibdev = to_mdev(device); + + if (mlx4_is_bonded(ibdev-dev)) { + struct net_device *dev; + struct net_device *upper = NULL; + + rcu_read_lock(); + + dev = mlx4_get_protocol_dev(ibdev-dev, MLX4_PROT_ETH, port_num); + if (dev) + upper = netdev_master_upper_dev_get_rcu(dev); + else + goto unlock; + if (upper) + dev = bond_option_active_slave_get_rcu(netdev_priv(upper)); +unlock: + rcu_read_unlock(); + + return dev; + } + + return mlx4_get_protocol_dev(ibdev-dev, MLX4_PROT_ETH, port_num); +} + static void mlx4_ib_scan_netdevs(struct mlx4_ib_dev *ibdev, struct net_device *dev, unsigned long event) @@ -1806,6 +1834,7 @@ static void *mlx4_ib_add(struct mlx4_dev *dev) ibdev-ib_dev.attach_mcast = mlx4_ib_mcg_attach; ibdev-ib_dev.detach_mcast = mlx4_ib_mcg_detach; ibdev-ib_dev.process_mad = mlx4_ib_process_mad; + ibdev-ib_dev.get_netdev= mlx4_ib_get_netdev; if (!mlx4_is_slave(ibdev-dev)) { ibdev-ib_dev.alloc_fmr = mlx4_ib_fmr_alloc; diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c b/drivers/net/ethernet/mellanox/mlx4/main.c index 1893a57..6311897 100644 --- a/drivers/net/ethernet/mellanox/mlx4/main.c +++ b/drivers/net/ethernet/mellanox/mlx4/main.c @@ -1237,6 +1237,24 @@ int mlx4_port_map_set(struct mlx4_dev *dev, struct mlx4_port_map *v2p) } EXPORT_SYMBOL_GPL(mlx4_port_map_set); +int mlx4_port_map_get(struct mlx4_dev *dev, u8 vport, u8 *pport) +{ + struct mlx4_priv *priv = mlx4_priv(dev); + + if (!pport) + return -EINVAL; + *pport = 0; + + if (vport == 1) + *pport = priv-v2p.port1; + else if (vport == 2) + *pport = priv-v2p.port2; + if (!*pport) + return -EINVAL; + return 0; +} +EXPORT_SYMBOL_GPL(mlx4_port_map_get); + static int mlx4_load_fw(struct mlx4_dev *dev) { struct mlx4_priv *priv = mlx4_priv(dev); diff --git a/include/linux/mlx4/driver.h b/include/linux/mlx4/driver.h index 5a06d96..a992971 100644 --- a/include/linux/mlx4/driver.h +++ b/include/linux/mlx4/driver.h @@ -81,6 +81,7 @@ struct mlx4_port_map { }; int mlx4_port_map_set(struct mlx4_dev *dev, struct mlx4_port_map *v2p); +int mlx4_port_map_get(struct mlx4_dev *dev, u8 vport, u8 *pport); void *mlx4_get_protocol_dev(struct mlx4_dev *dev, enum mlx4_protocol proto, int port); -- 2.1.0 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 for-next 08/32] IB/core: Report gid_type and gid_ndev through sysfs
From: Matan Barak mat...@mellanox.com Since we've added GID attributes to the RoCE GID table, the users need a convenient way to query them. Adding the GID type and relate net device to IB's sysfs. The new attributes are available in: /sys/class/infiniband/device/ports/port/gid_attrs/ndevs/index /sys/class/infiniband/device/ports/port/gid_attrs/types/index The index corresponds to the index of the respective GID in: /sys/class/infiniband/device/ports/port/gids/index Signed-off-by: Matan Barak mat...@mellanox.com Signed-off-by: Somnath Kotur somnath.ko...@emulex.com --- drivers/infiniband/core/core_priv.h | 2 + drivers/infiniband/core/roce_gid_cache.c | 13 +++ drivers/infiniband/core/sysfs.c | 184 ++- 3 files changed, 197 insertions(+), 2 deletions(-) diff --git a/drivers/infiniband/core/core_priv.h b/drivers/infiniband/core/core_priv.h index 6ab40a9..411672f 100644 --- a/drivers/infiniband/core/core_priv.h +++ b/drivers/infiniband/core/core_priv.h @@ -71,6 +71,8 @@ void ib_enum_roce_ports_of_netdev(roce_netdev_filter filter, roce_netdev_callback cb, void *cookie); +const char *roce_gid_cache_type_str(enum ib_gid_type gid_type); + int roce_gid_cache_get_gid(struct ib_device *ib_dev, u8 port, int index, union ib_gid *gid, struct ib_gid_attr *attr); diff --git a/drivers/infiniband/core/roce_gid_cache.c b/drivers/infiniband/core/roce_gid_cache.c index 2bd663f..5c109f7 100644 --- a/drivers/infiniband/core/roce_gid_cache.c +++ b/drivers/infiniband/core/roce_gid_cache.c @@ -48,6 +48,11 @@ enum gid_attr_find_mask { GID_ATTR_FIND_MASK_NETDEV = 1UL 1, }; +static const char * const gid_type_str[] = { + [IB_GID_TYPE_IB]= IB/RoCE V1\n, + [IB_GID_TYPE_ROCE_V2] = RoCE V2\n, +}; + static inline int start_port(struct ib_device *ib_dev) { return (ib_dev-node_type == RDMA_NODE_IB_SWITCH) ? 0 : 1; @@ -58,6 +63,14 @@ struct dev_put_rcu { struct net_device *ndev; }; +const char *roce_gid_cache_type_str(enum ib_gid_type gid_type) +{ + if (gid_type ARRAY_SIZE(gid_type_str) gid_type_str[gid_type]) + return gid_type_str[gid_type]; + + return Invalid GID type; +} + static void put_ndev(struct rcu_head *rcu) { struct dev_put_rcu *put_rcu = diff --git a/drivers/infiniband/core/sysfs.c b/drivers/infiniband/core/sysfs.c index 5cee246..887c2f8 100644 --- a/drivers/infiniband/core/sysfs.c +++ b/drivers/infiniband/core/sysfs.c @@ -37,12 +37,22 @@ #include linux/slab.h #include linux/stat.h #include linux/string.h +#include linux/netdevice.h #include rdma/ib_mad.h +struct ib_port; + +struct gid_attr_group { + struct ib_port *port; + struct kobject kobj; + struct attribute_group ndev; + struct attribute_group type; +}; struct ib_port { struct kobject kobj; struct ib_device *ibdev; + struct gid_attr_group *gid_attr_group; struct attribute_group gid_group; struct attribute_group pkey_group; u8 port_num; @@ -84,6 +94,24 @@ static const struct sysfs_ops port_sysfs_ops = { .show = port_attr_show }; +static ssize_t gid_attr_show(struct kobject *kobj, +struct attribute *attr, char *buf) +{ + struct port_attribute *port_attr = + container_of(attr, struct port_attribute, attr); + struct ib_port *p = container_of(kobj, struct gid_attr_group, +kobj)-port; + + if (!port_attr-show) + return -EIO; + + return port_attr-show(p, port_attr, buf); +} + +static const struct sysfs_ops gid_attr_sysfs_ops = { + .show = gid_attr_show +}; + static ssize_t state_show(struct ib_port *p, struct port_attribute *unused, char *buf) { @@ -281,6 +309,46 @@ static struct attribute *port_default_attrs[] = { NULL }; +static size_t print_ndev(struct ib_gid_attr *gid_attr, char *buf) +{ + if (!gid_attr-ndev) + return -EINVAL; + + return sprintf(buf, %s\n, gid_attr-ndev-name); +} + +static size_t print_gid_type(struct ib_gid_attr *gid_attr, char *buf) +{ + return sprintf(buf, %s, roce_gid_cache_type_str(gid_attr-gid_type)); +} + +static ssize_t _show_port_gid_attr(struct ib_port *p, + struct port_attribute *attr, + char *buf, + size_t (*print)(struct ib_gid_attr *gid_attr, + char *buf)) +{ + struct port_table_attribute *tab_attr = + container_of(attr, struct port_table_attribute, attr); + union ib_gid gid; + struct ib_gid_attr gid_attr; + ssize_t ret; + va_list args; + + rcu_read_lock
[PATCH v2 for-next 13/32] IB/cma: Add configfs for rdma_cm
From: Matan Barak mat...@mellanox.com Users would like to control the behaviour of rdma_cm. For example, old applications which doesn't set the required RoCE gid type could be executed on RoCE V2 network types. In order to support this configuration, we implement a configfs for rdma_cm. In order to use the configfs, one needs to mount it and mkdir IB device name inside rdma_cm directory. The patch adds support for a single configuration file, default_roce_mode. The mode can either be IB RoCEv1 or RoCEv2. Signed-off-by: Matan Barak mat...@mellanox.com Signed-off-by: Somnath Kotur somnath.ko...@emulex.com --- drivers/infiniband/core/Makefile | 2 + drivers/infiniband/core/cma.c| 54 +++- drivers/infiniband/core/cma_configfs.c | 222 +++ drivers/infiniband/core/core_priv.h | 13 ++ drivers/infiniband/core/roce_gid_cache.c | 13 ++ 5 files changed, 300 insertions(+), 4 deletions(-) create mode 100644 drivers/infiniband/core/cma_configfs.c diff --git a/drivers/infiniband/core/Makefile b/drivers/infiniband/core/Makefile index 2c94963..e25a96c 100644 --- a/drivers/infiniband/core/Makefile +++ b/drivers/infiniband/core/Makefile @@ -24,6 +24,8 @@ iw_cm-y :=iwcm.o iwpm_util.o iwpm_msg.o rdma_cm-y := cma.o +rdma_cm-$(CONFIG_CONFIGFS_FS) += cma_configfs.o + rdma_ucm-y := ucma.o ib_addr-y := addr.o diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index 9afa410..1705280 100644 --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -55,6 +55,7 @@ #include rdma/ib_cm.h #include rdma/ib_sa.h #include rdma/iw_cm.h +#include core_priv.h MODULE_AUTHOR(Sean Hefty); MODULE_DESCRIPTION(Generic RDMA CM Agent); @@ -91,6 +92,7 @@ struct cma_device { struct completion comp; atomic_trefcount; struct list_headid_list; + enum ib_gid_typedefault_gid_type; }; struct rdma_bind_list { @@ -103,6 +105,42 @@ enum { CMA_OPTION_AFONLY, }; +void cma_ref_dev(struct cma_device *cma_dev) +{ + atomic_inc(cma_dev-refcount); +} + +struct cma_device *cma_enum_devices_by_ibdev(cma_device_filter filter, +void *cookie) +{ + struct cma_device *cma_dev; + struct cma_device *found_cma_dev = NULL; + + mutex_lock(lock); + + list_for_each_entry(cma_dev, dev_list, list) + if (filter(cma_dev-device, cookie)) { + found_cma_dev = cma_dev; + break; + } + + if (found_cma_dev) + cma_ref_dev(found_cma_dev); + mutex_unlock(lock); + return found_cma_dev; +} + +enum ib_gid_type cma_get_default_gid_type(struct cma_device *cma_dev) +{ + return cma_dev-default_gid_type; +} + +void cma_set_default_gid_type(struct cma_device *cma_dev, + enum ib_gid_type default_gid_type) +{ + cma_dev-default_gid_type = default_gid_type; +} + /* * Device removal can occur at anytime, so we need extra handling to * serialize notifying the user of device removal with other callbacks. @@ -248,15 +286,16 @@ static inline void cma_set_ip_ver(struct cma_hdr *hdr, u8 ip_ver) static void cma_attach_to_dev(struct rdma_id_private *id_priv, struct cma_device *cma_dev) { - atomic_inc(cma_dev-refcount); + cma_ref_dev(cma_dev); id_priv-cma_dev = cma_dev; + id_priv-gid_type = cma_dev-default_gid_type; id_priv-id.device = cma_dev-device; id_priv-id.route.addr.dev_addr.transport = rdma_node_get_transport(cma_dev-device-node_type); list_add_tail(id_priv-list, cma_dev-id_list); } -static inline void cma_deref_dev(struct cma_device *cma_dev) +void cma_deref_dev(struct cma_device *cma_dev) { if (atomic_dec_and_test(cma_dev-refcount)) complete(cma_dev-comp); @@ -385,7 +424,7 @@ static int cma_acquire_dev(struct rdma_id_private *id_priv, ret = ib_find_cached_gid_by_port(cma_dev-device, iboe_gid, -IB_GID_TYPE_IB, + cma_dev-default_gid_type, port, init_net, if_index, @@ -418,7 +457,7 @@ static int cma_acquire_dev(struct rdma_id_private *id_priv, ret = ib_find_cached_gid_by_port(cma_dev-device, iboe_gid, - IB_GID_TYPE_IB
[PATCH v2 for-next 25/32] IB/mlx4: Implement ib_device callback - modify_gid
From: Moni Shoua mo...@mellanox.com This is a new callbac that is required for RoCEv2 support. In RoCE, GID table is managed in the IB core driver. The role of the mlx4 driver is to synchronize the HW with the entries in the GID table. Since it is possible that the same GID value will appear more than once in the GID table (though with different attributes) it is required from the mlx4 driver to maintain a reference counting mechanism and populate the HW with a single value. Since an index to the GID table is not necessarily the same as index to the matching entry in the HW GID table, a translation between indexes is required. Signed-off-by: Moni Shoua mo...@mellanox.com Signed-off-by: Somnath Kotur somnath.ko...@emulex.com --- drivers/infiniband/hw/mlx4/main.c| 224 +++ drivers/infiniband/hw/mlx4/mlx4_ib.h | 18 +++ include/linux/mlx4/cmd.h | 3 +- include/linux/mlx4/device.h | 3 +- 4 files changed, 246 insertions(+), 2 deletions(-) diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c index 04e6603..9d651cf 100644 --- a/drivers/infiniband/hw/mlx4/main.c +++ b/drivers/infiniband/hw/mlx4/main.c @@ -1555,6 +1555,228 @@ unlock: return mlx4_get_protocol_dev(ibdev-dev, MLX4_PROT_ETH, port_num); } +static int mlx4_ib_update_gids_v1(struct gid_entry *gids, + struct mlx4_ib_dev *ibdev, + u8 port_num) +{ + struct mlx4_cmd_mailbox *mailbox; + int err; + struct mlx4_dev *dev = ibdev-dev; + int i; + union ib_gid *gid_tbl; + + mailbox = mlx4_alloc_cmd_mailbox(dev); + if (IS_ERR(mailbox)) + return -ENOMEM; + + gid_tbl = mailbox-buf; + + for (i = 0; i MLX4_MAX_PORT_GIDS; ++i) + memcpy(gid_tbl[i], gids[i].gid, sizeof(union ib_gid)); + + err = mlx4_cmd(dev, mailbox-dma, + MLX4_SET_PORT_GID_TABLE 8 | port_num, + 1, MLX4_CMD_SET_PORT, MLX4_CMD_TIME_CLASS_B, + MLX4_CMD_WRAPPED); + if (mlx4_is_bonded(dev)) + err += mlx4_cmd(dev, mailbox-dma, + MLX4_SET_PORT_GID_TABLE 8 | 2, + 1, MLX4_CMD_SET_PORT, MLX4_CMD_TIME_CLASS_B, + MLX4_CMD_WRAPPED); + + mlx4_free_cmd_mailbox(dev, mailbox); + return err; +} + +static int mlx4_ib_update_gids_v1_v2(struct gid_entry *gids, +struct mlx4_ib_dev *ibdev, +u8 port_num) +{ + struct mlx4_cmd_mailbox *mailbox; + int err; + struct mlx4_dev *dev = ibdev-dev; + int i; + struct { + union ib_gidgid; + __be32 rsrvd1[2]; + __be16 rsrvd2; + u8 type; + u8 version; + __be32 rsrvd3; + } *gid_tbl; + + mailbox = mlx4_alloc_cmd_mailbox(dev); + if (IS_ERR(mailbox)) + return -ENOMEM; + + gid_tbl = mailbox-buf; + for (i = 0; i MLX4_MAX_PORT_GIDS; ++i) { + memcpy(gid_tbl[i].gid, gids[i].gid, sizeof(union ib_gid)); + if (gids[i].gid_type == IB_GID_TYPE_ROCE_V2) { + gid_tbl[i].version = 2; + if (!ipv6_addr_v4mapped((struct in6_addr *)gids[i].gid)) + gid_tbl[i].type = 1; + } + } + + err = mlx4_cmd(dev, mailbox-dma, + MLX4_SET_PORT_ROCE_ADDR 8 | port_num, + 1, MLX4_CMD_SET_PORT, MLX4_CMD_TIME_CLASS_B, + MLX4_CMD_WRAPPED); + if (mlx4_is_bonded(dev)) + err += mlx4_cmd(dev, mailbox-dma, + MLX4_SET_PORT_ROCE_ADDR 8 | 2, + 1, MLX4_CMD_SET_PORT, MLX4_CMD_TIME_CLASS_B, + MLX4_CMD_WRAPPED); + + mlx4_free_cmd_mailbox(dev, mailbox); + return err; +} + +static int mlx4_ib_update_gids(struct gid_entry *gids, + struct mlx4_ib_dev *ibdev, + u8 port_num) +{ + if (ibdev-dev-caps.flags2 MLX4_DEV_CAP_FLAG2_ROCE_V1_V2) + return mlx4_ib_update_gids_v1_v2(gids, ibdev, port_num); + + return mlx4_ib_update_gids_v1(gids, ibdev, port_num); +} + +static int mlx4_ib_modify_gid(struct ib_device *device, + u8 port_num, unsigned int index, + const union ib_gid *gid, + const struct ib_gid_attr *attr, + void **context) +{ + struct mlx4_ib_dev *ibdev = to_mdev(device); + struct mlx4_ib_iboe *iboe = ibdev-iboe; + struct mlx4_port_gid_table *port_gid_table; + int free = -1, found = -1; + int ret
[PATCH v2 for-next 15/32] RDMA/ocrdma: Changes in driver to incorporate the moving of GID Table mgmt to IB/Core.
1.Check and set port capability flags to indicate RoCEV2 support. 2.Change query_gid hook to return value from IB/Core GID Mgmt APIs. 3.Get rid of all the netdev notifier chain subscription code as well as maintenance of SGID Table in memory. 4.Implement get_netdev hook in driver. Signed-off-by: Somnath Kotur somnath.ko...@emulex.com Signed-off-by: Devesh Sharma devesh.sha...@emulex.com --- drivers/infiniband/hw/ocrdma/ocrdma.h | 10 ++ drivers/infiniband/hw/ocrdma/ocrdma_hw.c| 3 + drivers/infiniband/hw/ocrdma/ocrdma_main.c | 233 +--- drivers/infiniband/hw/ocrdma/ocrdma_sli.h | 13 ++ drivers/infiniband/hw/ocrdma/ocrdma_verbs.c | 34 +++- drivers/infiniband/hw/ocrdma/ocrdma_verbs.h | 4 + 6 files changed, 65 insertions(+), 232 deletions(-) diff --git a/drivers/infiniband/hw/ocrdma/ocrdma.h b/drivers/infiniband/hw/ocrdma/ocrdma.h index 16ee36e..97f971a 100644 --- a/drivers/infiniband/hw/ocrdma/ocrdma.h +++ b/drivers/infiniband/hw/ocrdma/ocrdma.h @@ -100,6 +100,7 @@ struct ocrdma_dev_attr { u8 local_ca_ack_delay; u8 ird; u8 num_ird_pages; + u8 roce_flags; }; struct ocrdma_dma_mem { @@ -575,4 +576,13 @@ static inline u8 ocrdma_is_enabled_and_synced(u32 state) (state OCRDMA_STATE_FLAG_SYNC); } +static inline bool ocrdma_is_rocev2_supported(struct ocrdma_dev *dev) +{ + return (dev-attr.roce_flags (OCRDMA_L3_TYPE_IPV4 + OCRDMA_ROUDP_FLAGS_SHIFT) || + dev-attr.roce_flags (OCRDMA_L3_TYPE_IPV6 + OCRDMA_ROUDP_FLAGS_SHIFT)) ? + true : false; +} + #endif diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_hw.c b/drivers/infiniband/hw/ocrdma/ocrdma_hw.c index e5f0244..20f9e8f 100644 --- a/drivers/infiniband/hw/ocrdma/ocrdma_hw.c +++ b/drivers/infiniband/hw/ocrdma/ocrdma_hw.c @@ -1112,6 +1112,9 @@ static void ocrdma_get_attr(struct ocrdma_dev *dev, attr-local_ca_ack_delay = (rsp-max_pd_ca_ack_delay OCRDMA_MBX_QUERY_CFG_CA_ACK_DELAY_MASK) OCRDMA_MBX_QUERY_CFG_CA_ACK_DELAY_SHIFT; + attr-roce_flags = (rsp-max_pd_ca_ack_delay + OCRDMA_MBX_QUERY_CFG_L3_TYPE_MASK) + OCRDMA_MBX_QUERY_CFG_L3_TYPE_SHIFT; attr-max_mw = rsp-max_mw; attr-max_mr = rsp-max_mr; attr-max_mr_size = ((u64)rsp-max_mr_size_hi 32) | diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_main.c b/drivers/infiniband/hw/ocrdma/ocrdma_main.c index 7a2b59a..a81492f 100644 --- a/drivers/infiniband/hw/ocrdma/ocrdma_main.c +++ b/drivers/infiniband/hw/ocrdma/ocrdma_main.c @@ -51,8 +51,6 @@ static LIST_HEAD(ocrdma_dev_list); static DEFINE_SPINLOCK(ocrdma_devlist_lock); static DEFINE_IDR(ocrdma_dev_id); -static union ib_gid ocrdma_zero_sgid; - void ocrdma_get_guid(struct ocrdma_dev *dev, u8 *guid) { u8 mac_addr[6]; @@ -67,135 +65,6 @@ void ocrdma_get_guid(struct ocrdma_dev *dev, u8 *guid) guid[6] = mac_addr[4]; guid[7] = mac_addr[5]; } - -static bool ocrdma_add_sgid(struct ocrdma_dev *dev, union ib_gid *new_sgid) -{ - int i; - unsigned long flags; - - memset(ocrdma_zero_sgid, 0, sizeof(union ib_gid)); - - - spin_lock_irqsave(dev-sgid_lock, flags); - for (i = 0; i OCRDMA_MAX_SGID; i++) { - if (!memcmp(dev-sgid_tbl[i], ocrdma_zero_sgid, - sizeof(union ib_gid))) { - /* found free entry */ - memcpy(dev-sgid_tbl[i], new_sgid, - sizeof(union ib_gid)); - spin_unlock_irqrestore(dev-sgid_lock, flags); - return true; - } else if (!memcmp(dev-sgid_tbl[i], new_sgid, - sizeof(union ib_gid))) { - /* entry already present, no addition is required. */ - spin_unlock_irqrestore(dev-sgid_lock, flags); - return false; - } - } - spin_unlock_irqrestore(dev-sgid_lock, flags); - return false; -} - -static bool ocrdma_del_sgid(struct ocrdma_dev *dev, union ib_gid *sgid) -{ - int found = false; - int i; - unsigned long flags; - - - spin_lock_irqsave(dev-sgid_lock, flags); - /* first is default sgid, which cannot be deleted. */ - for (i = 1; i OCRDMA_MAX_SGID; i++) { - if (!memcmp(dev-sgid_tbl[i], sgid, sizeof(union ib_gid))) { - /* found matching entry */ - memset(dev-sgid_tbl[i], 0, sizeof(union ib_gid)); - found = true; - break; - } - } - spin_unlock_irqrestore(dev-sgid_lock, flags); - return found; -} - -static int ocrdma_addr_event(unsigned long event, struct
[PATCH v2 for-next 02/32] IB/core: Add kref to IB devices
From: Matan Barak mat...@mellanox.com Previously. we used device_mutex lock in order to protect the device's list. That means that in order to guarantee a device isn't freed while we use it, we had to lock all devices. Adding a kref per IB device. Before an IB device is unregistered, we wait before its not held anymore. Signed-off-by: Matan Barak mat...@mellanox.com Signed-off-by: Somnath Kotur somnath.ko...@emulex.com --- drivers/infiniband/core/device.c | 41 include/rdma/ib_verbs.h | 6 ++ 2 files changed, 47 insertions(+) diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c index 18c1ece..8616a95 100644 --- a/drivers/infiniband/core/device.c +++ b/drivers/infiniband/core/device.c @@ -261,6 +261,39 @@ out: return ret; } +static void ib_device_complete_cb(struct kref *kref) +{ + struct ib_device *device = container_of(kref, struct ib_device, + refcount); + + if (device-reg_state = IB_DEV_UNREGISTERING) + complete(device-free); +} + +/** + * ib_device_hold - increase the reference count of device + * @device: ib device to prevent from being free'd + * + * Prevent the device from being free'd. + */ +void ib_device_hold(struct ib_device *device) +{ + kref_get(device-refcount); +} +EXPORT_SYMBOL(ib_device_hold); + +/** + * ib_device_put - decrease the reference count of device + * @device: allows this device to be free'd + * + * Puts the ib_device and allows it to be free'd. + */ +int ib_device_put(struct ib_device *device) +{ + return kref_put(device-refcount, ib_device_complete_cb); +} +EXPORT_SYMBOL(ib_device_put); + /** * ib_register_device - Register an IB device with IB core * @device:Device to register @@ -312,6 +345,9 @@ int ib_register_device(struct ib_device *device, list_add_tail(device-core_list, device_list); + kref_init(device-refcount); + init_completion(device-free); + device-reg_state = IB_DEV_REGISTERED; { @@ -342,6 +378,8 @@ void ib_unregister_device(struct ib_device *device) mutex_lock(device_mutex); + device-reg_state = IB_DEV_UNREGISTERING; + list_for_each_entry_reverse(client, client_list, list) if (client-remove) client-remove(device); @@ -355,6 +393,9 @@ void ib_unregister_device(struct ib_device *device) ib_device_unregister_sysfs(device); + ib_device_put(device); + wait_for_completion(device-free); + spin_lock_irqsave(device-client_data_lock, flags); list_for_each_entry_safe(context, tmp, device-client_data_list, list) kfree(context); diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index 1866595..a7593b0 100644 --- a/include/rdma/ib_verbs.h +++ b/include/rdma/ib_verbs.h @@ -1716,6 +1716,7 @@ struct ib_device { enum { IB_DEV_UNINITIALIZED, IB_DEV_REGISTERED, + IB_DEV_UNREGISTERING, IB_DEV_UNREGISTERED }reg_state; @@ -1728,6 +1729,8 @@ struct ib_device { u32 local_dma_lkey; u8 node_type; u8 phys_port_cnt; + struct kref refcount; + struct completionfree; }; struct ib_client { @@ -1741,6 +1744,9 @@ struct ib_client { struct ib_device *ib_alloc_device(size_t size); void ib_dealloc_device(struct ib_device *device); +void ib_device_hold(struct ib_device *device); +int ib_device_put(struct ib_device *device); + int ib_register_device(struct ib_device *device, int (*port_callback)(struct ib_device *, u8, struct kobject *)); -- 2.1.0 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 for-next 01/32] IB/core: Add RoCE GID cache
From: Matan Barak mat...@mellanox.com In order to manage multiple types, vlans and MACs per GID, we need to store them along the GID itself. We store the net device as well, as sometimes GIDs should be handled according to the net device they came from. Since populating the GID table should be identical for every RoCE provider, the GIDs table should be handled in ib_core. Adding a GID cache table that supports a lockless find, add and delete gids. The lockless nature comes from using a unique sequence number per table entry and detecting that while reading/ writing this sequence wasn't changed. By using this RoCE GID cache table, providers must implement a modify_gid callback. The table is managed exclusively by this roce_gid_cache and the provider just need to write the data to the hardware. Signed-off-by: Matan Barak mat...@mellanox.com Signed-off-by: Somnath Kotur somnath.ko...@emulex.com --- drivers/infiniband/core/Makefile | 3 +- drivers/infiniband/core/core_priv.h | 24 ++ drivers/infiniband/core/roce_gid_cache.c | 511 +++ drivers/infiniband/hw/mlx4/main.c| 2 - include/rdma/ib_verbs.h | 55 +++- 5 files changed, 591 insertions(+), 4 deletions(-) create mode 100644 drivers/infiniband/core/roce_gid_cache.c diff --git a/drivers/infiniband/core/Makefile b/drivers/infiniband/core/Makefile index acf7367..9b63bdf 100644 --- a/drivers/infiniband/core/Makefile +++ b/drivers/infiniband/core/Makefile @@ -9,7 +9,8 @@ obj-$(CONFIG_INFINIBAND_USER_ACCESS) += ib_uverbs.o ib_ucm.o \ $(user_access-y) ib_core-y := packer.o ud_header.o verbs.o sysfs.o \ - device.o fmr_pool.o cache.o netlink.o + device.o fmr_pool.o cache.o netlink.o \ + roce_gid_cache.o ib_core-$(CONFIG_INFINIBAND_USER_MEM) += umem.o ib_core-$(CONFIG_INFINIBAND_ON_DEMAND_PAGING) += umem_odp.o umem_rbtree.o diff --git a/drivers/infiniband/core/core_priv.h b/drivers/infiniband/core/core_priv.h index 87d1936..a502daa 100644 --- a/drivers/infiniband/core/core_priv.h +++ b/drivers/infiniband/core/core_priv.h @@ -35,6 +35,7 @@ #include linux/list.h #include linux/spinlock.h +#include net/net_namespace.h #include rdma/ib_verbs.h @@ -51,4 +52,27 @@ void ib_cache_cleanup(void); int ib_resolve_eth_l2_attrs(struct ib_qp *qp, struct ib_qp_attr *qp_attr, int *qp_attr_mask); + +int roce_gid_cache_get_gid(struct ib_device *ib_dev, u8 port, int index, + union ib_gid *gid, struct ib_gid_attr *attr); + +int roce_gid_cache_find_gid(struct ib_device *ib_dev, union ib_gid *gid, + enum ib_gid_type gid_type, struct net *net, + int if_index, u8 *port, u16 *index); + +int roce_gid_cache_find_gid_by_port(struct ib_device *ib_dev, union ib_gid *gid, + enum ib_gid_type gid_type, u8 port, + struct net *net, int if_index, u16 *index); + +int roce_gid_cache_is_active(struct ib_device *ib_dev, u8 port); + +int roce_add_gid(struct ib_device *ib_dev, u8 port, +union ib_gid *gid, struct ib_gid_attr *attr); + +int roce_del_gid(struct ib_device *ib_dev, u8 port, +union ib_gid *gid, struct ib_gid_attr *attr); + +int roce_del_all_netdev_gids(struct ib_device *ib_dev, u8 port, +struct net_device *ndev); + #endif /* _CORE_PRIV_H */ diff --git a/drivers/infiniband/core/roce_gid_cache.c b/drivers/infiniband/core/roce_gid_cache.c new file mode 100644 index 000..aa20371 --- /dev/null +++ b/drivers/infiniband/core/roce_gid_cache.c @@ -0,0 +1,511 @@ +/* + * Copyright (c) 2015, Mellanox Technologies inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + *copyright notice, this list of conditions and the following + *disclaimer. + * + * - Redistributions in binary form must reproduce the above + *copyright notice, this list of conditions and the following + *disclaimer in the documentation and/or other materials + *provided with the distribution. + * + * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT
[PATCH v2 for-next 32/32] IB/cma: Join and leave multicast groups with IGMP
From: Moni Shoua mo...@mellanox.com Since RoCEv2 is a protocol over IP header it is required to send IGMP join and leave requests to the network when joining and leaving multicast groups. Signed-off-by: Moni Shoua mo...@mellanox.com Signed-off-by: Somnath Kotur somnath.ko...@emulex.com --- drivers/infiniband/core/cma.c | 78 ++--- drivers/infiniband/core/multicast.c | 18 - include/rdma/ib_sa.h| 3 ++ 3 files changed, 92 insertions(+), 7 deletions(-) diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index 2bfe798..bc30bc5 100644 --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -38,6 +38,7 @@ #include linux/in6.h #include linux/mutex.h #include linux/random.h +#include linux/igmp.h #include linux/idr.h #include linux/inetdevice.h #include linux/slab.h @@ -196,6 +197,7 @@ struct cma_multicast { void*context; struct sockaddr_storage addr; struct kref mcref; + booligmp_joined; }; struct cma_work { @@ -283,6 +285,26 @@ static inline void cma_set_ip_ver(struct cma_hdr *hdr, u8 ip_ver) hdr-ip_version = (ip_ver 4) | (hdr-ip_version 0xF); } +static int cma_igmp_send(struct net_device *ndev, union ib_gid *mgid, bool join) +{ + struct in_device *in_dev = NULL; + + if (ndev) { + rtnl_lock(); + in_dev = __in_dev_get_rtnl(ndev); + if (in_dev) { + if (join) + ip_mc_inc_group(in_dev, + *(__be32 *)(mgid-raw+12)); + else + ip_mc_dec_group(in_dev, + *(__be32 *)(mgid-raw+12)); + } + rtnl_unlock(); + } + return (in_dev) ? 0 : -ENODEV; +} + static void cma_attach_to_dev(struct rdma_id_private *id_priv, struct cma_device *cma_dev) { @@ -1076,6 +1098,20 @@ static void cma_leave_mc_groups(struct rdma_id_private *id_priv) kfree(mc); break; case IB_LINK_LAYER_ETHERNET: + if (mc-igmp_joined) { + struct rdma_dev_addr *dev_addr = id_priv-id.route.addr.dev_addr; + struct net_device *ndev = NULL; + + if (dev_addr-bound_dev_if) + ndev = dev_get_by_index(init_net, + dev_addr-bound_dev_if); + if (ndev) { + cma_igmp_send(ndev, + mc-multicast.ib-rec.mgid, + false); + dev_put(ndev); + } + } kref_put(mc-mcref, release_mc); break; default: @@ -3356,7 +3392,7 @@ static int cma_iboe_join_multicast(struct rdma_id_private *id_priv, { struct iboe_mcast_work *work; struct rdma_dev_addr *dev_addr = id_priv-id.route.addr.dev_addr; - int err; + int err = 0; struct sockaddr *addr = (struct sockaddr *)mc-addr; struct net_device *ndev = NULL; @@ -3388,13 +3424,30 @@ static int cma_iboe_join_multicast(struct rdma_id_private *id_priv, mc-multicast.ib-rec.rate = iboe_get_rate(ndev); mc-multicast.ib-rec.hop_limit = 1; mc-multicast.ib-rec.mtu = iboe_get_mtu(ndev-mtu); + mc-multicast.ib-rec.ifindex = dev_addr-bound_dev_if; + mc-multicast.ib-rec.net = init_net; + rdma_ip2gid((struct sockaddr *)id_priv-id.route.addr.src_addr, + mc-multicast.ib-rec.port_gid); + + if (addr-sa_family == AF_INET) { + mc-multicast.ib-rec.gid_type = + id_priv-cma_dev-default_gid_type; + if (mc-multicast.ib-rec.gid_type == IB_GID_TYPE_ROCE_V2) + err = cma_igmp_send(ndev, mc-multicast.ib-rec.mgid, + true); + if (!err) { + mc-igmp_joined = true; + mc-multicast.ib-rec.hop_limit = IPV6_DEFAULT_HOPLIMIT; + } + } else { + mc-multicast.ib-rec.gid_type = IB_GID_TYPE_IB; + } dev_put(ndev); - if (!mc-multicast.ib-rec.mtu) { + if (err || !mc-multicast.ib-rec.mtu) { err = -EINVAL; goto out2; } - rdma_ip2gid((struct sockaddr *)id_priv-id.route.addr.src_addr, - mc-multicast.ib-rec.port_gid); + work-id = id_priv; work-mc = mc; INIT_WORK(work
[PATCH v2 for-next 28/32] net/mlx4_core: Add handling of R-RoCE over IPV4 in qp attach flow
From: Maor Gottlieb ma...@mellanox.com In that case, the IPv4 bit should be enabled in the IB flow spec. Signed-off-by: Maor Gottlieb ma...@mellanox.com Signed-off-by: Somnath Kotur somnath.ko...@emulex.com --- drivers/net/ethernet/mellanox/mlx4/mcg.c | 14 -- include/linux/mlx4/device.h | 6 ++ 2 files changed, 18 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx4/mcg.c b/drivers/net/ethernet/mellanox/mlx4/mcg.c index a3867e7..cdf07b9 100644 --- a/drivers/net/ethernet/mellanox/mlx4/mcg.c +++ b/drivers/net/ethernet/mellanox/mlx4/mcg.c @@ -858,7 +858,9 @@ static int parse_trans_rule(struct mlx4_dev *dev, struct mlx4_spec_list *spec, break; case MLX4_NET_TRANS_RULE_ID_IB: - rule_hw-ib.l3_qpn = spec-ib.l3_qpn; + rule_hw-ib.l3_qpn = spec-ib.l3_qpn | + (spec-ib.roce_type == MLX4_FLOW_SPEC_IB_ROCE_TYPE_IPV4 ? +0x80 : 0); rule_hw-ib.qpn_mask = spec-ib.qpn_msk; memcpy(rule_hw-ib.dst_gid, spec-ib.dst_gid, 16); memcpy(rule_hw-ib.dst_gid_msk, spec-ib.dst_gid_msk, 16); @@ -1377,10 +1379,18 @@ int mlx4_trans_to_dmfs_attach(struct mlx4_dev *dev, struct mlx4_qp *qp, memcpy(spec.eth.dst_mac_msk, mac_mask, ETH_ALEN); break; + case MLX4_PROT_IB_IPV4: + spec.id = MLX4_NET_TRANS_RULE_ID_IB; + memcpy(spec.ib.dst_gid + 12, gid + 12, 4); + memset(spec.ib.dst_gid_msk + 12, 0xff, 4); + spec.ib.roce_type = MLX4_FLOW_SPEC_IB_ROCE_TYPE_IPV4; + + break; case MLX4_PROT_IB_IPV6: spec.id = MLX4_NET_TRANS_RULE_ID_IB; memcpy(spec.ib.dst_gid, gid, 16); - memset(spec.ib.dst_gid_msk, 0xff, 16); + memset(spec.ib.dst_gid_msk, 0xff, 16); + spec.ib.roce_type = MLX4_FLOW_SPEC_IB_ROCE_TYPE_IPV6; break; default: return -EINVAL; diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h index dd1488c..58b0b8c 100644 --- a/include/linux/mlx4/device.h +++ b/include/linux/mlx4/device.h @@ -369,6 +369,11 @@ enum mlx4_protocol { MLX4_PROT_FCOE }; +enum mlx4_flow_roce_type { + MLX4_FLOW_SPEC_IB_ROCE_TYPE_IPV6 = 0, + MLX4_FLOW_SPEC_IB_ROCE_TYPE_IPV4 +}; + enum { MLX4_MTT_FLAG_PRESENT = 1 }; @@ -1096,6 +1101,7 @@ struct mlx4_spec_ipv4 { struct mlx4_spec_ib { __be32 l3_qpn; __be32 qpn_msk; + enummlx4_flow_roce_type roce_type; u8 dst_gid[16]; u8 dst_gid_msk[16]; }; -- 2.1.0 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 for-next 19/32] IB/mlx4: Remove gid table management for RoCE
From: Moni Shoua mo...@mellanox.com RoCE GID table management moved to InfiniBand core driver. Core driver is now responsible to populate the GID table and supply query and lookup functions for GIDs. HW drivers are responsible only modify GID table in network adapters. The query_gid hook should now return the answer from the cache when link layer is Ethernet. Signed-off-by: Moni Shoua mo...@mellanox.com Signed-off-by: Somnath Kotur somnath.ko...@emulex.com --- drivers/infiniband/hw/mlx4/main.c| 495 +-- drivers/infiniband/hw/mlx4/mlx4_ib.h | 4 - 2 files changed, 14 insertions(+), 485 deletions(-) diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c index 6fa5e49..91caffc 100644 --- a/drivers/infiniband/hw/mlx4/main.c +++ b/drivers/infiniband/hw/mlx4/main.c @@ -45,6 +45,7 @@ #include rdma/ib_smi.h #include rdma/ib_user_verbs.h #include rdma/ib_addr.h +#include rdma/ib_cache.h #include linux/mlx4/driver.h #include linux/mlx4/cmd.h @@ -74,13 +75,6 @@ static const char mlx4_ib_version[] = DRV_NAME : Mellanox ConnectX InfiniBand driver v DRV_VERSION ( DRV_RELDATE )\n; -struct update_gid_work { - struct work_struct work; - union ib_gidgids[128]; - struct mlx4_ib_dev *dev; - int port; -}; - static void do_slave_init(struct mlx4_ib_dev *ibdev, int slave, int do_init); static struct workqueue_struct *wq; @@ -474,23 +468,21 @@ out: return err; } -static int iboe_query_gid(struct ib_device *ibdev, u8 port, int index, - union ib_gid *gid) -{ - struct mlx4_ib_dev *dev = to_mdev(ibdev); - - *gid = dev-iboe.gid_table[port - 1][index]; - - return 0; -} - static int mlx4_ib_query_gid(struct ib_device *ibdev, u8 port, int index, union ib_gid *gid) { - if (rdma_port_get_link_layer(ibdev, port) == IB_LINK_LAYER_INFINIBAND) + int ret; + + if (ib_cache_use_roce_gid_cache(ibdev, port)) return __mlx4_ib_query_gid(ibdev, port, index, gid, 0); - else - return iboe_query_gid(ibdev, port, index, gid); + + ret = ib_get_cached_gid(ibdev, port, index, gid, NULL); + if (ret == -EAGAIN) { + memcpy(gid, zgid, sizeof(*gid)); + return 0; + } + + return ret; } int __mlx4_ib_query_pkey(struct ib_device *ibdev, u8 port, u16 index, @@ -1480,273 +1472,6 @@ static struct device_attribute *mlx4_class_attributes[] = { dev_attr_board_id }; -static void mlx4_addrconf_ifid_eui48(u8 *eui, u16 vlan_id, -struct net_device *dev) -{ - memcpy(eui, dev-dev_addr, 3); - memcpy(eui + 5, dev-dev_addr + 3, 3); - if (vlan_id 0x1000) { - eui[3] = vlan_id 8; - eui[4] = vlan_id 0xff; - } else { - eui[3] = 0xff; - eui[4] = 0xfe; - } - eui[0] ^= 2; -} - -static void update_gids_task(struct work_struct *work) -{ - struct update_gid_work *gw = container_of(work, struct update_gid_work, work); - struct mlx4_cmd_mailbox *mailbox; - union ib_gid *gids; - int err; - struct mlx4_dev *dev = gw-dev-dev; - int is_bonded = mlx4_is_bonded(dev); - - if (!gw-dev-ib_active) - return; - - mailbox = mlx4_alloc_cmd_mailbox(dev); - if (IS_ERR(mailbox)) { - pr_warn(update gid table failed %ld\n, PTR_ERR(mailbox)); - return; - } - - gids = mailbox-buf; - memcpy(gids, gw-gids, sizeof gw-gids); - - err = mlx4_cmd(dev, mailbox-dma, MLX4_SET_PORT_GID_TABLE 8 | gw-port, - 1, MLX4_CMD_SET_PORT, MLX4_CMD_TIME_CLASS_B, - MLX4_CMD_WRAPPED); - if (err) - pr_warn(set port command failed\n); - else - if ((gw-port == 1) || !is_bonded) - mlx4_ib_dispatch_event(gw-dev, - is_bonded ? 1 : gw-port, - IB_EVENT_GID_CHANGE); - - mlx4_free_cmd_mailbox(dev, mailbox); - kfree(gw); -} - -static void reset_gids_task(struct work_struct *work) -{ - struct update_gid_work *gw = - container_of(work, struct update_gid_work, work); - struct mlx4_cmd_mailbox *mailbox; - union ib_gid *gids; - int err; - struct mlx4_dev *dev = gw-dev-dev; - - if (!gw-dev-ib_active) - return; - - mailbox = mlx4_alloc_cmd_mailbox(dev); - if (IS_ERR(mailbox)) { - pr_warn(reset gid table failed\n); - goto free; - } - - gids = mailbox-buf; - memcpy(gids, gw-gids, sizeof(gw-gids)); - - if (mlx4_ib_port_link_layer(gw-dev-ib_dev, gw-port
RE: [PATCH 09/30] IB/core: Modify ib_verbs and cma in order to use roce_gid_cache
-Original Message- From: Matan Barak [mailto:mat...@mellanox.com] Sent: Monday, February 23, 2015 3:47 PM To: Devesh Sharma; Somnath Kotur; rol...@kernel.org Cc: linux-rdma@vger.kernel.org Subject: Re: [PATCH 09/30] IB/core: Modify ib_verbs and cma in order to use roce_gid_cache On 2/23/2015 7:25 AM, Devesh Sharma wrote: Hi Matan, Please find a comment inline below: -Regards Devesh -Original Message- From: linux-rdma-ow...@vger.kernel.org [mailto:linux-rdma- ow...@vger.kernel.org] On Behalf Of Somnath Kotur Sent: Friday, February 20, 2015 3:32 AM To: rol...@kernel.org Cc: linux-rdma@vger.kernel.org; Matan Barak; Somnath Kotur Subject: [PATCH 09/30] IB/core: Modify ib_verbs and cma in order to use roce_gid_cache From: Matan Barak mat...@mellanox.com Previously, we resolved the dmac and took the smac and vlan from the resolved address. Changing that into finding a net device that matches the IP and vlan of the network packet and querying the RoCE GID cache for this net device, GID and GID type. ocrdma driver changes were done by Somnath Kotur somnath.ko...@emulex.com Signed-off-by: Matan Barak mat...@mellanox.com Signed-off-by: Somnath Kotur somnath.ko...@emulex.com --- drivers/infiniband/core/addr.c |3 +- drivers/infiniband/core/cm.c | 30 -- drivers/infiniband/core/cma.c|9 -- drivers/infiniband/core/core_priv.h |4 +- drivers/infiniband/core/sa_query.c |4 - drivers/infiniband/core/ucma.c |1 - drivers/infiniband/core/uverbs_cmd.c |6 +- drivers/infiniband/core/verbs.c | 159 +-- -- drivers/infiniband/hw/mlx4/ah.c | 15 +++- drivers/infiniband/hw/mlx4/mad.c | 12 ++- drivers/infiniband/hw/mlx4/mcg.c |2 +- drivers/infiniband/hw/mlx4/mlx4_ib.h |2 +- drivers/infiniband/hw/mlx4/qp.c | 42 ++-- drivers/infiniband/hw/ocrdma/ocrdma.h|1 + drivers/infiniband/hw/ocrdma/ocrdma_ah.c | 20 +++-- drivers/infiniband/hw/ocrdma/ocrdma_hw.c | 17 ++- include/rdma/ib_addr.h |2 +- include/rdma/ib_sa.h |2 - include/rdma/ib_verbs.h |7 +- 19 files changed, 183 insertions(+), 155 deletions(-) diff --git a/drivers/infiniband/core/addr.c b/drivers/infiniband/core/addr.c index f80da50..43af7f5 100644 --- a/drivers/infiniband/core/addr.c +++ b/drivers/infiniband/core/addr.c @@ -458,7 +458,7 @@ static void resolve_cb(int status, struct sockaddr *src_addr, } int rdma_addr_find_dmac_by_grh(union ib_gid *sgid, union ib_gid *dgid, u8 *dmac, - u16 *vlan_id) + u16 *vlan_id, int if_index) { int ret = 0; struct rdma_dev_addr dev_addr; @@ -481,6 +481,7 @@ int rdma_addr_find_dmac_by_grh(union ib_gid *sgid, union ib_gid *dgid, u8 *dmac, return ret; memset(dev_addr, 0, sizeof(dev_addr)); + dev_addr.bound_dev_if = if_index; ctx.addr = dev_addr; init_completion(ctx.comp); diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c index d88f2ae..7974e74 100644 --- a/drivers/infiniband/core/cm.c +++ b/drivers/infiniband/core/cm.c @@ -178,8 +178,6 @@ struct cm_av { struct ib_ah_attr ah_attr; u16 pkey_index; u8 timeout; - u8 valid; - u8 smac[ETH_ALEN]; }; struct cm_work { @@ -382,7 +380,6 @@ static int cm_init_av_by_path(struct ib_sa_path_rec *path, struct cm_av *av) av-ah_attr); av-timeout = path-packet_life_time + 1; - av-valid = 1; return 0; } @@ -1563,7 +1560,6 @@ static int cm_req_handler(struct cm_work *work) cm_format_paths_from_req(req_msg, work-path[0], work- path[1]); memcpy(work-path[0].dmac, cm_id_priv-av.ah_attr.dmac, ETH_ALEN); - work-path[0].vlan_id = cm_id_priv-av.ah_attr.vlan_id; ret = cm_init_av_by_path(work-path[0], cm_id_priv-av); if (ret) { ib_get_cached_gid(work-port-cm_dev-ib_device, @@ -3511,32 +3507,6 @@ static int cm_init_qp_rtr_attr(struct cm_id_private *cm_id_priv, *qp_attr_mask = IB_QP_STATE | IB_QP_AV | IB_QP_PATH_MTU | IB_QP_DEST_QPN | IB_QP_RQ_PSN; qp_attr-ah_attr = cm_id_priv-av.ah_attr; - if (!cm_id_priv-av.valid) { - spin_unlock_irqrestore(cm_id_priv-lock, flags); - return -EINVAL; - } - if (cm_id_priv-av.ah_attr.vlan_id != 0x) { - qp_attr-vlan_id = cm_id_priv-av.ah_attr.vlan_id; - *qp_attr_mask |= IB_QP_VID; - } - if (!is_zero_ether_addr(cm_id_priv-av.smac)) { - memcpy(qp_attr-smac, cm_id_priv-av.smac
RE: [PATCH] RDMA/ocrdma: Changes in driver to incorporate the moving of GID Table mgmt to IB/Core.
Shachar, Yes, it happened by mistake which I realized and immediately sent out the patch with the correct patch number Thanks Som -Original Message- From: Shachar Raindel [mailto:rain...@mellanox.com] Sent: Thursday, February 19, 2015 2:31 PM To: Somnath Kotur; rol...@kernel.org Cc: linux-rdma@vger.kernel.org; Devesh Sharma Subject: RE: [PATCH] RDMA/ocrdma: Changes in driver to incorporate the moving of GID Table mgmt to IB/Core. -Original Message- From: linux-rdma-ow...@vger.kernel.org [mailto:linux-rdma- ow...@vger.kernel.org] On Behalf Of Somnath Kotur Sent: Friday, February 20, 2015 12:02 AM To: rol...@kernel.org Cc: linux-rdma@vger.kernel.org; Somnath Kotur; Devesh Sharma Subject: [PATCH] RDMA/ocrdma: Changes in driver to incorporate the moving of GID Table mgmt to IB/Core. Som, the patch number seems to be missing here. When sending next iteration, please make sure: - That all patches include the proper numbers - That the version of the patchset is cleanly indicated in the header. You can use --subject-prefix=PATCH V2 when using format-patch to make this happen. Thanks, --Shachar -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH v1 00/30] IB/Core: Adding support for RoCEV2 Specification
Hi Bart, Here's the link to the git tree with the patches https://github.com/matanb10/linux.git branch name: rocev2_rc4 Thanks Som -Original Message- From: Bart Van Assche [mailto:bart.vanass...@sandisk.com] Sent: Thursday, February 19, 2015 1:47 PM To: Somnath Kotur; rol...@kernel.org Cc: linux-rdma@vger.kernel.org Subject: Re: [PATCH v1 00/30] IB/Core: Adding support for RoCEV2 Specification On 02/19/15 23:02, Somnath Kotur wrote: This series depends on RoCE LAG series (already accepted in net-next tree) Hello Somnath, Can you make a git tree available with these patches ? These patches do not apply cleanly on Dave Miller's latest net-next branch (git commit ID fece13ca005a5f559147e9424321f4b5e01272b4; Feb 17, 2015). Thanks, Bart. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 01/30] IB/core: Add RoCE GID cache
From: Matan Barak mat...@mellanox.com In order to manage multiple types, vlans and MACs per GID, we need to store them along the GID itself. We store the net device as well, as sometimes GIDs should be handled according to the net device they came from. Since populating the GID table should be identical for every RoCE provider, the GIDs table should be handled in ib_core. Adding a GID cache table that supports a lockless find, add and delete gids. The lockless nature comes from using a unique sequence number per table entry and detecting that while reading/ writing this sequence wasn't changed. By using this RoCE GID cache table, providers must implement a modify_gid callback. The table is managed exclusively by this roce_gid_cache and the provider just need to write the data to the hardware. Signed-off-by: Matan Barak mat...@mellanox.com Signed-off-by: Somnath Kotur somnath.ko...@emulex.com --- drivers/infiniband/core/Makefile |3 +- drivers/infiniband/core/core_priv.h | 24 ++ drivers/infiniband/core/roce_gid_cache.c | 511 ++ drivers/infiniband/hw/mlx4/main.c|2 - include/rdma/ib_verbs.h | 55 - 5 files changed, 591 insertions(+), 4 deletions(-) create mode 100644 drivers/infiniband/core/roce_gid_cache.c diff --git a/drivers/infiniband/core/Makefile b/drivers/infiniband/core/Makefile index acf7367..9b63bdf 100644 --- a/drivers/infiniband/core/Makefile +++ b/drivers/infiniband/core/Makefile @@ -9,7 +9,8 @@ obj-$(CONFIG_INFINIBAND_USER_ACCESS) += ib_uverbs.o ib_ucm.o \ $(user_access-y) ib_core-y := packer.o ud_header.o verbs.o sysfs.o \ - device.o fmr_pool.o cache.o netlink.o + device.o fmr_pool.o cache.o netlink.o \ + roce_gid_cache.o ib_core-$(CONFIG_INFINIBAND_USER_MEM) += umem.o ib_core-$(CONFIG_INFINIBAND_ON_DEMAND_PAGING) += umem_odp.o umem_rbtree.o diff --git a/drivers/infiniband/core/core_priv.h b/drivers/infiniband/core/core_priv.h index 87d1936..a502daa 100644 --- a/drivers/infiniband/core/core_priv.h +++ b/drivers/infiniband/core/core_priv.h @@ -35,6 +35,7 @@ #include linux/list.h #include linux/spinlock.h +#include net/net_namespace.h #include rdma/ib_verbs.h @@ -51,4 +52,27 @@ void ib_cache_cleanup(void); int ib_resolve_eth_l2_attrs(struct ib_qp *qp, struct ib_qp_attr *qp_attr, int *qp_attr_mask); + +int roce_gid_cache_get_gid(struct ib_device *ib_dev, u8 port, int index, + union ib_gid *gid, struct ib_gid_attr *attr); + +int roce_gid_cache_find_gid(struct ib_device *ib_dev, union ib_gid *gid, + enum ib_gid_type gid_type, struct net *net, + int if_index, u8 *port, u16 *index); + +int roce_gid_cache_find_gid_by_port(struct ib_device *ib_dev, union ib_gid *gid, + enum ib_gid_type gid_type, u8 port, + struct net *net, int if_index, u16 *index); + +int roce_gid_cache_is_active(struct ib_device *ib_dev, u8 port); + +int roce_add_gid(struct ib_device *ib_dev, u8 port, +union ib_gid *gid, struct ib_gid_attr *attr); + +int roce_del_gid(struct ib_device *ib_dev, u8 port, +union ib_gid *gid, struct ib_gid_attr *attr); + +int roce_del_all_netdev_gids(struct ib_device *ib_dev, u8 port, +struct net_device *ndev); + #endif /* _CORE_PRIV_H */ diff --git a/drivers/infiniband/core/roce_gid_cache.c b/drivers/infiniband/core/roce_gid_cache.c new file mode 100644 index 000..8f6af4a --- /dev/null +++ b/drivers/infiniband/core/roce_gid_cache.c @@ -0,0 +1,511 @@ +/* + * Copyright (c) 2015, Mellanox Technologies inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + *copyright notice, this list of conditions and the following + *disclaimer. + * + * - Redistributions in binary form must reproduce the above + *copyright notice, this list of conditions and the following + *disclaimer in the documentation and/or other materials + *provided with the distribution. + * + * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT
[PATCH 14/30] RDMA/ocrdma: Changes in driver to incorporate the moving of GID Table mgmt to IB/Core.
1.Check and set port capability flags to indicate RoCEV2 support. 2.Change query_gid hook to return value from IB/Core GID Mgmt APIs. 3.Get rid of all the netdev notifier chain subscription code as well as maintenance of SGID Table in memory. 4.Implement get_netdev hook in driver. Signed-off-by: Somnath Kotur somnath.ko...@emulex.com Signed-off-by: Devesh Sharma devesh.sha...@emulex.com --- drivers/infiniband/hw/ocrdma/ocrdma.h | 10 ++ drivers/infiniband/hw/ocrdma/ocrdma_hw.c|3 + drivers/infiniband/hw/ocrdma/ocrdma_main.c | 233 +-- drivers/infiniband/hw/ocrdma/ocrdma_sli.h | 13 ++ drivers/infiniband/hw/ocrdma/ocrdma_verbs.c | 31 - drivers/infiniband/hw/ocrdma/ocrdma_verbs.h |4 + 6 files changed, 63 insertions(+), 231 deletions(-) diff --git a/drivers/infiniband/hw/ocrdma/ocrdma.h b/drivers/infiniband/hw/ocrdma/ocrdma.h index 16ee36e..97f971a 100644 --- a/drivers/infiniband/hw/ocrdma/ocrdma.h +++ b/drivers/infiniband/hw/ocrdma/ocrdma.h @@ -100,6 +100,7 @@ struct ocrdma_dev_attr { u8 local_ca_ack_delay; u8 ird; u8 num_ird_pages; + u8 roce_flags; }; struct ocrdma_dma_mem { @@ -575,4 +576,13 @@ static inline u8 ocrdma_is_enabled_and_synced(u32 state) (state OCRDMA_STATE_FLAG_SYNC); } +static inline bool ocrdma_is_rocev2_supported(struct ocrdma_dev *dev) +{ + return (dev-attr.roce_flags (OCRDMA_L3_TYPE_IPV4 + OCRDMA_ROUDP_FLAGS_SHIFT) || + dev-attr.roce_flags (OCRDMA_L3_TYPE_IPV6 + OCRDMA_ROUDP_FLAGS_SHIFT)) ? + true : false; +} + #endif diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_hw.c b/drivers/infiniband/hw/ocrdma/ocrdma_hw.c index c0dda74..cb98911 100644 --- a/drivers/infiniband/hw/ocrdma/ocrdma_hw.c +++ b/drivers/infiniband/hw/ocrdma/ocrdma_hw.c @@ -1112,6 +1112,9 @@ static void ocrdma_get_attr(struct ocrdma_dev *dev, attr-local_ca_ack_delay = (rsp-max_pd_ca_ack_delay OCRDMA_MBX_QUERY_CFG_CA_ACK_DELAY_MASK) OCRDMA_MBX_QUERY_CFG_CA_ACK_DELAY_SHIFT; + attr-roce_flags = (rsp-max_pd_ca_ack_delay + OCRDMA_MBX_QUERY_CFG_L3_TYPE_MASK) + OCRDMA_MBX_QUERY_CFG_L3_TYPE_SHIFT; attr-max_mw = rsp-max_mw; attr-max_mr = rsp-max_mr; attr-max_mr_size = ((u64)rsp-max_mr_size_hi 32) | diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_main.c b/drivers/infiniband/hw/ocrdma/ocrdma_main.c index 7a2b59a..a81492f 100644 --- a/drivers/infiniband/hw/ocrdma/ocrdma_main.c +++ b/drivers/infiniband/hw/ocrdma/ocrdma_main.c @@ -51,8 +51,6 @@ static LIST_HEAD(ocrdma_dev_list); static DEFINE_SPINLOCK(ocrdma_devlist_lock); static DEFINE_IDR(ocrdma_dev_id); -static union ib_gid ocrdma_zero_sgid; - void ocrdma_get_guid(struct ocrdma_dev *dev, u8 *guid) { u8 mac_addr[6]; @@ -67,135 +65,6 @@ void ocrdma_get_guid(struct ocrdma_dev *dev, u8 *guid) guid[6] = mac_addr[4]; guid[7] = mac_addr[5]; } - -static bool ocrdma_add_sgid(struct ocrdma_dev *dev, union ib_gid *new_sgid) -{ - int i; - unsigned long flags; - - memset(ocrdma_zero_sgid, 0, sizeof(union ib_gid)); - - - spin_lock_irqsave(dev-sgid_lock, flags); - for (i = 0; i OCRDMA_MAX_SGID; i++) { - if (!memcmp(dev-sgid_tbl[i], ocrdma_zero_sgid, - sizeof(union ib_gid))) { - /* found free entry */ - memcpy(dev-sgid_tbl[i], new_sgid, - sizeof(union ib_gid)); - spin_unlock_irqrestore(dev-sgid_lock, flags); - return true; - } else if (!memcmp(dev-sgid_tbl[i], new_sgid, - sizeof(union ib_gid))) { - /* entry already present, no addition is required. */ - spin_unlock_irqrestore(dev-sgid_lock, flags); - return false; - } - } - spin_unlock_irqrestore(dev-sgid_lock, flags); - return false; -} - -static bool ocrdma_del_sgid(struct ocrdma_dev *dev, union ib_gid *sgid) -{ - int found = false; - int i; - unsigned long flags; - - - spin_lock_irqsave(dev-sgid_lock, flags); - /* first is default sgid, which cannot be deleted. */ - for (i = 1; i OCRDMA_MAX_SGID; i++) { - if (!memcmp(dev-sgid_tbl[i], sgid, sizeof(union ib_gid))) { - /* found matching entry */ - memset(dev-sgid_tbl[i], 0, sizeof(union ib_gid)); - found = true; - break; - } - } - spin_unlock_irqrestore(dev-sgid_lock, flags); - return found; -} - -static int ocrdma_addr_event(unsigned long event
[PATCH 13/30] IB/Core: Changes to the IB Core infrastructure for RoCEv2 support
1. Choose sgid_index and type from all the matching entries in RDMA-CM based on hint from the IP stack. 2. Set hop_limit for the IP Packet based on above hint from IP stack 3. Define a RDMA_NETWORK enum type. Signed-off-by: Somnath Kotur somnath.ko...@emulex.com Signed-off-by: Matan Barak mat...@mellanox.com --- drivers/infiniband/core/addr.c |8 drivers/infiniband/core/cma.c | 10 +- drivers/infiniband/core/verbs.c | 70 +-- include/rdma/ib_addr.h |1 + include/rdma/ib_verbs.h |6 +++ 5 files changed, 62 insertions(+), 33 deletions(-) diff --git a/drivers/infiniband/core/addr.c b/drivers/infiniband/core/addr.c index 43af7f5..da24c0e 100644 --- a/drivers/infiniband/core/addr.c +++ b/drivers/infiniband/core/addr.c @@ -257,6 +257,9 @@ static int addr4_resolve(struct sockaddr_in *src_in, goto put; } + if (rt-rt_uses_gateway) + addr-network = RDMA_NETWORK_IPV4; + ret = dst_fetch_ha(rt-dst, addr, fl4.daddr); put: ip_rt_put(rt); @@ -271,6 +274,7 @@ static int addr6_resolve(struct sockaddr_in6 *src_in, { struct flowi6 fl6; struct dst_entry *dst; + struct rt6_info *rt; int ret; memset(fl6, 0, sizeof fl6); @@ -282,6 +286,7 @@ static int addr6_resolve(struct sockaddr_in6 *src_in, if ((ret = dst-error)) goto put; + rt = (struct rt6_info *)dst; if (ipv6_addr_any(fl6.saddr)) { ret = ipv6_dev_get_saddr(init_net, ip6_dst_idev(dst)-dev, fl6.daddr, 0, fl6.saddr); @@ -305,6 +310,9 @@ static int addr6_resolve(struct sockaddr_in6 *src_in, goto put; } + if (rt-rt6i_flags RTF_GATEWAY) + addr-network = RDMA_NETWORK_IPV6; + ret = dst_fetch_ha(dst, addr, fl6.daddr); put: dst_release(dst); diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index 237f2dd..50635fe 100644 --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -1952,6 +1952,7 @@ static int cma_resolve_iboe_route(struct rdma_id_private *id_priv) { struct rdma_route *route = id_priv-id.route; struct rdma_addr *addr = route-addr; + enum ib_gid_type network_gid_type; struct cma_work *work; int ret; struct net_device *ndev = NULL; @@ -1990,7 +1991,14 @@ static int cma_resolve_iboe_route(struct rdma_id_private *id_priv) rdma_ip2gid((struct sockaddr *)id_priv-id.route.addr.dst_addr, route-path_rec-dgid); - route-path_rec-hop_limit = 1; + /* Use the hint from IP Stack to select GID Type */ + network_gid_type = ib_network_to_gid_type(addr-dev_addr.network); + if (addr-dev_addr.network != RDMA_NETWORK_IB) { + route-path_rec-gid_type = network_gid_type; + route-path_rec-hop_limit = IPV6_DEFAULT_HOPLIMIT; + } else { + route-path_rec-hop_limit = 1; + } route-path_rec-reversible = 1; route-path_rec-pkey = cpu_to_be16(0x); route-path_rec-mtu_selector = IB_SA_EQ; diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c index 0fdac14..5478c5d 100644 --- a/drivers/infiniband/core/verbs.c +++ b/drivers/infiniband/core/verbs.c @@ -219,37 +219,6 @@ static int ib_get_grh_header_version(const void *h) return 6; } -static int ib_get_dgid_sgid_by_grh(const void *h, - enum rdma_network_type net_type, - union ib_gid *dgid, union ib_gid *sgid) -{ - switch (net_type) { - case RDMA_NETWORK_IPV4: { - const struct iphdr *ip4h = (struct iphdr *)(h + 20); - - ipv6_addr_set_v4mapped(ip4h-daddr, (struct in6_addr *)dgid); - ipv6_addr_set_v4mapped(ip4h-saddr, (struct in6_addr *)sgid); - return 0; - } - case RDMA_NETWORK_IPV6: { - struct ipv6hdr *ip6h = (struct ipv6hdr *)h; - - memcpy(dgid, ip6h-daddr, sizeof(*dgid)); - memcpy(sgid, ip6h-saddr, sizeof(*sgid)); - return 0; - } - case RDMA_NETWORK_IB: { - struct ib_grh *grh = (struct ib_grh *)h; - - memcpy(dgid, grh-dgid, sizeof(*dgid)); - memcpy(sgid, grh-sgid, sizeof(*sgid)); - return 0; - } - } - - return -EINVAL; -} - static enum rdma_network_type ib_get_net_type_by_grh(struct ib_device *device, u8 port_num, const struct ib_grh *grh) @@ -305,6 +274,40 @@ static int get_sgid_index_from_eth(struct ib_device *device, u8 port_num, context, gid_index); } +static int get_gids_from_grh(struct ib_grh *grh, enum rdma_network_type
[PATCH 00/30] IB/Core: Adding support for RoCEV2 Specification
Hi Roland, This patch series was created out of collaboration between Emulex and Mellanox. While Emulex sent out the RoCEV2 patch first to the community, Mellanox which was also working on some core infrastructure changes from the ground-up towards RoCEV2 felt that the RoCEV2 patch would be better served if done on top of their basic infrastructure changes to associate entities like MAC, VLAN, IP Address with GIDs and thereby move GID Table Management from HW Vendor drivers to IB/Core. This patchset is the result of joint development effort between the two teams. Patch 0001 creates a new infrastructure for storing GIDs and their attributes in IB/core. This infrastructure support lock-less read of GIDs using a sequence number. The data structure is initialized only for RoCE ports. Every gid has meta information describes its related net device and its type. Patches 0002, 0004 and 0005 add population of this table for various cases based on net device events. We always enable default gids for an active device (an active device is defined here as a device that doesn't have a bonding master or is the current active slave). This is done in order to allow loopback traffic Patch 0005 adds proper bonding support - only the active slaves retain their master's IP based gids and default gids. This whole concept needs to fit the existing sysfs model, thus patch 0006 adds sysfs entries that represent the net device and gid type related to each gid. Patches 0002, 0007, 0008 and 0009 changes the rest of IB/core to fit the new model. Instead of storing smac and vlan, we store either if_index, gid and gid_type or sgid_index. Either set suffices in order to resolve all the required Ethernet parameters. ib_init_ah_from_wc was changed, such as that when a wc is arrived, we query all the net devices in all namespaces trying to find a match. This match is later used to find an appropriate sgid_index. Patch 0010 is used in order to configure the default mode of the cma. In order to avoid changing existing rdma-cm applications, we adds a configfs that states for each ib device what's the default RoCE mode. Patch 0011 mainly corrects the hop limit value and adds a hint about RoCE type according to whether we have a gateway. This is the patch that makes it possible for applications to seamlessly interop between RoCE V1 and V2 without undergoing any changes themselves. The rest of the patches add support for ocrdma and mlx4 devices. This series depends on RoCE LAG series (already accepted in net-next tree) Thanks, Somnath, Devesh, Moni and Matan Devesh Sharma (3): RDMA/ocrdma: changes to support RoCE-v2 in UD path RDMA/ocrdma: changes to support RoCE-v2 in RC path RDMA/ocrdma: changes to support user AH creation Matan Barak (12): IB/core: Add RoCE GID cache IB/core: Add kref to IB devices IB/core: Add RoCE GID population IB/core: Add default GID for RoCE GID Cache IB/core: Add RoCE cache bonding support IB/core: GID attribute should be returned from verbs API and cache API IB/core: Report gid_type and gid_ndev through sysfs IB/core: Support find sgid index using a filter function IB/core: Modify ib_verbs and cma in order to use roce_gid_cache IB/core: Add gid_type to path and rdma_id_private IB/core: Add rdma_network_type to wc IB/cma: Add configfs for rdma_cm Moni Shoua (13): IB/mlx4: Remove gid table management for RoCE IB/mlx4: Replace spin_lock with rw_semaphore IB/mlx4: Lock with RCU instead of RTNL net/mlx4: Postpone the registration of net_device IB/mlx4: Advertise RoCE support in port capabilities IB/mlx4: Implement ib_device callback - get_netdev IB/mlx4: Implement ib_device callback - modify_gid IB/mlx4: Configure device to work in RoCEv2 IB/mlx4: Translate cache gid index to real index IB/core: Initialize UD header structure with IP and UDP headers IB/mlx4: Enable send of RoCE QP1 packets with IP/UDP headers IB/mlx4: Create and use another QP1 for RoCEv2 IB/cma: Join and leave multicast groups with IGMP Somnath Kotur (2): IB/Core: Changes to the IB Core infrastructure for RoCEv2 support RDMA/ocrdma: Changes in driver to incorporate the moving of GID Table mgmt to IB/Core. drivers/infiniband/core/Makefile |5 +- drivers/infiniband/core/addr.c | 11 +- drivers/infiniband/core/cache.c| 249 +++-- drivers/infiniband/core/cm.c | 49 +-- drivers/infiniband/core/cma.c | 229 ++-- drivers/infiniband/core/cma_configfs.c | 222 +++ drivers/infiniband/core/core_priv.h| 88 +++- drivers/infiniband/core/device.c | 150 +- drivers/infiniband/core/mad.c |2 +- drivers/infiniband/core/multicast.c|3 +- drivers/infiniband/core/roce_gid_cache.c | 755 drivers/infiniband/core/roce_gid_mgmt.c| 703 ++ drivers
[PATCH 05/30] IB/core: Add RoCE cache bonding support
From: Matan Barak mat...@mellanox.com Bonding is a unique behavior since when working in active-backup mode, only the current selected slave should occupy the default GIDs and the master's GID. Listening to bonding events and only adding the required GIDs to the active slave in the RoCE cache GID table. Signed-off-by: Matan Barak mat...@mellanox.com Signed-off-by: Somnath Kotur somnath.ko...@emulex.com --- drivers/infiniband/core/roce_gid_mgmt.c | 137 ++- drivers/net/bonding/bond_options.c | 13 --- include/net/bonding.h |7 ++ 3 files changed, 140 insertions(+), 17 deletions(-) diff --git a/drivers/infiniband/core/roce_gid_mgmt.c b/drivers/infiniband/core/roce_gid_mgmt.c index b65eab8..e724295 100644 --- a/drivers/infiniband/core/roce_gid_mgmt.c +++ b/drivers/infiniband/core/roce_gid_mgmt.c @@ -37,6 +37,7 @@ /* For in6_dev_get/in6_dev_put */ #include net/addrconf.h +#include net/bonding.h #include rdma/ib_cache.h #include rdma/ib_addr.h @@ -127,12 +128,40 @@ static void update_gid(enum gid_op_type gid_op, struct ib_device *ib_dev, } } +#define IS_NETDEV_BONDING_MASTER(ndev) \ + (((ndev)-priv_flags \ + (IFF_BONDING | IFF_MASTER)) == (IFF_BONDING | IFF_MASTER)) + +enum bonding_slave_state { + BONDING_SLAVE_STATE_ACTIVE, + BONDING_SLAVE_STATE_INACTIVE, + BONDING_SLAVE_STATE_NA +}; + +static enum bonding_slave_state is_eth_active_slave_of_bonding(struct net_device *idev, + struct net_device *upper) +{ + if (upper IS_NETDEV_BONDING_MASTER(upper)) { + struct net_device *pdev; + + rcu_read_lock(); + pdev = bond_option_active_slave_get_rcu(netdev_priv(upper)); + rcu_read_unlock(); + if (pdev) + return idev == pdev ? BONDING_SLAVE_STATE_ACTIVE : + BONDING_SLAVE_STATE_INACTIVE; + } + + return BONDING_SLAVE_STATE_NA; +} + static int is_eth_port_of_netdev(struct ib_device *ib_dev, u8 port, struct net_device *idev, void *cookie) { struct net_device *rdev; struct net_device *mdev; struct net_device *ndev = (struct net_device *)cookie; + int res; if (!idev) return 0; @@ -140,9 +169,16 @@ static int is_eth_port_of_netdev(struct ib_device *ib_dev, u8 port, rcu_read_lock(); mdev = netdev_master_upper_dev_get_rcu(idev); rdev = rdma_vlan_dev_real_dev(ndev); - rcu_read_unlock(); + if (!rdev) + rdev = ndev; - return (rdev ? rdev : ndev) == (mdev ? mdev : idev); + res = (rdev == idev || + (rdev == mdev + is_eth_active_slave_of_bonding(idev, mdev) != + BONDING_SLAVE_STATE_INACTIVE)); + + rcu_read_unlock(); + return res; } static int pass_all_filter(struct ib_device *ib_dev, u8 port, @@ -151,6 +187,26 @@ static int pass_all_filter(struct ib_device *ib_dev, u8 port, return 1; } +static int bonding_slaves_filter(struct ib_device *ib_dev, u8 port, +struct net_device *idev, void *cookie) +{ + struct net_device *mdev; + struct net_device *rdev; + struct net_device *ndev = (struct net_device *)cookie; + + rdev = rdma_vlan_dev_real_dev(ndev); + + ndev = rdev ? rdev : ndev; + if (!idev || !IS_NETDEV_BONDING_MASTER(ndev)) + return 0; + + rcu_read_lock(); + mdev = netdev_master_upper_dev_get_rcu(idev); + rcu_read_unlock(); + + return ndev == mdev; +} + static void netdevice_event_work_handler(struct work_struct *_work) { struct netdev_event_work *work = @@ -186,8 +242,16 @@ static void enum_netdev_default_gids(struct ib_device *ib_dev, { unsigned long gid_type_mask; - if (idev != ndev) + rcu_read_lock(); + if (!idev || + ((idev != ndev netdev_master_upper_dev_get_rcu(idev) != ndev) || +is_eth_active_slave_of_bonding(idev, + netdev_master_upper_dev_get_rcu(idev)) == +BONDING_SLAVE_STATE_INACTIVE)) { + rcu_read_unlock(); return; + } + rcu_read_unlock(); gid_type_mask = gid_type_mask_support(ib_dev, port); @@ -195,6 +259,35 @@ static void enum_netdev_default_gids(struct ib_device *ib_dev, ROCE_GID_CACHE_DEFAULT_MODE_SET); } +static void bond_delete_netdev_default_gids(struct ib_device *ib_dev, + u8 port, struct net_device *ndev, + struct net_device *idev) +{ + struct net_device *upper; + + if (!idev) + return; + + rcu_read_lock(); + upper
[PATCH 09/30] IB/core: Modify ib_verbs and cma in order to use roce_gid_cache
From: Matan Barak mat...@mellanox.com Previously, we resolved the dmac and took the smac and vlan from the resolved address. Changing that into finding a net device that matches the IP and vlan of the network packet and querying the RoCE GID cache for this net device, GID and GID type. ocrdma driver changes were done by Somnath Kotur somnath.ko...@emulex.com Signed-off-by: Matan Barak mat...@mellanox.com Signed-off-by: Somnath Kotur somnath.ko...@emulex.com --- drivers/infiniband/core/addr.c |3 +- drivers/infiniband/core/cm.c | 30 -- drivers/infiniband/core/cma.c|9 -- drivers/infiniband/core/core_priv.h |4 +- drivers/infiniband/core/sa_query.c |4 - drivers/infiniband/core/ucma.c |1 - drivers/infiniband/core/uverbs_cmd.c |6 +- drivers/infiniband/core/verbs.c | 159 + drivers/infiniband/hw/mlx4/ah.c | 15 +++- drivers/infiniband/hw/mlx4/mad.c | 12 ++- drivers/infiniband/hw/mlx4/mcg.c |2 +- drivers/infiniband/hw/mlx4/mlx4_ib.h |2 +- drivers/infiniband/hw/mlx4/qp.c | 42 ++-- drivers/infiniband/hw/ocrdma/ocrdma.h|1 + drivers/infiniband/hw/ocrdma/ocrdma_ah.c | 20 +++-- drivers/infiniband/hw/ocrdma/ocrdma_hw.c | 17 ++- include/rdma/ib_addr.h |2 +- include/rdma/ib_sa.h |2 - include/rdma/ib_verbs.h |7 +- 19 files changed, 183 insertions(+), 155 deletions(-) diff --git a/drivers/infiniband/core/addr.c b/drivers/infiniband/core/addr.c index f80da50..43af7f5 100644 --- a/drivers/infiniband/core/addr.c +++ b/drivers/infiniband/core/addr.c @@ -458,7 +458,7 @@ static void resolve_cb(int status, struct sockaddr *src_addr, } int rdma_addr_find_dmac_by_grh(union ib_gid *sgid, union ib_gid *dgid, u8 *dmac, - u16 *vlan_id) + u16 *vlan_id, int if_index) { int ret = 0; struct rdma_dev_addr dev_addr; @@ -481,6 +481,7 @@ int rdma_addr_find_dmac_by_grh(union ib_gid *sgid, union ib_gid *dgid, u8 *dmac, return ret; memset(dev_addr, 0, sizeof(dev_addr)); + dev_addr.bound_dev_if = if_index; ctx.addr = dev_addr; init_completion(ctx.comp); diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c index d88f2ae..7974e74 100644 --- a/drivers/infiniband/core/cm.c +++ b/drivers/infiniband/core/cm.c @@ -178,8 +178,6 @@ struct cm_av { struct ib_ah_attr ah_attr; u16 pkey_index; u8 timeout; - u8 valid; - u8 smac[ETH_ALEN]; }; struct cm_work { @@ -382,7 +380,6 @@ static int cm_init_av_by_path(struct ib_sa_path_rec *path, struct cm_av *av) av-ah_attr); av-timeout = path-packet_life_time + 1; - av-valid = 1; return 0; } @@ -1563,7 +1560,6 @@ static int cm_req_handler(struct cm_work *work) cm_format_paths_from_req(req_msg, work-path[0], work-path[1]); memcpy(work-path[0].dmac, cm_id_priv-av.ah_attr.dmac, ETH_ALEN); - work-path[0].vlan_id = cm_id_priv-av.ah_attr.vlan_id; ret = cm_init_av_by_path(work-path[0], cm_id_priv-av); if (ret) { ib_get_cached_gid(work-port-cm_dev-ib_device, @@ -3511,32 +3507,6 @@ static int cm_init_qp_rtr_attr(struct cm_id_private *cm_id_priv, *qp_attr_mask = IB_QP_STATE | IB_QP_AV | IB_QP_PATH_MTU | IB_QP_DEST_QPN | IB_QP_RQ_PSN; qp_attr-ah_attr = cm_id_priv-av.ah_attr; - if (!cm_id_priv-av.valid) { - spin_unlock_irqrestore(cm_id_priv-lock, flags); - return -EINVAL; - } - if (cm_id_priv-av.ah_attr.vlan_id != 0x) { - qp_attr-vlan_id = cm_id_priv-av.ah_attr.vlan_id; - *qp_attr_mask |= IB_QP_VID; - } - if (!is_zero_ether_addr(cm_id_priv-av.smac)) { - memcpy(qp_attr-smac, cm_id_priv-av.smac, - sizeof(qp_attr-smac)); - *qp_attr_mask |= IB_QP_SMAC; - } - if (cm_id_priv-alt_av.valid) { - if (cm_id_priv-alt_av.ah_attr.vlan_id != 0x) { - qp_attr-alt_vlan_id = - cm_id_priv-alt_av.ah_attr.vlan_id; - *qp_attr_mask |= IB_QP_ALT_VID; - } - if (!is_zero_ether_addr(cm_id_priv-alt_av.smac)) { - memcpy(qp_attr-alt_smac, - cm_id_priv-alt_av.smac, - sizeof(qp_attr-alt_smac)); - *qp_attr_mask |= IB_QP_ALT_SMAC
[PATCH 07/30] IB/core: Report gid_type and gid_ndev through sysfs
From: Matan Barak mat...@mellanox.com Since we've added GID attributes to the RoCE GID table, the users need a convenient way to query them. Adding the GID type and relate net device to IB's sysfs. The new attributes are available in: /sys/class/infiniband/device/ports/port/gid_attrs/ndevs/index /sys/class/infiniband/device/ports/port/gid_attrs/types/index The index corresponds to the index of the respective GID in: /sys/class/infiniband/device/ports/port/gids/index Signed-off-by: Matan Barak mat...@mellanox.com Signed-off-by: Somnath Kotur somnath.ko...@emulex.com --- drivers/infiniband/core/core_priv.h |2 + drivers/infiniband/core/roce_gid_cache.c | 13 ++ drivers/infiniband/core/sysfs.c | 185 +- 3 files changed, 198 insertions(+), 2 deletions(-) diff --git a/drivers/infiniband/core/core_priv.h b/drivers/infiniband/core/core_priv.h index 6ab40a9..411672f 100644 --- a/drivers/infiniband/core/core_priv.h +++ b/drivers/infiniband/core/core_priv.h @@ -71,6 +71,8 @@ void ib_enum_roce_ports_of_netdev(roce_netdev_filter filter, roce_netdev_callback cb, void *cookie); +const char *roce_gid_cache_type_str(enum ib_gid_type gid_type); + int roce_gid_cache_get_gid(struct ib_device *ib_dev, u8 port, int index, union ib_gid *gid, struct ib_gid_attr *attr); diff --git a/drivers/infiniband/core/roce_gid_cache.c b/drivers/infiniband/core/roce_gid_cache.c index fc6a4e6..895b9c1 100644 --- a/drivers/infiniband/core/roce_gid_cache.c +++ b/drivers/infiniband/core/roce_gid_cache.c @@ -48,6 +48,11 @@ enum gid_attr_find_mask { GID_ATTR_FIND_MASK_NETDEV = 1UL 1, }; +static const char * const gid_type_str[] = { + [IB_GID_TYPE_IB]= IB/RoCE V1\n, + [IB_GID_TYPE_ROCE_V2] = RoCE V2\n, +}; + static inline int start_port(struct ib_device *ib_dev) { return (ib_dev-node_type == RDMA_NODE_IB_SWITCH) ? 0 : 1; @@ -58,6 +63,14 @@ struct dev_put_rcu { struct net_device *ndev; }; +const char *roce_gid_cache_type_str(enum ib_gid_type gid_type) +{ + if (gid_type ARRAY_SIZE(gid_type_str) gid_type_str[gid_type]) + return gid_type_str[gid_type]; + + return Invalid GID type; +} + static void put_ndev(struct rcu_head *rcu) { struct dev_put_rcu *put_rcu = diff --git a/drivers/infiniband/core/sysfs.c b/drivers/infiniband/core/sysfs.c index 5cee246..51f0e32 100644 --- a/drivers/infiniband/core/sysfs.c +++ b/drivers/infiniband/core/sysfs.c @@ -37,12 +37,22 @@ #include linux/slab.h #include linux/stat.h #include linux/string.h +#include linux/netdevice.h #include rdma/ib_mad.h +struct ib_port; + +struct gid_attr_group { + struct ib_port *port; + struct kobject kobj; + struct attribute_group ndev; + struct attribute_group type; +}; struct ib_port { struct kobject kobj; struct ib_device *ibdev; + struct gid_attr_group *gid_attr_group; struct attribute_group gid_group; struct attribute_group pkey_group; u8 port_num; @@ -84,6 +94,24 @@ static const struct sysfs_ops port_sysfs_ops = { .show = port_attr_show }; +static ssize_t gid_attr_show(struct kobject *kobj, +struct attribute *attr, char *buf) +{ + struct port_attribute *port_attr = + container_of(attr, struct port_attribute, attr); + struct ib_port *p = container_of(kobj, struct gid_attr_group, +kobj)-port; + + if (!port_attr-show) + return -EIO; + + return port_attr-show(p, port_attr, buf); +} + +static const struct sysfs_ops gid_attr_sysfs_ops = { + .show = gid_attr_show +}; + static ssize_t state_show(struct ib_port *p, struct port_attribute *unused, char *buf) { @@ -281,6 +309,46 @@ static struct attribute *port_default_attrs[] = { NULL }; +static size_t print_ndev(struct ib_gid_attr *gid_attr, char *buf) +{ + if (!gid_attr-ndev) + return -EINVAL; + + return sprintf(buf, %s\n, gid_attr-ndev-name); +} + +static size_t print_gid_type(struct ib_gid_attr *gid_attr, char *buf) +{ + return sprintf(buf, %s, roce_gid_cache_type_str(gid_attr-gid_type)); +} + +static ssize_t _show_port_gid_attr(struct ib_port *p, + struct port_attribute *attr, + char *buf, + size_t (*print)(struct ib_gid_attr *gid_attr, + char *buf)) +{ + struct port_table_attribute *tab_attr = + container_of(attr, struct port_table_attribute, attr); + union ib_gid gid; + struct ib_gid_attr gid_attr; + ssize_t ret; + va_list args; + + rcu_read_lock
[PATCH 04/30] IB/core: Add default GID for RoCE GID Cache
From: Matan Barak mat...@mellanox.com When RoCE is used, a default GID address should be generated for every supported RoCE type. These default GID addresses are generated based on the IPv6 link-local address, but in contrast to the GID based on the regular IPv6 link-local (as we generate GID per IP address), these GIDs are also available if the net device is down (in order to support loopback). Moreover, these default GID addresses can't be deleted. Signed-off-by: Matan Barak mat...@mellanox.com Signed-off-by: Somnath Kotur somnath.ko...@emulex.com --- drivers/infiniband/core/core_priv.h | 10 drivers/infiniband/core/roce_gid_cache.c | 86 ++ drivers/infiniband/core/roce_gid_mgmt.c | 43 --- include/net/addrconf.h | 31 +++ net/ipv6/addrconf.c | 31 --- 5 files changed, 163 insertions(+), 38 deletions(-) diff --git a/drivers/infiniband/core/core_priv.h b/drivers/infiniband/core/core_priv.h index 12797d9..6ab40a9 100644 --- a/drivers/infiniband/core/core_priv.h +++ b/drivers/infiniband/core/core_priv.h @@ -84,6 +84,16 @@ int roce_gid_cache_find_gid_by_port(struct ib_device *ib_dev, union ib_gid *gid, int roce_gid_cache_is_active(struct ib_device *ib_dev, u8 port); +enum roce_gid_cache_default_mode { + ROCE_GID_CACHE_DEFAULT_MODE_SET, + ROCE_GID_CACHE_DEFAULT_MODE_DELETE +}; + +void roce_gid_cache_set_default_gid(struct ib_device *ib_dev, u8 port, + struct net_device *ndev, + unsigned long gid_type_mask, + enum roce_gid_cache_default_mode mode); + int roce_gid_cache_setup(void); void roce_gid_cache_cleanup(void); diff --git a/drivers/infiniband/core/roce_gid_cache.c b/drivers/infiniband/core/roce_gid_cache.c index f072533..fc6a4e6 100644 --- a/drivers/infiniband/core/roce_gid_cache.c +++ b/drivers/infiniband/core/roce_gid_cache.c @@ -34,6 +34,7 @@ #include linux/netdevice.h #include linux/rtnetlink.h #include rdma/ib_cache.h +#include net/addrconf.h #include core_priv.h @@ -176,12 +177,19 @@ static int find_gid(struct ib_roce_gid_cache *cache, union ib_gid *gid, return -1; } +static void make_default_gid(struct net_device *dev, union ib_gid *gid) +{ + gid-global.subnet_prefix = cpu_to_be64(0xfe80LL); + addrconf_ifid_eui48(gid-raw[8], dev); +} + int roce_add_gid(struct ib_device *ib_dev, u8 port, union ib_gid *gid, struct ib_gid_attr *attr) { struct ib_roce_gid_cache *cache; int ix; int ret = 0; + struct net_device *idev; if (!ib_dev-cache.roce_gid_cache) return -ENOSYS; @@ -191,6 +199,22 @@ int roce_add_gid(struct ib_device *ib_dev, u8 port, if (!cache-active) return -ENOSYS; + if (ib_dev-get_netdev) { + rcu_read_lock(); + idev = ib_dev-get_netdev(ib_dev, port); + if (attr-ndev != idev) { + union ib_gid default_gid; + + /* Adding default GIDs in not permitted */ + make_default_gid(idev, default_gid); + if (!memcmp(gid, default_gid, sizeof(*gid))) { + rcu_read_unlock(); + return -EPERM; + } + } + rcu_read_unlock(); + } + mutex_lock(cache-lock); ix = find_gid(cache, gid, attr, GID_ATTR_FIND_MASK_GID_TYPE | @@ -215,6 +239,7 @@ int roce_del_gid(struct ib_device *ib_dev, u8 port, union ib_gid *gid, struct ib_gid_attr *attr) { struct ib_roce_gid_cache *cache; + union ib_gid default_gid; int ix; if (!ib_dev-cache.roce_gid_cache) @@ -225,6 +250,13 @@ int roce_del_gid(struct ib_device *ib_dev, u8 port, if (!cache-active) return -ENOSYS; + if (attr-ndev) { + /* Deleting default GIDs in not permitted */ + make_default_gid(attr-ndev, default_gid); + if (!memcmp(gid, default_gid, sizeof(*gid))) + return -EPERM; + } + mutex_lock(cache-lock); ix = find_gid(cache, gid, attr, @@ -437,6 +469,60 @@ static void set_roce_gid_cache_active(struct ib_roce_gid_cache *cache, cache-active = active; } +void roce_gid_cache_set_default_gid(struct ib_device *ib_dev, u8 port, + struct net_device *ndev, + unsigned long gid_type_mask, + enum roce_gid_cache_default_mode mode) +{ + union ib_gid gid; + struct ib_gid_attr gid_attr; + struct ib_roce_gid_cache *cache; + unsigned int gid_type; + unsigned int gid_index = 0; + + cache = ib_dev-cache.roce_gid_cache[port - 1
[PATCH 23/30] IB/mlx4: Implement ib_device callback - get_netdev
From: Moni Shoua mo...@mellanox.com This is a new callback that is required for RoCEv2 support. In port aggregation mode it is required to return the netdev of the active port so support in mlx4 core driver to figure out that port identity is required. Signed-off-by: Moni Shoua mo...@mellanox.com Signed-off-by: Somnath Kotur somnath.ko...@emulex.com --- drivers/infiniband/hw/mlx4/main.c | 17 + drivers/net/ethernet/mellanox/mlx4/main.c | 18 ++ include/linux/mlx4/driver.h |1 + 3 files changed, 36 insertions(+), 0 deletions(-) diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c index bf87a95..38061a0 100644 --- a/drivers/infiniband/hw/mlx4/main.c +++ b/drivers/infiniband/hw/mlx4/main.c @@ -1527,6 +1527,22 @@ unlock: mutex_unlock(ibdev-qp1_proxy_lock[port - 1]); } +static struct net_device *mlx4_ib_get_netdev(struct ib_device *device, u8 port_num) +{ + struct mlx4_ib_dev *ibdev = to_mdev(device); + + if (mlx4_is_bonded(ibdev-dev)) { + u8 true_port_num; + + if (!mlx4_port_map_get(ibdev-dev, port_num, true_port_num)) + port_num = true_port_num; + else + return NULL; + } + + return mlx4_get_protocol_dev(ibdev-dev, MLX4_PROT_ETH, port_num); +} + static void mlx4_ib_scan_netdevs(struct mlx4_ib_dev *ibdev, struct net_device *dev, unsigned long event) @@ -1806,6 +1822,7 @@ static void *mlx4_ib_add(struct mlx4_dev *dev) ibdev-ib_dev.attach_mcast = mlx4_ib_mcg_attach; ibdev-ib_dev.detach_mcast = mlx4_ib_mcg_detach; ibdev-ib_dev.process_mad = mlx4_ib_process_mad; + ibdev-ib_dev.get_netdev= mlx4_ib_get_netdev; if (!mlx4_is_slave(ibdev-dev)) { ibdev-ib_dev.alloc_fmr = mlx4_ib_fmr_alloc; diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c b/drivers/net/ethernet/mellanox/mlx4/main.c index 1893a57..6311897 100644 --- a/drivers/net/ethernet/mellanox/mlx4/main.c +++ b/drivers/net/ethernet/mellanox/mlx4/main.c @@ -1237,6 +1237,24 @@ int mlx4_port_map_set(struct mlx4_dev *dev, struct mlx4_port_map *v2p) } EXPORT_SYMBOL_GPL(mlx4_port_map_set); +int mlx4_port_map_get(struct mlx4_dev *dev, u8 vport, u8 *pport) +{ + struct mlx4_priv *priv = mlx4_priv(dev); + + if (!pport) + return -EINVAL; + *pport = 0; + + if (vport == 1) + *pport = priv-v2p.port1; + else if (vport == 2) + *pport = priv-v2p.port2; + if (!*pport) + return -EINVAL; + return 0; +} +EXPORT_SYMBOL_GPL(mlx4_port_map_get); + static int mlx4_load_fw(struct mlx4_dev *dev) { struct mlx4_priv *priv = mlx4_priv(dev); diff --git a/include/linux/mlx4/driver.h b/include/linux/mlx4/driver.h index 5a06d96..a992971 100644 --- a/include/linux/mlx4/driver.h +++ b/include/linux/mlx4/driver.h @@ -81,6 +81,7 @@ struct mlx4_port_map { }; int mlx4_port_map_set(struct mlx4_dev *dev, struct mlx4_port_map *v2p); +int mlx4_port_map_get(struct mlx4_dev *dev, u8 vport, u8 *pport); void *mlx4_get_protocol_dev(struct mlx4_dev *dev, enum mlx4_protocol proto, int port); -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 26/30] IB/mlx4: Translate cache gid index to real index
From: Moni Shoua mo...@mellanox.com When QP is modified with path the given sgid_index is not necessarily the index that HW knows. This is due to optimizations that can save place in the HW table. Therefore, translation is required. Signed-off-by: Moni Shoua mo...@mellanox.com Signed-off-by: Somnath Kotur somnath.ko...@emulex.com --- drivers/infiniband/hw/mlx4/qp.c | 10 +++--- 1 files changed, 7 insertions(+), 3 deletions(-) diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c index 9731c07..b06e9fc 100644 --- a/drivers/infiniband/hw/mlx4/qp.c +++ b/drivers/infiniband/hw/mlx4/qp.c @@ -1256,14 +1256,18 @@ static int _mlx4_set_path(struct mlx4_ib_dev *dev, const struct ib_ah_attr *ah, path-static_rate = 0; if (ah-ah_flags IB_AH_GRH) { - if (ah-grh.sgid_index = dev-dev-caps.gid_table_len[port]) { + int real_sgid_index = mlx4_ib_gid_index_to_real_index(dev, + port, + ah-grh.sgid_index); + + if (real_sgid_index = dev-dev-caps.gid_table_len[port]) { pr_err(sgid_index (%u) too large. max is %d\n, - ah-grh.sgid_index, dev-dev-caps.gid_table_len[port] - 1); + real_sgid_index, dev-dev-caps.gid_table_len[port] - 1); return -1; } path-grh_mylmc |= 1 7; - path-mgid_index = ah-grh.sgid_index; + path-mgid_index = real_sgid_index; path-hop_limit = ah-grh.hop_limit; path-tclass_flowlabel = cpu_to_be32((ah-grh.traffic_class 20) | -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 11/30] IB/core: Add rdma_network_type to wc
From: Matan Barak mat...@mellanox.com Providers should tell IB core the wc's network type. This is used in order to search for the proper GID in the GID table. When using HCAs that can't provide this info, IB core tries to deep examine the packet and extract the GID type by itself. Signed-off-by: Matan Barak mat...@mellanox.com Signed-off-by: Somnath Kotur somnath.ko...@emulex.com --- drivers/infiniband/core/verbs.c | 106 +-- include/rdma/ib_verbs.h | 30 +++ 2 files changed, 131 insertions(+), 5 deletions(-) diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c index 2c54d31..0fdac14 100644 --- a/drivers/infiniband/core/verbs.c +++ b/drivers/infiniband/core/verbs.c @@ -195,8 +195,84 @@ struct ib_ah *ib_create_ah(struct ib_pd *pd, struct ib_ah_attr *ah_attr) } EXPORT_SYMBOL(ib_create_ah); +static int ib_get_grh_header_version(const void *h) +{ + const struct iphdr *ip4h = (struct iphdr *)(h + 20); + struct iphdr ip4h_checked; + const struct ipv6hdr *ip6h = (struct ipv6hdr *)h; + + if (ip6h-version != 6) + return (ip4h-version == 4) ? 4 : 0; + /* version may be 6 or 4 */ + if (ip4h-ihl != 5) /* IPv4 header length must be 5 for RR */ + return 6; + /* Verify checksum. + We can't write on scattered buffers so we need to copy to + temp buffer. +*/ + memcpy(ip4h_checked, ip4h, sizeof(ip4h_checked)); + ip4h_checked.check = 0; + ip4h_checked.check = ip_fast_csum((u8 *)ip4h_checked, 5); + /* if IPv4 header checksum is OK, bellive it */ + if (ip4h-check == ip4h_checked.check) + return 4; + return 6; +} + +static int ib_get_dgid_sgid_by_grh(const void *h, + enum rdma_network_type net_type, + union ib_gid *dgid, union ib_gid *sgid) +{ + switch (net_type) { + case RDMA_NETWORK_IPV4: { + const struct iphdr *ip4h = (struct iphdr *)(h + 20); + + ipv6_addr_set_v4mapped(ip4h-daddr, (struct in6_addr *)dgid); + ipv6_addr_set_v4mapped(ip4h-saddr, (struct in6_addr *)sgid); + return 0; + } + case RDMA_NETWORK_IPV6: { + struct ipv6hdr *ip6h = (struct ipv6hdr *)h; + + memcpy(dgid, ip6h-daddr, sizeof(*dgid)); + memcpy(sgid, ip6h-saddr, sizeof(*sgid)); + return 0; + } + case RDMA_NETWORK_IB: { + struct ib_grh *grh = (struct ib_grh *)h; + + memcpy(dgid, grh-dgid, sizeof(*dgid)); + memcpy(sgid, grh-sgid, sizeof(*sgid)); + return 0; + } + } + + return -EINVAL; +} + +static enum rdma_network_type ib_get_net_type_by_grh(struct ib_device *device, +u8 port_num, +const struct ib_grh *grh) +{ + int grh_version; + + if (rdma_port_get_link_layer(device, port_num) == IB_LINK_LAYER_INFINIBAND) + return RDMA_NETWORK_IB; + + grh_version = ib_get_grh_header_version(grh); + + if (grh_version == 4) + return RDMA_NETWORK_IPV4; + + if (grh-next_hdr == IPPROTO_UDP) + return RDMA_NETWORK_IPV6; + + return RDMA_NETWORK_IB; +} + struct find_gid_index_context { u16 vlan_id; + enum ib_gid_type gid_type; }; static bool find_gid_index(const union ib_gid *gid, @@ -206,6 +282,9 @@ static bool find_gid_index(const union ib_gid *gid, struct find_gid_index_context *ctx = (struct find_gid_index_context *)context; + if (ctx-gid_type != gid_attr-gid_type) + return false; + if ((!!(ctx-vlan_id != 0x) == !is_vlan_dev(gid_attr-ndev)) || (is_vlan_dev(gid_attr-ndev) vlan_dev_vlan_id(gid_attr-ndev) != ctx-vlan_id)) @@ -216,9 +295,11 @@ static bool find_gid_index(const union ib_gid *gid, static int get_sgid_index_from_eth(struct ib_device *device, u8 port_num, u16 vlan_id, union ib_gid *sgid, + enum ib_gid_type gid_type, u16 *gid_index) { - struct find_gid_index_context context = {.vlan_id = vlan_id}; + struct find_gid_index_context context = {.vlan_id = vlan_id, +.gid_type = gid_type}; return ib_find_gid_by_filter(device, sgid, port_num, find_gid_index, context, gid_index); @@ -232,9 +313,24 @@ int ib_init_ah_from_wc(struct ib_device *device, u8 port_num, struct ib_wc *wc, int ret; int is_eth = (rdma_port_get_link_layer(device, port_num) == IB_LINK_LAYER_ETHERNET); + enum rdma_network_type net_type = RDMA_NETWORK_IB
[PATCH 28/30] IB/mlx4: Enable send of RoCE QP1 packets with IP/UDP headers
From: Moni Shoua mo...@mellanox.com RoCEv2 packets are sent over IP/UDP protocols. The mlx4 driver uses a type of RAW QP to send packets for QP1 and therefore needs to build the network headers below BTH in software. This patche adds option to build QP1 packets with IP and UDP headers if RoCEv2 is requested. Signed-off-by: Moni Shoua mo...@mellanox.com Signed-off-by: Somnath Kotur somnath.ko...@emulex.com --- drivers/infiniband/hw/mlx4/qp.c | 84 --- 1 files changed, 52 insertions(+), 32 deletions(-) diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c index f55f4d4..9996527 100644 --- a/drivers/infiniband/hw/mlx4/qp.c +++ b/drivers/infiniband/hw/mlx4/qp.c @@ -32,6 +32,8 @@ */ #include linux/log2.h +#include linux/if_ether.h +#include net/ip.h #include linux/slab.h #include linux/netdevice.h @@ -2164,16 +2166,7 @@ static int build_sriov_qp0_header(struct mlx4_ib_sqp *sqp, return 0; } -static void mlx4_u64_to_smac(u8 *dst_mac, u64 src_mac) -{ - int i; - - for (i = ETH_ALEN; i; i--) { - dst_mac[i - 1] = src_mac 0xff; - src_mac = 8; - } -} - +#define MLX4_ROCEV2_QP1_SPORT 0xC000 static int build_mlx_header(struct mlx4_ib_sqp *sqp, struct ib_send_wr *wr, void *wqe, unsigned *mlx_seg_len) { @@ -2193,6 +2186,8 @@ static int build_mlx_header(struct mlx4_ib_sqp *sqp, struct ib_send_wr *wr, bool is_eth; bool is_vlan = false; bool is_grh; + bool is_udp = false; + int ip_version = 0; send_size = 0; for (i = 0; i wr-num_sge; ++i) @@ -2201,6 +2196,8 @@ static int build_mlx_header(struct mlx4_ib_sqp *sqp, struct ib_send_wr *wr, is_eth = rdma_port_get_link_layer(sqp-qp.ibqp.device, sqp-qp.port) == IB_LINK_LAYER_ETHERNET; is_grh = mlx4_ib_ah_grh_present(ah); if (is_eth) { + struct ib_gid_attr gid_attr; + if (mlx4_is_mfunc(to_mdev(ib_dev)-dev)) { /* When multi-function is enabled, the ib_core gid * indexes don't necessarily match the hw ones, so @@ -2211,21 +2208,29 @@ static int build_mlx_header(struct mlx4_ib_sqp *sqp, struct ib_send_wr *wr, if (err) return err; } else { - err = ib_get_cached_gid(ib_dev, + err = ib_get_cached_gid(sqp-qp.ibqp.device, be32_to_cpu(ah-av.ib.port_pd) 24, - ah-av.ib.gid_index, sgid, - NULL); - if (err) + ah-av.ib.gid_index, sgid, gid_attr); + if (!err) { + is_udp = (gid_attr.gid_type == IB_GID_TYPE_ROCE_V2) ? true : false; + if (is_udp) { + if (ipv6_addr_v4mapped((struct in6_addr *)sgid)) + ip_version = 4; + else + ip_version = 6; + is_grh = false; + } + } else { return err; + } } - if (ah-av.eth.vlan != cpu_to_be16(0x)) { vlan = be16_to_cpu(ah-av.eth.vlan) 0x0fff; is_vlan = 1; } } err = ib_ud_header_init(send_size, !is_eth, is_eth, is_vlan, is_grh, - 0, 0, 0, sqp-ud_header); + ip_version, is_udp, 0, sqp-ud_header); if (err) return err; @@ -2236,12 +2241,14 @@ static int build_mlx_header(struct mlx4_ib_sqp *sqp, struct ib_send_wr *wr, sqp-ud_header.lrh.source_lid = cpu_to_be16(ah-av.ib.g_slid 0x7f); } - if (is_grh) { + if (is_grh || (ip_version == 6)) { sqp-ud_header.grh.traffic_class = (be32_to_cpu(ah-av.ib.sl_tclass_flowlabel) 20) 0xff; sqp-ud_header.grh.flow_label= ah-av.ib.sl_tclass_flowlabel cpu_to_be32(0xf); - sqp-ud_header.grh.hop_limit = ah-av.ib.hop_limit; + + sqp-ud_header.grh.hop_limit = (is_udp) ? + IPV6_DEFAULT_HOPLIMIT : ah-av.ib.hop_limit; if (is_eth) memcpy(sqp-ud_header.grh.source_gid.raw, sgid.raw, 16); else { @@ -2265,6 +2272,26 @@ static int build_mlx_header(struct mlx4_ib_sqp *sqp, struct ib_send_wr *wr, ah-av.ib.dgid, 16); } + if (ip_version == 4
[PATCH 27/30] IB/core: Initialize UD header structure with IP and UDP headers
From: Moni Shoua mo...@mellanox.com ib_ud_header_init() is used to format InfiniBand headers in a buffer up to (but not with) BTH. For RoCEv2 it is required that this function would be able to build also IP and UDP headers. Signed-off-by: Moni Shoua mo...@mellanox.com Signed-off-by: Somnath Kotur somnath.ko...@emulex.com --- drivers/infiniband/core/ud_header.c| 153 +-- drivers/infiniband/hw/mlx4/qp.c|7 +- drivers/infiniband/hw/mthca/mthca_qp.c |2 +- include/rdma/ib_pack.h | 44 -- 4 files changed, 186 insertions(+), 20 deletions(-) diff --git a/drivers/infiniband/core/ud_header.c b/drivers/infiniband/core/ud_header.c index 72feee6..a7797a7 100644 --- a/drivers/infiniband/core/ud_header.c +++ b/drivers/infiniband/core/ud_header.c @@ -35,6 +35,7 @@ #include linux/string.h #include linux/export.h #include linux/if_ether.h +#include linux/ip.h #include rdma/ib_pack.h @@ -116,6 +117,68 @@ static const struct ib_field vlan_table[] = { .size_bits= 16 } }; +static const struct ib_field ip4_table[] = { + { STRUCT_FIELD(ip4, ver_len), + .offset_words = 0, + .offset_bits = 0, + .size_bits= 8 }, + { STRUCT_FIELD(ip4, tos), + .offset_words = 0, + .offset_bits = 8, + .size_bits= 8 }, + { STRUCT_FIELD(ip4, tot_len), + .offset_words = 0, + .offset_bits = 16, + .size_bits= 16 }, + { STRUCT_FIELD(ip4, id), + .offset_words = 1, + .offset_bits = 0, + .size_bits= 16 }, + { STRUCT_FIELD(ip4, frag_off), + .offset_words = 1, + .offset_bits = 16, + .size_bits= 16 }, + { STRUCT_FIELD(ip4, ttl), + .offset_words = 2, + .offset_bits = 0, + .size_bits= 8 }, + { STRUCT_FIELD(ip4, protocol), + .offset_words = 2, + .offset_bits = 8, + .size_bits= 8 }, + { STRUCT_FIELD(ip4, check), + .offset_words = 2, + .offset_bits = 16, + .size_bits= 16 }, + { STRUCT_FIELD(ip4, saddr), + .offset_words = 3, + .offset_bits = 0, + .size_bits= 32 }, + { STRUCT_FIELD(ip4, daddr), + .offset_words = 4, + .offset_bits = 0, + .size_bits= 32 } +}; + +static const struct ib_field udp_table[] = { + { STRUCT_FIELD(udp, sport), + .offset_words = 0, + .offset_bits = 0, + .size_bits= 16 }, + { STRUCT_FIELD(udp, dport), + .offset_words = 0, + .offset_bits = 16, + .size_bits= 16 }, + { STRUCT_FIELD(udp, length), + .offset_words = 1, + .offset_bits = 0, + .size_bits= 16 }, + { STRUCT_FIELD(udp, csum), + .offset_words = 1, + .offset_bits = 16, + .size_bits= 16 } +}; + static const struct ib_field grh_table[] = { { STRUCT_FIELD(grh, ip_version), .offset_words = 0, @@ -213,6 +276,26 @@ static const struct ib_field deth_table[] = { .size_bits= 24 } }; +u16 ib_ud_ip4_csum(struct ib_ud_header *header) +{ + struct iphdr iph; + + iph.ihl = 5; + iph.version = 4; + iph.tos = header-ip4.tos; + iph.tot_len = header-ip4.tot_len; + iph.id = header-ip4.id; + iph.frag_off= header-ip4.frag_off; + iph.ttl = header-ip4.ttl; + iph.protocol= header-ip4.protocol; + iph.check = 0; + iph.saddr = header-ip4.saddr; + iph.daddr = header-ip4.daddr; + + return ip_fast_csum((u8 *)iph, iph.ihl); +} +EXPORT_SYMBOL(ib_ud_ip4_csum); + /** * ib_ud_header_init - Initialize UD header structure * @payload_bytes:Length of packet payload @@ -220,19 +303,35 @@ static const struct ib_field deth_table[] = { * @eth_present: specify if Eth header is present * @vlan_present: packet is tagged vlan * @grh_present:GRH flag (if non-zero, GRH will be included) + * @ip_version:GRH flag (if non-zero, IP header, V4 or V6, will be included) + * @grh_present:GRH flag (if non-zero, UDP header will be included) * @immediate_present: specify if immediate data is present * @header:Structure to initialize */ -void ib_ud_header_init(int payload_bytes, - int lrh_present, - int eth_present, - int vlan_present, - int grh_present, - int immediate_present, - struct ib_ud_header *header) +int ib_ud_header_init(int payload_bytes, + intlrh_present, + inteth_present, + intvlan_present, + intgrh_present
[PATCH 25/30] IB/mlx4: Configure device to work in RoCEv2
From: Moni Shoua mo...@mellanox.com Some mlx4 adapters are RoCEv2 capable. To enable this feature some hardware configuration is required. This is 1. Set port general parameters 2. Configure the outgoing UDP destination port 3. Configure the QP that work with RoCEv2 Signed-off-by: Moni Shoua mo...@mellanox.com Signed-off-by: Somnath Kotur somnath.ko...@emulex.com --- drivers/infiniband/hw/mlx4/main.c | 10 ++- drivers/infiniband/hw/mlx4/qp.c | 39 ++-- drivers/net/ethernet/mellanox/mlx4/fw.c | 16 +++- drivers/net/ethernet/mellanox/mlx4/mlx4.h |3 +- drivers/net/ethernet/mellanox/mlx4/port.c |9 ++- drivers/net/ethernet/mellanox/mlx4/qp.c | 27 include/linux/mlx4/device.h |3 +- include/linux/mlx4/qp.h | 15 +- 8 files changed, 112 insertions(+), 10 deletions(-) diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c index ca19d1d..50612b8 100644 --- a/drivers/infiniband/hw/mlx4/main.c +++ b/drivers/infiniband/hw/mlx4/main.c @@ -2154,7 +2154,8 @@ static void *mlx4_ib_add(struct mlx4_dev *dev) if (mlx4_ib_init_sriov(ibdev)) goto err_mad; - if (dev-caps.flags MLX4_DEV_CAP_FLAG_IBOE) { + if (dev-caps.flags MLX4_DEV_CAP_FLAG_IBOE || + dev-caps.flags2 MLX4_DEV_CAP_FLAG2_ROCE_V1_V2) { if (!iboe-nb.notifier_call) { iboe-nb.notifier_call = mlx4_ib_netdev_event; err = register_netdevice_notifier(iboe-nb); @@ -2163,6 +2164,13 @@ static void *mlx4_ib_add(struct mlx4_dev *dev) goto err_notif; } } + if (!mlx4_is_slave(dev) + dev-caps.flags2 MLX4_DEV_CAP_FLAG2_ROCE_V1_V2) { + err = mlx4_config_roce_v2_port(dev, ROCE_V2_UDP_DPORT); + if (err) { + goto err_notif; + } + } } for (j = 0; j ARRAY_SIZE(mlx4_class_attributes); ++j) { diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c index 9ab9156..9731c07 100644 --- a/drivers/infiniband/hw/mlx4/qp.c +++ b/drivers/infiniband/hw/mlx4/qp.c @@ -1408,6 +1408,24 @@ static int handle_eth_ud_smac_index(struct mlx4_ib_dev *dev, return 0; } +enum { + MLX4_QPC_ROCE_MODE_1 = 0, + MLX4_QPC_ROCE_MODE_2 = 2, + MLX4_QPC_ROCE_MODE_MAX = 0xff +}; + +static u8 gid_type_to_qpc(enum ib_gid_type gid_type) +{ + switch (gid_type) { + case IB_GID_TYPE_IB: + return MLX4_QPC_ROCE_MODE_1; + case IB_GID_TYPE_ROCE_V2: + return MLX4_QPC_ROCE_MODE_2; + default: + return MLX4_QPC_ROCE_MODE_MAX; + } +} + static int __mlx4_ib_modify_qp(struct ib_qp *ibqp, const struct ib_qp_attr *attr, int attr_mask, enum ib_qp_state cur_state, enum ib_qp_state new_state) @@ -1532,9 +1550,12 @@ static int __mlx4_ib_modify_qp(struct ib_qp *ibqp, u16 vlan = 0x; u8 smac[ETH_ALEN]; int status = 0; + int is_eth = rdma_port_get_link_layer(dev-ib_dev, qp-port) == + IB_LINK_LAYER_ETHERNET; - if (rdma_port_get_link_layer(dev-ib_dev, qp-port) == - IB_LINK_LAYER_ETHERNET) { + if (is_eth) { + if (mlx4_is_bonded(dev-dev)) + port_num = 1; rcu_read_lock(); status = ib_get_cached_gid(ibqp-device, port_num, index, gid, gid_attr); @@ -1551,8 +1572,20 @@ static int __mlx4_ib_modify_qp(struct ib_qp *ibqp, port_num, vlan, smac)) goto out; + if (is_eth gid_attr.gid_type == IB_GID_TYPE_ROCE_V2) + context-pri_path.hop_limit = IPV6_DEFAULT_HOPLIMIT; + optpar |= (MLX4_QP_OPTPAR_PRIMARY_ADDR_PATH | MLX4_QP_OPTPAR_SCHED_QUEUE); + + if (is_eth (cur_state == IB_QPS_INIT new_state == IB_QPS_RTR)) { + u8 qpc_roce_mode = gid_type_to_qpc(gid_attr.gid_type); + + if (qpc_roce_mode == MLX4_QPC_ROCE_MODE_MAX) + goto out; + context-rlkey_roce_mode |= (qpc_roce_mode 6); + } + } if (attr_mask IB_QP_TIMEOUT) { @@ -1722,7 +1755,7 @@ static int __mlx4_ib_modify_qp(struct ib_qp *ibqp, sqd_event = 0; if (!ibqp-uobject cur_state == IB_QPS_RESET new_state == IB_QPS_INIT) - context-rlkey |= (1 4); + context-rlkey_roce_mode
[PATCH 29/30] IB/mlx4: Create and use another QP1 for RoCEv2
From: Moni Shoua mo...@mellanox.com The mlx4 driver uses a special QP to implement the GSI QP. This kind of QP allows to build the InfiniBand headers in SW to be put before the payload that comes in with the WR. The mlx4 HW builds the packet, calculates the ICRC and puts it at the end of the payload. This ICRC calculation however depends on the QP configuration which is determined when QP is modified (roce_mode during INIT-RTR). On the other hand, ICRC verification when packet is received does to depend on this configuration. Therefore, using 2 GSI QPs for send (one for each RoCE version) and 1 GSI QP for receive are required. Signed-off-by: Moni Shoua mo...@mellanox.com Signed-off-by: Somnath Kotur somnath.ko...@emulex.com --- drivers/infiniband/hw/mlx4/mlx4_ib.h |7 ++ drivers/infiniband/hw/mlx4/qp.c | 154 ++ 2 files changed, 143 insertions(+), 18 deletions(-) diff --git a/drivers/infiniband/hw/mlx4/mlx4_ib.h b/drivers/infiniband/hw/mlx4/mlx4_ib.h index 018bda6..a853330 100644 --- a/drivers/infiniband/hw/mlx4/mlx4_ib.h +++ b/drivers/infiniband/hw/mlx4/mlx4_ib.h @@ -159,11 +159,18 @@ struct mlx4_ib_wq { unsignedtail; }; +enum { + MLX4_IB_QP_CREATE_ROCE_V2_GSI = IB_QP_CREATE_RESERVED_START +}; + enum mlx4_ib_qp_flags { MLX4_IB_QP_LSO = IB_QP_CREATE_IPOIB_UD_LSO, MLX4_IB_QP_BLOCK_MULTICAST_LOOPBACK = IB_QP_CREATE_BLOCK_MULTICAST_LOOPBACK, MLX4_IB_QP_NETIF = IB_QP_CREATE_NETIF_QP, MLX4_IB_QP_CREATE_USE_GFP_NOIO = IB_QP_CREATE_USE_GFP_NOIO, + + /* Mellanox specific flags start from IB_QP_CREATE_RESERVED_START */ + MLX4_IB_ROCE_V2_GSI_QP = MLX4_IB_QP_CREATE_ROCE_V2_GSI, MLX4_IB_SRIOV_TUNNEL_QP = 1 30, MLX4_IB_SRIOV_SQP = 1 31, }; diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c index 9996527..161b933 100644 --- a/drivers/infiniband/hw/mlx4/qp.c +++ b/drivers/infiniband/hw/mlx4/qp.c @@ -81,6 +81,7 @@ struct mlx4_ib_sqp { u32 send_psn; struct ib_ud_header ud_header; u8 header_buf[MLX4_IB_UD_HEADER_SIZE]; + struct ib_qp*roce_v2_gsi; }; enum { @@ -150,7 +151,10 @@ static int is_sqp(struct mlx4_ib_dev *dev, struct mlx4_ib_qp *qp) } } } - return proxy_sqp; + if (proxy_sqp) + return 1; + + return !!(qp-flags MLX4_IB_ROCE_V2_GSI_QP); } /* used for INIT/CLOSE port logic */ @@ -672,6 +676,7 @@ static int create_qp_common(struct mlx4_ib_dev *dev, struct ib_pd *pd, qp = sqp-qp; qp-pri.vid = 0x; qp-alt.vid = 0x; + sqp-roce_v2_gsi = NULL; } else { qp = kzalloc(sizeof (struct mlx4_ib_qp), gfp); if (!qp) @@ -1029,9 +1034,17 @@ static void destroy_qp_common(struct mlx4_ib_dev *dev, struct mlx4_ib_qp *qp, del_gid_entries(qp); } -static u32 get_sqp_num(struct mlx4_ib_dev *dev, struct ib_qp_init_attr *attr) +static int get_sqp_num(struct mlx4_ib_dev *dev, struct ib_qp_init_attr *attr) { /* Native or PPF */ + if ((!mlx4_is_mfunc(dev-dev) || mlx4_is_master(dev-dev)) + attr-create_flags MLX4_IB_QP_CREATE_ROCE_V2_GSI) { + int sqpn; + int res = mlx4_qp_reserve_range(dev-dev, 1, 1, sqpn, 0); + + return res ? -abs(res) : sqpn; + } + if (!mlx4_is_mfunc(dev-dev) || (mlx4_is_master(dev-dev) attr-create_flags MLX4_IB_SRIOV_SQP)) { @@ -1039,6 +1052,7 @@ static u32 get_sqp_num(struct mlx4_ib_dev *dev, struct ib_qp_init_attr *attr) (attr-qp_type == IB_QPT_SMI ? 0 : 2) + attr-port_num - 1; } + /* PF or VF -- creating proxies */ if (attr-qp_type == IB_QPT_SMI) return dev-dev-caps.qp0_proxy[attr-port_num - 1]; @@ -1046,9 +1060,9 @@ static u32 get_sqp_num(struct mlx4_ib_dev *dev, struct ib_qp_init_attr *attr) return dev-dev-caps.qp1_proxy[attr-port_num - 1]; } -struct ib_qp *mlx4_ib_create_qp(struct ib_pd *pd, - struct ib_qp_init_attr *init_attr, - struct ib_udata *udata) +static struct ib_qp *_mlx4_ib_create_qp(struct ib_pd *pd, + struct ib_qp_init_attr *init_attr, + struct ib_udata *udata) { struct mlx4_ib_qp *qp = NULL; int err; @@ -1066,6 +1080,7 @@ struct ib_qp *mlx4_ib_create_qp(struct ib_pd *pd, MLX4_IB_SRIOV_TUNNEL_QP | MLX4_IB_SRIOV_SQP | MLX4_IB_QP_NETIF | + MLX4_IB_QP_CREATE_ROCE_V2_GSI
[PATCH 15/30] RDMA/ocrdma: changes to support RoCE-v2 in UD path
From: Devesh Sharma devesh.sha...@emulex.com To support UD protocol this patch adds following changes to existing UD implementation. 1. AH creation resolves gid-type for a given index. 2. Based on GID-type protocol header is built. 3. Work completion reports l3-type if f/w supports RoCE-v2 and sets IB_WC_WITH_NETWORK_HDR_TYPE flag in wc-wc_flags. Signed-off-by: Somnath Kotur somnath.ko...@emulex.com Signed-off-by: Devesh Sharma devesh.sha...@emulex.com --- drivers/infiniband/hw/ocrdma/ocrdma.h |1 + drivers/infiniband/hw/ocrdma/ocrdma_ah.c| 68 ++- drivers/infiniband/hw/ocrdma/ocrdma_sli.h |5 ++- drivers/infiniband/hw/ocrdma/ocrdma_verbs.c | 23 +++-- 4 files changed, 80 insertions(+), 17 deletions(-) diff --git a/drivers/infiniband/hw/ocrdma/ocrdma.h b/drivers/infiniband/hw/ocrdma/ocrdma.h index 97f971a..302fd0e 100644 --- a/drivers/infiniband/hw/ocrdma/ocrdma.h +++ b/drivers/infiniband/hw/ocrdma/ocrdma.h @@ -341,6 +341,7 @@ struct ocrdma_ah { struct ocrdma_av *av; u16 sgid_index; u32 id; + u8 hdr_type; }; struct ocrdma_qp_hwq_info { diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_ah.c b/drivers/infiniband/hw/ocrdma/ocrdma_ah.c index 7ecd230..70a885b 100644 --- a/drivers/infiniband/hw/ocrdma/ocrdma_ah.c +++ b/drivers/infiniband/hw/ocrdma/ocrdma_ah.c @@ -39,6 +39,20 @@ #define OCRDMA_VID_PCP_SHIFT 0xD +static u16 ocrdma_hdr_type_to_proto_num(u8 hdr_type) +{ + switch (hdr_type) { + case OCRDMA_L3_TYPE_IB_GRH: + return (u16)0x8915; + case OCRDMA_L3_TYPE_IPV4: + return (u16)0x0800; + case OCRDMA_L3_TYPE_IPV6: + return (u16)0x86dd; + default: + return 0; + } +} + static inline int set_av_attr(struct ocrdma_dev *dev, struct ocrdma_ah *ah, struct ib_ah_attr *attr, union ib_gid *sgid, int pdid, bool *isvlan, u16 vlan_tag) @@ -47,22 +61,32 @@ static inline int set_av_attr(struct ocrdma_dev *dev, struct ocrdma_ah *ah, struct ocrdma_eth_vlan eth; struct ocrdma_grh grh; int eth_sz; + u16 proto_num = 0; + struct iphdr ipv4; + union { + struct sockaddr _sockaddr; + struct sockaddr_in _sockaddr_in; + struct sockaddr_in6 _sockaddr_in6; + } sgid_addr, dgid_addr; memset(eth, 0, sizeof(eth)); memset(grh, 0, sizeof(grh)); + /* Protocol Number */ + proto_num = ocrdma_hdr_type_to_proto_num(ah-hdr_type); + /* VLAN */ if (!vlan_tag || (vlan_tag 0xFFF)) vlan_tag = dev-pvid; if (vlan_tag (vlan_tag 0x1000)) { eth.eth_type = cpu_to_be16(0x8100); - eth.roce_eth_type = cpu_to_be16(OCRDMA_ROCE_ETH_TYPE); + eth.roce_eth_type = cpu_to_be16(proto_num); vlan_tag |= (dev-sl 0x07) OCRDMA_VID_PCP_SHIFT; eth.vlan_tag = cpu_to_be16(vlan_tag); eth_sz = sizeof(struct ocrdma_eth_vlan); *isvlan = true; } else { - eth.eth_type = cpu_to_be16(OCRDMA_ROCE_ETH_TYPE); + eth.eth_type = cpu_to_be16(proto_num); eth_sz = sizeof(struct ocrdma_eth_basic); } /* MAC */ @@ -71,18 +95,34 @@ static inline int set_av_attr(struct ocrdma_dev *dev, struct ocrdma_ah *ah, if (status) return status; ah-sgid_index = attr-grh.sgid_index; - memcpy(grh.sgid[0], sgid-raw, sizeof(union ib_gid)); - memcpy(grh.dgid[0], attr-grh.dgid.raw, sizeof(attr-grh.dgid.raw)); - - grh.tclass_flow = cpu_to_be32((6 28) | - (attr-grh.traffic_class 24) | - attr-grh.flow_label); - /* 0x1b is next header value in GRH */ - grh.pdid_hoplimit = cpu_to_be32((pdid 16) | - (0x1b 8) | attr-grh.hop_limit); /* Eth HDR */ memcpy(ah-av-eth_hdr, eth, eth_sz); - memcpy((u8 *)ah-av + eth_sz, grh, sizeof(struct ocrdma_grh)); + if (ah-hdr_type == RDMA_NETWORK_IPV4) { + *((__be16 *)ipv4) = htons((4 12) | (5 8) | + attr-grh.traffic_class); + ipv4.id = cpu_to_be16(pdid); + ipv4.frag_off = htons(IP_DF); + ipv4.tot_len = htons(0); + ipv4.ttl = attr-grh.hop_limit; + ipv4.protocol = 0x11; + rdma_gid2ip(sgid_addr._sockaddr, sgid); + ipv4.saddr = sgid_addr._sockaddr_in.sin_addr.s_addr; + rdma_gid2ip(dgid_addr._sockaddr, attr-grh.dgid); + ipv4.daddr = dgid_addr._sockaddr_in.sin_addr.s_addr; + memcpy((u8 *)ah-av + eth_sz, ipv4, sizeof(struct iphdr)); + } else { + memcpy(grh.sgid[0], sgid-raw, sizeof(union ib_gid)); + grh.tclass_flow = cpu_to_be32((6 28
[PATCH 30/30] IB/cma: Join and leave multicast groups with IGMP
From: Moni Shoua mo...@mellanox.com Since RoCEv2 is a protocol over IP header it is required to send IGMP join and leave requests to the network when joining and leaving multicast groups. Signed-off-by: Moni Shoua mo...@mellanox.com Signed-off-by: Somnath Kotur somnath.ko...@emulex.com --- drivers/infiniband/core/cma.c | 78 ++-- 1 files changed, 74 insertions(+), 4 deletions(-) diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index 50635fe..6e658e8 100644 --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -38,6 +38,7 @@ #include linux/in6.h #include linux/mutex.h #include linux/random.h +#include linux/igmp.h #include linux/idr.h #include linux/inetdevice.h #include linux/slab.h @@ -185,6 +186,7 @@ struct rdma_id_private { u8 reuseaddr; u8 afonly; enum ib_gid_typegid_type; + booligmp_joined; }; struct cma_multicast { @@ -283,6 +285,26 @@ static inline void cma_set_ip_ver(struct cma_hdr *hdr, u8 ip_ver) hdr-ip_version = (ip_ver 4) | (hdr-ip_version 0xF); } +static int cma_igmp_send(struct net_device *ndev, union ib_gid *mgid, bool join) +{ + struct in_device *in_dev = NULL; + + if (ndev) { + rtnl_lock(); + in_dev = __in_dev_get_rtnl(ndev); + if (in_dev) { + if (join) + ip_mc_inc_group(in_dev, + *(__be32 *)(mgid-raw+12)); + else + ip_mc_dec_group(in_dev, + *(__be32 *)(mgid-raw+12)); + } + rtnl_unlock(); + } + return (in_dev) ? 0 : -ENODEV; +} + static void cma_attach_to_dev(struct rdma_id_private *id_priv, struct cma_device *cma_dev) { @@ -585,6 +607,7 @@ struct rdma_cm_id *rdma_create_id(rdma_cm_event_handler event_handler, INIT_LIST_HEAD(id_priv-listen_list); INIT_LIST_HEAD(id_priv-mc_list); get_random_bytes(id_priv-seq_num, sizeof id_priv-seq_num); + id_priv-igmp_joined = false; return id_priv-id; } @@ -1076,6 +1099,20 @@ static void cma_leave_mc_groups(struct rdma_id_private *id_priv) kfree(mc); break; case IB_LINK_LAYER_ETHERNET: + if (id_priv-igmp_joined) { + struct rdma_dev_addr *dev_addr = id_priv-id.route.addr.dev_addr; + struct net_device *ndev = NULL; + + if (dev_addr-bound_dev_if) + ndev = dev_get_by_index(init_net, + dev_addr-bound_dev_if); + if (ndev) { + cma_igmp_send(ndev, + mc-multicast.ib-rec.mgid, + false); + dev_put(ndev); + } + } kref_put(mc-mcref, release_mc); break; default: @@ -3356,7 +3393,7 @@ static int cma_iboe_join_multicast(struct rdma_id_private *id_priv, { struct iboe_mcast_work *work; struct rdma_dev_addr *dev_addr = id_priv-id.route.addr.dev_addr; - int err; + int err = 0; struct sockaddr *addr = (struct sockaddr *)mc-addr; struct net_device *ndev = NULL; @@ -3388,13 +3425,31 @@ static int cma_iboe_join_multicast(struct rdma_id_private *id_priv, mc-multicast.ib-rec.rate = iboe_get_rate(ndev); mc-multicast.ib-rec.hop_limit = 1; mc-multicast.ib-rec.mtu = iboe_get_mtu(ndev-mtu); + rdma_ip2gid((struct sockaddr *)id_priv-id.route.addr.src_addr, + mc-multicast.ib-rec.port_gid); + + if (addr-sa_family == AF_INET) { + u16 sgid_index; + + err = ib_find_cached_gid_by_port(id_priv-cma_dev-device, + mc-multicast.ib-rec.port_gid, +IB_GID_TYPE_ROCE_V2, +id_priv-id.port_num, +init_net, dev_addr-bound_dev_if, +sgid_index); + if (!err) + err = cma_igmp_send(ndev, mc-multicast.ib-rec.mgid, true); + if (!err) { + id_priv-igmp_joined = true; + mc-multicast.ib-rec.hop_limit = IPV6_DEFAULT_HOPLIMIT; + } + } dev_put(ndev); - if (!mc
[PATCH 10/30] IB/core: Add gid_type to path and rdma_id_private
From: Matan Barak mat...@mellanox.com When using rdma cm, we want to take the gid_type from the rdma_id_private. This is mandatory before adding an API from user-space/configfs that sets the gid_type of CM connection. Signed-off-by: Matan Barak mat...@mellanox.com Signed-off-by: Somnath Kotur somnath.ko...@emulex.com --- drivers/infiniband/core/cm.c | 19 ++- drivers/infiniband/core/cma.c |2 ++ drivers/infiniband/core/sa_query.c|3 ++- drivers/infiniband/core/uverbs_marshall.c |1 + include/rdma/ib_sa.h |1 + 5 files changed, 20 insertions(+), 6 deletions(-) diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c index 7974e74..22dac05 100644 --- a/drivers/infiniband/core/cm.c +++ b/drivers/infiniband/core/cm.c @@ -358,9 +358,8 @@ static int cm_init_av_by_path(struct ib_sa_path_rec *path, struct cm_av *av) read_lock_irqsave(cm.device_lock, flags); list_for_each_entry(cm_dev, cm.device_list, list) { if (!ib_find_cached_gid(cm_dev-ib_device, path-sgid, - IB_GID_TYPE_IB, path-net, - path-ifindex, - p, NULL)) { + path-gid_type, path-net, + path-ifindex, p, NULL)) { port = cm_dev-port[p-1]; break; } @@ -1521,6 +1520,8 @@ static int cm_req_handler(struct cm_work *work) struct ib_cm_id *cm_id; struct cm_id_private *cm_id_priv, *listen_cm_id_priv; struct cm_req_msg *req_msg; + union ib_gid gid; + struct ib_gid_attr gid_attr; int ret; req_msg = (struct cm_req_msg *)work-mad_recv_wc-recv_buf.mad; @@ -1560,11 +1561,19 @@ static int cm_req_handler(struct cm_work *work) cm_format_paths_from_req(req_msg, work-path[0], work-path[1]); memcpy(work-path[0].dmac, cm_id_priv-av.ah_attr.dmac, ETH_ALEN); - ret = cm_init_av_by_path(work-path[0], cm_id_priv-av); + ret = ib_get_cached_gid(work-port-cm_dev-ib_device, + work-port-port_num, + cm_id_priv-av.ah_attr.grh.sgid_index, + gid, gid_attr); + if (!ret) { + work-path[0].gid_type = gid_attr.gid_type; + ret = cm_init_av_by_path(work-path[0], cm_id_priv-av); + } if (ret) { ib_get_cached_gid(work-port-cm_dev-ib_device, work-port-port_num, 0, work-path[0].sgid, - NULL); + gid_attr); + work-path[0].gid_type = gid_attr.gid_type; ib_send_cm_rej(cm_id, IB_CM_REJ_INVALID_GID, work-path[0].sgid, sizeof work-path[0].sgid, NULL, 0); diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index 659676c..9afa410 100644 --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -146,6 +146,7 @@ struct rdma_id_private { u8 tos; u8 reuseaddr; u8 afonly; + enum ib_gid_typegid_type; }; struct cma_multicast { @@ -1936,6 +1937,7 @@ static int cma_resolve_iboe_route(struct rdma_id_private *id_priv) ndev = dev_get_by_index(init_net, addr-dev_addr.bound_dev_if); route-path_rec-net = init_net; route-path_rec-ifindex = addr-dev_addr.bound_dev_if; + route-path_rec-gid_type = id_priv-gid_type; } if (!ndev) { ret = -ENODEV; diff --git a/drivers/infiniband/core/sa_query.c b/drivers/infiniband/core/sa_query.c index 705b6b8..f770049 100644 --- a/drivers/infiniband/core/sa_query.c +++ b/drivers/infiniband/core/sa_query.c @@ -546,7 +546,7 @@ int ib_init_ah_from_path(struct ib_device *device, u8 port_num, ah_attr-ah_flags = IB_AH_GRH; ah_attr-grh.dgid = rec-dgid; - ret = ib_find_cached_gid(device, rec-sgid, IB_GID_TYPE_IB, + ret = ib_find_cached_gid(device, rec-sgid, rec-gid_type, rec-net, rec-ifindex, port_num, gid_index); if (ret) @@ -676,6 +676,7 @@ static void ib_sa_path_rec_callback(struct ib_sa_query *sa_query, mad-data, rec); rec.net = NULL; rec.ifindex = 0; + rec.gid_type = IB_GID_TYPE_IB; memset(rec.dmac, 0, ETH_ALEN); query-callback(status, rec, query-context); } else diff --git a/drivers/infiniband/core/uverbs_marshall.c b/drivers/infiniband/core/uverbs_marshall.c index 7d2f14c..af020f8
[PATCH 19/30] IB/mlx4: Replace spin_lock with rw_semaphore
From: Moni Shoua mo...@mellanox.com Protection on iboe-netdevs is no longer required to be from an atomic context. Replacing a spin_lock with a semaphore is allowed and makes more sense. Signed-off-by: Moni Shoua mo...@mellanox.com Signed-off-by: Somnath Kotur somnath.ko...@emulex.com --- drivers/infiniband/hw/mlx4/main.c| 27 ++- drivers/infiniband/hw/mlx4/mlx4_ib.h |2 +- 2 files changed, 11 insertions(+), 18 deletions(-) diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c index 91caffc..d8b227e 100644 --- a/drivers/infiniband/hw/mlx4/main.c +++ b/drivers/infiniband/hw/mlx4/main.c @@ -369,7 +369,7 @@ static int eth_link_query_port(struct ib_device *ibdev, u8 port, props-active_mtu = IB_MTU_256; if (is_bonded) rtnl_lock(); /* required to get upper dev */ - spin_lock_bh(iboe-lock); + down_read(iboe-sem); ndev = iboe-netdevs[port - 1]; if (ndev is_bonded) ndev = netdev_master_upper_dev_get(ndev); @@ -383,7 +383,7 @@ static int eth_link_query_port(struct ib_device *ibdev, u8 port, IB_PORT_ACTIVE : IB_PORT_DOWN; props-phys_state = state_to_phys_state(props-state); out_unlock: - spin_unlock_bh(iboe-lock); + up_read(iboe-sem); if (is_bonded) rtnl_unlock(); out: @@ -825,11 +825,11 @@ int mlx4_ib_add_mc(struct mlx4_ib_dev *mdev, struct mlx4_ib_qp *mqp, if (!mqp-port) return 0; - spin_lock_bh(mdev-iboe.lock); + down_read(mdev-iboe.sem); ndev = mdev-iboe.netdevs[mqp-port - 1]; if (ndev) dev_hold(ndev); - spin_unlock_bh(mdev-iboe.lock); + up_read(mdev-iboe.sem); if (ndev) { ret = 1; @@ -1330,7 +1330,6 @@ static int mlx4_ib_mcg_detach(struct ib_qp *ibqp, union ib_gid *gid, u16 lid) struct mlx4_ib_dev *mdev = to_mdev(ibqp-device); struct mlx4_dev *dev = mdev-dev; struct mlx4_ib_qp *mqp = to_mqp(ibqp); - struct net_device *ndev; struct mlx4_ib_gid_entry *ge; enum mlx4_protocol prot = MLX4_PROT_IB_IPV6; struct mlx4_flow_reg_id reg_id = {0, 0}; @@ -1370,13 +1369,6 @@ static int mlx4_ib_mcg_detach(struct ib_qp *ibqp, union ib_gid *gid, u16 lid) mutex_lock(mqp-mutex); ge = find_gid_entry(mqp, gid-raw); if (ge) { - spin_lock_bh(mdev-iboe.lock); - ndev = ge-added ? mdev-iboe.netdevs[ge-port - 1] : NULL; - if (ndev) - dev_hold(ndev); - spin_unlock_bh(mdev-iboe.lock); - if (ndev) - dev_put(ndev); list_del(ge-list); kfree(ge); } else @@ -1543,7 +1535,7 @@ static void mlx4_ib_scan_netdevs(struct mlx4_ib_dev *ibdev, iboe = ibdev-iboe; - spin_lock_bh(iboe-lock); + down_write(iboe-sem); mlx4_foreach_ib_transport_port(port, ibdev-dev) { iboe-netdevs[port - 1] = @@ -1555,7 +1547,7 @@ static void mlx4_ib_scan_netdevs(struct mlx4_ib_dev *ibdev, update_qps_port = port; } - spin_unlock_bh(iboe-lock); + up_write(iboe-sem); if (update_qps_port 0) mlx4_ib_update_qps(ibdev, dev, update_qps_port); @@ -1848,7 +1840,7 @@ static void *mlx4_ib_add(struct mlx4_dev *dev) mlx4_ib_alloc_eqs(dev, ibdev); - spin_lock_init(iboe-lock); + init_rwsem(iboe-sem); if (init_node_data(ibdev)) goto err_map; @@ -2153,7 +2145,8 @@ static void handle_bonded_port_state_event(struct work_struct *work) struct ib_event ibev; kfree(ew); - spin_lock_bh(ibdev-iboe.lock); + + down_read(ibdev-iboe.sem); for (i = 0; i MLX4_MAX_PORTS; ++i) { struct net_device *curr_netdev = ibdev-iboe.netdevs[i]; @@ -2165,7 +2158,7 @@ static void handle_bonded_port_state_event(struct work_struct *work) bonded_port_state = (bonded_port_state != IB_PORT_ACTIVE) ? curr_port_state : IB_PORT_ACTIVE; } - spin_unlock_bh(ibdev-iboe.lock); + up_read(ibdev-iboe.sem); ibev.device = ibdev-ib_dev; ibev.element.port_num = 1; diff --git a/drivers/infiniband/hw/mlx4/mlx4_ib.h b/drivers/infiniband/hw/mlx4/mlx4_ib.h index e3805a4..166ebf9 100644 --- a/drivers/infiniband/hw/mlx4/mlx4_ib.h +++ b/drivers/infiniband/hw/mlx4/mlx4_ib.h @@ -455,7 +455,7 @@ struct mlx4_ib_sriov { }; struct mlx4_ib_iboe { - spinlock_t lock; + struct rw_semaphore sem; /* guard from concurrent access to data in this struct */ struct net_device *netdevs[MLX4_MAX_PORTS]; atomic64_t mac[MLX4_MAX_PORTS]; struct notifier_block nb; -- 1.7.1 -- To unsubscribe from this list
[PATCH 18/30] IB/mlx4: Remove gid table management for RoCE
From: Moni Shoua mo...@mellanox.com RoCE GID table management moved to InfiniBand core driver. Core driver is now responsible to populate the GID table and supply query and lookup functions for GIDs. HW drivers are responsible only modify GID table in network adapters. The query_gid hook should now return the answer from the cache when link layer is Ethernet. Signed-off-by: Moni Shoua mo...@mellanox.com Signed-off-by: Somnath Kotur somnath.ko...@emulex.com --- drivers/infiniband/hw/mlx4/main.c| 495 +- drivers/infiniband/hw/mlx4/mlx4_ib.h |4 - 2 files changed, 14 insertions(+), 485 deletions(-) diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c index 6fa5e49..91caffc 100644 --- a/drivers/infiniband/hw/mlx4/main.c +++ b/drivers/infiniband/hw/mlx4/main.c @@ -45,6 +45,7 @@ #include rdma/ib_smi.h #include rdma/ib_user_verbs.h #include rdma/ib_addr.h +#include rdma/ib_cache.h #include linux/mlx4/driver.h #include linux/mlx4/cmd.h @@ -74,13 +75,6 @@ static const char mlx4_ib_version[] = DRV_NAME : Mellanox ConnectX InfiniBand driver v DRV_VERSION ( DRV_RELDATE )\n; -struct update_gid_work { - struct work_struct work; - union ib_gidgids[128]; - struct mlx4_ib_dev *dev; - int port; -}; - static void do_slave_init(struct mlx4_ib_dev *ibdev, int slave, int do_init); static struct workqueue_struct *wq; @@ -474,23 +468,21 @@ out: return err; } -static int iboe_query_gid(struct ib_device *ibdev, u8 port, int index, - union ib_gid *gid) -{ - struct mlx4_ib_dev *dev = to_mdev(ibdev); - - *gid = dev-iboe.gid_table[port - 1][index]; - - return 0; -} - static int mlx4_ib_query_gid(struct ib_device *ibdev, u8 port, int index, union ib_gid *gid) { - if (rdma_port_get_link_layer(ibdev, port) == IB_LINK_LAYER_INFINIBAND) + int ret; + + if (ib_cache_use_roce_gid_cache(ibdev, port)) return __mlx4_ib_query_gid(ibdev, port, index, gid, 0); - else - return iboe_query_gid(ibdev, port, index, gid); + + ret = ib_get_cached_gid(ibdev, port, index, gid, NULL); + if (ret == -EAGAIN) { + memcpy(gid, zgid, sizeof(*gid)); + return 0; + } + + return ret; } int __mlx4_ib_query_pkey(struct ib_device *ibdev, u8 port, u16 index, @@ -1480,273 +1472,6 @@ static struct device_attribute *mlx4_class_attributes[] = { dev_attr_board_id }; -static void mlx4_addrconf_ifid_eui48(u8 *eui, u16 vlan_id, -struct net_device *dev) -{ - memcpy(eui, dev-dev_addr, 3); - memcpy(eui + 5, dev-dev_addr + 3, 3); - if (vlan_id 0x1000) { - eui[3] = vlan_id 8; - eui[4] = vlan_id 0xff; - } else { - eui[3] = 0xff; - eui[4] = 0xfe; - } - eui[0] ^= 2; -} - -static void update_gids_task(struct work_struct *work) -{ - struct update_gid_work *gw = container_of(work, struct update_gid_work, work); - struct mlx4_cmd_mailbox *mailbox; - union ib_gid *gids; - int err; - struct mlx4_dev *dev = gw-dev-dev; - int is_bonded = mlx4_is_bonded(dev); - - if (!gw-dev-ib_active) - return; - - mailbox = mlx4_alloc_cmd_mailbox(dev); - if (IS_ERR(mailbox)) { - pr_warn(update gid table failed %ld\n, PTR_ERR(mailbox)); - return; - } - - gids = mailbox-buf; - memcpy(gids, gw-gids, sizeof gw-gids); - - err = mlx4_cmd(dev, mailbox-dma, MLX4_SET_PORT_GID_TABLE 8 | gw-port, - 1, MLX4_CMD_SET_PORT, MLX4_CMD_TIME_CLASS_B, - MLX4_CMD_WRAPPED); - if (err) - pr_warn(set port command failed\n); - else - if ((gw-port == 1) || !is_bonded) - mlx4_ib_dispatch_event(gw-dev, - is_bonded ? 1 : gw-port, - IB_EVENT_GID_CHANGE); - - mlx4_free_cmd_mailbox(dev, mailbox); - kfree(gw); -} - -static void reset_gids_task(struct work_struct *work) -{ - struct update_gid_work *gw = - container_of(work, struct update_gid_work, work); - struct mlx4_cmd_mailbox *mailbox; - union ib_gid *gids; - int err; - struct mlx4_dev *dev = gw-dev-dev; - - if (!gw-dev-ib_active) - return; - - mailbox = mlx4_alloc_cmd_mailbox(dev); - if (IS_ERR(mailbox)) { - pr_warn(reset gid table failed\n); - goto free; - } - - gids = mailbox-buf; - memcpy(gids, gw-gids, sizeof(gw-gids)); - - if (mlx4_ib_port_link_layer(gw-dev-ib_dev, gw-port
[PATCH 17/30] RDMA/ocrdma: changes to support user AH creation
From: Devesh Sharma devesh.sha...@emulex.com To support user space AH this uses ahid field to convey l3-type to user space library. The library is responsible for decoding the l3-type out of ahid. Signed-off-by: Somnath Kotur somnath.ko...@emulex.com Signed-off-by: Devesh Sharma devesh.sha...@emulex.com --- drivers/infiniband/hw/ocrdma/ocrdma_ah.c |5 + drivers/infiniband/hw/ocrdma/ocrdma_ah.h |5 +++-- 2 files changed, 8 insertions(+), 2 deletions(-) diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_ah.c b/drivers/infiniband/hw/ocrdma/ocrdma_ah.c index 70a885b..b42fa24 100644 --- a/drivers/infiniband/hw/ocrdma/ocrdma_ah.c +++ b/drivers/infiniband/hw/ocrdma/ocrdma_ah.c @@ -190,6 +190,11 @@ struct ib_ah *ocrdma_create_ah(struct ib_pd *ibpd, struct ib_ah_attr *attr) ahid_addr = pd-uctx-ah_tbl.va + attr-dlid; *ahid_addr = 0; *ahid_addr |= ah-id OCRDMA_AH_ID_MASK; + if (ocrdma_is_rocev2_supported(dev)) { + *ahid_addr |= ((u32)ah-hdr_type + OCRDMA_AH_L3_TYPE_MASK) + OCRDMA_AH_L3_TYPE_SHIFT; + } if (isvlan) *ahid_addr |= (OCRDMA_AH_VLAN_VALID_MASK OCRDMA_AH_VLAN_VALID_SHIFT); diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_ah.h b/drivers/infiniband/hw/ocrdma/ocrdma_ah.h index 726a87c..ed45ecd 100644 --- a/drivers/infiniband/hw/ocrdma/ocrdma_ah.h +++ b/drivers/infiniband/hw/ocrdma/ocrdma_ah.h @@ -31,9 +31,10 @@ enum { OCRDMA_AH_ID_MASK = 0x3FF, OCRDMA_AH_VLAN_VALID_MASK = 0x01, - OCRDMA_AH_VLAN_VALID_SHIFT = 0x1F + OCRDMA_AH_VLAN_VALID_SHIFT = 0x1F, + OCRDMA_AH_L3_TYPE_MASK = 0x03, + OCRDMA_AH_L3_TYPE_SHIFT = 0x1D /* 29 bits */ }; - struct ib_ah *ocrdma_create_ah(struct ib_pd *, struct ib_ah_attr *); int ocrdma_destroy_ah(struct ib_ah *); int ocrdma_query_ah(struct ib_ah *, struct ib_ah_attr *); -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 16/30] RDMA/ocrdma: changes to support RoCE-v2 in RC path
From: Devesh Sharma devesh.sha...@emulex.com To support RoCE-V2 this patch implements following changes 1. Get the GID-type for a given sgid. 2. Based on the gid type get IPv4 L3 address and give those to FW. 3. Provide l3-type to FW. Signed-off-by: Somnath Kotur somnath.ko...@emulex.com Signed-off-by: Devesh Sharma devesh.sha...@emulex.com --- drivers/infiniband/hw/ocrdma/ocrdma_hw.c | 28 +++- 1 files changed, 27 insertions(+), 1 deletions(-) diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_hw.c b/drivers/infiniband/hw/ocrdma/ocrdma_hw.c index cb98911..237b62c 100644 --- a/drivers/infiniband/hw/ocrdma/ocrdma_hw.c +++ b/drivers/infiniband/hw/ocrdma/ocrdma_hw.c @@ -2433,7 +2433,13 @@ static int ocrdma_set_av_params(struct ocrdma_qp *qp, union ib_gid sgid, zgid; struct ib_gid_attr sgid_attr; u32 vlan_id = 0x; - u8 mac_addr[6]; + u8 mac_addr[6], hdr_type; + union { + struct sockaddr _sockaddr; + struct sockaddr_in _sockaddr_in; + struct sockaddr_in6 _sockaddr_in6; + } sgid_addr, dgid_addr; + struct ocrdma_dev *dev = get_ocrdma_dev(qp-ibqp.device); if ((ah_attr-ah_flags IB_AH_GRH) == 0) @@ -2448,6 +2454,8 @@ static int ocrdma_set_av_params(struct ocrdma_qp *qp, cmd-params.hop_lmt_rq_psn |= (ah_attr-grh.hop_limit OCRDMA_QP_PARAMS_HOP_LMT_SHIFT); cmd-flags |= OCRDMA_QP_PARA_FLOW_LBL_VALID; + + /* GIDs */ memcpy(cmd-params.dgid[0], ah_attr-grh.dgid.raw[0], sizeof(cmd-params.dgid)); @@ -2471,6 +2479,19 @@ static int ocrdma_set_av_params(struct ocrdma_qp *qp, return status; cmd-params.dmac_b0_to_b3 = mac_addr[0] | (mac_addr[1] 8) | (mac_addr[2] 16) | (mac_addr[3] 24); + hdr_type = ib_gid_to_network_type(sgid_attr.gid_type, sgid); + if (hdr_type == RDMA_NETWORK_IPV4) { + status = rdma_gid2ip(sgid_addr._sockaddr, sgid); + if (status) + return status; + status = rdma_gid2ip(dgid_addr._sockaddr, ah_attr-grh.dgid); + if (status) + return status; + memcpy(cmd-params.dgid[0], + dgid_addr._sockaddr_in.sin_addr.s_addr, 4); + memcpy(cmd-params.sgid[0], + sgid_addr._sockaddr_in.sin_addr.s_addr, 4); + } /* convert them to LE format. */ ocrdma_cpu_to_le32(cmd-params.dgid[0], sizeof(cmd-params.dgid)); ocrdma_cpu_to_le32(cmd-params.sgid[0], sizeof(cmd-params.sgid)); @@ -2482,6 +2503,11 @@ static int ocrdma_set_av_params(struct ocrdma_qp *qp, cmd-params.rnt_rc_sl_fl |= (dev-sl 0x07) OCRDMA_QP_PARAMS_SL_SHIFT; } + + cmd-params.max_sge_recv_flags |= +((hdr_type +OCRDMA_QP_PARAMS_FLAGS_L3_TYPE_SHIFT) +OCRDMA_QP_PARAMS_FLAGS_L3_TYPE_MASK); return 0; } -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 02/30] IB/core: Add kref to IB devices
From: Matan Barak mat...@mellanox.com Previously. we used device_mutex lock in order to protect the device's list. That means that in order to guarantee a device isn't freed while we use it, we had to lock all devices. Adding a kref per IB device. Before an IB device is unregistered, we wait before its not held anymore. Signed-off-by: Matan Barak mat...@mellanox.com Signed-off-by: Somnath Kotur somnath.ko...@emulex.com --- drivers/infiniband/core/device.c | 41 ++ include/rdma/ib_verbs.h |6 + 2 files changed, 47 insertions(+), 0 deletions(-) diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c index 18c1ece..8616a95 100644 --- a/drivers/infiniband/core/device.c +++ b/drivers/infiniband/core/device.c @@ -261,6 +261,39 @@ out: return ret; } +static void ib_device_complete_cb(struct kref *kref) +{ + struct ib_device *device = container_of(kref, struct ib_device, + refcount); + + if (device-reg_state = IB_DEV_UNREGISTERING) + complete(device-free); +} + +/** + * ib_device_hold - increase the reference count of device + * @device: ib device to prevent from being free'd + * + * Prevent the device from being free'd. + */ +void ib_device_hold(struct ib_device *device) +{ + kref_get(device-refcount); +} +EXPORT_SYMBOL(ib_device_hold); + +/** + * ib_device_put - decrease the reference count of device + * @device: allows this device to be free'd + * + * Puts the ib_device and allows it to be free'd. + */ +int ib_device_put(struct ib_device *device) +{ + return kref_put(device-refcount, ib_device_complete_cb); +} +EXPORT_SYMBOL(ib_device_put); + /** * ib_register_device - Register an IB device with IB core * @device:Device to register @@ -312,6 +345,9 @@ int ib_register_device(struct ib_device *device, list_add_tail(device-core_list, device_list); + kref_init(device-refcount); + init_completion(device-free); + device-reg_state = IB_DEV_REGISTERED; { @@ -342,6 +378,8 @@ void ib_unregister_device(struct ib_device *device) mutex_lock(device_mutex); + device-reg_state = IB_DEV_UNREGISTERING; + list_for_each_entry_reverse(client, client_list, list) if (client-remove) client-remove(device); @@ -355,6 +393,9 @@ void ib_unregister_device(struct ib_device *device) ib_device_unregister_sysfs(device); + ib_device_put(device); + wait_for_completion(device-free); + spin_lock_irqsave(device-client_data_lock, flags); list_for_each_entry_safe(context, tmp, device-client_data_list, list) kfree(context); diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index 1866595..a7593b0 100644 --- a/include/rdma/ib_verbs.h +++ b/include/rdma/ib_verbs.h @@ -1716,6 +1716,7 @@ struct ib_device { enum { IB_DEV_UNINITIALIZED, IB_DEV_REGISTERED, + IB_DEV_UNREGISTERING, IB_DEV_UNREGISTERED }reg_state; @@ -1728,6 +1729,8 @@ struct ib_device { u32 local_dma_lkey; u8 node_type; u8 phys_port_cnt; + struct kref refcount; + struct completionfree; }; struct ib_client { @@ -1741,6 +1744,9 @@ struct ib_client { struct ib_device *ib_alloc_device(size_t size); void ib_dealloc_device(struct ib_device *device); +void ib_device_hold(struct ib_device *device); +int ib_device_put(struct ib_device *device); + int ib_register_device(struct ib_device *device, int (*port_callback)(struct ib_device *, u8, struct kobject *)); -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 03/30] IB/core: Add RoCE GID population
From: Matan Barak mat...@mellanox.com In order to populate the GID table, we need to listen for events: (a) IB device has been added or removed - used in order to allocate/deallocate the cache and populate the GID table internally. (b) inet events - add new GIDs (according to the IP addresses) to the table. (c) netdev up/down/change_addr - if a netdev is built onto our RoCE device, we need to add/delete its IPs. When an event is received, multiple entries (each with different GID type) are added. Signed-off-by: Matan Barak mat...@mellanox.com Signed-off-by: Somnath Kotur somnath.ko...@emulex.com --- drivers/infiniband/core/Makefile |2 +- drivers/infiniband/core/core_priv.h | 26 ++ drivers/infiniband/core/device.c | 80 + drivers/infiniband/core/roce_gid_cache.c | 66 drivers/infiniband/core/roce_gid_mgmt.c | 545 ++ include/rdma/ib_addr.h |2 +- include/rdma/ib_verbs.h |9 + 7 files changed, 728 insertions(+), 2 deletions(-) create mode 100644 drivers/infiniband/core/roce_gid_mgmt.c diff --git a/drivers/infiniband/core/Makefile b/drivers/infiniband/core/Makefile index 9b63bdf..2c94963 100644 --- a/drivers/infiniband/core/Makefile +++ b/drivers/infiniband/core/Makefile @@ -10,7 +10,7 @@ obj-$(CONFIG_INFINIBAND_USER_ACCESS) += ib_uverbs.o ib_ucm.o \ ib_core-y := packer.o ud_header.o verbs.o sysfs.o \ device.o fmr_pool.o cache.o netlink.o \ - roce_gid_cache.o + roce_gid_cache.o roce_gid_mgmt.o ib_core-$(CONFIG_INFINIBAND_USER_MEM) += umem.o ib_core-$(CONFIG_INFINIBAND_ON_DEMAND_PAGING) += umem_odp.o umem_rbtree.o diff --git a/drivers/infiniband/core/core_priv.h b/drivers/infiniband/core/core_priv.h index a502daa..12797d9 100644 --- a/drivers/infiniband/core/core_priv.h +++ b/drivers/infiniband/core/core_priv.h @@ -39,6 +39,8 @@ #include rdma/ib_verbs.h +extern struct workqueue_struct *roce_gid_mgmt_wq; + int ib_device_register_sysfs(struct ib_device *device, int (*port_callback)(struct ib_device *, u8, struct kobject *)); @@ -53,6 +55,22 @@ void ib_cache_cleanup(void); int ib_resolve_eth_l2_attrs(struct ib_qp *qp, struct ib_qp_attr *qp_attr, int *qp_attr_mask); +typedef void (*roce_netdev_callback)(struct ib_device *device, u8 port, + struct net_device *idev, void *cookie); + +typedef int (*roce_netdev_filter)(struct ib_device *device, u8 port, +struct net_device *idev, void *cookie); + +void ib_dev_roce_ports_of_netdev(struct ib_device *ib_dev, +roce_netdev_filter filter, +void *filter_cookie, +roce_netdev_callback cb, +void *cookie); +void ib_enum_roce_ports_of_netdev(roce_netdev_filter filter, + void *filter_cookie, + roce_netdev_callback cb, + void *cookie); + int roce_gid_cache_get_gid(struct ib_device *ib_dev, u8 port, int index, union ib_gid *gid, struct ib_gid_attr *attr); @@ -66,6 +84,9 @@ int roce_gid_cache_find_gid_by_port(struct ib_device *ib_dev, union ib_gid *gid, int roce_gid_cache_is_active(struct ib_device *ib_dev, u8 port); +int roce_gid_cache_setup(void); +void roce_gid_cache_cleanup(void); + int roce_add_gid(struct ib_device *ib_dev, u8 port, union ib_gid *gid, struct ib_gid_attr *attr); @@ -75,4 +96,9 @@ int roce_del_gid(struct ib_device *ib_dev, u8 port, int roce_del_all_netdev_gids(struct ib_device *ib_dev, u8 port, struct net_device *ndev); +int roce_gid_mgmt_init(void); +void roce_gid_mgmt_cleanup(void); + +int roce_rescan_device(struct ib_device *ib_dev); + #endif /* _CORE_PRIV_H */ diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c index 8616a95..5ce57bf 100644 --- a/drivers/infiniband/core/device.c +++ b/drivers/infiniband/core/device.c @@ -39,6 +39,7 @@ #include linux/init.h #include linux/mutex.h #include rdma/rdma_netlink.h +#include rdma/ib_addr.h #include core_priv.h @@ -640,6 +641,82 @@ int ib_query_gid(struct ib_device *device, EXPORT_SYMBOL(ib_query_gid); /** + * ib_dev_roce_ports_of_netdev - enumerate RoCE ports of ibdev in + * respect of netdev + * @ib_dev : IB device we want to query + * @filter: Should we call the callback? + * @filter_cookie: Cookie passed to filter + * @cb: Callback to call for each found RoCE ports + * @cookie: Cookie passed back to the callback + * + * Enumerates all of the physical RoCE ports of ib_dev RoCE ports + * which are relaying Ethernet packets to a specific