Re: [PATCH for-next V2 00/11] Add RoCE v2 support
> The part that bothers me about this is that this statement makes sense > when just thinking about the spec, as you say. However, once you > consider namespaces, security implications make this statement spec > compliant, but still unacceptable. The spec itself is silent on > namespaces. But, you guys wanted, and you got, namespace support. > Since that's beyond spec, and carries security requirements, I think > it's fair to say that from now on, the Linux kernel RDMA stack can no > longer *just* be spec compliant. There are additional concerns that > must always be addressed with new changes, and those are the namespace > constraint preservation concerns. I can't object to that but I really would like to get an example of a security risk. So far, besides hearing that the way we choose to handle completions is wrong, I didn't get a convincing example of how where it doesn't work. Moreover, regarding security, all we wanted is for HW to report the L3 protocol (IB, IPv4, or IPv6) in the packet. This is data that with some extra CPU cycles can be obtained from the 40 bytes that are scattered to the receive bufs anyway. So, if there is a security hole it exists from day one of the IB stack and this is not the time we should insist on fixing it. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RESEND] infiniband:core:Add needed error path in cm_init_av_by_path
On Tue, Dec 15, 2015 at 10:30:22AM -0700, Jason Gunthorpe wrote: > On Tue, Dec 15, 2015 at 05:38:34PM +0100, Michael Wang wrote: > > The hop_limit is only suggest that the package allowed to be > > routed, not have to, correct? > > If the hop limit is >= 2 (?) then the GRH is mandatory. Yes >= 2. "Setting this value to 0 or 1 will ensure that the packet will not be forwarded beyond the local subnet." -- IBTA 8.3.6 Hop Limit Ira > The > SM will return this information in the PathRecord if it determines a > GRH is required. The whole stack follows this protocol. > > The GRH is optional for in-subnet communications. > > Jason > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: device attr cleanup
On 12/15/2015 9:03 PM, Doug Ledford wrote: Or, you specifically asked me to wait until this week. I made my initial impressions clear (I don't necessarily like the removal of the attr struct, but I like the removal of all of the query calls, and I'm inclined to take the patch in spite of not liking the removal of the struct). Do you have anything to add or have we beat this horse to death? Hi Doug, Lets stop beating, both horses and people. I do understand that 1. you don't link the removal of the attr 2. you do like the removal of all the query calls I am proposing to take the path of a patch that does exactly #2 while avoiding #1. What's wrong with that? I haven't heard any reasoning for why its so good to stash ~50 new fields on the IB device structure except for the author saying that other subsystems do that and other people saying they are in favor of this approach while not providing any reasoning, except for maybe something on bikes. Why you or anyone else has to be from now and ever the cache line police making sure that people don't add new attributes in random locations over the IB device structure? What's wrong with putting fifty attributesin a structure which is a field of the device struct and have people go there to see what are the d ifferentattrs and add news ones there? This will make the 4.5 merge window extremely complex or even totally threatened w.r.t to the RDMA subsystem and related drivers by 3.3K LOC patch. Sorry, but, I still don't get it. Or. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH for-next V2 00/11] Add RoCE v2 support
On Tue, Dec 15, 2015 at 04:45:21PM -0500, Doug Ledford wrote: > In particular, Liran piped up with this comment: > > "Also, I don't want to do any route resolution on the Rx path. A UD QP > completion just reports the details of the packet it received. > > Conceptually, an incoming packet may not even match an SGID index at > all. Maybe, responses should be sent from another device. This should > not be decided that the point that a packet was received." > > The part that bothers me about this is that this statement makes sense > when just thinking about the spec, as you say. However, once you > consider namespaces, security implications make this statement spec > compliant, but still unacceptable. This is where I got to and decided there is no desire to make a correct patch for this stuff. You are completely correct in your comments above. > Since you and Jason did not reach a consensus, I have to dig in and see > if these patches make it possible to break namespace confinement, either > accidentally or with intentionally tricky behavior. That's going to > take me some time. Everything to do with parsing a wc and constructing an AH is wrong in this series, and the fixes require the API changes I mentioned ( add 'wc to gid index' API call, add 'route to AH' API call) Every time you read 'route validation' - that is an error, the route should never just be validated, it is needed information to construct a rocev2 AH. All the places that roughly hand parse the rocev2 WC should not be open coded. Even if current HW is broken for namespaces we should not enshrine that in the kapi. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] IB core: Display 64 bit counters from the extended set
On 12/15/2015 04:42 PM, Hal Rosenstock wrote: > On 12/15/2015 4:20 PM, Jason Gunthorpe wrote: >>> The unicast/multicast extended counters are not always supported - depends on setting of PerfMgt ClassPortInfo CapabilityMask.IsExtendedWidthSupportedNoIETF (bit 10). > >> Yes.. certainly this proposed patch needs to account for that and >> continue to use the 32 bit ones in that case. > > There are no 32 bit equivalents of those 4 "IETF" counters ([uni > multi]cast [xmit rcv] pkts). > > When not supported, perhaps it is best not to populate these counters in > sysfs so one can discern between counter not supported and 0 value. > > I'm still working on definitive mthca answer but think the attribute is > not supported there. Does anyone out there have an mthca setup where > they can try this ? Yes. -- Doug Ledford GPG KeyID: 0E572FDD signature.asc Description: OpenPGP digital signature
Re: [PATCH for-next V2 00/11] Add RoCE v2 support
On 12/15/2015 02:15 AM, Moni Shoua wrote: > On Thu, Dec 3, 2015 at 3:47 PM, Matan Barak wrote: >> Hi Doug, >> >> This series adds the support for RoCE v2. In order to support RoCE v2, >> we add gid_type attribute to every GID. When the RoCE GID management >> populates the GID table, it duplicates each GID with all supported types. >> This gives the user the ability to communicate over each supported >> type. >> >> Patch 0001, 0002 and 0003 add support for multiple GID types to the >> cache and related APIs. The third patch exposes the GID attributes >> information is sysfs. >> >> Patch 0004 adds the RoCE v2 GID type and the capabilities required >> from the vendor in order to implement RoCE v2. These capabilities >> are grouped together as RDMA_CORE_PORT_IBA_ROCE_UDP_ENCAP. >> >> RoCE v2 could work at IPv4 and IPv6 networks. When receiving ib_wc, this >> information should come from the vendor's driver. In case the vendor >> doesn't supply this information, we parse the packet headers and resolve >> its network type. Patch 0005 adds this information and required utilities. >> >> Patches 0006 and 0007 adds route validation. This is mandatory to ensure >> that we send packets using GIDS which corresponds to a net-device that >> can be routed to the destination. >> >> Patches 0008 and 0009 add configfs support (and the required >> infrastructure) for CMA. The administrator should be able to set the >> default RoCE type. This is done through a new per-port >> default_roce_mode configfs file. >> >> Patch 0010 formats a QP1 packet in order to support RoCE v2 CM >> packets. This is required for vendors which implement their >> QP1 as a Raw QP. >> >> Patch 0011 adds support for IPv4 multicast as an IPv4 network >> requires IGMP to be sent in order to join multicast groups. >> >> Vendors code aren't part of this patch-set. Soft-Roce will be >> sent soon and depends on these patches. Other vendors, like >> mlx4, ocrdma and mlx5 will follow. >> >> This patch is applied on "Change per-entry locks in GID cache to table lock" >> which was sent to the mailing list. >> >> Thanks, >> Matan >> >> Changed from V1: >> - Rebased against Linux 4.4-rc2 master branch. >> - Add route validation >> - ConfigFS - avoid compiling INFINIBAND=y and CONFIGFS_FS=m >> - Add documentation for configfs and sysfs ABI >> - Remove ifindex and gid_type from mcmember >> >> Changes from V0: >> - Rebased patches against Doug's latest k.o/for-4.4 tree. >> - Fixed a bug in configfs (rmdir caused an incorrect free). >> >> Matan Barak (8): >> IB/core: Add gid_type to gid attribute >> IB/cm: Use the source GID index type >> IB/core: Add gid attributes to sysfs >> IB/core: Add ROCE_UDP_ENCAP (RoCE V2) type >> IB/core: Move rdma_is_upper_dev_rcu to header file >> IB/core: Validate route in ib_init_ah_from_wc and ib_init_ah_from_path >> IB/rdma_cm: Add wrapper for cma reference count >> IB/cma: Add configfs for rdma_cm >> >> Moni Shoua (2): >> IB/core: Initialize UD header structure with IP and UDP headers >> IB/cma: Join and leave multicast groups with IGMP >> >> Somnath Kotur (1): >> IB/core: Add rdma_network_type to wc >> >> Documentation/ABI/testing/configfs-rdma_cm | 22 ++ >> Documentation/ABI/testing/sysfs-class-infiniband | 16 ++ >> drivers/infiniband/Kconfig | 9 + >> drivers/infiniband/core/Makefile | 2 + >> drivers/infiniband/core/addr.c | 185 + >> drivers/infiniband/core/cache.c | 169 >> drivers/infiniband/core/cm.c | 31 ++- >> drivers/infiniband/core/cma.c| 261 -- >> drivers/infiniband/core/cma_configfs.c | 321 >> +++ >> drivers/infiniband/core/core_priv.h | 45 >> drivers/infiniband/core/device.c | 10 +- >> drivers/infiniband/core/multicast.c | 17 +- >> drivers/infiniband/core/roce_gid_mgmt.c | 81 -- >> drivers/infiniband/core/sa_query.c | 76 +- >> drivers/infiniband/core/sysfs.c | 184 - >> drivers/infiniband/core/ud_header.c | 155 ++- >> drivers/infiniband/core/uverbs_marshall.c| 1 + >> drivers/infiniband/core/verbs.c | 170 ++-- >> drivers/infiniband/hw/mlx4/qp.c | 7 +- >> drivers/infiniband/hw/mthca/mthca_qp.c | 2 +- >> drivers/infiniband/hw/ocrdma/ocrdma_ah.c | 2 +- >> include/rdma/ib_addr.h | 11 +- >> include/rdma/ib_cache.h | 4 + >> include/rdma/ib_pack.h | 45 +++- >> include/rdma/ib_sa.h | 3 + >> include/rdma/ib_verbs.h | 78 +- >> 26 files changed, 1704 insertions(+), 203 deletions(-) >> create mode 100644 Documentation/ABI/testing/configfs-rdma
Re: [PATCH 3/3] IB core: Display 64 bit counters from the extended set
On 12/15/2015 4:20 PM, Jason Gunthorpe wrote: >> The unicast/multicast extended counters are not always supported - >> > depends on setting of PerfMgt ClassPortInfo >> > CapabilityMask.IsExtendedWidthSupportedNoIETF (bit 10). > Yes.. certainly this proposed patch needs to account for that and > continue to use the 32 bit ones in that case. There are no 32 bit equivalents of those 4 "IETF" counters ([uni multi]cast [xmit rcv] pkts). When not supported, perhaps it is best not to populate these counters in sysfs so one can discern between counter not supported and 0 value. I'm still working on definitive mthca answer but think the attribute is not supported there. Does anyone out there have an mthca setup where they can try this ? -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] IB core: Display 64 bit counters from the extended set
On Tue, Dec 15, 2015 at 03:01:03PM -0500, Hal Rosenstock wrote: > > I would agree, from my observation, that the two main byte counters > > are always available. > > The extended packet counts work but I thought there was a PMA with one > of the extended byte counts wired to 0. Can't remember > > But not necessarily the others. > > The unicast/multicast extended counters are not always supported - > depends on setting of PerfMgt ClassPortInfo > CapabilityMask.IsExtendedWidthSupportedNoIETF (bit 10). Yes.. certainly this proposed patch needs to account for that and continue to use the 32 bit ones in that case. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] IB core: Display 64 bit counters from the extended set
On 12/15/2015 2:55 PM, Jason Gunthorpe wrote: > On Tue, Dec 15, 2015 at 01:51:35PM -0600, Christoph Lameter wrote: >> On Mon, 14 Dec 2015, Hal Rosenstock wrote: >> Mellanox should really confirm this for their hardware matrix. >>> >>> I am trying to get definitive answer to this. >> >> I was told today on a conf call with a couple of Mellanox employees that >> extended counters are always available. > > .. and every field is valid? > > I would agree, from my observation, that the two main byte counters > are always available. The extended packet counts work but I thought there was a PMA with one of the extended byte counts wired to 0. > But not necessarily the others. The unicast/multicast extended counters are not always supported - depends on setting of PerfMgt ClassPortInfo CapabilityMask.IsExtendedWidthSupportedNoIETF (bit 10). -- Hal > Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] IB core: Display 64 bit counters from the extended set
On Tue, Dec 15, 2015 at 01:51:35PM -0600, Christoph Lameter wrote: > On Mon, 14 Dec 2015, Hal Rosenstock wrote: > > > > Mellanox should really confirm this for their hardware matrix. > > > > I am trying to get definitive answer to this. > > I was told today on a conf call with a couple of Mellanox employees that > extended counters are always available. .. and every field is valid? I would agree, from my observation, that the two main byte counters are always available.. But not necessarily the others. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] IB core: Display 64 bit counters from the extended set
On Mon, 14 Dec 2015, Hal Rosenstock wrote: > > Mellanox should really confirm this for their hardware matrix. > > I am trying to get definitive answer to this. I was told today on a conf call with a couple of Mellanox employees that extended counters are always available. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 00/11] NFS/RDMA client patches for 4.5
Thanks, Chuck! Everything looks okay to me, so I'll apply these patches and send them to Trond before the holiday. On 12/14/2015 04:17 PM, Chuck Lever wrote: > For 4.5, I'd like to address the send queue accounting and > invalidation/unmap ordering issues Jason brought up a couple of > months ago. > > In preparation for Doug's final topic branch, Anna, I've rebased > these on Christoph's ib_device_attr branch, but there were no merge > conflicts or other changes needed. Could you begin preparing these > for linux-next and other final testing and review? No merge conflicts is nice, and we might not need to worry about ordering the pull request. Thanks, Anna > > Also available in the "nfs-rdma-for-4.5" topic branch of this git repo: > > git://git.linux-nfs.org/projects/cel/cel-2.6.git > > Or for browsing: > > http://git.linux-nfs.org/?p=cel/cel-2.6.git;a=log;h=refs/heads/nfs-rdma-for-4.5 > > > Changes since v2: > - Rebased on Christoph's ib_device_attr branch > > > Changes since v1: > > - Rebased on v4.4-rc3 > - Receive buffer safety margin patch dropped > - Backchannel pr_err and pr_info converted to dprintk > - Backchannel spin locks converted to work queue-safe locks > - Fixed premature release of backchannel request buffer > - NFSv4.1 callbacks tested with for-4.5 server > > --- > > Chuck Lever (11): > xprtrdma: Fix additional uses of spin_lock_irqsave(rb_lock) > xprtrdma: xprt_rdma_free() must not release backchannel reqs > xprtrdma: Disable RPC/RDMA backchannel debugging messages > xprtrdma: Move struct ib_send_wr off the stack > xprtrdma: Introduce ro_unmap_sync method > xprtrdma: Add ro_unmap_sync method for FRWR > xprtrdma: Add ro_unmap_sync method for FMR > xprtrdma: Add ro_unmap_sync method for all-physical registration > SUNRPC: Introduce xprt_commit_rqst() > xprtrdma: Invalidate in the RPC reply handler > xprtrdma: Revert commit e7104a2a9606 ('xprtrdma: Cap req_cqinit'). > > > include/linux/sunrpc/xprt.h|1 > net/sunrpc/xprt.c | 14 +++ > net/sunrpc/xprtrdma/backchannel.c | 22 ++--- > net/sunrpc/xprtrdma/fmr_ops.c | 64 + > net/sunrpc/xprtrdma/frwr_ops.c | 175 > +++- > net/sunrpc/xprtrdma/physical_ops.c | 13 +++ > net/sunrpc/xprtrdma/rpc_rdma.c | 14 +++ > net/sunrpc/xprtrdma/transport.c|3 + > net/sunrpc/xprtrdma/verbs.c| 13 +-- > net/sunrpc/xprtrdma/xprt_rdma.h| 12 +- > 10 files changed, 283 insertions(+), 48 deletions(-) > > -- > Chuck Lever > -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: device attr cleanup
On 12/08/2015 06:15 PM, Or Gerlitz wrote: > On Wed, Dec 9, 2015 at 1:13 AM, Or Gerlitz wrote: >> On Wed, Dec 9, 2015 at 12:59 AM, Jason Gunthorpe >> wrote: >>> On Wed, Dec 09, 2015 at 12:47:55AM +0200, Or Gerlitz wrote: >> The patch is three liner to add the cached attrs -- http://marc.info/?l=linux-rdma&m=142309296813985&w=2 -- if you are OK with that, I will add a 2nd patch that ports all ULPs to use the cached copy instead of their code which does the query. >> Actually, why not start with this approach and later decide if we need to go further of this is enough? >> >>> Or, can we please stop this bikeshedding. Christoph's patch is done, >>> well tested and does a lot more clean up that this hacky three liner. >> >> Christoph patch is here indeed, it does two things >> >> 1. remove all the ULP device attr alloc, device query, attr free hassle >> 2. adds tons of new fields to struct ib_device >> >> I think it just goes too much and needlessly adds tons of these new >> fields directly to struct ib_device where we can have them all well >> scoped into ib_device_attr member or pointer from struct ib_device >> >>> It is a good patch, >> >> I didn't say it's bad, I said I think we can achieve #1 above with way >> less drastic patch, and I'd like to hear the maintainer option, and >> have the chance to argu with the maintainer if needed, yours I heard, >> you like it, I know. > > and I will be OOO for the rest of this week, since this is in the air > for ***months***, it would be fair enough to ask the maintainer not to > cut it this way or another before next week, Doug, agree? Or, you specifically asked me to wait until this week. I made my initial impressions clear (I don't necessarily like the removal of the attr struct, but I like the removal of all of the query calls, and I'm inclined to take the patch in spite of not liking the removal of the struct). Do you have anything to add or have we beat this horse to death? -- Doug Ledford GPG KeyID: 0E572FDD signature.asc Description: OpenPGP digital signature
Re: [PATCH RESEND] infiniband:core:Add needed error path in cm_init_av_by_path
On Tue, Dec 15, 2015 at 05:38:34PM +0100, Michael Wang wrote: > The hop_limit is only suggest that the package allowed to be > routed, not have to, correct? If the hop limit is >= 2 (?) then the GRH is mandatory. The SM will return this information in the PathRecord if it determines a GRH is required. The whole stack follows this protocol. The GRH is optional for in-subnet communications. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] IB/ipoib: Change sendq threshold size
On 11/17/2015 01:45 AM, Erez Shitrit wrote: > On Mon, Nov 16, 2015 at 8:54 PM, Yuval Shaia wrote: >> Expecting half of the queue to be empty before reopening netif_queue seems >> too high. >> With this fix threshold will be 90%. > > Is this backed by tests? why not 75% or 25%? Much like Erez, I suspect this is a bad change. Unless there are performance numbers to show that under heavy usage this is actually a positive change, I'm not inclined to take it. -- Doug Ledford GPG KeyID: 0E572FDD signature.asc Description: OpenPGP digital signature
[PATCH for-next V2 4/5] IB/mlx5: Add hca_core_clock_offset to udata in init_ucontext
Pass hca_core_clock_offset to user-space is mandatory in order to let the user-space read the free-running clock register from the right offset in the memory mapped page. Passing this value is done by changing the vendor's command and response of init_ucontext to be in extensible form. Signed-off-by: Matan Barak Reviewed-by: Moshe Lazer --- drivers/infiniband/hw/mlx5/main.c| 37 drivers/infiniband/hw/mlx5/mlx5_ib.h | 3 +++ drivers/infiniband/hw/mlx5/user.h| 12 ++-- include/linux/mlx5/device.h | 7 +-- 4 files changed, 47 insertions(+), 12 deletions(-) diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c index c707c43..917363c 100644 --- a/drivers/infiniband/hw/mlx5/main.c +++ b/drivers/infiniband/hw/mlx5/main.c @@ -582,8 +582,8 @@ static struct ib_ucontext *mlx5_ib_alloc_ucontext(struct ib_device *ibdev, struct ib_udata *udata) { struct mlx5_ib_dev *dev = to_mdev(ibdev); - struct mlx5_ib_alloc_ucontext_req_v2 req; - struct mlx5_ib_alloc_ucontext_resp resp; + struct mlx5_ib_alloc_ucontext_req_v2 req = {}; + struct mlx5_ib_alloc_ucontext_resp resp = {}; struct mlx5_ib_ucontext *context; struct mlx5_uuar_info *uuari; struct mlx5_uar *uars; @@ -598,20 +598,19 @@ static struct ib_ucontext *mlx5_ib_alloc_ucontext(struct ib_device *ibdev, if (!dev->ib_active) return ERR_PTR(-EAGAIN); - memset(&req, 0, sizeof(req)); reqlen = udata->inlen - sizeof(struct ib_uverbs_cmd_hdr); if (reqlen == sizeof(struct mlx5_ib_alloc_ucontext_req)) ver = 0; - else if (reqlen == sizeof(struct mlx5_ib_alloc_ucontext_req_v2)) + else if (reqlen >= sizeof(struct mlx5_ib_alloc_ucontext_req_v2)) ver = 2; else return ERR_PTR(-EINVAL); - err = ib_copy_from_udata(&req, udata, reqlen); + err = ib_copy_from_udata(&req, udata, min(reqlen, sizeof(req))); if (err) return ERR_PTR(err); - if (req.flags || req.reserved) + if (req.flags) return ERR_PTR(-EINVAL); if (req.total_num_uuars > MLX5_MAX_UUARS) @@ -620,6 +619,14 @@ static struct ib_ucontext *mlx5_ib_alloc_ucontext(struct ib_device *ibdev, if (req.total_num_uuars == 0) return ERR_PTR(-EINVAL); + if (req.comp_mask) + return ERR_PTR(-EOPNOTSUPP); + + if (reqlen > sizeof(req) && + !ib_is_udata_cleared(udata, sizeof(req), +udata->inlen - sizeof(req))) + return ERR_PTR(-EOPNOTSUPP); + req.total_num_uuars = ALIGN(req.total_num_uuars, MLX5_NON_FP_BF_REGS_PER_PAGE); if (req.num_low_latency_uuars > req.total_num_uuars - 1) @@ -635,6 +642,8 @@ static struct ib_ucontext *mlx5_ib_alloc_ucontext(struct ib_device *ibdev, resp.max_send_wqebb = 1 << MLX5_CAP_GEN(dev->mdev, log_max_qp_sz); resp.max_recv_wr = 1 << MLX5_CAP_GEN(dev->mdev, log_max_qp_sz); resp.max_srq_recv_wr = 1 << MLX5_CAP_GEN(dev->mdev, log_max_srq_sz); + resp.response_length = min(offsetof(typeof(resp), response_length) + + sizeof(resp.response_length), udata->outlen); context = kzalloc(sizeof(*context), GFP_KERNEL); if (!context) @@ -685,8 +694,20 @@ static struct ib_ucontext *mlx5_ib_alloc_ucontext(struct ib_device *ibdev, resp.tot_uuars = req.total_num_uuars; resp.num_ports = MLX5_CAP_GEN(dev->mdev, num_ports); - err = ib_copy_to_udata(udata, &resp, - sizeof(resp) - sizeof(resp.reserved)); + + if (field_avail(typeof(resp), reserved2, udata->outlen)) + resp.response_length += sizeof(resp.reserved2); + + if (field_avail(typeof(resp), hca_core_clock_offset, udata->outlen)) { + resp.comp_mask |= + MLX5_IB_ALLOC_UCONTEXT_RESP_MASK_CORE_CLOCK_OFFSET; + resp.hca_core_clock_offset = + offsetof(struct mlx5_init_seg, internal_timer_h) % + PAGE_SIZE; + resp.response_length += sizeof(resp.hca_core_clock_offset); + } + + err = ib_copy_to_udata(udata, &resp, resp.response_length); if (err) goto out_uars; diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h index 6333472..43b3c58 100644 --- a/drivers/infiniband/hw/mlx5/mlx5_ib.h +++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h @@ -55,6 +55,9 @@ pr_err("%s:%s:%d:(pid %d): " format, (dev)->ib_dev.name, __func__,\ pr_warn("%s:%s:%d:(pid %d): " format, (dev)->ib_dev.name, __func__,\ __LINE__, current->pid, ##arg) +#define field_avail(type, fld, sz) (offsetof(type, fld) + \ +
[PATCH for-next V2 0/5] User-space time-stamping support for mlx5_ib
Hi Eli, This patch-set adds user-space support for time-stamping in mlx5_ib. It implements the necessary API: (a) ib_create_cq_ex - Add support for CQ creation flags (b) ib_query_device - return timestamp_mask and hca_core_clock. We also add support for mmaping the HCA's free running clock. In order to do so, we use the response of the vendor's extended part in init_ucontext. This allows us to pass the page offset of the free running clock register to the user-space driver. In order to implement it in a future extensible manner, we use the same mechanism of verbs extensions to the mlx5 vendor part as well. Regards, Matan Changes from v1: * Change ib_is_udata_cleared to use memchr_inv. Changes from v0: * Limit mmap PAGE_SIZE to 4K (security wise). * Optimize ib_is_udata_cleared. * Pass hca_core_clock_offset in the vendor's response part of init_ucontext. Matan Barak (5): IB/mlx5: Add create_cq extended command IB/core: Add ib_is_udata_cleared IB/mlx5: Add support for hca_core_clock and timestamp_mask IB/mlx5: Add hca_core_clock_offset to udata in init_ucontext IB/mlx5: Mmap the HCA's core clock register to user-space drivers/infiniband/hw/mlx5/cq.c | 7 drivers/infiniband/hw/mlx5/main.c| 67 +++- drivers/infiniband/hw/mlx5/mlx5_ib.h | 7 +++- drivers/infiniband/hw/mlx5/user.h| 12 +-- include/linux/mlx5/device.h | 7 ++-- include/linux/mlx5/mlx5_ifc.h| 9 +++-- include/rdma/ib_verbs.h | 27 +++ 7 files changed, 120 insertions(+), 16 deletions(-) -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH for-next V2 1/5] IB/mlx5: Add create_cq extended command
In order to create a CQ that supports timestamp, mlx5 needs to support the extended create CQ command with the timestamp flag. Signed-off-by: Matan Barak Reviewed-by: Eli Cohen --- drivers/infiniband/hw/mlx5/cq.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/drivers/infiniband/hw/mlx5/cq.c b/drivers/infiniband/hw/mlx5/cq.c index 3dfd287..186debf 100644 --- a/drivers/infiniband/hw/mlx5/cq.c +++ b/drivers/infiniband/hw/mlx5/cq.c @@ -743,6 +743,10 @@ static void destroy_cq_kernel(struct mlx5_ib_dev *dev, struct mlx5_ib_cq *cq) mlx5_db_free(dev->mdev, &cq->db); } +enum { + CQ_CREATE_FLAGS_SUPPORTED = IB_CQ_FLAGS_TIMESTAMP_COMPLETION +}; + struct ib_cq *mlx5_ib_create_cq(struct ib_device *ibdev, const struct ib_cq_init_attr *attr, struct ib_ucontext *context, @@ -766,6 +770,9 @@ struct ib_cq *mlx5_ib_create_cq(struct ib_device *ibdev, if (entries < 0) return ERR_PTR(-EINVAL); + if (attr->flags & ~CQ_CREATE_FLAGS_SUPPORTED) + return ERR_PTR(-EOPNOTSUPP); + entries = roundup_pow_of_two(entries + 1); if (entries > (1 << MLX5_CAP_GEN(dev->mdev, log_max_cq_sz))) return ERR_PTR(-EINVAL); -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH for-next V2 2/5] IB/core: Add ib_is_udata_cleared
Extending core and vendor verb commands require us to check that the unknown part of the user's given command is all zeros. Adding ib_is_udata_cleared in order to do so. Signed-off-by: Matan Barak --- include/rdma/ib_verbs.h | 27 +++ 1 file changed, 27 insertions(+) diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index 9a68a19..33ab4eb 100644 --- a/include/rdma/ib_verbs.h +++ b/include/rdma/ib_verbs.h @@ -50,6 +50,8 @@ #include #include #include +#include +#include #include #include @@ -1887,6 +1889,31 @@ static inline int ib_copy_to_udata(struct ib_udata *udata, void *src, size_t len return copy_to_user(udata->outbuf, src, len) ? -EFAULT : 0; } +static inline bool ib_is_udata_cleared(struct ib_udata *udata, + size_t offset, + size_t len) +{ + const void __user *p = udata->inbuf + offset; + bool ret = false; + u8 *buf; + + if (len > USHRT_MAX) + return false; + + buf = kmalloc(len, GFP_KERNEL); + if (!buf) + return false; + + if (copy_from_user(buf, p, len)) + goto free; + + ret = !memchr_inv(buf, 0, len); + +free: + kfree(buf); + return ret; +} + /** * ib_modify_qp_is_ok - Check that the supplied attribute mask * contains all required attributes and no attributes not allowed for -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: device attr cleanup (was: Handle mlx4 max_sge_rd correctly)
On Thu, Dec 10, 2015 at 7:49 PM, Chuck Lever wrote: > >> On Dec 10, 2015, at 6:30 PM, Christoph Hellwig wrote: >> >> On Thu, Dec 10, 2015 at 11:07:03AM -0700, Jason Gunthorpe wrote: >>> The ARM folks do this sort of stuff on a regular basis.. Very early on >>> Doug prepares a topic branch with only the big change, NFS folks pull >>> it and then pull your work. Then Doug would send the topic branch to >>> Linus as soon as the merge window opens, then NFS would send theirs. >>> >>> This is alot less work overall than trying to sequence multiple >>> patches over multiple releases.. >> >> Agreed. Staging has alaways been a giant pain and things tend to never >> finish moving over that way if they are non-trivial enough. > > In that case: > > You need to make sure you have all the right Acks. I've added > Anna and Bruce to Ack the NFS-related portions. Santosh should > Ack the RDS part. The NFS client parts look okay to me: Acked-by: Anna Schumaker > > http://git.infradead.org/users/hch/rdma.git/shortlog/refs/heads/ib_device_attr > > Given the proximity to the holidays and the next merge window, > Doug will need to get a properly-acked topic branch published > in the next day or two so the rest of us can rebase and start > testing before the relevant parties disappear for the holiday. > > > -- > Chuck Lever > > > > -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH for-next V2 5/5] IB/mlx5: Mmap the HCA's core clock register to user-space
In order to read the HCA's current cycles register, we need to map it to user-space. Add support to map this register via mmap command. Signed-off-by: Matan Barak Reviewed-by: Moshe Lazer --- drivers/infiniband/hw/mlx5/main.c| 28 drivers/infiniband/hw/mlx5/mlx5_ib.h | 4 +++- 2 files changed, 31 insertions(+), 1 deletion(-) diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c index 917363c..d037a72 100644 --- a/drivers/infiniband/hw/mlx5/main.c +++ b/drivers/infiniband/hw/mlx5/main.c @@ -810,6 +810,34 @@ static int mlx5_ib_mmap(struct ib_ucontext *ibcontext, struct vm_area_struct *vm case MLX5_IB_MMAP_GET_CONTIGUOUS_PAGES: return -ENOSYS; + case MLX5_IB_MMAP_CORE_CLOCK: + { + phys_addr_t pfn; + + if (vma->vm_end - vma->vm_start != PAGE_SIZE) + return -EINVAL; + + if (vma->vm_flags & (VM_WRITE | VM_EXEC)) + return -EPERM; + + /* Don't expose to user-space information it shouldn't have */ + if (PAGE_SIZE > 4096) + return -EOPNOTSUPP; + + vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot); + pfn = (dev->mdev->iseg_base + + offsetof(struct mlx5_init_seg, internal_timer_h)) >> + PAGE_SHIFT; + if (io_remap_pfn_range(vma, vma->vm_start, pfn, + PAGE_SIZE, vma->vm_page_prot)) + return -EAGAIN; + + mlx5_ib_dbg(dev, "mapped internal timer at 0x%lx, PA 0x%llx\n", + vma->vm_start, + (unsigned long long)pfn << PAGE_SHIFT); + break; + } + default: return -EINVAL; } diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h index 43b3c58..a87f312 100644 --- a/drivers/infiniband/hw/mlx5/mlx5_ib.h +++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h @@ -65,7 +65,9 @@ enum { enum mlx5_ib_mmap_cmd { MLX5_IB_MMAP_REGULAR_PAGE = 0, - MLX5_IB_MMAP_GET_CONTIGUOUS_PAGES = 1, /* always last */ + MLX5_IB_MMAP_GET_CONTIGUOUS_PAGES = 1, + /* 5 is chosen in order to be compatible with old versions of libmlx5 */ + MLX5_IB_MMAP_CORE_CLOCK = 5, }; enum { -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH for-next V2 3/5] IB/mlx5: Add support for hca_core_clock and timestamp_mask
Reporting the hca_core_clock (in kHZ) and the timestamp_mask in query_device extended verb. timestamp_mask is used by users in order to know what is the valid range of the raw timestamps, while hca_core_clock reports the clock frequency that is used for timestamps. Signed-off-by: Matan Barak Reviewed-by: Moshe Lazer --- drivers/infiniband/hw/mlx5/main.c | 2 ++ include/linux/mlx5/mlx5_ifc.h | 9 ++--- 2 files changed, 8 insertions(+), 3 deletions(-) diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c index 7e97cb5..c707c43 100644 --- a/drivers/infiniband/hw/mlx5/main.c +++ b/drivers/infiniband/hw/mlx5/main.c @@ -293,6 +293,8 @@ static int mlx5_ib_query_device(struct ib_device *ibdev, props->max_total_mcast_qp_attach = props->max_mcast_qp_attach * props->max_mcast_grp; props->max_map_per_fmr = INT_MAX; /* no limit in ConnectIB */ + props->hca_core_clock = MLX5_CAP_GEN(mdev, device_frequency_khz); + props->timestamp_mask = 0x7FFFULL; #ifdef CONFIG_INFINIBAND_ON_DEMAND_PAGING if (MLX5_CAP_GEN(mdev, pg)) diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h index 1565324..1a74114 100644 --- a/include/linux/mlx5/mlx5_ifc.h +++ b/include/linux/mlx5/mlx5_ifc.h @@ -794,15 +794,18 @@ struct mlx5_ifc_cmd_hca_cap_bits { u8 reserved_63[0x8]; u8 log_uar_page_sz[0x10]; - u8 reserved_64[0x100]; + u8 reserved_64[0x20]; + u8 device_frequency_mhz[0x20]; + u8 device_frequency_khz[0x20]; + u8 reserved_65[0xa0]; - u8 reserved_65[0x1f]; + u8 reserved_66[0x1f]; u8 cqe_zip[0x1]; u8 cqe_zip_timeout[0x10]; u8 cqe_zip_max_num[0x10]; - u8 reserved_66[0x220]; + u8 reserved_67[0x220]; }; enum { -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: device attr cleanup (was: Handle mlx4 max_sge_rd correctly)
On 12/9/2015 10:42 AM, Christoph Hellwig wrote: On Tue, Dec 08, 2015 at 07:52:03PM -0500, ira.weiny wrote: Searching patchworks... I'm a bit worried about the size of the patch and I would like to see it split up for review. But I agree Christophs method is better long term. I'd be happy to split it up if I could see a way to split it. So if anyone has an idea you're welcome! Christoph do you have this on github somewhere? Perhaps it is split but I'm not finding in on patchworks? No need for github, we have much better (and older) git hosting sites :) http://git.infradead.org/users/hch/rdma.git/shortlog/refs/heads/ib_device_attr net/rds/ib.c | 34 ++--- net/rds/iw.c | 23 +-- For RDS changes, Acked-by: Santosh Shilimlkar I will try and find some time to test them out. Thanks !! -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RESEND] infiniband:core:Add needed error path in cm_init_av_by_path
On 12/15/2015 04:52 PM, Nicholas Krause wrote: > This adds a needed error path in the function, cm_init_av_by_path > after the call to ib_init_ah_from_path in order to avoid incorrectly > accessing the path pointer of structure type ib_sa_path_rec if this > function call fails to complete its intended work successfully by > returning a error code. > > Signed-off-by: Nicholas Krause > --- > drivers/infiniband/core/cm.c | 7 +-- > 1 file changed, 5 insertions(+), 2 deletions(-) > > diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c > index 0a26dd6..e9b36ea 100644 > --- a/drivers/infiniband/core/cm.c > +++ b/drivers/infiniband/core/cm.c > @@ -383,8 +383,11 @@ static int cm_init_av_by_path(struct ib_sa_path_rec > *path, struct cm_av *av) > return ret; > > av->port = port; > - ib_init_ah_from_path(cm_dev->ib_device, port->port_num, path, > - &av->ah_attr); > + ret = ib_init_ah_from_path(cm_dev->ib_device, port->port_num, path, > +&av->ah_attr); ..Just wondering what if the transport don't require GRH? eg inside the same subnet? The hop_limit is only suggest that the package allowed to be routed, not have to, correct? Regards, Michael Wang > + if (ret) > + return ret; > + > av->timeout = path->packet_life_time + 1; > > return 0; > -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH for-next 6/6] IB/mlx5: Add Raw Packet Queue Pair (QP) support
On 12/10/2015 10:09 PM, Majd Dibbiny wrote: This patchs adds support for Raw Packet QP for the mlx5 device. Raw Packet QP, unlike other QP types, has no matching mlx5_core_qp object but rather it is built of RQ/SQ/TIR/TIS/TD mlx5_core object. The Raw Packet QP state changes are implemented by changing the state of the sub-objects. Since the SQ and RQ work-queue (WQ) buffers are not contiguous like other QPs, we allocate separate buffers in the user-space and pass the address of each one of them separately to the kernel. Signed-off-by: Majd Dibbiny --- drivers/infiniband/hw/mlx5/main.c | 15 +- drivers/infiniband/hw/mlx5/mlx5_ib.h | 52 +- drivers/infiniband/hw/mlx5/qp.c| 894 ++--- drivers/infiniband/hw/mlx5/user.h | 1 + drivers/net/ethernet/mellanox/mlx5/core/qp.c | 48 +- drivers/net/ethernet/mellanox/mlx5/core/transobj.c | 39 + include/linux/mlx5/mlx5_ifc.h | 11 +- include/linux/mlx5/qp.h| 3 +- include/linux/mlx5/transobj.h | 4 + 9 files changed, 913 insertions(+), 154 deletions(-) Seems way too big for single patch, seek how to split to 2-3 patches Or. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH for-next V1 2/5] IB/core: Add ib_is_udata_cleared
On 14/12/2015 18:31, Matan Barak wrote: > I'm not sure regarding the string.c location, as it deals with user > buffers, but in order not to > be dependent on this, I'll change this code to the following. > > static inline bool ib_is_udata_cleared(struct ib_udata *udata, >u8 cleared_char, >size_t offset, >size_t len) > { > const void __user *p = udata->inbuf + offset; > bool ret = false; > u8 *buf; > > if (len > USHRT_MAX) > return false; > > buf = kmalloc(len, GFP_KERNEL); > if (!buf) > return false; > > if (copy_from_user(buf, p, len)) > goto free; > > ret = !memchr_inv(buf, cleared_char, len); > > free: > kfree(buf); > return ret; > } Looks good to me. Thanks, Haggai -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Bug 109391] New: Packet and byte counters only 32bit
https://bugzilla.kernel.org/show_bug.cgi?id=109391 Bug ID: 109391 Summary: Packet and byte counters only 32bit Product: Drivers Version: 2.5 Kernel Version: git HEAD Hardware: All OS: Linux Tree: Mainline Status: NEW Severity: normal Priority: P1 Component: Infiniband/RDMA Assignee: drivers_infiniband-r...@kernel-bugs.osdl.org Reporter: philipp.ma...@linbit.com Regression: No Trying to find packet and byte counters for the RDMA transports I stumbled upon this directory: /sys/class/infiniband/mlx5_1/ports/1/counters/ This seems to have all the requested data; but, looking again, I can see port_rcv_data: 4294967295 which looks awful. And indeed, in https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/drivers/infiniband/core/sysfs.c#n401 there's a 32bit width, with explicit code in https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/drivers/infiniband/hw/qib/qib_mad.c#n1507 to cap on that limit... That is useless for the upcoming speeds. The MLX5 driver can drive Connect-X4 cards which go 100GBit/sec; that is 11GByte/sec, which means that the 32bit counter will overflow ~3 times a second. So it can't be monitored. Please provide 64bit counters (for packets and bytes, at least)! Thanks a lot. -- You are receiving this mail because: You are watching the assignee of the bug. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html