[PATCH for-3.10 0/3] 2nd batch of iSER patches
Hi Roland, Here's a 2nd batch of iser patches for 3.10, with the hightlight being a fix to the device removal flow from Roi Dayan. For some reason the race this patch fixes doesn't hit on IB link layer as of different timings (e.g more modules that register with the IB core, such as IPoIB), but it was there, thanks for Sean we nailed down the problem and came up with a proper fix. Also, with the kernel now having iser target support through LIO and the increased use cases for iser, I added a MAINTAINERS entry to help people figure out who's involved (and send bugs and flames...) hope you're OK with that. Or. Or Gerlitz (2): IB/iser: Add Mellanox copyright MAINTAINERS: Add entry for iSCSI Extensions for RDMA (iSER) initiator Roi Dayan (1): IB/iser: Fix device removal flow MAINTAINERS | 13 + drivers/infiniband/ulp/iser/iscsi_iser.c |1 + drivers/infiniband/ulp/iser/iscsi_iser.h |1 + drivers/infiniband/ulp/iser/iser_initiator.c |1 + drivers/infiniband/ulp/iser/iser_memory.c|1 + drivers/infiniband/ulp/iser/iser_verbs.c | 16 +--- 6 files changed, 26 insertions(+), 7 deletions(-) -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH RESEND for-3.10 3/3] MAINTAINERS: Add entry for iSCSI Extensions for RDMA (iSER) initiator
Add entry for the iSER initiator driver and which is maintained by Or Gerlitz and Roi Dayan below the kernel Infiniband subsystem. Signed-off-by: Or Gerlitz ogerl...@mellanox.com --- MAINTAINERS | 10 ++ 1 files changed, 10 insertions(+), 0 deletions(-) diff --git a/MAINTAINERS b/MAINTAINERS index 8bdd7a7..cc5861c 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -4378,6 +4378,16 @@ S: Maintained F: drivers/scsi/*iscsi* F: include/scsi/*iscsi* +ISCSI EXTENSIONS FOR RDMA (ISER) INITIATOR +M: Or Gerlitz ogerl...@mellanox.com +M: Roi Dayan r...@mellanox.com +L: linux-rdma@vger.kernel.org +S: Supported +W: http://www.openfabrics.org +W: www.open-iscsi.org +Q: http://patchwork.kernel.org/project/linux-rdma/list/ +F: drivers/infiniband/ulp/iser + ISDN SUBSYSTEM M: Karsten Keil i...@linux-pingi.de L: isdn4li...@listserv.isdn4linux.de (subscribers-only) -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH RESEND for-3.10 2/3] IB/iser: Add Mellanox copyright
Add Mellanox copyright to the iser initiator source code which I maintain. Signed-off-by: Or Gerlitz ogerl...@mellanox.com --- drivers/infiniband/ulp/iser/iscsi_iser.c |1 + drivers/infiniband/ulp/iser/iscsi_iser.h |1 + drivers/infiniband/ulp/iser/iser_initiator.c |1 + drivers/infiniband/ulp/iser/iser_memory.c|1 + drivers/infiniband/ulp/iser/iser_verbs.c |1 + 5 files changed, 5 insertions(+), 0 deletions(-) diff --git a/drivers/infiniband/ulp/iser/iscsi_iser.c b/drivers/infiniband/ulp/iser/iscsi_iser.c index f19b099..2e84ef8 100644 --- a/drivers/infiniband/ulp/iser/iscsi_iser.c +++ b/drivers/infiniband/ulp/iser/iscsi_iser.c @@ -5,6 +5,7 @@ * Copyright (C) 2004 Alex Aizman * Copyright (C) 2005 Mike Christie * Copyright (c) 2005, 2006 Voltaire, Inc. All rights reserved. + * Copyright (c) 2013 Mellanox Technologies. All rights reserved. * maintained by openib-gene...@openib.org * * This software is available to you under a choice of one of two diff --git a/drivers/infiniband/ulp/iser/iscsi_iser.h b/drivers/infiniband/ulp/iser/iscsi_iser.h index cae6084..e0afab4 100644 --- a/drivers/infiniband/ulp/iser/iscsi_iser.h +++ b/drivers/infiniband/ulp/iser/iscsi_iser.h @@ -8,6 +8,7 @@ * * Copyright (c) 2004, 2005, 2006 Voltaire, Inc. All rights reserved. * Copyright (c) 2005, 2006 Cisco Systems. All rights reserved. + * Copyright (c) 2013 Mellanox Technologies. All rights reserved. * * This software is available to you under a choice of one of two * licenses. You may choose to be licensed under the terms of the GNU diff --git a/drivers/infiniband/ulp/iser/iser_initiator.c b/drivers/infiniband/ulp/iser/iser_initiator.c index a00ccd1..b6d81a8 100644 --- a/drivers/infiniband/ulp/iser/iser_initiator.c +++ b/drivers/infiniband/ulp/iser/iser_initiator.c @@ -1,5 +1,6 @@ /* * Copyright (c) 2004, 2005, 2006 Voltaire, Inc. All rights reserved. + * Copyright (c) 2013 Mellanox Technologies. All rights reserved. * * This software is available to you under a choice of one of two * licenses. You may choose to be licensed under the terms of the GNU diff --git a/drivers/infiniband/ulp/iser/iser_memory.c b/drivers/infiniband/ulp/iser/iser_memory.c index 68ebb7f..7827baf 100644 --- a/drivers/infiniband/ulp/iser/iser_memory.c +++ b/drivers/infiniband/ulp/iser/iser_memory.c @@ -1,5 +1,6 @@ /* * Copyright (c) 2004, 2005, 2006 Voltaire, Inc. All rights reserved. + * Copyright (c) 2013 Mellanox Technologies. All rights reserved. * * This software is available to you under a choice of one of two * licenses. You may choose to be licensed under the terms of the GNU diff --git a/drivers/infiniband/ulp/iser/iser_verbs.c b/drivers/infiniband/ulp/iser/iser_verbs.c index f13cc22..2c4941d 100644 --- a/drivers/infiniband/ulp/iser/iser_verbs.c +++ b/drivers/infiniband/ulp/iser/iser_verbs.c @@ -1,6 +1,7 @@ /* * Copyright (c) 2004, 2005, 2006 Voltaire, Inc. All rights reserved. * Copyright (c) 2005, 2006 Cisco Systems. All rights reserved. + * Copyright (c) 2013 Mellanox Technologies. All rights reserved. * * This software is available to you under a choice of one of two * licenses. You may choose to be licensed under the terms of the GNU -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH RESEND for-3.10 1/3] IB/iser: Fix device removal flow
From: Roi Dayan r...@mellanox.com Change the code to destroy the last opened rdma_cm id after making sure we released all other objects (QP,CQs,PD,etc) associated with the IB device. Since iser accesses the IB device using the rdma_cm id, we need to free any objects that are related to the device which is associated with the rdma_cm id prior to destroying that id. When this isn't ensured, the low level driver that created this device can be unloaded before iser has a chance to free all the objects and a such a call may invoke code segment which isn't valid any more and crash. Cc: Sean Hefty sean.he...@intel.com Signed-off-by: Roi Dayan r...@mellanox.com Signed-off-by: Or Gerlitz ogerl...@mellanox.com --- drivers/infiniband/ulp/iser/iser_verbs.c | 15 --- 1 files changed, 8 insertions(+), 7 deletions(-) diff --git a/drivers/infiniband/ulp/iser/iser_verbs.c b/drivers/infiniband/ulp/iser/iser_verbs.c index 5278916..f13cc22 100644 --- a/drivers/infiniband/ulp/iser/iser_verbs.c +++ b/drivers/infiniband/ulp/iser/iser_verbs.c @@ -292,10 +292,10 @@ out_err: } /** - * releases the FMR pool, QP and CMA ID objects, returns 0 on success, + * releases the FMR pool and QP objects, returns 0 on success, * -1 on failure */ -static int iser_free_ib_conn_res(struct iser_conn *ib_conn, int can_destroy_id) +static int iser_free_ib_conn_res(struct iser_conn *ib_conn) { int cq_index; BUG_ON(ib_conn == NULL); @@ -314,13 +314,9 @@ static int iser_free_ib_conn_res(struct iser_conn *ib_conn, int can_destroy_id) rdma_destroy_qp(ib_conn-cma_id); } - /* if cma handler context, the caller acts s.t the cma destroy the id */ - if (ib_conn-cma_id != NULL can_destroy_id) - rdma_destroy_id(ib_conn-cma_id); ib_conn-fmr_pool = NULL; ib_conn-qp = NULL; - ib_conn-cma_id = NULL; kfree(ib_conn-page_vec); if (ib_conn-login_buf) { @@ -415,11 +411,16 @@ static void iser_conn_release(struct iser_conn *ib_conn, int can_destroy_id) list_del(ib_conn-conn_list); mutex_unlock(ig.connlist_mutex); iser_free_rx_descriptors(ib_conn); - iser_free_ib_conn_res(ib_conn, can_destroy_id); + iser_free_ib_conn_res(ib_conn); ib_conn-device = NULL; /* on EVENT_ADDR_ERROR there's no device yet for this conn */ if (device != NULL) iser_device_try_release(device); + /* if cma handler context, the caller actually destroy the id */ + if (ib_conn-cma_id != NULL can_destroy_id) { + rdma_destroy_id(ib_conn-cma_id); + ib_conn-cma_id = NULL; + } iscsi_destroy_endpoint(ib_conn-ep); } -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH RESEND for-3.10 0/3] 2nd batch of iSER patches
Resending the whole series as of wrong chunk getting into patch #3, sorry for that. Hi Roland, Here's a 2nd batch of iser patches for 3.10, with the hightlight being a fix to the device removal flow from Roi Dayan. For some reason the race this patch fixes doesn't hit on IB link layer as of different timings (e.g more modules that register with the IB core, such as IPoIB), but it was there, thanks for Sean we nailed down the problem and came up with a proper fix. Also, with the kernel now having iser target support through LIO and the increased use cases for iser, I added a MAINTAINERS entry to help people figure out who's involved (and send bugs and flames...) hope you're OK with that. Or. Or Gerlitz (2): IB/iser: Add Mellanox copyright MAINTAINERS: Add entry for iSCSI Extensions for RDMA (iSER) initiator Roi Dayan (1): IB/iser: Fix device removal flow MAINTAINERS | 10 ++ drivers/infiniband/ulp/iser/iscsi_iser.c |1 + drivers/infiniband/ulp/iser/iscsi_iser.h |1 + drivers/infiniband/ulp/iser/iser_initiator.c |1 + drivers/infiniband/ulp/iser/iser_memory.c|1 + drivers/infiniband/ulp/iser/iser_verbs.c | 16 +--- 6 files changed, 23 insertions(+), 7 deletions(-) -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PULL] please pull infiniband.git
On 09/05/2013 00:20, Roland Dreier wrote: Or Gerlitz (2): IB/iser: Return error to upper layers on EAGAIN registration failures IB/iser: Add support for iser CM REQ additional info Roi Dayan (2): IB/iser: Add module version IB/iser: Move informational messages from error to info level Hi Roland, so Linus pulled these patches, but I can't find them on the for-next branch nor other branches of your tree, I assume this will be fixed through some rebase. Note I sent three more iser patches for 3.10. Or. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: list corruption in IPOIB
On Fri, May 17, 2013 at 10:36 PM, Jack Wang jinpu.w...@profitbricks.com wrote: We've seen below neigh-list list corruption warning during testing, So about little heads up on what kernel you are using? what's the way to trigger this warning? From Dongsu's and my opinion, several place also need netif_tx_lock(_bh)/netif_tx_unlock(_bh) pairs around neigh-list , I tried to add netif_tx_lock/netif_tx_unlock into ipoib_cm_destroy_tx, it improved the situation, there're some other places in ipoib_main.c and ipoib_mcast.c, but I don't know which lock should be added, if you can take some time to look into it, that will be great. what do you mean by improved the situation? the waring is gone? and if yes, what's remain? Or. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: list corruption in IPOIB
On 19/05/2013 00:36, Jack Wang wrote: I tried 3.4.23, and mainline kernel from Roland's rdma-for-linus, we added bug injection interface, run multithread iperf, and switched ib mode between connected and datagram in sync on each side as Shlomo suggested. Can you be more specific re the bug injection interface, is that existing kernel mechanism or something you added? so the bug triggers when you run iperf in multi-threaded mode AND in parallel inject errors AND in parallel switch between datagram and connected mode? bee --- I assume this isn't something you do just for the fun of it... so some problem X hits you in production and this problem Y you get with the above juggling, any known or empiric relation between the two? Or. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: MLX4 Cq Question
On 18/05/2013 00:37, Roland Dreier wrote: you see that when freeing a CQ, we first do the HW2SW_CQ firmware command; once this command completes, no more events will be generated for that CQ. Then we do synchronize_irq for the CQ's interrupt vector. Once that completes, no more completion handlers will be running for the CQ, so we can safely delete the CQ from the radix tree (relying on the radix tree's safety of deleting one entry while possibly looking up other entries, so no lock is needed). We also use the lock to synchronize against the CQ event function, which as you noted does take the lock too. Basic idea is that we're tricky and careful so we can make the fast path (completion interrupt handling) lock-free, but then use locks and whatever else needed in the slow path (CQ async event handling, CQ destroy). Jack, so do we finally agree to this analysis? last time when this was on the list, I was under the impression that there was no consensus and I also see that on the stack we provide to customers there's a patch of yours in that area, or it may fix another bug? Or. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: list corruption in IPOIB
On 20/05/2013 12:10, Jinpu Wang wrote: which list_del do you mean? in ipoib_cm_tx_start? yes, but not only, you can start with 5KG hammer and convert all thesehits to list_del_init linux-2.6]# grep list_del drivers/infiniband/ulp/ipoib/*.c | grep neigh drivers/infiniband/ulp/ipoib/ipoib_cm.c: list_del(neigh-list); drivers/infiniband/ulp/ipoib/ipoib_cm.c: list_del(neigh-list); drivers/infiniband/ulp/ipoib/ipoib_cm.c: list_del(neigh-list); drivers/infiniband/ulp/ipoib/ipoib_main.c: list_del(neigh-list); drivers/infiniband/ulp/ipoib/ipoib_main.c: list_del(neigh-list); drivers/infiniband/ulp/ipoib/ipoib_main.c: list_del(neigh-list); drivers/infiniband/ulp/ipoib/ipoib_main.c: list_del(neigh-list); drivers/infiniband/ulp/ipoib/ipoib_main.c: list_del(neigh-list); drivers/infiniband/ulp/ipoib/ipoib_main.c: list_del(neigh-list); -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
libibverbs / libmlx4 release
Hi Roland, Following what we discussed last week during the Linux Foundation EU summit, I think it would be good to follow what you said and have a point release for libibverbs and libmlx4 before we pull in the verbs extensions framework and features that use it (XRC, Flow-Steering, etc more fun). I mentioned to you that we have some more libmlx4 patches, but its totally OK for us to submit them after that release, makes sense? Or. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: list corruption in IPOIB
On 20/05/2013 15:46, Jinpu Wang wrote: A quick test show the list_corruption warning is gone, after I convert all list_del(neigh-list) to list_del_list(neigh-list). yes, but this wasn't your original problem or was it? -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: list corruption in IPOIB
On Mon, May 20, 2013 at 5:36 PM, Jack Wang jinpu.w...@profitbricks.com wrote: Sorry for confusion. Current list corruption is gone in my preliminary test, after I changed list_del to list_del_init as Or suggested. As Or asked for the original bug, so I just want to show him the whole story. I am still not clear if the bug you saw in your production environment is gone with the list_del_init patch applied, please clarify. Or. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
rockets feedbacks?
Hi Sean, Do we have some public quoted usages/feedback for rsockets? I think you've mentioned something during the panel at the Linux EU summit last week but I am not sure... Or. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: list corruption in IPOIB
On Mon, May 20, 2013 at 10:38 PM, Jack Wang jinpu.w...@profitbricks.com wrote: The bug in our production environment is introduced in our backport about ipoib fixes from mainline, and when we hit that bug we reverted back to old kernel without the backport patch, and the bug didn't happen for now. This list_del_init patch do fix list corruption warning, but it's not the one we hit in production, the list corruption is reproduced in our test setup with bug injection patch iperf -P 50 mode switch. Is this clear for you now? NO, you say that the list_del_init patch eliminates the list corruption warning, does the list corruption is still reproduced in your test setup even when the patch is applied?! what's the trace? and what is the trace you see in your production when using kernel X (which?) patches with commit Y (which?) Or. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: rockets feedbacks?
On Mon, May 20, 2013 at 10:52 PM, Hefty, Sean sean.he...@intel.com wrote: Do we have some public quoted usages/feedback for rsockets? I think you've mentioned something during the panel at the Linux EU summit last week but I am not sure... Most feedback I can think of has come via private emails or personal interactions, especially specific details of various usage models. So if you were pushing these private conversations to linux-rdma, more have been known on rsockets for the benefit of all... oh well. I think you mentioned something re Intel HPC group, or I am wrong? Or. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: MLX4 Cq Question
On 20/05/2013 17:53, Jack Morgenstein wrote: === net/mlx4_core: Fix racy flow in the driver CQ completion handler The mlx4 CQ completion handler, mlx4_cq_completion, doesn't bother to lock the radix tree which is used to manage the table of CQs, nor does it increase the reference count of the CQ before invoking the user provided callback (and decrease it afterwards). This is racy and can cause use-after-free, null pointer dereference, etc, which result in kernel crashes. To fix this, we must do the following in mlx4_cq_completion: - increase the ref count on the cq before invoking the user callback, and decrement it after the callback. - Place a lock around the radix tree lookup/ref-count-increase Using an irq spinlock will not fix this issue. The problem is that under VPI, the ETH interface uses multiple msix irq's, which can result in one cq completion event interrupting another in-progress cq completion event. A deadlock results when the handler for the first cq completion grabs the spinlock, and is interrupted by the second completion before it has a chance to release the spinlock. The handler for the second completion will deadlock waiting for the spinlock to be released. I am not sure to follow on two pieces here: 1. why we say that only mlx4_en uses multiple msix irq's? mlx4_ib also exposes multiple vectors (-- EQs -- MSI-X -- IRQ) and the iser driver use that, e.g creates multiple CQs each on different EQ 2. is possible in the Linux kernel for one hard irq callback to flash on CPU X while another hard irq callback is running on the same CPU? Or. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: MLX4 Cq Question
On 21/05/2013 13:42, Bart Van Assche wrote: On 05/21/13 11:40, Or Gerlitz wrote: 2. is possible in the Linux kernel for one hard irq callback to flash on CPU X while another hard irq callback is running on the same CPU? I think that from kernel 2.6.35 on MSI IRQs are no longer nested. See also http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=753649dbc49345a73a2454c770a3f2d54d11aec6 or http://lwn.net/Articles/380931/ thanks, so suppose we agree on that, still the patch makes sense as the race is there, but does the patch has to change? Or. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
better understanding rdma-cm UNREACHABLE/ETIMEDOUT scheme
Hi Sean, We have a user space application which is made of M (clients) x N (servers) RC connectivity pattern using librdmacm. Basically, there are N nodes, each running M client process and each client connects to all N servers. So under some unknown conditions, many of the clients connection attempts fail with RDMA_CM_EVENT_UNREACHABLE event and the status is -ETIMEDOUT. Looking on the rdma-cm kernel code, I see that the only location which generates this event is in cma_ib_handler when getting IB_CM_REQ_ERROR (or IB_CM_REP_ERROR). Digging down into the CM, I see that the only place where IB_CM_REQ_ERROR is delivered is on cm_process_send_error which is called when the status of mad send completion is not success or flush. Digging down into the MAD code and the CM usage of it, I see that that the mad code will issue a mac send completion handler with the IB_WC_RESP_TIMEOUT_ERR status, and that the CM code programs the number of retries set by its consumer (rdma-cm in this case) into the mad send buffer. Running this over an M=8 and N=4setup, e.g four nodes, each running one server process and eight client processes and sampling the IB CM counters before and after the job and adding the numbers from the four nodes, we see the following cm_tx_msgs.req = 395 cm_tx_retries.req= 270 cm_rx_msgs.req= 390 cm_tx_msgs.rep= 375 cm_tx_retries.rep= 255 cm_rx_msgs.rep= 380 cm_tx_msgs.rtu= 108 cm_rx_msgs.rtu= 103 cm_tx_msgs.mra= 540 cm_rx_msgs.mra= 270 cm_tx_retries.mra= 270 In cm_send_handler we see that the CM TX retry counter is incremented with the number of retries reported by the MAD layer, I also see that the RDMA-CM programs the CM to do 15 retries and the CM further programs this into the MAD send buffers. From the RTU counters its clear that at most ~100 connections got established out of 128. One thing seen in the nodes dmesg is a message from an old patch of yours which exists in ofed1.5.3 but didn't hit (or wasn't accepted?) upstream saying ib_cm: calculated mra timeout 67584 8192, decreasing used timeout_ms does this provides any insight into the problem? One more piece of info, is that this apps doesn't call rdma_disconnect at all, when they are done or if something goes wrong (e.g that unreachable event) they simply issue rdma_destroy_id which when I look on the rdma-cm/cm code gets to a CM function whic sends a dreq (if the ID is in the established state) and puts the ID in the timewait zone. So it seems we're not loosing mads, also on the stack they use (that 1.5.3) the ucma backlog size is 128 but each server process gets only 32 request (8x4) so we don't think ucma dropping REQs as of no more backlog budget takes place. Or. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: MLX4 Cq Question
On 21/05/2013 17:13, Jack Morgenstein wrote: I just need to verify that the patch can be applied correctly on the upstream kernel. The use of RCU (and not spinlock) makes sense from a performance standpoint in any case. We do NOT want to force mlx4_cq_completion to have a spinlock which is device-global, resulting in having completion event processing be single-threaded in effect). cool, lets do that and re-submit -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: better understanding rdma-cm UNREACHABLE/ETIMEDOUT scheme
On 21/05/2013 18:24, Hefty, Sean wrote: I don't remember this patch at all. Alex, can you please send Sean this patch -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: better understanding rdma-cm UNREACHABLE/ETIMEDOUT scheme
On Tue, May 21, 2013 at 6:24 PM, Hefty, Sean sean.he...@intel.com wrote: One thing seen in the nodes dmesg is a message from an old patch of yours which exists in ofed1.5.3 but didn't hit (or wasn't accepted?) upstream saying ib_cm: calculated mra timeout 67584 8192, decreasing used timeout_ms does this provides any insight into the problem? I don't remember this patch at all. Alex sent it to you, is that something which is missing upstream or alternatively could create troubles on that ofed stack where its applied? My first guess is that the server isn't responding to new requests. yep, smells like this could be the root cause here, Dina and Alex will do some tweaking of the server code to make sure there's no starvation is servicing new connection requests. Or. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/4] add RAW Packet QP type
On Tue, May 21, 2013 at 1:43 AM, Shawn Bohrer shawn.boh...@gmail.com wrote: I appologize if I missed it, but did any support for L3/L4 CSUM generation get added? Doesn't look like the upstream libibverbs has it, and I don't seem to see any patches floating around. Roland commented that he will make a point release this week for libibverbs and libmlx4, the patches for CSUM offload I will post after that release. Or. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH for-next 0/9] Add receive Flow Steering support
On Tue, May 21, 2013 at 1:54 AM, Shawn Bohrer shawn.boh...@gmail.com wrote: Are there any patches for libibverbs to add ibv_create_flow/ibv_destroy_flow? And are there any needed patches for libmlx4? I'm building up a stack so we can begin testing this series. YES there are patches NO I didn't post them here yet, as I went the bottom-up way of 1st posting kernel patches and once they are accepted (which didn't happen yet) post the user space patches. Last week over the Linux EU summit, people made comments that the flow-steering patches looks OK, and I understand Roland is fine with accepting them for the 3.11 merge window. I do want you or anyone else to start testing them right away, please start with getting a system with 3.10 + the flow-steering patches to run and I will post here in the coming days pointer to user space implementation you can use for testing the kernel patches. Or. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: libibverbs / libmlx4 release
On Mon, May 20, 2013 at 7:49 PM, Roland Dreier rol...@kernel.org wrote: That's fine, I'll do the releases this week. cool. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ibv_reg_mr call failed
On 24/05/2013 20:43, Liu Ginhann wrote: Here is version information of our test chassis, what driver / card are you using? -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ibv_reg_mr call failed
On 24/05/2013 20:43, Liu Ginhann wrote: Ibv_reg_mr() call return EFAULT - bad address Basically, AFAIK, the IB stack should support what you are trying to do, If you tell from which code e.g in libibverbs/libmlx4 or the kernel this value is returned, it might be helpful for further assisting you. Or. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ibv_reg_mr call failed
On Sun, May 26, 2013 at 4:52 PM, Liu Ginhann ginhann@grassvalley.com wrote: Will dig into it.Thanks. does this works if you use get_free_pages in the kernel instead of kmem_cache? Or. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ANNOUNCE] libibverbs 1.1.7 is released
On 29/05/2013 02:10, Roland Dreier wrote: libibverbs is a library that allows programs to use RDMA verbs for direct access to RDMA (currently InfiniBand and iWARP) hardware from userspace. Hey, so there's RoCE out there too... The new stable release, 1.1.7, is available from libmlx4 releasing is coming too? Or. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Status of ummunot branch?
On Tue, May 28, 2013 at 8:51 PM, Jeff Squyres (jsquyres) jsquy...@cisco.com wrote: I ask because, as an MPI guy, I would *love* to see this stuff integrated into the kernel and libibverbs. Hi Jeff, Have you looked on ODP? see https://www.openfabrics.org/resources/document-downloads/presentations/doc_download/568-on-demand-paging-for-user-space-networking.html Or. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Status of ummunot branch?
On 30/05/2013 01:56, Jeff Squyres (jsquyres) wrote: On May 29, 2013, at 4:53 AM, Or Gerlitz or.gerl...@gmail.com wrote: Have you looked on ODP? see https://www.openfabrics.org/resources/document-downloads/presentations/doc_download/568-on-demand-paging-for-user-space-networking.html Is this upstream? No Has this been run by the MPI implementor community? The team that works on this here isn't ready for submission, so community runs were not made yet The limitation of a max of 2 concurrent page faults seems fairly significant. let me check -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ibv_reg_mr call failed
On Fri, May 31, 2013 at 2:05 AM, Liu Ginhann ginhann@grassvalley.com wrote: Or, Tried __get_free_pages, kmalloc, all behave the same - fail in ibv_reg_mr call but the va mapped back is fine for peek and poke. Any other suggestion? Yes, tell from where the EFAULT error originates Hank -Original Message- From: linux-rdma-ow...@vger.kernel.org [mailto:linux-rdma- ow...@vger.kernel.org] On Behalf Of Or Gerlitz Sent: Monday, May 27, 2013 12:53 AM To: Liu Ginhann Cc: Or Gerlitz; linux-rdma@vger.kernel.org Subject: Re: ibv_reg_mr call failed On Sun, May 26, 2013 at 4:52 PM, Liu Ginhann ginhann@grassvalley.com wrote: Will dig into it.Thanks. does this works if you use get_free_pages in the kernel instead of kmem_cache? Or. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Status of ummunot branch?
On 04/06/2013 04:24, Jeff Squyres (jsquyres) wrote: On May 29, 2013, at 1:53 AM, Or Gerlitz or.gerl...@gmail.com wrote: Have you looked on ODP? see https://www.openfabrics.org/resources/document-downloads/presentations/doc_download/568-on-demand-paging-for-user-space-networking.html Is the idea behind ODP that, at the beginning of time, you register the entire memory space (i.e., NULL to 2^64) and then never worry about registered memory? Adding Haggai from the team that works on ODP. Haggai, Jeff also made a comment over this thread http://marc.info/?t=13697634766r=1w=2 that a limitation of a max of 2 concurrent page faults seems fairly significant which you might want to address too. Or. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ibv_reg_mr call failed
On 08/06/2013 19:42, Liu Ginhann wrote: does this works if you use get_free_pages in the kernel instead of kmem_cache? I tried get_free_pages, kmalloc, kmem_cache_alloc. None of them work, it failed with the same error - EFAULT bad pointer. After code walk through, I believe it failed in ib_umem_get routine get_user_pages call. Below is the code snippet and it seems like the address point to is expect to be user space memory. If the comment is true, then that may be able to explain why ibv_reg_mr is not happy with remap address from kernel allocated memory but perfectly fine with malloc from user space. Guys, do you agree, will ib_umem_get always fail when provided memory which wasn't allocated @ user-space? why? Or. If this is true. Do you think there is another way to accomplish this? It got to have some way to do this. You help is appreciated. Hank /** * ib_umem_get - Pin and DMA map userspace memory. * @context: userspace context to pin memory for * @addr: userspace virtual address to start at * @size: length of region to pin * @access: IB_ACCESS_xxx flags for memory being pinned * @dmasync: flush in-flight DMA when the memory region is written */ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr, size_t size, int access, int dmasync) { ret = 0; while (npages) { ret = get_user_pages(current, current-mm, cur_base, min_t(unsigned long, npages, PAGE_SIZE / sizeof (struct page *)), 1, !umem-writable, page_list, vma_list); if (ret 0) goto out; -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V1 for-next 4/4] IB/mlx4: Add receive Flow Steering support
From: Hadar Hen Zion had...@mellanox.com Implement the ib_create_flow and ib_destroy_flow verbs. Translate the verbs structures provided by the user to HW structures and call the MLX4_QP_FLOW_STEERING_ATTACH/DETACH firmware commands. On the ATTACH command completion, the firmware provides 64 bit registration ID which is returned to the caller within struct ib_flow and used later for detaching that flow. Signed-off-by: Hadar Hen Zion had...@mellanox.com Signed-off-by: Or Gerlitz ogerl...@mellanox.com --- drivers/infiniband/hw/mlx4/main.c | 246 + 1 files changed, 246 insertions(+), 0 deletions(-) diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c index 23d7343..0ac5023 100644 --- a/drivers/infiniband/hw/mlx4/main.c +++ b/drivers/infiniband/hw/mlx4/main.c @@ -54,6 +54,8 @@ #define DRV_VERSION1.0 #define DRV_RELDATEApril 4, 2008 +#define MLX4_IB_FLOW_MAX_PRIO 0xFFF + MODULE_AUTHOR(Roland Dreier); MODULE_DESCRIPTION(Mellanox ConnectX HCA InfiniBand driver); MODULE_LICENSE(Dual BSD/GPL); @@ -88,6 +90,25 @@ static void init_query_mad(struct ib_smp *mad) static union ib_gid zgid; +static int check_flow_steering_support(struct mlx4_dev *dev) +{ + int ib_num_ports = 0; + int i; + + mlx4_foreach_port(i, dev, MLX4_PORT_TYPE_IB) + ib_num_ports++; + + if (dev-caps.steering_mode == MLX4_STEERING_MODE_DEVICE_MANAGED) { + if (ib_num_ports || mlx4_is_mfunc(dev)) { + pr_warn(Device managed flow steering is unavailable + for IB ports or in multifunction env.\n); + return 0; + } + return 1; + } + return 0; +} + static int mlx4_ib_query_device(struct ib_device *ibdev, struct ib_device_attr *props) { @@ -144,6 +165,8 @@ static int mlx4_ib_query_device(struct ib_device *ibdev, props-device_cap_flags |= IB_DEVICE_MEM_WINDOW_TYPE_2B; else props-device_cap_flags |= IB_DEVICE_MEM_WINDOW_TYPE_2A; + if (check_flow_steering_support(dev-dev)) + props-device_cap_flags |= IB_DEVICE_MANAGED_FLOW_STEERING; } props-vendor_id = be32_to_cpup((__be32 *) (out_mad-data + 36)) @@ -798,6 +821,220 @@ struct mlx4_ib_steering { union ib_gid gid; }; +static int parse_flow_attr(struct mlx4_dev *dev, + struct _ib_flow_spec *ib_spec, + struct _rule_hw *mlx4_spec) +{ + enum mlx4_net_trans_rule_id type; + + switch (ib_spec-type) { + case IB_FLOW_SPEC_ETH: + type = MLX4_NET_TRANS_RULE_ID_ETH; + memcpy(mlx4_spec-eth.dst_mac, ib_spec-eth.val.dst_mac, + ETH_ALEN); + memcpy(mlx4_spec-eth.dst_mac_msk, ib_spec-eth.mask.dst_mac, + ETH_ALEN); + mlx4_spec-eth.vlan_tag = ib_spec-eth.val.vlan_tag; + mlx4_spec-eth.vlan_tag_msk = ib_spec-eth.mask.vlan_tag; + break; + + case IB_FLOW_SPEC_IB: + type = MLX4_NET_TRANS_RULE_ID_IB; + mlx4_spec-ib.l3_qpn = ib_spec-ib.val.l3_type_qpn; + mlx4_spec-ib.qpn_mask = ib_spec-ib.mask.l3_type_qpn; + memcpy(mlx4_spec-ib.dst_gid, ib_spec-ib.val.dst_gid, 16); + memcpy(mlx4_spec-ib.dst_gid_msk, + ib_spec-ib.mask.dst_gid, 16); + break; + + case IB_FLOW_SPEC_IPV4: + type = MLX4_NET_TRANS_RULE_ID_IPV4; + mlx4_spec-ipv4.src_ip = ib_spec-ipv4.val.src_ip; + mlx4_spec-ipv4.src_ip_msk = ib_spec-ipv4.mask.src_ip; + mlx4_spec-ipv4.dst_ip = ib_spec-ipv4.val.dst_ip; + mlx4_spec-ipv4.dst_ip_msk = ib_spec-ipv4.mask.dst_ip; + break; + + case IB_FLOW_SPEC_TCP: + case IB_FLOW_SPEC_UDP: + type = ib_spec-type == IB_FLOW_SPEC_TCP ? + MLX4_NET_TRANS_RULE_ID_TCP : + MLX4_NET_TRANS_RULE_ID_UDP; + mlx4_spec-tcp_udp.dst_port = ib_spec-tcp_udp.val.dst_port; + mlx4_spec-tcp_udp.dst_port_msk = ib_spec-tcp_udp.mask.dst_port; + mlx4_spec-tcp_udp.src_port = ib_spec-tcp_udp.val.src_port; + mlx4_spec-tcp_udp.src_port_msk = ib_spec-tcp_udp.mask.src_port; + break; + + default: + return -EINVAL; + } + if (mlx4_map_sw_to_hw_steering_id(dev, type) 0 || + mlx4_hw_rule_sz(dev, type) 0) + return -EINVAL; + mlx4_spec-id = cpu_to_be16(mlx4_map_sw_to_hw_steering_id(dev, type)); + mlx4_spec-size = mlx4_hw_rule_sz(dev, type) 2; + return mlx4_hw_rule_sz(dev, type); +} + +static int __mlx4_ib_create_flow(struct ib_qp *qp, struct
[PATCH V1 for-next 1/4] IB/core: Add receive Flow Steering support
From: Hadar Hen Zion had...@mellanox.com The RDMA stack allows for applications to create IB_QPT_RAW_PACKET QPs, for which plain Ethernet packets are used, specifically packets which don't carry any QPN to be matched by the receiving side. Applications using these QPs must be provided with a method to program some steering rule with the HW so packets arriving at the local port can be routed to them. This patch adds ib_create_flow which allow to provide a flow specification for a QP, such that when there's a match between the specification and the received packet, it can be forwarded to that QP, in a similar manner one needs to use ib_attach_multicast for IB UD multicast handling. Flow specifications are provided as instances of struct ib_flow_spec_yyy which describe L2, L3 and L4 headers, currently specs for Ethernet, IPv4, TCP, UDP and IB are defined. Flow specs are made of values and masks. The input to ib_create_flow is instance of struct ib_flow_attr which contain few mandatory control elements and optional flow specs. struct ib_flow_attr { enum ib_flow_attr_type type; u16 size; u16 priority; u8 num_of_specs; u8 port; u32 flags; /* Following are the optional layers according to user request * struct ib_flow_spec_yyy * struct ib_flow_spec_zzz */ }; As these specs are eventually coming from user space, they are defined and used in a way which allows adding new spec types without kernel/user ABI change, and with a little API enhancement which defines the newly added spec. The flow spec structures are defined in a TLV (Type-Length-Value) manner, which allows to call ib_create_flow with a list of variable length of optional specs. For the actual processing of ib_flow_attr the driver uses the number of specs and the size mandatory fields along with the TLV nature of the specs. Steering rules processing order is according to rules priority. The user sets the 12 low-order bits from the priority field and the remaining 4 high-order bits are set by the kernel according to a domain the application or the layer that created the rule belongs to. Lower priority numerical value means higher priority. The returned value from ib_create_flow is instance of struct ib_flow which contains a database pointer (handle) provided by the HW driver to be used when calling ib_destroy_flow. Applications that offload TCP/IP traffic could be written also over IB UD QPs. As such, the ib_create_flow / ib_destroy_flow API is designed to support UD QPs too, the HW driver sets IB_DEVICE_MANAGED_FLOW_STEERING to denote support of flow steering. The ib_flow_attr enum type relates to usage of flow steering for promiscuous and sniffer purposes: IB_FLOW_ATTR_NORMAL - regular rule, steering according to rule specification IB_FLOW_ATTR_ALL_DEFAULT - default unicast and multicast rule, receive all Ethernet traffic which isn't steered to any QP IB_FLOW_ATTR_MC_DEFAULT - same as IB_FLOW_ATTR_ALL_DEFAULT but only for multicast IB_FLOW_ATTR_SNIFFER - sniffer rule, receive all port traffic ALL_DEFAULT and MC_DEFAULT rules options are valid only for Ethernet link type. Signed-off-by: Hadar Hen Zion had...@mellanox.com Signed-off-by: Or Gerlitz ogerl...@mellanox.com --- drivers/infiniband/core/verbs.c | 30 + include/rdma/ib_verbs.h | 136 ++- 2 files changed, 164 insertions(+), 2 deletions(-) diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c index 22192de..932f4a7 100644 --- a/drivers/infiniband/core/verbs.c +++ b/drivers/infiniband/core/verbs.c @@ -1254,3 +1254,33 @@ int ib_dealloc_xrcd(struct ib_xrcd *xrcd) return xrcd-device-dealloc_xrcd(xrcd); } EXPORT_SYMBOL(ib_dealloc_xrcd); + +struct ib_flow *ib_create_flow(struct ib_qp *qp, + struct ib_flow_attr *flow_attr, + int domain) +{ + struct ib_flow *flow_id; + if (!qp-device-create_flow) + return ERR_PTR(-ENOSYS); + + flow_id = qp-device-create_flow(qp, flow_attr, domain); + if (!IS_ERR(flow_id)) + atomic_inc(qp-usecnt); + return flow_id; +} +EXPORT_SYMBOL(ib_create_flow); + +int ib_destroy_flow(struct ib_flow *flow_id) +{ + int err; + struct ib_qp *qp = flow_id-qp; + + if (!flow_id-qp-device-destroy_flow) + return -ENOSYS; + + err = qp-device-destroy_flow(flow_id); + if (!err) + atomic_dec(qp-usecnt); + return err; +} +EXPORT_SYMBOL(ib_destroy_flow); diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index 98cc4b2..6f76d62 100644 --- a/include/rdma/ib_verbs.h +++ b/include/rdma/ib_verbs.h @@ -116,7 +116,8 @@ enum ib_device_cap_flags { IB_DEVICE_MEM_MGT_EXTENSIONS= (121), IB_DEVICE_BLOCK_MULTICAST_LOOPBACK = (122), IB_DEVICE_MEM_WINDOW_TYPE_2A= (123
[PATCH V1 for-next 0/4] Add receive Flow Steering support
Hi Roland, all These patches add Flow Steering support to the kernel IB core, to uverbs and to the mlx4 IB (verbs) driver along with one patch to uverbs which adds some code to support extensions. IB/core: Add receive Flow Steering support IB/core: Infra-structure to support verbs extensions through uverbs IB/core: Export ib_create/destroy_flow through uverbs IB/mlx4: Add receive Flow Steering support The main patch which introduces the Flow-Steering API is IB/core: Add receive Flow Steering support, see its change log. Looking on the Network Adapter Flow Steering slides from Tzahi Oved which he presented on the annual OFA 2012 meeting could be helpful https://www.openfabrics.org/resources/document-downloads/presentations/doc_download/518-network-adapter-flow-steering.html V0 has been acknowledged by Steve and Christoph, and was also got positive feedback from Sean and Jason over f2f talks we had during the Linux Foundation EU summit on last month. V1 changes: - dropped the five pre-patches which were accepted into 3.10 - rebased the patches against Roland's for-next / 3.10-rc4 - in patch #3, ib_uverbs_destroy_flow was returning too quickly when the driver returned failure for ib_destroy_flow, need to free some uverbs resources 1st. - in patch #4, check index before accessing the array at mlx4_ib_create/destroy_flow Or. Hadar Hen Zion (3): IB/core: Add receive Flow Steering support IB/core: Export ib_create/destroy_flow through uverbs IB/mlx4: Add receive Flow Steering support Igor Ivanov (1): IB/core: Infra-structure to support verbs extensions through uverbs drivers/infiniband/core/uverbs.h |3 + drivers/infiniband/core/uverbs_cmd.c | 206 +++ drivers/infiniband/core/uverbs_main.c | 42 +- drivers/infiniband/core/verbs.c | 30 drivers/infiniband/hw/mlx4/main.c | 246 + include/rdma/ib_verbs.h | 137 ++- include/uapi/rdma/ib_user_verbs.h | 118 - 7 files changed, 773 insertions(+), 9 deletions(-) -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V1 for-next 3/4] IB/core: Export ib_create/destroy_flow through uverbs
From: Hadar Hen Zion had...@mellanox.com Implement ib_uverbs_create_flow and ib_uverbs_destroy_flow to support flow steering for user space applications. Signed-off-by: Hadar Hen Zion had...@mellanox.com Signed-off-by: Or Gerlitz ogerl...@mellanox.com --- drivers/infiniband/core/uverbs.h |3 + drivers/infiniband/core/uverbs_cmd.c | 206 + drivers/infiniband/core/uverbs_main.c | 13 ++- include/rdma/ib_verbs.h |1 + include/uapi/rdma/ib_user_verbs.h | 108 +- 5 files changed, 329 insertions(+), 2 deletions(-) diff --git a/drivers/infiniband/core/uverbs.h b/drivers/infiniband/core/uverbs.h index 0fcd7aa..ad9d102 100644 --- a/drivers/infiniband/core/uverbs.h +++ b/drivers/infiniband/core/uverbs.h @@ -155,6 +155,7 @@ extern struct idr ib_uverbs_cq_idr; extern struct idr ib_uverbs_qp_idr; extern struct idr ib_uverbs_srq_idr; extern struct idr ib_uverbs_xrcd_idr; +extern struct idr ib_uverbs_rule_idr; void idr_remove_uobj(struct idr *idp, struct ib_uobject *uobj); @@ -215,5 +216,7 @@ IB_UVERBS_DECLARE_CMD(destroy_srq); IB_UVERBS_DECLARE_CMD(create_xsrq); IB_UVERBS_DECLARE_CMD(open_xrcd); IB_UVERBS_DECLARE_CMD(close_xrcd); +IB_UVERBS_DECLARE_CMD(create_flow); +IB_UVERBS_DECLARE_CMD(destroy_flow); #endif /* UVERBS_H */ diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c index a7d00f6..956782b 100644 --- a/drivers/infiniband/core/uverbs_cmd.c +++ b/drivers/infiniband/core/uverbs_cmd.c @@ -54,6 +54,7 @@ static struct uverbs_lock_class qp_lock_class = { .name = QP-uobj }; static struct uverbs_lock_class ah_lock_class = { .name = AH-uobj }; static struct uverbs_lock_class srq_lock_class = { .name = SRQ-uobj }; static struct uverbs_lock_class xrcd_lock_class = { .name = XRCD-uobj }; +static struct uverbs_lock_class rule_lock_class = { .name = RULE-uobj }; #define INIT_UDATA(udata, ibuf, obuf, ilen, olen) \ do {\ @@ -330,6 +331,7 @@ ssize_t ib_uverbs_get_context(struct ib_uverbs_file *file, INIT_LIST_HEAD(ucontext-srq_list); INIT_LIST_HEAD(ucontext-ah_list); INIT_LIST_HEAD(ucontext-xrcd_list); + INIT_LIST_HEAD(ucontext-rule_list); ucontext-closing = 0; resp.num_comp_vectors = file-device-num_comp_vectors; @@ -2587,6 +2589,210 @@ out_put: return ret ? ret : in_len; } +static int kern_spec_to_ib_spec(struct ib_kern_spec *kern_spec, + struct _ib_flow_spec *ib_spec) +{ + ib_spec-type = kern_spec-type; + + switch (ib_spec-type) { + case IB_FLOW_SPEC_ETH: + ib_spec-eth.size = sizeof(struct ib_flow_spec_eth); + memcpy(ib_spec-eth.val, kern_spec-eth.val, + sizeof(struct ib_flow_eth_filter)); + memcpy(ib_spec-eth.mask, kern_spec-eth.mask, + sizeof(struct ib_flow_eth_filter)); + break; + case IB_FLOW_SPEC_IB: + ib_spec-ib.size = sizeof(struct ib_flow_spec_ib); + memcpy(ib_spec-ib.val, kern_spec-ib.val, + sizeof(struct ib_flow_ib_filter)); + memcpy(ib_spec-ib.mask, kern_spec-ib.mask, + sizeof(struct ib_flow_ib_filter)); + break; + case IB_FLOW_SPEC_IPV4: + ib_spec-ipv4.size = sizeof(struct ib_flow_spec_ipv4); + memcpy(ib_spec-ipv4.val, kern_spec-ipv4.val, + sizeof(struct ib_flow_ipv4_filter)); + memcpy(ib_spec-ipv4.mask, kern_spec-ipv4.mask, + sizeof(struct ib_flow_ipv4_filter)); + break; + case IB_FLOW_SPEC_TCP: + case IB_FLOW_SPEC_UDP: + ib_spec-tcp_udp.size = sizeof(struct ib_flow_spec_tcp_udp); + memcpy(ib_spec-tcp_udp.val, kern_spec-tcp_udp.val, + sizeof(struct ib_flow_tcp_udp_filter)); + memcpy(ib_spec-tcp_udp.mask, kern_spec-tcp_udp.mask, + sizeof(struct ib_flow_tcp_udp_filter)); + break; + default: + return -EINVAL; + } + return 0; +} + +ssize_t ib_uverbs_create_flow(struct ib_uverbs_file *file, + const char __user *buf, int in_len, + int out_len) +{ + struct ib_uverbs_create_flow cmd; + struct ib_uverbs_create_flow_resp resp; + struct ib_uobject *uobj; + struct ib_flow*flow_id; + struct ib_kern_flow_attr *kern_flow_attr; + struct ib_flow_attr *flow_attr; + struct ib_qp *qp; + int err = 0; + void *kern_spec; + void *ib_spec; + int i; + + if (out_len sizeof(resp)) + return -ENOSPC; + + if (copy_from_user(cmd, buf
[PATCH V1 for-next 2/4] IB/core: Infra-structure to support verbs extensions through uverbs
From: Igor Ivanov igor.iva...@itseez.com Add Infra-structure to support extended uverbs capabilities in a forward/backward manner. Uverbs command opcodes which are based on the verbs extensions approach should be greater or equal to IB_USER_VERBS_CMD_THRESHOLD. They have new header format and processed a bit differently. Signed-off-by: Igor Ivanov igor.iva...@itseez.com Signed-off-by: Hadar Hen Zion had...@mellanox.com Signed-off-by: Or Gerlitz ogerl...@mellanox.com --- drivers/infiniband/core/uverbs_main.c | 29 - include/uapi/rdma/ib_user_verbs.h | 10 ++ 2 files changed, 34 insertions(+), 5 deletions(-) diff --git a/drivers/infiniband/core/uverbs_main.c b/drivers/infiniband/core/uverbs_main.c index 2c6f0f2..e4e7b24 100644 --- a/drivers/infiniband/core/uverbs_main.c +++ b/drivers/infiniband/core/uverbs_main.c @@ -583,9 +583,6 @@ static ssize_t ib_uverbs_write(struct file *filp, const char __user *buf, if (copy_from_user(hdr, buf, sizeof hdr)) return -EFAULT; - if (hdr.in_words * 4 != count) - return -EINVAL; - if (hdr.command = ARRAY_SIZE(uverbs_cmd_table) || !uverbs_cmd_table[hdr.command]) return -EINVAL; @@ -597,8 +594,30 @@ static ssize_t ib_uverbs_write(struct file *filp, const char __user *buf, if (!(file-device-ib_dev-uverbs_cmd_mask (1ull hdr.command))) return -ENOSYS; - return uverbs_cmd_table[hdr.command](file, buf + sizeof hdr, -hdr.in_words * 4, hdr.out_words * 4); + if (hdr.command = IB_USER_VERBS_CMD_THRESHOLD) { + struct ib_uverbs_cmd_hdr_ex hdr_ex; + + if (copy_from_user(hdr_ex, buf, sizeof(hdr_ex))) + return -EFAULT; + + if (((hdr_ex.in_words + hdr_ex.provider_in_words) * 4) != count) + return -EINVAL; + + return uverbs_cmd_table[hdr.command](file, +buf + sizeof(hdr_ex), +(hdr_ex.in_words + + hdr_ex.provider_in_words) * 4, +(hdr_ex.out_words + + hdr_ex.provider_out_words) * 4); + } else { + if (hdr.in_words * 4 != count) + return -EINVAL; + + return uverbs_cmd_table[hdr.command](file, +buf + sizeof(hdr), +hdr.in_words * 4, +hdr.out_words * 4); + } } static int ib_uverbs_mmap(struct file *filp, struct vm_area_struct *vma) diff --git a/include/uapi/rdma/ib_user_verbs.h b/include/uapi/rdma/ib_user_verbs.h index 805711e..61535aa 100644 --- a/include/uapi/rdma/ib_user_verbs.h +++ b/include/uapi/rdma/ib_user_verbs.h @@ -43,6 +43,7 @@ * compatibility are made. */ #define IB_USER_VERBS_ABI_VERSION 6 +#define IB_USER_VERBS_CMD_THRESHOLD50 enum { IB_USER_VERBS_CMD_GET_CONTEXT, @@ -123,6 +124,15 @@ struct ib_uverbs_cmd_hdr { __u16 out_words; }; +struct ib_uverbs_cmd_hdr_ex { + __u32 command; + __u16 in_words; + __u16 out_words; + __u16 provider_in_words; + __u16 provider_out_words; + __u32 cmd_hdr_reserved; +}; + struct ib_uverbs_get_context { __u64 response; __u64 driver_data[0]; -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 04/14] IB/srp: Skip host settle delay
On 13/06/2013 12:53, Sebastian Riemer wrote: On 12.06.2013 15:24, Bart Van Assche wrote: The SRP initiator implements host reset by reconnecting to the SRP target. That means that communication with the target is possible as soon as host reset finished. Hence skip the host settle delay. Signed-off-by: Bart Van Assche bvanass...@acm.org Cc: Roland Dreier rol...@purestorage.com Cc: David Dillow dillo...@ornl.gov Cc: Vu Pham v...@mellanox.com Cc: Sebastian Riemer sebastian.rie...@profitbricks.com --- drivers/infiniband/ulp/srp/ib_srp.c |1 + 1 file changed, 1 insertion(+) diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c index fb37b47..be12780 100644 --- a/drivers/infiniband/ulp/srp/ib_srp.c +++ b/drivers/infiniband/ulp/srp/ib_srp.c @@ -1949,6 +1949,7 @@ static struct scsi_host_template srp_template = { .eh_abort_handler = srp_abort, .eh_device_reset_handler= srp_reset_device, .eh_host_reset_handler = srp_reset_host, + .skip_settle_delay = true, .sg_tablesize = SRP_DEF_SG_TABLESIZE, .can_queue = SRP_CMD_SQ_SIZE, .this_id= -1, Signed-off-by: Sebastian Riemer sebastian.rie...@profitbricks.com Acked-by: Sebastian Riemer sebastian.rie...@profitbricks.com Tested-by: Sebastian Riemer sebastian.rie...@profitbricks.com Reviewed-by: Sebastian Riemer sebastian.rie...@profitbricks.com Reviewed-by: Christoph Hellwig h...@infradead.org Choose something, yes, but too many things... else we will end up with one liner patch that has ten yyy-by: credit lines... Or. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH for-next 2/4] IB/mlx4: RoCE IP based GID addressing
From: Moni Shoua mo...@mellanox.co.il Currently, the mlx4 driver set RoCE (IBoE) gids to encode related Ethernet netdevice interface MAC address and possibly VLAN id. Change this scheme such that gids encode interface IP addresses (both IP4 and IPv6). Signed-off-by: Moni Shoua mo...@mellanox.co.il Signed-off-by: Or Gerlitz ogerl...@mellanox.com --- drivers/infiniband/hw/mlx4/ah.c | 21 +- drivers/infiniband/hw/mlx4/cq.c |5 + drivers/infiniband/hw/mlx4/main.c| 461 +++--- drivers/infiniband/hw/mlx4/mlx4_ib.h |3 + drivers/infiniband/hw/mlx4/qp.c | 19 +- include/linux/mlx4/cq.h | 14 +- 6 files changed, 354 insertions(+), 169 deletions(-) diff --git a/drivers/infiniband/hw/mlx4/ah.c b/drivers/infiniband/hw/mlx4/ah.c index a251bec..3941700 100644 --- a/drivers/infiniband/hw/mlx4/ah.c +++ b/drivers/infiniband/hw/mlx4/ah.c @@ -92,21 +92,18 @@ static struct ib_ah *create_iboe_ah(struct ib_pd *pd, struct ib_ah_attr *ah_attr { struct mlx4_ib_dev *ibdev = to_mdev(pd-device); struct mlx4_dev *dev = ibdev-dev; - union ib_gid sgid; - u8 mac[6]; - int err; int is_mcast; + struct in6_addr in6; u16 vlan_tag; - err = mlx4_ib_resolve_grh(ibdev, ah_attr, mac, is_mcast, ah_attr-port_num); - if (err) - return ERR_PTR(err); - - memcpy(ah-av.eth.mac, mac, 6); - err = ib_get_cached_gid(pd-device, ah_attr-port_num, ah_attr-grh.sgid_index, sgid); - if (err) - return ERR_PTR(err); - vlan_tag = rdma_get_vlan_id(sgid); + memcpy(in6, ah_attr-grh.dgid.raw, sizeof(in6)); + if (rdma_is_multicast_addr(in6)) { + is_mcast = 1; + rdma_get_mcast_mac(in6, ah-av.eth.mac); + } else { + memcpy(ah-av.eth.mac, ah_attr-dmac, 6); + } + vlan_tag = ah_attr-vlan; if (vlan_tag 0x1000) vlan_tag |= (ah_attr-sl 7) 13; ah-av.eth.port_pd = cpu_to_be32(to_mpd(pd)-pdn | (ah_attr-port_num 24)); diff --git a/drivers/infiniband/hw/mlx4/cq.c b/drivers/infiniband/hw/mlx4/cq.c index d5e60f4..ba3f85b 100644 --- a/drivers/infiniband/hw/mlx4/cq.c +++ b/drivers/infiniband/hw/mlx4/cq.c @@ -793,6 +793,11 @@ repoll: wc-sl = be16_to_cpu(cqe-sl_vid) 13; else wc-sl = be16_to_cpu(cqe-sl_vid) 12; + if (be32_to_cpu(cqe-vlan_my_qpn) MLX4_CQE_VLAN_PRESENT_MASK) + wc-vlan = be16_to_cpu(cqe-sl_vid) MLX4_CQE_VID_MASK; + else + wc-vlan = 0x; + memcpy(wc-smac, cqe-smac, 6); } return 0; diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c index 23d7343..8879b41 100644 --- a/drivers/infiniband/hw/mlx4/main.c +++ b/drivers/infiniband/hw/mlx4/main.c @@ -39,6 +39,8 @@ #include linux/inetdevice.h #include linux/rtnetlink.h #include linux/if_vlan.h +#include net/ipv6.h +#include net/addrconf.h #include rdma/ib_smi.h #include rdma/ib_user_verbs.h @@ -767,7 +769,6 @@ static int add_gid_entry(struct ib_qp *ibqp, union ib_gid *gid) int mlx4_ib_add_mc(struct mlx4_ib_dev *mdev, struct mlx4_ib_qp *mqp, union ib_gid *gid) { - u8 mac[6]; struct net_device *ndev; int ret = 0; @@ -781,11 +782,7 @@ int mlx4_ib_add_mc(struct mlx4_ib_dev *mdev, struct mlx4_ib_qp *mqp, spin_unlock(mdev-iboe.lock); if (ndev) { - rdma_get_mcast_mac((struct in6_addr *)gid, mac); - rtnl_lock(); - dev_mc_add(mdev-iboe.netdevs[mqp-port - 1], mac); ret = 1; - rtnl_unlock(); dev_put(ndev); } @@ -805,6 +802,8 @@ static int mlx4_ib_mcg_attach(struct ib_qp *ibqp, union ib_gid *gid, u16 lid) struct mlx4_ib_qp *mqp = to_mqp(ibqp); u64 reg_id; struct mlx4_ib_steering *ib_steering = NULL; + enum mlx4_protocol prot = (gid-raw[1] == 0x0e) ? + MLX4_PROT_IB_IPV4 : MLX4_PROT_IB_IPV6; if (mdev-dev-caps.steering_mode == MLX4_STEERING_MODE_DEVICE_MANAGED) { @@ -816,7 +815,7 @@ static int mlx4_ib_mcg_attach(struct ib_qp *ibqp, union ib_gid *gid, u16 lid) err = mlx4_multicast_attach(mdev-dev, mqp-mqp, gid-raw, mqp-port, !!(mqp-flags MLX4_IB_QP_BLOCK_MULTICAST_LOOPBACK), - MLX4_PROT_IB_IPV6, reg_id); + prot, reg_id); if (err) goto err_malloc; @@ -835,7 +834,7 @@ static int mlx4_ib_mcg_attach(struct ib_qp *ibqp, union ib_gid *gid, u16 lid) err_add: mlx4_multicast_detach(mdev-dev, mqp-mqp, gid-raw, - MLX4_PROT_IB_IPV6, reg_id); + prot, reg_id); err_malloc
[PATCH for-next 0/4] IP based RoCE GID Addressing
Currently, the IB stack (core + drivers) handle RoCE (IBoE) gids as they encode related Ethernet net-device interface MAC address and possibly VLAN id. This series changes RoCE GIDs to encode IP addresses (IPv4 + IPv6) of the that Ethernet interface, under the following reasoning: 1. There are environments where the compute entity that runs the RoCE stack is not aware that its traffic is vlan-tagged. This results with that node to create/assume wrong GIDs from the view point of a peer node which is aware to vlans. Note that node here can be physical node connected to Ethernet switch acting in access mode talking to another node which does vlan insertion/stripping by itself. Or another example is SRIOV Virtual Function which is configured to work in VST mode (Virtual-Switch-Tagging) such that the hypervisor configures the HW eSWitch to do vlan insertion for the vPORT representing that function. 2. When RoCE traffic is inspected (mirrored/trapped) in Ethernet switches for monitoring and security purposes. It is much more natural for both humans and automated utilities (...) to observe IP addresses in a certain offset into RoCE frames L3 header vs. MAC/VLANs (which are there anyway in the L2 header of that frame, so they are not gone by this change). 3. Some Bonding/Teaming advanced mode such as balance-alb and balance-tlb are using multiple underlying devices in parallel, and hence packets always carry the bond IP address but different streams have different source MACs. The approach brought by this series is part from what would allow to support that for RoCE traffic too. The 1st patch modified the IB core to cope with the new scheme, and the 2nd does that for the mlx4_ib driver. The 3rd patch sets the foundation for extending uverbs to the new scheme which was introduced lately, and the fourth patch adds two extended uCMA commands and two extended uVERBS commands which are now exported to user space. These extended verbs will allow to enhance user space libraries such that they work OK over the modified scheme. All RC applications using librdmacm will not need to be modified at all, since the change will be encapsulated into that library. The ocrdma driver needs to go through a similar patch as the mlx4_ib one, we can surely do that patch, just need to dig there a little further. Or. Igor Ivanov (1): IB/core: Infra-structure to support verbs extensions through uverbs Matan Barak (1): IB/core: Add RoCE IP based addressing extensions towards user space Moni Shoua (2): IB/core: RoCE IP based GID addressing IB/mlx4: RoCE IP based GID addressing drivers/infiniband/core/cm.c |3 + drivers/infiniband/core/cma.c | 39 ++- drivers/infiniband/core/sa_query.c|5 + drivers/infiniband/core/ucma.c| 190 +++-- drivers/infiniband/core/uverbs.h |2 + drivers/infiniband/core/uverbs_cmd.c | 330 - drivers/infiniband/core/uverbs_main.c | 33 ++- drivers/infiniband/core/uverbs_marshall.c | 94 ++- drivers/infiniband/core/verbs.c |7 + drivers/infiniband/hw/mlx4/ah.c | 21 +- drivers/infiniband/hw/mlx4/cq.c |5 + drivers/infiniband/hw/mlx4/main.c | 461 - drivers/infiniband/hw/mlx4/mlx4_ib.h |3 + drivers/infiniband/hw/mlx4/qp.c | 19 +- include/linux/mlx4/cq.h | 14 +- include/rdma/ib_addr.h| 45 ++-- include/rdma/ib_marshall.h| 12 + include/rdma/ib_sa.h |3 + include/rdma/ib_verbs.h |4 + include/uapi/rdma/ib_user_sa.h| 34 ++- include/uapi/rdma/ib_user_verbs.h | 130 - include/uapi/rdma/rdma_user_cm.h | 21 ++- 22 files changed, 1157 insertions(+), 318 deletions(-) -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH for-next 3/4] IB/core: Infra-structure to support verbs extensions through uverbs
From: Igor Ivanov igor.iva...@itseez.com Add Infra-structure to support extended uverbs capabilities in a forward/backward manner. Uverbs command opcodes which are based on the verbs extensions approach should be greater or equal to IB_USER_VERBS_CMD_THRESHOLD. They have new header format and processed a bit differently. Signed-off-by: Igor Ivanov igor.iva...@itseez.com Signed-off-by: Hadar Hen Zion had...@mellanox.com Signed-off-by: Or Gerlitz ogerl...@mellanox.com --- drivers/infiniband/core/uverbs_main.c | 29 - include/uapi/rdma/ib_user_verbs.h | 10 ++ 2 files changed, 34 insertions(+), 5 deletions(-) diff --git a/drivers/infiniband/core/uverbs_main.c b/drivers/infiniband/core/uverbs_main.c index 2c6f0f2..e4e7b24 100644 --- a/drivers/infiniband/core/uverbs_main.c +++ b/drivers/infiniband/core/uverbs_main.c @@ -583,9 +583,6 @@ static ssize_t ib_uverbs_write(struct file *filp, const char __user *buf, if (copy_from_user(hdr, buf, sizeof hdr)) return -EFAULT; - if (hdr.in_words * 4 != count) - return -EINVAL; - if (hdr.command = ARRAY_SIZE(uverbs_cmd_table) || !uverbs_cmd_table[hdr.command]) return -EINVAL; @@ -597,8 +594,30 @@ static ssize_t ib_uverbs_write(struct file *filp, const char __user *buf, if (!(file-device-ib_dev-uverbs_cmd_mask (1ull hdr.command))) return -ENOSYS; - return uverbs_cmd_table[hdr.command](file, buf + sizeof hdr, -hdr.in_words * 4, hdr.out_words * 4); + if (hdr.command = IB_USER_VERBS_CMD_THRESHOLD) { + struct ib_uverbs_cmd_hdr_ex hdr_ex; + + if (copy_from_user(hdr_ex, buf, sizeof(hdr_ex))) + return -EFAULT; + + if (((hdr_ex.in_words + hdr_ex.provider_in_words) * 4) != count) + return -EINVAL; + + return uverbs_cmd_table[hdr.command](file, +buf + sizeof(hdr_ex), +(hdr_ex.in_words + + hdr_ex.provider_in_words) * 4, +(hdr_ex.out_words + + hdr_ex.provider_out_words) * 4); + } else { + if (hdr.in_words * 4 != count) + return -EINVAL; + + return uverbs_cmd_table[hdr.command](file, +buf + sizeof(hdr), +hdr.in_words * 4, +hdr.out_words * 4); + } } static int ib_uverbs_mmap(struct file *filp, struct vm_area_struct *vma) diff --git a/include/uapi/rdma/ib_user_verbs.h b/include/uapi/rdma/ib_user_verbs.h index 805711e..61535aa 100644 --- a/include/uapi/rdma/ib_user_verbs.h +++ b/include/uapi/rdma/ib_user_verbs.h @@ -43,6 +43,7 @@ * compatibility are made. */ #define IB_USER_VERBS_ABI_VERSION 6 +#define IB_USER_VERBS_CMD_THRESHOLD50 enum { IB_USER_VERBS_CMD_GET_CONTEXT, @@ -123,6 +124,15 @@ struct ib_uverbs_cmd_hdr { __u16 out_words; }; +struct ib_uverbs_cmd_hdr_ex { + __u32 command; + __u16 in_words; + __u16 out_words; + __u16 provider_in_words; + __u16 provider_out_words; + __u32 cmd_hdr_reserved; +}; + struct ib_uverbs_get_context { __u64 response; __u64 driver_data[0]; -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH for-next 1/4] IB/core: RoCE IP based GID addressing
From: Moni Shoua mo...@mellanox.co.il Currently, the IB core assume RoCE (IBoE) gids encode related Ethernet netdevice interface MAC address and possibly VLAN id. Change gids to be treated as they encode interface IP address. Since Ethernet layer 2 address parameters are not longer encoded within gids, had to extend the Infiniband address structures (e.g. ib_ah_attr) with layer 2 address parameters, namely mac and vlan. Signed-off-by: Moni Shoua mo...@mellanox.co.il Signed-off-by: Or Gerlitz ogerl...@mellanox.com --- drivers/infiniband/core/cm.c |3 ++ drivers/infiniband/core/cma.c | 39 ++ drivers/infiniband/core/sa_query.c |5 drivers/infiniband/core/ucma.c | 18 +++--- drivers/infiniband/core/verbs.c|7 + include/rdma/ib_addr.h | 45 include/rdma/ib_sa.h |3 ++ include/rdma/ib_verbs.h|4 +++ 8 files changed, 79 insertions(+), 45 deletions(-) diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c index 784b97c..7af618f 100644 --- a/drivers/infiniband/core/cm.c +++ b/drivers/infiniband/core/cm.c @@ -1557,6 +1557,9 @@ static int cm_req_handler(struct cm_work *work) cm_process_routed_req(req_msg, work-mad_recv_wc-wc); cm_format_paths_from_req(req_msg, work-path[0], work-path[1]); + + memcpy(work-path[0].dmac, cm_id_priv-av.ah_attr.dmac, 6); + work-path[0].vlan = cm_id_priv-av.ah_attr.vlan; ret = cm_init_av_by_path(work-path[0], cm_id_priv-av); if (ret) { ib_get_cached_gid(work-port-cm_dev-ib_device, diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index 71c2c71..ba217c9 100644 --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -373,7 +373,9 @@ static int cma_acquire_dev(struct rdma_id_private *id_priv) return -EINVAL; mutex_lock(lock); - iboe_addr_get_sgid(dev_addr, iboe_gid); + rdma_ip2gid((struct sockaddr *)id_priv-id.route.addr.src_addr, + iboe_gid); + memcpy(gid, dev_addr-src_dev_addr + rdma_addr_gid_offset(dev_addr), sizeof gid); list_for_each_entry(cma_dev, dev_list, list) { @@ -1803,7 +1805,7 @@ static int cma_resolve_iboe_route(struct rdma_id_private *id_priv) struct sockaddr_in *src_addr = (struct sockaddr_in *)route-addr.src_addr; struct sockaddr_in *dst_addr = (struct sockaddr_in *)route-addr.dst_addr; struct net_device *ndev = NULL; - u16 vid; + if (src_addr-sin_family != dst_addr-sin_family) return -EINVAL; @@ -1830,10 +1832,13 @@ static int cma_resolve_iboe_route(struct rdma_id_private *id_priv) goto err2; } - vid = rdma_vlan_dev_vlan_id(ndev); + route-path_rec-vlan = rdma_vlan_dev_vlan_id(ndev); + memcpy(route-path_rec-dmac, addr-dev_addr.dst_dev_addr, 6); - iboe_mac_vlan_to_ll(route-path_rec-sgid, addr-dev_addr.src_dev_addr, vid); - iboe_mac_vlan_to_ll(route-path_rec-dgid, addr-dev_addr.dst_dev_addr, vid); + rdma_ip2gid((struct sockaddr *)id_priv-id.route.addr.src_addr, + route-path_rec-sgid); + rdma_ip2gid((struct sockaddr *)id_priv-id.route.addr.dst_addr, + route-path_rec-dgid); route-path_rec-hop_limit = 1; route-path_rec-reversible = 1; @@ -1970,6 +1975,8 @@ static void addr_handler(int status, struct sockaddr *src_addr, RDMA_CM_ADDR_RESOLVED)) goto out; + memcpy(id_priv-id.route.addr.src_addr, src_addr, + ip_addr_size(src_addr)); if (!status !id_priv-cma_dev) status = cma_acquire_dev(id_priv); @@ -1979,11 +1986,8 @@ static void addr_handler(int status, struct sockaddr *src_addr, goto out; event.event = RDMA_CM_EVENT_ADDR_ERROR; event.status = status; - } else { - memcpy(id_priv-id.route.addr.src_addr, src_addr, - ip_addr_size(src_addr)); + } else event.event = RDMA_CM_EVENT_ADDR_RESOLVED; - } if (id_priv-id.event_handler(id_priv-id, event)) { cma_exch(id_priv, RDMA_CM_DESTROYING); @@ -2381,6 +2385,7 @@ int rdma_bind_addr(struct rdma_cm_id *id, struct sockaddr *addr) if (ret) goto err1; + memcpy(id-route.addr.src_addr, addr, ip_addr_size(addr)); if (!cma_any_addr(addr)) { ret = rdma_translate_ip(addr, id-route.addr.dev_addr); if (ret) @@ -2391,7 +2396,6 @@ int rdma_bind_addr(struct rdma_cm_id *id, struct sockaddr *addr) goto err1; } - memcpy(id-route.addr.src_addr, addr, ip_addr_size(addr)); if (!(id_priv-options (1 CMA_OPTION_AFONLY))) { if (addr
[PATCH for-next 4/4] IB/core: Add RoCE IP based addressing extensions towards user space
From: Matan Barak mat...@mellanox.com Add support for RoCE (IBoE) IP based addressing extensions towards user space. Extend INIT_QP_ATTR and QUERY_ROUTE ucma commands. Extend MODIFY_QP and CREATE_AH uverbs commands. Signed-off-by: Matan Barak mat...@mellanox.com Signed-off-by: Or Gerlitz ogerl...@mellanox.com --- drivers/infiniband/core/ucma.c| 172 +++- drivers/infiniband/core/uverbs.h |2 + drivers/infiniband/core/uverbs_cmd.c | 330 ++--- drivers/infiniband/core/uverbs_main.c |4 +- drivers/infiniband/core/uverbs_marshall.c | 94 - include/rdma/ib_marshall.h| 12 + include/uapi/rdma/ib_user_sa.h| 34 +++- include/uapi/rdma/ib_user_verbs.h | 120 +++- include/uapi/rdma/rdma_user_cm.h | 21 ++- 9 files changed, 690 insertions(+), 99 deletions(-) diff --git a/drivers/infiniband/core/ucma.c b/drivers/infiniband/core/ucma.c index bc2cb5d..c7dfd99 100644 --- a/drivers/infiniband/core/ucma.c +++ b/drivers/infiniband/core/ucma.c @@ -599,6 +599,35 @@ static void ucma_copy_ib_route(struct rdma_ucm_query_route_resp *resp, } } +static void ucma_copy_ib_route_ex(struct rdma_ucm_query_route_resp_ex *resp, + struct rdma_route *route) +{ + struct rdma_dev_addr *dev_addr; + + resp-num_paths = route-num_paths; + switch (route-num_paths) { + case 0: + dev_addr = route-addr.dev_addr; + rdma_addr_get_dgid(dev_addr, + (union ib_gid *)resp-ib_route[0].dgid); + rdma_addr_get_sgid(dev_addr, + (union ib_gid *)resp-ib_route[0].sgid); + resp-ib_route[0].pkey = + cpu_to_be16(ib_addr_get_pkey(dev_addr)); + break; + case 2: + ib_copy_path_rec_to_user_ex(resp-ib_route[1], + route-path_rec[1]); + /* fall through */ + case 1: + ib_copy_path_rec_to_user_ex(resp-ib_route[0], + route-path_rec[0]); + break; + default: + break; + } +} + static void ucma_copy_iboe_route(struct rdma_ucm_query_route_resp *resp, struct rdma_route *route) { @@ -625,14 +654,39 @@ static void ucma_copy_iboe_route(struct rdma_ucm_query_route_resp *resp, } } -static void ucma_copy_iw_route(struct rdma_ucm_query_route_resp *resp, +static void ucma_copy_iboe_route_ex(struct rdma_ucm_query_route_resp_ex *resp, + struct rdma_route *route) +{ + resp-num_paths = route-num_paths; + switch (route-num_paths) { + case 0: + rdma_ip2gid((struct sockaddr *)route-addr.dst_addr, + (union ib_gid *)resp-ib_route[0].dgid); + rdma_ip2gid((struct sockaddr *)route-addr.src_addr, + (union ib_gid *)resp-ib_route[0].sgid); + resp-ib_route[0].pkey = cpu_to_be16(0x); + break; + case 2: + ib_copy_path_rec_to_user_ex(resp-ib_route[1], + route-path_rec[1]); + /* fall through */ + case 1: + ib_copy_path_rec_to_user_ex(resp-ib_route[0], + route-path_rec[0]); + break; + default: + break; + } +} + +static void ucma_copy_iw_route(struct ib_user_path_rec *resp_path, struct rdma_route *route) { struct rdma_dev_addr *dev_addr; dev_addr = route-addr.dev_addr; - rdma_addr_get_dgid(dev_addr, (union ib_gid *) resp-ib_route[0].dgid); - rdma_addr_get_sgid(dev_addr, (union ib_gid *) resp-ib_route[0].sgid); + rdma_addr_get_dgid(dev_addr, (union ib_gid *)resp_path-dgid); + rdma_addr_get_sgid(dev_addr, (union ib_gid *)resp_path-sgid); } static ssize_t ucma_query_route(struct ucma_file *file, @@ -684,7 +738,74 @@ static ssize_t ucma_query_route(struct ucma_file *file, } break; case RDMA_TRANSPORT_IWARP: - ucma_copy_iw_route(resp, ctx-cm_id-route); + ucma_copy_iw_route(resp.ib_route[0], ctx-cm_id-route); + break; + default: + break; + } + +out: + if (copy_to_user((void __user *)(unsigned long)cmd.response, +resp, sizeof(resp))) + ret = -EFAULT; + + ucma_put_ctx(ctx); + return ret; +} + +static ssize_t ucma_query_route_ex(struct ucma_file *file, + const char __user *inbuf, + int in_len, int out_len) +{ + struct rdma_ucm_query_route_ex cmd; + struct
Re: [PATCH for-next 0/4] IP based RoCE GID Addressing
Jason Gunthorpe jguntho...@obsidianresearch.com wrote: Can you talk abit about compatibility please? What happens when nodes with this patch are on the same network as nodes without it? The CM on the passive side would send a reject with the reason being invalid gid so this will not go unnoticed. Does this patch remove the encoding of the VLAN from the GID? YES, and I explained in argument #1 why the vlan being there doesn't work in many environments, in other words, its something that needs to be fix, and this series addresses that. How is the destination MAC derived now? as it was before, using address resolution, e.g ARPs sent by the RDMA-CM. There is a RoCE standard, it doesn't say much, but how the MAC and GRH GID are related/derived really should be specified... Not sure about copying the IP/IPv6 address from the interface into the HW, there has always been pressure to keep verbs separate from the net stack.. At the very least patch #2 should have its change log updated to actually reflect what is in the patch. Sure, I'll see what needs to be better explained in the change-log. Note that the inbox RoCE implementation is tightly coupled to net-devices, e.g the GID table population is based on netevents of related netdevices. Or. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH for-next 4/4] IB/core: Add RoCE IP based addressing extensions towards user space
On Thu, Jun 13, 2013 at 8:09 PM, Jason Gunthorpe jguntho...@obsidianresearch.com wrote: On Thu, Jun 13, 2013 at 06:01:44PM +0300, Or Gerlitz wrote: From: Matan Barak mat...@mellanox.com Add support for RoCE (IBoE) IP based addressing extensions towards user space. Extend INIT_QP_ATTR and QUERY_ROUTE ucma commands. Extend MODIFY_QP and CREATE_AH uverbs commands. This is a really big patch Or, there is lots going on here, hard to review :( The rdma cm stuff should probably be split out of this, and Sean should look at it of course. sure, will do that, one patch for uverbs and one patch for rdma_ucm In fact, since the user ABI is so important, every ABI change should be a distinct patch, with a good change log, stating the intended goals of the change and ABI visible changes it makes. point taken, will do that, thanks for bringing this over. The changelog above is terrible for a huge patch that makes changes to the userspace API. diff --git a/include/uapi/rdma/ib_user_sa.h b/include/uapi/rdma/ib_user_sa.h index cfc7c9b..367d66a 100644 +++ b/include/uapi/rdma/ib_user_sa.h @@ -48,7 +48,13 @@ enum { struct ib_path_rec_data { __u32 flags; __u32 reserved; - __u32 path_rec[16]; + __u32 path_rec[20]; +}; + +enum ibv_kern_path_rec_attr_mask { + IB_USER_PATH_REC_ATTR_DMAC = 1ULL 0, + IB_USER_PATH_REC_ATTR_SMAC = 1ULL 1, + IB_USER_PATH_REC_ATTR_VID = 1ULL 2 }; So, how is userspace supposed to know what these values are? Its part of the verbs extensions deal. The current system where the MAC address is in the GID seemed understandable, assuming you discover the MAC out of band some how... MAC is Ethernet layer 2 address, I don't see why put mac in L3 header (GRH) its better understandable vs putting there L3 address (IP). +struct ib_uverbs_modify_qp_ex { + __u32 comp_mask; + struct ib_uverbs_qp_dest dest; + struct ib_uverbs_qp_dest alt_dest; [...] + struct ib_uverbs_qp_dest_ex dest_ex; + struct ib_uverbs_qp_dest_ex alt_dest_ex; Yuk.. The 'ex' structures don't have to be byte compatible, they just have to have a known transform, dest should be the full extended dest, not split into two.. +struct rdma_ucm_query_route_resp_ex { + __u64 node_guid; + struct ib_user_path_rec_ex ib_route[2]; + struct sockaddr_in6 src_addr; + struct sockaddr_in6 dst_addr; + __u32 num_paths; + __u8 port_num; + __u8 reserved[3]; +}; Should these be sockaddr_storage? How does this intersect with Sean's AF_GID work? sockaddr_in6 is OK for extending rdma_ucm_query_route_resp as its OK for the non extended version of that command. I don't see any intersection with the AF_IB work. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V1 for-next 0/4] Add receive Flow Steering support
On Tue, Jun 11, 2013 at 2:42 PM, Or Gerlitz ogerl...@mellanox.com wrote: [...] V0 has been acknowledged by Steve and Christoph, and was also got positive feedback from Sean and Jason over f2f talks we had during the Linux Foundation EU summit on last month. Hi Roland, So we're @ -rc6 and there's also other goodies on the plate for the coming merge window ... any comment here, is this safe for 3.11? taking this? I am asking here, b/c it doesn't seem you update you patchwork, so no other choice. Or. V1 changes: - dropped the five pre-patches which were accepted into 3.10 - rebased the patches against Roland's for-next / 3.10-rc4 - in patch #3, ib_uverbs_destroy_flow was returning too quickly when the driver returned failure for ib_destroy_flow, need to free some uverbs resources 1st. - in patch #4, check index before accessing the array at mlx4_ib_create/destroy_flow Or. Hadar Hen Zion (3): IB/core: Add receive Flow Steering support IB/core: Export ib_create/destroy_flow through uverbs IB/mlx4: Add receive Flow Steering support Igor Ivanov (1): IB/core: Infra-structure to support verbs extensions through uverbs drivers/infiniband/core/uverbs.h |3 + drivers/infiniband/core/uverbs_cmd.c | 206 +++ drivers/infiniband/core/uverbs_main.c | 42 +- drivers/infiniband/core/verbs.c | 30 drivers/infiniband/hw/mlx4/main.c | 246 + include/rdma/ib_verbs.h | 137 ++- include/uapi/rdma/ib_user_verbs.h | 118 - 7 files changed, 773 insertions(+), 9 deletions(-) -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: config file lost
On 17/06/2013 21:44, Hal Rosenstock wrote: I'm not 100% sure about the origin of those RPMs but I think the 3.3.15 one is RedHat packaged and the 3.3.16 appears to be PLD packaged and the processes are a little different. I suspect the 3.3.16 one is packaged with the spec file in the tree whereas RedHat uses their own spec file. FWIW it's simple to generate an up to date config file: opensm -c opensm.conf Hal, YES for your observations, that 3.3.15 was RHEL packages and the 3.3.16 was built from the upstream spec. I know that I can generate the config file using the method you suggested, however, does the upstream service scripts uses the location to which this is generated, so things are plug-and-play, or I need to hack that somehow? Or. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: config file lost
On 18/06/2013 14:19, Hal Rosenstock wrote: Is /etc/rdma a standard location in Linux ? Is it used by other RDMA upstream components ? its used by RHEL packages, not upstream Also, opensm doesn't by default use this location for the config file. I expect that's dealt with by other scripts RedHat supplies. yes, this is part of their specs I think so things are plug-and-play, or I need to hack that somehow? You would currently need to hack that. How exactly? I'd like to build rpm from upstream opensm, generate config file and have the opensm service script to read this config and apply it for successive restarts or sig HUPs I send. Maybe you can come up with some patch, it will help Or. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V1 for-next 2/6] IB/mlx4: Use RoCE IP based GIDs in the port GID table
From: Moni Shoua mo...@mellanox.co.il Currently, the mlx4 driver set RoCE (IBoE) gids to encode related Ethernet netdevice interface MAC address and possibly VLAN id. Change this scheme such that gids encode interface IP addresses (both IP4 and IPv6). This requires learning which are the IP addresses which are of use by a netdevice associated with the HCA port, formatting them to gids and adding them to the port gid table. Further, events of add and delete address are caught to maintain the gid table accordingly. Associated IP addresses may belong to a master of an Ethernet netdevice on top of that port so this should be considered when building and maintaining the gid table. Signed-off-by: Moni Shoua mo...@mellanox.co.il Signed-off-by: Or Gerlitz ogerl...@mellanox.com --- drivers/infiniband/hw/mlx4/main.c| 461 +++--- drivers/infiniband/hw/mlx4/mlx4_ib.h |3 + 2 files changed, 320 insertions(+), 144 deletions(-) diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c index 23d7343..8879b41 100644 --- a/drivers/infiniband/hw/mlx4/main.c +++ b/drivers/infiniband/hw/mlx4/main.c @@ -39,6 +39,8 @@ #include linux/inetdevice.h #include linux/rtnetlink.h #include linux/if_vlan.h +#include net/ipv6.h +#include net/addrconf.h #include rdma/ib_smi.h #include rdma/ib_user_verbs.h @@ -767,7 +769,6 @@ static int add_gid_entry(struct ib_qp *ibqp, union ib_gid *gid) int mlx4_ib_add_mc(struct mlx4_ib_dev *mdev, struct mlx4_ib_qp *mqp, union ib_gid *gid) { - u8 mac[6]; struct net_device *ndev; int ret = 0; @@ -781,11 +782,7 @@ int mlx4_ib_add_mc(struct mlx4_ib_dev *mdev, struct mlx4_ib_qp *mqp, spin_unlock(mdev-iboe.lock); if (ndev) { - rdma_get_mcast_mac((struct in6_addr *)gid, mac); - rtnl_lock(); - dev_mc_add(mdev-iboe.netdevs[mqp-port - 1], mac); ret = 1; - rtnl_unlock(); dev_put(ndev); } @@ -805,6 +802,8 @@ static int mlx4_ib_mcg_attach(struct ib_qp *ibqp, union ib_gid *gid, u16 lid) struct mlx4_ib_qp *mqp = to_mqp(ibqp); u64 reg_id; struct mlx4_ib_steering *ib_steering = NULL; + enum mlx4_protocol prot = (gid-raw[1] == 0x0e) ? + MLX4_PROT_IB_IPV4 : MLX4_PROT_IB_IPV6; if (mdev-dev-caps.steering_mode == MLX4_STEERING_MODE_DEVICE_MANAGED) { @@ -816,7 +815,7 @@ static int mlx4_ib_mcg_attach(struct ib_qp *ibqp, union ib_gid *gid, u16 lid) err = mlx4_multicast_attach(mdev-dev, mqp-mqp, gid-raw, mqp-port, !!(mqp-flags MLX4_IB_QP_BLOCK_MULTICAST_LOOPBACK), - MLX4_PROT_IB_IPV6, reg_id); + prot, reg_id); if (err) goto err_malloc; @@ -835,7 +834,7 @@ static int mlx4_ib_mcg_attach(struct ib_qp *ibqp, union ib_gid *gid, u16 lid) err_add: mlx4_multicast_detach(mdev-dev, mqp-mqp, gid-raw, - MLX4_PROT_IB_IPV6, reg_id); + prot, reg_id); err_malloc: kfree(ib_steering); @@ -863,10 +862,11 @@ static int mlx4_ib_mcg_detach(struct ib_qp *ibqp, union ib_gid *gid, u16 lid) int err; struct mlx4_ib_dev *mdev = to_mdev(ibqp-device); struct mlx4_ib_qp *mqp = to_mqp(ibqp); - u8 mac[6]; struct net_device *ndev; struct mlx4_ib_gid_entry *ge; u64 reg_id = 0; + enum mlx4_protocol prot = (gid-raw[1] == 0x0e) ? + MLX4_PROT_IB_IPV4 : MLX4_PROT_IB_IPV6; if (mdev-dev-caps.steering_mode == MLX4_STEERING_MODE_DEVICE_MANAGED) { @@ -889,7 +889,7 @@ static int mlx4_ib_mcg_detach(struct ib_qp *ibqp, union ib_gid *gid, u16 lid) } err = mlx4_multicast_detach(mdev-dev, mqp-mqp, gid-raw, - MLX4_PROT_IB_IPV6, reg_id); + prot, reg_id); if (err) return err; @@ -901,13 +901,8 @@ static int mlx4_ib_mcg_detach(struct ib_qp *ibqp, union ib_gid *gid, u16 lid) if (ndev) dev_hold(ndev); spin_unlock(mdev-iboe.lock); - rdma_get_mcast_mac((struct in6_addr *)gid, mac); - if (ndev) { - rtnl_lock(); - dev_mc_del(mdev-iboe.netdevs[ge-port - 1], mac); - rtnl_unlock(); + if (ndev) dev_put(ndev); - } list_del(ge-list); kfree(ge); } else @@ -1003,20 +998,6 @@ static struct device_attribute *mlx4_class_attributes[] = { dev_attr_board_id }; -static void mlx4_addrconf_ifid_eui48(u8 *eui, u16 vlan_id, struct net_device *dev) -{ - memcpy(eui, dev-dev_addr, 3
[PATCH V1 for-next 4/6] IB/core: Infra-structure to support verbs extensions through uverbs
From: Igor Ivanov igor.iva...@itseez.com Add Infra-structure to support extended uverbs capabilities in a forward/backward manner. Uverbs command opcodes which are based on the verbs extensions approach should be greater or equal to IB_USER_VERBS_CMD_THRESHOLD. They have new header format and processed a bit differently. Signed-off-by: Igor Ivanov igor.iva...@itseez.com Signed-off-by: Hadar Hen Zion had...@mellanox.com Signed-off-by: Or Gerlitz ogerl...@mellanox.com --- drivers/infiniband/core/uverbs_main.c | 29 - include/uapi/rdma/ib_user_verbs.h | 10 ++ 2 files changed, 34 insertions(+), 5 deletions(-) diff --git a/drivers/infiniband/core/uverbs_main.c b/drivers/infiniband/core/uverbs_main.c index 2c6f0f2..e4e7b24 100644 --- a/drivers/infiniband/core/uverbs_main.c +++ b/drivers/infiniband/core/uverbs_main.c @@ -583,9 +583,6 @@ static ssize_t ib_uverbs_write(struct file *filp, const char __user *buf, if (copy_from_user(hdr, buf, sizeof hdr)) return -EFAULT; - if (hdr.in_words * 4 != count) - return -EINVAL; - if (hdr.command = ARRAY_SIZE(uverbs_cmd_table) || !uverbs_cmd_table[hdr.command]) return -EINVAL; @@ -597,8 +594,30 @@ static ssize_t ib_uverbs_write(struct file *filp, const char __user *buf, if (!(file-device-ib_dev-uverbs_cmd_mask (1ull hdr.command))) return -ENOSYS; - return uverbs_cmd_table[hdr.command](file, buf + sizeof hdr, -hdr.in_words * 4, hdr.out_words * 4); + if (hdr.command = IB_USER_VERBS_CMD_THRESHOLD) { + struct ib_uverbs_cmd_hdr_ex hdr_ex; + + if (copy_from_user(hdr_ex, buf, sizeof(hdr_ex))) + return -EFAULT; + + if (((hdr_ex.in_words + hdr_ex.provider_in_words) * 4) != count) + return -EINVAL; + + return uverbs_cmd_table[hdr.command](file, +buf + sizeof(hdr_ex), +(hdr_ex.in_words + + hdr_ex.provider_in_words) * 4, +(hdr_ex.out_words + + hdr_ex.provider_out_words) * 4); + } else { + if (hdr.in_words * 4 != count) + return -EINVAL; + + return uverbs_cmd_table[hdr.command](file, +buf + sizeof(hdr), +hdr.in_words * 4, +hdr.out_words * 4); + } } static int ib_uverbs_mmap(struct file *filp, struct vm_area_struct *vma) diff --git a/include/uapi/rdma/ib_user_verbs.h b/include/uapi/rdma/ib_user_verbs.h index 805711e..61535aa 100644 --- a/include/uapi/rdma/ib_user_verbs.h +++ b/include/uapi/rdma/ib_user_verbs.h @@ -43,6 +43,7 @@ * compatibility are made. */ #define IB_USER_VERBS_ABI_VERSION 6 +#define IB_USER_VERBS_CMD_THRESHOLD50 enum { IB_USER_VERBS_CMD_GET_CONTEXT, @@ -123,6 +124,15 @@ struct ib_uverbs_cmd_hdr { __u16 out_words; }; +struct ib_uverbs_cmd_hdr_ex { + __u32 command; + __u16 in_words; + __u16 out_words; + __u16 provider_in_words; + __u16 provider_out_words; + __u32 cmd_hdr_reserved; +}; + struct ib_uverbs_get_context { __u64 response; __u64 driver_data[0]; -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V1 for-next 0/6] IP based RoCE GID Addressing
Currently, the IB stack (core + drivers) handle RoCE (IBoE) gids as they encode related Ethernet net-device interface MAC address and possibly VLAN id. This series changes RoCE GIDs to encode IP addresses (IPv4 + IPv6) of the that Ethernet interface, under the following reasoning: 1. There are environments where the compute entity that runs the RoCE stack is not aware that its traffic is vlan-tagged. This results with that node to create/assume wrong GIDs from the view point of a peer node which is aware to vlans. Note that node here can be physical node connected to Ethernet switch acting in access mode talking to another node which does vlan insertion/stripping by itself. Or another example is SRIOV Virtual Function which is configured to work in VST mode (Virtual-Switch-Tagging) such that the hypervisor configures the HW eSWitch to do vlan insertion for the vPORT representing that function. 2. When RoCE traffic is inspected (mirrored/trapped) in Ethernet switches for monitoring and security purposes. It is much more natural for both humans and automated utilities (...) to observe IP addresses in a certain offset into RoCE frames L3 header vs. MAC/VLANs (which are there anyway in the L2 header of that frame, so they are not gone by this change). 3. Some Bonding/Teaming advanced mode such as balance-alb and balance-tlb are using multiple underlying devices in parallel, and hence packets always carry the bond IP address but different streams have different source MACs. The approach brought by this series is part from what would allow to support that for RoCE traffic too. The 1st patch modified the IB core to cope with the new scheme, and the 2nd/3rd ones do that for the mlx4_ib driver. The 4th patch sets the foundation for extending uverbs to the new scheme which was introduced lately, and the 5th/6th patches add two extended uCMA commands and two extended uVERBS commands which are now exported to user space. These extended verbs will allow to enhance user space libraries such that they work OK over the modified scheme. All RC applications using librdmacm will not need to be modified at all, since the change will be encapsulated into that library. The ocrdma driver needs to go through a similar patch as the mlx4_ib one, we can surely do that patch, just need to dig there a little further. Or. changes from V0: - enhanced docuementation of the mlx4_ib, uverbs and ucma patches - broke the mlx4_ib patch to two - broke the extended user space commands patch to two Igor Ivanov (1): IB/core: Infra-structure to support verbs extensions through uverbs Matan Barak (2): IB/core: Add RoCE IP based addressing extensions for uverbs IB/core: Add RoCE IP based addressing extensions for rdma_ucm Moni Shoua (3): IB/core: RoCE IP based GID addressing IB/mlx4: Use RoCE IP based GIDs in the port GID table IB/mlx4: Handle Ethernet L2 parameters for IP based GID addressing drivers/infiniband/core/cm.c |3 + drivers/infiniband/core/cma.c | 39 ++- drivers/infiniband/core/sa_query.c|5 + drivers/infiniband/core/ucma.c| 190 +++-- drivers/infiniband/core/uverbs.h |2 + drivers/infiniband/core/uverbs_cmd.c | 330 - drivers/infiniband/core/uverbs_main.c | 33 ++- drivers/infiniband/core/uverbs_marshall.c | 94 ++- drivers/infiniband/core/verbs.c |7 + drivers/infiniband/hw/mlx4/ah.c | 21 +- drivers/infiniband/hw/mlx4/cq.c |5 + drivers/infiniband/hw/mlx4/main.c | 461 - drivers/infiniband/hw/mlx4/mlx4_ib.h |3 + drivers/infiniband/hw/mlx4/qp.c | 19 +- include/linux/mlx4/cq.h | 14 +- include/rdma/ib_addr.h| 45 ++-- include/rdma/ib_marshall.h| 12 + include/rdma/ib_sa.h |3 + include/rdma/ib_verbs.h |4 + include/uapi/rdma/ib_user_sa.h| 34 ++- include/uapi/rdma/ib_user_verbs.h | 130 - include/uapi/rdma/rdma_user_cm.h | 21 ++- 22 files changed, 1157 insertions(+), 318 deletions(-) -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V1 for-next 1/6] IB/core: RoCE IP based GID addressing
From: Moni Shoua mo...@mellanox.co.il Currently, the IB core assume RoCE (IBoE) gids encode related Ethernet netdevice interface MAC address and possibly VLAN id. Change gids to be treated as they encode interface IP address. Since Ethernet layer 2 address parameters are not longer encoded within gids, had to extend the Infiniband address structures (e.g. ib_ah_attr) with layer 2 address parameters, namely mac and vlan. Signed-off-by: Moni Shoua mo...@mellanox.co.il Signed-off-by: Or Gerlitz ogerl...@mellanox.com --- drivers/infiniband/core/cm.c |3 ++ drivers/infiniband/core/cma.c | 39 ++ drivers/infiniband/core/sa_query.c |5 drivers/infiniband/core/ucma.c | 18 +++--- drivers/infiniband/core/verbs.c|7 + include/rdma/ib_addr.h | 45 include/rdma/ib_sa.h |3 ++ include/rdma/ib_verbs.h|4 +++ 8 files changed, 79 insertions(+), 45 deletions(-) diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c index 784b97c..7af618f 100644 --- a/drivers/infiniband/core/cm.c +++ b/drivers/infiniband/core/cm.c @@ -1557,6 +1557,9 @@ static int cm_req_handler(struct cm_work *work) cm_process_routed_req(req_msg, work-mad_recv_wc-wc); cm_format_paths_from_req(req_msg, work-path[0], work-path[1]); + + memcpy(work-path[0].dmac, cm_id_priv-av.ah_attr.dmac, 6); + work-path[0].vlan = cm_id_priv-av.ah_attr.vlan; ret = cm_init_av_by_path(work-path[0], cm_id_priv-av); if (ret) { ib_get_cached_gid(work-port-cm_dev-ib_device, diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index 71c2c71..ba217c9 100644 --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -373,7 +373,9 @@ static int cma_acquire_dev(struct rdma_id_private *id_priv) return -EINVAL; mutex_lock(lock); - iboe_addr_get_sgid(dev_addr, iboe_gid); + rdma_ip2gid((struct sockaddr *)id_priv-id.route.addr.src_addr, + iboe_gid); + memcpy(gid, dev_addr-src_dev_addr + rdma_addr_gid_offset(dev_addr), sizeof gid); list_for_each_entry(cma_dev, dev_list, list) { @@ -1803,7 +1805,7 @@ static int cma_resolve_iboe_route(struct rdma_id_private *id_priv) struct sockaddr_in *src_addr = (struct sockaddr_in *)route-addr.src_addr; struct sockaddr_in *dst_addr = (struct sockaddr_in *)route-addr.dst_addr; struct net_device *ndev = NULL; - u16 vid; + if (src_addr-sin_family != dst_addr-sin_family) return -EINVAL; @@ -1830,10 +1832,13 @@ static int cma_resolve_iboe_route(struct rdma_id_private *id_priv) goto err2; } - vid = rdma_vlan_dev_vlan_id(ndev); + route-path_rec-vlan = rdma_vlan_dev_vlan_id(ndev); + memcpy(route-path_rec-dmac, addr-dev_addr.dst_dev_addr, 6); - iboe_mac_vlan_to_ll(route-path_rec-sgid, addr-dev_addr.src_dev_addr, vid); - iboe_mac_vlan_to_ll(route-path_rec-dgid, addr-dev_addr.dst_dev_addr, vid); + rdma_ip2gid((struct sockaddr *)id_priv-id.route.addr.src_addr, + route-path_rec-sgid); + rdma_ip2gid((struct sockaddr *)id_priv-id.route.addr.dst_addr, + route-path_rec-dgid); route-path_rec-hop_limit = 1; route-path_rec-reversible = 1; @@ -1970,6 +1975,8 @@ static void addr_handler(int status, struct sockaddr *src_addr, RDMA_CM_ADDR_RESOLVED)) goto out; + memcpy(id_priv-id.route.addr.src_addr, src_addr, + ip_addr_size(src_addr)); if (!status !id_priv-cma_dev) status = cma_acquire_dev(id_priv); @@ -1979,11 +1986,8 @@ static void addr_handler(int status, struct sockaddr *src_addr, goto out; event.event = RDMA_CM_EVENT_ADDR_ERROR; event.status = status; - } else { - memcpy(id_priv-id.route.addr.src_addr, src_addr, - ip_addr_size(src_addr)); + } else event.event = RDMA_CM_EVENT_ADDR_RESOLVED; - } if (id_priv-id.event_handler(id_priv-id, event)) { cma_exch(id_priv, RDMA_CM_DESTROYING); @@ -2381,6 +2385,7 @@ int rdma_bind_addr(struct rdma_cm_id *id, struct sockaddr *addr) if (ret) goto err1; + memcpy(id-route.addr.src_addr, addr, ip_addr_size(addr)); if (!cma_any_addr(addr)) { ret = rdma_translate_ip(addr, id-route.addr.dev_addr); if (ret) @@ -2391,7 +2396,6 @@ int rdma_bind_addr(struct rdma_cm_id *id, struct sockaddr *addr) goto err1; } - memcpy(id-route.addr.src_addr, addr, ip_addr_size(addr)); if (!(id_priv-options (1 CMA_OPTION_AFONLY))) { if (addr
[PATCH V1 for-next 5/6] IB/core: Add RoCE IP based addressing extensions for uverbs
From: Matan Barak mat...@mellanox.com Add uverbs support for RoCE (IBoE) IP based addressing extensions towards user space libraries. Under ip based gid addressing, for RC QPs, QP attributes should contain the Ethernet L2 destination. Until now, indicatings GID was sufficient. When using ip encoded in gids, the QP attributes should contain extended destination, indicating vlan and dmac as well. This is done via a new struct ib_uverbs_qp_dest_ex. This new structure is contained in a new struct ib_uverbs_modify_qp_ex that is used by MODIFY_QP_EX command. In order to make those changes seamlessly, those extended structures were added in the bottom of the current structures. Also, when the gid encodes ip address, the AH attributes should contain also vlan and dmac. Therefore, ib_uverbs_create_ah was extended to contain those fields. When creating an AH, the user indicates the exact L2 ethernet destination parameters. This is done by a new CREATE_AH_EX command that uses a new struct ib_uverbs_create_ah_ex. struct ib_user_path_rec was extended too, to contain source and destination MAC and VLAN ID, this structure is of use by the rdma_ucm driver. Signed-off-by: Matan Barak mat...@mellanox.com Signed-off-by: Or Gerlitz ogerl...@mellanox.com --- drivers/infiniband/core/uverbs.h |2 + drivers/infiniband/core/uverbs_cmd.c | 330 ++--- drivers/infiniband/core/uverbs_main.c |4 +- drivers/infiniband/core/uverbs_marshall.c | 94 - include/rdma/ib_marshall.h| 12 + include/uapi/rdma/ib_user_sa.h| 34 +++- include/uapi/rdma/ib_user_verbs.h | 120 +++- 7 files changed, 503 insertions(+), 93 deletions(-) diff --git a/drivers/infiniband/core/uverbs.h b/drivers/infiniband/core/uverbs.h index 0fcd7aa..1ec4850 100644 --- a/drivers/infiniband/core/uverbs.h +++ b/drivers/infiniband/core/uverbs.h @@ -200,11 +200,13 @@ IB_UVERBS_DECLARE_CMD(create_qp); IB_UVERBS_DECLARE_CMD(open_qp); IB_UVERBS_DECLARE_CMD(query_qp); IB_UVERBS_DECLARE_CMD(modify_qp); +IB_UVERBS_DECLARE_CMD(modify_qp_ex); IB_UVERBS_DECLARE_CMD(destroy_qp); IB_UVERBS_DECLARE_CMD(post_send); IB_UVERBS_DECLARE_CMD(post_recv); IB_UVERBS_DECLARE_CMD(post_srq_recv); IB_UVERBS_DECLARE_CMD(create_ah); +IB_UVERBS_DECLARE_CMD(create_ah_ex); IB_UVERBS_DECLARE_CMD(destroy_ah); IB_UVERBS_DECLARE_CMD(attach_mcast); IB_UVERBS_DECLARE_CMD(detach_mcast); diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c index a7d00f6..eb3e7e6 100644 --- a/drivers/infiniband/core/uverbs_cmd.c +++ b/drivers/infiniband/core/uverbs_cmd.c @@ -1891,6 +1891,58 @@ static int modify_qp_mask(enum ib_qp_type qp_type, int mask) } } +static void ib_uverbs_modify_qp_assign(struct ib_uverbs_modify_qp *cmd, + struct ib_qp_attr *attr) { + attr-qp_state= cmd-qp_state; + attr-cur_qp_state= cmd-cur_qp_state; + attr-path_mtu= cmd-path_mtu; + attr-path_mig_state = cmd-path_mig_state; + attr-qkey= cmd-qkey; + attr-rq_psn = cmd-rq_psn; + attr-sq_psn = cmd-sq_psn; + attr-dest_qp_num = cmd-dest_qp_num; + attr-qp_access_flags = cmd-qp_access_flags; + attr-pkey_index = cmd-pkey_index; + attr-alt_pkey_index = cmd-alt_pkey_index; + attr-en_sqd_async_notify = cmd-en_sqd_async_notify; + attr-max_rd_atomic = cmd-max_rd_atomic; + attr-max_dest_rd_atomic = cmd-max_dest_rd_atomic; + attr-min_rnr_timer = cmd-min_rnr_timer; + attr-port_num= cmd-port_num; + attr-timeout = cmd-timeout; + attr-retry_cnt = cmd-retry_cnt; + attr-rnr_retry = cmd-rnr_retry; + attr-alt_port_num= cmd-alt_port_num; + attr-alt_timeout = cmd-alt_timeout; + + memcpy(attr-ah_attr.grh.dgid.raw, cmd-dest.dgid, 16); + attr-ah_attr.grh.flow_label= cmd-dest.flow_label; + attr-ah_attr.grh.sgid_index= cmd-dest.sgid_index; + attr-ah_attr.grh.hop_limit = cmd-dest.hop_limit; + attr-ah_attr.grh.traffic_class = cmd-dest.traffic_class; + attr-ah_attr.dlid = cmd-dest.dlid; + attr-ah_attr.sl= cmd-dest.sl; + attr-ah_attr.src_path_bits = cmd-dest.src_path_bits; + attr-ah_attr.static_rate = cmd-dest.static_rate; + attr-ah_attr.ah_flags = cmd-dest.is_global ? + IB_AH_GRH : 0; + attr-ah_attr.port_num = cmd-dest.port_num; + + memcpy(attr-alt_ah_attr.grh.dgid.raw, cmd-alt_dest.dgid, 16); + attr-alt_ah_attr.grh.flow_label= cmd-alt_dest.flow_label; + attr-alt_ah_attr.grh.sgid_index= cmd-alt_dest.sgid_index; + attr
[PATCH V1 for-next 6/6] IB/core: Add RoCE IP based addressing extensions for rdma_ucm
From: Matan Barak mat...@mellanox.com Add rdma_ucm support for RoCE (IBoE) IP based addressing extensions towards librdmacm Extend INIT_QP_ATTR and QUERY_ROUTE ucma commands. INIT_QP_ATTR_EX uses struct ib_uverbs_qp_attr_ex QUERY_ROUTE_EX uses struct rdma_ucm_query_route_resp_ex which in turn uses ib_user_path_rec_ex Signed-off-by: Matan Barak mat...@mellanox.com Signed-off-by: Or Gerlitz ogerl...@mellanox.com --- drivers/infiniband/core/ucma.c | 172 - include/uapi/rdma/rdma_user_cm.h | 21 +- 2 files changed, 187 insertions(+), 6 deletions(-) diff --git a/drivers/infiniband/core/ucma.c b/drivers/infiniband/core/ucma.c index bc2cb5d..c7dfd99 100644 --- a/drivers/infiniband/core/ucma.c +++ b/drivers/infiniband/core/ucma.c @@ -599,6 +599,35 @@ static void ucma_copy_ib_route(struct rdma_ucm_query_route_resp *resp, } } +static void ucma_copy_ib_route_ex(struct rdma_ucm_query_route_resp_ex *resp, + struct rdma_route *route) +{ + struct rdma_dev_addr *dev_addr; + + resp-num_paths = route-num_paths; + switch (route-num_paths) { + case 0: + dev_addr = route-addr.dev_addr; + rdma_addr_get_dgid(dev_addr, + (union ib_gid *)resp-ib_route[0].dgid); + rdma_addr_get_sgid(dev_addr, + (union ib_gid *)resp-ib_route[0].sgid); + resp-ib_route[0].pkey = + cpu_to_be16(ib_addr_get_pkey(dev_addr)); + break; + case 2: + ib_copy_path_rec_to_user_ex(resp-ib_route[1], + route-path_rec[1]); + /* fall through */ + case 1: + ib_copy_path_rec_to_user_ex(resp-ib_route[0], + route-path_rec[0]); + break; + default: + break; + } +} + static void ucma_copy_iboe_route(struct rdma_ucm_query_route_resp *resp, struct rdma_route *route) { @@ -625,14 +654,39 @@ static void ucma_copy_iboe_route(struct rdma_ucm_query_route_resp *resp, } } -static void ucma_copy_iw_route(struct rdma_ucm_query_route_resp *resp, +static void ucma_copy_iboe_route_ex(struct rdma_ucm_query_route_resp_ex *resp, + struct rdma_route *route) +{ + resp-num_paths = route-num_paths; + switch (route-num_paths) { + case 0: + rdma_ip2gid((struct sockaddr *)route-addr.dst_addr, + (union ib_gid *)resp-ib_route[0].dgid); + rdma_ip2gid((struct sockaddr *)route-addr.src_addr, + (union ib_gid *)resp-ib_route[0].sgid); + resp-ib_route[0].pkey = cpu_to_be16(0x); + break; + case 2: + ib_copy_path_rec_to_user_ex(resp-ib_route[1], + route-path_rec[1]); + /* fall through */ + case 1: + ib_copy_path_rec_to_user_ex(resp-ib_route[0], + route-path_rec[0]); + break; + default: + break; + } +} + +static void ucma_copy_iw_route(struct ib_user_path_rec *resp_path, struct rdma_route *route) { struct rdma_dev_addr *dev_addr; dev_addr = route-addr.dev_addr; - rdma_addr_get_dgid(dev_addr, (union ib_gid *) resp-ib_route[0].dgid); - rdma_addr_get_sgid(dev_addr, (union ib_gid *) resp-ib_route[0].sgid); + rdma_addr_get_dgid(dev_addr, (union ib_gid *)resp_path-dgid); + rdma_addr_get_sgid(dev_addr, (union ib_gid *)resp_path-sgid); } static ssize_t ucma_query_route(struct ucma_file *file, @@ -684,7 +738,74 @@ static ssize_t ucma_query_route(struct ucma_file *file, } break; case RDMA_TRANSPORT_IWARP: - ucma_copy_iw_route(resp, ctx-cm_id-route); + ucma_copy_iw_route(resp.ib_route[0], ctx-cm_id-route); + break; + default: + break; + } + +out: + if (copy_to_user((void __user *)(unsigned long)cmd.response, +resp, sizeof(resp))) + ret = -EFAULT; + + ucma_put_ctx(ctx); + return ret; +} + +static ssize_t ucma_query_route_ex(struct ucma_file *file, + const char __user *inbuf, + int in_len, int out_len) +{ + struct rdma_ucm_query_route_ex cmd; + struct rdma_ucm_query_route_resp_ex resp; + struct ucma_context *ctx; + struct sockaddr *addr; + int ret = 0; + + if (out_len sizeof(resp)) + return -ENOSPC; + + if (copy_from_user(cmd, inbuf, sizeof(cmd))) + return -EFAULT; + + ctx = ucma_get_ctx(file, cmd.id
[PATCH V1 for-next 3/6] IB/mlx4: Handle Ethernet L2 parameters for IP based GID addressing
From: Moni Shoua mo...@mellanox.co.il IP based RoCE gids don't store Ethernet L2 parameters, MAC and VLAN. Hence, we need to extract them now from the CQE and place in struct ib_wc (to be used for cases were they were taken from the gid). Also, when modifying a QP or building address handle, instead of parsing the dgid to get the MAC and VLAN, take them from the address handle attributes. Signed-off-by: Moni Shoua mo...@mellanox.co.il Signed-off-by: Or Gerlitz ogerl...@mellanox.com --- drivers/infiniband/hw/mlx4/ah.c | 21 + drivers/infiniband/hw/mlx4/cq.c |5 + drivers/infiniband/hw/mlx4/qp.c | 19 ++- include/linux/mlx4/cq.h | 14 ++ 4 files changed, 34 insertions(+), 25 deletions(-) diff --git a/drivers/infiniband/hw/mlx4/ah.c b/drivers/infiniband/hw/mlx4/ah.c index a251bec..3941700 100644 --- a/drivers/infiniband/hw/mlx4/ah.c +++ b/drivers/infiniband/hw/mlx4/ah.c @@ -92,21 +92,18 @@ static struct ib_ah *create_iboe_ah(struct ib_pd *pd, struct ib_ah_attr *ah_attr { struct mlx4_ib_dev *ibdev = to_mdev(pd-device); struct mlx4_dev *dev = ibdev-dev; - union ib_gid sgid; - u8 mac[6]; - int err; int is_mcast; + struct in6_addr in6; u16 vlan_tag; - err = mlx4_ib_resolve_grh(ibdev, ah_attr, mac, is_mcast, ah_attr-port_num); - if (err) - return ERR_PTR(err); - - memcpy(ah-av.eth.mac, mac, 6); - err = ib_get_cached_gid(pd-device, ah_attr-port_num, ah_attr-grh.sgid_index, sgid); - if (err) - return ERR_PTR(err); - vlan_tag = rdma_get_vlan_id(sgid); + memcpy(in6, ah_attr-grh.dgid.raw, sizeof(in6)); + if (rdma_is_multicast_addr(in6)) { + is_mcast = 1; + rdma_get_mcast_mac(in6, ah-av.eth.mac); + } else { + memcpy(ah-av.eth.mac, ah_attr-dmac, 6); + } + vlan_tag = ah_attr-vlan; if (vlan_tag 0x1000) vlan_tag |= (ah_attr-sl 7) 13; ah-av.eth.port_pd = cpu_to_be32(to_mpd(pd)-pdn | (ah_attr-port_num 24)); diff --git a/drivers/infiniband/hw/mlx4/cq.c b/drivers/infiniband/hw/mlx4/cq.c index d5e60f4..ba3f85b 100644 --- a/drivers/infiniband/hw/mlx4/cq.c +++ b/drivers/infiniband/hw/mlx4/cq.c @@ -793,6 +793,11 @@ repoll: wc-sl = be16_to_cpu(cqe-sl_vid) 13; else wc-sl = be16_to_cpu(cqe-sl_vid) 12; + if (be32_to_cpu(cqe-vlan_my_qpn) MLX4_CQE_VLAN_PRESENT_MASK) + wc-vlan = be16_to_cpu(cqe-sl_vid) MLX4_CQE_VID_MASK; + else + wc-vlan = 0x; + memcpy(wc-smac, cqe-smac, 6); } return 0; diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c index 4f10af2..ddf5a1a 100644 --- a/drivers/infiniband/hw/mlx4/qp.c +++ b/drivers/infiniband/hw/mlx4/qp.c @@ -1147,11 +1147,8 @@ static void mlx4_set_sched(struct mlx4_qp_path *path, u8 port) static int mlx4_set_path(struct mlx4_ib_dev *dev, const struct ib_ah_attr *ah, struct mlx4_qp_path *path, u8 port) { - int err; int is_eth = rdma_port_get_link_layer(dev-ib_dev, port) == IB_LINK_LAYER_ETHERNET; - u8 mac[6]; - int is_mcast; u16 vlan_tag; int vidx; @@ -1188,16 +1185,12 @@ static int mlx4_set_path(struct mlx4_ib_dev *dev, const struct ib_ah_attr *ah, if (!(ah-ah_flags IB_AH_GRH)) return -1; - err = mlx4_ib_resolve_grh(dev, ah, mac, is_mcast, port); - if (err) - return err; - - memcpy(path-dmac, mac, 6); + memcpy(path-dmac, ah-dmac, 6); path-ackto = MLX4_IB_LINK_TYPE_ETH; /* use index 0 into MAC table for IBoE */ path-grh_mylmc = 0x80; - vlan_tag = rdma_get_vlan_id(dev-iboe.gid_table[port - 1][ah-grh.sgid_index]); + vlan_tag = ah-vlan; if (vlan_tag 0x1000) { if (mlx4_find_cached_vlan(dev-dev, port, vlan_tag, vidx)) return -ENOENT; @@ -1236,6 +1229,7 @@ static int __mlx4_ib_modify_qp(struct ib_qp *ibqp, enum mlx4_qp_optpar optpar = 0; int sqd_event; int err = -EINVAL; + int is_eth; context = kzalloc(sizeof *context, GFP_KERNEL); if (!context) @@ -1464,6 +1458,13 @@ static int __mlx4_ib_modify_qp(struct ib_qp *ibqp, context-pri_path.ackto = (context-pri_path.ackto 0xf8) | MLX4_IB_LINK_TYPE_ETH; + if (ibqp-qp_type == IB_QPT_UD) + if (is_eth (new_state == IB_QPS_RTR)) { + context-pri_path.ackto = MLX4_IB_LINK_TYPE_ETH; + optpar |= MLX4_QP_OPTPAR_PRIMARY_ADDR_PATH
Re: [PATCH for-next 0/8] Add Mellanox mlx5 driver for Connect-IB devices
On 16/06/2013 15:02, Eli Cohen wrote: From: Eli Cohen e...@mellanox.com The patches that follow constitute the driver for Mellanox's 5th generation of HCAs named Connect-IB. The driver is comprised of two kernel modules: mlx5_ib and mlx5_core. This partitioning resembles what we have for mlx4 with the substantial difference that mlx5_ib is the pci device driver and not mlx5_core. mlx5_core provides general functionality that is intended to be used by other Mellanox devices that will be introduced in the future. In this sense, it can be perceived as a library. mlx5_ib has a similar role as any hardware device under drivers/infiniband/hw. Hi Dave, So we skipped netdev in V0, in an attempt to reduce cross postings... anyway, the mlx5_core driver is similar story as of mlx4_core. So, if looking forward, for the initial merge to be simpler, are you OK for both the core and IB driver to go through Roland's tree? Or. The patches are partitioned to avoid exceeding the 100KB vger.kernel.org limitation. Only the last patch adds the Makefiles and Kconfigs, to make things robust for future bisections. PPC is not yet supported but support will be included in the near future. Eli Eli Cohen (8): mlx5: Mellanox Connect-IB driver part 1/8 mlx5: Mellanox Connect-IB driver part 2/8 mlx5: Mellanox Connect-IB driver part 3/8 mlx5: Mellanox Connect-IB driver part 4/8 mlx5: Mellanox Connect-IB driver part 5/8 mlx5: Mellanox Connect-IB driver part 6/8 mlx5: Mellanox Connect-IB driver part 7/8 mlx5: Mellanox Connect-IB driver part 8/8 MAINTAINERS| 22 + drivers/infiniband/Kconfig |1 + drivers/infiniband/Makefile|1 + drivers/infiniband/hw/mlx5/Kconfig | 10 + drivers/infiniband/hw/mlx5/Makefile|4 + drivers/infiniband/hw/mlx5/ah.c| 95 + drivers/infiniband/hw/mlx5/cq.c| 851 +++ drivers/infiniband/hw/mlx5/doorbell.c | 100 + drivers/infiniband/hw/mlx5/mad.c | 143 ++ drivers/infiniband/hw/mlx5/main.c | 1512 drivers/infiniband/hw/mlx5/mem.c | 194 ++ drivers/infiniband/hw/mlx5/mlx5_ib.h | 593 + drivers/infiniband/hw/mlx5/mr.c| 1025 drivers/infiniband/hw/mlx5/qp.c| 2549 drivers/infiniband/hw/mlx5/srq.c | 481 drivers/infiniband/hw/mlx5/user.h | 123 + drivers/net/ethernet/mellanox/Kconfig |1 + drivers/net/ethernet/mellanox/Makefile |1 + drivers/net/ethernet/mellanox/mlx5/core/Kconfig| 18 + drivers/net/ethernet/mellanox/mlx5/core/Makefile |6 + drivers/net/ethernet/mellanox/mlx5/core/alloc.c| 244 ++ drivers/net/ethernet/mellanox/mlx5/core/cmd.c | 1497 drivers/net/ethernet/mellanox/mlx5/core/cq.c | 226 ++ drivers/net/ethernet/mellanox/mlx5/core/debugfs.c | 600 + drivers/net/ethernet/mellanox/mlx5/core/eq.c | 523 drivers/net/ethernet/mellanox/mlx5/core/fw.c | 187 ++ drivers/net/ethernet/mellanox/mlx5/core/health.c | 216 ++ drivers/net/ethernet/mellanox/mlx5/core/mad.c | 80 + drivers/net/ethernet/mellanox/mlx5/core/main.c | 483 drivers/net/ethernet/mellanox/mlx5/core/mcg.c | 108 + .../net/ethernet/mellanox/mlx5/core/mlx5_core.h| 96 + drivers/net/ethernet/mellanox/mlx5/core/mr.c | 138 ++ .../net/ethernet/mellanox/mlx5/core/pagealloc.c| 438 drivers/net/ethernet/mellanox/mlx5/core/pd.c | 103 + drivers/net/ethernet/mellanox/mlx5/core/port.c | 106 + drivers/net/ethernet/mellanox/mlx5/core/qp.c | 303 +++ drivers/net/ethernet/mellanox/mlx5/core/srq.c | 225 ++ drivers/net/ethernet/mellanox/mlx5/core/uar.c | 225 ++ include/linux/mlx5/cmd.h | 51 + include/linux/mlx5/cq.h| 166 ++ include/linux/mlx5/device.h| 886 +++ include/linux/mlx5/doorbell.h | 81 + include/linux/mlx5/driver.h| 763 ++ include/linux/mlx5/qp.h| 467 include/linux/mlx5/srq.h | 41 + 45 files changed, 15983 insertions(+) create mode 100644 drivers/infiniband/hw/mlx5/Kconfig create mode 100644 drivers/infiniband/hw/mlx5/Makefile create mode 100644 drivers/infiniband/hw/mlx5/ah.c create mode 100644 drivers/infiniband/hw/mlx5/cq.c create mode 100644 drivers/infiniband/hw/mlx5/doorbell.c create mode 100644 drivers/infiniband/hw/mlx5/mad.c create mode 100644 drivers/infiniband/hw/mlx5/main.c create mode 100644 drivers/infiniband/hw/mlx5/mem.c create mode 100644
Re: NFS over RDMA benchmark
On 19/06/2013 18:47, Wendy Cheng wrote: what kind of HW I would need to run it ? The mlx4 driver supports memory windows as of kernel 3.9 Or. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] IB/qib: add optional numa affinity
Mike Marciniszyn mike.marcinis...@intel.com wrote: From: Ramkrishna Vepa ramkrishna.v...@intel.com This patch adds context relative numa affinity conditioned on the module parameter numa_aware. The qib_ctxtdata has an additional node_id member and qib_create_ctxtdata() has an addition node_id parameter. Could you elaborate why making numa awareness be conditioned on module parameter? Or. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V1 for-next 3/4] IB/core: Export ib_create/destroy_flow through uverbs
On 25/06/2013 00:10, Roland Dreier wrote: On Tue, Jun 11, 2013 at 4:42 AM, Or Gerlitz ogerl...@mellanox.com wrote: +struct ib_kern_flow { + struct ib_device *device; + struct ib_uobject *uobject; + void *flow_context; +}; I don't think it makes sense to put a structure with kernel pointers in it into an include file under include/uapi. For one thing the size of pointers depends on whether userspace is 32-bit or 64-bit (but of course there are many other reasons why this will break). good catch, will look fix up Or. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V2 for-next 1/4] IB/core: Add receive Flow Steering support
From: Hadar Hen Zion had...@mellanox.com The RDMA stack allows for applications to create IB_QPT_RAW_PACKET QPs, for which plain Ethernet packets are used, specifically packets which don't carry any QPN to be matched by the receiving side. Applications using these QPs must be provided with a method to program some steering rule with the HW so packets arriving at the local port can be routed to them. This patch adds ib_create_flow which allow to provide a flow specification for a QP, such that when there's a match between the specification and the received packet, it can be forwarded to that QP, in a similar manner one needs to use ib_attach_multicast for IB UD multicast handling. Flow specifications are provided as instances of struct ib_flow_spec_yyy which describe L2, L3 and L4 headers, currently specs for Ethernet, IPv4, TCP, UDP and IB are defined. Flow specs are made of values and masks. The input to ib_create_flow is instance of struct ib_flow_attr which contain few mandatory control elements and optional flow specs. struct ib_flow_attr { enum ib_flow_attr_type type; u16 size; u16 priority; u8 num_of_specs; u8 port; u32 flags; /* Following are the optional layers according to user request * struct ib_flow_spec_yyy * struct ib_flow_spec_zzz */ }; As these specs are eventually coming from user space, they are defined and used in a way which allows adding new spec types without kernel/user ABI change, and with a little API enhancement which defines the newly added spec. The flow spec structures are defined in a TLV (Type-Length-Value) manner, which allows to call ib_create_flow with a list of variable length of optional specs. For the actual processing of ib_flow_attr the driver uses the number of specs and the size mandatory fields along with the TLV nature of the specs. Steering rules processing order is according to rules priority. The user sets the 12 low-order bits from the priority field and the remaining 4 high-order bits are set by the kernel according to a domain the application or the layer that created the rule belongs to. Lower priority numerical value means higher priority. The returned value from ib_create_flow is instance of struct ib_flow which contains a database pointer (handle) provided by the HW driver to be used when calling ib_destroy_flow. Applications that offload TCP/IP traffic could be written also over IB UD QPs. As such, the ib_create_flow / ib_destroy_flow API is designed to support UD QPs too, the HW driver sets IB_DEVICE_MANAGED_FLOW_STEERING to denote support of flow steering. The ib_flow_attr enum type relates to usage of flow steering for promiscuous and sniffer purposes: IB_FLOW_ATTR_NORMAL - regular rule, steering according to rule specification IB_FLOW_ATTR_ALL_DEFAULT - default unicast and multicast rule, receive all Ethernet traffic which isn't steered to any QP IB_FLOW_ATTR_MC_DEFAULT - same as IB_FLOW_ATTR_ALL_DEFAULT but only for multicast IB_FLOW_ATTR_SNIFFER - sniffer rule, receive all port traffic ALL_DEFAULT and MC_DEFAULT rules options are valid only for Ethernet link type. Signed-off-by: Hadar Hen Zion had...@mellanox.com Signed-off-by: Or Gerlitz ogerl...@mellanox.com --- drivers/infiniband/core/verbs.c | 30 + include/rdma/ib_verbs.h | 135 ++- 2 files changed, 163 insertions(+), 2 deletions(-) diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c index 22192de..932f4a7 100644 --- a/drivers/infiniband/core/verbs.c +++ b/drivers/infiniband/core/verbs.c @@ -1254,3 +1254,33 @@ int ib_dealloc_xrcd(struct ib_xrcd *xrcd) return xrcd-device-dealloc_xrcd(xrcd); } EXPORT_SYMBOL(ib_dealloc_xrcd); + +struct ib_flow *ib_create_flow(struct ib_qp *qp, + struct ib_flow_attr *flow_attr, + int domain) +{ + struct ib_flow *flow_id; + if (!qp-device-create_flow) + return ERR_PTR(-ENOSYS); + + flow_id = qp-device-create_flow(qp, flow_attr, domain); + if (!IS_ERR(flow_id)) + atomic_inc(qp-usecnt); + return flow_id; +} +EXPORT_SYMBOL(ib_create_flow); + +int ib_destroy_flow(struct ib_flow *flow_id) +{ + int err; + struct ib_qp *qp = flow_id-qp; + + if (!flow_id-qp-device-destroy_flow) + return -ENOSYS; + + err = qp-device-destroy_flow(flow_id); + if (!err) + atomic_dec(qp-usecnt); + return err; +} +EXPORT_SYMBOL(ib_destroy_flow); diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index 98cc4b2..8e18d17 100644 --- a/include/rdma/ib_verbs.h +++ b/include/rdma/ib_verbs.h @@ -116,7 +116,8 @@ enum ib_device_cap_flags { IB_DEVICE_MEM_MGT_EXTENSIONS= (121), IB_DEVICE_BLOCK_MULTICAST_LOOPBACK = (122), IB_DEVICE_MEM_WINDOW_TYPE_2A= (123
[PATCH V2 for-next 4/4] IB/mlx4: Add receive Flow Steering support
From: Hadar Hen Zion had...@mellanox.com Implement ib_create_flow and ib_destroy_flow. Translate the verbs structures provided by the user to HW structures and call the MLX4_QP_FLOW_STEERING_ATTACH/DETACH firmware commands. On the ATTACH command completion, the firmware provides 64 bit registration ID which is placed into struct mlx4_ib_flow that wraps the instance of struct ib_flow which is retuned to caller. Later, this reg ID is used for detaching that flow from the firmware. Signed-off-by: Hadar Hen Zion had...@mellanox.com Signed-off-by: Or Gerlitz ogerl...@mellanox.com --- drivers/infiniband/hw/mlx4/main.c| 244 ++ drivers/infiniband/hw/mlx4/mlx4_ib.h | 12 ++ include/linux/mlx4/device.h |5 - 3 files changed, 256 insertions(+), 5 deletions(-) diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c index a188d31..752c958 100644 --- a/drivers/infiniband/hw/mlx4/main.c +++ b/drivers/infiniband/hw/mlx4/main.c @@ -54,6 +54,8 @@ #define DRV_VERSION1.0 #define DRV_RELDATEApril 4, 2008 +#define MLX4_IB_FLOW_MAX_PRIO 0xFFF + MODULE_AUTHOR(Roland Dreier); MODULE_DESCRIPTION(Mellanox ConnectX HCA InfiniBand driver); MODULE_LICENSE(Dual BSD/GPL); @@ -88,6 +90,25 @@ static void init_query_mad(struct ib_smp *mad) static union ib_gid zgid; +static int check_flow_steering_support(struct mlx4_dev *dev) +{ + int ib_num_ports = 0; + int i; + + mlx4_foreach_port(i, dev, MLX4_PORT_TYPE_IB) + ib_num_ports++; + + if (dev-caps.steering_mode == MLX4_STEERING_MODE_DEVICE_MANAGED) { + if (ib_num_ports || mlx4_is_mfunc(dev)) { + pr_warn(Device managed flow steering is unavailable + for IB ports or in multifunction env.\n); + return 0; + } + return 1; + } + return 0; +} + static int mlx4_ib_query_device(struct ib_device *ibdev, struct ib_device_attr *props) { @@ -144,6 +165,8 @@ static int mlx4_ib_query_device(struct ib_device *ibdev, props-device_cap_flags |= IB_DEVICE_MEM_WINDOW_TYPE_2B; else props-device_cap_flags |= IB_DEVICE_MEM_WINDOW_TYPE_2A; + if (check_flow_steering_support(dev-dev)) + props-device_cap_flags |= IB_DEVICE_MANAGED_FLOW_STEERING; } props-vendor_id = be32_to_cpup((__be32 *) (out_mad-data + 36)) @@ -798,6 +821,218 @@ struct mlx4_ib_steering { union ib_gid gid; }; +static int parse_flow_attr(struct mlx4_dev *dev, + struct _ib_flow_spec *ib_spec, + struct _rule_hw *mlx4_spec) +{ + enum mlx4_net_trans_rule_id type; + + switch (ib_spec-type) { + case IB_FLOW_SPEC_ETH: + type = MLX4_NET_TRANS_RULE_ID_ETH; + memcpy(mlx4_spec-eth.dst_mac, ib_spec-eth.val.dst_mac, + ETH_ALEN); + memcpy(mlx4_spec-eth.dst_mac_msk, ib_spec-eth.mask.dst_mac, + ETH_ALEN); + mlx4_spec-eth.vlan_tag = ib_spec-eth.val.vlan_tag; + mlx4_spec-eth.vlan_tag_msk = ib_spec-eth.mask.vlan_tag; + break; + + case IB_FLOW_SPEC_IB: + type = MLX4_NET_TRANS_RULE_ID_IB; + mlx4_spec-ib.l3_qpn = ib_spec-ib.val.l3_type_qpn; + mlx4_spec-ib.qpn_mask = ib_spec-ib.mask.l3_type_qpn; + memcpy(mlx4_spec-ib.dst_gid, ib_spec-ib.val.dst_gid, 16); + memcpy(mlx4_spec-ib.dst_gid_msk, + ib_spec-ib.mask.dst_gid, 16); + break; + + case IB_FLOW_SPEC_IPV4: + type = MLX4_NET_TRANS_RULE_ID_IPV4; + mlx4_spec-ipv4.src_ip = ib_spec-ipv4.val.src_ip; + mlx4_spec-ipv4.src_ip_msk = ib_spec-ipv4.mask.src_ip; + mlx4_spec-ipv4.dst_ip = ib_spec-ipv4.val.dst_ip; + mlx4_spec-ipv4.dst_ip_msk = ib_spec-ipv4.mask.dst_ip; + break; + + case IB_FLOW_SPEC_TCP: + case IB_FLOW_SPEC_UDP: + type = ib_spec-type == IB_FLOW_SPEC_TCP ? + MLX4_NET_TRANS_RULE_ID_TCP : + MLX4_NET_TRANS_RULE_ID_UDP; + mlx4_spec-tcp_udp.dst_port = ib_spec-tcp_udp.val.dst_port; + mlx4_spec-tcp_udp.dst_port_msk = ib_spec-tcp_udp.mask.dst_port; + mlx4_spec-tcp_udp.src_port = ib_spec-tcp_udp.val.src_port; + mlx4_spec-tcp_udp.src_port_msk = ib_spec-tcp_udp.mask.src_port; + break; + + default: + return -EINVAL; + } + if (mlx4_map_sw_to_hw_steering_id(dev, type) 0 || + mlx4_hw_rule_sz(dev, type) 0) + return -EINVAL; + mlx4_spec-id = cpu_to_be16(mlx4_map_sw_to_hw_steering_id(dev, type
[PATCH V2 for-next 2/4] IB/core: Infra-structure to support verbs extensions through uverbs
From: Igor Ivanov igor.iva...@itseez.com Add Infra-structure to support extended uverbs capabilities in a forward/backward manner. Uverbs command opcodes which are based on the verbs extensions approach should be greater or equal to IB_USER_VERBS_CMD_THRESHOLD. They have new header format and processed a bit differently. Signed-off-by: Igor Ivanov igor.iva...@itseez.com Signed-off-by: Hadar Hen Zion had...@mellanox.com Signed-off-by: Or Gerlitz ogerl...@mellanox.com --- drivers/infiniband/core/uverbs_main.c | 29 - include/uapi/rdma/ib_user_verbs.h | 10 ++ 2 files changed, 34 insertions(+), 5 deletions(-) diff --git a/drivers/infiniband/core/uverbs_main.c b/drivers/infiniband/core/uverbs_main.c index 2c6f0f2..e4e7b24 100644 --- a/drivers/infiniband/core/uverbs_main.c +++ b/drivers/infiniband/core/uverbs_main.c @@ -583,9 +583,6 @@ static ssize_t ib_uverbs_write(struct file *filp, const char __user *buf, if (copy_from_user(hdr, buf, sizeof hdr)) return -EFAULT; - if (hdr.in_words * 4 != count) - return -EINVAL; - if (hdr.command = ARRAY_SIZE(uverbs_cmd_table) || !uverbs_cmd_table[hdr.command]) return -EINVAL; @@ -597,8 +594,30 @@ static ssize_t ib_uverbs_write(struct file *filp, const char __user *buf, if (!(file-device-ib_dev-uverbs_cmd_mask (1ull hdr.command))) return -ENOSYS; - return uverbs_cmd_table[hdr.command](file, buf + sizeof hdr, -hdr.in_words * 4, hdr.out_words * 4); + if (hdr.command = IB_USER_VERBS_CMD_THRESHOLD) { + struct ib_uverbs_cmd_hdr_ex hdr_ex; + + if (copy_from_user(hdr_ex, buf, sizeof(hdr_ex))) + return -EFAULT; + + if (((hdr_ex.in_words + hdr_ex.provider_in_words) * 4) != count) + return -EINVAL; + + return uverbs_cmd_table[hdr.command](file, +buf + sizeof(hdr_ex), +(hdr_ex.in_words + + hdr_ex.provider_in_words) * 4, +(hdr_ex.out_words + + hdr_ex.provider_out_words) * 4); + } else { + if (hdr.in_words * 4 != count) + return -EINVAL; + + return uverbs_cmd_table[hdr.command](file, +buf + sizeof(hdr), +hdr.in_words * 4, +hdr.out_words * 4); + } } static int ib_uverbs_mmap(struct file *filp, struct vm_area_struct *vma) diff --git a/include/uapi/rdma/ib_user_verbs.h b/include/uapi/rdma/ib_user_verbs.h index 805711e..61535aa 100644 --- a/include/uapi/rdma/ib_user_verbs.h +++ b/include/uapi/rdma/ib_user_verbs.h @@ -43,6 +43,7 @@ * compatibility are made. */ #define IB_USER_VERBS_ABI_VERSION 6 +#define IB_USER_VERBS_CMD_THRESHOLD50 enum { IB_USER_VERBS_CMD_GET_CONTEXT, @@ -123,6 +124,15 @@ struct ib_uverbs_cmd_hdr { __u16 out_words; }; +struct ib_uverbs_cmd_hdr_ex { + __u32 command; + __u16 in_words; + __u16 out_words; + __u16 provider_in_words; + __u16 provider_out_words; + __u32 cmd_hdr_reserved; +}; + struct ib_uverbs_get_context { __u64 response; __u64 driver_data[0]; -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V2 for-next 3/4] IB/core: Export ib_create/destroy_flow through uverbs
From: Hadar Hen Zion had...@mellanox.com Implement ib_uverbs_create_flow and ib_uverbs_destroy_flow to support flow steering for user space applications. Signed-off-by: Hadar Hen Zion had...@mellanox.com Signed-off-by: Or Gerlitz ogerl...@mellanox.com --- drivers/infiniband/core/uverbs.h |3 + drivers/infiniband/core/uverbs_cmd.c | 206 + drivers/infiniband/core/uverbs_main.c | 13 ++- include/rdma/ib_verbs.h |1 + include/uapi/rdma/ib_user_verbs.h | 102 - 5 files changed, 323 insertions(+), 2 deletions(-) diff --git a/drivers/infiniband/core/uverbs.h b/drivers/infiniband/core/uverbs.h index 0fcd7aa..ad9d102 100644 --- a/drivers/infiniband/core/uverbs.h +++ b/drivers/infiniband/core/uverbs.h @@ -155,6 +155,7 @@ extern struct idr ib_uverbs_cq_idr; extern struct idr ib_uverbs_qp_idr; extern struct idr ib_uverbs_srq_idr; extern struct idr ib_uverbs_xrcd_idr; +extern struct idr ib_uverbs_rule_idr; void idr_remove_uobj(struct idr *idp, struct ib_uobject *uobj); @@ -215,5 +216,7 @@ IB_UVERBS_DECLARE_CMD(destroy_srq); IB_UVERBS_DECLARE_CMD(create_xsrq); IB_UVERBS_DECLARE_CMD(open_xrcd); IB_UVERBS_DECLARE_CMD(close_xrcd); +IB_UVERBS_DECLARE_CMD(create_flow); +IB_UVERBS_DECLARE_CMD(destroy_flow); #endif /* UVERBS_H */ diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c index a7d00f6..956782b 100644 --- a/drivers/infiniband/core/uverbs_cmd.c +++ b/drivers/infiniband/core/uverbs_cmd.c @@ -54,6 +54,7 @@ static struct uverbs_lock_class qp_lock_class = { .name = QP-uobj }; static struct uverbs_lock_class ah_lock_class = { .name = AH-uobj }; static struct uverbs_lock_class srq_lock_class = { .name = SRQ-uobj }; static struct uverbs_lock_class xrcd_lock_class = { .name = XRCD-uobj }; +static struct uverbs_lock_class rule_lock_class = { .name = RULE-uobj }; #define INIT_UDATA(udata, ibuf, obuf, ilen, olen) \ do {\ @@ -330,6 +331,7 @@ ssize_t ib_uverbs_get_context(struct ib_uverbs_file *file, INIT_LIST_HEAD(ucontext-srq_list); INIT_LIST_HEAD(ucontext-ah_list); INIT_LIST_HEAD(ucontext-xrcd_list); + INIT_LIST_HEAD(ucontext-rule_list); ucontext-closing = 0; resp.num_comp_vectors = file-device-num_comp_vectors; @@ -2587,6 +2589,210 @@ out_put: return ret ? ret : in_len; } +static int kern_spec_to_ib_spec(struct ib_kern_spec *kern_spec, + struct _ib_flow_spec *ib_spec) +{ + ib_spec-type = kern_spec-type; + + switch (ib_spec-type) { + case IB_FLOW_SPEC_ETH: + ib_spec-eth.size = sizeof(struct ib_flow_spec_eth); + memcpy(ib_spec-eth.val, kern_spec-eth.val, + sizeof(struct ib_flow_eth_filter)); + memcpy(ib_spec-eth.mask, kern_spec-eth.mask, + sizeof(struct ib_flow_eth_filter)); + break; + case IB_FLOW_SPEC_IB: + ib_spec-ib.size = sizeof(struct ib_flow_spec_ib); + memcpy(ib_spec-ib.val, kern_spec-ib.val, + sizeof(struct ib_flow_ib_filter)); + memcpy(ib_spec-ib.mask, kern_spec-ib.mask, + sizeof(struct ib_flow_ib_filter)); + break; + case IB_FLOW_SPEC_IPV4: + ib_spec-ipv4.size = sizeof(struct ib_flow_spec_ipv4); + memcpy(ib_spec-ipv4.val, kern_spec-ipv4.val, + sizeof(struct ib_flow_ipv4_filter)); + memcpy(ib_spec-ipv4.mask, kern_spec-ipv4.mask, + sizeof(struct ib_flow_ipv4_filter)); + break; + case IB_FLOW_SPEC_TCP: + case IB_FLOW_SPEC_UDP: + ib_spec-tcp_udp.size = sizeof(struct ib_flow_spec_tcp_udp); + memcpy(ib_spec-tcp_udp.val, kern_spec-tcp_udp.val, + sizeof(struct ib_flow_tcp_udp_filter)); + memcpy(ib_spec-tcp_udp.mask, kern_spec-tcp_udp.mask, + sizeof(struct ib_flow_tcp_udp_filter)); + break; + default: + return -EINVAL; + } + return 0; +} + +ssize_t ib_uverbs_create_flow(struct ib_uverbs_file *file, + const char __user *buf, int in_len, + int out_len) +{ + struct ib_uverbs_create_flow cmd; + struct ib_uverbs_create_flow_resp resp; + struct ib_uobject *uobj; + struct ib_flow*flow_id; + struct ib_kern_flow_attr *kern_flow_attr; + struct ib_flow_attr *flow_attr; + struct ib_qp *qp; + int err = 0; + void *kern_spec; + void *ib_spec; + int i; + + if (out_len sizeof(resp)) + return -ENOSPC; + + if (copy_from_user(cmd, buf
[PATCH for/net-next 8/8] IB/mlx5: Mellanox Connect-IB, IB driver part 5/5
From: Eli Cohen e...@mellanox.com Signed-off-by: Eli Cohen e...@mellanox.com --- MAINTAINERS | 10 ++ drivers/infiniband/Kconfig |1 + drivers/infiniband/Makefile |1 + drivers/infiniband/hw/mlx5/Kconfig | 10 ++ drivers/infiniband/hw/mlx5/Makefile |4 5 files changed, 26 insertions(+), 0 deletions(-) create mode 100644 drivers/infiniband/hw/mlx5/Kconfig create mode 100644 drivers/infiniband/hw/mlx5/Makefile diff --git a/MAINTAINERS b/MAINTAINERS index 6e82fb5..b426536 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -5377,6 +5377,16 @@ S: Supported F: drivers/net/ethernet/mellanox/mlx5/core/ F: include/linux/mlx5/ +Mellanox MLX5 IB driver +M: Eli Cohen e...@mellanox.com +L: linux-rdma@vger.kernel.org +W: http://www.mellanox.com +Q: http://patchwork.kernel.org/project/linux-rdma/list/ +T: git://openfabrics.org/~eli/connect-ib.git +S: Supported +F: include/linux/mlx5/ +F: drivers/infiniband/hw/mlx5/ + MODULE SUPPORT M: Rusty Russell ru...@rustcorp.com.au S: Maintained diff --git a/drivers/infiniband/Kconfig b/drivers/infiniband/Kconfig index c85b56c..5ceda71 100644 --- a/drivers/infiniband/Kconfig +++ b/drivers/infiniband/Kconfig @@ -50,6 +50,7 @@ source drivers/infiniband/hw/amso1100/Kconfig source drivers/infiniband/hw/cxgb3/Kconfig source drivers/infiniband/hw/cxgb4/Kconfig source drivers/infiniband/hw/mlx4/Kconfig +source drivers/infiniband/hw/mlx5/Kconfig source drivers/infiniband/hw/nes/Kconfig source drivers/infiniband/hw/ocrdma/Kconfig diff --git a/drivers/infiniband/Makefile b/drivers/infiniband/Makefile index b126fef..1fe6988 100644 --- a/drivers/infiniband/Makefile +++ b/drivers/infiniband/Makefile @@ -7,6 +7,7 @@ obj-$(CONFIG_INFINIBAND_AMSO1100) += hw/amso1100/ obj-$(CONFIG_INFINIBAND_CXGB3) += hw/cxgb3/ obj-$(CONFIG_INFINIBAND_CXGB4) += hw/cxgb4/ obj-$(CONFIG_MLX4_INFINIBAND) += hw/mlx4/ +obj-$(CONFIG_MLX5_INFINIBAND) += hw/mlx5/ obj-$(CONFIG_INFINIBAND_NES) += hw/nes/ obj-$(CONFIG_INFINIBAND_OCRDMA)+= hw/ocrdma/ obj-$(CONFIG_INFINIBAND_IPOIB) += ulp/ipoib/ diff --git a/drivers/infiniband/hw/mlx5/Kconfig b/drivers/infiniband/hw/mlx5/Kconfig new file mode 100644 index 000..8e6aebf --- /dev/null +++ b/drivers/infiniband/hw/mlx5/Kconfig @@ -0,0 +1,10 @@ +config MLX5_INFINIBAND + tristate Mellanox Connect-IB HCA support + depends on NETDEVICES ETHERNET PCI X86 + select NET_VENDOR_MELLANOX + select MLX5_CORE + ---help--- + This driver provides low-level InfiniBand support for + Mellanox Connect-IB PCI Express host channel adapters (HCAs). + This is required to use InfiniBand protocols such as + IP-over-IB or SRP with these devices. diff --git a/drivers/infiniband/hw/mlx5/Makefile b/drivers/infiniband/hw/mlx5/Makefile new file mode 100644 index 000..0f492da --- /dev/null +++ b/drivers/infiniband/hw/mlx5/Makefile @@ -0,0 +1,4 @@ +obj-$(CONFIG_MLX5_INFINIBAND) += mlx5_ib.o +ccflags-y += -Wall -Werror -DDEBUG + +mlx5_ib-y := main.o cq.o doorbell.o qp.o mem.o srq.o mr.o ah.o mad.o -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH for/net-next 0/8] Add Mellanox mlx5 driver for Connect-IB devices
From: Eli Cohen e...@mellanox.com The patches that follow constitute the driver for Mellanox's 5th generation of HCAs named Connect-IB. The driver is comprised of two kernel modules: mlx5_ib and mlx5_core. This partitioning resembles what we have for mlx4 with the substantial difference that mlx5_ib is the pci device driver and not mlx5_core. mlx5_core provides general functionality that is intended to be used by other Mellanox devices that will be introduced in the future. In this sense, it can be perceived as a library. mlx5_ib has a similar role as any hardware device under drivers/infiniband/hw. The patches are partitioned to avoid exceeding the 100KB vger.kernel.org limitation. They are divided such that the first three ones have the code of the mlx5_core driver, and the last five the code of the mlx5_ib driver. Only the last patch per driver adds the Makefiles and Kconfigs, to make things robust for future bisections. PPC is not yet supported but support will be included in the near future. changes from V0: - Per Dave's request, cross posting to both netdev and linux-rdma, to see if there are comments from netdev on the core driver. Eli Cohen (8): net/mlx5: Mellanox Connect-IB, core driver part 1/3 net/mlx5: Mellanox Connect-IB, core driver part 2/3 net/mlx5: Mellanox Connect-IB, core driver part 3/3 IB/mlx5: Mellanox Connect-IB, IB driver part 1/5 IB/mlx5: Mellanox Connect-IB, IB driver part 2/5 IB/mlx5: Mellanox Connect-IB, IB driver part 3/5 IB/mlx5: Mellanox Connect-IB, IB driver part 4/5 IB/mlx5: Mellanox Connect-IB, IB driver part 5/5 MAINTAINERS| 22 + drivers/infiniband/Kconfig |1 + drivers/infiniband/Makefile|1 + drivers/infiniband/hw/mlx5/Kconfig | 10 + drivers/infiniband/hw/mlx5/Makefile|4 + drivers/infiniband/hw/mlx5/ah.c| 95 + drivers/infiniband/hw/mlx5/cq.c| 851 +++ drivers/infiniband/hw/mlx5/doorbell.c | 100 + drivers/infiniband/hw/mlx5/mad.c | 143 ++ drivers/infiniband/hw/mlx5/main.c | 1512 drivers/infiniband/hw/mlx5/mem.c | 194 ++ drivers/infiniband/hw/mlx5/mlx5_ib.h | 593 + drivers/infiniband/hw/mlx5/mr.c| 1025 drivers/infiniband/hw/mlx5/qp.c| 2549 drivers/infiniband/hw/mlx5/srq.c | 481 drivers/infiniband/hw/mlx5/user.h | 123 + drivers/net/ethernet/mellanox/Kconfig |1 + drivers/net/ethernet/mellanox/Makefile |1 + drivers/net/ethernet/mellanox/mlx5/core/Kconfig| 18 + drivers/net/ethernet/mellanox/mlx5/core/Makefile |6 + drivers/net/ethernet/mellanox/mlx5/core/alloc.c| 244 ++ drivers/net/ethernet/mellanox/mlx5/core/cmd.c | 1497 drivers/net/ethernet/mellanox/mlx5/core/cq.c | 226 ++ drivers/net/ethernet/mellanox/mlx5/core/debugfs.c | 600 + drivers/net/ethernet/mellanox/mlx5/core/eq.c | 523 drivers/net/ethernet/mellanox/mlx5/core/fw.c | 187 ++ drivers/net/ethernet/mellanox/mlx5/core/health.c | 216 ++ drivers/net/ethernet/mellanox/mlx5/core/mad.c | 80 + drivers/net/ethernet/mellanox/mlx5/core/main.c | 483 drivers/net/ethernet/mellanox/mlx5/core/mcg.c | 108 + .../net/ethernet/mellanox/mlx5/core/mlx5_core.h| 96 + drivers/net/ethernet/mellanox/mlx5/core/mr.c | 138 ++ .../net/ethernet/mellanox/mlx5/core/pagealloc.c| 438 drivers/net/ethernet/mellanox/mlx5/core/pd.c | 103 + drivers/net/ethernet/mellanox/mlx5/core/port.c | 106 + drivers/net/ethernet/mellanox/mlx5/core/qp.c | 303 +++ drivers/net/ethernet/mellanox/mlx5/core/srq.c | 225 ++ drivers/net/ethernet/mellanox/mlx5/core/uar.c | 225 ++ include/linux/mlx5/cmd.h | 51 + include/linux/mlx5/cq.h| 166 ++ include/linux/mlx5/device.h| 886 +++ include/linux/mlx5/doorbell.h | 81 + include/linux/mlx5/driver.h| 763 ++ include/linux/mlx5/qp.h| 467 include/linux/mlx5/srq.h | 41 + 45 files changed, 15983 insertions(+), 0 deletions(-) create mode 100644 drivers/infiniband/hw/mlx5/Kconfig create mode 100644 drivers/infiniband/hw/mlx5/Makefile create mode 100644 drivers/infiniband/hw/mlx5/ah.c create mode 100644 drivers/infiniband/hw/mlx5/cq.c create mode 100644 drivers/infiniband/hw/mlx5/doorbell.c create mode 100644 drivers/infiniband/hw/mlx5/mad.c create mode 100644 drivers/infiniband/hw/mlx5/main.c create mode 100644 drivers/infiniband/hw/mlx5/mem.c create mode 100644 drivers/infiniband/hw/mlx5/mlx5_ib.h
[PATCH for/net-next 4/8] IB/mlx5: Mellanox Connect-IB, IB driver part 1/5
From: Eli Cohen e...@mellanox.com Signed-off-by: Eli Cohen e...@mellanox.com --- drivers/infiniband/hw/mlx5/ah.c | 95 drivers/infiniband/hw/mlx5/cq.c | 851 + drivers/infiniband/hw/mlx5/doorbell.c | 100 drivers/infiniband/hw/mlx5/mad.c | 143 ++ 4 files changed, 1189 insertions(+), 0 deletions(-) create mode 100644 drivers/infiniband/hw/mlx5/ah.c create mode 100644 drivers/infiniband/hw/mlx5/cq.c create mode 100644 drivers/infiniband/hw/mlx5/doorbell.c create mode 100644 drivers/infiniband/hw/mlx5/mad.c diff --git a/drivers/infiniband/hw/mlx5/ah.c b/drivers/infiniband/hw/mlx5/ah.c new file mode 100644 index 000..ff8f1cb --- /dev/null +++ b/drivers/infiniband/hw/mlx5/ah.c @@ -0,0 +1,95 @@ +/* + * Copyright (c) 2013, Mellanox Technologies inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + *copyright notice, this list of conditions and the following + *disclaimer. + * + * - Redistributions in binary form must reproduce the above + *copyright notice, this list of conditions and the following + *disclaimer in the documentation and/or other materials + *provided with the distribution. + * + * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ + +#include mlx5_ib.h + +struct ib_ah *create_ib_ah(struct ib_ah_attr *ah_attr, + struct mlx5_ib_ah *ah) +{ + u32 sgi; + + if (ah_attr-ah_flags IB_AH_GRH) { + sgi = ah_attr-grh.sgid_index 20; + + memcpy(ah-av.rgid, ah_attr-grh.dgid, 16); + ah-av.grh_gid_fl = cpu_to_be32(ah_attr-grh.flow_label | + (1 30) | sgi); + ah-av.hop_limit = ah_attr-grh.hop_limit; + ah-av.tclass = ah_attr-grh.traffic_class; + } + + ah-av.rlid = cpu_to_be16(ah_attr-dlid); + ah-av.fl_mlid = ah_attr-src_path_bits 0x7f; + ah-av.stat_rate_sl = (ah_attr-static_rate 4) | (ah_attr-sl 0xf); + + return ah-ibah; +} + +struct ib_ah *mlx5_ib_create_ah(struct ib_pd *pd, struct ib_ah_attr *ah_attr) +{ + struct mlx5_ib_ah *ah; + + ah = kzalloc(sizeof(*ah), GFP_ATOMIC); + if (!ah) + return ERR_PTR(-ENOMEM); + + return create_ib_ah(ah_attr, ah); /* never fails */ +} + +int mlx5_ib_query_ah(struct ib_ah *ibah, struct ib_ah_attr *ah_attr) +{ + struct mlx5_ib_ah *ah = to_mah(ibah); + u32 tmp; + + memset(ah_attr, 0, sizeof(*ah_attr)); + + tmp = be32_to_cpu(ah-av.grh_gid_fl); + if (tmp (1 30)) { + ah_attr-ah_flags = IB_AH_GRH; + ah_attr-grh.sgid_index = (tmp 20) 0xff; + ah_attr-grh.flow_label = tmp 0xf; + memcpy(ah_attr-grh.dgid, ah-av.rgid, 16); + ah_attr-grh.hop_limit = ah-av.hop_limit; + ah_attr-grh.traffic_class = ah-av.tclass; + } + ah_attr-dlid = be16_to_cpu(ah-av.rlid); + ah_attr-static_rate = ah-av.stat_rate_sl 4; + ah_attr-sl = ah-av.stat_rate_sl 0xf; + + return 0; +} + +int mlx5_ib_destroy_ah(struct ib_ah *ah) +{ + kfree(to_mah(ah)); + return 0; +} diff --git a/drivers/infiniband/hw/mlx5/cq.c b/drivers/infiniband/hw/mlx5/cq.c new file mode 100644 index 000..001e182 --- /dev/null +++ b/drivers/infiniband/hw/mlx5/cq.c @@ -0,0 +1,851 @@ +/* + * Copyright (c) 2013, Mellanox Technologies inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + *
[PATCH for/net-next 6/8] IB/mlx5: Mellanox Connect-IB, IB driver part 3/5
From: Eli Cohen e...@mellanox.com Signed-off-by: Eli Cohen e...@mellanox.com --- drivers/infiniband/hw/mlx5/mlx5_ib.h | 593 drivers/infiniband/hw/mlx5/mr.c | 1025 ++ 2 files changed, 1618 insertions(+), 0 deletions(-) create mode 100644 drivers/infiniband/hw/mlx5/mlx5_ib.h create mode 100644 drivers/infiniband/hw/mlx5/mr.c diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h new file mode 100644 index 000..f197972 --- /dev/null +++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h @@ -0,0 +1,593 @@ +/* + * Copyright (c) 2013, Mellanox Technologies inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + *copyright notice, this list of conditions and the following + *disclaimer. + * + * - Redistributions in binary form must reproduce the above + *copyright notice, this list of conditions and the following + *disclaimer in the documentation and/or other materials + *provided with the distribution. + * + * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ + +#ifndef MLX5_IB_H +#define MLX5_IB_H + +#include linux/kernel.h +#include linux/sched.h +#include rdma/ib_verbs.h +#include rdma/ib_smi.h +#include linux/mlx5/driver.h +#include linux/mlx5/cq.h +#include linux/mlx5/qp.h +#include linux/mlx5/srq.h +#include linux/types.h + +extern int mlx5_ib_debug_mask; + +#define mlx5_ib_dbg(dev, format, arg...) \ +do { \ + if (debug_mask mlx5_ib_debug_mask) \ + pr_debug(%s:%s:%d:(pid %d): format, (dev)-ib_dev.name, \ +__func__, __LINE__, current-pid, ##arg); \ +} while (0) + +#define mlx5_ib_err(dev, format, arg...) \ +pr_err(%s:%s:%d:(pid %d): format, (dev)-ib_dev.name, __func__,\ + __LINE__, current-pid, ##arg) + +#define mlx5_ib_warn(dev, format, arg...) \ +pr_warn(%s:%s:%d:(pid %d): format, (dev)-ib_dev.name, __func__, \ + __LINE__, current-pid, ##arg) + +#define MLX5_IB_MOD_DBG_MASK(mod_id)\ +static const u32 debug_mask = 1 (mod_id) + +enum { + MLX5_IB_MMAP_CMD_SHIFT = 8, + MLX5_IB_MMAP_CMD_MASK = 0xff, +}; + +enum mlx5_ib_mmap_cmd { + MLX5_IB_MMAP_REGULAR_PAGE = 0, + MLX5_IB_MMAP_GET_CONTIGUOUS_PAGES = 1, /* always last */ +}; + +enum { + MLX5_RES_SCAT_DATA32_CQE= 0x1, + MLX5_RES_SCAT_DATA64_CQE= 0x2, + MLX5_REQ_SCAT_DATA32_CQE= 0x11, + MLX5_REQ_SCAT_DATA64_CQE= 0x22, +}; + +enum { + MLX5_IB_MOD_MR, + MLX5_IB_MOD_CQ, + MLX5_IB_MOD_QP, + MLX5_IB_MOD_MEM, + MLX5_IB_MOD_MAIN, + MLX5_IB_MOD_MAD, + MLX5_IB_MOD_SRQ, +}; + +/* + * we do not expose this yet so we use a value out of range */ +enum { + IB_QPT_REG_UMR = IB_QPT_MAX + 0x1234, +}; + +/* === this should be passed to the vergbs layer */ +enum { + IB_WR_SET_PSV = IB_WR_BIND_MW + 10, + IB_WR_GET_PSV, + IB_WR_CHECK_PSV, + IB_WR_RGET_PSV, + IB_WR_RCHECK_PSV, + IB_WR_UMR, +}; + +enum { + IB_SEND_UMR_UNREG = IB_SEND_IP_CSUM 1, +}; + +enum ib_latency_class { + IB_LATENCY_CLASS_LOW, + IB_LATENCY_CLASS_MEDIUM, + IB_LATENCY_CLASS_HIGH, + IB_LATENCY_CLASS_FAST_PATH +}; +/* === this should be passed to the vergbs layer */ + + +enum mlx5_ib_mad_ifc_flags { + MLX5_MAD_IFC_IGNORE_MKEY= 1, + MLX5_MAD_IFC_IGNORE_BKEY= 2, + MLX5_MAD_IFC_NET_VIEW = 4, +}; + +struct mlx5_ib_ucontext { + struct ib_ucontext ibucontext; + struct list_headdb_page_list; + + /* +* protect doorbell record alloc/free +*/ + struct mutexdb_page_mutex; + struct mlx5_uuar_info uuari; +}; + +static inline struct mlx5_ib_ucontext *to_mucontext(struct
Re: [PATCH V2 for-next 2/4] IB/core: Infra-structure to support verbs extensions through uverbs
On 26/06/2013 16:05, Roland Dreier wrote: On Wed, Jun 26, 2013 at 5:57 AM, Or Gerlitz ogerl...@mellanox.com wrote: Add Infra-structure to support extended uverbs capabilities in a forward/backward manner. Uverbs command opcodes which are based on the verbs extensions approach should be greater or equal to IB_USER_VERBS_CMD_THRESHOLD. They have new header format and processed a bit differently. I think you missed the feedback I gave to the previous version of this patch: This patch at least doesn't have a sufficient changelog. I don't understand what extended capabilities are or why we need to change the header format. What is the verbs extensions approach? Why does the kernel need to know about it? What is different about the processing? The only difference I see is that userspace now has a more complicated way to pass the size in, which the kernel seems to nearly ignore -- it just adds the sizes together and proceeds as before. Roland, you provided the comment to this patch indeed, but it was on another series where the patch was posted, the RoCE IP based addressing one. I posted it twice since its an infrastructure (...) patch used by both series, I wanted to post V2 of the flow steering patches to make sure I addressed your comment on the void pointer OK, and take things from there, never mind. To the point, the uverbs extensions construct is basically made from two building blocks 1. extended header which explicitly specifies the in/out verbs data size and in/out provider data size 2. a bit mask (comp mask) which allows to specify what fields in the uverbs command structure are used. The combination of 1 + 2 will allow to extend commands which are provided along these building blocks without a need to bump the uverbs ABI. Today, the kernel uverbs layer assumes a given size for each command, so for example, the provider udata IN size is in_words - size_of_cmd. For commands added along this framework, the kernel could support all the previous versions towards user space in parallel, say we added new command cmdX, to both user and kernel, where v0 is the initial version, and later we added few fields to and have cmdX_v1 and later on more fields and have cmdX_v2 +struct ib_uverbs_cmd_hdr_ex { + __u32 command; + __u16 in_words; + __u16 out_words; + __u16 provider_in_words; + __u16 provider_out_words; + __u32 cmd_hdr_reserved; +}; + Based on the bits set in the comp mask and the in_words field value, the kernel which has cmdX_v2 can work towards older user space libraries/applications e.g cmdX_v1 and cmdX_v0 The comp mask is not part of the header, but rather the 1st field of every uverbs command and response, here, in this series, it was added in patch 3/4 for the uverbs flow-steering structures which are cmdX_v0 in this context. If we only used (in_words - size_of_cmd) we can't achieve that support. Or. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V2 for-next 2/4] IB/core: Infra-structure to support verbs extensions through uverbs
On 26/06/2013 18:17, Or Gerlitz wrote: Based on the bits set in the comp mask and the in_words field value, the kernel which has cmdX_v2 can work towards older user space libraries/applications e.g cmdX_v1 and cmdX_v0 The comp mask is not part of the header, but rather the 1st field of every uverbs command and response, here, in this series, it was added in patch 3/4 for the uverbs flow-steering structures which are cmdX_v0 in this context. The comp mask biz logic is also explained in Tzahi's OFA 2013 talk on verbs extensions, he is referring their to extending libibverbs API in user space towards applications but the concept is the same, slides here https://www.openfabrics.org/resources/document-downloads/presentations/doc_download/549-extending-verbs-api.html -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V2 for-next 1/4] IB/core: Add receive Flow Steering support
On Wed, Jun 26, 2013 at 10:56 PM, Hefty, Sean sean.he...@intel.com wrote: The input to ib_create_flow is instance of struct ib_flow_attr which contain few mandatory control elements and optional flow specs. struct ib_flow_attr { enum ib_flow_attr_type type; u16 size; u16 priority; u8 num_of_specs; u8 port; u32 flags; This structure could be aligned better. OK, I assume you mean arrange fields by decreasing size, correct? so here we need to put the flags field before the size field. /* Following are the optional layers according to user request * struct ib_flow_spec_yyy * struct ib_flow_spec_zzz */ }; As these specs are eventually coming from user space, they are defined and used in a way which allows adding new spec types without kernel/user ABI change, and with a little API enhancement which defines the newly added spec. The flow spec structures are defined in a TLV (Type-Length-Value) manner, which allows to call ib_create_flow with a list of variable length of optional specs. For the actual processing of ib_flow_attr the driver uses the number of specs and the size mandatory fields along with the TLV nature of the specs. Steering rules processing order is according to rules priority. The user sets the 12 low-order bits from the priority field and the remaining 4 high-order bits are set by the kernel according to a domain the application or the layer that created the rule belongs to. Lower priority numerical value means higher priority. Why are bit fields being exposed to the user in this way? Yes, this is probably not general enough. So what would you suggest, use a more integral division? e.g 16 bits for priority and 16 bits for location? +struct ib_flow *ib_create_flow(struct ib_qp *qp, +struct ib_flow_attr *flow_attr, +int domain) +{ + struct ib_flow *flow_id; + if (!qp-device-create_flow) + return ERR_PTR(-ENOSYS); + + flow_id = qp-device-create_flow(qp, flow_attr, domain); + if (!IS_ERR(flow_id)) + atomic_inc(qp-usecnt); + return flow_id; +} +EXPORT_SYMBOL(ib_create_flow); + +int ib_destroy_flow(struct ib_flow *flow_id) +{ + int err; + struct ib_qp *qp = flow_id-qp; + + if (!flow_id-qp-device-destroy_flow) + return -ENOSYS; We can assume destroy_flow exists if create_flow does. OK, will fix. +struct ib_flow_ib_filter { + __be32 l3_type_qpn; + u8 dst_gid[16]; +}; Maybe this is just a naming issue, but why wouldn't an IB filter have SLID/DLID instead of just DGID? What does l3_type_qpn mean? Is this just the QPN? yes, its just the QPN, will fix the name to better match. The TCP/IP filters are broken into separate filters based in L4/L3. It would seem to make sense if the IB filters were similarly divided into L2/L3/L4 filters. IB and IPv6 could probably share the same filter definition. IPv6 filters wasn't defined through this submission, but as I wrote, the scheme provided allows for adding more filters and flow specs. +struct ib_flow_spec_ib { + enum ib_flow_spec_type type; + u16 size; + struct ib_flow_ib_filter val; + struct ib_flow_ib_filter mask; +}; + +struct ib_flow_ipv4_filter { + __be32 src_ip; + __be32 dst_ip; +}; + +struct ib_flow_spec_ipv4 { + enum ib_flow_spec_type type; + u16size; + struct ib_flow_ipv4_filter val; + struct ib_flow_ipv4_filter mask; +}; + +struct ib_flow_tcp_udp_filter { + __be16 dst_port; + __be16 src_port; +}; + +struct ib_flow_spec_tcp_udp { + enum ib_flow_spec_typetype; + u16 size; + struct ib_flow_tcp_udp_filter val; + struct ib_flow_tcp_udp_filter mask; +}; - Sean -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V2 for-next 1/4] IB/core: Add receive Flow Steering support
On Thu, Jun 27, 2013 at 12:33 AM, Steve Wise sw...@opengridcomputing.com wrote: On 6/26/2013 4:13 PM, Or Gerlitz wrote: On Wed, Jun 26, 2013 at 10:56 PM, Hefty, Sean sean.he...@intel.com Steering rules processing order is according to rules priority. The user sets the 12 low-order bits from the priority field and the remaining 4 high-order bits are set by the kernel according to a domain the application or the layer that created the rule belongs to. Lower priority numerical value means higher priority. Why are bit fields being exposed to the user in this way? Yes, this is probably not general enough. So what would you suggest, use a more integral division? e.g 16 bits for priority and 16 bits for location? If the kernel driver is setting the location, whatever that is, why would the application need access to it? IE isn't a priority field enough to allow the application provide an ordering/prioritization to the rules? I wasn't accurate, the idea is that per domain we allow the app to set the rule priority, but the actual priority towards the HW is made of the provided prioriry X domain, where different domains have different priorities along the order set by the verbs header file see enum ib_flow_domain -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V2 for-next 1/4] IB/core: Add receive Flow Steering support
On Thu, Jun 27, 2013 at 11:55 PM, Hefty, Sean sean.he...@intel.com wrote: My point was that the IPv6 filter should be defined and used here. The following basic filters were defined: ethernet - src/dst mac ... ip -src/dst ip tcp/udp - src/dst port These are at least somewhat intuitive to me. The IB filter is ib -(src/dst?) qpn, dgid This is equivalent to creating a filter that's: tcpip - port, dst ip IMO, it would be better to define IB filters using the same structure that you used for tcp/ip/ethernet. For example ibqp - src/dst qpn (pkey?) ipv6 - src/dst ipv6/gids (flowlabel?) iblink -src/dst lids, (sl?) If the hardware can only support matching on the qpn and dgid, then it can simply fail any requests which specify a non-zero mask on the unsupported components. Sean, I agree that the provided filter on dest qpn / dgid doesn't make sense and will fix that out. Still for the initial set of patches that goes in I tend to just remove the IB filter structure and define the different IB filters along your proposal in a follow-up patches/es, OK? -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH for-next 0/8] Add Mellanox mlx5 driver for Connect-IB devices
On Sun, Jun 16, 2013 at 3:02 PM, Eli Cohen e...@dev.mellanox.co.il wrote: From: Eli Cohen e...@mellanox.com The patches that follow constitute the driver for Mellanox's 5th generation of HCAs named Connect-IB. The driver is comprised of two kernel modules: mlx5_ib and mlx5_core. This partitioning resembles what we have for mlx4 with the substantial difference that mlx5_ib is the pci device driver and not mlx5_core. Hi Roland, We're on 3.10-rc7 soon on -rc8, so things warm up for 3.11... today will mark two working weeks since the mlx5 driver was posted here, and no comment, its marked as new in your patchwork. Is this safe for 3.11? any comments or fixes we have to apply? As you probably saw, I posted V1 which is essentially almost the same as V0 but with netdev copied, to see if there are rejections/comments from there, so far nothing. Dave said he wants to see this posted to netdev inorder to decide if he's OK for the core driver to be pushed through your tree too. Or. mlx5_core provides general functionality that is intended to be used by other Mellanox devices that will be introduced in the future. In this sense, it can be perceived as a library. mlx5_ib has a similar role as any hardware device under drivers/infiniband/hw. The patches are partitioned to avoid exceeding the 100KB vger.kernel.org limitation. Only the last patch adds the Makefiles and Kconfigs, to make things robust for future bisections. PPC is not yet supported but support will be included in the near future. Eli Eli Cohen (8): mlx5: Mellanox Connect-IB driver part 1/8 mlx5: Mellanox Connect-IB driver part 2/8 mlx5: Mellanox Connect-IB driver part 3/8 mlx5: Mellanox Connect-IB driver part 4/8 mlx5: Mellanox Connect-IB driver part 5/8 mlx5: Mellanox Connect-IB driver part 6/8 mlx5: Mellanox Connect-IB driver part 7/8 mlx5: Mellanox Connect-IB driver part 8/8 MAINTAINERS| 22 + drivers/infiniband/Kconfig |1 + drivers/infiniband/Makefile|1 + drivers/infiniband/hw/mlx5/Kconfig | 10 + drivers/infiniband/hw/mlx5/Makefile|4 + drivers/infiniband/hw/mlx5/ah.c| 95 + drivers/infiniband/hw/mlx5/cq.c| 851 +++ drivers/infiniband/hw/mlx5/doorbell.c | 100 + drivers/infiniband/hw/mlx5/mad.c | 143 ++ drivers/infiniband/hw/mlx5/main.c | 1512 drivers/infiniband/hw/mlx5/mem.c | 194 ++ drivers/infiniband/hw/mlx5/mlx5_ib.h | 593 + drivers/infiniband/hw/mlx5/mr.c| 1025 drivers/infiniband/hw/mlx5/qp.c| 2549 drivers/infiniband/hw/mlx5/srq.c | 481 drivers/infiniband/hw/mlx5/user.h | 123 + drivers/net/ethernet/mellanox/Kconfig |1 + drivers/net/ethernet/mellanox/Makefile |1 + drivers/net/ethernet/mellanox/mlx5/core/Kconfig| 18 + drivers/net/ethernet/mellanox/mlx5/core/Makefile |6 + drivers/net/ethernet/mellanox/mlx5/core/alloc.c| 244 ++ drivers/net/ethernet/mellanox/mlx5/core/cmd.c | 1497 drivers/net/ethernet/mellanox/mlx5/core/cq.c | 226 ++ drivers/net/ethernet/mellanox/mlx5/core/debugfs.c | 600 + drivers/net/ethernet/mellanox/mlx5/core/eq.c | 523 drivers/net/ethernet/mellanox/mlx5/core/fw.c | 187 ++ drivers/net/ethernet/mellanox/mlx5/core/health.c | 216 ++ drivers/net/ethernet/mellanox/mlx5/core/mad.c | 80 + drivers/net/ethernet/mellanox/mlx5/core/main.c | 483 drivers/net/ethernet/mellanox/mlx5/core/mcg.c | 108 + .../net/ethernet/mellanox/mlx5/core/mlx5_core.h| 96 + drivers/net/ethernet/mellanox/mlx5/core/mr.c | 138 ++ .../net/ethernet/mellanox/mlx5/core/pagealloc.c| 438 drivers/net/ethernet/mellanox/mlx5/core/pd.c | 103 + drivers/net/ethernet/mellanox/mlx5/core/port.c | 106 + drivers/net/ethernet/mellanox/mlx5/core/qp.c | 303 +++ drivers/net/ethernet/mellanox/mlx5/core/srq.c | 225 ++ drivers/net/ethernet/mellanox/mlx5/core/uar.c | 225 ++ include/linux/mlx5/cmd.h | 51 + include/linux/mlx5/cq.h| 166 ++ include/linux/mlx5/device.h| 886 +++ include/linux/mlx5/doorbell.h | 81 + include/linux/mlx5/driver.h| 763 ++ include/linux/mlx5/qp.h| 467 include/linux/mlx5/srq.h | 41 + 45 files changed, 15983 insertions(+) create mode 100644 drivers/infiniband/hw/mlx5/Kconfig create mode 100644 drivers/infiniband/hw/mlx5/Makefile create mode 100644
Re: [PATCH for-next 0/8] Add Mellanox mlx5 driver for Connect-IB devices
On Fri, Jun 28, 2013 at 4:20 PM, Or Gerlitz or.gerl...@gmail.com wrote: On Sun, Jun 16, 2013 at 3:02 PM, Eli Cohen e...@dev.mellanox.co.il wrote: From: Eli Cohen e...@mellanox.com The patches that follow constitute the driver for Mellanox's 5th generation of HCAs named Connect-IB. The driver is comprised of two kernel modules: mlx5_ib and mlx5_core. This partitioning resembles what we have for mlx4 with the substantial difference that mlx5_ib is the pci device driver and not mlx5_core. Hi Roland, We're on 3.10-rc7 soon on -rc8, so things warm up for 3.11... today will mark two working weeks since the mlx5 driver was posted here, and no comment, its marked as new in your patchwork. Is this safe for 3.11? any comments or fixes we have to apply? As you probably saw, I posted V1 which is essentially almost the same as V0 but with netdev copied, to see if there are rejections/comments from there, so far nothing. Dave said he wants to see this posted to netdev inorder to decide if he's OK for the core driver to be pushed through your tree too. If this helps, the patches are here git://beany.openfabrics.org/~eli/connect-ib.git branch mlx5-v1-int -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH for-next 0/8] Add Mellanox mlx5 driver for Connect-IB devices
On Tue, Jul 2, 2013 at 12:22 AM, Roland Dreier rol...@kernel.org wrote: Also, sparse warns about [...] in mlx5_ib.h. Nor does it have any callers, so it's a bit hard to tell if it's really and truly a bug. removing this function for V2 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH for-next 0/8] Add Mellanox mlx5 driver for Connect-IB devices
On Mon, Jul 1, 2013 at 9:03 PM, Roland Dreier rol...@kernel.org wrote: In general I don't think overriding the CFLAGS (as you do in the mlx5 Makefiles) is a good idea, and in particular here your -Wall -Werror break the build, at least for my gcc 4.7.3: CC drivers/infiniband/hw/mlx5/qp.o /home/roland/Src/linux-merge.git/drivers/infiniband/hw/mlx5/qp.c: In function ‘sq_overhead’: /home/roland/Src/linux-merge.git/drivers/infiniband/hw/mlx5/qp.c:234:2: error: case value ‘4671’ not in enumerated type ‘enum ib_qp_type’ [-Werror=switch] Will do both (A) remove the flags added on the driver makefile and (B) fix the issues pointed by these flags... [...] What is this IB_QPT_REG_UMR stuff anyway? Shouldn't we strip out all that from the mlx5 driver until it's available in the core code? IB_QPT_REG_UMR is the type of QP used internally by the driver, to do plain memory registration by verbs consumers. Will apply here a similar practice to the one done by mlx4 driver to create the proxy and tunnel QP types for SRIOV, e.g will define MLX5_IB_QPT_REG_UMR and use that under driver specific QP creation flags for which we have the foundations in the IB verbs header file to go and use. [...] /* === this should be passed to the vergbs layer */ enum { IB_WR_SET_PSV = IB_WR_BIND_MW + 10, IB_WR_GET_PSV, IB_WR_CHECK_PSV, IB_WR_RGET_PSV, IB_WR_RCHECK_PSV, IB_WR_UMR, }; enum { IB_SEND_UMR_UNREG = IB_SEND_IP_CSUM 1, }; enum ib_latency_class { IB_LATENCY_CLASS_LOW, IB_LATENCY_CLASS_MEDIUM, IB_LATENCY_CLASS_HIGH, IB_LATENCY_CLASS_FAST_PATH }; /* === this should be passed to the vergbs layer */ looks like it shouldn't be in your submission. (What are vergbs anyway? :) Will fix that, basically, will remove things we can get along for now, e.g unused, even not internally such as IB_WR_YYY_PSV, and internalize what we do need internally e.g use MLX5_IB_XXX where IB_XXX was used and vergbs is a typo whose fix missed the version submitted... -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH for-next 0/8] Add Mellanox mlx5 driver for Connect-IB devices
On Mon, Jul 1, 2013 at 8:49 PM, Roland Dreier rol...@kernel.org wrote: So I'm inclined to apply the mlx5 driver for 3.11, since it's a completely new driver. However, reading through it so far I had the following comments, and I'd like these cleanups addressed along with Dave Miller's: Roland, Working to have all Dave Miller's comments addressed along with yours and post V2 later this week, so we will be still on track for a 3.11 merge of the core and IB driver through your tree. - The debug mask complexity seems unnecessary now that pr_debug() is controllable at runtime with the DYNAMIC_DEBUG stuff. We should get rid of the extra level of indirection. OK - I think the active flag for the health check timer is unnecessary. It can just be stopped with del_timer_sync(). Jack was looking on this today and we're not sure, he will send his reading of the matter tomorrow. - Many places use foo_spl as a name, and in the Linux kernel foo_lock would be much more idiomatic and easier to read. sure, done. - In: +struct mlx5_cmd { ... +struct mlx5_cmd_statsstats[0x80a]; the 0x80a magic number really needs to have a name. done. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 0/13] IB SRP initiator patches for kernel 3.11
On 03/07/2013 15:41, Bart Van Assche wrote: [...] Bart, The individual patches in this series are as follows: 0001-IB-srp-Fix-remove_one-crash-due-to-resource-exhausti.patch 0002-IB-srp-Fix-race-between-srp_queuecommand-and-srp_cla.patch 0003-IB-srp-Avoid-that-srp_reset_host-is-skipped-after-a-.patch 0004-IB-srp-Fail-I-O-fast-if-target-offline.patch 0005-IB-srp-Skip-host-settle-delay.patch 0006-IB-srp-Maintain-a-single-connection-per-I_T-nexus.patch 0007-IB-srp-Keep-rport-as-long-as-the-IB-transport-layer.patch 0008-scsi_transport_srp-Add-transport-layer-error-handlin.patch 0009-IB-srp-Add-srp_terminate_io.patch 0010-IB-srp-Use-SRP-transport-layer-error-recovery.patch 0011-IB-srp-Start-timers-if-a-transport-layer-error-occur.patch 0012-IB-srp-Fail-SCSI-commands-silently.patch 0013-IB-srp-Make-HCA-completion-vector-configurable.patch 0014-IB-srp-Make-transport-layer-retry-count-configurable.patch 0015-IB-srp-Bump-driver-version-and-release-date.patch Some of these patches were already picked by Roland (SB), I would suggest that you post V4 and drop the ones which were accepted. e8ca413 IB/srp: Bump driver version and release date 4b5e5f4 IB/srp: Make HCA completion vector configurable 96fc248 IB/srp: Maintain a single connection per I_T nexus 99e1c13 IB/srp: Fail I/O fast if target offline 2742c1d IB/srp: Skip host settle delay 086f44f IB/srp: Avoid skipping srp_reset_host() after a transport error 1fe0cb8 IB/srp: Fix remove_one crash due to resource exhaustion Also, Would help if you use the --cover-letter of git format-patch and the resulted cover letter (patch 0/N) as it has standard content which you can enhance and place your additions. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH for-next 0/8] Add Mellanox mlx5 driver for Connect-IB devices
On 01/07/2013 20:49, Roland Dreier wrote: - I think the active flag for the health check timer is unnecessary. It can just be stopped with del_timer_sync(). Hi Roland Jack looked on this comment/code and he says that the active flag is used to prevent re-scheduling the timer from inside the timer handling routine. In the kernel, the comment header in the source file for del_timer_sync explicitly states that re-scheduling the timer must be prevented, or the sync is useless:Callers must prevent restarting of the timer, otherwise this function is meaningless So we believe that code should remain. Or. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V2 4/9] IB/core: Add reserved values to enums for low-level drivers use
From: Jack Morgenstein ja...@dev.mellanox.co.il Continue the approach taken by commit d2b57063e4a IB/core: Reserve bits in enum ib_qp_create_flags for low-level driver use and reserved entries to the ib_qp_type and ib_wr_opcode enums. The low-level drivers will then define macros to use these reserved values, giving proper names to the macros for readability. Also add a range of reserved flags to enum ib_send_flags. The mlx5 IB driver uses the new additions. Signed-off-by: Jack Morgenstein ja...@dev.mellanox.co.il --- include/rdma/ib_verbs.h | 35 +-- 1 files changed, 33 insertions(+), 2 deletions(-) diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index 98cc4b2..645c3ce 100644 --- a/include/rdma/ib_verbs.h +++ b/include/rdma/ib_verbs.h @@ -610,7 +610,21 @@ enum ib_qp_type { IB_QPT_RAW_PACKET = 8, IB_QPT_XRC_INI = 9, IB_QPT_XRC_TGT, - IB_QPT_MAX + IB_QPT_MAX, + /* Reserve a range for qp types internal to the low level driver. +* These qp types will not be visible at the IB core layer, so the +* IB_QPT_MAX usages should not be affected in the core layer +*/ + IB_QPT_RESERVED1 = 0x1000, + IB_QPT_RESERVED2, + IB_QPT_RESERVED3, + IB_QPT_RESERVED4, + IB_QPT_RESERVED5, + IB_QPT_RESERVED6, + IB_QPT_RESERVED7, + IB_QPT_RESERVED8, + IB_QPT_RESERVED9, + IB_QPT_RESERVED10, }; enum ib_qp_create_flags { @@ -766,6 +780,19 @@ enum ib_wr_opcode { IB_WR_MASKED_ATOMIC_CMP_AND_SWP, IB_WR_MASKED_ATOMIC_FETCH_AND_ADD, IB_WR_BIND_MW, + /* reserve values for low level drivers' internal use. +* These values will not be used at all in the ib core layer. +*/ + IB_WR_RESERVED1 = 0xf0, + IB_WR_RESERVED2, + IB_WR_RESERVED3, + IB_WR_RESERVED4, + IB_WR_RESERVED5, + IB_WR_RESERVED6, + IB_WR_RESERVED7, + IB_WR_RESERVED8, + IB_WR_RESERVED9, + IB_WR_RESERVED10, }; enum ib_send_flags { @@ -773,7 +800,11 @@ enum ib_send_flags { IB_SEND_SIGNALED= (11), IB_SEND_SOLICITED = (12), IB_SEND_INLINE = (13), - IB_SEND_IP_CSUM = (14) + IB_SEND_IP_CSUM = (14), + + /* reserve bits 26-31 for low level drivers' internal use */ + IB_SEND_RESERVED_START = (1 26), + IB_SEND_RESERVED_END= (1 31), }; struct ib_sge { -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V2 5/9] IB/mlx5: Mellanox Connect-IB, IB driver part 1/5
From: Eli Cohen e...@mellanox.com Signed-off-by: Eli Cohen e...@mellanox.com --- drivers/infiniband/hw/mlx5/ah.c | 95 drivers/infiniband/hw/mlx5/cq.c | 844 + drivers/infiniband/hw/mlx5/doorbell.c | 100 drivers/infiniband/hw/mlx5/mad.c | 139 ++ 4 files changed, 1178 insertions(+), 0 deletions(-) create mode 100644 drivers/infiniband/hw/mlx5/ah.c create mode 100644 drivers/infiniband/hw/mlx5/cq.c create mode 100644 drivers/infiniband/hw/mlx5/doorbell.c create mode 100644 drivers/infiniband/hw/mlx5/mad.c diff --git a/drivers/infiniband/hw/mlx5/ah.c b/drivers/infiniband/hw/mlx5/ah.c new file mode 100644 index 000..ff8f1cb --- /dev/null +++ b/drivers/infiniband/hw/mlx5/ah.c @@ -0,0 +1,95 @@ +/* + * Copyright (c) 2013, Mellanox Technologies inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + *copyright notice, this list of conditions and the following + *disclaimer. + * + * - Redistributions in binary form must reproduce the above + *copyright notice, this list of conditions and the following + *disclaimer in the documentation and/or other materials + *provided with the distribution. + * + * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ + +#include mlx5_ib.h + +struct ib_ah *create_ib_ah(struct ib_ah_attr *ah_attr, + struct mlx5_ib_ah *ah) +{ + u32 sgi; + + if (ah_attr-ah_flags IB_AH_GRH) { + sgi = ah_attr-grh.sgid_index 20; + + memcpy(ah-av.rgid, ah_attr-grh.dgid, 16); + ah-av.grh_gid_fl = cpu_to_be32(ah_attr-grh.flow_label | + (1 30) | sgi); + ah-av.hop_limit = ah_attr-grh.hop_limit; + ah-av.tclass = ah_attr-grh.traffic_class; + } + + ah-av.rlid = cpu_to_be16(ah_attr-dlid); + ah-av.fl_mlid = ah_attr-src_path_bits 0x7f; + ah-av.stat_rate_sl = (ah_attr-static_rate 4) | (ah_attr-sl 0xf); + + return ah-ibah; +} + +struct ib_ah *mlx5_ib_create_ah(struct ib_pd *pd, struct ib_ah_attr *ah_attr) +{ + struct mlx5_ib_ah *ah; + + ah = kzalloc(sizeof(*ah), GFP_ATOMIC); + if (!ah) + return ERR_PTR(-ENOMEM); + + return create_ib_ah(ah_attr, ah); /* never fails */ +} + +int mlx5_ib_query_ah(struct ib_ah *ibah, struct ib_ah_attr *ah_attr) +{ + struct mlx5_ib_ah *ah = to_mah(ibah); + u32 tmp; + + memset(ah_attr, 0, sizeof(*ah_attr)); + + tmp = be32_to_cpu(ah-av.grh_gid_fl); + if (tmp (1 30)) { + ah_attr-ah_flags = IB_AH_GRH; + ah_attr-grh.sgid_index = (tmp 20) 0xff; + ah_attr-grh.flow_label = tmp 0xf; + memcpy(ah_attr-grh.dgid, ah-av.rgid, 16); + ah_attr-grh.hop_limit = ah-av.hop_limit; + ah_attr-grh.traffic_class = ah-av.tclass; + } + ah_attr-dlid = be16_to_cpu(ah-av.rlid); + ah_attr-static_rate = ah-av.stat_rate_sl 4; + ah_attr-sl = ah-av.stat_rate_sl 0xf; + + return 0; +} + +int mlx5_ib_destroy_ah(struct ib_ah *ah) +{ + kfree(to_mah(ah)); + return 0; +} diff --git a/drivers/infiniband/hw/mlx5/cq.c b/drivers/infiniband/hw/mlx5/cq.c new file mode 100644 index 000..c05868e --- /dev/null +++ b/drivers/infiniband/hw/mlx5/cq.c @@ -0,0 +1,844 @@ +/* + * Copyright (c) 2013, Mellanox Technologies inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + *
[PATCH V2 9/9] IB/mlx5: Mellanox Connect-IB, IB driver part 5/5
From: Eli Cohen e...@mellanox.com Signed-off-by: Eli Cohen e...@mellanox.com --- MAINTAINERS | 10 ++ drivers/infiniband/Kconfig |1 + drivers/infiniband/Makefile |1 + drivers/infiniband/hw/mlx5/Kconfig | 10 ++ drivers/infiniband/hw/mlx5/Makefile |3 +++ 5 files changed, 25 insertions(+), 0 deletions(-) create mode 100644 drivers/infiniband/hw/mlx5/Kconfig create mode 100644 drivers/infiniband/hw/mlx5/Makefile diff --git a/MAINTAINERS b/MAINTAINERS index 6e82fb5..b426536 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -5377,6 +5377,16 @@ S: Supported F: drivers/net/ethernet/mellanox/mlx5/core/ F: include/linux/mlx5/ +Mellanox MLX5 IB driver +M: Eli Cohen e...@mellanox.com +L: linux-rdma@vger.kernel.org +W: http://www.mellanox.com +Q: http://patchwork.kernel.org/project/linux-rdma/list/ +T: git://openfabrics.org/~eli/connect-ib.git +S: Supported +F: include/linux/mlx5/ +F: drivers/infiniband/hw/mlx5/ + MODULE SUPPORT M: Rusty Russell ru...@rustcorp.com.au S: Maintained diff --git a/drivers/infiniband/Kconfig b/drivers/infiniband/Kconfig index c85b56c..5ceda71 100644 --- a/drivers/infiniband/Kconfig +++ b/drivers/infiniband/Kconfig @@ -50,6 +50,7 @@ source drivers/infiniband/hw/amso1100/Kconfig source drivers/infiniband/hw/cxgb3/Kconfig source drivers/infiniband/hw/cxgb4/Kconfig source drivers/infiniband/hw/mlx4/Kconfig +source drivers/infiniband/hw/mlx5/Kconfig source drivers/infiniband/hw/nes/Kconfig source drivers/infiniband/hw/ocrdma/Kconfig diff --git a/drivers/infiniband/Makefile b/drivers/infiniband/Makefile index b126fef..1fe6988 100644 --- a/drivers/infiniband/Makefile +++ b/drivers/infiniband/Makefile @@ -7,6 +7,7 @@ obj-$(CONFIG_INFINIBAND_AMSO1100) += hw/amso1100/ obj-$(CONFIG_INFINIBAND_CXGB3) += hw/cxgb3/ obj-$(CONFIG_INFINIBAND_CXGB4) += hw/cxgb4/ obj-$(CONFIG_MLX4_INFINIBAND) += hw/mlx4/ +obj-$(CONFIG_MLX5_INFINIBAND) += hw/mlx5/ obj-$(CONFIG_INFINIBAND_NES) += hw/nes/ obj-$(CONFIG_INFINIBAND_OCRDMA)+= hw/ocrdma/ obj-$(CONFIG_INFINIBAND_IPOIB) += ulp/ipoib/ diff --git a/drivers/infiniband/hw/mlx5/Kconfig b/drivers/infiniband/hw/mlx5/Kconfig new file mode 100644 index 000..8e6aebf --- /dev/null +++ b/drivers/infiniband/hw/mlx5/Kconfig @@ -0,0 +1,10 @@ +config MLX5_INFINIBAND + tristate Mellanox Connect-IB HCA support + depends on NETDEVICES ETHERNET PCI X86 + select NET_VENDOR_MELLANOX + select MLX5_CORE + ---help--- + This driver provides low-level InfiniBand support for + Mellanox Connect-IB PCI Express host channel adapters (HCAs). + This is required to use InfiniBand protocols such as + IP-over-IB or SRP with these devices. diff --git a/drivers/infiniband/hw/mlx5/Makefile b/drivers/infiniband/hw/mlx5/Makefile new file mode 100644 index 000..4ea0135 --- /dev/null +++ b/drivers/infiniband/hw/mlx5/Makefile @@ -0,0 +1,3 @@ +obj-$(CONFIG_MLX5_INFINIBAND) += mlx5_ib.o + +mlx5_ib-y := main.o cq.o doorbell.o qp.o mem.o srq.o mr.o ah.o mad.o -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V2 0/9] Add Mellanox mlx5 driver for Connect-IB devices
Hi Roland, all Here's V2 of the driver, with Dave's and Roland's comments addressed, looking forward to see if we have OK from Roland to merge that into 3.11 Jack, Moshe and Or. changes from V1: - Addreessed Dave Miller's comments: * Local variables in functions listed from longest to shortest * --i/++i changed to i--/i++ in all for-loops * Removed leading /* empty line from all comments * magic constants given names * endianness code moved to driver.h, and defined an endianness-dependent macro for use in assignment. * destroy_msg_cache() duplicated code removed - Addressed Roland's comments: * Renamed foo_spl to foo_lock for spinlocks. * Eliminated magic number from mlx5_cmd_stats field declaration in struct mlx5_cmd. * Eliminated unused procedure mlx5_ib_umem_populate_pas() command execution times, but all file-name-based mask bits removed. * Cleaned up mlx5_ib.h: * Added new patch for ib_verbs.h, adding reserved values to several enums * For several ib-core enums, added reserved values for use by low-level drivers. By defining macros at the low level (i.e., renaming the reserved values, in effect), the ll drivers may use these enums without needing to duplicate the ib-core enums while adding extra values. This fixes compilation problems such as: /home/roland/Src/linux-merge.git/drivers/infiniband/hw/mlx5/qp.c:975:2: error: case value 4671 not in enumerated type enum ib_qp_type * Changed ib_latency_class to mlx5_ib_latency_class, visible only in low-level driver * Eliminated the unused IB_WR_xxx_PSV enums * Defined macros MLX5_IB_SEND_UMR_UNREG, MLX5_IB_QPT_REG_UMR, and MLX5_IB_WR_UMR, taking advantage of the reserved values added to the ib_core enums. * debug-mask removed from mlx5_ib * Regarding mlx5_core, still have a debug mask to enable printouts of command data and * Removed forced -Wall -Werror -DDEBUG settings in the mlx5 core/ib makefiles changes from V0: - Per Dave's request, cross posting to both netdev and linux-rdma, to see if there are comments from netdev on the core driver. From: Eli Cohen e...@mellanox.com The patches that follow constitute the driver for Mellanox's 5th generation of HCAs named Connect-IB. The driver is comprised of two kernel modules: mlx5_ib and mlx5_core. This partitioning resembles what we have for mlx4 with the substantial difference that mlx5_ib is the pci device driver and not mlx5_core. mlx5_core provides general functionality that is intended to be used by other Mellanox devices that will be introduced in the future. In this sense, it can be perceived as a library. mlx5_ib has a similar role as any hardware device under drivers/infiniband/hw. The patches are partitioned to avoid exceeding the 100KB vger.kernel.org limitation. They are divided such that the first three ones have the code of the mlx5_core driver, and the last five the code of the mlx5_ib driver. Only the last patch per driver adds the Makefiles and Kconfigs, to make things robust for future bisections. PPC is not yet supported but support will be included in the near future. Eli Cohen (8): net/mlx5: Mellanox Connect-IB, core driver part 1/3 net/mlx5: Mellanox Connect-IB, core driver part 2/3 net/mlx5: Mellanox Connect-IB, core driver part 3/3 IB/mlx5: Mellanox Connect-IB, IB driver part 1/5 IB/mlx5: Mellanox Connect-IB, IB driver part 2/5 IB/mlx5: Mellanox Connect-IB, IB driver part 3/5 IB/mlx5: Mellanox Connect-IB, IB driver part 4/5 IB/mlx5: Mellanox Connect-IB, IB driver part 5/5 Jack Morgenstein (1): IB/core: Add reserved values to enums for low-level drivers use MAINTAINERS| 22 + drivers/infiniband/Kconfig |1 + drivers/infiniband/Makefile|1 + drivers/infiniband/hw/mlx5/Kconfig | 10 + drivers/infiniband/hw/mlx5/Makefile|3 + drivers/infiniband/hw/mlx5/ah.c| 95 + drivers/infiniband/hw/mlx5/cq.c| 844 +++ drivers/infiniband/hw/mlx5/doorbell.c | 100 + drivers/infiniband/hw/mlx5/mad.c | 139 ++ drivers/infiniband/hw/mlx5/main.c | 1504 drivers/infiniband/hw/mlx5/mem.c | 162 ++ drivers/infiniband/hw/mlx5/mlx5_ib.h | 547 + drivers/infiniband/hw/mlx5/mr.c| 1021 drivers/infiniband/hw/mlx5/qp.c| 2537 drivers/infiniband/hw/mlx5/srq.c | 478 drivers/infiniband/hw/mlx5/user.h | 121 + drivers/net/ethernet/mellanox/Kconfig |1 + drivers/net/ethernet/mellanox/Makefile |1 + drivers/net/ethernet/mellanox/mlx5/core/Kconfig| 18 + drivers/net/ethernet/mellanox/mlx5/core/Makefile |5 +
[PATCH V2 7/9] IB/mlx5: Mellanox Connect-IB, IB driver part 3/5
From: Eli Cohen e...@mellanox.com Signed-off-by: Eli Cohen e...@mellanox.com --- drivers/infiniband/hw/mlx5/mlx5_ib.h | 547 ++ drivers/infiniband/hw/mlx5/mr.c | 1021 ++ 2 files changed, 1568 insertions(+), 0 deletions(-) create mode 100644 drivers/infiniband/hw/mlx5/mlx5_ib.h create mode 100644 drivers/infiniband/hw/mlx5/mr.c diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h new file mode 100644 index 000..d2067c3 --- /dev/null +++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h @@ -0,0 +1,547 @@ +/* + * Copyright (c) 2013, Mellanox Technologies inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + *copyright notice, this list of conditions and the following + *disclaimer. + * + * - Redistributions in binary form must reproduce the above + *copyright notice, this list of conditions and the following + *disclaimer in the documentation and/or other materials + *provided with the distribution. + * + * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ + +#ifndef MLX5_IB_H +#define MLX5_IB_H + +#include linux/kernel.h +#include linux/sched.h +#include rdma/ib_verbs.h +#include rdma/ib_smi.h +#include linux/mlx5/driver.h +#include linux/mlx5/cq.h +#include linux/mlx5/qp.h +#include linux/mlx5/srq.h +#include linux/types.h + +#define mlx5_ib_dbg(dev, format, arg...) \ +do { \ + pr_debug(%s:%s:%d:(pid %d): format, (dev)-ib_dev.name, \ +__func__, __LINE__, current-pid, ##arg); \ +} while (0) + +#define mlx5_ib_err(dev, format, arg...) \ +pr_err(%s:%s:%d:(pid %d): format, (dev)-ib_dev.name, __func__, \ + __LINE__, current-pid, ##arg) + +#define mlx5_ib_warn(dev, format, arg...) \ +pr_warn(%s:%s:%d:(pid %d): format, (dev)-ib_dev.name, __func__,\ + __LINE__, current-pid, ##arg) + +enum { + MLX5_IB_MMAP_CMD_SHIFT = 8, + MLX5_IB_MMAP_CMD_MASK = 0xff, +}; + +enum mlx5_ib_mmap_cmd { + MLX5_IB_MMAP_REGULAR_PAGE = 0, + MLX5_IB_MMAP_GET_CONTIGUOUS_PAGES = 1, /* always last */ +}; + +enum { + MLX5_RES_SCAT_DATA32_CQE= 0x1, + MLX5_RES_SCAT_DATA64_CQE= 0x2, + MLX5_REQ_SCAT_DATA32_CQE= 0x11, + MLX5_REQ_SCAT_DATA64_CQE= 0x22, +}; + +enum mlx5_ib_latency_class { + MLX5_IB_LATENCY_CLASS_LOW, + MLX5_IB_LATENCY_CLASS_MEDIUM, + MLX5_IB_LATENCY_CLASS_HIGH, + MLX5_IB_LATENCY_CLASS_FAST_PATH +}; + +enum mlx5_ib_mad_ifc_flags { + MLX5_MAD_IFC_IGNORE_MKEY= 1, + MLX5_MAD_IFC_IGNORE_BKEY= 2, + MLX5_MAD_IFC_NET_VIEW = 4, +}; + +struct mlx5_ib_ucontext { + struct ib_ucontext ibucontext; + struct list_headdb_page_list; + + /* protect doorbell record alloc/free +*/ + struct mutexdb_page_mutex; + struct mlx5_uuar_info uuari; +}; + +static inline struct mlx5_ib_ucontext *to_mucontext(struct ib_ucontext *ibucontext) +{ + return container_of(ibucontext, struct mlx5_ib_ucontext, ibucontext); +} + +struct mlx5_ib_pd { + struct ib_pdibpd; + u32 pdn; + u32 pa_lkey; +}; + +/* Use macros here so that don't have to duplicate + * enum ib_send_flags and enum ib_qp_type for low-level driver + */ + +#define MLX5_IB_SEND_UMR_UNREG IB_SEND_RESERVED_START +#define MLX5_IB_QPT_REG_UMRIB_QPT_RESERVED1 +#define MLX5_IB_WR_UMR IB_WR_RESERVED1 + +struct wr_list { + u16 opcode; + u16 next; +}; + +struct mlx5_ib_wq { + u64*wrid; + u32*wr_data; + struct wr_list *w_list; + unsigned *wqe_head; + u16 unsig_count; + + /*
Re: rtnl_lock deadlock on 3.10
On 03/07/2013 20:22, Shawn Bohrer wrote: On Wed, Jul 03, 2013 at 07:33:07AM +0200, Hannes Frederic Sowa wrote: On Wed, Jul 03, 2013 at 07:11:52AM +0200, Hannes Frederic Sowa wrote: On Tue, Jul 02, 2013 at 01:38:26PM +, Cong Wang wrote: On Tue, 02 Jul 2013 at 08:28 GMT, Hannes Frederic Sowa han...@stressinduktion.org wrote: On Mon, Jul 01, 2013 at 09:54:56AM -0500, Shawn Bohrer wrote: I've managed to hit a deadlock at boot a couple times while testing the 3.10 rc kernels. It seems to always happen when my network devices are initializing. This morning I updated to v3.10 and made a few config tweaks and so far I've hit it 4 out of 5 reboots. It looks like most processes are getting stuck on rtnl_lock. Below is a boot log with the soft lockup prints. Please let know if there is any other information I can provide: Could you try a build with CONFIG_LOCKDEP enabled? The problem is clear: ib_register_device() is called with rtnl_lock, but itself needs device_mutex, however, ib_register_client() first acquires device_mutex, then indirectly calls register_netdev() which takes rtnl_lock. Deadlock! One possible fix is always taking rtnl_lock before taking device_mutex, something like below: diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c index 18c1ece..890870b 100644 --- a/drivers/infiniband/core/device.c +++ b/drivers/infiniband/core/device.c @@ -381,6 +381,7 @@ int ib_register_client(struct ib_client *client) { struct ib_device *device; + rtnl_lock(); mutex_lock(device_mutex); list_add_tail(client-list, client_list); @@ -389,6 +390,7 @@ int ib_register_client(struct ib_client *client) client-add(device); mutex_unlock(device_mutex); + rtnl_unlock(); return 0; } diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c index b6e049a..5a7a048 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c @@ -1609,7 +1609,7 @@ static struct net_device *ipoib_add_port(const char *format, goto event_failed; } - result = register_netdev(priv-dev); + result = register_netdevice(priv-dev); if (result) { printk(KERN_WARNING %s: couldn't register ipoib port %d; error %d\n, hca-name, port, result); Looks good to me. Shawn, could you test this patch? ib_unregister_device/ib_unregister_client would need the same change, too. I have not checked the other -add() and -remove() functions. Also cc'ed linux-rdma@vger.kernel.org, Roland Dreier. Cong's patch is missing the #include linux/rtnetlink.h but otherwise I've had 34 successful reboots with no deadlocks which is a good sign. It sounds like there are more paths that need to be audited and a proper patch submitted. I can do more testing later if needed. Thanks, Shawn Guys, I was a bit busy today looking into that, but I don't think we want the IB core layer (core/device.c) to use rtnl locking which is something that belongs to the network stack. Or. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V3 for-next 3/4] IB/core: Export ib_create/destroy_flow through uverbs
From: Hadar Hen Zion had...@mellanox.com Implement ib_uverbs_create_flow and ib_uverbs_destroy_flow to support flow steering for user space applications. Signed-off-by: Hadar Hen Zion had...@mellanox.com Signed-off-by: Or Gerlitz ogerl...@mellanox.com --- drivers/infiniband/core/uverbs.h |3 + drivers/infiniband/core/uverbs_cmd.c | 199 + drivers/infiniband/core/uverbs_main.c | 13 ++- include/rdma/ib_verbs.h |1 + include/uapi/rdma/ib_user_verbs.h | 88 ++- 5 files changed, 302 insertions(+), 2 deletions(-) diff --git a/drivers/infiniband/core/uverbs.h b/drivers/infiniband/core/uverbs.h index 0fcd7aa..ad9d102 100644 --- a/drivers/infiniband/core/uverbs.h +++ b/drivers/infiniband/core/uverbs.h @@ -155,6 +155,7 @@ extern struct idr ib_uverbs_cq_idr; extern struct idr ib_uverbs_qp_idr; extern struct idr ib_uverbs_srq_idr; extern struct idr ib_uverbs_xrcd_idr; +extern struct idr ib_uverbs_rule_idr; void idr_remove_uobj(struct idr *idp, struct ib_uobject *uobj); @@ -215,5 +216,7 @@ IB_UVERBS_DECLARE_CMD(destroy_srq); IB_UVERBS_DECLARE_CMD(create_xsrq); IB_UVERBS_DECLARE_CMD(open_xrcd); IB_UVERBS_DECLARE_CMD(close_xrcd); +IB_UVERBS_DECLARE_CMD(create_flow); +IB_UVERBS_DECLARE_CMD(destroy_flow); #endif /* UVERBS_H */ diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c index a7d00f6..bfc53f7 100644 --- a/drivers/infiniband/core/uverbs_cmd.c +++ b/drivers/infiniband/core/uverbs_cmd.c @@ -54,6 +54,7 @@ static struct uverbs_lock_class qp_lock_class = { .name = QP-uobj }; static struct uverbs_lock_class ah_lock_class = { .name = AH-uobj }; static struct uverbs_lock_class srq_lock_class = { .name = SRQ-uobj }; static struct uverbs_lock_class xrcd_lock_class = { .name = XRCD-uobj }; +static struct uverbs_lock_class rule_lock_class = { .name = RULE-uobj }; #define INIT_UDATA(udata, ibuf, obuf, ilen, olen) \ do {\ @@ -330,6 +331,7 @@ ssize_t ib_uverbs_get_context(struct ib_uverbs_file *file, INIT_LIST_HEAD(ucontext-srq_list); INIT_LIST_HEAD(ucontext-ah_list); INIT_LIST_HEAD(ucontext-xrcd_list); + INIT_LIST_HEAD(ucontext-rule_list); ucontext-closing = 0; resp.num_comp_vectors = file-device-num_comp_vectors; @@ -2587,6 +2589,203 @@ out_put: return ret ? ret : in_len; } +static int kern_spec_to_ib_spec(struct ib_kern_spec *kern_spec, + struct _ib_flow_spec *ib_spec) +{ + ib_spec-type = kern_spec-type; + + switch (ib_spec-type) { + case IB_FLOW_SPEC_ETH: + ib_spec-eth.size = sizeof(struct ib_flow_spec_eth); + memcpy(ib_spec-eth.val, kern_spec-eth.val, + sizeof(struct ib_flow_eth_filter)); + memcpy(ib_spec-eth.mask, kern_spec-eth.mask, + sizeof(struct ib_flow_eth_filter)); + break; + case IB_FLOW_SPEC_IPV4: + ib_spec-ipv4.size = sizeof(struct ib_flow_spec_ipv4); + memcpy(ib_spec-ipv4.val, kern_spec-ipv4.val, + sizeof(struct ib_flow_ipv4_filter)); + memcpy(ib_spec-ipv4.mask, kern_spec-ipv4.mask, + sizeof(struct ib_flow_ipv4_filter)); + break; + case IB_FLOW_SPEC_TCP: + case IB_FLOW_SPEC_UDP: + ib_spec-tcp_udp.size = sizeof(struct ib_flow_spec_tcp_udp); + memcpy(ib_spec-tcp_udp.val, kern_spec-tcp_udp.val, + sizeof(struct ib_flow_tcp_udp_filter)); + memcpy(ib_spec-tcp_udp.mask, kern_spec-tcp_udp.mask, + sizeof(struct ib_flow_tcp_udp_filter)); + break; + default: + return -EINVAL; + } + return 0; +} + +ssize_t ib_uverbs_create_flow(struct ib_uverbs_file *file, + const char __user *buf, int in_len, + int out_len) +{ + struct ib_uverbs_create_flow cmd; + struct ib_uverbs_create_flow_resp resp; + struct ib_uobject *uobj; + struct ib_flow*flow_id; + struct ib_kern_flow_attr *kern_flow_attr; + struct ib_flow_attr *flow_attr; + struct ib_qp *qp; + int err = 0; + void *kern_spec; + void *ib_spec; + int i; + + if (out_len sizeof(resp)) + return -ENOSPC; + + if (copy_from_user(cmd, buf, sizeof(cmd))) + return -EFAULT; + + if ((cmd.flow_attr.type == IB_FLOW_ATTR_SNIFFER +!capable(CAP_NET_ADMIN)) || !capable(CAP_NET_RAW)) + return -EPERM; + + if (cmd.flow_attr.num_of_specs) { + kern_flow_attr = kmalloc(cmd.flow_attr.size, GFP_KERNEL); + if (!kern_flow_attr
[PATCH V3 for-next 2/4] IB/core: Infra-structure to support verbs extensions through uverbs
From: Igor Ivanov igor.iva...@itseez.com Add Infra-structure to support extended uverbs capabilities in a forward/backward manner. Uverbs command opcodes which are based on the verbs extensions approach should be greater or equal to IB_USER_VERBS_CMD_THRESHOLD. They have new header format and processed a bit differently. Signed-off-by: Igor Ivanov igor.iva...@itseez.com Signed-off-by: Hadar Hen Zion had...@mellanox.com Signed-off-by: Or Gerlitz ogerl...@mellanox.com --- drivers/infiniband/core/uverbs_main.c | 29 - include/uapi/rdma/ib_user_verbs.h | 10 ++ 2 files changed, 34 insertions(+), 5 deletions(-) diff --git a/drivers/infiniband/core/uverbs_main.c b/drivers/infiniband/core/uverbs_main.c index 2c6f0f2..e4e7b24 100644 --- a/drivers/infiniband/core/uverbs_main.c +++ b/drivers/infiniband/core/uverbs_main.c @@ -583,9 +583,6 @@ static ssize_t ib_uverbs_write(struct file *filp, const char __user *buf, if (copy_from_user(hdr, buf, sizeof hdr)) return -EFAULT; - if (hdr.in_words * 4 != count) - return -EINVAL; - if (hdr.command = ARRAY_SIZE(uverbs_cmd_table) || !uverbs_cmd_table[hdr.command]) return -EINVAL; @@ -597,8 +594,30 @@ static ssize_t ib_uverbs_write(struct file *filp, const char __user *buf, if (!(file-device-ib_dev-uverbs_cmd_mask (1ull hdr.command))) return -ENOSYS; - return uverbs_cmd_table[hdr.command](file, buf + sizeof hdr, -hdr.in_words * 4, hdr.out_words * 4); + if (hdr.command = IB_USER_VERBS_CMD_THRESHOLD) { + struct ib_uverbs_cmd_hdr_ex hdr_ex; + + if (copy_from_user(hdr_ex, buf, sizeof(hdr_ex))) + return -EFAULT; + + if (((hdr_ex.in_words + hdr_ex.provider_in_words) * 4) != count) + return -EINVAL; + + return uverbs_cmd_table[hdr.command](file, +buf + sizeof(hdr_ex), +(hdr_ex.in_words + + hdr_ex.provider_in_words) * 4, +(hdr_ex.out_words + + hdr_ex.provider_out_words) * 4); + } else { + if (hdr.in_words * 4 != count) + return -EINVAL; + + return uverbs_cmd_table[hdr.command](file, +buf + sizeof(hdr), +hdr.in_words * 4, +hdr.out_words * 4); + } } static int ib_uverbs_mmap(struct file *filp, struct vm_area_struct *vma) diff --git a/include/uapi/rdma/ib_user_verbs.h b/include/uapi/rdma/ib_user_verbs.h index 805711e..61535aa 100644 --- a/include/uapi/rdma/ib_user_verbs.h +++ b/include/uapi/rdma/ib_user_verbs.h @@ -43,6 +43,7 @@ * compatibility are made. */ #define IB_USER_VERBS_ABI_VERSION 6 +#define IB_USER_VERBS_CMD_THRESHOLD50 enum { IB_USER_VERBS_CMD_GET_CONTEXT, @@ -123,6 +124,15 @@ struct ib_uverbs_cmd_hdr { __u16 out_words; }; +struct ib_uverbs_cmd_hdr_ex { + __u32 command; + __u16 in_words; + __u16 out_words; + __u16 provider_in_words; + __u16 provider_out_words; + __u32 cmd_hdr_reserved; +}; + struct ib_uverbs_get_context { __u64 response; __u64 driver_data[0]; -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V3 for-next 0/4] Add receive Flow Steering support
Hi Roland, all V3 addresses the comments made by Sean. There are still some concerns/questions posed by Roland on the uverbs extensions element of the series. I have posted replies for them, but so far no further comments were made. V3 changes: - Addressed comments from Sean: - modified the change-log of patch #1 to be clearer on the priority and domain semantics and usage - re-arranged the fields of struct ib_flow_attr - removed check from ib_flow_destroy - removed the IB flow spec which wasn't inline with the L2/L3/L4 approach done for Ethernet/IP/TCP|UDP, will use proper IB flow specs when adding the support for IPoIB flow steering V2 changes: - dropped struct ib_kern_flow from patch #3, this structure wasn't used and was left there by mistake (bug, thanks Roland) - removed the void *flow_context field from struct ib_flow, this was pointing to driver private data for that flow, but doesn't belong here, i.e need not be seen by the verbs consumer but rather hidden. - renamed struct mlx4_flow_handle to mlx4_ib_flow, a structure that contains the verbs level struct ib_flow and the mlx4 registeration ID for that flow V1 changes: - dropped the five pre-patches which were accepted into 3.10 - rebased the patches against Roland's for-next / 3.10-rc4 - in patch #3, ib_uverbs_destroy_flow was returning too quickly when the driver returned failure for ib_destroy_flow, need to free some uverbs resources 1st. - in patch #4, check index before accessing the array at mlx4_ib_create/destroy_flow These patches add Flow Steering support to the kernel IB core, to uverbs and to the mlx4 IB (verbs) driver along with one patch to uverbs which adds some code to support extensions. IB/core: Add receive Flow Steering support IB/core: Infra-structure to support verbs extensions through uverbs IB/core: Export ib_create/destroy_flow through uverbs IB/mlx4: Add receive Flow Steering support The main patch which introduces the Flow-Steering API is IB/core: Add receive Flow Steering support, see its change log. Looking on the Network Adapter Flow Steering slides from Tzahi Oved which he presented on the annual OFA 2012 meeting could be helpful https://www.openfabrics.org/resources/document-downloads/presentations/doc_download/518-network-adapter-flow-steering.html Or. Hadar Hen Zion (3): IB/core: Add receive Flow Steering support IB/core: Export ib_create/destroy_flow through uverbs IB/mlx4: Add receive Flow Steering support Igor Ivanov (1): IB/core: Infra-structure to support verbs extensions through uverbs drivers/infiniband/core/uverbs.h |3 + drivers/infiniband/core/uverbs_cmd.c | 199 drivers/infiniband/core/uverbs_main.c | 42 +- drivers/infiniband/core/verbs.c | 27 drivers/infiniband/hw/mlx4/main.c | 235 + drivers/infiniband/hw/mlx4/mlx4_ib.h | 12 ++ include/linux/mlx4/device.h |5 - include/rdma/ib_verbs.h | 122 +- include/uapi/rdma/ib_user_verbs.h | 98 ++- 9 files changed, 729 insertions(+), 14 deletions(-) -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V3 for-next 4/4] IB/mlx4: Add receive Flow Steering support
From: Hadar Hen Zion had...@mellanox.com Implement ib_create_flow and ib_destroy_flow. Translate the verbs structures provided by the user to HW structures and call the MLX4_QP_FLOW_STEERING_ATTACH/DETACH firmware commands. On the ATTACH command completion, the firmware provides 64 bit registration ID which is placed into struct mlx4_ib_flow that wraps the instance of struct ib_flow which is retuned to caller. Later, this reg ID is used for detaching that flow from the firmware. Signed-off-by: Hadar Hen Zion had...@mellanox.com Signed-off-by: Or Gerlitz ogerl...@mellanox.com --- drivers/infiniband/hw/mlx4/main.c| 235 ++ drivers/infiniband/hw/mlx4/mlx4_ib.h | 12 ++ include/linux/mlx4/device.h |5 - 3 files changed, 247 insertions(+), 5 deletions(-) diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c index a188d31..5b5518f 100644 --- a/drivers/infiniband/hw/mlx4/main.c +++ b/drivers/infiniband/hw/mlx4/main.c @@ -54,6 +54,8 @@ #define DRV_VERSION1.0 #define DRV_RELDATEApril 4, 2008 +#define MLX4_IB_FLOW_MAX_PRIO 0xFFF + MODULE_AUTHOR(Roland Dreier); MODULE_DESCRIPTION(Mellanox ConnectX HCA InfiniBand driver); MODULE_LICENSE(Dual BSD/GPL); @@ -88,6 +90,25 @@ static void init_query_mad(struct ib_smp *mad) static union ib_gid zgid; +static int check_flow_steering_support(struct mlx4_dev *dev) +{ + int ib_num_ports = 0; + int i; + + mlx4_foreach_port(i, dev, MLX4_PORT_TYPE_IB) + ib_num_ports++; + + if (dev-caps.steering_mode == MLX4_STEERING_MODE_DEVICE_MANAGED) { + if (ib_num_ports || mlx4_is_mfunc(dev)) { + pr_warn(Device managed flow steering is unavailable + for IB ports or in multifunction env.\n); + return 0; + } + return 1; + } + return 0; +} + static int mlx4_ib_query_device(struct ib_device *ibdev, struct ib_device_attr *props) { @@ -144,6 +165,8 @@ static int mlx4_ib_query_device(struct ib_device *ibdev, props-device_cap_flags |= IB_DEVICE_MEM_WINDOW_TYPE_2B; else props-device_cap_flags |= IB_DEVICE_MEM_WINDOW_TYPE_2A; + if (check_flow_steering_support(dev-dev)) + props-device_cap_flags |= IB_DEVICE_MANAGED_FLOW_STEERING; } props-vendor_id = be32_to_cpup((__be32 *) (out_mad-data + 36)) @@ -798,6 +821,209 @@ struct mlx4_ib_steering { union ib_gid gid; }; +static int parse_flow_attr(struct mlx4_dev *dev, + struct _ib_flow_spec *ib_spec, + struct _rule_hw *mlx4_spec) +{ + enum mlx4_net_trans_rule_id type; + + switch (ib_spec-type) { + case IB_FLOW_SPEC_ETH: + type = MLX4_NET_TRANS_RULE_ID_ETH; + memcpy(mlx4_spec-eth.dst_mac, ib_spec-eth.val.dst_mac, + ETH_ALEN); + memcpy(mlx4_spec-eth.dst_mac_msk, ib_spec-eth.mask.dst_mac, + ETH_ALEN); + mlx4_spec-eth.vlan_tag = ib_spec-eth.val.vlan_tag; + mlx4_spec-eth.vlan_tag_msk = ib_spec-eth.mask.vlan_tag; + break; + + case IB_FLOW_SPEC_IPV4: + type = MLX4_NET_TRANS_RULE_ID_IPV4; + mlx4_spec-ipv4.src_ip = ib_spec-ipv4.val.src_ip; + mlx4_spec-ipv4.src_ip_msk = ib_spec-ipv4.mask.src_ip; + mlx4_spec-ipv4.dst_ip = ib_spec-ipv4.val.dst_ip; + mlx4_spec-ipv4.dst_ip_msk = ib_spec-ipv4.mask.dst_ip; + break; + + case IB_FLOW_SPEC_TCP: + case IB_FLOW_SPEC_UDP: + type = ib_spec-type == IB_FLOW_SPEC_TCP ? + MLX4_NET_TRANS_RULE_ID_TCP : + MLX4_NET_TRANS_RULE_ID_UDP; + mlx4_spec-tcp_udp.dst_port = ib_spec-tcp_udp.val.dst_port; + mlx4_spec-tcp_udp.dst_port_msk = ib_spec-tcp_udp.mask.dst_port; + mlx4_spec-tcp_udp.src_port = ib_spec-tcp_udp.val.src_port; + mlx4_spec-tcp_udp.src_port_msk = ib_spec-tcp_udp.mask.src_port; + break; + + default: + return -EINVAL; + } + if (mlx4_map_sw_to_hw_steering_id(dev, type) 0 || + mlx4_hw_rule_sz(dev, type) 0) + return -EINVAL; + mlx4_spec-id = cpu_to_be16(mlx4_map_sw_to_hw_steering_id(dev, type)); + mlx4_spec-size = mlx4_hw_rule_sz(dev, type) 2; + return mlx4_hw_rule_sz(dev, type); +} + +static int __mlx4_ib_create_flow(struct ib_qp *qp, struct ib_flow_attr *flow_attr, + int domain, + enum mlx4_net_trans_promisc_mode flow_type, + u64 *reg_id) +{ + int ret, i; + int size = 0; + void *ib_flow; + struct
[PATCH V3 for-next 1/4] IB/core: Add receive Flow Steering support
From: Hadar Hen Zion had...@mellanox.com The RDMA stack allows for applications to create IB_QPT_RAW_PACKET QPs, for which plain Ethernet packets are used, specifically packets which don't carry any QPN to be matched by the receiving side. Applications using these QPs must be provided with a method to program some steering rule with the HW so packets arriving at the local port can be routed to them. This patch adds ib_create_flow which allow to provide a flow specification for a QP, such that when there's a match between the specification and the received packet, it can be forwarded to that QP, in a similar manner one needs to use ib_attach_multicast for IB UD multicast handling. Flow specifications are provided as instances of struct ib_flow_spec_yyy which describe L2, L3 and L4 headers, currently specs for Ethernet, IPv4, TCP and UDP are defined. Flow specs are made of values and masks. The input to ib_create_flow is instance of struct ib_flow_attr which contain few mandatory control elements and optional flow specs. struct ib_flow_attr { enum ib_flow_attr_type type; u16 size; u16 priority; u32 flags; u8 num_of_specs; u8 port; /* Following are the optional layers according to user request * struct ib_flow_spec_yyy * struct ib_flow_spec_zzz */ }; As these specs are eventually coming from user space, they are defined and used in a way which allows adding new spec types without kernel/user ABI change, and with a little API enhancement which defines the newly added spec. The flow spec structures are defined in a TLV (Type-Length-Value) manner, which allows to call ib_create_flow with a list of variable length of optional specs. For the actual processing of ib_flow_attr the driver uses the number of specs and the size mandatory fields along with the TLV nature of the specs. Steering rules processing order is according to the domain over which the rule is set and the rule priority. All rules set by user space applicatations fall into the IB_FLOW_DOMAIN_USER domain, other domains could be used by future IPoIB RFS and Ethetool flow-steering interface implementation. Lower priority numerical value means higher priority. The returned value from ib_create_flow is instance of struct ib_flow which contains a database pointer (handle) provided by the HW driver to be used when calling ib_destroy_flow. Applications that offload TCP/IP traffic could be written also over IB UD QPs. As such, the ib_create_flow / ib_destroy_flow API is designed to support UD QPs too, the HW driver sets IB_DEVICE_MANAGED_FLOW_STEERING to denote support of flow steering. The ib_flow_attr enum type relates to usage of flow steering for promiscuous and sniffer purposes: IB_FLOW_ATTR_NORMAL - regular rule, steering according to rule specification IB_FLOW_ATTR_ALL_DEFAULT - default unicast and multicast rule, receive all Ethernet traffic which isn't steered to any QP IB_FLOW_ATTR_MC_DEFAULT - same as IB_FLOW_ATTR_ALL_DEFAULT but only for multicast IB_FLOW_ATTR_SNIFFER - sniffer rule, receive all port traffic ALL_DEFAULT and MC_DEFAULT rules options are valid only for Ethernet link type. Signed-off-by: Hadar Hen Zion had...@mellanox.com Signed-off-by: Or Gerlitz ogerl...@mellanox.com --- drivers/infiniband/core/verbs.c | 27 + include/rdma/ib_verbs.h | 121 ++- 2 files changed, 146 insertions(+), 2 deletions(-) diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c index 22192de..87a8102 100644 --- a/drivers/infiniband/core/verbs.c +++ b/drivers/infiniband/core/verbs.c @@ -1254,3 +1254,30 @@ int ib_dealloc_xrcd(struct ib_xrcd *xrcd) return xrcd-device-dealloc_xrcd(xrcd); } EXPORT_SYMBOL(ib_dealloc_xrcd); + +struct ib_flow *ib_create_flow(struct ib_qp *qp, + struct ib_flow_attr *flow_attr, + int domain) +{ + struct ib_flow *flow_id; + if (!qp-device-create_flow) + return ERR_PTR(-ENOSYS); + + flow_id = qp-device-create_flow(qp, flow_attr, domain); + if (!IS_ERR(flow_id)) + atomic_inc(qp-usecnt); + return flow_id; +} +EXPORT_SYMBOL(ib_create_flow); + +int ib_destroy_flow(struct ib_flow *flow_id) +{ + int err; + struct ib_qp *qp = flow_id-qp; + + err = qp-device-destroy_flow(flow_id); + if (!err) + atomic_dec(qp-usecnt); + return err; +} +EXPORT_SYMBOL(ib_destroy_flow); diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index 98cc4b2..1390a0f 100644 --- a/include/rdma/ib_verbs.h +++ b/include/rdma/ib_verbs.h @@ -116,7 +116,8 @@ enum ib_device_cap_flags { IB_DEVICE_MEM_MGT_EXTENSIONS= (121), IB_DEVICE_BLOCK_MULTICAST_LOOPBACK = (122), IB_DEVICE_MEM_WINDOW_TYPE_2A= (123), - IB_DEVICE_MEM_WINDOW_TYPE_2B= (124
Re: [PATCH for-next 0/8] Add Mellanox mlx5 driver for Connect-IB devices
On Wed, Jul 3, 2013 at 10:26 PM, Roland Dreier rol...@kernel.org wrote: On Wed, Jul 3, 2013 at 9:41 AM, Or Gerlitz ogerl...@mellanox.com wrote: Jack looked on this comment/code and he says that the active flag is used to prevent re-scheduling the timer from inside the timer handling routine. In the kernel, the comment header in the source file for del_timer_sync explicitly states that re-scheduling the timer must be prevented, or the sync is useless:Callers must prevent restarting of the timer, otherwise this function is meaningless So we believe that code should remain. Look at the actual timer code. del_timer_sync() won't work if something unrelated re-adds the timer, but it will work if the timer itself is what re-adds itself. [...] OK, we will re-look into that tomorrow. So how V2 looks? Or. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V3 4/9] IB/core: Add reserved values to enums for low-level drivers use
From: Jack Morgenstein ja...@dev.mellanox.co.il Continue the approach taken by commit d2b57063e4a IB/core: Reserve bits in enum ib_qp_create_flags for low-level driver use and reserved entries to the ib_qp_type and ib_wr_opcode enums. The low-level drivers will then define macros to use these reserved values, giving proper names to the macros for readability. Also add a range of reserved flags to enum ib_send_flags. The mlx5 IB driver uses the new additions. Signed-off-by: Jack Morgenstein ja...@dev.mellanox.co.il --- include/rdma/ib_verbs.h | 35 +-- 1 files changed, 33 insertions(+), 2 deletions(-) diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index 98cc4b2..645c3ce 100644 --- a/include/rdma/ib_verbs.h +++ b/include/rdma/ib_verbs.h @@ -610,7 +610,21 @@ enum ib_qp_type { IB_QPT_RAW_PACKET = 8, IB_QPT_XRC_INI = 9, IB_QPT_XRC_TGT, - IB_QPT_MAX + IB_QPT_MAX, + /* Reserve a range for qp types internal to the low level driver. +* These qp types will not be visible at the IB core layer, so the +* IB_QPT_MAX usages should not be affected in the core layer +*/ + IB_QPT_RESERVED1 = 0x1000, + IB_QPT_RESERVED2, + IB_QPT_RESERVED3, + IB_QPT_RESERVED4, + IB_QPT_RESERVED5, + IB_QPT_RESERVED6, + IB_QPT_RESERVED7, + IB_QPT_RESERVED8, + IB_QPT_RESERVED9, + IB_QPT_RESERVED10, }; enum ib_qp_create_flags { @@ -766,6 +780,19 @@ enum ib_wr_opcode { IB_WR_MASKED_ATOMIC_CMP_AND_SWP, IB_WR_MASKED_ATOMIC_FETCH_AND_ADD, IB_WR_BIND_MW, + /* reserve values for low level drivers' internal use. +* These values will not be used at all in the ib core layer. +*/ + IB_WR_RESERVED1 = 0xf0, + IB_WR_RESERVED2, + IB_WR_RESERVED3, + IB_WR_RESERVED4, + IB_WR_RESERVED5, + IB_WR_RESERVED6, + IB_WR_RESERVED7, + IB_WR_RESERVED8, + IB_WR_RESERVED9, + IB_WR_RESERVED10, }; enum ib_send_flags { @@ -773,7 +800,11 @@ enum ib_send_flags { IB_SEND_SIGNALED= (11), IB_SEND_SOLICITED = (12), IB_SEND_INLINE = (13), - IB_SEND_IP_CSUM = (14) + IB_SEND_IP_CSUM = (14), + + /* reserve bits 26-31 for low level drivers' internal use */ + IB_SEND_RESERVED_START = (1 26), + IB_SEND_RESERVED_END= (1 31), }; struct ib_sge { -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V3 0/9] Add Mellanox mlx5 driver for Connect-IB devices
Hi Roland, Here's V3 of the mlx5 driver, with Dave's, Joe's and yours comments addressed. Hoping that would be all for getting this into 3.11 Jack, Moshe and Or. changes from V2: - Addressed feedback from Joe Perches: * Added parentheses around sizeof * Removed unnecessary do-while for driver pr_debug envelope (done for mlx5_core.h as well) * Removed unneeded log output on memory allocation failures * Fixed some typos * Used snprintf instead of strcpy/strcat (safer and shorter) * Removed unnecessary local variable sgi from ib_create_ah() * Reduced vzalloc usage by trying to do kzalloc first and vzalloc only if kzalloc fails - Addressed Roland's feedback: * Removed unneeded active flag from health polling -- no need for active flag for re-scheduling from within timer handler when using del_timer_sync. - Also removed some calls to mlx5_ib_dbg() which had newline char only, and therefore only did execution tracing. changes from V1: - Addreessed Dave Miller's comments: * Local variables in functions listed from longest to shortest * --i/++i changed to i--/i++ in all for-loops * Removed leading /* empty line from all comments * magic constants given names * endianness code moved to driver.h, and defined an endianness-dependent macro for use in assignment. * destroy_msg_cache() duplicated code removed - Addressed Roland's comments: * Renamed foo_spl to foo_lock for spinlocks. * Eliminated magic number from mlx5_cmd_stats field declaration in struct mlx5_cmd. * Eliminated unused procedure mlx5_ib_umem_populate_pas() command execution times, but all file-name-based mask bits removed. * Cleaned up mlx5_ib.h: * Added new patch for ib_verbs.h, adding reserved values to several enums * For several ib-core enums, added reserved values for use by low-level drivers. By defining macros at the low level (i.e., renaming the reserved values, in effect), the ll drivers may use these enums without needing to duplicate the ib-core enums while adding extra values. This fixes compilation problems such as: /home/roland/Src/linux-merge.git/drivers/infiniband/hw/mlx5/qp.c:975:2: error: case value 4671 not in enumerated type enum ib_qp_type * Changed ib_latency_class to mlx5_ib_latency_class, visible only in low-level driver * Eliminated the unused IB_WR_xxx_PSV enums * Defined macros MLX5_IB_SEND_UMR_UNREG, MLX5_IB_QPT_REG_UMR, and MLX5_IB_WR_UMR, taking advantage of the reserved values added to the ib_core enums. * debug-mask removed from mlx5_ib * Regarding mlx5_core, still have a debug mask to enable printouts of command data and * Removed forced -Wall -Werror -DDEBUG settings in the mlx5 core/ib makefiles changes from V0: - Per Dave's request, cross posting to both netdev and linux-rdma, to see if there are comments from netdev on the core driver. The patches that follow constitute the driver for Mellanox's 5th generation of HCAs named Connect-IB. The driver is comprised of two kernel modules: mlx5_ib and mlx5_core. This partitioning resembles what we have for mlx4 with the substantial difference that mlx5_ib is the pci device driver and not mlx5_core. mlx5_core provides general functionality that is intended to be used by other Mellanox devices that will be introduced in the future. In this sense, it can be perceived as a library. mlx5_ib has a similar role as any hardware device under drivers/infiniband/hw. The patches are partitioned to avoid exceeding the 100KB vger.kernel.org limitation. They are divided such that the first three ones have the code of the mlx5_core driver, and the last five the code of the mlx5_ib driver. Only the last patch per driver adds the Makefiles and Kconfigs, to make things robust for future bisections. PPC is not yet supported but support will be included in the near future. Eli Cohen (8): net/mlx5: Mellanox Connect-IB, core driver part 1/3 net/mlx5: Mellanox Connect-IB, core driver part 2/3 net/mlx5: Mellanox Connect-IB, core driver part 3/3 IB/mlx5: Mellanox Connect-IB, IB driver part 1/5 IB/mlx5: Mellanox Connect-IB, IB driver part 2/5 IB/mlx5: Mellanox Connect-IB, IB driver part 3/5 IB/mlx5: Mellanox Connect-IB, IB driver part 4/5 IB/mlx5: Mellanox Connect-IB, IB driver part 5/5 Jack Morgenstein (1): IB/core: Add reserved values to enums for low-level drivers use MAINTAINERS| 22 + drivers/infiniband/Kconfig |1 + drivers/infiniband/Makefile|1 + drivers/infiniband/hw/mlx5/Kconfig | 10 + drivers/infiniband/hw/mlx5/Makefile|3 + drivers/infiniband/hw/mlx5/ah.c| 92 + drivers/infiniband/hw/mlx5/cq.c| 843 +++ drivers/infiniband/hw/mlx5/doorbell.c | 100 + drivers/infiniband/hw/mlx5/mad.c | 139 ++
[PATCH V3 7/9] IB/mlx5: Mellanox Connect-IB, IB driver part 3/5
From: Eli Cohen e...@mellanox.com Signed-off-by: Eli Cohen e...@mellanox.com Signed-off-by: Jack Morgenstein ja...@dev.mellanox.co.il Signed-off-by: Or Gerlitz ogerl...@mellanox.com --- drivers/infiniband/hw/mlx5/mlx5_ib.h | 545 ++ drivers/infiniband/hw/mlx5/mr.c | 1014 ++ 2 files changed, 1559 insertions(+), 0 deletions(-) create mode 100644 drivers/infiniband/hw/mlx5/mlx5_ib.h create mode 100644 drivers/infiniband/hw/mlx5/mr.c diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h new file mode 100644 index 000..836be91 --- /dev/null +++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h @@ -0,0 +1,545 @@ +/* + * Copyright (c) 2013, Mellanox Technologies inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + *copyright notice, this list of conditions and the following + *disclaimer. + * + * - Redistributions in binary form must reproduce the above + *copyright notice, this list of conditions and the following + *disclaimer in the documentation and/or other materials + *provided with the distribution. + * + * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ + +#ifndef MLX5_IB_H +#define MLX5_IB_H + +#include linux/kernel.h +#include linux/sched.h +#include rdma/ib_verbs.h +#include rdma/ib_smi.h +#include linux/mlx5/driver.h +#include linux/mlx5/cq.h +#include linux/mlx5/qp.h +#include linux/mlx5/srq.h +#include linux/types.h + +#define mlx5_ib_dbg(dev, format, arg...) \ +pr_debug(%s:%s:%d:(pid %d): format, (dev)-ib_dev.name, __func__, \ +__LINE__, current-pid, ##arg) + +#define mlx5_ib_err(dev, format, arg...) \ +pr_err(%s:%s:%d:(pid %d): format, (dev)-ib_dev.name, __func__, \ + __LINE__, current-pid, ##arg) + +#define mlx5_ib_warn(dev, format, arg...) \ +pr_warn(%s:%s:%d:(pid %d): format, (dev)-ib_dev.name, __func__,\ + __LINE__, current-pid, ##arg) + +enum { + MLX5_IB_MMAP_CMD_SHIFT = 8, + MLX5_IB_MMAP_CMD_MASK = 0xff, +}; + +enum mlx5_ib_mmap_cmd { + MLX5_IB_MMAP_REGULAR_PAGE = 0, + MLX5_IB_MMAP_GET_CONTIGUOUS_PAGES = 1, /* always last */ +}; + +enum { + MLX5_RES_SCAT_DATA32_CQE= 0x1, + MLX5_RES_SCAT_DATA64_CQE= 0x2, + MLX5_REQ_SCAT_DATA32_CQE= 0x11, + MLX5_REQ_SCAT_DATA64_CQE= 0x22, +}; + +enum mlx5_ib_latency_class { + MLX5_IB_LATENCY_CLASS_LOW, + MLX5_IB_LATENCY_CLASS_MEDIUM, + MLX5_IB_LATENCY_CLASS_HIGH, + MLX5_IB_LATENCY_CLASS_FAST_PATH +}; + +enum mlx5_ib_mad_ifc_flags { + MLX5_MAD_IFC_IGNORE_MKEY= 1, + MLX5_MAD_IFC_IGNORE_BKEY= 2, + MLX5_MAD_IFC_NET_VIEW = 4, +}; + +struct mlx5_ib_ucontext { + struct ib_ucontext ibucontext; + struct list_headdb_page_list; + + /* protect doorbell record alloc/free +*/ + struct mutexdb_page_mutex; + struct mlx5_uuar_info uuari; +}; + +static inline struct mlx5_ib_ucontext *to_mucontext(struct ib_ucontext *ibucontext) +{ + return container_of(ibucontext, struct mlx5_ib_ucontext, ibucontext); +} + +struct mlx5_ib_pd { + struct ib_pdibpd; + u32 pdn; + u32 pa_lkey; +}; + +/* Use macros here so that don't have to duplicate + * enum ib_send_flags and enum ib_qp_type for low-level driver + */ + +#define MLX5_IB_SEND_UMR_UNREG IB_SEND_RESERVED_START +#define MLX5_IB_QPT_REG_UMRIB_QPT_RESERVED1 +#define MLX5_IB_WR_UMR IB_WR_RESERVED1 + +struct wr_list { + u16 opcode; + u16 next; +}; + +struct mlx5_ib_wq { + u64*wrid; + u32*wr_data; + struct wr_list *w_list; + unsigned *wqe_head; + u16 unsig_count; + + /* serialize post
[PATCH V3 5/9] IB/mlx5: Mellanox Connect-IB, IB driver part 1/5
From: Eli Cohen e...@mellanox.com Signed-off-by: Eli Cohen e...@mellanox.com Signed-off-by: Jack Morgenstein ja...@dev.mellanox.co.il Signed-off-by: Or Gerlitz ogerl...@mellanox.com --- drivers/infiniband/hw/mlx5/ah.c | 92 drivers/infiniband/hw/mlx5/cq.c | 843 + drivers/infiniband/hw/mlx5/doorbell.c | 100 drivers/infiniband/hw/mlx5/mad.c | 139 ++ 4 files changed, 1174 insertions(+), 0 deletions(-) create mode 100644 drivers/infiniband/hw/mlx5/ah.c create mode 100644 drivers/infiniband/hw/mlx5/cq.c create mode 100644 drivers/infiniband/hw/mlx5/doorbell.c create mode 100644 drivers/infiniband/hw/mlx5/mad.c diff --git a/drivers/infiniband/hw/mlx5/ah.c b/drivers/infiniband/hw/mlx5/ah.c new file mode 100644 index 000..39ab0ca --- /dev/null +++ b/drivers/infiniband/hw/mlx5/ah.c @@ -0,0 +1,92 @@ +/* + * Copyright (c) 2013, Mellanox Technologies inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + *copyright notice, this list of conditions and the following + *disclaimer. + * + * - Redistributions in binary form must reproduce the above + *copyright notice, this list of conditions and the following + *disclaimer in the documentation and/or other materials + *provided with the distribution. + * + * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ + +#include mlx5_ib.h + +struct ib_ah *create_ib_ah(struct ib_ah_attr *ah_attr, + struct mlx5_ib_ah *ah) +{ + if (ah_attr-ah_flags IB_AH_GRH) { + memcpy(ah-av.rgid, ah_attr-grh.dgid, 16); + ah-av.grh_gid_fl = cpu_to_be32(ah_attr-grh.flow_label | + (1 30) | + ah_attr-grh.sgid_index 20); + ah-av.hop_limit = ah_attr-grh.hop_limit; + ah-av.tclass = ah_attr-grh.traffic_class; + } + + ah-av.rlid = cpu_to_be16(ah_attr-dlid); + ah-av.fl_mlid = ah_attr-src_path_bits 0x7f; + ah-av.stat_rate_sl = (ah_attr-static_rate 4) | (ah_attr-sl 0xf); + + return ah-ibah; +} + +struct ib_ah *mlx5_ib_create_ah(struct ib_pd *pd, struct ib_ah_attr *ah_attr) +{ + struct mlx5_ib_ah *ah; + + ah = kzalloc(sizeof(*ah), GFP_ATOMIC); + if (!ah) + return ERR_PTR(-ENOMEM); + + return create_ib_ah(ah_attr, ah); /* never fails */ +} + +int mlx5_ib_query_ah(struct ib_ah *ibah, struct ib_ah_attr *ah_attr) +{ + struct mlx5_ib_ah *ah = to_mah(ibah); + u32 tmp; + + memset(ah_attr, 0, sizeof(*ah_attr)); + + tmp = be32_to_cpu(ah-av.grh_gid_fl); + if (tmp (1 30)) { + ah_attr-ah_flags = IB_AH_GRH; + ah_attr-grh.sgid_index = (tmp 20) 0xff; + ah_attr-grh.flow_label = tmp 0xf; + memcpy(ah_attr-grh.dgid, ah-av.rgid, 16); + ah_attr-grh.hop_limit = ah-av.hop_limit; + ah_attr-grh.traffic_class = ah-av.tclass; + } + ah_attr-dlid = be16_to_cpu(ah-av.rlid); + ah_attr-static_rate = ah-av.stat_rate_sl 4; + ah_attr-sl = ah-av.stat_rate_sl 0xf; + + return 0; +} + +int mlx5_ib_destroy_ah(struct ib_ah *ah) +{ + kfree(to_mah(ah)); + return 0; +} diff --git a/drivers/infiniband/hw/mlx5/cq.c b/drivers/infiniband/hw/mlx5/cq.c new file mode 100644 index 000..344ab03 --- /dev/null +++ b/drivers/infiniband/hw/mlx5/cq.c @@ -0,0 +1,843 @@ +/* + * Copyright (c) 2013, Mellanox Technologies inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following
[PATCH V3 9/9] IB/mlx5: Mellanox Connect-IB, IB driver part 5/5
From: Eli Cohen e...@mellanox.com Signed-off-by: Eli Cohen e...@mellanox.com Signed-off-by: Jack Morgenstein ja...@dev.melanox.co.il Signed-off-by: Or Gerlitz ogerl...@mellanox.com --- MAINTAINERS | 10 ++ drivers/infiniband/Kconfig |1 + drivers/infiniband/Makefile |1 + drivers/infiniband/hw/mlx5/Kconfig | 10 ++ drivers/infiniband/hw/mlx5/Makefile |3 +++ 5 files changed, 25 insertions(+), 0 deletions(-) create mode 100644 drivers/infiniband/hw/mlx5/Kconfig create mode 100644 drivers/infiniband/hw/mlx5/Makefile diff --git a/MAINTAINERS b/MAINTAINERS index 6e82fb5..b426536 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -5377,6 +5377,16 @@ S: Supported F: drivers/net/ethernet/mellanox/mlx5/core/ F: include/linux/mlx5/ +Mellanox MLX5 IB driver +M: Eli Cohen e...@mellanox.com +L: linux-rdma@vger.kernel.org +W: http://www.mellanox.com +Q: http://patchwork.kernel.org/project/linux-rdma/list/ +T: git://openfabrics.org/~eli/connect-ib.git +S: Supported +F: include/linux/mlx5/ +F: drivers/infiniband/hw/mlx5/ + MODULE SUPPORT M: Rusty Russell ru...@rustcorp.com.au S: Maintained diff --git a/drivers/infiniband/Kconfig b/drivers/infiniband/Kconfig index c85b56c..5ceda71 100644 --- a/drivers/infiniband/Kconfig +++ b/drivers/infiniband/Kconfig @@ -50,6 +50,7 @@ source drivers/infiniband/hw/amso1100/Kconfig source drivers/infiniband/hw/cxgb3/Kconfig source drivers/infiniband/hw/cxgb4/Kconfig source drivers/infiniband/hw/mlx4/Kconfig +source drivers/infiniband/hw/mlx5/Kconfig source drivers/infiniband/hw/nes/Kconfig source drivers/infiniband/hw/ocrdma/Kconfig diff --git a/drivers/infiniband/Makefile b/drivers/infiniband/Makefile index b126fef..1fe6988 100644 --- a/drivers/infiniband/Makefile +++ b/drivers/infiniband/Makefile @@ -7,6 +7,7 @@ obj-$(CONFIG_INFINIBAND_AMSO1100) += hw/amso1100/ obj-$(CONFIG_INFINIBAND_CXGB3) += hw/cxgb3/ obj-$(CONFIG_INFINIBAND_CXGB4) += hw/cxgb4/ obj-$(CONFIG_MLX4_INFINIBAND) += hw/mlx4/ +obj-$(CONFIG_MLX5_INFINIBAND) += hw/mlx5/ obj-$(CONFIG_INFINIBAND_NES) += hw/nes/ obj-$(CONFIG_INFINIBAND_OCRDMA)+= hw/ocrdma/ obj-$(CONFIG_INFINIBAND_IPOIB) += ulp/ipoib/ diff --git a/drivers/infiniband/hw/mlx5/Kconfig b/drivers/infiniband/hw/mlx5/Kconfig new file mode 100644 index 000..8e6aebf --- /dev/null +++ b/drivers/infiniband/hw/mlx5/Kconfig @@ -0,0 +1,10 @@ +config MLX5_INFINIBAND + tristate Mellanox Connect-IB HCA support + depends on NETDEVICES ETHERNET PCI X86 + select NET_VENDOR_MELLANOX + select MLX5_CORE + ---help--- + This driver provides low-level InfiniBand support for + Mellanox Connect-IB PCI Express host channel adapters (HCAs). + This is required to use InfiniBand protocols such as + IP-over-IB or SRP with these devices. diff --git a/drivers/infiniband/hw/mlx5/Makefile b/drivers/infiniband/hw/mlx5/Makefile new file mode 100644 index 000..4ea0135 --- /dev/null +++ b/drivers/infiniband/hw/mlx5/Makefile @@ -0,0 +1,3 @@ +obj-$(CONFIG_MLX5_INFINIBAND) += mlx5_ib.o + +mlx5_ib-y := main.o cq.o doorbell.o qp.o mem.o srq.o mr.o ah.o mad.o -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V3 6/9] IB/mlx5: Mellanox Connect-IB, IB driver part 2/5
From: Eli Cohen e...@mellanox.com Signed-off-by: Eli Cohen e...@mellanox.com Signed-off-by: Jack Morgenstein ja...@dev.mellanox.co.il Signed-off-by: Or Gerlitz ogerl...@mellanox.com --- drivers/infiniband/hw/mlx5/main.c | 1504 + drivers/infiniband/hw/mlx5/mem.c | 162 2 files changed, 1666 insertions(+), 0 deletions(-) create mode 100644 drivers/infiniband/hw/mlx5/main.c create mode 100644 drivers/infiniband/hw/mlx5/mem.c diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c new file mode 100644 index 000..6b1007f --- /dev/null +++ b/drivers/infiniband/hw/mlx5/main.c @@ -0,0 +1,1504 @@ +/* + * Copyright (c) 2013, Mellanox Technologies inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + *copyright notice, this list of conditions and the following + *disclaimer. + * + * - Redistributions in binary form must reproduce the above + *copyright notice, this list of conditions and the following + *disclaimer in the documentation and/or other materials + *provided with the distribution. + * + * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ + +#include asm-generic/kmap_types.h +#include linux/module.h +#include linux/init.h +#include linux/errno.h +#include linux/pci.h +#include linux/dma-mapping.h +#include linux/slab.h +#include linux/io-mapping.h +#include linux/sched.h +#include rdma/ib_user_verbs.h +#include rdma/ib_smi.h +#include rdma/ib_umem.h +#include user.h +#include mlx5_ib.h + +#define DRIVER_NAME mlx5_ib +#define DRIVER_VERSION 1.0 +#define DRIVER_RELDATE June 2013 + +MODULE_AUTHOR(Eli Cohen e...@mellanox.com); +MODULE_DESCRIPTION(Mellanox Connect-IB HCA IB driver); +MODULE_LICENSE(Dual BSD/GPL); +MODULE_VERSION(DRIVER_VERSION); + +static int prof_sel = 2; +module_param_named(prof_sel, prof_sel, int, 0444); +MODULE_PARM_DESC(prof_sel, profile selector. Valid range 0 - 2); + +static char mlx5_version[] = + DRIVER_NAME : Mellanox Connect-IB Infiniband driver v + DRIVER_VERSION ( DRIVER_RELDATE )\n; + +struct mlx5_profile profile[] = { + [0] = { + .mask = 0, + }, + [1] = { + .mask = MLX5_PROF_MASK_QP_SIZE, + .log_max_qp = 12, + }, + [2] = { + .mask = MLX5_PROF_MASK_QP_SIZE | + MLX5_PROF_MASK_MR_CACHE, + .log_max_qp = 17, + .mr_cache[0]= { + .size = 500, + .limit = 250 + }, + .mr_cache[1]= { + .size = 500, + .limit = 250 + }, + .mr_cache[2]= { + .size = 500, + .limit = 250 + }, + .mr_cache[3]= { + .size = 500, + .limit = 250 + }, + .mr_cache[4]= { + .size = 500, + .limit = 250 + }, + .mr_cache[5]= { + .size = 500, + .limit = 250 + }, + .mr_cache[6]= { + .size = 500, + .limit = 250 + }, + .mr_cache[7]= { + .size = 500, + .limit = 250 + }, + .mr_cache[8]= { + .size = 500, + .limit = 250 + }, + .mr_cache[9]= { + .size = 500, + .limit = 250 + }, + .mr_cache[10] = { + .size = 500, + .limit = 250 + }, + .mr_cache[11] = { + .size = 500, + .limit
wrong email address in mlx5 patch signature
Hi Roland, There's a typo in Jack's email address which is our mistake, was in V3 9/9, please fix it to be Jack Morgenstein ja...@dev.mellanox.co.il (the error is missing l in mellanox) thanks, Or. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html