Re: CMA handler status code
Eldad Zinger wrote: event.status = ib_event-param.sidr_rep_rcvd.status event.status = ib_event-param.rej_rcvd.reason event.status should be 0 for success, or negative value of generic error code. In that code, the error code is positive and do not comply with generic error code. Basically, I believe that the status equals reject reason for rdma-cm reject event is known to the kernel developers that deal with the rdma-cm. Personally, I'm fine with it, we could document that, but currently there's no rdma-cm document under Documentation/infiniband which could have this. For user space, I would add a comment in the man pages In order to make the status field available for other modules (like SDP), that field should be format-consistent. With SDP being out of tree for about four-six years (and counting), somehow hard to take into account claims related to it. Ot. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: CMA handler status code
For user space, I would add a comment in the man pages [PATCH] librdmacm/man: document status field semantics for rejected event document status being the IB reject reason for RDMA_CM_EVENT_REJECTED event Signed-off-by: Or Gerlitz ogerl...@voltaire.com diff --git a/man/rdma_get_cm_event.3 b/man/rdma_get_cm_event.3 index 79bf606..91317c4 100644 --- a/man/rdma_get_cm_event.3 +++ b/man/rdma_get_cm_event.3 @@ -126,7 +126,8 @@ Generated on the active side to notify the user that the remote server is not reachable or unable to respond to a connection request. .IP RDMA_CM_EVENT_REJECTED Indicates that a connection request or response was rejected by the remote -end point. +end point. Under Infiniband, the event status field contains the reject reason +as provided by the IB CM. .IP RDMA_CM_EVENT_ESTABLISHED Indicates that a connection has been established with the remote end point. .IP RDMA_CM_EVENT_DISCONNECTED -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: CMA handler status code
drivers/infiniband/core/cma.c : 2204 drivers/infiniband/core/cma.c : 976 event.status = ib_event-param.sidr_rep_rcvd.status event.status = ib_event-param.rej_rcvd.reason The original intent was to expose the transport specific status values to the user, rather than trying to map them. - Sean -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] ib/mlx4: add IB_CQ_REPORT_MISSED_EVENTS support
enhance the cq arming code to support IB_CQ_REPORT_MISSED_EVENTS Signed-off-by: Or Gerlitz ogerl...@voltaire.com I noted that the IB_CQ_REPORT_MISSED_EVENTS flag was added in the same cycle with mlx4 and maybe as of this, mlx4 didn't implement the flag, which is used by IPoIB The patch is compile tested only, if the patch seems okay, I can conduct further testing. diff --git a/drivers/infiniband/hw/mlx4/cq.c b/drivers/infiniband/hw/mlx4/cq.c index 5a219a2..4366811 100644 --- a/drivers/infiniband/hw/mlx4/cq.c +++ b/drivers/infiniband/hw/mlx4/cq.c @@ -755,6 +755,13 @@ int mlx4_ib_arm_cq(struct ib_cq *ibcq, enum ib_cq_notify_flags flags) to_mdev(ibcq-device)-uar_map, MLX4_GET_DOORBELL_LOCK(to_mdev(ibcq-device)-uar_lock)); + if (flags IB_CQ_REPORT_MISSED_EVENTS) { + struct mlx4_cqe *cqe; + cqe = next_cqe_sw(to_mcq(ibcq)); + if (cqe) + return 1; + } + return 0; } -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: About a shortcoming of the verbs API
On Mon, Jul 26, 2010 at 9:22 PM, Roland Dreier rdre...@cisco.com wrote: [ ... ] Another approach is to just always run the completion processing for a given CQ on a single CPU and avoid locking entirely. If you want more CPUs to spread the work, just use multiple CQs and multiple event vectors. In the applications I'm familiar with InfiniBand is being used not only because of its low latency but also because of its high throughput. In order to handle such loads efficiently, interrupts have to be spread over multiple CPUs. Switching from a single receive queue to multiple receive queues is an interesting alternative, but is not possible without changing the communication protocol between client and server. Changing the communication protocol is not always possible, especially when the communication protocol has been defined by a standards organization. see e.g. VipCQNotify() in the Virtual Interface Architecture Specification. I don't know of an efficient way to implement this type of atomic dequeue completion or enable completions with any existing hardware. Do you have an idea how this could be done? I am not an expert with regard to HCA programming. But I assume the above should refer to reprogrammable firmware instead of hardware ? Bart. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] ib/mlx4: add IB_CQ_REPORT_MISSED_EVENTS support
Can I conclude from this that the polling loop (2) from http://www.mail-archive.com/linux-rdma@vger.kernel.org/msg04907.html won't trigger a race on a multiprocessor when using mlx4 hardware ? Bart. On Tue, Jul 27, 2010 at 11:14 AM, Eli Cohen e...@dev.mellanox.co.il wrote: I don't think this patch is required for mlx4. 0 means an error occurred while requesting notification == 0 means notification was requested successfully, and if IB_CQ_REPORT_MISSED_EVENTS was passed in, then no events were missed and it is safe to wait for another event. 0 is only returned if IB_CQ_REPORT_MISSED_EVENTS was passed in. It means that the consumer must poll the CQ again to make sure it is empty to avoid the race described above. returning 1 means that you must poll the CQ to avoid a race condition which is not true for mlx4. For example if you return always 0 than you don't violate what the changelog says. On Tue, Jul 27, 2010 at 11:21 AM, Or Gerlitz ogerl...@voltaire.com wrote: enhance the cq arming code to support IB_CQ_REPORT_MISSED_EVENTS Signed-off-by: Or Gerlitz ogerl...@voltaire.com I noted that the IB_CQ_REPORT_MISSED_EVENTS flag was added in the same cycle with mlx4 and maybe as of this, mlx4 didn't implement the flag, which is used by IPoIB The patch is compile tested only, if the patch seems okay, I can conduct further testing. diff --git a/drivers/infiniband/hw/mlx4/cq.c b/drivers/infiniband/hw/mlx4/cq.c index 5a219a2..4366811 100644 --- a/drivers/infiniband/hw/mlx4/cq.c +++ b/drivers/infiniband/hw/mlx4/cq.c @@ -755,6 +755,13 @@ int mlx4_ib_arm_cq(struct ib_cq *ibcq, enum ib_cq_notify_flags flags) to_mdev(ibcq-device)-uar_map, MLX4_GET_DOORBELL_LOCK(to_mdev(ibcq-device)-uar_lock)); + if (flags IB_CQ_REPORT_MISSED_EVENTS) { + struct mlx4_cqe *cqe; + cqe = next_cqe_sw(to_mcq(ibcq)); + if (cqe) + return 1; + } + return 0; } -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] ib/mlx4: add IB_CQ_REPORT_MISSED_EVENTS support
Eli Cohen wrote: returning 1 means that you must poll the CQ to avoid a race condition which is not true for mlx4. makes sense, thanks for clarifying that. Or. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: About a shortcoming of the verbs API
In the applications I'm familiar with InfiniBand is being used not only because of its low latency but also because of its high throughput. Yes, I seem to recall hearing that people care about throughput as well. In order to handle such loads efficiently, interrupts have to be spread over multiple CPUs. Let's look at what you say you want here: - strict in-order processing of completions - work spread across multiple CPUs Do you see that the two goals are contradictory? If you are running work on multiple CPUs in parallel, then there can't be an order assumed between CPUs -- otherwise you serialize the processing and lose all the benefit of parallelism. Switching from a single receive queue to multiple receive queues is an interesting alternative, but is not possible without changing the communication protocol between client and server. Changing the communication protocol is not always possible, especially when the communication protocol has been defined by a standards organization. If you only have a single client talking to a single server over a single connection, then yes the opportunities for parallelism are limited. By the way, looking at VipCQNotify further, I'm not sure I follow exactly the race you're worried about. If you're willing to do your processing from the completion notification callback (which seems to be the approach that VipCQNotify forces), then doesn't the following (from Documentation/infiniband/core_locking.txt): The low-level driver is responsible for ensuring that multiple completion event handlers for the same CQ are not called simultaneously. The driver must guarantee that only one CQ event handler for a given CQ is running at a time. In other words, the following situation is not allowed: CPU1CPU2 low-level driver - consumer CQ event callback: /* ... */ ib_req_notify_cq(cq, ...); low-level driver - /* ... */ consumer CQ event callback: /* ... */ return from CQ event handler mean that the problem you are complaining about doesn't actually exist? - R. -- Roland Dreier rola...@cisco.com || For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/index.html -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: About a shortcoming of the verbs API
On Tue, Jul 27, 2010 at 6:50 PM, Roland Dreier rdre...@cisco.com wrote: [ ... ] From Documentation/infiniband/core_locking.txt: The low-level driver is responsible for ensuring that multiple completion event handlers for the same CQ are not called simultaneously. The driver must guarantee that only one CQ event handler for a given CQ is running at a time. In other words, the following situation is not allowed: CPU1 CPU2 low-level driver - consumer CQ event callback: /* ... */ ib_req_notify_cq(cq, ...); low-level driver - /* ... */ consumer CQ event callback: /* ... */ return from CQ event handler mean that the problem you are complaining about doesn't actually exist? As far as I know it is not possible for a HCA to tell whether or not a CPU has finished executing the interrupt it triggered. So it is not possible for the HCA to implement the above requirement by delaying the generation of a new interrupt -- implementing the above requirement is only possible in the low-level driver. A low-level driver could e.g. postpone notification reenabling until the end of the interrupt handler or it could use a spinlock to prevent simultaneous execution of notification handlers. I have inspected the source code of one particular low-level driver but could not find any such provisions. Did I overlook something ? Bart. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: About a shortcoming of the verbs API
On Tue, Jul 27, 2010 at 08:03:25PM +0200, Bart Van Assche wrote: As far as I know it is not possible for a HCA to tell whether or not a CPU has finished executing the interrupt it triggered. So it is not possible for the HCA to implement the above requirement by delaying the generation of a new interrupt -- implementing the above Linux does not allow interrupts to re-enter.. Read through kernel/irq/chip.c handle_edge_irq to get a sense of how that is done for MSI. Looked to me like all the CQ call backs flowed from the interrupt handler in mlx4? Jason -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: About a shortcoming of the verbs API
On Tue, Jul 27, 2010 at 09:28:54PM +0200, Bart Van Assche wrote: I have two more questions: - Some time ago I observed that the kernel reported soft lockups because of spin_lock() calls inside a completion handler. These spinlocks were not locked in any other context than the completion handler itself. And the lockups disappeared after having replaced the spin_lock() calls by spin_lock_irqsave(). Can it be concluded from this observation that completion handlers are not always invoked from interrupt context ? I don't know.. It wouldn't surprise me if there were some error paths that called completion handlers outside an IRQ context, but as Roland pointed out the API guarantee is that this never happens in parallel with interrupt called cases. - The function handle_edge_irq() in kernel/irq/chip.c invokes the actual interrupt handler while the spinlock desc-lock is not locked. Does that mean that a completion interrupt can get lost due to the It holds desc-lock while manipulating the flags, so IRQ_PENDING will be set by CPU 2 and CPU 1 will notice once and re-invoke the handler once it re-locks desc. Jason -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
rds-tools issue starting with OFED-1.5.2-20100718-0600.tgz
Hi, It looks like starting on July 18, the daily snapshot for OFED-1.5.2 changed what version of rds-tools it was based on: diff -urN OFED-1.5.2-20100717-0600/BUILD_ID OFED-1.5.2-20100718-0600/BUILD_ID --- OFED-1.5.2-20100717-0600/BUILD_ID 2010-07-17 07:13:29.0 -0600 +++ OFED-1.5.2-20100718-0600/BUILD_ID 2010-07-18 07:13:29.0 -0600 @@ -1,4 +1,4 @@ -OFED-1.5.2-20100717-0600: +OFED-1.5.2-20100718-0600: compat-dapl: http://www.openfabrics.org/downloads/dapl/compat-dapl-1.2.18.tar.gz @@ -77,7 +77,7 @@ ofa_kernel: git://git.openfabrics.org/ofed_1_5/linux-2.6.git ofed_kernel_1_5 -commit 0c842405cd3d204b23125836a8749fe7cd40b566 +commit 6bcc8f2eb4f005f430ee8f1d6962ba6778d6bbd8 ofed-docs: git://git.openfabrics.org/~tziporet/docs.git ofed_1_5 @@ -105,7 +105,7 @@ http://www.openfabrics.org/downloads/qperf/qperf-0.4.6-0.1.gb81434e.tar.gz rds-tools: -http://www.openfabrics.org/~vlad/ofed_1_5/rds-tools/rds-tools-1.5-1.src.rpm +http://www.openfabrics.org/downloads/rds-tools/rds-tools-2.0.3.tar.gz It also looks like, from http://oss.oracle.com/git/?p=agrover/rds-tools.git;a=summary that rds-tools now builds into two RPMs, rds-tools and rds-devel, but the OFED build scripts don't seem to know about that change. I'd like to learn how to write apps that use RDS, so I thought I needed rds.h to compile against, in hopes of running against the latest upstream kernel RDS. But, I can't seem to get it from the 1.5.2 daily snapshot, as a rds-devel rpm isn't getting installed. Is there somewhere else to get the appropriate rds.h from? Thanks -- Jim -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html