Re: CMA handler status code

2010-07-27 Thread Or Gerlitz
Eldad Zinger wrote:
 event.status = ib_event-param.sidr_rep_rcvd.status
 event.status = ib_event-param.rej_rcvd.reason
 event.status should be 0 for success, or negative value of generic error code.
 In that code, the error code is positive and do not comply with generic error 
 code.

Basically, I believe that the status equals reject reason for rdma-cm reject 
event
is known to the kernel developers that deal with the rdma-cm. Personally, I'm 
fine
with it, we could document that, but currently there's no rdma-cm document 
under 
Documentation/infiniband which could have this.

For user space, I would add a comment in the man pages

 In order to make the status field available for other modules (like
 SDP), that field should be format-consistent.

With SDP being out of tree for about four-six years (and counting), somehow
hard to take into account claims related to it.

Ot.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: CMA handler status code

2010-07-27 Thread Or Gerlitz
 For user space, I would add a comment in the man pages

[PATCH] librdmacm/man: document status field semantics for rejected event

document status being the IB reject reason for RDMA_CM_EVENT_REJECTED event

Signed-off-by: Or Gerlitz ogerl...@voltaire.com

diff --git a/man/rdma_get_cm_event.3 b/man/rdma_get_cm_event.3
index 79bf606..91317c4 100644
--- a/man/rdma_get_cm_event.3
+++ b/man/rdma_get_cm_event.3
@@ -126,7 +126,8 @@ Generated on the active side to notify the user that the 
remote server is
 not reachable or unable to respond to a connection request.
 .IP RDMA_CM_EVENT_REJECTED
 Indicates that a connection request or response was rejected by the remote
-end point.
+end point. Under Infiniband, the event status field contains the reject reason
+as provided by the IB CM.
 .IP RDMA_CM_EVENT_ESTABLISHED
 Indicates that a connection has been established with the remote end point.
 .IP RDMA_CM_EVENT_DISCONNECTED
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: CMA handler status code

2010-07-27 Thread Hefty, Sean
 drivers/infiniband/core/cma.c : 2204
 drivers/infiniband/core/cma.c : 976
 
 event.status = ib_event-param.sidr_rep_rcvd.status
 event.status = ib_event-param.rej_rcvd.reason

The original intent was to expose the transport specific status values to the 
user, rather than trying to map them.

- Sean

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] ib/mlx4: add IB_CQ_REPORT_MISSED_EVENTS support

2010-07-27 Thread Or Gerlitz
enhance the cq arming code to support IB_CQ_REPORT_MISSED_EVENTS

Signed-off-by: Or Gerlitz ogerl...@voltaire.com



I noted that the IB_CQ_REPORT_MISSED_EVENTS flag was added in the same cycle 
with mlx4
and maybe as of this, mlx4 didn't implement the flag, which is used by IPoIB

The patch is compile tested only, if the patch seems okay, I can conduct 
further testing.

diff --git a/drivers/infiniband/hw/mlx4/cq.c b/drivers/infiniband/hw/mlx4/cq.c
index 5a219a2..4366811 100644
--- a/drivers/infiniband/hw/mlx4/cq.c
+++ b/drivers/infiniband/hw/mlx4/cq.c
@@ -755,6 +755,13 @@ int mlx4_ib_arm_cq(struct ib_cq *ibcq, enum 
ib_cq_notify_flags flags)
to_mdev(ibcq-device)-uar_map,
MLX4_GET_DOORBELL_LOCK(to_mdev(ibcq-device)-uar_lock));
 
+   if (flags  IB_CQ_REPORT_MISSED_EVENTS) {
+   struct mlx4_cqe *cqe;
+   cqe = next_cqe_sw(to_mcq(ibcq));
+   if (cqe)
+   return 1;
+   }
+
return 0;
 }
 
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: About a shortcoming of the verbs API

2010-07-27 Thread Bart Van Assche
On Mon, Jul 26, 2010 at 9:22 PM, Roland Dreier rdre...@cisco.com wrote:
 [ ... ]

 Another approach is to just always run the completion processing for a
 given CQ on a single CPU and avoid locking entirely.  If you want more
 CPUs to spread the work, just use multiple CQs and multiple event vectors.

In the applications I'm familiar with InfiniBand is being used not
only because of its low latency but also because of its high
throughput. In order to handle such loads efficiently, interrupts have
to be spread over multiple CPUs.

Switching from a single receive queue to multiple receive queues is an
interesting alternative, but is not possible without changing the
communication protocol between client and server. Changing the
communication protocol is not always possible, especially when the
communication protocol has been defined by a standards organization.

   see e.g. VipCQNotify() in the Virtual Interface Architecture
   Specification.

 I don't know of an efficient way to implement this type of atomic
 dequeue completion or enable completions with any existing hardware.
 Do you have an idea how this could be done?

I am not an expert with regard to HCA programming. But I assume the
above should refer to reprogrammable firmware instead of hardware
?

Bart.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] ib/mlx4: add IB_CQ_REPORT_MISSED_EVENTS support

2010-07-27 Thread Bart Van Assche
Can I conclude from this that the polling loop (2) from
http://www.mail-archive.com/linux-rdma@vger.kernel.org/msg04907.html
won't trigger a race on a multiprocessor when using mlx4 hardware ?

Bart.

On Tue, Jul 27, 2010 at 11:14 AM, Eli Cohen e...@dev.mellanox.co.il wrote:

 I don't think this patch is required for mlx4.

          0    means an error occurred while requesting notification
        == 0    means notification was requested successfully, and if
                IB_CQ_REPORT_MISSED_EVENTS was passed in, then no
                events were missed and it is safe to wait for another
                event.
          0    is only returned if IB_CQ_REPORT_MISSED_EVENTS was
                passed in.  It means that the consumer must poll the
                CQ again to make sure it is empty to avoid the race
                described above.

 returning 1 means that you must poll the CQ to avoid a race condition
 which is not true for mlx4. For example if you return always 0 than
 you don't violate what the changelog says.

 On Tue, Jul 27, 2010 at 11:21 AM, Or Gerlitz ogerl...@voltaire.com wrote:
  enhance the cq arming code to support IB_CQ_REPORT_MISSED_EVENTS
 
  Signed-off-by: Or Gerlitz ogerl...@voltaire.com
 
  
 
  I noted that the IB_CQ_REPORT_MISSED_EVENTS flag was added in the same 
  cycle with mlx4
  and maybe as of this, mlx4 didn't implement the flag, which is used by IPoIB
 
  The patch is compile tested only, if the patch seems okay, I can conduct 
  further testing.
 
  diff --git a/drivers/infiniband/hw/mlx4/cq.c 
  b/drivers/infiniband/hw/mlx4/cq.c
  index 5a219a2..4366811 100644
  --- a/drivers/infiniband/hw/mlx4/cq.c
  +++ b/drivers/infiniband/hw/mlx4/cq.c
  @@ -755,6 +755,13 @@ int mlx4_ib_arm_cq(struct ib_cq *ibcq, enum 
  ib_cq_notify_flags flags)
                     to_mdev(ibcq-device)-uar_map,
                     
  MLX4_GET_DOORBELL_LOCK(to_mdev(ibcq-device)-uar_lock));
 
  +       if (flags  IB_CQ_REPORT_MISSED_EVENTS) {
  +               struct mlx4_cqe *cqe;
  +               cqe = next_cqe_sw(to_mcq(ibcq));
  +               if (cqe)
  +                       return 1;
  +       }
  +
         return 0;
   }
 
  --
  To unsubscribe from this list: send the line unsubscribe linux-rdma in
  the body of a message to majord...@vger.kernel.org
  More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
 --
 To unsubscribe from this list: send the line unsubscribe linux-rdma in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] ib/mlx4: add IB_CQ_REPORT_MISSED_EVENTS support

2010-07-27 Thread Or Gerlitz
Eli Cohen wrote:
 returning 1 means that you must poll the CQ to avoid a race condition
 which is not true for mlx4. 

makes sense, thanks for clarifying that.

Or.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: About a shortcoming of the verbs API

2010-07-27 Thread Roland Dreier
  In the applications I'm familiar with InfiniBand is being used not
  only because of its low latency but also because of its high
  throughput.

Yes, I seem to recall hearing that people care about throughput as well.

  In order to handle such loads efficiently, interrupts have to be
  spread over multiple CPUs.

Let's look at what you say you want here:

 - strict in-order processing of completions
 - work spread across multiple CPUs

Do you see that the two goals are contradictory?  If you are running
work on multiple CPUs in parallel, then there can't be an order assumed
between CPUs -- otherwise you serialize the processing and lose all the
benefit of parallelism.

  Switching from a single receive queue to multiple receive queues is an
  interesting alternative, but is not possible without changing the
  communication protocol between client and server. Changing the
  communication protocol is not always possible, especially when the
  communication protocol has been defined by a standards organization.

If you only have a single client talking to a single server over a
single connection, then yes the opportunities for parallelism are
limited.

By the way, looking at VipCQNotify further, I'm not sure I follow
exactly the race you're worried about.  If you're willing to do your
processing from the completion notification callback (which seems to be
the approach that VipCQNotify forces), then doesn't the following (from
Documentation/infiniband/core_locking.txt):

  The low-level driver is responsible for ensuring that multiple
  completion event handlers for the same CQ are not called
  simultaneously.  The driver must guarantee that only one CQ event
  handler for a given CQ is running at a time.  In other words, the
  following situation is not allowed:

CPU1CPU2

  low-level driver -
consumer CQ event callback:
  /* ... */
  ib_req_notify_cq(cq, ...);
low-level driver -
  /* ... */   consumer CQ event callback:
/* ... */
  return from CQ event handler

mean that the problem you are complaining about doesn't actually exist?

 - R.
-- 
Roland Dreier rola...@cisco.com || For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: About a shortcoming of the verbs API

2010-07-27 Thread Bart Van Assche
On Tue, Jul 27, 2010 at 6:50 PM, Roland Dreier rdre...@cisco.com wrote:
 [ ... ]

 From Documentation/infiniband/core_locking.txt:

  The low-level driver is responsible for ensuring that multiple
  completion event handlers for the same CQ are not called
  simultaneously.  The driver must guarantee that only one CQ event
  handler for a given CQ is running at a time.  In other words, the
  following situation is not allowed:

        CPU1                                    CPU2

  low-level driver -
    consumer CQ event callback:
      /* ... */
      ib_req_notify_cq(cq, ...);
                                        low-level driver -
      /* ... */                           consumer CQ event callback:
                                            /* ... */
      return from CQ event handler

 mean that the problem you are complaining about doesn't actually exist?

As far as I know it is not possible for a HCA to tell whether or not a
CPU has finished executing the interrupt it triggered. So it is not
possible for the HCA to implement the above requirement by delaying
the generation of a new interrupt -- implementing the above
requirement is only possible in the low-level driver. A low-level
driver could e.g. postpone notification reenabling until the end of
the interrupt handler or it could use a spinlock to prevent
simultaneous execution of notification handlers. I have inspected the
source code of one particular low-level driver but could not find any
such provisions. Did I overlook something ?

Bart.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: About a shortcoming of the verbs API

2010-07-27 Thread Jason Gunthorpe
On Tue, Jul 27, 2010 at 08:03:25PM +0200, Bart Van Assche wrote:

 As far as I know it is not possible for a HCA to tell whether or not a
 CPU has finished executing the interrupt it triggered. So it is not
 possible for the HCA to implement the above requirement by delaying
 the generation of a new interrupt -- implementing the above

Linux does not allow interrupts to re-enter.. Read through
kernel/irq/chip.c handle_edge_irq to get a sense of how that is done
for MSI. Looked to me like all the CQ call backs flowed from the
interrupt handler in mlx4?

Jason
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: About a shortcoming of the verbs API

2010-07-27 Thread Jason Gunthorpe
On Tue, Jul 27, 2010 at 09:28:54PM +0200, Bart Van Assche wrote:

 I have two more questions: - Some time ago I observed that the
 kernel reported soft lockups because of spin_lock() calls inside a
 completion handler. These spinlocks were not locked in any other
 context than the completion handler itself. And the lockups
 disappeared after having replaced the spin_lock() calls by
 spin_lock_irqsave(). Can it be concluded from this observation that
 completion handlers are not always invoked from interrupt context ?

I don't know.. It wouldn't surprise me if there were some error paths
that called completion handlers outside an IRQ context, but as Roland
pointed out the API guarantee is that this never happens in parallel
with interrupt called cases.

 - The function handle_edge_irq() in kernel/irq/chip.c invokes the
 actual interrupt handler while the spinlock desc-lock is not
 locked.  Does that mean that a completion interrupt can get lost due
 to the

It holds desc-lock while manipulating the flags, so IRQ_PENDING will
be set by CPU 2 and CPU 1 will notice once and re-invoke the handler
once it re-locks desc.

Jason
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


rds-tools issue starting with OFED-1.5.2-20100718-0600.tgz

2010-07-27 Thread Jim Schutt
Hi,

It looks like starting on July 18, the daily snapshot for OFED-1.5.2
changed what version of rds-tools it was based on:

diff -urN OFED-1.5.2-20100717-0600/BUILD_ID OFED-1.5.2-20100718-0600/BUILD_ID
--- OFED-1.5.2-20100717-0600/BUILD_ID   2010-07-17 07:13:29.0 -0600
+++ OFED-1.5.2-20100718-0600/BUILD_ID   2010-07-18 07:13:29.0 -0600
@@ -1,4 +1,4 @@
-OFED-1.5.2-20100717-0600:
+OFED-1.5.2-20100718-0600:
 
 compat-dapl:
 http://www.openfabrics.org/downloads/dapl/compat-dapl-1.2.18.tar.gz
@@ -77,7 +77,7 @@
 
 ofa_kernel:
 git://git.openfabrics.org/ofed_1_5/linux-2.6.git ofed_kernel_1_5
-commit 0c842405cd3d204b23125836a8749fe7cd40b566
+commit 6bcc8f2eb4f005f430ee8f1d6962ba6778d6bbd8
 
 ofed-docs:
 git://git.openfabrics.org/~tziporet/docs.git ofed_1_5
@@ -105,7 +105,7 @@
 http://www.openfabrics.org/downloads/qperf/qperf-0.4.6-0.1.gb81434e.tar.gz
 
 rds-tools:
-http://www.openfabrics.org/~vlad/ofed_1_5/rds-tools/rds-tools-1.5-1.src.rpm
+http://www.openfabrics.org/downloads/rds-tools/rds-tools-2.0.3.tar.gz
 


It also looks like, from 
  http://oss.oracle.com/git/?p=agrover/rds-tools.git;a=summary

that rds-tools now builds into two RPMs, rds-tools and rds-devel,
but the OFED build scripts don't seem to know about that change.

I'd like to learn how to write apps that use RDS, so I thought
I needed rds.h to compile against, in hopes of running against the
latest upstream kernel RDS.  But, I can't seem to get it from
the 1.5.2 daily snapshot, as a rds-devel rpm isn't getting installed.

Is there somewhere else to get the appropriate rds.h from?

Thanks -- Jim



--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html