Re: [PATCH] rdma/ib_cm: check LAP state before sending an MRA

2010-07-28 Thread Roland Dreier
thanks, applied.
-- 
Roland Dreier  || For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] rdma/ib_cm: check LAP state before sending an MRA

2010-07-22 Thread Arthur Kepner
On Wed, Jul 21, 2010 at 04:36:52PM -0700, Hefty, Sean wrote:
> ...
> Josh or Arthur, can either of you confirm if this patch fixes the 
> crashes that you've seen?
> 

I can't. It's been practically impossible for us to reproduce.
(Only our customer seems to have the magic recipe.) 

-- 
Arthur
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] rdma/ib_cm: check LAP state before sending an MRA

2010-07-21 Thread Hefty, Sean
This problem was originally reported by Arthur Kepner :

We have a customer who has repeatedly had system panics with 
the following signature:

Unable to handle kernel NULL pointer dereference at 0010 RIP:
{:ib_cm:ib_cm_init_qp_attr+580}
PGD 3a2db6067 PUD 0
Oops:  [1] SMP
last sysfs file: /class/infiniband/mlx4_0/node_guid
CPU 4
Modules linked in: i2c_dev sg sd_mod crc32c libcrc32c iscsi_tcp libiscsi
scsi_transport_iscsi rdma_ucm rdma_cm
iw_cm ib_addr ib_ipoib ib_cm ib_sa ipv6 ib_uverbs ib_umad iw_cxgb3 cxgb3
firmware_class mlx4_ib ib_mthca ib_mad
 ib_core loop numatools xpmem worm mlx4_core libata i2c_i801 scsi_mod i2c_core
shpchp pci_hotplug nfs lockd nfs
_acl af_packet sunrpc e1000
Pid: 3256, comm: star Tainted: G U 2.6.16.60-0.34-smp #1
RIP: 0010:[]
{:ib_cm:ib_cm_init_qp_attr+580}
RSP: 0018:810369d09d38  EFLAGS: 00010046
RAX:  RBX: 810419678c00 RCX: 0008
RDX: 0246 RSI: 810419678d18 RDI: 810369d09e70
RBP: 810369d09e18 R08: 0003003d R09: 
R10: 810369d09e18 R11: 0088 R12: 810369d09d88
R13:  R14: 810419678c80 R15: 403500b0
FS:  40354940(0063) GS:810420ffbbc0() knlGS:
CS:  0010 DS:  ES:  CR0: 8005003b
CR2: 0010 CR3: 00039f0c4000 CR4: 06e0
Process star (pid: 3256, threadinfo 810369d08000, task 8103b81b5830)
Stack: 810419678a00 810369d09d88 810369d09e18 810369d09e18
   40143430 882fb6d5 810376261540 81040bea4740
   810376261540 88309285
Call Trace: {:rdma_cm:rdma_init_qp_attr+209}
   {:rdma_ucm:ucma_init_qp_attr+160}
   {thread_return+0}
{:rdma_ucm:ucma_write+115}
   {vfs_write+215} {sys_write+69}
  {system_call+126}

Code: 8a 40 10 88 85 85 00 00 00 8b 83 38 01 00 00 66 89 45 7a 8a
RIP {:ib_cm:ib_cm_init_qp_attr+580} RSP 


>From a crash dump, I determined that we died in cm_init_qp_rts_attr() 
(it's inline, so it doesn't show up in the traceback) on the line 
labeled below:

static int cm_init_qp_rts_attr(struct cm_id_private *cm_id_priv,
   struct ib_qp_attr *qp_attr,
   int *qp_attr_mask)
{

if (cm_id_priv->id.lap_state == IB_CM_LAP_UNINIT) {
.
} else {
   *qp_attr_mask = IB_QP_ALT_PATH | IB_QP_PATH_MIG_STATE;
   qp_attr->alt_port_num = cm_id_priv->alt_av.port->port_num; <-die


A similar problem was reported by Josh England .

The problem is that the rdma_cm can call ib_send_cm_mra() after a
connection has been established.  The ib_cm incorrectly assumes that the
MRA is in response to a LAP (load alternate path) message, even though no
LAP message has been received.  The ib_cm needs to check the lap_state
before sending an MRA if the cm_id state is established.

Signed-off-by: Sean Hefty 
---
Josh or Arthur, can either of you confirm if this patch fixes the crashes that
you've seen?

 drivers/infiniband/core/cm.c |   10 ++
 1 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
index ad63b79..64e0903 100644
--- a/drivers/infiniband/core/cm.c
+++ b/drivers/infiniband/core/cm.c
@@ -2409,10 +2409,12 @@ int ib_send_cm_mra(struct ib_cm_id *cm_id,
msg_response = CM_MSG_RESPONSE_REP;
break;
case IB_CM_ESTABLISHED:
-   cm_state = cm_id->state;
-   lap_state = IB_CM_MRA_LAP_SENT;
-   msg_response = CM_MSG_RESPONSE_OTHER;
-   break;
+   if (cm_id->lap_state == IB_CM_LAP_RCVD) {
+   cm_state = cm_id->state;
+   lap_state = IB_CM_MRA_LAP_SENT;
+   msg_response = CM_MSG_RESPONSE_OTHER;
+   break;
+   }
default:
ret = -EINVAL;
goto error1;


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html