Issue with RDMA_CM on systems with multiple IB HCA's.
Hi All, I'm trying to run 'ib_rdma_bw' test (part of the perftest suite) on a cluster with two IB ConnectX DDR HCA's. The OFED version I'm using is 1.5.1. OpenSM is running on the network and the ports are up and active. I see that whenever I use RDMA_CM to establish connections, the program quits with the error given below (the test runs fine if we don't use RDMA_CM). I recall seeing a post mentioning some issues with RDMA_CM on systems with multiple HCA's. I was wondering whether this has been resolved with the latest OFED. Any input on this issue would be greatly appreciated. Also, it would be great if anyone could point me to any open bugs on this issue so that I can track it's status. [subra...@amd6 perftest]$ ./ib_rdma_bw -c 172.16.1.5 11928: | port=18515 | ib_port=1 | size=65536 | tx_depth=100 | iters=1000 | duplex=0 | cma=1 | 11928: Local address: LID , QPN 00, PSN 0x5bfbba RKey 0x90042602 VAddr 0x002b27feabe000 11928: Remote address: LID , QPN 00, PSN 0x392fe6, RKey 0xf8042605 VAddr 0x002b9d5c93b000 11928:pp_send_start: bad wc status 12 11928:main: Completion with error at client: 11928:main: Failed status 5: wr_id 3 11928:main: scnt=100, ccnt=0 Thanks in advance, Hari. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] rdma/ib_cm: check LAP state before sending an MRA
This problem was originally reported by Arthur Kepner : We have a customer who has repeatedly had system panics with the following signature: Unable to handle kernel NULL pointer dereference at 0010 RIP: {:ib_cm:ib_cm_init_qp_attr+580} PGD 3a2db6067 PUD 0 Oops: [1] SMP last sysfs file: /class/infiniband/mlx4_0/node_guid CPU 4 Modules linked in: i2c_dev sg sd_mod crc32c libcrc32c iscsi_tcp libiscsi scsi_transport_iscsi rdma_ucm rdma_cm iw_cm ib_addr ib_ipoib ib_cm ib_sa ipv6 ib_uverbs ib_umad iw_cxgb3 cxgb3 firmware_class mlx4_ib ib_mthca ib_mad ib_core loop numatools xpmem worm mlx4_core libata i2c_i801 scsi_mod i2c_core shpchp pci_hotplug nfs lockd nfs _acl af_packet sunrpc e1000 Pid: 3256, comm: star Tainted: G U 2.6.16.60-0.34-smp #1 RIP: 0010:[] {:ib_cm:ib_cm_init_qp_attr+580} RSP: 0018:810369d09d38 EFLAGS: 00010046 RAX: RBX: 810419678c00 RCX: 0008 RDX: 0246 RSI: 810419678d18 RDI: 810369d09e70 RBP: 810369d09e18 R08: 0003003d R09: R10: 810369d09e18 R11: 0088 R12: 810369d09d88 R13: R14: 810419678c80 R15: 403500b0 FS: 40354940(0063) GS:810420ffbbc0() knlGS: CS: 0010 DS: ES: CR0: 8005003b CR2: 0010 CR3: 00039f0c4000 CR4: 06e0 Process star (pid: 3256, threadinfo 810369d08000, task 8103b81b5830) Stack: 810419678a00 810369d09d88 810369d09e18 810369d09e18 40143430 882fb6d5 810376261540 81040bea4740 810376261540 88309285 Call Trace: {:rdma_cm:rdma_init_qp_attr+209} {:rdma_ucm:ucma_init_qp_attr+160} {thread_return+0} {:rdma_ucm:ucma_write+115} {vfs_write+215} {sys_write+69} {system_call+126} Code: 8a 40 10 88 85 85 00 00 00 8b 83 38 01 00 00 66 89 45 7a 8a RIP {:ib_cm:ib_cm_init_qp_attr+580} RSP >From a crash dump, I determined that we died in cm_init_qp_rts_attr() (it's inline, so it doesn't show up in the traceback) on the line labeled below: static int cm_init_qp_rts_attr(struct cm_id_private *cm_id_priv, struct ib_qp_attr *qp_attr, int *qp_attr_mask) { if (cm_id_priv->id.lap_state == IB_CM_LAP_UNINIT) { . } else { *qp_attr_mask = IB_QP_ALT_PATH | IB_QP_PATH_MIG_STATE; qp_attr->alt_port_num = cm_id_priv->alt_av.port->port_num; <-die A similar problem was reported by Josh England . The problem is that the rdma_cm can call ib_send_cm_mra() after a connection has been established. The ib_cm incorrectly assumes that the MRA is in response to a LAP (load alternate path) message, even though no LAP message has been received. The ib_cm needs to check the lap_state before sending an MRA if the cm_id state is established. Signed-off-by: Sean Hefty --- Josh or Arthur, can either of you confirm if this patch fixes the crashes that you've seen? drivers/infiniband/core/cm.c | 10 ++ 1 files changed, 6 insertions(+), 4 deletions(-) diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c index ad63b79..64e0903 100644 --- a/drivers/infiniband/core/cm.c +++ b/drivers/infiniband/core/cm.c @@ -2409,10 +2409,12 @@ int ib_send_cm_mra(struct ib_cm_id *cm_id, msg_response = CM_MSG_RESPONSE_REP; break; case IB_CM_ESTABLISHED: - cm_state = cm_id->state; - lap_state = IB_CM_MRA_LAP_SENT; - msg_response = CM_MSG_RESPONSE_OTHER; - break; + if (cm_id->lap_state == IB_CM_LAP_RCVD) { + cm_state = cm_id->state; + lap_state = IB_CM_MRA_LAP_SENT; + msg_response = CM_MSG_RESPONSE_OTHER; + break; + } default: ret = -EINVAL; goto error1; -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] IB/qib: set cfgctxts to number of CPUs by default
Up to now, we have set the number of available user contexts based on the number of hardware contexts which is set according to the number of available CPUs. This was fine since most CPUs had a power of two number of cores and the chip supported 4, 8, or 16 user contexts. Now that some systems have 12 cores, the default isn't optimal and should be set to 12 even though 16 hardware contexts need to be enabled. Signed-off-by: Ralph Campbell --- drivers/infiniband/hw/qib/qib_iba7322.c |2 +- drivers/infiniband/hw/qib/qib_init.c|2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/infiniband/hw/qib/qib_iba7322.c b/drivers/infiniband/hw/qib/qib_iba7322.c index 5eedf83..9031cd8 100644 --- a/drivers/infiniband/hw/qib/qib_iba7322.c +++ b/drivers/infiniband/hw/qib/qib_iba7322.c @@ -5864,7 +5864,7 @@ static void write_7322_initregs(struct qib_devdata *dd) * Doesn't clear any of the error bits that might be set. */ val = TIDFLOW_ERRBITS; /* these are W1C */ - for (i = 0; i < dd->ctxtcnt; i++) { + for (i = 0; i < dd->cfgctxts; i++) { int flow; for (flow = 0; flow < NUM_TIDFLOWS_CTXT; flow++) qib_write_ureg(dd, ur_rcvflowtable+flow, val, i); diff --git a/drivers/infiniband/hw/qib/qib_init.c b/drivers/infiniband/hw/qib/qib_init.c index a873dd5..f1d16d3 100644 --- a/drivers/infiniband/hw/qib/qib_init.c +++ b/drivers/infiniband/hw/qib/qib_init.c @@ -93,7 +93,7 @@ unsigned long *qib_cpulist; void qib_set_ctxtcnt(struct qib_devdata *dd) { if (!qib_cfgctxts) - dd->cfgctxts = dd->ctxtcnt; + dd->cfgctxts = dd->first_user_ctxt + num_online_cpus(); else if (qib_cfgctxts < dd->num_pports) dd->cfgctxts = dd->ctxtcnt; else if (qib_cfgctxts <= dd->ctxtcnt) -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: NULL pointer dereference in rdma_ucm
Upgrading 1.5.1 is the way to go for me. I have other dependencies tying me down to the CentOS kernel for the time being. Hopefully any patch to mainstream should apply fairly cleanly to 1.5.1. -JE On Wed, Jul 21, 2010 at 1:51 PM, Hefty, Sean wrote: >> Timed out connections might be something that can be compensated for >> in the app. It is definitely preferable to a kernel panic. Still, >> I'll work on making OFED-1.5.1 happy before playing around with >> removing the ib_send_cm_mra() call. > > FYI - the possible issue I'm describing is in the upstream kernel, so it > would also be in 1.5.1. > -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V3] ib_qib: Allow writes to the diag_counters to be able to clear them
Acked-by: Ralph Campbell On Tue, 2010-07-13 at 18:53 -0700, Ira Weiny wrote: > From: Ira Weiny > Date: Wed, 7 Jul 2010 17:35:34 -0700 > Subject: [PATCH] ib_qib: Allow writes to the diag_counters to be able to > clear them > > Changes in V3: > Add non-number error check > Return "proper" proper length > > Changes in V2: > Add check for negative values > Return proper length > > Signed-off-by: Ira Weiny > --- > drivers/infiniband/hw/qib/qib_sysfs.c | 21 - > 1 files changed, 20 insertions(+), 1 deletions(-) > > diff --git a/drivers/infiniband/hw/qib/qib_sysfs.c > b/drivers/infiniband/hw/qib/qib_sysfs.c > index dab4d9f..b214eff 100644 > --- a/drivers/infiniband/hw/qib/qib_sysfs.c > +++ b/drivers/infiniband/hw/qib/qib_sysfs.c > @@ -347,7 +347,7 @@ static struct kobj_type qib_sl2vl_ktype = { > > #define QIB_DIAGC_ATTR(N) \ > static struct qib_diagc_attr qib_diagc_attr_##N = { \ > - .attr = { .name = __stringify(N), .mode = 0444 }, \ > + .attr = { .name = __stringify(N), .mode = 0664 }, \ > .counter = offsetof(struct qib_ibport, n_##N) \ > } > > @@ -403,8 +403,27 @@ static ssize_t diagc_attr_show(struct kobject *kobj, > struct attribute *attr, > return sprintf(buf, "%u\n", *(u32 *)((char *)qibp + dattr->counter)); > } > > +static ssize_t diagc_attr_store(struct kobject *kobj, struct attribute *attr, > + const char *buf, size_t size) > +{ > + struct qib_diagc_attr *dattr = > + container_of(attr, struct qib_diagc_attr, attr); > + struct qib_pportdata *ppd = > + container_of(kobj, struct qib_pportdata, diagc_kobj); > + struct qib_ibport *qibp = &ppd->ibport_data; > + char *endp; > + long val = simple_strtol(buf, &endp, 0); > + > + if (val < 0 || endp == buf) > + return -EINVAL; > + > + *(u32 *)((char *)qibp + dattr->counter) = (u32)val; > + return size; > +} > + > static const struct sysfs_ops qib_diagc_ops = { > .show = diagc_attr_show, > + .store = diagc_attr_store, > }; > > static struct kobj_type qib_diagc_ktype = { -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: NULL pointer dereference in rdma_ucm
> Timed out connections might be something that can be compensated for > in the app. It is definitely preferable to a kernel panic. Still, > I'll work on making OFED-1.5.1 happy before playing around with > removing the ib_send_cm_mra() call. FYI - the possible issue I'm describing is in the upstream kernel, so it would also be in 1.5.1. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: more partition questions
Hi Tom, On 7/19/10, Tom Ammon wrote: > I'm trying to set up partitions in a little test environment, and I'm > having trouble. > > I have opensm running on a machine attached to the fabric, and sminfo on > the other machines confirm that this is indeed the master SM. Here's my > /etc/opensm/partitions.conf: > > Default=0x , ipoib : ALL, SELF=full ; > PartitionBlue=0x8004, ipoib : 0x0002c9030009cb3f=full, > 0x0002c90200252841=full, 0x0002c90200243471=full ; > PartitionRed=0x8005, ipoib : 0x0002c90200252841=full, > 0x0002c90200243591=full, 0x0002c9030009cb2b=full ; You don't really need the 0x8000 bit on in the pkeys but I don't think it does any harm. > But when I go to the machine with port GUID 0x0002c90200243471, it > doesn't appear that it's getting the pkey I wanted: > > [r...@stagnate ~]# ibstat > CA 'mthca0' > CA type: MT23108 > Number of ports: 2 > Firmware version: 3.3.5 > Hardware version: a1 > Node GUID: 0x0002c90200243470 > System image GUID: 0x0002c90200243473 > Port 1: > State: Active > Physical state: LinkUp > Rate: 10 > Base lid: 10 > LMC: 0 > SM lid: 4 > Capability mask: 0x02510a68 > Port GUID: 0x0002c90200243471 > Port 2: > State: Down > Physical state: Polling > Rate: 2 > Base lid: 0 > LMC: 0 > SM lid: 0 > Capability mask: 0x02510a68 > Port GUID: 0x0002c90200243472 > > [r...@stagnate ~]# cat /sys/class/net/ib0/pkey > 0x What does: smpquery pkeys 10 1 say ? Do you see the other pkey(s) on that port ? The pkey you are seeing is the only one for ib0 interface. If you want to have IPoIB interfaces on the other partitions too, you need to set this up by creating a child interface on those nodes; you had asked about that in a previous email (http://www.mail-archive.com/linux-rdma@vger.kernel.org/msg04728.html). -- Hal > > I'm trying to run one ipoib subnet in each partition, and then > eventually the goal is to have a different server that has 2 child > interfaces, one on each subnet. But it doesn't appear that my partition > configuration is even correct. Is there a syntax error, or something > else I am missing? > > Thanks, > > Tom > > > > -- > Tom Ammon > Network Engineer > Office: 801.587.0976 > Mobile: 801.674.9273 > > Center for High Performance Computing > University of Utah > http://www.chpc.utah.edu > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] ib/ehca: init irq tasklet before irq can happen
thanks, applied. -- Roland Dreier || For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/index.html -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] ib/ehca: Catch failing ioremap()
thanks, applied. -- Roland Dreier || For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/index.html -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/7] IB/qib: allow PSM to select from multiple port assignment algorithms
thanks, applied -- Roland Dreier || For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/index.html -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 04/18] ib/qib: use generic_file_llseek
thanks, applied -- Roland Dreier || For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/index.html -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V2] ib_qib: Allow writes to the diag_counters to be able to clear them
So Ralph, I'm relying on you to decide if this makes sense... -- Roland Dreier || For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/index.html -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: NULL pointer dereference in rdma_ucm
On Wed, Jul 21, 2010 at 11:13 AM, Hefty, Sean wrote: > If this is what is happening, then removing the call to ib_send_cm_mra() from > cma_req_handler() should eliminate the crash. The drawback is that > connections may begin timing out under load, but it may be worth trying. > I'll see if I can come up with a better fix. Timed out connections might be something that can be compensated for in the app. It is definitely preferable to a kernel panic. Still, I'll work on making OFED-1.5.1 happy before playing around with removing the ib_send_cm_mra() call. -JE -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] change thread-unsafe readdir to thread-safe readdir_r calls
On 07/21/2010 11:06 AM, Roland Dreier wrote: > + buf = alloca(offsetof(struct dirent, d_name) + NAME_MAX + 1); > + while (readdir_r(class_dir, buf,&dent) == 0&& dent) { So after thinking this over, I don't think I'm going to apply this patch. I think the right fix is for corosync to allow readdir() -- in general pushing people to safer APIs is probably a good thing, but I think in this particular case it doesn't make sense. In fact it seems that readdir_r() is actually worse since the "buf" parameter does not have a well-defined required size, while readdir() is actually safe in most uses. - R. thanks for considering regards -steve -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: NULL pointer dereference in rdma_ucm
> [] :rdma_cm:rdma_init_qp_attr+0xed/0x13f > [] :rdma_ucm:ucma_init_qp_attr+0x97/0xe4 > [] default_wake_function+0x0/0xe > [] default_wake_function+0x0/0xe > [] shmem_file_write+0x23f/0x251 > [] :rdma_ucm:ucma_write+0x73/0x91 > [] vfs_write+0xce/0x174 > [] sys_write+0x45/0x6e > [] system_call+0x7e/0x83 Here's a guess at what may be happening: The rdma_cm receives a REQ, in cma_req_handler() we have: ret = conn_id->id.event_handler(&conn_id->id, &event); if (!ret) { /* * Acquire mutex to prevent user executing rdma_destroy_id() * while we're accessing the cm_id. */ mutex_lock(&lock); if (cma_comp(conn_id, CMA_CONNECT) && !cma_is_ud_ps(conn_id->id.ps)) ib_send_cm_mra(cm_id, CMA_CM_MRA_SETTING, NULL, 0); Note that the call to ib_send_cm_mra() is after the event has been reported to the user. I'm guessing that the connection is getting established before the call to ib_send_cm_mra() is invoked. ib_send_cm_mra() does this: case IB_CM_ESTABLISHED: cm_state = cm_id->state; lap_state = IB_CM_MRA_LAP_SENT; msg_response = CM_MSG_RESPONSE_OTHER; break; ... cm_id->state = cm_state; cm_id->lap_state = lap_state; cm_id_priv->service_timeout = service_timeout; If the cm_id state is established when ib_send_cm_mra() is called, the lap_state is changed. This would result in cm_init_qp_rts_attr() incorrectly falling into: if (cm_id_priv->id.lap_state == IB_CM_LAP_UNINIT) { ... } else { ---> here where the alt_av.port would not be set. If this is what is happening, then removing the call to ib_send_cm_mra() from cma_req_handler() should eliminate the crash. The drawback is that connections may begin timing out under load, but it may be worth trying. I'll see if I can come up with a better fix. - Sean -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] change thread-unsafe readdir to thread-safe readdir_r calls
> +buf = alloca(offsetof(struct dirent, d_name) + NAME_MAX + 1); > +while (readdir_r(class_dir, buf, &dent) == 0 && dent) { So after thinking this over, I don't think I'm going to apply this patch. I think the right fix is for corosync to allow readdir() -- in general pushing people to safer APIs is probably a good thing, but I think in this particular case it doesn't make sense. In fact it seems that readdir_r() is actually worse since the "buf" parameter does not have a well-defined required size, while readdir() is actually safe in most uses. - R. -- Roland Dreier || For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/index.html -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2.6.35 2/3] RDMA/cxgb4: Support variable sized work requests.
thanks, applied. -- Roland Dreier || For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/index.html -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch v2] infiniband: cxgb3: clean up signed check
thanks, applied -- Roland Dreier || For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/index.html -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 3/3] RDMA/cxgb4: Obtain RDMA QID ranges from LLD/FW.
> This one is dependent on a cxgb3 change merged into net-next. I'll hold on to this until that change gets merged upstream. -- Roland Dreier || For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/index.html -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 2/3] RDMA/cxgb4: Add module option to tweak delayed ack.
thanks, applied -- Roland Dreier || For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/index.html -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 2/6] infiniband: remove dependency on __GFP_NOFAIL
thanks guys, applied -- Roland Dreier || For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/index.html -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: NULL pointer dereference in rdma_ucm
> I do have the sysfs counters in > /sys/class/infiniband_cm///cm_tx_msgs/. Could you > point me to a reference for what they all mean? These are counting the number of CM messages sent for each type. You would need to refer to the Infiniband specification to understand the CM protocol and messages. In short, IB uses a 3 way handshake to connect: REQ (request) -> <- REP (reply) RTU (ready to use) -> MRA (message received) can also appear to indicate that a message has been received, but its processing will be delayed. > Now, I'm not sure how relevant it is, but under heavy load I also see > a lot of these: > kernel: ib_cm: calculated mra timeout 67584 > 8192, decreasing use > timeout_ms This is likely not an issue. The CM has received an MRA, but is using a shorter timeout than what was specified in the MRA message. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] ib_ipath: Fix probe failure path
thanks, applied. -- Roland Dreier || For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/index.html -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: knockdown voltaire switch with ARP multicast
Yes. Good progress - most likely idnetify workaround. On Wed, Jul 21, 2010 at 06:38:10AM -0500, Or Gerlitz wrote: > Bob Ciotti wrote: > > Maybe someone on the voltaire side can help. > > I'm working the issue now Wed Jul 21 00:34:14 PDT 2010 > Hi Bob, > > I understand that some folks from Voltaire are working with you directly. > > Or. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: NULL pointer dereference in rdma_ucm
I do have the sysfs counters in /sys/class/infiniband_cm///cm_tx_msgs/. Could you point me to a reference for what they all mean? There are a few patches I've had to throw into 1.4.2 so I'll need to check whether they are still needed in 1.5.1, but I'll work on that today. Now, I'm not sure how relevant it is, but under heavy load I also see a lot of these: kernel: ib_cm: calculated mra timeout 67584 > 8192, decreasing use timeout_ms -JE On Wed, Jul 21, 2010 at 6:09 AM, Or Gerlitz wrote: > Josh England wrote: >> Do you think upgrading to OFED-1.5.1 would help at all? > > it might help you to diagnose the problem better, if you read through the > thread I pointed on (its very short, four messages, let then two minutes), > you would see that Arthur is reporting on the lap_state and Sean is > suggesting to use the IB CM sysfs counter to further debug this. I don't know > if these counters exist on the IB stack used for the ofed drop you're using, > but they should be in 1.5.x > > Or. > -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: sense remote hardware address change by rdma-cm applications
Or Gerlitz wrote: Steve Wise wrote: The cxgb3/4 drivers do not set IFF_NOARP and rely on ND being done as part of connection setup. The driver will initiate ND if there isn't a neigh entry available at the time the iwarp driver tries to send a SYN or SYN/ACK. okay, understood, thanks for clarifying this out. The cxgb* drivers actually reference the neigh and dst structs until the offload connection is gone. Also if the the offloaded connection has problems transmitting (due to a L2 address change, for example), then the driver will initiate ND again by calling neigh_event_send(). See t4_l2t_send_event() in l2t.c which is called by the iwarp driver in peer_abort() from iwch_cm.c when the HW tells us its retransmitting too much. In the general case of rdma-cm consumer, e.g IB RC based and/or UD unicast based, we don't have such feedback mechanism from the HW. As such, I would draw the line here around adopting into the rdma-cm the behavior of referencing the neigh and dst structures until the connection is gone (could you point on the func/path in drivers/net/cxgb3/l2t.c which does this? i wasn't sure). Actually the dst entry ref/deref is really done in iw_cxgb3. The dst/neigh entries are referenced in iwch_connect() and pass_accept_req() by calling ip_route_output() via find_route(). They are released in __free_ep() when the endpoint is finally freed after connection shutdown. The L2T code deals with maintaining the HW L2 entries and dealing with neighbour change events from the kernel. Steve. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: sense remote hardware address change by rdma-cm applications
Jason Gunthorpe wrote: > I'm thinking something like this.. > - The RDMA CM gets the dst from its route lookup locks it and stores it. > - Instead of doing a route lookup cxgb gets the dst from RDMA CM, > locks it and stores it > - RDMA CM traps all notifications/etc and generates callback to cxgb > to say the dst has changed. > - cxgb releases the old dst and grabs the new one, updates the HW, etc. Jason, I'm up for extending the rdma-cm event of address change, on which an app can decide if to re-act or not. For example, the in-tree iser and rds code treat this event the same as a disconnection request arriving, which means higher layer (e.g the user space iscsi daemon in the iser case) would try to re-connect. This has the advantage of simplifying the ULP state-machine, so there's no need for special handing for address-change, just treat it as a hint that re-connection is needed. the cxgb* code take this deeper as they handle L2 changes in the driver level and not as event delivered to the ULP which can optionally address or ignore it. Or. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: sense remote hardware address change by rdma-cm applications
Steve Wise wrote: > The cxgb3/4 drivers do not set IFF_NOARP and rely on ND being done as > part of connection setup. The driver will initiate ND if there isn't a > neigh entry available at the time the iwarp driver tries to send a SYN or > SYN/ACK. okay, understood, thanks for clarifying this out. > The cxgb* drivers actually reference the neigh and dst structs until the > offload connection is gone. Also if the the offloaded connection has > problems transmitting (due to a L2 address change, for example), then > the driver will initiate ND again by calling neigh_event_send(). See > t4_l2t_send_event() in l2t.c which is called by the iwarp driver in > peer_abort() from iwch_cm.c when the HW tells us its retransmitting too much. In the general case of rdma-cm consumer, e.g IB RC based and/or UD unicast based, we don't have such feedback mechanism from the HW. As such, I would draw the line here around adopting into the rdma-cm the behavior of referencing the neigh and dst structures until the connection is gone (could you point on the func/path in drivers/net/cxgb3/l2t.c which does this? i wasn't sure). > What doesn't happen is active positive feedback during the connection to > avoid NUD. IE once the connection is setup, nobody calls dst_confirm() > It is only called during connection setup/teardown. I think we can live with that, this is similar to the case of an app using UDP in uni-directional manner between host A --> B so the NUD part of the network stack @ host A has to issue timely probes to validate the L2 address of host B. The only difference is that we have the A --> B comm offloaded and eventually without keeping the ref the neighbour and dst are deleted, the proposed patch eliminates this deletion. Or. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: NULL pointer dereference in rdma_ucm
Josh England wrote: > Do you think upgrading to OFED-1.5.1 would help at all? it might help you to diagnose the problem better, if you read through the thread I pointed on (its very short, four messages, let then two minutes), you would see that Arthur is reporting on the lap_state and Sean is suggesting to use the IB CM sysfs counter to further debug this. I don't know if these counters exist on the IB stack used for the ofed drop you're using, but they should be in 1.5.x Or. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: knockdown voltaire switch with ARP multicast
Bob Ciotti wrote: Maybe someone on the voltaire side can help. I'm working the issue now Wed Jul 21 00:34:14 PDT 2010 Hi Bob, I understand that some folks from Voltaire are working with you directly. Or. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
knockdown voltair switch with ARP multicast
I've been able to duplicate a nasty problem that took down our system last week. I believe its caused by IPoIB ARP multicast traffic. The 4036 switch subscribes to the IPoIB multicast group whether the SM is running or disabled. When the switch goes belly up, and it can't be programmed by the SM. ERR 3113: MAD completed in error (IB_TIMEOUT): SubnSet(SwitchInfo), attr_mod 0x0, TID 0x22322 smpquery works (nodeinfo/switchinfo) but perfquery fails with ibwarn: [26748] _do_madrpc: recv failed: Connection timed out I can log into the switch, but wouldn't know what to look for there. Voltaire support case number US Case 00018085: Re: NASA Issue [[ ref:00D38IO.5008BPc6T:ref]] Maybe someone on the voltaire side can help. I'm working the issue now Wed Jul 21 00:34:14 PDT 2010 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html