I placed a 1.0.14.1 package on the ofa server in the downloads/rdmacm section. Can you verify that it works? If so, I'll ask to pull it into 1.5.3
> -----Original Message----- > From: Steve Wise [mailto:sw...@opengridcomputing.com] > Sent: Tuesday, February 15, 2011 10:37 AM > To: Hefty, Sean > Cc: OpenFabrics EWG; Tziporet Koren > Subject: Re: rping/cxgb3 regression > > > On 02/15/2011 12:18 PM, Hefty, Sean wrote: > >> I'm wondering if pulling the rping changes for ofed-1.5.3 would be ok? > I > >> guess to do this you would have to push a > >> 1-off librdmacm without those changes? Or maybe back up what is in > OFED- > >> 1.5.3 to the previous release without this > >> rping change? > >> > >> Thoughts? > > Is the commit (93635fa33b41d356fa096242fec4ce788194b42f) below the issue? > (Btw, the author listed in my git tree is wrong.) > > > > Yes. > > > I don't think I want to drop back to 1.0.13 for 1.5.3, so maybe reverting > this change and pushing out 1.0.14.1 would work. There's just one other > change after 1.0.14 at the moment, and it's to the build, so I'd skip a > full release for now. > > > > Let me know if you think this would work. > > > > I just tested that removing this from 1.0.14 will resolve the issue for > 1.5.3. > > > > - Sean > > > > --- > > > > librdmacm/rping: Make sure CQ event thread exits before destroying > the CQ > > > > It is possible for the CQ event thread to poll the CQ after it has > been > > destroyed which can result in a seg fault on T3 interfaces. This > patch > > waits for the thread to exit before destroying the CQ. > > > > Signed-off-by: Steve Wise<sw...@opengridcomputing.com> > > Signed-off-by: Sean Hefty<sean.he...@intel.com> > > > > diff --git a/examples/rping.c b/examples/rping.c > > index 2d4c2de..ee292ec 100644 > > --- a/examples/rping.c > > +++ b/examples/rping.c > > @@ -280,12 +280,11 @@ static int rping_cq_event_handler(struct rping_cb > *cb) > > ret = 0; > > > > if (wc.status) { > > - if (wc.status != IBV_WC_WR_FLUSH_ERR) { > > + if (wc.status != IBV_WC_WR_FLUSH_ERR) > > fprintf(stderr, > > "cq completion failed status > %d\n", > > wc.status); > > - ret = -1; > > - } > > + ret = -1; > > goto error; > > } > > > > @@ -802,10 +801,9 @@ static void *rping_persistent_server_thread(void > *arg) > > > > rping_test_server(cb); > > rdma_disconnect(cb->child_cm_id); > > + pthread_join(cb->cqthread, NULL); > > rping_free_buffers(cb); > > rping_free_qp(cb); > > - pthread_cancel(cb->cqthread); > > - pthread_join(cb->cqthread, NULL); > > rdma_destroy_id(cb->child_cm_id); > > free_cb(cb); > > return NULL; > > @@ -890,6 +888,7 @@ static int rping_run_server(struct rping_cb *cb) > > > > rping_test_server(cb); > > rdma_disconnect(cb->child_cm_id); > > + pthread_join(cb->cqthread, NULL); > > rdma_destroy_id(cb->child_cm_id); > > err2: > > rping_free_buffers(cb); > > @@ -1057,6 +1056,7 @@ static int rping_run_client(struct rping_cb *cb) > > > > rping_test_client(cb); > > rdma_disconnect(cb->cm_id); > > + pthread_join(cb->cqthread, NULL); > > err2: > > rping_free_buffers(cb); > > err1: _______________________________________________ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg