I'm not familiar at all with this IB code, but the supplied patch seems to post the recv wr only once when the UD QP is created. And get_pathrecord_info() seems to have logic to retry querying path records, and if it does, there won't be a recv posted after the first recv completes. So it seems that if a recv WR is polled from the CQ, and we're going to iterate calling ibv_post_send() again, then we better post another recv wr...
> -----Original Message----- > From: Jeff Squyres (jsquyres) [mailto:jsquy...@cisco.com] > Sent: Monday, June 29, 2015 1:50 PM > To: Open MPI Developers List > Cc: Nathan Hjelm; Steve Wise > Subject: Re: [OMPI devel] the bug in btl_openib_connect_sl.c > > Nathan / Steve -- > > Can you comment? > > > > On Jun 26, 2015, at 5:13 AM, Алексей Рыжих <avryzh...@compcenter.org> wrote: > > > > Hi everybody, > > I tried the functionality for 3D-torus cluster topology support and > > encountered the bug with error message like below: > > > > srvmpisnb02][[9011,1],3][ompi/mca/btl/openib/connect/btl_openib_connect_sl.c:239:get_pathrecord_info] > > error posting receive on QP > [0x4f] errno says: Success [0] > > > > The reason of this bug is receive queue overflow on UD QP associated with > > handle cache->qp > > > > Attached file is my proposal to fix it based on 1.8 Open MPI branch. > > > > And I have a question about 3D-Torus toplogy support for UD QPs. For > > example you use UD transport in UDCM connection manger. > Are any changes required to query service level for UD QP? > > > > May be we need to add the call of btl_openib_connect_get_pathrecord_sl(…) > > before ibv_create_ah() like below: > > ah_attr.is_global = 0; > > ah_attr.dlid = remote_lid; > > ah_attr.sl = btl_openib_connect_get_pathrecord_sl(…); > > ah_attr.src_path_bits = mca_btl_openib_component.ib_src_path_bits; > > ah_attr.port_num = openib_btl->ib_port_num; > > > > ah =ibv_create_ah)(openib_btl->ib_pd, &ah_attr); > > > > > > Regards, > > Alexey Ryzhikh > > <btl_openib_connect_sl.c.diff>_______________________________________________ > > devel mailing list > > de...@open-mpi.org > > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > > Link to this post: > > http://www.open-mpi.org/community/lists/devel/2015/06/17551.php > > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/