I'm not familiar at all with this IB code, but the supplied patch seems to post 
the recv wr only once when the UD QP is created.  And get_pathrecord_info() 
seems to have logic to retry querying path records, and if it does, there won't 
be a recv posted after the first recv completes. So it seems that if a recv WR 
is polled from the CQ, and we're going to iterate calling ibv_post_send() 
again, then we better post another recv wr...



> -----Original Message-----
> From: Jeff Squyres (jsquyres) [mailto:jsquy...@cisco.com]
> Sent: Monday, June 29, 2015 1:50 PM
> To: Open MPI Developers List
> Cc: Nathan Hjelm; Steve Wise
> Subject: Re: [OMPI devel] the bug in btl_openib_connect_sl.c
> 
> Nathan / Steve --
> 
> Can you comment?
> 
> 
> > On Jun 26, 2015, at 5:13 AM, Алексей Рыжих <avryzh...@compcenter.org> wrote:
> >
> > Hi everybody,
> > I tried the functionality for  3D-torus cluster topology  support  and 
> > encountered  the bug with error message like below:
> >
> > srvmpisnb02][[9011,1],3][ompi/mca/btl/openib/connect/btl_openib_connect_sl.c:239:get_pathrecord_info]
> >  error posting receive on QP
> [0x4f] errno says: Success [0]
> >
> > The reason of this bug is receive queue overflow on UD QP associated with 
> > handle cache->qp
> >
> > Attached file is my proposal to fix it based on 1.8 Open MPI branch.
> >
> > And I have a question about 3D-Torus toplogy support  for UD QPs.  For 
> > example you use UD transport in UDCM connection manger.
> Are any changes required to query service level for UD QP?
> >
> > May be we need to add the call of btl_openib_connect_get_pathrecord_sl(…)  
> > before  ibv_create_ah()   like below:
> > ah_attr.is_global  = 0;
> > ah_attr.dlid            = remote_lid;
> > ah_attr.sl                = btl_openib_connect_get_pathrecord_sl(…);
> > ah_attr.src_path_bits   = mca_btl_openib_component.ib_src_path_bits;
> > ah_attr.port_num        = openib_btl->ib_port_num;
> >
> > ah =ibv_create_ah)(openib_btl->ib_pd, &ah_attr);
> >
> >
> > Regards,
> > Alexey Ryzhikh
> > <btl_openib_connect_sl.c.diff>_______________________________________________
> > devel mailing list
> > de...@open-mpi.org
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > Link to this post: 
> > http://www.open-mpi.org/community/lists/devel/2015/06/17551.php
> 
> 
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to: 
> http://www.cisco.com/web/about/doing_business/legal/cri/


Reply via email to