Hi everybody,

I tried the functionality for  3D-torus cluster topology  support  and
encountered  the bug with error message like below:



srvmpisnb02][[9011,1],3][ompi/mca/btl/openib/connect/btl_openib_connect_sl.c:239:get_pathrecord_info]
error posting receive on QP [0x4f] errno says: Success [0]



The reason of this bug is receive queue overflow on UD QP associated with
handle cache->qp



Attached file is my proposal to fix it based on 1.8 Open MPI branch.



And I have a question about 3D-Torus toplogy support  for UD QPs.  For
example you use UD transport in UDCM connection manger.  Are any changes
required to query service level for UD QP?



May be we need to add the call of btl_openib_connect_get_pathrecord_sl(…)
before  ibv_create_ah()   like below:

ah_attr.is_global  = 0;

ah_attr.dlid            = remote_lid;

ah_attr.sl                = btl_openib_connect_get_pathrecord_sl(…);

ah_attr.src_path_bits   = mca_btl_openib_component.ib_src_path_bits;

ah_attr.port_num        = openib_btl->ib_port_num;



ah =ibv_create_ah)(openib_btl->ib_pd, &ah_attr);





Regards,

Alexey Ryzhikh

Attachment: btl_openib_connect_sl.c.diff
Description: Binary data

Reply via email to