I’m not seeing any problem inside the OOB - the problem appears to be in the 
info being given to it:

[host1:16244] 1 more process has sent help message help-mpi-btl-openib.txt / 
default subnet prefix
[host1:16244] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help 
/ error messages
[[46697,1],0][btl_openib_component.c:3501:handle_wc] from host1 to: 
192.168.2.22 error polling LP CQ with status RETRY EXCEEDED ERROR status number 
12 for wr_id 112db80 opcode 32767  vendor error 129 qp_idx 0

I’ve been searching, and I don’t see that help message anywhere in your output 
- not sure what happened to it. I do see this in your output - don’t know what 
it means:

[host1][[46697,1],0][connect/btl_openib_connect_oob.c:935:rml_recv_cb] 
!!!!!!!!!!!!!!!!!!!!!!!!!


> On Apr 20, 2017, at 8:36 AM, Shiqing Fan <shiqing....@huawei.com> wrote:
> 
> Forgot to enable oob verbose in my last test. Here is the updated output file.
>  
> Thanks,
> Shiqing
>  
> From: devel [mailto:devel-boun...@lists.open-mpi.org] On Behalf Of 
> r...@open-mpi.org
> Sent: Thursday, April 20, 2017 4:29 PM
> To: OpenMPI Devel
> Subject: Re: [OMPI devel] openib oob module
>  
> Yeah, I forgot that the 1.10 series still had the BTLs in OMPI. Should be 
> able to restore it. I honestly don’t recall the bug, though :-(
>  
> If you want to try reviving it, you can add some debug in there (plus turn on 
> the OOB verbosity) and I’m happy to help you figure it out.
> Ralph
>  
> On Apr 20, 2017, at 7:13 AM, Shiqing Fan <shiqing....@huawei.com 
> <mailto:shiqing....@huawei.com>> wrote:
>  
> Hi Ralph,
>  
> Yes, it’s been a long time. Hope you all are doing well (I believe so J ).
>  
> I’m working on a virtualization project, and need to run Open MPI on an 
> unikernel OS (most of OFED is missing/unsupported).
>  
> Actually I’m only focusing on 1.10.2, which still has oob in ompi. Probably 
> it might be possible to make oob work there? Or even for 1.10 branch (as 
> Gilles metioned)?
> Do you have any clue about the bug in oob back then?
>  
> Regards,
> Shiqing
>  
>  
> From: devel [mailto:devel-boun...@lists.open-mpi.org 
> <mailto:devel-boun...@lists.open-mpi.org>] On Behalf Of r...@open-mpi.org 
> <mailto:r...@open-mpi.org>
> Sent: Thursday, April 20, 2017 3:49 PM
> To: OpenMPI Devel
> Subject: Re: [OMPI devel] openib oob module
>  
> Hi Shiqing!
>  
> Been a long time - hope you are doing well.
>  
> I see no way to bring the oob module back now that the BTLs are in the OPAL 
> layer - this is why it was removed as the oob is in ORTE, and thus not 
> accessible from OPAL.
> Ralph
>  
> On Apr 20, 2017, at 6:02 AM, Shiqing Fan <shiqing....@huawei.com 
> <mailto:shiqing....@huawei.com>> wrote:
>  
> Dear all,
>  
> I noticed that openib oob module has been removed since a long time ago, 
> because it wasn’t working anymore and nobody seemed need it.
> But for some special operating system, where the rdmacm, udcm or ibcm kernel 
> support is missing, oob may still be necessary.
>  
> I’m curious if it’s possible to bring this module back? How difficult would 
> it be to fix the bug in order to make it work again in 1.10 branch or later? 
> Thanks a lot.
>  
> Best Regards,
> Shiqing
> _______________________________________________
> devel mailing list
> devel@lists.open-mpi.org <mailto:devel@lists.open-mpi.org>
> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel 
> <https://rfd.newmexicoconsortium.org/mailman/listinfo/devel>
>  
> _______________________________________________
> devel mailing list
> devel@lists.open-mpi.org <mailto:devel@lists.open-mpi.org>
> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel 
> <https://rfd.newmexicoconsortium.org/mailman/listinfo/devel>
>  
> <output.txt>_______________________________________________
> devel mailing list
> devel@lists.open-mpi.org <mailto:devel@lists.open-mpi.org>
> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel 
> <https://rfd.newmexicoconsortium.org/mailman/listinfo/devel>
_______________________________________________
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

Reply via email to