You should also check your paths for non interactive remote logins and ensure 
that you are not accidentally mixing versions of open MPI (e.g., the new 
version in your local machine, and some other version on the remote machines). 

Sent from my phone. No type good. 

> On Feb 13, 2017, at 8:14 AM, Gilles Gouaillardet 
> <gilles.gouaillar...@gmail.com> wrote:
> 
> Cyril,
> 
> Are you running your jobs via a batch manager 
> If yes, was support for it correctly built ?
> 
> If you were able to get a core dump, can you post the gdb stacktrace ?
> 
> I guess your nodes have several IP interfaces, you might want to try
> mpirun --mca oob_tcp_if_include eth0 ...
> (replace eth0 with something appropriate if needed)
> 
> Cheers,
> 
> Gilles
> 
> Cyril Bordage <cyril.bord...@inria.fr> wrote:
>> Unfortunately this does not complete this thread. The problem is not
>> solved! It is not an installation problem. I have no previous
>> installation since I use separate directories.
>> I have nothing specific to MPI path in my env, I just use the complete
>> path to mpicc and mpirun.
>> 
>> The error depends on which node I run on. For example I can run on node1
>> and node2, or node1 and node3, or node2 and node3, but not on node1,
>> node2 and node3. With the official version of the platform (1.8.1) it
>> works like a charm.
>> 
>> George, maybe, you could see it by yourself by connecting to our
>> platform (plafrim), since you have an account. It should be easier to
>> understand and see our problem.
>> 
>> 
>> Cyril.
>> 
>>> Le 10/02/2017 à 18:15, George Bosilca a écrit :
>>> To complete this thread, the problem is now solved. Some .so were lingering 
>>> around from a previous installation causing startup pb.
>>> 
>>>  George.
>>> 
>>> 
>>>> On Feb 10, 2017, at 05:38 , Cyril Bordage <cyril.bord...@inria.fr> wrote:
>>>> 
>>>> Thank you for your answer.
>>>> I am running the git master version (last tested was cad4c03).
>>>> 
>>>> FYI, Clément Foyer is talking with George Bosilca about this problem.
>>>> 
>>>> 
>>>> Cyril.
>>>> 
>>>>> Le 08/02/2017 à 16:46, Jeff Squyres (jsquyres) a écrit :
>>>>> What version of Open MPI are you running?
>>>>> 
>>>>> The error is indicating that Open MPI is trying to start a user-level 
>>>>> helper daemon on the remote node, and the daemon is seg faulting (which 
>>>>> is unusual).
>>>>> 
>>>>> One thing to be aware of:
>>>>> 
>>>>>    https://www.open-mpi.org/faq/?category=building#install-overwrite
>>>>> 
>>>>> 
>>>>> 
>>>>>> On Feb 6, 2017, at 8:14 AM, Cyril Bordage <cyril.bord...@inria.fr> wrote:
>>>>>> 
>>>>>> Hello,
>>>>>> 
>>>>>> I cannot run the a program with MPI when I compile it myself.
>>>>>> On some nodes I have the following error:
>>>>>> ================================================================================
>>>>>> [mimi012:17730] *** Process received signal ***
>>>>>> [mimi012:17730] Signal: Segmentation fault (11)
>>>>>> [mimi012:17730] Signal code: Address not mapped (1)
>>>>>> [mimi012:17730] Failing at address: 0xf8
>>>>>> [mimi012:17730] [ 0] /lib64/libpthread.so.0(+0xf500)[0x7ffff66c0500]
>>>>>> [mimi012:17730] [ 1]
>>>>>> /home/bordage/modules/openmpi/openmpi-debug/lib/libopen-pal.so.0(opal_libevent2022_event_priority_set+0xa9)[0x7ffff781fcb9]
>>>>>> [mimi012:17730] [ 2]
>>>>>> /home/bordage/modules/openmpi/openmpi-debug/lib/openmpi/mca_oob_tcp.so(+0xebcd)[0x7ffff197fbcd]
>>>>>> [mimi012:17730] [ 3]
>>>>>> /home/bordage/modules/openmpi/openmpi-debug/lib/openmpi/mca_oob_tcp.so(mca_oob_tcp_peer_accept+0xa1)[0x7ffff1981e34]
>>>>>> [mimi012:17730] [ 4]
>>>>>> /home/bordage/modules/openmpi/openmpi-debug/lib/openmpi/mca_oob_tcp.so(+0xab1d)[0x7ffff197bb1d]
>>>>>> [mimi012:17730] [ 5]
>>>>>> /home/bordage/modules/openmpi/openmpi-debug/lib/libopen-pal.so.0(opal_libevent2022_event_base_loop+0x53c)[0x7ffff782323c]
>>>>>> [mimi012:17730] [ 6]
>>>>>> /home/bordage/modules/openmpi/openmpi-debug/lib/libopen-pal.so.0(+0x3d34c)[0x7ffff77c534c]
>>>>>> [mimi012:17730] [ 7] /lib64/libpthread.so.0(+0x7851)[0x7ffff66b8851]
>>>>>> [mimi012:17730] [ 8] /lib64/libc.so.6(clone+0x6d)[0x7ffff640694d]
>>>>>> [mimi012:17730] *** End of error message ***
>>>>>> --------------------------------------------------------------------------
>>>>>> ORTE has lost communication with its daemon located on node:
>>>>>> 
>>>>>> hostname:  mimi012
>>>>>> 
>>>>>> This is usually due to either a failure of the TCP network
>>>>>> connection to the node, or possibly an internal failure of
>>>>>> the daemon itself. We cannot recover from this failure, and
>>>>>> therefore will terminate the job.
>>>>>> --------------------------------------------------------------------------
>>>>>> ================================================================================
>>>>>> 
>>>>>> The error does not appear with the official MPI installed in the
>>>>>> platform. I asked the admins about their compilation options but there
>>>>>> is nothing particular.
>>>>>> 
>>>>>> Moreover it appears only for some node lists. Still, the nodes seem to
>>>>>> be fine since it works with the official version of MPI of the platform.
>>>>>> 
>>>>>> To be sure it is not a network problem I tried to use "-mca btl
>>>>>> tcp,sm,self" or "-mca btl openib,sm,self" with no change.
>>>>>> 
>>>>>> Do you have any idea where this error may come from?
>>>>>> 
>>>>>> Thank you.
>>>>>> 
>>>>>> 
>>>>>> Cyril Bordage.
>>>>>> _______________________________________________
>>>>>> devel mailing list
>>>>>> devel@lists.open-mpi.org
>>>>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
>>>>> 
>>>>> 
>>>> _______________________________________________
>>>> devel mailing list
>>>> devel@lists.open-mpi.org
>>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
>>> 
>>> _______________________________________________
>>> devel mailing list
>>> devel@lists.open-mpi.org
>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
>>> 
>> _______________________________________________
>> devel mailing list
>> devel@lists.open-mpi.org
>> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
> _______________________________________________
> devel mailing list
> devel@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
_______________________________________________
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

Reply via email to