Re: [OMPI devel] OpenMPI v1.8 and v1.8.3 mpirun hangs at execution on an embedded ARM Linux kernel version 3.15.0

2014-12-02 Thread Gilles Gouaillardet
t; <mailto:al...@cs.ucla.edu> > wrote:
>> Thanks Paul! Unfortunately '/boot' is not available in my embedded linux, 
>> and I do not have the configuration file for the old kernel since it is 
>> provided as is. However, I have the new kernel configuration since I 
>> compiled it myself. Would it be helpful if I provide you the .config file 
>> when I compile the kernel? It maybe quite painful to look through that file 
>> though. Is there any other way that I can obtain the configuration?
>>
>> I checked my config for the new kernel, and UNIX-domain sockets and Sys V 
>> IPC are both enabled in the build. Are there any other possibilities I can 
>> check?
>>
>> Thanks,
>> Di
>>
>> --
>> Di Wu (Allan)
>> PhD student, VAST Laboratory <http://vast.cs.ucla.edu/> 
>> <http://vast.cs.ucla.edu/>,
>> Department of Computer Science, UC Los Angeles
>> Email: al...@cs.ucla.edu <mailto:al...@cs.ucla.edu> 
>>
>> On Tue, Nov 25, 2014 at 10:45 AM, Paul Hargrove > <mailto:phhargr...@lbl.gov> > wrote:
>> Allan,
>>
>> A likely possibility is that some important kernel feature (that Open MPI 
>> assumes is present) is missing.
>> That includes not only "kernel modules" as you mention, but also features 
>> configure in (or out) of the base kernel.
>> For instance, some embedded kernels omit UNIX-domain sockets and SysV IPC 
>> support.
>>
>> If you can send me (preferably off-list) the kernel config files for the old 
>> an new kernels I may be able to spot something.
>> If present, you are looking for /boot/config-[VERSION]
>>
>> -Paul
>>
>> On Tue, Nov 25, 2014 at 10:25 AM, Allan Wu > <mailto:al...@cs.ucla.edu> > wrote:
>> I'm sorry I forgot to change the subject when I reply to the digest issue. 
>> Please find my original email below.
>>
>> Regards,
>> Di
>>
>> On Tue, Nov 25, 2014 at 10:19 AM, Allan Wu > <mailto:al...@cs.ucla.edu> > wrote:
>> Thanks Ralph for the reply. Sorry about the log file, I think I forgot to 
>> put an extension to the file. Please find a new one attached with this email.
>>
>> I'm sorry for not enough debugging information, but 'omp_info' and 
>> '--debug-devel' are the only ways I know for collecting information, are 
>> there any other things I can try to provide more info?
>>
>> When I execute 'mpirun --debug-devel -np 1 ./helloworld', all the output is 
>> the logging information in my last email. It got stuck at  "[fpga1:00718] 
>> tmp: /tmp", and nothing from my helloworld program is printed out to the 
>> screen. So I think it is mpirun failing to start my executable, not failing 
>> to terminate.
>>
>> I was wondering if this has anything to do with my newer kernel version, 
>> since it works well in the old case.
>>
>> Thanks,
>> --
>> Di Wu (Allan)
>> PhD student, VAST Laboratory <http://vast.cs.ucla.edu/> 
>> <http://vast.cs.ucla.edu/>,
>> Department of Computer Science, UC Los Angeles
>> Email: al...@cs.ucla.edu <mailto:al...@cs.ucla.edu> 
>>
>>
>> Date: Tue, 25 Nov 2014 07:29:51 -0800
>> From: Ralph Castain mailto:r...@open-mpi.org> 
>> >
>> To: Open MPI Developers mailto:de...@open-mpi.org> 
>> >
>> Subject: Re: [OMPI devel] OpenMPI v1.8 and v1.8.3 mpirun hangs at
>> execution   on an embedded ARM Linux kernel version 3.15.0
>> Message-ID: <898cb117-f6a6-4569-89c3-49b75d65b...@open-mpi.org 
>> <mailto:898cb117-f6a6-4569-89c3-49b75d65b...@open-mpi.org> 
>> <898cb117-f6a6-4569-89c3-49b75d65b...@open-mpi.org>>
>> Content-Type: text/plain; charset="utf-8"
>>
>> I don?t know what you put in that log file, but it was an executable and I?m 
>> not feeling that trusting :-)
>>
>> I?m afraid there isn?t enough debug output there to really tell anything. 
>> From what little I can see, I?m guessing that the application ran fine and 
>> you got the usual ?hello? output and the helloworld process exited safely - 
>> is that correct? And so it is solely mpirun that is failing to cleanly 
>> terminate?
>>
>>
>>
>>  On Nov 24, 2014, at 11:24 PM, Allan Wu > <mailto:al...@cs.ucla.edu> > wrote:
>>
>> Hello everyone,
>>
>> I have cross-compiled OpenMPI for an embedded ARM Linux. Everything works 
>> fine for my system based on Linux 3.8.0. I have previously submitted a post 
>> related to my com

Re: [OMPI devel] OpenMPI v1.8 and v1.8.3 mpirun hangs at execution on an embedded ARM Linux kernel version 3.15.0

2014-12-02 Thread Paul Hargrove
 That includes not only "kernel modules" as you mention, but also features 
> configure in (or out) of the base kernel.
> For instance, some embedded kernels omit UNIX-domain sockets and SysV IPC 
> support.
>
> If you can send me (preferably off-list) the kernel config files for the old 
> an new kernels I may be able to spot something.
> If present, you are looking for /boot/config-[VERSION]
>
> -Paul
>
> On Tue, Nov 25, 2014 at 10:25 AM, Allan Wu  <mailto:al...@cs.ucla.edu> > wrote:
> I'm sorry I forgot to change the subject when I reply to the digest issue. 
> Please find my original email below.
>
> Regards,
> Di
>
> On Tue, Nov 25, 2014 at 10:19 AM, Allan Wu  <mailto:al...@cs.ucla.edu> > wrote:
> Thanks Ralph for the reply. Sorry about the log file, I think I forgot to put 
> an extension to the file. Please find a new one attached with this email.
>
> I'm sorry for not enough debugging information, but 'omp_info' and 
> '--debug-devel' are the only ways I know for collecting information, are 
> there any other things I can try to provide more info?
>
> When I execute 'mpirun --debug-devel -np 1 ./helloworld', all the output is 
> the logging information in my last email. It got stuck at  "[fpga1:00718] 
> tmp: /tmp", and nothing from my helloworld program is printed out to the 
> screen. So I think it is mpirun failing to start my executable, not failing 
> to terminate.
>
> I was wondering if this has anything to do with my newer kernel version, 
> since it works well in the old case.
>
> Thanks,
> --
> Di Wu (Allan)
> PhD student, VAST Laboratory <http://vast.cs.ucla.edu/> 
> <http://vast.cs.ucla.edu/>,
> Department of Computer Science, UC Los Angeles
> Email: al...@cs.ucla.edu <mailto:al...@cs.ucla.edu> 
>
>
> Date: Tue, 25 Nov 2014 07:29:51 -0800
> From: Ralph Castain mailto:r...@open-mpi.org> 
> >
> To: Open MPI Developers mailto:de...@open-mpi.org> 
> >
> Subject: Re: [OMPI devel] OpenMPI v1.8 and v1.8.3 mpirun hangs at
> execution   on an embedded ARM Linux kernel version 3.15.0
> Message-ID: <898cb117-f6a6-4569-89c3-49b75d65b...@open-mpi.org 
> <mailto:898cb117-f6a6-4569-89c3-49b75d65b...@open-mpi.org> 
> <898cb117-f6a6-4569-89c3-49b75d65b...@open-mpi.org>>
> Content-Type: text/plain; charset="utf-8"
>
> I don?t know what you put in that log file, but it was an executable and I?m 
> not feeling that trusting :-)
>
> I?m afraid there isn?t enough debug output there to really tell anything. 
> From what little I can see, I?m guessing that the application ran fine and 
> you got the usual ?hello? output and the helloworld process exited safely - 
> is that correct? And so it is solely mpirun that is failing to cleanly 
> terminate?
>
>
>
>  On Nov 24, 2014, at 11:24 PM, Allan Wu  <mailto:al...@cs.ucla.edu> > wrote:
>
> Hello everyone,
>
> I have cross-compiled OpenMPI for an embedded ARM Linux. Everything works 
> fine for my system based on Linux 3.8.0. I have previously submitted a post 
> related to my compilation, which can be found here: 
> http://www.open-mpi.org/community/lists/devel/2014/04/14440.php 
> <http://www.open-mpi.org/community/lists/devel/2014/04/14440.php> 
> <http://www.open-mpi.org/community/lists/devel/2014/04/14440.php> 
> <http://www.open-mpi.org/community/lists/devel/2014/04/14440.php 
> <http://www.open-mpi.org/community/lists/devel/2014/04/14440.php> 
> <http://www.open-mpi.org/community/lists/devel/2014/04/14440.php>>. When I 
> recently upgraded my Linux kernel to 3.15.0, mpirun begins to stuck
>  at even
>  the helloworld program. The program consists only simple APIs: MPI_Init, 
> MPI_Comm_size, MPI_Comm_rank, MPI_Finalize. The problem occurs even at 
> 'mpirun -np 1 ./helloworld', and below are the output with --debug-devel 
> (before it got stuck):
> [fpga1:00716] sess_dir_finalize: job session dir not empty - leaving
> [fpga1:00716] procdir: /tmp/openmpi-sessions-root@fpga1_0/63813/0/0
> [fpga1:00716] jobdir: /tmp/openmpi-sessions-root@fpga1_0/63813/0
> [fpga1:00716] top: openmpi-sessions-root@fpga1_0
> [fpga1:00716] tmp: /tmp
> [fpga1:00718] procdir: /tmp/openmpi-sessions-root@fpga1_0/63813/1/0
> [fpga1:00718] jobdir: /tmp/openmpi-sessions-root@fpga1_0/63813/1
> [fpga1:00718] top: openmpi-sessions-root@fpga1_0
> [fpga1:00718] tmp: /tmp
>
> I suspect maybe it is due to incompatible kernel version or some missing 
> kernel modules. I tried also with the latest version 1.8.3, and had the same 
> problem. Does anyone have any thoughts? I have attached the output of 
> 'ompi-

Re: [OMPI devel] OpenMPI v1.8 and v1.8.3 mpirun hangs at execution on an embedded ARM Linux kernel version 3.15.0

2014-12-02 Thread Ralph Castain
..@open-mpi.org> <mailto:r...@open-mpi.org> 
>>>>> <mailto:r...@open-mpi.org>> wrote:
>>>>> This is all running on a single node, correct? If so, did you configure 
>>>>> OMPI with —enable-debug? 
>>>>> If you can do that, or already have, then let’s add the following to 
>>>>> the mpirun cmd line: 
>>>>> 
>>>>> -mca state_base_verbose 10 -mca odls_base_verbose 10 -mca 
>>>>> oob_base_verbose 10 
>>>>> 
>>>>> You’ll get a bunch of output, but hopefully it will tell us where 
>>>>> mpirun is encountering a problem. 
>>>>> Ralph 
>>>>> 
>>>>> On Tue, Nov 25, 2014 at 11:20 AM, Paul Hargrove >>>> <mailto:phhargr...@lbl.gov> <mailto:phhargr...@lbl.gov> 
>>>>> <mailto:phhargr...@lbl.gov>> wrote:
>>>>> Allan,
>>>>> 
>>>>> If you send me the .config from your build of the kernel I can compare it 
>>>>> against, for instance, my .config for a Raspberry Pi.
>>>>> There will certainly be many differences, but I am hoping my own 
>>>>> experience configuring linux kernels will help me filter the "noise" from 
>>>>> any differences that might be significant.
>>>>> 
>>>>> -Paul
>>>>> 
>>>>> On Tue, Nov 25, 2014 at 11:11 AM, Allan Wu >>>> <mailto:al...@cs.ucla.edu> <mailto:al...@cs.ucla.edu> 
>>>>> <mailto:al...@cs.ucla.edu>> wrote:
>>>>> Thanks Paul! Unfortunately '/boot' is not available in my embedded linux, 
>>>>> and I do not have the configuration file for the old kernel since it is 
>>>>> provided as is. However, I have the new kernel configuration since I 
>>>>> compiled it myself. Would it be helpful if I provide you the .config file 
>>>>> when I compile the kernel? It maybe quite painful to look through that 
>>>>> file though. Is there any other way that I can obtain the configuration? 
>>>>> 
>>>>> I checked my config for the new kernel, and UNIX-domain sockets and Sys V 
>>>>> IPC are both enabled in the build. Are there any other possibilities I 
>>>>> can check?
>>>>> 
>>>>> Thanks,
>>>>> Di
>>>>> 
>>>>> --
>>>>> Di Wu (Allan)
>>>>> PhD student, VAST Laboratory <http://vast.cs.ucla.edu/> 
>>>>> <http://vast.cs.ucla.edu/>,
>>>>> Department of Computer Science, UC Los Angeles
>>>>> Email: al...@cs.ucla.edu <mailto:al...@cs.ucla.edu> 
>>>>> <mailto:al...@cs.ucla.edu> <mailto:al...@cs.ucla.edu>
>>>>> 
>>>>> On Tue, Nov 25, 2014 at 10:45 AM, Paul Hargrove >>>> <mailto:phhargr...@lbl.gov> <mailto:phhargr...@lbl.gov> 
>>>>> <mailto:phhargr...@lbl.gov>> wrote:
>>>>> Allan,
>>>>> 
>>>>> A likely possibility is that some important kernel feature (that Open MPI 
>>>>> assumes is present) is missing.
>>>>> That includes not only "kernel modules" as you mention, but also features 
>>>>> configure in (or out) of the base kernel.
>>>>> For instance, some embedded kernels omit UNIX-domain sockets and SysV IPC 
>>>>> support.
>>>>> 
>>>>> If you can send me (preferably off-list) the kernel config files for the 
>>>>> old an new kernels I may be able to spot something.
>>>>> If present, you are looking for /boot/config-[VERSION]
>>>>> 
>>>>> -Paul
>>>>> 
>>>>> On Tue, Nov 25, 2014 at 10:25 AM, Allan Wu >>>> <mailto:al...@cs.ucla.edu> <mailto:al...@cs.ucla.edu> 
>>>>> <mailto:al...@cs.ucla.edu>> wrote:
>>>>> I'm sorry I forgot to change the subject when I reply to the digest 
>>>>> issue. Please find my original email below. 
>>>>> 
>>>>> Regards,
>>>>> Di
>>>>> 
>>>>> On Tue, Nov 25, 2014 at 10:19 AM, Allan Wu >>>> <mailto:al...@cs.ucla.edu> <mailto:al...@cs.ucla.edu> 
>>>>> <mailto:al...@cs.ucla.edu>> wrote:
>>>>> Thanks Ralph for the reply. Sorry about the log file, I think I forgot to 
>>>>> put an extension to the file. Pleas

Re: [OMPI devel] OpenMPI v1.8 and v1.8.3 mpirun hangs at execution on an embedded ARM Linux kernel version 3.15.0

2014-11-25 Thread Ralph Castain
 
>>>> any differences that might be significant.
>>>> 
>>>> -Paul
>>>> 
>>>> On Tue, Nov 25, 2014 at 11:11 AM, Allan Wu >>> <mailto:al...@cs.ucla.edu> <mailto:al...@cs.ucla.edu> 
>>>> <mailto:al...@cs.ucla.edu>> wrote:
>>>> Thanks Paul! Unfortunately '/boot' is not available in my embedded linux, 
>>>> and I do not have the configuration file for the old kernel since it is 
>>>> provided as is. However, I have the new kernel configuration since I 
>>>> compiled it myself. Would it be helpful if I provide you the .config file 
>>>> when I compile the kernel? It maybe quite painful to look through that 
>>>> file though. Is there any other way that I can obtain the configuration? 
>>>> 
>>>> I checked my config for the new kernel, and UNIX-domain sockets and Sys V 
>>>> IPC are both enabled in the build. Are there any other possibilities I can 
>>>> check?
>>>> 
>>>> Thanks,
>>>> Di
>>>> 
>>>> --
>>>> Di Wu (Allan)
>>>> PhD student, VAST Laboratory <http://vast.cs.ucla.edu/> 
>>>> <http://vast.cs.ucla.edu/>,
>>>> Department of Computer Science, UC Los Angeles
>>>> Email: al...@cs.ucla.edu <mailto:al...@cs.ucla.edu> 
>>>> <mailto:al...@cs.ucla.edu> <mailto:al...@cs.ucla.edu>
>>>> 
>>>> On Tue, Nov 25, 2014 at 10:45 AM, Paul Hargrove >>> <mailto:phhargr...@lbl.gov> <mailto:phhargr...@lbl.gov> 
>>>> <mailto:phhargr...@lbl.gov>> wrote:
>>>> Allan,
>>>> 
>>>> A likely possibility is that some important kernel feature (that Open MPI 
>>>> assumes is present) is missing.
>>>> That includes not only "kernel modules" as you mention, but also features 
>>>> configure in (or out) of the base kernel.
>>>> For instance, some embedded kernels omit UNIX-domain sockets and SysV IPC 
>>>> support.
>>>> 
>>>> If you can send me (preferably off-list) the kernel config files for the 
>>>> old an new kernels I may be able to spot something.
>>>> If present, you are looking for /boot/config-[VERSION]
>>>> 
>>>> -Paul
>>>> 
>>>> On Tue, Nov 25, 2014 at 10:25 AM, Allan Wu >>> <mailto:al...@cs.ucla.edu> <mailto:al...@cs.ucla.edu> 
>>>> <mailto:al...@cs.ucla.edu>> wrote:
>>>> I'm sorry I forgot to change the subject when I reply to the digest issue. 
>>>> Please find my original email below. 
>>>> 
>>>> Regards,
>>>> Di
>>>> 
>>>> On Tue, Nov 25, 2014 at 10:19 AM, Allan Wu >>> <mailto:al...@cs.ucla.edu> <mailto:al...@cs.ucla.edu> 
>>>> <mailto:al...@cs.ucla.edu>> wrote:
>>>> Thanks Ralph for the reply. Sorry about the log file, I think I forgot to 
>>>> put an extension to the file. Please find a new one attached with this 
>>>> email. 
>>>> 
>>>> I'm sorry for not enough debugging information, but 'omp_info' and 
>>>> '--debug-devel' are the only ways I know for collecting information, are 
>>>> there any other things I can try to provide more info?
>>>> 
>>>> When I execute 'mpirun --debug-devel -np 1 ./helloworld', all the output 
>>>> is the logging information in my last email. It got stuck at  
>>>> "[fpga1:00718] tmp: /tmp", and nothing from my helloworld program is 
>>>> printed out to the screen. So I think it is mpirun failing to start my 
>>>> executable, not failing to terminate.
>>>> 
>>>> I was wondering if this has anything to do with my newer kernel version, 
>>>> since it works well in the old case. 
>>>> 
>>>> Thanks,
>>>> --
>>>> Di Wu (Allan)
>>>> PhD student, VAST Laboratory <http://vast.cs.ucla.edu/> 
>>>> <http://vast.cs.ucla.edu/>,
>>>> Department of Computer Science, UC Los Angeles
>>>> Email: al...@cs.ucla.edu <mailto:al...@cs.ucla.edu> 
>>>> <mailto:al...@cs.ucla.edu> <mailto:al...@cs.ucla.edu>
>>>> 
>>>> 
>>>> Date: Tue, 25 Nov 2014 07:29:51 -0800
>>>> From: Ralph Castain mailto:r...@open-mpi.org> 
>>>> <mailto:r...@open-mpi.org> <mailto

Re: [OMPI devel] OpenMPI v1.8 and v1.8.3 mpirun hangs at execution on an embedded ARM Linux kernel version 3.15.0

2014-11-25 Thread Gilles Gouaillardet
he base kernel.
>>> For instance, some embedded kernels omit UNIX-domain sockets and SysV IPC 
>>> support.
>>>
>>> If you can send me (preferably off-list) the kernel config files for the 
>>> old an new kernels I may be able to spot something.
>>> If present, you are looking for /boot/config-[VERSION]
>>>
>>> -Paul
>>>
>>> On Tue, Nov 25, 2014 at 10:25 AM, Allan Wu >> <mailto:al...@cs.ucla.edu>> wrote:
>>> I'm sorry I forgot to change the subject when I reply to the digest issue. 
>>> Please find my original email below. 
>>>
>>> Regards,
>>> Di
>>>
>>> On Tue, Nov 25, 2014 at 10:19 AM, Allan Wu >> <mailto:al...@cs.ucla.edu>> wrote:
>>> Thanks Ralph for the reply. Sorry about the log file, I think I forgot to 
>>> put an extension to the file. Please find a new one attached with this 
>>> email. 
>>>
>>> I'm sorry for not enough debugging information, but 'omp_info' and 
>>> '--debug-devel' are the only ways I know for collecting information, are 
>>> there any other things I can try to provide more info?
>>>
>>> When I execute 'mpirun --debug-devel -np 1 ./helloworld', all the output is 
>>> the logging information in my last email. It got stuck at  "[fpga1:00718] 
>>> tmp: /tmp", and nothing from my helloworld program is printed out to the 
>>> screen. So I think it is mpirun failing to start my executable, not failing 
>>> to terminate.
>>>
>>> I was wondering if this has anything to do with my newer kernel version, 
>>> since it works well in the old case. 
>>>
>>> Thanks,
>>> --
>>> Di Wu (Allan)
>>> PhD student, VAST Laboratory <http://vast.cs.ucla.edu/>,
>>> Department of Computer Science, UC Los Angeles
>>> Email: al...@cs.ucla.edu <mailto:al...@cs.ucla.edu>
>>>
>>>
>>> Date: Tue, 25 Nov 2014 07:29:51 -0800
>>> From: Ralph Castain mailto:r...@open-mpi.org>>
>>> To: Open MPI Developers mailto:de...@open-mpi.org>>
>>> Subject: Re: [OMPI devel] OpenMPI v1.8 and v1.8.3 mpirun hangs at
>>> execution   on an embedded ARM Linux kernel version 3.15.0
>>> Message-ID: <898cb117-f6a6-4569-89c3-49b75d65b...@open-mpi.org 
>>> <mailto:898cb117-f6a6-4569-89c3-49b75d65b...@open-mpi.org>>
>>> Content-Type: text/plain; charset="utf-8"
>>>
>>> I don?t know what you put in that log file, but it was an executable and 
>>> I?m not feeling that trusting :-)
>>>
>>> I?m afraid there isn?t enough debug output there to really tell anything. 
>>> From what little I can see, I?m guessing that the application ran fine and 
>>> you got the usual ?hello? output and the helloworld process exited safely - 
>>> is that correct? And so it is solely mpirun that is failing to cleanly 
>>> terminate?
>>>
>>>
>>>> On Nov 24, 2014, at 11:24 PM, Allan Wu >>> <mailto:al...@cs.ucla.edu>> wrote:
>>>>
>>>> Hello everyone,
>>>>
>>>> I have cross-compiled OpenMPI for an embedded ARM Linux. Everything works 
>>>> fine for my system based on Linux 3.8.0. I have previously submitted a 
>>>> post related to my compilation, which can be found here: 
>>>> http://www.open-mpi.org/community/lists/devel/2014/04/14440.php 
>>>> <http://www.open-mpi.org/community/lists/devel/2014/04/14440.php> 
>>>> <http://www.open-mpi.org/community/lists/devel/2014/04/14440.php 
>>>> <http://www.open-mpi.org/community/lists/devel/2014/04/14440.php>>. When I 
>>>> recently upgraded my Linux kernel to 3.15.0, mpirun begins to stuck at 
>>>> even the helloworld program. The program consists only simple APIs: 
>>>> MPI_Init, MPI_Comm_size, MPI_Comm_rank, MPI_Finalize. The problem occurs 
>>>> even at 'mpirun -np 1 ./helloworld', and below are the output with 
>>>> --debug-devel (before it got stuck):
>>>> [fpga1:00716] sess_dir_finalize: job session dir not empty - leaving
>>>> [fpga1:00716] procdir: /tmp/openmpi-sessions-root@fpga1_0/63813/0/0
>>>> [fpga1:00716] jobdir: /tmp/openmpi-sessions-root@fpga1_0/63813/0
>>>> [fpga1:00716] top: openmpi-sessions-root@fpga1_0
>>>> [fpga1:00716] tmp: /tmp
>>>> [fpga1:00718] procdir: /tmp/openmpi-sessions-root@fpga1_0/63813/1/0
&g

Re: [OMPI devel] OpenMPI v1.8 and v1.8.3 mpirun hangs at execution on an embedded ARM Linux kernel version 3.15.0

2014-11-25 Thread Ralph Castain
ience, UC Los Angeles
>> Email: al...@cs.ucla.edu <mailto:al...@cs.ucla.edu>
>> 
>> On Tue, Nov 25, 2014 at 11:55 AM, Allan Wu > <mailto:al...@cs.ucla.edu>> wrote:
>> Thanks Ralph!
>> 
>> I did not compile my openmpi with --enable-debug, and I am compiling it now. 
>> But your suggested command already provided some output, which I attached 
>> with this email. 
>> 
>> It seems the process was stuck on the line:
>> "[fpga2:00962] [[44848,1],0] waiting for connect completion to [[44848,0],0] 
>> - activating send event"
>> 
>> Then it got stuck and I CTRL+C'ed it. Previous to that line, it said 
>> something about 'orte_tcp_peer_try_connect: attempting to connect to proc 
>> [[44848,0],0] via interface eth0'.
>> 
>> Regards,
>> Di
>> 
>> On Tue, Nov 25, 2014 at 2:25 PM, Ralph Castain > <mailto:r...@open-mpi.org>> wrote:
>> This is all running on a single node, correct? If so, did you configure OMPI 
>> with —enable-debug? 
>> If you can do that, or already have, then let’s add the following to the 
>> mpirun cmd line: 
>> 
>> -mca state_base_verbose 10 -mca odls_base_verbose 10 -mca oob_base_verbose 
>> 10 
>> 
>> You’ll get a bunch of output, but hopefully it will tell us where mpirun 
>> is encountering a problem. 
>> Ralph 
>> 
>> On Tue, Nov 25, 2014 at 11:20 AM, Paul Hargrove > <mailto:phhargr...@lbl.gov>> wrote:
>> Allan,
>> 
>> If you send me the .config from your build of the kernel I can compare it 
>> against, for instance, my .config for a Raspberry Pi.
>> There will certainly be many differences, but I am hoping my own experience 
>> configuring linux kernels will help me filter the "noise" from any 
>> differences that might be significant.
>> 
>> -Paul
>> 
>> On Tue, Nov 25, 2014 at 11:11 AM, Allan Wu > <mailto:al...@cs.ucla.edu>> wrote:
>> Thanks Paul! Unfortunately '/boot' is not available in my embedded linux, 
>> and I do not have the configuration file for the old kernel since it is 
>> provided as is. However, I have the new kernel configuration since I 
>> compiled it myself. Would it be helpful if I provide you the .config file 
>> when I compile the kernel? It maybe quite painful to look through that file 
>> though. Is there any other way that I can obtain the configuration? 
>> 
>> I checked my config for the new kernel, and UNIX-domain sockets and Sys V 
>> IPC are both enabled in the build. Are there any other possibilities I can 
>> check?
>> 
>> Thanks,
>> Di
>> 
>> --
>> Di Wu (Allan)
>> PhD student, VAST Laboratory <http://vast.cs.ucla.edu/>,
>> Department of Computer Science, UC Los Angeles
>> Email: al...@cs.ucla.edu <mailto:al...@cs.ucla.edu>
>> 
>> On Tue, Nov 25, 2014 at 10:45 AM, Paul Hargrove > <mailto:phhargr...@lbl.gov>> wrote:
>> Allan,
>> 
>> A likely possibility is that some important kernel feature (that Open MPI 
>> assumes is present) is missing.
>> That includes not only "kernel modules" as you mention, but also features 
>> configure in (or out) of the base kernel.
>> For instance, some embedded kernels omit UNIX-domain sockets and SysV IPC 
>> support.
>> 
>> If you can send me (preferably off-list) the kernel config files for the old 
>> an new kernels I may be able to spot something.
>> If present, you are looking for /boot/config-[VERSION]
>> 
>> -Paul
>> 
>> On Tue, Nov 25, 2014 at 10:25 AM, Allan Wu > <mailto:al...@cs.ucla.edu>> wrote:
>> I'm sorry I forgot to change the subject when I reply to the digest issue. 
>> Please find my original email below. 
>> 
>> Regards,
>> Di
>> 
>> On Tue, Nov 25, 2014 at 10:19 AM, Allan Wu > <mailto:al...@cs.ucla.edu>> wrote:
>> Thanks Ralph for the reply. Sorry about the log file, I think I forgot to 
>> put an extension to the file. Please find a new one attached with this 
>> email. 
>> 
>> I'm sorry for not enough debugging information, but 'omp_info' and 
>> '--debug-devel' are the only ways I know for collecting information, are 
>> there any other things I can try to provide more info?
>> 
>> When I execute 'mpirun --debug-devel -np 1 ./helloworld', all the output is 
>> the logging information in my last email. It got stuck at  "[fpga1:00718] 
>> tmp: /tmp", and nothing from my helloworld program is printed out to 

Re: [OMPI devel] OpenMPI v1.8 and v1.8.3 mpirun hangs at execution on an embedded ARM Linux kernel version 3.15.0

2014-11-25 Thread Paul Hargrove
On Tue, Nov 25, 2014 at 5:37 PM, Ralph Castain  wrote:

> So it looks like the issue isn't so much with our code as it is with the
> OS stack, yes? We aren't requiring that the loopback be "up", but the stack
> is in order to establish the connection, even when we are trying a non-lo
> interface.



Correct, as far as I can tell.
It look to me as if the stack says "Hey, that is my own address" and uses
the loopback interface instead of the one associated with the address.

I have checked Mac OSX and Solaris and neither one exhibits this behavior.
I can, if requested, check {Net,Open,Free}BSD as well.

-Paul


-- 
Paul H. Hargrove  phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900


Re: [OMPI devel] OpenMPI v1.8 and v1.8.3 mpirun hangs at execution on an embedded ARM Linux kernel version 3.15.0

2014-11-25 Thread Ralph Castain
hink of checking the lo 
>> interface. Anyway, thanks everyone for all of your kind help. Let me know if 
>> you want me to provide any more information for future references. 
>> 
>> Regards,
>> Allan
>> 
>> --
>> Di Wu (Allan)
>> PhD student, VAST Laboratory <http://vast.cs.ucla.edu/>,
>> Department of Computer Science, UC Los Angeles
>> Email: al...@cs.ucla.edu <mailto:al...@cs.ucla.edu>
>> 
>> On Tue, Nov 25, 2014 at 11:55 AM, Allan Wu > <mailto:al...@cs.ucla.edu>> wrote:
>> Thanks Ralph!
>> 
>> I did not compile my openmpi with --enable-debug, and I am compiling it now. 
>> But your suggested command already provided some output, which I attached 
>> with this email. 
>> 
>> It seems the process was stuck on the line:
>> "[fpga2:00962] [[44848,1],0] waiting for connect completion to [[44848,0],0] 
>> - activating send event"
>> 
>> Then it got stuck and I CTRL+C'ed it. Previous to that line, it said 
>> something about 'orte_tcp_peer_try_connect: attempting to connect to proc 
>> [[44848,0],0] via interface eth0'.
>> 
>> Regards,
>> Di
>> 
>> On Tue, Nov 25, 2014 at 2:25 PM, Ralph Castain > <mailto:r...@open-mpi.org>> wrote:
>> This is all running on a single node, correct? If so, did you configure OMPI 
>> with —enable-debug? 
>> If you can do that, or already have, then let’s add the following to the 
>> mpirun cmd line: 
>> 
>> -mca state_base_verbose 10 -mca odls_base_verbose 10 -mca oob_base_verbose 
>> 10 
>> 
>> You’ll get a bunch of output, but hopefully it will tell us where mpirun 
>> is encountering a problem. 
>> Ralph 
>> 
>> On Tue, Nov 25, 2014 at 11:20 AM, Paul Hargrove > <mailto:phhargr...@lbl.gov>> wrote:
>> Allan,
>> 
>> If you send me the .config from your build of the kernel I can compare it 
>> against, for instance, my .config for a Raspberry Pi.
>> There will certainly be many differences, but I am hoping my own experience 
>> configuring linux kernels will help me filter the "noise" from any 
>> differences that might be significant.
>> 
>> -Paul
>> 
>> On Tue, Nov 25, 2014 at 11:11 AM, Allan Wu > <mailto:al...@cs.ucla.edu>> wrote:
>> Thanks Paul! Unfortunately '/boot' is not available in my embedded linux, 
>> and I do not have the configuration file for the old kernel since it is 
>> provided as is. However, I have the new kernel configuration since I 
>> compiled it myself. Would it be helpful if I provide you the .config file 
>> when I compile the kernel? It maybe quite painful to look through that file 
>> though. Is there any other way that I can obtain the configuration? 
>> 
>> I checked my config for the new kernel, and UNIX-domain sockets and Sys V 
>> IPC are both enabled in the build. Are there any other possibilities I can 
>> check?
>> 
>> Thanks,
>> Di
>> 
>> --
>> Di Wu (Allan)
>> PhD student, VAST Laboratory <http://vast.cs.ucla.edu/>,
>> Department of Computer Science, UC Los Angeles
>> Email: al...@cs.ucla.edu <mailto:al...@cs.ucla.edu>
>> 
>> On Tue, Nov 25, 2014 at 10:45 AM, Paul Hargrove > <mailto:phhargr...@lbl.gov>> wrote:
>> Allan,
>> 
>> A likely possibility is that some important kernel feature (that Open MPI 
>> assumes is present) is missing.
>> That includes not only "kernel modules" as you mention, but also features 
>> configure in (or out) of the base kernel.
>> For instance, some embedded kernels omit UNIX-domain sockets and SysV IPC 
>> support.
>> 
>> If you can send me (preferably off-list) the kernel config files for the old 
>> an new kernels I may be able to spot something.
>> If present, you are looking for /boot/config-[VERSION]
>> 
>> -Paul
>> 
>> On Tue, Nov 25, 2014 at 10:25 AM, Allan Wu > <mailto:al...@cs.ucla.edu>> wrote:
>> I'm sorry I forgot to change the subject when I reply to the digest issue. 
>> Please find my original email below. 
>> 
>> Regards,
>> Di
>> 
>> On Tue, Nov 25, 2014 at 10:19 AM, Allan Wu > <mailto:al...@cs.ucla.edu>> wrote:
>> Thanks Ralph for the reply. Sorry about the log file, I think I forgot to 
>> put an extension to the file. Please find a new one attached with this 
>> email. 
>> 
>> I'm sorry for not enough debugging information, but 'omp_info' and 
>> '--debug-devel' are the only w

Re: [OMPI devel] OpenMPI v1.8 and v1.8.3 mpirun hangs at execution on an embedded ARM Linux kernel version 3.15.0

2014-11-25 Thread Paul Hargrove
848,0],0] - activating send event"
>>>
>>> Then it got stuck and I CTRL+C'ed it. Previous to that line, it said
>>> something about 'orte_tcp_peer_try_connect: attempting to connect to proc
>>> [[44848,0],0] via interface eth0'
>>> .
>>>
>>>
>>> Regards,
>>> Di
>>>
>>> On Tue, Nov 25, 2014 at 2:25 PM, Ralph Castain  wrote:
>>>
>>>> This is all running on a single node, correct? If so, did you configure
>>>> OMPI with â EURO "enable-debug?
>>>>
>>>> If you can do that, or already have, then letâ EURO (tm)s add the 
>>>> following to
>>>> the mpirun cmd line:
>>>>
>>>> -mca state_base_verbose 10 -mca odls_base_verbose 10 -mca
>>>> oob_base_verbose 10
>>>>
>>>> Youâ EURO (tm)ll get a bunch of output, but hopefully it will tell us where
>>>> mpirun is encountering a problem.
>>>> Ralph
>>>> On Tue, Nov 25, 2014 at 11:20 AM, Paul Hargrove 
>>>> wrote:
>>>>
>>>>> Allan,
>>>>>
>>>>> If you send me the .config from your build of the kernel I can compare
>>>>> it against, for instance, my .config for a Raspberry Pi.
>>>>> There will certainly be many differences, but I am hoping my own
>>>>> experience configuring linux kernels will help me filter the "noise" from
>>>>> any differences that might be significant.
>>>>>
>>>>> -Paul
>>>>>
>>>>> On Tue, Nov 25, 2014 at 11:11 AM, Allan Wu  wrote:
>>>>>
>>>>>> Thanks Paul! Unfortunately '/boot' is not available in my embedded
>>>>>> linux, and I do not have the configuration file for the old kernel since 
>>>>>> it
>>>>>> is provided as is. However, I have the new kernel configuration since I
>>>>>> compiled it myself. Would it be helpful if I provide you the .config file
>>>>>> when I compile the kernel? It maybe quite painful to look through that 
>>>>>> file
>>>>>> though. Is there any other way that I can obtain the configuration?
>>>>>>
>>>>>> I checked my config for the new kernel, and UNIX-domain sockets and
>>>>>> Sys V IPC are both enabled in the build. Are there any other 
>>>>>> possibilities
>>>>>> I can check?
>>>>>>
>>>>>> Thanks,
>>>>>> Di
>>>>>>
>>>>>> --
>>>>>> Di Wu (Allan)
>>>>>> PhD student, VAST Laboratory <http://vast.cs.ucla.edu/>,
>>>>>> Department of Computer Science, UC Los Angeles
>>>>>> Email: al...@cs.ucla.edu
>>>>>>
>>>>>> On Tue, Nov 25, 2014 at 10:45 AM, Paul Hargrove 
>>>>>> wrote:
>>>>>>
>>>>>>> Allan,
>>>>>>>
>>>>>>> A likely possibility is that some important kernel feature (that
>>>>>>> Open MPI assumes is present) is missing.
>>>>>>> That includes not only "kernel modules" as you mention, but also
>>>>>>> features configure in (or out) of the base kernel.
>>>>>>> For instance, some embedded kernels omit UNIX-domain sockets and
>>>>>>> SysV IPC support.
>>>>>>>
>>>>>>> If you can send me (preferably off-list) the kernel config files for
>>>>>>> the old an new kernels I may be able to spot something.
>>>>>>> If present, you are looking for /boot/config-[VERSION]
>>>>>>>
>>>>>>> -Paul
>>>>>>>
>>>>>>> On Tue, Nov 25, 2014 at 10:25 AM, Allan Wu 
>>>>>>> wrote:
>>>>>>>
>>>>>>>> I'm sorry I forgot to change the subject when I reply to the digest
>>>>>>>> issue. Please find my original email below.
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Di
>>>>>>>>
>>>>>>>> On Tue, Nov 25, 2014 at 10:19 AM, Allan Wu 
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Thanks Ralph for the reply. Sorry about the log file, I think I
>>>>>>>>> forgot to put a

Re: [OMPI devel] OpenMPI v1.8 and v1.8.3 mpirun hangs at execution on an embedded ARM Linux kernel version 3.15.0

2014-11-25 Thread Ralph Castain
 25, 2014 at 10:45 AM, Paul Hargrove  <mailto:phhargr...@lbl.gov>> wrote:
> Allan,
> 
> A likely possibility is that some important kernel feature (that Open MPI 
> assumes is present) is missing.
> That includes not only "kernel modules" as you mention, but also features 
> configure in (or out) of the base kernel.
> For instance, some embedded kernels omit UNIX-domain sockets and SysV IPC 
> support.
> 
> If you can send me (preferably off-list) the kernel config files for the old 
> an new kernels I may be able to spot something.
> If present, you are looking for /boot/config-[VERSION]
> 
> -Paul
> 
> On Tue, Nov 25, 2014 at 10:25 AM, Allan Wu  <mailto:al...@cs.ucla.edu>> wrote:
> I'm sorry I forgot to change the subject when I reply to the digest issue. 
> Please find my original email below. 
> 
> Regards,
> Di
> 
> On Tue, Nov 25, 2014 at 10:19 AM, Allan Wu  <mailto:al...@cs.ucla.edu>> wrote:
> Thanks Ralph for the reply. Sorry about the log file, I think I forgot to put 
> an extension to the file. Please find a new one attached with this email. 
> 
> I'm sorry for not enough debugging information, but 'omp_info' and 
> '--debug-devel' are the only ways I know for collecting information, are 
> there any other things I can try to provide more info?
> 
> When I execute 'mpirun --debug-devel -np 1 ./helloworld', all the output is 
> the logging information in my last email. It got stuck at  "[fpga1:00718] 
> tmp: /tmp", and nothing from my helloworld program is printed out to the 
> screen. So I think it is mpirun failing to start my executable, not failing 
> to terminate.
> 
> I was wondering if this has anything to do with my newer kernel version, 
> since it works well in the old case. 
> 
> Thanks,
> --
> Di Wu (Allan)
> PhD student, VAST Laboratory <http://vast.cs.ucla.edu/>,
> Department of Computer Science, UC Los Angeles
> Email: al...@cs.ucla.edu <mailto:al...@cs.ucla.edu>
> 
> 
> Date: Tue, 25 Nov 2014 07:29:51 -0800
> From: Ralph Castain mailto:r...@open-mpi.org>>
> To: Open MPI Developers mailto:de...@open-mpi.org>>
> Subject: Re: [OMPI devel] OpenMPI v1.8 and v1.8.3 mpirun hangs at
> execution   on an embedded ARM Linux kernel version 3.15.0
> Message-ID: <898cb117-f6a6-4569-89c3-49b75d65b...@open-mpi.org 
> <mailto:898cb117-f6a6-4569-89c3-49b75d65b...@open-mpi.org>>
> Content-Type: text/plain; charset="utf-8"
> 
> I don?t know what you put in that log file, but it was an executable and I?m 
> not feeling that trusting :-)
> 
> I?m afraid there isn?t enough debug output there to really tell anything. 
> From what little I can see, I?m guessing that the application ran fine and 
> you got the usual ?hello? output and the helloworld process exited safely - 
> is that correct? And so it is solely mpirun that is failing to cleanly 
> terminate?
> 
> 
> > On Nov 24, 2014, at 11:24 PM, Allan Wu  > <mailto:al...@cs.ucla.edu>> wrote:
> >
> > Hello everyone,
> >
> > I have cross-compiled OpenMPI for an embedded ARM Linux. Everything works 
> > fine for my system based on Linux 3.8.0. I have previously submitted a post 
> > related to my compilation, which can be found here: 
> > http://www.open-mpi.org/community/lists/devel/2014/04/14440.php 
> > <http://www.open-mpi.org/community/lists/devel/2014/04/14440.php> 
> > <http://www.open-mpi.org/community/lists/devel/2014/04/14440.php 
> > <http://www.open-mpi.org/community/lists/devel/2014/04/14440.php>>. When I 
> > recently upgraded my Linux kernel to 3.15.0, mpirun begins to stuck at even 
> > the helloworld program. The program consists only simple APIs: MPI_Init, 
> > MPI_Comm_size, MPI_Comm_rank, MPI_Finalize. The problem occurs even at 
> > 'mpirun -np 1 ./helloworld', and below are the output with --debug-devel 
> > (before it got stuck):
> > [fpga1:00716] sess_dir_finalize: job session dir not empty - leaving
> > [fpga1:00716] procdir: /tmp/openmpi-sessions-root@fpga1_0/63813/0/0
> > [fpga1:00716] jobdir: /tmp/openmpi-sessions-root@fpga1_0/63813/0
> > [fpga1:00716] top: openmpi-sessions-root@fpga1_0
> > [fpga1:00716] tmp: /tmp
> > [fpga1:00718] procdir: /tmp/openmpi-sessions-root@fpga1_0/63813/1/0
> > [fpga1:00718] jobdir: /tmp/openmpi-sessions-root@fpga1_0/63813/1
> > [fpga1:00718] top: openmpi-sessions-root@fpga1_0
> > [fpga1:00718] tmp: /tmp
> >
> > I suspect maybe it is due to incompatible kernel version or some missing 
> > kernel modules. I tried also with the latest version 1.8.3, and had the 

Re: [OMPI devel] OpenMPI v1.8 and v1.8.3 mpirun hangs at execution on an embedded ARM Linux kernel version 3.15.0

2014-11-25 Thread Paul Hargrove
gt;> Di
>>>>>
>>>>> --
>>>>> Di Wu (Allan)
>>>>> PhD student, VAST Laboratory <http://vast.cs.ucla.edu/>,
>>>>> Department of Computer Science, UC Los Angeles
>>>>> Email: al...@cs.ucla.edu
>>>>>
>>>>> On Tue, Nov 25, 2014 at 10:45 AM, Paul Hargrove 
>>>>> wrote:
>>>>>
>>>>>> Allan,
>>>>>>
>>>>>> A likely possibility is that some important kernel feature (that Open
>>>>>> MPI assumes is present) is missing.
>>>>>> That includes not only "kernel modules" as you mention, but also
>>>>>> features configure in (or out) of the base kernel.
>>>>>> For instance, some embedded kernels omit UNIX-domain sockets and SysV
>>>>>> IPC support.
>>>>>>
>>>>>> If you can send me (preferably off-list) the kernel config files for
>>>>>> the old an new kernels I may be able to spot something.
>>>>>> If present, you are looking for /boot/config-[VERSION]
>>>>>>
>>>>>> -Paul
>>>>>>
>>>>>> On Tue, Nov 25, 2014 at 10:25 AM, Allan Wu  wrote:
>>>>>>
>>>>>>> I'm sorry I forgot to change the subject when I reply to the digest
>>>>>>> issue. Please find my original email below.
>>>>>>>
>>>>>>> Regards,
>>>>>>> Di
>>>>>>>
>>>>>>> On Tue, Nov 25, 2014 at 10:19 AM, Allan Wu 
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Thanks Ralph for the reply. Sorry about the log file, I think I
>>>>>>>> forgot to put an extension to the file. Please find a new one attached 
>>>>>>>> with
>>>>>>>> this email.
>>>>>>>>
>>>>>>>> I'm sorry for not enough debugging information, but 'omp_info' and
>>>>>>>> '--debug-devel' are the only ways I know for collecting information, 
>>>>>>>> are
>>>>>>>> there any other things I can try to provide more info?
>>>>>>>>
>>>>>>>> When I execute 'mpirun --debug-devel -np 1 ./helloworld', all the
>>>>>>>> output is the logging information in my last email. It got stuck at
>>>>>>>>  "[fpga1:00718] tmp: /tmp", and nothing from my helloworld program
>>>>>>>> is printed out to the screen. So I think it is mpirun failing to start 
>>>>>>>> my
>>>>>>>> executable, not failing to terminate.
>>>>>>>>
>>>>>>>> I was wondering if this has anything to do with my newer kernel
>>>>>>>> version, since it works well in the old case.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> --
>>>>>>>> Di Wu (Allan)
>>>>>>>> PhD student, VAST Laboratory <http://vast.cs.ucla.edu/>,
>>>>>>>> Department of Computer Science, UC Los Angeles
>>>>>>>> Email: al...@cs.ucla.edu
>>>>>>>>
>>>>>>>>
>>>>>>>> Date: Tue, 25 Nov 2014 07:29:51 -0800
>>>>>>>> From:
>>>>>>>> Ralph Castain 
>>>>>>>> To: Open MPI Developers 
>>>>>>>> Subject: Re: [OMPI devel] OpenMPI v1.8 and v1.8.3 mpirun hangs at
>>>>>>>> execution   on an embedded ARM Linux kernel version
>>>>>>>> 3.15.0
>>>>>>>> Message-ID: <898cb117-f6a6-4569-89c3-49b75d65b...@open-mpi.org>
>>>>>>>> Content-Type: text/plain; charset="utf-8"
>>>>>>>>
>>>>>>>> I don?t know what you put in that log file, but it was an
>>>>>>>> executable and I?m not feeling that trusting :-)
>>>>>>>>
>>>>>>>> I?m afraid there isn?t enough debug output there to really tell
>>>>>>>> anything. From what little I can see, I?m guessing that the 
>>>>>>>> application ran
>>>>>>>> fine and you got the usu

Re: [OMPI devel] OpenMPI v1.8 and v1.8.3 mpirun hangs at execution on an embedded ARM Linux kernel version 3.15.0

2014-11-25 Thread Allan Wu
>>>>> -Paul
>>>>>
>>>>> On Tue, Nov 25, 2014 at 10:25 AM, Allan Wu  wrote:
>>>>>
>>>>>> I'm sorry I forgot to change the subject when I reply to the digest
>>>>>> issue. Please find my original email below.
>>>>>>
>>>>>> Regards,
>>>>>> Di
>>>>>>
>>>>>> On Tue, Nov 25, 2014 at 10:19 AM, Allan Wu  wrote:
>>>>>>
>>>>>>> Thanks Ralph for the reply. Sorry about the log file, I think I
>>>>>>> forgot to put an extension to the file. Please find a new one attached 
>>>>>>> with
>>>>>>> this email.
>>>>>>>
>>>>>>> I'm sorry for not enough debugging information, but 'omp_info' and
>>>>>>> '--debug-devel' are the only ways I know for collecting information, are
>>>>>>> there any other things I can try to provide more info?
>>>>>>>
>>>>>>> When I execute 'mpirun --debug-devel -np 1 ./helloworld', all the
>>>>>>> output is the logging information in my last email. It got stuck at
>>>>>>>  "[fpga1:00718] tmp: /tmp", and nothing from my helloworld program
>>>>>>> is printed out to the screen. So I think it is mpirun failing to start 
>>>>>>> my
>>>>>>> executable, not failing to terminate.
>>>>>>>
>>>>>>> I was wondering if this has anything to do with my newer kernel
>>>>>>> version, since it works well in the old case.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> --
>>>>>>> Di Wu (Allan)
>>>>>>> PhD student, VAST Laboratory <http://vast.cs.ucla.edu/>,
>>>>>>> Department of Computer Science, UC Los Angeles
>>>>>>> Email: al...@cs.ucla.edu
>>>>>>>
>>>>>>>
>>>>>>> Date: Tue, 25 Nov 2014 07:29:51 -0800
>>>>>>> From:
>>>>>>> Ralph Castain 
>>>>>>> To: Open MPI Developers 
>>>>>>> Subject: Re: [OMPI devel] OpenMPI v1.8 and v1.8.3 mpirun hangs at
>>>>>>> execution   on an embedded ARM Linux kernel version
>>>>>>> 3.15.0
>>>>>>> Message-ID: <898cb117-f6a6-4569-89c3-49b75d65b...@open-mpi.org>
>>>>>>> Content-Type: text/plain; charset="utf-8"
>>>>>>>
>>>>>>> I don?t know what you put in that log file, but it was an executable
>>>>>>> and I?m not feeling that trusting :-)
>>>>>>>
>>>>>>> I?m afraid there isn?t enough debug output there to really tell
>>>>>>> anything. From what little I can see, I?m guessing that the application 
>>>>>>> ran
>>>>>>> fine and you got the usual ?hello? output and the helloworld process 
>>>>>>> exited
>>>>>>> safely - is that correct? And so it is solely mpirun that is failing to
>>>>>>> cleanly terminate?
>>>>>>>
>>>>>>>
>>>>>>> > On Nov 24, 2014, at 11:24 PM, Allan Wu  wrote:
>>>>>>> >
>>>>>>> > Hello everyone,
>>>>>>> >
>>>>>>> > I have cross-compiled OpenMPI for an embedded ARM Linux.
>>>>>>> Everything works fine for my system based on Linux 3.8.0. I have 
>>>>>>> previously
>>>>>>> submitted a post related to my compilation, which can be found here:
>>>>>>> http://www.open-mpi.org/community/lists/devel/2014/04/14440.php <
>>>>>>> http://www.open-mpi.org/community/lists/devel/2014/04/14440.php>.
>>>>>>> When I recently upgraded my Linux kernel to 3.15.0, mpirun begins to 
>>>>>>> stuck
>>>>>>> at even the helloworld program. The program consists only simple APIs:
>>>>>>> MPI_Init, MPI_Comm_size, MPI_Comm_rank, MPI_Finalize. The problem occurs
>>>>>>> even at 'mpirun -np 1 ./helloworld', and below are the output with
>>>>>>> --debug-devel (before it got stuck):
>>>>>>> > [fpga1:00716] sess_dir_finalize: job session dir not empty -
>>>>>>> leaving
>>>>>>> > [fpga1:00716] procdir: /tmp/openmpi-sessions-root@
>>>>>>> fpga1_0/63813/0/0
>>>>>>> > [fpga1:00716] jobdir: /tmp/openmpi-sessions-root@fpga1_0/63813/0
>>>>>>> > [fpga1:00716] top: openmpi-sessions-root@fpga1_0
>>>>>>> > [fpga1:00716] tmp: /tmp
>>>>>>> > [fpga1:00718] procdir: /tmp/openmpi-sessions-root@
>>>>>>> fpga1_0/63813/1/0
>>>>>>> > [fpga1:00718] jobdir: /tmp/openmpi-sessions-root@fpga1_0/63813/1
>>>>>>> > [fpga1:00718] top: openmpi-sessions-root@fpga1_0
>>>>>>> >
>>>>>>> [fpga1:00718] tmp: /tmp
>>>>>>> >
>>>>>>> > I suspect maybe it is due to incompatible kernel version or some
>>>>>>> missing kernel modules. I tried also with the latest version 1.8.3, and 
>>>>>>> had
>>>>>>> the same problem. Does anyone have any thoughts? I have attached the 
>>>>>>> output
>>>>>>> of 'ompi-info --all' with this email.
>>>>>>> >
>>>>>>> > Please let me know if I need to provide more information. Thanks
>>>>>>> in advance!
>>>>>>> >
>>>>>>> > Regards,
>>>>>>> > --
>>>>>>> > Di Wu (Allan)
>>>>>>> > PhD student, VAST?Laboratory <http://vast.cs.ucla.edu/>,
>>>>>>> > Department of Computer Science, UC Los Angeles
>>>>>>> > Email: al...@cs.ucla.edu <mailto:al...@cs.ucla.edu>
>>>>>>> > ___
>>>>>>> > devel mailing list
>>>>>>> > de...@open-mpi.org
>>>>>>> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>> > Link to this post:
>>>>>>> http://www.open-mpi.org/community/lists/devel/2014/11/16330.php
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> ___
>>>>>> devel mailing list
>>>>>> de...@open-mpi.org
>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>> Link to this post:
>>>>>> http://www.open-mpi.org/community/lists/devel/2014/11/16341.php
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Paul H. Hargrove  phhargr...@lbl.gov
>>>>> Computer Languages & Systems Software (CLaSS) Group
>>>>> Computer Science Department   Tel: +1-510-495-2352
>>>>> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Paul H. Hargrove  phhargr...@lbl.gov
>>> Computer Languages & Systems Software (CLaSS) Group
>>> Computer Science Department   Tel: +1-510-495-2352
>>> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
>>>
>>
>>
>


Re: [OMPI devel] OpenMPI v1.8 and v1.8.3 mpirun hangs at execution on an embedded ARM Linux kernel version 3.15.0

2014-11-25 Thread Paul Hargrove
Following Larry's suggestion to use /proc/config.gz, Allan sent me kernel
configs for the old (3.8) and new (3.15) kernels.
While there were more changes than I expected, none relates to removing an
API/feature that Open MPI is likely to be using.

-Paul

On Tue, Nov 25, 2014 at 11:28 AM, Larry Baker  wrote:

> Allan,
>
> If you can still boot the old embedded system, a lot of times the config
> parameters are saved as /proc/config.gz.  You can at least them compare the
> two configs.
>
> Larry Baker
> US Geological Survey
> 650-329-5608
> ba...@usgs.gov
>
>
>
> On 25 Nov 2014, at 11:11 AM, Allan Wu wrote:
>
> Thanks Paul! Unfortunately '/boot' is not available in my embedded linux,
> and I do not have the configuration file for the old kernel since it is
> provided as is. However, I have the new kernel configuration since I
> compiled it myself. Would it be helpful if I provide you the .config file
> when I compile the kernel? It maybe quite painful to look through that file
> though. Is there any other way that I can obtain the configuration?
>
> I checked my config for the new kernel, and UNIX-domain sockets and Sys V
> IPC are both enabled in the build. Are there any other possibilities I can
> check?
>
> Thanks,
> Di
>
> --
> Di Wu (Allan)
> PhD student, VAST Laboratory <http://vast.cs.ucla.edu/>,
> Department of Computer Science, UC Los Angeles
> Email: al...@cs.ucla.edu
>
> On Tue, Nov 25, 2014 at 10:45 AM, Paul Hargrove 
> wrote:
>
>> Allan,
>>
>> A likely possibility is that some important kernel feature (that Open MPI
>> assumes is present) is missing.
>> That includes not only "kernel modules" as you mention, but also features
>> configure in (or out) of the base kernel.
>> For instance, some embedded kernels omit UNIX-domain sockets and SysV IPC
>> support.
>>
>> If you can send me (preferably off-list) the kernel config files for the
>> old an new kernels I may be able to spot something.
>> If present, you are looking for /boot/config-[VERSION]
>>
>> -Paul
>>
>> On Tue, Nov 25, 2014 at 10:25 AM, Allan Wu  wrote:
>>
>>> I'm sorry I forgot to change the subject when I reply to the digest
>>> issue. Please find my original email below.
>>>
>>> Regards,
>>> Di
>>>
>>> On Tue, Nov 25, 2014 at 10:19 AM, Allan Wu  wrote:
>>>
>>>> Thanks Ralph for the reply. Sorry about the log file, I think I forgot
>>>> to put an extension to the file. Please find a new one attached with this
>>>> email.
>>>>
>>>> I'm sorry for not enough debugging information, but 'omp_info' and
>>>> '--debug-devel' are the only ways I know for collecting information, are
>>>> there any other things I can try to provide more info?
>>>>
>>>> When I execute 'mpirun --debug-devel -np 1 ./helloworld', all the
>>>> output is the logging information in my last email. It got stuck at
>>>>  "[fpga1:00718] tmp: /tmp", and nothing from my helloworld program is
>>>> printed out to the screen. So I think it is mpirun failing to start my
>>>> executable, not failing to terminate.
>>>>
>>>> I was wondering if this has anything to do with my newer kernel
>>>> version, since it works well in the old case.
>>>>
>>>> Thanks,
>>>> --
>>>> Di Wu (Allan)
>>>> PhD student, VAST Laboratory <http://vast.cs.ucla.edu/>,
>>>> Department of Computer Science, UC Los Angeles
>>>> Email: al...@cs.ucla.edu
>>>>
>>>>
>>>> Date: Tue, 25 Nov 2014 07:29:51 -0800
>>>> From:
>>>> Ralph Castain 
>>>> To: Open MPI Developers 
>>>> Subject: Re: [OMPI devel] OpenMPI v1.8 and v1.8.3 mpirun hangs at
>>>> execution   on an embedded ARM Linux kernel version 3.15.0
>>>> Message-ID: <898cb117-f6a6-4569-89c3-49b75d65b...@open-mpi.org>
>>>> Content-Type: text/plain; charset="utf-8"
>>>>
>>>> I don?t know what you put in that log file, but it was an executable
>>>> and I?m not feeling that trusting :-)
>>>>
>>>> I?m afraid there isn?t enough debug output there to really tell
>>>> anything. From what little I can see, I?m guessing that the application ran
>>>> fine and you got the usual ?hello? output and the helloworld process exited
>>>> safely - is that correct? And so it is solely mpirun that is 

Re: [OMPI devel] OpenMPI v1.8 and v1.8.3 mpirun hangs at execution on an embedded ARM Linux kernel version 3.15.0

2014-11-25 Thread Allan Wu
Thanks Ralph!

I did not compile my openmpi with --enable-debug, and I am compiling it
now. But your suggested command already provide
​d​
some output, which I attached with this email.

It seems the process was stuck on the line:
"[fpga2:00962] [[44848,1],0] waiting for connect completion to
[[44848,0],0] - activating send event"

Then it got stuck and I CTRL+C'ed it. Previous to that line, it said
something about 'orte_tcp_peer_try_connect: attempting to connect to proc
[[44848,0],0] via interface eth0'
​.​


Regards,
Di

On Tue, Nov 25, 2014 at 2:25 PM, Ralph Castain  wrote:

> ​
> This is all running on a single node, correct? If so, did you configure
> OMPI with —enable-debug?
>
> If you can do that, or already have, then let’s add the following to the
> mpirun cmd line:
>
> -mca state_base_verbose 10 -mca odls_base_verbose 10 -mca oob_base_verbose
> 10
>
> You’ll get a bunch of output, but hopefully it will tell us where mpirun
> is encountering a problem.
> Ralph
> On Tue, Nov 25, 2014 at 11:20 AM, Paul Hargrove 
> wrote:
>
>> Allan,
>>
>> If you send me the .config from your build of the kernel I can compare it
>> against, for instance, my .config for a Raspberry Pi.
>> There will certainly be many differences, but I am hoping my own
>> experience configuring linux kernels will help me filter the "noise" from
>> any differences that might be significant.
>>
>> -Paul
>>
>> On Tue, Nov 25, 2014 at 11:11 AM, Allan Wu  wrote:
>>
>>> Thanks Paul! Unfortunately '/boot' is not available in my embedded
>>> linux, and I do not have the configuration file for the old kernel since it
>>> is provided as is. However, I have the new kernel configuration since I
>>> compiled it myself. Would it be helpful if I provide you the .config file
>>> when I compile the kernel? It maybe quite painful to look through that file
>>> though. Is there any other way that I can obtain the configuration?
>>>
>>> I checked my config for the new kernel, and UNIX-domain sockets and Sys
>>> V IPC are both enabled in the build. Are there any other possibilities I
>>> can check?
>>>
>>> Thanks,
>>> Di
>>>
>>> --
>>> Di Wu (Allan)
>>> PhD student, VAST Laboratory <http://vast.cs.ucla.edu/>,
>>> Department of Computer Science, UC Los Angeles
>>> Email: al...@cs.ucla.edu
>>>
>>> On Tue, Nov 25, 2014 at 10:45 AM, Paul Hargrove 
>>> wrote:
>>>
>>>> Allan,
>>>>
>>>> A likely possibility is that some important kernel feature (that Open
>>>> MPI assumes is present) is missing.
>>>> That includes not only "kernel modules" as you mention, but also
>>>> features configure in (or out) of the base kernel.
>>>> For instance, some embedded kernels omit UNIX-domain sockets and SysV
>>>> IPC support.
>>>>
>>>> If you can send me (preferably off-list) the kernel config files for
>>>> the old an new kernels I may be able to spot something.
>>>> If present, you are looking for /boot/config-[VERSION]
>>>>
>>>> -Paul
>>>>
>>>> On Tue, Nov 25, 2014 at 10:25 AM, Allan Wu  wrote:
>>>>
>>>>> I'm sorry I forgot to change the subject when I reply to the digest
>>>>> issue. Please find my original email below.
>>>>>
>>>>> Regards,
>>>>> Di
>>>>>
>>>>> On Tue, Nov 25, 2014 at 10:19 AM, Allan Wu  wrote:
>>>>>
>>>>>> Thanks Ralph for the reply. Sorry about the log file, I think I
>>>>>> forgot to put an extension to the file. Please find a new one attached 
>>>>>> with
>>>>>> this email.
>>>>>>
>>>>>> I'm sorry for not enough debugging information, but 'omp_info' and
>>>>>> '--debug-devel' are the only ways I know for collecting information, are
>>>>>> there any other things I can try to provide more info?
>>>>>>
>>>>>> When I execute 'mpirun --debug-devel -np 1 ./helloworld', all the
>>>>>> output is the logging information in my last email. It got stuck at
>>>>>>  "[fpga1:00718] tmp: /tmp", and nothing from my helloworld program is
>>>>>> printed out to the screen. So I think it is mpirun failing to start my
>>>>>> executable, not failing to terminate.
&

Re: [OMPI devel] OpenMPI v1.8 and v1.8.3 mpirun hangs at execution on an embedded ARM Linux kernel version 3.15.0

2014-11-25 Thread Larry Baker
Allan,

If you can still boot the old embedded system, a lot of times the config 
parameters are saved as /proc/config.gz.  You can at least them compare the two 
configs.

Larry Baker
US Geological Survey
650-329-5608
ba...@usgs.gov



On 25 Nov 2014, at 11:11 AM, Allan Wu wrote:

> Thanks Paul! Unfortunately '/boot' is not available in my embedded linux, and 
> I do not have the configuration file for the old kernel since it is provided 
> as is. However, I have the new kernel configuration since I compiled it 
> myself. Would it be helpful if I provide you the .config file when I compile 
> the kernel? It maybe quite painful to look through that file though. Is there 
> any other way that I can obtain the configuration? 
> 
> I checked my config for the new kernel, and UNIX-domain sockets and Sys V IPC 
> are both enabled in the build. Are there any other possibilities I can check?
> 
> Thanks,
> Di
> 
> --
> Di Wu (Allan)
> PhD student, VAST Laboratory,
> Department of Computer Science, UC Los Angeles
> Email: al...@cs.ucla.edu
> 
> On Tue, Nov 25, 2014 at 10:45 AM, Paul Hargrove  wrote:
> Allan,
> 
> A likely possibility is that some important kernel feature (that Open MPI 
> assumes is present) is missing.
> That includes not only "kernel modules" as you mention, but also features 
> configure in (or out) of the base kernel.
> For instance, some embedded kernels omit UNIX-domain sockets and SysV IPC 
> support.
> 
> If you can send me (preferably off-list) the kernel config files for the old 
> an new kernels I may be able to spot something.
> If present, you are looking for /boot/config-[VERSION]
> 
> -Paul
> 
> On Tue, Nov 25, 2014 at 10:25 AM, Allan Wu  wrote:
> I'm sorry I forgot to change the subject when I reply to the digest issue. 
> Please find my original email below. 
> 
> Regards,
> Di
> 
> On Tue, Nov 25, 2014 at 10:19 AM, Allan Wu  wrote:
> Thanks Ralph for the reply. Sorry about the log file, I think I forgot to put 
> an extension to the file. Please find a new one attached with this email. 
> 
> I'm sorry for not enough debugging information, but 'omp_info' and 
> '--debug-devel' are the only ways I know for collecting information, are 
> there any other things I can try to provide more info?
> 
> When I execute 'mpirun --debug-devel -np 1 ./helloworld', all the output is 
> the logging information in my last email. It got stuck at  "[fpga1:00718] 
> tmp: /tmp", and nothing from my helloworld program is printed out to the 
> screen. So I think it is mpirun failing to start my executable, not failing 
> to terminate.
> 
> I was wondering if this has anything to do with my newer kernel version, 
> since it works well in the old case. 
> 
> Thanks,
> --
> Di Wu (Allan)
> PhD student, VAST Laboratory,
> Department of Computer Science, UC Los Angeles
> Email: al...@cs.ucla.edu
> 
> 
> Date: Tue, 25 Nov 2014 07:29:51 -0800
> From: Ralph Castain 
> To: Open MPI Developers 
> Subject: Re: [OMPI devel] OpenMPI v1.8 and v1.8.3 mpirun hangs at
> execution   on an embedded ARM Linux kernel version 3.15.0
> Message-ID: <898cb117-f6a6-4569-89c3-49b75d65b...@open-mpi.org>
> Content-Type: text/plain; charset="utf-8"
> 
> I don?t know what you put in that log file, but it was an executable and I?m 
> not feeling that trusting :-)
> 
> I?m afraid there isn?t enough debug output there to really tell anything. 
> From what little I can see, I?m guessing that the application ran fine and 
> you got the usual ?hello? output and the helloworld process exited safely - 
> is that correct? And so it is solely mpirun that is failing to cleanly 
> terminate?
> 
> 
> > On Nov 24, 2014, at 11:24 PM, Allan Wu  wrote:
> >
> > Hello everyone,
> >
> > I have cross-compiled OpenMPI for an embedded ARM Linux. Everything works 
> > fine for my system based on Linux 3.8.0. I have previously submitted a post 
> > related to my compilation, which can be found here: 
> > http://www.open-mpi.org/community/lists/devel/2014/04/14440.php 
> > <http://www.open-mpi.org/community/lists/devel/2014/04/14440.php>. When I 
> > recently upgraded my Linux kernel to 3.15.0, mpirun begins to stuck at even 
> > the helloworld program. The program consists only simple APIs: MPI_Init, 
> > MPI_Comm_size, MPI_Comm_rank, MPI_Finalize. The problem occurs even at 
> > 'mpirun -np 1 ./helloworld', and below are the output with --debug-devel 
> > (before it got stuck):
> > [fpga1:00716] sess_dir_finalize: job session dir not empty - leaving
> > [fpga1:00716] procdir: /tmp/openmpi-session

Re: [OMPI devel] OpenMPI v1.8 and v1.8.3 mpirun hangs at execution on an embedded ARM Linux kernel version 3.15.0

2014-11-25 Thread Ralph Castain
This is all running on a single node, correct? If so, did you configure OMPI 
with —enable-debug?

If you can do that, or already have, then let’s add the following to the mpirun 
cmd line:

-mca state_base_verbose 10 -mca odls_base_verbose 10 -mca oob_base_verbose 10

You’ll get a bunch of output, but hopefully it will tell us where mpirun is 
encountering a problem.
Ralph


> On Nov 25, 2014, at 11:11 AM, Allan Wu  wrote:
> 
> Thanks Paul! Unfortunately '/boot' is not available in my embedded linux, and 
> I do not have the configuration file for the old kernel since it is provided 
> as is. However, I have the new kernel configuration since I compiled it 
> myself. Would it be helpful if I provide you the .config file when I compile 
> the kernel? It maybe quite painful to look through that file though. Is there 
> any other way that I can obtain the configuration? 
> 
> I checked my config for the new kernel, and UNIX-domain sockets and Sys V IPC 
> are both enabled in the build. Are there any other possibilities I can check?
> 
> Thanks,
> Di
> 
> --
> Di Wu (Allan)
> PhD student, VAST Laboratory <http://vast.cs.ucla.edu/>,
> Department of Computer Science, UC Los Angeles
> Email: al...@cs.ucla.edu <mailto:al...@cs.ucla.edu>
> 
> On Tue, Nov 25, 2014 at 10:45 AM, Paul Hargrove  <mailto:phhargr...@lbl.gov>> wrote:
> Allan,
> 
> A likely possibility is that some important kernel feature (that Open MPI 
> assumes is present) is missing.
> That includes not only "kernel modules" as you mention, but also features 
> configure in (or out) of the base kernel.
> For instance, some embedded kernels omit UNIX-domain sockets and SysV IPC 
> support.
> 
> If you can send me (preferably off-list) the kernel config files for the old 
> an new kernels I may be able to spot something.
> If present, you are looking for /boot/config-[VERSION]
> 
> -Paul
> 
> On Tue, Nov 25, 2014 at 10:25 AM, Allan Wu  <mailto:al...@cs.ucla.edu>> wrote:
> I'm sorry I forgot to change the subject when I reply to the digest issue. 
> Please find my original email below. 
> 
> Regards,
> Di
> 
> On Tue, Nov 25, 2014 at 10:19 AM, Allan Wu  <mailto:al...@cs.ucla.edu>> wrote:
> Thanks Ralph for the reply. Sorry about the log file, I think I forgot to put 
> an extension to the file. Please find a new one attached with this email. 
> 
> I'm sorry for not enough debugging information, but 'omp_info' and 
> '--debug-devel' are the only ways I know for collecting information, are 
> there any other things I can try to provide more info?
> 
> When I execute 'mpirun --debug-devel -np 1 ./helloworld', all the output is 
> the logging information in my last email. It got stuck at  "[fpga1:00718] 
> tmp: /tmp", and nothing from my helloworld program is printed out to the 
> screen. So I think it is mpirun failing to start my executable, not failing 
> to terminate.
> 
> I was wondering if this has anything to do with my newer kernel version, 
> since it works well in the old case. 
> 
> Thanks,
> --
> Di Wu (Allan)
> PhD student, VAST Laboratory <http://vast.cs.ucla.edu/>,
> Department of Computer Science, UC Los Angeles
> Email: al...@cs.ucla.edu <mailto:al...@cs.ucla.edu>
> 
> 
> Date: Tue, 25 Nov 2014 07:29:51 -0800
> From: Ralph Castain mailto:r...@open-mpi.org>>
> To: Open MPI Developers mailto:de...@open-mpi.org>>
> Subject: Re: [OMPI devel] OpenMPI v1.8 and v1.8.3 mpirun hangs at
> execution   on an embedded ARM Linux kernel version 3.15.0
> Message-ID: <898cb117-f6a6-4569-89c3-49b75d65b...@open-mpi.org 
> <mailto:898cb117-f6a6-4569-89c3-49b75d65b...@open-mpi.org>>
> Content-Type: text/plain; charset="utf-8"
> 
> I don?t know what you put in that log file, but it was an executable and I?m 
> not feeling that trusting :-)
> 
> I?m afraid there isn?t enough debug output there to really tell anything. 
> From what little I can see, I?m guessing that the application ran fine and 
> you got the usual ?hello? output and the helloworld process exited safely - 
> is that correct? And so it is solely mpirun that is failing to cleanly 
> terminate?
> 
> 
> > On Nov 24, 2014, at 11:24 PM, Allan Wu  > <mailto:al...@cs.ucla.edu>> wrote:
> >
> > Hello everyone,
> >
> > I have cross-compiled OpenMPI for an embedded ARM Linux. Everything works 
> > fine for my system based on Linux 3.8.0. I have previously submitted a post 
> > related to my compilation, which can be found here: 
> > http://www.open-mpi.org/community/lists/devel/2014/04/14440.php 
> > <http://www.op

Re: [OMPI devel] OpenMPI v1.8 and v1.8.3 mpirun hangs at execution on an embedded ARM Linux kernel version 3.15.0

2014-11-25 Thread Allan Wu
Thanks Paul! Unfortunately '/boot' is not available in my embedded linux,
and I do not have the configuration file for the old kernel since it is
provided as is. However, I have the new kernel configuration since I
compiled it myself. Would it be helpful if I provide you the .config file
when I compile the kernel? It maybe quite painful to look through that file
though. Is there any other way that I can obtain the configuration?

I checked my config for the new kernel, and UNIX-domain sockets and Sys V
IPC are both enabled in the build. Are there any other possibilities I can
check?

Thanks,
Di

--
Di Wu (Allan)
PhD student, VAST Laboratory <http://vast.cs.ucla.edu/>,
Department of Computer Science, UC Los Angeles
Email: al...@cs.ucla.edu

On Tue, Nov 25, 2014 at 10:45 AM, Paul Hargrove  wrote:

> Allan,
>
> A likely possibility is that some important kernel feature (that Open MPI
> assumes is present) is missing.
> That includes not only "kernel modules" as you mention, but also features
> configure in (or out) of the base kernel.
> For instance, some embedded kernels omit UNIX-domain sockets and SysV IPC
> support.
>
> If you can send me (preferably off-list) the kernel config files for the
> old an new kernels I may be able to spot something.
> If present, you are looking for /boot/config-[VERSION]
>
> -Paul
>
> On Tue, Nov 25, 2014 at 10:25 AM, Allan Wu  wrote:
>
>> I'm sorry I forgot to change the subject when I reply to the digest
>> issue. Please find my original email below.
>>
>> Regards,
>> Di
>>
>> On Tue, Nov 25, 2014 at 10:19 AM, Allan Wu  wrote:
>>
>>> Thanks Ralph for the reply. Sorry about the log file, I think I forgot
>>> to put an extension to the file. Please find a new one attached with this
>>> email.
>>>
>>> I'm sorry for not enough debugging information, but 'omp_info' and
>>> '--debug-devel' are the only ways I know for collecting information, are
>>> there any other things I can try to provide more info?
>>>
>>> When I execute 'mpirun --debug-devel -np 1 ./helloworld', all the output
>>> is the logging information in my last email. It got stuck at
>>>  "[fpga1:00718] tmp: /tmp", and nothing from my helloworld program is
>>> printed out to the screen. So I think it is mpirun failing to start my
>>> executable, not failing to terminate.
>>>
>>> I was wondering if this has anything to do with my newer kernel version,
>>> since it works well in the old case.
>>>
>>> Thanks,
>>> --
>>> Di Wu (Allan)
>>> PhD student, VAST Laboratory <http://vast.cs.ucla.edu/>,
>>> Department of Computer Science, UC Los Angeles
>>> Email: al...@cs.ucla.edu
>>>
>>>
>>> Date: Tue, 25 Nov 2014 07:29:51 -0800
>>> From:
>>> Ralph Castain 
>>> To: Open MPI Developers 
>>> Subject: Re: [OMPI devel] OpenMPI v1.8 and v1.8.3 mpirun hangs at
>>> execution   on an embedded ARM Linux kernel version 3.15.0
>>> Message-ID: <898cb117-f6a6-4569-89c3-49b75d65b...@open-mpi.org>
>>> Content-Type: text/plain; charset="utf-8"
>>>
>>> I don?t know what you put in that log file, but it was an executable and
>>> I?m not feeling that trusting :-)
>>>
>>> I?m afraid there isn?t enough debug output there to really tell
>>> anything. From what little I can see, I?m guessing that the application ran
>>> fine and you got the usual ?hello? output and the helloworld process exited
>>> safely - is that correct? And so it is solely mpirun that is failing to
>>> cleanly terminate?
>>>
>>>
>>> > On Nov 24, 2014, at 11:24 PM, Allan Wu  wrote:
>>> >
>>> > Hello everyone,
>>> >
>>> > I have cross-compiled OpenMPI for an embedded ARM Linux. Everything
>>> works fine for my system based on Linux 3.8.0. I have previously submitted
>>> a post related to my compilation, which can be found here:
>>> http://www.open-mpi.org/community/lists/devel/2014/04/14440.php <
>>> http://www.open-mpi.org/community/lists/devel/2014/04/14440.php>. When
>>> I recently upgraded my Linux kernel to 3.15.0, mpirun begins to stuck at
>>> even the helloworld program. The program consists only simple APIs:
>>> MPI_Init, MPI_Comm_size, MPI_Comm_rank, MPI_Finalize. The problem occurs
>>> even at 'mpirun -np 1 ./helloworld', and below are the output with
>>> --debug-devel (before it got stuck):
>>> > [fpga1:00716] 

Re: [OMPI devel] OpenMPI v1.8 and v1.8.3 mpirun hangs at execution on an embedded ARM Linux kernel version 3.15.0

2014-11-25 Thread Paul Hargrove
Allan,

A likely possibility is that some important kernel feature (that Open MPI
assumes is present) is missing.
That includes not only "kernel modules" as you mention, but also features
configure in (or out) of the base kernel.
For instance, some embedded kernels omit UNIX-domain sockets and SysV IPC
support.

If you can send me (preferably off-list) the kernel config files for the
old an new kernels I may be able to spot something.
If present, you are looking for /boot/config-[VERSION]

-Paul

On Tue, Nov 25, 2014 at 10:25 AM, Allan Wu  wrote:

> I'm sorry I forgot to change the subject when I reply to the digest
> issue. Please find my original email below.
>
> Regards,
> Di
>
> On Tue, Nov 25, 2014 at 10:19 AM, Allan Wu  wrote:
>
>> Thanks Ralph for the reply. Sorry about the log file, I think I forgot to
>> put an extension to the file. Please find a new one attached with this
>> email.
>>
>> I'm sorry for not enough debugging information, but 'omp_info' and
>> '--debug-devel' are the only ways I know for collecting information, are
>> there any other things I can try to provide more info?
>>
>> When I execute 'mpirun --debug-devel -np 1 ./helloworld', all the output
>> is the logging information in my last email. It got stuck at
>> 
>>  "[fpga1:00718] tmp: /tmp", and nothing from my helloworld program is
>> printed out to the screen. So I think it is mpirun failing to start my
>> executable, not failing to terminate.
>>
>> I was wondering if this has anything to do with my newer kernel version,
>> since it works well in the old case.
>>
>> Thanks,
>> --
>> Di Wu (Allan)
>> PhD student, VAST Laboratory <http://vast.cs.ucla.edu/>,
>> Department of Computer Science, UC Los Angeles
>> Email: al...@cs.ucla.edu
>>
>>
>> Date: Tue, 25 Nov 2014 07:29:51 -0800
>> From:
>> 
>> 
>> Ralph Castain 
>> To: Open MPI Developers 
>> Subject: Re: [OMPI devel] OpenMPI v1.8 and v1.8.3 mpirun hangs at
>> execution   on an embedded ARM Linux kernel version 3.15.0
>> Message-ID: <898cb117-f6a6-4569-89c3-49b75d65b...@open-mpi.org>
>> Content-Type: text/plain; charset="utf-8"
>>
>> I don?t know what you put in that log file, but it was an executable and
>> I?m not feeling that trusting :-)
>>
>> I?m afraid there isn?t enough debug output there to really tell anything.
>> From what little I can see, I?m guessing that the application ran fine and
>> you got the usual ?hello? output and the helloworld process exited safely -
>> is that correct? And so it is solely mpirun that is failing to cleanly
>> terminate?
>>
>>
>> > On Nov 24, 2014, at 11:24 PM, Allan Wu  wrote:
>> >
>> > Hello everyone,
>> >
>> > I have cross-compiled OpenMPI for an embedded ARM Linux. Everything
>> works fine for my system based on Linux 3.8.0. I have previously submitted
>> a post related to my compilation, which can be found here:
>> http://www.open-mpi.org/community/lists/devel/2014/04/14440.php <
>> http://www.open-mpi.org/community/lists/devel/2014/04/14440.php>. When I
>> recently upgraded my Linux kernel to 3.15.0, mpirun begins to stuck at even
>> the helloworld program. The program consists only simple APIs: MPI_Init,
>> MPI_Comm_size, MPI_Comm_rank, MPI_Finalize. The problem occurs even at
>> 'mpirun -np 1 ./helloworld', and below are the output with --debug-devel
>> (before it got stuck):
>> > [fpga1:00716] sess_dir_finalize: job session dir not empty - leaving
>> > [fpga1:00716] procdir: /tmp/openmpi-sessions-root@fpga1_0/63813/0/0
>> > [fpga1:00716] jobdir: /tmp/openmpi-sessions-root@fpga1_0/63813/0
>> > [fpga1:00716] top: openmpi-sessions-root@fpga1_0
>> > [fpga1:00716] tmp: /tmp
>> > [fpga1:00718] procdir: /tmp/openmpi-sessions-root@fpga1_0/63813/1/0
>> > [fpga1:00718] jobdir: /tmp/openmpi-sessions-root@fpga1_0/63813/1
>> > [fpga1:00718] top: openmpi-sessions-root@fpga1_0
>> >
>> 
>> [fpga1:00718] tmp: /tmp
>> >
>> > I suspect maybe it is due to incompatible kernel version or some
>> missing kernel modules. I tried also with the latest version 1.8.3, and had
>> the same problem. Does anyone have any thoughts? I have attached the output
>> of 'ompi-info --all' with this email.
>> >
>> > Please let me know if I need to provide more information. Thanks in
>> advance!
>> >
>> > Regards,
>> > --
>> > Di Wu (Allan)
>> &

Re: [OMPI devel] OpenMPI v1.8 and v1.8.3 mpirun hangs at execution on an embedded ARM Linux kernel version 3.15.0

2014-11-25 Thread Allan Wu
​​I'm sorry I forgot to change the subject when I reply to the digest
issue. Please find my original email below.

Regards,
Di

On Tue, Nov 25, 2014 at 10:19 AM, Allan Wu  wrote:

> Thanks Ralph for the reply. Sorry about the log file, I think I forgot to
> put an extension to the file. Please find a new one attached with this
> email.
>
> ​I'm sorry for not enough debugging information, ​but 'omp_info' and
> '--debug-devel' are the only ways I know for collecting information, are
> there any other things I can try to provide more info?
>
> When I execute 'mpirun --debug-devel -np 1 ./helloworld', all the output
> is the logging information in my last email. It got stuck at
> ​
>  "[fpga1:00718] tmp: /tmp", and nothing from my helloworld program is
> printed out to the screen. So I think it is mpirun failing to start my
> executable, not failing to terminate.
>
> I was wondering if this has anything to do with my newer kernel version,
> since it works well in the old case.
>
> Thanks,
> --
> Di Wu (Allan)
> PhD student, VAST Laboratory <http://vast.cs.ucla.edu/>,
> Department of Computer Science, UC Los Angeles
> Email: al...@cs.ucla.edu
>
>
> ​Date: Tue, 25 Nov 2014 07:29:51 -0800
> From:
> ​​
> ​​
> Ralph Castain 
> To: Open MPI Developers 
> Subject: Re: [OMPI devel] OpenMPI v1.8 and v1.8.3 mpirun hangs at
> execution   on an embedded ARM Linux kernel version 3.15.0
> Message-ID: <898cb117-f6a6-4569-89c3-49b75d65b...@open-mpi.org>
> Content-Type: text/plain; charset="utf-8"
>
> I don?t know what you put in that log file, but it was an executable and
> I?m not feeling that trusting :-)
>
> I?m afraid there isn?t enough debug output there to really tell anything.
> From what little I can see, I?m guessing that the application ran fine and
> you got the usual ?hello? output and the helloworld process exited safely -
> is that correct? And so it is solely mpirun that is failing to cleanly
> terminate?
>
>
> > On Nov 24, 2014, at 11:24 PM, Allan Wu  wrote:
> >
> > Hello everyone,
> >
> > I have cross-compiled OpenMPI for an embedded ARM Linux. Everything
> works fine for my system based on Linux 3.8.0. I have previously submitted
> a post related to my compilation, which can be found here:
> http://www.open-mpi.org/community/lists/devel/2014/04/14440.php <
> http://www.open-mpi.org/community/lists/devel/2014/04/14440.php>. When I
> recently upgraded my Linux kernel to 3.15.0, mpirun begins to stuck at even
> the helloworld program. The program consists only simple APIs: MPI_Init,
> MPI_Comm_size, MPI_Comm_rank, MPI_Finalize. The problem occurs even at
> 'mpirun -np 1 ./helloworld', and below are the output with --debug-devel
> (before it got stuck):
> > [fpga1:00716] sess_dir_finalize: job session dir not empty - leaving
> > [fpga1:00716] procdir: /tmp/openmpi-sessions-root@fpga1_0/63813/0/0
> > [fpga1:00716] jobdir: /tmp/openmpi-sessions-root@fpga1_0/63813/0
> > [fpga1:00716] top: openmpi-sessions-root@fpga1_0
> > [fpga1:00716] tmp: /tmp
> > [fpga1:00718] procdir: /tmp/openmpi-sessions-root@fpga1_0/63813/1/0
> > [fpga1:00718] jobdir: /tmp/openmpi-sessions-root@fpga1_0/63813/1
> > [fpga1:00718] top: openmpi-sessions-root@fpga1_0
> >
> ​​
> [fpga1:00718] tmp: /tmp
> >
> > I suspect maybe it is due to incompatible kernel version or some missing
> kernel modules. I tried also with the latest version 1.8.3, and had the
> same problem. Does anyone have any thoughts? I have attached the output of
> 'ompi-info --all' with this email.
> >
> > Please let me know if I need to provide more information. Thanks in
> advance!
> >
> > Regards,
> > --
> > Di Wu (Allan)
> > PhD student, VAST?Laboratory <http://vast.cs.ucla.edu/>,
> > Department of Computer Science, UC Los Angeles
> > Email: al...@cs.ucla.edu <mailto:al...@cs.ucla.edu>
> > ___
> > devel mailing list
> > de...@open-mpi.org
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > Link to this post:
> http://www.open-mpi.org/community/lists/devel/2014/11/16330.php​
>
>


Re: [OMPI devel] OpenMPI v1.8 and v1.8.3 mpirun hangs at execution on an embedded ARM Linux kernel version 3.15.0

2014-11-25 Thread Ralph Castain
Thanks - no idea why it was trying to execute on my machine, but I’ve learned 
to be far less trusting.

Looks like it was just a complete output of ompi_info, which doesn’t really 
help here anyway. Will need to hear the answers to my questions before 
suggesting a next step.


> On Nov 25, 2014, at 9:09 AM, Paul Hargrove  wrote:
> 
> Ralph,
> 
> I downloaded the attachment and found it to be a gzipped tar file containing 
> a single text file "log".
> I have attached the bzipped (not tarred) log file.
> 
> -Paul
> 
> On Tue, Nov 25, 2014 at 7:29 AM, Ralph Castain  > wrote:
> I don’t know what you put in that log file, but it was an executable and I’m 
> not feeling that trusting :-)
> 
> I’m afraid there isn’t enough debug output there to really tell anything. 
> From what little I can see, I’m guessing that the application ran fine and 
> you got the usual “hello” output and the helloworld process exited safely - 
> is that correct? And so it is solely mpirun that is failing to cleanly 
> terminate?
> 
> 
>> On Nov 24, 2014, at 11:24 PM, Allan Wu > > wrote:
>> 
>> Hello everyone,
>> 
>> I have cross-compiled OpenMPI for an embedded ARM Linux. Everything works 
>> fine for my system based on Linux 3.8.0. I have previously submitted a post 
>> related to my compilation, which can be found here: 
>> http://www.open-mpi.org/community/lists/devel/2014/04/14440.php 
>> . When I 
>> recently upgraded my Linux kernel to 3.15.0, mpirun begins to stuck at even 
>> the helloworld program. The program consists only simple APIs: MPI_Init, 
>> MPI_Comm_size, MPI_Comm_rank, MPI_Finalize. The problem occurs even at 
>> 'mpirun -np 1 ./helloworld', and below are the output with --debug-devel 
>> (before it got stuck):
>> [fpga1:00716] sess_dir_finalize: job session dir not empty - leaving
>> [fpga1:00716] procdir: /tmp/openmpi-sessions-root@fpga1_0/63813/0/0
>> [fpga1:00716] jobdir: /tmp/openmpi-sessions-root@fpga1_0/63813/0
>> [fpga1:00716] top: openmpi-sessions-root@fpga1_0
>> [fpga1:00716] tmp: /tmp
>> [fpga1:00718] procdir: /tmp/openmpi-sessions-root@fpga1_0/63813/1/0
>> [fpga1:00718] jobdir: /tmp/openmpi-sessions-root@fpga1_0/63813/1
>> [fpga1:00718] top: openmpi-sessions-root@fpga1_0
>> [fpga1:00718] tmp: /tmp
>> 
>> I suspect maybe it is due to incompatible kernel version or some missing 
>> kernel modules. I tried also with the latest version 1.8.3, and had the same 
>> problem. Does anyone have any thoughts? I have attached the output of 
>> 'ompi-info --all' with this email. 
>> 
>> Please let me know if I need to provide more information. Thanks in advance!
>> 
>> Regards,
>> --
>> Di Wu (Allan)
>> PhD student, VAST Laboratory ,
>> Department of Computer Science, UC Los Angeles
>> Email: al...@cs.ucla.edu 
>> ___
>> devel mailing list
>> de...@open-mpi.org 
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel 
>> 
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2014/11/16330.php 
>> 
> 
> ___
> devel mailing list
> de...@open-mpi.org 
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel 
> 
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/11/16331.php 
> 
> 
> 
> 
> -- 
> Paul H. Hargrove  phhargr...@lbl.gov 
> 
> Computer Languages & Systems Software (CLaSS) Group
> Computer Science Department   Tel: +1-510-495-2352
> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/11/16335.php



Re: [OMPI devel] OpenMPI v1.8 and v1.8.3 mpirun hangs at execution on an embedded ARM Linux kernel version 3.15.0

2014-11-25 Thread Paul Hargrove
Ralph,

I downloaded the attachment and found it to be a gzipped tar file
containing a single text file "log".
I have attached the bzipped (not tarred) log file.

-Paul

On Tue, Nov 25, 2014 at 7:29 AM, Ralph Castain  wrote:

> I don't know what you put in that log file, but it was an executable and
> I'm not feeling that trusting :-)
>
> I'm afraid there isn't enough debug output there to really tell anything.
> From what little I can see, I'm guessing that the application ran fine and
> you got the usual "hello" output and the helloworld process exited safely -
> is that correct? And so it is solely mpirun that is failing to cleanly
> terminate?
>
>
> On Nov 24, 2014, at 11:24 PM, Allan Wu  wrote:
>
> Hello everyone,
>
> I have cross-compiled OpenMPI for an embedded ARM Linux. Everything works
> fine for my system based on Linux 3.8.0. I have previously submitted a post
> related to my compilation, which can be found here: http://www.open-mpi
> .org/community/lists/devel/2014/04/14440.php. When I recently upgraded my
> Linux kernel to 3.15.0, mpirun begins to stuck at even the helloworld
> program. The program consists only simple APIs: MPI_Init, MPI_Comm_size,
> MPI_Comm_rank, MPI_Finalize. The problem occurs even at 'mpirun -np 1
> ./helloworld', and below are the output with --debug-devel (before it got
> stuck):
> [fpga1:00716] sess_dir_finalize: job session dir not empty - leaving
> [fpga1:00716] procdir: /tmp/openmpi-sessions-root@fpga1_0/63813/0/0
> [fpga1:00716] jobdir: /tmp/openmpi-sessions-root@fpga1_0/63813/0
> [fpga1:00716] top: openmpi-sessions-root@fpga1_0
> [fpga1:00716] tmp: /tmp
> [fpga1:00718] procdir: /tmp/openmpi-sessions-root@fpga1_0/63813/1/0
> [fpga1:00718] jobdir: /tmp/openmpi-sessions-root@fpga1_0/63813/1
> [fpga1:00718] top: openmpi-sessions-root@fpga1_0
> [fpga1:00718] tmp: /tmp
>
> I suspect maybe it is due to incompatible kernel version or some missing
> kernel modules. I tried also with the latest version 1.8.3, and had the
> same problem. Does anyone have any thoughts? I have attached the output of
> 'ompi-info --all' with this email.
>
> Please let me know if I need to provide more information. Thanks in
> advance!
>
> Regards,
> --
> Di Wu (Allan)
> PhD student, VAST Laboratory ,
> Department of Computer Science, UC Los Angeles
> Email: al...@cs.ucla.edu
>  ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2014/11/16330.php
>
>
>
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2014/11/16331.php
>



-- 
Paul H. Hargrove  phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900


log.bz2
Description: BZip2 compressed data


Re: [OMPI devel] OpenMPI v1.8 and v1.8.3 mpirun hangs at execution on an embedded ARM Linux kernel version 3.15.0

2014-11-25 Thread Ralph Castain
I don’t know what you put in that log file, but it was an executable and I’m 
not feeling that trusting :-)

I’m afraid there isn’t enough debug output there to really tell anything. From 
what little I can see, I’m guessing that the application ran fine and you got 
the usual “hello” output and the helloworld process exited safely - is that 
correct? And so it is solely mpirun that is failing to cleanly terminate?


> On Nov 24, 2014, at 11:24 PM, Allan Wu  wrote:
> 
> Hello everyone,
> 
> I have cross-compiled OpenMPI for an embedded ARM Linux. Everything works 
> fine for my system based on Linux 3.8.0. I have previously submitted a post 
> related to my compilation, which can be found here: 
> http://www.open-mpi.org/community/lists/devel/2014/04/14440.php 
> . When I 
> recently upgraded my Linux kernel to 3.15.0, mpirun begins to stuck at even 
> the helloworld program. The program consists only simple APIs: MPI_Init, 
> MPI_Comm_size, MPI_Comm_rank, MPI_Finalize. The problem occurs even at 
> 'mpirun -np 1 ./helloworld', and below are the output with --debug-devel 
> (before it got stuck):
> [fpga1:00716] sess_dir_finalize: job session dir not empty - leaving
> [fpga1:00716] procdir: /tmp/openmpi-sessions-root@fpga1_0/63813/0/0
> [fpga1:00716] jobdir: /tmp/openmpi-sessions-root@fpga1_0/63813/0
> [fpga1:00716] top: openmpi-sessions-root@fpga1_0
> [fpga1:00716] tmp: /tmp
> [fpga1:00718] procdir: /tmp/openmpi-sessions-root@fpga1_0/63813/1/0
> [fpga1:00718] jobdir: /tmp/openmpi-sessions-root@fpga1_0/63813/1
> [fpga1:00718] top: openmpi-sessions-root@fpga1_0
> [fpga1:00718] tmp: /tmp
> 
> I suspect maybe it is due to incompatible kernel version or some missing 
> kernel modules. I tried also with the latest version 1.8.3, and had the same 
> problem. Does anyone have any thoughts? I have attached the output of 
> 'ompi-info --all' with this email. 
> 
> Please let me know if I need to provide more information. Thanks in advance!
> 
> Regards,
> --
> Di Wu (Allan)
> PhD student, VAST Laboratory ,
> Department of Computer Science, UC Los Angeles
> Email: al...@cs.ucla.edu 
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/11/16330.php



[OMPI devel] OpenMPI v1.8 and v1.8.3 mpirun hangs at execution on an embedded ARM Linux kernel version 3.15.0

2014-11-25 Thread Allan Wu
Hello everyone,

I have cross-compiled OpenMPI for an embedded ARM Linux. Everything works
fine for my system based on Linux 3.8.0. I have previously submitted a post
related to my compilation, which can be found here: http://www.open-mpi
.org/community/lists/devel/2014/04/14440.php. When I recently upgraded my
Linux kernel to 3.15.0, mpirun begins to stuck at even the helloworld
program. The program consists only simple APIs: MPI_Init, MPI_Comm_size,
MPI_Comm_rank, MPI_Finalize. The problem occurs even at 'mpirun -np 1
./helloworld', and below are the output with --debug-devel (before it got
stuck):
[fpga1:00716] sess_dir_finalize: job session dir not empty - leaving
[fpga1:00716] procdir: /tmp/openmpi-sessions-root@fpga1_0/63813/0/0
[fpga1:00716] jobdir: /tmp/openmpi-sessions-root@fpga1_0/63813/0
[fpga1:00716] top: openmpi-sessions-root@fpga1_0
[fpga1:00716] tmp: /tmp
[fpga1:00718] procdir: /tmp/openmpi-sessions-root@fpga1_0/63813/1/0
[fpga1:00718] jobdir: /tmp/openmpi-sessions-root@fpga1_0/63813/1
[fpga1:00718] top: openmpi-sessions-root@fpga1_0
[fpga1:00718] tmp: /tmp

I suspect maybe it is due to incompatible kernel version or some missing
kernel modules. I tried also with the latest version 1.8.3, and had the
same problem. Does anyone have any thoughts? I have attached the output of
'ompi-info --all' with this email.

Please let me know if I need to provide more information. Thanks in advance!

Regards,
--
Di Wu (Allan)
PhD student, VAST Laboratory ,
Department of Computer Science, UC Los Angeles
Email: al...@cs.ucla.edu


log.tar.gz
Description: GNU Zip compressed data