Re: [OMPI users] Open MPI exited on signal 11 (Segmentation fault). Trying to run a script that uses Open MPI

2013-07-05 Thread Ed Blosch
Compile with -traceback and -check all if using Intel.   Otherwise find the 
right compiler options to check array bounds accesses and to dump a stack 
trace. Then compile debug and run that way. Assuming it fails, you probably 
will get good info on the source of the problem. If it doesn't fail then the 
compiler has a bug (not as rare as you might think).

You need to look at the application output. Not the output from mpirun.

Ed


Sent via the Samsung Galaxy S™ III, an AT 4G LTE smartphone

 Original message 
From: Ralph Castain  
Date:  
To: Open MPI Users  
Subject: Re: [OMPI users] Open MPI exited on signal 11 (Segmentation fault).
Trying to run a script that uses Open MPI 
 
Well, it's telling you that your program segfaulted - so I'd start with that, 
perhaps looking for any core it might have dropped.

On Jul 4, 2013, at 8:36 PM, Rick White  wrote:

Hello, 

I have this error:
mpiexec noticed that process rank 1 with PID 16087 on node server exited on 
signal 11 (Segmentation fault)

Wondering how to fix it?

Cheers and many thanks
Rick

-- 
Richard Allen White III M.S.
PhD Candidate - Suttle Lab
Department of Microbiology & Immunology
The University of British Columbia 
Vancouver, BC, Canada
cell.  604-440-5150 
http://www.ocgy.ubc.ca/~suttle/ 


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



Re: [OMPI users] Application hangs on mpi_waitall

2013-06-27 Thread Ed Blosch
It ran a bit longer but still deadlocked.  All matching sends are posted 
1:1with posted recvs so it is a delivery issue of some kind.  I'm running a 
debug compiled version tonight to see what that might turn up.  I may try to 
rewrite with blocking sends and see if that works.  I can also try adding a 
barrier (irecvs, barrier, isends, waitall) to make sure sends are not buffering 
waiting for recvs to be posted.


Sent via the Samsung Galaxy S™ III, an AT 4G LTE smartphone

 Original message 
From: George Bosilca  
Date:  
To: Open MPI Users  
Subject: Re: [OMPI users] Application hangs on mpi_waitall 
 
Ed,

Im not sure but there might be a case where the BTL is getting overwhelmed by 
the nob-blocking operations while trying to setup the connection. There is a 
simple test for this. Add an MPI_Alltoall with a reasonable size (100k) before 
you start posting the non-blocking receives, and let's see if this solves your 
issue.

  George.


On Jun 26, 2013, at 04:02 , eblo...@1scom.net wrote:

> An update: I recoded the mpi_waitall as a loop over the requests with
> mpi_test and a 30 second timeout.  The timeout happens unpredictably,
> sometimes after 10 minutes of run time, other times after 15 minutes, for
> the exact same case.
> 
> After 30 seconds, I print out the status of all outstanding receive
> requests.  The message tags that are outstanding have definitely been
> sent, so I am wondering why they are not getting received?
> 
> As I said before, everybody posts non-blocking standard receives, then
> non-blocking standard sends, then calls mpi_waitall. Each process is
> typically waiting on 200 to 300 requests. Is deadlock possible via this
> implementation approach under some kind of unusual conditions?
> 
> Thanks again,
> 
> Ed
> 
>> I'm running OpenMPI 1.6.4 and seeing a problem where mpi_waitall never
>> returns.  The case runs fine with MVAPICH.  The logic associated with the
>> communications has been extensively debugged in the past; we don't think
>> it has errors.   Each process posts non-blocking receives, non-blocking
>> sends, and then does waitall on all the outstanding requests.
>> 
>> The work is broken down into 960 chunks. If I run with 960 processes (60
>> nodes of 16 cores each), things seem to work.  If I use 160 processes
>> (each process handling 6 chunks of work), then each process is handling 6
>> times as much communication, and that is the case that hangs with OpenMPI
>> 1.6.4; again, seems to work with MVAPICH.  Is there an obvious place to
>> start, diagnostically?  We're using the openib btl.
>> 
>> Thanks,
>> 
>> Ed
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


Re: [OMPI users] EXTERNAL: Re: basic questions about compiling OpenMPI

2013-05-25 Thread Ed Blosch
Much appreciated, guys.  I am a middle man in a discussion over whether MPI 
should be handled by apps people or system people and there was some confusion 
when we saw RHEL6 had an RPM for OpenMpi. Your comments make it clear that 
there is a pretty strong  preference to build OpenMpi on the system to be used, 
with the compilers that match your apps compiler. Just a prereq for supporting 
a development environment

 Original message 
From: Tim Prince  
Date:  
To: us...@open-mpi.org 
Subject: Re: [OMPI users] EXTERNAL: Re: basic questions about compiling OpenMPI 
 
On 5/25/2013 8:26 AM, Jeff Squyres (jsquyres) wrote:
> On May 23, 2013, at 9:50 AM, "Blosch, Edwin L"  
> wrote:
>
>> Excellent.  Now I've read the FAQ and noticed that it doesn't mention the 
>> issue with the Fortran 90 .mod signatures.  Our applications are Fortran.  
>> So your replies are very helpful -- now I know it really isn't practical for 
>> us to use the default OpenMPI shipped with RHEL6 since we use both Intel and 
>> PGI compilers and have several applications to accommodate.  Presumably if 
>> all the applications did INCLUDE 'mpif.h'  instead of 'USE MPI' then we 
>> could get things working, but it's not a great workaround.
> No, not even if they use mpif.h.  Here's a chunk of text from the v1.6 README:
>
> - While it is possible -- on some platforms -- to configure and build
>    Open MPI with one Fortran compiler and then build MPI applications
>    with a different Fortran compiler, this is not recommended.  Subtle
>    problems can arise at run time, even if the MPI application
>    compiled and linked successfully.
>
>    Specifically, the following two cases may not be portable between
>    different Fortran compilers:
>
>    1. The C constants MPI_F_STATUS_IGNORE and MPI_F_STATUSES_IGNORE
>   will only compare properly to Fortran applications that were
>   created with Fortran compilers that that use the same
>   name-mangling scheme as the Fortran compiler with which Open MPI
>   was configured.
>
>    2. Fortran compilers may have different values for the logical
>   .TRUE. constant.  As such, any MPI function that uses the Fortran
>   LOGICAL type may only get .TRUE. values back that correspond to
>   the the .TRUE. value of the Fortran compiler with which Open MPI
>   was configured.  Note that some Fortran compilers allow forcing
>   .TRUE. to be 1 and .FALSE. to be 0.  For example, the Portland
>   Group compilers provide the "-Munixlogical" option, and Intel
>   compilers (version >= 8.) provide the "-fpscomp logicals" option.
>
>    You can use the ompi_info command to see the Fortran compiler with
>    which Open MPI was configured.
>
>
Even when the name mangling obstacle doesn't arise (it shouldn't for the 
cited case of gfortran vs. ifort), run-time library function usage is 
likely to conflict between the compiler used to build the MPI Fortran 
library and the compiler used to build the application. So there really 
isn't a good incentive to retrogress away from the USE files simply to 
avoid one aspect of mixing incompatible compilers.

-- 
Tim Prince

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


Re: [OMPI users] Confused on simple MPI/OpenMP program

2013-04-09 Thread Ed Blosch
I figured it out. 

In the real application, I also did not have the 'use' statement, and there
was an IMPLICIT statement causing the omp_get_max_threads() function to be
automatically compiled as a real function instead of as integer, and the
integers were automatically promoted to 8-byte using -i8.

Once I added the 'use omp_lib' statement, the compiler caught the mis-match.


Just to verify, I did add the 'use omp_lib' statement and ran the test
program by itself.  I do get '4' as expected regardless of whether or not I
run the program under MPI.

So there is no OpenMPI-related issue.  I thought it was OpenMPI-related
because, after commenting out the MPI calls, I got the right answer.  But
this was probably just a coincidence. 

Thanks,

Ed 



I did not have the 'use' statement.

-Original Message-
From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On
Behalf Of Reuti
Sent: Thursday, April 04, 2013 7:13 AM
To: Open MPI Users
Subject: Re: [OMPI users] Confused on simple MPI/OpenMP program

Hi,

Am 04.04.2013 um 04:35 schrieb Ed Blosch:

> Consider this Fortran program snippet:
> 
> program test

use omp_lib
include 'mpif.h'

might be missing.


>  ! everybody except rank=0 exits.
>  call mpi_init(ierr)
>  call mpi_comm_rank(MPI_COMM_WORLD,irank,ierr)
>  if (irank /= 0) then
>call mpi_finalize(ierr)
>stop
>  endif
> 
>  ! rank 0 tries to set number of OpenMP threads to 4  call 
> omp_set_num_threads(4)  nthreads = omp_get_max_threads()  print*, 
> "nthreads = ", nthreads
> 
>  call mpi_finalize(ierr)
> 
> end program test
> 
> It is compiled like this: 'mpif90 -o test -O2 -openmp test.f90'  
> (Intel
> 11.x)
> 
> When I run it like this:  mpirun -np 2 ./test
> 
> The output is:  "nthreads = 0"
> 
> Does that make sense?  I was expecting 4.
> 
> If I comment out the MPI lines and run the program serially (but still 
> compiled with mpif90), then I get the expected output value 4.

Nope, for me it's still 0 then.

-- Reuti


> I'm sure I must be overlooking something basic here.  Please enlighten me.
> Does this have anything to do with how I've configured OpenMPI?
> 
> Thanks,
> 
> Ed
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



[OMPI users] Confused on simple MPI/OpenMP program

2013-04-03 Thread Ed Blosch
Consider this Fortran program snippet:

program test

  ! everybody except rank=0 exits.
  call mpi_init(ierr)
  call mpi_comm_rank(MPI_COMM_WORLD,irank,ierr)
  if (irank /= 0) then
call mpi_finalize(ierr)
stop
  endif

  ! rank 0 tries to set number of OpenMP threads to 4
  call omp_set_num_threads(4)
  nthreads = omp_get_max_threads()
  print*, "nthreads = ", nthreads

  call mpi_finalize(ierr)

end program test

It is compiled like this: 'mpif90 -o test -O2 -openmp test.f90'  (Intel
11.x)

When I run it like this:  mpirun -np 2 ./test

The output is:  "nthreads = 0"

Does that make sense?  I was expecting 4.

If I comment out the MPI lines and run the program serially (but still
compiled with mpif90), then I get the expected output value 4.

I'm sure I must be overlooking something basic here.  Please enlighten me.
Does this have anything to do with how I've configured OpenMPI?

Thanks,

Ed




Re: [OMPI users] EXTERNAL: Re: How to set up state-less node /tmp for OpenMPI usage

2011-11-04 Thread Ed Blosch
Thanks very much, exactly what I wanted to hear. How big is /tmp?

-Original Message-
From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On
Behalf Of David Turner
Sent: Thursday, November 03, 2011 6:36 PM
To: us...@open-mpi.org
Subject: Re: [OMPI users] EXTERNAL: Re: How to set up state-less node /tmp
for OpenMPI usage

I'm not a systems guy, but I'll pitch in anyway.  On our cluster,
all the compute nodes are completely diskless.  The root file system,
including /tmp, resides in memory (ramdisk).  OpenMPI puts these
session directories therein.  All our jobs run through a batch
system (torque).  At the conclusion of each batch job, an epilogue
process runs that removes all files belonging to the owner of the
current batch job from /tmp (and also looks for and kills orphan
processes belonging to the user).  This epilogue had to written
by our systems staff.

I believe this is a fairly common configuration for diskless
clusters.

On 11/3/11 4:09 PM, Blosch, Edwin L wrote:
> Thanks for the help.  A couple follow-up-questions, maybe this starts to
go outside OpenMPI:
>
> What's wrong with using /dev/shm?  I think you said earlier in this thread
that this was not a safe place.
>
> If the NFS-mount point is moved from /tmp to /work, would a /tmp magically
appear in the filesystem for a stateless node?  How big would it be, given
that there is no local disk, right?  That may be something I have to ask the
vendor, which I've tried, but they don't quite seem to get the question.
>
> Thanks
>
>
>
>
> -Original Message-
> From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On
Behalf Of Ralph Castain
> Sent: Thursday, November 03, 2011 5:22 PM
> To: Open MPI Users
> Subject: Re: [OMPI users] EXTERNAL: Re: How to set up state-less node /tmp
for OpenMPI usage
>
>
> On Nov 3, 2011, at 2:55 PM, Blosch, Edwin L wrote:
>
>> I might be missing something here. Is there a side-effect or performance
loss if you don't use the sm btl?  Why would it exist if there is a wholly
equivalent alternative?  What happens to traffic that is intended for
another process on the same node?
>
> There is a definite performance impact, and we wouldn't recommend doing
what Eugene suggested if you care about performance.
>
> The correct solution here is get your sys admin to make /tmp local. Making
/tmp NFS mounted across multiple nodes is a major "faux pas" in the Linux
world - it should never be done, for the reasons stated by Jeff.
>
>
>>
>> Thanks
>>
>>
>> -Original Message-
>> From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On
Behalf Of Eugene Loh
>> Sent: Thursday, November 03, 2011 1:23 PM
>> To: us...@open-mpi.org
>> Subject: Re: [OMPI users] EXTERNAL: Re: How to set up state-less node
/tmp for OpenMPI usage
>>
>> Right.  Actually "--mca btl ^sm".  (Was missing "btl".)
>>
>> On 11/3/2011 11:19 AM, Blosch, Edwin L wrote:
>>> I don't tell OpenMPI what BTLs to use. The default uses sm and puts a
session file on /tmp, which is NFS-mounted and thus not a good choice.
>>>
>>> Are you suggesting something like --mca ^sm?
>>>
>>>
>>> -Original Message-
>>> From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On
Behalf Of Eugene Loh
>>> Sent: Thursday, November 03, 2011 12:54 PM
>>> To: us...@open-mpi.org
>>> Subject: Re: [OMPI users] EXTERNAL: Re: How to set up state-less node
/tmp for OpenMPI usage
>>>
>>> I've not been following closely.  Why must one use shared-memory
>>> communications?  How about using other BTLs in a "loopback" fashion?
>>> ___
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> ___
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Best regards,

David Turner
User Services Groupemail: dptur...@lbl.gov
NERSC Division phone: (510) 486-4027
Lawrence Berkeley Labfax: (510) 486-4316
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



Re: [OMPI users] EXTERNAL: Re: Unresolved reference 'mbind' and 'get_mempolicy'

2011-09-30 Thread Ed Blosch
Thank you for all this information.

Your diagnosis is totally right.  I actually sent e-mail yesterday but
apparently it never got through :<

It IS the MPI application that is failing to link, not OpenMPI itself; my
e-mail was not well written; sorry Brice.

The situation is this:  I am trying to compile using an OpenMPI 1.5.4 that
was built to be rooted in /release, but it is not placed there yet
(testing); it is currently under /builds/release.  I have set OPAL_PREFIX in
the environment, with the intention of helping the compiler wrappers work
right. Under /release, I currently have OpenMPI 1.4.3, whereas the OpenMPI
under /builds/release is 1.5.4.

What I am getting is this:  The mpif90 wrapper (under
/builds/release/openmpi/bin) puts -I/release instead of -I/builds/release.
But it includes -L/builds/release.

So I'm getting headers from 1.4.3 when compiling, but the libmpi from 1.5.4
when linking.

I did a quick "move 1.4.3 out of the way and put 1.5.4 over to /release
where it belongs" test, and my application did link without errors, so I
think that confirms the nature of the problem.

Is it a bug that mpif90 didn't pay attention to OPAL_PREFIX in the -I but
did use it in the -L ?


-Original Message-
From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On
Behalf Of Jeff Squyres
Sent: Friday, September 30, 2011 7:04 AM
To: Open MPI Users
Subject: Re: [OMPI users] EXTERNAL: Re: Unresolved reference 'mbind' and
'get_mempolicy'

I think the issue here is that it's linking the *MPI application* that is
causing the problem.  Is that right?

If so, can you send your exact application compile line, and the the output
of that compile line with "--showme" at the end?


On Sep 29, 2011, at 4:24 PM, Brice Goglin wrote:

> Le 28/09/2011 23:02, Blosch, Edwin L a écrit :
>> Jeff,  
>> 
>> I've tried it now adding --without-libnuma.  Actually that did NOT fix
the problem, so I can send you the full output from configure if you want,
to understand why this "hwloc" function is trying to use a function which
appears to be unavailable.
> 
> This function is likely available... in the dynamic version of libnuma
> (that's why configure is happy), but make is probably trying to link
> with the static version which isn't available on your machine. That's my
> guess, at least.
> 
>> I don't understand about make V=1. What tree? Somewhere in the OpenMPI
build, or in the application compilation itself? Is "V=1" something in the
OpenMPI makefile structure?
> 
> Instead of doing
>  ./configure ...
>  make
> do
>  ./configure 
>  make V=1
> 
> It will make the output more verbose. Once you get the failure, please
> send the last 15 lines or so. We will look at these verbose lines to
> understand how things are being compiled (which linker flags, which
> libraries, ...)
> 
> Brice
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] Can you set the gid of the processes created by mpirun?

2011-09-16 Thread Ed Blosch
In my case the directories are actually the "tmp" directories created by the
job-scheduling system, but I think a wrapper script could chgrp and setguid
appropriately so that a process running group 1040 would effectively write
files with group ownership 650. Thanks for the clever idea.

-Original Message-
From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On
Behalf Of Reuti
Sent: Thursday, September 15, 2011 12:23 PM
To: Open MPI Users
Subject: Re: [OMPI users] Can you set the gid of the processes created by
mpirun?

Edwin, going back to this:

>> The mpirun command is invoked when the user's group is 'set group' to
group 650.  When the rank 0 process creates files, they have group ownership
650.  But the user's login group is group 1040. The child processes that get
started on other nodes run with group 1040, and the files they create have
group ownership 1040.

What about setting the set-guid flag for the directory? Created files
therein will inherit the group from the folder then (which has to be set to
the group in question of course).

-- Reuti
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



Re: [OMPI users] Can you set the gid of the processes created by mpirun?

2011-09-07 Thread Ed Blosch
Typically it is something like 'qsub -W group_list=groupB 
myjob.sh'. Ultimately myjob.sh runs with gid groupB on some host in the
cluster.  When that script reaches the mpirun command, then mpirun and the
processes started on the same host all run with gid groupB, but any of the
spawned processes that start on other hosts run with the user's default
group, say groupA.

It did occur to me that the launching technique might have some ability to
influence this behavior as you indicated. I don't know what launcher is
being used in our cases, I guess it's rsh/ssh.

-Original Message-
From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On
Behalf Of Reuti
Sent: Wednesday, September 07, 2011 12:24 PM
To: Open MPI Users
Subject: Re: [OMPI users] Can you set the gid of the processes created by
mpirun?

Hi,

you mean you change the group id of the user before you submit the job? In
GridEngine you can specify whether the actual group id should be used for
the job, or the default login id.

Having a tight integration, also the slave processes will run with the same
group id.

-- Reuti


>  Ed
>  
> From: Ralph Castain [mailto:r...@open-mpi.org] 
> Sent: Wednesday, September 07, 2011 8:53 AM
> To: Open MPI Users
> Subject: Re: [OMPI users] Can you set the gid of the processes created by
mpirun?
>  
> On Sep 7, 2011, at 7:38 AM, Blosch, Edwin L wrote:
> 
> 
> The mpirun command is invoked when the user's group is 'set group' to
group 650.  When the rank 0 process creates files, they have group ownership
650.  But the user's login group is group 1040. The child processes that get
started on other nodes run with group 1040, and the files they create have
group ownership 1040.
>  
> Is there a way to tell mpirun to start the child processes with the same
uid and gid as the rank 0 process?
>  
> I'm afraid not - never came up before. Could be done, but probably not
right away. What version are you using?
> 
> 
>  
> Thanks
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>  
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users