[OMPI devel] Open MPI, ssh and limits

2017-03-03 Thread Gilles Gouaillardet

Folks,


this is a follow-up on 
https://www.mail-archive.com/users@lists.open-mpi.org//msg30715.html



on my cluster, the core file size is 0 by default, but it can be set to 
unlimited by any user.


i think this is a pretty common default.


$ ulimit -c
0
$ bash -c 'ulimit -c'
0
$ mpirun -np 1 bash -c 'ulimit -c'
0

$ mpirun -np 1 --host n1 bash -c 'ulimit -c'
0

$ ssh n1
[n1 ~]$ ulimit -c
0
[n1 ~]$ bash -c 'ulimit -c'
0

*but*

$ ssh motomachi-n1 bash -c 'ulimit -c'
unlimited


now if i manually set the core file size to unlimited

$ ulimit -c unlimited
$ ulimit -c
unlimited
$ bash -c 'ulimit -c'
unlimited
$ mpirun -np 1 bash -c 'ulimit -c'
unlimited


*but*

$ mpirun -np 1 --host n1 bash -c 'ulimit -c'
0


fun fact

$ ssh n1 bash -c 'ulimit -c; bash -c "ulimit -c"'
unlimited
0


bottom line, MPI tasks that run on the same node mpirun was invoked on 
inherit


the core file size limit from mpirun, whereas tasks that run on the 
other node


use the default core file size limit.


a manual workaround is

mpirun --mca opal_set_max_sys_limits core:unlimited ...


i guess we should do something about that, but what

- just document it

- mpirun forwards all/some limits to all the spawned tasks regardless 
where they run


- mpirun forwards all/some limits to all the spawned tasks regardless 
where they run


  but only if they are 0 or unlimited

- something else



thoughts anyone ?


Gilles


___
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel


[OMPI devel] weird error message (you'll be puzzled!)

2017-03-03 Thread Paul Kapinos

Dear Open MPI developer,
please take a look at the attached 'hello MPI world' file.
We know that it contain an error (you should never put '1476395012' into 
MPI_Init_thread() call! It was a typo, initially...) BUT, see what happens if 
you compile&call it:


$ mpif90 -g mpihelloworld.f90
$ ./a.out
  1476395012   3
*** The MPI_Init_thread() function was called before MPI_INIT was invoked.
*** This is disallowed by the MPI standard.
*** Your MPI job will now abort.
[cluster-hpc.rz.RWTH-Aachen.DE:25739] Local abort before MPI_INIT completed 
successfully; not able to aggregate error messages, and not able to guarantee 
that all other processes were killed


For me, reading this:
> MPI_Init_thread() function was called before MPI_INIT was invoked.
> This is disallowed by the MPI standard
...produced some cognitive dissonance, as the MPI's calls to MPI_Init_thread and 
MPI_Init are well-known to be *mutually exclusive*. Well maybe with 
'MPI_Init_thread() function' something Open MPI- internal is meant instead of 
MPI's MPI_Init_thread, but the error message stays strongly unbelievable ( 2 + 2 
= 6 !)


Maybe you can text a better error message? :o)

Have a nice day,

Paul Kapinos

P.S. Tested versions: 1.10.6 and 2.0.1, with support for MPI_THREAD_MULTIPLE


> MPI_Init_thread(3) Open MPI MPI_Init_thread(3)
> NAME
>MPI_Init_thread - Initializes the MPI execution environment
> ..
> DESCRIPTION
>This  routine, or MPI_Init, must be called before any other MPI routine
>(apart from MPI_Initialized) is called. MPI can be initialized at  most
>once; subsequent calls to MPI_Init or MPI_Init_thread are erroneous.
>
>MPI_Init_thread,  as compared to MPI_Init, has a provision to request a
>certain level of thread support in required:


> MPI_Init(3)Open MPIMPI_Init(3)
> NAME
>MPI_Init - Initializes the MPI execution environment
> .
> DESCRIPTION
>This  routine,  or MPI_Init_thread, must be called before any other MPI
>routine (apart from MPI_Initialized) is called. MPI can be  initialized
>at most once; subsequent calls to MPI_Init or MPI_Init_thread are erro-
>neous.





--
Dipl.-Inform. Paul Kapinos   -   High Performance Computing,
RWTH Aachen University, IT Center
Seffenter Weg 23,  D 52074  Aachen (Germany)
Tel: +49 241/80-24915
! Paul Kapinos 22.09.2009 - 
! RZ RWTH Aachen, www.rz.rwth-aachen.de
!
! MPI-Hello-World
!
PROGRAM PK_MPI_Test

USE MPI
IMPLICIT NONE
!include "mpif.h"


!
INTEGER :: my_MPI_Rank, laenge, ierr
INTEGER :: requ, provid, required
! INTEGER :: PROVIDED, REQUIRED

CHARACTER*(MPI_MAX_PROCESSOR_NAME) my_Host
!
!WRITE (*,*) "Jetz penn ich mal 30"
!CALL Sleep(30)
!WRITE (*,*) "Starten"

!CALL MPI_INIT (ierr)
required = MPI_THREAD_MULTIPLE
requ = 1476395012
WRITE (*,*) requ, required
CALL MPI_Init_thread (requ, provid, ierr)
WRITE (*,*) "MPI_Init_thread (", requ, provid, ierr, ")"

! REQUIRED = MPI_THREAD_MULTIPLE !MPI_THREAD_SINGLE, MPI_THREAD_FUNNELED, 
MPI_THREAD_SERIALIZED, MPI_THREAD_MULTIPLE ist evil
! CALL MPI_INIT_THREAD(REQUIRED, PROVIDED, ierr)
! WRITE(*,*) "Threading levels: ", MPI_THREAD_SINGLE, MPI_THREAD_FUNNELED, 
MPI_THREAD_SERIALIZED, MPI_THREAD_MULTIPLE
! WRITE(*,*) "Fordere multithreading an:", MPI_THREAD_MULTIPLE, REQUIRED, 
PROVIDED



!
!WRITE (*,*) "Nach MPI_INIT"
!CALL Sleep(30)
CALL MPI_COMM_RANK( MPI_COMM_WORLD, my_MPI_Rank, ierr )
!WRITE (*,*) "Nach MPI_COMM_RANK"
CALL MPI_GET_PROCESSOR_NAME(my_Host, laenge, ierr)
WRITE (*,*) "Prozessor ", my_MPI_Rank, "on Host: ", my_Host(1:laenge)

! sleeping or spinnig - the same behaviour
!CALL Sleep(2)
!DO WHILE (.TRUE.)
!ENDDO

CALL Sleep(1)

!IF (my_MPI_Rank == 1) STOP

CALL MPI_FINALIZE(ierr)
!
!WRITE (*,*) "Daswars"
!
END PROGRAM PK_MPI_Test


smime.p7s
Description: S/MIME Cryptographic Signature
___
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

Re: [OMPI devel] weird error message (you'll be puzzled!)

2017-03-03 Thread Gilles Gouaillardet
Thanks Paul,

It looks like we (indirectly) call MPI_Abort() when the argument is invalid.
That would explain the counter intuitive error message

Cheers,

Gilles 

Paul Kapinos  wrote:
>Dear Open MPI developer,
>please take a look at the attached 'hello MPI world' file.
>We know that it contain an error (you should never put '1476395012' into 
>MPI_Init_thread() call! It was a typo, initially...) BUT, see what happens if 
>you compile&call it:
>
>$ mpif90 -g mpihelloworld.f90
>$ ./a.out
>   1476395012   3
>*** The MPI_Init_thread() function was called before MPI_INIT was invoked.
>*** This is disallowed by the MPI standard.
>*** Your MPI job will now abort.
>[cluster-hpc.rz.RWTH-Aachen.DE:25739] Local abort before MPI_INIT completed 
>successfully; not able to aggregate error messages, and not able to guarantee 
>that all other processes were killed
>
>For me, reading this:
> > MPI_Init_thread() function was called before MPI_INIT was invoked.
> > This is disallowed by the MPI standard
>...produced some cognitive dissonance, as the MPI's calls to MPI_Init_thread 
>and 
>MPI_Init are well-known to be *mutually exclusive*. Well maybe with 
>'MPI_Init_thread() function' something Open MPI- internal is meant instead of 
>MPI's MPI_Init_thread, but the error message stays strongly unbelievable ( 2 + 
>2 
>= 6 !)
>
>Maybe you can text a better error message? :o)
>
>Have a nice day,
>
>Paul Kapinos
>
>P.S. Tested versions: 1.10.6 and 2.0.1, with support for MPI_THREAD_MULTIPLE
>
>
> > MPI_Init_thread(3) Open MPI 
> > MPI_Init_thread(3)
> > NAME
> >MPI_Init_thread - Initializes the MPI execution environment
> > ..
> > DESCRIPTION
> >This  routine, or MPI_Init, must be called before any other MPI 
> > routine
> >(apart from MPI_Initialized) is called. MPI can be initialized at  
> > most
> >once; subsequent calls to MPI_Init or MPI_Init_thread are erroneous.
> >
> >MPI_Init_thread,  as compared to MPI_Init, has a provision to 
> > request a
> >certain level of thread support in required:
>
>
> > MPI_Init(3)Open MPI
> > MPI_Init(3)
> > NAME
> >MPI_Init - Initializes the MPI execution environment
> > .
> > DESCRIPTION
> >This  routine,  or MPI_Init_thread, must be called before any other 
> > MPI
> >routine (apart from MPI_Initialized) is called. MPI can be  
> > initialized
> >at most once; subsequent calls to MPI_Init or MPI_Init_thread are 
> > erro-
> >neous.
>
>
>
>
>
>-- 
>Dipl.-Inform. Paul Kapinos   -   High Performance Computing,
>RWTH Aachen University, IT Center
>Seffenter Weg 23,  D 52074  Aachen (Germany)
>Tel: +49 241/80-24915
___
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel


Re: [OMPI devel] Open MPI, ssh and limits

2017-03-03 Thread George Bosilca
Isn't this supposed to be part of cluster 101?

I would rather add it to our faq, maybe in a slightly more generic way (not
only focused towards 'ulimit - c'. Otherwise we will be bound to define
what is forwarded and what is not, and potentially creates chaos for
knowledgeable users (that know how to deal with these issues).

George


On Mar 3, 2017 3:05 AM, "Gilles Gouaillardet"  wrote:

Folks,


this is a follow-up on https://www.mail-archive.com/u
s...@lists.open-mpi.org//msg30715.html


on my cluster, the core file size is 0 by default, but it can be set to
unlimited by any user.

i think this is a pretty common default.


$ ulimit -c
0
$ bash -c 'ulimit -c'
0
$ mpirun -np 1 bash -c 'ulimit -c'
0

$ mpirun -np 1 --host n1 bash -c 'ulimit -c'
0

$ ssh n1
[n1 ~]$ ulimit -c
0
[n1 ~]$ bash -c 'ulimit -c'
0

*but*

$ ssh motomachi-n1 bash -c 'ulimit -c'
unlimited


now if i manually set the core file size to unlimited

$ ulimit -c unlimited
$ ulimit -c
unlimited
$ bash -c 'ulimit -c'
unlimited
$ mpirun -np 1 bash -c 'ulimit -c'
unlimited


*but*

$ mpirun -np 1 --host n1 bash -c 'ulimit -c'
0


fun fact

$ ssh n1 bash -c 'ulimit -c; bash -c "ulimit -c"'
unlimited
0


bottom line, MPI tasks that run on the same node mpirun was invoked on
inherit

the core file size limit from mpirun, whereas tasks that run on the other
node

use the default core file size limit.


a manual workaround is

mpirun --mca opal_set_max_sys_limits core:unlimited ...


i guess we should do something about that, but what

- just document it

- mpirun forwards all/some limits to all the spawned tasks regardless where
they run

- mpirun forwards all/some limits to all the spawned tasks regardless where
they run

  but only if they are 0 or unlimited

- something else



thoughts anyone ?


Gilles


___
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
___
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

Re: [OMPI devel] Open MPI, ssh and limits

2017-03-03 Thread Jeff Squyres (jsquyres)
On Mar 3, 2017, at 3:05 AM, Gilles Gouaillardet  wrote:
> 
> bottom line, MPI tasks that run on the same node mpirun was invoked on inherit
> the core file size limit from mpirun, whereas tasks that run on the other node
> use the default core file size limit.

I am not sure that this is inconsistent.  This is quite similar to -- at least 
with ssh -- how environment variables can propagate (or not).

With SSH:
- environment variables:
  - propagate from mpirun's environment if the user specifies -x on the mpirun 
command line
  - are set per the user's shell startup files on each node
- process limits (e.g., corefile size)
  - are set per ssh's defaults and the user's shell startup files on each node

With a resource manager (e.g., SLURM)
- environment variables:
  - (typically) propagate from mpirun's environment via the resource manager
- process limits (e.g,. corefile size)
  - may or may not propagate from mpirun's environment, but the resource 
manager may impose its own limits

It sounds like we already have an MCA param that allows propagating those 
limits to override ssh / resource manager / shell startup file settings.  I 
think that's probably enough.

> 
> a manual workaround is
> 
> mpirun --mca opal_set_max_sys_limits core:unlimited ...
> 
> 
> i guess we should do something about that, but what
> 
> - just document it
> 
> - mpirun forwards all/some limits to all the spawned tasks regardless where 
> they run
> 
> - mpirun forwards all/some limits to all the spawned tasks regardless where 
> they run
> 
>  but only if they are 0 or unlimited
> 
> - something else

I think the first option (documenting it) is probably a good idea.  The FAQ 
would likely be a good place for this (and maybe also the README?).

-- 
Jeff Squyres
jsquy...@cisco.com

___
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel


[OMPI devel] Git commit messages: a request

2017-03-03 Thread Jeff Squyres (jsquyres)
Developers --

A request: when you have a commit that fixes a user-reported issue, please 
thank the user in the commit message (e.g., "Thanks to Joe Shmoe for reporting 
the issue.").  It's good for the community, and it also *really* helps us write 
the NEWS file when it comes time for release.

Some of us do this already, but I don't think we're all always consistent about 
it.

Thanks!

-- 
Jeff Squyres
jsquy...@cisco.com

___
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel