[OMPI users] Code failing when requesting all "processors"

2020-10-13 Thread Diego Zuccato via users
Hello all.

I have a problem on a server: launching a job with mpirun fails if I
request all 32 CPUs (threads, since HT is enabled) but succeeds if I
only request 30.

The test code is really minimal:
-8<--
#include "mpi.h"
#include 
#include 
#define  MASTER 0

int main (int argc, char *argv[])
{
  int   numtasks, taskid, len;
  char hostname[MPI_MAX_PROCESSOR_NAME];
  MPI_Init(&argc, &argv);
//  int provided=0;
//  MPI_Init_thread(&argc, &argv, MPI_THREAD_MULTIPLE, &provided);
//printf("MPI provided threads: %d\n", provided);
  MPI_Comm_size(MPI_COMM_WORLD, &numtasks);
  MPI_Comm_rank(MPI_COMM_WORLD,&taskid);

  if (taskid == MASTER)
printf("This is an MPI parallel code for Hello World with no
communication\n");
  //MPI_Barrier(MPI_COMM_WORLD);


  MPI_Get_processor_name(hostname, &len);

  printf ("Hello from task %d on %s!\n", taskid, hostname);

  if (taskid == MASTER)
printf("MASTER: Number of MPI tasks is: %d\n",numtasks);

  MPI_Finalize();

  printf("END OF CODE from task %d\n", taskid);
}
-8<--
(the commented section is a leftover of one of the tests).

The error is :
-8<--
[str957-bl0-03:19637] *** Process received signal ***
[str957-bl0-03:19637] Signal: Segmentation fault (11)
[str957-bl0-03:19637] Signal code: Address not mapped (1)
[str957-bl0-03:19637] Failing at address: 0x77fac008
[str957-bl0-03:19637] [ 0]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x12730)[0x77e92730]
[str957-bl0-03:19637] [ 1]
/usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_gds_ds21.so(+0x2936)[0x7646d936]
[str957-bl0-03:19637] [ 2]
/usr/lib/x86_64-linux-gnu/libmca_common_dstore.so.1(pmix_common_dstor_init+0x9d3)[0x76444733]
[str957-bl0-03:19637] [ 3]
/usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_gds_ds21.so(+0x25b4)[0x7646d5b4]
[str957-bl0-03:19637] [ 4]
/usr/lib/x86_64-linux-gnu/libpmix.so.2(pmix_gds_base_select+0x12e)[0x7659346e]
[str957-bl0-03:19637] [ 5]
/usr/lib/x86_64-linux-gnu/libpmix.so.2(pmix_rte_init+0x8cd)[0x7654b88d]
[str957-bl0-03:19637] [ 6]
/usr/lib/x86_64-linux-gnu/libpmix.so.2(PMIx_Init+0xdc)[0x76507d7c]
[str957-bl0-03:19637] [ 7]
/usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pmix_ext2x.so(ext2x_client_init+0xc4)[0x76603fe4]
[str957-bl0-03:19637] [ 8]
/usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_ess_pmi.so(+0x2656)[0x77fb1656]
[str957-bl0-03:19637] [ 9]
/usr/lib/x86_64-linux-gnu/libopen-rte.so.40(orte_init+0x29a)[0x77c1c11a]
[str957-bl0-03:19637] [10]
/usr/lib/x86_64-linux-gnu/libmpi.so.40(ompi_mpi_init+0x252)[0x77eece62]
[str957-bl0-03:19637] [11]
/usr/lib/x86_64-linux-gnu/libmpi.so.40(MPI_Init+0x6e)[0x77f1b17e]
[str957-bl0-03:19637] [12] ./mpitest-debug(+0x11c6)[0x51c6]
[str957-bl0-03:19637] [13]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xeb)[0x77ce309b]
[str957-bl0-03:19637] [14] ./mpitest-debug(+0x10da)[0x50da]
[str957-bl0-03:19637] *** End of error message ***
-8<--

I'm using Debian stable packages. On other servers there is no problem
(but there was in the past, and it got "solved" by just installing gdb).

Any hints?

TIA

-- 
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786


Re: [OMPI users] Code failing when requesting all "processors"

2020-10-13 Thread Jeff Squyres (jsquyres) via users
That's odd.  What version of Open MPI are you using?


> On Oct 13, 2020, at 6:34 AM, Diego Zuccato via users 
>  wrote:
> 
> Hello all.
> 
> I have a problem on a server: launching a job with mpirun fails if I
> request all 32 CPUs (threads, since HT is enabled) but succeeds if I
> only request 30.
> 
> The test code is really minimal:
> -8<--
> #include "mpi.h"
> #include 
> #include 
> #define  MASTER 0
> 
> int main (int argc, char *argv[])
> {
>  int   numtasks, taskid, len;
>  char hostname[MPI_MAX_PROCESSOR_NAME];
>  MPI_Init(&argc, &argv);
> //  int provided=0;
> //  MPI_Init_thread(&argc, &argv, MPI_THREAD_MULTIPLE, &provided);
> //printf("MPI provided threads: %d\n", provided);
>  MPI_Comm_size(MPI_COMM_WORLD, &numtasks);
>  MPI_Comm_rank(MPI_COMM_WORLD,&taskid);
> 
>  if (taskid == MASTER)
>printf("This is an MPI parallel code for Hello World with no
> communication\n");
>  //MPI_Barrier(MPI_COMM_WORLD);
> 
> 
>  MPI_Get_processor_name(hostname, &len);
> 
>  printf ("Hello from task %d on %s!\n", taskid, hostname);
> 
>  if (taskid == MASTER)
>printf("MASTER: Number of MPI tasks is: %d\n",numtasks);
> 
>  MPI_Finalize();
> 
>  printf("END OF CODE from task %d\n", taskid);
> }
> -8<--
> (the commented section is a leftover of one of the tests).
> 
> The error is :
> -8<--
> [str957-bl0-03:19637] *** Process received signal ***
> [str957-bl0-03:19637] Signal: Segmentation fault (11)
> [str957-bl0-03:19637] Signal code: Address not mapped (1)
> [str957-bl0-03:19637] Failing at address: 0x77fac008
> [str957-bl0-03:19637] [ 0]
> /lib/x86_64-linux-gnu/libpthread.so.0(+0x12730)[0x77e92730]
> [str957-bl0-03:19637] [ 1]
> /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_gds_ds21.so(+0x2936)[0x7646d936]
> [str957-bl0-03:19637] [ 2]
> /usr/lib/x86_64-linux-gnu/libmca_common_dstore.so.1(pmix_common_dstor_init+0x9d3)[0x76444733]
> [str957-bl0-03:19637] [ 3]
> /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_gds_ds21.so(+0x25b4)[0x7646d5b4]
> [str957-bl0-03:19637] [ 4]
> /usr/lib/x86_64-linux-gnu/libpmix.so.2(pmix_gds_base_select+0x12e)[0x7659346e]
> [str957-bl0-03:19637] [ 5]
> /usr/lib/x86_64-linux-gnu/libpmix.so.2(pmix_rte_init+0x8cd)[0x7654b88d]
> [str957-bl0-03:19637] [ 6]
> /usr/lib/x86_64-linux-gnu/libpmix.so.2(PMIx_Init+0xdc)[0x76507d7c]
> [str957-bl0-03:19637] [ 7]
> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pmix_ext2x.so(ext2x_client_init+0xc4)[0x76603fe4]
> [str957-bl0-03:19637] [ 8]
> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_ess_pmi.so(+0x2656)[0x77fb1656]
> [str957-bl0-03:19637] [ 9]
> /usr/lib/x86_64-linux-gnu/libopen-rte.so.40(orte_init+0x29a)[0x77c1c11a]
> [str957-bl0-03:19637] [10]
> /usr/lib/x86_64-linux-gnu/libmpi.so.40(ompi_mpi_init+0x252)[0x77eece62]
> [str957-bl0-03:19637] [11]
> /usr/lib/x86_64-linux-gnu/libmpi.so.40(MPI_Init+0x6e)[0x77f1b17e]
> [str957-bl0-03:19637] [12] ./mpitest-debug(+0x11c6)[0x51c6]
> [str957-bl0-03:19637] [13]
> /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xeb)[0x77ce309b]
> [str957-bl0-03:19637] [14] ./mpitest-debug(+0x10da)[0x50da]
> [str957-bl0-03:19637] *** End of error message ***
> -8<--
> 
> I'm using Debian stable packages. On other servers there is no problem
> (but there was in the past, and it got "solved" by just installing gdb).
> 
> Any hints?
> 
> TIA
> 
> -- 
> Diego Zuccato
> DIFA - Dip. di Fisica e Astronomia
> Servizi Informatici
> Alma Mater Studiorum - Università di Bologna
> V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
> tel.: +39 051 20 95786


-- 
Jeff Squyres
jsquy...@cisco.com



Re: [OMPI users] Code failing when requesting all "processors"

2020-10-13 Thread Gus Correa via users
Can you use taskid after MPI_Finalize?
Isn't it undefined/deallocated at that point?
Just a question (... or two) ...

Gus Correa

>  MPI_Finalize();
>
>  printf("END OF CODE from task %d\n", taskid);





On Tue, Oct 13, 2020 at 10:34 AM Jeff Squyres (jsquyres) via users <
users@lists.open-mpi.org> wrote:

> That's odd.  What version of Open MPI are you using?
>
>
> > On Oct 13, 2020, at 6:34 AM, Diego Zuccato via users <
> users@lists.open-mpi.org> wrote:
> >
> > Hello all.
> >
> > I have a problem on a server: launching a job with mpirun fails if I
> > request all 32 CPUs (threads, since HT is enabled) but succeeds if I
> > only request 30.
> >
> > The test code is really minimal:
> > -8<--
> > #include "mpi.h"
> > #include 
> > #include 
> > #define  MASTER 0
> >
> > int main (int argc, char *argv[])
> > {
> >  int   numtasks, taskid, len;
> >  char hostname[MPI_MAX_PROCESSOR_NAME];
> >  MPI_Init(&argc, &argv);
> > //  int provided=0;
> > //  MPI_Init_thread(&argc, &argv, MPI_THREAD_MULTIPLE, &provided);
> > //printf("MPI provided threads: %d\n", provided);
> >  MPI_Comm_size(MPI_COMM_WORLD, &numtasks);
> >  MPI_Comm_rank(MPI_COMM_WORLD,&taskid);
> >
> >  if (taskid == MASTER)
> >printf("This is an MPI parallel code for Hello World with no
> > communication\n");
> >  //MPI_Barrier(MPI_COMM_WORLD);
> >
> >
> >  MPI_Get_processor_name(hostname, &len);
> >
> >  printf ("Hello from task %d on %s!\n", taskid, hostname);
> >
> >  if (taskid == MASTER)
> >printf("MASTER: Number of MPI tasks is: %d\n",numtasks);
> >
> >  MPI_Finalize();
> >
> >  printf("END OF CODE from task %d\n", taskid);
> > }
> > -8<--
> > (the commented section is a leftover of one of the tests).
> >
> > The error is :
> > -8<--
> > [str957-bl0-03:19637] *** Process received signal ***
> > [str957-bl0-03:19637] Signal: Segmentation fault (11)
> > [str957-bl0-03:19637] Signal code: Address not mapped (1)
> > [str957-bl0-03:19637] Failing at address: 0x77fac008
> > [str957-bl0-03:19637] [ 0]
> > /lib/x86_64-linux-gnu/libpthread.so.0(+0x12730)[0x77e92730]
> > [str957-bl0-03:19637] [ 1]
> >
> /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_gds_ds21.so(+0x2936)[0x7646d936]
> > [str957-bl0-03:19637] [ 2]
> >
> /usr/lib/x86_64-linux-gnu/libmca_common_dstore.so.1(pmix_common_dstor_init+0x9d3)[0x76444733]
> > [str957-bl0-03:19637] [ 3]
> >
> /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_gds_ds21.so(+0x25b4)[0x7646d5b4]
> > [str957-bl0-03:19637] [ 4]
> >
> /usr/lib/x86_64-linux-gnu/libpmix.so.2(pmix_gds_base_select+0x12e)[0x7659346e]
> > [str957-bl0-03:19637] [ 5]
> >
> /usr/lib/x86_64-linux-gnu/libpmix.so.2(pmix_rte_init+0x8cd)[0x7654b88d]
> > [str957-bl0-03:19637] [ 6]
> > /usr/lib/x86_64-linux-gnu/libpmix.so.2(PMIx_Init+0xdc)[0x76507d7c]
> > [str957-bl0-03:19637] [ 7]
> >
> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pmix_ext2x.so(ext2x_client_init+0xc4)[0x76603fe4]
> > [str957-bl0-03:19637] [ 8]
> >
> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_ess_pmi.so(+0x2656)[0x77fb1656]
> > [str957-bl0-03:19637] [ 9]
> >
> /usr/lib/x86_64-linux-gnu/libopen-rte.so.40(orte_init+0x29a)[0x77c1c11a]
> > [str957-bl0-03:19637] [10]
> >
> /usr/lib/x86_64-linux-gnu/libmpi.so.40(ompi_mpi_init+0x252)[0x77eece62]
> > [str957-bl0-03:19637] [11]
> > /usr/lib/x86_64-linux-gnu/libmpi.so.40(MPI_Init+0x6e)[0x77f1b17e]
> > [str957-bl0-03:19637] [12] ./mpitest-debug(+0x11c6)[0x51c6]
> > [str957-bl0-03:19637] [13]
> > /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xeb)[0x77ce309b]
> > [str957-bl0-03:19637] [14] ./mpitest-debug(+0x10da)[0x50da]
> > [str957-bl0-03:19637] *** End of error message ***
> > -8<--
> >
> > I'm using Debian stable packages. On other servers there is no problem
> > (but there was in the past, and it got "solved" by just installing gdb).
> >
> > Any hints?
> >
> > TIA
> >
> > --
> > Diego Zuccato
> > DIFA - Dip. di Fisica e Astronomia
> > Servizi Informatici
> > Alma Mater Studiorum - Università di Bologna
> > V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
> > tel.: +39 051 20 95786
>
>
> --
> Jeff Squyres
> jsquy...@cisco.com
>
>


Re: [OMPI users] Code failing when requesting all "processors"

2020-10-13 Thread Jeff Squyres (jsquyres) via users
On Oct 13, 2020, at 10:43 AM, Gus Correa via users  
wrote:
> 
> Can you use taskid after MPI_Finalize?

Yes.  It's a variable, just like any other.

> Isn't it undefined/deallocated at that point?

No.  MPI filled it in during MPI_Comm_rank() and then never touched it again.

So even though MPI may have shut down, the value that it loaded into taskid is 
still valid/initialized.

-- 
Jeff Squyres
jsquy...@cisco.com