Re: [OMPI users] Problem with mpi_comm_spawn_multiple
Dear Ralph, thanks for that. I have done much the same (as I indicated in my original post). I this case my C-program correctly spawned the slaves and the slaves printed the correctly passed argument lists. On running this and my fortran slave I get: nsize, mytid: iargs 2 0 : 1 spray: 0 1:1 2 3 4 nsize, mytid: iargs 2 1 : 1 spray: 1 1:5 6 7 8 which is what I expect. I still think the error may well be mine rather that ompi's but I am at a loss to see what is going on !! Thanks for the help so far, Fred Marquis. c-program = #include "mpi.h" #include #include int main( int argc, char *argv[] ) { int np[2] = { 1, 1 }; int errcodes[2]; char *cmds[2] = { "./spray", "./spray" }; char *args[2] = { "1 2 3 4", "5 6 7 8" }; char **array_of_argv[2]; char *argv0[] = {"1 2 3 4", (char *)0}; char *argv1[] = {"5 6 7 8", (char *)0}; array_of_argv[0] = argv0; array_of_argv[1] = argv1; MPI_Comm parentcomm, intercomm; MPI_Info infos[2] = { MPI_INFO_NULL, MPI_INFO_NULL }; MPI_Init( &argc, &argv ); MPI_Comm_spawn_multiple( 2, cmds, array_of_argv, np, infos, 0, MPI_COMM_WORLD, &intercomm, errcodes ); MPI_Finalize(); return 0; } On Wed, May 05, 2010 at 07:47:20PM +0100, Ralph Castain wrote: > I think OMPI is okay - here is a C sample program and the associated output: > > $ mpirun -np 3 ./spawn_multiple > Parent [pid 98895] about to spawn! > Parent [pid 98896] about to spawn! > Parent [pid 98897] about to spawn! > Parent done with spawn > Parent sending message to children > Parent done with spawn > Parent done with spawn > Hello from the child 0 of 2 on host Ralph pid 98898: argv[1] = foo > Child 0 received msg: 38 > Hello from the child 1 of 2 on host Ralph pid 98899: argv[1] = bar > Parent disconnected > Parent disconnected > Child 1 disconnected > Child 0 disconnected > Parent disconnected > > > > On May 5, 2010, at 12:08 PM, Fred Marquis wrote: > > > Hi, > > > > I am using mpi_comm_spawn_multiple to spawn multiple commands with > > argument lists. I am trying to do this in fortran (77) using version > > openmpi-1.4.1 and the ifort compiler v9.0. The operating system is SuSE > > Linux 10.1 (x86-64). > > > > I have put together a simple controlling example program (test_pbload.F) > > and an example slave program (spray.F) to try and explain my problem. > > > > In the controlling program mpi_comm_spawn_multiple is used to set 2 copies > > of the slave running. The first is started with the argument list "1 2 3 4" > > and the second with "5 6 7 8". > > > > The slaves are started OK and the slaves print out the argument lists and > > exit. In addition the slaves print out their rank numbers so I can see > > which argument list belongs to which slave. > > > > What I am finding is that the argument lists are not being sent to the > > slaves correctly, indeed both slaves seem to be getting both arguments > > lists !!! > > > > To compile and run the programs I follow the steps below. > > > > Controlling program "test_pbload.F" > > > > mpif77 -o test_pbload test_pbload.F > > > > Slave program "spray.F" > > > > mpif77 -o spray spray.F > > > > Run the controller > > > > mpirun -np 1 test_pbload > > > > > > > > > > The output of which is from the first slave: > > > > nsize, mytid: iargs 2 0 : 2 > > spray: 0 1:1 2 3 4 < FIRST ARGUMENT > > spray: 0 2:4 5 6 7 < SECOND ARGUMENT > > > > and the second slave: > > > > nsize, mytid: iargs 2 1 : 2 > > spray: 1 1:1 2 3 4 < FIRST ARGUMENT > > spray: 1 2:4 5 6 7 < SECOND ARGUMENT > > > > In each case the arguments (2 in both cases) are the same. > > > > I have written a C version of the controlling program and everthing works > > as expected so I presume that I have either got the specification of the > > argument list wrong or I have discovered an error/bug. At the moment I > > working on the former -- but am at a loss to see what is wrong !! > > > > Any help, pointers etc really appreciated. > > > > > > Controlling program (that uses MPI_COMM_SPAWN_MULTIPLE) test_pbload.F > > > > program main > > c > > implicit none > > #include "mpif.h" > > > > integer error > > integer intercomm > > CHARACTER*25 commands(2), argvs(2, 2) > > integer nprocs(2),info(2),ncpus > > c > > call mpi_init(error) > > c > > ncpus = 2 > > c > > commands(1) = ' ./spray ' > > nprocs(1) = 1 > > info(1) = MPI_INFO_NULL > > argvs(1, 1) = ' 1 2 3 4 ' > > argvs(1, 2) = ' ' > > c > > commands(2) = ' ./spray ' > > nprocs(2) = 1 > > info(2) = MPI_INFO_NULL > > argvs(2, 1) = ' 4 5 6 7 ' > > argvs(2, 2) = ' ' > > c > > call mpi_comm_
Re: [OMPI users] Problem with mpi_comm_spawn_multiple
Dear Jeff, am afraid not, as I said in my original post I am using the Intel ifort compiler version 9.0, i.e. fred@prandtl:~> mpif77 -V Intel(R) Fortran Compiler for Intel(R) EM64T-based applications, Version 9.0 Build 20060222 Package ID: Copyright (C) 1985-2006 Intel Corporation. All rights reserved. FOR NON-COMMERCIAL USE ONLY I have been looking at this myself and have noted a couple of things, some of these need cross-checking (I am using different computers and different setups and different compilers and different openmpi releases !!) but my thoughts at the moment are (point number (4) is possibly the most important so far): 1) If I allocate the string array using an allocate statement then I see that ALL of the string locations are initialised to "\0" (character 0). 2) If I set part of a location in the string array then all the OTHER characters in the same location are set to " " (character 32). 3) If the character array is defined via a dimension statement then the locations in the array seem to be initialised at random. 4) Looking at the output from my test program I noticed and odd pattern in the arguments being sent to the slaves (yes I do need to quantify this better !!). However this caused me to look at the ompi source, in particular I am looking at: openmpi-1.4.1/ompi/mpi/f77/base/strings.c In particular at the bottom (line 156( in function "ompi_fortran_multiple_argvs_f2c" at the end of the for statement there is the line: current_array += len * i; The "* i" looks wrong to me I am thinking it should just be: current_array += len; making this change improves things BUT like you suggest in your email there seems to be a problem locating the end of the 2d-array elements. I will try and look at this more over the w/e. Fred Marquis. On Fri, May 07, 2010 at 10:02:48PM +0100, Jeff Squyres wrote: > Greetings Fred. > > After looking at this for more hours than I'd care to admit, I'm wondering if > this is a bug in gfortran. I can replicate your problem with a simple > program on gfortran 4.1 on RHEL 5.4, but it doesn't happen with the Intel > Fortran compiler (11.1) or the PGI fortran compiler (10.0). > > One of the issues appears how to determine how Fortran 2d CHARACTER arrays > are terminated. I can't figure out how gfortran is terminating them -- but > intel and PGI both terminate them by having an empty string at the end. > > Are you using gfortran 4.1, perchance? > > > > > On May 5, 2010, at 2:08 PM, Fred Marquis wrote: > > > Hi, > > > > I am using mpi_comm_spawn_multiple to spawn multiple commands with > > argument lists. I am trying to do this in fortran (77) using version > > openmpi-1.4.1 and the ifort compiler v9.0. The operating system is SuSE > > Linux 10.1 (x86-64). > > > > I have put together a simple controlling example program (test_pbload.F) > > and an example slave program (spray.F) to try and explain my problem. > > > > In the controlling program mpi_comm_spawn_multiple is used to set 2 copies > > of the slave running. The first is started with the argument list "1 2 3 4" > > and the second with "5 6 7 8". > > > > The slaves are started OK and the slaves print out the argument lists and > > exit. In addition the slaves print out their rank numbers so I can see > > which argument list belongs to which slave. > > > > What I am finding is that the argument lists are not being sent to the > > slaves correctly, indeed both slaves seem to be getting both arguments > > lists !!! > > > > To compile and run the programs I follow the steps below. > > > > Controlling program "test_pbload.F" > > > >mpif77 -o test_pbload test_pbload.F > > > > Slave program "spray.F" > > > >mpif77 -o spray spray.F > > > > Run the controller > > > >mpirun -np 1 test_pbload > > > > > > > > > > The output of which is from the first slave: > > > > nsize, mytid: iargs 2 0 : 2 > > spray: 0 1:1 2 3 4 < FIRST ARGUMENT > > spray: 0 2:4 5 6 7 < SECOND ARGUMENT > > > > and the second slave: > > > > nsize, mytid: iargs 2 1 : 2 > > spray: 1 1:1 2 3 4 < FIRST ARGUMENT > > spray: 1 2:4 5 6 7 < SECOND ARGUMENT > > > > In each case the arguments (2 in both cases) are the same. > > > > I have written a C version of the controlling program and everthing works > > as expected so I presume that I have either got the specification of the > > argument list wrong or I have discovered an error/bug. At the moment I > > working on the former -- but am at a loss to see what is wrong !! > > > > Any help, pointers etc really appreciated. > > > > > > Controlling program (that uses MPI_COMM_SPAWN_MULTIPLE) test_pbload.F > > > > program main > > c > > implicit none > > #include "mpif.h" > > > > integer error > > integer intercomm > > CHARACTER*25 commands(2), argvs(2, 2)
Re: [OMPI users] Problem with mpi_comm_spawn_multiple
Dear Jeff, thats odd !! fred@prandtl:~/test/fortran-c-2d-char> make CC=icc FC=ifort ifort -g -c -o main.o main.f icc -g -c -o c_func.o c_func.c Error: A license for CComp is not available (-5,357). I will look into this tomorrow, time for bed I am afraid !! Fred Marquis. On Fri, May 07, 2010 at 10:49:40PM +0100, Jeff Squyres wrote: > Yoinks; I missed that -- sorry! > > Here's a simple tarball; can you try this with your compiler? Just untar it > and > > make CC=icc FC=ifort > ./main > > Do you see only 6 entries in the array? > > (I have icc 9.0, but I'm now running RHEL 5.4, and the gcc version with it is > too new for icc 9.0 -- so I can't run it) > > > On May 7, 2010, at 5:44 PM, Andrew J Marquis wrote: > > > Dear Jeff, > > > >am afraid not, as I said in my original post I am using the Intel ifort > > compiler version 9.0, i.e. > > > > fred@prandtl:~> mpif77 -V > > > > Intel(R) Fortran Compiler for Intel(R) EM64T-based applications, Version > > 9.0Build 20060222 Package ID: > > Copyright (C) 1985-2006 Intel Corporation. All rights reserved. > > FOR NON-COMMERCIAL USE ONLY > > > > > > I have been looking at this myself and have noted a couple of things, some > > of these need cross-checking (I am using different computers and different > > setups and different compilers and different openmpi releases !!) but > > my thoughts at the moment are (point number (4) is possibly the most > > important so far): > > > > 1) If I allocate the string array using an allocate statement then I see > > that ALL of the string locations are initialised to "\0" (character 0). > > > > 2) If I set part of a location in the string array then all the OTHER > > characters in the same location are set to " " (character 32). > > > > 3) If the character array is defined via a dimension statement then the > > locations in the array seem to be initialised at random. > > > > 4) Looking at the output from my test program I noticed and odd pattern in > > the arguments being sent to the slaves (yes I do need to quantify this > > better !!). However this caused me to look at the ompi source, in > > particular I am looking at: > > > >openmpi-1.4.1/ompi/mpi/f77/base/strings.c > > > > In particular at the bottom (line 156( in function > > "ompi_fortran_multiple_argvs_f2c" at the end of the for statement there is > > the line: > > > >current_array += len * i; > > > > The "* i" looks wrong to me I am thinking it should just be: > > > >current_array += len; > > > > making this change improves things BUT like you suggest in your email there > > seems to be a problem locating the end of the 2d-array elements. > > > > > > > > I will try and look at this more over the w/e. > > > > Fred Marquis. > > > > > > On Fri, May 07, 2010 at 10:02:48PM +0100, Jeff Squyres wrote: > > > Greetings Fred. > > > > > > After looking at this for more hours than I'd care to admit, I'm > > > wondering if this is a bug in gfortran. I can replicate your problem > > > with a simple program on gfortran 4.1 on RHEL 5.4, but it doesn't happen > > > with the Intel Fortran compiler (11.1) or the PGI fortran compiler (10.0). > > > > > > One of the issues appears how to determine how Fortran 2d CHARACTER > > > arrays are terminated. I can't figure out how gfortran is terminating > > > them -- but intel and PGI both terminate them by having an empty string > > > at the end. > > > > > > Are you using gfortran 4.1, perchance? > > > > > > > > > > > > > > > On May 5, 2010, at 2:08 PM, Fred Marquis wrote: > > > > > > > Hi, > > > > > > > > I am using mpi_comm_spawn_multiple to spawn multiple commands with > > > > argument lists. I am trying to do this in fortran (77) using version > > > > openmpi-1.4.1 and the ifort compiler v9.0. The operating system is SuSE > > > > Linux 10.1 (x86-64). > > > > > > > > I have put together a simple controlling example program > > > > (test_pbload.F) and an example slave program (spray.F) to try and > > > > explain my problem. > > > > > > > > In the controlling program mpi_comm_spawn_multiple is used to set 2 > > > > copies of the slave running. The first
Re: [OMPI users] Problem with mpi_comm_spawn_multiple
Dear Jeff, following the failure I just reported I changed the CC=icc to CC=cc and reran and got this: fred@prandtl:~/test/fortran-c-2d-char> make CC=cc FC=ifort cc -g -c -o c_func.o c_func.c ifort -g main.o c_func.o -g -o main fred@prandtl:~/test/fortran-c-2d-char> ./main Got leading dimension: 2 Got string len: 14 Found string: 1 2 3 4 Found string: 4 5 6 7 Found string: hello Found string: goodbye Found string: helloagain Found string: goodbyeagain End of the array -- found 6 entries Fred Marquis. On Fri, May 07, 2010 at 10:49:40PM +0100, Jeff Squyres wrote: > Yoinks; I missed that -- sorry! > > Here's a simple tarball; can you try this with your compiler? Just untar it > and > > make CC=icc FC=ifort > ./main > > Do you see only 6 entries in the array? > > (I have icc 9.0, but I'm now running RHEL 5.4, and the gcc version with it is > too new for icc 9.0 -- so I can't run it) > > > On May 7, 2010, at 5:44 PM, Andrew J Marquis wrote: > > > Dear Jeff, > > > >am afraid not, as I said in my original post I am using the Intel ifort > > compiler version 9.0, i.e. > > > > fred@prandtl:~> mpif77 -V > > > > Intel(R) Fortran Compiler for Intel(R) EM64T-based applications, Version > > 9.0Build 20060222 Package ID: > > Copyright (C) 1985-2006 Intel Corporation. All rights reserved. > > FOR NON-COMMERCIAL USE ONLY > > > > > > I have been looking at this myself and have noted a couple of things, some > > of these need cross-checking (I am using different computers and different > > setups and different compilers and different openmpi releases !!) but > > my thoughts at the moment are (point number (4) is possibly the most > > important so far): > > > > 1) If I allocate the string array using an allocate statement then I see > > that ALL of the string locations are initialised to "\0" (character 0). > > > > 2) If I set part of a location in the string array then all the OTHER > > characters in the same location are set to " " (character 32). > > > > 3) If the character array is defined via a dimension statement then the > > locations in the array seem to be initialised at random. > > > > 4) Looking at the output from my test program I noticed and odd pattern in > > the arguments being sent to the slaves (yes I do need to quantify this > > better !!). However this caused me to look at the ompi source, in > > particular I am looking at: > > > >openmpi-1.4.1/ompi/mpi/f77/base/strings.c > > > > In particular at the bottom (line 156( in function > > "ompi_fortran_multiple_argvs_f2c" at the end of the for statement there is > > the line: > > > >current_array += len * i; > > > > The "* i" looks wrong to me I am thinking it should just be: > > > >current_array += len; > > > > making this change improves things BUT like you suggest in your email there > > seems to be a problem locating the end of the 2d-array elements. > > > > > > > > I will try and look at this more over the w/e. > > > > Fred Marquis. > > > > > > On Fri, May 07, 2010 at 10:02:48PM +0100, Jeff Squyres wrote: > > > Greetings Fred. > > > > > > After looking at this for more hours than I'd care to admit, I'm > > > wondering if this is a bug in gfortran. I can replicate your problem > > > with a simple program on gfortran 4.1 on RHEL 5.4, but it doesn't happen > > > with the Intel Fortran compiler (11.1) or the PGI fortran compiler (10.0). > > > > > > One of the issues appears how to determine how Fortran 2d CHARACTER > > > arrays are terminated. I can't figure out how gfortran is terminating > > > them -- but intel and PGI both terminate them by having an empty string > > > at the end. > > > > > > Are you using gfortran 4.1, perchance? > > > > > > > > > > > > > > > On May 5, 2010, at 2:08 PM, Fred Marquis wrote: > > > > > > > Hi, > > > > > > > > I am using mpi_comm_spawn_multiple to spawn multiple commands with > > > > argument lists. I am trying to do this in fortran (77) using version > > > > openmpi-1.4.1 and the ifort compiler v9.0. The operating system is SuSE > > > > Linux 10.1 (x86-64). > > > > > > > > I have put together a simple controlling example program > > > > (test_pbload.F) and an example slave program