Ok - the code runs locally fine - but not on 'SunGridEngine' > % qsub -pe kmbmpi 4 my_mpirun_job.sh
Wrt SGE - what does it require from MPI. Is it MPI agnostic - or does it need a perticular MPI to be used? What if you use PETSc installed with --download-mpich instead? How will it know how to schedule these MPI jobs? Or does it require a peritcular MPI - installed in a perticular way - on all the nodes on the grid? BTW: what do you have for 'ldd ex19'? Satish On Thu, 17 Dec 2009, Kevin.Buckley at ecs.vuw.ac.nz wrote: > > Lets ignore 'Sun Grid Engine environment' initially and just figureout > > your PETSc install. > > > > - What MPI is it built with? Send us the output for the compile of ex19 > > > > - you claim 'make test' worked fine - i.e this example ran fine > > paralley. can you confrim this with manual run? > > > > [if thats the case - then PETSc would be working correctly with the > > MPI specified] > > > > > >>From the info below -- the example crashes happen only in 'Sun Grid > > Engine environment' What is that? And why should binaries compiled > > with this default 'MPI' - work in that grid enviornment - without > > recompiling with a different 'sun-grid-mpi' ? > > > > > > Satish > > Someone else using the PISM software, over in Alaska as it happens, > which sits on top of PETSc here, has seen similar errors so I am > thinking that it may not be just my environment, which I doubt > matches theirs. > > For your consideration though: > > =========== > > My PETSc was built against OpenMPI 1.4 > > =========== > > Compilation of the example in question shows: > > $ export PETSC_DIR=/vol/grid/pkg/petsc-3.0.0-p7 > > $gmake ex19 > mpicc -o ex19.o -c -Wall -Wwrite-strings -Wno-strict-aliasing -g3 > -I/vol/grid/pkg/petsc-3.0.0-p7/include > -I/vol/grid/pkg/petsc-3.0.0-p7/include -I/usr/pkg/include > -D__SDIR__="src/snes/examples/tutorials/" ex19.c > mpicc -Wall -Wwrite-strings -Wno-strict-aliasing -g3 -o ex19 ex19.o > -Wl,-rpath,/vol/grid/pkg/petsc-3.0.0-p7/lib > -L/vol/grid/pkg/petsc-3.0.0-p7/lib -lpetscsnes -lpetscksp -lpetscdm > -lpetscmat -lpetscvec -lpetsc -L/usr/pkg/lib -lX11 -llapack -lblas > -L/usr/pkg/lib -lmpi -lopen-rte -lopen-pal -lutil -lpthread -lgcc_eh > -Wl,-rpath,/usr/pkg/lib -lmpi_f77 -lf95 -lm -lm > -L/usr/pkg/lib/gcc-lib/i386--netbsdelf/4.0.3 -L/lib -lm -lm -lmpi_cxx > -lstdc++ -lgcc_s -lmpi_cxx -lstdc++ -lgcc_s -lmpi -lopen-rte -lopen-pal > -lutil -lpthread -lgcc_eh > /bin/rm -f ex19.o > > $./ex19 > lid velocity = 0.0204082, prandtl # = 1, grashof # = 1 > Number of Newton iterations = 2 > lid velocity = 0.0204082, prandtl # = 1, grashof # = 1 > Number of Newton iterations = 2 > > ========== > > Running a parallel invocation local to one machine > > $ mpirun -n 2 ./ex19 -dmmg_nlevels 4 > lid velocity = 0.0016, prandtl # = 1, grashof # = 1 > Number of Newton iterations = 2 > lid velocity = 0.0016, prandtl # = 1, grashof # = 1 > Number of Newton iterations = 2 > > $ mpirun -n 4 ./ex19 -dmmg_nlevels 4 > lid velocity = 0.0016, prandtl # = 1, grashof # = 1 > Number of Newton iterations = 2 > lid velocity = 0.0016, prandtl # = 1, grashof # = 1 > Number of Newton iterations = 2 > > however when submitting within the SGE environment, we see a > similar story to that seen with the PISM package > > % qsub -pe kmbmpi 2 my_mpirun_job.sh > % cat my_mpirun_job.sh.o425710 > lid velocity = 0.0016, prandtl # = 1, grashof # = 1 > Number of Newton iterations = 2 > lid velocity = 0.0016, prandtl # = 1, grashof # = 1 > Number of Newton iterations = 2 > > > % qsub -pe kmbmpi 4 my_mpirun_job.sh > > A swathe of PETSc errors. > > ========== > >
