Hi again, I though tI had got things working but maybe not, not completely, anyway.
I did this and stuff worked: PETSC_DIR=$PWD; export PETSC_DIR ./configure --with-c++-support --with-hdf5=/usr/pkg --prefix=/vol/grid/pkg/petsc-3.0.0-p7 PETSC_ARCH=netbsdelf5.0.-c-debug; export PETSC_ARCH make all make install make test cd src/snes/examples/tutorials/ make ex19 ./ex19 -contours Nice pictures! I then moved the example ex19 source and the makefile out of the distribution tree to somwhere else and built it against the installed stuff and ran it: that worked too. export PETSC_DIR=/vol/grid/pkg/petsc-3.0.0-p7 make ex19 ./ex19 -dmmg_nlevels 4 -snes_monitor_draw ./ex19 -contours I then built the package that needs PETSc, PISM, from Univ Alaska at Fairbanks, and ran that. What I then found is that the PISM stuff would fail if we launched it into an Sun Grid Engine environment with more than TWO processors, It also ran if simply mpiexec-d onto a four-processor machine but not onto a four-machine grid. I saw this block of error messages from a 4-node submission [2]PETSC ERROR: ------------------------------------------------------------------------ [2]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range [2]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger [2]PETSC ERROR: or see http://www.mcs.anl.gov/petsc/petsc-as/documentation/troubleshooting.html#Signal[2]PETSC ERROR: or try http://valgrind.org on linux or man libgmalloc on Apple to find memory corruption errors [2]PETSC ERROR: likely location of problem given in stack below [2]PETSC ERROR: --------------------- Stack Frames ------------------------------------ [2]PETSC ERROR: Note: The EXACT line numbers in the stack are not available, [2]PETSC ERROR: INSTEAD the line number of the start of the function [2]PETSC ERROR: is given. [2]PETSC ERROR: [2] VecScatterCreateCommon_PtoS line 1699 src/vec/vec/utils/vpscat.c [2]PETSC ERROR: [2] VecScatterCreate_PtoS line 1508 src/vec/vec/utils/vpscat.c [2]PETSC ERROR: [2] VecScatterCreate line 833 src/vec/vec/utils/vscat.c [2]PETSC ERROR: [2] DACreate2d line 338 src/dm/da/src/da2.c [2]PETSC ERROR: --------------------- Error Message ------------------------------------ [2]PETSC ERROR: Signal received! [2]PETSC ERROR: ------------------------------------------------------------------------ [2]PETSC ERROR: Petsc Release Version 3.0.0, Patch 7, Mon Jul 6 11:33:34 CDT 2009 [2]PETSC ERROR: See docs/changes/index.html for recent updates. [2]PETSC ERROR: See docs/faq.html for hints about trouble shooting. [2]PETSC ERROR: See docs/index.html for manual pages. [2]PETSC ERROR: -------------------------------------------------------------------------- MPI_ABORT was invoked on rank 2 in communicator MPI_COMM_WORLD with errorcode 59. NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. You may or may not see output from other processes, depending on exactly when Open MPI kills them. -------------------------------------------------------------------------- ------------------------------------------------------------------------ [2]PETSC ERROR: /vol/grid/pkg/pism-0.2.1/bin/pismv on a netbsdelf named citron.ecs.vuw.ac.nz by golledni Wed Dec 16 15:49:09 2009 [2]PETSC ERROR: Libraries linked from /vol/grid/pkg/petsc-3.0.0-p7/lib [2]PETSC ERROR: Configure run at Mon Dec 14 17:02:49 2009 [2]PETSC ERROR: Configure options --with-c++-support --with-hdf5=/usr/pkg --prefix=/vol/grid/pkg/petsc-3.0.0-p7 --with-shared=0 [2]PETSC ERROR: ------------------------------------------------------------------------ [2]PETSC ERROR: User provided function() line 0 in unknown directory unknown file -------------------------------------------------------------------------- mpirun has exited due to process rank 2 with PID 4365 on node citron.ecs.vuw.ac.nz exiting without calling "finalize". This may have caused other processes in the application to be terminated by signals sent by mpirun (as reported here). -------------------------------------------------------------------------- and this block of messages from an 8-node submission [3]PETSC ERROR: ------------------------------------------------------------------------ [3]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range [3]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger [3]PETSC ERROR: or see http://www.mcs.anl.gov/petsc/petsc-as/documentation/troubleshooting.html#Signal[3]PETSC ERROR: or try http://valgrind.org on linux or man libgmalloc on Apple to find memory corruption errors [3]PETSC ERROR: likely location of problem given in stack below [3]PETSC ERROR: --------------------- Stack Frames ------------------------------------ [2]PETSC ERROR: ------------------------------------------------------------------------ [2]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range [2]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger [2]PETSC ERROR: or see http://www.mcs.anl.gov/petsc/petsc-as/documentation/troubleshooting.html#Signal[2]PETSC ERROR: or try http://valgrind.org on linux or man libgmalloc on Apple to find memory corruption errors [2]PETSC ERROR: likely location of problem given in stack below [2]PETSC ERROR: --------------------- Stack Frames ------------------------------------ I then went back and tried to run the PETSc example and found similar happenings, things run when submitted to a two-node "grid" but not a four-node one, the error message block being: [0]PETSC ERROR: --------------------- Error Message ------------------------------------ [0]PETSC ERROR: Out of memory. This could be due to allocating [0]PETSC ERROR: too large an object or bleeding by not properly [0]PETSC ERROR: destroying unneeded objects. -------------------------------------------------------------------------- MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD with errorcode 1. NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. You may or may not see output from other processes, depending on exactly when Open MPI kills them. -------------------------------------------------------------------------- [0]PETSC ERROR: Memory allocated 90628 Memory used by process 0 [0]PETSC ERROR: Try running with -malloc_dump or -malloc_log for info. [0]PETSC ERROR: Memory requested 320! [0]PETSC ERROR: ------------------------------------------------------------------------ [0]PETSC ERROR: Petsc Release Version 3.0.0, Patch 7, Mon Jul 6 11:33:34 CDT 2009 [0]PETSC ERROR: See docs/changes/index.html for recent updates. [0]PETSC ERROR: See docs/faq.html for hints about trouble shooting. [0]PETSC ERROR: See docs/index.html for manual pages. [0]PETSC ERROR: ------------------------------------------------------------------------ [0]PETSC ERROR: /home/rialto1/kingstlind/kevin/PETSc/ex19 on a netbsdelf named petit-lyon.ecs.vuw.ac.nz by kingstlind Wed Dec 16 16:45:39 2009 [0]PETSC ERROR: Libraries linked from /vol/grid/pkg/petsc-3.0.0-p7/lib [0]PETSC ERROR: Configure run at Mon Dec 14 17:02:49 2009 [0]PETSC ERROR: Configure options --with-c++-support --with-hdf5=/usr/pkg --prefix=/vol/grid/pkg/petsc-3.0.0-p7 --with-shared=0 [0]PETSC ERROR: ------------------------------------------------------------------------ [0]PETSC ERROR: PetscMallocAlign() line 61 in src/sys/memory/mal.c [0]PETSC ERROR: PetscTrMallocDefault() line 194 in src/sys/memory/mtr.c [0]PETSC ERROR: PetscFListAdd() line 235 in src/sys/dll/reg.c [0]PETSC ERROR: MatRegister() line 140 in src/mat/interface/matreg.c [0]PETSC ERROR: MatRegisterAll() line 106 in src/mat/interface/matregis.c [0]PETSC ERROR: MatInitializePackage() line 54 in src/mat/interface/dlregismat.c [0]PETSC ERROR: MatCreate() line 74 in src/mat/utils/gcreate.c [0]PETSC ERROR: DAGetInterpolation_2D_Q1() line 308 in src/dm/da/src/dainterp.c [0]PETSC ERROR: DAGetInterpolation() line 879 in src/dm/da/src/dainterp.c [0]PETSC ERROR: DMGetInterpolation() line 144 in src/dm/da/utils/dm.c [0]PETSC ERROR: DMMGSetDM() line 309 in src/snes/utils/damg.c [0]PETSC ERROR: main() line 108 in src/snes/examples/tutorials/ex19.c -------------------------------------------------------------------------- mpirun has exited due to process rank 0 with PID 9757 on node petit-lyon.ecs.vuw.ac.nz exiting without calling "finalize". This may have caused other processes in the application to be terminated by signals sent by mpirun (as reported here). -------------------------------------------------------------------------- [1]PETSC ERROR: ------------------------------------------------------------------------ [1]PETSC ERROR: Caught signal number 15 Terminate: Somet process (or the batch system) has told this process to end [1]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger [1]PETSC ERROR: or see http://www.mcs.anl.gov/petsc/petsc-as/documentation/troubleshooting.html#Signal[2]PETSC ERROR: ------------------------------------------------------------------------ [pulcinella.ecs.vuw.ac.nz:24936] opal_sockaddr2str failed:Unknown error (return code 4) [3]PETSC ERROR: ------------------------------------------------------------------------ [3]PETSC ERROR: Caught signal number 15 Terminate: Somet process (or the batch system) has told this process to end [3]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger [3]PETSC ERROR: or see http://www.mcs.anl.gov/petsc/petsc-as/documentation/troubleshooting.html#Signal[3]PETSC ERROR: or try http://valgrind.org on linux or man libgmalloc on Apple to find memory corruption errors [3]PETSC ERROR: Do the PETSc error message suggest anything wrong with my PETSc or do they point to underlying problems with the OpenMPI ? Any suggestions/insight welcome, Kevin -- Kevin M. Buckley Room: CO327 School of Engineering and Phone: +64 4 463 5971 Computer Science Victoria University of Wellington New Zealand
