Dear all, I hope I am in the correct mailing list with my problem. I try to run openmpi with the gridengine(6.0u10, 6.1). Therefore I compiled openmpi (1.2.2), which has the gridengine support included, I have checked it with ompi_info. In principle, openmpi runs well. The gridengine is configured such that the user has to specify the memory consumption via the h_vmem option. Then I noticed that with a larger number of processes the job is killed by the gridengine because of taking too much memory. To take a closer look on that, I wrote a small and simple (Fortran) MPI program which has just a MPI_Init and a (static) array, in my case of 50MB, then the programm goes into a (infinite) loop, because it takes some time until the gridengine reports the maxvmem. I found, that if the processes run all on different nodes, there is only a offset per process, at least a linear scaling. But it becomes worse when the jobs run on one node. There it seems to be a quadratic scaling with the offset, in my case about 30M. I made a list of the virtual memory reported by the gridengine, I was running on a 16 processor node:
#N proc virt. Mem[MB] 1 182 2 468 3 825 4 1065 5 1001 6 1378 7 1817 8 2303 12 4927 16 8559 the pure program should need N*50MB, for 16 it is only 800M, but it takes 10 times more, >7GB!!! Of course, the gridengine will kills the job is this overtaking is not taken into accout, because of too much virtual memory consumption. The momory consumtion is not related to the grid engine, it is the same if I run from the command line. I guess it might be related to the 'sm' component of the btl. Is it possible to avoid the quadratic scaling? Of course I could use the vapi/tcp component only like mpirun --mca btl mvapi -np 16 ./my_test_program in this case the virtual memory is fine, but it will not be what one wants on a smp node. then it becomes ever worse: openmpi nicely report the (max./act.) used virtual memory to the grid engine as sum of all processes. This value is the compared with the one the user has specified with the h_vmem option, but the gridengine takes this value per process for the allocation of the job (works) and does not multiply this with the number of processes. Maybe one should report this to the gridenging mailing list, but it could be related as well for the openmpi interface. The last thing I noticed: It seems that if the v_mem option for gridengine jobs is specified like '2.0G' my test job was immedialtely killed; but when I specify '2000M' (which is obviously less) it work. The gridengine puts the job allways on the correct node as requested, but I think there is might be a problem in the openmpi interface. It would be nice if someone could give some hints how to avoid the quadratic scaling or maybe to think if this is really neccessary in openmpi. Thanks. Markus Daene my compiling options: ./configure --prefix=/not_important --enable-static --with-f90-size=medium --with-f90-max-array-dim=7 --with-mpi-para m-check=always --enable-cxx-exceptions --with-mvapi --enable-mca-no-build=btl-tcp ompi_info output: Open MPI: 1.2.2 Open MPI SVN revision: r14613 Open RTE: 1.2.2 Open RTE SVN revision: r14613 OPAL: 1.2.2 OPAL SVN revision: r14613 Prefix: /usrurz/openmpi/1.2.2/pathscale_3.0 Configured architecture: x86_64-unknown-linux-gnu Configured by: root Configured on: Mon Jun 4 16:04:38 CEST 2007 Configure host: GE1N01 Built by: root Built on: Mon Jun 4 16:09:37 CEST 2007 Built host: GE1N01 C bindings: yes C++ bindings: yes Fortran77 bindings: yes (all) Fortran90 bindings: yes Fortran90 bindings size: small C compiler: pathcc C compiler absolute: /usrurz/pathscale/bin/pathcc C++ compiler: pathCC C++ compiler absolute: /usrurz/pathscale/bin/pathCC Fortran77 compiler: pathf90 Fortran77 compiler abs: /usrurz/pathscale/bin/pathf90 Fortran90 compiler: pathf90 Fortran90 compiler abs: /usrurz/pathscale/bin/pathf90 C profiling: yes C++ profiling: yes Fortran77 profiling: yes Fortran90 profiling: yes C++ exceptions: yes Thread support: posix (mpi: no, progress: no) Internal debug support: no MPI parameter check: always Memory profiling support: no Memory debugging support: no libltdl support: yes Heterogeneous support: yes mpirun default --prefix: no MCA backtrace: execinfo (MCA v1.0, API v1.0, Component v1.2.2) MCA memory: ptmalloc2 (MCA v1.0, API v1.0, Component v1.2.2) MCA paffinity: linux (MCA v1.0, API v1.0, Component v1.2.2) MCA maffinity: first_use (MCA v1.0, API v1.0, Component v1.2.2) MCA maffinity: libnuma (MCA v1.0, API v1.0, Component v1.2.2) MCA timer: linux (MCA v1.0, API v1.0, Component v1.2.2) MCA installdirs: env (MCA v1.0, API v1.0, Component v1.2.2) MCA installdirs: config (MCA v1.0, API v1.0, Component v1.2.2) MCA allocator: basic (MCA v1.0, API v1.0, Component v1.0) MCA allocator: bucket (MCA v1.0, API v1.0, Component v1.0) MCA coll: basic (MCA v1.0, API v1.0, Component v1.2.2) MCA coll: self (MCA v1.0, API v1.0, Component v1.2.2) MCA coll: sm (MCA v1.0, API v1.0, Component v1.2.2) MCA coll: tuned (MCA v1.0, API v1.0, Component v1.2.2) MCA io: romio (MCA v1.0, API v1.0, Component v1.2.2) MCA mpool: rdma (MCA v1.0, API v1.0, Component v1.2.2) MCA mpool: sm (MCA v1.0, API v1.0, Component v1.2.2) MCA pml: cm (MCA v1.0, API v1.0, Component v1.2.2) MCA pml: ob1 (MCA v1.0, API v1.0, Component v1.2.2) MCA bml: r2 (MCA v1.0, API v1.0, Component v1.2.2) MCA rcache: vma (MCA v1.0, API v1.0, Component v1.2.2) MCA btl: self (MCA v1.0, API v1.0.1, Component v1.2.2) MCA btl: sm (MCA v1.0, API v1.0.1, Component v1.2.2) MCA btl: mvapi (MCA v1.0, API v1.0.1, Component v1.2.2) MCA topo: unity (MCA v1.0, API v1.0, Component v1.2.2) MCA osc: pt2pt (MCA v1.0, API v1.0, Component v1.2.2) MCA errmgr: hnp (MCA v1.0, API v1.3, Component v1.2.2) MCA errmgr: orted (MCA v1.0, API v1.3, Component v1.2.2) MCA errmgr: proxy (MCA v1.0, API v1.3, Component v1.2.2) MCA gpr: null (MCA v1.0, API v1.0, Component v1.2.2) MCA gpr: proxy (MCA v1.0, API v1.0, Component v1.2.2) MCA gpr: replica (MCA v1.0, API v1.0, Component v1.2.2) MCA iof: proxy (MCA v1.0, API v1.0, Component v1.2.2) MCA iof: svc (MCA v1.0, API v1.0, Component v1.2.2) MCA ns: proxy (MCA v1.0, API v2.0, Component v1.2.2) MCA ns: replica (MCA v1.0, API v2.0, Component v1.2.2) MCA oob: tcp (MCA v1.0, API v1.0, Component v1.0) MCA ras: dash_host (MCA v1.0, API v1.3, Component v1.2.2) MCA ras: localhost (MCA v1.0, API v1.3, Component v1.2.2) MCA ras: gridengine (MCA v1.0, API v1.3, Component v1.2.2) MCA ras: slurm (MCA v1.0, API v1.3, Component v1.2.2) MCA rds: hostfile (MCA v1.0, API v1.3, Component v1.2.2) MCA rds: proxy (MCA v1.0, API v1.3, Component v1.2.2) MCA rds: resfile (MCA v1.0, API v1.3, Component v1.2.2) MCA rmaps: round_robin (MCA v1.0, API v1.3, Component v1.2.2) MCA rmgr: proxy (MCA v1.0, API v2.0, Component v1.2.2) MCA rmgr: urm (MCA v1.0, API v2.0, Component v1.2.2) MCA rml: oob (MCA v1.0, API v1.0, Component v1.2.2) MCA pls: proxy (MCA v1.0, API v1.3, Component v1.2.2) MCA pls: gridengine (MCA v1.0, API v1.3, Component v1.2.2) MCA pls: rsh (MCA v1.0, API v1.3, Component v1.2.2) MCA pls: slurm (MCA v1.0, API v1.3, Component v1.2.2) MCA sds: env (MCA v1.0, API v1.0, Component v1.2.2) MCA sds: seed (MCA v1.0, API v1.0, Component v1.2.2) MCA sds: singleton (MCA v1.0, API v1.0, Component v1.2.2) MCA sds: pipe (MCA v1.0, API v1.0, Component v1.2.2) MCA sds: slurm (MCA v1.0, API v1.0, Component v1.2.2) ---------------------------------------------------------- Markus Daene Martin Luther University Halle-Wittenberg Naturwissenschaftliche Fakultaet II Institute of Physics Von Seckendorff-Platz 1 (room 1.28) 06120 Halle Germany