On Mon, 2006-11-27 at 16:29 -0700, Brian W Barrett wrote: > On Nov 27, 2006, at 4:19 PM, Matt Leininger wrote: > > > I've been running more tests of OpenMPI v1.2b. I've run into several > > cases where the app+MPI use too much memory and the OOM handler kills > > off tasks. Sometimes the ompi mpirun shuts down gracefully, but other > > times the OOM handler may kill off 1 to 4 MPI tasks per node (when I'm > > using 8 MPI tasks per node). The remaining MPI tasks keep > > running/polling and have to be killed off by hand. Has anyone seen > > this > > behavior before? > > Are the orteds also getting killed?
Not sure. I'll check the next time I see this. > It's a known problem that if the > orted is killed by outside forces, everything kind of hangs. We're > working on this one, and hope to have it fixed by the time 1.2 ships. That could be the problem. > > I'm not really familiar with the OOM killer -- does it cause the > parent of the killed process to get a SIGCHLD? If not, that could be > a fairly serious problem for us, as we rely on SIGCHLDs being > received by the orteds when things die... Mark Grondona could answer this. His reply to devel-core bounced so I'm including [email protected] on this thread. - Matt > > Brian > _______________________________________________ > devel-core mailing list > [email protected] > http://www.open-mpi.org/mailman/listinfo.cgi/devel-core >
