On Mon, 2006-11-27 at 16:29 -0700, Brian W Barrett wrote:
> On Nov 27, 2006, at 4:19 PM, Matt Leininger wrote:
> 
> >  I've been running more tests of OpenMPI v1.2b.  I've run into several
> > cases where the app+MPI use too much memory and the OOM handler kills
> > off tasks.  Sometimes the ompi mpirun shuts down gracefully, but other
> > times the OOM handler may kill off 1 to 4 MPI tasks per node (when I'm
> > using 8 MPI tasks per node).  The remaining MPI tasks keep
> > running/polling and have to be killed off by hand.  Has anyone seen  
> > this
> > behavior before?
> 
> Are the orteds also getting killed? 

  Not sure.  I'll check the next time I see this.

>  It's a known problem that if the  
> orted is killed by outside forces, everything kind of hangs.  We're  
> working on this one, and hope to have it fixed by the time 1.2 ships.

  That could be the problem.  

> 
> I'm not really familiar with the OOM killer -- does it cause the  
> parent of the killed process to get a SIGCHLD?  If not, that could be  
> a fairly serious problem for us, as we rely on SIGCHLDs being  
> received by the orteds when things die...

  Mark Grondona could answer this.  His reply to devel-core bounced so
I'm including de...@open-mpi.org on this thread.

  - Matt

> 
> Brian
> _______________________________________________
> devel-core mailing list
> devel-c...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel-core
> 


Reply via email to