I would tend to agree with Paul.

It's uncommon (e.g., no one has run into this before now), and I would say that 
this is a bad application.  But then again, hanging is bad -- so it would be 
better to abort/terminate the whole job in this scenario.

I don't know how I would rate the priority of this, but it would be nice to 
have someday.


On Dec 15, 2009, at 11:17 PM, Ralph Castain wrote:

> Understandable - and we can count on your patch in the near future, then? :-)
> 
> On Dec 15, 2009, at 9:12 PM, Paul H. Hargrove wrote:
> 
> > My 0.02USD says that for pragmatic reasons one should attempt to terminate 
> > the job in this case, regardless of ones opinion of this unusual 
> > application behavior.
> >
> > -Paul
> >
> > Ralph Castain wrote:
> >> Hi folks
> >>
> >> In case you didn't follow this on the user list, we had a question come up 
> >> about proper OMPI behavior. Basically, the user has an application where 
> >> one process decides it should cleanly terminate prior to calling MPI_Init, 
> >> but all the others go ahead and enter MPI_Init. The application hangs 
> >> since we don't detect the one proc's exit as an abnormal termination (no 
> >> segfault, and it didn't call MPI_Init so it isn't required to call 
> >> MPI_Finalize prior to termination).
> >>
> >> I can probably come up with a way to detect this scenario and abort it. 
> >> But before I spend the effort chasing this down, my question to you MPI 
> >> folks is:
> >>
> >> What -should- OMPI do in this situation? We have never previously detected 
> >> such behavior - was this an oversight, or is this simply a "bad" 
> >> application?
> >>
> >> Thanks
> >> Ralph
> >>
> >>
> >> _______________________________________________
> >> devel mailing list
> >> de...@open-mpi.org
> >> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >> 
> >
> >
> > --
> > Paul H. Hargrove                          phhargr...@lbl.gov
> > Future Technologies Group                 Tel: +1-510-495-2352
> > HPC Research Department                   Fax: +1-510-486-6900
> > Lawrence Berkeley National Laboratory    
> > _______________________________________________
> > devel mailing list
> > de...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 


-- 
Jeff Squyres
jsquy...@cisco.com


Reply via email to