Thanks - that helps clarify a great deal! I'll keep you posted, pending any further input on the initial question.
FWIW: I'm also using OMPI/ORTE in an embedded environment, so I suspect some of our issues are common. On Oct 12, 2010, at 9:59 AM, Kenneth Lloyd wrote: > Ralph, > > I think I understand the problem very well. My point is that it is easier > for us researchers to "bit-twiddle" than to ask accommodation from a more > "orthodox" implementation. If you believe that an OS threading approach > better addresses your concerns, then by all means, drop the single threading > concern. It truly doesn't inconvenience us much at all. Perhaps some > logical bifurcation point has been reached. > > Our work involves a re-visitation of the hwloc and carto modules in new and > interesting ways. You have touched on a major performance issue - the > asynchronous nature, not only of message passing and certain RDMA, but of > the generally asynchronous nature we face in MPP computation across myriad > hardware platforms (FPGAs, CPUs of various stripes, GPUs, memories, IO hubs, > HCAs and bridges thereof), not to mention different software and middleware. > We discovered we were playing "wack-a-mole" or Theory of Constraints in > optimizing efficiency and effectiveness of the many configurations, given > the different software stacks (esp. w/ hard-coded task rollouts) and various > data partitioning schemes. IOW, trust me, we KNOW about hanging. > > There are probably several ways of addressing this issue. Ours is not yours. > When we get some reliable data, we'll be happy to push out a whitepaper > describing some of the experiments that lead us to our conclusions. That > way, others can experiment to see which solutions work best for them. > > Ken > > > > -----Original Message----- > From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org] On > Behalf Of Ralph Castain > Sent: Tuesday, October 12, 2010 9:28 AM > To: Open MPI Developers > Subject: Re: [OMPI devel] Threading > > I honestly wasn't casting aspersions - just sounds like a very strange > operational mode. Never heard of something like that before. > > The problem is that we continue to have issues with clean termination and > "hangs", largely because the program counter gets "hung" as we try to work > with an event-driven system constrained to a single thread. We also have > performance problems because we cannot progress communications > asynchronously. > > So the movement is to threading mpirun and the orte daemons to solve the > problems. Maintaining both threaded and unthreaded operations inside a > single code becomes a study in spaghetti, and so it may prove intractable. > In that case, I'll "freeze" an unthreaded version at the current level, and > we'll focus further development on the threaded version. > > If we go that route (and that isn't a given yet), then I'll rig the build > system so configuring without threads generates the unthreaded version, with > the correct accompanying man page. > > HTH > Ralph > > > On Oct 12, 2010, at 9:15 AM, Kenneth Lloyd wrote: > >> Ralph, >> >> There is really no need to do anything different to accommodate us > "oddball" >> cases. Continue to "do what you do". >> >> Ken >> >> -----Original Message----- >> From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org] On >> Behalf Of Ralph Castain >> Sent: Tuesday, October 12, 2010 9:01 AM >> To: Open MPI Developers >> Subject: Re: [OMPI devel] Threading >> >> Hmmm...I don't understand what you just said, but it definitely sounds >> -ugly-! :-) >> >> I'll take your word for it - we may have to provide a lower performance >> version for such oddball purposes, and offer a higher capability version > for >> everyone else. I'll see if I can keep a single version, though, assuming > the >> code doesn't get too convoluted so as to become unmaintainable. >> >> Otherwise, I'll branch it and "freeze" a non-threaded version for the >> unusual case. >> >> Thanks! >> >> On Oct 12, 2010, at 8:51 AM, Kenneth Lloyd wrote: >> >>> In certain hybrid, heterogeneous HPC configurations, mpirun often cannot >> or >>> should not be threaded through the OS under which OpenMPI runs. The >> primary >>> OS and MPI can configure management nodes and topologies (even other MPI >>> layers) that subsequently spawn various OSes and other lightweight >> kernels. >>> These share memory spaces and indirectly access the program stacks in >>> various devices. >>> >>> In short, yes, there are environments where this would cause a problem. >>> >>> ================== >>> Kenneth A. Lloyd >>> Watt Systems Technologies Inc. >>> >>> >>> -----Original Message----- >>> From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org] On >>> Behalf Of Barrett, Brian W >>> Sent: Tuesday, October 12, 2010 8:24 AM >>> To: Open MPI Developers >>> Subject: Re: [OMPI devel] Threading >>> >>> On Oct 11, 2010, at 11:41 PM, Ralph Castain wrote: >>> >>>> Does anyone know of a reason why mpirun can -not- be threaded, assuming >>> that all threads block and do not continuously chew cpu? Is there an >>> environment where this would cause a problem? >>> >>> We don't have any machines at Sandia where I could see this being a >> problem. >>> >>> Brian >>> >>> -- >>> Brian W. Barrett >>> Dept. 1423: Scalable System Software >>> Sandia National Laboratories >>> >>> >>> >>> >>> >>> _______________________________________________ >>> devel mailing list >>> de...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>> >>> _______________________________________________ >>> devel mailing list >>> de...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >> >> >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel >> >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel > > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel