Ralph, I think I understand the problem very well. My point is that it is easier for us researchers to "bit-twiddle" than to ask accommodation from a more "orthodox" implementation. If you believe that an OS threading approach better addresses your concerns, then by all means, drop the single threading concern. It truly doesn't inconvenience us much at all. Perhaps some logical bifurcation point has been reached.
Our work involves a re-visitation of the hwloc and carto modules in new and interesting ways. You have touched on a major performance issue - the asynchronous nature, not only of message passing and certain RDMA, but of the generally asynchronous nature we face in MPP computation across myriad hardware platforms (FPGAs, CPUs of various stripes, GPUs, memories, IO hubs, HCAs and bridges thereof), not to mention different software and middleware. We discovered we were playing "wack-a-mole" or Theory of Constraints in optimizing efficiency and effectiveness of the many configurations, given the different software stacks (esp. w/ hard-coded task rollouts) and various data partitioning schemes. IOW, trust me, we KNOW about hanging. There are probably several ways of addressing this issue. Ours is not yours. When we get some reliable data, we'll be happy to push out a whitepaper describing some of the experiments that lead us to our conclusions. That way, others can experiment to see which solutions work best for them. Ken -----Original Message----- From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org] On Behalf Of Ralph Castain Sent: Tuesday, October 12, 2010 9:28 AM To: Open MPI Developers Subject: Re: [OMPI devel] Threading I honestly wasn't casting aspersions - just sounds like a very strange operational mode. Never heard of something like that before. The problem is that we continue to have issues with clean termination and "hangs", largely because the program counter gets "hung" as we try to work with an event-driven system constrained to a single thread. We also have performance problems because we cannot progress communications asynchronously. So the movement is to threading mpirun and the orte daemons to solve the problems. Maintaining both threaded and unthreaded operations inside a single code becomes a study in spaghetti, and so it may prove intractable. In that case, I'll "freeze" an unthreaded version at the current level, and we'll focus further development on the threaded version. If we go that route (and that isn't a given yet), then I'll rig the build system so configuring without threads generates the unthreaded version, with the correct accompanying man page. HTH Ralph On Oct 12, 2010, at 9:15 AM, Kenneth Lloyd wrote: > Ralph, > > There is really no need to do anything different to accommodate us "oddball" > cases. Continue to "do what you do". > > Ken > > -----Original Message----- > From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org] On > Behalf Of Ralph Castain > Sent: Tuesday, October 12, 2010 9:01 AM > To: Open MPI Developers > Subject: Re: [OMPI devel] Threading > > Hmmm...I don't understand what you just said, but it definitely sounds > -ugly-! :-) > > I'll take your word for it - we may have to provide a lower performance > version for such oddball purposes, and offer a higher capability version for > everyone else. I'll see if I can keep a single version, though, assuming the > code doesn't get too convoluted so as to become unmaintainable. > > Otherwise, I'll branch it and "freeze" a non-threaded version for the > unusual case. > > Thanks! > > On Oct 12, 2010, at 8:51 AM, Kenneth Lloyd wrote: > >> In certain hybrid, heterogeneous HPC configurations, mpirun often cannot > or >> should not be threaded through the OS under which OpenMPI runs. The > primary >> OS and MPI can configure management nodes and topologies (even other MPI >> layers) that subsequently spawn various OSes and other lightweight > kernels. >> These share memory spaces and indirectly access the program stacks in >> various devices. >> >> In short, yes, there are environments where this would cause a problem. >> >> ================== >> Kenneth A. Lloyd >> Watt Systems Technologies Inc. >> >> >> -----Original Message----- >> From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org] On >> Behalf Of Barrett, Brian W >> Sent: Tuesday, October 12, 2010 8:24 AM >> To: Open MPI Developers >> Subject: Re: [OMPI devel] Threading >> >> On Oct 11, 2010, at 11:41 PM, Ralph Castain wrote: >> >>> Does anyone know of a reason why mpirun can -not- be threaded, assuming >> that all threads block and do not continuously chew cpu? Is there an >> environment where this would cause a problem? >> >> We don't have any machines at Sandia where I could see this being a > problem. >> >> Brian >> >> -- >> Brian W. Barrett >> Dept. 1423: Scalable System Software >> Sandia National Laboratories >> >> >> >> >> >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel >> >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel > > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel _______________________________________________ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel