Thanks - that helps clarify a great deal!

I'll keep you posted, pending any further input on the initial question.

FWIW: I'm also using OMPI/ORTE in an embedded environment, so I suspect some of 
our issues are common.


On Oct 12, 2010, at 9:59 AM, Kenneth Lloyd wrote:

> Ralph,
> 
> I think I understand the problem very well.  My point is that it is easier
> for us researchers to "bit-twiddle" than to ask accommodation from a more
> "orthodox" implementation.  If you believe that an OS threading approach
> better addresses your concerns, then by all means, drop the single threading
> concern.  It truly doesn't inconvenience us much at all.  Perhaps some
> logical bifurcation point has been reached.
> 
> Our work involves a re-visitation of the hwloc and carto modules in new and
> interesting ways.  You have touched on a major performance issue - the
> asynchronous nature, not only of message passing and certain RDMA, but of
> the generally asynchronous nature we face in MPP computation across myriad
> hardware platforms (FPGAs, CPUs of various stripes, GPUs, memories, IO hubs,
> HCAs and bridges thereof), not to mention different software and middleware.
> We discovered we were playing "wack-a-mole" or Theory of Constraints in
> optimizing efficiency and effectiveness of the many configurations, given
> the different software stacks (esp. w/ hard-coded task rollouts) and various
> data partitioning schemes.  IOW, trust me, we KNOW about hanging.
> 
> There are probably several ways of addressing this issue. Ours is not yours.
> When we get some reliable data, we'll be happy to push out a whitepaper
> describing some of the experiments that lead us to our conclusions.  That
> way, others can experiment to see which solutions work best for them.
> 
> Ken
> 
> 
> 
> -----Original Message-----
> From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org] On
> Behalf Of Ralph Castain
> Sent: Tuesday, October 12, 2010 9:28 AM
> To: Open MPI Developers
> Subject: Re: [OMPI devel] Threading
> 
> I honestly wasn't casting aspersions - just sounds like a very strange
> operational mode. Never heard of something like that before.
> 
> The problem is that we continue to have issues with clean termination and
> "hangs", largely because the program counter gets "hung" as we try to work
> with an event-driven system constrained to a single thread. We also have
> performance problems because we cannot progress communications
> asynchronously.
> 
> So the movement is to threading mpirun and the orte daemons to solve the
> problems. Maintaining both threaded and unthreaded operations inside a
> single code becomes a study in spaghetti, and so it may prove intractable.
> In that case, I'll "freeze" an unthreaded version at the current level, and
> we'll focus further development on the threaded version.
> 
> If we go that route (and that isn't a given yet), then I'll rig the build
> system so configuring without threads generates the unthreaded version, with
> the correct accompanying man page.
> 
> HTH
> Ralph
> 
> 
> On Oct 12, 2010, at 9:15 AM, Kenneth Lloyd wrote:
> 
>> Ralph,
>> 
>> There is really no need to do anything different to accommodate us
> "oddball"
>> cases.  Continue to "do what you do".
>> 
>> Ken
>> 
>> -----Original Message-----
>> From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org] On
>> Behalf Of Ralph Castain
>> Sent: Tuesday, October 12, 2010 9:01 AM
>> To: Open MPI Developers
>> Subject: Re: [OMPI devel] Threading
>> 
>> Hmmm...I don't understand what you just said, but it definitely sounds
>> -ugly-! :-)
>> 
>> I'll take your word for it - we may have to provide a lower performance
>> version for such oddball purposes, and offer a higher capability version
> for
>> everyone else. I'll see if I can keep a single version, though, assuming
> the
>> code doesn't get too convoluted so as to become unmaintainable.
>> 
>> Otherwise, I'll branch it and "freeze" a non-threaded version for the
>> unusual case.
>> 
>> Thanks!
>> 
>> On Oct 12, 2010, at 8:51 AM, Kenneth Lloyd wrote:
>> 
>>> In certain hybrid, heterogeneous HPC configurations, mpirun often cannot
>> or
>>> should not be threaded through the OS under which OpenMPI runs. The
>> primary
>>> OS and MPI can configure management nodes and topologies (even other MPI
>>> layers) that subsequently spawn various OSes and other lightweight
>> kernels.
>>> These share memory spaces and indirectly access the program stacks in
>>> various devices.  
>>> 
>>> In short, yes, there are environments where this would cause a problem.
>>> 
>>> ==================
>>> Kenneth A. Lloyd
>>> Watt Systems Technologies Inc.
>>> 
>>> 
>>> -----Original Message-----
>>> From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org] On
>>> Behalf Of Barrett, Brian W
>>> Sent: Tuesday, October 12, 2010 8:24 AM
>>> To: Open MPI Developers
>>> Subject: Re: [OMPI devel] Threading
>>> 
>>> On Oct 11, 2010, at 11:41 PM, Ralph Castain wrote:
>>> 
>>>> Does anyone know of a reason why mpirun can -not- be threaded, assuming
>>> that all threads block and do not continuously chew cpu? Is there an
>>> environment where this would cause a problem?
>>> 
>>> We don't have any machines at Sandia where I could see this being a
>> problem.
>>> 
>>> Brian
>>> 
>>> --
>>> Brian W. Barrett
>>> Dept. 1423: Scalable System Software
>>> Sandia National Laboratories
>>> 
>>> 
>>> 
>>> 
>>> 
>>> _______________________________________________
>>> devel mailing list
>>> de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> 
>>> _______________________________________________
>>> devel mailing list
>>> de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> 
>> 
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> 
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel


Reply via email to