On Tue, 27 Apr 2010, Derek Gaston wrote: > Right... this whole idea of ONE global spinlock isn't right. With TBB each > place where a spinlock is needed should have it's own spinlock defined just > in the local compilation unit (often they do this at the top of the .C).
I hate global variables sitting at the top of a .C, and for larger .C files you could still be "overloading" a lock. To be really safe we want the lock to be local to the data whose concurrent access it guards - the per-GetPot lock should be a good example of that. There are probably instances where that idiom won't work, but I'll bet that so far we can naturally put each lock target into one of our classes. > We could just use a compilation unit local spinlock in the Parallel namespace As an aside: in hindsight, the Parallel:: namespace should have been the Parallel:: class, which should have been the encapsulation of Parallel::Communicator. Every function there needs a communicator argument anyway, and calling libMesh::Comm_World.sum(a) would have been a more natural expression of that than tacking on the argument with a default value of Comm_World. Probably too late to change now without too much inconvenience. > where the MPI functions are and lock / unlock it as we go through the > parallel calls. The only issue is speed... I don't know how much overhead > that will add when we're calling those functions and we don't need to create > / destroy the spinlock. That's not the only issue, unfortunately. That would be a decent fix for letting people run with non-thread-safe MPI implementations, and we could let configure turn off the Parallel::thread_lock if a thread-safe implementation is detected. The big issue is: we currently depend on each rank hitting each Parallel:: call in a sane order. But if e.g. thread A on each rank is doing MeshTools::n_p_levels() and calls Parallel::max(nl) there, and thread B on each rank is doing MeshTools::n_active_levels() and calls Parallel::max(nl) there, then the results of both calls will basically be garbage. I can think of other cases that will result in deadlocks instead. Even asynchronous Parallel:: functions aren't safe, because it's possible for the same caller function (and thus the same tag) to be called by multiple threads in different contexts. I'm happy we don't actually have examples of such bugs yet, but they'd be too easy to write, and I haven't thought of a good way to assert() such bugs out of existence yet. "we want to break up elements for threading, but when we hit an element in set A we'll need to call some parallel code to instantiate a resource, and when we hit an element in set B we'll need to call different parallel code to instantiate a different resource" is a devastating counterexample to every attempt so far. Even if we create a unique MPI communicator for each thread, how do we know *which* thread gets which communicator? In the PointLocator example, it's pretty much random which thread hits the "create a PointLocator now" lock first. I'm actually starting to lean back towards enforcing "no Parallel:: calls from within Threads:: calls". In the periodic constraints PointLocator example again, on a ParallelMesh it's possible for an interior partition to never hit the PointLocator creation call - at that point in opt mode the whole run would stall waiting for that interior partition's rank to join in on bounding_box calculations, or in debug mode we'd die in a parallel_only() assertion failure. Maybe the bugfix here is for create_periodic_constraints to check whether it will need a PointLocator, and if necessary create one *before* it ever gets into its own Threads:: loop. And maybe something like that is the most practical bugfix for *every* Parallel-within-Threads situation. --- Roy ------------------------------------------------------------------------------ _______________________________________________ Libmesh-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/libmesh-devel
