Much appreciated! Per some of my other comments on this thread and on the referenced ticket, can you tell me what kernel you have on that machine? I assume you have NUMA support enabled, given that chipset?
Thanks! Ralph On Wed, Jun 10, 2009 at 10:29 AM, Sylvain Jeaugey <sylvain.jeau...@bull.net>wrote: > Hum, very glad that padb works with Open MPI, I couldn't live without it. > In my opinion, the best debug tool for parallel applications, and more > importantly, the only one that scales. > > About the issue, I couldn't reproduce it on my platform (tried 2 nodes with > 2 to 8 processes each, nodes are twin 2.93 GHz Nehalem, IB is Mellanox QDR). > > So my feeling about that is that is may be very hardware related. > Especially if you use the hierarch component, some transactions will be done > through RDMA on one side and read directly through shared memory on the > other side, which can, depending on the hardware, produce very different > timings and bugs. Did you try with a different collective component (i.e. > not hierarch) ? Or with another interconnect ? [Yes, of course, if it is a > race condition, we might well avoid the bug because timings will be > different, but that's still information] > > Perhaps all what I'm saying makes no sense or you already thought about > this, anyway, if you want me to try different things, just let me know. > > Sylvain > > > On Wed, 10 Jun 2009, Ralph Castain wrote: > > Hi Ashley >> >> Thanks! I would definitely be interested and will look at the tool. >> Meantime, I have filed a bunch of data on this in >> ticket #1944, so perhaps you might take a glance at that and offer some >> thoughts? >> >> https://svn.open-mpi.org/trac/ompi/ticket/1944 >> >> Will be back after I look at the tool. >> >> Thanks again >> Ralph >> >> >> On Wed, Jun 10, 2009 at 8:51 AM, Ashley Pittman <ash...@pittman.co.uk> >> wrote: >> >> Ralph, >> >> If I may say this is exactly the type of problem the tool I have been >> working on recently aims to help with and I'd be happy to help you >> through it. >> >> Firstly I'd say of the three collectives you mention, MPI_Allgather, >> MPI_Reduce and MPI_Bcast one exhibit a many-to-many, one a >> many-to-one >> and the last a many-to-one communication pattern. The scenario of a >> root process falling behind and getting swamped in comms is a >> plausible >> one for MPI_Reduce only but doesn't hold water with the other two. >> You >> also don't mention if the loop is over a single collective or if you >> have loop calling a number of different collectives each iteration. >> >> padb, the tool I've been working on has the ability to look at >> parallel >> jobs and report on the state of collective comms and should help >> narrow >> you down on erroneous processes and those simply blocked waiting for >> comms. I'd recommend using it to look at maybe four or five >> instances >> where the application has hung and look for any common features >> between >> them. >> >> Let me know if you are willing to try this route and I'll talk, the >> code >> is downloadable from http://padb.pittman.org.uk and if you want the >> full >> collective functionality you'll need to patch openmp with the patch >> from >> http://padb.pittman.org.uk/extensions.html >> >> Ashley, >> >> -- >> >> Ashley Pittman, Bath, UK. >> >> Padb - A parallel job inspection tool for cluster computing >> http://padb.pittman.org.uk >> >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel >> >> >> >> > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel >