Hi Ralph, Chris, You guys are both correct:
(1) The output that I passed along /is/ exemplary of only 32 processors running (provided htop reports things correctly). The job I submitted is the exact same process called 48 times (well, np times), so all procs should take about the same time, ~1 minute. The execution is notably slower than with 1.6.5 (I will time it shortly, but offhand I'd say it's ~5x slower), and it seems that, for the fraction of the time, 32 processors do all the work, and then 1 processor finishes the remaining work -- i.e. htop shows 32 procs working, 16 idling, then 32 goes down to 1 and stays that way for a while, then it drops to 0 and the job finishes. This behavior is apparent in /all/ mpi jobs, not just this particular test case. (2) I suspected that hwloc might be a culprit; before I posted here, I reported it on hwloc mailing list, where I was told that it seems to be a cache reporting problem and that I should be fine ignoring it, or that I should load the topology from XML. I figured I'd mention the buggy bios in my first post just in case it rang any bells. Is there a way to add timestamps to the debug output? That may demonstrate better what I'm trying to say in (1) above. If it helps, I'd be more than happy to provide access to the affected machine so that you can see what's going on first hand, or capture a small movie of htop while the process is running. Thanks, Andrej > Ah, that sheds some light. There is indeed a significant change > between earlier releases and the 1.8.1 and above that might explain > what he is seeing. Specifically, we no longer hammer the cpu while in > MPI_Finalize. So if 16 of the procs are finishing early (which the > output would suggest), then they will go into a "lazy" finalize state > while they wait for the rest of the procs to complete their work. > > In contrast, prior releases would continue at 100% cpu while they > polled to see if the other procs were done. > > We did this to help save power/energy, and because users had asked > why the cpu utilization remained at 100% even though procs were > waiting in finalize > > HTH > Ralph > > On Aug 21, 2014, at 5:55 PM, Christopher Samuel > <sam...@unimelb.edu.au> wrote: > > > On 22/08/14 10:43, Ralph Castain wrote: > > > >> From your earlier concerns, I would have expected only to find 32 > >> of them running. Was that not the case in this run? > > > > As I understand it in his original email he mentioned that with > > 1.6.5 all 48 processes were running at 100% CPU and was wondering > > if the buggy BIOS that caused hwloc the issues he reported on the > > hwloc-users list might be the cause for this regression in > > performance. > > > > All the best, > > Chris > > -- > > Christopher Samuel Senior Systems Administrator > > VLSCI - Victorian Life Sciences Computation Initiative > > Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 > > http://www.vlsci.org.au/ http://twitter.com/vlsci > > > > _______________________________________________ > > devel mailing list > > de...@open-mpi.org > > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > > Link to this post: > > http://www.open-mpi.org/community/lists/devel/2014/08/15686.php > > _______________________________________________ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/08/15687.php