Hi Ralph, Chris,

You guys are both correct:

(1) The output that I passed along /is/ exemplary of only 32 processors
    running (provided htop reports things correctly). The job I
    submitted is the exact same process called 48 times (well, np
    times), so all procs should take about the same time, ~1 minute.
    The execution is notably slower than with 1.6.5 (I will time it
    shortly, but offhand I'd say it's ~5x slower), and it seems that,
    for the fraction of the time, 32 processors do all the work, and
    then 1 processor finishes the remaining work -- i.e. htop shows 32
    procs working, 16 idling, then 32 goes down to 1 and stays that way
    for a while, then it drops to 0 and the job finishes. This behavior
    is apparent in /all/ mpi jobs, not just this particular test case.

(2) I suspected that hwloc might be a culprit; before I posted here, I
    reported it on hwloc mailing list, where I was told that it seems
    to be a cache reporting problem and that I should be fine ignoring
    it, or that I should load the topology from XML. I figured I'd
    mention the buggy bios in my first post just in case it rang any
    bells.

Is there a way to add timestamps to the debug output? That may
demonstrate better what I'm trying to say in (1) above.

If it helps, I'd be more than happy to provide access to the affected
machine so that you can see what's going on first hand, or capture a
small movie of htop while the process is running.

Thanks,
Andrej

> Ah, that sheds some light. There is indeed a significant change
> between earlier releases and the 1.8.1 and above that might explain
> what he is seeing. Specifically, we no longer hammer the cpu while in
> MPI_Finalize. So if 16 of the procs are finishing early (which the
> output would suggest), then they will go into a "lazy" finalize state
> while they wait for the rest of the procs to complete their work.
> 
> In contrast, prior releases would continue at 100% cpu while they
> polled to see if the other procs were done.
> 
> We did this to help save power/energy, and because users had asked
> why the cpu utilization remained at 100% even though procs were
> waiting in finalize
> 
> HTH
> Ralph
> 
> On Aug 21, 2014, at 5:55 PM, Christopher Samuel
> <sam...@unimelb.edu.au> wrote:
> 
> > On 22/08/14 10:43, Ralph Castain wrote:
> > 
> >> From your earlier concerns, I would have expected only to find 32
> >> of them running. Was that not the case in this run?
> > 
> > As I understand it in his original email he mentioned that with
> > 1.6.5 all 48 processes were running at 100% CPU and was wondering
> > if the buggy BIOS that caused hwloc the issues he reported on the
> > hwloc-users list might be the cause for this regression in
> > performance.
> > 
> > All the best,
> > Chris
> > -- 
> > Christopher Samuel        Senior Systems Administrator
> > VLSCI - Victorian Life Sciences Computation Initiative
> > Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
> > http://www.vlsci.org.au/      http://twitter.com/vlsci
> > 
> > _______________________________________________
> > devel mailing list
> > de...@open-mpi.org
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > Link to this post:
> > http://www.open-mpi.org/community/lists/devel/2014/08/15686.php
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2014/08/15687.php

Reply via email to