Hi again,

I generated a video that demonstrates the problem; for brevity I did
not run a full process, but I'm providing the timing below. If you'd
like me to record a full process, just let me know -- but as I said in
my previous email, 32 procs drop to 1 after about a minute and the
computation then rests on a single processor to complete the job.

With openmpi-1.6.5:

        real    1m13.186s
        user    0m0.044s
        sys     0m0.059s

With openmpi-1.8.2rc4:

        real    13m42.998s
        user    0m0.070s
        sys     0m0.066s

Exact invocation both times, exact same job submitted. Here's a link to
the video:

        http://clusty.ast.villanova.edu/aprsa/files/test.ogv

Please let me know if I can provide you with anything further.

Thanks,
Andrej

> Ah, that sheds some light. There is indeed a significant change
> between earlier releases and the 1.8.1 and above that might explain
> what he is seeing. Specifically, we no longer hammer the cpu while in
> MPI_Finalize. So if 16 of the procs are finishing early (which the
> output would suggest), then they will go into a "lazy" finalize state
> while they wait for the rest of the procs to complete their work.
> 
> In contrast, prior releases would continue at 100% cpu while they
> polled to see if the other procs were done.
> 
> We did this to help save power/energy, and because users had asked
> why the cpu utilization remained at 100% even though procs were
> waiting in finalize
> 
> HTH
> Ralph
> 
> On Aug 21, 2014, at 5:55 PM, Christopher Samuel
> <sam...@unimelb.edu.au> wrote:
> 
> > On 22/08/14 10:43, Ralph Castain wrote:
> > 
> >> From your earlier concerns, I would have expected only to find 32
> >> of them running. Was that not the case in this run?
> > 
> > As I understand it in his original email he mentioned that with
> > 1.6.5 all 48 processes were running at 100% CPU and was wondering
> > if the buggy BIOS that caused hwloc the issues he reported on the
> > hwloc-users list might be the cause for this regression in
> > performance.
> > 
> > All the best,
> > Chris
> > -- 
> > Christopher Samuel        Senior Systems Administrator
> > VLSCI - Victorian Life Sciences Computation Initiative
> > Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
> > http://www.vlsci.org.au/      http://twitter.com/vlsci
> > 
> > _______________________________________________
> > devel mailing list
> > de...@open-mpi.org
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > Link to this post:
> > http://www.open-mpi.org/community/lists/devel/2014/08/15686.php
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2014/08/15687.php

Reply via email to