Re: [OMPI devel] 1.8.2rc4 problem: only 32 out of 48 cores are working

2014-08-25 Thread Andrej Prsa
Hi Jeff,

My apologies for the delay in replying, I was flying back from the UK
to the States, but now I'm here and I can provide a more timely
response.

> I confirm that the hwloc message you sent (and your posts to the
> hwloc-users list) indicate that hwloc is getting confused by a buggy
> BIOS, but it's only dealing with the L3 cache, and that shouldn't
> affect the binding that OMPI is doing.

Great, good to know. I'd still be interested in learning how to build a
hwloc-parsable xml as a workaround, especially if it fixes the bindings
(see below).

> 1. Run with "--report-bindings" and send the output.  It'll
> prettyprint-render where OMPI thinks it is binding each process.

Please find it attached.

> 2. Run with "--bind-to none" and see if that helps.  I.e., if, per
> #1, OMPI thinks it is binding correctly (i.e., each of the 48
> processes is being bound to a unique core), then perhaps hwloc is
> doing something wrong in the actual binding (i.e., binding the 48
> processes only among the lower 32 cores).

BINGO! As soon as I did this, indeed all the cores went to 100%! Here's
the updated timing (compared to 13 minutes from before):

real1m8.442s
user0m0.077s
sys 0m0.071s

So I guess the conclusion is that hwloc is somehow messing things up on
this chipset?

Thanks,
Andrej


test_report_bindings.stderr
Description: Binary data


Re: [OMPI devel] 1.8.2rc4 problem: only 32 out of 48 cores are working

2014-08-23 Thread Jeff Squyres (jsquyres)
The video you provided was most helpful -- thank you!

I confirm that the hwloc message you sent (and your posts to the hwloc-users 
list) indicate that hwloc is getting confused by a buggy BIOS, but it's only 
dealing with the L3 cache, and that shouldn't affect the binding that OMPI is 
doing.

Can I ask you to do two more tests:

1. Run with "--report-bindings" and send the output.  It'll prettyprint-render 
where OMPI thinks it is binding each process.  Ralph asked you to run a few 
test already, and the output you sent may simply confirm what you sent 
previously, but he's more of an ORTE expert than I am -- the 
--reporting-bindings output shows easily parseable output for the rest of us.  
:-)

2. Run with "--bind-to none" and see if that helps.  I.e., if, per #1, OMPI 
thinks it is binding correctly (i.e., each of the 48 processes is being bound 
to a unique core), then perhaps hwloc is doing something wrong in the actual 
binding (i.e., binding the 48 processes only among the lower 32 cores).



On Aug 22, 2014, at 6:49 AM, Andrej Prsa  wrote:

> Hi again,
> 
> I generated a video that demonstrates the problem; for brevity I did
> not run a full process, but I'm providing the timing below. If you'd
> like me to record a full process, just let me know -- but as I said in
> my previous email, 32 procs drop to 1 after about a minute and the
> computation then rests on a single processor to complete the job.
> 
> With openmpi-1.6.5:
> 
>   real1m13.186s
>   user0m0.044s
>   sys 0m0.059s
> 
> With openmpi-1.8.2rc4:
> 
>   real13m42.998s
>   user0m0.070s
>   sys 0m0.066s
> 
> Exact invocation both times, exact same job submitted. Here's a link to
> the video:
> 
>   http://clusty.ast.villanova.edu/aprsa/files/test.ogv
> 
> Please let me know if I can provide you with anything further.
> 
> Thanks,
> Andrej
> 
>> Ah, that sheds some light. There is indeed a significant change
>> between earlier releases and the 1.8.1 and above that might explain
>> what he is seeing. Specifically, we no longer hammer the cpu while in
>> MPI_Finalize. So if 16 of the procs are finishing early (which the
>> output would suggest), then they will go into a "lazy" finalize state
>> while they wait for the rest of the procs to complete their work.
>> 
>> In contrast, prior releases would continue at 100% cpu while they
>> polled to see if the other procs were done.
>> 
>> We did this to help save power/energy, and because users had asked
>> why the cpu utilization remained at 100% even though procs were
>> waiting in finalize
>> 
>> HTH
>> Ralph
>> 
>> On Aug 21, 2014, at 5:55 PM, Christopher Samuel
>>  wrote:
>> 
>>> On 22/08/14 10:43, Ralph Castain wrote:
>>> 
 From your earlier concerns, I would have expected only to find 32
 of them running. Was that not the case in this run?
>>> 
>>> As I understand it in his original email he mentioned that with
>>> 1.6.5 all 48 processes were running at 100% CPU and was wondering
>>> if the buggy BIOS that caused hwloc the issues he reported on the
>>> hwloc-users list might be the cause for this regression in
>>> performance.
>>> 
>>> All the best,
>>> Chris
>>> -- 
>>> Christopher SamuelSenior Systems Administrator
>>> VLSCI - Victorian Life Sciences Computation Initiative
>>> Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
>>> http://www.vlsci.org.au/  http://twitter.com/vlsci
>>> 
>>> ___
>>> devel mailing list
>>> de...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> Link to this post:
>>> http://www.open-mpi.org/community/lists/devel/2014/08/15686.php
>> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post:
>> http://www.open-mpi.org/community/lists/devel/2014/08/15687.php
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/08/15690.php


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/



Re: [OMPI devel] 1.8.2rc4 problem: only 32 out of 48 cores are working

2014-08-22 Thread Andrej Prsa
Hi again,

I generated a video that demonstrates the problem; for brevity I did
not run a full process, but I'm providing the timing below. If you'd
like me to record a full process, just let me know -- but as I said in
my previous email, 32 procs drop to 1 after about a minute and the
computation then rests on a single processor to complete the job.

With openmpi-1.6.5:

real1m13.186s
user0m0.044s
sys 0m0.059s

With openmpi-1.8.2rc4:

real13m42.998s
user0m0.070s
sys 0m0.066s

Exact invocation both times, exact same job submitted. Here's a link to
the video:

http://clusty.ast.villanova.edu/aprsa/files/test.ogv

Please let me know if I can provide you with anything further.

Thanks,
Andrej

> Ah, that sheds some light. There is indeed a significant change
> between earlier releases and the 1.8.1 and above that might explain
> what he is seeing. Specifically, we no longer hammer the cpu while in
> MPI_Finalize. So if 16 of the procs are finishing early (which the
> output would suggest), then they will go into a "lazy" finalize state
> while they wait for the rest of the procs to complete their work.
> 
> In contrast, prior releases would continue at 100% cpu while they
> polled to see if the other procs were done.
> 
> We did this to help save power/energy, and because users had asked
> why the cpu utilization remained at 100% even though procs were
> waiting in finalize
> 
> HTH
> Ralph
> 
> On Aug 21, 2014, at 5:55 PM, Christopher Samuel
>  wrote:
> 
> > On 22/08/14 10:43, Ralph Castain wrote:
> > 
> >> From your earlier concerns, I would have expected only to find 32
> >> of them running. Was that not the case in this run?
> > 
> > As I understand it in his original email he mentioned that with
> > 1.6.5 all 48 processes were running at 100% CPU and was wondering
> > if the buggy BIOS that caused hwloc the issues he reported on the
> > hwloc-users list might be the cause for this regression in
> > performance.
> > 
> > All the best,
> > Chris
> > -- 
> > Christopher SamuelSenior Systems Administrator
> > VLSCI - Victorian Life Sciences Computation Initiative
> > Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
> > http://www.vlsci.org.au/  http://twitter.com/vlsci
> > 
> > ___
> > devel mailing list
> > de...@open-mpi.org
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > Link to this post:
> > http://www.open-mpi.org/community/lists/devel/2014/08/15686.php
> 
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2014/08/15687.php


Re: [OMPI devel] 1.8.2rc4 problem: only 32 out of 48 cores are working

2014-08-22 Thread Andrej Prsa
Hi Ralph, Chris,

You guys are both correct:

(1) The output that I passed along /is/ exemplary of only 32 processors
running (provided htop reports things correctly). The job I
submitted is the exact same process called 48 times (well, np
times), so all procs should take about the same time, ~1 minute.
The execution is notably slower than with 1.6.5 (I will time it
shortly, but offhand I'd say it's ~5x slower), and it seems that,
for the fraction of the time, 32 processors do all the work, and
then 1 processor finishes the remaining work -- i.e. htop shows 32
procs working, 16 idling, then 32 goes down to 1 and stays that way
for a while, then it drops to 0 and the job finishes. This behavior
is apparent in /all/ mpi jobs, not just this particular test case.

(2) I suspected that hwloc might be a culprit; before I posted here, I
reported it on hwloc mailing list, where I was told that it seems
to be a cache reporting problem and that I should be fine ignoring
it, or that I should load the topology from XML. I figured I'd
mention the buggy bios in my first post just in case it rang any
bells.

Is there a way to add timestamps to the debug output? That may
demonstrate better what I'm trying to say in (1) above.

If it helps, I'd be more than happy to provide access to the affected
machine so that you can see what's going on first hand, or capture a
small movie of htop while the process is running.

Thanks,
Andrej

> Ah, that sheds some light. There is indeed a significant change
> between earlier releases and the 1.8.1 and above that might explain
> what he is seeing. Specifically, we no longer hammer the cpu while in
> MPI_Finalize. So if 16 of the procs are finishing early (which the
> output would suggest), then they will go into a "lazy" finalize state
> while they wait for the rest of the procs to complete their work.
> 
> In contrast, prior releases would continue at 100% cpu while they
> polled to see if the other procs were done.
> 
> We did this to help save power/energy, and because users had asked
> why the cpu utilization remained at 100% even though procs were
> waiting in finalize
> 
> HTH
> Ralph
> 
> On Aug 21, 2014, at 5:55 PM, Christopher Samuel
>  wrote:
> 
> > On 22/08/14 10:43, Ralph Castain wrote:
> > 
> >> From your earlier concerns, I would have expected only to find 32
> >> of them running. Was that not the case in this run?
> > 
> > As I understand it in his original email he mentioned that with
> > 1.6.5 all 48 processes were running at 100% CPU and was wondering
> > if the buggy BIOS that caused hwloc the issues he reported on the
> > hwloc-users list might be the cause for this regression in
> > performance.
> > 
> > All the best,
> > Chris
> > -- 
> > Christopher SamuelSenior Systems Administrator
> > VLSCI - Victorian Life Sciences Computation Initiative
> > Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
> > http://www.vlsci.org.au/  http://twitter.com/vlsci
> > 
> > ___
> > devel mailing list
> > de...@open-mpi.org
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > Link to this post:
> > http://www.open-mpi.org/community/lists/devel/2014/08/15686.php
> 
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2014/08/15687.php


Re: [OMPI devel] 1.8.2rc4 problem: only 32 out of 48 cores are working

2014-08-21 Thread Ralph Castain
Ah, that sheds some light. There is indeed a significant change between earlier 
releases and the 1.8.1 and above that might explain what he is seeing. 
Specifically, we no longer hammer the cpu while in MPI_Finalize. So if 16 of 
the procs are finishing early (which the output would suggest), then they will 
go into a "lazy" finalize state while they wait for the rest of the procs to 
complete their work.

In contrast, prior releases would continue at 100% cpu while they polled to see 
if the other procs were done.

We did this to help save power/energy, and because users had asked why the cpu 
utilization remained at 100% even though procs were waiting in finalize

HTH
Ralph

On Aug 21, 2014, at 5:55 PM, Christopher Samuel  wrote:

> On 22/08/14 10:43, Ralph Castain wrote:
> 
>> From your earlier concerns, I would have expected only to find 32 of
>> them running. Was that not the case in this run?
> 
> As I understand it in his original email he mentioned that with 1.6.5
> all 48 processes were running at 100% CPU and was wondering if the buggy
> BIOS that caused hwloc the issues he reported on the hwloc-users list
> might be the cause for this regression in performance.
> 
> All the best,
> Chris
> -- 
> Christopher SamuelSenior Systems Administrator
> VLSCI - Victorian Life Sciences Computation Initiative
> Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
> http://www.vlsci.org.au/  http://twitter.com/vlsci
> 
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/08/15686.php



Re: [OMPI devel] 1.8.2rc4 problem: only 32 out of 48 cores are working

2014-08-21 Thread Ralph Castain
I think maybe I'm misunderstanding something. This shows that all 48 procs ran 
and terminated normally.

From your earlier concerns, I would have expected only to find 32 of them 
running. Was that not the case in this run?


On Aug 21, 2014, at 4:57 PM, Andrej Prsa  wrote:

> Whoops, jumped the gun there before the process finished. I'm attaching
> the new stderr output.
> 
>> Hmmm...that's even weirder. It thinks it is going to start 48 procs,
>> and the binding pattern even looks right.
>> 
>> Hate to keep bothering you, but could you ensure this is a debug
>> build (i.e., was configured with --enable-debug), and then set -mca
>> odls_base_verbose 5 --leave-session-attached on the cmd line?
>> 
>> It'll be a little noisy, but should tell us why the other 16 procs
>> aren't getting launched
>> 
>> 
>> On Aug 21, 2014, at 3:27 PM, Andrej Prsa  wrote:
>> 
 How odd - can you run it with --display-devel-map and send that
 along? It will give us a detailed statement of where it thinks
 everything should run.
>>> 
>>> Sure thing -- please find it attached.
>>> 
>>> Cheers,
>>> Andrej
>>> ___
>>> devel mailing list
>>> de...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> Link to this post:
>>> http://www.open-mpi.org/community/lists/devel/2014/08/15681.php
>> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post:
>> http://www.open-mpi.org/community/lists/devel/2014/08/15682.php
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/08/15684.php



Re: [OMPI devel] 1.8.2rc4 problem: only 32 out of 48 cores are working

2014-08-21 Thread Andrej Prsa
Whoops, jumped the gun there before the process finished. I'm attaching
the new stderr output.

> Hmmm...that's even weirder. It thinks it is going to start 48 procs,
> and the binding pattern even looks right.
> 
> Hate to keep bothering you, but could you ensure this is a debug
> build (i.e., was configured with --enable-debug), and then set -mca
> odls_base_verbose 5 --leave-session-attached on the cmd line?
> 
> It'll be a little noisy, but should tell us why the other 16 procs
> aren't getting launched
> 
> 
> On Aug 21, 2014, at 3:27 PM, Andrej Prsa  wrote:
> 
> >> How odd - can you run it with --display-devel-map and send that
> >> along? It will give us a detailed statement of where it thinks
> >> everything should run.
> > 
> > Sure thing -- please find it attached.
> > 
> > Cheers,
> > Andrej
> > ___
> > devel mailing list
> > de...@open-mpi.org
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > Link to this post:
> > http://www.open-mpi.org/community/lists/devel/2014/08/15681.php
> 
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2014/08/15682.php


test.ste.bz2
Description: application/bzip


Re: [OMPI devel] 1.8.2rc4 problem: only 32 out of 48 cores are working

2014-08-21 Thread Andrej Prsa
> Hate to keep bothering you, but could you ensure this is a debug
> build (i.e., was configured with --enable-debug), and then set -mca
> odls_base_verbose 5 --leave-session-attached on the cmd line?

No bother at all -- would love to help. I recompiled 1.8.2rc4 with
debug and issued:

/usr/local/openmpi-1.8.2rc4/bin/mpirun -np 48 --hostfile hosts -mca
odls_base_verbose 5 --leave-session-attached just_phb.py > test.std 2>
test.ste

I'm attaching test.ste.

Cheers,
Andrej


test.ste.bz2
Description: application/bzip


Re: [OMPI devel] 1.8.2rc4 problem: only 32 out of 48 cores are working

2014-08-21 Thread Andrej Prsa
> How odd - can you run it with --display-devel-map and send that
> along? It will give us a detailed statement of where it thinks
> everything should run.

Sure thing -- please find it attached.

Cheers,
Andrej


test.std.bz2
Description: application/bzip


Re: [OMPI devel] 1.8.2rc4 problem: only 32 out of 48 cores are working

2014-08-21 Thread Ralph Castain
How odd - can you run it with --display-devel-map and send that along? It will 
give us a detailed statement of where it thinks everything should run.


On Aug 21, 2014, at 2:49 PM, Andrej Prsa  wrote:

> Hi Ralph,
> 
> Thanks for your reply!
> 
>> One thing you might want to try: add this to your mpirun cmd line:
>> 
>> --display-allocation
>> 
>> This will tell you how many slots we think we've been given on your
>> cluster.
> 
> I tried that using 1.8.2rc4, this is what I get:
> 
> ==   ALLOCATED NODES   ==
>node2: slots=48 max_slots=48 slots_inuse=0 state=UNKNOWN
> =
> 
> I forgot to mention previously that mpirun runs all cores on localhost,
> it is only when running on another host (--hostfile hosts) that the 32
> proc cap is observed. I'm attaching a snapshot of the most recent run.
> The job was invoked by:
> 
> /usr/local/openmpi-1.8.2rc4/bin/mpirun -np 48 --hostfile hosts
>  --display-allocation ./test.py > test.std 2> test.ste
> 
> test.ste contains the hwloc error I mentioned in my previous post:
> 
> 
> * hwloc has encountered what looks like an error from the operating system.
> *
> * object (L3 cpuset 0x03f0) intersection without inclusion!
> * Error occurred in topology.c line 760
> *
> * Please report this error message to the hwloc user's mailing list,
> * along with the output from the hwloc-gather-topology.sh script.
> 
> 
> Hope this helps,
> Andrej
> 
> 
>> On Aug 21, 2014, at 12:50 PM, Ralph Castain  wrote:
>> 
>>> Starting early in the 1.7 series, we began to bind procs by default
>>> to cores when -np <= 2, and to sockets if np > 2. Is it possible
>>> this is what you are seeing?
>>> 
>>> 
>>> On Aug 21, 2014, at 12:45 PM, Andrej Prsa  wrote:
>>> 
 Dear devels,
 
 I have been trying out 1.8.2rcs recently and found a show-stopping
 problem on our cluster. Running any job with any number of
 processors larger than 32 will always employ only 32 cores per
 node (our nodes have 48 cores). We are seeing identical behavior
 with 1.8.2rc4, 1.8.2rc2, and 1.8.1. Running identical programs
 shows no such issues with version 1.6.5, where all 48 cores per
 node are working. While our system is running torque/maui, the
 problem is evident by running mpirun directly.
 
 I am attaching hwloc topology in case that helps -- I am aware of a
 buggy bios code that trips hwloc, but I don't know if that might
 be an issue or not. I am happy to help debugging if you can
 provide me with guidance.
 
 Thanks,
 Andrej
 ___
 devel mailing list
 de...@open-mpi.org
 Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
 Link to this post:
 http://www.open-mpi.org/community/lists/devel/2014/08/15676.php
>>> 
>> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post:
>> http://www.open-mpi.org/community/lists/devel/2014/08/15678.php
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/08/15679.php



Re: [OMPI devel] 1.8.2rc4 problem: only 32 out of 48 cores are working

2014-08-21 Thread Andrej Prsa
Hi Ralph,

Thanks for your reply!

> One thing you might want to try: add this to your mpirun cmd line:
> 
> --display-allocation
> 
> This will tell you how many slots we think we've been given on your
> cluster.

I tried that using 1.8.2rc4, this is what I get:

==   ALLOCATED NODES   ==
node2: slots=48 max_slots=48 slots_inuse=0 state=UNKNOWN
=

I forgot to mention previously that mpirun runs all cores on localhost,
it is only when running on another host (--hostfile hosts) that the 32
proc cap is observed. I'm attaching a snapshot of the most recent run.
The job was invoked by:

/usr/local/openmpi-1.8.2rc4/bin/mpirun -np 48 --hostfile hosts
  --display-allocation ./test.py > test.std 2> test.ste

test.ste contains the hwloc error I mentioned in my previous post:


* hwloc has encountered what looks like an error from the operating system.
*
* object (L3 cpuset 0x03f0) intersection without inclusion!
* Error occurred in topology.c line 760
*
* Please report this error message to the hwloc user's mailing list,
* along with the output from the hwloc-gather-topology.sh script.


Hope this helps,
Andrej


> On Aug 21, 2014, at 12:50 PM, Ralph Castain  wrote:
> 
> > Starting early in the 1.7 series, we began to bind procs by default
> > to cores when -np <= 2, and to sockets if np > 2. Is it possible
> > this is what you are seeing?
> > 
> > 
> > On Aug 21, 2014, at 12:45 PM, Andrej Prsa  wrote:
> > 
> >> Dear devels,
> >> 
> >> I have been trying out 1.8.2rcs recently and found a show-stopping
> >> problem on our cluster. Running any job with any number of
> >> processors larger than 32 will always employ only 32 cores per
> >> node (our nodes have 48 cores). We are seeing identical behavior
> >> with 1.8.2rc4, 1.8.2rc2, and 1.8.1. Running identical programs
> >> shows no such issues with version 1.6.5, where all 48 cores per
> >> node are working. While our system is running torque/maui, the
> >> problem is evident by running mpirun directly.
> >> 
> >> I am attaching hwloc topology in case that helps -- I am aware of a
> >> buggy bios code that trips hwloc, but I don't know if that might
> >> be an issue or not. I am happy to help debugging if you can
> >> provide me with guidance.
> >> 
> >> Thanks,
> >> Andrej
> >> ___
> >> devel mailing list
> >> de...@open-mpi.org
> >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >> Link to this post:
> >> http://www.open-mpi.org/community/lists/devel/2014/08/15676.php
> > 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2014/08/15678.php


Re: [OMPI devel] 1.8.2rc4 problem: only 32 out of 48 cores are working

2014-08-21 Thread Ralph Castain
One thing you might want to try: add this to your mpirun cmd line:

--display-allocation

This will tell you how many slots we think we've been given on your cluster.

On Aug 21, 2014, at 12:50 PM, Ralph Castain  wrote:

> Starting early in the 1.7 series, we began to bind procs by default to cores 
> when -np <= 2, and to sockets if np > 2. Is it possible this is what you are 
> seeing?
> 
> 
> On Aug 21, 2014, at 12:45 PM, Andrej Prsa  wrote:
> 
>> Dear devels,
>> 
>> I have been trying out 1.8.2rcs recently and found a show-stopping
>> problem on our cluster. Running any job with any number of processors
>> larger than 32 will always employ only 32 cores per node (our nodes
>> have 48 cores). We are seeing identical behavior with 1.8.2rc4,
>> 1.8.2rc2, and 1.8.1. Running identical programs shows no such issues
>> with version 1.6.5, where all 48 cores per node are working. While our
>> system is running torque/maui, the problem is evident by running mpirun
>> directly.
>> 
>> I am attaching hwloc topology in case that helps -- I am aware of a
>> buggy bios code that trips hwloc, but I don't know if that might be an
>> issue or not. I am happy to help debugging if you can provide me with
>> guidance.
>> 
>> Thanks,
>> Andrej
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2014/08/15676.php
> 



Re: [OMPI devel] 1.8.2rc4 problem: only 32 out of 48 cores are working

2014-08-21 Thread Ralph Castain
Starting early in the 1.7 series, we began to bind procs by default to cores 
when -np <= 2, and to sockets if np > 2. Is it possible this is what you are 
seeing?


On Aug 21, 2014, at 12:45 PM, Andrej Prsa  wrote:

> Dear devels,
> 
> I have been trying out 1.8.2rcs recently and found a show-stopping
> problem on our cluster. Running any job with any number of processors
> larger than 32 will always employ only 32 cores per node (our nodes
> have 48 cores). We are seeing identical behavior with 1.8.2rc4,
> 1.8.2rc2, and 1.8.1. Running identical programs shows no such issues
> with version 1.6.5, where all 48 cores per node are working. While our
> system is running torque/maui, the problem is evident by running mpirun
> directly.
> 
> I am attaching hwloc topology in case that helps -- I am aware of a
> buggy bios code that trips hwloc, but I don't know if that might be an
> issue or not. I am happy to help debugging if you can provide me with
> guidance.
> 
> Thanks,
> Andrej
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/08/15676.php



[OMPI devel] 1.8.2rc4 problem: only 32 out of 48 cores are working

2014-08-21 Thread Andrej Prsa
Dear devels,

I have been trying out 1.8.2rcs recently and found a show-stopping
problem on our cluster. Running any job with any number of processors
larger than 32 will always employ only 32 cores per node (our nodes
have 48 cores). We are seeing identical behavior with 1.8.2rc4,
1.8.2rc2, and 1.8.1. Running identical programs shows no such issues
with version 1.6.5, where all 48 cores per node are working. While our
system is running torque/maui, the problem is evident by running mpirun
directly.

I am attaching hwloc topology in case that helps -- I am aware of a
buggy bios code that trips hwloc, but I don't know if that might be an
issue or not. I am happy to help debugging if you can provide me with
guidance.

Thanks,
Andrej


cluster.output
Description: Binary data


cluster.tar.bz2
Description: application/bzip