Re: [OMPI users] HPL with OpenMPI: Do I have a memory leak?

2009-05-01 Thread Gus Correa

Hi Brian

Thank you very much for the instant help!

I just tried "-mca btl openib,sm,self" and
"-mca mpi_leave_pinned 0" together (still with OpenMPI 1.3.1).

So far so good, it passed through two NB cases/linear system solutions,
it is running the third NB, and the memory use hasn't increased.
On the failed runs the second NB already used more memory than the 
first, and the third would blow up memory use.


If the run was bound do fail it would be swapping memory at this point, 
and it is not.

This is a good sign, I hope I am not speaking too early,
but it looks like your suggestion fixed the problem.
Thanks!

It was interesting to observe using Ganglia
that on the failed runs the memory use "jumps"
happened whenever HPL switched from one NB to another.
Every NB transition (i.e., time HPL started to solve a
new linear system, and probably generated a new random matrix)
the memory use would jump to a (significantly) higher value.
Anyway, this is just is in case the info tells you something about what
might be going on.

I will certainly follow your advice and upgrade to OpenMPI 1.3.2,
which I just downloaded.
You guys are prolific, a new edition per month! :)

Many thanks!
Gus Correa

Brian W. Barrett wrote:

Gus -

Open MPI 1.3.0 & 1.3.1 attempted to use some controls in the glibc 
malloc implementation to handle memory registration caching for 
InfiniBand. Unfortunately, it was not only bugging in that it didn't 
work, but it also has the side effect that certain memory usage patterns 
can cause the memory allocator to use much more memory than it normally 
would.  The configuration options were set any time the openib module 
was loaded, even if it wasn't used in communication.  Can you try 
running with the extra option:


  -mca mpi_leave_pinned 0

I'm guessing that will fix the problem.  If you're using InfiniBand, you 
probably want to upgrade to 1.3.2, as there are known data corruption 
issues in 1.3.0 and 1.3.1 with openib.


Brian

On Fri, 1 May 2009, Gus Correa wrote:


Hi Ralph

Thank you very much for the prompt answer.
Sorry for being so confusing on my original message.

Yes, I am saying that the inclusion of openib is causing the difference
in behavior.
It runs with "sm,self", it fails with "openib,sm,self".
I am as puzzled as you are, because I thought the "openib" parameter
was simply ignored when running on a single node, exactly like you said.
After your message arrived, I ran HPL once more with "openib",
just in case.
Sure enough it failed just as I described.

And yes, all the procs run on a single node in both cases.
It doesn't seem to be a problem caused by a particular
node hardware either, as I already
tried three different nodes with similar results.

BTW, I successfully ran HPL across the whole cluster two days ago,
with IB ("openib,sm,self"),
but using a modest (for the cluster) problem size: N=50,000.
The total cluster memory is 24*16=384GB,
which gives a max HPL problem size N=195,000.
I have yet to try the large problem on the whole cluster,
but I am afraid I will stumble on the same memory problem.

Finally, on your email you use the syntax "btl=openib,sm,self",
with an "=" sign between the btl key and its values.
However, the mpiexec man page uses the syntax "btl openib,sm,self",
with a blank space between the btl key and its values.
I've been following the man page syntax.
The "=" sign doesn't seem to work, and aborts with the error:
"No executable was specified on the mpiexec command line.".
Could this possibly be the issue (say, wrong parsing of mca options)?

Many thanks!
Gus Correa
-
Gustavo Correa
Lamont-Doherty Earth Observatory - Columbia University
Palisades, NY, 10964-8000 - USA
-

Ralph Castain wrote:
If you are running on a single node, then btl=openib,sm,self would be 
equivalent to btl=sm,self. OMPI is smart enough to know not to use IB 
if you are on a single node, and instead uses the shared memory 
subsystem.


Are you saying that the inclusion of openib is causing a difference 
in behavior, even though all procs are on the same node??


Just want to ensure I understand the problem.

Thanks
Ralph


On Fri, May 1, 2009 at 11:16 AM, Gus Correa > wrote:


Hi OpenMPI and HPC experts

This may or may not be the right forum to post this,
and I am sorry to bother those that think it is not.

I am trying to run the HPL benchmark on our cluster,
compiling it with Gnu and linking to
GotoBLAS (1.26) and OpenMPI (1.3.1),
both also Gnu-compiled.

I have got failures that suggest a memory leak when the
problem size is large, but still within the memory limits
recommended by HPL.
The problem only happens when "openib" is among the OpenMPI
MCA parameters (and the problem size is large).
Any help is appreciated.

Here is a 

Re: [OMPI users] HPL with OpenMPI: Do I have a memory leak?

2009-05-01 Thread Gus Correa

Hi Jacob

Thank you very much for the suggestions and insight.

On an idle node MemFree is about 15599152 kB (14.8GB).
Applying the "80%" rule to it, I get a problem size N=38,440.
However, the HPL run fails with the memory leak problem
even if I use N=35,000,
with openib among the MCA btl parameters.
You may have seen another message by Brian Barret explaining a possible
reason for the problem, and suggesting a workaround.
I haven't tried it yet, but I will.

I read about the HPL preference for "square" PxQ processor grids.
On a single node the fastest runs are 2x4,
but 1x8 is often times competitive also, coming second or third,
although it is not "square" at all.
I would guess this has much to do with
the physical 2-socket-4-core layout, or not?
I would also guess that the best processor grid is likely to
be quite different when the whole cluster is used, right?
How can one use the 2x4 fastest processor grid layout on a single node
to infer the fastest processor grid for the cluster?

The best I got so far was 80% efficiency, less than your "at least 85%".
So, I certainly have more work to do.

GotoBLAS was compiled with Gnu, no special optimization flags,
other than what the distribution Makefiles already have.
OpenMPI was also compiled with Gnu, but I used the CFLAGS=FFLAGS=:

-march=amdfam10 -O3 -finline-functions -funroll-loops -mfpmath=sse

As I used mpicc and mpif77 to compile HPL, I presume it inherited these
flags also, right?

However, I already read comments on other mailing lists
that "-march=adfam10" is not really the best choice for
Barcelona (and I wonder if it is for Shanghai),
although gcc says it tailored for that architecture.
What "-march" is really the fastest?

Any suggestions in this area of compilers and optimization?

Many thanks,
Gus Correa
-
Gustavo Correa
Lamont-Doherty Earth Observatory - Columbia University
Palisades, NY, 10964-8000 - USA
-

jacob_liber...@dell.com wrote:

Hi Gus,

For single node runs, don't bother specifying the btl.  Openmpi should select 
the best option.

Beyond that, the "80% total RAM" recommendation is misleading. Base your N off the memfree rather than memtotal. IB can reserve quite a bit.  Verify your /etc/security/limits.conf limits allow sufficient locking.  (Try unlimited) 


Finally, P should be smaller than Q, and squarer values are recommended.

With Shanghai, OpenMPI, GotoBLAS expect single node efficiency of a least 85% 
given decent tuning.  If the distribution continues to look strange, there are 
more things to check.

Thanks, Jacob


-Original Message-
From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On
Behalf Of Gus Correa
Sent: Friday, May 01, 2009 12:17 PM
To: Open MPI Users
Subject: [OMPI users] HPL with OpenMPI: Do I have a memory leak?

Hi OpenMPI and HPC experts

This may or may not be the right forum to post this,
and I am sorry to bother those that think it is not.

I am trying to run the HPL benchmark on our cluster,
compiling it with Gnu and linking to
GotoBLAS (1.26) and OpenMPI (1.3.1),
both also Gnu-compiled.

I have got failures that suggest a memory leak when the
problem size is large, but still within the memory limits
recommended by HPL.
The problem only happens when "openib" is among the OpenMPI
MCA parameters (and the problem size is large).
Any help is appreciated.

Here is a description of what happens.

For starters I am trying HPL on a single node, to get a feeling for
the right parameters (N & NB, P & Q, etc) on dual-socked quad-core
AMD Opteron 2376 "Shanghai"

The HPL recommendation is to use close to 80% of your physical memory,
to reach top Gigaflop performance.
Our physical memory on a node is 16GB, and this gives a problem size
N=40,000 to keep the 80% memory use.
I tried several block sizes, somewhat correlated to the size of the
processor cache:  NB=64 80 96 128 ...

When I run HPL with N=20,000 or smaller all works fine,
and the HPL run completes, regardless of whether "openib"
is present or not on my MCA parameters.

However, moving when I move N=40,000, or even N=35,000,
the run starts OK with NB=64,
but as NB is switched to larger values
the total memory use increases in jumps (as shown by Ganglia),
and becomes uneven across the processors (as shown by "top").
The problem happens if "openib" is among the MCA parameters,
but doesn't happen if I remove "openib" from the MCA list and use
only "sm,self".

For N=35,000, when NB reaches 96 memory use is already above the
physical limit
(16GB), having increased from 12.5GB to over 17GB.
For N=40,000 the problem happens even earlier, with NB=80.
At this point memory swapping kicks in,
and eventually the run dies with memory allocation errors:

===
=
T/VNNB P Q   Time
   Gflops

Re: [OMPI users] HPL with OpenMPI: Do I have a memory leak?

2009-05-01 Thread Brian W. Barrett

Gus -

Open MPI 1.3.0 & 1.3.1 attempted to use some controls in the glibc malloc 
implementation to handle memory registration caching for InfiniBand. 
Unfortunately, it was not only bugging in that it didn't work, but it also 
has the side effect that certain memory usage patterns can cause the 
memory allocator to use much more memory than it normally would.  The 
configuration options were set any time the openib module was loaded, even 
if it wasn't used in communication.  Can you try running with the extra 
option:


  -mca mpi_leave_pinned 0

I'm guessing that will fix the problem.  If you're using InfiniBand, you 
probably want to upgrade to 1.3.2, as there are known data corruption 
issues in 1.3.0 and 1.3.1 with openib.


Brian

On Fri, 1 May 2009, Gus Correa wrote:


Hi Ralph

Thank you very much for the prompt answer.
Sorry for being so confusing on my original message.

Yes, I am saying that the inclusion of openib is causing the difference
in behavior.
It runs with "sm,self", it fails with "openib,sm,self".
I am as puzzled as you are, because I thought the "openib" parameter
was simply ignored when running on a single node, exactly like you said.
After your message arrived, I ran HPL once more with "openib",
just in case.
Sure enough it failed just as I described.

And yes, all the procs run on a single node in both cases.
It doesn't seem to be a problem caused by a particular
node hardware either, as I already
tried three different nodes with similar results.

BTW, I successfully ran HPL across the whole cluster two days ago,
with IB ("openib,sm,self"),
but using a modest (for the cluster) problem size: N=50,000.
The total cluster memory is 24*16=384GB,
which gives a max HPL problem size N=195,000.
I have yet to try the large problem on the whole cluster,
but I am afraid I will stumble on the same memory problem.

Finally, on your email you use the syntax "btl=openib,sm,self",
with an "=" sign between the btl key and its values.
However, the mpiexec man page uses the syntax "btl openib,sm,self",
with a blank space between the btl key and its values.
I've been following the man page syntax.
The "=" sign doesn't seem to work, and aborts with the error:
"No executable was specified on the mpiexec command line.".
Could this possibly be the issue (say, wrong parsing of mca options)?

Many thanks!
Gus Correa
-
Gustavo Correa
Lamont-Doherty Earth Observatory - Columbia University
Palisades, NY, 10964-8000 - USA
-

Ralph Castain wrote:
If you are running on a single node, then btl=openib,sm,self would be 
equivalent to btl=sm,self. OMPI is smart enough to know not to use IB if 
you are on a single node, and instead uses the shared memory subsystem.


Are you saying that the inclusion of openib is causing a difference in 
behavior, even though all procs are on the same node??


Just want to ensure I understand the problem.

Thanks
Ralph


On Fri, May 1, 2009 at 11:16 AM, Gus Correa > wrote:


Hi OpenMPI and HPC experts

This may or may not be the right forum to post this,
and I am sorry to bother those that think it is not.

I am trying to run the HPL benchmark on our cluster,
compiling it with Gnu and linking to
GotoBLAS (1.26) and OpenMPI (1.3.1),
both also Gnu-compiled.

I have got failures that suggest a memory leak when the
problem size is large, but still within the memory limits
recommended by HPL.
The problem only happens when "openib" is among the OpenMPI
MCA parameters (and the problem size is large).
Any help is appreciated.

Here is a description of what happens.

For starters I am trying HPL on a single node, to get a feeling for
the right parameters (N & NB, P & Q, etc) on dual-socked quad-core
AMD Opteron 2376 "Shanghai"

The HPL recommendation is to use close to 80% of your physical memory,
to reach top Gigaflop performance.
Our physical memory on a node is 16GB, and this gives a problem size
N=40,000 to keep the 80% memory use.
I tried several block sizes, somewhat correlated to the size of the
processor cache:  NB=64 80 96 128 ...

When I run HPL with N=20,000 or smaller all works fine,
and the HPL run completes, regardless of whether "openib"
is present or not on my MCA parameters.

However, moving when I move N=40,000, or even N=35,000,
the run starts OK with NB=64,
but as NB is switched to larger values
the total memory use increases in jumps (as shown by Ganglia),
and becomes uneven across the processors (as shown by "top").
The problem happens if "openib" is among the MCA parameters,
but doesn't happen if I remove "openib" from the MCA list and use
only "sm,self".

For N=35,000, when NB reaches 96 memory use is already above the
physical limit

Re: [OMPI users] HPL with OpenMPI: Do I have a memory leak?

2009-05-01 Thread Gus Correa

Hi Ralph

Thank you very much for the prompt answer.
Sorry for being so confusing on my original message.

Yes, I am saying that the inclusion of openib is causing the difference
in behavior.
It runs with "sm,self", it fails with "openib,sm,self".
I am as puzzled as you are, because I thought the "openib" parameter
was simply ignored when running on a single node, exactly like you said.
After your message arrived, I ran HPL once more with "openib",
just in case.
Sure enough it failed just as I described.

And yes, all the procs run on a single node in both cases.
It doesn't seem to be a problem caused by a particular
node hardware either, as I already
tried three different nodes with similar results.

BTW, I successfully ran HPL across the whole cluster two days ago,
with IB ("openib,sm,self"),
but using a modest (for the cluster) problem size: N=50,000.
The total cluster memory is 24*16=384GB,
which gives a max HPL problem size N=195,000.
I have yet to try the large problem on the whole cluster,
but I am afraid I will stumble on the same memory problem.

Finally, on your email you use the syntax "btl=openib,sm,self",
with an "=" sign between the btl key and its values.
However, the mpiexec man page uses the syntax "btl openib,sm,self",
with a blank space between the btl key and its values.
I've been following the man page syntax.
The "=" sign doesn't seem to work, and aborts with the error:
"No executable was specified on the mpiexec command line.".
Could this possibly be the issue (say, wrong parsing of mca options)?

Many thanks!
Gus Correa
-
Gustavo Correa
Lamont-Doherty Earth Observatory - Columbia University
Palisades, NY, 10964-8000 - USA
-

Ralph Castain wrote:
If you are running on a single node, then btl=openib,sm,self would be 
equivalent to btl=sm,self. OMPI is smart enough to know not to use IB if 
you are on a single node, and instead uses the shared memory subsystem.


Are you saying that the inclusion of openib is causing a difference in 
behavior, even though all procs are on the same node??


Just want to ensure I understand the problem.

Thanks
Ralph


On Fri, May 1, 2009 at 11:16 AM, Gus Correa > wrote:


Hi OpenMPI and HPC experts

This may or may not be the right forum to post this,
and I am sorry to bother those that think it is not.

I am trying to run the HPL benchmark on our cluster,
compiling it with Gnu and linking to
GotoBLAS (1.26) and OpenMPI (1.3.1),
both also Gnu-compiled.

I have got failures that suggest a memory leak when the
problem size is large, but still within the memory limits
recommended by HPL.
The problem only happens when "openib" is among the OpenMPI
MCA parameters (and the problem size is large).
Any help is appreciated.

Here is a description of what happens.

For starters I am trying HPL on a single node, to get a feeling for
the right parameters (N & NB, P & Q, etc) on dual-socked quad-core
AMD Opteron 2376 "Shanghai"

The HPL recommendation is to use close to 80% of your physical memory,
to reach top Gigaflop performance.
Our physical memory on a node is 16GB, and this gives a problem size
N=40,000 to keep the 80% memory use.
I tried several block sizes, somewhat correlated to the size of the
processor cache:  NB=64 80 96 128 ...

When I run HPL with N=20,000 or smaller all works fine,
and the HPL run completes, regardless of whether "openib"
is present or not on my MCA parameters.

However, moving when I move N=40,000, or even N=35,000,
the run starts OK with NB=64,
but as NB is switched to larger values
the total memory use increases in jumps (as shown by Ganglia),
and becomes uneven across the processors (as shown by "top").
The problem happens if "openib" is among the MCA parameters,
but doesn't happen if I remove "openib" from the MCA list and use
only "sm,self".

For N=35,000, when NB reaches 96 memory use is already above the
physical limit
(16GB), having increased from 12.5GB to over 17GB.
For N=40,000 the problem happens even earlier, with NB=80.
At this point memory swapping kicks in,
and eventually the run dies with memory allocation errors:



T/VNNB P Q   Time  Gflops


WR01L2L4   35000   128 8 1 539.66 5.297e+01


||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)=0.0043992
.. PASSED
HPL ERROR from process # 0, on line 172 of function HPL_pdtest:
 >>> 

Re: [OMPI users] Problem with Filem

2009-05-01 Thread Josh Hursey
This typically this means that one or more of the rcp/scp or rsh/ssh  
commands failed. FileM should be printing an error message when one  
of the copy commands fail. Try turning up the verbose level to 10 to  
see if it indicates any problems:

 -mca filem_rsh_verbose 10

Can you send me the MCA parameters that you are setting? That may  
help narrow down the problem as well. Also I cleaned up some of the  
filem (and snapc) error reporting in the development trunk if you  
want to give that a try.


Let me know what you find out.

Best,
Josh

On Apr 30, 2009, at 6:40 AM, Bouguerra mohamed slim wrote:


Hello,
I have a problem with the Filem module when i would checkpoint on a  
remote host without shared space file system.
I use the new open-mpi 1.3.2 and it is the same problem as in the  
version 1.3.1. Indeed, when i use the NFS system file it works.  
Thus i guess that is a problem with the Filem.


[azur-6.fr:23223] filem:rsh: wait_all(): Wait failed (-1)
[azur-6.fr:23223] [[48784,0],0] ORTE_ERROR_LOG: Error in file /home/ 
grenoble/msbouguerra/openmpi-1.3.2/orte/mca/snapc/full/ 
snapc_full_global.c at line 1054


--
Cordialement,
Mohamed-Slim BOUGUERRAPhD student INRIA-Grenoble / Projet MOAIS
ENSIMAG - antenne de Montbonnot
ZIRST 51, avenue Jean Kuntzmann
38330 MONTBONNOT SAINT MARTIN France
Tel :+33 (0)4 76 61 20 79
Fax :+33 (0)4 76 61 20 99

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] Checkpointing configuration problem

2009-05-01 Thread Yaakoub El Khamra
You might want to consider --enable-mpi-threads=yes



Regards
Yaakoub El Khamra




On Fri, May 1, 2009 at 3:17 PM, Kritiraj Sajadah  wrote:
>
> Dear all,
>            I am trying to install openmpi 1.3 on my laptop. I successfully 
> installed BLCR in /usr/local.
>
> When installing openmpi using the following options:
>
>  ./configure --prefix=/usr/local --with-ft=cr --enable-ft-thread 
> --enable-MPI-thread --with-blcr=/usr/local
>
> I got the following error:
>
> 
> == System-specific tests
> 
> ...
>
> checking if want fault tolerance thread... Must enable progress or MPI 
> threads to use this option
> configure: error: Cannot continue
>
> Help please.
>
> regards,
>
> Raj
>
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>



Re: [OMPI users] Checkpointing configuration problem

2009-05-01 Thread Josh Hursey
Try replacing "--enable-MPI-thread" with "--enable-mpi-threads". That  
should fix it.


-- Josh


On May 1, 2009, at 4:17 PM, Kritiraj Sajadah wrote:



Dear all,
I am trying to install openmpi 1.3 on my laptop. I  
successfully installed BLCR in /usr/local.


When installing openmpi using the following options:

 ./configure --prefix=/usr/local --with-ft=cr --enable-ft-thread -- 
enable-MPI-thread --with-blcr=/usr/local


I got the following error:

== 
==

== System-specific tests
== 
==

...

checking if want fault tolerance thread... Must enable progress or  
MPI threads to use this option

configure: error: Cannot continue

Help please.

regards,

Raj



___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




[OMPI users] Checkpointing configuration problem

2009-05-01 Thread Kritiraj Sajadah

Dear all, 
I am trying to install openmpi 1.3 on my laptop. I successfully 
installed BLCR in /usr/local.

When installing openmpi using the following options:

 ./configure --prefix=/usr/local --with-ft=cr --enable-ft-thread 
--enable-MPI-thread --with-blcr=/usr/local

I got the following error:


== System-specific tests

...

checking if want fault tolerance thread... Must enable progress or MPI threads 
to use this option
configure: error: Cannot continue

Help please.

regards,

Raj





Re: [OMPI users] HPL with OpenMPI: Do I have a memory leak?

2009-05-01 Thread JACOB_LIBERMAN
Hi Gus,

For single node runs, don't bother specifying the btl.  Openmpi should select 
the best option.

Beyond that, the "80% total RAM" recommendation is misleading. Base your N off 
the memfree rather than memtotal. IB can reserve quite a bit.  Verify your 
/etc/security/limits.conf limits allow sufficient locking.  (Try unlimited) 

Finally, P should be smaller than Q, and squarer values are recommended.

With Shanghai, OpenMPI, GotoBLAS expect single node efficiency of a least 85% 
given decent tuning.  If the distribution continues to look strange, there are 
more things to check.

Thanks, Jacob

> -Original Message-
> From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On
> Behalf Of Gus Correa
> Sent: Friday, May 01, 2009 12:17 PM
> To: Open MPI Users
> Subject: [OMPI users] HPL with OpenMPI: Do I have a memory leak?
> 
> Hi OpenMPI and HPC experts
> 
> This may or may not be the right forum to post this,
> and I am sorry to bother those that think it is not.
> 
> I am trying to run the HPL benchmark on our cluster,
> compiling it with Gnu and linking to
> GotoBLAS (1.26) and OpenMPI (1.3.1),
> both also Gnu-compiled.
> 
> I have got failures that suggest a memory leak when the
> problem size is large, but still within the memory limits
> recommended by HPL.
> The problem only happens when "openib" is among the OpenMPI
> MCA parameters (and the problem size is large).
> Any help is appreciated.
> 
> Here is a description of what happens.
> 
> For starters I am trying HPL on a single node, to get a feeling for
> the right parameters (N & NB, P & Q, etc) on dual-socked quad-core
> AMD Opteron 2376 "Shanghai"
> 
> The HPL recommendation is to use close to 80% of your physical memory,
> to reach top Gigaflop performance.
> Our physical memory on a node is 16GB, and this gives a problem size
> N=40,000 to keep the 80% memory use.
> I tried several block sizes, somewhat correlated to the size of the
> processor cache:  NB=64 80 96 128 ...
> 
> When I run HPL with N=20,000 or smaller all works fine,
> and the HPL run completes, regardless of whether "openib"
> is present or not on my MCA parameters.
> 
> However, moving when I move N=40,000, or even N=35,000,
> the run starts OK with NB=64,
> but as NB is switched to larger values
> the total memory use increases in jumps (as shown by Ganglia),
> and becomes uneven across the processors (as shown by "top").
> The problem happens if "openib" is among the MCA parameters,
> but doesn't happen if I remove "openib" from the MCA list and use
> only "sm,self".
> 
> For N=35,000, when NB reaches 96 memory use is already above the
> physical limit
> (16GB), having increased from 12.5GB to over 17GB.
> For N=40,000 the problem happens even earlier, with NB=80.
> At this point memory swapping kicks in,
> and eventually the run dies with memory allocation errors:
> 
> ===
> =
> T/VNNB P Q   Time
>Gflops
> ---
> -
> WR01L2L4   35000   128 8 1 539.66
> 5.297e+01
> ---
> -
> ||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)=0.0043992
> .. PASSED
> HPL ERROR from process # 0, on line 172 of function HPL_pdtest:
>  >>> [7,0] Memory allocation failed for A, x and b. Skip. <<<
> ...
> 
> ***
> 
> The code snippet that corresponds to HPL_pdest.c is this,
> although the leak is probably somewhere else:
> 
> /*
>   * Allocate dynamic memory
>   */
> vptr = (void*)malloc( ( (size_t)(ALGO->align) +
> (size_t)(mat.ld+1) * (size_t)(mat.nq) ) *
>   sizeof(double) );
> info[0] = (vptr == NULL); info[1] = myrow; info[2] = mycol;
> (void) HPL_all_reduce( (void *)(info), 3, HPL_INT, HPL_max,
>GRID->all_comm );
> if( info[0] != 0 )
> {
>if( ( myrow == 0 ) && ( mycol == 0 ) )
>   HPL_pwarn( TEST->outfp, __LINE__, "HPL_pdtest",
>  "[%d,%d] %s", info[1], info[2],
>  "Memory allocation failed for A, x and b. Skip." );
>(TEST->kskip)++;
>return;
> }
> 
> ***
> 
> I found this continued increase in memory use rather strange,
> and suggestive of a memory leak in one of the codes being used.
> 
> Everything (OpenMPI, GotoBLAS, and HPL)
> was compiled using Gnu only (gcc, gfortran, g++).
> 
> I haven't changed anything on the compiler's memory model,
> i.e., I haven't used or changed the "-mcmodel" flag of gcc
> (I don't know if the Makefiles on HPL, GotoBLAS, and OpenMPI use it.)
> 
> No additional load is present on the node,
> other than the OS (Linux CentOS 5.2), HPL is running alone.
> 
> The cluster has Infiniband.
> However, I am running on a single node.
> 
> The surprising thing is that 

Re: [OMPI users] HPL with OpenMPI: Do I have a memory leak?

2009-05-01 Thread Ralph Castain
If you are running on a single node, then btl=openib,sm,self would be
equivalent to btl=sm,self. OMPI is smart enough to know not to use IB if you
are on a single node, and instead uses the shared memory subsystem.

Are you saying that the inclusion of openib is causing a difference in
behavior, even though all procs are on the same node??

Just want to ensure I understand the problem.

Thanks
Ralph


On Fri, May 1, 2009 at 11:16 AM, Gus Correa  wrote:

> Hi OpenMPI and HPC experts
>
> This may or may not be the right forum to post this,
> and I am sorry to bother those that think it is not.
>
> I am trying to run the HPL benchmark on our cluster,
> compiling it with Gnu and linking to
> GotoBLAS (1.26) and OpenMPI (1.3.1),
> both also Gnu-compiled.
>
> I have got failures that suggest a memory leak when the
> problem size is large, but still within the memory limits
> recommended by HPL.
> The problem only happens when "openib" is among the OpenMPI
> MCA parameters (and the problem size is large).
> Any help is appreciated.
>
> Here is a description of what happens.
>
> For starters I am trying HPL on a single node, to get a feeling for
> the right parameters (N & NB, P & Q, etc) on dual-socked quad-core
> AMD Opteron 2376 "Shanghai"
>
> The HPL recommendation is to use close to 80% of your physical memory,
> to reach top Gigaflop performance.
> Our physical memory on a node is 16GB, and this gives a problem size
> N=40,000 to keep the 80% memory use.
> I tried several block sizes, somewhat correlated to the size of the
> processor cache:  NB=64 80 96 128 ...
>
> When I run HPL with N=20,000 or smaller all works fine,
> and the HPL run completes, regardless of whether "openib"
> is present or not on my MCA parameters.
>
> However, moving when I move N=40,000, or even N=35,000,
> the run starts OK with NB=64,
> but as NB is switched to larger values
> the total memory use increases in jumps (as shown by Ganglia),
> and becomes uneven across the processors (as shown by "top").
> The problem happens if "openib" is among the MCA parameters,
> but doesn't happen if I remove "openib" from the MCA list and use
> only "sm,self".
>
> For N=35,000, when NB reaches 96 memory use is already above the physical
> limit
> (16GB), having increased from 12.5GB to over 17GB.
> For N=40,000 the problem happens even earlier, with NB=80.
> At this point memory swapping kicks in,
> and eventually the run dies with memory allocation errors:
>
>
> 
> T/VNNB P Q   Time  Gflops
>
> 
> WR01L2L4   35000   128 8 1 539.66 5.297e+01
>
> 
> ||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)=0.0043992 ..
> PASSED
> HPL ERROR from process # 0, on line 172 of function HPL_pdtest:
> >>> [7,0] Memory allocation failed for A, x and b. Skip. <<<
> ...
>
> ***
>
> The code snippet that corresponds to HPL_pdest.c is this,
> although the leak is probably somewhere else:
>
> /*
>  * Allocate dynamic memory
>  */
>   vptr = (void*)malloc( ( (size_t)(ALGO->align) +
>   (size_t)(mat.ld+1) * (size_t)(mat.nq) ) *
> sizeof(double) );
>   info[0] = (vptr == NULL); info[1] = myrow; info[2] = mycol;
>   (void) HPL_all_reduce( (void *)(info), 3, HPL_INT, HPL_max,
>  GRID->all_comm );
>   if( info[0] != 0 )
>   {
>  if( ( myrow == 0 ) && ( mycol == 0 ) )
> HPL_pwarn( TEST->outfp, __LINE__, "HPL_pdtest",
>"[%d,%d] %s", info[1], info[2],
>"Memory allocation failed for A, x and b. Skip." );
>  (TEST->kskip)++;
>  return;
>   }
>
> ***
>
> I found this continued increase in memory use rather strange,
> and suggestive of a memory leak in one of the codes being used.
>
> Everything (OpenMPI, GotoBLAS, and HPL)
> was compiled using Gnu only (gcc, gfortran, g++).
>
> I haven't changed anything on the compiler's memory model,
> i.e., I haven't used or changed the "-mcmodel" flag of gcc
> (I don't know if the Makefiles on HPL, GotoBLAS, and OpenMPI use it.)
>
> No additional load is present on the node,
> other than the OS (Linux CentOS 5.2), HPL is running alone.
>
> The cluster has Infiniband.
> However, I am running on a single node.
>
> The surprising thing is that if I run on shared memory only
> (-mca btl sm,self) there is no memory problem,
> the memory use is stable at about 13.9GB,
> and the run completes.
> So, there is a way around to run on a single node.
> (Actually shared memory is presumably the way to go on a single node.)
>
> However, if I introduce IB (-mca btl openib,sm,self)
> among the MCA btl parameters, then memory use blows up.
>
> This is bad news for me, because I want to extend 

[OMPI users] HPL with OpenMPI: Do I have a memory leak?

2009-05-01 Thread Gus Correa

Hi OpenMPI and HPC experts

This may or may not be the right forum to post this,
and I am sorry to bother those that think it is not.

I am trying to run the HPL benchmark on our cluster,
compiling it with Gnu and linking to
GotoBLAS (1.26) and OpenMPI (1.3.1),
both also Gnu-compiled.

I have got failures that suggest a memory leak when the
problem size is large, but still within the memory limits
recommended by HPL.
The problem only happens when "openib" is among the OpenMPI
MCA parameters (and the problem size is large).
Any help is appreciated.

Here is a description of what happens.

For starters I am trying HPL on a single node, to get a feeling for
the right parameters (N & NB, P & Q, etc) on dual-socked quad-core
AMD Opteron 2376 "Shanghai"

The HPL recommendation is to use close to 80% of your physical memory,
to reach top Gigaflop performance.
Our physical memory on a node is 16GB, and this gives a problem size
N=40,000 to keep the 80% memory use.
I tried several block sizes, somewhat correlated to the size of the
processor cache:  NB=64 80 96 128 ...

When I run HPL with N=20,000 or smaller all works fine,
and the HPL run completes, regardless of whether "openib"
is present or not on my MCA parameters.

However, moving when I move N=40,000, or even N=35,000,
the run starts OK with NB=64,
but as NB is switched to larger values
the total memory use increases in jumps (as shown by Ganglia),
and becomes uneven across the processors (as shown by "top").
The problem happens if "openib" is among the MCA parameters,
but doesn't happen if I remove "openib" from the MCA list and use
only "sm,self".

For N=35,000, when NB reaches 96 memory use is already above the 
physical limit

(16GB), having increased from 12.5GB to over 17GB.
For N=40,000 the problem happens even earlier, with NB=80.
At this point memory swapping kicks in,
and eventually the run dies with memory allocation errors:


T/VNNB P Q   Time 
  Gflops


WR01L2L4   35000   128 8 1 539.66 
5.297e+01


||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)=0.0043992 
.. PASSED

HPL ERROR from process # 0, on line 172 of function HPL_pdtest:
>>> [7,0] Memory allocation failed for A, x and b. Skip. <<<
...

***

The code snippet that corresponds to HPL_pdest.c is this,
although the leak is probably somewhere else:

/*
 * Allocate dynamic memory
 */
   vptr = (void*)malloc( ( (size_t)(ALGO->align) +
   (size_t)(mat.ld+1) * (size_t)(mat.nq) ) *
 sizeof(double) );
   info[0] = (vptr == NULL); info[1] = myrow; info[2] = mycol;
   (void) HPL_all_reduce( (void *)(info), 3, HPL_INT, HPL_max,
  GRID->all_comm );
   if( info[0] != 0 )
   {
  if( ( myrow == 0 ) && ( mycol == 0 ) )
 HPL_pwarn( TEST->outfp, __LINE__, "HPL_pdtest",
"[%d,%d] %s", info[1], info[2],
"Memory allocation failed for A, x and b. Skip." );
  (TEST->kskip)++;
  return;
   }

***

I found this continued increase in memory use rather strange,
and suggestive of a memory leak in one of the codes being used.

Everything (OpenMPI, GotoBLAS, and HPL)
was compiled using Gnu only (gcc, gfortran, g++).

I haven't changed anything on the compiler's memory model,
i.e., I haven't used or changed the "-mcmodel" flag of gcc
(I don't know if the Makefiles on HPL, GotoBLAS, and OpenMPI use it.)

No additional load is present on the node,
other than the OS (Linux CentOS 5.2), HPL is running alone.

The cluster has Infiniband.
However, I am running on a single node.

The surprising thing is that if I run on shared memory only
(-mca btl sm,self) there is no memory problem,
the memory use is stable at about 13.9GB,
and the run completes.
So, there is a way around to run on a single node.
(Actually shared memory is presumably the way to go on a single node.)

However, if I introduce IB (-mca btl openib,sm,self)
among the MCA btl parameters, then memory use blows up.

This is bad news for me, because I want to extend the experiment
to run HPL also across the whole cluster using IB,
which is actually the ultimate goal of HPL, of course!
It also suggests that the problem is somehow related to Infiniband,
maybe hidden under OpenMPI.

Here is the mpiexec command I use (with and without openib):

/path/to/openmpi/bin/mpiexec \
-prefix /the/run/directory \
-np 8 \
-mca btl [openib,]sm,self \
xhpl


Any help, insights, suggestions, reports of previous experiences,
are much appreciated.

Thank you,
Gus Correa


Re: [OMPI users] MPI processes hang when using OpenMPI 1.3.2 and Gcc-4.4.0

2009-05-01 Thread Eugene Loh
So far, I'm unable to reproduce this problem.  I haven't exactly 
reproduced your test conditions, but then I can't.  At a minimum, I 
don't have exactly the code you ran (and not convinced I want to!).  So:


*) Can you reproduce the problem with the stand-alone test case I sent out?
*) Does the problem correlate with OMPI version?  (I.e., 1.3.1 versus 
1.3.2.)

*) Does the problem occur at lower np?
*) Does the problem correlate with the compiler version?  (I.e., GCC 4.4 
versus 4.3.3.)
*) What is the failure rate?  How many times should I expect to run to 
see failures?

*) How large is N?

Eugene Loh wrote:


Simone Pellegrini wrote:


Dear all,
I have successfully compiled and installed openmpi 1.3.2 on a 8 
socket quad-core machine from Sun.


I have used both Gcc-4.4 and Gcc-4.3.3 during the compilation phase 
but when I try to run simple MPI programs processes hangs. Actually 
this is the kernel of the application I am trying to run:


MPI_Barrier(MPI_COMM_WORLD);
total = MPI_Wtime();
for(i=0; i0)
MPI_Sendrecv(A[i-1], N, MPI_FLOAT, top, 0, row, N, 
MPI_FLOAT, bottom, 0, MPI_COMM_WORLD, );

for(k=0; k

Re: [OMPI users] compilation application with openmpi question

2009-05-01 Thread Ralph Castain
Hmmmthose appear to be vampirtrace functions. I suspect they will have
to fix it.

For now, you can work around the problem by configuring with this:

--enable-contrib-no-build=vt

That will turn the offending code off.

Ralph


On Fri, May 1, 2009 at 9:07 AM, David Wong  wrote:

> Hi,
>
>  I have installed openmpi on my machine and tested with some simple
> programs such as ring and fpi. Everything works. When I tried to compile my
> application, I got the following:
>
> /work/wdx/ptmp/openmpi/openmpi-1.3.2/lib/libotf.a(OTF_File.o): In function
> `OTF_File_open_zlevel':
> OTF_File.c:(.text+0x5a2): undefined reference to `inflateInit_'
> OTF_File.c:(.text+0x762): undefined reference to `deflateInit_'
> /work/wdx/ptmp/openmpi/openmpi-1.3.2/lib/libotf.a(OTF_File.o): In function
> `OTF_File_seek':
> OTF_File.c:(.text+0x1172): undefined reference to `inflateEnd'
> OTF_File.c:(.text+0x11a2): undefined reference to `inflateInit_'
> OTF_File.c:(.text+0x11c2): undefined reference to `inflateSync'
> /work/wdx/ptmp/openmpi/openmpi-1.3.2/lib/libotf.a(OTF_File.o): In function
> `OTF_File_read':
> OTF_File.c:(.text+0x1322): undefined reference to `inflate'
> /work/wdx/ptmp/openmpi/openmpi-1.3.2/lib/libotf.a(OTF_File.o): In function
> `OTF_File_write':
> OTF_File.c:(.text+0x1622): undefined reference to `deflate'
> OTF_File.c:(.text+0x1772): undefined reference to `deflate'
> /work/wdx/ptmp/openmpi/openmpi-1.3.2/lib/libotf.a(OTF_File.o): In function
> `OTF_File_close':
> OTF_File.c:(.text+0x19d2): undefined reference to `inflateEnd'
> OTF_File.c:(.text+0x1bc2): undefined reference to `deflate'
> OTF_File.c:(.text+0x1c82): undefined reference to `deflateEnd'
> make: *** [CCTM_e1a_Linux2_i686intel] Error 1
>
> Am I missing something in the openmpi building process? Please advise. Your
> help is greatly appreciated.
>
> Thanks,
> David
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>


[OMPI users] compilation application with openmpi question

2009-05-01 Thread David Wong

Hi,

  I have installed openmpi on my machine and tested with some simple 
programs such as ring and fpi. Everything works. When I tried to compile 
my application, I got the following:


/work/wdx/ptmp/openmpi/openmpi-1.3.2/lib/libotf.a(OTF_File.o): In 
function `OTF_File_open_zlevel':

OTF_File.c:(.text+0x5a2): undefined reference to `inflateInit_'
OTF_File.c:(.text+0x762): undefined reference to `deflateInit_'
/work/wdx/ptmp/openmpi/openmpi-1.3.2/lib/libotf.a(OTF_File.o): In 
function `OTF_File_seek':

OTF_File.c:(.text+0x1172): undefined reference to `inflateEnd'
OTF_File.c:(.text+0x11a2): undefined reference to `inflateInit_'
OTF_File.c:(.text+0x11c2): undefined reference to `inflateSync'
/work/wdx/ptmp/openmpi/openmpi-1.3.2/lib/libotf.a(OTF_File.o): In 
function `OTF_File_read':

OTF_File.c:(.text+0x1322): undefined reference to `inflate'
/work/wdx/ptmp/openmpi/openmpi-1.3.2/lib/libotf.a(OTF_File.o): In 
function `OTF_File_write':

OTF_File.c:(.text+0x1622): undefined reference to `deflate'
OTF_File.c:(.text+0x1772): undefined reference to `deflate'
/work/wdx/ptmp/openmpi/openmpi-1.3.2/lib/libotf.a(OTF_File.o): In 
function `OTF_File_close':

OTF_File.c:(.text+0x19d2): undefined reference to `inflateEnd'
OTF_File.c:(.text+0x1bc2): undefined reference to `deflate'
OTF_File.c:(.text+0x1c82): undefined reference to `deflateEnd'
make: *** [CCTM_e1a_Linux2_i686intel] Error 1

Am I missing something in the openmpi building process? Please advise. 
Your help is greatly appreciated.


Thanks,
David



Re: [MTT users] Splitting build and run phases

2009-05-01 Thread Matney Sr, Kenneth D.
At ORNL, I do this (when I have time to run MTT and time
to check the results).  What I do is set up my script to 
check to see if it is in a batch job.   If so, it runs 
the tests, like so:

  mtt --verbose\
  --print-time \
  --no-mpi-phases --no-test-get --no-test-build\
  --scratch   ${SW_BLDDIR} \
  --file  ${HOME}/mtt-jaguarpf/ornl-pgi.ini

But, if not in a batch job, it builds OMPI and the tests, by:

  mtt --verbose\
  --print-time \
  --no-test-run\
  --scratch   ${SW_BLDDIR} \
  --file  ${HOME}/mtt-jaguarpf/ornl-pgi.ini

In addition, I have the script, when it is not in a batch job submit
itself as a batch job, when it finishes building.  So, basically, I
can fire off the build script and go work on other things.
-- Ken



-Original Message-
From: mtt-users-boun...@open-mpi.org
[mailto:mtt-users-boun...@open-mpi.org] On Behalf Of Barrett, Brian W
Sent: Thursday, April 30, 2009 5:17 PM
To: user list for the MPI Testing Tool
Subject: [MTT users] Splitting build and run phases

Hi all -

I have what's probably a stupid question, but I couldn't find the answer
on
the wiki.

I've currently been building OMPI and the tests then running the tests
all
in the same MTT run, all in a batch job.  The problem is, that means
I've
got a bunch of nodes reserved while building OMPI, which I can't
actually
use.

Is there any way to split the two phases (build and run) so that I can
build
outside of the batch job, get the reservation, and run the tests?

Thanks,

Brian

--
   Brian W. Barrett
   Dept. 1423: Scalable System Software
   Sandia National Laboratories



___
mtt-users mailing list
mtt-us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/mtt-users



Re: [MTT users] Splitting build and run phases

2009-05-01 Thread Jeff Squyres

On Apr 30, 2009, at 5:17 PM, Barrett, Brian W wrote:

I have what's probably a stupid question, but I couldn't find the  
answer on

the wiki.



The wiki has a lot of info, but it is probably incomplete.  :-\

I've currently been building OMPI and the tests then running the  
tests all
in the same MTT run, all in a batch job.  The problem is, that means  
I've
got a bunch of nodes reserved while building OMPI, which I can't  
actually

use.

Is there any way to split the two phases (build and run) so that I  
can build

outside of the batch job, get the reservation, and run the tests?




Yes.  I actually have quite a sophisticated (if I do say so  
myself ;-) ) system at Cisco -- I split all my gets/installs/builds  
into separate slurm jobs from the corresponding test runs, for  
example.  In that way, I can submit a whole pile of 1-node SLURM jobs  
to do all the gets/installs/builds, and then N-node SLURM jobs for the  
test runs.  Even better, I make the N-node SLURM jobs depend on the 1- 
node SLURM get/install/build jobs.  That way, if the 1-node job fails  
(e.g., someone commits a build error to the tree and the MPI install  
phase fails), then SLURM will automatically dequeue any dependent jobs  
without even running them.  MTT would recognize this and simply not  
run the test run phases, but it's nice that SLURM just kills them  
without even running them.  :-)


Anyhoo...  The client is quite flexible; you can limit what you run by  
phase and/or section.  Check out the output of "./client/mtt --help".   
This part in particular:


--[no-]mpi-get  Do the "MPI get" phase
--[no-]mpi-install  Do the "MPI install" phase
--[no-]mpi-phases   Alias for --mpi-get --mpi-install
--[no-]test-get Do the "Test get" phase
--[no-]test-build   Do the "Test build" phase
--[no-]test-run Do the "Test run" phase
--[no-]test-phases  Alias for --test-get --test-build -- 
test-run

--[no-]section  Do a specific section(s)

By default, the client runs everything in finds in the ini file.  But  
you can tell it exactly what phases to run (or not to run).  For  
example, say I had 2 MPI get phases:


[MPI get: ompi-nightly-trunk]
[MPI get: ompi-nightly-v1.3]

You can tell the client to run just the MPI Get phases:

   ./client/mtt --file ... --scratch ... --mpi-get

Or you can tell the client to run just the "trunk" MPI Get phase:

   ./client/mtt --file ... --scratch ... --mpi-get --section trunk

--section matching is case-insensitive.

BEWARE: the --section matching applies to *all* sections.   
Specifically, if you're running a reportable phase (MPI Install, Test  
Build, Test Install), you must *also* be able to match your reporter  
section or that section won't be included.  For example:


   ./client/mtt --file ... --scratch ... --mpi-install --section gnu- 
standard --section reporter


In my cisco-ompi-core-testing.ini file (see ompi-tests/trunk/cisco/ 
mtt), this will run the following sections:


[MPI install: GNU-standard]
[Reporter: IU database]

I have a "nightly.pl" script (same SVN dir, see above) that launches a  
set of very specific SLURM jobs to do Cisco's runs.  It reads the  
sections from the Cisco INI file and launches a whole series of 1-node  
SLURM jobs, each with a unique scratch tree, each doing a single MPI  
install section corresponding to a single MPI get section, and then  
doing all corresponding Test Builds.  It essentially runs "run-mtt- 
compile.pl  ".  This script essentially  
does the following:


   # Run a single MPI Get phase
   ./client/mtt -p --file ... --scratch  --mpi-get --section  
reporter --section 

   # if ^^ succeeds, run a single MPI install phase
   ./client/mtt -p --file ... --scratch  --mpi-install --section  
reporter --section 
   # if ^^ succeeds, run all corresponding Test Get and Test Build  
phases

   ./client/mtt -p --file ... --scratch  --test-get --test-build

I also sbatch a whole pile of corresponding N-node Test Run SLURM jobs  
that are dependent upon the above SLURM job that essentially run the  
following:


   ./client/mtt -p --file ... --scratch  --test-run --section  
reporter --section 


Hope that helps.

--
Jeff Squyres
Cisco Systems