Re: [OMPI users] HPL with OpenMPI: Do I have a memory leak?
Hi Brian Thank you very much for the instant help! I just tried "-mca btl openib,sm,self" and "-mca mpi_leave_pinned 0" together (still with OpenMPI 1.3.1). So far so good, it passed through two NB cases/linear system solutions, it is running the third NB, and the memory use hasn't increased. On the failed runs the second NB already used more memory than the first, and the third would blow up memory use. If the run was bound do fail it would be swapping memory at this point, and it is not. This is a good sign, I hope I am not speaking too early, but it looks like your suggestion fixed the problem. Thanks! It was interesting to observe using Ganglia that on the failed runs the memory use "jumps" happened whenever HPL switched from one NB to another. Every NB transition (i.e., time HPL started to solve a new linear system, and probably generated a new random matrix) the memory use would jump to a (significantly) higher value. Anyway, this is just is in case the info tells you something about what might be going on. I will certainly follow your advice and upgrade to OpenMPI 1.3.2, which I just downloaded. You guys are prolific, a new edition per month! :) Many thanks! Gus Correa Brian W. Barrett wrote: Gus - Open MPI 1.3.0 & 1.3.1 attempted to use some controls in the glibc malloc implementation to handle memory registration caching for InfiniBand. Unfortunately, it was not only bugging in that it didn't work, but it also has the side effect that certain memory usage patterns can cause the memory allocator to use much more memory than it normally would. The configuration options were set any time the openib module was loaded, even if it wasn't used in communication. Can you try running with the extra option: -mca mpi_leave_pinned 0 I'm guessing that will fix the problem. If you're using InfiniBand, you probably want to upgrade to 1.3.2, as there are known data corruption issues in 1.3.0 and 1.3.1 with openib. Brian On Fri, 1 May 2009, Gus Correa wrote: Hi Ralph Thank you very much for the prompt answer. Sorry for being so confusing on my original message. Yes, I am saying that the inclusion of openib is causing the difference in behavior. It runs with "sm,self", it fails with "openib,sm,self". I am as puzzled as you are, because I thought the "openib" parameter was simply ignored when running on a single node, exactly like you said. After your message arrived, I ran HPL once more with "openib", just in case. Sure enough it failed just as I described. And yes, all the procs run on a single node in both cases. It doesn't seem to be a problem caused by a particular node hardware either, as I already tried three different nodes with similar results. BTW, I successfully ran HPL across the whole cluster two days ago, with IB ("openib,sm,self"), but using a modest (for the cluster) problem size: N=50,000. The total cluster memory is 24*16=384GB, which gives a max HPL problem size N=195,000. I have yet to try the large problem on the whole cluster, but I am afraid I will stumble on the same memory problem. Finally, on your email you use the syntax "btl=openib,sm,self", with an "=" sign between the btl key and its values. However, the mpiexec man page uses the syntax "btl openib,sm,self", with a blank space between the btl key and its values. I've been following the man page syntax. The "=" sign doesn't seem to work, and aborts with the error: "No executable was specified on the mpiexec command line.". Could this possibly be the issue (say, wrong parsing of mca options)? Many thanks! Gus Correa - Gustavo Correa Lamont-Doherty Earth Observatory - Columbia University Palisades, NY, 10964-8000 - USA - Ralph Castain wrote: If you are running on a single node, then btl=openib,sm,self would be equivalent to btl=sm,self. OMPI is smart enough to know not to use IB if you are on a single node, and instead uses the shared memory subsystem. Are you saying that the inclusion of openib is causing a difference in behavior, even though all procs are on the same node?? Just want to ensure I understand the problem. Thanks Ralph On Fri, May 1, 2009 at 11:16 AM, Gus Correa> wrote: Hi OpenMPI and HPC experts This may or may not be the right forum to post this, and I am sorry to bother those that think it is not. I am trying to run the HPL benchmark on our cluster, compiling it with Gnu and linking to GotoBLAS (1.26) and OpenMPI (1.3.1), both also Gnu-compiled. I have got failures that suggest a memory leak when the problem size is large, but still within the memory limits recommended by HPL. The problem only happens when "openib" is among the OpenMPI MCA parameters (and the problem size is large). Any help is appreciated. Here is a
Re: [OMPI users] HPL with OpenMPI: Do I have a memory leak?
Hi Jacob Thank you very much for the suggestions and insight. On an idle node MemFree is about 15599152 kB (14.8GB). Applying the "80%" rule to it, I get a problem size N=38,440. However, the HPL run fails with the memory leak problem even if I use N=35,000, with openib among the MCA btl parameters. You may have seen another message by Brian Barret explaining a possible reason for the problem, and suggesting a workaround. I haven't tried it yet, but I will. I read about the HPL preference for "square" PxQ processor grids. On a single node the fastest runs are 2x4, but 1x8 is often times competitive also, coming second or third, although it is not "square" at all. I would guess this has much to do with the physical 2-socket-4-core layout, or not? I would also guess that the best processor grid is likely to be quite different when the whole cluster is used, right? How can one use the 2x4 fastest processor grid layout on a single node to infer the fastest processor grid for the cluster? The best I got so far was 80% efficiency, less than your "at least 85%". So, I certainly have more work to do. GotoBLAS was compiled with Gnu, no special optimization flags, other than what the distribution Makefiles already have. OpenMPI was also compiled with Gnu, but I used the CFLAGS=FFLAGS=: -march=amdfam10 -O3 -finline-functions -funroll-loops -mfpmath=sse As I used mpicc and mpif77 to compile HPL, I presume it inherited these flags also, right? However, I already read comments on other mailing lists that "-march=adfam10" is not really the best choice for Barcelona (and I wonder if it is for Shanghai), although gcc says it tailored for that architecture. What "-march" is really the fastest? Any suggestions in this area of compilers and optimization? Many thanks, Gus Correa - Gustavo Correa Lamont-Doherty Earth Observatory - Columbia University Palisades, NY, 10964-8000 - USA - jacob_liber...@dell.com wrote: Hi Gus, For single node runs, don't bother specifying the btl. Openmpi should select the best option. Beyond that, the "80% total RAM" recommendation is misleading. Base your N off the memfree rather than memtotal. IB can reserve quite a bit. Verify your /etc/security/limits.conf limits allow sufficient locking. (Try unlimited) Finally, P should be smaller than Q, and squarer values are recommended. With Shanghai, OpenMPI, GotoBLAS expect single node efficiency of a least 85% given decent tuning. If the distribution continues to look strange, there are more things to check. Thanks, Jacob -Original Message- From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf Of Gus Correa Sent: Friday, May 01, 2009 12:17 PM To: Open MPI Users Subject: [OMPI users] HPL with OpenMPI: Do I have a memory leak? Hi OpenMPI and HPC experts This may or may not be the right forum to post this, and I am sorry to bother those that think it is not. I am trying to run the HPL benchmark on our cluster, compiling it with Gnu and linking to GotoBLAS (1.26) and OpenMPI (1.3.1), both also Gnu-compiled. I have got failures that suggest a memory leak when the problem size is large, but still within the memory limits recommended by HPL. The problem only happens when "openib" is among the OpenMPI MCA parameters (and the problem size is large). Any help is appreciated. Here is a description of what happens. For starters I am trying HPL on a single node, to get a feeling for the right parameters (N & NB, P & Q, etc) on dual-socked quad-core AMD Opteron 2376 "Shanghai" The HPL recommendation is to use close to 80% of your physical memory, to reach top Gigaflop performance. Our physical memory on a node is 16GB, and this gives a problem size N=40,000 to keep the 80% memory use. I tried several block sizes, somewhat correlated to the size of the processor cache: NB=64 80 96 128 ... When I run HPL with N=20,000 or smaller all works fine, and the HPL run completes, regardless of whether "openib" is present or not on my MCA parameters. However, moving when I move N=40,000, or even N=35,000, the run starts OK with NB=64, but as NB is switched to larger values the total memory use increases in jumps (as shown by Ganglia), and becomes uneven across the processors (as shown by "top"). The problem happens if "openib" is among the MCA parameters, but doesn't happen if I remove "openib" from the MCA list and use only "sm,self". For N=35,000, when NB reaches 96 memory use is already above the physical limit (16GB), having increased from 12.5GB to over 17GB. For N=40,000 the problem happens even earlier, with NB=80. At this point memory swapping kicks in, and eventually the run dies with memory allocation errors: === = T/VNNB P Q Time Gflops
Re: [OMPI users] HPL with OpenMPI: Do I have a memory leak?
Gus - Open MPI 1.3.0 & 1.3.1 attempted to use some controls in the glibc malloc implementation to handle memory registration caching for InfiniBand. Unfortunately, it was not only bugging in that it didn't work, but it also has the side effect that certain memory usage patterns can cause the memory allocator to use much more memory than it normally would. The configuration options were set any time the openib module was loaded, even if it wasn't used in communication. Can you try running with the extra option: -mca mpi_leave_pinned 0 I'm guessing that will fix the problem. If you're using InfiniBand, you probably want to upgrade to 1.3.2, as there are known data corruption issues in 1.3.0 and 1.3.1 with openib. Brian On Fri, 1 May 2009, Gus Correa wrote: Hi Ralph Thank you very much for the prompt answer. Sorry for being so confusing on my original message. Yes, I am saying that the inclusion of openib is causing the difference in behavior. It runs with "sm,self", it fails with "openib,sm,self". I am as puzzled as you are, because I thought the "openib" parameter was simply ignored when running on a single node, exactly like you said. After your message arrived, I ran HPL once more with "openib", just in case. Sure enough it failed just as I described. And yes, all the procs run on a single node in both cases. It doesn't seem to be a problem caused by a particular node hardware either, as I already tried three different nodes with similar results. BTW, I successfully ran HPL across the whole cluster two days ago, with IB ("openib,sm,self"), but using a modest (for the cluster) problem size: N=50,000. The total cluster memory is 24*16=384GB, which gives a max HPL problem size N=195,000. I have yet to try the large problem on the whole cluster, but I am afraid I will stumble on the same memory problem. Finally, on your email you use the syntax "btl=openib,sm,self", with an "=" sign between the btl key and its values. However, the mpiexec man page uses the syntax "btl openib,sm,self", with a blank space between the btl key and its values. I've been following the man page syntax. The "=" sign doesn't seem to work, and aborts with the error: "No executable was specified on the mpiexec command line.". Could this possibly be the issue (say, wrong parsing of mca options)? Many thanks! Gus Correa - Gustavo Correa Lamont-Doherty Earth Observatory - Columbia University Palisades, NY, 10964-8000 - USA - Ralph Castain wrote: If you are running on a single node, then btl=openib,sm,self would be equivalent to btl=sm,self. OMPI is smart enough to know not to use IB if you are on a single node, and instead uses the shared memory subsystem. Are you saying that the inclusion of openib is causing a difference in behavior, even though all procs are on the same node?? Just want to ensure I understand the problem. Thanks Ralph On Fri, May 1, 2009 at 11:16 AM, Gus Correa> wrote: Hi OpenMPI and HPC experts This may or may not be the right forum to post this, and I am sorry to bother those that think it is not. I am trying to run the HPL benchmark on our cluster, compiling it with Gnu and linking to GotoBLAS (1.26) and OpenMPI (1.3.1), both also Gnu-compiled. I have got failures that suggest a memory leak when the problem size is large, but still within the memory limits recommended by HPL. The problem only happens when "openib" is among the OpenMPI MCA parameters (and the problem size is large). Any help is appreciated. Here is a description of what happens. For starters I am trying HPL on a single node, to get a feeling for the right parameters (N & NB, P & Q, etc) on dual-socked quad-core AMD Opteron 2376 "Shanghai" The HPL recommendation is to use close to 80% of your physical memory, to reach top Gigaflop performance. Our physical memory on a node is 16GB, and this gives a problem size N=40,000 to keep the 80% memory use. I tried several block sizes, somewhat correlated to the size of the processor cache: NB=64 80 96 128 ... When I run HPL with N=20,000 or smaller all works fine, and the HPL run completes, regardless of whether "openib" is present or not on my MCA parameters. However, moving when I move N=40,000, or even N=35,000, the run starts OK with NB=64, but as NB is switched to larger values the total memory use increases in jumps (as shown by Ganglia), and becomes uneven across the processors (as shown by "top"). The problem happens if "openib" is among the MCA parameters, but doesn't happen if I remove "openib" from the MCA list and use only "sm,self". For N=35,000, when NB reaches 96 memory use is already above the physical limit
Re: [OMPI users] HPL with OpenMPI: Do I have a memory leak?
Hi Ralph Thank you very much for the prompt answer. Sorry for being so confusing on my original message. Yes, I am saying that the inclusion of openib is causing the difference in behavior. It runs with "sm,self", it fails with "openib,sm,self". I am as puzzled as you are, because I thought the "openib" parameter was simply ignored when running on a single node, exactly like you said. After your message arrived, I ran HPL once more with "openib", just in case. Sure enough it failed just as I described. And yes, all the procs run on a single node in both cases. It doesn't seem to be a problem caused by a particular node hardware either, as I already tried three different nodes with similar results. BTW, I successfully ran HPL across the whole cluster two days ago, with IB ("openib,sm,self"), but using a modest (for the cluster) problem size: N=50,000. The total cluster memory is 24*16=384GB, which gives a max HPL problem size N=195,000. I have yet to try the large problem on the whole cluster, but I am afraid I will stumble on the same memory problem. Finally, on your email you use the syntax "btl=openib,sm,self", with an "=" sign between the btl key and its values. However, the mpiexec man page uses the syntax "btl openib,sm,self", with a blank space between the btl key and its values. I've been following the man page syntax. The "=" sign doesn't seem to work, and aborts with the error: "No executable was specified on the mpiexec command line.". Could this possibly be the issue (say, wrong parsing of mca options)? Many thanks! Gus Correa - Gustavo Correa Lamont-Doherty Earth Observatory - Columbia University Palisades, NY, 10964-8000 - USA - Ralph Castain wrote: If you are running on a single node, then btl=openib,sm,self would be equivalent to btl=sm,self. OMPI is smart enough to know not to use IB if you are on a single node, and instead uses the shared memory subsystem. Are you saying that the inclusion of openib is causing a difference in behavior, even though all procs are on the same node?? Just want to ensure I understand the problem. Thanks Ralph On Fri, May 1, 2009 at 11:16 AM, Gus Correa> wrote: Hi OpenMPI and HPC experts This may or may not be the right forum to post this, and I am sorry to bother those that think it is not. I am trying to run the HPL benchmark on our cluster, compiling it with Gnu and linking to GotoBLAS (1.26) and OpenMPI (1.3.1), both also Gnu-compiled. I have got failures that suggest a memory leak when the problem size is large, but still within the memory limits recommended by HPL. The problem only happens when "openib" is among the OpenMPI MCA parameters (and the problem size is large). Any help is appreciated. Here is a description of what happens. For starters I am trying HPL on a single node, to get a feeling for the right parameters (N & NB, P & Q, etc) on dual-socked quad-core AMD Opteron 2376 "Shanghai" The HPL recommendation is to use close to 80% of your physical memory, to reach top Gigaflop performance. Our physical memory on a node is 16GB, and this gives a problem size N=40,000 to keep the 80% memory use. I tried several block sizes, somewhat correlated to the size of the processor cache: NB=64 80 96 128 ... When I run HPL with N=20,000 or smaller all works fine, and the HPL run completes, regardless of whether "openib" is present or not on my MCA parameters. However, moving when I move N=40,000, or even N=35,000, the run starts OK with NB=64, but as NB is switched to larger values the total memory use increases in jumps (as shown by Ganglia), and becomes uneven across the processors (as shown by "top"). The problem happens if "openib" is among the MCA parameters, but doesn't happen if I remove "openib" from the MCA list and use only "sm,self". For N=35,000, when NB reaches 96 memory use is already above the physical limit (16GB), having increased from 12.5GB to over 17GB. For N=40,000 the problem happens even earlier, with NB=80. At this point memory swapping kicks in, and eventually the run dies with memory allocation errors: T/VNNB P Q Time Gflops WR01L2L4 35000 128 8 1 539.66 5.297e+01 ||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)=0.0043992 .. PASSED HPL ERROR from process # 0, on line 172 of function HPL_pdtest: >>>
Re: [OMPI users] Problem with Filem
This typically this means that one or more of the rcp/scp or rsh/ssh commands failed. FileM should be printing an error message when one of the copy commands fail. Try turning up the verbose level to 10 to see if it indicates any problems: -mca filem_rsh_verbose 10 Can you send me the MCA parameters that you are setting? That may help narrow down the problem as well. Also I cleaned up some of the filem (and snapc) error reporting in the development trunk if you want to give that a try. Let me know what you find out. Best, Josh On Apr 30, 2009, at 6:40 AM, Bouguerra mohamed slim wrote: Hello, I have a problem with the Filem module when i would checkpoint on a remote host without shared space file system. I use the new open-mpi 1.3.2 and it is the same problem as in the version 1.3.1. Indeed, when i use the NFS system file it works. Thus i guess that is a problem with the Filem. [azur-6.fr:23223] filem:rsh: wait_all(): Wait failed (-1) [azur-6.fr:23223] [[48784,0],0] ORTE_ERROR_LOG: Error in file /home/ grenoble/msbouguerra/openmpi-1.3.2/orte/mca/snapc/full/ snapc_full_global.c at line 1054 -- Cordialement, Mohamed-Slim BOUGUERRAPhD student INRIA-Grenoble / Projet MOAIS ENSIMAG - antenne de Montbonnot ZIRST 51, avenue Jean Kuntzmann 38330 MONTBONNOT SAINT MARTIN France Tel :+33 (0)4 76 61 20 79 Fax :+33 (0)4 76 61 20 99 ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] Checkpointing configuration problem
You might want to consider --enable-mpi-threads=yes Regards Yaakoub El Khamra On Fri, May 1, 2009 at 3:17 PM, Kritiraj Sajadahwrote: > > Dear all, > I am trying to install openmpi 1.3 on my laptop. I successfully > installed BLCR in /usr/local. > > When installing openmpi using the following options: > > ./configure --prefix=/usr/local --with-ft=cr --enable-ft-thread > --enable-MPI-thread --with-blcr=/usr/local > > I got the following error: > > > == System-specific tests > > ... > > checking if want fault tolerance thread... Must enable progress or MPI > threads to use this option > configure: error: Cannot continue > > Help please. > > regards, > > Raj > > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >
Re: [OMPI users] Checkpointing configuration problem
Try replacing "--enable-MPI-thread" with "--enable-mpi-threads". That should fix it. -- Josh On May 1, 2009, at 4:17 PM, Kritiraj Sajadah wrote: Dear all, I am trying to install openmpi 1.3 on my laptop. I successfully installed BLCR in /usr/local. When installing openmpi using the following options: ./configure --prefix=/usr/local --with-ft=cr --enable-ft-thread -- enable-MPI-thread --with-blcr=/usr/local I got the following error: == == == System-specific tests == == ... checking if want fault tolerance thread... Must enable progress or MPI threads to use this option configure: error: Cannot continue Help please. regards, Raj ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
[OMPI users] Checkpointing configuration problem
Dear all, I am trying to install openmpi 1.3 on my laptop. I successfully installed BLCR in /usr/local. When installing openmpi using the following options: ./configure --prefix=/usr/local --with-ft=cr --enable-ft-thread --enable-MPI-thread --with-blcr=/usr/local I got the following error: == System-specific tests ... checking if want fault tolerance thread... Must enable progress or MPI threads to use this option configure: error: Cannot continue Help please. regards, Raj
Re: [OMPI users] HPL with OpenMPI: Do I have a memory leak?
Hi Gus, For single node runs, don't bother specifying the btl. Openmpi should select the best option. Beyond that, the "80% total RAM" recommendation is misleading. Base your N off the memfree rather than memtotal. IB can reserve quite a bit. Verify your /etc/security/limits.conf limits allow sufficient locking. (Try unlimited) Finally, P should be smaller than Q, and squarer values are recommended. With Shanghai, OpenMPI, GotoBLAS expect single node efficiency of a least 85% given decent tuning. If the distribution continues to look strange, there are more things to check. Thanks, Jacob > -Original Message- > From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On > Behalf Of Gus Correa > Sent: Friday, May 01, 2009 12:17 PM > To: Open MPI Users > Subject: [OMPI users] HPL with OpenMPI: Do I have a memory leak? > > Hi OpenMPI and HPC experts > > This may or may not be the right forum to post this, > and I am sorry to bother those that think it is not. > > I am trying to run the HPL benchmark on our cluster, > compiling it with Gnu and linking to > GotoBLAS (1.26) and OpenMPI (1.3.1), > both also Gnu-compiled. > > I have got failures that suggest a memory leak when the > problem size is large, but still within the memory limits > recommended by HPL. > The problem only happens when "openib" is among the OpenMPI > MCA parameters (and the problem size is large). > Any help is appreciated. > > Here is a description of what happens. > > For starters I am trying HPL on a single node, to get a feeling for > the right parameters (N & NB, P & Q, etc) on dual-socked quad-core > AMD Opteron 2376 "Shanghai" > > The HPL recommendation is to use close to 80% of your physical memory, > to reach top Gigaflop performance. > Our physical memory on a node is 16GB, and this gives a problem size > N=40,000 to keep the 80% memory use. > I tried several block sizes, somewhat correlated to the size of the > processor cache: NB=64 80 96 128 ... > > When I run HPL with N=20,000 or smaller all works fine, > and the HPL run completes, regardless of whether "openib" > is present or not on my MCA parameters. > > However, moving when I move N=40,000, or even N=35,000, > the run starts OK with NB=64, > but as NB is switched to larger values > the total memory use increases in jumps (as shown by Ganglia), > and becomes uneven across the processors (as shown by "top"). > The problem happens if "openib" is among the MCA parameters, > but doesn't happen if I remove "openib" from the MCA list and use > only "sm,self". > > For N=35,000, when NB reaches 96 memory use is already above the > physical limit > (16GB), having increased from 12.5GB to over 17GB. > For N=40,000 the problem happens even earlier, with NB=80. > At this point memory swapping kicks in, > and eventually the run dies with memory allocation errors: > > === > = > T/VNNB P Q Time >Gflops > --- > - > WR01L2L4 35000 128 8 1 539.66 > 5.297e+01 > --- > - > ||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)=0.0043992 > .. PASSED > HPL ERROR from process # 0, on line 172 of function HPL_pdtest: > >>> [7,0] Memory allocation failed for A, x and b. Skip. <<< > ... > > *** > > The code snippet that corresponds to HPL_pdest.c is this, > although the leak is probably somewhere else: > > /* > * Allocate dynamic memory > */ > vptr = (void*)malloc( ( (size_t)(ALGO->align) + > (size_t)(mat.ld+1) * (size_t)(mat.nq) ) * > sizeof(double) ); > info[0] = (vptr == NULL); info[1] = myrow; info[2] = mycol; > (void) HPL_all_reduce( (void *)(info), 3, HPL_INT, HPL_max, >GRID->all_comm ); > if( info[0] != 0 ) > { >if( ( myrow == 0 ) && ( mycol == 0 ) ) > HPL_pwarn( TEST->outfp, __LINE__, "HPL_pdtest", > "[%d,%d] %s", info[1], info[2], > "Memory allocation failed for A, x and b. Skip." ); >(TEST->kskip)++; >return; > } > > *** > > I found this continued increase in memory use rather strange, > and suggestive of a memory leak in one of the codes being used. > > Everything (OpenMPI, GotoBLAS, and HPL) > was compiled using Gnu only (gcc, gfortran, g++). > > I haven't changed anything on the compiler's memory model, > i.e., I haven't used or changed the "-mcmodel" flag of gcc > (I don't know if the Makefiles on HPL, GotoBLAS, and OpenMPI use it.) > > No additional load is present on the node, > other than the OS (Linux CentOS 5.2), HPL is running alone. > > The cluster has Infiniband. > However, I am running on a single node. > > The surprising thing is that
Re: [OMPI users] HPL with OpenMPI: Do I have a memory leak?
If you are running on a single node, then btl=openib,sm,self would be equivalent to btl=sm,self. OMPI is smart enough to know not to use IB if you are on a single node, and instead uses the shared memory subsystem. Are you saying that the inclusion of openib is causing a difference in behavior, even though all procs are on the same node?? Just want to ensure I understand the problem. Thanks Ralph On Fri, May 1, 2009 at 11:16 AM, Gus Correawrote: > Hi OpenMPI and HPC experts > > This may or may not be the right forum to post this, > and I am sorry to bother those that think it is not. > > I am trying to run the HPL benchmark on our cluster, > compiling it with Gnu and linking to > GotoBLAS (1.26) and OpenMPI (1.3.1), > both also Gnu-compiled. > > I have got failures that suggest a memory leak when the > problem size is large, but still within the memory limits > recommended by HPL. > The problem only happens when "openib" is among the OpenMPI > MCA parameters (and the problem size is large). > Any help is appreciated. > > Here is a description of what happens. > > For starters I am trying HPL on a single node, to get a feeling for > the right parameters (N & NB, P & Q, etc) on dual-socked quad-core > AMD Opteron 2376 "Shanghai" > > The HPL recommendation is to use close to 80% of your physical memory, > to reach top Gigaflop performance. > Our physical memory on a node is 16GB, and this gives a problem size > N=40,000 to keep the 80% memory use. > I tried several block sizes, somewhat correlated to the size of the > processor cache: NB=64 80 96 128 ... > > When I run HPL with N=20,000 or smaller all works fine, > and the HPL run completes, regardless of whether "openib" > is present or not on my MCA parameters. > > However, moving when I move N=40,000, or even N=35,000, > the run starts OK with NB=64, > but as NB is switched to larger values > the total memory use increases in jumps (as shown by Ganglia), > and becomes uneven across the processors (as shown by "top"). > The problem happens if "openib" is among the MCA parameters, > but doesn't happen if I remove "openib" from the MCA list and use > only "sm,self". > > For N=35,000, when NB reaches 96 memory use is already above the physical > limit > (16GB), having increased from 12.5GB to over 17GB. > For N=40,000 the problem happens even earlier, with NB=80. > At this point memory swapping kicks in, > and eventually the run dies with memory allocation errors: > > > > T/VNNB P Q Time Gflops > > > WR01L2L4 35000 128 8 1 539.66 5.297e+01 > > > ||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)=0.0043992 .. > PASSED > HPL ERROR from process # 0, on line 172 of function HPL_pdtest: > >>> [7,0] Memory allocation failed for A, x and b. Skip. <<< > ... > > *** > > The code snippet that corresponds to HPL_pdest.c is this, > although the leak is probably somewhere else: > > /* > * Allocate dynamic memory > */ > vptr = (void*)malloc( ( (size_t)(ALGO->align) + > (size_t)(mat.ld+1) * (size_t)(mat.nq) ) * > sizeof(double) ); > info[0] = (vptr == NULL); info[1] = myrow; info[2] = mycol; > (void) HPL_all_reduce( (void *)(info), 3, HPL_INT, HPL_max, > GRID->all_comm ); > if( info[0] != 0 ) > { > if( ( myrow == 0 ) && ( mycol == 0 ) ) > HPL_pwarn( TEST->outfp, __LINE__, "HPL_pdtest", >"[%d,%d] %s", info[1], info[2], >"Memory allocation failed for A, x and b. Skip." ); > (TEST->kskip)++; > return; > } > > *** > > I found this continued increase in memory use rather strange, > and suggestive of a memory leak in one of the codes being used. > > Everything (OpenMPI, GotoBLAS, and HPL) > was compiled using Gnu only (gcc, gfortran, g++). > > I haven't changed anything on the compiler's memory model, > i.e., I haven't used or changed the "-mcmodel" flag of gcc > (I don't know if the Makefiles on HPL, GotoBLAS, and OpenMPI use it.) > > No additional load is present on the node, > other than the OS (Linux CentOS 5.2), HPL is running alone. > > The cluster has Infiniband. > However, I am running on a single node. > > The surprising thing is that if I run on shared memory only > (-mca btl sm,self) there is no memory problem, > the memory use is stable at about 13.9GB, > and the run completes. > So, there is a way around to run on a single node. > (Actually shared memory is presumably the way to go on a single node.) > > However, if I introduce IB (-mca btl openib,sm,self) > among the MCA btl parameters, then memory use blows up. > > This is bad news for me, because I want to extend
[OMPI users] HPL with OpenMPI: Do I have a memory leak?
Hi OpenMPI and HPC experts This may or may not be the right forum to post this, and I am sorry to bother those that think it is not. I am trying to run the HPL benchmark on our cluster, compiling it with Gnu and linking to GotoBLAS (1.26) and OpenMPI (1.3.1), both also Gnu-compiled. I have got failures that suggest a memory leak when the problem size is large, but still within the memory limits recommended by HPL. The problem only happens when "openib" is among the OpenMPI MCA parameters (and the problem size is large). Any help is appreciated. Here is a description of what happens. For starters I am trying HPL on a single node, to get a feeling for the right parameters (N & NB, P & Q, etc) on dual-socked quad-core AMD Opteron 2376 "Shanghai" The HPL recommendation is to use close to 80% of your physical memory, to reach top Gigaflop performance. Our physical memory on a node is 16GB, and this gives a problem size N=40,000 to keep the 80% memory use. I tried several block sizes, somewhat correlated to the size of the processor cache: NB=64 80 96 128 ... When I run HPL with N=20,000 or smaller all works fine, and the HPL run completes, regardless of whether "openib" is present or not on my MCA parameters. However, moving when I move N=40,000, or even N=35,000, the run starts OK with NB=64, but as NB is switched to larger values the total memory use increases in jumps (as shown by Ganglia), and becomes uneven across the processors (as shown by "top"). The problem happens if "openib" is among the MCA parameters, but doesn't happen if I remove "openib" from the MCA list and use only "sm,self". For N=35,000, when NB reaches 96 memory use is already above the physical limit (16GB), having increased from 12.5GB to over 17GB. For N=40,000 the problem happens even earlier, with NB=80. At this point memory swapping kicks in, and eventually the run dies with memory allocation errors: T/VNNB P Q Time Gflops WR01L2L4 35000 128 8 1 539.66 5.297e+01 ||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)=0.0043992 .. PASSED HPL ERROR from process # 0, on line 172 of function HPL_pdtest: >>> [7,0] Memory allocation failed for A, x and b. Skip. <<< ... *** The code snippet that corresponds to HPL_pdest.c is this, although the leak is probably somewhere else: /* * Allocate dynamic memory */ vptr = (void*)malloc( ( (size_t)(ALGO->align) + (size_t)(mat.ld+1) * (size_t)(mat.nq) ) * sizeof(double) ); info[0] = (vptr == NULL); info[1] = myrow; info[2] = mycol; (void) HPL_all_reduce( (void *)(info), 3, HPL_INT, HPL_max, GRID->all_comm ); if( info[0] != 0 ) { if( ( myrow == 0 ) && ( mycol == 0 ) ) HPL_pwarn( TEST->outfp, __LINE__, "HPL_pdtest", "[%d,%d] %s", info[1], info[2], "Memory allocation failed for A, x and b. Skip." ); (TEST->kskip)++; return; } *** I found this continued increase in memory use rather strange, and suggestive of a memory leak in one of the codes being used. Everything (OpenMPI, GotoBLAS, and HPL) was compiled using Gnu only (gcc, gfortran, g++). I haven't changed anything on the compiler's memory model, i.e., I haven't used or changed the "-mcmodel" flag of gcc (I don't know if the Makefiles on HPL, GotoBLAS, and OpenMPI use it.) No additional load is present on the node, other than the OS (Linux CentOS 5.2), HPL is running alone. The cluster has Infiniband. However, I am running on a single node. The surprising thing is that if I run on shared memory only (-mca btl sm,self) there is no memory problem, the memory use is stable at about 13.9GB, and the run completes. So, there is a way around to run on a single node. (Actually shared memory is presumably the way to go on a single node.) However, if I introduce IB (-mca btl openib,sm,self) among the MCA btl parameters, then memory use blows up. This is bad news for me, because I want to extend the experiment to run HPL also across the whole cluster using IB, which is actually the ultimate goal of HPL, of course! It also suggests that the problem is somehow related to Infiniband, maybe hidden under OpenMPI. Here is the mpiexec command I use (with and without openib): /path/to/openmpi/bin/mpiexec \ -prefix /the/run/directory \ -np 8 \ -mca btl [openib,]sm,self \ xhpl Any help, insights, suggestions, reports of previous experiences, are much appreciated. Thank you, Gus Correa
Re: [OMPI users] MPI processes hang when using OpenMPI 1.3.2 and Gcc-4.4.0
So far, I'm unable to reproduce this problem. I haven't exactly reproduced your test conditions, but then I can't. At a minimum, I don't have exactly the code you ran (and not convinced I want to!). So: *) Can you reproduce the problem with the stand-alone test case I sent out? *) Does the problem correlate with OMPI version? (I.e., 1.3.1 versus 1.3.2.) *) Does the problem occur at lower np? *) Does the problem correlate with the compiler version? (I.e., GCC 4.4 versus 4.3.3.) *) What is the failure rate? How many times should I expect to run to see failures? *) How large is N? Eugene Loh wrote: Simone Pellegrini wrote: Dear all, I have successfully compiled and installed openmpi 1.3.2 on a 8 socket quad-core machine from Sun. I have used both Gcc-4.4 and Gcc-4.3.3 during the compilation phase but when I try to run simple MPI programs processes hangs. Actually this is the kernel of the application I am trying to run: MPI_Barrier(MPI_COMM_WORLD); total = MPI_Wtime(); for(i=0; i0) MPI_Sendrecv(A[i-1], N, MPI_FLOAT, top, 0, row, N, MPI_FLOAT, bottom, 0, MPI_COMM_WORLD, ); for(k=0; k
Re: [OMPI users] compilation application with openmpi question
Hmmmthose appear to be vampirtrace functions. I suspect they will have to fix it. For now, you can work around the problem by configuring with this: --enable-contrib-no-build=vt That will turn the offending code off. Ralph On Fri, May 1, 2009 at 9:07 AM, David Wongwrote: > Hi, > > I have installed openmpi on my machine and tested with some simple > programs such as ring and fpi. Everything works. When I tried to compile my > application, I got the following: > > /work/wdx/ptmp/openmpi/openmpi-1.3.2/lib/libotf.a(OTF_File.o): In function > `OTF_File_open_zlevel': > OTF_File.c:(.text+0x5a2): undefined reference to `inflateInit_' > OTF_File.c:(.text+0x762): undefined reference to `deflateInit_' > /work/wdx/ptmp/openmpi/openmpi-1.3.2/lib/libotf.a(OTF_File.o): In function > `OTF_File_seek': > OTF_File.c:(.text+0x1172): undefined reference to `inflateEnd' > OTF_File.c:(.text+0x11a2): undefined reference to `inflateInit_' > OTF_File.c:(.text+0x11c2): undefined reference to `inflateSync' > /work/wdx/ptmp/openmpi/openmpi-1.3.2/lib/libotf.a(OTF_File.o): In function > `OTF_File_read': > OTF_File.c:(.text+0x1322): undefined reference to `inflate' > /work/wdx/ptmp/openmpi/openmpi-1.3.2/lib/libotf.a(OTF_File.o): In function > `OTF_File_write': > OTF_File.c:(.text+0x1622): undefined reference to `deflate' > OTF_File.c:(.text+0x1772): undefined reference to `deflate' > /work/wdx/ptmp/openmpi/openmpi-1.3.2/lib/libotf.a(OTF_File.o): In function > `OTF_File_close': > OTF_File.c:(.text+0x19d2): undefined reference to `inflateEnd' > OTF_File.c:(.text+0x1bc2): undefined reference to `deflate' > OTF_File.c:(.text+0x1c82): undefined reference to `deflateEnd' > make: *** [CCTM_e1a_Linux2_i686intel] Error 1 > > Am I missing something in the openmpi building process? Please advise. Your > help is greatly appreciated. > > Thanks, > David > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >
[OMPI users] compilation application with openmpi question
Hi, I have installed openmpi on my machine and tested with some simple programs such as ring and fpi. Everything works. When I tried to compile my application, I got the following: /work/wdx/ptmp/openmpi/openmpi-1.3.2/lib/libotf.a(OTF_File.o): In function `OTF_File_open_zlevel': OTF_File.c:(.text+0x5a2): undefined reference to `inflateInit_' OTF_File.c:(.text+0x762): undefined reference to `deflateInit_' /work/wdx/ptmp/openmpi/openmpi-1.3.2/lib/libotf.a(OTF_File.o): In function `OTF_File_seek': OTF_File.c:(.text+0x1172): undefined reference to `inflateEnd' OTF_File.c:(.text+0x11a2): undefined reference to `inflateInit_' OTF_File.c:(.text+0x11c2): undefined reference to `inflateSync' /work/wdx/ptmp/openmpi/openmpi-1.3.2/lib/libotf.a(OTF_File.o): In function `OTF_File_read': OTF_File.c:(.text+0x1322): undefined reference to `inflate' /work/wdx/ptmp/openmpi/openmpi-1.3.2/lib/libotf.a(OTF_File.o): In function `OTF_File_write': OTF_File.c:(.text+0x1622): undefined reference to `deflate' OTF_File.c:(.text+0x1772): undefined reference to `deflate' /work/wdx/ptmp/openmpi/openmpi-1.3.2/lib/libotf.a(OTF_File.o): In function `OTF_File_close': OTF_File.c:(.text+0x19d2): undefined reference to `inflateEnd' OTF_File.c:(.text+0x1bc2): undefined reference to `deflate' OTF_File.c:(.text+0x1c82): undefined reference to `deflateEnd' make: *** [CCTM_e1a_Linux2_i686intel] Error 1 Am I missing something in the openmpi building process? Please advise. Your help is greatly appreciated. Thanks, David
Re: [MTT users] Splitting build and run phases
At ORNL, I do this (when I have time to run MTT and time to check the results). What I do is set up my script to check to see if it is in a batch job. If so, it runs the tests, like so: mtt --verbose\ --print-time \ --no-mpi-phases --no-test-get --no-test-build\ --scratch ${SW_BLDDIR} \ --file ${HOME}/mtt-jaguarpf/ornl-pgi.ini But, if not in a batch job, it builds OMPI and the tests, by: mtt --verbose\ --print-time \ --no-test-run\ --scratch ${SW_BLDDIR} \ --file ${HOME}/mtt-jaguarpf/ornl-pgi.ini In addition, I have the script, when it is not in a batch job submit itself as a batch job, when it finishes building. So, basically, I can fire off the build script and go work on other things. -- Ken -Original Message- From: mtt-users-boun...@open-mpi.org [mailto:mtt-users-boun...@open-mpi.org] On Behalf Of Barrett, Brian W Sent: Thursday, April 30, 2009 5:17 PM To: user list for the MPI Testing Tool Subject: [MTT users] Splitting build and run phases Hi all - I have what's probably a stupid question, but I couldn't find the answer on the wiki. I've currently been building OMPI and the tests then running the tests all in the same MTT run, all in a batch job. The problem is, that means I've got a bunch of nodes reserved while building OMPI, which I can't actually use. Is there any way to split the two phases (build and run) so that I can build outside of the batch job, get the reservation, and run the tests? Thanks, Brian -- Brian W. Barrett Dept. 1423: Scalable System Software Sandia National Laboratories ___ mtt-users mailing list mtt-us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/mtt-users
Re: [MTT users] Splitting build and run phases
On Apr 30, 2009, at 5:17 PM, Barrett, Brian W wrote: I have what's probably a stupid question, but I couldn't find the answer on the wiki. The wiki has a lot of info, but it is probably incomplete. :-\ I've currently been building OMPI and the tests then running the tests all in the same MTT run, all in a batch job. The problem is, that means I've got a bunch of nodes reserved while building OMPI, which I can't actually use. Is there any way to split the two phases (build and run) so that I can build outside of the batch job, get the reservation, and run the tests? Yes. I actually have quite a sophisticated (if I do say so myself ;-) ) system at Cisco -- I split all my gets/installs/builds into separate slurm jobs from the corresponding test runs, for example. In that way, I can submit a whole pile of 1-node SLURM jobs to do all the gets/installs/builds, and then N-node SLURM jobs for the test runs. Even better, I make the N-node SLURM jobs depend on the 1- node SLURM get/install/build jobs. That way, if the 1-node job fails (e.g., someone commits a build error to the tree and the MPI install phase fails), then SLURM will automatically dequeue any dependent jobs without even running them. MTT would recognize this and simply not run the test run phases, but it's nice that SLURM just kills them without even running them. :-) Anyhoo... The client is quite flexible; you can limit what you run by phase and/or section. Check out the output of "./client/mtt --help". This part in particular: --[no-]mpi-get Do the "MPI get" phase --[no-]mpi-install Do the "MPI install" phase --[no-]mpi-phases Alias for --mpi-get --mpi-install --[no-]test-get Do the "Test get" phase --[no-]test-build Do the "Test build" phase --[no-]test-run Do the "Test run" phase --[no-]test-phases Alias for --test-get --test-build -- test-run --[no-]section Do a specific section(s) By default, the client runs everything in finds in the ini file. But you can tell it exactly what phases to run (or not to run). For example, say I had 2 MPI get phases: [MPI get: ompi-nightly-trunk] [MPI get: ompi-nightly-v1.3] You can tell the client to run just the MPI Get phases: ./client/mtt --file ... --scratch ... --mpi-get Or you can tell the client to run just the "trunk" MPI Get phase: ./client/mtt --file ... --scratch ... --mpi-get --section trunk --section matching is case-insensitive. BEWARE: the --section matching applies to *all* sections. Specifically, if you're running a reportable phase (MPI Install, Test Build, Test Install), you must *also* be able to match your reporter section or that section won't be included. For example: ./client/mtt --file ... --scratch ... --mpi-install --section gnu- standard --section reporter In my cisco-ompi-core-testing.ini file (see ompi-tests/trunk/cisco/ mtt), this will run the following sections: [MPI install: GNU-standard] [Reporter: IU database] I have a "nightly.pl" script (same SVN dir, see above) that launches a set of very specific SLURM jobs to do Cisco's runs. It reads the sections from the Cisco INI file and launches a whole series of 1-node SLURM jobs, each with a unique scratch tree, each doing a single MPI install section corresponding to a single MPI get section, and then doing all corresponding Test Builds. It essentially runs "run-mtt- compile.pl ". This script essentially does the following: # Run a single MPI Get phase ./client/mtt -p --file ... --scratch --mpi-get --section reporter --section # if ^^ succeeds, run a single MPI install phase ./client/mtt -p --file ... --scratch --mpi-install --section reporter --section # if ^^ succeeds, run all corresponding Test Get and Test Build phases ./client/mtt -p --file ... --scratch --test-get --test-build I also sbatch a whole pile of corresponding N-node Test Run SLURM jobs that are dependent upon the above SLURM job that essentially run the following: ./client/mtt -p --file ... --scratch --test-run --section reporter --section Hope that helps. -- Jeff Squyres Cisco Systems