Re: [OMPI users] Problems in OpenMPI
On Fri, 2009-07-10 at 14:35 -0500, Yin Feng wrote: > I have my code run on supercomputer. > First, I required allocation and then just run my code using mpirun. > The supercomputer will assign 4 nodes but they are different at each > time of requirement. So, I don't know the machines I will use before > it runs. > Do you know how to figure out under this situation? The answer depends on what scheduler the computer is using, if it's using SGE then I believe it's enough to compile Open-MPI with the --with-sge flag and it figures it out for itself. You'll probably need to check with the local admins for a definitive answer. Ashley, -- Ashley Pittman, Bath, UK. Padb - A parallel job inspection tool for cluster computing http://padb.pittman.org.uk
Re: [OMPI users] Problems in OpenMPI
I have my code run on supercomputer. First, I required allocation and then just run my code using mpirun. The supercomputer will assign 4 nodes but they are different at each time of requirement. So, I don't know the machines I will use before it runs. Do you know how to figure out under this situation? On Fri, Jul 10, 2009 at 4:20 AM, Ashley Pittmanwrote: > On Thu, 2009-07-09 at 23:40 -0500, Yin Feng wrote: >> I am a beginner in MPI. >> >> I ran an example code using OpenMPI and it seems work. >> And then I tried a parallel example in PETSc tutorials folder (ex5). >> >> mpirun -np 4 ex5 >> It can do but the results are not as accurate as just running ex5. >> Is that thing normal? > > Not as accurate or just different? Different is normal and in light of > that accurate is itself a vague concept. > >> After that, send this job to supercomputer which allocates me 4 nodes >> and each node has 8 processors. When I check load on each node, I >> found: > >> Does anyone have any idea about this? > > I'd say it's obvious all 32 processes have been located on the same > node, what was the mpirun command you issued and the contents of the > machinefile you used? > > Running "orte-ps" on the machine where the mpirun command is running > will tell you the hostname where every rank is running or if you want > more information (load, cpu usage etc) you can use padb, the link for > which is in my signature. > > Ashley, > > -- > > Ashley Pittman, Bath, UK. > > Padb - A parallel job inspection tool for cluster computing > http://padb.pittman.org.uk > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >
[OMPI users] How to improve non-blocking point-to-point communication scaling
Dear OpenMPI experts We are seeing bad scaling of a certain code that uses OpenMPI non-blocking point-to-point routines, and would love to hear any suggestions on how to improve the situation. Details: We have a small 24-node cluster (Monk) with Infiniband, dual AMD Opteron quad-core processors, and we are using OpenMPI 1.3.2. One of the codes we run here is the MITgcm. The code is written in Fortran 77, uses a standard domain decomposition technique, and (Open)MPI. Some of the heavy lifting is done by a routine that solves the so-called barotropic pressure equation (an elliptic PDE) using a conjugate gradient technique, which typically takes 300 iterations at each time step. The pressure solver conjugate gradient routine uses MPI point-to-point non-blocking communication to exchange arrays across the subdomain boundaries. There are calls to MPI_ISend, MPI_Recv, and MPI_Waitall only. (There are a few MPI_Barrier also, but they seem to be inactive, knocked out by suitable preprocessor directives.) Problem: One user noted that when he increases the number of processors, the pressure solver takes a progressively larger share of the total walltime, and this percentage is much larger than on other (public) clusters. Here is a typical result on our cluster (Monk): Nodes --- Cores --- percent time taken by pressure solver --1-8-5% (Note: IB not used, single node run) --2-1614% --4-3245% --129680% (Note: fast increase of pressure solver %time with # cores used) However, according to the same user, when he runs the same code in the TACC Ranger and Lonestar clusters, the percent runtime taken by the pressure solver is a significantly smaller fraction of the total runtime, even when the number of cores used is large. Here are his results at TACC: On LoneStar (dual Xeon dual core, Infiniband (?), MVAPICH2 (?) ) Nodes --- Cores --- percent time taken by pressure solver --16---64--22% On Ranger (dual Opteron quad core, Infiniband, MVAPICH2) Nodes --- Cores --- percent time taken by pressure solver --864--19% --24--192--35% (Note: much smaller % than on our machine for the same number of cores.) I wonder if there is any parameter that I can tweak in OpenMPI which may reduce the percent time taken by the pressure solver. Any suggestions are appreciated. Many thanks, Gus Correa - Gustavo Correa Lamont-Doherty Earth Observatory - Columbia University Palisades, NY, 10964-8000 - USA -
Re: [OMPI users] Problems in OpenMPI
On Thu, 2009-07-09 at 23:40 -0500, Yin Feng wrote: > I am a beginner in MPI. > > I ran an example code using OpenMPI and it seems work. > And then I tried a parallel example in PETSc tutorials folder (ex5). > > mpirun -np 4 ex5 > It can do but the results are not as accurate as just running ex5. > Is that thing normal? Not as accurate or just different? Different is normal and in light of that accurate is itself a vague concept. > After that, send this job to supercomputer which allocates me 4 nodes > and each node has 8 processors. When I check load on each node, I > found: > Does anyone have any idea about this? I'd say it's obvious all 32 processes have been located on the same node, what was the mpirun command you issued and the contents of the machinefile you used? Running "orte-ps" on the machine where the mpirun command is running will tell you the hostname where every rank is running or if you want more information (load, cpu usage etc) you can use padb, the link for which is in my signature. Ashley, -- Ashley Pittman, Bath, UK. Padb - A parallel job inspection tool for cluster computing http://padb.pittman.org.uk
Re: [OMPI users] MPI and C++ (Boost)
Luis Vitorio Cargnini wrote: Ok, after all the considerations, I'll try Boost, today, make some experiments and see if I can use it or if I'll avoid it yet. But as said by Raimond I think, the problem is been dependent of a rich-incredible-amazing-toolset but still implementing only MPI-1, and do not implement all the MPI functions main drawbacks of boost, but the set of functions implemented do not compromise the functionality, i don't know about the MPI-1, MPI-2 and future MPI-3 specifications, how this specifications implementations affect boost and the developer using Boost, with OpenMPI of course. Continuing if something change in the boost how can I guarantee it won't affect my code in the future ? It is impossible. Anyway I'll test it today and without it and choose my direction, thanks for all the replies, suggestions, solutions, that you all pointed to me I really appreciate all your help and comments about boost or not my code. Thanks and Regards. Vitorio. Vitorio, If there is some MPI capability that is not currently provided in Boost.MPI, then just call it the normal MPI way. Using Boost.MPI doesn't interfere with any use of the C bindings, even in the same function. As for future changes, if something happens to a boost library that you don't like, just keep using the older version. Past releases of boost remain available after new releases arrive. John
[OMPI users] Problems in OpenMPI
I am a beginner in MPI. I ran an example code using OpenMPI and it seems work. And then I tried a parallel example in PETSc tutorials folder (ex5). mpirun -np 4 ex5 It can do but the results are not as accurate as just running ex5. Is that thing normal? After that, send this job to supercomputer which allocates me 4 nodes and each node has 8 processors. When I check load on each node, I found: Node LOAD CPU 0 32 800 1 0 0 2 0 0 3 0 0 But for other's job, they got Node LOAD 0 8 1 8 2 8 3 8 It seems the master node take all the load and the speed is even lower than it works on single processor. Does anyone have any idea about this? Thank you in advance! Sincerely, YIN