Re: [OMPI users] WRF run on multiple Nodes
look into -machinefile On Fri, Apr 1, 2011 at 8:16 PM, Ahsan Ali wrote: > Hello, > > I want to run WRF on multiple nodes in a linux cluster using openmpi, > giving the command *mpirun -np 4 ./wrf.exe* just submit it to the single > node . I don't know how to run it on other nodes as well. Help needed. > > Regards, > > -- > Syed Ahsan Ali Bokhari > Electronic Engineer (EE) > > Research & Development Division > Pakistan Meteorological Department H-8/4, Islamabad. > Phone # off +92518358714 > Cell # +923155145014 > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > -- David Zhang University of California, San Diego
[OMPI users] unable to run program
Sir, I am a student of MCA(final).i have to make Bewoulf linux cluster as a part of final project.please help me by telling how to make cluster and how we run program
Re: [OMPI users] unable to run program
Mohd, the Clustermonkey site is a good resource for you http://www.clustermonkey.net/
[OMPI users] MPI-2 I/O functions (Open MPI 1.5.x on Windows)
Dear Developers and Users, Thank you for your development of Open MPI. I want to use Open MPI 1.5.3 on Windows 7 32bit, one PC. But there is something wrong with the part using MPI-2 I/O functions in my program. It correctly worked on Open MPI on Linux. I would very much appreciate any information you could send me. I can't find it in Open MPI User's Mailing List Archives. FYI: I did download Open MPI 1.5.3 for Windows 32bit: http://www.open-mpi.org/software/ompi/v1.5/downloads/OpenMPI_v1.5.3-2_win32.exe I found that libmpi_f77.lib is missing in this package. Sincerely yours, Satoi Satoi Ogawa
[OMPI users] OMPI not calling finalize error
Hi, When I run a parallel program, I got an error : --[n333:129522] *** Process received signal ***[n333:129522] Signal: Segmentation fault (11)[n333:129522] Signal code: Address not mapped (1)[n333:129522] Failing at address: 0x40[n333:129522] [ 0] /lib64/libpthread.so.0 [0x3c50e0e4c0][n333:129522] [ 1] /opt/openmpi-1.3.4-gnu/lib/libmpi.so.0 [0x4cd19b1][n333:129522] [ 2] /opt/openmpi-1.3.4-gnu/lib/libopen-pal.so.0(opal_progress+0x75) [0x52e5165][n333:129522] [ 3] /opt/openmpi-1.3.4-gnu/lib/libopen-rte.so.0 [0x508565c][n333:129522] [ 4] /opt/openmpi-1.3.4-gnu/lib/libmpi.so.0 [0x4c653eb][n333:129522] [ 5] /opt/openmpi-1.3.4-gnu/lib/libmpi.so.0(MPI_Init+0x120) [0x4c84b90][n333:129522] [ 6] /lustre/jxding/netplan49/nsga2b [0x4497f6][n333:129522] [ 7] /lib64/libc.so.6(__libc_start_main+0xf4) [0x3c5021d974][n333:129522] [ 8] /lustre/jxding/netplan49/nsga2b(__gxx_personality_v0+0x499) [0x4436e9][n333:129522] *** End of error message ***--mpirun has exited due to process rank 24 with PID 129522 onnode n333 exiting without calling "finalize". This mayhave caused other processes in the application to beterminated by signals sent by mpirun (as reported here).-But, the program only run for not more than a few of minutes. It should take hours to finish. How can it reach "finalize" so fast ? Any help is appreciated. Jack
Re: [OMPI users] OMPI not calling finalize error
>From the error message, there is a segfault in the program, which crashes the one of the process. MPI notices one of the process has died and terminate the other processes as well. Because these processes were not terminated by calling MPI_finalize, you get the error message on the bottom. On Sat, Apr 2, 2011 at 8:05 AM, Jack Bryan wrote: > Hi, > > When I run a parallel program, I got an error : > -- > [n333:129522] *** Process received signal *** > [n333:129522] Signal: Segmentation fault (11) > [n333:129522] Signal code: Address not mapped (1) > [n333:129522] Failing at address: 0x40 > [n333:129522] [ 0] /lib64/libpthread.so.0 [0x3c50e0e4c0] > [n333:129522] [ 1] /opt/openmpi-1.3.4-gnu/lib/libmpi.so.0 [0x4cd19b1] > [n333:129522] [ 2] > /opt/openmpi-1.3.4-gnu/lib/libopen-pal.so.0(opal_progress+0x75) [0x52e5165] > [n333:129522] [ 3] /opt/openmpi-1.3.4-gnu/lib/libopen-rte.so.0 [0x508565c] > [n333:129522] [ 4] /opt/openmpi-1.3.4-gnu/lib/libmpi.so.0 [0x4c653eb] > [n333:129522] [ 5] /opt/openmpi-1.3.4-gnu/lib/libmpi.so.0(MPI_Init+0x120) > [0x4c84b90] > [n333:129522] [ 6] /lustre/jxding/netplan49/nsga2b [0x4497f6] > [n333:129522] [ 7] /lib64/libc.so.6(__libc_start_main+0xf4) [0x3c5021d974] > [n333:129522] [ 8] > /lustre/jxding/netplan49/nsga2b(__gxx_personality_v0+0x499) [0x4436e9] > [n333:129522] *** End of error message *** > -- > mpirun has exited due to process rank 24 with PID 129522 on > node n333 exiting without calling "finalize". This may > have caused other processes in the application to be > terminated by signals sent by mpirun (as reported here). > -- > > --- > But, the program only run for not more than a few of minutes. It should > take hours to finish. > > How can it reach "finalize" so fast ? > > Any help is appreciated. > > Jack > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > -- David Zhang University of California, San Diego
Re: [OMPI users] Deadlock with mpi_init_thread + mpi_file_set_view
> Even inside MPICH2, I have given little attention to threadsafety and > the MPI-IO routines. In MPICH2, each MPI_File* function grabs the big > critical section lock -- not pretty but it gets the job done. > When ported to OpenMPI, I don't know how the locking works. > Furthermore, the MPI-IO library inside OpenMPI-1.4.3 is pretty old. I > wonder if the locking we added over the years will help? Can you try > openmpi-1.5.3 and report what happens? In Openmpi-1.5.3 with enabled threading support, the MPI-IO routines work without any problems. However, the dead lock now occurs when calling mpi_finalize with the backtrace given below. This deadlock is independent of the number of mpi tasks. However, the deadlock during mpi_finalize does not occur when no MPI-IO routines where called before. Unfortunately, the program terminates with a segfault in this case, after returning from mpi_finalize (at the end of the program) Fabian opal_mutex_lock(): Resource deadlock avoided #0 0x0012e416 in __kernel_vsyscall () #1 0x01035941 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64 #2 0x01038e42 in abort () at abort.c:92 #3 0x00d9da68 in ompi_attr_free_keyval (type=COMM_ATTR, key=0xbffda0e4, predefined=0 '\000') at attribute/attribute.c:656 #4 0x00dd8aa2 in PMPI_Keyval_free (keyval=0xbffda0e4) at pkeyval_free.c:52 #5 0x01bf3e6a in ADIOI_End_call (comm=0xf1c0c0, keyval=10, attribute_val=0x0, extra_state=0x0) at ad_end.c:82 #6 0x00da01bb in ompi_attr_delete. (type=UNUSED_ATTR, object=0x6, attr_hash=0x2c64, key=14285602, predefined=232 '\350', need_lock=128 '\200') at attribute/attribute.c:726 #7 0x00d9fb22 in ompi_attr_delete_all (type=COMM_ATTR, object=0xf1c0c0, attr_hash=0x8d0fee8) at attribute/attribute.c:1043 #8 0x00dbda65 in ompi_mpi_finalize () at runtime/ompi_mpi_finalize.c:133 #9 0x00dd12c2 in PMPI_Finalize () at pfinalize.c:46 #10 0x00d6b515 in mpi_finalize_f (ierr=0xbffda2b8) at pfinalize_f.c:62 .
[OMPI users] openmpi/pbsdsh/Torque problem
I have a problem which may or may not be openmpi, but since this list was useful before with a race condition I am posting. I am trying to use pbsdsh as a ssh replacement, pushed by sysadmins as Torque does not know about ssh tasks launched from a task. In a simple case, a script launches three mpi tasks in parallel, Task1: NodeA Task2: NodeB and NodeC Task3: NodeD (some cores on each, all handled correctly). Reproducible (but with different nodes and numbers of cores) Task1 and Task3 work fine, the mpi task starts on NodeB but nothing starts on NodeC, it appears that NodeC does not communicate. It does not have to be this it could be Task1: NodeA NodeB Task2: NodeC NodeD Here NodeC will start and it looks as if NodeD never starts anything. I've also run it with 4 Tasks (1,3,4 work) and if Task2 only uses one Node (number of cores do not matter) it is fine. -- Laurence Marks Department of Materials Science and Engineering MSE Rm 2036 Cook Hall 2220 N Campus Drive Northwestern University Evanston, IL 60208, USA Tel: (847) 491-3996 Fax: (847) 491-7820 email: L-marks at northwestern dot edu Web: www.numis.northwestern.edu Chair, Commission on Electron Crystallography of IUCR www.numis.northwestern.edu/ Research is to see what everybody else has seen, and to think what nobody else has thought Albert Szent-Györgi
Re: [OMPI users] openmpi/pbsdsh/Torque problem
I'm afraid I have no idea what you are talking about. Are you saying you are launching OMPI processes via mpirun, but with "pbsdsh" as the plm_rsh_agent??? That would be a very bad idea. If you are running under Torque, then let mpirun "do the right thing" and use its Torque-based launcher. On the other hand, if you are trying to launch MPI processes directly using pbsdsh, then that simply won't work. The procs will have no idea how to wire up or communicate. On Apr 2, 2011, at 8:36 PM, Laurence Marks wrote: > I have a problem which may or may not be openmpi, but since this list > was useful before with a race condition I am posting. > > I am trying to use pbsdsh as a ssh replacement, pushed by sysadmins as > Torque does not know about ssh tasks launched from a task. In a simple > case, a script launches three mpi tasks in parallel, > > Task1: NodeA > Task2: NodeB and NodeC > Task3: NodeD > > (some cores on each, all handled correctly). Reproducible (but with > different nodes and numbers of cores) Task1 and Task3 work fine, the > mpi task starts on NodeB but nothing starts on NodeC, it appears that > NodeC does not communicate. It does not have to be this it could be > > Task1: NodeA NodeB > Task2: NodeC NodeD > > Here NodeC will start and it looks as if NodeD never starts anything. > I've also run it with 4 Tasks (1,3,4 work) and if Task2 only uses one > Node (number of cores do not matter) it is fine. > > -- > Laurence Marks > Department of Materials Science and Engineering > MSE Rm 2036 Cook Hall > 2220 N Campus Drive > Northwestern University > Evanston, IL 60208, USA > Tel: (847) 491-3996 Fax: (847) 491-7820 > email: L-marks at northwestern dot edu > Web: www.numis.northwestern.edu > Chair, Commission on Electron Crystallography of IUCR > www.numis.northwestern.edu/ > Research is to see what everybody else has seen, and to think what > nobody else has thought > Albert Szent-Györgi > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users