Re: [OMPI users] WRF run on multiple Nodes

2011-04-02 Thread David Zhang
look into -machinefile

On Fri, Apr 1, 2011 at 8:16 PM, Ahsan Ali  wrote:

> Hello,
>
>  I want to run WRF on multiple nodes in a linux cluster using openmpi,
> giving the command *mpirun -np 4 ./wrf.exe* just submit it to the single
> node . I don't know how to run it on other nodes as well. Help needed.
>
> Regards,
>
> --
> Syed Ahsan Ali Bokhari
> Electronic Engineer (EE)
>
> Research & Development Division
> Pakistan Meteorological Department H-8/4, Islamabad.
> Phone # off  +92518358714
> Cell # +923155145014
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>



-- 
David Zhang
University of California, San Diego


[OMPI users] unable to run program

2011-04-02 Thread mohd naseem
Sir,
  I am a student of MCA(final).i have to make Bewoulf linux cluster as a
part of final project.please help me by telling how to make cluster and how
we run program


Re: [OMPI users] unable to run program

2011-04-02 Thread John Hearns
Mohd,
  the Clustermonkey site is a good resource for you
http://www.clustermonkey.net/


[OMPI users] MPI-2 I/O functions (Open MPI 1.5.x on Windows)

2011-04-02 Thread Satoi Ogawa
Dear Developers and Users,

Thank you for your development of Open MPI.

I want to use Open MPI 1.5.3 on Windows 7 32bit, one PC.
But there is something wrong with the part using MPI-2 I/O functions
in my program.
It correctly worked on Open MPI on Linux.
I would very much appreciate any information you could send me.
I can't find it in Open MPI User's Mailing List Archives.

FYI:
I did download Open MPI 1.5.3 for Windows 32bit:
http://www.open-mpi.org/software/ompi/v1.5/downloads/OpenMPI_v1.5.3-2_win32.exe
I found that libmpi_f77.lib is missing in this package.

Sincerely yours,
Satoi

Satoi Ogawa 


[OMPI users] OMPI not calling finalize error

2011-04-02 Thread Jack Bryan

Hi, 
When I run a parallel program, I got an error : 
--[n333:129522] 
*** Process received signal ***[n333:129522] Signal: Segmentation fault 
(11)[n333:129522] Signal code: Address not mapped (1)[n333:129522] Failing at 
address: 0x40[n333:129522] [ 0] /lib64/libpthread.so.0 
[0x3c50e0e4c0][n333:129522] [ 1] /opt/openmpi-1.3.4-gnu/lib/libmpi.so.0 
[0x4cd19b1][n333:129522] [ 2] 
/opt/openmpi-1.3.4-gnu/lib/libopen-pal.so.0(opal_progress+0x75) 
[0x52e5165][n333:129522] [ 3] /opt/openmpi-1.3.4-gnu/lib/libopen-rte.so.0 
[0x508565c][n333:129522] [ 4] /opt/openmpi-1.3.4-gnu/lib/libmpi.so.0 
[0x4c653eb][n333:129522] [ 5] 
/opt/openmpi-1.3.4-gnu/lib/libmpi.so.0(MPI_Init+0x120) [0x4c84b90][n333:129522] 
[ 6] /lustre/jxding/netplan49/nsga2b [0x4497f6][n333:129522] [ 7] 
/lib64/libc.so.6(__libc_start_main+0xf4) [0x3c5021d974][n333:129522] [ 8] 
/lustre/jxding/netplan49/nsga2b(__gxx_personality_v0+0x499) 
[0x4436e9][n333:129522] *** End of error message 
***--mpirun
 has exited due to process rank 24 with PID 129522 onnode n333 exiting without 
calling "finalize". This mayhave caused other processes in the application to 
beterminated by signals sent by mpirun (as reported 
here).-But,
 the program only run for not more than a few of minutes. It should take hours 
to finish. 
How can it reach "finalize" so fast ? 
Any help is appreciated. 
Jack  

Re: [OMPI users] OMPI not calling finalize error

2011-04-02 Thread David Zhang
>From the error message, there is a segfault in the program, which crashes
the one of the process.  MPI notices one of the process has died and
terminate the other processes as well.  Because these processes were not
terminated by calling MPI_finalize, you get the error message on the bottom.


On Sat, Apr 2, 2011 at 8:05 AM, Jack Bryan  wrote:

>  Hi,
>
> When I run a parallel program, I got an error :
> --
> [n333:129522] *** Process received signal ***
> [n333:129522] Signal: Segmentation fault (11)
> [n333:129522] Signal code: Address not mapped (1)
> [n333:129522] Failing at address: 0x40
> [n333:129522] [ 0] /lib64/libpthread.so.0 [0x3c50e0e4c0]
> [n333:129522] [ 1] /opt/openmpi-1.3.4-gnu/lib/libmpi.so.0 [0x4cd19b1]
> [n333:129522] [ 2]
> /opt/openmpi-1.3.4-gnu/lib/libopen-pal.so.0(opal_progress+0x75) [0x52e5165]
> [n333:129522] [ 3] /opt/openmpi-1.3.4-gnu/lib/libopen-rte.so.0 [0x508565c]
> [n333:129522] [ 4] /opt/openmpi-1.3.4-gnu/lib/libmpi.so.0 [0x4c653eb]
> [n333:129522] [ 5] /opt/openmpi-1.3.4-gnu/lib/libmpi.so.0(MPI_Init+0x120)
> [0x4c84b90]
> [n333:129522] [ 6] /lustre/jxding/netplan49/nsga2b [0x4497f6]
> [n333:129522] [ 7] /lib64/libc.so.6(__libc_start_main+0xf4) [0x3c5021d974]
> [n333:129522] [ 8]
> /lustre/jxding/netplan49/nsga2b(__gxx_personality_v0+0x499) [0x4436e9]
> [n333:129522] *** End of error message ***
> --
> mpirun has exited due to process rank 24 with PID 129522 on
> node n333 exiting without calling "finalize". This may
> have caused other processes in the application to be
> terminated by signals sent by mpirun (as reported here).
> --
>
> ---
> But, the program only run for not more than a few of minutes. It should
> take hours to finish.
>
> How can it reach "finalize" so fast ?
>
> Any help is appreciated.
>
> Jack
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>



-- 
David Zhang
University of California, San Diego


Re: [OMPI users] Deadlock with mpi_init_thread + mpi_file_set_view

2011-04-02 Thread fah10
> Even inside MPICH2, I have given little attention to threadsafety and
> the MPI-IO routines.  In MPICH2, each MPI_File* function grabs the big
> critical section lock -- not pretty but it gets the job done.  
> When ported to OpenMPI, I don't know how the locking works.
> Furthermore, the MPI-IO library inside OpenMPI-1.4.3 is pretty old.  I
> wonder if the locking we added over the years will help?  Can you try
> openmpi-1.5.3 and report what happens?

In Openmpi-1.5.3 with enabled threading support, the MPI-IO routines work
without any problems. However, the dead lock now occurs when calling 
mpi_finalize with the backtrace given below. This deadlock is independent
of the number of mpi tasks. 
However, the deadlock during mpi_finalize does not occur when no MPI-IO 
routines where called before. Unfortunately, the program terminates with a
segfault in this case, after returning from mpi_finalize (at the end of the 
program)

Fabian


opal_mutex_lock(): Resource deadlock avoided
#0  0x0012e416 in __kernel_vsyscall ()
#1  0x01035941 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#2  0x01038e42 in abort () at abort.c:92
#3  0x00d9da68 in ompi_attr_free_keyval (type=COMM_ATTR, key=0xbffda0e4, 
predefined=0 '\000') at attribute/attribute.c:656
#4  0x00dd8aa2 in PMPI_Keyval_free (keyval=0xbffda0e4) at pkeyval_free.c:52
#5  0x01bf3e6a in ADIOI_End_call (comm=0xf1c0c0, keyval=10, attribute_val=0x0, 
extra_state=0x0) at ad_end.c:82
#6  0x00da01bb in ompi_attr_delete. (type=UNUSED_ATTR, object=0x6, 
attr_hash=0x2c64, key=14285602, predefined=232 '\350', need_lock=128 '\200')
at attribute/attribute.c:726
#7  0x00d9fb22 in ompi_attr_delete_all (type=COMM_ATTR, object=0xf1c0c0, 
attr_hash=0x8d0fee8) at attribute/attribute.c:1043
#8  0x00dbda65 in ompi_mpi_finalize () at runtime/ompi_mpi_finalize.c:133
#9  0x00dd12c2 in PMPI_Finalize () at pfinalize.c:46
#10 0x00d6b515 in mpi_finalize_f (ierr=0xbffda2b8) at pfinalize_f.c:62



.


[OMPI users] openmpi/pbsdsh/Torque problem

2011-04-02 Thread Laurence Marks
I have a problem which may or may not be openmpi, but since this list
was useful before with a race condition I am posting.

I am trying to use pbsdsh as a ssh replacement, pushed by sysadmins as
Torque does not know about ssh tasks launched from a task. In a simple
case, a script launches three mpi tasks in parallel,

Task1: NodeA
Task2: NodeB and NodeC
Task3: NodeD

(some cores on each, all handled correctly). Reproducible (but with
different nodes and numbers of cores) Task1 and Task3 work fine, the
mpi task starts on NodeB but nothing starts on NodeC, it appears that
NodeC does not communicate. It does not have to be this it could be

Task1: NodeA NodeB
Task2: NodeC NodeD

Here NodeC will start and it looks as if NodeD never starts anything.
I've also run it with 4 Tasks (1,3,4 work) and if Task2 only uses one
Node (number of cores do not matter) it is fine.

-- 
Laurence Marks
Department of Materials Science and Engineering
MSE Rm 2036 Cook Hall
2220 N Campus Drive
Northwestern University
Evanston, IL 60208, USA
Tel: (847) 491-3996 Fax: (847) 491-7820
email: L-marks at northwestern dot edu
Web: www.numis.northwestern.edu
Chair, Commission on Electron Crystallography of IUCR
www.numis.northwestern.edu/
Research is to see what everybody else has seen, and to think what
nobody else has thought
Albert Szent-Györgi



Re: [OMPI users] openmpi/pbsdsh/Torque problem

2011-04-02 Thread Ralph Castain
I'm afraid I have no idea what you are talking about. Are you saying you are 
launching OMPI processes via mpirun, but with "pbsdsh" as the plm_rsh_agent???

That would be a very bad idea. If you are running under Torque, then let mpirun 
"do the right thing" and use its Torque-based launcher.

On the other hand, if you are trying to launch MPI processes directly using 
pbsdsh, then that simply won't work. The procs will have no idea how to wire up 
or communicate.


On Apr 2, 2011, at 8:36 PM, Laurence Marks wrote:

> I have a problem which may or may not be openmpi, but since this list
> was useful before with a race condition I am posting.
> 
> I am trying to use pbsdsh as a ssh replacement, pushed by sysadmins as
> Torque does not know about ssh tasks launched from a task. In a simple
> case, a script launches three mpi tasks in parallel,
> 
> Task1: NodeA
> Task2: NodeB and NodeC
> Task3: NodeD
> 
> (some cores on each, all handled correctly). Reproducible (but with
> different nodes and numbers of cores) Task1 and Task3 work fine, the
> mpi task starts on NodeB but nothing starts on NodeC, it appears that
> NodeC does not communicate. It does not have to be this it could be
> 
> Task1: NodeA NodeB
> Task2: NodeC NodeD
> 
> Here NodeC will start and it looks as if NodeD never starts anything.
> I've also run it with 4 Tasks (1,3,4 work) and if Task2 only uses one
> Node (number of cores do not matter) it is fine.
> 
> -- 
> Laurence Marks
> Department of Materials Science and Engineering
> MSE Rm 2036 Cook Hall
> 2220 N Campus Drive
> Northwestern University
> Evanston, IL 60208, USA
> Tel: (847) 491-3996 Fax: (847) 491-7820
> email: L-marks at northwestern dot edu
> Web: www.numis.northwestern.edu
> Chair, Commission on Electron Crystallography of IUCR
> www.numis.northwestern.edu/
> Research is to see what everybody else has seen, and to think what
> nobody else has thought
> Albert Szent-Györgi
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users