Re: [OMPI users] trouble_MPI

2012-09-19 Thread David Warren
Segfaults in FORTRAN generally mean either an array is out of bounds, or 
you can't get the memory you are requesting. Check your array sizes 
(particularly the ones in subroutines). You can compile with -C, but 
that only tells you if you exceed an array declaration, not the actual 
size. It is possible to pass a smaller array to a subroutine than it 
declares it to be and -C won't catch that. I have seen lots of code that 
does that. Some that even relied on the fact that VAXen used to stack 
arrays in order, so you could wander into the next and previous ones, 
and everything worked as expected.


I doubt you are exceeding and memory limitation as you are asking for 40 
processors, so each one is pretty small. It is more likely that there is 
some temporary array there that is the wrong size.


On 09/18/12 14:42, Brian Budge wrote:

On Tue, Sep 18, 2012 at 2:14 PM, Alidoust  wrote:

Dear Madam/Sir,


I have a serial Fortran code (f90), dealing with matrix diagonalizing
subroutines, and recently got its parallel version to be faster in some
unfeasible parts via the serial program.
I have been using the following commands for initializing MPI in the code
---
 call MPI_INIT(ierr)
 call MPI_COMM_SIZE(MPI_COMM_WORLD, p, ierr)
 call MPI_COMM_RANK(MPI_COMM_WORLD, my_rank, ierr)

CPU requirement>>  pmem=1500mb,nodes=5:ppn=8<<
---
Everything looks OK when matrix dimensions are less than 1000x1000. When I
increase the matrix dimensions to some larger values the parallel code gets
the following error
--
mpirun noticed that process rank 6 with PID 1566 on node node1082 exited on
signal 11 (Segmentation fault)
--
There is no such error with the serial version even for larger matrix
dimensions than 2400x2400. I then thought it might be raised by the number
of nodes and memory space I'm requiring. Then changed it as follows

pmem=10gb,nodes=20:ppn=2

which is more or less similar to what I'm using for serial jobs
(mem=10gb,nodes=1:ppn=1). But the problem persists still. Is there any
limitation on MPI subroutines for transferring data size or the issue would
be raised by some cause else?

Best of Regards,
Mohammad


I believe the send/recv/bcast calls are all limited to sending 2 GB
data since they use a signed 32-bit integer to denote the size.  If
your matrices require a lot of space per element, I suppose this limit
could be reached.

   Brian
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


Re: [OMPI users] some mpi processes "disappear" on a cluster of servers

2012-09-04 Thread David Warren
Which FORTRAN compiler are you using? I believe that most of them allow 
you to compile with -g and optimization and then force a stack dump on 
crash. I have found this to work on code that seems to vanish on random 
processors. Also, you might look at the FORTRAN options and see if it 
lets you allocate memory for temporary arrays from the heap instead of 
the stack. Sometime this will help, even if you have stacksize set to 
unlimited. One last thing to check is are you running with 32bit  or 64 
bit memory models.


If you are using ifort, the options you want are
 -O[2,3] -g -traceback -heap-arrays -mcmodel=medium
You only need -mcmodel=medium if you might exceed 2 gig in a process.

For gfortran try
-O -g -fbacktrace -mcmodel=medium

On 09/03/12 13:39, Andrea Negri wrote:

I have asked to my admin and he said that no log messages were present
in /var/log, apart my login on the compute node.
No killed processes, no full stack errors, the memory is ok, 1GB is
used and 2GB are free.
Actually I don't know what kind of problem coud be, does someone have
ideas? Or at least a suspect?

Please, don't let me alone!

Sorry for the trouble with the mail

2012/9/1:

Send users mailing list submissions to
 us...@open-mpi.org

To subscribe or unsubscribe via the World Wide Web, visit
 http://www.open-mpi.org/mailman/listinfo.cgi/users
or, via email, send a message with subject or body 'help' to
 users-requ...@open-mpi.org

You can reach the person managing the list at
 users-ow...@open-mpi.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of users digest..."


Today's Topics:

1. Re: some mpi processes "disappear" on a cluster ofservers
   (John Hearns)
2. Re: users Digest, Vol 2339, Issue 5 (Andrea Negri)


--

Message: 1
Date: Sat, 1 Sep 2012 08:48:56 +0100
From: John Hearns
Subject: Re: [OMPI users] some mpi processes "disappear" on a cluster
 of  servers
To: Open MPI Users
Message-ID:
 
Content-Type: text/plain; charset=ISO-8859-1

Apologies, I have not taken the time to read your comprehensive diagnostics!

As Gus says, this sounds like a memory problem.
My suspicion would be the kernel Out Of Memory (OOM) killer.
Log into those nodes (or ask your systems manager to do this). Look
closely at /var/log/messages where there will be notifications when
the OOM Killer kicks in and - well - kills large memory processes!
Grep for "killed process" in /var/log/messages

http://linux-mm.org/OOM_Killer


--

Message: 2
Date: Sat, 1 Sep 2012 11:50:59 +0200
From: Andrea Negri
Subject: Re: [OMPI users] users Digest, Vol 2339, Issue 5
To: us...@open-mpi.org
Message-ID:
 
Content-Type: text/plain; charset=ISO-8859-1

Hi, Gus and John,

my code (zeusmp2) is a F77 code ported in F95, and the very first time
I have launched it, the processed disappears almost immediately.
I checked the code with valgrind and some unallocated arrays were
passed wrongly to some subroutines.
After having corrected this bug, the code runs for a while and after
occour all the stuff described in my first post.
However, the code still performs a lot of main temporal cycle before
"die" (I don't know if thies piece of information is useful).

Now I'm going to check the memory usage, (I also have a lot of unused
variables in this pretty large code, maybe I shoud comment them).

uname -a returns
Linux cloud 2.6.9-42.0.3.ELsmp #1 SMP Thu Oct 5 16:29:37 CDT 2006
x86_64 x86_64 x86_64 GNU/Linux

ulimit -a returns:
core file size(blocks, -c) 0
data seg size   (kbytes, -d) unlimited
file size   (blocks, -f) unlimited
pending signals(-i) 1024
max locked memory (kbytes, -l) 32
max memory size(kbytes, -m) unlimited
open files   (-n) 1024
pipe size(512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
stack size   (kbytes, -s) 10240
cpu time (seconds, -t) unlimited
max user processes (-u) 36864
virtual memory   (kbytes, -v) unlimited
file locks(-x) unlimited


I can log on the logins nodes, but unfortunately the command ls
/var/log/messages return
acpid   cron.4  messages.3 secure.4
anaconda.logcupsmessages.4 spooler
anaconda.syslog dmesg   mpi_uninstall.log  spooler.1
anaconda.xlog   gdm pppspooler.2
audit   httpd   prelink.logspooler.3
boot.logitac_uninstall.log  rpmpkgsspooler.4
boot.log.1  lastlog rpmpkgs.1  vbox
boot.log.2  mailrpmpkgs.2 

Re: [OMPI users] MPI/FORTRAN on a cluster system

2012-08-20 Thread David Warren
The biggest issue you may have is that gnu fortran does not support all the
fortran constructs that all the others do. Most fortrans have supported the
standard plus the DEC extentions. Gnu fortran does not quite get all the
standards.Intel fortran does support them all, and I believe that portland
group and absoft may also.

On Sun, Aug 19, 2012 at 9:11 AM, Bill Mulberry wrote:

>
> Hi
>
> I have a large program written in FORTRAN 77 with a couple of routines
> written in C++.  It has MPI commands built into it to run on a large scale
> multiprocessor IBM systems.  I am now having the task of transferring this
> program over to a cluster system.  Both the multiprocessor and cluster
> system has linux hosted on them.  The Cluster system has GNU FORTRAN and
> GNU
> C compilers on it.  I am told the Cluster has openmpi.  I am wondering if
> anybody out there has had to do the same task and if so what I can expect
> from this.  Will I be expected to make some big changes, etc.?  Any advice
> will be appreciated.
>
> Thanks.
>
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>



-- 
David Warren
University of Washington
206 543-0954


Re: [OMPI users] compiling openMPI 1.6 with Intel compilers on Ubuntu, getting error

2012-07-24 Thread David Warren
Instead of sudo make install do
sudo bash
source /opt/intel/bin/compilervars.sh intel64
make install

Once you sudo you are starting a new shell as root, not a subshell. So,
your environment does not go with it. You need to become root, then set the
environment.

On Tue, Jul 24, 2012 at 7:47 AM, Stephen J. Barr wrote:

> Greetings,
>
> I am working on building openmpi-1.6 on ubuntu 12.04 using the intel
> compiler suite. My configure command was:
>
> ./configure --prefix=/usr/local/lib CC=icc CXX=icpc F77=ifort FC=ifort
>
> which completed successfully, as did 'make all'
>
> I am having trouble with the 'sudo make install' step. Specifically,
>
> make[2]: Entering directory `/home/stevejb/apps/openmpi-1.6/ompi/mpi/cxx'
> make[3]: Entering directory `/home/stevejb/apps/openmpi-1.6/ompi/mpi/cxx'
> test -z "/usr/local/lib/lib" || /bin/mkdir -p "/usr/local/lib/lib"
>  /bin/bash ../../../libtool   --mode=install /usr/bin/install -c
> libmpi_cxx.la '/usr/local/lib/lib'
> libtool: install: warning: relinking `libmpi_cxx.la'
> libtool: install: (cd /home/stevejb/apps/openmpi-1.6/ompi/mpi/cxx;
> /bin/bash /home/stevejb/apps/openmpi-1.6/libtool  --silent --tag CXX
> --mode=relink icpc -O3 -DNDEBUG -finline-functions -pthread -version-info
> 1:1:0 -export-dynamic -o libmpi_cxx.la -rpath /usr/local/lib/lib
> mpicxx.lo intercepts.lo comm.lo datatype.lo win.lo file.lo ../../../ompi/
> libmpi.la -lrt -lnsl -lutil )
> /home/stevejb/apps/openmpi-1.6/libtool: line 8979: icpc: command not found
> libtool: install: error: relink `libmpi_cxx.la' with the above command
> before installing it
> make[3]: *** [install-libLTLIBRARIES] Error 1
> make[3]: Leaving directory `/home/stevejb/apps/openmpi-1.6/ompi/mpi/cxx'
> make[2]: *** [install-am] Error 2
> make[2]: Leaving directory `/home/stevejb/apps/openmpi-1.6/ompi/mpi/cxx'
> make[1]: *** [install-recursive] Error 1
> make[1]: Leaving directory `/home/stevejb/apps/openmpi-1.6/ompi'
> make: *** [install-recursive] Error 1
>
>
> It seems to be a similar problem to this thread:
> http://www.open-mpi.org/community/lists/users/2010/11/14913.php but I
> cannot seem to get it resolved. From what I can tell, libtool cannot figure
> out where icpc is. From what I know, that location is set in my .bashrc
> script with the line:
>
> source /opt/intel/bin/compilervars.sh intel64
>
> In addition, I explicitly set it as:
>
> export PATH=$PATH:/opt/intel/composer_xe_2011_sp1.11.339/bin/intel64/
>
> What am I missing so that I can get libtool to see where icpc is?
>
> Thanks and best regards,
> Stephen
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>



-- 
David Warren
University of Washington
206 543-0954


Re: [OMPI users] Bad parallel scaling using Code Saturne with openmpi

2012-07-10 Thread David Warren
Your problem may not be related to bandwidth. It may be latency or 
division of the problem. We found significant improvements running wrf 
and other atmospheric code (CFD) over IB. The problem was not so much 
the amount of data communicated, but how long it takes to send it. Also, 
is your model big enough to split up as much as you are trying? If there 
is not enough work for each core to do between edge exchanges, you will 
spend all of your time spinning waiting for the network. If you are 
running a demo benchmark it is likely too small for the number of 
processors. At least that is what we find with most weather model demo 
problems. One other thing to look at is how it is being split up. 
Depending on what the algorithm does, you may want more x points, more y 
points or completely even divisions. We found that we can significantly 
speed up wrf for our particular domain by not lett


On 07/10/12 08:48, Dugenoux Albert wrote:

Thanks for your answer.You are right.
I've tried upon 4 nodes with 6 processes and things are worst.
So do you suggest that unique thing to do is to order an infiniband 
switch or is there a possibility to enhance

something by tuning mca parameters ?

*De :* Ralph Castain 
*À :* Dugenoux Albert ; Open MPI Users 


*Envoyé le :* Mardi 10 juillet 2012 16h47
*Objet :* Re: [OMPI users] Bad parallel scaling using Code Saturne 
with openmpi


I suspect it mostly reflects communication patterns. I don't know 
anything about Saturne, but shared memory is a great deal faster than 
TCP, so the more processes sharing a node the better. You may also be 
hitting some natural boundary in your model - perhaps with 8 
processes/node you wind up with more processes that cross the node 
boundary, further increasing the communication requirement.


Do things continue to get worse if you use all 4 nodes with 6 
processes/node?



On Jul 10, 2012, at 7:31 AM, Dugenoux Albert wrote:


Hi.
I have recently built a cluster upon a Dell PowerEdge Server with a 
Debian 6.0 OS. This server is composed of
4 system board of 2 processors of hexacores. So it gives 12 cores per 
system board.

The boards are linked with a local Gbits switch.
In order to parallelize the software Code Saturne, which is a CFD 
solver, I have configured the cluster
such that there are a pbs server/mom on 1 system board and 3 mom and 
the 3 others cards. So this leads to
48 cores dispatched on 4 nodes of 12 CPU. Code saturne is compiled 
with the openmpi 1.6 version.
When I launch a simulation using 2 nodes with 12 cores, elapse time 
is good and network traffic is not full.
But when I launch the same simulation using 3 nodes with 8 cores, 
elapse time is 5 times the previous one.

I both cases, I use 24 cores and network seems not to be satured.
I have tested several configurations : binaries in local file system 
or on a NFS. But results are the same.
I have visited severals forums (in particular 
http://www.open-mpi.org/community/lists/users/2009/08/10394.php)
and read lots of threads, but as I am not an expert at clusters, I 
presently do not see where it is wrong !
Is it a problem in the configuration of PBS (I have installed it from 
the deb packages), a subtile compilation options

of openMPI, or a bad network configuration ?
Regards.
B. S.
___
users mailing list
us...@open-mpi.org 
http://www.open-mpi.org/mailman/listinfo.cgi/users





___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


Re: [OMPI users] fortran program with integer kind=8 using openmpi

2012-06-28 Thread David Warren
You should not have to recompile openmpi, but you do have to use the 
correct type. You can check the size of integers in your fortrana nd use 
MPI_INTEGER4 or MPI_INTEGER8 depending on what you get.

in gfortran use
integer i
if(sizeof(i) .eq. 8) then
mpi_int_type=MPI_INTEGER8
else
mpi_int_type=MPI_INTEGER4
endif
then use mpi_int_type for the type in the calls



On 06/28/12 16:00, William Au wrote:

Hi,

I try to compile my fortran program in linux with gfortran44 using 
option  -fdefault-integer-8,

then all my integer will of kind=8.

My question is what should I do with openmpi? I am using 1.6, should I 
compile openmpi
with the same options? Will it get the correct size of MPI_INTEGER and 
MPI_INTEGER2?




Thanks.

Regards,

William


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


Re: [OMPI users] MPI_BCAST and fortran subarrays

2011-12-14 Thread David Warren
Actually, sub array passing is part of the F90 standard (at least 
according to every document I can find), and not an Intel extension. So 
if it doesn't work you should complain to the compiler company. One of 
the reasons for using it is that the compiler should be optimized for 
whatever method they chose to use. As there are multiple options in the 
F90 standard for how arrays get passed, it is not really a good idea to 
circumvent the official method. Using user defined data types is great 
as long as the compiler chooses to do a simple pointer pass, however if 
they use the copy in/out option you will be making much larger temporary 
arrays than if you just pass the correct subarray. Anyway, this is not 
really an MPI issue as much as an F90 bug in your compiler.


On 12/14/11 08:57, Gustavo Correa wrote:

Hi Patrick

> From my mere MPI and Fortran-90 user point of view,
I think that the solution offered by the MPI standard [at least up to MPI-2]
to address the problem of non-contiguous memory layouts is to use MPI 
user-defined types,
as I pointed out in my previous email.
I like this solution because it is portable and doesn't require the allocation 
of
temporary arrays, and the additional programming effort is not that big.

As far as I know, MPI doesn't parse or comply with the Fortran-90
array-section notation and syntax.  All buffers in the MPI calls are 
pointers/addresses to the
first element on the buffer, which will  be tracked according to the number of 
elements passed
to the MPI call, and according to the MPI type passed to the MPI routine [which 
should be
a user-defined type, if you need to implement a fancy memory layout].

That MPI doesn't understand Fortran-90 array-sections doesn't surprise me so 
much.
I think Lapack doesn't do it either, and many other legitimate Fortran 
libraries don't
'understand' array-sections either.
FFTW, for instance, goes a long way do define its own mechanism to
specify fancy memory layouts independently of the Fortran-90 array-section 
notation.
Amongst the libraries with Fortran interfaces that I've used, MPI probably 
provides the most
flexible and complete mechanism to describe memory layout, through user-defined 
types.
In your case I think the work required to declare a MPI_TYPE_VECTOR to handle 
your
table 'tab' is not really big or complicated.

As two other list subscribers mentioned, and you already tried,
the Intel compiler seems to offer an extension
to deal with this, and shortcut the use of MPI user-defined types.
This Intel compiler extension apparently uses under the hood the same idea of a
temporary array that you used programatically in one of the 'bide' program 
versions
that you sent in your original message.
The temporary array is used to ship data to/from contiguous/non-contiguous 
memory before/after the MPI call is invoked.
I presume this Intel compiler extension would work with libraries other than 
MPI,
whenever the library doesn't understand the Fortran-90 array-section notation.
I never used this extension, though.
For one thing, this solution may not be portable to other compilers.
Another aspect to consider is how much 'under the hood memory allocation' this 
solution
would require if the array you pass to MPI_BCAST is really big,
and how much this may impact performance.

I hope this helps,
Gus Correa

On Dec 14, 2011, at 11:03 AM, Patrick Begou wrote:

   

Thanks all for your anwers. yes, I understand well that it is a non contiguous 
memory access problem as the MPI_BCAST should wait for a pointer on a valid 
memory  zone. But I'm surprised that with the MPI module usage Fortran does not 
hide this discontinuity in a contiguous temporary copy of the array. I've spent 
some time to build openMPI with g++/gcc/ifort (to create the right mpi module) 
and ran some additional tests:


Default OpenMPI is openmpi-1.2.8-17.4.x86_64

# module load openmpi
# mpif90 ess.F90&&  mpirun -np 4 ./a.out
0   1   2   3   0   1   
2   3   0   1   2   3   0   
1   2   3
# module unload openmpi
The result is Ok but sometime it hangs (when I require are a lot of processes)

With OpenMPI 1.4.4 and gfortran from gcc-fortran-4.5-19.1.x86_64

# module load openmpi-1.4.4-gcc-gfortran
# mpif90 ess.F90&&  mpirun -np 4 ./a.out
0  -1  -1  -1   0  -1   
   -1  -1   0  -1  -1  -1   0   
   -1  -1  -1
# module unload openmpi-1.4.4-gcc-gfortran
Node 0 only update the global array with it's subarray. (i only print node 0 
result)


With OpenMPI 1.4.4 and ifort 10.1.018 (yes, it's quite old, I have the latest 
one but it isn't installed!)

# module load openmpi-1.4.4-gcc-intel
# mpif90 ess.F90&&  mpirun -np 4 ./a.out
ess.F90(15): (col. 5) remark: LOOP WAS VECTORIZED.
0  -1  -1   

Re: [OMPI users] MPI_BCAST and fortran subarrays

2011-12-12 Thread David Warren
What FORTRAN compiler are you using? This should not really be an issue 
with the MPI implementation, but with the FORTRAN. This is legitimate 
usage in FORTRAN 90 and the compiler should deal with it. I do similar 
things using ifort and it creates temporary arrays when necessary and it 
all works.


On 12/12/11 09:38, Gustavo Correa wrote:

Hi Patrick

I think tab(i,:) is not contiguous in memory, but has a stride of nbcpus.
Since the MPI type you are passing is just the barebones MPI_INTEGER,
MPI_BCAST expects the four integers to be contiguous in memory, I guess.
The MPI calls don't have any idea of the Fortran90 memory layout,
and the tab(i,:) that you pass to MPI_BCAST means only the address for the 
*first*
MPI_INTEGER to be broadcast (sent and received).

My impression is that you could either:
1) Declare your table transposed, i.e, tab(4,nbcpus-1),
and make a few adjustments in the code
to adapt to this change, which would make tab(:,i) contiguous in memory.
or
2) Keep your current declaration of 'tab', but declare an MPI_TYPE_VECTOR with
the appropriate stride (nbcpus) and use it in your MPI_BCAST call.

For MPI user defined types see Ch. 3 of "MPI, The Complete Reference, Vol.1, The MPI 
Core, 2nd Ed." by M. Snir et. al.

I hope this helps,
Gus Correa

On Dec 12, 2011, at 10:35 AM, Patrick Begou wrote:

   

I've got a strange problem with Fortran90 and MPI_BCAST call on a large 
application. I've isolated the problem in this short program samples.
With fortran we can use subarrays in functions calls. Example, with passing a subarray to 
the "change" procedure:

MODULE mymod
IMPLICIT NONE
CONTAINS
   SUBROUTINE change(tab,i)
 IMPLICIT NONE
 INTEGER, INTENT(INOUT),DIMENSION(:)::tab
 INTEGER, INTENT(IN) :: i
 tab(:)=i
   END SUBROUTINE change
END MODULE mymod

PROGRAM toto
   USE mymod
   IMPLICIT NONE
   INTEGER, PARAMETER::nx=6, ny=4
   INTEGER, DIMENSION(nx,ny):: tab
   INTEGER::i

   tab=-1
   DO i=1,nx
 CALL change(tab(i,:),i)
   ENDDO
   PRINT*,tab
END PROGRAM toto

But If I use subarrays with MPI_BCAST() like in this example:

PROGRAM bide
USE mpi
IMPLICIT NONE
INTEGER :: nbcpus
INTEGER :: my_rank
INTEGER :: ierr,i,buf
INTEGER, ALLOCATABLE:: tab(:,:)

 CALL MPI_INIT(ierr)
 CALL MPI_COMM_RANK(MPI_COMM_WORLD,my_rank,ierr)
 CALL MPI_COMM_SIZE(MPI_COMM_WORLD,nbcpus,ierr)

 ALLOCATE (tab(0:nbcpus-1,4))

 tab(:,:)=-1
 tab(my_rank,:)=my_rank
 DO i=0,nbcpus-1
CALL MPI_BCAST(tab(i,:),4,MPI_INTEGER,i,MPI_COMM_WORLD,ierr)
 ENDDO
 IF (my_rank .EQ. 0) print*,tab
 CALL MPI_FINALIZE(ierr)

END PROGRAM bide

It doesn't work! With openMPI 1.2.8 (OpenSuse 11.4 X86_64) I have random 
segfault: it works sometime, with few cpus (2, 4, 8...) and does'nt work 
sometime with a larger number of cpus (32, 48, 64...). With OpenMPI 1.4.4 
(build from sources) it hangs (most of the array tab remains at the -1 
initialization value).
Such procedure calls are allowed with fortran90 so I do not understand why they 
fail here. I have to use a buffer array (called tabl in the following program) 
to solve the problem.

PROGRAM bide
USE mpi
IMPLICIT NONE
INTEGER :: nbcpus
INTEGER :: my_rank
INTEGER :: ierr,i,buf
INTEGER, ALLOCATABLE:: tab(:,:)
INTEGER::tab1(4)

 CALL MPI_INIT(ierr)
 CALL MPI_COMM_RANK(MPI_COMM_WORLD,my_rank,ierr)
 CALL MPI_COMM_SIZE(MPI_COMM_WORLD,nbcpus,ierr)

 ALLOCATE (tab(0:nbcpus-1,4))

 tab=-1
 tab1=-1
 DO i=0,nbcpus-1
IF(my_rank.EQ.i) tab1=my_rank
CALL MPI_BCAST(tab1,4,MPI_INTEGER,i,MPI_COMM_WORLD,ierr)
tab(i,:)=tab1
 ENDDO
 IF (my_rank .EQ. 0) print*,tab
 CALL MPI_FINALIZE(ierr)

END PROGRAM bide

Any idea about this behavior ?

Patrick
--
===
|  Equipe M.O.S.T. |
http://most.hmg.inpg.fr
   |
|  Patrick BEGOU   |      |
|  LEGI|
mailto:patrick.be...@hmg.inpg.fr
  |
|  BP 53 X | Tel 04 76 82 51 35   |
|  38041 GRENOBLE CEDEX| Fax 04 76 82 52 71   |
===

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
 


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
   


Re: [OMPI users] Open MPI via SSH noob issue

2011-08-09 Thread David Warren
I don't know if this is it, but if you use the name localhost, won't 
processes on both machines try to talk to 127.0.0.1? I believe you need 
to use the real hostname in you host file. I think that your two tests 
work because there is no interprocess communication, just stdout.


On 08/08/11 23:46, Christopher Jones wrote:

Hi again,

I changed the subject of my previous posting to reflect a new problem 
encountered when I changed my strategy to using SSH instead of Xgrid on two mac 
pros. I've set up a login-less ssh communication between the two macs 
(connected via direct ethernet, both running openmpi 1.2.8 on OSX 10.6.8) per 
the instructions on the FAQ. I can type in 'ssh computer-name.local' on either 
computer and connect without a password prompt. From what I can see, the 
ssh-agent is up and running - the following is listed in my ENV:

SSH_AUTH_SOCK=/tmp/launch-5FoCc1/Listeners
SSH_AGENT_PID=61058

My host file simply lists 'localhost' and 
'chrisjones2@allana-welshs-mac-pro.local'. When I run a simple hello_world 
test, I get what seems like a reasonable output:

chris-joness-mac-pro:~ chrisjones$ mpirun -np 8 -hostfile hostfile ./test_hello
Hello world from process 0 of 8
Hello world from process 1 of 8
Hello world from process 2 of 8
Hello world from process 3 of 8
Hello world from process 4 of 8
Hello world from process 7 of 8
Hello world from process 5 of 8
Hello world from process 6 of 8

I can also run hostname and get what seems to be an ok response (unless I'm 
wrong about this):

chris-joness-mac-pro:~ chrisjones$ mpirun -np 8 -hostfile hostfile hostname
allana-welshs-mac-pro.local
allana-welshs-mac-pro.local
allana-welshs-mac-pro.local
allana-welshs-mac-pro.local
quadcore.mikrob.slu.se
quadcore.mikrob.slu.se
quadcore.mikrob.slu.se
quadcore.mikrob.slu.se


However, when I run the ring_c test, it freezes:

chris-joness-mac-pro:~ chrisjones$ mpirun -np 8 -hostfile hostfile ./ring_c
Process 0 sending 10 to 1, tag 201 (8 processes in ring)
Process 0 sent to 1
Process 0 decremented value: 9

(I noted that processors on both computers are active).

ring_c was compiled separately on each computer, however both have the same 
version of openmpi and OSX. I've gone through the FAQ and searched the user 
forum, but I can't quite seems to get this problem unstuck.

Many thanks for your time,
Chris

On Aug 5, 2011, at 6:00 PM,  
  wrote:

   

Send users mailing list submissions to
us...@open-mpi.org

To subscribe or unsubscribe via the World Wide Web, visit
http://www.open-mpi.org/mailman/listinfo.cgi/users
or, via email, send a message with subject or body 'help' to
users-requ...@open-mpi.org

You can reach the person managing the list at
users-ow...@open-mpi.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of users digest..."


Today's Topics:

   1. Re: OpenMPI causing WRF to crash (Jeff Squyres)
   2. Re: OpenMPI causing WRF to crash (Anthony Chan)
   3. Re: Program hangs on send when run with nodes on  remote
  machine (Jeff Squyres)
   4. Re: openmpi 1.2.8 on Xgrid noob issue (Jeff Squyres)
   5. Re: parallel I/O on 64-bit indexed arays (Rob Latham)


--

Message: 1
Date: Thu, 4 Aug 2011 19:18:36 -0400
From: Jeff Squyres
Subject: Re: [OMPI users] OpenMPI causing WRF to crash
To: Open MPI Users
Message-ID:<3f0e661f-a74f-4e51-86c0-1f84feb07...@cisco.com>
Content-Type: text/plain; charset=windows-1252

Signal 15 is usually SIGTERM on Linux, meaning that some external entity 
probably killed the job.

The OMPI error message you describe is also typical for that kind of scenario 
-- i.e., a process exited without calling MPI_Finalize could mean that it 
called exit() or some external process killed it.


On Aug 3, 2011, at 7:24 AM, BasitAli Khan wrote:

 

I am trying to run a rather heavy wrf simulation with spectral nudging but the 
simulation crashes after 1.8 minutes of integration.
The simulation has two domainswith  d01 = 601x601 and d02 = 721x721 and 51 
vertical levels. I tried this simulation on two different systems but result 
was more or less same. For example

On our Bluegene/P  with SUSE Linux Enterprise Server 10 ppc and XLF compiler I 
tried to run wrf on 2048 shared memory nodes (1 compute node = 4 cores , 32 
bit, 850 Mhz). For the parallel run I used mpixlc, mpixlcxx and mpixlf90.  I 
got the following error message in the wrf.err file

  BE_MPI (ERROR): The error message in the job
record is as follows:
  BE_MPI (ERROR):   "killed with signal 15"

I also tried to run the same simulation on our linux cluster (Linux Red Hat 
Enterprise 5.4m  x86_64 and Intel compiler) with 8, 16 and 64 nodes (1 compute 
node=8 cores). For the parallel run I am used mpi/openmpi/1.4.2-intel-11. I got 
the following error message in the error log after couple of minutes of 
integration.

"mpirun has exited due to process rank 45 with PID 19540 

Re: [OMPI users] OpenMPI causing WRF to crash

2011-08-05 Thread David Warren
That error is from one of the processes that was working when another 
one died. It is not an indication that MPI had problems, but that you 
had one of the wrf processes (#45) crash. You need to look at what 
happened to process 45. What do the rsl.out and rsl.error files for #45 
say?


On 08/04/11 16:18, Jeff Squyres wrote:

Signal 15 is usually SIGTERM on Linux, meaning that some external entity 
probably killed the job.

The OMPI error message you describe is also typical for that kind of scenario 
-- i.e., a process exited without calling MPI_Finalize could mean that it 
called exit() or some external process killed it.


On Aug 3, 2011, at 7:24 AM, BasitAli Khan wrote:

   

I am trying to run a rather heavy wrf simulation with spectral nudging but the 
simulation crashes after 1.8 minutes of integration.
  The simulation has two domainswith  d01 = 601x601 and d02 = 721x721 and 
51 vertical levels. I tried this simulation on two different systems but result 
was more or less same. For example

On our Bluegene/P  with SUSE Linux Enterprise Server 10 ppc and XLF compiler I 
tried to run wrf on 2048 shared memory nodes (1 compute node = 4 cores , 32 
bit, 850 Mhz). For the parallel run I used mpixlc, mpixlcxx and mpixlf90.  I 
got the following error message in the wrf.err file

  BE_MPI (ERROR): The error message in the job
record is as follows:
  BE_MPI (ERROR):   "killed with signal 15"

I also tried to run the same simulation on our linux cluster (Linux Red Hat 
Enterprise 5.4m  x86_64 and Intel compiler) with 8, 16 and 64 nodes (1 compute 
node=8 cores). For the parallel run I am used mpi/openmpi/1.4.2-intel-11. I got 
the following error message in the error log after couple of minutes of 
integration.

"mpirun has exited due to process rank 45 with PID 19540 on
node ci118 exiting without calling "finalize". This may
have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here)."

I tried many things but nothing seems to be working. However, if I reduce  grid 
points below 200, the simulation goes fine. It appears that probably OpenMP has 
problem with large number of grid points but I have no idea how to fix it. I 
will greatly appreciate if you could suggest some solution.

Best regards,
---
Basit A. Khan, Ph.D.
Postdoctoral Fellow
Division of Physical Sciences&  Engineering
Office# 3204, Level 3, Building 1,
King Abdullah University of Science&  Technology
4700 King Abdullah Blvd, Box 2753, Thuwal 23955 –6900,
Kingdom of Saudi Arabia.

Office: +966(0)2 808 0276,  Mobile: +966(0)5 9538 7592
E-mail: basitali.k...@kaust.edu.sa
Skype name: basit.a.khan
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
 


   
<>

Re: [OMPI users] Mixed Mellanox and Qlogic problems

2011-07-27 Thread David Warren
Ok, I finally was able to get on and run some ofed tests - it looks to 
me like I must have something configured wrong with the qlogic cards, 
but I have no idea what???


Mellanox to Qlogic:
 ibv_rc_pingpong n15
  local address:  LID 0x0006, QPN 0x240049, PSN 0x87f83a, GID ::
  remote address: LID 0x000d, QPN 0x00b7cb, PSN 0xcc9dee, GID ::
8192000 bytes in 0.01 seconds = 4565.38 Mbit/sec
1000 iters in 0.01 seconds = 14.35 usec/iter

ibv_srq_pingpong n15
  local address:  LID 0x0006, QPN 0x280049, PSN 0xf83e06, GID ::
 ...
8192000 bytes in 0.01 seconds = 9829.91 Mbit/sec
1000 iters in 0.01 seconds = 6.67 usec/iter

ibv_uc_pingpong n15
  local address:  LID 0x0006, QPN 0x680049, PSN 0x7b33d2, GID ::
  remote address: LID 0x000d, QPN 0x00b7ed, PSN 0x7fafaa, GID ::
8192000 bytes in 0.02 seconds = 4080.19 Mbit/sec
1000 iters in 0.02 seconds = 16.06 usec/iter

Qlogic to Qlogic

ibv_rc_pingpong n15
  local address:  LID 0x000b, QPN 0x00afb7, PSN 0x3f08df, GID ::
  remote address: LID 0x000d, QPN 0x00b7ef, PSN 0xd15096, GID ::
8192000 bytes in 0.02 seconds = 3223.13 Mbit/sec
1000 iters in 0.02 seconds = 20.33 usec/iter

ibv_srq_pingpong n15
  local address:  LID 0x000b, QPN 0x00afb9, PSN 0x9cdde3, GID ::
 ...
8192000 bytes in 0.01 seconds = 9018.30 Mbit/sec
1000 iters in 0.01 seconds = 7.27 usec/iter

ibv_uc_pingpong n15
  local address:  LID 0x000b, QPN 0x00afd9, PSN 0x98cfa0, GID ::
  remote address: LID 0x000d, QPN 0x00b811, PSN 0x0a0d6e, GID ::
8192000 bytes in 0.02 seconds = 3318.28 Mbit/sec
1000 iters in 0.02 seconds = 19.75 usec/iter

Mellanox to Mellanox

ibv_rc_pingpong n5
  local address:  LID 0x0009, QPN 0x240049, PSN 0xd72119, GID ::
  remote address: LID 0x0006, QPN 0x6c0049, PSN 0xc1909e, GID ::
8192000 bytes in 0.01 seconds = 7121.93 Mbit/sec
1000 iters in 0.01 seconds = 9.20 usec/iter

ibv_srq_pingpong n5
  local address:  LID 0x0009, QPN 0x280049, PSN 0x78f4f7, GID ::
...
8192000 bytes in 0.00 seconds = 24619.08 Mbit/sec
1000 iters in 0.00 seconds = 2.66 usec/iter

ibv_uc_pingpong n5
  local address:  LID 0x0009, QPN 0x680049, PSN 0x4002ea, GID ::
  remote address: LID 0x0006, QPN 0x300049, PSN 0x29abf0, GID ::
8192000 bytes in 0.01 seconds = 7176.52 Mbit/sec
1000 iters in 0.01 seconds = 9.13 usec/iter


On 07/17/11 05:49, Jeff Squyres wrote:

Interesting.

Try with the native OFED benchmarks -- i.e., get MPI out of the way and see if 
the raw/native performance of the network between the devices reflects the same 
dichotomy.

(e.g., ibv_rc_pingpong)


On Jul 15, 2011, at 7:58 PM, David Warren wrote:

   

All OFED 1.4 and 2.6.32 (that's what I can get to today)
qib to qib:

# OSU MPI Latency Test v3.3
# SizeLatency (us)
0 0.29
1 0.32
2 0.31
4 0.32
8 0.32
160.35
320.35
640.47
128   0.47
256   0.50
512   0.53
1024  0.66
2048  0.88
4096  1.24
8192  1.89
16384 3.94
32768 5.94
65536 9.79
131072   18.93
262144   37.36
524288   71.90
1048576 189.62
2097152 478.55
41943041148.80

# OSU MPI Bandwidth Test v3.3
# SizeBandwidth (MB/s)
1 2.48
2 5.00
410.04
820.02
16   33.22
32   67.32
64  134.65
128 260.30
256 486.44
512 860.77
1024   1385.54
2048   1940.68
4096   2231.20
8192   2343.30
16384  2944.99
32768  3213.77
65536  3174.85
131072 3220.07
262144 3259.48
524288 3277.05
10485763283.97
20971523288.91
41943043291.84

# OSU MPI Bi-Directional Bandwidth Test v3.3
# Size Bi-Bandwidth (MB/s)
1 3.10
2 6.21
413.08
826.91
16   41.00
32   78.17
64  161.13
128 312.08
256 588.18
512 968.32
1024   1683.42
2048   2513.86
4096   2948.11
8192   2918.39
16384  3370.28
32768  3543.99
65536  4159.99
131072 4709.73
262144 4733.31
524288 4795.44
10485764753.69
20971524786.11
41

Re: [OMPI users] Mixed Mellanox and Qlogic problems

2011-07-15 Thread David Warren
.86
4 3.85
8 3.92
163.93
323.93
644.02
128   4.60
256   4.80
512   5.14
1024  5.94
2048  7.26
4096  8.50
8192 10.98
1638419.92
3276826.35
6553639.93
131072   64.45
262144  106.93
524288  191.89
1048576 358.31
2097152 694.25
41943041429.56

# OSU MPI Bandwidth Test v3.3
# SizeBandwidth (MB/s)
1 0.64
2 1.39
4 2.76
8 5.58
16   11.03
32   22.17
64   43.70
128 100.49
256 179.83
512 305.87
1024544.68
2048838.22
4096   1187.74
8192   1542.07
16384  1260.93
32768  1708.54
65536  2180.45
131072 2482.28
262144 2624.89
524288 2680.55
10485762728.58
never gets past here

# OSU MPI Bi-Directional Bandwidth Test v3.3
# Size Bi-Bandwidth (MB/s)
1 0.41
2 0.83
4 1.68
8 3.37
166.71
32   13.37
64   26.64
128  63.47
256 113.23
512 202.92
1024362.48
2048578.53
4096830.31
8192   1143.16
16384  1303.02
32768  1913.07
65536  2463.83
131072 2793.83
262144 2918.32
524288 2987.92
10485763033.31
never gets past here



On 07/15/11 09:03, Jeff Squyres wrote:

I don't think too many people have done combined QLogic + Mellanox runs, so 
this probably isn't a well-explored space.

Can you run some microbenchmarks to see what kind of latency / bandwidth you're 
getting between nodes of the same type and nodes of different types?

On Jul 14, 2011, at 8:21 PM, David Warren wrote:

   

On my test runs (wrf run just long enough to go beyond the spinup influence)
On just 6 of the the old mlx4 machines I get about 00:05:30 runtime
On 3 mlx4 and 3 qib nodes I get avg of 00:06:20
So the slow down is about 11+%
When this is a full run 11% becomes a evry long time.  This has held for some 
longer tests as well before I went to ofed 1.6.

On 07/14/11 05:55, Jeff Squyres wrote:
 

On Jul 13, 2011, at 7:46 PM, David Warren wrote:


   

I finally got access to the systems again (the original ones are part of our 
real time system). I thought I would try one other test I had set up first.  I 
went to OFED 1.6 and it started running with no errors. It must have been an 
OFED bug. Now I just have the speed problem. Anyone have a way to make the 
mixture of mlx4 and qlogic work together without slowing down?

 

What do you mean by "slowing down"?


   


 


   
<>

Re: [OMPI users] Mixed Mellanox and Qlogic problems

2011-07-14 Thread David Warren

On my test runs (wrf run just long enough to go beyond the spinup influence)
On just 6 of the the old mlx4 machines I get about 00:05:30 runtime
On 3 mlx4 and 3 qib nodes I get avg of 00:06:20
So the slow down is about 11+%
When this is a full run 11% becomes a evry long time.  This has held for 
some longer tests as well before I went to ofed 1.6.


On 07/14/11 05:55, Jeff Squyres wrote:

On Jul 13, 2011, at 7:46 PM, David Warren wrote:

   

I finally got access to the systems again (the original ones are part of our 
real time system). I thought I would try one other test I had set up first.  I 
went to OFED 1.6 and it started running with no errors. It must have been an 
OFED bug. Now I just have the speed problem. Anyone have a way to make the 
mixture of mlx4 and qlogic work together without slowing down?
 

What do you mean by "slowing down"?

   
<>

Re: [OMPI users] Mixed Mellanox and Qlogic problems

2011-07-13 Thread David Warren
I finally got access to the systems again (the original ones are part of 
our real time system). I thought I would try one other test I had set up 
first.  I went to OFED 1.6 and it started running with no errors. It 
must have been an OFED bug. Now I just have the speed problem. Anyone 
have a way to make the mixture of mlx4 and qlogic work together without 
slowing down?


On 07/07/11 17:19, Jeff Squyres wrote:

Huh; wonky.

Can you set the MCA parameter "mpi_abort_delay" to -1 and run your job again? 
This will prevent all the processes from dying when MPI_ABORT is invoked.  Then attach a 
debugger to one of the still-live processes after the error message is printed.  Can you 
send the stack trace?  It would be interesting to know what is going on here -- I can't 
think of a reason that would happen offhand.


On Jun 30, 2011, at 5:03 PM, David Warren wrote:

   

I have a cluster with mostly Mellanox ConnectX hardware and a few with Qlogic 
QLE7340's. After looking through the web, FAQs etc. I built openmpi-1.5.3 with 
psm and openib. If I run within the same hardware it is fast and works fine. If 
I run between without specifying an MTL (e.g. mpirun -np 24 -machinefile 
dwhosts --byslot --bind-to-core --mca btl ^tcp ...) it dies with
*** The MPI_Init() function was called before MPI_INIT was invoked.
 

*** This is disallowed by the MPI standard.
*** Your MPI job will now abort.
[n16:9438] Abort before MPI_INIT completed successfully; not able to
   

guarantee that all other processes were killed!
 

*** The MPI_Init() function was called before MPI_INIT was invoked.
*** This is disallowed by the MPI standard.
*** Your MPI job will now abort.
   

...
I can make it run but giving a bad mtl e.g. -mca mtl psm,none. All the 
processes run after complaining that mtl none does not exist. However, they run 
just as slow (about 10% slower than either set alone)

Pertinent info:
On the Qlogic Nodes:
OFED: QLogic-OFED.SLES11-x86_64.1.5.3.0.22
On the Mellanox Nodes:
OFED-1.5.2.1-20101105-0600

All:
debian lenny kernel 2.6.32.41
OpenSM
limit | grep memorylocked gives unlimited on all nodes.

Configure line:
./configure --with-libnuma --with-openib --prefix=/usr/local/openmpi-1.5.3 
--with-psm=/usr --enable-btl-openib-failover --enable-openib-connectx-xrc 
--enable-openib-rdmacm

I thought that with 1.5.3 I am supposed to be able to do this. Am I just wrong? 
Does anyone see what I am doing wrong?

Thanks
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
 


   
<>

[OMPI users] Mixed Mellanox and Qlogic problems

2011-06-30 Thread David Warren
I have a cluster with mostly Mellanox ConnectX hardware and a few with 
Qlogic QLE7340's. After looking through the web, FAQs etc. I built 
openmpi-1.5.3 with psm and openib. If I run within the same hardware it 
is fast and works fine. If I run between without specifying an MTL (e.g. 
mpirun -np 24 -machinefile dwhosts --byslot --bind-to-core --mca btl 
^tcp ...) it dies with

*** The MPI_Init() function was called before MPI_INIT was invoked.

 *** This is disallowed by the MPI standard.
 *** Your MPI job will now abort.
 [n16:9438] Abort before MPI_INIT completed successfully; not able to 

guarantee that all other processes were killed!

 *** The MPI_Init() function was called before MPI_INIT was invoked.
 *** This is disallowed by the MPI standard.
 *** Your MPI job will now abort.

...
I can make it run but giving a bad mtl e.g. -mca mtl psm,none. All the 
processes run after complaining that mtl none does not exist. However, 
they run just as slow (about 10% slower than either set alone)


Pertinent info:
On the Qlogic Nodes:
OFED: QLogic-OFED.SLES11-x86_64.1.5.3.0.22
On the Mellanox Nodes:
OFED-1.5.2.1-20101105-0600

All:
debian lenny kernel 2.6.32.41
OpenSM
limit | grep memorylocked gives unlimited on all nodes.

Configure line:
./configure --with-libnuma --with-openib 
--prefix=/usr/local/openmpi-1.5.3 --with-psm=/usr 
--enable-btl-openib-failover --enable-openib-connectx-xrc 
--enable-openib-rdmacm


I thought that with 1.5.3 I am supposed to be able to do this. Am I just 
wrong? Does anyone see what I am doing wrong?


Thanks


mellanox_devinfo.gz
Description: GNU Zip compressed data


mellanox_ifconfig.gz
Description: GNU Zip compressed data


ompi_info_output.gz
Description: GNU Zip compressed data


qlogic_devinfo.gz
Description: GNU Zip compressed data


qlogic_ifconfig.gz
Description: GNU Zip compressed data
<>