Dear Rainer,

Thanks for your (and George's) very quick answer and the fast solution! The patch works great.

Best regards
Daniel Spångberg


On Wed, 13 Jun 2007 01:38:26 +0200, Rainer Keller <kel...@hlrs.de> wrote:

Dear Daniel,
well, this definitly would be an issue for the us...@open-mpi.org list.
Your report is definitely important and valid, all functions allowing
MPI_BOTTOM in the fortran interface should be using OMPI_ADDR to convert for
the C-interface....

Aaah, I see George has a similar commit -- possibly, we need to sync.
Please see the code attached, compared r15030, it adapts the macros to use
OMPI_F2C_...

Thanks,
Rainer

On Tuesday 12 June 2007 23:15, Daniel Spångberg wrote:
One of the users of our computer cluster have reported a problem with his
simulation code. We have been able to trace this down to a problem with
the MPI struct when absolute addressing is used for the members in the
struct, necessating the use of MPI_BOTTOM. The problem occurs with
MPI_BCAST, but works fine with MPI_SEND and MPI_RECV. I digged down into
the openmpi code and found out that the problem occurs only in fortran,
which seems to be because when MPI_BCAST is called, the OMPI_ADDR macro in
ompi/mpi/f77/constants.h is never evaluated which thus never turns the
fortran MPI_BOTTOM into a C MPI_BOTTOM. When MPI_SEND and MPI_RECV are
used the OMPI_ADDR macro is evaluated and no problem occurs.

An example of a problematic code (tested on two processes):

       program testme
       implicit none
       include 'mpif.h'
       integer ierr,rank,size
       integer btype(1),blen(1),param_batch
       real param
       integer(kind=mpi_address_kind) :: disp(1)
       integer status(MPI_STATUS_SIZE)

       param=0.

       call MPI_INIT(ierr)
       call MPI_GET_ADDRESS(param,disp(1),ierr)
       if (ierr.ne.MPI_SUCCESS) write(*,*) 'MPI_GET_ADDRESS FAILED'
       btype(1)=MPI_REAL
       blen(1)=1
       call MPI_COMM_RANK(MPI_COMM_WORLD,rank,ierr)
       call MPI_COMM_SIZE(MPI_COMM_WORLD,size,ierr)
       call MPI_TYPE_CREATE_STRUCT(1,blen,disp,btype,param_batch,ierr)
       if (ierr.ne.MPI_SUCCESS) then
         write(*,*) 'MPI_TYPE_CREATE_STRUCT FAILED'
       endif
       call MPI_TYPE_COMMIT(param_batch,ierr)
       if (ierr.ne.MPI_SUCCESS) write(*,*) 'MPI_TYPE_COMMIT FAILED'
       if (rank.eq.0) param=100.
c      if (rank.eq.0) then
c        call MPI_SEND(MPI_BOTTOM,1,
c     x       param_batch,1,0,MPI_COMM_WORLD,ierr)
c      else
c        call MPI_RECV(MPI_BOTTOM,1,
c     x       param_batch,0,0,MPI_COMM_WORLD,status,ierr)
c      endif
       call MPI_BCAST(MPI_BOTTOM,1,param_batch,0,MPI_COMM_WORLD,ierr)
       if (ierr.ne.MPI_SUCCESS) write(*,*) 'MPI_BCAST FAILED'
       write(*,*) 'Rank:',rank,'Size:',size,'Param=',param
       call MPI_Finalize(ierr)

       end program testme

mpirun -np 2 testme_submitted_to_devel
[auchentoshan:21021] *** Process received signal ***
[auchentoshan:21021] Signal: Segmentation fault (11)
[auchentoshan:21021] Signal code: Address not mapped (1)
[auchentoshan:21021] Failing at address: 0x7fc04ffd7c
[auchentoshan:21021] [ 0] /lib64/tls/libpthread.so.0 [0x3d9660c430]
[auchentoshan:21021] [ 1] /lib64/tls/libc.so.6(memcpy+0x60) [0x3d95571ec0]
[auchentoshan:21021] [ 2]
/opt/openmpi-1.2.1-gcc4/lib/libmpi.so.0(ompi_convertor_pack+0x164)
[0x2a957cf5b4]
[auchentoshan:21021] [ 3]
/opt/openmpi-1.2.1-gcc4/lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_send_request
_start_copy+0x24c) [0x2a982851dc]
[auchentoshan:21021] [ 4]
/opt/openmpi-1.2.1-gcc4/lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_isend+0x217)
[0x2a9827fe77]
[auchentoshan:21021] [ 5]
/opt/openmpi-1.2.1-gcc4/lib/openmpi/mca_coll_tuned.so(ompi_coll_tuned_bcast
_intra_generic+0x354) [0x2a98ac08b4]
[auchentoshan:21021] [ 6]
/opt/openmpi-1.2.1-gcc4/lib/openmpi/mca_coll_tuned.so(ompi_coll_tuned_bcast
_intra_binomial+0xc8) [0x2a98ac0bd8]
[auchentoshan:21021] [ 7]
/opt/openmpi-1.2.1-gcc4/lib/libmpi.so.0(PMPI_Bcast+0x15c) [0x2a957d62ac]
[auchentoshan:21021] [ 8]
/opt/openmpi-1.2.1-gcc4/lib/libmpi_f77.so.0(pmpi_bcast_+0x5a)
[0x2a9567e99a]
[auchentoshan:21021] [ 9] testme_submitted_to_devel(MAIN__+0x1f8)
[0x401080]
[auchentoshan:21021] [10] testme_submitted_to_devel(main+0xe) [0x4011be]
[auchentoshan:21021] [11] /lib64/tls/libc.so.6(__libc_start_main+0xdb)
[0x3d9551c3fb]
[auchentoshan:21021] [12] testme_submitted_to_devel [0x400dfa]
[auchentoshan:21021] *** End of error message ***
mpirun noticed that job rank 0 with PID 21021 on node auchentoshan exited
on signal 11 (Segmentation fault).
1 additional process aborted (not shown)

The openmpi version we have tested this on includes the latest, 1.2.2
version as well although the log message above is for version 1.2.1.
Distribution: Scientific Linux 4.4 (RHEL4 clone). The problem occurs on
both AMD64 and i386. The problem occurs both when using gcc/gfortran and
the portland group compilers version 7.0-4. Interconnect GBE.

daniels@auchentoshan:~ > gcc4 --version
gcc4 (GCC) 4.1.0 20060515 (Red Hat 4.1.0-18)
Copyright (C) 2006 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

daniels@auchentoshan:~ > gfortran --version
GNU Fortran 95 (GCC) 4.1.0 20060515 (Red Hat 4.1.0-18)
Copyright (C) 2006 Free Software Foundation, Inc.

GNU Fortran comes with NO WARRANTY, to the extent permitted by law.
You may redistribute copies of GNU Fortran
under the terms of the GNU General Public License.
For more information about these matters, see the file named COPYING

Working C code which does the same thing:
#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>

int main(int argc, char **argv)
{
   int rank,size;
   float param=0.;
   MPI_Datatype btype[1],param_batch;
   MPI_Aint disp[1];
   int blen[1];

   MPI_Init(&argc,&argv);
   if (MPI_Get_address(&param,&disp[0])!=MPI_SUCCESS)
     fprintf(stderr,"MPI_Get_address failed.\n");
   btype[0]=MPI_FLOAT;
   blen[0]=1;
   MPI_Comm_rank(MPI_COMM_WORLD,&rank);
   MPI_Comm_size(MPI_COMM_WORLD,&size);
if (MPI_Type_create_struct(1,blen,disp,btype,&param_batch)!=MPI_SUCCESS)
     fprintf(stderr,"MPI_Type_Create_Struct failed.\n");
   if (MPI_Type_commit(&param_batch)!=MPI_SUCCESS)
     fprintf(stderr,"MPI_Type_Commit failed.\n");
   if (rank==0)
     param=100.;
if (MPI_Bcast(MPI_BOTTOM,1,param_batch,0,MPI_COMM_WORLD)!=MPI_SUCCESS)
     fprintf(stderr,"MPI_Bcast failed.\n");
   printf("Rank:%d, Size:%d, Param=%f\n",rank,size,param);
   MPI_Finalize();
   return 0;
}


Best regards
Daniel Spångberg

_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



Reply via email to