General informations:
------------------------------------
3 node Opteron cluster, 24CPUs, Melanox Infiniband 10Gb interconnect
Debian Lenny 5.0
self build kernel from kernel.org: 2.6.32.12, all NFS functions available from kernel side self build NFS-utils 1.2.2 from debian source of sid: nfs-kernel-server, nfs-common

nfs-server with working lockd
fnctl() and locking is available on all nfs-clients, tested with perl-script (attached)

openMPI 1.4.2 (build with GNU 4.3.2)
configure options:
./configure --prefix=/opt/openMPI_gnu_4.3.2 --sysconfdir=/etc --localstatedir=/var --with-libnuma=/usr --with-libnuma-libdir=/usr/lib --enable-mpirun-prefix-by-default --enable-sparse-groups --enable-static --enable-cxx-exceptions --with-wrapper-cflags='-O3 -march=opteron' --with-wrapper-cxxflags='-O3 -march=opteron' --with-wrapper-fflags='-O3 -march=opteron' --with-wrapper-fcflags='-O3 -march=opteron' --with-openib --with-gnu-ld CFLAGS='-O3 -march=opteron' CXXFLAGS='-O3 -march=opteron' FFLAGS='-O3 -march=opteron' FCFLAGS='-O3 -march=opteron'

=======================================================================================

Dear openMPI developers,

I've found a bug in the current stable release of openMPI 1.4.2 which is related to the MPI_WRITE function in combination with the execution on a NFS-v3-crossmount. I've attached a small Fortran code-snip (testmpi.f), which uses mpi_write to create a file "test.dat" which contains {1,2,3,4,5,6} in binary, MPI_REALS written from every mpi-node executed on, in the right displacement to every node.

When I execute this code on a glusterFS share, everthing works like a charme....no problems at all....

The Problem is, when I try to compile and execute this program for two nodes on an NFS-crossmount with openMPI, I get the following MPI error:
[ppclus02:23440] *** An error occurred in MPI_Bcast
[ppclus02:23440] *** on communicator MPI COMMUNICATOR 3 DUP FROM 0
[ppclus02:23440] *** MPI_ERR_TRUNCATE: message truncated
[ppclus02:23440] *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
--------------------------------------------------------------------------
mpiexec has exited due to process rank 1 with PID 23440 on
node 192.168.11.2 exiting without calling "finalize". This may
have caused other processes in the application to be
terminated by signals sent by mpiexec (as reported here).
--------------------------------------------------------------------------

My first educated guess was, that my NFS-crossmounts aren't capable to make use of fnct() to lock the file needed by MPI_WRITE. So, i gave a try on the following perl script (lock.pl). The result was: fnctl() and NFS-file-locking works...

In comparison, I also tried the recent unstable version of MPICH2 v1.3a2 on the same NFS-crossmount. With MPICH2 it works also without any problems on NFS-v3.

Thanks for your help, I remain in

best regards,
Oliver Deppert


lock.pl (to test NFS fnctl()-file locking)
-----------------------------------------------------------------------------------------------------------------------------------------------------

#!/usr/bin/perl
 use Fcntl;
 open FH, ">locktest.lock" or die "Cannot open $fn: $!";
 print "Testing fcntl...\n";
 @list = (F_WRLCK,0,0,0,0); # exclusive write lock, entire file
 $struct = pack("SSLLL",@list);
 fcntl(FH,&F_SETLKW,$struct) or die("cannot lock because: $!\n");

------------------------------------------------------------------------------------------------------------------------------------------------------

testmpi.f (fortran 90 code-snip to test mpi_write on NFS-v3)
-----------------------------------------------------------------------------------------------------------------------------------------------------
      program WRITE_FILE

      implicit none
      include 'mpif.h'

      integer info,pec
      integer npe,mpe,mtag

      integer :: realsize,file,displace,displaceloc
      integer(kind=MPI_OFFSET_KIND) :: disp
      integer :: status(MPI_STATUS_SIZE)
      real(kind=4) :: locidx(6)


c INITIALIZATION

      call MPI_INIT(info)
call MPI_COMM_SIZE(MPI_COMM_WORLD,npe,info) call MPI_COMM_RANK(MPI_COMM_WORLD,mpe,info)


c routine

      mtag=123
      displace=6

      !send data offset
      do pec=0,mpe-1
         CALL MPI_SEND(displace,1,MPI_INTEGER,
&           pec,mtag,MPI_COMM_WORLD,info)
      enddo
      do pec=mpe+1,npe-1
         CALL MPI_SEND(displace,1,MPI_INTEGER,
&           pec,mtag,MPI_COMM_WORLD,info)
      enddo

      displaceloc=0
      !get data offset
      do pec=0,mpe-1
         CALL MPI_RECV(displace,1,MPI_INTEGER,pec,mtag,
&                    MPI_COMM_WORLD,status,info)

         displaceloc=displaceloc+displace
      enddo

      CALL MPI_TYPE_EXTENT(MPI_REAL,realsize,info)
      disp=displaceloc*realsize

      !open file
      CALL MPI_FILE_OPEN(MPI_COMM_WORLD,'test.dat',
&   MPI_MODE_WRONLY+MPI_MODE_CREATE,MPI_INFO_NULL,file,info)

      !set file view (displacement in bytes)
      CALL MPI_FILE_SET_VIEW(file,disp,MPI_REAL,
&   MPI_REAL,'native',MPI_INFO_NULL,info)

      !write out data
      locidx(1)=1
      locidx(2)=2
      locidx(3)=3
      locidx(4)=4
      locidx(5)=5
      locidx(6)=6

      CALL MPI_FILE_WRITE(file,locidx,6,MPI_REAL,
&        status,info)

      !wait until all processes are done
      !sync-barrier-sync recommended by mpi-consortium to guarantee
      !file consistency
      !http://www.mpi-forum.org/docs/mpi-20-html/node215.htm (2010)
      call MPI_FILE_SYNC(file,info)
      call MPI_BARRIER(MPI_COMM_WORLD,info)
      CALL MPI_FILE_SYNC(file,info)
      !close file
      call MPI_FILE_CLOSE(file,info)

      call MPI_FINALIZE(info)
      stop

      end

------------------------------------------------------------------------------------------------------------------------------------------------------

Reply via email to