Hi,

we're facing a NFS race condition if File_Open is called for
a nonexisting file:

#include <mpi.h>
int main(int argc, char *argv[])
{
    MPI::Init(argc, argv);
    MPI::File _outputFile;
    double dummy = 42;

    _outputFile = MPI::File::Open(MPI::COMM_WORLD,
            "foo",
            MPI_MODE_CREATE | MPI_MODE_WRONLY, MPI::INFO_NULL);
    _outputFile.Set_errhandler(MPI::ERRORS_ARE_FATAL);
    _outputFile.Write(&dummy, 1, MPI::DOUBLE);
    _outputFile.Close();
    MPI::Finalize();
}

If run on two or more nodes with shared NFS, it usually fails:

ADIOI_NFS_OPEN (line 55): **filenoexist fooADIOI_NFS_OPEN (line 55): 
**filenoexist fooMPI_FILE_CLOSE (line 51): **iobadfh
ADIO_OPEN (line 273): **oremote_fail
ADIOI_NFS_OPEN (line 55): **filenoexist fooADIOI_NFS_OPEN (line 55): 
**filenoexist fooADIOI_NFS_OPEN (line 55): **filenoexist fooADIOI_NFS_OPEN 
(line 55): **filenoexist foo[amun2:12137] *** An error occurred in 
MPI_File_write
[amun2:12137] *** on a NULL file
MPI_FILE_CLOSE (line 51): **iobadfh
MPI_FILE_CLOSE (line 51): **iobadfh
MPI_FILE_CLOSE (line 51): **iobadfh
[amun2:12137] *** MPI_ERR_FILE: invalid file
[amun2:12137] *** MPI_ERRORS_ARE_FATAL (goodbye)
[inge:19493] *** An error occurred in MPI_File_write
[inge:19493] *** on a NULL file
[amun4:10186] *** An error occurred in MPI_File_write
[amun4:10186] *** on a NULL file
[amun3:11146] *** An error occurred in MPI_File_write
[amun3:11146] *** on a NULL file


(There are chances that this code will succeed if it is run on only two
 nodes and rank=0 is the NFS client and rank=1 is the NFS server)

The file is created on rank 0, closed and later reopened by all N
processes as described in ad_open.c around line 163. Unfortunately,
NFS isn't fast enough to inform all clients about the new file.
Also sync-mounting the share doesn't solve this issue.

A well-placed system("ls") in the code remedies the problem.
To avoid this noisy call, I've reimplemented this ls with
open(".") and stat("."), but stat() isn't necessary.

The attached patch fixes this problem, but perhaps there is
a better way to do it. What about upstream? (MPICH)?

(I guess NFS is widely used, so there should be more people
 facing this issue).


-- 
Cluster and Metacomputing Working Group
Friedrich-Schiller-Universität Jena, Germany

private: http://adi.thur.de
Index: ompi/mca/io/romio/romio/adio/common/ad_open.c
===================================================================
--- ompi/mca/io/romio/romio/adio/common/ad_open.c       (revision 1913)
+++ ompi/mca/io/romio/romio/adio/common/ad_open.c       (working copy)
@@ -5,6 +5,9 @@
  *   See COPYRIGHT notice in top-level directory.
  */

+#include <libgen.h>
+#include <unistd.h>
+
 #include "adio.h"
 #include "adio_extern.h"
 #include "adio_cb_config_list.h"
@@ -226,8 +229,19 @@
     }
     fd->access_mode = access_mode;

-    (*(fd->fns->ADIOI_xxx_Open))(fd, error_code);
+    if ((ADIO_NFS == fd->file_system)) {
+        char *dirc = ADIOI_Strdup(filename);
+        char *dname = dirname (dirc);
+        int my_fd;

+        my_fd = open (dname, O_RDONLY);
+        //stat (my_fd, NULL);
+        close (my_fd);
+        ADIOI_Free(dirc);
+        free (dname);
+        (*(fd->fns->ADIOI_xxx_Open))(fd, error_code);
+    }
+
     /* if error, may be it was due to the change in amode above. 
        therefore, reopen with access mode provided by the user.*/ 
     fd->access_mode = orig_amode_wronly;  

Reply via email to