In our MTT testing, we see ibm/io/file_status_get_count fail
occasionally with:
File locking failed in ADIOI_Set_lock(fd A,cmd F_SETLKW/7,type F_RDLCK/0,whence
0) with return value
FFFFFFFF and errno 5.
- If the file system is NFS, you need to use NFS version 3, ensure that the
lockd daemon is running
on all the machines, and mount the directory with the 'noac' option (no
attribute caching).
- If the file system is LUSTRE, ensure that the directory is mounted with the
'flock' option.
ADIOI_Set_lock:: Input/output error
ADIOI_Set_lock:offset 0, length 1
One of the curious things (to us) about this test is that no one else
appears to run it. Looking back through a lot of MTT results,
essentially the only results reported are Oracle. Almost no non-Oracle
results for this test have been reported in the last few months. Is
there something special about this test we should know about?
P.S. We're also interested in understanding the error message better.
I suppose that's more appropriately taken up with ROMIO folks, which I
will do, but if anyone on this list has useful information I'd love to
hear it. The error apparently comes when MPI_File_get_size sets a
lock. Each process has its own file and the test usually passes, so
it's unclear to me what the problem is. Further, the error message
discussing NFS and Lustre strikes me as rather speculative. We tend to
run these tests repeatedly on the same file systems from the same test
nodes. Anyone have any idea how sound the NFSv3/lockd/noac advice is or
what the real issue is here?