On Oct 28, 2011, at 1:59 AM, Eugene Loh wrote:

> In our MTT testing, we see ibm/io/file_status_get_count fail occasionally 
> with:
> 
> File locking failed in ADIOI_Set_lock(fd A,cmd F_SETLKW/7,type 
> F_RDLCK/0,whence 0) with return value
> FFFFFFFF and errno 5.
> - If the file system is NFS, you need to use NFS version 3, ensure that the 
> lockd daemon is running
> on all the machines, and mount the directory with the 'noac' option (no 
> attribute caching).
> - If the file system is LUSTRE, ensure that the directory is mounted with the 
> 'flock' option.
> ADIOI_Set_lock:: Input/output error
> ADIOI_Set_lock:offset 0, length 1
> 
> One of the curious things (to us) about this test is that no one else appears 
> to run it.  Looking back through a lot of MTT results, essentially the only 
> results reported are Oracle.  Almost no non-Oracle results for this test have 
> been reported in the last few months.  Is there something special about this 
> test we should know about?

Not that I'm aware of.

I see why Cisco skipped it -- I didn't have the "io" directory listed in my 
list of IBM directories to traverse.  Doh!  That's been fixed.

(Cisco's MTT runs look like they need a bit of TLC -- I'm guessing IB is down 
on a node or two, resulting in a lot of false failures, but I likely won't have 
time to look at them until after SC :-( )

> P.S.  We're also interested in understanding the error message better.  I 
> suppose that's more appropriately taken up with ROMIO folks, which I will do, 
> but if anyone on this list has useful information I'd love to hear it.  The 
> error apparently comes when MPI_File_get_size sets a lock.  Each process has 
> its own file and the test usually passes, so it's unclear to me what the 
> problem is.  Further, the error message discussing NFS and Lustre strikes me 
> as rather speculative.  We tend to run these tests repeatedly on the same 
> file systems from the same test nodes.  Anyone have any idea how sound the 
> NFSv3/lockd/noac advice is or what the real issue is here?

No.  You'll need to ask Rob Latham.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/


Reply via email to