I think so. When one node deletes a file, it does not send out messages to invalidate the cache in all of the other clients, so those still have a cached (no longer valid) entry.

If those other clients then lookup the file it will succeed (as if another client had won the race to create it), but when they try to access the file there will be an error because the handle in the cache is stale.

There really isn't much way around this with the local ncache approach. Maybe the stock release should have the ncache disabled if this workload will be common (having one client delete a particular file and then a different client immediately recreate a file with the same name), or maybe at least disable it by default for system interface usage since MPI programs are probably more likely to trigger this than VFS programs.

-Phil

Robert Latham wrote:
On Mon, Aug 28, 2006 at 04:28:32PM -0400, Pete Wyckoff wrote:

So yeah, the file gets deleted by just one task.  Then they all
simultaneously try to create it again.


That's also what happens when noncontig_coll2 was failing.  We did ok
until a different process tried to open the file that another process
just deleted.  By turning the ncache timeout way down (not disabled, but
set to a very short interval), the test would pass.  Guess the delete
from one process wasn't visible (is that the right word?) to other
processes.

==rob


_______________________________________________
Pvfs2-developers mailing list
Pvfs2-developers@beowulf-underground.org
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

Reply via email to