Re: [Pvfs2-developers] ncache causes shared creat problems

Phil Carns Mon, 28 Aug 2006 14:25:54 -0700

I think so. When one node deletes a file, it does not send out messagesto invalidate the cache in all of the other clients, so those still havea cached (no longer valid) entry.

If those other clients then lookup the file it will succeed (as ifanother client had won the race to create it), but when they try toaccess the file there will be an error because the handle in the cacheis stale.

There really isn't much way around this with the local ncache approach.Maybe the stock release should have the ncache disabled if thisworkload will be common (having one client delete a particular file andthen a different client immediately recreate a file with the same name),or maybe at least disable it by default for system interface usage sinceMPI programs are probably more likely to trigger this than VFS programs.


-Phil

Robert Latham wrote:

On Mon, Aug 28, 2006 at 04:28:32PM -0400, Pete Wyckoff wrote:

So yeah, the file gets deleted by just one task.  Then they all
simultaneously try to create it again.



That's also what happens when noncontig_coll2 was failing.  We did ok
until a different process tried to open the file that another process
just deleted.  By turning the ncache timeout way down (not disabled, but
set to a very short interval), the test would pass.  Guess the delete
from one process wasn't visible (is that the right word?) to other
processes.

==rob


_______________________________________________
Pvfs2-developers mailing list
Pvfs2-developers@beowulf-underground.org
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

Re: [Pvfs2-developers] ncache causes shared creat problems

Reply via email to