On Thu, Dec 18, 2008 at 7:59 PM, Roman Shaposhnik <r...@sun.com> wrote: > On Dec 18, 2008, at 7:26 PM, ron minnich wrote: >> >> On Thu, Dec 18, 2008 at 7:06 PM, Roman Shaposhnik <r...@sun.com> wrote: >>> >>> Its fun, yes. But I believe this is more of a testament to the >>> statelessness >>> of the NFS >>> plus the fact that the "end of file" is not a well defined offset (unlike >>> beginning of >>> the file). >> >> no, it's even worse with stateful systems. >
you want to write at EOF. Where is EOF? On Plan 9 on an append file, server by definition always knows: it's where the last write was. So writes go at EOF. What about writing append files in a stateful FS where it's up to the client to figure out where the end is? client by definition knows more than the server. So client has to do this: 1. get metadata in a way that indicates that nobody else gets to write. Client calls server to get exclusive access to metadata/file. This can result in server-client callbacks to all other clients. This is fun to watch on 1000s of nodes. Before the right hacks went in it could take 30 minutes. I am not making this up. Why? Well, what if *every* one of the thousands of clients is trying to write at eof and they're all fighting for the metadata? Congestive collapse, that's what. 2. Client writes at eof. Since the client has exclusive access at this point, it's pretty fast. 3. Clients releases the metadata lock to the server and hence the other thousands of clients. The 'client write at EOF' is bad for precisely the same reason that you don't want to use shared memory for locks in a CC-NUMA machine; you want to send the operation to the data, not move the data to the operation. Lots of great papers on this over the years ... ron