Re: [9fans] 9pfuse and O_APPEND

ron minnich Fri, 19 Dec 2008 08:55:56 -0800

On Thu, Dec 18, 2008 at 7:59 PM, Roman Shaposhnik <r...@sun.com> wrote:
> On Dec 18, 2008, at 7:26 PM, ron minnich wrote:
>>
>> On Thu, Dec 18, 2008 at 7:06 PM, Roman Shaposhnik <r...@sun.com> wrote:
>>>
>>> Its fun, yes. But I believe this is more of a testament to the
>>> statelessness
>>> of the NFS
>>> plus the fact that the "end of file" is not a well defined offset (unlike
>>> beginning of
>>> the file).
>>
>> no, it's even worse with stateful systems.
>



you want to write at EOF. Where is EOF? On Plan 9 on an append file,
server by definition always knows: it's where the last write was. So
writes go at EOF.

What about writing append files in a stateful FS where it's up to the
client to figure out where the end is?

client by definition knows more than the server. So client has to do this:
1. get metadata in a way that indicates that nobody else gets to
write. Client calls server to get exclusive access to metadata/file.
This can result in server-client callbacks to all other clients. This
is fun to watch on 1000s of nodes. Before the right hacks went in it
could take 30 minutes. I am not making this up. Why? Well, what if
*every* one of the thousands of clients is trying to write at
eof and they're all fighting for the metadata? Congestive collapse,
that's what.
2. Client writes at eof. Since the client has exclusive access at this
point, it's pretty fast.
3. Clients releases the metadata lock to the server and hence the
other thousands of clients.

The 'client write at EOF' is bad for precisely the same reason that
you don't want to use shared memory for locks in a CC-NUMA machine;
you want to send the operation to the data, not move the data to the
operation. Lots of great papers on this over the years ...

ron

Re: [9fans] 9pfuse and O_APPEND

Reply via email to