Post your document on the reiserfs mailing list when you finish it, the ReiserFS team
will enjoy
reading it.
Hans
Daniel Phillips wrote:
>
> Alexander Viro wrote:
> > On Wed, 26 Jul 2000, Stephen C. Tweedie wrote:
> > > On Wed, Jul 26, 2000 at 03:19:46PM -0400, Alexander Viro wrote:
> > >
> > > > Erm? Consider that: huge lseek() + write past the end of file. Woops - got
> > > > to unmerge the tail (it's an internal block now) and we've got no
> > > > knowledge of IO going on the page. Again, IO may be asynchronous - no
> > > > protection from i_sem for us. After that page becomes a regular one,
> > > > right? Looks like a change of state to me...
> > >
> > > Naturally, and that change of state must be made atomically by the
> > > filesystem.
> >
> > Yep. Which is the point - there _are_ dragons. I believe that it's doable,
> > but I realy want to repeat: Daniel, watch out for races at the moments
> > when page state changes, it needs more accurate approach than usual
> > pagecache-using fs. It can be done, but it will take some reading (and
> > yes, Stephen, I know that _you_ know it ;-)
>
> That's apparent, and I feel that Stephen could probably implement the entire
> tail merge as described so far in few days. But that wouldn't be as useful as
> having me and perhaps some interested observers others go all the way through
> the exercise of figuring out the so-far unwritten rules of the
> buffercache/pagecache duo.
>
> The exact same accurate work is required for Tux2, which makes massive use of
> copy-on-write. Right now, buffer issues are the main thing standing in the way
> of making a development code release for Tux2. So there is no question in my
> mind about whether such issues have to be dealt with: they do.
>
> I dove into the 2.4.0 cache code for the first time last night (using lxr - try
> it, you'll like it) and I'm almost at the point where I have some relevant
> questions to ask. I notice that buffer.c has increased in size by almost 50%
> and is far and away the largest module in the VFS. Worse, buffer.c is massively
> cross-coupled to the mm subsystem and the page cache, as we know too well.
> Buffer.c is right at the core of the issues we're talking about.
>
> Bearing that in mind, instead of just jumping in and starting to code I'll try
> the methodical approach :-) My immediate objective is to try clarify a few
> things that aren't immediately obvious from the source, in the following areas:
>
> - States and transitions for the main objects:
> - Buffer heads
> - Buffer data
> - Page heads
> - Page data
> - Other?
>
> - Existing concurrency controls:
> - Semaphores/Spinlocks
> - Big kernel lock
> - Filesystem locks
> - Posix locks?
> - Other?
>
> - Planned additions/deletions of concurrency controls
>
> I will also try to make a list of the main internal functions in the VFS (and
> some related ones from the mm and drivers modules) and examine
> function-by-function what the intended usage is, what the issues/caveats are,
> and maybe even how we can expect them to evolve in the future.
>
> I think we need even more than this in terms of documentation in order to work
> effectively, but this at least will be a good start. It will be more than what
> we have now. If it gets to the point where we can actually answer questions
> about race conditions by consulting the docs then we really will have
> accomplished something. Yes, I know that the code is going to keep evolving and
> sometimes will break the docs, but I also have confidence that the docs can keep
> up with such evolution given some interested volunteer doc maintainers willing
> to hang out on the devel list and keep asking questions.
>
> Even in 2.2.x I felt that there is a lot of understated elegance in Linux's
> buffer cache design. In 2.4.0 it seems to be getting more elegant, although
> it's hard to say exactly, because of the sparse (read: nonexistent)
> documentation. This is a problem that can be easily fixed.
>
> To get through this I will have to ask a lot of naive-sounding questions.
> Hopefully I'll have the first batch ready this afternoon (morning, your time).
>
> --
> Daniel