>This is fairly easy, at least the way we implement stacking right
now.Since
>each filesystem layer uses the page cache, each one has a page allocated
>allocated for a piece of data. Each of those pages would have a pointer
>to a method belonging to the layer above (who allocated the page in the
>first place and then passed it down), so the notification can propagate
>up without any problems.
Don't you need more than just the method pointer ? Shouldn't there be some
way to specify the argument to be passed to it as well ? e.g. a pointer to
the upper layer's corresponding page/inode in this case - OR is it that you
expect there to be some other way to get to that directly from the page ?
[I noticed that the note I'd sent to Ben on this aspect had bounced, so
I've resent it today ]
There's some similarity here with that aspect of IRP completion stacking
on Windows NT (where the I/O manager provides stacking).
>I know this multiple-caching is suboptimal, but I see no clean way around
it.
>A good optimization would be to mark all the lower layer pages as
"discard",
>but I don't think the page cache currently supports such a feature. Please
>correct me if I'm wrong -- I'd love to be proven wrong here. :-)
>BTW, I now realize that I was in fact right in my first reply. :-) You
can't
>use an address_space op for the callback, because it must callback into
>the upper layer, whereas the address_space ops reachable from struct page
>belongs to the current layer. So an addition to struct page is inevitable.
>>One option that I first thought of was that the upper layer could change
>>the
>>lower layer's completion op to point to the upper layer's
readpagecomplete.
>>The upper layer readpagecomplete would invoke the lower layer's
completion
>>op, and then perform its own completion processing. The only complication
>>is that
>>the upper layer's readpagecomplete would still be passed in the lower
>>layer's
>>inode/page and would hence need to devise some means of locating the
>>corresponding upper layer inode/page (and from that the original lower
>>layer's
>>completion op too).
>Ah, but that is too ugly to contemplate. No, modifying the inode's op
table
>is completely out of the question, especially since it is
(implementation-wise)
>shared across all the filesystem inodes. No way. A separate callback
mechanism
>added to struct page is a lot cleaner.
It isn't as bad as you think, because it is a copy of the op table that
is modified and assigned to the concerned inode. There's more to it
but, I think my going further into a discussion on the details
and the merits and demerits of this kind of an approach to
stacking may divert us from the question at hand right now.
The question which I was looking into was whether we could have a generic
interface for enabling filesystem specific formatting/modification/
post-processing of data read in via an asynchonous i/o mechanism (and
similarly some post-processing action on async write completion), without
each filesystem implementation having to explicitly do this on their
own (say e.g. using just block_read_full_page, and then having its own
readpagecompletion routine to the non-standard work, or in the case of
a stackable filesystem, using the underlying readpage to do most of the
work except for the post-processing needed).
We can think of it as an interface for re-entry into the filesystem
logic after page i/o is done.
So, it'd be nice if this were not restricted for use by stacked
filesystems and even the lowest level filesystem could use this
interface.
Secondly, where we store the func ptr is really dependent on the
granularity of such specialization that we want to support, and which
components are expected to require it. Putting it in the page structure
does provide more flexibility, since we can have different post-processing
routines for different pages of the same file. [A similar effect may
be attempted with some work even with a_ops by using a clone of the address
space for the page with some further manipulation, but this would be kind
of kludgy.]
Some thought needs to be given to how well this fits with the existing
address space ops scheme, though.
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]