Hi,

Roland Mainz wrote:
serious cycles on this though we'd like to, and the folks who were in
involved in 64K simply don't have any interest in working on this in the open.

Why ? What do they fear ? Being swamped&overburned with too many emails
or what ?

This is a community -- it's up to them whether they want to participate.

The VOP_GETPAGE() and VOP_PUTPAGE() interfaces were added in the page
cache unification which I believe (without looking) dates back to SunOS
4.x. The original vnode interface specification from srk was based on the
buffer cache.

Who or what is "srk" ?

Steve Kleiman. Looking at the notes at the end of the paper it says that the architecture was designed by Bill Joy but (I'm told) Steve was the original implementor of the vnode interfaces.

The major problem with these interfaces is that they expose PAGESIZE to
filesystems. UFS in particuar is problematic because (as I've said before)
it assumes that PAGESIZE <= MAXBSIZE.

Yes, but in the x86 case we have PAGESIZE != MAXBSIZE so these locations
are likely well-known and we do not have to search anymore...

The filesystems appear to be prepared to handle multiple pages per block but not multiple blocks per page.

The right way to fix this is to refactor the VM/FS interfaces to remove PAGESIZE from them. That work is getting underway now.

What about skipping UFS in the initial pass and only concentrate - as
Holger Berger suggested - on NFS booting ?

That sounds like a fine idea.

BTW: Are the ZFS/zfsboot people aware of the problem ? IMO at least the
person(s) working on zfsboot should be warned that relying on pagesize
for whatever reason may be bad...

ZFS doesn't have any of these issues; when Jeff Bonwick started ZFS he knew more about the VM system than most of us did back then, and specifically he realized how broken the VM/FS interfaces are long before we did.

<weeds> One "for instance" is to go look at the ZFS implementation of mmap() and compare it to UFS. UFS has all sorts of horrible deadlocks between mmap() fault-driven file I/O and its other data paths due to the page cache, whereas ZFS does not. ZFS doesn't exhibit similar problems partly because it requires transactional interfaces -- from the time the data is checksummed to when the transaction group is committed the data buffers have to be immutable. Pulling this off with a unified page cache is simply impractical, since every write would require a writers-lock plus a global TLB shootdown to prevent subsequent writes. Modern architectures are capable of copying the data in about 1/10th of the time it takes to make a user-mapped buffer immutable on an MP. Logging UFS also has transactional issues which have been band-aided over leading to locking hell and codepaths so twisted they make my head spin. VOP_GETPAGE() and VOP_PUTPAGE() and their reliance on PAGESIZE lead to a complete lack of transactional semantics which more and more modern filesystems are starting to require. </weeds>

Since extensive file system changes are necessary one way or the other to
pull this off, myself and others would prefer to take the tact (since we
have to go there eventually ANYWAY) of blowing up VOP_*PAGE(), and
introducing a new VOP_IO() interface which does not depend on PAGESIZE at
all. Instead of a page_t, it would use a base/bounds pair attached to a
different data structure which is associated with the I/O transaction.

What about committing the original solution first (excluding UFS (e.g.
make UFS module unloadable for now in a 64k kernel) and concentrating on
a prototype which can only boot via NFS) ?

You're welcome to try to implement partial-page reads and writes in the VOP layer if you want. Myself, I wouldn't waste my time on it.

As you can see from the discussion thread I think there is a lot more than
throwing the code over the wall,

Yes... but at least the hungry wolves can chew on it for some time until
complains come back... :-)

:)

because it implicitly assumes that their
approach was the best approach to solving the problem or was even tractable.

Ok... I'll try to sync with Holger berger then how to proceed...

Sounds good.

- Eric

_______________________________________________
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org

Reply via email to