systemtuneabletosetthedefaultpagesize

Eric Lowe Mon, 10 Apr 2006 07:13:24 -0700

Hi,

Roland Mainz wrote:

serious cycles on this though we'd like to, and the folks who were in
involved in 64K simply don't have any interest in working on this in the open.


Why ? What do they fear ? Being swamped&overburned with too many emails
or what ?


This is a community -- it's up to them whether they want to participate.

The VOP_GETPAGE() and VOP_PUTPAGE() interfaces were added in the page
cache unification which I believe (without looking) dates back to SunOS
4.x. The original vnode interface specification from srk was based on the
buffer cache.


Who or what is "srk" ?

Steve Kleiman. Looking at the notes at the end of the paper it says thatthe architecture was designed by Bill Joy but (I'm told) Steve was theoriginal implementor of the vnode interfaces.

The major problem with these interfaces is that they expose PAGESIZE to
filesystems. UFS in particuar is problematic because (as I've said before)
it assumes that PAGESIZE <= MAXBSIZE.


Yes, but in the x86 case we have PAGESIZE != MAXBSIZE so these locations
are likely well-known and we do not have to search anymore...

The filesystems appear to be prepared to handle multiple pages per blockbut not multiple blocks per page.

The right way to fix this is to refactor the VM/FS interfaces to removePAGESIZE from them. That work is getting underway now.

What about skipping UFS in the initial pass and only concentrate - as
Holger Berger suggested - on NFS booting ?


That sounds like a fine idea.

BTW: Are the ZFS/zfsboot people aware of the problem ? IMO at least the
person(s) working on zfsboot should be warned that relying on pagesize
for whatever reason may be bad...

ZFS doesn't have any of these issues; when Jeff Bonwick started ZFS heknew more about the VM system than most of us did back then, andspecifically he realized how broken the VM/FS interfaces are long beforewe did.

<weeds> One "for instance" is to go look at the ZFS implementation ofmmap() and compare it to UFS. UFS has all sorts of horrible deadlocksbetween mmap() fault-driven file I/O and its other data paths due to thepage cache, whereas ZFS does not. ZFS doesn't exhibit similar problemspartly because it requires transactional interfaces -- from the time thedata is checksummed to when the transaction group is committed the databuffers have to be immutable. Pulling this off with a unified page cacheis simply impractical, since every write would require a writers-lock plusa global TLB shootdown to prevent subsequent writes. Modern architecturesare capable of copying the data in about 1/10th of the time it takes tomake a user-mapped buffer immutable on an MP. Logging UFS also hastransactional issues which have been band-aided over leading to lockinghell and codepaths so twisted they make my head spin. VOP_GETPAGE() andVOP_PUTPAGE() and their reliance on PAGESIZE lead to a complete lack oftransactional semantics which more and more modern filesystems arestarting to require. </weeds>

Since extensive file system changes are necessary one way or the other to
pull this off, myself and others would prefer to take the tact (since we
have to go there eventually ANYWAY) of blowing up VOP_*PAGE(), and
introducing a new VOP_IO() interface which does not depend on PAGESIZE at
all. Instead of a page_t, it would use a base/bounds pair attached to a
different data structure which is associated with the I/O transaction.


What about committing the original solution first (excluding UFS (e.g.
make UFS module unloadable for now in a 64k kernel) and concentrating on
a prototype which can only boot via NFS) ?

You're welcome to try to implement partial-page reads and writes in theVOP layer if you want. Myself, I wouldn't waste my time on it.

As you can see from the discussion thread I think there is a lot more than
throwing the code over the wall,


Yes... but at least the hungry wolves can chew on it for some time until
complains come back... :-)

:)

because it implicitly assumes that their
approach was the best approach to solving the problem or was even tractable.


Ok... I'll try to sync with Holger berger then how to proceed...


Sounds good.

- Eric

_______________________________________________
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org

Re: [osol-discuss] Re: RFE: /etc/systemtuneabletosetthedefaultpagesize

Reply via email to