system tuneable tosetthedefaultpagesize

Eric Lowe Fri, 24 Mar 2006 06:49:01 -0800

Roland Mainz wrote:

Aside from the apps that broke the next largest drawback seen with the 64K
prototype kernels was that when you have only 64K pages, if you touch 8K
in the middle of a mmap() region you end up writing back 64K to the
backing store. If you do this enough (which some apps do) you see a
performance degradation. The same factor comes into play with paging,
since you lose track of your working set;


Erm, no. You still have device buffers and max_*xfer* tuneables which
counter-act this. Additionally it may not be as bad as you think since

The issue is that a 64K page_t has a "mod" bit of 64K granularity. If Iset the mod bit, I've lost the information on what part of the page isreally modified, so I have to write back the whole 64K wad to the backingstore.

Since some apps (stupidly) do a lot of writes with mmap() this is a badthing; as I recall there were 20% or larger regressions on some fileworkloads due to this single issue.

Don't get me wrong; if I had my way, we'd deprecate mmap(), and push foras many apps as possible to migrate away from it. It is a nightmare to theVM system -- it exposes sysconf(_SC_PAGESIZE), and it throws away too muchuseful information. It causes us to have to run a scanner in thebackground to go push the modified pages out to disk. It doesn't tell uswhat parts of a big page are really modified when a write is issued. Ithas horrible race conditions with the page cache including one UFS bugwhich has been open unresolved for ten years. It's.. my evil nemesis. :)

read() and write() are a much better model for doing any volume of fileaccess and are so much easier to make large-page friendly since they havea base and bounds instead of being wired to the granularity of the MMUtranslations mapping a segment...

And overall, the
performance gains weren't always stellar due to the reduction in page
coloring on systems with direct-mapped or 2-way associative external caches.


Which was on SF68k-SF15k and not Niagara. But I think SF68k-SF15k would
still benefit from such a change - if you measured a performance gain
it's still worth to work on this - remember that the kernel and all
tuneables were optimizes for 8k pages, not 64k pages. Further
fine-tuning could deliver huge improvements in performace (the 8k-kernel
is simply maxed out at this point).


Yep yep yep.

I've been a champion since the beginning to go 64K all the way on Niagaraand for sun4v across the board. Upcoming Niagara revs and later plannedsun4v processors all appear to have hardware capabilities in their cachesthat address the coloring issues. The low thread:TLB entry ratio makes bigpages a clear winner. If we can overcome (or ignore) the compatibilityhurdles, it's a no-brainer on these machines.

As a result of the lessons learned from the 64K project I believe the
ability to do 8K protection granular mappings to files at the user level
moving forward is probably not a bad thing to keep in our pocket, although
it is in fairness an undue constraint. I believe that if we depart from
the conventional wisdom of physical memory management in the kernel and
make the physical and virtual page sizes are correctly decoupled in the VM
system (a problem I've been looking at for awhile now -- it ain't easy)
this is not a difficult constraint to retain.


Do you have time to explain what you are planning here ?


Not right now but it is planned to bring a lot of this out into the open.

The disadvantage of doing this is that it would bring back from the dead
the 32bit/64bit dual testing scenarios that caused us nightmares before we
EOF'ed 32-bit SPARC kernel support.


Agreed. But for now only a OpenSolaris.org project was proposed (with a
pretty big alliance behind the scene) - to get the 64k kernel project
back to life, fix it, making it bootable (via NFS or UFS), tune it and
then try to measure where it has it's strengths and weaknesses.

May I suggest that you propose a 64K kernel project. You'll get a +1 fromme, and you'll be off and running.

Neither I nor anyone else within Sun can promise to hurl the work doneover the wall, as we still have our day job. But clearly there's interest,and there are some good leads to follow.

Yes, but MPSS inflicts serious pain elsewhere - and it's unfortunately
not solving many problems. Even with agressive MPSS and kernel tuneables
(autoMPSS etc.) you still have so many 8k pages around that it hurts.
And the comparisation "MPSS+kernel-tuning" vs. "64-kernel" is little bit
unfair since the first option was heavily fine-tuned - and the 2nd
option AFAIK was not.


Intentionally so.

I think a fixed page size per architecture approach would
be better received.


I think the Linux people may disagree here. And there is always Niagara
which could benefit from this much better than (auto)MPSS will ever be
able to do.

What exactly is Linux doing here? It sounds like there is a lot ofresearch and investigation that has been done, which can be leveraged.


- Eric
_______________________________________________
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org

Re: [osol-discuss] Re: RFE: /etc/system tuneable tosetthedefaultpagesize

Reply via email to