Re: pmap_extract(9) (was Re: xmd(4) (Re: XIP))

2010-11-10 Thread Masao Uebayashi
On Mon, Nov 08, 2010 at 08:53:12AM -0800, Matt Thomas wrote:
 
 On Nov 8, 2010, at 8:07 AM, Masao Uebayashi wrote:
 
  On Mon, Nov 08, 2010 at 10:48:45AM -0500, Thor Lancelot Simon wrote:
  On Mon, Nov 08, 2010 at 11:32:34PM +0900, Masao Uebayashi wrote:
  
  I don't like it's MD, period attitude.  That solves nothing.
  
  We've had pmaps which have tried to pretend they were pmaps for some
  other architecture (that is, that some parts of the pmap weren't
  best left MD).  For example, we used to have a lot of pmaps in our
  tree that sort of treated the whole world like a 68K MMU.
  
  Performance has not been so great.  And besides, what -are- you going
  to do, in an MI way, about synchronization against hardware lookup?
  
  Do you mean synchronization among processors?
 
 No.  For instance, on PPC OEA processors the CPU will write back to
 the reverse page table entries to update the REF/MOD bits.  This
 requires the pmap to use the PPC equivalent of LL/SC to update PTEs.
 
 For normal page tables with hardware lookup like ARM the MMU will 
 read the L1 page table to find the address of the L2 page tables 
 and then read the actual PTE.  All of this happens without any sort
 of locking so updates need to be done in a lockless manner to have
 a coherent view of the page tables.
 
 On a TLB base MMU, the TLB miss handler will run without locking 
 which requires an always coherent page lookup (typically page table)
 where entries (either PTEs or page table pointers) are updated using
 using lockless primitives (CAS).  THis is even more critical as we
 deal with more MP platform where lookups on one CPU may be happening
 in parallel with updates on another.

So, in either design, we have to carefully update page tables by
atomic operations.

But even with it done so, the whole fault resolution can be done
in once shot in slow paths - like paging (I/O) or COW.  There are
consistencies between VAs sharing one PA, or CPUs sharing one VA.
And we resolve these dirty works one by one.  My concern is more
about the order of those operations.

I think what's going wrong in fault handling is, UVM doesn't teach
enough information to pmap during fault handling, and it calls
pmap_enter() with only a few clues.  Thus pmap has lots of problems
to solve at once.

I guess if UVM tells pmap right information at right timing, and solve
one thing at a time, pmap_enter() would become pretty much simple
operation - place the new PTE.  All the needed information is in
MI UVM structures.  Why not use them.

 
 This doesn't mean that the pmap can't be made more MI (for instance
 I have the mips and ppc85xx pmaps sharing a lot of code but still
 have MD bits to handle the various machine dependent bits).  But
 going completely MI is just not possible.
 


Re: XIP (Rev. 2)

2010-11-10 Thread Masao Uebayashi
On Tue, Nov 09, 2010 at 09:24:06PM +0200, Antti Kantee wrote:
 On Tue Nov 09 2010 at 12:47:11 -0600, David Young wrote:
  On Tue, Nov 09, 2010 at 04:31:22PM +0200, Antti Kantee wrote:
   A big problem with the XIP thread is that it is simply not palatable.
   It takes a lot of commitment just to read the thread, not to mention
   putting out a sensible review comments like e.g. Chuq and Matt have done.
   The issue is complex and the code involved is even more so.  However,
   that is no excuse for a confusing presentation.  It seems like hardly
   anyone can follow what is going on, and usually that signals that the
   audience is not the root of the problem.
  
  If the conversation's leading participants adopt the rule that they may
  not introduce a new term (pager ops) or symbol (pgo_fault) to the
  discussion until a manual page describes it, then we will gain some
  useful kernel-internals documentation, and the conversation will be more
  accessible. :-)
 
 Those concepts are carefully documented, if nowhere else, at least in
 the uvm dissertation.  Basically a pager is involved in moving things
 between memory and whatever the va is backed with (swap, a file system,
 ubc, ...).  There's pgo_get which pages data from the backing storage
 to memory (*) and pgo_put which does the opposite.  Additionally there's
 pgo_fault which is like pgo_get except the interface allows the method
 a little more freedom in how it handles the operation.  ... but i don't
 know if that's a helpful explanation unless you are familiar with pagers,
 which is why it is very difficult to produce succint documentation on
 the subject -- everyone learns to understand it a little differently.
 
 *) obviously in the case of XIP to is a matter of mapping instead
 of transferring
 
 But, the problem was not so much the use of terminology as it was the
 lack of any clear focus on the direction.  I can't form a clear mental
 image of the project, although admittedly I didn't even finish reading
 the earlier thread yet.
 
 Like gimpy said, the diff is a big piece to swallow since it's so full
 of unrelated parts:
 
 1) man pages
 2) new drivers
 3) vm
 4) vnode pager
 5) MD collateral

My XIP honors abstraction.  It has little impact against upper
layers (vnode, filesystem, etc.).

XIP is built on top of physical segment layer (physical RAM pages
or device pages).  There things are horribly abstracted.  This is
what you call unrelated changes, uvm_page.c.

 
 Then again, it's missing pieces (what's pmap_common.c?  and isn't that
 a slight oxymoron ?)
 
 The diff would be much more browsable if it was separated into pieces
 and the man pages attached as rendered versions.  Although reading the
 diff is quicker than reading the previous thread ;)
 
 A radically different implementation at this stage seems feasible only
 if there is strong reason for that based on another actually existing
 implementation (in another OS, of course).
 
 Beauty issues aside, can we have a summary of the current implementation
 of XIP from a functional perspective, i.e. what works and what doesn't.
 That's what users care about ...

Read-only XIP memory disk works
Read-only FlashROM memory disk works
Write doesn't work

That's all.  This is documented in man pages.  I'll put rendered ones next.


re: XIP (Rev. 2)

2010-11-09 Thread matthew green

 I'll merge this in a few days.  I believe I've given enough reasonings
 to back this design in various places.

do not do this.

this code has currently seen review that was less than favourable
and you have not given much consideration to the flaws.  unless
this actual patch is given review and go-ahead from others, it
is not acceptable for you to merge this into -current.


.mrg.


Re: XIP (Rev. 2)

2010-11-09 Thread Masao Uebayashi
On Tue, Nov 09, 2010 at 07:28:34PM +1100, matthew green wrote:
 
  I'll merge this in a few days.  I believe I've given enough reasonings
  to back this design in various places.
 
 do not do this.
 
 this code has currently seen review that was less than favourable
 and you have not given much consideration to the flaws.  unless

What are the flaws?

 this actual patch is given review and go-ahead from others, it
 is not acceptable for you to merge this into -current.
 
 
 .mrg.

-- 
Masao Uebayashi / Tombi Inc. / Tel: +81-90-9141-4635


Re: XIP (Rev. 2)

2010-11-09 Thread Masao Uebayashi
On Tue, Nov 09, 2010 at 10:41:23PM +0900, Izumi Tsutsui wrote:
  What are the flaws?
 
 You have not answered all questions and no one says go ahead.

I have answered all questions from Chuck.

Tsutsui-san doesn't need to understand XIP.

 ---
 Izumi Tsutsui

-- 
Masao Uebayashi / Tombi Inc. / Tel: +81-90-9141-4635


Re: XIP (Rev. 2)

2010-11-09 Thread Matt Thomas

On Nov 9, 2010, at 12:32 AM, Masao Uebayashi wrote:

 On Tue, Nov 09, 2010 at 07:28:34PM +1100, matthew green wrote:
 
 I'll merge this in a few days.  I believe I've given enough reasonings
 to back this design in various places.
 
 do not do this.
 
 this code has currently seen review that was less than favourable
 and you have not given much consideration to the flaws.  unless
 
 What are the flaws?

I'm still trying to understand the code as I assume others are trying
to do.  That takes time.

 this actual patch is given review and go-ahead from others, it
 is not acceptable for you to merge this into -current.

I concur with mrg.  We are not ready to integrate this.



Re: XIP (Rev. 2)

2010-11-09 Thread Antti Kantee
A big problem with the XIP thread is that it is simply not palatable.
It takes a lot of commitment just to read the thread, not to mention
putting out a sensible review comments like e.g. Chuq and Matt have done.
The issue is complex and the code involved is even more so.  However,
that is no excuse for a confusing presentation.  It seems like hardly
anyone can follow what is going on, and usually that signals that the
audience is not the root of the problem.

A while back chuq promised to send a mail classifying his points
into clear showstopers and issues which can be handled post-merge.
Let's start with that list (hopefully we'll get it soon) and see what
exactly are the relevant issues remaining and solve *only* those issues.

What needs to stop is threading to other areas because $subsystem is
broken beyond repair.  We know, but let's just handle the problems
relevant to XIP for now.


Re: XIP (Rev. 2)

2010-11-09 Thread Masao Uebayashi
On Tue, Nov 09, 2010 at 03:18:37PM +, Eduardo Horvath wrote:
 On Tue, 9 Nov 2010, Masao Uebayashi wrote:
 
  On Tue, Nov 09, 2010 at 07:28:34PM +1100, matthew green wrote:
   
I'll merge this in a few days.  I believe I've given enough reasonings
to back this design in various places.
   
   do not do this.
   
   this code has currently seen review that was less than favourable
   and you have not given much consideration to the flaws.  unless
  
  What are the flaws?
 
 There are two issues I see with the design and I don't understand how 
 they are addressed:
 
 1) On machines where the cache is responsible for handling ECC, how do you 
 prevent a user from trying to mount a device XIP, causing a data error and 
 a system crash?

Sorry, I don't understand this situation...  How does this differ
from user mapped RAM pages with ECC?

 
 2) How will this work with mfs and memory disks where you really want to 
 use XIP always but the pages are standard, managed RAM?

This is a good question.  What you need to do is:

- Provide a block device interface (mount)

- Provide a vnode pager interface (page fault)

You'll allocate managed RAM pages in the memory disk driver, and
keep them.  When a file is accessed, fault handler asks vnode pager
to give relevant pages back to it.

My current code assumes XIP backend is always a contiguous MMIO
device.  Both physical address pages and metadata (vm_page) are
contiguous, we can look up matching vm_pages (genfs_getpages_xip).

If you want to use managed RAM pages, you need to manage a collection
of vm_pages, presented as a range.  This is exactly what uvm_object
is for.  I think it's natural that device drivers own uvm_object, and
return their pages back to other subsystems, or loan pages to
other uvm_objects like vnode.  The problem is, the current I/O
subsystem and UVM are not integrated very well.

So, the answer is, you can't do that now, but it's a known problem.

(Extending uvm_object and using it everywhere is the way to go.)


Re: XIP (Rev. 2)

2010-11-09 Thread Masao Uebayashi
On Tue, Nov 09, 2010 at 08:39:16AM -0800, Matt Thomas wrote:
 
 On Nov 8, 2010, at 11:25 PM, Masao Uebayashi wrote:
 
  http://uebayasi.dyndns.org/~uebayasi/tmp/uebayasi-xip-20101109.txt
 
 Besides the churn (and there is a lot of it), I think my fundamental
 problem with this incarnation of XIP is that it took a wrong approach.
 It has tried to fit itself under uvm_vnodeops and I think that's a
 fatal flaw by requiring invasive changes to contort to that decision.
 
 Instead, XIP should have its own pager ops uvm_xipops and vnodes should
 be set to use that in vnalloc/insmntque which is easily done since you 
 can just check for MNT_XIP in the passed mount point.
 
 xipops would a pgo_fault routine to handle the entering the corresponding
 PA for the faulting VA using similar logic to the genfs_getpages_xip in
 your patch.
 
 This avoid the entire issue about vm_pages entirely since pgo_fault will
 be calling pmap_enter directly with the correct paddr_t.  We will need
 to add a PMAP_CACHE flag as well.
 
 This leaves the issue of how to deal with copy-on-write (COW) page faults.
 I think the best way to deal with is for pgo_fault to return a specific
 error (EROFS seems appropriate) and let UVM deal with it.  However UVM
 doesn't know where the source data exists so we will need to add a
 int (*pgo_copy_page)(struct uvm_object *uobj, voff_t offset, paddr_t pa) 
 op to pagerops which copy one page of data from uvm_object to the
 specified pa.  This would do what we would normally use pmap_copy_page
 but we can't since we don't know the source pa and it's not a managed
 page.
 
 I think with this approach most of churn goes away and there is minimal
 changes to the rest of NetBSD.

I understand having a separate pager would work too.  If you go
that route, you have to give up COW.  The two layered amap/uobj is
the fundamental design of UVM.

I'd also point out that pgo_fault() is prepaired only for *special*
purposes.  My plan is to rewrite those backends to use pgo_get().
Then use single pmap_enter().

Both of yours and mine are possible, and there're pros and cons.


Re: XIP (Rev. 2)

2010-11-09 Thread Matt Thomas

On Nov 9, 2010, at 9:06 AM, Masao Uebayashi wrote:

 On Tue, Nov 09, 2010 at 08:39:16AM -0800, Matt Thomas wrote:
 
 On Nov 8, 2010, at 11:25 PM, Masao Uebayashi wrote:
 
 http://uebayasi.dyndns.org/~uebayasi/tmp/uebayasi-xip-20101109.txt
 
 Besides the churn (and there is a lot of it), I think my fundamental
 problem with this incarnation of XIP is that it took a wrong approach.
 It has tried to fit itself under uvm_vnodeops and I think that's a
 fatal flaw by requiring invasive changes to contort to that decision.
 
 Instead, XIP should have its own pager ops uvm_xipops and vnodes should
 be set to use that in vnalloc/insmntque which is easily done since you 
 can just check for MNT_XIP in the passed mount point.
 
 I understand having a separate pager would work too.  If you go
 that route, you have to give up COW.  The two layered amap/uobj is
 the fundamental design of UVM.

You don't have to give up COW, you have to deal with it.  And those
changes will be far less pervasive than the changes you've had to make.

 I'd also point out that pgo_fault() is prepaired only for *special*
 purposes.  My plan is to rewrite those backends to use pgo_get().
 Then use single pmap_enter().

And XIP isn't special?  But I disagree that pgo_fault is only for
those purposes.  It could be used for more but that hasn't been
needed.

 Both of yours and mine are possible, and there're pros and cons.

That's true.  


Re: XIP (Rev. 2)

2010-11-09 Thread Eduardo Horvath
On Wed, 10 Nov 2010, Masao Uebayashi wrote:

 On Tue, Nov 09, 2010 at 03:18:37PM +, Eduardo Horvath wrote:

  There are two issues I see with the design and I don't understand how 
  they are addressed:
  
  1) On machines where the cache is responsible for handling ECC, how do you 
  prevent a user from trying to mount a device XIP, causing a data error and 
  a system crash?
 
 Sorry, I don't understand this situation...  How does this differ
 from user mapped RAM pages with ECC?

Ok, I'll try to explain the hardware.

In an ECC setup you have extra RAM bits to store the ECC data.  That data 
is generated when data is writen to RAM and checked when it's read back 
from RAM.  This is usually done in the memory controller so the extra data 
is not stored in the cache.  The ECC domain is RAM.

If your machine uses ECC in the cache, then the ECC information is 
generated and checked when the data is inserted and removed from the 
cache.  The ECC domain is not RAM but cache.  In this case if you try to 
set the bit in the PTE to enable caching for an address that does not 
provide ECC bits, such as a FLASH PROM, when the data enters the cache it 
has no ECC infomation and the cache generates a fault.

On these machines the cache can only be enabled for RAM.


  2) How will this work with mfs and memory disks where you really want to 
  use XIP always but the pages are standard, managed RAM?
 
 This is a good question.  What you need to do is:
 
 - Provide a block device interface (mount)
 
 - Provide a vnode pager interface (page fault)
 
 You'll allocate managed RAM pages in the memory disk driver, and
 keep them.  When a file is accessed, fault handler asks vnode pager
 to give relevant pages back to it.
 
 My current code assumes XIP backend is always a contiguous MMIO
 device.  Both physical address pages and metadata (vm_page) are
 contiguous, we can look up matching vm_pages (genfs_getpages_xip).

 If you want to use managed RAM pages, you need to manage a collection
 of vm_pages, presented as a range.  This is exactly what uvm_object
 is for.  I think it's natural that device drivers own uvm_object, and
 return their pages back to other subsystems, or loan pages to
 other uvm_objects like vnode.  The problem is, the current I/O
 subsystem and UVM are not integrated very well.
 
 So, the answer is, you can't do that now, but it's a known problem.
 
 (Extending uvm_object and using it everywhere is the way to go.)

Hm.  Does this mean two separate XIP implementations are needed for I/O 
devices and managd RAM?

Eduardo


Re: XIP (Rev. 2)

2010-11-09 Thread Masao Uebayashi
I think your approach descibed here is something intermediate
between my generic one and a dedicated one - which we talked
privately long time ago - that designing a dedicated ROM format,
hooks map with XIP execution in exec handler, etc.

I chose generic approach, because I wanted to

- Avoid unexpected problems (I didn't know what I didn't know), and
- Avoid creating dedicated filesystem and its tools
- Honor abstraction
- Avoid code duplication
- Optimize later

So mine reuses all the existing code path of filesystem, vnode
pager (except the genfs_getpages_xip).

Pros:
- Fully COW'ed
- No code duplication
- Existing resources are reusable

Cons:
- Slow
- Inefficient (use normal sized TLBs)
- Too generic; redefine some fundamental assumptions of Unix

I admit this is hard to understand.  But it's not solely my fault.
XIP is an odd concept by nature...

Masao

On Tue, Nov 09, 2010 at 09:19:15AM -0800, Matt Thomas wrote:
 
 On Nov 9, 2010, at 9:06 AM, Masao Uebayashi wrote:
 
  On Tue, Nov 09, 2010 at 08:39:16AM -0800, Matt Thomas wrote:
  
  On Nov 8, 2010, at 11:25 PM, Masao Uebayashi wrote:
  
http://uebayasi.dyndns.org/~uebayasi/tmp/uebayasi-xip-20101109.txt
  
  Besides the churn (and there is a lot of it), I think my fundamental
  problem with this incarnation of XIP is that it took a wrong approach.
  It has tried to fit itself under uvm_vnodeops and I think that's a
  fatal flaw by requiring invasive changes to contort to that decision.
  
  Instead, XIP should have its own pager ops uvm_xipops and vnodes should
  be set to use that in vnalloc/insmntque which is easily done since you 
  can just check for MNT_XIP in the passed mount point.
  
  I understand having a separate pager would work too.  If you go
  that route, you have to give up COW.  The two layered amap/uobj is
  the fundamental design of UVM.
 
 You don't have to give up COW, you have to deal with it.  And those
 changes will be far less pervasive than the changes you've had to make.
 
  I'd also point out that pgo_fault() is prepaired only for *special*
  purposes.  My plan is to rewrite those backends to use pgo_get().
  Then use single pmap_enter().
 
 And XIP isn't special?  But I disagree that pgo_fault is only for
 those purposes.  It could be used for more but that hasn't been
 needed.
 
  Both of yours and mine are possible, and there're pros and cons.
 
 That's true.  

-- 
Masao Uebayashi / Tombi Inc. / Tel: +81-90-9141-4635


Re: XIP (Rev. 2)

2010-11-09 Thread David Young
On Tue, Nov 09, 2010 at 04:31:22PM +0200, Antti Kantee wrote:
 A big problem with the XIP thread is that it is simply not palatable.
 It takes a lot of commitment just to read the thread, not to mention
 putting out a sensible review comments like e.g. Chuq and Matt have done.
 The issue is complex and the code involved is even more so.  However,
 that is no excuse for a confusing presentation.  It seems like hardly
 anyone can follow what is going on, and usually that signals that the
 audience is not the root of the problem.

If the conversation's leading participants adopt the rule that they may
not introduce a new term (pager ops) or symbol (pgo_fault) to the
discussion until a manual page describes it, then we will gain some
useful kernel-internals documentation, and the conversation will be more
accessible. :-)

Dave

-- 
David Young OJC Technologies
dyo...@ojctech.com  Urbana, IL * (217) 278-3933


Re: XIP (Rev. 2)

2010-11-09 Thread Antti Kantee
On Tue Nov 09 2010 at 12:47:11 -0600, David Young wrote:
 On Tue, Nov 09, 2010 at 04:31:22PM +0200, Antti Kantee wrote:
  A big problem with the XIP thread is that it is simply not palatable.
  It takes a lot of commitment just to read the thread, not to mention
  putting out a sensible review comments like e.g. Chuq and Matt have done.
  The issue is complex and the code involved is even more so.  However,
  that is no excuse for a confusing presentation.  It seems like hardly
  anyone can follow what is going on, and usually that signals that the
  audience is not the root of the problem.
 
 If the conversation's leading participants adopt the rule that they may
 not introduce a new term (pager ops) or symbol (pgo_fault) to the
 discussion until a manual page describes it, then we will gain some
 useful kernel-internals documentation, and the conversation will be more
 accessible. :-)

Those concepts are carefully documented, if nowhere else, at least in
the uvm dissertation.  Basically a pager is involved in moving things
between memory and whatever the va is backed with (swap, a file system,
ubc, ...).  There's pgo_get which pages data from the backing storage
to memory (*) and pgo_put which does the opposite.  Additionally there's
pgo_fault which is like pgo_get except the interface allows the method
a little more freedom in how it handles the operation.  ... but i don't
know if that's a helpful explanation unless you are familiar with pagers,
which is why it is very difficult to produce succint documentation on
the subject -- everyone learns to understand it a little differently.

*) obviously in the case of XIP to is a matter of mapping instead
of transferring

But, the problem was not so much the use of terminology as it was the
lack of any clear focus on the direction.  I can't form a clear mental
image of the project, although admittedly I didn't even finish reading
the earlier thread yet.

Like gimpy said, the diff is a big piece to swallow since it's so full
of unrelated parts:

1) man pages
2) new drivers
3) vm
4) vnode pager
5) MD collateral

Then again, it's missing pieces (what's pmap_common.c?  and isn't that
a slight oxymoron ?)

The diff would be much more browsable if it was separated into pieces
and the man pages attached as rendered versions.  Although reading the
diff is quicker than reading the previous thread ;)

A radically different implementation at this stage seems feasible only
if there is strong reason for that based on another actually existing
implementation (in another OS, of course).

Beauty issues aside, can we have a summary of the current implementation
of XIP from a functional perspective, i.e. what works and what doesn't.
That's what users care about ...


Ownership of uvm_object and vm_page (was Re: XIP)

2010-11-08 Thread Masao Uebayashi
On Sat, Oct 30, 2010 at 06:55:42PM -0700, Chuck Silvers wrote:
 On Wed, Oct 27, 2010 at 06:38:11PM +0900, Masao Uebayashi wrote:
  On Tue, Oct 26, 2010 at 02:06:38PM -0700, Chuck Silvers wrote:
   On Mon, Oct 25, 2010 at 02:09:43AM +0900, Masao Uebayashi wrote:
I think the uebayasi-xip branch is ready to be merged.
   
   hi,
   
   here's what I found looking at the current code in the branch:
   
   
- the biggest issue I had with the version that I reviewed earlier
  was that it muddled the notion of a managed page.  you wanted
  to create a new kind of partially-managed page for XIP devices
  which would not be managed by the UVM in the sense that it could
  contain different things depending on what was needed but only that
  the page's mappings would be tracked so that pmap_page_protect()
  could be called on it.  this is what led to all the pmap changes
  the pmaps needed to be able to handle being called with a vm_page
  pointer that didn't actually point to a struct vm_page.
   
  it looks like you've gotten rid of that, which I like, but you've
  replaced it with allocating a full vm_page structure for every page in
  an XIP device, which seems like a waste of memory.  as we discussed
  earlier, I think it would be better to treat XIP pages as unmanaged
  and change a few other parts of UVM to avoid needing to track the
  mappings of XIP page mappings.  I have thoughts on how to do all that,
  which I'll list at the end of this mail.  however, if XIP devices
  are small enough that the memory overhead of treating device pages
  as managed is reasonable, then I'll go along with it.
  so how big do these XIP devices get?
  
  It's waste of memory, yes.  With 64M ROM on arm (4K page, 80byte vm_page),
  the array is 1.25M.  If vm_page's made a single pointer (sizeof(void
  *) == 4), the array size becomes 64K.  Not small difference.
  Typical XIP'ed products would be mobile devices with FlashROM, or small
  servers with memory disk (md or xmd).  About 16M~1G RAM/ROM?
  
  I made it back to have vm_page to simplify code.  We can make it
  to vm_page_md or whatever minimal, once after we figure out the
  new design of MI vm_page_md.
  
  either way, the changes to the various pmaps to handle the fake vm_page
  pointers aren't necessary anymore, so all those changes should be 
   reverted.
  
  Those mechanical vm_page - vm_page_md changes done in pmaps have
  a valid point by itself.  Passing around vm_page pointer across
  pmap functions is unnecessary.  I'd rather say wrong.  All pmap
  needs is vm_page_md.
  
  I'd propose to do this vm_page - vm_page_md change in pmaps first
  in HEAD and sync the branch with it, rather than revert it.
 
 that seems a bit premature, don't you think?
 since you're already talking about redesigning it?
 
 ... apparently not, you already checked it in.
 
 
  it doesn't look like the PMAP_UNMANAGED flag that you added before
  is necessary anymore either, is that right?  if so, it should also
  be reverted.  if not, what's it for now?
  
  pmap_enter() passes paddr_t directly to pmap.  pmap has no clue if
  the given paddr_t is to be cached/uncached.  So a separate flag,
  PMAP_UNMANAGED.  Without it, a FlashROM which is registered as
  (potentially) managed (using bus_space_physload_device) is always
  mapped to user address as cached, even if it's not XIP.  This made
  userland flash writer program not work.
  
  The point is whether to be cached or not is decided by virtual
  address.  Thus such an information should be stored in vm_map_entry
  explicitly, in theory.
  
  (This is related to framebuffers too.)
 
 there is already an MI mechanism for specifying cachability of mappings,
 see PMAP_NOCACHE and related flags.
 
 
- I mentioned before that I think the driver for testing this using
  normal memory to simulate an XIP device should just be the existing
  md driver, rather than a whole new driver whose only purpose
  would be testing the XIP code.  however, you wrote a separate
  xmd driver anyway.  so I'll say again: I think the xmd should be
  merged into the md driver.
  
  It has turned out that the new xmd(4) is not really md(4); it's not
  md(4) modified to support XIP
  but rather
  rom(4) emulated using RAM
  
  I plan to merge xmd(4) into rom(4) or flash(4) when it's ready.
  If you don't like xmd(4) to reserve dev namespace, I'd rather back
  out xmd(4) totally.
  
  If you don't like the name, suggest a better one. ;)
 
 I'll respond to this in the other thread you started about it.
 but creating a temporary device driver does seem pretty silly.
 
 
  you also have an xmd_machdep.c for various platforms, but nothing
  in that file is really machine-specific.  rather than use the
  machine-specific macros for converting between bytes and pages,
  it would be better to either use the MI macros (ptoa / 

Re: XIP

2010-11-08 Thread Masao Uebayashi
On Fri, Nov 05, 2010 at 05:44:55PM +, Mindaugas Rasiukevicius wrote:
 Masao Uebayashi uebay...@tombi.co.jp wrote:
Those mechanical vm_page - vm_page_md changes done in pmaps have
a valid point by itself.  Passing around vm_page pointer across
pmap functions is unnecessary.  I'd rather say wrong.  All pmap
needs is vm_page_md.

I'd propose to do this vm_page - vm_page_md change in pmaps first
in HEAD and sync the branch with it, rather than revert it.
   
   that seems a bit premature, don't you think?
   since you're already talking about redesigning it?
   
   ... apparently not, you already checked it in.
  
  It's not premature.  It clarifies that passing struct vm_page * to
  pmap is unnecessary at all.  We'll need to move those MD PV data
  structures to MI anyway.
 
 There are ideas to move P-V tracking to MI, right.  However, you are
 starting to re-design the pmap abstraction here - that is really out of
 XIP project scope, and it definitely needs rmind-uvmplock merge first.
 
   is your desire to control whether the mount accesses the device via
   mappings, or just to be able to see whether or not the device is being
   accessed via mappings?
  
  Both.
  
  My understanding is that mount options change only its internal behavior.
  I don't see how MNT_LOG and MNT_XIP differ.  Mount is done only by
  admins who know internals.  There may be cases where an XIP device
  is much slower than RAM, and page cache is wanted.
  
  I also think that mount option is the only way to configure per-mount
  behavior.
 
 Mount option seems quite useful to me.  Speaking of them.. I would also
 like to add MNT_DIRECTIO. :)
 
   I've thought about loaning and XIP again quite a bit, but in the
   interest of actually sending this mail today, I'll leave this part for
   later too.
  
  I spent a little time for this...
  
  Now uvm_loan() returns to a loaner an array of pointers to struct
  vm_page.  Which means that those struct vm_page's are shared among
  loaner and loanee.  The owner of those pages are recorded in
  vm_page::uanon and vm_page::uobject.
  
  I'm thinking this design is broken.  At least A-A loan is impossible
  for no reason.  I think loanee should allocate (by kmem_alloc(9))
  a new *alias* vm_page, to keep loaning state.  Alias vm_page has
  a pointer to its new owner, and with a pointer to the *real* vm_page
  too.
  
  uanon/uobject is also bad in that it obscures the 1:1 relationship
  of pages and owners.  These two members should be merged into a single
  void *pg_owner.  vm_page::loan_count is only for loaner.  Loaners
  of read-only, wired pages don't need to remember anything.
 
 I do not think loaning is broken, but I see where you are coming from.
 Owner-less state i.e. when locker needs to resolve orphaned page brings
 some obscurity.  Also, O-A loaning where we do trylocks (since we cannot
 hold a reference on the object) is very messy.
 
 Your idea about allocating extra meta-data in order to do the loaning
 seems counter-productive, given the whole concept of loaning.

I believe I understand loaning *now*.  I thought loaning has
direction.  It's not; someone who writes to it is penalized.

I also understand why O-O doesn't exist.

 
 So again.. I would say it is something to revisit after XIP merge.

I'm only trying to show that my XIP is not going in a totally wrong
direction in big pictures.  Unfortunately Chuck seems suspecting yet...


Re: pmap_extract(9) (was Re: xmd(4) (Re: XIP))

2010-11-08 Thread Masao Uebayashi
On Fri, Nov 05, 2010 at 04:54:33PM +, Eduardo Horvath wrote:
 On Fri, 5 Nov 2010, Masao Uebayashi wrote:
 
  On Mon, Nov 01, 2010 at 03:55:01PM +, Eduardo Horvath wrote:
   On Mon, 1 Nov 2010, Masao Uebayashi wrote:
   
I think pmap_extract(9) is a bad API.

After MD bootstrap code detects all physical memories, it gives
all the informations to UVM, including available KVA.  At this
point UVM knows all the available resources of virtual/physical
addresses.  UVM is responsible to manage all of these.
   
   This is managed RAM.  What about I/O pages?
  
  To access MMIO device pages, you need a physical address.  Physical
  address space is single, linear resource on all platforms.  I wonder
  why we can't manage it in MI way.
 
 I suppose that depends on your definition of linear.  But that's beside 
 the point.
 
 I/O pages have no KVA until a mapping is done.  UVM knows nothing about 
 those mappings since they are managed solely by pmap.  I still don't see 
 how what you're proposing here will work.

UVM knows nothing about those mappings, since they are not taught.

UVM knows managed RAM pages, since they are taught.

 
  
   
Calling pmap_extract(9) means that some kernel code asks pmap(9)
to look up a physical address.  pmap(9) is only responsible to
handle CPU and MMU.  Using it as a lookup database is an abuse.
The only reasonable use of pmap_extract(9) is for debugging purpose.
I think that pmap_extract(9) should be changed like:

bool pmap_mapped_p(struct pmap *, vaddr_t);

and allow it to be used for KASSERT()s.

The only right way to retrieve P-V translation is to lookup from
vm_map (== the fault handler).  If we honour this principle, VM
and I/O code will be much more consistent.
   
   pmap(9) has always needed a database to keep track of V-P mappings(*) as 
   wll as P-V mappings so pmap_page_protect() can be implemented.  
  
  pmap_extract() accesses page table (per-space).  pmap_page_protect()
  accesses PV (per-page).  I think they're totally different...
 
 The purpose of pmap(9) is to manage MMU hardware.  Page tables are one 
 possible implementation of MMU hardware.  Not all machines have page 
 tables.  Some processors use reverse page tables.  Some just have TLBs.  
 And if you read secion 5.13 of 
 _The_Design_and_Implmentation_of_the_4.4BSD_Operating_System_ 
 it says that pmap is allowed to forget any mappings that are not wired.  
 So, in theory, all you need to do is keep a linked list of wired mappings 
 to insert in the TLB on fault and forget everything else.  Of course, that 
 doesn't seem to work so well with UVM.

Ancient designs don't help me so far.

 
 Anyway, please keep in mind that not all machines are PCs.  I'd really 
 hate to see a repeat of the Linux VM subsysem which directly manipulated 
 x86 page tables even on architectures that don't have page tables let 
 alone somehing compaible wih x86.  pmap(9) is an abstraction layer for 
 good reason.

Huh?  When I said I like x86?

I said only PV.  IIRC Linux didn't have PV before 2.6.


Re: pmap_extract(9) (was Re: xmd(4) (Re: XIP))

2010-11-08 Thread Masao Uebayashi
On Fri, Nov 05, 2010 at 05:36:53PM +, Eduardo Horvath wrote:
 On Fri, 5 Nov 2010, Masao Uebayashi wrote:
 
  On Mon, Nov 01, 2010 at 03:52:11PM -0700, Matt Thomas wrote:
 
   Indeed.  Also consider that pmap's are designed to have to have
   fast V-P translations, using that instead of UVM makes a lot of
   sense.
  
  How does locking works?
  
  My understanding is page tables (per-process) are protected by
  struct vm_map (per-process).  (Or moving toward it.)
 
 No, once again this is MD.  For instance sparc64 uses compare and swap 
 instructions to manipulate page tables for lockless synchronization.

I don't like it's MD, period attitude.  That solves nothing.


Re: pmap_extract(9) (was Re: xmd(4) (Re: XIP))

2010-11-08 Thread Masao Uebayashi
On Fri, Nov 05, 2010 at 10:04:46AM -0700, Matt Thomas wrote:
 
 On Nov 5, 2010, at 4:59 AM, Masao Uebayashi wrote:
 
  On Mon, Nov 01, 2010 at 03:52:11PM -0700, Matt Thomas wrote:
  
  On Nov 1, 2010, at 8:55 AM, Eduardo Horvath wrote:
  
  On Mon, 1 Nov 2010, Masao Uebayashi wrote:
  
  I think pmap_extract(9) is a bad API.
  
  After MD bootstrap code detects all physical memories, it gives
  all the informations to UVM, including available KVA.  At this
  point UVM knows all the available resources of virtual/physical
  addresses.  UVM is responsible to manage all of these.
  
  This is managed RAM.  What about I/O pages?
  
  Indeed.  Also consider that pmap's are designed to have to have
  fast V-P translations, using that instead of UVM makes a lot of
  sense.
  
  How does locking works?
  
  My understanding is page tables (per-process) are protected by
  struct vm_map (per-process).  (Or moving toward it.)
 
 Unfortunately, that doesn't completely solve the problem since
 lookups will be done either by exception handlers or hardware 
 bypassing any locks.  These means that the page tables must be
 updated in a MP safe manner.

I spent some time to think of this.  I'm pretty sure I have a good
understanding of pmap vs. MP now.

I'll reply after doing a little more research.


Re: pmap_extract(9) (was Re: xmd(4) (Re: XIP))

2010-11-08 Thread Izumi Tsutsui
  No, once again this is MD.  For instance sparc64 uses compare and swap 
  instructions to manipulate page tables for lockless synchronization.
 
 I don't like it's MD, period attitude.  That solves nothing.

What do you want to solve, as yamt asked you first?

He said pmap_extract() could be used to get PA from VA.
You just answered pmap_extract() was bad API.
What you were trying to solve?

If existing API can solve it without bad side effect,
I don't think it's so bad for your purpose and
its design should be another discussion.

---
Izumi Tsutsui


Re: pmap_extract(9) (was Re: xmd(4) (Re: XIP))

2010-11-08 Thread Eduardo Horvath
On Mon, 8 Nov 2010, Masao Uebayashi wrote:

 On Fri, Nov 05, 2010 at 05:36:53PM +, Eduardo Horvath wrote:
  On Fri, 5 Nov 2010, Masao Uebayashi wrote:
  
   On Mon, Nov 01, 2010 at 03:52:11PM -0700, Matt Thomas wrote:
  
Indeed.  Also consider that pmap's are designed to have to have
fast V-P translations, using that instead of UVM makes a lot of
sense.
   
   How does locking works?
   
   My understanding is page tables (per-process) are protected by
   struct vm_map (per-process).  (Or moving toward it.)
  
  No, once again this is MD.  For instance sparc64 uses compare and swap 
  instructions to manipulate page tables for lockless synchronization.
 
 I don't like it's MD, period attitude.  That solves nothing.

Yes it does.  If you have bleed through between the different abstraction 
layers it makes implementing a pmap for a new processor much more 
difficult and makes the code inefficient since you end up implementing a 
whole bunch of goo just to keep the sideffects compatible.  You should not 
be making any implicit assumptions beyond what is explicitly documented in 
the interface descriptions otherwise the code becomes unmaintainable 
across the dozens of different processors and MMU archittures we're trying 
to support.

Eduardo


Re: pmap_extract(9) (was Re: xmd(4) (Re: XIP))

2010-11-08 Thread Thor Lancelot Simon
On Mon, Nov 08, 2010 at 11:32:34PM +0900, Masao Uebayashi wrote:
 
 I don't like it's MD, period attitude.  That solves nothing.

We've had pmaps which have tried to pretend they were pmaps for some
other architecture (that is, that some parts of the pmap weren't
best left MD).  For example, we used to have a lot of pmaps in our
tree that sort of treated the whole world like a 68K MMU.

Performance has not been so great.  And besides, what -are- you going
to do, in an MI way, about synchronization against hardware lookup?

-- 
Thor Lancelot Simont...@rek.tjls.com

   If the World Wide Web were more than a pale shadow of what Usenet was,
   every single blog entry would be http://preview.tinyurl.com/34zahyx .


Re: pmap_extract(9) (was Re: xmd(4) (Re: XIP))

2010-11-08 Thread Masao Uebayashi
On Mon, Nov 08, 2010 at 03:22:42PM +, Eduardo Horvath wrote:
 On Mon, 8 Nov 2010, Masao Uebayashi wrote:
 
  On Fri, Nov 05, 2010 at 05:36:53PM +, Eduardo Horvath wrote:
   On Fri, 5 Nov 2010, Masao Uebayashi wrote:
   
On Mon, Nov 01, 2010 at 03:52:11PM -0700, Matt Thomas wrote:
   
 Indeed.  Also consider that pmap's are designed to have to have
 fast V-P translations, using that instead of UVM makes a lot of
 sense.

How does locking works?

My understanding is page tables (per-process) are protected by
struct vm_map (per-process).  (Or moving toward it.)
   
   No, once again this is MD.  For instance sparc64 uses compare and swap 
   instructions to manipulate page tables for lockless synchronization.
  
  I don't like it's MD, period attitude.  That solves nothing.
 
 Yes it does.  If you have bleed through between the different abstraction 
 layers it makes implementing a pmap for a new processor much more 
 difficult and makes the code inefficient since you end up implementing a 
 whole bunch of goo just to keep the sideffects compatible.  You should not 
 be making any implicit assumptions beyond what is explicitly documented in 
 the interface descriptions otherwise the code becomes unmaintainable 
 across the dozens of different processors and MMU archittures we're trying 
 to support.

Most of pmaps are already almost unmaintainable IMO. ;)


Re: pmap_extract(9) (was Re: xmd(4) (Re: XIP))

2010-11-08 Thread Masao Uebayashi
On Mon, Nov 08, 2010 at 10:48:45AM -0500, Thor Lancelot Simon wrote:
 On Mon, Nov 08, 2010 at 11:32:34PM +0900, Masao Uebayashi wrote:
  
  I don't like it's MD, period attitude.  That solves nothing.
 
 We've had pmaps which have tried to pretend they were pmaps for some
 other architecture (that is, that some parts of the pmap weren't
 best left MD).  For example, we used to have a lot of pmaps in our
 tree that sort of treated the whole world like a 68K MMU.
 
 Performance has not been so great.  And besides, what -are- you going
 to do, in an MI way, about synchronization against hardware lookup?

Do you mean synchronization among processors?


Re: XIP

2010-11-05 Thread Masao Uebayashi
Hi,

On Tue, Nov 02, 2010 at 08:58:14AM +, Mindaugas Rasiukevicius wrote:
 Chuck Silvers c...@chuq.com wrote:
 - in getpages, rather than allocating a page of zeros for each file,
   it would be better to use the same page of zeroes that mem.c uses.
   if multiple subsystems need a page of zeroes, then I think that
   UVM should provide this functionality rather than duplicating
   that all over the place.  if you don't want to update all the
copies of mem.c, that's fine, but please put this in UVM instead of
genfs.
   
   I made it per-vnode, because If you share a single physical zero'ed
   page, when a vnode is putpage'ed, it invalidates all mappings in
   other vnodes pointing to the zero page in a pmap.  Doesn't this
   sound strange?
  
  I think you're misunderstanding something.  for operations like
  msync(), putpage is called on the object in the UVM map entry,
  which would never be the object owning the global page of zeroes.
  from the pagedaemon, putpage is called on a page's object,
  but if the page replacement policy decides to reclaim the page
  of zeroes then invalidating all the mappings is necessary.
  if what you describe would actually happen, then yes,
  I would find it strange.
 
 FYI: In rmind-uvmplock branch, all MD mem.c are replaced with a single MI
 driver, which provides zero page.  So that can be easily moved to UVM (and
 some day.. in the longer term, we should have a page for each NUMA node).

The difference is, mem.c and others use their zero page from kernel.
In XIP it's only for vnodes.  I'm not sure how much per-CPU zero
page helps XIP.  Anyway, I think my per-vnode approach is not
terribly bad.  Let's improve this later.

BTW, can't you merge your mem.c changes first?

Masao

 
   Another reason is that by having a vm_page with a uvm_object as
   its parent, fault handler doesn't need to know anything about the
   special zero'ed page.
  
  the fault handler wouldn't need to know about the page of zeroes,
  why do you think it would?  it does need to be aware that the pages
  returned by getpages may not be owned by the object that getpages
  is called with, but it needs to handle that regardless, for layered
  file systems.
 
 Lock sharing among UVM objects (for layered file systems and tmpfs) should
 bring some simplifications here, I think.
 
 -- 
 Mindaugas

-- 
Masao Uebayashi / Tombi Inc. / Tel: +81-90-9141-4635


Re: pmap_extract(9) (was Re: xmd(4) (Re: XIP))

2010-11-05 Thread Masao Uebayashi
On Mon, Nov 01, 2010 at 03:55:01PM +, Eduardo Horvath wrote:
 On Mon, 1 Nov 2010, Masao Uebayashi wrote:
 
  I think pmap_extract(9) is a bad API.
  
  After MD bootstrap code detects all physical memories, it gives
  all the informations to UVM, including available KVA.  At this
  point UVM knows all the available resources of virtual/physical
  addresses.  UVM is responsible to manage all of these.
 
 This is managed RAM.  What about I/O pages?

To access MMIO device pages, you need a physical address.  Physical
address space is single, linear resource on all platforms.  I wonder
why we can't manage it in MI way.

 
  Calling pmap_extract(9) means that some kernel code asks pmap(9)
  to look up a physical address.  pmap(9) is only responsible to
  handle CPU and MMU.  Using it as a lookup database is an abuse.
  The only reasonable use of pmap_extract(9) is for debugging purpose.
  I think that pmap_extract(9) should be changed like:
  
  bool pmap_mapped_p(struct pmap *, vaddr_t);
  
  and allow it to be used for KASSERT()s.
  
  The only right way to retrieve P-V translation is to lookup from
  vm_map (== the fault handler).  If we honour this principle, VM
  and I/O code will be much more consistent.
 
 pmap(9) has always needed a database to keep track of V-P mappings(*) as 
 wll as P-V mappings so pmap_page_protect() can be implemented.  

pmap_extract() accesses page table (per-space).  pmap_page_protect()
accesses PV (per-page).  I think they're totally different...

 
 Are you planning on moving the responsibility of tracking P-V mappings to 
 UVM?
 
 * While you can claim that keeping track of P-V mappings is the primary 
 function of pmap(9) and a sideffect of page tables, that posits the 
 machine in quesion uses page tables.  In a machine with a software managed 
 TLB you could implement pmap(9) by walking the UVM structures on a page 
 fault and generating TLB entries from the vm_page structure.  This would 
 reduce the amount of duplicate informaion maintained by the VM subsystems.  
 However, UVM currently assumes pmap() remembers all forward and reverse 
 mappings.  If pmap() forgets them, bad things happen.
 
 Eduardo

-- 
Masao Uebayashi / Tombi Inc. / Tel: +81-90-9141-4635


Re: pmap_extract(9) (was Re: xmd(4) (Re: XIP))

2010-11-05 Thread Masao Uebayashi
On Mon, Nov 01, 2010 at 03:52:11PM -0700, Matt Thomas wrote:
 
 On Nov 1, 2010, at 8:55 AM, Eduardo Horvath wrote:
 
  On Mon, 1 Nov 2010, Masao Uebayashi wrote:
  
  I think pmap_extract(9) is a bad API.
  
  After MD bootstrap code detects all physical memories, it gives
  all the informations to UVM, including available KVA.  At this
  point UVM knows all the available resources of virtual/physical
  addresses.  UVM is responsible to manage all of these.
  
  This is managed RAM.  What about I/O pages?
 
 Indeed.  Also consider that pmap's are designed to have to have
 fast V-P translations, using that instead of UVM makes a lot of
 sense.

How does locking works?

My understanding is page tables (per-process) are protected by
struct vm_map (per-process).  (Or moving toward it.)

Masao

-- 
Masao Uebayashi / Tombi Inc. / Tel: +81-90-9141-4635


Re: pmap_extract(9) (was Re: xmd(4) (Re: XIP))

2010-11-05 Thread Eduardo Horvath
On Fri, 5 Nov 2010, Masao Uebayashi wrote:

 On Mon, Nov 01, 2010 at 03:55:01PM +, Eduardo Horvath wrote:
  On Mon, 1 Nov 2010, Masao Uebayashi wrote:
  
   I think pmap_extract(9) is a bad API.
   
   After MD bootstrap code detects all physical memories, it gives
   all the informations to UVM, including available KVA.  At this
   point UVM knows all the available resources of virtual/physical
   addresses.  UVM is responsible to manage all of these.
  
  This is managed RAM.  What about I/O pages?
 
 To access MMIO device pages, you need a physical address.  Physical
 address space is single, linear resource on all platforms.  I wonder
 why we can't manage it in MI way.

I suppose that depends on your definition of linear.  But that's beside 
the point.

I/O pages have no KVA until a mapping is done.  UVM knows nothing about 
those mappings since they are managed solely by pmap.  I still don't see 
how what you're proposing here will work.

 
  
   Calling pmap_extract(9) means that some kernel code asks pmap(9)
   to look up a physical address.  pmap(9) is only responsible to
   handle CPU and MMU.  Using it as a lookup database is an abuse.
   The only reasonable use of pmap_extract(9) is for debugging purpose.
   I think that pmap_extract(9) should be changed like:
   
 bool pmap_mapped_p(struct pmap *, vaddr_t);
   
   and allow it to be used for KASSERT()s.
   
   The only right way to retrieve P-V translation is to lookup from
   vm_map (== the fault handler).  If we honour this principle, VM
   and I/O code will be much more consistent.
  
  pmap(9) has always needed a database to keep track of V-P mappings(*) as 
  wll as P-V mappings so pmap_page_protect() can be implemented.  
 
 pmap_extract() accesses page table (per-space).  pmap_page_protect()
 accesses PV (per-page).  I think they're totally different...

The purpose of pmap(9) is to manage MMU hardware.  Page tables are one 
possible implementation of MMU hardware.  Not all machines have page 
tables.  Some processors use reverse page tables.  Some just have TLBs.  
And if you read secion 5.13 of 
_The_Design_and_Implmentation_of_the_4.4BSD_Operating_System_ 
it says that pmap is allowed to forget any mappings that are not wired.  
So, in theory, all you need to do is keep a linked list of wired mappings 
to insert in the TLB on fault and forget everything else.  Of course, that 
doesn't seem to work so well with UVM.

Anyway, please keep in mind that not all machines are PCs.  I'd really 
hate to see a repeat of the Linux VM subsysem which directly manipulated 
x86 page tables even on architectures that don't have page tables let 
alone somehing compaible wih x86.  pmap(9) is an abstraction layer for 
good reason.

Eduardo


Re: pmap_extract(9) (was Re: xmd(4) (Re: XIP))

2010-11-05 Thread Eduardo Horvath
On Fri, 5 Nov 2010, Masao Uebayashi wrote:

 On Mon, Nov 01, 2010 at 03:52:11PM -0700, Matt Thomas wrote:

  Indeed.  Also consider that pmap's are designed to have to have
  fast V-P translations, using that instead of UVM makes a lot of
  sense.
 
 How does locking works?
 
 My understanding is page tables (per-process) are protected by
 struct vm_map (per-process).  (Or moving toward it.)

No, once again this is MD.  For instance sparc64 uses compare and swap 
instructions to manipulate page tables for lockless synchronization.

Eduardo


Re: XIP

2010-11-05 Thread Mindaugas Rasiukevicius
Masao Uebayashi uebay...@tombi.co.jp wrote:
   Those mechanical vm_page - vm_page_md changes done in pmaps have
   a valid point by itself.  Passing around vm_page pointer across
   pmap functions is unnecessary.  I'd rather say wrong.  All pmap
   needs is vm_page_md.
   
   I'd propose to do this vm_page - vm_page_md change in pmaps first
   in HEAD and sync the branch with it, rather than revert it.
  
  that seems a bit premature, don't you think?
  since you're already talking about redesigning it?
  
  ... apparently not, you already checked it in.
 
 It's not premature.  It clarifies that passing struct vm_page * to
 pmap is unnecessary at all.  We'll need to move those MD PV data
 structures to MI anyway.

There are ideas to move P-V tracking to MI, right.  However, you are
starting to re-design the pmap abstraction here - that is really out of
XIP project scope, and it definitely needs rmind-uvmplock merge first.

  is your desire to control whether the mount accesses the device via
  mappings, or just to be able to see whether or not the device is being
  accessed via mappings?
 
 Both.
 
 My understanding is that mount options change only its internal behavior.
 I don't see how MNT_LOG and MNT_XIP differ.  Mount is done only by
 admins who know internals.  There may be cases where an XIP device
 is much slower than RAM, and page cache is wanted.
 
 I also think that mount option is the only way to configure per-mount
 behavior.

Mount option seems quite useful to me.  Speaking of them.. I would also
like to add MNT_DIRECTIO. :)

  I've thought about loaning and XIP again quite a bit, but in the
  interest of actually sending this mail today, I'll leave this part for
  later too.
 
 I spent a little time for this...
 
 Now uvm_loan() returns to a loaner an array of pointers to struct
 vm_page.  Which means that those struct vm_page's are shared among
 loaner and loanee.  The owner of those pages are recorded in
 vm_page::uanon and vm_page::uobject.
 
 I'm thinking this design is broken.  At least A-A loan is impossible
 for no reason.  I think loanee should allocate (by kmem_alloc(9))
 a new *alias* vm_page, to keep loaning state.  Alias vm_page has
 a pointer to its new owner, and with a pointer to the *real* vm_page
 too.
 
 uanon/uobject is also bad in that it obscures the 1:1 relationship
 of pages and owners.  These two members should be merged into a single
 void *pg_owner.  vm_page::loan_count is only for loaner.  Loaners
 of read-only, wired pages don't need to remember anything.

I do not think loaning is broken, but I see where you are coming from.
Owner-less state i.e. when locker needs to resolve orphaned page brings
some obscurity.  Also, O-A loaning where we do trylocks (since we cannot
hold a reference on the object) is very messy.

Your idea about allocating extra meta-data in order to do the loaning
seems counter-productive, given the whole concept of loaning.

So again.. I would say it is something to revisit after XIP merge.

-- 
Mindaugas


Re: AMAP_SHARED (was Re: XIP)

2010-11-05 Thread Mindaugas Rasiukevicius
y...@mwd.biglobe.ne.jp (YAMAMOTO Takashi) wrote:
 iirc, you need minherit to create shared-cow mappings.
 (imo, the introduction of minherit was a mistake because it
 merely complicated vm.)

I think we should look for a way to remove it.

 
 YAMAMOTO Takashi

-- 
Mindaugas


Re: AMAP_SHARED (was Re: XIP)

2010-11-03 Thread Masao Uebayashi
Correction:

On Thu, Nov 04, 2010 at 01:36:15AM +0900, Masao Uebayashi wrote:
 I have found that XIP is very similar to AMAP_SHARED, at the point
 where XIP supports write operation.
 
 Imagine a FlashROM page is XIP'ed and shared as vnode by processes.
 The user rewrite the firmware written onto the FlashROM.  To keep
 processes running, we need to copy the XIP pages into RAM (page
 cache), put those pages into the vnode, then notify processes to
 update their VA.  This is another kind of COW, done in another
   ^^
s/VA/PV/.  VA stays same.

 layer.
 
 Of course, we need PV tracking here.


Re: AMAP_SHARED (was Re: XIP)

2010-11-03 Thread Masao Uebayashi
On Tue, Nov 02, 2010 at 02:08:13PM +0100, Joerg Sonnenberger wrote:
 On Tue, Nov 02, 2010 at 01:35:10PM +0900, Masao Uebayashi wrote:
  If a small program has both .data and .bss, and if .data is small,
  I'd use .rodata and copy it to .bss explicitly, so that resulting
  process allocates only .bss anon instead .data + .bss.
 
 Why don't you do the reverse and drop .bss explicitly by linker scripts?

Never thought of that.  I just like read-only concept.

ISTR that some Linux XIP optimized filesystem supports compression
of .data.  That'd be good, but it's not on my TODO list.

 
 Joerg

-- 
Masao Uebayashi / Tombi Inc. / Tel: +81-90-9141-4635


Re: AMAP_SHARED (was Re: XIP)

2010-11-02 Thread YAMAMOTO Takashi
hi,

 A few more comments:
 
 - AMAP_SHARED itself is a fine concept; it's used by shared memory.
 
   sys/kern/sysv_shm.c:
   452 error = uvm_map(vm-vm_map, attach_va, size, uobj, 0, 
 0,
   453 UVM_MAPFLAG(prot, prot, UVM_INH_SHARE, 
 UVM_ADV_RANDOM, flags));
 
   (Note UVM_INH_SHARE == MAP_INHERIT_SHARE).

sysv shm is backed by an aobj and does not involve amap.

iirc, you need minherit to create shared-cow mappings.
(imo, the introduction of minherit was a mistake because it
merely complicated vm.)

YAMAMOTO Takashi

 
   I guess MAP_INHERIT_SHARE was added because adding it was easy
   after shared amap was implemented for shared memory?
 
 - For highly tuned, XIP'ed systems, programs should be designed to
   avoid .data, because they're COW'ed to page cache sooner or later.


Re: AMAP_SHARED (was Re: XIP)

2010-11-02 Thread Masao Uebayashi
On Tue, Nov 02, 2010 at 06:31:05AM +, YAMAMOTO Takashi wrote:
 hi,
 
  A few more comments:
  
  - AMAP_SHARED itself is a fine concept; it's used by shared memory.
  
  sys/kern/sysv_shm.c:
  452 error = uvm_map(vm-vm_map, attach_va, size, uobj, 0, 
  0,
  453 UVM_MAPFLAG(prot, prot, UVM_INH_SHARE, 
  UVM_ADV_RANDOM, flags));
  
(Note UVM_INH_SHARE == MAP_INHERIT_SHARE).
 
 sysv shm is backed by an aobj and does not involve amap.

Hmm, you're right.

 
 iirc, you need minherit to create shared-cow mappings.
 (imo, the introduction of minherit was a mistake because it
 merely complicated vm.)

Now I'm 100% fine to remove AMAP_SHARED functionality.  Even without
it, UVM is very good in avoiding unnecessary memory copy etc.

Masao

 
 YAMAMOTO Takashi
 
  
I guess MAP_INHERIT_SHARE was added because adding it was easy
after shared amap was implemented for shared memory?
  
  - For highly tuned, XIP'ed systems, programs should be designed to
avoid .data, because they're COW'ed to page cache sooner or later.


Re: XIP

2010-11-02 Thread Mindaugas Rasiukevicius
Chuck Silvers c...@chuq.com wrote:
- in getpages, rather than allocating a page of zeros for each file,
  it would be better to use the same page of zeroes that mem.c uses.
  if multiple subsystems need a page of zeroes, then I think that
  UVM should provide this functionality rather than duplicating
  that all over the place.  if you don't want to update all the
   copies of mem.c, that's fine, but please put this in UVM instead of
   genfs.
  
  I made it per-vnode, because If you share a single physical zero'ed
  page, when a vnode is putpage'ed, it invalidates all mappings in
  other vnodes pointing to the zero page in a pmap.  Doesn't this
  sound strange?
 
 I think you're misunderstanding something.  for operations like
 msync(), putpage is called on the object in the UVM map entry,
 which would never be the object owning the global page of zeroes.
 from the pagedaemon, putpage is called on a page's object,
 but if the page replacement policy decides to reclaim the page
 of zeroes then invalidating all the mappings is necessary.
 if what you describe would actually happen, then yes,
 I would find it strange.

FYI: In rmind-uvmplock branch, all MD mem.c are replaced with a single MI
driver, which provides zero page.  So that can be easily moved to UVM (and
some day.. in the longer term, we should have a page for each NUMA node).

  Another reason is that by having a vm_page with a uvm_object as
  its parent, fault handler doesn't need to know anything about the
  special zero'ed page.
 
 the fault handler wouldn't need to know about the page of zeroes,
 why do you think it would?  it does need to be aware that the pages
 returned by getpages may not be owned by the object that getpages
 is called with, but it needs to handle that regardless, for layered
 file systems.

Lock sharing among UVM objects (for layered file systems and tmpfs) should
bring some simplifications here, I think.

-- 
Mindaugas


Re: AMAP_SHARED (was Re: XIP)

2010-11-02 Thread Joerg Sonnenberger
On Tue, Nov 02, 2010 at 01:35:10PM +0900, Masao Uebayashi wrote:
 If a small program has both .data and .bss, and if .data is small,
 I'd use .rodata and copy it to .bss explicitly, so that resulting
 process allocates only .bss anon instead .data + .bss.

Why don't you do the reverse and drop .bss explicitly by linker scripts?

Joerg


Re: AMAP_SHARED (was Re: XIP)

2010-11-01 Thread Masao Uebayashi
On Mon, Nov 01, 2010 at 02:32:23AM +0900, Masao Uebayashi wrote:
 Hm.  I may be missing something seriously.  I think I've
 understood the following description you wrote, and I got hit this
 in the very early development stage ... 1 year ago.  It was
 uvm_fault() - uvmfault_promote() - amap_add() - pmap_page_protect()
 Since then I've assumed that shared amap is a pretty much common
 thing...  Now I realize it should not be, as you describe.  Worse,
 I can't reproduce that code path...
 
 Now I have to *really* understand how this works...

I still have no idea how I was hit by this.  Strange.

Considering how AMAP_SHARED is used, especially it's backed by
vnode, this is only for highly tuned server/client programs which
share initialized data (.data).  While such a use case may have
some value, I'd say this is a rare feature.

I append a test program.

 
 (I'll respond about this topic again later.)
 
 Masao
 
 On Tue, Oct 26, 2010 at 02:06:38PM -0700, Chuck Silvers wrote:
 (snip)
  now here's the explanation I promised for how to treat XIP pages
  as unmanaged instead of managed.
  
  first, some background for other people who don't know all this:
  the only reason that treating XIP pages as managed pages is
  relevant at all is because of the AMAP_SHARED flag in UVM,
  which allows anonymous memory to be shared between processes
  such that the changes made by one process are seen by the other.
  this impacts XIP pages (which are not anonymous) because a
  PROT_WRITE, MAP_PRIVATE mapping of an XIP vnode should point to
  the XIP pages as long as all access to the mapping is for reads,
  but when the mapping is written to then the XIP page should be
  copied to an anonymous page (the normal COW operation) but that
  new anonymous page should still be shared between all processes
  that are sharing the AMAP_SHARED mapping.  to force those other
  processes to take another page fault the next time they access
  their copy of the mapping (which we need to do so that they will
  start accessing the new anonymous page instead of the XIP page),
  we must invalidate all the other pmap entries for the XIP page,
  which we do by calling pmap_page_protect() on it.  the pmap layer
  tracks all the mappings of the page and thus it can find them all.
  
  there are several ways that the AMAP_SHARED functionality is used,
  and unfortunately they would need to changed in different ways to
  make this work for XIP pages without needing to track mappings:
  
   - uvm_io(), which copies data between the kernel or current process
 and an arbitrary other process address space.
 currently this works by sharing the other address space with
 the kernel via uvm_map_extract() and then just using uiomove()
 to transfer the data.  this could be done instead by using part
 of the uvm_fault() code to find the physical page in the other
 address space that we want to access and lock it (ie. set PG_BUSY),
 map the page into the kernel (perhaps using uvm_pager_mapin()),
 transfer the data, then unmap the page and unlock it.
  
   - uvm_mremap(), which resizes an existing mapping.
 this uses uvm_map_extract() internally, which uses AMAP_SHARED,
 but the mremap operation doesn't actually need the semantics of
 AMAP_SHARED since as mremap doesn't create any additional mappings
 as far as applications are concerned.  the usage of AMAP_SHARED
 is just a side-effect of the current implementation, which bends
 over backward to call a bunch of existing higher-level functions
 rather than doing something more direct (which would be simpler
 and more efficient as well).
  
   - MAP_INHERIT_SHARE, used to implement minherit().
 this is the one that is the most trouble, since it's what AMAP_SHARED
 was invented for.  however, it's also of least importance since
 some searching with google finds absolutely no evidence of
 any application actually using it, just lots of copies of
 the implementations and manpages for various operating systems.
  
 with that in mind, there are several ways this could be handled.
  (1) just drop support for minherit() entirely.
  (2) reject attempts to set MAP_INHERIT_SHARE on XIP mappings.
  (3) copy XIP pages into normal anon pages when setting
  MAP_INHERIT_SHARE.
  (4) copy XIP pages into normal vnode pages when setting
  MAP_INHERIT_SHARE.  this would mean that the getpages
  code would need to look in the normal page cache
  before using XIP pages.  I think this option would also
  need getpages to know about the inherit flag to
  correctly handle later faults on XIP mappings,
  and there are probably other sublte complications.
  
 of these choices, (2) sounds like the best compromise to me.
  
  
  this approach would also bring back some issues where our previous
  discussion went around in circles, such as 

pmap_extract(9) (was Re: xmd(4) (Re: XIP))

2010-11-01 Thread Masao Uebayashi
I think pmap_extract(9) is a bad API.

After MD bootstrap code detects all physical memories, it gives
all the informations to UVM, including available KVA.  At this
point UVM knows all the available resources of virtual/physical
addresses.  UVM is responsible to manage all of these.

Calling pmap_extract(9) means that some kernel code asks pmap(9)
to look up a physical address.  pmap(9) is only responsible to
handle CPU and MMU.  Using it as a lookup database is an abuse.
The only reasonable use of pmap_extract(9) is for debugging purpose.
I think that pmap_extract(9) should be changed like:

bool pmap_mapped_p(struct pmap *, vaddr_t);

and allow it to be used for KASSERT()s.

The only right way to retrieve P-V translation is to lookup from
vm_map (== the fault handler).  If we honour this principle, VM
and I/O code will be much more consistent.

Masao

-- 
Masao Uebayashi / Tombi Inc. / Tel: +81-90-9141-4635


Re: pmap_extract(9) (was Re: xmd(4) (Re: XIP))

2010-11-01 Thread der Mouse
 The only right way to retrieve P-V translation is to lookup from
 vm_map (== the fault handler).

What about setting up DMA on machines whose DMA uses physical
addresses?  Or does the DMA code get an exception to this rule?

I also suspect debugging may well be a non-ignorable use case, though I
could also be wrong about that.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: pmap_extract(9) (was Re: xmd(4) (Re: XIP))

2010-11-01 Thread Eduardo Horvath
On Mon, 1 Nov 2010, Masao Uebayashi wrote:

 I think pmap_extract(9) is a bad API.
 
 After MD bootstrap code detects all physical memories, it gives
 all the informations to UVM, including available KVA.  At this
 point UVM knows all the available resources of virtual/physical
 addresses.  UVM is responsible to manage all of these.

This is managed RAM.  What about I/O pages?

 Calling pmap_extract(9) means that some kernel code asks pmap(9)
 to look up a physical address.  pmap(9) is only responsible to
 handle CPU and MMU.  Using it as a lookup database is an abuse.
 The only reasonable use of pmap_extract(9) is for debugging purpose.
 I think that pmap_extract(9) should be changed like:
 
   bool pmap_mapped_p(struct pmap *, vaddr_t);
 
 and allow it to be used for KASSERT()s.
 
 The only right way to retrieve P-V translation is to lookup from
 vm_map (== the fault handler).  If we honour this principle, VM
 and I/O code will be much more consistent.

pmap(9) has always needed a database to keep track of V-P mappings(*) as 
wll as P-V mappings so pmap_page_protect() can be implemented.  

Are you planning on moving the responsibility of tracking P-V mappings to 
UVM?

* While you can claim that keeping track of P-V mappings is the primary 
function of pmap(9) and a sideffect of page tables, that posits the 
machine in quesion uses page tables.  In a machine with a software managed 
TLB you could implement pmap(9) by walking the UVM structures on a page 
fault and generating TLB entries from the vm_page structure.  This would 
reduce the amount of duplicate informaion maintained by the VM subsystems.  
However, UVM currently assumes pmap() remembers all forward and reverse 
mappings.  If pmap() forgets them, bad things happen.

Eduardo


Re: pmap_extract(9) (was Re: xmd(4) (Re: XIP))

2010-11-01 Thread Matt Thomas

On Nov 1, 2010, at 8:55 AM, Eduardo Horvath wrote:

 On Mon, 1 Nov 2010, Masao Uebayashi wrote:
 
 I think pmap_extract(9) is a bad API.
 
 After MD bootstrap code detects all physical memories, it gives
 all the informations to UVM, including available KVA.  At this
 point UVM knows all the available resources of virtual/physical
 addresses.  UVM is responsible to manage all of these.
 
 This is managed RAM.  What about I/O pages?

Indeed.  Also consider that pmap's are designed to have to have
fast V-P translations, using that instead of UVM makes a lot of
sense.


re: AMAP_SHARED (was Re: XIP)

2010-11-01 Thread matthew green

 - AMAP_SHARED itself is a fine concept; it's used by shared memory.
 
   sys/kern/sysv_shm.c:
   452 error = uvm_map(vm-vm_map, attach_va, size, uobj, 0, 
 0,
   453 UVM_MAPFLAG(prot, prot, UVM_INH_SHARE, 
 UVM_ADV_RANDOM, flags));
 
   (Note UVM_INH_SHARE == MAP_INHERIT_SHARE).
 
   I guess MAP_INHERIT_SHARE was added because adding it was easy
   after shared amap was implemented for shared memory?

MAP_INHERIT_SHARE was originally MAP_INHERIT, and came from
machvm.

 - For highly tuned, XIP'ed systems, programs should be designed to
   avoid .data, because they're COW'ed to page cache sooner or later.

why is this a problem?

if the data is needed, and it will be written to, then these pages
will be allocated (COW'd) eventually, and the same space will be used.


.mrg.


Re: AMAP_SHARED (was Re: XIP)

2010-11-01 Thread Masao Uebayashi
On Tue, Nov 02, 2010 at 02:36:30PM +1100, matthew green wrote:
 
  - AMAP_SHARED itself is a fine concept; it's used by shared memory.
  
  sys/kern/sysv_shm.c:
  452 error = uvm_map(vm-vm_map, attach_va, size, uobj, 0, 
  0,
  453 UVM_MAPFLAG(prot, prot, UVM_INH_SHARE, 
  UVM_ADV_RANDOM, flags));
  
(Note UVM_INH_SHARE == MAP_INHERIT_SHARE).
  
I guess MAP_INHERIT_SHARE was added because adding it was easy
after shared amap was implemented for shared memory?
 
 MAP_INHERIT_SHARE was originally MAP_INHERIT, and came from
 machvm.

This was a reply to:

http://mail-index.netbsd.org/tech-kern/2010/10/26/msg009085.html

 - MAP_INHERIT_SHARE, used to implement minherit().
   this is the one that is the most trouble, since it's what 
AMAP_SHARED
   was invented for.  however, it's also of least importance since

Even if MAP_INHERIT_SHARE preceded SYSV SHM, we need AMAP_SHARED anyway.

(I don't know if Mach had shared memory.)

 
  - For highly tuned, XIP'ed systems, programs should be designed to
avoid .data, because they're COW'ed to page cache sooner or later.
 
 why is this a problem?
 
 if the data is needed, and it will be written to, then these pages
 will be allocated (COW'd) eventually, and the same space will be used.

Not a problem, as in it works.

As already explained, we allocate PV for XIP segments, only for
vnode-backed AMAP_SHARED == shared .data.  Careful users may design
the whole system to not allocate PV at all, by giving up that
feature.  To help user's design decision, I stated the obvious -
.data is XIP-unfriendly.

 
 
 .mrg.


re: AMAP_SHARED (was Re: XIP)

2010-11-01 Thread matthew green

   - For highly tuned, XIP'ed systems, programs should be designed to
 avoid .data, because they're COW'ed to page cache sooner or later.
  
  why is this a problem?
  
  if the data is needed, and it will be written to, then these pages
  will be allocated (COW'd) eventually, and the same space will be used.
 
 Not a problem, as in it works.
 
 As already explained, we allocate PV for XIP segments, only for
 vnode-backed AMAP_SHARED == shared .data.  Careful users may design
 the whole system to not allocate PV at all, by giving up that
 feature.  To help user's design decision, I stated the obvious -
 .data is XIP-unfriendly.

but why is it unfriendly?  i don't see why.  there's going to
be the same number of pages allocated for writeable data in
both cases, so the same amount of resources will be consumed.


.mrg.


Re: AMAP_SHARED (was Re: XIP)

2010-11-01 Thread Masao Uebayashi
On Tue, Nov 02, 2010 at 03:09:51PM +1100, matthew green wrote:
 
- For highly tuned, XIP'ed systems, programs should be designed to
  avoid .data, because they're COW'ed to page cache sooner or later.
   
   why is this a problem?
   
   if the data is needed, and it will be written to, then these pages
   will be allocated (COW'd) eventually, and the same space will be used.
  
  Not a problem, as in it works.
  
  As already explained, we allocate PV for XIP segments, only for
  vnode-backed AMAP_SHARED == shared .data.  Careful users may design
  the whole system to not allocate PV at all, by giving up that
  feature.  To help user's design decision, I stated the obvious -
  .data is XIP-unfriendly.
 
 but why is it unfriendly?  i don't see why.  there's going to
 be the same number of pages allocated for writeable data in
 both cases, so the same amount of resources will be consumed.

What do you mean by both cases here?

If a small program has both .data and .bss, and if .data is small,
I'd use .rodata and copy it to .bss explicitly, so that resulting
process allocates only .bss anon instead .data + .bss.

 
 
 .mrg.

-- 
Masao Uebayashi / Tombi Inc. / Tel: +81-90-9141-4635


re: AMAP_SHARED (was Re: XIP)

2010-11-01 Thread matthew green

 - For highly tuned, XIP'ed systems, programs should be designed to
   avoid .data, because they're COW'ed to page cache sooner or later.

why is this a problem?

if the data is needed, and it will be written to, then these pages
will be allocated (COW'd) eventually, and the same space will be used.
   
   Not a problem, as in it works.
   
   As already explained, we allocate PV for XIP segments, only for
   vnode-backed AMAP_SHARED == shared .data.  Careful users may design
   the whole system to not allocate PV at all, by giving up that
   feature.  To help user's design decision, I stated the obvious -
   .data is XIP-unfriendly.
  
  but why is it unfriendly?  i don't see why.  there's going to
  be the same number of pages allocated for writeable data in
  both cases, so the same amount of resources will be consumed.
 
 What do you mean by both cases here?

i mean moving stuff from .data to elsewhere, compared to the
normal method.

 If a small program has both .data and .bss, and if .data is small,
 I'd use .rodata and copy it to .bss explicitly, so that resulting
 process allocates only .bss anon instead .data + .bss.

why is this useful?  what's the saving?  maybe one page if
roundup(data + bss) is smaller than roundup(data) +
roundup(bss).

my point is that if a program needs data, whether it is from
the .data or .bss, the same amount of resources will be
consumed when pages are written to (or not.)  (possibly there
is a one-page saving...)


.mrg.


Re: AMAP_SHARED (was Re: XIP)

2010-11-01 Thread Masao Uebayashi
On Tue, Nov 02, 2010 at 03:47:55PM +1100, matthew green wrote:
 
  - For highly tuned, XIP'ed systems, programs should be designed to
avoid .data, because they're COW'ed to page cache sooner or later.
 
 why is this a problem?
 
 if the data is needed, and it will be written to, then these pages
 will be allocated (COW'd) eventually, and the same space will be used.

Not a problem, as in it works.

As already explained, we allocate PV for XIP segments, only for
vnode-backed AMAP_SHARED == shared .data.  Careful users may design
the whole system to not allocate PV at all, by giving up that
feature.  To help user's design decision, I stated the obvious -
.data is XIP-unfriendly.
   
   but why is it unfriendly?  i don't see why.  there's going to
   be the same number of pages allocated for writeable data in
   both cases, so the same amount of resources will be consumed.
  
  What do you mean by both cases here?
 
 i mean moving stuff from .data to elsewhere, compared to the
 normal method.
 
  If a small program has both .data and .bss, and if .data is small,
  I'd use .rodata and copy it to .bss explicitly, so that resulting
  process allocates only .bss anon instead .data + .bss.
 
 why is this useful?  what's the saving?  maybe one page if
 roundup(data + bss) is smaller than roundup(data) +
 roundup(bss).
 
 my point is that if a program needs data, whether it is from
 the .data or .bss, the same amount of resources will be
 consumed when pages are written to (or not.)  (possibly there
 is a one-page saving...)

You're right.

One page saving is a saving too.

 
 
 .mrg.


AMAP_SHARED (was Re: XIP)

2010-10-31 Thread Masao Uebayashi
Hm.  I may be missing something seriously.  I think I've
understood the following description you wrote, and I got hit this
in the very early development stage ... 1 year ago.  It was
uvm_fault() - uvmfault_promote() - amap_add() - pmap_page_protect()
Since then I've assumed that shared amap is a pretty much common
thing...  Now I realize it should not be, as you describe.  Worse,
I can't reproduce that code path...

Now I have to *really* understand how this works...

(I'll respond about this topic again later.)

Masao

On Tue, Oct 26, 2010 at 02:06:38PM -0700, Chuck Silvers wrote:
(snip)
 now here's the explanation I promised for how to treat XIP pages
 as unmanaged instead of managed.
 
 first, some background for other people who don't know all this:
 the only reason that treating XIP pages as managed pages is
 relevant at all is because of the AMAP_SHARED flag in UVM,
 which allows anonymous memory to be shared between processes
 such that the changes made by one process are seen by the other.
 this impacts XIP pages (which are not anonymous) because a
 PROT_WRITE, MAP_PRIVATE mapping of an XIP vnode should point to
 the XIP pages as long as all access to the mapping is for reads,
 but when the mapping is written to then the XIP page should be
 copied to an anonymous page (the normal COW operation) but that
 new anonymous page should still be shared between all processes
 that are sharing the AMAP_SHARED mapping.  to force those other
 processes to take another page fault the next time they access
 their copy of the mapping (which we need to do so that they will
 start accessing the new anonymous page instead of the XIP page),
 we must invalidate all the other pmap entries for the XIP page,
 which we do by calling pmap_page_protect() on it.  the pmap layer
 tracks all the mappings of the page and thus it can find them all.
 
 there are several ways that the AMAP_SHARED functionality is used,
 and unfortunately they would need to changed in different ways to
 make this work for XIP pages without needing to track mappings:
 
  - uvm_io(), which copies data between the kernel or current process
and an arbitrary other process address space.
currently this works by sharing the other address space with
the kernel via uvm_map_extract() and then just using uiomove()
to transfer the data.  this could be done instead by using part
of the uvm_fault() code to find the physical page in the other
address space that we want to access and lock it (ie. set PG_BUSY),
map the page into the kernel (perhaps using uvm_pager_mapin()),
transfer the data, then unmap the page and unlock it.
 
  - uvm_mremap(), which resizes an existing mapping.
this uses uvm_map_extract() internally, which uses AMAP_SHARED,
but the mremap operation doesn't actually need the semantics of
AMAP_SHARED since as mremap doesn't create any additional mappings
as far as applications are concerned.  the usage of AMAP_SHARED
is just a side-effect of the current implementation, which bends
over backward to call a bunch of existing higher-level functions
rather than doing something more direct (which would be simpler
and more efficient as well).
 
  - MAP_INHERIT_SHARE, used to implement minherit().
this is the one that is the most trouble, since it's what AMAP_SHARED
was invented for.  however, it's also of least importance since
some searching with google finds absolutely no evidence of
any application actually using it, just lots of copies of
the implementations and manpages for various operating systems.
 
with that in mind, there are several ways this could be handled.
   (1) just drop support for minherit() entirely.
   (2) reject attempts to set MAP_INHERIT_SHARE on XIP mappings.
   (3) copy XIP pages into normal anon pages when setting
   MAP_INHERIT_SHARE.
   (4) copy XIP pages into normal vnode pages when setting
   MAP_INHERIT_SHARE.  this would mean that the getpages
   code would need to look in the normal page cache
   before using XIP pages.  I think this option would also
   need getpages to know about the inherit flag to
   correctly handle later faults on XIP mappings,
   and there are probably other sublte complications.
 
of these choices, (2) sounds like the best compromise to me.
 
 
 this approach would also bring back some issues where our previous
 discussion went around in circles, such as callers of VOP_GETPAGES()
 wanting vm_page pointers but XIP pages not having them, but
 I'll leave that additional discussion for future email if necessary.
 
 I'm just going over all this now so that it's clear to everyone
 that this kind of approach is possible if the memory overhead of
 the full vm_page structures for XIP pages is deemed too high.
 
 
 -Chuck

-- 
Masao Uebayashi / Tombi Inc. / Tel: +81-90-9141-4635


Re: XIP

2010-10-31 Thread Masao Uebayashi
Even if XIP doesn't need to track PV at all, you still need a way
to pass page identities from XIP vnode pager to fault handler.

You said you liked a single pointer for that.  I started going that
route, and gave up.  My current plan is (struct vm_physseg * +
off_t).  This will involve many changes, so I'll leave this for
now too.

Masao

-- 
Masao Uebayashi / Tombi Inc. / Tel: +81-90-9141-4635


Re: xmd(4) (Re: XIP)

2010-10-31 Thread Masao Uebayashi
On Thu, Oct 28, 2010 at 07:59:17AM +, YAMAMOTO Takashi wrote:
 hi,
 
  On Thu, Oct 28, 2010 at 05:31:45AM +, YAMAMOTO Takashi wrote:
  hi,
  
   Here's the reason why I've written xmd_machdep.c:
   
   xmd(4) is a read-only RAM-based disk driver capable of XIP.  The
   main purpose is to test XIP functionality.  xmd(4) can be implemented
   on any platforms that supports VM in theory.  xmd(4) may be also
   useful for other cases where md(4) is used, but users want to save
   memory.  md(4) allocates memory for its storage, and copies pages
   from or to page cache.
   
   xmd(4) allocates a static, read-only array and provides it as a
   backing store.  When it's used as XIP, it registers the array as
   a physical device page segment.  From VM's POV, the registered
   region is seen like a ROM in a device connected over some buses.
   
   The procedure to register an array as a physical segment is somewhat
   strange.  The registered array resides in kernel's read-only data
   section.  Kernel already maps its static region (text, rodata,
   data, bss, ...) at boot time.  xmd(4) re-defines part of it as
   a physical device segment, like a ROM connected via another bus.
   
   As far as the backing store array resides in main memory, you'll
   end up with some way to convert kernel VA back to physical address.
   There is no alternative to achieve the goal in MI way, or xmd.c is
   sprinkled like mem.c.
  
  why can't you use pmap_extract?
  
  Because looking up a paddr_t doesn't help alone.
  
  The driver needs to allocate a physical segment object (struct
  vm_physseg) and per-page objects (struct vm_page), so that its
  region can be mapped to user address.  This is done by calling
  bus_space_physload_device() or xmd_machdep_physload(), which in
  turn call uvm_page_physload_device().
  
  This is what happens during a fault onto xmd:
  
  - User opens a cdev (/dev/XXX), then calls mmap() with its fd
  - User touch a mapped address
  - Fault is triggered, fault handler looks up user's map and map
entry
  - uvm_fault() - udv_fault() - cdev_mmap() - xmd_mmap()
  - xmd_mmap() returns a paddr_t
  - udv_fault() enters the paddr_t to pmap_enter()
  - pmap_enter looks up a vm_physseg from a paddr_t
  - pmap_enter looks up a vm_page from a vm_physseg
  - pmap_enter looks up a vm_page_md from a vm_page
  :
  
  This is redundant.  The problem is we use paddr_t as a cookie
  to identify a page in a segment, overriding its original meaning,
  physical address.  What pmap_enter needs is an ID.  Looking up a
  physical address from an ID is easy.  The reverse is not.
  
  After these observations, I have concluded that any appearance of
  paddr_t in any MI code (sys/uvm, sys/kern, sys/dev) is a wrong
  approach.  I don't see how pmap_extract() helps this situation?
 
 because you seem saying that there is no MI way to
 convert kernel VA back to physical address, i suggested 
 pmap_extract.  i guess i don't understand your situation. :-)

I come to think that pmap_extract(9) API is unnecessary at all.
See other mails for the details...

Masao

 
 YAMAMOTO Takashi
 
  
  Masao
  
  
  YAMAMOTO Takashi
  
   
   Masao
  
  -- 
  Masao Uebayashi / Tombi Inc. / Tel: +81-90-9141-4635

-- 
Masao Uebayashi / Tombi Inc. / Tel: +81-90-9141-4635


Re: XIP

2010-10-30 Thread Masao Uebayashi
Hi,

On Sat, Oct 30, 2010 at 06:55:42PM -0700, Chuck Silvers wrote:
 On Wed, Oct 27, 2010 at 06:38:11PM +0900, Masao Uebayashi wrote:
  On Tue, Oct 26, 2010 at 02:06:38PM -0700, Chuck Silvers wrote:
   On Mon, Oct 25, 2010 at 02:09:43AM +0900, Masao Uebayashi wrote:
I think the uebayasi-xip branch is ready to be merged.
   
   hi,
   
   here's what I found looking at the current code in the branch:
   
   
- the biggest issue I had with the version that I reviewed earlier
  was that it muddled the notion of a managed page.  you wanted
  to create a new kind of partially-managed page for XIP devices
  which would not be managed by the UVM in the sense that it could
  contain different things depending on what was needed but only that
  the page's mappings would be tracked so that pmap_page_protect()
  could be called on it.  this is what led to all the pmap changes
  the pmaps needed to be able to handle being called with a vm_page
  pointer that didn't actually point to a struct vm_page.
   
  it looks like you've gotten rid of that, which I like, but you've
  replaced it with allocating a full vm_page structure for every page in
  an XIP device, which seems like a waste of memory.  as we discussed
  earlier, I think it would be better to treat XIP pages as unmanaged
  and change a few other parts of UVM to avoid needing to track the
  mappings of XIP page mappings.  I have thoughts on how to do all that,
  which I'll list at the end of this mail.  however, if XIP devices
  are small enough that the memory overhead of treating device pages
  as managed is reasonable, then I'll go along with it.
  so how big do these XIP devices get?
  
  It's waste of memory, yes.  With 64M ROM on arm (4K page, 80byte vm_page),
  the array is 1.25M.  If vm_page's made a single pointer (sizeof(void
  *) == 4), the array size becomes 64K.  Not small difference.
  Typical XIP'ed products would be mobile devices with FlashROM, or small
  servers with memory disk (md or xmd).  About 16M~1G RAM/ROM?
  
  I made it back to have vm_page to simplify code.  We can make it
  to vm_page_md or whatever minimal, once after we figure out the
  new design of MI vm_page_md.
  
  either way, the changes to the various pmaps to handle the fake vm_page
  pointers aren't necessary anymore, so all those changes should be 
   reverted.
  
  Those mechanical vm_page - vm_page_md changes done in pmaps have
  a valid point by itself.  Passing around vm_page pointer across
  pmap functions is unnecessary.  I'd rather say wrong.  All pmap
  needs is vm_page_md.
  
  I'd propose to do this vm_page - vm_page_md change in pmaps first
  in HEAD and sync the branch with it, rather than revert it.
 
 that seems a bit premature, don't you think?
 since you're already talking about redesigning it?
 
 ... apparently not, you already checked it in.

It's not premature.  It clarifies that passing struct vm_page * to
pmap is unnecessary at all.  We'll need to move those MD PV data
structures to MI anyway.

This is what I'm having in my head:

struct vm_physseg {
:
paddr_t start, end;
:
struct vm_page *pgs;
:
struct vm_pv *pvs;
:
};

struct vm_pv {
/* == struct vm_page_md, or what's called PV head now */
};

struct vm_pv_entry {
/* == what's called PV entry now */
};

All pmaps are converted to use struct vm_pv array, instead of struct
vm_page::struct vm_page_md.

Here, the page identity is (struct vm_physseg *, off_t).  You can
retrieve the physical address of a given struct vm_page * or struct
vm_pv *, by looking up the matching struct vm_physseg from the
global list (as talked about killing vm_page::phys_addr).

Of course XIP-capable, read-only physical segments have only vm_pv[].
I think tracking PV there for shared amap is worth doing in the
generic XIP design.

pmaps are passed struct vm_pv *, not struct vm_page *.  Physical
address is looked up by calling a function.

 
 
  it doesn't look like the PMAP_UNMANAGED flag that you added before
  is necessary anymore either, is that right?  if so, it should also
  be reverted.  if not, what's it for now?
  
  pmap_enter() passes paddr_t directly to pmap.  pmap has no clue if
  the given paddr_t is to be cached/uncached.  So a separate flag,
  PMAP_UNMANAGED.  Without it, a FlashROM which is registered as
  (potentially) managed (using bus_space_physload_device) is always
  mapped to user address as cached, even if it's not XIP.  This made
  userland flash writer program not work.
  
  The point is whether to be cached or not is decided by virtual
  address.  Thus such an information should be stored in vm_map_entry
  explicitly, in theory.
  
  (This is related to framebuffers too.)
 
 there is already an MI mechanism for 

Re: xmd(4) (Re: XIP)

2010-10-28 Thread Masao Uebayashi
On Thu, Oct 28, 2010 at 05:31:45AM +, YAMAMOTO Takashi wrote:
 hi,
 
  Here's the reason why I've written xmd_machdep.c:
  
  xmd(4) is a read-only RAM-based disk driver capable of XIP.  The
  main purpose is to test XIP functionality.  xmd(4) can be implemented
  on any platforms that supports VM in theory.  xmd(4) may be also
  useful for other cases where md(4) is used, but users want to save
  memory.  md(4) allocates memory for its storage, and copies pages
  from or to page cache.
  
  xmd(4) allocates a static, read-only array and provides it as a
  backing store.  When it's used as XIP, it registers the array as
  a physical device page segment.  From VM's POV, the registered
  region is seen like a ROM in a device connected over some buses.
  
  The procedure to register an array as a physical segment is somewhat
  strange.  The registered array resides in kernel's read-only data
  section.  Kernel already maps its static region (text, rodata,
  data, bss, ...) at boot time.  xmd(4) re-defines part of it as
  a physical device segment, like a ROM connected via another bus.
  
  As far as the backing store array resides in main memory, you'll
  end up with some way to convert kernel VA back to physical address.
  There is no alternative to achieve the goal in MI way, or xmd.c is
  sprinkled like mem.c.
 
 why can't you use pmap_extract?

Because looking up a paddr_t doesn't help alone.

The driver needs to allocate a physical segment object (struct
vm_physseg) and per-page objects (struct vm_page), so that its
region can be mapped to user address.  This is done by calling
bus_space_physload_device() or xmd_machdep_physload(), which in
turn call uvm_page_physload_device().

This is what happens during a fault onto xmd:

- User opens a cdev (/dev/XXX), then calls mmap() with its fd
- User touch a mapped address
- Fault is triggered, fault handler looks up user's map and map
  entry
- uvm_fault() - udv_fault() - cdev_mmap() - xmd_mmap()
- xmd_mmap() returns a paddr_t
- udv_fault() enters the paddr_t to pmap_enter()
- pmap_enter looks up a vm_physseg from a paddr_t
- pmap_enter looks up a vm_page from a vm_physseg
- pmap_enter looks up a vm_page_md from a vm_page
:

This is redundant.  The problem is we use paddr_t as a cookie
to identify a page in a segment, overriding its original meaning,
physical address.  What pmap_enter needs is an ID.  Looking up a
physical address from an ID is easy.  The reverse is not.

After these observations, I have concluded that any appearance of
paddr_t in any MI code (sys/uvm, sys/kern, sys/dev) is a wrong
approach.  I don't see how pmap_extract() helps this situation?

Masao

 
 YAMAMOTO Takashi
 
  
  Masao

-- 
Masao Uebayashi / Tombi Inc. / Tel: +81-90-9141-4635


Re: xmd(4) (Re: XIP)

2010-10-27 Thread YAMAMOTO Takashi
hi,

 Here's the reason why I've written xmd_machdep.c:
 
 xmd(4) is a read-only RAM-based disk driver capable of XIP.  The
 main purpose is to test XIP functionality.  xmd(4) can be implemented
 on any platforms that supports VM in theory.  xmd(4) may be also
 useful for other cases where md(4) is used, but users want to save
 memory.  md(4) allocates memory for its storage, and copies pages
 from or to page cache.
 
 xmd(4) allocates a static, read-only array and provides it as a
 backing store.  When it's used as XIP, it registers the array as
 a physical device page segment.  From VM's POV, the registered
 region is seen like a ROM in a device connected over some buses.
 
 The procedure to register an array as a physical segment is somewhat
 strange.  The registered array resides in kernel's read-only data
 section.  Kernel already maps its static region (text, rodata,
 data, bss, ...) at boot time.  xmd(4) re-defines part of it as
 a physical device segment, like a ROM connected via another bus.
 
 As far as the backing store array resides in main memory, you'll
 end up with some way to convert kernel VA back to physical address.
 There is no alternative to achieve the goal in MI way, or xmd.c is
 sprinkled like mem.c.

why can't you use pmap_extract?

YAMAMOTO Takashi

 
 Masao


Re: XIP

2010-10-26 Thread Alan Barrett
On Mon, 25 Oct 2010, Masao Uebayashi wrote:
 I think the uebayasi-xip branch is ready to be merged.
 
 This branch implements a preliminary support of eXecute-In-Place;
 execute programs directly from memory-mappable devices without
 copying files into RAM.  This benefits mainly resource restricted
 embedded systems to save RAM consumption.

Would memory disks (such as md(4)) also benefit from XIP, or do they
already do something to avoid having multiple copies of the same data?

--apb (Alan Barrett)



Re: XIP

2010-10-26 Thread Alan Barrett
On Tue, 26 Oct 2010, Alan Barrett wrote:
 Would memory disks (such as md(4)) also benefit from XIP, or do they
 already do something to avoid having multiple copies of the same data?

Never mind.  I see you discuss this in section 11.6 of the paper.

--apb (Alan Barrett)


Re: XIP

2010-10-26 Thread Masao Uebayashi
   http://uebayasi.dyndns.org/~uebayasi/tmp/bsdcon-2010-xip.pdf

http://uebayasi.dyndns.org/~uebayasi/tmp/bsdcan-2010-xip.pdf
 ^^


Re: XIP

2010-10-26 Thread Chuck Silvers
On Mon, Oct 25, 2010 at 02:09:43AM +0900, Masao Uebayashi wrote:
 I think the uebayasi-xip branch is ready to be merged.

hi,

here's what I found looking at the current code in the branch:


 - the biggest issue I had with the version that I reviewed earlier
   was that it muddled the notion of a managed page.  you wanted
   to create a new kind of partially-managed page for XIP devices
   which would not be managed by the UVM in the sense that it could
   contain different things depending on what was needed but only that
   the page's mappings would be tracked so that pmap_page_protect()
   could be called on it.  this is what led to all the pmap changes
   the pmaps needed to be able to handle being called with a vm_page
   pointer that didn't actually point to a struct vm_page.

   it looks like you've gotten rid of that, which I like, but you've
   replaced it with allocating a full vm_page structure for every page in
   an XIP device, which seems like a waste of memory.  as we discussed
   earlier, I think it would be better to treat XIP pages as unmanaged
   and change a few other parts of UVM to avoid needing to track the
   mappings of XIP page mappings.  I have thoughts on how to do all that,
   which I'll list at the end of this mail.  however, if XIP devices
   are small enough that the memory overhead of treating device pages
   as managed is reasonable, then I'll go along with it.
   so how big do these XIP devices get?

   either way, the changes to the various pmaps to handle the fake vm_page
   pointers aren't necessary anymore, so all those changes should be reverted.

   it doesn't look like the PMAP_UNMANAGED flag that you added before
   is necessary anymore either, is that right?  if so, it should also
   be reverted.  if not, what's it for now?

   if we do keep the vm_page structures for device pages, I think
   it would be better to put them in the same vm_physseg array as
   the normal memory rather than having a parallel set of functions
   and data structures for device pages.  device pages would just be
   loaded with avail_start and avail_end indicating that none of the pages
   should be added to the freelist.  is there any other way that
   device pages need to be treated differently from normal pages
   as far as their physseg information is concerned?

 - I mentioned before that I think the driver for testing this using
   normal memory to simulate an XIP device should just be the existing
   md driver, rather than a whole new driver whose only purpose
   would be testing the XIP code.  however, you wrote a separate
   xmd driver anyway.  so I'll say again: I think the xmd should be
   merged into the md driver.

   you also have an xmd_machdep.c for various platforms, but nothing
   in that file is really machine-specific.  rather than use the
   machine-specific macros for converting between bytes and pages,
   it would be better to either use the MI macros (ptoa / atop)
   or just shift by PAGE_SHIFT, so that there doesn't need to be
   extra code written for each platform.

 - any function that might be called from a kernel module should always exist,
   regardless of which kernel options are enabled.  the particular one
   that I noticed is uvm_page_physload_device().  in general, it seems to me
   that if this were better intergrated then it wouldn't be worthwhile having
   an XIP kernel option at all, since there would be very little code to omit.

 - trivial amounts of code in kernel modules shouldn't be compile-time
   conditional either, so that there don't need to be separate binaries
   for the module with and without the option.  the various XIP bits in xmd.c
   fall into this category.

 - as we discussed before, the addition of bus_space_physload() and related
   interfaces seems very strange to me, especially since the two implementations
   you've added both do exactly the same thing (call bus_space_mmap() on the
   start and end of the address range, and then call uvm_page_physload()
   with the resulting paddr_t values).  is there any reason this can't be
   a MI function?   also, the non-device versions of these are unused.

 - in many files the only change you made was to replace the include of
   uvm/uvm_extern.h with uvm/uvm.h, what's the reason for that?

 - why is the majors entry for the flash driver in the MD majors files?
   isn't this an MI driver?

 - in flash.c, most of the body of flash_init() is ifdef'd out,
   what's up with that?

 - we talked before about removing the xip mount option and enabling
   this functionality automatically when it's available, which you did
   but then recently changed back.  so I'll ask again,
   why do we need a mount option?

 - as we've discussed before, I think that the XIP genfs_getpages()
   should be merged with the existing one before you merge this into
   -current.  merging it as-is would make a mess, and it's better to
   avoid creating messes than than to promise to clean them up later.
   we've 

Re: XIP

2010-10-25 Thread Izumi Tsutsui
uebay...@tombi.co.jp wrote:

 I think the uebayasi-xip branch is ready to be merged.
 
 This branch implements a preliminary support of eXecute-In-Place;
 execute programs directly from memory-mappable devices without
 copying files into RAM.  This benefits mainly resource restricted
 embedded systems to save RAM consumption.
 
 My approach to achieve XIP is to implement a generic XIP vnode
 pager which is neutral to underlying filesystem formats.  The result
 is a minimal code impact + sharing the generic fault handler.

Probably it's better to post more verbose summary that describes:
- which MI kernel structures/APIs are modified or added
- which sources are mainly affected
- how many changes are/were required in MD pmap or each file system etc.
- which ports / devices are actually tested
- related man page in branch
- benchmark results
etc.

 I asked Chuck Silvers to review the branch.  I believe I've addressed
 most of his questions except a few ones.

It's also better to post all his questions and your answers
so that other guys can also see what's going on.

---
Izumi Tsutsui


Re: XIP

2010-10-25 Thread Masao Uebayashi
Attachments forgotten in the previous mail.
--- dest.pdk/root/vmstat-s.ro.12010-10-26 02:51:27.0 +0900
+++ dest.pdk/root/vmstat-s.ro+xip.12010-10-26 02:47:23.0 +0900
@@ -1,8 +1,8 @@
  4096 bytes per page
 1 page color
 63479 pages managed
-59826 pages free
- 2572 pages active
+61961 pages free
+  435 pages active
 0 pages inactive
 0 pages paging
 0 pages wired
@@ -10,7 +10,7 @@
 1 reserve pagedaemon pages
 5 reserve kernel pages
98 anonymous pages
- 2184 cached file pages
+   47 cached file pages
   290 cached executable pages
   256 minimum free pages
   341 target free pages
@@ -20,10 +20,10 @@
 0 swap pages in use
 0 swap allocations
  2355 total faults taken
- 2811 traps
+ 2760 traps
 0 device interrupts
- 2773 CPU context switches
- 1426 software interrupts
+ 3196 CPU context switches
+ 1592 software interrupts
  1275 system calls
 0 pagein requests
 0 pageout requests
@@ -35,9 +35,9 @@
 0 pagealloc zero wanted and avail
   309 pagealloc zero wanted and not avail
 0 aborts of idle page zeroing
- 3895 pagealloc desired color avail
+ 1760 pagealloc desired color avail
 0 pagealloc desired color not avail
- 3895 pagealloc local cpu avail
+ 1760 pagealloc local cpu avail
 0 pagealloc local cpu not avail
 0 faults with no memory
 0 faults with no anons
--- ksh-nfs.txt 2010-10-26 14:19:02.0 +0900
+++ ksh-xip.txt 2010-10-26 14:18:50.0 +0900
@@ -1,16 +1,16 @@
-# env ENV= time -l /bin/ksh -c :
-0.08 real 0.00 user 0.00 sys
+# env ENV= time -l /mnt/ksh -c :
+0.03 real 0.00 user 0.02 sys
  0  maximum resident set size
  0  average shared memory size
  0  average unshared data size
  0  average unshared stack size
 90  page reclaims
-19  page faults
+ 0  page faults
  0  swaps
- 0  block input operations
+ 3  block input operations
  0  block output operations
-28  messages sent
-28  messages received
+ 2  messages sent
+ 2  messages received
  0  signals received
-28  voluntary context switches
+ 2  voluntary context switches
  0  involuntary context switches