Re: AMAP_SHARED (was Re: XIP)

2010-11-01 Thread Masao Uebayashi
On Tue, Nov 02, 2010 at 06:31:05AM +, YAMAMOTO Takashi wrote:
> hi,
> 
> > A few more comments:
> > 
> > - AMAP_SHARED itself is a fine concept; it's used by shared memory.
> > 
> > sys/kern/sysv_shm.c:
> > 452 error = uvm_map(&vm->vm_map, &attach_va, size, uobj, 0, 
> > 0,
> > 453 UVM_MAPFLAG(prot, prot, UVM_INH_SHARE, 
> > UVM_ADV_RANDOM, flags));
> > 
> >   (Note UVM_INH_SHARE == MAP_INHERIT_SHARE).
> 
> sysv shm is backed by an aobj and does not involve amap.

Hmm, you're right.

> 
> iirc, you need minherit to create shared-cow mappings.
> (imo, the introduction of minherit was a mistake because it
> merely complicated vm.)

Now I'm 100% fine to remove AMAP_SHARED functionality.  Even without
it, UVM is very good in avoiding unnecessary memory copy etc.

Masao

> 
> YAMAMOTO Takashi
> 
> > 
> >   I guess MAP_INHERIT_SHARE was added because adding it was easy
> >   after shared amap was implemented for shared memory?
> > 
> > - For highly tuned, XIP'ed systems, programs should be designed to
> >   avoid .data, because they're COW'ed to page cache sooner or later.


Re: AMAP_SHARED (was Re: XIP)

2010-11-01 Thread YAMAMOTO Takashi
hi,

> A few more comments:
> 
> - AMAP_SHARED itself is a fine concept; it's used by shared memory.
> 
>   sys/kern/sysv_shm.c:
>   452 error = uvm_map(&vm->vm_map, &attach_va, size, uobj, 0, 
> 0,
>   453 UVM_MAPFLAG(prot, prot, UVM_INH_SHARE, 
> UVM_ADV_RANDOM, flags));
> 
>   (Note UVM_INH_SHARE == MAP_INHERIT_SHARE).

sysv shm is backed by an aobj and does not involve amap.

iirc, you need minherit to create shared-cow mappings.
(imo, the introduction of minherit was a mistake because it
merely complicated vm.)

YAMAMOTO Takashi

> 
>   I guess MAP_INHERIT_SHARE was added because adding it was easy
>   after shared amap was implemented for shared memory?
> 
> - For highly tuned, XIP'ed systems, programs should be designed to
>   avoid .data, because they're COW'ed to page cache sooner or later.


Re: AMAP_SHARED (was Re: XIP)

2010-11-01 Thread Masao Uebayashi
On Tue, Nov 02, 2010 at 03:47:55PM +1100, matthew green wrote:
> 
> > > > > > - For highly tuned, XIP'ed systems, programs should be designed to
> > > > > >   avoid .data, because they're COW'ed to page cache sooner or later.
> > > > > 
> > > > > why is this a problem?
> > > > > 
> > > > > if the data is needed, and it will be written to, then these pages
> > > > > will be allocated (COW'd) eventually, and the same space will be used.
> > > > 
> > > > Not a problem, as in it works.
> > > > 
> > > > As already explained, we allocate PV for XIP segments, only for
> > > > vnode-backed AMAP_SHARED == shared .data.  Careful users may design
> > > > the whole system to not allocate PV at all, by giving up that
> > > > feature.  To help user's design decision, I stated the obvious -
> > > > .data is XIP-unfriendly.
> > > 
> > > but why is it unfriendly?  i don't see why.  there's going to
> > > be the same number of pages allocated for writeable data in
> > > both cases, so the same amount of resources will be consumed.
> > 
> > What do you mean by "both cases" here?
> 
> i mean moving stuff from .data to elsewhere, compared to the
> normal method.
> 
> > If a small program has both .data and .bss, and if .data is small,
> > I'd use .rodata and copy it to .bss explicitly, so that resulting
> > process allocates only .bss anon instead .data + .bss.
> 
> why is this useful?  what's the saving?  maybe one page if
> roundup(data + bss) is smaller than roundup(data) +
> roundup(bss).
> 
> my point is that if a program needs data, whether it is from
> the .data or .bss, the same amount of resources will be
> consumed when pages are written to (or not.)  (possibly there
> is a one-page saving...)

You're right.

One page saving is a saving too.

> 
> 
> .mrg.


re: AMAP_SHARED (was Re: XIP)

2010-11-01 Thread matthew green

> > > > > - For highly tuned, XIP'ed systems, programs should be designed to
> > > > >   avoid .data, because they're COW'ed to page cache sooner or later.
> > > > 
> > > > why is this a problem?
> > > > 
> > > > if the data is needed, and it will be written to, then these pages
> > > > will be allocated (COW'd) eventually, and the same space will be used.
> > > 
> > > Not a problem, as in it works.
> > > 
> > > As already explained, we allocate PV for XIP segments, only for
> > > vnode-backed AMAP_SHARED == shared .data.  Careful users may design
> > > the whole system to not allocate PV at all, by giving up that
> > > feature.  To help user's design decision, I stated the obvious -
> > > .data is XIP-unfriendly.
> > 
> > but why is it unfriendly?  i don't see why.  there's going to
> > be the same number of pages allocated for writeable data in
> > both cases, so the same amount of resources will be consumed.
> 
> What do you mean by "both cases" here?

i mean moving stuff from .data to elsewhere, compared to the
normal method.

> If a small program has both .data and .bss, and if .data is small,
> I'd use .rodata and copy it to .bss explicitly, so that resulting
> process allocates only .bss anon instead .data + .bss.

why is this useful?  what's the saving?  maybe one page if
roundup(data + bss) is smaller than roundup(data) +
roundup(bss).

my point is that if a program needs data, whether it is from
the .data or .bss, the same amount of resources will be
consumed when pages are written to (or not.)  (possibly there
is a one-page saving...)


.mrg.


Re: AMAP_SHARED (was Re: XIP)

2010-11-01 Thread Masao Uebayashi
On Tue, Nov 02, 2010 at 03:09:51PM +1100, matthew green wrote:
> 
> > > > - For highly tuned, XIP'ed systems, programs should be designed to
> > > >   avoid .data, because they're COW'ed to page cache sooner or later.
> > > 
> > > why is this a problem?
> > > 
> > > if the data is needed, and it will be written to, then these pages
> > > will be allocated (COW'd) eventually, and the same space will be used.
> > 
> > Not a problem, as in it works.
> > 
> > As already explained, we allocate PV for XIP segments, only for
> > vnode-backed AMAP_SHARED == shared .data.  Careful users may design
> > the whole system to not allocate PV at all, by giving up that
> > feature.  To help user's design decision, I stated the obvious -
> > .data is XIP-unfriendly.
> 
> but why is it unfriendly?  i don't see why.  there's going to
> be the same number of pages allocated for writeable data in
> both cases, so the same amount of resources will be consumed.

What do you mean by "both cases" here?

If a small program has both .data and .bss, and if .data is small,
I'd use .rodata and copy it to .bss explicitly, so that resulting
process allocates only .bss anon instead .data + .bss.

> 
> 
> .mrg.

-- 
Masao Uebayashi / Tombi Inc. / Tel: +81-90-9141-4635


re: AMAP_SHARED (was Re: XIP)

2010-11-01 Thread matthew green

> > > - For highly tuned, XIP'ed systems, programs should be designed to
> > >   avoid .data, because they're COW'ed to page cache sooner or later.
> > 
> > why is this a problem?
> > 
> > if the data is needed, and it will be written to, then these pages
> > will be allocated (COW'd) eventually, and the same space will be used.
> 
> Not a problem, as in it works.
> 
> As already explained, we allocate PV for XIP segments, only for
> vnode-backed AMAP_SHARED == shared .data.  Careful users may design
> the whole system to not allocate PV at all, by giving up that
> feature.  To help user's design decision, I stated the obvious -
> .data is XIP-unfriendly.

but why is it unfriendly?  i don't see why.  there's going to
be the same number of pages allocated for writeable data in
both cases, so the same amount of resources will be consumed.


.mrg.


Re: AMAP_SHARED (was Re: XIP)

2010-11-01 Thread Masao Uebayashi
On Tue, Nov 02, 2010 at 02:36:30PM +1100, matthew green wrote:
> 
> > - AMAP_SHARED itself is a fine concept; it's used by shared memory.
> > 
> > sys/kern/sysv_shm.c:
> > 452 error = uvm_map(&vm->vm_map, &attach_va, size, uobj, 0, 
> > 0,
> > 453 UVM_MAPFLAG(prot, prot, UVM_INH_SHARE, 
> > UVM_ADV_RANDOM, flags));
> > 
> >   (Note UVM_INH_SHARE == MAP_INHERIT_SHARE).
> > 
> >   I guess MAP_INHERIT_SHARE was added because adding it was easy
> >   after shared amap was implemented for shared memory?
> 
> MAP_INHERIT_SHARE was originally MAP_INHERIT, and came from
> machvm.

This was a reply to:

http://mail-index.netbsd.org/tech-kern/2010/10/26/msg009085.html

> - MAP_INHERIT_SHARE, used to implement minherit().
>   this is the one that is the most trouble, since it's what 
AMAP_SHARED
>   was invented for.  however, it's also of least importance since

Even if MAP_INHERIT_SHARE preceded SYSV SHM, we need AMAP_SHARED anyway.

(I don't know if Mach had shared memory.)

> 
> > - For highly tuned, XIP'ed systems, programs should be designed to
> >   avoid .data, because they're COW'ed to page cache sooner or later.
> 
> why is this a problem?
> 
> if the data is needed, and it will be written to, then these pages
> will be allocated (COW'd) eventually, and the same space will be used.

Not a problem, as in it works.

As already explained, we allocate PV for XIP segments, only for
vnode-backed AMAP_SHARED == shared .data.  Careful users may design
the whole system to not allocate PV at all, by giving up that
feature.  To help user's design decision, I stated the obvious -
.data is XIP-unfriendly.

> 
> 
> .mrg.


re: AMAP_SHARED (was Re: XIP)

2010-11-01 Thread matthew green

> - AMAP_SHARED itself is a fine concept; it's used by shared memory.
> 
>   sys/kern/sysv_shm.c:
>   452 error = uvm_map(&vm->vm_map, &attach_va, size, uobj, 0, 
> 0,
>   453 UVM_MAPFLAG(prot, prot, UVM_INH_SHARE, 
> UVM_ADV_RANDOM, flags));
> 
>   (Note UVM_INH_SHARE == MAP_INHERIT_SHARE).
> 
>   I guess MAP_INHERIT_SHARE was added because adding it was easy
>   after shared amap was implemented for shared memory?

MAP_INHERIT_SHARE was originally MAP_INHERIT, and came from
machvm.

> - For highly tuned, XIP'ed systems, programs should be designed to
>   avoid .data, because they're COW'ed to page cache sooner or later.

why is this a problem?

if the data is needed, and it will be written to, then these pages
will be allocated (COW'd) eventually, and the same space will be used.


.mrg.


Re: AMAP_SHARED (was Re: XIP)

2010-11-01 Thread Masao Uebayashi
A few more comments:

- AMAP_SHARED itself is a fine concept; it's used by shared memory.

sys/kern/sysv_shm.c:
452 error = uvm_map(&vm->vm_map, &attach_va, size, uobj, 0, 
0,
453 UVM_MAPFLAG(prot, prot, UVM_INH_SHARE, 
UVM_ADV_RANDOM, flags));

  (Note UVM_INH_SHARE == MAP_INHERIT_SHARE).

  I guess MAP_INHERIT_SHARE was added because adding it was easy
  after shared amap was implemented for shared memory?

- For highly tuned, XIP'ed systems, programs should be designed to
  avoid .data, because they're COW'ed to page cache sooner or later.


RFC: ppath(3): property list paths library

2010-11-01 Thread David Young
I'm working on a library called ppath(3) for making property lists more
convenient to use in the kernel.  With ppath(3), you refer to a property
to read/write/delete in a property list by the path from the list's
outermost container.  Comments welcome.

The latest source is at
.

Here is an example of using ppath(3):

/* Read and write from a personnel property list a user's "favorite color" 
const char **s;
int rc;
ppath_t *p;
prop_dictionary_t d;

/* Create the property list. */
d = prop_dictionary_internalize(" "
"   David Young"
"   "
"   favorite color"
"   green"
"   "
"");

assert(d != NULL);

/* Set up the path. */
p = ppath_create();
ppath_push_key(p, "David Young");
ppath_push_key(p, "favorite color");

assert(p != NULL);

/* Get the string at the path. */
switch (ppath_get_string(d, p, &s)) {
case ENOENT:
errx(EXIT_FAILURE, "favorite color not found");
break;
case EFTYPE:
errx(EXIT_FAILURE, "favorite color is not a string");
break;
case 0:
printf("old favorite color: %s\n", s);
break;
default:
errx(EXIT_FAILURE, "unknown error");
break;
}

/* Replace with a new value. */
switch (ppath_set_string(d, p, "brown")) {
case ENOENT:
errx(EXIT_FAILURE, "favorite color not found");
break;
case EFTYPE:
errx(EXIT_FAILURE, "favorite color is not a string");
break;
case 0:
printf("set a new favorite color\n");
break;
default:
errx(EXIT_FAILURE, "unknown error");
break;
}

Dave

-- 
David Young OJC Technologies
dyo...@ojctech.com  Urbana, IL * (217) 278-3933


Re: pmap_extract(9) (was Re: xmd(4) (Re: XIP))

2010-11-01 Thread Matt Thomas

On Nov 1, 2010, at 8:55 AM, Eduardo Horvath wrote:

> On Mon, 1 Nov 2010, Masao Uebayashi wrote:
> 
>> I think pmap_extract(9) is a bad API.
>> 
>> After MD bootstrap code detects all physical memories, it gives
>> all the informations to UVM, including available KVA.  At this
>> point UVM knows all the available resources of virtual/physical
>> addresses.  UVM is responsible to manage all of these.
> 
> This is managed RAM.  What about I/O pages?

Indeed.  Also consider that pmap's are designed to have to have
fast V->P translations, using that instead of UVM makes a lot of
sense.


Re: pmap_extract(9) (was Re: xmd(4) (Re: XIP))

2010-11-01 Thread Eduardo Horvath
On Mon, 1 Nov 2010, Masao Uebayashi wrote:

> I think pmap_extract(9) is a bad API.
> 
> After MD bootstrap code detects all physical memories, it gives
> all the informations to UVM, including available KVA.  At this
> point UVM knows all the available resources of virtual/physical
> addresses.  UVM is responsible to manage all of these.

This is managed RAM.  What about I/O pages?

> Calling pmap_extract(9) means that some kernel code asks pmap(9)
> to look up a physical address.  pmap(9) is only responsible to
> handle CPU and MMU.  Using it as a lookup database is an abuse.
> The only reasonable use of pmap_extract(9) is for debugging purpose.
> I think that pmap_extract(9) should be changed like:
> 
>   bool pmap_mapped_p(struct pmap *, vaddr_t);
> 
> and allow it to be used for KASSERT()s.
> 
> The only right way to retrieve P->V translation is to lookup from
> vm_map (== the fault handler).  If we honour this principle, VM
> and I/O code will be much more consistent.

pmap(9) has always needed a database to keep track of V->P mappings(*) as 
wll as P->V mappings so pmap_page_protect() can be implemented.  

Are you planning on moving the responsibility of tracking P->V mappings to 
UVM?

* While you can claim that keeping track of P->V mappings is the primary 
function of pmap(9) and a sideffect of page tables, that posits the 
machine in quesion uses page tables.  In a machine with a software managed 
TLB you could implement pmap(9) by walking the UVM structures on a page 
fault and generating TLB entries from the vm_page structure.  This would 
reduce the amount of duplicate informaion maintained by the VM subsystems.  
However, UVM currently assumes pmap() remembers all forward and reverse 
mappings.  If pmap() forgets them, bad things happen.

Eduardo


Re: pmap_extract(9) (was Re: xmd(4) (Re: XIP))

2010-11-01 Thread der Mouse
> The only right way to retrieve P->V translation is to lookup from
> vm_map (== the fault handler).

What about setting up DMA on machines whose DMA uses physical
addresses?  Or does the DMA code get an exception to this rule?

I also suspect debugging may well be a non-ignorable use case, though I
could also be wrong about that.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


pmap_extract(9) (was Re: xmd(4) (Re: XIP))

2010-11-01 Thread Masao Uebayashi
I think pmap_extract(9) is a bad API.

After MD bootstrap code detects all physical memories, it gives
all the informations to UVM, including available KVA.  At this
point UVM knows all the available resources of virtual/physical
addresses.  UVM is responsible to manage all of these.

Calling pmap_extract(9) means that some kernel code asks pmap(9)
to look up a physical address.  pmap(9) is only responsible to
handle CPU and MMU.  Using it as a lookup database is an abuse.
The only reasonable use of pmap_extract(9) is for debugging purpose.
I think that pmap_extract(9) should be changed like:

bool pmap_mapped_p(struct pmap *, vaddr_t);

and allow it to be used for KASSERT()s.

The only right way to retrieve P->V translation is to lookup from
vm_map (== the fault handler).  If we honour this principle, VM
and I/O code will be much more consistent.

Masao

-- 
Masao Uebayashi / Tombi Inc. / Tel: +81-90-9141-4635


Re: AMAP_SHARED (was Re: XIP)

2010-11-01 Thread Masao Uebayashi
On Mon, Nov 01, 2010 at 02:32:23AM +0900, Masao Uebayashi wrote:
> Hm.  I may be missing something seriously.  I think I've
> understood the following description you wrote, and I got hit this
> in the very early development stage ... >1 year ago.  It was
> uvm_fault() -> uvmfault_promote() -> amap_add() -> pmap_page_protect()
> Since then I've assumed that shared amap is a pretty much common
> thing...  Now I realize it should not be, as you describe.  Worse,
> I can't reproduce that code path...
> 
> Now I have to *really* understand how this works...

I still have no idea how I was hit by this.  Strange.

Considering how AMAP_SHARED is used, especially it's backed by
vnode, this is only for highly tuned server/client programs which
share initialized data (.data).  While such a use case may have
some value, I'd say this is a rare feature.

I append a test program.

> 
> (I'll respond about this topic again later.)
> 
> Masao
> 
> On Tue, Oct 26, 2010 at 02:06:38PM -0700, Chuck Silvers wrote:
> (snip)
> > now here's the explanation I promised for how to treat XIP pages
> > as unmanaged instead of managed.
> > 
> > first, some background for other people who don't know all this:
> > the only reason that treating XIP pages as managed pages is
> > relevant at all is because of the AMAP_SHARED flag in UVM,
> > which allows anonymous memory to be shared between processes
> > such that the changes made by one process are seen by the other.
> > this impacts XIP pages (which are not anonymous) because a
> > PROT_WRITE, MAP_PRIVATE mapping of an XIP vnode should point to
> > the XIP pages as long as all access to the mapping is for reads,
> > but when the mapping is written to then the XIP page should be
> > copied to an anonymous page (the normal COW operation) but that
> > new anonymous page should still be shared between all processes
> > that are sharing the AMAP_SHARED mapping.  to force those other
> > processes to take another page fault the next time they access
> > their copy of the mapping (which we need to do so that they will
> > start accessing the new anonymous page instead of the XIP page),
> > we must invalidate all the other pmap entries for the XIP page,
> > which we do by calling pmap_page_protect() on it.  the pmap layer
> > tracks all the mappings of the page and thus it can find them all.
> > 
> > there are several ways that the AMAP_SHARED functionality is used,
> > and unfortunately they would need to changed in different ways to
> > make this work for XIP pages without needing to track mappings:
> > 
> >  - uvm_io(), which copies data between the kernel or current process
> >and an arbitrary other process address space.
> >currently this works by sharing the other address space with
> >the kernel via uvm_map_extract() and then just using uiomove()
> >to transfer the data.  this could be done instead by using part
> >of the uvm_fault() code to find the physical page in the other
> >address space that we want to access and lock it (ie. set PG_BUSY),
> >map the page into the kernel (perhaps using uvm_pager_mapin()),
> >transfer the data, then unmap the page and unlock it.
> > 
> >  - uvm_mremap(), which resizes an existing mapping.
> >this uses uvm_map_extract() internally, which uses AMAP_SHARED,
> >but the mremap operation doesn't actually need the semantics of
> >AMAP_SHARED since as mremap doesn't create any additional mappings
> >as far as applications are concerned.  the usage of AMAP_SHARED
> >is just a side-effect of the current implementation, which bends
> >over backward to call a bunch of existing higher-level functions
> >rather than doing something more direct (which would be simpler
> >and more efficient as well).
> > 
> >  - MAP_INHERIT_SHARE, used to implement minherit().
> >this is the one that is the most trouble, since it's what AMAP_SHARED
> >was invented for.  however, it's also of least importance since
> >some searching with google finds absolutely no evidence of
> >any application actually using it, just lots of copies of
> >the implementations and manpages for various operating systems.
> > 
> >with that in mind, there are several ways this could be handled.
> > (1) just drop support for minherit() entirely.
> > (2) reject attempts to set MAP_INHERIT_SHARE on XIP mappings.
> > (3) copy XIP pages into normal anon pages when setting
> > MAP_INHERIT_SHARE.
> > (4) copy XIP pages into normal vnode pages when setting
> > MAP_INHERIT_SHARE.  this would mean that the getpages
> > code would need to look in the normal page cache
> > before using XIP pages.  I think this option would also
> > need getpages to know about the inherit flag to
> > correctly handle later faults on XIP mappings,
> > and there are probably other sublte complications.
> > 
> >of these choices, (2) sounds like