Re: CVS commit: src/sys/uvm

2011-08-13 Thread Mindaugas Rasiukevicius
y...@mwd.biglobe.ne.jp (YAMAMOTO Takashi) wrote:
> > Right, uvm_anon_locklaonpg() dance can happen only in O->A case.
> > However, having uvm_anfree() able to release the lock by its interface
> > definition is potentially defective.  It is the main motivation why I
> > want to slightly rework the code into uvm_anon_freelst() which would
> > always drop the lock and move freeing of anons to the end point.
> > Cleaner, less error prone.
> 
> the committed change seems broken in case uvm_anon_dispose sets
> PG_RELEASED. in that case, uvm_anon_freelst should leave the anon as it
> will be freed by uvm_anon_release later.

Right.  Fix committed.

Do we have a better regression test to trigger PG_RELEASED cases?

-- 
Mindaugas


Re: CVS commit: src/sys/uvm

2011-08-10 Thread YAMAMOTO Takashi
[ moving to tech-kern ]

hi,

> y...@mwd.biglobe.ne.jp (YAMAMOTO Takashi) wrote:
>> > 
>> > Here is the updated patch after your changes:
>> > 
>> > http://www.netbsd.org/~rmind/uvm_anon_freelst2.diff
>> > 
>> > As you noted, uvm_anfree() can temporarily release the amap lock - that
>> > can happen in amap_copy().  Patch closes the race by moving uvm_anfree
>> > () further, and changes the semantics of the function, now called
>> > uvm_anon_freelst(), to return with amap lock released (plus free anons
>> > without lock held).
>> 
>> the temporary release of the amap lock is only for O->A loan
>> which you disabled, isn't it?
> 
> Right, uvm_anon_locklaonpg() dance can happen only in O->A case.  However,
> having uvm_anfree() able to release the lock by its interface definition
> is potentially defective.  It is the main motivation why I want to slightly
> rework the code into uvm_anon_freelst() which would always drop the lock
> and move freeing of anons to the end point.  Cleaner, less error prone.

the committed change seems broken in case uvm_anon_dispose sets PG_RELEASED.
in that case, uvm_anon_freelst should leave the anon as it will be freed by
uvm_anon_release later.

YAMAMOTO Takashi

> 
> -- 
> Mindaugas


Re: CVS commit: src/sys/uvm

2011-01-16 Thread YAMAMOTO Takashi
hi,

>> >> (Just in case, have you read my paper?)
>> 
>> which paper?  i guess no.
> 
> http://uebayasi.dyndns.org/~uebayasi/tmp/xip.pdf

i will put it on my todo list.  thanks.

YAMAMOTO Takashi


Re: CVS commit: src/sys/uvm

2011-01-05 Thread Masao Uebayashi
On Wed, Jan 05, 2011 at 05:58:34AM +, YAMAMOTO Takashi wrote:
> hi,
> 
> > I take silence as "no objection".
> 
> the silence in this case means was-busy-for-other-things-and-forgot.
> sorry.
> 
> >> I have no real code for this big picture at this moment.  Making
> >> vm_physseg available as reference is the first step.  This only
> >> changes uvm_page_physload() to return a pointer:
> >> 
> >>-void uvm_page_physload();
> >>+void *uvm_page_physload();
> >> 
> >> But this makes XIP pager MUCH cleaner.  The reason has been explained
> >> many times.
> 
> because the separate uvm_page_physload_device is no longer necessary,
> you mean?  i have no problem with the step.
> 
> >> Making fault handlers and pagers to use vm_physseg * + off_t is
> >> the next step, and I don't intend to work on it now.  I just want
> >> to explain the big picture.
> >> 
> >> > 
> >> > >> 
> >> > >> Keep vm_physseg * + off_t array on stack.  If UVM objects uses
> >> > >> vm_page (e.g. vnode), its pager looks up vm_page -> vm_physseg *
> >> > >> + off_t *once* and cache it on stack.
> >> > 
> >> > do you mean something like this?
> >> >  struct {
> >> >  vm_physseg *hoge;
> >> >  off_t fuga;
> >> >  } foo [16];
> >> 
> >> Yes.
> >> 
> >> Or cache vm_page * with it, like:
> >> 
> >>struct vm_id {
> >>vm_physseg *seg;
> >>off_t off;
> >>vm_page *pg;
> >>};
> >> 
> >>uvm_fault()
> >>{
> >>vm_id pgs[];
> >>:
> >>}
> >> 
> >> Vnode pager (genfs_getpages) takes vm_page's by looking up
> >> vnode::v_uobj's list, or uvn_findpages().
> >> 
> >> When it returns back to fault handler, we have to lookup vm_physseg
> >> for each page.  Then fill the "seg" slot above (assuming we'll
> >> remove vm_page::phys_addr soon).
> >> 
> >> Fault handler calls per-vm_page operations iff vm_page slot is filled.
> >> XIP pages are not pageq'ed.
> 
> pgo_get returns either seg+off or pg for each vm_id slots?

vm_id[] is passed to pgo_get, each pgo_get (uvn_get, udv_get, ...)
will fill seg + off and return the array back to uvm_fault.

Pagers that use vm_page (e.g. vnode) will get pg from vnode, then
lookup seg+off, fill them into the vm_id slot.

Those that don't use vm_page (e.g. cdev) will get seg+off from
underlying objects, fill them into the vm_id slot, and return.

> 
> >> XIP pages don't need vm_page, but
> >> cached because it's vnode.
> 
> can you explain this sentence?

Sorry, it was so unclear.

XIP pages don't need vm_page (as a paging state variable),
because the pages are read-only and never involve paging.

XIP pages have to be mapped as cacheable at MMU level,
because it's vnode.

I'm allocating the full vm_page for XIP'able pages, only because
chs@ wanted to support loaning, which needs vm_page::loan_count.

> 
> >> (Just in case, have you read my paper?)
> 
> which paper?  i guess no.

http://uebayasi.dyndns.org/~uebayasi/tmp/xip.pdf

> 
> YAMAMOTO Takashi

-- 
Masao Uebayashi / Tombi Inc. / Tel: +81-90-9141-4635


Re: CVS commit: src/sys/uvm

2011-01-04 Thread YAMAMOTO Takashi
hi,

> I take silence as "no objection".

the silence in this case means was-busy-for-other-things-and-forgot.
sorry.

>> I have no real code for this big picture at this moment.  Making
>> vm_physseg available as reference is the first step.  This only
>> changes uvm_page_physload() to return a pointer:
>> 
>>  -void uvm_page_physload();
>>  +void *uvm_page_physload();
>> 
>> But this makes XIP pager MUCH cleaner.  The reason has been explained
>> many times.

because the separate uvm_page_physload_device is no longer necessary,
you mean?  i have no problem with the step.

>> Making fault handlers and pagers to use vm_physseg * + off_t is
>> the next step, and I don't intend to work on it now.  I just want
>> to explain the big picture.
>> 
>> > 
>> > >> 
>> > >> Keep vm_physseg * + off_t array on stack.  If UVM objects uses
>> > >> vm_page (e.g. vnode), its pager looks up vm_page -> vm_physseg *
>> > >> + off_t *once* and cache it on stack.
>> > 
>> > do you mean something like this?
>> >struct {
>> >vm_physseg *hoge;
>> >off_t fuga;
>> >} foo [16];
>> 
>> Yes.
>> 
>> Or cache vm_page * with it, like:
>> 
>>  struct vm_id {
>>  vm_physseg *seg;
>>  off_t off;
>>  vm_page *pg;
>>  };
>> 
>>  uvm_fault()
>>  {
>>  vm_id pgs[];
>>  :
>>  }
>> 
>> Vnode pager (genfs_getpages) takes vm_page's by looking up
>> vnode::v_uobj's list, or uvn_findpages().
>> 
>> When it returns back to fault handler, we have to lookup vm_physseg
>> for each page.  Then fill the "seg" slot above (assuming we'll
>> remove vm_page::phys_addr soon).
>> 
>> Fault handler calls per-vm_page operations iff vm_page slot is filled.
>> XIP pages are not pageq'ed.

pgo_get returns either seg+off or pg for each vm_id slots?

>> XIP pages don't need vm_page, but
>> cached because it's vnode.

can you explain this sentence?

>> (Just in case, have you read my paper?)

which paper?  i guess no.

YAMAMOTO Takashi


Re: CVS commit: src/sys/uvm

2011-01-04 Thread Masao Uebayashi
On Mon, Jan 03, 2011 at 10:28:17PM -0800, Matt Thomas wrote:
> 
> On Jan 3, 2011, at 9:01 PM, Masao Uebayashi wrote:
> 
> > I take silence as "no objection".
> 
> Not so safe.
> 
> > On Thu, Dec 23, 2010 at 12:48:04PM +0900, Masao Uebayashi wrote:
> >> On Wed, Dec 22, 2010 at 05:37:57AM +, YAMAMOTO Takashi wrote:
> >>> hi,
> >>> 
>  Could you ack this discussion?
> >>> 
> >>> sorry for dropping a ball.
> >>> 
>  
>  On Tue, Dec 07, 2010 at 01:19:46AM +0900, Masao Uebayashi wrote:
> > On Thu, Nov 25, 2010 at 11:32:39PM +, YAMAMOTO Takashi wrote:
> >> [ adding cc: tech-kern@ ]
> >> 
> >> hi,
> >> 
> >>> On Wed, Nov 24, 2010 at 11:26:39PM -0800, Matt Thomas wrote:
>  
>  On Nov 24, 2010, at 10:47 PM, Masao Uebayashi wrote:
>  
> > On Thu, Nov 25, 2010 at 05:44:21AM +, YAMAMOTO Takashi wrote:
> >> hi,
> >> 
> >>> On Thu, Nov 25, 2010 at 04:18:25AM +, YAMAMOTO Takashi wrote:
>  hi,
>  
> > Hi, thanks for review.
> > 
> > On Thu, Nov 25, 2010 at 01:58:04AM +, YAMAMOTO Takashi 
> > wrote:
> >> hi,
> >> 
> >> - what's VM_PHYSSEG_OP_PG?
> > 
> > It's to lookup vm_physseg by "struct vm_page *", relying on that
> > "struct vm_page *[]" is allocated linearly.  It'll be used to 
> > remove
> > vm_page::phys_addr as we talked some time ago.
>  
>  i'm not sure if commiting this unused uncommented code now helps 
>  it.
>  some try-and-benchmark cycles might be necessary given that
>  vm_page <-> paddr conversion could be performace critical.
> >>> 
> >>> If you really care performance, we can directly pass "struct 
> >>> vm_page
> >>> *" to pmap_enter().
> >>> 
> >>> We're doing "struct vm_page *" -> "paddr_t" just before 
> >>> pmap_enter(),
> >>> then doing "paddr_t" -> "vm_physseg" reverse lookup again in
> >>> pmap_enter() to check if a given PA is managed.  What is really
> >>> needed here is, to lookup "struct vm_page *" -> "vm_physseg" once
> >>> and you'll know both paddr_t and managed or not.
> >> 
> >> i agree that the current code is not ideal in that respect.
> >> otoh, i'm not sure if passing vm_physseg around is a good idea.
> > 
> > It's great you share the interest.
> > 
> > I chose vm_physseg, because it was there.  I'm open to alternatives,
> > but I don't think you have many options...
>  
>  Passing vm_page * doesn't work if the page isn't managed since there
>  won't be a vm_page for the paddr_t.
>  
>  Now passing both paddr_t and vm_page * would work and if the pointer
>  to the vm_page, it would be an unmanaged mapping.  This also gives 
>  the
>  access to mdpg without another lookup.
> >>> 
> >>> What if XIP'ed md(4), where physical pages are in .data (or .rodata)?
> >>> 
> >>> And don't forget that you're the one who first pointed out that
> >>> allocating vm_pages for XIP is a pure waste of memory. ;)
> >> 
> >> i guess matt meant "if the pointer to the vm_page is NULL,".
> >> 
> >>> 
> >>> I'm allocating vm_pages, only because of phys_addr and loan_count.
> >>> I believe vm_pages is unnecessary for read-only XIP segments.
> >>> Because they're read-only, and stateless.
> >>> 
> >>> I've already concluded that the current "managed or not" model
> >>> doesn't work for XIP.  I'm pretty sure that my vm_physseg + off_t
> >>> model can explain everything.  I'm rather waiting for a counter
> >>> example how vm_physseg doesn't work...
> >> 
> >> i guess your suggestion is too vague.
> >> where do you want to use vm_physseg * + off_t instead of vm_page * ?
> >> getpages, pmap_enter, and?  how their function prototypes would be?
> > 
> > The basic idea is straightforward; always allocate vm_physseg for
> > memories/devices.  If a vm_physseg is used as general purpose
> > memory, you allocate vm_page[] (as vm_physseg::pgs).  If it's
> > potentially mapped as cached, you allocate pvh (as vm_physseg:pvh).
> >>> 
> >>> can you provide function prototypes?
> >> 
> >> I have no real code for this big picture at this moment.  Making
> >> vm_physseg available as reference is the first step.  This only
> >> changes uvm_page_physload() to return a pointer:
> >> 
> >>-void uvm_page_physload();
> >>+void *uvm_page_physload();
> >> 
> >> But this makes XIP pager MUCH cleaner.  The reason has been explained
> >> many times.
> >> 
> >> Making fault handlers and pagers to use vm_physseg * + off_t is
> >> the next step, and I don't intend to work on it now.  I ju

Re: CVS commit: src/sys/uvm

2011-01-03 Thread Matt Thomas

On Jan 3, 2011, at 9:01 PM, Masao Uebayashi wrote:

> I take silence as "no objection".

Not so safe.

> On Thu, Dec 23, 2010 at 12:48:04PM +0900, Masao Uebayashi wrote:
>> On Wed, Dec 22, 2010 at 05:37:57AM +, YAMAMOTO Takashi wrote:
>>> hi,
>>> 
 Could you ack this discussion?
>>> 
>>> sorry for dropping a ball.
>>> 
 
 On Tue, Dec 07, 2010 at 01:19:46AM +0900, Masao Uebayashi wrote:
> On Thu, Nov 25, 2010 at 11:32:39PM +, YAMAMOTO Takashi wrote:
>> [ adding cc: tech-kern@ ]
>> 
>> hi,
>> 
>>> On Wed, Nov 24, 2010 at 11:26:39PM -0800, Matt Thomas wrote:
 
 On Nov 24, 2010, at 10:47 PM, Masao Uebayashi wrote:
 
> On Thu, Nov 25, 2010 at 05:44:21AM +, YAMAMOTO Takashi wrote:
>> hi,
>> 
>>> On Thu, Nov 25, 2010 at 04:18:25AM +, YAMAMOTO Takashi wrote:
 hi,
 
> Hi, thanks for review.
> 
> On Thu, Nov 25, 2010 at 01:58:04AM +, YAMAMOTO Takashi wrote:
>> hi,
>> 
>> - what's VM_PHYSSEG_OP_PG?
> 
> It's to lookup vm_physseg by "struct vm_page *", relying on that
> "struct vm_page *[]" is allocated linearly.  It'll be used to 
> remove
> vm_page::phys_addr as we talked some time ago.
 
 i'm not sure if commiting this unused uncommented code now helps 
 it.
 some try-and-benchmark cycles might be necessary given that
 vm_page <-> paddr conversion could be performace critical.
>>> 
>>> If you really care performance, we can directly pass "struct vm_page
>>> *" to pmap_enter().
>>> 
>>> We're doing "struct vm_page *" -> "paddr_t" just before 
>>> pmap_enter(),
>>> then doing "paddr_t" -> "vm_physseg" reverse lookup again in
>>> pmap_enter() to check if a given PA is managed.  What is really
>>> needed here is, to lookup "struct vm_page *" -> "vm_physseg" once
>>> and you'll know both paddr_t and managed or not.
>> 
>> i agree that the current code is not ideal in that respect.
>> otoh, i'm not sure if passing vm_physseg around is a good idea.
> 
> It's great you share the interest.
> 
> I chose vm_physseg, because it was there.  I'm open to alternatives,
> but I don't think you have many options...
 
 Passing vm_page * doesn't work if the page isn't managed since there
 won't be a vm_page for the paddr_t.
 
 Now passing both paddr_t and vm_page * would work and if the pointer
 to the vm_page, it would be an unmanaged mapping.  This also gives the
 access to mdpg without another lookup.
>>> 
>>> What if XIP'ed md(4), where physical pages are in .data (or .rodata)?
>>> 
>>> And don't forget that you're the one who first pointed out that
>>> allocating vm_pages for XIP is a pure waste of memory. ;)
>> 
>> i guess matt meant "if the pointer to the vm_page is NULL,".
>> 
>>> 
>>> I'm allocating vm_pages, only because of phys_addr and loan_count.
>>> I believe vm_pages is unnecessary for read-only XIP segments.
>>> Because they're read-only, and stateless.
>>> 
>>> I've already concluded that the current "managed or not" model
>>> doesn't work for XIP.  I'm pretty sure that my vm_physseg + off_t
>>> model can explain everything.  I'm rather waiting for a counter
>>> example how vm_physseg doesn't work...
>> 
>> i guess your suggestion is too vague.
>> where do you want to use vm_physseg * + off_t instead of vm_page * ?
>> getpages, pmap_enter, and?  how their function prototypes would be?
> 
> The basic idea is straightforward; always allocate vm_physseg for
> memories/devices.  If a vm_physseg is used as general purpose
> memory, you allocate vm_page[] (as vm_physseg::pgs).  If it's
> potentially mapped as cached, you allocate pvh (as vm_physseg:pvh).
>>> 
>>> can you provide function prototypes?
>> 
>> I have no real code for this big picture at this moment.  Making
>> vm_physseg available as reference is the first step.  This only
>> changes uvm_page_physload() to return a pointer:
>> 
>>  -void uvm_page_physload();
>>  +void *uvm_page_physload();
>> 
>> But this makes XIP pager MUCH cleaner.  The reason has been explained
>> many times.
>> 
>> Making fault handlers and pagers to use vm_physseg * + off_t is
>> the next step, and I don't intend to work on it now.  I just want
>> to explain the big picture.
>> 
>>> 
> 
> Keep vm_physseg * + off_t array on stack.  If UVM objects uses
> vm_page (e.g. vnode), its pager looks up vm_page -> vm_physseg *
> + off_t *once* and cache it on stack.
>>> 
>>> do you mean something like this?
>>> struct {
>>> vm_physseg *hog

Re: CVS commit: src/sys/uvm

2011-01-03 Thread Masao Uebayashi
I take silence as "no objection".

On Thu, Dec 23, 2010 at 12:48:04PM +0900, Masao Uebayashi wrote:
> On Wed, Dec 22, 2010 at 05:37:57AM +, YAMAMOTO Takashi wrote:
> > hi,
> > 
> > > Could you ack this discussion?
> > 
> > sorry for dropping a ball.
> > 
> > > 
> > > On Tue, Dec 07, 2010 at 01:19:46AM +0900, Masao Uebayashi wrote:
> > >> On Thu, Nov 25, 2010 at 11:32:39PM +, YAMAMOTO Takashi wrote:
> > >> > [ adding cc: tech-kern@ ]
> > >> > 
> > >> > hi,
> > >> > 
> > >> > > On Wed, Nov 24, 2010 at 11:26:39PM -0800, Matt Thomas wrote:
> > >> > >> 
> > >> > >> On Nov 24, 2010, at 10:47 PM, Masao Uebayashi wrote:
> > >> > >> 
> > >> > >> > On Thu, Nov 25, 2010 at 05:44:21AM +, YAMAMOTO Takashi wrote:
> > >> > >> >> hi,
> > >> > >> >> 
> > >> > >> >>> On Thu, Nov 25, 2010 at 04:18:25AM +, YAMAMOTO Takashi 
> > >> > >> >>> wrote:
> > >> > >>  hi,
> > >> > >>  
> > >> > >> > Hi, thanks for review.
> > >> > >> > 
> > >> > >> > On Thu, Nov 25, 2010 at 01:58:04AM +, YAMAMOTO Takashi 
> > >> > >> > wrote:
> > >> > >> >> hi,
> > >> > >> >> 
> > >> > >> >> - what's VM_PHYSSEG_OP_PG?
> > >> > >> > 
> > >> > >> > It's to lookup vm_physseg by "struct vm_page *", relying on 
> > >> > >> > that
> > >> > >> > "struct vm_page *[]" is allocated linearly.  It'll be used to 
> > >> > >> > remove
> > >> > >> > vm_page::phys_addr as we talked some time ago.
> > >> > >>  
> > >> > >>  i'm not sure if commiting this unused uncommented code now 
> > >> > >>  helps it.
> > >> > >>  some try-and-benchmark cycles might be necessary given that
> > >> > >>  vm_page <-> paddr conversion could be performace critical.
> > >> > >> >>> 
> > >> > >> >>> If you really care performance, we can directly pass "struct 
> > >> > >> >>> vm_page
> > >> > >> >>> *" to pmap_enter().
> > >> > >> >>> 
> > >> > >> >>> We're doing "struct vm_page *" -> "paddr_t" just before 
> > >> > >> >>> pmap_enter(),
> > >> > >> >>> then doing "paddr_t" -> "vm_physseg" reverse lookup again in
> > >> > >> >>> pmap_enter() to check if a given PA is managed.  What is really
> > >> > >> >>> needed here is, to lookup "struct vm_page *" -> "vm_physseg" 
> > >> > >> >>> once
> > >> > >> >>> and you'll know both paddr_t and managed or not.
> > >> > >> >> 
> > >> > >> >> i agree that the current code is not ideal in that respect.
> > >> > >> >> otoh, i'm not sure if passing vm_physseg around is a good idea.
> > >> > >> > 
> > >> > >> > It's great you share the interest.
> > >> > >> > 
> > >> > >> > I chose vm_physseg, because it was there.  I'm open to 
> > >> > >> > alternatives,
> > >> > >> > but I don't think you have many options...
> > >> > >> 
> > >> > >> Passing vm_page * doesn't work if the page isn't managed since there
> > >> > >> won't be a vm_page for the paddr_t.
> > >> > >> 
> > >> > >> Now passing both paddr_t and vm_page * would work and if the pointer
> > >> > >> to the vm_page, it would be an unmanaged mapping.  This also gives 
> > >> > >> the
> > >> > >> access to mdpg without another lookup.
> > >> > > 
> > >> > > What if XIP'ed md(4), where physical pages are in .data (or .rodata)?
> > >> > > 
> > >> > > And don't forget that you're the one who first pointed out that
> > >> > > allocating vm_pages for XIP is a pure waste of memory. ;)
> > >> > 
> > >> > i guess matt meant "if the pointer to the vm_page is NULL,".
> > >> > 
> > >> > > 
> > >> > > I'm allocating vm_pages, only because of phys_addr and loan_count.
> > >> > > I believe vm_pages is unnecessary for read-only XIP segments.
> > >> > > Because they're read-only, and stateless.
> > >> > > 
> > >> > > I've already concluded that the current "managed or not" model
> > >> > > doesn't work for XIP.  I'm pretty sure that my vm_physseg + off_t
> > >> > > model can explain everything.  I'm rather waiting for a counter
> > >> > > example how vm_physseg doesn't work...
> > >> > 
> > >> > i guess your suggestion is too vague.
> > >> > where do you want to use vm_physseg * + off_t instead of vm_page * ?
> > >> > getpages, pmap_enter, and?  how their function prototypes would be?
> > >> 
> > >> The basic idea is straightforward; always allocate vm_physseg for
> > >> memories/devices.  If a vm_physseg is used as general purpose
> > >> memory, you allocate vm_page[] (as vm_physseg::pgs).  If it's
> > >> potentially mapped as cached, you allocate pvh (as vm_physseg:pvh).
> > 
> > can you provide function prototypes?
> 
> I have no real code for this big picture at this moment.  Making
> vm_physseg available as reference is the first step.  This only
> changes uvm_page_physload() to return a pointer:
> 
>   -void uvm_page_physload();
>   +void *uvm_page_physload();
> 
> But this makes XIP pager MUCH cleaner.  The reason has been explained
> many times.
> 
> Making fault handlers and pagers to use vm_physseg * + off_t is
> the next step, and I don't intend to work on it now.  I just want
> to explain the big 

Re: CVS commit: src/sys/uvm

2011-01-03 Thread Masao Uebayashi
Do you have any questions yet?

On Wed, Dec 22, 2010 at 12:52:21PM +0900, Masao Uebayashi wrote:
> On Tue, Dec 21, 2010 at 06:33:53PM -0800, Matt Thomas wrote:
> > 
> > On Dec 21, 2010, at 6:13 PM, Masao Uebayashi wrote:
> > 
> > > On Tue, Dec 21, 2010 at 11:29:01AM -0800, Matt Thomas wrote:
> > >> 
> > >> On Dec 6, 2010, at 8:19 AM, Masao Uebayashi wrote:
> > >> 
> > >>> On Thu, Nov 25, 2010 at 11:32:39PM +, YAMAMOTO Takashi wrote:
> >  [ adding cc: tech-kern@ ]
> > >>> 
> > >>> The basic idea is straightforward; always allocate vm_physseg for
> > >>> memories/devices.  If a vm_physseg is used as general purpose
> > >>> memory, you allocate vm_page[] (as vm_physseg::pgs).  If it's
> > >>> potentially mapped as cached, you allocate pvh (as vm_physseg:pvh).
> > >> 
> > >> Ewww.  How p->v is managed needs to be kept out of the MI code.
> > > 
> > > Could you elaborate the reason why so?
> > 
> > PowerPC OEA uses an inverted page table.  It's p->v works very differently.
> 
> True.
> 
> > 
> > > I've already proven that __HAVE_VM_PAGE_MD pmaps don't need struct
> > > vm_page *.
> > 
> > Well, pmap_page_* definitely need vm_page * since it's one of the arguments.
> 
> Yes, and vm_page * is used as page identity, right?
> 
> What I call "vm_page is used" means for example read/write'ing
> pg->flags.  In that sense, vm_page is not really used in any pmaps.
> 
> > 
> >  any valid paddr_t value will belong to exactly one vm_phsseg?
> > >>> 
> > >>> That's the idea.  This would clarify mem(4) backend too.
> > >>> 
> > >>> Note that allocating vm_physseg for device segments is cheap.
> > >> 
> > >> that's depends on how much more expensive finding physseg gets.
> > > 
> > > "Finding physseg" == "(reverse) lookup of vm_page -> vm_physseg".
> > > It is done only once (for each page) in pagers that use vm_page.
> > > 
> > > Is the biggest concern lookup cost?  Then I'd point out that
> > > uvm_pageismanaged() in pmap_enter() should die.  Cacheability is
> > > decided by how VA is mapped.  Those uvm_pageismanaged() calls are
> > > both inefficient and wrong.
> > 
> > Not all pmaps use that.  Instead, we should have a PMAP_UNMANAGED
> > flag passed to pmap_enter since the caller probably know the
> > answer.
> 
> s/probably know/know/
> 
> IMO cacheability is always explicitly specified by UVM.
> 
> Cacheability is decided by how VA is mapped, i.e., shared or not.
> Vnode are shared but read-only, so it's cacheable.  Device registers
> are shared between CPU and H/W, so it can't be cacheable.

-- 
Masao Uebayashi / Tombi Inc. / Tel: +81-90-9141-4635


Re: CVS commit: src/sys/uvm

2010-12-22 Thread Masao Uebayashi
On Wed, Dec 22, 2010 at 05:37:57AM +, YAMAMOTO Takashi wrote:
> hi,
> 
> > Could you ack this discussion?
> 
> sorry for dropping a ball.
> 
> > 
> > On Tue, Dec 07, 2010 at 01:19:46AM +0900, Masao Uebayashi wrote:
> >> On Thu, Nov 25, 2010 at 11:32:39PM +, YAMAMOTO Takashi wrote:
> >> > [ adding cc: tech-kern@ ]
> >> > 
> >> > hi,
> >> > 
> >> > > On Wed, Nov 24, 2010 at 11:26:39PM -0800, Matt Thomas wrote:
> >> > >> 
> >> > >> On Nov 24, 2010, at 10:47 PM, Masao Uebayashi wrote:
> >> > >> 
> >> > >> > On Thu, Nov 25, 2010 at 05:44:21AM +, YAMAMOTO Takashi wrote:
> >> > >> >> hi,
> >> > >> >> 
> >> > >> >>> On Thu, Nov 25, 2010 at 04:18:25AM +, YAMAMOTO Takashi wrote:
> >> > >>  hi,
> >> > >>  
> >> > >> > Hi, thanks for review.
> >> > >> > 
> >> > >> > On Thu, Nov 25, 2010 at 01:58:04AM +, YAMAMOTO Takashi 
> >> > >> > wrote:
> >> > >> >> hi,
> >> > >> >> 
> >> > >> >> - what's VM_PHYSSEG_OP_PG?
> >> > >> > 
> >> > >> > It's to lookup vm_physseg by "struct vm_page *", relying on that
> >> > >> > "struct vm_page *[]" is allocated linearly.  It'll be used to 
> >> > >> > remove
> >> > >> > vm_page::phys_addr as we talked some time ago.
> >> > >>  
> >> > >>  i'm not sure if commiting this unused uncommented code now helps 
> >> > >>  it.
> >> > >>  some try-and-benchmark cycles might be necessary given that
> >> > >>  vm_page <-> paddr conversion could be performace critical.
> >> > >> >>> 
> >> > >> >>> If you really care performance, we can directly pass "struct 
> >> > >> >>> vm_page
> >> > >> >>> *" to pmap_enter().
> >> > >> >>> 
> >> > >> >>> We're doing "struct vm_page *" -> "paddr_t" just before 
> >> > >> >>> pmap_enter(),
> >> > >> >>> then doing "paddr_t" -> "vm_physseg" reverse lookup again in
> >> > >> >>> pmap_enter() to check if a given PA is managed.  What is really
> >> > >> >>> needed here is, to lookup "struct vm_page *" -> "vm_physseg" once
> >> > >> >>> and you'll know both paddr_t and managed or not.
> >> > >> >> 
> >> > >> >> i agree that the current code is not ideal in that respect.
> >> > >> >> otoh, i'm not sure if passing vm_physseg around is a good idea.
> >> > >> > 
> >> > >> > It's great you share the interest.
> >> > >> > 
> >> > >> > I chose vm_physseg, because it was there.  I'm open to alternatives,
> >> > >> > but I don't think you have many options...
> >> > >> 
> >> > >> Passing vm_page * doesn't work if the page isn't managed since there
> >> > >> won't be a vm_page for the paddr_t.
> >> > >> 
> >> > >> Now passing both paddr_t and vm_page * would work and if the pointer
> >> > >> to the vm_page, it would be an unmanaged mapping.  This also gives the
> >> > >> access to mdpg without another lookup.
> >> > > 
> >> > > What if XIP'ed md(4), where physical pages are in .data (or .rodata)?
> >> > > 
> >> > > And don't forget that you're the one who first pointed out that
> >> > > allocating vm_pages for XIP is a pure waste of memory. ;)
> >> > 
> >> > i guess matt meant "if the pointer to the vm_page is NULL,".
> >> > 
> >> > > 
> >> > > I'm allocating vm_pages, only because of phys_addr and loan_count.
> >> > > I believe vm_pages is unnecessary for read-only XIP segments.
> >> > > Because they're read-only, and stateless.
> >> > > 
> >> > > I've already concluded that the current "managed or not" model
> >> > > doesn't work for XIP.  I'm pretty sure that my vm_physseg + off_t
> >> > > model can explain everything.  I'm rather waiting for a counter
> >> > > example how vm_physseg doesn't work...
> >> > 
> >> > i guess your suggestion is too vague.
> >> > where do you want to use vm_physseg * + off_t instead of vm_page * ?
> >> > getpages, pmap_enter, and?  how their function prototypes would be?
> >> 
> >> The basic idea is straightforward; always allocate vm_physseg for
> >> memories/devices.  If a vm_physseg is used as general purpose
> >> memory, you allocate vm_page[] (as vm_physseg::pgs).  If it's
> >> potentially mapped as cached, you allocate pvh (as vm_physseg:pvh).
> 
> can you provide function prototypes?

I have no real code for this big picture at this moment.  Making
vm_physseg available as reference is the first step.  This only
changes uvm_page_physload() to return a pointer:

-void uvm_page_physload();
+void *uvm_page_physload();

But this makes XIP pager MUCH cleaner.  The reason has been explained
many times.

Making fault handlers and pagers to use vm_physseg * + off_t is
the next step, and I don't intend to work on it now.  I just want
to explain the big picture.

> 
> >> 
> >> Keep vm_physseg * + off_t array on stack.  If UVM objects uses
> >> vm_page (e.g. vnode), its pager looks up vm_page -> vm_physseg *
> >> + off_t *once* and cache it on stack.
> 
> do you mean something like this?
>   struct {
>   vm_physseg *hoge;
>   off_t fuga;
>   } foo [16];

Yes.

Or cache vm_page * with it, like:

struct v

Re: CVS commit: src/sys/uvm

2010-12-21 Thread YAMAMOTO Takashi
hi,

> Could you ack this discussion?

sorry for dropping a ball.

> 
> On Tue, Dec 07, 2010 at 01:19:46AM +0900, Masao Uebayashi wrote:
>> On Thu, Nov 25, 2010 at 11:32:39PM +, YAMAMOTO Takashi wrote:
>> > [ adding cc: tech-kern@ ]
>> > 
>> > hi,
>> > 
>> > > On Wed, Nov 24, 2010 at 11:26:39PM -0800, Matt Thomas wrote:
>> > >> 
>> > >> On Nov 24, 2010, at 10:47 PM, Masao Uebayashi wrote:
>> > >> 
>> > >> > On Thu, Nov 25, 2010 at 05:44:21AM +, YAMAMOTO Takashi wrote:
>> > >> >> hi,
>> > >> >> 
>> > >> >>> On Thu, Nov 25, 2010 at 04:18:25AM +, YAMAMOTO Takashi wrote:
>> > >>  hi,
>> > >>  
>> > >> > Hi, thanks for review.
>> > >> > 
>> > >> > On Thu, Nov 25, 2010 at 01:58:04AM +, YAMAMOTO Takashi wrote:
>> > >> >> hi,
>> > >> >> 
>> > >> >> - what's VM_PHYSSEG_OP_PG?
>> > >> > 
>> > >> > It's to lookup vm_physseg by "struct vm_page *", relying on that
>> > >> > "struct vm_page *[]" is allocated linearly.  It'll be used to 
>> > >> > remove
>> > >> > vm_page::phys_addr as we talked some time ago.
>> > >>  
>> > >>  i'm not sure if commiting this unused uncommented code now helps 
>> > >>  it.
>> > >>  some try-and-benchmark cycles might be necessary given that
>> > >>  vm_page <-> paddr conversion could be performace critical.
>> > >> >>> 
>> > >> >>> If you really care performance, we can directly pass "struct vm_page
>> > >> >>> *" to pmap_enter().
>> > >> >>> 
>> > >> >>> We're doing "struct vm_page *" -> "paddr_t" just before 
>> > >> >>> pmap_enter(),
>> > >> >>> then doing "paddr_t" -> "vm_physseg" reverse lookup again in
>> > >> >>> pmap_enter() to check if a given PA is managed.  What is really
>> > >> >>> needed here is, to lookup "struct vm_page *" -> "vm_physseg" once
>> > >> >>> and you'll know both paddr_t and managed or not.
>> > >> >> 
>> > >> >> i agree that the current code is not ideal in that respect.
>> > >> >> otoh, i'm not sure if passing vm_physseg around is a good idea.
>> > >> > 
>> > >> > It's great you share the interest.
>> > >> > 
>> > >> > I chose vm_physseg, because it was there.  I'm open to alternatives,
>> > >> > but I don't think you have many options...
>> > >> 
>> > >> Passing vm_page * doesn't work if the page isn't managed since there
>> > >> won't be a vm_page for the paddr_t.
>> > >> 
>> > >> Now passing both paddr_t and vm_page * would work and if the pointer
>> > >> to the vm_page, it would be an unmanaged mapping.  This also gives the
>> > >> access to mdpg without another lookup.
>> > > 
>> > > What if XIP'ed md(4), where physical pages are in .data (or .rodata)?
>> > > 
>> > > And don't forget that you're the one who first pointed out that
>> > > allocating vm_pages for XIP is a pure waste of memory. ;)
>> > 
>> > i guess matt meant "if the pointer to the vm_page is NULL,".
>> > 
>> > > 
>> > > I'm allocating vm_pages, only because of phys_addr and loan_count.
>> > > I believe vm_pages is unnecessary for read-only XIP segments.
>> > > Because they're read-only, and stateless.
>> > > 
>> > > I've already concluded that the current "managed or not" model
>> > > doesn't work for XIP.  I'm pretty sure that my vm_physseg + off_t
>> > > model can explain everything.  I'm rather waiting for a counter
>> > > example how vm_physseg doesn't work...
>> > 
>> > i guess your suggestion is too vague.
>> > where do you want to use vm_physseg * + off_t instead of vm_page * ?
>> > getpages, pmap_enter, and?  how their function prototypes would be?
>> 
>> The basic idea is straightforward; always allocate vm_physseg for
>> memories/devices.  If a vm_physseg is used as general purpose
>> memory, you allocate vm_page[] (as vm_physseg::pgs).  If it's
>> potentially mapped as cached, you allocate pvh (as vm_physseg:pvh).

can you provide function prototypes?

>> 
>> Keep vm_physseg * + off_t array on stack.  If UVM objects uses
>> vm_page (e.g. vnode), its pager looks up vm_page -> vm_physseg *
>> + off_t *once* and cache it on stack.

do you mean something like this?
struct {
vm_physseg *hoge;
off_t fuga;
} foo [16];

YAMAMOTO Takashi

>> 
>> > any valid paddr_t value will belong to exactly one vm_phsseg?
>> 
>> That's the idea.  This would clarify mem(4) backend too.
>> 
>> Note that allocating vm_physseg for device segments is cheap.


Re: CVS commit: src/sys/uvm

2010-12-21 Thread Masao Uebayashi
On Wed, Dec 22, 2010 at 11:13:52AM +0900, Masao Uebayashi wrote:
> On Tue, Dec 21, 2010 at 11:29:01AM -0800, Matt Thomas wrote:
> > 
> > On Dec 6, 2010, at 8:19 AM, Masao Uebayashi wrote:
> > 
> > > On Thu, Nov 25, 2010 at 11:32:39PM +, YAMAMOTO Takashi wrote:
> > >> [ adding cc: tech-kern@ ]
> > > 
> > > The basic idea is straightforward; always allocate vm_physseg for
> > > memories/devices.  If a vm_physseg is used as general purpose
> > > memory, you allocate vm_page[] (as vm_physseg::pgs).  If it's
> > > potentially mapped as cached, you allocate pvh (as vm_physseg:pvh).
> > 
> > Ewww.  How p->v is managed needs to be kept out of the MI code.
> 
> Could you elaborate the reason why so?
> 
> I've already proven that __HAVE_VM_PAGE_MD pmaps don't need struct
> vm_page *.
> 
> > There can be a common implementation of it for use by MD code but
> > that isn't the same as MI.
> > 
> > > Keep vm_physseg * + off_t array on stack.  If UVM objects uses
> > > vm_page (e.g. vnode), its pager looks up vm_page -> vm_physseg *
> > > + off_t *once* and cache it on stack.
> > 
> > off_t is not the right type.  psize_t would be more accurate.
> 
> Probably.

Well, I want to make this MI.  One of motivations to use vm_physseg
is to make page identity independent of machine-specific types
(paddr_t / psize_t).  So that kernel modules are shared between
different sizeof(paddr_t).

I see pmap as "MMU driver".  paddr_t is an opaque type, which should
be used only for H/Ws that touch the physical bus (mainbus).


Re: CVS commit: src/sys/uvm

2010-12-21 Thread Masao Uebayashi
On Tue, Dec 21, 2010 at 06:33:53PM -0800, Matt Thomas wrote:
> 
> On Dec 21, 2010, at 6:13 PM, Masao Uebayashi wrote:
> 
> > On Tue, Dec 21, 2010 at 11:29:01AM -0800, Matt Thomas wrote:
> >> 
> >> On Dec 6, 2010, at 8:19 AM, Masao Uebayashi wrote:
> >> 
> >>> On Thu, Nov 25, 2010 at 11:32:39PM +, YAMAMOTO Takashi wrote:
>  [ adding cc: tech-kern@ ]
> >>> 
> >>> The basic idea is straightforward; always allocate vm_physseg for
> >>> memories/devices.  If a vm_physseg is used as general purpose
> >>> memory, you allocate vm_page[] (as vm_physseg::pgs).  If it's
> >>> potentially mapped as cached, you allocate pvh (as vm_physseg:pvh).
> >> 
> >> Ewww.  How p->v is managed needs to be kept out of the MI code.
> > 
> > Could you elaborate the reason why so?
> 
> PowerPC OEA uses an inverted page table.  It's p->v works very differently.

True.

> 
> > I've already proven that __HAVE_VM_PAGE_MD pmaps don't need struct
> > vm_page *.
> 
> Well, pmap_page_* definitely need vm_page * since it's one of the arguments.

Yes, and vm_page * is used as page identity, right?

What I call "vm_page is used" means for example read/write'ing
pg->flags.  In that sense, vm_page is not really used in any pmaps.

> 
>  any valid paddr_t value will belong to exactly one vm_phsseg?
> >>> 
> >>> That's the idea.  This would clarify mem(4) backend too.
> >>> 
> >>> Note that allocating vm_physseg for device segments is cheap.
> >> 
> >> that's depends on how much more expensive finding physseg gets.
> > 
> > "Finding physseg" == "(reverse) lookup of vm_page -> vm_physseg".
> > It is done only once (for each page) in pagers that use vm_page.
> > 
> > Is the biggest concern lookup cost?  Then I'd point out that
> > uvm_pageismanaged() in pmap_enter() should die.  Cacheability is
> > decided by how VA is mapped.  Those uvm_pageismanaged() calls are
> > both inefficient and wrong.
> 
> Not all pmaps use that.  Instead, we should have a PMAP_UNMANAGED
> flag passed to pmap_enter since the caller probably know the
> answer.

s/probably know/know/

IMO cacheability is always explicitly specified by UVM.

Cacheability is decided by how VA is mapped, i.e., shared or not.
Vnode are shared but read-only, so it's cacheable.  Device registers
are shared between CPU and H/W, so it can't be cacheable.


Re: CVS commit: src/sys/uvm

2010-12-21 Thread Matt Thomas

On Dec 21, 2010, at 6:13 PM, Masao Uebayashi wrote:

> On Tue, Dec 21, 2010 at 11:29:01AM -0800, Matt Thomas wrote:
>> 
>> On Dec 6, 2010, at 8:19 AM, Masao Uebayashi wrote:
>> 
>>> On Thu, Nov 25, 2010 at 11:32:39PM +, YAMAMOTO Takashi wrote:
 [ adding cc: tech-kern@ ]
>>> 
>>> The basic idea is straightforward; always allocate vm_physseg for
>>> memories/devices.  If a vm_physseg is used as general purpose
>>> memory, you allocate vm_page[] (as vm_physseg::pgs).  If it's
>>> potentially mapped as cached, you allocate pvh (as vm_physseg:pvh).
>> 
>> Ewww.  How p->v is managed needs to be kept out of the MI code.
> 
> Could you elaborate the reason why so?

PowerPC OEA uses an inverted page table.  It's p->v works very differently.

> I've already proven that __HAVE_VM_PAGE_MD pmaps don't need struct
> vm_page *.

Well, pmap_page_* definitely need vm_page * since it's one of the arguments.

 any valid paddr_t value will belong to exactly one vm_phsseg?
>>> 
>>> That's the idea.  This would clarify mem(4) backend too.
>>> 
>>> Note that allocating vm_physseg for device segments is cheap.
>> 
>> that's depends on how much more expensive finding physseg gets.
> 
> "Finding physseg" == "(reverse) lookup of vm_page -> vm_physseg".
> It is done only once (for each page) in pagers that use vm_page.
> 
> Is the biggest concern lookup cost?  Then I'd point out that
> uvm_pageismanaged() in pmap_enter() should die.  Cacheability is
> decided by how VA is mapped.  Those uvm_pageismanaged() calls are
> both inefficient and wrong.

Not all pmaps use that.  Instead, we should have a PMAP_UNMANAGED
flag passed to pmap_enter since the caller probably know the
answer.


Re: CVS commit: src/sys/uvm

2010-12-21 Thread Masao Uebayashi
On Tue, Dec 21, 2010 at 11:29:01AM -0800, Matt Thomas wrote:
> 
> On Dec 6, 2010, at 8:19 AM, Masao Uebayashi wrote:
> 
> > On Thu, Nov 25, 2010 at 11:32:39PM +, YAMAMOTO Takashi wrote:
> >> [ adding cc: tech-kern@ ]
> > 
> > The basic idea is straightforward; always allocate vm_physseg for
> > memories/devices.  If a vm_physseg is used as general purpose
> > memory, you allocate vm_page[] (as vm_physseg::pgs).  If it's
> > potentially mapped as cached, you allocate pvh (as vm_physseg:pvh).
> 
> Ewww.  How p->v is managed needs to be kept out of the MI code.

Could you elaborate the reason why so?

I've already proven that __HAVE_VM_PAGE_MD pmaps don't need struct
vm_page *.

> There can be a common implementation of it for use by MD code but
> that isn't the same as MI.
> 
> > Keep vm_physseg * + off_t array on stack.  If UVM objects uses
> > vm_page (e.g. vnode), its pager looks up vm_page -> vm_physseg *
> > + off_t *once* and cache it on stack.
> 
> off_t is not the right type.  psize_t would be more accurate.

Probably.

> 
> >> any valid paddr_t value will belong to exactly one vm_phsseg?
> > 
> > That's the idea.  This would clarify mem(4) backend too.
> > 
> > Note that allocating vm_physseg for device segments is cheap.
> 
> that's depends on how much more expensive finding physseg gets.

"Finding physseg" == "(reverse) lookup of vm_page -> vm_physseg".
It is done only once (for each page) in pagers that use vm_page.

Is the biggest concern lookup cost?  Then I'd point out that
uvm_pageismanaged() in pmap_enter() should die.  Cacheability is
decided by how VA is mapped.  Those uvm_pageismanaged() calls are
both inefficient and wrong.


Re: CVS commit: src/sys/uvm

2010-12-21 Thread Matt Thomas

On Dec 6, 2010, at 8:19 AM, Masao Uebayashi wrote:

> On Thu, Nov 25, 2010 at 11:32:39PM +, YAMAMOTO Takashi wrote:
>> [ adding cc: tech-kern@ ]
> 
> The basic idea is straightforward; always allocate vm_physseg for
> memories/devices.  If a vm_physseg is used as general purpose
> memory, you allocate vm_page[] (as vm_physseg::pgs).  If it's
> potentially mapped as cached, you allocate pvh (as vm_physseg:pvh).

Ewww.  How p->v is managed needs to be kept out of the MI code.
There can be a common implementation of it for use by MD code but
that isn't the same as MI.

> Keep vm_physseg * + off_t array on stack.  If UVM objects uses
> vm_page (e.g. vnode), its pager looks up vm_page -> vm_physseg *
> + off_t *once* and cache it on stack.

off_t is not the right type.  psize_t would be more accurate.

>> any valid paddr_t value will belong to exactly one vm_phsseg?
> 
> That's the idea.  This would clarify mem(4) backend too.
> 
> Note that allocating vm_physseg for device segments is cheap.

that's depends on how much more expensive finding physseg gets.



Re: CVS commit: src/sys/uvm

2010-12-21 Thread Masao Uebayashi
Could you ack this discussion?

On Tue, Dec 07, 2010 at 01:19:46AM +0900, Masao Uebayashi wrote:
> On Thu, Nov 25, 2010 at 11:32:39PM +, YAMAMOTO Takashi wrote:
> > [ adding cc: tech-kern@ ]
> > 
> > hi,
> > 
> > > On Wed, Nov 24, 2010 at 11:26:39PM -0800, Matt Thomas wrote:
> > >> 
> > >> On Nov 24, 2010, at 10:47 PM, Masao Uebayashi wrote:
> > >> 
> > >> > On Thu, Nov 25, 2010 at 05:44:21AM +, YAMAMOTO Takashi wrote:
> > >> >> hi,
> > >> >> 
> > >> >>> On Thu, Nov 25, 2010 at 04:18:25AM +, YAMAMOTO Takashi wrote:
> > >>  hi,
> > >>  
> > >> > Hi, thanks for review.
> > >> > 
> > >> > On Thu, Nov 25, 2010 at 01:58:04AM +, YAMAMOTO Takashi wrote:
> > >> >> hi,
> > >> >> 
> > >> >> - what's VM_PHYSSEG_OP_PG?
> > >> > 
> > >> > It's to lookup vm_physseg by "struct vm_page *", relying on that
> > >> > "struct vm_page *[]" is allocated linearly.  It'll be used to 
> > >> > remove
> > >> > vm_page::phys_addr as we talked some time ago.
> > >>  
> > >>  i'm not sure if commiting this unused uncommented code now helps it.
> > >>  some try-and-benchmark cycles might be necessary given that
> > >>  vm_page <-> paddr conversion could be performace critical.
> > >> >>> 
> > >> >>> If you really care performance, we can directly pass "struct vm_page
> > >> >>> *" to pmap_enter().
> > >> >>> 
> > >> >>> We're doing "struct vm_page *" -> "paddr_t" just before pmap_enter(),
> > >> >>> then doing "paddr_t" -> "vm_physseg" reverse lookup again in
> > >> >>> pmap_enter() to check if a given PA is managed.  What is really
> > >> >>> needed here is, to lookup "struct vm_page *" -> "vm_physseg" once
> > >> >>> and you'll know both paddr_t and managed or not.
> > >> >> 
> > >> >> i agree that the current code is not ideal in that respect.
> > >> >> otoh, i'm not sure if passing vm_physseg around is a good idea.
> > >> > 
> > >> > It's great you share the interest.
> > >> > 
> > >> > I chose vm_physseg, because it was there.  I'm open to alternatives,
> > >> > but I don't think you have many options...
> > >> 
> > >> Passing vm_page * doesn't work if the page isn't managed since there
> > >> won't be a vm_page for the paddr_t.
> > >> 
> > >> Now passing both paddr_t and vm_page * would work and if the pointer
> > >> to the vm_page, it would be an unmanaged mapping.  This also gives the
> > >> access to mdpg without another lookup.
> > > 
> > > What if XIP'ed md(4), where physical pages are in .data (or .rodata)?
> > > 
> > > And don't forget that you're the one who first pointed out that
> > > allocating vm_pages for XIP is a pure waste of memory. ;)
> > 
> > i guess matt meant "if the pointer to the vm_page is NULL,".
> > 
> > > 
> > > I'm allocating vm_pages, only because of phys_addr and loan_count.
> > > I believe vm_pages is unnecessary for read-only XIP segments.
> > > Because they're read-only, and stateless.
> > > 
> > > I've already concluded that the current "managed or not" model
> > > doesn't work for XIP.  I'm pretty sure that my vm_physseg + off_t
> > > model can explain everything.  I'm rather waiting for a counter
> > > example how vm_physseg doesn't work...
> > 
> > i guess your suggestion is too vague.
> > where do you want to use vm_physseg * + off_t instead of vm_page * ?
> > getpages, pmap_enter, and?  how their function prototypes would be?
> 
> The basic idea is straightforward; always allocate vm_physseg for
> memories/devices.  If a vm_physseg is used as general purpose
> memory, you allocate vm_page[] (as vm_physseg::pgs).  If it's
> potentially mapped as cached, you allocate pvh (as vm_physseg:pvh).
> 
> Keep vm_physseg * + off_t array on stack.  If UVM objects uses
> vm_page (e.g. vnode), its pager looks up vm_page -> vm_physseg *
> + off_t *once* and cache it on stack.
> 
> > any valid paddr_t value will belong to exactly one vm_phsseg?
> 
> That's the idea.  This would clarify mem(4) backend too.
> 
> Note that allocating vm_physseg for device segments is cheap.


Re: CVS commit: src/sys/uvm

2010-12-06 Thread Masao Uebayashi
On Thu, Nov 25, 2010 at 11:32:39PM +, YAMAMOTO Takashi wrote:
> [ adding cc: tech-kern@ ]
> 
> hi,
> 
> > On Wed, Nov 24, 2010 at 11:26:39PM -0800, Matt Thomas wrote:
> >> 
> >> On Nov 24, 2010, at 10:47 PM, Masao Uebayashi wrote:
> >> 
> >> > On Thu, Nov 25, 2010 at 05:44:21AM +, YAMAMOTO Takashi wrote:
> >> >> hi,
> >> >> 
> >> >>> On Thu, Nov 25, 2010 at 04:18:25AM +, YAMAMOTO Takashi wrote:
> >>  hi,
> >>  
> >> > Hi, thanks for review.
> >> > 
> >> > On Thu, Nov 25, 2010 at 01:58:04AM +, YAMAMOTO Takashi wrote:
> >> >> hi,
> >> >> 
> >> >> - what's VM_PHYSSEG_OP_PG?
> >> > 
> >> > It's to lookup vm_physseg by "struct vm_page *", relying on that
> >> > "struct vm_page *[]" is allocated linearly.  It'll be used to remove
> >> > vm_page::phys_addr as we talked some time ago.
> >>  
> >>  i'm not sure if commiting this unused uncommented code now helps it.
> >>  some try-and-benchmark cycles might be necessary given that
> >>  vm_page <-> paddr conversion could be performace critical.
> >> >>> 
> >> >>> If you really care performance, we can directly pass "struct vm_page
> >> >>> *" to pmap_enter().
> >> >>> 
> >> >>> We're doing "struct vm_page *" -> "paddr_t" just before pmap_enter(),
> >> >>> then doing "paddr_t" -> "vm_physseg" reverse lookup again in
> >> >>> pmap_enter() to check if a given PA is managed.  What is really
> >> >>> needed here is, to lookup "struct vm_page *" -> "vm_physseg" once
> >> >>> and you'll know both paddr_t and managed or not.
> >> >> 
> >> >> i agree that the current code is not ideal in that respect.
> >> >> otoh, i'm not sure if passing vm_physseg around is a good idea.
> >> > 
> >> > It's great you share the interest.
> >> > 
> >> > I chose vm_physseg, because it was there.  I'm open to alternatives,
> >> > but I don't think you have many options...
> >> 
> >> Passing vm_page * doesn't work if the page isn't managed since there
> >> won't be a vm_page for the paddr_t.
> >> 
> >> Now passing both paddr_t and vm_page * would work and if the pointer
> >> to the vm_page, it would be an unmanaged mapping.  This also gives the
> >> access to mdpg without another lookup.
> > 
> > What if XIP'ed md(4), where physical pages are in .data (or .rodata)?
> > 
> > And don't forget that you're the one who first pointed out that
> > allocating vm_pages for XIP is a pure waste of memory. ;)
> 
> i guess matt meant "if the pointer to the vm_page is NULL,".
> 
> > 
> > I'm allocating vm_pages, only because of phys_addr and loan_count.
> > I believe vm_pages is unnecessary for read-only XIP segments.
> > Because they're read-only, and stateless.
> > 
> > I've already concluded that the current "managed or not" model
> > doesn't work for XIP.  I'm pretty sure that my vm_physseg + off_t
> > model can explain everything.  I'm rather waiting for a counter
> > example how vm_physseg doesn't work...
> 
> i guess your suggestion is too vague.
> where do you want to use vm_physseg * + off_t instead of vm_page * ?
> getpages, pmap_enter, and?  how their function prototypes would be?

The basic idea is straightforward; always allocate vm_physseg for
memories/devices.  If a vm_physseg is used as general purpose
memory, you allocate vm_page[] (as vm_physseg::pgs).  If it's
potentially mapped as cached, you allocate pvh (as vm_physseg:pvh).

Keep vm_physseg * + off_t array on stack.  If UVM objects uses
vm_page (e.g. vnode), its pager looks up vm_page -> vm_physseg *
+ off_t *once* and cache it on stack.

> any valid paddr_t value will belong to exactly one vm_phsseg?

That's the idea.  This would clarify mem(4) backend too.

Note that allocating vm_physseg for device segments is cheap.


Re: CVS commit: src/sys/uvm

2010-11-25 Thread YAMAMOTO Takashi
[ adding cc: tech-kern@ ]

hi,

> On Wed, Nov 24, 2010 at 11:26:39PM -0800, Matt Thomas wrote:
>> 
>> On Nov 24, 2010, at 10:47 PM, Masao Uebayashi wrote:
>> 
>> > On Thu, Nov 25, 2010 at 05:44:21AM +, YAMAMOTO Takashi wrote:
>> >> hi,
>> >> 
>> >>> On Thu, Nov 25, 2010 at 04:18:25AM +, YAMAMOTO Takashi wrote:
>>  hi,
>>  
>> > Hi, thanks for review.
>> > 
>> > On Thu, Nov 25, 2010 at 01:58:04AM +, YAMAMOTO Takashi wrote:
>> >> hi,
>> >> 
>> >> - what's VM_PHYSSEG_OP_PG?
>> > 
>> > It's to lookup vm_physseg by "struct vm_page *", relying on that
>> > "struct vm_page *[]" is allocated linearly.  It'll be used to remove
>> > vm_page::phys_addr as we talked some time ago.
>>  
>>  i'm not sure if commiting this unused uncommented code now helps it.
>>  some try-and-benchmark cycles might be necessary given that
>>  vm_page <-> paddr conversion could be performace critical.
>> >>> 
>> >>> If you really care performance, we can directly pass "struct vm_page
>> >>> *" to pmap_enter().
>> >>> 
>> >>> We're doing "struct vm_page *" -> "paddr_t" just before pmap_enter(),
>> >>> then doing "paddr_t" -> "vm_physseg" reverse lookup again in
>> >>> pmap_enter() to check if a given PA is managed.  What is really
>> >>> needed here is, to lookup "struct vm_page *" -> "vm_physseg" once
>> >>> and you'll know both paddr_t and managed or not.
>> >> 
>> >> i agree that the current code is not ideal in that respect.
>> >> otoh, i'm not sure if passing vm_physseg around is a good idea.
>> > 
>> > It's great you share the interest.
>> > 
>> > I chose vm_physseg, because it was there.  I'm open to alternatives,
>> > but I don't think you have many options...
>> 
>> Passing vm_page * doesn't work if the page isn't managed since there
>> won't be a vm_page for the paddr_t.
>> 
>> Now passing both paddr_t and vm_page * would work and if the pointer
>> to the vm_page, it would be an unmanaged mapping.  This also gives the
>> access to mdpg without another lookup.
> 
> What if XIP'ed md(4), where physical pages are in .data (or .rodata)?
> 
> And don't forget that you're the one who first pointed out that
> allocating vm_pages for XIP is a pure waste of memory. ;)

i guess matt meant "if the pointer to the vm_page is NULL,".

> 
> I'm allocating vm_pages, only because of phys_addr and loan_count.
> I believe vm_pages is unnecessary for read-only XIP segments.
> Because they're read-only, and stateless.
> 
> I've already concluded that the current "managed or not" model
> doesn't work for XIP.  I'm pretty sure that my vm_physseg + off_t
> model can explain everything.  I'm rather waiting for a counter
> example how vm_physseg doesn't work...

i guess your suggestion is too vague.
where do you want to use vm_physseg * + off_t instead of vm_page * ?
getpages, pmap_enter, and?  how their function prototypes would be?
any valid paddr_t value will belong to exactly one vm_phsseg?

YAMAMOTO Takashi


Re: sysctl node names (Re: CVS commit: src/sys/uvm)

2010-04-28 Thread Patrick Welche
On Fri, Feb 19, 2010 at 07:15:21PM +, Patrick Welche wrote:
> 
> Which reminds me of sys/sysctl.h
> 
> #define CTLTYPE_BOOL6   /* name describes a bool */
> 
> but bool doesn't seem to have i/o routines (I bet that all the above
> are CTLTYPE_INT) And then what would be output for bool 1, t, true, vrai...

Which has been fixed :-)

date: 2010/04/11 01:52:10;  author: mrg;  state: Exp;  lines: +66 -5
implement CTLTYPE_BOOL support.  it was entirely missing.  HI MATT!



Re: sysctl node names (Re: CVS commit: src/sys/uvm)

2010-04-27 Thread David Holland
On Fri, Feb 19, 2010 at 06:42:03AM -0500, der Mouse wrote:
 > I'd say it's a question of whether you think of them as input to the
 > kernel, commands ("enable this"), or as output from the kernel,
 > reporting state ("this is enabled").
 > 
 > Of course, in most cases, they're actually both, so that doesn't help
 > much.  But, as a native anglophone, it's what the difference between
 > "enable" and "enabled" here feels like to me.

Well, the sysctl tree is a presentation of the state. Changing it is a
command. So I think it should be "enabled".

We could also avoid this problem by using "on" instead, but that's
probably not a great idea.

-- 
David A. Holland
dholl...@netbsd.org


Re: sysctl node names (Re: CVS commit: src/sys/uvm)

2010-02-19 Thread Patrick Welche
On Fri, Feb 19, 2010 at 05:44:48AM +, YAMAMOTO Takashi wrote:
> hi,
> 
> > Module Name:src
> > Committed By:   drochner
> > Date:   Thu Feb 18 14:57:01 UTC 2010
> > 
> > Modified Files:
> > src/sys/uvm: files.uvm uvm_map.c
> > 
> > Log Message:
> > Disable mapping of virtual address 0 by user programs per default.
> > This blocks an easy exploit of kernel bugs leading to dereference
> > of a NULL pointer on some architectures (eg i386).
> > The check can be disabled in various ways:
> > -by CPP definitions in machine/types.h (portmaster's choice)
> > -by a kernel config option USER_VA0_DISABLED_DEFAULT=0
> > -at runtime by sysctl vm.user_va0_disabled (cannot be cleared
> >  at securelevel>0)
> 
> it reminded me this...
> 
> can we have some policy for future sysctl node names?
> the current mixture of "enable" vs "enabled" seems ugly to me.
> if my preference matters, i'd say "enable"/"disable".
> 
> YAMAMOTO Takashi
> 
> ushi% sysctl -a|grep enable
> net.inet.tcp.sack.enable = 1
> net.inet.tcp.ecn.enable = 0
> net.inet.tcp.abc.enable = 1
> net.inet6.tcp6.sack.enable = 1
> net.inet6.tcp6.ecn.enable = 0
> net.inet6.tcp6.abc.enable = 1
> hw.fwohci.phydma_enable = 1
> security.pax.mprotect.enabled = 1
> security.pax.aslr.enabled = 1
> ushi% 

Which reminds me of sys/sysctl.h

#define CTLTYPE_BOOL6   /* name describes a bool */

but bool doesn't seem to have i/o routines (I bet that all the above
are CTLTYPE_INT) And then what would be output for bool 1, t, true, vrai...

Cheers,

Patrick


Re: sysctl node names (Re: CVS commit: src/sys/uvm)

2010-02-19 Thread der Mouse
>> the current mixture of "enable" vs "enabled" seems ugly to me.
>> if my preference matters, i'd say "enable"/"disable".
> For me, "enabled" sounds more logical because such a flag refers to a
> state, not a process.

I'd say it's a question of whether you think of them as input to the
kernel, commands ("enable this"), or as output from the kernel,
reporting state ("this is enabled").

Of course, in most cases, they're actually both, so that doesn't help
much.  But, as a native anglophone, it's what the difference between
"enable" and "enabled" here feels like to me.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: sysctl node names (Re: CVS commit: src/sys/uvm)

2010-02-19 Thread Matthias Drochner

y...@mwd.biglobe.ne.jp said:
> the current mixture of "enable" vs "enabled" seems ugly to me. if my
> preference matters, i'd say "enable"/"disable".

For me, "enabled" sounds more logical because such a flag
refers to a state, not a process. But I'm ready to sacrifice
that for consistency. The "enable"s are the majority, and they
are the older ones.
In the "va0" case it is not too late -- if no native
speaker convinces me otherwise I'll change it to _disable.

best regards
Matthias





Forschungszentrum Juelich GmbH
52425 Juelich
Sitz der Gesellschaft: Juelich
Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
Vorsitzende des Aufsichtsrats: MinDir'in Baerbel Brumme-Bothe
Geschaeftsfuehrung: Prof. Dr. Achim Bachem (Vorsitzender),
Dr. Ulrich Krafft (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt,
Prof. Dr. Sebastian M. Schmidt




sysctl node names (Re: CVS commit: src/sys/uvm)

2010-02-18 Thread YAMAMOTO Takashi
hi,

> Module Name:  src
> Committed By: drochner
> Date: Thu Feb 18 14:57:01 UTC 2010
> 
> Modified Files:
>   src/sys/uvm: files.uvm uvm_map.c
> 
> Log Message:
> Disable mapping of virtual address 0 by user programs per default.
> This blocks an easy exploit of kernel bugs leading to dereference
> of a NULL pointer on some architectures (eg i386).
> The check can be disabled in various ways:
> -by CPP definitions in machine/types.h (portmaster's choice)
> -by a kernel config option USER_VA0_DISABLED_DEFAULT=0
> -at runtime by sysctl vm.user_va0_disabled (cannot be cleared
>  at securelevel>0)

it reminded me this...

can we have some policy for future sysctl node names?
the current mixture of "enable" vs "enabled" seems ugly to me.
if my preference matters, i'd say "enable"/"disable".

YAMAMOTO Takashi

ushi% sysctl -a|grep enable
net.inet.tcp.sack.enable = 1
net.inet.tcp.ecn.enable = 0
net.inet.tcp.abc.enable = 1
net.inet6.tcp6.sack.enable = 1
net.inet6.tcp6.ecn.enable = 0
net.inet6.tcp6.abc.enable = 1
hw.fwohci.phydma_enable = 1
security.pax.mprotect.enabled = 1
security.pax.aslr.enabled = 1
ushi% 

> 
> 
> To generate a diff of this commit:
> cvs rdiff -u -r1.16 -r1.17 src/sys/uvm/files.uvm
> cvs rdiff -u -r1.287 -r1.288 src/sys/uvm/uvm_map.c
> 
> Please note that diffs are not public domain; they are subject to the
> copyright notices on the relevant files.