Re: [PATCH 2/2] page table iterators

2005-02-25 Thread Andi Kleen
On Thu, Feb 24, 2005 at 11:33:50AM -0800, David S. Miller wrote:
> On Thu, 24 Feb 2005 11:58:42 + (GMT)
> Hugh Dickins <[EMAIL PROTECTED]> wrote:
> 
> > Has anyone _ever_ seen a p??_ERROR message?
> 
> It triggers when you're writing new platform pagetable support
> or making drastric changes in same.  But on sparc64 I've set
> them all to nops to make the code output smaller. :-)

I don't think it's useful except for early debugging.

Also at least on i386/x86-64 the CPU sets a bit in the page fault
handler when it encounters a corrupted page table. On x86-64 
it is handled (not on i386) 


-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] page table iterators

2005-02-24 Thread Nick Piggin
Hugh Dickins wrote:
At one stage I was adding unlikelies to all the p??_bads, then it
seemed more sensible to hide that in a new macro (which of course
must do the none and bad tests inline, before going off to the function).
Yeah that sounds OK. I think (un)likely can propagate through
inline functions too, if that's any help to you.
We could at little cost.  But I think if these messages come up at all,
they're likely to come up in clumps, where the backtrace won't actually
be giving any interesting info, and the quantity of them be a nuisance
itself.  I'd rather leave it to the next person who gets the error and
wants the backtrace to add it.
You're probably right - I know when I see them (from my
hacking up the code) they usually come in clumps :P
Nick
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] page table iterators

2005-02-24 Thread Hugh Dickins
On Fri, 25 Feb 2005, Nick Piggin wrote:
> Hugh Dickins wrote:
> > 
> > Has anyone _ever_ seen a p??_ERROR message?  I'm inclined to just
> > put three functions into mm/memory.c to do the p??_ERROR and p??_clear,
> > but that way the __FILE__ and __LINE__ will always come out the same.
> > I think if it ever proves a problem, we'd just add in a dump_stack.
> 
> I think a function is the most sensible. And a good idea, it should
> reduce the icache pressure in the loops (although gcc does seem to
> do a pretty good job of moving unlikely()s away from the fastpath).

At one stage I was adding unlikelies to all the p??_bads, then it
seemed more sensible to hide that in a new macro (which of course
must do the none and bad tests inline, before going off to the function).

David's response confirms that __FILE__,__LINE__ shouldn't be an issue.

> I think at the point these things get detected, there is little use
> for having a dump_stack. But we may as well add one anyway if it is
> an out of line function?

We could at little cost.  But I think if these messages come up at all,
they're likely to come up in clumps, where the backtrace won't actually
be giving any interesting info, and the quantity of them be a nuisance
itself.  I'd rather leave it to the next person who gets the error and
wants the backtrace to add it.

Hugh
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] page table iterators

2005-02-24 Thread Nick Piggin
Hugh Dickins wrote:
On Thu, 24 Feb 2005, Nick Piggin wrote:
pud_addr_end?

next = pud_addr_end(addr, end);
Hmm, yes, I'll go with that, thanks (unless a better idea follows).
Something I do intend on top of what I sent before, is another set
of three macros, like
if (pud_none_or_clear_bad(pud))
continue;
to replace all the p??_none, p??_bad clauses: not to save space,
but just for clarity, those loops now seeming dominated by the
unlikeliest of cases.
Has anyone _ever_ seen a p??_ERROR message?  I'm inclined to just
put three functions into mm/memory.c to do the p??_ERROR and p??_clear,
but that way the __FILE__ and __LINE__ will always come out the same.
I think if it ever proves a problem, we'd just add in a dump_stack.
I think a function is the most sensible. And a good idea, it should
reduce the icache pressure in the loops (although gcc does seem to
do a pretty good job of moving unlikely()s away from the fastpath).
I think at the point these things get detected, there is little use
for having a dump_stack. But we may as well add one anyway if it is
an out of line function?
Nick
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] page table iterators

2005-02-24 Thread David S. Miller
On Thu, 24 Feb 2005 11:58:42 + (GMT)
Hugh Dickins <[EMAIL PROTECTED]> wrote:

> Has anyone _ever_ seen a p??_ERROR message?

It triggers when you're writing new platform pagetable support
or making drastric changes in same.  But on sparc64 I've set
them all to nops to make the code output smaller. :-)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] page table iterators

2005-02-24 Thread Hugh Dickins
On Thu, 24 Feb 2005, Nick Piggin wrote:
> 
> pud_addr_end?

next = pud_addr_end(addr, end);

Hmm, yes, I'll go with that, thanks (unless a better idea follows).

Something I do intend on top of what I sent before, is another set
of three macros, like

if (pud_none_or_clear_bad(pud))
continue;

to replace all the p??_none, p??_bad clauses: not to save space,
but just for clarity, those loops now seeming dominated by the
unlikeliest of cases.

Has anyone _ever_ seen a p??_ERROR message?  I'm inclined to just
put three functions into mm/memory.c to do the p??_ERROR and p??_clear,
but that way the __FILE__ and __LINE__ will always come out the same.
I think if it ever proves a problem, we'd just add in a dump_stack.

Hugh
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] page table iterators

2005-02-23 Thread Nick Piggin
On Thu, 2005-02-24 at 05:12 +, Hugh Dickins wrote:
> On Thu, 24 Feb 2005, Nick Piggin wrote:

> > OK after sleeping on it, I'm warming to your way.
> > 
> > I don't think it makes something like David's modifications any
> > easier, but mine didn't go a long way to that end either. And
> > being a more incremental approach gives us more room to move in
> > future (for example, maybe toward something that really *will*
> > accommodate the bitmap walking code nicely).
> 
> I'll take a quick look at David's today.
> Just so long as we don't make them harder.
> 

No, I think we may want to move to something better abstracted:
it makes things sufficiently complex that you wouldn't want to
have it open coded everywhere.

But no, you're not making it harder than the present situation.

> > So I'd be pretty happy for you to queue this up with Andrew for
> > 2.6.12. Anyone else?
> 
> Oh, okay, thanks.  You weren't very happy with p??_limit(addr, end),
> and good naming is important to me.  I didn't care for your tentative
> p??_span or p??_span_end.  Would p??_end be better?  p??_enda would
> be fun for one of them...
> 

pud_addr_end?



http://mobile.yahoo.com.au - Yahoo! Mobile

- Check & compose your email via SMS on your Telstra or Vodafone mobile.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] page table iterators

2005-02-23 Thread Hugh Dickins
On Thu, 24 Feb 2005, Nick Piggin wrote:
> Hugh Dickins wrote:
> 
> > I'm inlining pmd and pud levels, but not pte and pgd levels.
> 
> OK - that's probably sufficient for debugging. There is only so
> much that can go wrong in the middle levels... 

Yes, that was my thinking.

> how does it look
> performance wise? (I can give it a test when it gets split out)

Yesterday shattered in various directions, I hope to try today.

> > One point worth making, I do believe throughout that whatever the
> > address layout, "end" cannot be 0 - BUG_ON(addr >= end) assures.

Of course, that does allow some simplifications in your for_each
macros; but it still looked like my p??_limits were better for
shortest codepath, and close to yours for codesize.

> OK after sleeping on it, I'm warming to your way.
> 
> I don't think it makes something like David's modifications any
> easier, but mine didn't go a long way to that end either. And
> being a more incremental approach gives us more room to move in
> future (for example, maybe toward something that really *will*
> accommodate the bitmap walking code nicely).

I'll take a quick look at David's today.
Just so long as we don't make them harder.

> So I'd be pretty happy for you to queue this up with Andrew for
> 2.6.12. Anyone else?

Oh, okay, thanks.  You weren't very happy with p??_limit(addr, end),
and good naming is important to me.  I didn't care for your tentative
p??_span or p??_span_end.  Would p??_end be better?  p??_enda would
be fun for one of them...

Hugh
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] page table iterators

2005-02-23 Thread David S. Miller
On Thu, 24 Feb 2005 10:52:23 +1100
Nick Piggin <[EMAIL PROTECTED]> wrote:

> So I'd be pretty happy for you to queue this up with Andrew for
> 2.6.12. Anyone else?

No objections from me.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] page table iterators

2005-02-23 Thread Nick Piggin
Hugh Dickins wrote:
I'm off to bed, but since your appetite for looking at patches
is greater than mine, I'll throw what I'm currently testing over
the wall to you now.  Against 2.6.11-rc4-bk9, but my starting point
was obviously your patches.  Not yet split up, but clearly should be.
Yeah you've snuck a few other clever things in there ;)
Includes mm/swapfile.c which you missed.  I'm inlining pmd and pud
Thanks.
levels, but not pte and pgd levels.  No description yet, sorry.
OK - that's probably sufficient for debugging. There is only so
much that can go wrong in the middle levels... how does it look
performance wise? (I can give it a test when it gets split out)
One point worth making, I do believe throughout that whatever the
address layout, "end" cannot be 0 - BUG_ON(addr >= end) assures.
OK after sleeping on it, I'm warming to your way.
I don't think it makes something like David's modifications any
easier, but mine didn't go a long way to that end either. And
being a more incremental approach gives us more room to move in
future (for example, maybe toward something that really *will*
accommodate the bitmap walking code nicely).
So I'd be pretty happy for you to queue this up with Andrew for
2.6.12. Anyone else?
Nick
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] page table iterators

2005-02-22 Thread Nick Piggin
On Tue, 2005-02-22 at 20:31 -0800, David S. Miller wrote:

> I just got also reminded that we walk these damn pagetables completely
> twice every exit, once to unmap the VMAs pte mappings, once again to
> zap the page tables.  It might be fruitful to explore combining
> those two steps, perhaps not.
> 

I'm going to have a look at refcounting page table pages, which
will hopefully allow us to get back (and more) the clear_page_range
overhead introduced by the aggressive page table freeing.

It may also allow nice things like dropping file backed page table
mappings if they get reclaimed, and also a single walk to do the
freeing. I haven't looked into details yet though, these are just
vague hopes.


> Anyways, comments and improvment suggestions welcome.  Particularly
> interesting would be if this thing helps a lot on other platforms
> too, such as x86_64, ia64, alpha and ppc64.
> 

I have a feeling it should provide nice benefits to all archs if
we get it into all the walkers. Downsides are few - the bitmap walk
probably only becomes more expensive when all but a handful of
cachelines are present in a page table page.

I'd like to look at ways to make this patch happen with you soon...
First, for 2.6.12 my main concern is to get pt walking consistent,
and try to claw back some of the clear_page_range regression.

Thanks,
Nick


Find local movie times and trailers on Yahoo! Movies.

http://au.movies.yahoo.com

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] page table iterators

2005-02-22 Thread David S. Miller
On Wed, 23 Feb 2005 15:49:30 +1100
Nick Piggin <[EMAIL PROTECTED]> wrote:

> > It's easy to toy with the sparc64 optimization on other platforms,
> > just add the necessary hacks to pmd_set and pgd_set, allocation
> > of pmd and pgd tables
> 
> David: just an implementation detail that I had meant to bring
> up earlier - would it feel like less of a hack to put these in
> pmd_populate and pgd_populate?

Sure, no problem.  They get defined to pmd_set/pgd_set calls
anyways.  But wouldn't that miss pgd_clear() and pmd_clear()?
Someone may find it worthwhile to, on a *_clear(), to see if
a set bit can now be clear because all the neighboring entries
are empty as well.

That might have been the reason I put it there, but I may be
giving myself too much credit :-)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] page table iterators

2005-02-22 Thread Nick Piggin
On Tue, 2005-02-22 at 20:31 -0800, David S. Miller wrote:
> On Wed, 23 Feb 2005 02:06:28 + (GMT)
> Hugh Dickins <[EMAIL PROTECTED]> wrote:
> 
> > I've not seen Dave's bitmap walking functions (for clearing?),
> > would they fit in better with my way?
> 

Hugh: I'll have more of a look through your patch when I get
some time... to be honest I'm not too worried either way, so
long as one or the other gets in.

Very trivial point, but I'm not sure that I like the name
p?d_limit... maybe p?d_span or _span_end... hmm, they're not
really pleasing either.

You _are_ repeating a bit of mindless loop accounting in every
page table walk, and it isn't completely clear to me that it is
giving you much more flexibility (than for_each_*). But my loops
_are_ a bit contorted.

> This is what Nick is referring to:
> 

[snip]

> It's easy to toy with the sparc64 optimization on other platforms,
> just add the necessary hacks to pmd_set and pgd_set, allocation
> of pmd and pgd tables

David: just an implementation detail that I had meant to bring
up earlier - would it feel like less of a hack to put these in
pmd_populate and pgd_populate?

Nick





-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] page table iterators

2005-02-22 Thread David S. Miller
On Wed, 23 Feb 2005 02:06:28 + (GMT)
Hugh Dickins <[EMAIL PROTECTED]> wrote:

> I've not seen Dave's bitmap walking functions (for clearing?),
> would they fit in better with my way?

This is what Nick is referring to:



I hacked up something slightly different today.  I only
have it being used by clear_page_range() but it is extremely
effective.

Things like fork+exit latencies on my 750Mhz sparc64 box went
from ~490 microseconds to ~367 microseconds.  fork+execve
latency went down from ~1595 microseconds to ~1351 microseconds.

Two issues:

1) I'm not terribly satisfied with the interface.  I think
   with some improvements it can be applies to the two other
   routines this thing really makes sense for, namely copy_page_range
   and unmap_page_range

2) I don't think it will collapse well for 2-level page tables,
   someone take a look?

It's easy to toy with the sparc64 optimization on other platforms,
just add the necessary hacks to pmd_set and pgd_set, allocation
of pmd and pgd tables, use "PAGE_SHIFT - 5" instead of "PAGE_SHIFT - 6"
on 32-bit platforms, and then copy the asm-sparc64/pgwalk.h bits over
into your platforms asm-${ARCH}/pgwalk.h

I just got also reminded that we walk these damn pagetables completely
twice every exit, once to unmap the VMAs pte mappings, once again to
zap the page tables.  It might be fruitful to explore combining
those two steps, perhaps not.

Anyways, comments and improvment suggestions welcome.  Particularly
interesting would be if this thing helps a lot on other platforms
too, such as x86_64, ia64, alpha and ppc64.

# This is a BitKeeper generated diff -Nru style patch.
#
# ChangeSet
#   2004/08/10 23:44:24-07:00 [EMAIL PROTECTED] 
#   [MM]: Add arch-overridable page table walking machinery.
#   
#   Currently very rudimentary but is used fully for
#   clear_page_range().  An optimized implementation
#   is there for sparc64 and it is extremely effective
#   particularly for 64-bit processes.
#   
#   For things like lat_fork and friends clear_page_tables()
#   use to be 2nd or 3rd in the kernel profile, now it has
#   dropped to the 20th or so entry.
#   
#   Signed-off-by: David S. Miller 
# 
# mm/memory.c
#   2004/08/10 23:42:42-07:00 [EMAIL PROTECTED] +10 -26
#   [MM]: Add arch-overridable page table walking machinery.
# 
# include/asm-sparc64/pgtable.h
#   2004/08/10 23:42:42-07:00 [EMAIL PROTECTED] +28 -4
#   [MM]: Add arch-overridable page table walking machinery.
# 
# include/asm-sparc64/pgalloc.h
#   2004/08/10 23:42:42-07:00 [EMAIL PROTECTED] +10 -2
#   [MM]: Add arch-overridable page table walking machinery.
# 
# arch/sparc64/mm/init.c
#   2004/08/10 23:42:42-07:00 [EMAIL PROTECTED] +2 -2
#   [MM]: Add arch-overridable page table walking machinery.
# 
# include/asm-x86_64/pgwalk.h
#   2004/08/10 23:42:14-07:00 [EMAIL PROTECTED] +6 -0
#   [MM]: Add arch-overridable page table walking machinery.
# 
# include/asm-v850/pgwalk.h
#   2004/08/10 23:42:14-07:00 [EMAIL PROTECTED] +6 -0
#   [MM]: Add arch-overridable page table walking machinery.
# 
# include/asm-um/pgwalk.h
#   2004/08/10 23:42:14-07:00 [EMAIL PROTECTED] +6 -0
#   [MM]: Add arch-overridable page table walking machinery.
# 
# include/asm-sparc64/pgwalk.h
#   2004/08/10 23:42:14-07:00 [EMAIL PROTECTED] +114 -0
#   [MM]: Add arch-overridable page table walking machinery.
# 
# include/asm-sparc/pgwalk.h
#   2004/08/10 23:42:14-07:00 [EMAIL PROTECTED] +6 -0
#   [MM]: Add arch-overridable page table walking machinery.
# 
# include/asm-sh64/pgwalk.h
#   2004/08/10 23:42:14-07:00 [EMAIL PROTECTED] +6 -0
#   [MM]: Add arch-overridable page table walking machinery.
# 
# include/asm-sh/pgwalk.h
#   2004/08/10 23:42:14-07:00 [EMAIL PROTECTED] +6 -0
#   [MM]: Add arch-overridable page table walking machinery.
# 
# include/asm-s390/pgwalk.h
#   2004/08/10 23:42:14-07:00 [EMAIL PROTECTED] +6 -0
#   [MM]: Add arch-overridable page table walking machinery.
# 
# include/asm-ppc64/pgwalk.h
#   2004/08/10 23:42:14-07:00 [EMAIL PROTECTED] +6 -0
#   [MM]: Add arch-overridable page table walking machinery.
# 
# include/asm-ppc/pgwalk.h
#   2004/08/10 23:42:14-07:00 [EMAIL PROTECTED] +6 -0
#   [MM]: Add arch-overridable page table walking machinery.
# 
# include/asm-parisc/pgwalk.h
#   2004/08/10 23:42:14-07:00 [EMAIL PROTECTED] +6 -0
#   [MM]: Add arch-overridable page table walking machinery.
# 
# include/asm-mips/pgwalk.h
#   2004/08/10 23:42:14-07:00 [EMAIL PROTECTED] +6 -0
#   [MM]: Add arch-overridable page table walking machinery.
# 
# include/asm-m68knommu/pgwalk.h
#   2004/08/10 23:42:14-07:00 [EMAIL PROTECTED] +6 -0
#   [MM]: Add arch-overridable page table walking machinery.
# 
# include/asm-m68k/pgwalk.h
#   2004/08/10 23:42:14-07:00 [EMAIL PROTECTED] +6 -0
#   [MM]: Add arch-overridable page table walking machinery.
# 
# include/asm-ia64/pgwalk.h
#   2004/08/10 23:42:14-07:00 [EMAIL PROTECTED] +6 -0
#   [MM]: Add arch-overridable page table walking machinery.
# 
# include/asm-i386/pgw

Re: [PATCH 2/2] page table iterators

2005-02-22 Thread Nick Piggin
Hugh Dickins wrote:
On Sun, 20 Feb 2005, Nick Piggin wrote:


Open coding is probably the smaller evil.
And they're really not changed that often.

My opinion FWIW: I'm all for regularizing the pagetable loops to
work the same way, changing their variables to use the same names,
improving their efficiency; but I do like to see what a loop is up to.
list_for_each and friends are very widely used, they're fine, and I'm
quite glad to have their prefetching hidden away from me; but usually
I groan, grin and bear it, each time someone devises a clever new
for_each macro concealing half the details of some specialist loop.
In a minority?
OK, I think Andrew is now sitting on the fence after seeing the
code. So you (and Andi?) are the ones with remaining reservations
about this.
I don't disagree with your stance entirely, Hugh. I think these
macros are close to being too complicated... But I don't think
they is hiding too much detail: we all know that conceptually,
walking a page table page is reasonably simple. There are just a
few tricky bits like wrapping and termination that caused such
a divergent range of implementations - I would argue that hiding
these details is OK, because they are basically inconsequencial
to the job at hand. I think that actually makes the high level
intention of the code clearer, if anything.
If you are reading just the patch, that doesn't quite do it
justice IMO - in that case, have a look at the code after the
patch is applied (I can send you one which applies to current
kernels if you'd like).
Also, the implementation of the macros is not insanely difficult
to understand, so the details are still accessible.
Lastly, they fold to 2 and 3 levels easily, which is something
that couldn't sanely be done with the open-coded implementation.
I think with an infinitely smart compiler, there shouldn't need
to be any folding here. But in practice I see quite a large
speedup, which is something we shouldn't ignore.
I do think that they are probably not ideal candidates for a
more general abstraction that would allow for example the
transparent drop in of Dave's bitmap walking functions (it
would be possible, but would not be pretty AFAIKS). I have some
other ideas to work towards those goals, but before that I
think these macros do help with the deficiencies of the current
situation.
Nick
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] page table iterators

2005-02-21 Thread Nick Piggin
Nick Piggin wrote:
Haven't yet pulled out a pre-4-level kernel to see how 3-level compares
I guess I'll do that now.
Close.
Before 4level: 119.5us, after folded walkers: 132.8us
I think most of this is now coming from clear_page_range, rather
than the actual traversing of the page tables (because they should
be completely folded by now):
before:
  4089 total  0.0017
   753 kmap_atomic4.7358
   682 do_wp_page 0.6713
   680 do_page_fault  0.4561
   261 zap_pte_range  0.3625
   176 copy_page_range0.2133
   159 pte_alloc_one  1.5743
   145 clear_page_tables  0.4866
after:
  4307 total  0.0018
   676 kmap_atomic4.2516
   665 do_page_fault  0.4472
   615 do_wp_page 0.6225
   550 clear_page_range   0.9982
   262 zap_pte_range  0.4870
I think the additional work done by clear_page_range (versus
clear_page_tables) justifies the extra cost, even for 3-level
architectures.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] page table iterators

2005-02-21 Thread Nick Piggin
Benjamin Herrenschmidt wrote:
All of them are slightly differently implemented, some check overflow,
some don't, some have redudant checking, some aren't even consistent
between all 3/4 loops of a given walk routine set, and we have seen the
tendency to introduce subtle bugs in one of them when they all have to
be changed for some reason.
I'm all for turning them into something more consistent, and I like the
for_each_* idea...
It also allows to completely remove the code of the unused levels on 2
and 3 level page tables easily, regaining some of the perfs lost by the
move to 4 levels.
It appears to do even better on 2-levels (i386, !PAE) than the old
3-level code, not surprisingly. lmbench fork+exit overhead is under
100us on a 3.4GHz xeon now, which is the lowest I've seen.
Haven't yet pulled out a pre-4-level kernel to see how 3-level compares
I guess I'll do that now.
Now, we also need, in the long run, to improve perfs of walking the page
tables, especially PTEs, for things like tearing down processes or fork,
for example via a bitmap of used PGD entries etc... 

With proper iterators, such a thing could be implemented just by
modifying the iterator, and all loops would benefit from it.
After looking at David's bitmap walking code, I'm starting to think
that my current macros only _just_ scrape by because of the uniform
nature of the walkers, and their relative simplicity. Anything much
more complex will start to get ugly.
I'd like to look at a slightly more involved reworking in order to
nicely support optimisations like bitmap walking, without blowing out
the complexity of the macros and without hiding too much of the
workings.
However, my main aim for these macros was mainly to fix the
performance regressions on 2 and 3 level architectures. Ben's
complaints about these loops just served to hurry it along. I think
that these reasons (performance, code consistency) make it a good
idea.
Nick
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] page table iterators

2005-02-20 Thread Benjamin Herrenschmidt
On Sun, 2005-02-20 at 22:40 -0800, Andrew Morton wrote:
> Hugh Dickins <[EMAIL PROTECTED]> wrote:
> >
> > My opinion FWIW: I'm all for regularizing the pagetable loops to
> >  work the same way, changing their variables to use the same names,
> >  improving their efficiency; but I do like to see what a loop is up to.
> > 
> >  list_for_each and friends are very widely used, they're fine, and I'm
> >  quite glad to have their prefetching hidden away from me; but usually
> >  I groan, grin and bear it, each time someone devises a clever new
> >  for_each macro concealing half the details of some specialist loop.
> > 
> >  In a minority?
> 
> Of two.

Well, we basically have that bunch of loops that all do the same thing
to iterate the page tables. Only the inner part is different (that is
what is done on each PTE).

All of them are slightly differently implemented, some check overflow,
some don't, some have redudant checking, some aren't even consistent
between all 3/4 loops of a given walk routine set, and we have seen the
tendency to introduce subtle bugs in one of them when they all have to
be changed for some reason.

I'm all for turning them into something more consistent, and I like the
for_each_* idea...

It also allows to completely remove the code of the unused levels on 2
and 3 level page tables easily, regaining some of the perfs lost by the
move to 4 levels.

Now, we also need, in the long run, to improve perfs of walking the page
tables, especially PTEs, for things like tearing down processes or fork,
for example via a bitmap of used PGD entries etc... 

With proper iterators, such a thing could be implemented just by
modifying the iterator, and all loops would benefit from it.

I think that is enough to justify the move.

Ben.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] page table iterators

2005-02-20 Thread Andrew Morton
Hugh Dickins <[EMAIL PROTECTED]> wrote:
>
> My opinion FWIW: I'm all for regularizing the pagetable loops to
>  work the same way, changing their variables to use the same names,
>  improving their efficiency; but I do like to see what a loop is up to.
> 
>  list_for_each and friends are very widely used, they're fine, and I'm
>  quite glad to have their prefetching hidden away from me; but usually
>  I groan, grin and bear it, each time someone devises a clever new
>  for_each macro concealing half the details of some specialist loop.
> 
>  In a minority?

Of two.

Let's see what they look like.  They'd need to be very good, IMO.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] page table iterators

2005-02-20 Thread Hugh Dickins
On Sun, 20 Feb 2005, Nick Piggin wrote:
> Andi Kleen wrote:
> > 
> > The problem is just that these walker macros when they
> > do all the lazy walking stuff will be quite complicated.
> > And I don't really want another uaccess.h-like macro mess.
> > 
> > Yes currently they look simple, but that will change.
> 
> But even in that case, it will still be better to have the
> extra complexity once in the macro rather than throughout mm/
> 
> > Open coding is probably the smaller evil.
> > And they're really not changed that often.

My opinion FWIW: I'm all for regularizing the pagetable loops to
work the same way, changing their variables to use the same names,
improving their efficiency; but I do like to see what a loop is up to.

list_for_each and friends are very widely used, they're fine, and I'm
quite glad to have their prefetching hidden away from me; but usually
I groan, grin and bear it, each time someone devises a clever new
for_each macro concealing half the details of some specialist loop.

In a minority?
Hugh
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] page table iterators

2005-02-20 Thread Nick Piggin
Andi Kleen wrote:
On Thu, Feb 17, 2005 at 03:30:31PM -0800, David S. Miller wrote:
On Fri, 18 Feb 2005 00:03:42 +0100
Andi Kleen <[EMAIL PROTECTED]> wrote:

And to be honest we only have about 6 or 7 of these walkers
in the whole kernel. And 90% of them are in memory.c
While doing 4level I think I changed all of them around several
times and it wasn't that big an issue.  So it's not that we
have a big pressing problem here... 
It's super error prone.  A regression added by your edit of these

Actually it was in Nick's code (PUD layer ;-).  But I won't argue
that my code didn't have bugs too...

I won't look back to see where the error came from :) But
yeah it is equally (if not more) likely to have come from
me. And it probably did happen because all the code is
slightly different and hard to understand.
walkers for the 4level changes was only discovered and fixed
yesterday by the ppc folks.
I absolutely support any change which consolidates these things.

The problem is just that these walker macros when they
do all the lazy walking stuff will be quite complicated.
And I don't really want another uaccess.h-like macro mess.
Yes currently they look simple, but that will change.
But even in that case, it will still be better to have the
extra complexity once in the macro rather than throughout mm/
Open coding is probably the smaller evil.
And they're really not changed that often.
It is not so much a matter of changing, so much as having 10
slightly different implementations.
I think it should be easier to go from the iterators patch to
perhaps more complex iterators, or some open coding, etc etc.
rather than try to put a big complex pt walker on top of these
10 different open coded implementations.
But perhaps I'm missing something you're not - I'd need to see
the lazy walking code I guess.
Nick
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] page table iterators

2005-02-17 Thread Andi Kleen
On Thu, Feb 17, 2005 at 03:30:31PM -0800, David S. Miller wrote:
> On Fri, 18 Feb 2005 00:03:42 +0100
> Andi Kleen <[EMAIL PROTECTED]> wrote:
> 
> > And to be honest we only have about 6 or 7 of these walkers
> > in the whole kernel. And 90% of them are in memory.c
> > While doing 4level I think I changed all of them around several
> > times and it wasn't that big an issue.  So it's not that we
> > have a big pressing problem here... 
> 
> It's super error prone.  A regression added by your edit of these

Actually it was in Nick's code (PUD layer ;-).  But I won't argue
that my code didn't have bugs too...

> walkers for the 4level changes was only discovered and fixed
> yesterday by the ppc folks.
> 
> I absolutely support any change which consolidates these things.

The problem is just that these walker macros when they
do all the lazy walking stuff will be quite complicated.
And I don't really want another uaccess.h-like macro mess.

Yes currently they look simple, but that will change.

Open coding is probably the smaller evil.

And they're really not changed that often.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] page table iterators

2005-02-17 Thread Andi Kleen
On Fri, Feb 18, 2005 at 10:21:03AM +1100, Benjamin Herrenschmidt wrote:
> On Fri, 2005-02-18 at 00:03 +0100, Andi Kleen wrote:
> 
> > And to be honest we only have about 6 or 7 of these walkers
> > in the whole kernel. And 90% of them are in memory.c
> > While doing 4level I think I changed all of them around several
> > times and it wasn't that big an issue.  So it's not that we
> > have a big pressing problem here... 
> 
> We have about 50% of them in memory.c :) But my main problem is more
> that every single of them is implemented slightly differently.

No much more. But I only count real walkers, not stuff like vmalloc. 

The ioremap duplication over architectures is a bit annoying, but
the fix for that would be to factor the code out completely, not
only improve walking.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] page table iterators

2005-02-17 Thread David S. Miller
On Fri, 18 Feb 2005 00:03:42 +0100
Andi Kleen <[EMAIL PROTECTED]> wrote:

> And to be honest we only have about 6 or 7 of these walkers
> in the whole kernel. And 90% of them are in memory.c
> While doing 4level I think I changed all of them around several
> times and it wasn't that big an issue.  So it's not that we
> have a big pressing problem here... 

It's super error prone.  A regression added by your edit of these
walkers for the 4level changes was only discovered and fixed
yesterday by the ppc folks.

I absolutely support any change which consolidates these things.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] page table iterators

2005-02-17 Thread Benjamin Herrenschmidt
On Fri, 2005-02-18 at 00:03 +0100, Andi Kleen wrote:

> And to be honest we only have about 6 or 7 of these walkers
> in the whole kernel. And 90% of them are in memory.c
> While doing 4level I think I changed all of them around several
> times and it wasn't that big an issue.  So it's not that we
> have a big pressing problem here... 

We have about 50% of them in memory.c :) But my main problem is more
that every single of them is implemented slightly differently.

Going Nick's way is a good start. If they are all consolidated to use
the same macro, they will be easier for you to change later on anyway.

Ben.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] page table iterators

2005-02-17 Thread Andi Kleen
> I though about both ways yesterday, and in the end, I prefer Nick stuff,
> at least for now. It gives us also more flexibility to change gory
> implementation details in the future. I still have to run it through a
> bit of torture testing though.

They're really solving different problems. My code is just aimed
at getting x86-64 fork/exec/etc. as fast as before 4level again
(currently they are significantly slower because they have to walk
a lot more page tables) 

The problem is that the index based approach (I think you have to use
indexes for this, pointers get very messy) probably does not 
fit very well into Nick's complex macros.  

Nick's macros are essentially just code transformations with
some micro optimizations. 

That's not bad, but it won't give you the big speedups 
the lazy walking approach will give.

And to be honest we only have about 6 or 7 of these walkers
in the whole kernel. And 90% of them are in memory.c
While doing 4level I think I changed all of them around several
times and it wasn't that big an issue.  So it's not that we
have a big pressing problem here... 

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] page table iterators

2005-02-17 Thread Benjamin Herrenschmidt
On Thu, 2005-02-17 at 20:43 +0100, Andi Kleen wrote:
> On Fri, Feb 18, 2005 at 01:03:35AM +1100, Nick Piggin wrote:
> > I am pretty surprised myself that I was able to consolidate
> > all "page table range" functions into a single type of iterator
> > (well, there are a couple of variations, but it's not too bad).
> 
> I started a similar project - but it uses the existing loops,
> just using {pte,pmd,pud,pgd}_next. The idea is to optimize
> page table walking by keeping some state in the struct page
> of the page table page that says whether an entry is set 
> or not. To make this work I switched everything to indexes
> instead of pointers.
> 
> Main problem are some nasty include loops. 

I though about both ways yesterday, and in the end, I prefer Nick stuff,
at least for now. It gives us also more flexibility to change gory
implementation details in the future. I still have to run it through a
bit of torture testing though.

Ben.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] page table iterators

2005-02-17 Thread Andi Kleen
On Fri, Feb 18, 2005 at 01:03:35AM +1100, Nick Piggin wrote:
> I am pretty surprised myself that I was able to consolidate
> all "page table range" functions into a single type of iterator
> (well, there are a couple of variations, but it's not too bad).

I started a similar project - but it uses the existing loops,
just using {pte,pmd,pud,pgd}_next. The idea is to optimize
page table walking by keeping some state in the struct page
of the page table page that says whether an entry is set 
or not. To make this work I switched everything to indexes
instead of pointers.

Main problem are some nasty include loops. 

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] page table iterators

2005-02-17 Thread Nick Piggin
Linus Torvalds wrote:
On Fri, 18 Feb 2005, Nick Piggin wrote:
I am pretty surprised myself that I was able to consolidate
all "page table range" functions into a single type of iterator
(well, there are a couple of variations, but it's not too bad).

Ok, this is post-2.6.11 material, so please remind me.
Sure... it will probably be best to go through -mm, but either
way I'll package the patches up nicely and rediff them against
2.6.11 when it comes out.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] page table iterators

2005-02-17 Thread Linus Torvalds


On Fri, 18 Feb 2005, Nick Piggin wrote:
>
> I am pretty surprised myself that I was able to consolidate
> all "page table range" functions into a single type of iterator
> (well, there are a couple of variations, but it's not too bad).

Ok, this is post-2.6.11 material, so please remind me.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/