Re: [PATCH 2/2] page table iterators
On Thu, Feb 24, 2005 at 11:33:50AM -0800, David S. Miller wrote: > On Thu, 24 Feb 2005 11:58:42 + (GMT) > Hugh Dickins <[EMAIL PROTECTED]> wrote: > > > Has anyone _ever_ seen a p??_ERROR message? > > It triggers when you're writing new platform pagetable support > or making drastric changes in same. But on sparc64 I've set > them all to nops to make the code output smaller. :-) I don't think it's useful except for early debugging. Also at least on i386/x86-64 the CPU sets a bit in the page fault handler when it encounters a corrupted page table. On x86-64 it is handled (not on i386) -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] page table iterators
Hugh Dickins wrote: At one stage I was adding unlikelies to all the p??_bads, then it seemed more sensible to hide that in a new macro (which of course must do the none and bad tests inline, before going off to the function). Yeah that sounds OK. I think (un)likely can propagate through inline functions too, if that's any help to you. We could at little cost. But I think if these messages come up at all, they're likely to come up in clumps, where the backtrace won't actually be giving any interesting info, and the quantity of them be a nuisance itself. I'd rather leave it to the next person who gets the error and wants the backtrace to add it. You're probably right - I know when I see them (from my hacking up the code) they usually come in clumps :P Nick - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] page table iterators
On Fri, 25 Feb 2005, Nick Piggin wrote: > Hugh Dickins wrote: > > > > Has anyone _ever_ seen a p??_ERROR message? I'm inclined to just > > put three functions into mm/memory.c to do the p??_ERROR and p??_clear, > > but that way the __FILE__ and __LINE__ will always come out the same. > > I think if it ever proves a problem, we'd just add in a dump_stack. > > I think a function is the most sensible. And a good idea, it should > reduce the icache pressure in the loops (although gcc does seem to > do a pretty good job of moving unlikely()s away from the fastpath). At one stage I was adding unlikelies to all the p??_bads, then it seemed more sensible to hide that in a new macro (which of course must do the none and bad tests inline, before going off to the function). David's response confirms that __FILE__,__LINE__ shouldn't be an issue. > I think at the point these things get detected, there is little use > for having a dump_stack. But we may as well add one anyway if it is > an out of line function? We could at little cost. But I think if these messages come up at all, they're likely to come up in clumps, where the backtrace won't actually be giving any interesting info, and the quantity of them be a nuisance itself. I'd rather leave it to the next person who gets the error and wants the backtrace to add it. Hugh - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] page table iterators
Hugh Dickins wrote: On Thu, 24 Feb 2005, Nick Piggin wrote: pud_addr_end? next = pud_addr_end(addr, end); Hmm, yes, I'll go with that, thanks (unless a better idea follows). Something I do intend on top of what I sent before, is another set of three macros, like if (pud_none_or_clear_bad(pud)) continue; to replace all the p??_none, p??_bad clauses: not to save space, but just for clarity, those loops now seeming dominated by the unlikeliest of cases. Has anyone _ever_ seen a p??_ERROR message? I'm inclined to just put three functions into mm/memory.c to do the p??_ERROR and p??_clear, but that way the __FILE__ and __LINE__ will always come out the same. I think if it ever proves a problem, we'd just add in a dump_stack. I think a function is the most sensible. And a good idea, it should reduce the icache pressure in the loops (although gcc does seem to do a pretty good job of moving unlikely()s away from the fastpath). I think at the point these things get detected, there is little use for having a dump_stack. But we may as well add one anyway if it is an out of line function? Nick - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] page table iterators
On Thu, 24 Feb 2005 11:58:42 + (GMT) Hugh Dickins <[EMAIL PROTECTED]> wrote: > Has anyone _ever_ seen a p??_ERROR message? It triggers when you're writing new platform pagetable support or making drastric changes in same. But on sparc64 I've set them all to nops to make the code output smaller. :-) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] page table iterators
On Thu, 24 Feb 2005, Nick Piggin wrote: > > pud_addr_end? next = pud_addr_end(addr, end); Hmm, yes, I'll go with that, thanks (unless a better idea follows). Something I do intend on top of what I sent before, is another set of three macros, like if (pud_none_or_clear_bad(pud)) continue; to replace all the p??_none, p??_bad clauses: not to save space, but just for clarity, those loops now seeming dominated by the unlikeliest of cases. Has anyone _ever_ seen a p??_ERROR message? I'm inclined to just put three functions into mm/memory.c to do the p??_ERROR and p??_clear, but that way the __FILE__ and __LINE__ will always come out the same. I think if it ever proves a problem, we'd just add in a dump_stack. Hugh - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] page table iterators
On Thu, 2005-02-24 at 05:12 +, Hugh Dickins wrote: > On Thu, 24 Feb 2005, Nick Piggin wrote: > > OK after sleeping on it, I'm warming to your way. > > > > I don't think it makes something like David's modifications any > > easier, but mine didn't go a long way to that end either. And > > being a more incremental approach gives us more room to move in > > future (for example, maybe toward something that really *will* > > accommodate the bitmap walking code nicely). > > I'll take a quick look at David's today. > Just so long as we don't make them harder. > No, I think we may want to move to something better abstracted: it makes things sufficiently complex that you wouldn't want to have it open coded everywhere. But no, you're not making it harder than the present situation. > > So I'd be pretty happy for you to queue this up with Andrew for > > 2.6.12. Anyone else? > > Oh, okay, thanks. You weren't very happy with p??_limit(addr, end), > and good naming is important to me. I didn't care for your tentative > p??_span or p??_span_end. Would p??_end be better? p??_enda would > be fun for one of them... > pud_addr_end? http://mobile.yahoo.com.au - Yahoo! Mobile - Check & compose your email via SMS on your Telstra or Vodafone mobile. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] page table iterators
On Thu, 24 Feb 2005, Nick Piggin wrote: > Hugh Dickins wrote: > > > I'm inlining pmd and pud levels, but not pte and pgd levels. > > OK - that's probably sufficient for debugging. There is only so > much that can go wrong in the middle levels... Yes, that was my thinking. > how does it look > performance wise? (I can give it a test when it gets split out) Yesterday shattered in various directions, I hope to try today. > > One point worth making, I do believe throughout that whatever the > > address layout, "end" cannot be 0 - BUG_ON(addr >= end) assures. Of course, that does allow some simplifications in your for_each macros; but it still looked like my p??_limits were better for shortest codepath, and close to yours for codesize. > OK after sleeping on it, I'm warming to your way. > > I don't think it makes something like David's modifications any > easier, but mine didn't go a long way to that end either. And > being a more incremental approach gives us more room to move in > future (for example, maybe toward something that really *will* > accommodate the bitmap walking code nicely). I'll take a quick look at David's today. Just so long as we don't make them harder. > So I'd be pretty happy for you to queue this up with Andrew for > 2.6.12. Anyone else? Oh, okay, thanks. You weren't very happy with p??_limit(addr, end), and good naming is important to me. I didn't care for your tentative p??_span or p??_span_end. Would p??_end be better? p??_enda would be fun for one of them... Hugh - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] page table iterators
On Thu, 24 Feb 2005 10:52:23 +1100 Nick Piggin <[EMAIL PROTECTED]> wrote: > So I'd be pretty happy for you to queue this up with Andrew for > 2.6.12. Anyone else? No objections from me. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] page table iterators
Hugh Dickins wrote: I'm off to bed, but since your appetite for looking at patches is greater than mine, I'll throw what I'm currently testing over the wall to you now. Against 2.6.11-rc4-bk9, but my starting point was obviously your patches. Not yet split up, but clearly should be. Yeah you've snuck a few other clever things in there ;) Includes mm/swapfile.c which you missed. I'm inlining pmd and pud Thanks. levels, but not pte and pgd levels. No description yet, sorry. OK - that's probably sufficient for debugging. There is only so much that can go wrong in the middle levels... how does it look performance wise? (I can give it a test when it gets split out) One point worth making, I do believe throughout that whatever the address layout, "end" cannot be 0 - BUG_ON(addr >= end) assures. OK after sleeping on it, I'm warming to your way. I don't think it makes something like David's modifications any easier, but mine didn't go a long way to that end either. And being a more incremental approach gives us more room to move in future (for example, maybe toward something that really *will* accommodate the bitmap walking code nicely). So I'd be pretty happy for you to queue this up with Andrew for 2.6.12. Anyone else? Nick - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] page table iterators
On Tue, 2005-02-22 at 20:31 -0800, David S. Miller wrote: > I just got also reminded that we walk these damn pagetables completely > twice every exit, once to unmap the VMAs pte mappings, once again to > zap the page tables. It might be fruitful to explore combining > those two steps, perhaps not. > I'm going to have a look at refcounting page table pages, which will hopefully allow us to get back (and more) the clear_page_range overhead introduced by the aggressive page table freeing. It may also allow nice things like dropping file backed page table mappings if they get reclaimed, and also a single walk to do the freeing. I haven't looked into details yet though, these are just vague hopes. > Anyways, comments and improvment suggestions welcome. Particularly > interesting would be if this thing helps a lot on other platforms > too, such as x86_64, ia64, alpha and ppc64. > I have a feeling it should provide nice benefits to all archs if we get it into all the walkers. Downsides are few - the bitmap walk probably only becomes more expensive when all but a handful of cachelines are present in a page table page. I'd like to look at ways to make this patch happen with you soon... First, for 2.6.12 my main concern is to get pt walking consistent, and try to claw back some of the clear_page_range regression. Thanks, Nick Find local movie times and trailers on Yahoo! Movies. http://au.movies.yahoo.com - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] page table iterators
On Wed, 23 Feb 2005 15:49:30 +1100 Nick Piggin <[EMAIL PROTECTED]> wrote: > > It's easy to toy with the sparc64 optimization on other platforms, > > just add the necessary hacks to pmd_set and pgd_set, allocation > > of pmd and pgd tables > > David: just an implementation detail that I had meant to bring > up earlier - would it feel like less of a hack to put these in > pmd_populate and pgd_populate? Sure, no problem. They get defined to pmd_set/pgd_set calls anyways. But wouldn't that miss pgd_clear() and pmd_clear()? Someone may find it worthwhile to, on a *_clear(), to see if a set bit can now be clear because all the neighboring entries are empty as well. That might have been the reason I put it there, but I may be giving myself too much credit :-) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] page table iterators
On Tue, 2005-02-22 at 20:31 -0800, David S. Miller wrote: > On Wed, 23 Feb 2005 02:06:28 + (GMT) > Hugh Dickins <[EMAIL PROTECTED]> wrote: > > > I've not seen Dave's bitmap walking functions (for clearing?), > > would they fit in better with my way? > Hugh: I'll have more of a look through your patch when I get some time... to be honest I'm not too worried either way, so long as one or the other gets in. Very trivial point, but I'm not sure that I like the name p?d_limit... maybe p?d_span or _span_end... hmm, they're not really pleasing either. You _are_ repeating a bit of mindless loop accounting in every page table walk, and it isn't completely clear to me that it is giving you much more flexibility (than for_each_*). But my loops _are_ a bit contorted. > This is what Nick is referring to: > [snip] > It's easy to toy with the sparc64 optimization on other platforms, > just add the necessary hacks to pmd_set and pgd_set, allocation > of pmd and pgd tables David: just an implementation detail that I had meant to bring up earlier - would it feel like less of a hack to put these in pmd_populate and pgd_populate? Nick - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] page table iterators
On Wed, 23 Feb 2005 02:06:28 + (GMT) Hugh Dickins <[EMAIL PROTECTED]> wrote: > I've not seen Dave's bitmap walking functions (for clearing?), > would they fit in better with my way? This is what Nick is referring to: I hacked up something slightly different today. I only have it being used by clear_page_range() but it is extremely effective. Things like fork+exit latencies on my 750Mhz sparc64 box went from ~490 microseconds to ~367 microseconds. fork+execve latency went down from ~1595 microseconds to ~1351 microseconds. Two issues: 1) I'm not terribly satisfied with the interface. I think with some improvements it can be applies to the two other routines this thing really makes sense for, namely copy_page_range and unmap_page_range 2) I don't think it will collapse well for 2-level page tables, someone take a look? It's easy to toy with the sparc64 optimization on other platforms, just add the necessary hacks to pmd_set and pgd_set, allocation of pmd and pgd tables, use "PAGE_SHIFT - 5" instead of "PAGE_SHIFT - 6" on 32-bit platforms, and then copy the asm-sparc64/pgwalk.h bits over into your platforms asm-${ARCH}/pgwalk.h I just got also reminded that we walk these damn pagetables completely twice every exit, once to unmap the VMAs pte mappings, once again to zap the page tables. It might be fruitful to explore combining those two steps, perhaps not. Anyways, comments and improvment suggestions welcome. Particularly interesting would be if this thing helps a lot on other platforms too, such as x86_64, ia64, alpha and ppc64. # This is a BitKeeper generated diff -Nru style patch. # # ChangeSet # 2004/08/10 23:44:24-07:00 [EMAIL PROTECTED] # [MM]: Add arch-overridable page table walking machinery. # # Currently very rudimentary but is used fully for # clear_page_range(). An optimized implementation # is there for sparc64 and it is extremely effective # particularly for 64-bit processes. # # For things like lat_fork and friends clear_page_tables() # use to be 2nd or 3rd in the kernel profile, now it has # dropped to the 20th or so entry. # # Signed-off-by: David S. Miller # # mm/memory.c # 2004/08/10 23:42:42-07:00 [EMAIL PROTECTED] +10 -26 # [MM]: Add arch-overridable page table walking machinery. # # include/asm-sparc64/pgtable.h # 2004/08/10 23:42:42-07:00 [EMAIL PROTECTED] +28 -4 # [MM]: Add arch-overridable page table walking machinery. # # include/asm-sparc64/pgalloc.h # 2004/08/10 23:42:42-07:00 [EMAIL PROTECTED] +10 -2 # [MM]: Add arch-overridable page table walking machinery. # # arch/sparc64/mm/init.c # 2004/08/10 23:42:42-07:00 [EMAIL PROTECTED] +2 -2 # [MM]: Add arch-overridable page table walking machinery. # # include/asm-x86_64/pgwalk.h # 2004/08/10 23:42:14-07:00 [EMAIL PROTECTED] +6 -0 # [MM]: Add arch-overridable page table walking machinery. # # include/asm-v850/pgwalk.h # 2004/08/10 23:42:14-07:00 [EMAIL PROTECTED] +6 -0 # [MM]: Add arch-overridable page table walking machinery. # # include/asm-um/pgwalk.h # 2004/08/10 23:42:14-07:00 [EMAIL PROTECTED] +6 -0 # [MM]: Add arch-overridable page table walking machinery. # # include/asm-sparc64/pgwalk.h # 2004/08/10 23:42:14-07:00 [EMAIL PROTECTED] +114 -0 # [MM]: Add arch-overridable page table walking machinery. # # include/asm-sparc/pgwalk.h # 2004/08/10 23:42:14-07:00 [EMAIL PROTECTED] +6 -0 # [MM]: Add arch-overridable page table walking machinery. # # include/asm-sh64/pgwalk.h # 2004/08/10 23:42:14-07:00 [EMAIL PROTECTED] +6 -0 # [MM]: Add arch-overridable page table walking machinery. # # include/asm-sh/pgwalk.h # 2004/08/10 23:42:14-07:00 [EMAIL PROTECTED] +6 -0 # [MM]: Add arch-overridable page table walking machinery. # # include/asm-s390/pgwalk.h # 2004/08/10 23:42:14-07:00 [EMAIL PROTECTED] +6 -0 # [MM]: Add arch-overridable page table walking machinery. # # include/asm-ppc64/pgwalk.h # 2004/08/10 23:42:14-07:00 [EMAIL PROTECTED] +6 -0 # [MM]: Add arch-overridable page table walking machinery. # # include/asm-ppc/pgwalk.h # 2004/08/10 23:42:14-07:00 [EMAIL PROTECTED] +6 -0 # [MM]: Add arch-overridable page table walking machinery. # # include/asm-parisc/pgwalk.h # 2004/08/10 23:42:14-07:00 [EMAIL PROTECTED] +6 -0 # [MM]: Add arch-overridable page table walking machinery. # # include/asm-mips/pgwalk.h # 2004/08/10 23:42:14-07:00 [EMAIL PROTECTED] +6 -0 # [MM]: Add arch-overridable page table walking machinery. # # include/asm-m68knommu/pgwalk.h # 2004/08/10 23:42:14-07:00 [EMAIL PROTECTED] +6 -0 # [MM]: Add arch-overridable page table walking machinery. # # include/asm-m68k/pgwalk.h # 2004/08/10 23:42:14-07:00 [EMAIL PROTECTED] +6 -0 # [MM]: Add arch-overridable page table walking machinery. # # include/asm-ia64/pgwalk.h # 2004/08/10 23:42:14-07:00 [EMAIL PROTECTED] +6 -0 # [MM]: Add arch-overridable page table walking machinery. # # include/asm-i386/pgw
Re: [PATCH 2/2] page table iterators
Hugh Dickins wrote: On Sun, 20 Feb 2005, Nick Piggin wrote: Open coding is probably the smaller evil. And they're really not changed that often. My opinion FWIW: I'm all for regularizing the pagetable loops to work the same way, changing their variables to use the same names, improving their efficiency; but I do like to see what a loop is up to. list_for_each and friends are very widely used, they're fine, and I'm quite glad to have their prefetching hidden away from me; but usually I groan, grin and bear it, each time someone devises a clever new for_each macro concealing half the details of some specialist loop. In a minority? OK, I think Andrew is now sitting on the fence after seeing the code. So you (and Andi?) are the ones with remaining reservations about this. I don't disagree with your stance entirely, Hugh. I think these macros are close to being too complicated... But I don't think they is hiding too much detail: we all know that conceptually, walking a page table page is reasonably simple. There are just a few tricky bits like wrapping and termination that caused such a divergent range of implementations - I would argue that hiding these details is OK, because they are basically inconsequencial to the job at hand. I think that actually makes the high level intention of the code clearer, if anything. If you are reading just the patch, that doesn't quite do it justice IMO - in that case, have a look at the code after the patch is applied (I can send you one which applies to current kernels if you'd like). Also, the implementation of the macros is not insanely difficult to understand, so the details are still accessible. Lastly, they fold to 2 and 3 levels easily, which is something that couldn't sanely be done with the open-coded implementation. I think with an infinitely smart compiler, there shouldn't need to be any folding here. But in practice I see quite a large speedup, which is something we shouldn't ignore. I do think that they are probably not ideal candidates for a more general abstraction that would allow for example the transparent drop in of Dave's bitmap walking functions (it would be possible, but would not be pretty AFAIKS). I have some other ideas to work towards those goals, but before that I think these macros do help with the deficiencies of the current situation. Nick - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] page table iterators
Nick Piggin wrote: Haven't yet pulled out a pre-4-level kernel to see how 3-level compares I guess I'll do that now. Close. Before 4level: 119.5us, after folded walkers: 132.8us I think most of this is now coming from clear_page_range, rather than the actual traversing of the page tables (because they should be completely folded by now): before: 4089 total 0.0017 753 kmap_atomic4.7358 682 do_wp_page 0.6713 680 do_page_fault 0.4561 261 zap_pte_range 0.3625 176 copy_page_range0.2133 159 pte_alloc_one 1.5743 145 clear_page_tables 0.4866 after: 4307 total 0.0018 676 kmap_atomic4.2516 665 do_page_fault 0.4472 615 do_wp_page 0.6225 550 clear_page_range 0.9982 262 zap_pte_range 0.4870 I think the additional work done by clear_page_range (versus clear_page_tables) justifies the extra cost, even for 3-level architectures. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] page table iterators
Benjamin Herrenschmidt wrote: All of them are slightly differently implemented, some check overflow, some don't, some have redudant checking, some aren't even consistent between all 3/4 loops of a given walk routine set, and we have seen the tendency to introduce subtle bugs in one of them when they all have to be changed for some reason. I'm all for turning them into something more consistent, and I like the for_each_* idea... It also allows to completely remove the code of the unused levels on 2 and 3 level page tables easily, regaining some of the perfs lost by the move to 4 levels. It appears to do even better on 2-levels (i386, !PAE) than the old 3-level code, not surprisingly. lmbench fork+exit overhead is under 100us on a 3.4GHz xeon now, which is the lowest I've seen. Haven't yet pulled out a pre-4-level kernel to see how 3-level compares I guess I'll do that now. Now, we also need, in the long run, to improve perfs of walking the page tables, especially PTEs, for things like tearing down processes or fork, for example via a bitmap of used PGD entries etc... With proper iterators, such a thing could be implemented just by modifying the iterator, and all loops would benefit from it. After looking at David's bitmap walking code, I'm starting to think that my current macros only _just_ scrape by because of the uniform nature of the walkers, and their relative simplicity. Anything much more complex will start to get ugly. I'd like to look at a slightly more involved reworking in order to nicely support optimisations like bitmap walking, without blowing out the complexity of the macros and without hiding too much of the workings. However, my main aim for these macros was mainly to fix the performance regressions on 2 and 3 level architectures. Ben's complaints about these loops just served to hurry it along. I think that these reasons (performance, code consistency) make it a good idea. Nick - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] page table iterators
On Sun, 2005-02-20 at 22:40 -0800, Andrew Morton wrote: > Hugh Dickins <[EMAIL PROTECTED]> wrote: > > > > My opinion FWIW: I'm all for regularizing the pagetable loops to > > work the same way, changing their variables to use the same names, > > improving their efficiency; but I do like to see what a loop is up to. > > > > list_for_each and friends are very widely used, they're fine, and I'm > > quite glad to have their prefetching hidden away from me; but usually > > I groan, grin and bear it, each time someone devises a clever new > > for_each macro concealing half the details of some specialist loop. > > > > In a minority? > > Of two. Well, we basically have that bunch of loops that all do the same thing to iterate the page tables. Only the inner part is different (that is what is done on each PTE). All of them are slightly differently implemented, some check overflow, some don't, some have redudant checking, some aren't even consistent between all 3/4 loops of a given walk routine set, and we have seen the tendency to introduce subtle bugs in one of them when they all have to be changed for some reason. I'm all for turning them into something more consistent, and I like the for_each_* idea... It also allows to completely remove the code of the unused levels on 2 and 3 level page tables easily, regaining some of the perfs lost by the move to 4 levels. Now, we also need, in the long run, to improve perfs of walking the page tables, especially PTEs, for things like tearing down processes or fork, for example via a bitmap of used PGD entries etc... With proper iterators, such a thing could be implemented just by modifying the iterator, and all loops would benefit from it. I think that is enough to justify the move. Ben. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] page table iterators
Hugh Dickins <[EMAIL PROTECTED]> wrote: > > My opinion FWIW: I'm all for regularizing the pagetable loops to > work the same way, changing their variables to use the same names, > improving their efficiency; but I do like to see what a loop is up to. > > list_for_each and friends are very widely used, they're fine, and I'm > quite glad to have their prefetching hidden away from me; but usually > I groan, grin and bear it, each time someone devises a clever new > for_each macro concealing half the details of some specialist loop. > > In a minority? Of two. Let's see what they look like. They'd need to be very good, IMO. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] page table iterators
On Sun, 20 Feb 2005, Nick Piggin wrote: > Andi Kleen wrote: > > > > The problem is just that these walker macros when they > > do all the lazy walking stuff will be quite complicated. > > And I don't really want another uaccess.h-like macro mess. > > > > Yes currently they look simple, but that will change. > > But even in that case, it will still be better to have the > extra complexity once in the macro rather than throughout mm/ > > > Open coding is probably the smaller evil. > > And they're really not changed that often. My opinion FWIW: I'm all for regularizing the pagetable loops to work the same way, changing their variables to use the same names, improving their efficiency; but I do like to see what a loop is up to. list_for_each and friends are very widely used, they're fine, and I'm quite glad to have their prefetching hidden away from me; but usually I groan, grin and bear it, each time someone devises a clever new for_each macro concealing half the details of some specialist loop. In a minority? Hugh - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] page table iterators
Andi Kleen wrote: On Thu, Feb 17, 2005 at 03:30:31PM -0800, David S. Miller wrote: On Fri, 18 Feb 2005 00:03:42 +0100 Andi Kleen <[EMAIL PROTECTED]> wrote: And to be honest we only have about 6 or 7 of these walkers in the whole kernel. And 90% of them are in memory.c While doing 4level I think I changed all of them around several times and it wasn't that big an issue. So it's not that we have a big pressing problem here... It's super error prone. A regression added by your edit of these Actually it was in Nick's code (PUD layer ;-). But I won't argue that my code didn't have bugs too... I won't look back to see where the error came from :) But yeah it is equally (if not more) likely to have come from me. And it probably did happen because all the code is slightly different and hard to understand. walkers for the 4level changes was only discovered and fixed yesterday by the ppc folks. I absolutely support any change which consolidates these things. The problem is just that these walker macros when they do all the lazy walking stuff will be quite complicated. And I don't really want another uaccess.h-like macro mess. Yes currently they look simple, but that will change. But even in that case, it will still be better to have the extra complexity once in the macro rather than throughout mm/ Open coding is probably the smaller evil. And they're really not changed that often. It is not so much a matter of changing, so much as having 10 slightly different implementations. I think it should be easier to go from the iterators patch to perhaps more complex iterators, or some open coding, etc etc. rather than try to put a big complex pt walker on top of these 10 different open coded implementations. But perhaps I'm missing something you're not - I'd need to see the lazy walking code I guess. Nick - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] page table iterators
On Thu, Feb 17, 2005 at 03:30:31PM -0800, David S. Miller wrote: > On Fri, 18 Feb 2005 00:03:42 +0100 > Andi Kleen <[EMAIL PROTECTED]> wrote: > > > And to be honest we only have about 6 or 7 of these walkers > > in the whole kernel. And 90% of them are in memory.c > > While doing 4level I think I changed all of them around several > > times and it wasn't that big an issue. So it's not that we > > have a big pressing problem here... > > It's super error prone. A regression added by your edit of these Actually it was in Nick's code (PUD layer ;-). But I won't argue that my code didn't have bugs too... > walkers for the 4level changes was only discovered and fixed > yesterday by the ppc folks. > > I absolutely support any change which consolidates these things. The problem is just that these walker macros when they do all the lazy walking stuff will be quite complicated. And I don't really want another uaccess.h-like macro mess. Yes currently they look simple, but that will change. Open coding is probably the smaller evil. And they're really not changed that often. -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] page table iterators
On Fri, Feb 18, 2005 at 10:21:03AM +1100, Benjamin Herrenschmidt wrote: > On Fri, 2005-02-18 at 00:03 +0100, Andi Kleen wrote: > > > And to be honest we only have about 6 or 7 of these walkers > > in the whole kernel. And 90% of them are in memory.c > > While doing 4level I think I changed all of them around several > > times and it wasn't that big an issue. So it's not that we > > have a big pressing problem here... > > We have about 50% of them in memory.c :) But my main problem is more > that every single of them is implemented slightly differently. No much more. But I only count real walkers, not stuff like vmalloc. The ioremap duplication over architectures is a bit annoying, but the fix for that would be to factor the code out completely, not only improve walking. -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] page table iterators
On Fri, 18 Feb 2005 00:03:42 +0100 Andi Kleen <[EMAIL PROTECTED]> wrote: > And to be honest we only have about 6 or 7 of these walkers > in the whole kernel. And 90% of them are in memory.c > While doing 4level I think I changed all of them around several > times and it wasn't that big an issue. So it's not that we > have a big pressing problem here... It's super error prone. A regression added by your edit of these walkers for the 4level changes was only discovered and fixed yesterday by the ppc folks. I absolutely support any change which consolidates these things. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] page table iterators
On Fri, 2005-02-18 at 00:03 +0100, Andi Kleen wrote: > And to be honest we only have about 6 or 7 of these walkers > in the whole kernel. And 90% of them are in memory.c > While doing 4level I think I changed all of them around several > times and it wasn't that big an issue. So it's not that we > have a big pressing problem here... We have about 50% of them in memory.c :) But my main problem is more that every single of them is implemented slightly differently. Going Nick's way is a good start. If they are all consolidated to use the same macro, they will be easier for you to change later on anyway. Ben. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] page table iterators
> I though about both ways yesterday, and in the end, I prefer Nick stuff, > at least for now. It gives us also more flexibility to change gory > implementation details in the future. I still have to run it through a > bit of torture testing though. They're really solving different problems. My code is just aimed at getting x86-64 fork/exec/etc. as fast as before 4level again (currently they are significantly slower because they have to walk a lot more page tables) The problem is that the index based approach (I think you have to use indexes for this, pointers get very messy) probably does not fit very well into Nick's complex macros. Nick's macros are essentially just code transformations with some micro optimizations. That's not bad, but it won't give you the big speedups the lazy walking approach will give. And to be honest we only have about 6 or 7 of these walkers in the whole kernel. And 90% of them are in memory.c While doing 4level I think I changed all of them around several times and it wasn't that big an issue. So it's not that we have a big pressing problem here... -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] page table iterators
On Thu, 2005-02-17 at 20:43 +0100, Andi Kleen wrote: > On Fri, Feb 18, 2005 at 01:03:35AM +1100, Nick Piggin wrote: > > I am pretty surprised myself that I was able to consolidate > > all "page table range" functions into a single type of iterator > > (well, there are a couple of variations, but it's not too bad). > > I started a similar project - but it uses the existing loops, > just using {pte,pmd,pud,pgd}_next. The idea is to optimize > page table walking by keeping some state in the struct page > of the page table page that says whether an entry is set > or not. To make this work I switched everything to indexes > instead of pointers. > > Main problem are some nasty include loops. I though about both ways yesterday, and in the end, I prefer Nick stuff, at least for now. It gives us also more flexibility to change gory implementation details in the future. I still have to run it through a bit of torture testing though. Ben. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] page table iterators
On Fri, Feb 18, 2005 at 01:03:35AM +1100, Nick Piggin wrote: > I am pretty surprised myself that I was able to consolidate > all "page table range" functions into a single type of iterator > (well, there are a couple of variations, but it's not too bad). I started a similar project - but it uses the existing loops, just using {pte,pmd,pud,pgd}_next. The idea is to optimize page table walking by keeping some state in the struct page of the page table page that says whether an entry is set or not. To make this work I switched everything to indexes instead of pointers. Main problem are some nasty include loops. -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] page table iterators
Linus Torvalds wrote: On Fri, 18 Feb 2005, Nick Piggin wrote: I am pretty surprised myself that I was able to consolidate all "page table range" functions into a single type of iterator (well, there are a couple of variations, but it's not too bad). Ok, this is post-2.6.11 material, so please remind me. Sure... it will probably be best to go through -mm, but either way I'll package the patches up nicely and rediff them against 2.6.11 when it comes out. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] page table iterators
On Fri, 18 Feb 2005, Nick Piggin wrote: > > I am pretty surprised myself that I was able to consolidate > all "page table range" functions into a single type of iterator > (well, there are a couple of variations, but it's not too bad). Ok, this is post-2.6.11 material, so please remind me. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/