Re: [patch 20/20] Add apply_to_page_range() which applies a function to a pte range.
Matt Mackall wrote: > I was counting from the top down. > Bottom-up is better; that way the levels don't change for 2,3,4 level pagetables. J - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 20/20] Add apply_to_page_range() which applies a function to a pte range.
On Thu, Apr 19, 2007 at 02:37:53PM -0700, Jeremy Fitzhardinge wrote: > Matt Mackall wrote: > > Haven't thought a huge amount about that. Perhaps it's best done with > > the level 3 callback? > > > > Level 2, I think, assuming you count the pte pages as level 1. I think > it can be dealt with, so long as it correctly skips level 1 callbacks > for superpages, and does the test after the level 2 callback has returned. I was counting from the top down. -- Mathematics is the supreme nostalgia of our time. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 20/20] Add apply_to_page_range() which applies a function to a pte range.
Matt Mackall wrote: > Haven't thought a huge amount about that. Perhaps it's best done with > the level 3 callback? > Level 2, I think, assuming you count the pte pages as level 1. I think it can be dealt with, so long as it correctly skips level 1 callbacks for superpages, and does the test after the level 2 callback has returned. J - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 20/20] Add apply_to_page_range() which applies a function to a pte range.
On Thu, Apr 19, 2007 at 12:44:57PM -0700, Jeremy Fitzhardinge wrote: > Matt Mackall wrote: > > I think adding a flags field and an allocate flag to my callback > > struct would be sufficient here. > > > > Yes, probably. > > What about something that wants to shatter superpages? Haven't thought a huge amount about that. Perhaps it's best done with the level 3 callback? > Don't know that that's a huge concern. The typedef namespace (*_t) is > pretty empty, and its not like we're talking about something that's > visible outside the kernel. And the syntax is *really* ugly. Don't really care about this much. -- Mathematics is the supreme nostalgia of our time. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 20/20] Add apply_to_page_range() which applies a function to a pte range.
Matt Mackall wrote: > I think adding a flags field and an allocate flag to my callback > struct would be sufficient here. > Yes, probably. What about something that wants to shatter superpages? > The syntax is horrible, but I don't think we end up using the > resultant type enough to justify the namespace pollution. > Don't know that that's a huge concern. The typedef namespace (*_t) is pretty empty, and its not like we're talking about something that's visible outside the kernel. And the syntax is *really* ugly. J - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 20/20] Add apply_to_page_range() which applies a function to a pte range.
Matt Mackall wrote: I think adding a flags field and an allocate flag to my callback struct would be sufficient here. Yes, probably. What about something that wants to shatter superpages? The syntax is horrible, but I don't think we end up using the resultant type enough to justify the namespace pollution. Don't know that that's a huge concern. The typedef namespace (*_t) is pretty empty, and its not like we're talking about something that's visible outside the kernel. And the syntax is *really* ugly. J - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 20/20] Add apply_to_page_range() which applies a function to a pte range.
On Thu, Apr 19, 2007 at 12:44:57PM -0700, Jeremy Fitzhardinge wrote: Matt Mackall wrote: I think adding a flags field and an allocate flag to my callback struct would be sufficient here. Yes, probably. What about something that wants to shatter superpages? Haven't thought a huge amount about that. Perhaps it's best done with the level 3 callback? Don't know that that's a huge concern. The typedef namespace (*_t) is pretty empty, and its not like we're talking about something that's visible outside the kernel. And the syntax is *really* ugly. Don't really care about this much. -- Mathematics is the supreme nostalgia of our time. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 20/20] Add apply_to_page_range() which applies a function to a pte range.
Matt Mackall wrote: Haven't thought a huge amount about that. Perhaps it's best done with the level 3 callback? Level 2, I think, assuming you count the pte pages as level 1. I think it can be dealt with, so long as it correctly skips level 1 callbacks for superpages, and does the test after the level 2 callback has returned. J - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 20/20] Add apply_to_page_range() which applies a function to a pte range.
On Thu, Apr 19, 2007 at 02:37:53PM -0700, Jeremy Fitzhardinge wrote: Matt Mackall wrote: Haven't thought a huge amount about that. Perhaps it's best done with the level 3 callback? Level 2, I think, assuming you count the pte pages as level 1. I think it can be dealt with, so long as it correctly skips level 1 callbacks for superpages, and does the test after the level 2 callback has returned. I was counting from the top down. -- Mathematics is the supreme nostalgia of our time. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 20/20] Add apply_to_page_range() which applies a function to a pte range.
Matt Mackall wrote: I was counting from the top down. Bottom-up is better; that way the levels don't change for 2,3,4 level pagetables. J - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 20/20] Add apply_to_page_range() which applies a function to a pte range.
On Wed, Apr 04, 2007 at 11:52:57PM -0700, Jeremy Fitzhardinge wrote: > Matt Mackall wrote: > >> +/* > >> + * Scan a region of virtual memory, filling in page tables as necessary > >> + * and calling a provided function on each leaf page table. > >> + */ > >> > > > > But I'm not sure what the use case is that wants filling in the page > > table..? If both modes really make sense, perhaps a flag could unify > > these differences. > > > > Well, two reasons: > > One is the general one that if you're traversing ptes then they need to > exist to traverse them (for example, if you're creating new mappings). > Obviously if you want to just visit existing mappings, then > instantiating new pagetable is not the right thing to do (and I could > make use of this too). > > The other is that there are various places in the Xen hypervisor API > where you pass in a reference to pte entry for the hypervisor to put > mappings into, and the rest of the pagetable needs to exist. The Xen > code uses the side-effect of apply_to_page_range() to create pagetable > for these calls. I think adding a flags field and an allocate flag to my callback struct would be sufficient here. > > I'd gotten the impression that these sorts of typedefs were out of > > fashion. > > > > In general yes, but for function pointers the syntax is so clumsy that I > think typedefs are OK. The syntax is horrible, but I don't think we end up using the resultant type enough to justify the namespace pollution. -- Mathematics is the supreme nostalgia of our time. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 20/20] Add apply_to_page_range() which applies a function to a pte range.
On Wed, Apr 04, 2007 at 11:52:57PM -0700, Jeremy Fitzhardinge wrote: Matt Mackall wrote: +/* + * Scan a region of virtual memory, filling in page tables as necessary + * and calling a provided function on each leaf page table. + */ But I'm not sure what the use case is that wants filling in the page table..? If both modes really make sense, perhaps a flag could unify these differences. Well, two reasons: One is the general one that if you're traversing ptes then they need to exist to traverse them (for example, if you're creating new mappings). Obviously if you want to just visit existing mappings, then instantiating new pagetable is not the right thing to do (and I could make use of this too). The other is that there are various places in the Xen hypervisor API where you pass in a reference to pte entry for the hypervisor to put mappings into, and the rest of the pagetable needs to exist. The Xen code uses the side-effect of apply_to_page_range() to create pagetable for these calls. I think adding a flags field and an allocate flag to my callback struct would be sufficient here. I'd gotten the impression that these sorts of typedefs were out of fashion. In general yes, but for function pointers the syntax is so clumsy that I think typedefs are OK. The syntax is horrible, but I don't think we end up using the resultant type enough to justify the namespace pollution. -- Mathematics is the supreme nostalgia of our time. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 20/20] Add apply_to_page_range() which applies a function to a pte range.
Matt Mackall wrote: >> +/* >> + * Scan a region of virtual memory, filling in page tables as necessary >> + * and calling a provided function on each leaf page table. >> + */ >> > > But I'm not sure what the use case is that wants filling in the page > table..? If both modes really make sense, perhaps a flag could unify > these differences. > Well, two reasons: One is the general one that if you're traversing ptes then they need to exist to traverse them (for example, if you're creating new mappings). Obviously if you want to just visit existing mappings, then instantiating new pagetable is not the right thing to do (and I could make use of this too). The other is that there are various places in the Xen hypervisor API where you pass in a reference to pte entry for the hypervisor to put mappings into, and the rest of the pagetable needs to exist. The Xen code uses the side-effect of apply_to_page_range() to create pagetable for these calls. >> +typedef int (*pte_fn_t)(pte_t *pte, struct page *pmd_page, unsigned long >> addr, >> +void *data); >> > > I'd gotten the impression that these sorts of typedefs were out of > fashion. > In general yes, but for function pointers the syntax is so clumsy that I think typedefs are OK. >> +static int apply_to_pte_range(struct mm_struct *mm, pmd_t *pmd, >> + unsigned long addr, unsigned long end, >> + pte_fn_t fn, void *data) >> +{ >> +pte_t *pte; >> +int err; >> +struct page *pmd_page; >> +spinlock_t *ptl; >> + >> +pte = (mm == _mm) ? >> +pte_alloc_kernel(pmd, addr) : >> +pte_alloc_map_lock(mm, pmd, addr, ); >> +if (!pte) >> +return -ENOMEM; >> > > Seems a bit awkward to pass mm all the way down the tree just for this > quirk. Which is a bit awkward as it means that whether or not a lock > is held in the callback is context dependent. > Well, it would need mm for just pte_alloc_map_lock() anyway. > smaps, clear_ref, and my pagemap code all use the callback at the > pmd_range level, which a) localizes the pte-level locking concerns > with the user b) amortizes the indirection overhead and c) > (unfortunately) makes the user a bit more complex. > > We should try to measure whether (b) actually makes a difference. > I'll need to look closely at your code again. It would be nice to have one pagewalker. J - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 20/20] Add apply_to_page_range() which applies a function to a pte range.
Matt Mackall wrote: +/* + * Scan a region of virtual memory, filling in page tables as necessary + * and calling a provided function on each leaf page table. + */ But I'm not sure what the use case is that wants filling in the page table..? If both modes really make sense, perhaps a flag could unify these differences. Well, two reasons: One is the general one that if you're traversing ptes then they need to exist to traverse them (for example, if you're creating new mappings). Obviously if you want to just visit existing mappings, then instantiating new pagetable is not the right thing to do (and I could make use of this too). The other is that there are various places in the Xen hypervisor API where you pass in a reference to pte entry for the hypervisor to put mappings into, and the rest of the pagetable needs to exist. The Xen code uses the side-effect of apply_to_page_range() to create pagetable for these calls. +typedef int (*pte_fn_t)(pte_t *pte, struct page *pmd_page, unsigned long addr, +void *data); I'd gotten the impression that these sorts of typedefs were out of fashion. In general yes, but for function pointers the syntax is so clumsy that I think typedefs are OK. +static int apply_to_pte_range(struct mm_struct *mm, pmd_t *pmd, + unsigned long addr, unsigned long end, + pte_fn_t fn, void *data) +{ +pte_t *pte; +int err; +struct page *pmd_page; +spinlock_t *ptl; + +pte = (mm == init_mm) ? +pte_alloc_kernel(pmd, addr) : +pte_alloc_map_lock(mm, pmd, addr, ptl); +if (!pte) +return -ENOMEM; Seems a bit awkward to pass mm all the way down the tree just for this quirk. Which is a bit awkward as it means that whether or not a lock is held in the callback is context dependent. Well, it would need mm for just pte_alloc_map_lock() anyway. smaps, clear_ref, and my pagemap code all use the callback at the pmd_range level, which a) localizes the pte-level locking concerns with the user b) amortizes the indirection overhead and c) (unfortunately) makes the user a bit more complex. We should try to measure whether (b) actually makes a difference. I'll need to look closely at your code again. It would be nice to have one pagewalker. J - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 20/20] Add apply_to_page_range() which applies a function to a pte range.
On Wed, Apr 04, 2007 at 12:12:11PM -0700, Jeremy Fitzhardinge wrote: > Add a new mm function apply_to_page_range() which applies a given > function to every pte in a given virtual address range in a given mm > structure. This is a generic alternative to cut-and-pasting the Linux > idiomatic pagetable walking code in every place that a sequence of > PTEs must be accessed. As we discussed before, this obviously has a lot in common with my walk_page_range code. The major difference and one your above description seems to be missing the important detail of why it's doing this: > + pte_alloc_kernel(pmd, addr) : > + pmd = pmd_alloc(mm, pud, addr); > + pud = pud_alloc(mm, pgd, addr); ..which is mentioned here: > +/* > + * Scan a region of virtual memory, filling in page tables as necessary > + * and calling a provided function on each leaf page table. > + */ But I'm not sure what the use case is that wants filling in the page table..? If both modes really make sense, perhaps a flag could unify these differences. > +typedef int (*pte_fn_t)(pte_t *pte, struct page *pmd_page, unsigned long > addr, > + void *data); I'd gotten the impression that these sorts of typedefs were out of fashion. > +static int apply_to_pte_range(struct mm_struct *mm, pmd_t *pmd, > + unsigned long addr, unsigned long end, > + pte_fn_t fn, void *data) > +{ > + pte_t *pte; > + int err; > + struct page *pmd_page; > + spinlock_t *ptl; > + > + pte = (mm == _mm) ? > + pte_alloc_kernel(pmd, addr) : > + pte_alloc_map_lock(mm, pmd, addr, ); > + if (!pte) > + return -ENOMEM; Seems a bit awkward to pass mm all the way down the tree just for this quirk. Which is a bit awkward as it means that whether or not a lock is held in the callback is context dependent. smaps, clear_ref, and my pagemap code all use the callback at the pmd_range level, which a) localizes the pte-level locking concerns with the user b) amortizes the indirection overhead and c) (unfortunately) makes the user a bit more complex. We should try to measure whether (b) actually makes a difference. > + do { > + err = fn(pte, pmd_page, addr, data); > + if (err) > + break; > + } while (pte++, addr += PAGE_SIZE, addr != end); I was about to say this do/while format seems a bit non-idiomatic for page table walkers, but then I looked at the code in mm/memory.c and realized the stuff I've been hacking on is the odd one out. -- Mathematics is the supreme nostalgia of our time. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 20/20] Add apply_to_page_range() which applies a function to a pte range.
On Wed, Apr 04, 2007 at 12:12:11PM -0700, Jeremy Fitzhardinge wrote: Add a new mm function apply_to_page_range() which applies a given function to every pte in a given virtual address range in a given mm structure. This is a generic alternative to cut-and-pasting the Linux idiomatic pagetable walking code in every place that a sequence of PTEs must be accessed. As we discussed before, this obviously has a lot in common with my walk_page_range code. The major difference and one your above description seems to be missing the important detail of why it's doing this: + pte_alloc_kernel(pmd, addr) : + pmd = pmd_alloc(mm, pud, addr); + pud = pud_alloc(mm, pgd, addr); ..which is mentioned here: +/* + * Scan a region of virtual memory, filling in page tables as necessary + * and calling a provided function on each leaf page table. + */ But I'm not sure what the use case is that wants filling in the page table..? If both modes really make sense, perhaps a flag could unify these differences. +typedef int (*pte_fn_t)(pte_t *pte, struct page *pmd_page, unsigned long addr, + void *data); I'd gotten the impression that these sorts of typedefs were out of fashion. +static int apply_to_pte_range(struct mm_struct *mm, pmd_t *pmd, + unsigned long addr, unsigned long end, + pte_fn_t fn, void *data) +{ + pte_t *pte; + int err; + struct page *pmd_page; + spinlock_t *ptl; + + pte = (mm == init_mm) ? + pte_alloc_kernel(pmd, addr) : + pte_alloc_map_lock(mm, pmd, addr, ptl); + if (!pte) + return -ENOMEM; Seems a bit awkward to pass mm all the way down the tree just for this quirk. Which is a bit awkward as it means that whether or not a lock is held in the callback is context dependent. smaps, clear_ref, and my pagemap code all use the callback at the pmd_range level, which a) localizes the pte-level locking concerns with the user b) amortizes the indirection overhead and c) (unfortunately) makes the user a bit more complex. We should try to measure whether (b) actually makes a difference. + do { + err = fn(pte, pmd_page, addr, data); + if (err) + break; + } while (pte++, addr += PAGE_SIZE, addr != end); I was about to say this do/while format seems a bit non-idiomatic for page table walkers, but then I looked at the code in mm/memory.c and realized the stuff I've been hacking on is the odd one out. -- Mathematics is the supreme nostalgia of our time. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/