Re: [patch 1/3 take2] smaps: extract pte walker from smaps code
On Wed, Feb 07, 2007 at 12:30:38AM -0800, Christoph Lameter wrote: > On Tue, 6 Feb 2007, David Rientjes wrote: > > > Extracts the page table entry walker from the smaps-specific code in > > fs/proc/task_mmu.c. This will be used later for clearing the reference > > bits on pages to measure the number of pages accessed over a time period > > through /proc/pid/smaps. > > Clearing reference bits? Ummm... That is a pretty inaccurate measure since > reclaim can remove pages and revert the reference bits. It can never work > reliably. > > > The new struct pte_walker includes the struct vm_area_struct of the memory > > to walk over. Iteration begins at the start address and completes at the > > the end address. A pointer to another data structure may be stored in the > > private field such as the struct mem_size_stats, which acts as the smaps > > accumulator. For each page table entry in the VMA, the func function is > > called with the corresponding struct pte_walker, the pte_t, and its > > address. > > Would it be possible to sync up with the people doing the page table > interface? Have a pointer to this? -- Mathematics is the supreme nostalgia of our time. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/3 take2] smaps: extract pte walker from smaps code
On Wed, Feb 07, 2007 at 12:30:38AM -0800, Christoph Lameter wrote: On Tue, 6 Feb 2007, David Rientjes wrote: Extracts the page table entry walker from the smaps-specific code in fs/proc/task_mmu.c. This will be used later for clearing the reference bits on pages to measure the number of pages accessed over a time period through /proc/pid/smaps. Clearing reference bits? Ummm... That is a pretty inaccurate measure since reclaim can remove pages and revert the reference bits. It can never work reliably. The new struct pte_walker includes the struct vm_area_struct of the memory to walk over. Iteration begins at the start address and completes at the the end address. A pointer to another data structure may be stored in the private field such as the struct mem_size_stats, which acts as the smaps accumulator. For each page table entry in the VMA, the func function is called with the corresponding struct pte_walker, the pte_t, and its address. Would it be possible to sync up with the people doing the page table interface? Have a pointer to this? -- Mathematics is the supreme nostalgia of our time. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/3 take2] smaps: extract pte walker from smaps code
On Thu, 8 Feb 2007, Matt Mackall wrote: > I've been looking at a similar refactoring of other code and I think > the way to go is a callback per block-of-PTEs with start and end > pointers. That gets rid of most of the call indirection overhead. > Yes, but only in a limited number of cases (including smaps). It is not always sufficient to do it on the block-of-ptes level. Many of the the pte iterators in the code right now are slightly different on each level: the ioremap code, for example, allocates the various directories as it progresses in depth. To have one library pte iterator would only be possible with callback functions on each pgd, pud, and pmd level that (at least) ioremap would require to be implemented on top of. That isn't very appealing. The PTI patchset from July moved these iterators into .h files but did not implement a single pte iterator to build on. It just turned out nicely in the smaps case that there was no difference on the pgd, pud, or pmd level between the two required iterators. I added a void *private to carry the struct for statistics accumulating in the smaps case but this is worthless in the clear_refs case. David - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/3 take2] smaps: extract pte walker from smaps code
On Wed, Feb 07, 2007 at 03:29:34PM +0900, Paul Mundt wrote: > On Tue, Feb 06, 2007 at 10:15:47PM -0800, David Rientjes wrote: > > Extracts the page table entry walker from the smaps-specific code in > > fs/proc/task_mmu.c. This will be used later for clearing the reference > > bits on pages to measure the number of pages accessed over a time period > > through /proc/pid/smaps. > > > I like the general idea of this patch set, however.. > > > Since the PTE walker is now extracted from the smaps code, > > smaps_pte_func() is invoked for each PTE in the VMA. Its behavior is > > identical to the existing implementation, except it is slightly slower > > because each PTE now invokes a function call. > > > Perhaps this is something that needs to be looked at more closely and > made more generic? There are many ranged page table walkers that aren't > so performance critical that the function call cost would cause too much > pain. ioremap_page_range() comes to mind, and there's bound to be others. > This would also help people to get the pte map/unmap right, which seems > to pop up from time to time as well.. I've been looking at a similar refactoring of other code and I think the way to go is a callback per block-of-PTEs with start and end pointers. That gets rid of most of the call indirection overhead. -- Mathematics is the supreme nostalgia of our time. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/3 take2] smaps: extract pte walker from smaps code
On Wed, Feb 07, 2007 at 03:29:34PM +0900, Paul Mundt wrote: On Tue, Feb 06, 2007 at 10:15:47PM -0800, David Rientjes wrote: Extracts the page table entry walker from the smaps-specific code in fs/proc/task_mmu.c. This will be used later for clearing the reference bits on pages to measure the number of pages accessed over a time period through /proc/pid/smaps. I like the general idea of this patch set, however.. Since the PTE walker is now extracted from the smaps code, smaps_pte_func() is invoked for each PTE in the VMA. Its behavior is identical to the existing implementation, except it is slightly slower because each PTE now invokes a function call. Perhaps this is something that needs to be looked at more closely and made more generic? There are many ranged page table walkers that aren't so performance critical that the function call cost would cause too much pain. ioremap_page_range() comes to mind, and there's bound to be others. This would also help people to get the pte map/unmap right, which seems to pop up from time to time as well.. I've been looking at a similar refactoring of other code and I think the way to go is a callback per block-of-PTEs with start and end pointers. That gets rid of most of the call indirection overhead. -- Mathematics is the supreme nostalgia of our time. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/3 take2] smaps: extract pte walker from smaps code
On Thu, 8 Feb 2007, Matt Mackall wrote: I've been looking at a similar refactoring of other code and I think the way to go is a callback per block-of-PTEs with start and end pointers. That gets rid of most of the call indirection overhead. Yes, but only in a limited number of cases (including smaps). It is not always sufficient to do it on the block-of-ptes level. Many of the the pte iterators in the code right now are slightly different on each level: the ioremap code, for example, allocates the various directories as it progresses in depth. To have one library pte iterator would only be possible with callback functions on each pgd, pud, and pmd level that (at least) ioremap would require to be implemented on top of. That isn't very appealing. The PTI patchset from July moved these iterators into .h files but did not implement a single pte iterator to build on. It just turned out nicely in the smaps case that there was no difference on the pgd, pud, or pmd level between the two required iterators. I added a void *private to carry the struct for statistics accumulating in the smaps case but this is worthless in the clear_refs case. David - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/3 take2] smaps: extract pte walker from smaps code
On Wed, 7 Feb 2007, Christoph Lameter wrote: > Clearing reference bits? Ummm... That is a pretty inaccurate measure since > reclaim can remove pages and revert the reference bits. It can never work > reliably. > It's not intended to work precisely, it's intended to give a good estimate of task memory footprint. There are several different scenarios that can interfere with getting a precise measurement via this method; then again /proc/pid/smaps is so expensive already because of the iteration through VMA's that it's not something you'd do regularly. Any other suggestions on how to obtain this data is certainly welcome. > Would it be possible to sync up with the people doing the page table > interface? > (+pauld) Is there an update on Paul Davies' implementation that would allow us to iterate through a set of pte's in a vm_area_struct range? > Could we somehow consolidate smaps and numa_maps? > Sure, but that's beyond the scope of this patchset. My intention in extracting away a new pte_walker was to prevent having two identical implementations and it could easily be extracted even further to lib for the ioremap case that Paul mentioned. David - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/3 take2] smaps: extract pte walker from smaps code
On Tue, 6 Feb 2007, David Rientjes wrote: > Extracts the page table entry walker from the smaps-specific code in > fs/proc/task_mmu.c. This will be used later for clearing the reference > bits on pages to measure the number of pages accessed over a time period > through /proc/pid/smaps. Clearing reference bits? Ummm... That is a pretty inaccurate measure since reclaim can remove pages and revert the reference bits. It can never work reliably. > The new struct pte_walker includes the struct vm_area_struct of the memory > to walk over. Iteration begins at the start address and completes at the > the end address. A pointer to another data structure may be stored in the > private field such as the struct mem_size_stats, which acts as the smaps > accumulator. For each page table entry in the VMA, the func function is > called with the corresponding struct pte_walker, the pte_t, and its > address. Would it be possible to sync up with the people doing the page table interface? Could we somehow consolidate smaps and numa_maps? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/3 take2] smaps: extract pte walker from smaps code
On Tue, 6 Feb 2007, David Rientjes wrote: Extracts the page table entry walker from the smaps-specific code in fs/proc/task_mmu.c. This will be used later for clearing the reference bits on pages to measure the number of pages accessed over a time period through /proc/pid/smaps. Clearing reference bits? Ummm... That is a pretty inaccurate measure since reclaim can remove pages and revert the reference bits. It can never work reliably. The new struct pte_walker includes the struct vm_area_struct of the memory to walk over. Iteration begins at the start address and completes at the the end address. A pointer to another data structure may be stored in the private field such as the struct mem_size_stats, which acts as the smaps accumulator. For each page table entry in the VMA, the func function is called with the corresponding struct pte_walker, the pte_t, and its address. Would it be possible to sync up with the people doing the page table interface? Could we somehow consolidate smaps and numa_maps? - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/3 take2] smaps: extract pte walker from smaps code
On Wed, 7 Feb 2007, Christoph Lameter wrote: Clearing reference bits? Ummm... That is a pretty inaccurate measure since reclaim can remove pages and revert the reference bits. It can never work reliably. It's not intended to work precisely, it's intended to give a good estimate of task memory footprint. There are several different scenarios that can interfere with getting a precise measurement via this method; then again /proc/pid/smaps is so expensive already because of the iteration through VMA's that it's not something you'd do regularly. Any other suggestions on how to obtain this data is certainly welcome. Would it be possible to sync up with the people doing the page table interface? (+pauld) Is there an update on Paul Davies' implementation that would allow us to iterate through a set of pte's in a vm_area_struct range? Could we somehow consolidate smaps and numa_maps? Sure, but that's beyond the scope of this patchset. My intention in extracting away a new pte_walker was to prevent having two identical implementations and it could easily be extracted even further to lib for the ioremap case that Paul mentioned. David - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/3 take2] smaps: extract pte walker from smaps code
On Tue, Feb 06, 2007 at 10:49:14PM -0800, Andrew Morton wrote: > On Wed, 7 Feb 2007 15:29:34 +0900 Paul Mundt <[EMAIL PROTECTED]> wrote: > > I like the general idea of this patch set, however.. > > David didn't really spell out the rationale. Userspace people ask "how > much memory is my application using". For system planning and such. We > don't know, really. So the approach taken here is to nuke all the > referenced bits and to then monitor them coming back over time. > > Obviously it doesn't work very well if the page scanner is doing things, > but one hopes that when someone is instrumenting their application they'll > ensure that sufficient memory is available to prevent this inaccuracy. > This was the problem we kept running in to as well, people that wanted to instrument their applications on the desktop where all of the required infrastructure already brought the system to its knees long before instrumentation could begin didn't really provide the most consistent metrics, especially when the application depended heavily on overcommit. This resulted in a lot of extra hacks on top of the smaps code trying to balance out the numbers, this just ended up being a massive headache. smaps seems to be an appealing target for piling additional semantics on, unfortunately. > I don't really have a sense for how much stuff we want to put into the kernel > to support this sort of thing. The proposed patches are, I think, minimal. > Perhaps it needs more. If so, opinions are solicited before we go and add > (and hence be forced to maintain) this new interface. > This sort of minimalist interface is not necessarily a bad approach if it helps in a way that works for the people that really want it. The issue with any of the smaps extensions is being able to provide people with a "good enough" metric without falling in to interface hell. This really depends on how one defines "good enough", though. It would be nice to hear from application folks if this ends up being useful for them or not. > > Perhaps this is something that needs to be looked at more closely and > > made more generic? There are many ranged page table walkers that aren't > > so performance critical that the function call cost would cause too much > > pain. ioremap_page_range() comes to mind, and there's bound to be others. > > This would also help people to get the pte map/unmap right, which seems > > to pop up from time to time as well.. > > That's what Paul Davies' patches are aimed at: > > http://marc.theaimsgroup.com/?l=linux-mm=115276500100695=2 > > I promised Paul that I'd take a serious look at those patches next time > they pop up. It would be good if others could help out in that. It would be nice to see this smaps patchset rebased on something like that rather than introducing another walker abstraction simply to have it overhauled later. I realize this isn't really David's problem, but we shouldn't be trying to solve the same problem multiple times. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/3 take2] smaps: extract pte walker from smaps code
On Wed, 7 Feb 2007 15:29:34 +0900 Paul Mundt <[EMAIL PROTECTED]> wrote: > On Tue, Feb 06, 2007 at 10:15:47PM -0800, David Rientjes wrote: > > Extracts the page table entry walker from the smaps-specific code in > > fs/proc/task_mmu.c. This will be used later for clearing the reference > > bits on pages to measure the number of pages accessed over a time period > > through /proc/pid/smaps. > > > I like the general idea of this patch set, however.. David didn't really spell out the rationale. Userspace people ask "how much memory is my application using". For system planning and such. We don't know, really. So the approach taken here is to nuke all the referenced bits and to then monitor them coming back over time. Obviously it doesn't work very well if the page scanner is doing things, but one hopes that when someone is instrumenting their application they'll ensure that sufficient memory is available to prevent this inaccuracy. The other part of the question is of course pagecache. Hopefully /proc/pid/io will suffice for that, probably in combination with /proc/sys/vm/drop_caches. I don't really have a sense for how much stuff we want to put into the kernel to support this sort of thing. The proposed patches are, I think, minimal. Perhaps it needs more. If so, opinions are solicited before we go and add (and hence be forced to maintain) this new interface. > > Since the PTE walker is now extracted from the smaps code, > > smaps_pte_func() is invoked for each PTE in the VMA. Its behavior is > > identical to the existing implementation, except it is slightly slower > > because each PTE now invokes a function call. > > > Perhaps this is something that needs to be looked at more closely and > made more generic? There are many ranged page table walkers that aren't > so performance critical that the function call cost would cause too much > pain. ioremap_page_range() comes to mind, and there's bound to be others. > This would also help people to get the pte map/unmap right, which seems > to pop up from time to time as well.. That's what Paul Davies' patches are aimed at: http://marc.theaimsgroup.com/?l=linux-mm=115276500100695=2 I promised Paul that I'd take a serious look at those patches next time they pop up. It would be good if others could help out in that. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/3 take2] smaps: extract pte walker from smaps code
On Tue, Feb 06, 2007 at 10:15:47PM -0800, David Rientjes wrote: > Extracts the page table entry walker from the smaps-specific code in > fs/proc/task_mmu.c. This will be used later for clearing the reference > bits on pages to measure the number of pages accessed over a time period > through /proc/pid/smaps. > I like the general idea of this patch set, however.. > Since the PTE walker is now extracted from the smaps code, > smaps_pte_func() is invoked for each PTE in the VMA. Its behavior is > identical to the existing implementation, except it is slightly slower > because each PTE now invokes a function call. > Perhaps this is something that needs to be looked at more closely and made more generic? There are many ranged page table walkers that aren't so performance critical that the function call cost would cause too much pain. ioremap_page_range() comes to mind, and there's bound to be others. This would also help people to get the pte map/unmap right, which seems to pop up from time to time as well.. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 1/3 take2] smaps: extract pte walker from smaps code
Extracts the page table entry walker from the smaps-specific code in fs/proc/task_mmu.c. This will be used later for clearing the reference bits on pages to measure the number of pages accessed over a time period through /proc/pid/smaps. The new struct pte_walker includes the struct vm_area_struct of the memory to walk over. Iteration begins at the start address and completes at the the end address. A pointer to another data structure may be stored in the private field such as the struct mem_size_stats, which acts as the smaps accumulator. For each page table entry in the VMA, the func function is called with the corresponding struct pte_walker, the pte_t, and its address. Since the PTE walker is now extracted from the smaps code, smaps_pte_func() is invoked for each PTE in the VMA. Its behavior is identical to the existing implementation, except it is slightly slower because each PTE now invokes a function call. Cc: Hugh Dickins <[EMAIL PROTECTED]> Cc: Paul Mundt <[EMAIL PROTECTED]> Cc: Christoph Lameter <[EMAIL PROTECTED]> Signed-off-by: David Rientjes <[EMAIL PROTECTED]> --- fs/proc/task_mmu.c | 126 ++-- 1 files changed, 82 insertions(+), 44 deletions(-) diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index 55ade0d..e87824b 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -122,6 +122,15 @@ struct mem_size_stats unsigned long private_dirty; }; +struct pte_walker { + struct vm_area_struct *vma; /* VMA */ + unsigned long start;/* start address */ + unsigned long end; /* end address */ + void *private; /* private data */ + /* function to invoke for each pte in the above range */ + void (*func)(struct pte_walker*, pte_t*, unsigned long); +}; + static int show_map_internal(struct seq_file *m, void *v, struct mem_size_stats *mss) { struct proc_maps_private *priv = m->private; @@ -204,98 +213,127 @@ static int show_map(struct seq_file *m, void *v) return show_map_internal(m, v, NULL); } -static void smaps_pte_range(struct vm_area_struct *vma, pmd_t *pmd, - unsigned long addr, unsigned long end, - struct mem_size_stats *mss) +/* + * Walks each PTE in the struct pte_walker address range and calls + * walker->func() for each entry. + */ +static void walk_ptes(struct pte_walker *walker, pmd_t *pmd) { - pte_t *pte, ptent; + struct vm_area_struct *vma = walker->vma; + unsigned long addr = walker->start; spinlock_t *ptl; - struct page *page; + pte_t *pte; pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, ); do { - ptent = *pte; - if (!pte_present(ptent)) - continue; - - mss->resident += PAGE_SIZE; - - page = vm_normal_page(vma, addr, ptent); - if (!page) - continue; - - if (page_mapcount(page) >= 2) { - if (pte_dirty(ptent)) - mss->shared_dirty += PAGE_SIZE; - else - mss->shared_clean += PAGE_SIZE; - } else { - if (pte_dirty(ptent)) - mss->private_dirty += PAGE_SIZE; - else - mss->private_clean += PAGE_SIZE; - } - } while (pte++, addr += PAGE_SIZE, addr != end); + walker->func(walker, pte, addr); + } while (pte++, addr += PAGE_SIZE, addr != walker->end); pte_unmap_unlock(pte - 1, ptl); cond_resched(); } -static inline void smaps_pmd_range(struct vm_area_struct *vma, pud_t *pud, - unsigned long addr, unsigned long end, - struct mem_size_stats *mss) +static inline void walk_pmds(struct pte_walker *walker, pud_t *pud) { - pmd_t *pmd; + unsigned long addr = walker->start; + unsigned long end = walker->end; unsigned long next; + pmd_t *pmd; pmd = pmd_offset(pud, addr); do { next = pmd_addr_end(addr, end); if (pmd_none_or_clear_bad(pmd)) continue; - smaps_pte_range(vma, pmd, addr, next, mss); + walk_ptes(walker, pmd); } while (pmd++, addr = next, addr != end); } -static inline void smaps_pud_range(struct vm_area_struct *vma, pgd_t *pgd, - unsigned long addr, unsigned long end, - struct mem_size_stats *mss) +static inline void walk_puds(struct pte_walker *walker, pgd_t *pgd) { - pud_t *pud; + unsigned long addr = walker->start; + unsigned long end = walker->end; unsigned long next; + pud_t *pud; pud = pud_offset(pgd,
[patch 1/3 take2] smaps: extract pte walker from smaps code
Extracts the page table entry walker from the smaps-specific code in fs/proc/task_mmu.c. This will be used later for clearing the reference bits on pages to measure the number of pages accessed over a time period through /proc/pid/smaps. The new struct pte_walker includes the struct vm_area_struct of the memory to walk over. Iteration begins at the start address and completes at the the end address. A pointer to another data structure may be stored in the private field such as the struct mem_size_stats, which acts as the smaps accumulator. For each page table entry in the VMA, the func function is called with the corresponding struct pte_walker, the pte_t, and its address. Since the PTE walker is now extracted from the smaps code, smaps_pte_func() is invoked for each PTE in the VMA. Its behavior is identical to the existing implementation, except it is slightly slower because each PTE now invokes a function call. Cc: Hugh Dickins [EMAIL PROTECTED] Cc: Paul Mundt [EMAIL PROTECTED] Cc: Christoph Lameter [EMAIL PROTECTED] Signed-off-by: David Rientjes [EMAIL PROTECTED] --- fs/proc/task_mmu.c | 126 ++-- 1 files changed, 82 insertions(+), 44 deletions(-) diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index 55ade0d..e87824b 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -122,6 +122,15 @@ struct mem_size_stats unsigned long private_dirty; }; +struct pte_walker { + struct vm_area_struct *vma; /* VMA */ + unsigned long start;/* start address */ + unsigned long end; /* end address */ + void *private; /* private data */ + /* function to invoke for each pte in the above range */ + void (*func)(struct pte_walker*, pte_t*, unsigned long); +}; + static int show_map_internal(struct seq_file *m, void *v, struct mem_size_stats *mss) { struct proc_maps_private *priv = m-private; @@ -204,98 +213,127 @@ static int show_map(struct seq_file *m, void *v) return show_map_internal(m, v, NULL); } -static void smaps_pte_range(struct vm_area_struct *vma, pmd_t *pmd, - unsigned long addr, unsigned long end, - struct mem_size_stats *mss) +/* + * Walks each PTE in the struct pte_walker address range and calls + * walker-func() for each entry. + */ +static void walk_ptes(struct pte_walker *walker, pmd_t *pmd) { - pte_t *pte, ptent; + struct vm_area_struct *vma = walker-vma; + unsigned long addr = walker-start; spinlock_t *ptl; - struct page *page; + pte_t *pte; pte = pte_offset_map_lock(vma-vm_mm, pmd, addr, ptl); do { - ptent = *pte; - if (!pte_present(ptent)) - continue; - - mss-resident += PAGE_SIZE; - - page = vm_normal_page(vma, addr, ptent); - if (!page) - continue; - - if (page_mapcount(page) = 2) { - if (pte_dirty(ptent)) - mss-shared_dirty += PAGE_SIZE; - else - mss-shared_clean += PAGE_SIZE; - } else { - if (pte_dirty(ptent)) - mss-private_dirty += PAGE_SIZE; - else - mss-private_clean += PAGE_SIZE; - } - } while (pte++, addr += PAGE_SIZE, addr != end); + walker-func(walker, pte, addr); + } while (pte++, addr += PAGE_SIZE, addr != walker-end); pte_unmap_unlock(pte - 1, ptl); cond_resched(); } -static inline void smaps_pmd_range(struct vm_area_struct *vma, pud_t *pud, - unsigned long addr, unsigned long end, - struct mem_size_stats *mss) +static inline void walk_pmds(struct pte_walker *walker, pud_t *pud) { - pmd_t *pmd; + unsigned long addr = walker-start; + unsigned long end = walker-end; unsigned long next; + pmd_t *pmd; pmd = pmd_offset(pud, addr); do { next = pmd_addr_end(addr, end); if (pmd_none_or_clear_bad(pmd)) continue; - smaps_pte_range(vma, pmd, addr, next, mss); + walk_ptes(walker, pmd); } while (pmd++, addr = next, addr != end); } -static inline void smaps_pud_range(struct vm_area_struct *vma, pgd_t *pgd, - unsigned long addr, unsigned long end, - struct mem_size_stats *mss) +static inline void walk_puds(struct pte_walker *walker, pgd_t *pgd) { - pud_t *pud; + unsigned long addr = walker-start; + unsigned long end = walker-end; unsigned long next; + pud_t *pud; pud = pud_offset(pgd, addr); do {
Re: [patch 1/3 take2] smaps: extract pte walker from smaps code
On Tue, Feb 06, 2007 at 10:15:47PM -0800, David Rientjes wrote: Extracts the page table entry walker from the smaps-specific code in fs/proc/task_mmu.c. This will be used later for clearing the reference bits on pages to measure the number of pages accessed over a time period through /proc/pid/smaps. I like the general idea of this patch set, however.. Since the PTE walker is now extracted from the smaps code, smaps_pte_func() is invoked for each PTE in the VMA. Its behavior is identical to the existing implementation, except it is slightly slower because each PTE now invokes a function call. Perhaps this is something that needs to be looked at more closely and made more generic? There are many ranged page table walkers that aren't so performance critical that the function call cost would cause too much pain. ioremap_page_range() comes to mind, and there's bound to be others. This would also help people to get the pte map/unmap right, which seems to pop up from time to time as well.. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/3 take2] smaps: extract pte walker from smaps code
On Wed, 7 Feb 2007 15:29:34 +0900 Paul Mundt [EMAIL PROTECTED] wrote: On Tue, Feb 06, 2007 at 10:15:47PM -0800, David Rientjes wrote: Extracts the page table entry walker from the smaps-specific code in fs/proc/task_mmu.c. This will be used later for clearing the reference bits on pages to measure the number of pages accessed over a time period through /proc/pid/smaps. I like the general idea of this patch set, however.. David didn't really spell out the rationale. Userspace people ask how much memory is my application using. For system planning and such. We don't know, really. So the approach taken here is to nuke all the referenced bits and to then monitor them coming back over time. Obviously it doesn't work very well if the page scanner is doing things, but one hopes that when someone is instrumenting their application they'll ensure that sufficient memory is available to prevent this inaccuracy. The other part of the question is of course pagecache. Hopefully /proc/pid/io will suffice for that, probably in combination with /proc/sys/vm/drop_caches. I don't really have a sense for how much stuff we want to put into the kernel to support this sort of thing. The proposed patches are, I think, minimal. Perhaps it needs more. If so, opinions are solicited before we go and add (and hence be forced to maintain) this new interface. Since the PTE walker is now extracted from the smaps code, smaps_pte_func() is invoked for each PTE in the VMA. Its behavior is identical to the existing implementation, except it is slightly slower because each PTE now invokes a function call. Perhaps this is something that needs to be looked at more closely and made more generic? There are many ranged page table walkers that aren't so performance critical that the function call cost would cause too much pain. ioremap_page_range() comes to mind, and there's bound to be others. This would also help people to get the pte map/unmap right, which seems to pop up from time to time as well.. That's what Paul Davies' patches are aimed at: http://marc.theaimsgroup.com/?l=linux-mmm=115276500100695w=2 I promised Paul that I'd take a serious look at those patches next time they pop up. It would be good if others could help out in that. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/3 take2] smaps: extract pte walker from smaps code
On Tue, Feb 06, 2007 at 10:49:14PM -0800, Andrew Morton wrote: On Wed, 7 Feb 2007 15:29:34 +0900 Paul Mundt [EMAIL PROTECTED] wrote: I like the general idea of this patch set, however.. David didn't really spell out the rationale. Userspace people ask how much memory is my application using. For system planning and such. We don't know, really. So the approach taken here is to nuke all the referenced bits and to then monitor them coming back over time. Obviously it doesn't work very well if the page scanner is doing things, but one hopes that when someone is instrumenting their application they'll ensure that sufficient memory is available to prevent this inaccuracy. This was the problem we kept running in to as well, people that wanted to instrument their applications on the desktop where all of the required infrastructure already brought the system to its knees long before instrumentation could begin didn't really provide the most consistent metrics, especially when the application depended heavily on overcommit. This resulted in a lot of extra hacks on top of the smaps code trying to balance out the numbers, this just ended up being a massive headache. smaps seems to be an appealing target for piling additional semantics on, unfortunately. I don't really have a sense for how much stuff we want to put into the kernel to support this sort of thing. The proposed patches are, I think, minimal. Perhaps it needs more. If so, opinions are solicited before we go and add (and hence be forced to maintain) this new interface. This sort of minimalist interface is not necessarily a bad approach if it helps in a way that works for the people that really want it. The issue with any of the smaps extensions is being able to provide people with a good enough metric without falling in to interface hell. This really depends on how one defines good enough, though. It would be nice to hear from application folks if this ends up being useful for them or not. Perhaps this is something that needs to be looked at more closely and made more generic? There are many ranged page table walkers that aren't so performance critical that the function call cost would cause too much pain. ioremap_page_range() comes to mind, and there's bound to be others. This would also help people to get the pte map/unmap right, which seems to pop up from time to time as well.. That's what Paul Davies' patches are aimed at: http://marc.theaimsgroup.com/?l=linux-mmm=115276500100695w=2 I promised Paul that I'd take a serious look at those patches next time they pop up. It would be good if others could help out in that. It would be nice to see this smaps patchset rebased on something like that rather than introducing another walker abstraction simply to have it overhauled later. I realize this isn't really David's problem, but we shouldn't be trying to solve the same problem multiple times. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/