Re: [PATCH 12/13] maps#2: Add /proc/pid/pagemap interface
On Thu, Apr 19, 2007 at 12:12:29PM -0700, Dave Hansen wrote: > On Fri, 2007-04-06 at 17:03 -0500, Matt Mackall wrote: > > > > +static int pagemap_pte_range(pmd_t *pmd, unsigned long addr, unsigned > > long end, > > +void *private) > > +{ > > + struct pagemapread *pm = private; > > + pte_t *pte; > > + int err; > > + > > + pte = pte_offset_map(pmd, addr); > > + for (; addr != end; pte++, addr += PAGE_SIZE) { > > + if (addr < pm->next) > > + continue; > > + if (!pte_present(*pte)) > > + err = add_to_pagemap(addr, -1, pm); > > + else > > + err = add_to_pagemap(addr, pte_pfn(*pte), pm); > > + if (err) > > + return err; > > + } > > + pte_unmap(pte - 1); > > + return 0; > > +} > > Sorry for the horribly late reply. ;) > > Would you have any problems with this being extended for the ! > pte_present() case to show pages that happen to be in swap? > > I'm playing with some process checkpoint/restart code, and using the > existing swap mechanisms to get the current memory contents out of the > process. I've also created a hackish syscall to make a pretty raw dump > of pte contents. > > Perhaps we could steal the high bits of the pfn and have its presence in > swap, plus some handle to which swapfile it is in. Or, would you rather > I just create a new /proc file that utilizes most of the code you > already put in place, and _just_ deals with swap? It seems reasonable to deal with swap here in some fashion. It's just a matter of how. Current swap entries aren't terribly portable. I'm planning on using a couple high bits for exposing active/dirty. Adding present should be fine as well. -- Mathematics is the supreme nostalgia of our time. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 12/13] maps#2: Add /proc/pid/pagemap interface
On Fri, 2007-04-06 at 17:03 -0500, Matt Mackall wrote: > > +static int pagemap_pte_range(pmd_t *pmd, unsigned long addr, unsigned > long end, > +void *private) > +{ > + struct pagemapread *pm = private; > + pte_t *pte; > + int err; > + > + pte = pte_offset_map(pmd, addr); > + for (; addr != end; pte++, addr += PAGE_SIZE) { > + if (addr < pm->next) > + continue; > + if (!pte_present(*pte)) > + err = add_to_pagemap(addr, -1, pm); > + else > + err = add_to_pagemap(addr, pte_pfn(*pte), pm); > + if (err) > + return err; > + } > + pte_unmap(pte - 1); > + return 0; > +} Sorry for the horribly late reply. ;) Would you have any problems with this being extended for the ! pte_present() case to show pages that happen to be in swap? I'm playing with some process checkpoint/restart code, and using the existing swap mechanisms to get the current memory contents out of the process. I've also created a hackish syscall to make a pretty raw dump of pte contents. Perhaps we could steal the high bits of the pfn and have its presence in swap, plus some handle to which swapfile it is in. Or, would you rather I just create a new /proc file that utilizes most of the code you already put in place, and _just_ deals with swap? -- Dave - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 12/13] maps#2: Add /proc/pid/pagemap interface
On Fri, Apr 06, 2007 at 11:55:10PM -0700, Andrew Morton wrote: > On Fri, 06 Apr 2007 17:03:13 -0500 Matt Mackall <[EMAIL PROTECTED]> wrote: > > > Add /proc/pid/pagemap interface > > > > This interface provides a mapping for each page in an address space to > > its physical page frame number, allowing precise determination of what > > pages are mapped and what pages are shared between processes. > > Could we please have a simple read-proc-pid-pagemap.c placed under > Documentation/ somewhere? Also some sample output for the changelog > so we can see what all this does. Working on that. The userspace portion of my tools are very rough at the moment. And in Python. > Also for kpagemap, please. > > Should /proc/pid/pagemap and kpagemap be versioned? They've both got a variable-sized header, so we can add things there. -- Mathematics is the supreme nostalgia of our time. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 12/13] maps#2: Add /proc/pid/pagemap interface
On Fri, 06 Apr 2007 17:03:13 -0500 Matt Mackall <[EMAIL PROTECTED]> wrote: > Add /proc/pid/pagemap interface > > This interface provides a mapping for each page in an address space to > its physical page frame number, allowing precise determination of what > pages are mapped and what pages are shared between processes. Could we please have a simple read-proc-pid-pagemap.c placed under Documentation/ somewhere? Also some sample output for the changelog so we can see what all this does. Also for kpagemap, please. Should /proc/pid/pagemap and kpagemap be versioned? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 12/13] maps#2: Add /proc/pid/pagemap interface
Add /proc/pid/pagemap interface This interface provides a mapping for each page in an address space to its physical page frame number, allowing precise determination of what pages are mapped and what pages are shared between processes. Signed-off-by: Matt Mackall <[EMAIL PROTECTED]> Index: mm/fs/proc/base.c === --- mm.orig/fs/proc/base.c 2007-04-04 18:03:03.0 -0500 +++ mm/fs/proc/base.c 2007-04-04 18:03:03.0 -0500 @@ -664,7 +664,7 @@ out_no_task: } #endif -static loff_t mem_lseek(struct file * file, loff_t offset, int orig) +loff_t mem_lseek(struct file * file, loff_t offset, int orig) { switch (orig) { case 0: @@ -2006,6 +2006,9 @@ static const struct pid_entry tgid_base_ #ifdef CONFIG_PROC_SMAPS REG("smaps", S_IRUGO, smaps), #endif +#ifdef CONFIG_PROC_PAGEMAP + REG("pagemap",S_IRUSR, pagemap), +#endif #endif #ifdef CONFIG_SECURITY DIR("attr", S_IRUGO|S_IXUGO, attr_dir), @@ -2293,6 +2296,9 @@ static const struct pid_entry tid_base_s #ifdef CONFIG_PROC_SMAPS REG("smaps", S_IRUGO, smaps), #endif +#ifdef CONFIG_PROC_PAGEMAP + REG("pagemap",S_IRUSR, pagemap), +#endif #endif #ifdef CONFIG_SECURITY DIR("attr", S_IRUGO|S_IXUGO, attr_dir), Index: mm/fs/proc/internal.h === --- mm.orig/fs/proc/internal.h 2007-04-04 18:01:16.0 -0500 +++ mm/fs/proc/internal.h 2007-04-04 18:03:03.0 -0500 @@ -45,11 +45,13 @@ extern int proc_tid_stat(struct task_str extern int proc_tgid_stat(struct task_struct *, char *); extern int proc_pid_status(struct task_struct *, char *); extern int proc_pid_statm(struct task_struct *, char *); +extern loff_t mem_lseek(struct file * file, loff_t offset, int orig); extern const struct file_operations proc_maps_operations; extern const struct file_operations proc_numa_maps_operations; extern const struct file_operations proc_smaps_operations; extern const struct file_operations proc_clear_refs_operations; +extern const struct file_operations proc_pagemap_operations; void free_proc_entry(struct proc_dir_entry *de); Index: mm/fs/proc/task_mmu.c === --- mm.orig/fs/proc/task_mmu.c 2007-04-04 18:03:03.0 -0500 +++ mm/fs/proc/task_mmu.c 2007-04-05 14:25:39.0 -0500 @@ -530,3 +530,187 @@ const struct file_operations proc_numa_m }; #endif +#ifdef CONFIG_PROC_PAGEMAP +struct pagemapread { + struct mm_struct *mm; + unsigned long next; + unsigned long *buf; + unsigned long pos; + size_t count; + int index; + char __user *out; +}; + +static int flush_pagemap(struct pagemapread *pm) +{ + int n = min(pm->count, pm->index * sizeof(unsigned long)); + if (copy_to_user(pm->out, pm->buf, n)) + return -EFAULT; + pm->out += n; + pm->pos += n; + pm->count -= n; + pm->index = 0; + cond_resched(); + return 0; +} + +static int add_to_pagemap(unsigned long addr, unsigned long pfn, + struct pagemapread *pm) +{ + pm->buf[pm->index++] = pfn; + pm->next = addr + PAGE_SIZE; + if (pm->index * sizeof(unsigned long) >= PAGE_SIZE || + pm->index * sizeof(unsigned long) >= pm->count) + return flush_pagemap(pm); + return 0; +} + +static int pagemap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end, +void *private) +{ + struct pagemapread *pm = private; + pte_t *pte; + int err; + + pte = pte_offset_map(pmd, addr); + for (; addr != end; pte++, addr += PAGE_SIZE) { + if (addr < pm->next) + continue; + if (!pte_present(*pte)) + err = add_to_pagemap(addr, -1, pm); + else + err = add_to_pagemap(addr, pte_pfn(*pte), pm); + if (err) + return err; + } + pte_unmap(pte - 1); + return 0; +} + +static int pagemap_fill(struct pagemapread *pm, unsigned long end) +{ + int ret; + + while (pm->next != end) { + ret = add_to_pagemap(pm->next, -1UL, pm); + if (ret) + return ret; + } + return 0; +} + +static struct mm_walk pagemap_walk = { .pmd_entry = pagemap_pte_range }; + +/* + * /proc/pid/pagemap - an array mapping virtual pages to pfns + * + * For each page in the address space, this file contains one long + * representing the corresponding physical page frame number (PFN) or + * -1 if the page isn't present. This allows determining precisely + * which pages are mapped and comparing mapped pages between + * processes. + * + * Efficient users of this interface will use /proc/pid/maps to + * determine which ar