Re: [patch 20/20] Add apply_to_page_range() which applies a function to a pte range.

2007-04-19 Thread Jeremy Fitzhardinge
Matt Mackall wrote:
> I was counting from the top down.
>   

Bottom-up is better; that way the levels don't change for 2,3,4 level
pagetables.

J

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 20/20] Add apply_to_page_range() which applies a function to a pte range.

2007-04-19 Thread Matt Mackall
On Thu, Apr 19, 2007 at 02:37:53PM -0700, Jeremy Fitzhardinge wrote:
> Matt Mackall wrote:
> > Haven't thought a huge amount about that. Perhaps it's best done with
> > the level 3 callback?
> >   
> 
> Level 2, I think, assuming you count the pte pages as level 1.  I think
> it can be dealt with, so long as it correctly skips level 1 callbacks
> for superpages, and does the test after the level 2 callback has returned.

I was counting from the top down.

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 20/20] Add apply_to_page_range() which applies a function to a pte range.

2007-04-19 Thread Jeremy Fitzhardinge
Matt Mackall wrote:
> Haven't thought a huge amount about that. Perhaps it's best done with
> the level 3 callback?
>   

Level 2, I think, assuming you count the pte pages as level 1.  I think
it can be dealt with, so long as it correctly skips level 1 callbacks
for superpages, and does the test after the level 2 callback has returned.

J
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 20/20] Add apply_to_page_range() which applies a function to a pte range.

2007-04-19 Thread Matt Mackall
On Thu, Apr 19, 2007 at 12:44:57PM -0700, Jeremy Fitzhardinge wrote:
> Matt Mackall wrote:
> > I think adding a flags field and an allocate flag to my callback
> > struct would be sufficient here.
> >   
> 
> Yes, probably.
> 
> What about something that wants to shatter superpages?

Haven't thought a huge amount about that. Perhaps it's best done with
the level 3 callback?

> Don't know that that's a huge concern.  The typedef namespace (*_t) is
> pretty empty, and its not like we're talking about something that's
> visible outside the kernel.  And the syntax is *really* ugly.

Don't really care about this much.

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 20/20] Add apply_to_page_range() which applies a function to a pte range.

2007-04-19 Thread Jeremy Fitzhardinge
Matt Mackall wrote:
> I think adding a flags field and an allocate flag to my callback
> struct would be sufficient here.
>   

Yes, probably.

What about something that wants to shatter superpages?

> The syntax is horrible, but I don't think we end up using the
> resultant type enough to justify the namespace pollution.
>   

Don't know that that's a huge concern.  The typedef namespace (*_t) is
pretty empty, and its not like we're talking about something that's
visible outside the kernel.  And the syntax is *really* ugly.

J
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 20/20] Add apply_to_page_range() which applies a function to a pte range.

2007-04-19 Thread Jeremy Fitzhardinge
Matt Mackall wrote:
 I think adding a flags field and an allocate flag to my callback
 struct would be sufficient here.
   

Yes, probably.

What about something that wants to shatter superpages?

 The syntax is horrible, but I don't think we end up using the
 resultant type enough to justify the namespace pollution.
   

Don't know that that's a huge concern.  The typedef namespace (*_t) is
pretty empty, and its not like we're talking about something that's
visible outside the kernel.  And the syntax is *really* ugly.

J
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 20/20] Add apply_to_page_range() which applies a function to a pte range.

2007-04-19 Thread Matt Mackall
On Thu, Apr 19, 2007 at 12:44:57PM -0700, Jeremy Fitzhardinge wrote:
 Matt Mackall wrote:
  I think adding a flags field and an allocate flag to my callback
  struct would be sufficient here.

 
 Yes, probably.
 
 What about something that wants to shatter superpages?

Haven't thought a huge amount about that. Perhaps it's best done with
the level 3 callback?

 Don't know that that's a huge concern.  The typedef namespace (*_t) is
 pretty empty, and its not like we're talking about something that's
 visible outside the kernel.  And the syntax is *really* ugly.

Don't really care about this much.

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 20/20] Add apply_to_page_range() which applies a function to a pte range.

2007-04-19 Thread Jeremy Fitzhardinge
Matt Mackall wrote:
 Haven't thought a huge amount about that. Perhaps it's best done with
 the level 3 callback?
   

Level 2, I think, assuming you count the pte pages as level 1.  I think
it can be dealt with, so long as it correctly skips level 1 callbacks
for superpages, and does the test after the level 2 callback has returned.

J
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 20/20] Add apply_to_page_range() which applies a function to a pte range.

2007-04-19 Thread Matt Mackall
On Thu, Apr 19, 2007 at 02:37:53PM -0700, Jeremy Fitzhardinge wrote:
 Matt Mackall wrote:
  Haven't thought a huge amount about that. Perhaps it's best done with
  the level 3 callback?

 
 Level 2, I think, assuming you count the pte pages as level 1.  I think
 it can be dealt with, so long as it correctly skips level 1 callbacks
 for superpages, and does the test after the level 2 callback has returned.

I was counting from the top down.

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 20/20] Add apply_to_page_range() which applies a function to a pte range.

2007-04-19 Thread Jeremy Fitzhardinge
Matt Mackall wrote:
 I was counting from the top down.
   

Bottom-up is better; that way the levels don't change for 2,3,4 level
pagetables.

J

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 20/20] Add apply_to_page_range() which applies a function to a pte range.

2007-04-17 Thread Matt Mackall
On Wed, Apr 04, 2007 at 11:52:57PM -0700, Jeremy Fitzhardinge wrote:
> Matt Mackall wrote:
> >> +/*
> >> + * Scan a region of virtual memory, filling in page tables as necessary
> >> + * and calling a provided function on each leaf page table.
> >> + */
> >> 
> >
> > But I'm not sure what the use case is that wants filling in the page
> > table..? If both modes really make sense, perhaps a flag could unify
> > these differences.
> >   
> 
> Well, two reasons:
> 
> One is the general one that if you're traversing ptes then they need to
> exist to traverse them (for example, if you're creating new mappings). 
> Obviously if you want to just visit existing mappings, then
> instantiating new pagetable is not the right thing to do (and I could
> make use of this too).
> 
> The other is that there are various places in the Xen hypervisor API
> where you pass in a reference to pte entry for the hypervisor to put
> mappings into, and the rest of the pagetable needs to exist.  The Xen
> code uses the side-effect of apply_to_page_range() to create pagetable
> for these calls.

I think adding a flags field and an allocate flag to my callback
struct would be sufficient here.

> > I'd gotten the impression that these sorts of typedefs were out of
> > fashion.
> >   
> 
> In general yes, but for function pointers the syntax is so clumsy that I
> think typedefs are OK.

The syntax is horrible, but I don't think we end up using the
resultant type enough to justify the namespace pollution.

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 20/20] Add apply_to_page_range() which applies a function to a pte range.

2007-04-17 Thread Matt Mackall
On Wed, Apr 04, 2007 at 11:52:57PM -0700, Jeremy Fitzhardinge wrote:
 Matt Mackall wrote:
  +/*
  + * Scan a region of virtual memory, filling in page tables as necessary
  + * and calling a provided function on each leaf page table.
  + */
  
 
  But I'm not sure what the use case is that wants filling in the page
  table..? If both modes really make sense, perhaps a flag could unify
  these differences.

 
 Well, two reasons:
 
 One is the general one that if you're traversing ptes then they need to
 exist to traverse them (for example, if you're creating new mappings). 
 Obviously if you want to just visit existing mappings, then
 instantiating new pagetable is not the right thing to do (and I could
 make use of this too).
 
 The other is that there are various places in the Xen hypervisor API
 where you pass in a reference to pte entry for the hypervisor to put
 mappings into, and the rest of the pagetable needs to exist.  The Xen
 code uses the side-effect of apply_to_page_range() to create pagetable
 for these calls.

I think adding a flags field and an allocate flag to my callback
struct would be sufficient here.

  I'd gotten the impression that these sorts of typedefs were out of
  fashion.

 
 In general yes, but for function pointers the syntax is so clumsy that I
 think typedefs are OK.

The syntax is horrible, but I don't think we end up using the
resultant type enough to justify the namespace pollution.

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 20/20] Add apply_to_page_range() which applies a function to a pte range.

2007-04-05 Thread Jeremy Fitzhardinge
Matt Mackall wrote:
>> +/*
>> + * Scan a region of virtual memory, filling in page tables as necessary
>> + * and calling a provided function on each leaf page table.
>> + */
>> 
>
> But I'm not sure what the use case is that wants filling in the page
> table..? If both modes really make sense, perhaps a flag could unify
> these differences.
>   

Well, two reasons:

One is the general one that if you're traversing ptes then they need to
exist to traverse them (for example, if you're creating new mappings). 
Obviously if you want to just visit existing mappings, then
instantiating new pagetable is not the right thing to do (and I could
make use of this too).

The other is that there are various places in the Xen hypervisor API
where you pass in a reference to pte entry for the hypervisor to put
mappings into, and the rest of the pagetable needs to exist.  The Xen
code uses the side-effect of apply_to_page_range() to create pagetable
for these calls.

>> +typedef int (*pte_fn_t)(pte_t *pte, struct page *pmd_page, unsigned long 
>> addr,
>> +void *data);
>> 
>
> I'd gotten the impression that these sorts of typedefs were out of
> fashion.
>   

In general yes, but for function pointers the syntax is so clumsy that I
think typedefs are OK.

>> +static int apply_to_pte_range(struct mm_struct *mm, pmd_t *pmd,
>> + unsigned long addr, unsigned long end,
>> + pte_fn_t fn, void *data)
>> +{
>> +pte_t *pte;
>> +int err;
>> +struct page *pmd_page;
>> +spinlock_t *ptl;
>> +
>> +pte = (mm == _mm) ?
>> +pte_alloc_kernel(pmd, addr) :
>> +pte_alloc_map_lock(mm, pmd, addr, );
>> +if (!pte)
>> +return -ENOMEM;
>> 
>
> Seems a bit awkward to pass mm all the way down the tree just for this
> quirk. Which is a bit awkward as it means that whether or not a lock
> is held in the callback is context dependent.
>   

Well, it would need mm for just pte_alloc_map_lock() anyway.

> smaps, clear_ref, and my pagemap code all use the callback at the
> pmd_range level, which a) localizes the pte-level locking concerns
> with the user b) amortizes the indirection overhead and c)
> (unfortunately) makes the user a bit more complex.
>
> We should try to measure whether (b) actually makes a difference.
>   

I'll need to look closely at your code again.  It would be nice to have
one pagewalker.

J
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 20/20] Add apply_to_page_range() which applies a function to a pte range.

2007-04-05 Thread Jeremy Fitzhardinge
Matt Mackall wrote:
 +/*
 + * Scan a region of virtual memory, filling in page tables as necessary
 + * and calling a provided function on each leaf page table.
 + */
 

 But I'm not sure what the use case is that wants filling in the page
 table..? If both modes really make sense, perhaps a flag could unify
 these differences.
   

Well, two reasons:

One is the general one that if you're traversing ptes then they need to
exist to traverse them (for example, if you're creating new mappings). 
Obviously if you want to just visit existing mappings, then
instantiating new pagetable is not the right thing to do (and I could
make use of this too).

The other is that there are various places in the Xen hypervisor API
where you pass in a reference to pte entry for the hypervisor to put
mappings into, and the rest of the pagetable needs to exist.  The Xen
code uses the side-effect of apply_to_page_range() to create pagetable
for these calls.

 +typedef int (*pte_fn_t)(pte_t *pte, struct page *pmd_page, unsigned long 
 addr,
 +void *data);
 

 I'd gotten the impression that these sorts of typedefs were out of
 fashion.
   

In general yes, but for function pointers the syntax is so clumsy that I
think typedefs are OK.

 +static int apply_to_pte_range(struct mm_struct *mm, pmd_t *pmd,
 + unsigned long addr, unsigned long end,
 + pte_fn_t fn, void *data)
 +{
 +pte_t *pte;
 +int err;
 +struct page *pmd_page;
 +spinlock_t *ptl;
 +
 +pte = (mm == init_mm) ?
 +pte_alloc_kernel(pmd, addr) :
 +pte_alloc_map_lock(mm, pmd, addr, ptl);
 +if (!pte)
 +return -ENOMEM;
 

 Seems a bit awkward to pass mm all the way down the tree just for this
 quirk. Which is a bit awkward as it means that whether or not a lock
 is held in the callback is context dependent.
   

Well, it would need mm for just pte_alloc_map_lock() anyway.

 smaps, clear_ref, and my pagemap code all use the callback at the
 pmd_range level, which a) localizes the pte-level locking concerns
 with the user b) amortizes the indirection overhead and c)
 (unfortunately) makes the user a bit more complex.

 We should try to measure whether (b) actually makes a difference.
   

I'll need to look closely at your code again.  It would be nice to have
one pagewalker.

J
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 20/20] Add apply_to_page_range() which applies a function to a pte range.

2007-04-04 Thread Matt Mackall
On Wed, Apr 04, 2007 at 12:12:11PM -0700, Jeremy Fitzhardinge wrote:
> Add a new mm function apply_to_page_range() which applies a given
> function to every pte in a given virtual address range in a given mm
> structure. This is a generic alternative to cut-and-pasting the Linux
> idiomatic pagetable walking code in every place that a sequence of
> PTEs must be accessed.

As we discussed before, this obviously has a lot in common with my
walk_page_range code.

The major difference and one your above description seems to be
missing the important detail of why it's doing this:

> + pte_alloc_kernel(pmd, addr) :
> + pmd = pmd_alloc(mm, pud, addr);
> + pud = pud_alloc(mm, pgd, addr);

..which is mentioned here:

> +/*
> + * Scan a region of virtual memory, filling in page tables as necessary
> + * and calling a provided function on each leaf page table.
> + */

But I'm not sure what the use case is that wants filling in the page
table..? If both modes really make sense, perhaps a flag could unify
these differences.

> +typedef int (*pte_fn_t)(pte_t *pte, struct page *pmd_page, unsigned long 
> addr,
> + void *data);

I'd gotten the impression that these sorts of typedefs were out of
fashion.

> +static int apply_to_pte_range(struct mm_struct *mm, pmd_t *pmd,
> +  unsigned long addr, unsigned long end,
> +  pte_fn_t fn, void *data)
> +{
> + pte_t *pte;
> + int err;
> + struct page *pmd_page;
> + spinlock_t *ptl;
> +
> + pte = (mm == _mm) ?
> + pte_alloc_kernel(pmd, addr) :
> + pte_alloc_map_lock(mm, pmd, addr, );
> + if (!pte)
> + return -ENOMEM;

Seems a bit awkward to pass mm all the way down the tree just for this
quirk. Which is a bit awkward as it means that whether or not a lock
is held in the callback is context dependent.

smaps, clear_ref, and my pagemap code all use the callback at the
pmd_range level, which a) localizes the pte-level locking concerns
with the user b) amortizes the indirection overhead and c)
(unfortunately) makes the user a bit more complex.

We should try to measure whether (b) actually makes a difference.

> + do {
> + err = fn(pte, pmd_page, addr, data);
> + if (err)
> + break;
> + } while (pte++, addr += PAGE_SIZE, addr != end);

I was about to say this do/while format seems a bit non-idiomatic for
page table walkers, but then I looked at the code in mm/memory.c and
realized the stuff I've been hacking on is the odd one out.

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 20/20] Add apply_to_page_range() which applies a function to a pte range.

2007-04-04 Thread Matt Mackall
On Wed, Apr 04, 2007 at 12:12:11PM -0700, Jeremy Fitzhardinge wrote:
 Add a new mm function apply_to_page_range() which applies a given
 function to every pte in a given virtual address range in a given mm
 structure. This is a generic alternative to cut-and-pasting the Linux
 idiomatic pagetable walking code in every place that a sequence of
 PTEs must be accessed.

As we discussed before, this obviously has a lot in common with my
walk_page_range code.

The major difference and one your above description seems to be
missing the important detail of why it's doing this:

 + pte_alloc_kernel(pmd, addr) :
 + pmd = pmd_alloc(mm, pud, addr);
 + pud = pud_alloc(mm, pgd, addr);

..which is mentioned here:

 +/*
 + * Scan a region of virtual memory, filling in page tables as necessary
 + * and calling a provided function on each leaf page table.
 + */

But I'm not sure what the use case is that wants filling in the page
table..? If both modes really make sense, perhaps a flag could unify
these differences.

 +typedef int (*pte_fn_t)(pte_t *pte, struct page *pmd_page, unsigned long 
 addr,
 + void *data);

I'd gotten the impression that these sorts of typedefs were out of
fashion.

 +static int apply_to_pte_range(struct mm_struct *mm, pmd_t *pmd,
 +  unsigned long addr, unsigned long end,
 +  pte_fn_t fn, void *data)
 +{
 + pte_t *pte;
 + int err;
 + struct page *pmd_page;
 + spinlock_t *ptl;
 +
 + pte = (mm == init_mm) ?
 + pte_alloc_kernel(pmd, addr) :
 + pte_alloc_map_lock(mm, pmd, addr, ptl);
 + if (!pte)
 + return -ENOMEM;

Seems a bit awkward to pass mm all the way down the tree just for this
quirk. Which is a bit awkward as it means that whether or not a lock
is held in the callback is context dependent.

smaps, clear_ref, and my pagemap code all use the callback at the
pmd_range level, which a) localizes the pte-level locking concerns
with the user b) amortizes the indirection overhead and c)
(unfortunately) makes the user a bit more complex.

We should try to measure whether (b) actually makes a difference.

 + do {
 + err = fn(pte, pmd_page, addr, data);
 + if (err)
 + break;
 + } while (pte++, addr += PAGE_SIZE, addr != end);

I was about to say this do/while format seems a bit non-idiomatic for
page table walkers, but then I looked at the code in mm/memory.c and
realized the stuff I've been hacking on is the odd one out.

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/