Re: 2.6.22-rc3-mm1 - page_mkwrite() breakage

2007-06-01 Thread Mark Fasheh
On Fri, Jun 01, 2007 at 03:47:35PM -0700, Andrew Morton wrote:
> Right.  I did a lot of tricksy work for rc3-mm1 to merge git-ocfs2 on top
> of Nick's stuff.  Then I repulled your tree and lost it all.  This is
> because I was dumb and I fixed rc3-mm1's git-ocfs.patch rather than doing a
> separate fix-rejects-in-git-ocfs2.patch.
> 
> This is all unique-to-akpm stuff which you don't need to worry about ;)

Ok, I am no longer concerned! :)


> > So, which of Nick's patches are we talking about here?
> > 
> > Btw, I know you tend to handle rejects yourself, but if it's a major PITA
> > I'd be happy to help out. Boy, I'm hoping I didn't just ask for a load of
> > trouble there :)
> 
> Is OK - I'll move Nick's patches back to behind the git trees and it'll all 
> come
> good.

Phew ok. Once again, thanks for all the work you do getting the ocfs2 git
patches into -mm.
--Mark

--
Mark Fasheh
Senior Software Developer, Oracle
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc3-mm1 - page_mkwrite() breakage

2007-06-01 Thread Andrew Morton
On Fri, 1 Jun 2007 15:33:02 -0700
Mark Fasheh <[EMAIL PROTECTED]> wrote:

> On Fri, Jun 01, 2007 at 03:25:37PM -0700, Andrew Morton wrote:
> > > Andrew, if this is ok with you I'd really like to see that fix in -mm. 
> > > Ocfs2
> > > shared write mmap will instantly deadlock without it.
> > 
> > ug, OK.  I get a ginormous reject when merging ocfs2 on Nick's stuff which
> > I've been largely ignoring thus far.
> 
> Huh, I'm a bit confused... I created this patch on top of 2.6.22-rc3-mm1
> which most certainly contains a merge of git-ocfs2.patch and the series
> which at least contains
> mm-fix-fault-vs-invalidate-race-for-linear-mappings.patch.

Right.  I did a lot of tricksy work for rc3-mm1 to merge git-ocfs2 on top
of Nick's stuff.  Then I repulled your tree and lost it all.  This is
because I was dumb and I fixed rc3-mm1's git-ocfs.patch rather than doing a
separate fix-rejects-in-git-ocfs2.patch.

This is all unique-to-akpm stuff which you don't need to worry about ;)

> So, which of Nick's patches are we talking about here?
> 
> Btw, I know you tend to handle rejects yourself, but if it's a major PITA
> I'd be happy to help out. Boy, I'm hoping I didn't just ask for a load of
> trouble there :)

Is OK - I'll move Nick's patches back to behind the git trees and it'll all come
good.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc3-mm1 - page_mkwrite() breakage

2007-06-01 Thread Mark Fasheh
On Fri, Jun 01, 2007 at 03:25:37PM -0700, Andrew Morton wrote:
> > Andrew, if this is ok with you I'd really like to see that fix in -mm. Ocfs2
> > shared write mmap will instantly deadlock without it.
> 
> ug, OK.  I get a ginormous reject when merging ocfs2 on Nick's stuff which
> I've been largely ignoring thus far.

Huh, I'm a bit confused... I created this patch on top of 2.6.22-rc3-mm1
which most certainly contains a merge of git-ocfs2.patch and the series
which at least contains
mm-fix-fault-vs-invalidate-race-for-linear-mappings.patch.

So, which of Nick's patches are we talking about here?

Btw, I know you tend to handle rejects yourself, but if it's a major PITA
I'd be happy to help out. Boy, I'm hoping I didn't just ask for a load of
trouble there :)
--Mark

--
Mark Fasheh
Senior Software Developer, Oracle
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc3-mm1 - page_mkwrite() breakage

2007-06-01 Thread Andrew Morton
On Fri, 1 Jun 2007 15:01:18 -0700
Mark Fasheh <[EMAIL PROTECTED]> wrote:

> On Thu, May 31, 2007 at 10:20:39PM -0700, Mark Fasheh wrote:
> > Ok. So how about the attached patch? It's a bit different than discussed,
> > but I think it's much cleaner because it preserves the current behavior of
> > the callback and keeps that bit of page locking inside core code. Not tested
> > as of yet, but I can run it tommorrow.
> 
> Ok - this patch seems to check out fine in testing - no more deadlocking.
> 
> Andrew, if this is ok with you I'd really like to see that fix in -mm. Ocfs2
> shared write mmap will instantly deadlock without it.

ug, OK.  I get a ginormous reject when merging ocfs2 on Nick's stuff which
I've been largely ignoring thus far.

Perhaps I need to go back to staging Nick's stuff after the git trees.  I'll
take a look.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc3-mm1 - page_mkwrite() breakage

2007-06-01 Thread Mark Fasheh
On Thu, May 31, 2007 at 10:20:39PM -0700, Mark Fasheh wrote:
> Ok. So how about the attached patch? It's a bit different than discussed,
> but I think it's much cleaner because it preserves the current behavior of
> the callback and keeps that bit of page locking inside core code. Not tested
> as of yet, but I can run it tommorrow.

Ok - this patch seems to check out fine in testing - no more deadlocking.

Andrew, if this is ok with you I'd really like to see that fix in -mm. Ocfs2
shared write mmap will instantly deadlock without it.

>From reading Nick's description of the problem in
mm-fix-fault-vs-invalidate-race-for-linear-mappings.patch I think we're
pretty safe (as I noted before) because Ocfs2 re-checks the mapping under
lock to protect against trucate races. That's been an "unwritten"
requirement of page_mkwrite() anyway.

Speaking of requirements, attached is my sad attempt at documenting the API.
I know it might be merged into ->fault at some point, but we really ought to
have _something_ in the meantime.
--Mark

--
Mark Fasheh
Senior Software Developer, Oracle
[EMAIL PROTECTED]


From: Mark Fasheh <[EMAIL PROTECTED]>

[PATCH] Document ->page_mkwrite() locking

There seems to be very little documentation about this callback in general.
The locking in particular is a bit tricky, so it's worth having this in
writing.

Signed-off-by: Mark Fasheh <[EMAIL PROTECTED]>

---

 Documentation/filesystems/Locking |   11 ++-
 1 files changed, 10 insertions(+), 1 deletions(-)

2320eadfa34199c779638edbdbb6c491df09c49b
diff --git a/Documentation/filesystems/Locking 
b/Documentation/filesystems/Locking
index 970c8ec..91ec4b4 100644
--- a/Documentation/filesystems/Locking
+++ b/Documentation/filesystems/Locking
@@ -512,13 +512,22 @@ prototypes:
void (*close)(struct vm_area_struct*);
struct page *(*fault)(struct vm_area_struct*, struct fault_data *);
struct page *(*nopage)(struct vm_area_struct*, unsigned long, int *);
+   int (*page_mkwrite)(struct vm_area_struct *, struct page *);
 
 locking rules:
-   BKL mmap_sem
+   BKL mmap_semPageLocked(page)
 open:  no  yes
 close: no  yes
 fault: no  yes
 nopage:no  yes
+page_mkwrite:  no  yes no
+
+   ->page_mkwrite() is called when a previously read-only page is
+about to become writeable. The file system is responsible for
+protecting against truncate races. Once appropriate action has been
+taking to lock out truncate, the page range should be verified to be
+within i_size. The page mapping should also be checked that it is not
+NULL.
 
 

Dubious stuff
-- 
1.3.3

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc3-mm1 - page_mkwrite() breakage

2007-06-01 Thread Mark Fasheh
On Thu, May 31, 2007 at 10:20:39PM -0700, Mark Fasheh wrote:
 Ok. So how about the attached patch? It's a bit different than discussed,
 but I think it's much cleaner because it preserves the current behavior of
 the callback and keeps that bit of page locking inside core code. Not tested
 as of yet, but I can run it tommorrow.

Ok - this patch seems to check out fine in testing - no more deadlocking.

Andrew, if this is ok with you I'd really like to see that fix in -mm. Ocfs2
shared write mmap will instantly deadlock without it.

From reading Nick's description of the problem in
mm-fix-fault-vs-invalidate-race-for-linear-mappings.patch I think we're
pretty safe (as I noted before) because Ocfs2 re-checks the mapping under
lock to protect against trucate races. That's been an unwritten
requirement of page_mkwrite() anyway.

Speaking of requirements, attached is my sad attempt at documenting the API.
I know it might be merged into -fault at some point, but we really ought to
have _something_ in the meantime.
--Mark

--
Mark Fasheh
Senior Software Developer, Oracle
[EMAIL PROTECTED]


From: Mark Fasheh [EMAIL PROTECTED]

[PATCH] Document -page_mkwrite() locking

There seems to be very little documentation about this callback in general.
The locking in particular is a bit tricky, so it's worth having this in
writing.

Signed-off-by: Mark Fasheh [EMAIL PROTECTED]

---

 Documentation/filesystems/Locking |   11 ++-
 1 files changed, 10 insertions(+), 1 deletions(-)

2320eadfa34199c779638edbdbb6c491df09c49b
diff --git a/Documentation/filesystems/Locking 
b/Documentation/filesystems/Locking
index 970c8ec..91ec4b4 100644
--- a/Documentation/filesystems/Locking
+++ b/Documentation/filesystems/Locking
@@ -512,13 +512,22 @@ prototypes:
void (*close)(struct vm_area_struct*);
struct page *(*fault)(struct vm_area_struct*, struct fault_data *);
struct page *(*nopage)(struct vm_area_struct*, unsigned long, int *);
+   int (*page_mkwrite)(struct vm_area_struct *, struct page *);
 
 locking rules:
-   BKL mmap_sem
+   BKL mmap_semPageLocked(page)
 open:  no  yes
 close: no  yes
 fault: no  yes
 nopage:no  yes
+page_mkwrite:  no  yes no
+
+   -page_mkwrite() is called when a previously read-only page is
+about to become writeable. The file system is responsible for
+protecting against truncate races. Once appropriate action has been
+taking to lock out truncate, the page range should be verified to be
+within i_size. The page mapping should also be checked that it is not
+NULL.
 
 

Dubious stuff
-- 
1.3.3

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc3-mm1 - page_mkwrite() breakage

2007-06-01 Thread Andrew Morton
On Fri, 1 Jun 2007 15:01:18 -0700
Mark Fasheh [EMAIL PROTECTED] wrote:

 On Thu, May 31, 2007 at 10:20:39PM -0700, Mark Fasheh wrote:
  Ok. So how about the attached patch? It's a bit different than discussed,
  but I think it's much cleaner because it preserves the current behavior of
  the callback and keeps that bit of page locking inside core code. Not tested
  as of yet, but I can run it tommorrow.
 
 Ok - this patch seems to check out fine in testing - no more deadlocking.
 
 Andrew, if this is ok with you I'd really like to see that fix in -mm. Ocfs2
 shared write mmap will instantly deadlock without it.

ug, OK.  I get a ginormous reject when merging ocfs2 on Nick's stuff which
I've been largely ignoring thus far.

Perhaps I need to go back to staging Nick's stuff after the git trees.  I'll
take a look.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc3-mm1 - page_mkwrite() breakage

2007-06-01 Thread Mark Fasheh
On Fri, Jun 01, 2007 at 03:25:37PM -0700, Andrew Morton wrote:
  Andrew, if this is ok with you I'd really like to see that fix in -mm. Ocfs2
  shared write mmap will instantly deadlock without it.
 
 ug, OK.  I get a ginormous reject when merging ocfs2 on Nick's stuff which
 I've been largely ignoring thus far.

Huh, I'm a bit confused... I created this patch on top of 2.6.22-rc3-mm1
which most certainly contains a merge of git-ocfs2.patch and the series
which at least contains
mm-fix-fault-vs-invalidate-race-for-linear-mappings.patch.

So, which of Nick's patches are we talking about here?

Btw, I know you tend to handle rejects yourself, but if it's a major PITA
I'd be happy to help out. Boy, I'm hoping I didn't just ask for a load of
trouble there :)
--Mark

--
Mark Fasheh
Senior Software Developer, Oracle
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc3-mm1 - page_mkwrite() breakage

2007-06-01 Thread Andrew Morton
On Fri, 1 Jun 2007 15:33:02 -0700
Mark Fasheh [EMAIL PROTECTED] wrote:

 On Fri, Jun 01, 2007 at 03:25:37PM -0700, Andrew Morton wrote:
   Andrew, if this is ok with you I'd really like to see that fix in -mm. 
   Ocfs2
   shared write mmap will instantly deadlock without it.
  
  ug, OK.  I get a ginormous reject when merging ocfs2 on Nick's stuff which
  I've been largely ignoring thus far.
 
 Huh, I'm a bit confused... I created this patch on top of 2.6.22-rc3-mm1
 which most certainly contains a merge of git-ocfs2.patch and the series
 which at least contains
 mm-fix-fault-vs-invalidate-race-for-linear-mappings.patch.

Right.  I did a lot of tricksy work for rc3-mm1 to merge git-ocfs2 on top
of Nick's stuff.  Then I repulled your tree and lost it all.  This is
because I was dumb and I fixed rc3-mm1's git-ocfs.patch rather than doing a
separate fix-rejects-in-git-ocfs2.patch.

This is all unique-to-akpm stuff which you don't need to worry about ;)

 So, which of Nick's patches are we talking about here?
 
 Btw, I know you tend to handle rejects yourself, but if it's a major PITA
 I'd be happy to help out. Boy, I'm hoping I didn't just ask for a load of
 trouble there :)

Is OK - I'll move Nick's patches back to behind the git trees and it'll all come
good.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc3-mm1 - page_mkwrite() breakage

2007-06-01 Thread Mark Fasheh
On Fri, Jun 01, 2007 at 03:47:35PM -0700, Andrew Morton wrote:
 Right.  I did a lot of tricksy work for rc3-mm1 to merge git-ocfs2 on top
 of Nick's stuff.  Then I repulled your tree and lost it all.  This is
 because I was dumb and I fixed rc3-mm1's git-ocfs.patch rather than doing a
 separate fix-rejects-in-git-ocfs2.patch.
 
 This is all unique-to-akpm stuff which you don't need to worry about ;)

Ok, I am no longer concerned! :)


  So, which of Nick's patches are we talking about here?
  
  Btw, I know you tend to handle rejects yourself, but if it's a major PITA
  I'd be happy to help out. Boy, I'm hoping I didn't just ask for a load of
  trouble there :)
 
 Is OK - I'll move Nick's patches back to behind the git trees and it'll all 
 come
 good.

Phew ok. Once again, thanks for all the work you do getting the ocfs2 git
patches into -mm.
--Mark

--
Mark Fasheh
Senior Software Developer, Oracle
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc3-mm1 - page_mkwrite() breakage

2007-05-31 Thread Mark Fasheh
On Fri, Jun 01, 2007 at 03:53:49AM +0200, Nick Piggin wrote:
> On Thu, May 31, 2007 at 06:45:17PM -0700, Mark Fasheh wrote:
> > On Fri, Jun 01, 2007 at 03:34:02AM +0200, Nick Piggin wrote:
> > > > Here's a nasty idea... Would it be valid for ->page_mkwrite to unlock 
> > > > the
> > > > page, so long as it's returned in a locked state? Though, do we even 
> > > > need
> > > > the page lock that early? It seemed to me that you were adding it for
> > > > consistency reasons (I could be wrong though).
> > > 
> > > You could do that, but you'd have to probably check that it is
> > > within i_size after you relock it, I think... yeah, that might
> > > be the best thing for ocfs to do for now.

Ok. So how about the attached patch? It's a bit different than discussed,
but I think it's much cleaner because it preserves the current behavior of
the callback and keeps that bit of page locking inside core code. Not tested
as of yet, but I can run it tommorrow.
--Mark

--
Mark Fasheh
Senior Software Developer, Oracle
[EMAIL PROTECTED]

From: Mark Fasheh <[EMAIL PROTECTED]>

[PATCH] Release page lock before calling ->page_mkwrite

__do_fault() was calling ->page_mkwrite() with the page lock held, which
violates the locking rules for that callback. Release and retake the page
lock around the callback to avoid deadlocking file systems which manually
take it.

Signed-off-by: Mark Fasheh <[EMAIL PROTECTED]>
---
 mm/memory.c |   14 +-
 1 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index 7221618..491cc27 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2378,11 +2378,14 @@ static int __do_fault(struct mm_struct *
 * address space wants to know that the page is about
 * to become writable
 */
-   if (vma->vm_ops->page_mkwrite &&
-   vma->vm_ops->page_mkwrite(vma, page) < 0) {
-   fdata.type = VM_FAULT_SIGBUS;
-   anon = 1; /* no anon but release faulted_page */
-   goto out;
+   if (vma->vm_ops->page_mkwrite) {
+   unlock_page(page);
+   if (vma->vm_ops->page_mkwrite(vma, page) < 0) {
+   fdata.type = VM_FAULT_SIGBUS;
+   anon = 1; /* no anon but release 
faulted_page */
+   goto out_unlocked;
+   }
+   lock_page(page);
}
}
 
@@ -2434,6 +2437,7 @@ static int __do_fault(struct mm_struct *
 
 out:
unlock_page(faulted_page);
+out_unlocked:
if (anon)
page_cache_release(faulted_page);
else if (dirty_page) {
-- 
1.4.2.3

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc3-mm1 - page_mkwrite() breakage

2007-05-31 Thread Nick Piggin
On Thu, May 31, 2007 at 06:45:17PM -0700, Mark Fasheh wrote:
> On Fri, Jun 01, 2007 at 03:34:02AM +0200, Nick Piggin wrote:
> > > Here's a nasty idea... Would it be valid for ->page_mkwrite to unlock the
> > > page, so long as it's returned in a locked state? Though, do we even need
> > > the page lock that early? It seemed to me that you were adding it for
> > > consistency reasons (I could be wrong though).
> > 
> > You could do that, but you'd have to probably check that it is
> > within i_size after you relock it, I think... yeah, that might
> > be the best thing for ocfs to do for now.
> 
> Well, ocfs2 already does i_size checks in page_mkwrite, so we're covered
> with respect to truncate races.
> 
> I'm still not clear though - what was the reason for adding the page locking
> there in the 1st place?

Yeah, its to cover page invalidation races. There is a description in
an earlier patch's changelog.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc3-mm1 - page_mkwrite() breakage

2007-05-31 Thread Mark Fasheh
On Fri, Jun 01, 2007 at 03:34:02AM +0200, Nick Piggin wrote:
> > Here's a nasty idea... Would it be valid for ->page_mkwrite to unlock the
> > page, so long as it's returned in a locked state? Though, do we even need
> > the page lock that early? It seemed to me that you were adding it for
> > consistency reasons (I could be wrong though).
> 
> You could do that, but you'd have to probably check that it is
> within i_size after you relock it, I think... yeah, that might
> be the best thing for ocfs to do for now.

Well, ocfs2 already does i_size checks in page_mkwrite, so we're covered
with respect to truncate races.

I'm still not clear though - what was the reason for adding the page locking
there in the 1st place?
--Mark

--
Mark Fasheh
Senior Software Developer, Oracle
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc3-mm1 - page_mkwrite() breakage

2007-05-31 Thread Nick Piggin
On Thu, May 31, 2007 at 06:24:40PM -0700, Mark Fasheh wrote:
> On Fri, Jun 01, 2007 at 03:01:29AM +0200, Nick Piggin wrote:
> > Ah, I didn't realise you were using that yet. I expect ocfs2 is using
> > VM_CAN_INVALIDATE there anyway.
> > 
> > Hmm, this becomes easier to deal with after page_mkwrite is merged with
> > ->fault. But for now, can we just lock the page at the do_wp_page site
> > as well, and change the API? All users I have seen want the page locked
> > there anyway...
> 
> Unfortunately that doesn't work for ocfs2 for exactly the same reasons page
> lock doesn't work during a write either - there's a cluster lock inversion
> and we might have to zero adjacent pages for an allocating write.

I guess you could just fail the page_mkwrite and have it try again? ..
that would require changing it not to always go SIGBUS though.


> What's involved in merging it with ->fault?

Just I have't sent the patch (because at the time there were no
page_mkwrite users to look at).

It is nicer too, because the nopage path only has  to call into
the filesystem once, to return the page (the filesystem can check
whether it is for write, and do the page_mkwrite thing at that
time). do_wp_page obviously still involves the extra call, and
that will be with a flag telling the fs that it isn't a "nopage"
fault, but a write fault on an existing page.

 
> Here's a nasty idea... Would it be valid for ->page_mkwrite to unlock the
> page, so long as it's returned in a locked state? Though, do we even need
> the page lock that early? It seemed to me that you were adding it for
> consistency reasons (I could be wrong though).

You could do that, but you'd have to probably check that it is
within i_size after you relock it, I think... yeah, that might
be the best thing for ocfs to do for now.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc3-mm1 - page_mkwrite() breakage

2007-05-31 Thread Mark Fasheh
On Fri, Jun 01, 2007 at 03:01:29AM +0200, Nick Piggin wrote:
> Ah, I didn't realise you were using that yet. I expect ocfs2 is using
> VM_CAN_INVALIDATE there anyway.
> 
> Hmm, this becomes easier to deal with after page_mkwrite is merged with
> ->fault. But for now, can we just lock the page at the do_wp_page site
> as well, and change the API? All users I have seen want the page locked
> there anyway...

Unfortunately that doesn't work for ocfs2 for exactly the same reasons page
lock doesn't work during a write either - there's a cluster lock inversion
and we might have to zero adjacent pages for an allocating write.

What's involved in merging it with ->fault?

Here's a nasty idea... Would it be valid for ->page_mkwrite to unlock the
page, so long as it's returned in a locked state? Though, do we even need
the page lock that early? It seemed to me that you were adding it for
consistency reasons (I could be wrong though).
--Mark

--
Mark Fasheh
Senior Software Developer, Oracle
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc3-mm1 - page_mkwrite() breakage

2007-05-31 Thread Nick Piggin
On Thu, May 31, 2007 at 04:13:54PM -0700, Mark Fasheh wrote:
> On Wed, May 30, 2007 at 11:58:23PM -0700, Andrew Morton wrote:
> >  git-ocfs2.patch
> 
> Andrew, thanks for getting that back in there.
> 
> 
> mm-fix-fault-vs-invalidate-race-for-linear-mappings.patch broke ocfs2 shared
> writable mmap. We hang on a page lock because ->page_mkwrite() is
> being called with the page already locked:
> 
> + /*
> +  * For consistency in subsequent calls, make the nopage_page always
> +  * locked.
> +  */
> + if (unlikely(!(vma->vm_flags & VM_CAN_INVALIDATE)))
> + lock_page(nopage_page);
> 
> It wasn't previously being called with the page lock held, intentionally.

Ah, I didn't realise you were using that yet. I expect ocfs2 is using
VM_CAN_INVALIDATE there anyway.

Hmm, this becomes easier to deal with after page_mkwrite is merged with
->fault. But for now, can we just lock the page at the do_wp_page site
as well, and change the API? All users I have seen want the page locked
there anyway...

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc3-mm1 - page_mkwrite() breakage

2007-05-31 Thread Mark Fasheh
On Wed, May 30, 2007 at 11:58:23PM -0700, Andrew Morton wrote:
>  git-ocfs2.patch

Andrew, thanks for getting that back in there.


mm-fix-fault-vs-invalidate-race-for-linear-mappings.patch broke ocfs2 shared
writable mmap. We hang on a page lock because ->page_mkwrite() is
being called with the page already locked:

+   /*
+* For consistency in subsequent calls, make the nopage_page always
+* locked.
+*/
+   if (unlikely(!(vma->vm_flags & VM_CAN_INVALIDATE)))
+   lock_page(nopage_page);

It wasn't previously being called with the page lock held, intentionally.
--Mark

--
Mark Fasheh
Senior Software Developer, Oracle
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc3-mm1 - page_mkwrite() breakage

2007-05-31 Thread Mark Fasheh
On Wed, May 30, 2007 at 11:58:23PM -0700, Andrew Morton wrote:
  git-ocfs2.patch

Andrew, thanks for getting that back in there.


mm-fix-fault-vs-invalidate-race-for-linear-mappings.patch broke ocfs2 shared
writable mmap. We hang on a page lock because -page_mkwrite() is
being called with the page already locked:

+   /*
+* For consistency in subsequent calls, make the nopage_page always
+* locked.
+*/
+   if (unlikely(!(vma-vm_flags  VM_CAN_INVALIDATE)))
+   lock_page(nopage_page);

It wasn't previously being called with the page lock held, intentionally.
--Mark

--
Mark Fasheh
Senior Software Developer, Oracle
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc3-mm1 - page_mkwrite() breakage

2007-05-31 Thread Nick Piggin
On Thu, May 31, 2007 at 04:13:54PM -0700, Mark Fasheh wrote:
 On Wed, May 30, 2007 at 11:58:23PM -0700, Andrew Morton wrote:
   git-ocfs2.patch
 
 Andrew, thanks for getting that back in there.
 
 
 mm-fix-fault-vs-invalidate-race-for-linear-mappings.patch broke ocfs2 shared
 writable mmap. We hang on a page lock because -page_mkwrite() is
 being called with the page already locked:
 
 + /*
 +  * For consistency in subsequent calls, make the nopage_page always
 +  * locked.
 +  */
 + if (unlikely(!(vma-vm_flags  VM_CAN_INVALIDATE)))
 + lock_page(nopage_page);
 
 It wasn't previously being called with the page lock held, intentionally.

Ah, I didn't realise you were using that yet. I expect ocfs2 is using
VM_CAN_INVALIDATE there anyway.

Hmm, this becomes easier to deal with after page_mkwrite is merged with
-fault. But for now, can we just lock the page at the do_wp_page site
as well, and change the API? All users I have seen want the page locked
there anyway...

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc3-mm1 - page_mkwrite() breakage

2007-05-31 Thread Mark Fasheh
On Fri, Jun 01, 2007 at 03:01:29AM +0200, Nick Piggin wrote:
 Ah, I didn't realise you were using that yet. I expect ocfs2 is using
 VM_CAN_INVALIDATE there anyway.
 
 Hmm, this becomes easier to deal with after page_mkwrite is merged with
 -fault. But for now, can we just lock the page at the do_wp_page site
 as well, and change the API? All users I have seen want the page locked
 there anyway...

Unfortunately that doesn't work for ocfs2 for exactly the same reasons page
lock doesn't work during a write either - there's a cluster lock inversion
and we might have to zero adjacent pages for an allocating write.

What's involved in merging it with -fault?

Here's a nasty idea... Would it be valid for -page_mkwrite to unlock the
page, so long as it's returned in a locked state? Though, do we even need
the page lock that early? It seemed to me that you were adding it for
consistency reasons (I could be wrong though).
--Mark

--
Mark Fasheh
Senior Software Developer, Oracle
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc3-mm1 - page_mkwrite() breakage

2007-05-31 Thread Nick Piggin
On Thu, May 31, 2007 at 06:24:40PM -0700, Mark Fasheh wrote:
 On Fri, Jun 01, 2007 at 03:01:29AM +0200, Nick Piggin wrote:
  Ah, I didn't realise you were using that yet. I expect ocfs2 is using
  VM_CAN_INVALIDATE there anyway.
  
  Hmm, this becomes easier to deal with after page_mkwrite is merged with
  -fault. But for now, can we just lock the page at the do_wp_page site
  as well, and change the API? All users I have seen want the page locked
  there anyway...
 
 Unfortunately that doesn't work for ocfs2 for exactly the same reasons page
 lock doesn't work during a write either - there's a cluster lock inversion
 and we might have to zero adjacent pages for an allocating write.

I guess you could just fail the page_mkwrite and have it try again? ..
that would require changing it not to always go SIGBUS though.


 What's involved in merging it with -fault?

Just I have't sent the patch (because at the time there were no
page_mkwrite users to look at).

It is nicer too, because the nopage path only has  to call into
the filesystem once, to return the page (the filesystem can check
whether it is for write, and do the page_mkwrite thing at that
time). do_wp_page obviously still involves the extra call, and
that will be with a flag telling the fs that it isn't a nopage
fault, but a write fault on an existing page.

 
 Here's a nasty idea... Would it be valid for -page_mkwrite to unlock the
 page, so long as it's returned in a locked state? Though, do we even need
 the page lock that early? It seemed to me that you were adding it for
 consistency reasons (I could be wrong though).

You could do that, but you'd have to probably check that it is
within i_size after you relock it, I think... yeah, that might
be the best thing for ocfs to do for now.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc3-mm1 - page_mkwrite() breakage

2007-05-31 Thread Nick Piggin
On Thu, May 31, 2007 at 06:45:17PM -0700, Mark Fasheh wrote:
 On Fri, Jun 01, 2007 at 03:34:02AM +0200, Nick Piggin wrote:
   Here's a nasty idea... Would it be valid for -page_mkwrite to unlock the
   page, so long as it's returned in a locked state? Though, do we even need
   the page lock that early? It seemed to me that you were adding it for
   consistency reasons (I could be wrong though).
  
  You could do that, but you'd have to probably check that it is
  within i_size after you relock it, I think... yeah, that might
  be the best thing for ocfs to do for now.
 
 Well, ocfs2 already does i_size checks in page_mkwrite, so we're covered
 with respect to truncate races.
 
 I'm still not clear though - what was the reason for adding the page locking
 there in the 1st place?

Yeah, its to cover page invalidation races. There is a description in
an earlier patch's changelog.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc3-mm1 - page_mkwrite() breakage

2007-05-31 Thread Mark Fasheh
On Fri, Jun 01, 2007 at 03:34:02AM +0200, Nick Piggin wrote:
  Here's a nasty idea... Would it be valid for -page_mkwrite to unlock the
  page, so long as it's returned in a locked state? Though, do we even need
  the page lock that early? It seemed to me that you were adding it for
  consistency reasons (I could be wrong though).
 
 You could do that, but you'd have to probably check that it is
 within i_size after you relock it, I think... yeah, that might
 be the best thing for ocfs to do for now.

Well, ocfs2 already does i_size checks in page_mkwrite, so we're covered
with respect to truncate races.

I'm still not clear though - what was the reason for adding the page locking
there in the 1st place?
--Mark

--
Mark Fasheh
Senior Software Developer, Oracle
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc3-mm1 - page_mkwrite() breakage

2007-05-31 Thread Mark Fasheh
On Fri, Jun 01, 2007 at 03:53:49AM +0200, Nick Piggin wrote:
 On Thu, May 31, 2007 at 06:45:17PM -0700, Mark Fasheh wrote:
  On Fri, Jun 01, 2007 at 03:34:02AM +0200, Nick Piggin wrote:
Here's a nasty idea... Would it be valid for -page_mkwrite to unlock 
the
page, so long as it's returned in a locked state? Though, do we even 
need
the page lock that early? It seemed to me that you were adding it for
consistency reasons (I could be wrong though).
   
   You could do that, but you'd have to probably check that it is
   within i_size after you relock it, I think... yeah, that might
   be the best thing for ocfs to do for now.

Ok. So how about the attached patch? It's a bit different than discussed,
but I think it's much cleaner because it preserves the current behavior of
the callback and keeps that bit of page locking inside core code. Not tested
as of yet, but I can run it tommorrow.
--Mark

--
Mark Fasheh
Senior Software Developer, Oracle
[EMAIL PROTECTED]

From: Mark Fasheh [EMAIL PROTECTED]

[PATCH] Release page lock before calling -page_mkwrite

__do_fault() was calling -page_mkwrite() with the page lock held, which
violates the locking rules for that callback. Release and retake the page
lock around the callback to avoid deadlocking file systems which manually
take it.

Signed-off-by: Mark Fasheh [EMAIL PROTECTED]
---
 mm/memory.c |   14 +-
 1 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index 7221618..491cc27 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2378,11 +2378,14 @@ static int __do_fault(struct mm_struct *
 * address space wants to know that the page is about
 * to become writable
 */
-   if (vma-vm_ops-page_mkwrite 
-   vma-vm_ops-page_mkwrite(vma, page)  0) {
-   fdata.type = VM_FAULT_SIGBUS;
-   anon = 1; /* no anon but release faulted_page */
-   goto out;
+   if (vma-vm_ops-page_mkwrite) {
+   unlock_page(page);
+   if (vma-vm_ops-page_mkwrite(vma, page)  0) {
+   fdata.type = VM_FAULT_SIGBUS;
+   anon = 1; /* no anon but release 
faulted_page */
+   goto out_unlocked;
+   }
+   lock_page(page);
}
}
 
@@ -2434,6 +2437,7 @@ static int __do_fault(struct mm_struct *
 
 out:
unlock_page(faulted_page);
+out_unlocked:
if (anon)
page_cache_release(faulted_page);
else if (dirty_page) {
-- 
1.4.2.3

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/