Re: 2.6.22-rc3-mm1 - page_mkwrite() breakage
On Fri, Jun 01, 2007 at 03:47:35PM -0700, Andrew Morton wrote: > Right. I did a lot of tricksy work for rc3-mm1 to merge git-ocfs2 on top > of Nick's stuff. Then I repulled your tree and lost it all. This is > because I was dumb and I fixed rc3-mm1's git-ocfs.patch rather than doing a > separate fix-rejects-in-git-ocfs2.patch. > > This is all unique-to-akpm stuff which you don't need to worry about ;) Ok, I am no longer concerned! :) > > So, which of Nick's patches are we talking about here? > > > > Btw, I know you tend to handle rejects yourself, but if it's a major PITA > > I'd be happy to help out. Boy, I'm hoping I didn't just ask for a load of > > trouble there :) > > Is OK - I'll move Nick's patches back to behind the git trees and it'll all > come > good. Phew ok. Once again, thanks for all the work you do getting the ocfs2 git patches into -mm. --Mark -- Mark Fasheh Senior Software Developer, Oracle [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22-rc3-mm1 - page_mkwrite() breakage
On Fri, 1 Jun 2007 15:33:02 -0700 Mark Fasheh <[EMAIL PROTECTED]> wrote: > On Fri, Jun 01, 2007 at 03:25:37PM -0700, Andrew Morton wrote: > > > Andrew, if this is ok with you I'd really like to see that fix in -mm. > > > Ocfs2 > > > shared write mmap will instantly deadlock without it. > > > > ug, OK. I get a ginormous reject when merging ocfs2 on Nick's stuff which > > I've been largely ignoring thus far. > > Huh, I'm a bit confused... I created this patch on top of 2.6.22-rc3-mm1 > which most certainly contains a merge of git-ocfs2.patch and the series > which at least contains > mm-fix-fault-vs-invalidate-race-for-linear-mappings.patch. Right. I did a lot of tricksy work for rc3-mm1 to merge git-ocfs2 on top of Nick's stuff. Then I repulled your tree and lost it all. This is because I was dumb and I fixed rc3-mm1's git-ocfs.patch rather than doing a separate fix-rejects-in-git-ocfs2.patch. This is all unique-to-akpm stuff which you don't need to worry about ;) > So, which of Nick's patches are we talking about here? > > Btw, I know you tend to handle rejects yourself, but if it's a major PITA > I'd be happy to help out. Boy, I'm hoping I didn't just ask for a load of > trouble there :) Is OK - I'll move Nick's patches back to behind the git trees and it'll all come good. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22-rc3-mm1 - page_mkwrite() breakage
On Fri, Jun 01, 2007 at 03:25:37PM -0700, Andrew Morton wrote: > > Andrew, if this is ok with you I'd really like to see that fix in -mm. Ocfs2 > > shared write mmap will instantly deadlock without it. > > ug, OK. I get a ginormous reject when merging ocfs2 on Nick's stuff which > I've been largely ignoring thus far. Huh, I'm a bit confused... I created this patch on top of 2.6.22-rc3-mm1 which most certainly contains a merge of git-ocfs2.patch and the series which at least contains mm-fix-fault-vs-invalidate-race-for-linear-mappings.patch. So, which of Nick's patches are we talking about here? Btw, I know you tend to handle rejects yourself, but if it's a major PITA I'd be happy to help out. Boy, I'm hoping I didn't just ask for a load of trouble there :) --Mark -- Mark Fasheh Senior Software Developer, Oracle [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22-rc3-mm1 - page_mkwrite() breakage
On Fri, 1 Jun 2007 15:01:18 -0700 Mark Fasheh <[EMAIL PROTECTED]> wrote: > On Thu, May 31, 2007 at 10:20:39PM -0700, Mark Fasheh wrote: > > Ok. So how about the attached patch? It's a bit different than discussed, > > but I think it's much cleaner because it preserves the current behavior of > > the callback and keeps that bit of page locking inside core code. Not tested > > as of yet, but I can run it tommorrow. > > Ok - this patch seems to check out fine in testing - no more deadlocking. > > Andrew, if this is ok with you I'd really like to see that fix in -mm. Ocfs2 > shared write mmap will instantly deadlock without it. ug, OK. I get a ginormous reject when merging ocfs2 on Nick's stuff which I've been largely ignoring thus far. Perhaps I need to go back to staging Nick's stuff after the git trees. I'll take a look. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22-rc3-mm1 - page_mkwrite() breakage
On Thu, May 31, 2007 at 10:20:39PM -0700, Mark Fasheh wrote: > Ok. So how about the attached patch? It's a bit different than discussed, > but I think it's much cleaner because it preserves the current behavior of > the callback and keeps that bit of page locking inside core code. Not tested > as of yet, but I can run it tommorrow. Ok - this patch seems to check out fine in testing - no more deadlocking. Andrew, if this is ok with you I'd really like to see that fix in -mm. Ocfs2 shared write mmap will instantly deadlock without it. >From reading Nick's description of the problem in mm-fix-fault-vs-invalidate-race-for-linear-mappings.patch I think we're pretty safe (as I noted before) because Ocfs2 re-checks the mapping under lock to protect against trucate races. That's been an "unwritten" requirement of page_mkwrite() anyway. Speaking of requirements, attached is my sad attempt at documenting the API. I know it might be merged into ->fault at some point, but we really ought to have _something_ in the meantime. --Mark -- Mark Fasheh Senior Software Developer, Oracle [EMAIL PROTECTED] From: Mark Fasheh <[EMAIL PROTECTED]> [PATCH] Document ->page_mkwrite() locking There seems to be very little documentation about this callback in general. The locking in particular is a bit tricky, so it's worth having this in writing. Signed-off-by: Mark Fasheh <[EMAIL PROTECTED]> --- Documentation/filesystems/Locking | 11 ++- 1 files changed, 10 insertions(+), 1 deletions(-) 2320eadfa34199c779638edbdbb6c491df09c49b diff --git a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking index 970c8ec..91ec4b4 100644 --- a/Documentation/filesystems/Locking +++ b/Documentation/filesystems/Locking @@ -512,13 +512,22 @@ prototypes: void (*close)(struct vm_area_struct*); struct page *(*fault)(struct vm_area_struct*, struct fault_data *); struct page *(*nopage)(struct vm_area_struct*, unsigned long, int *); + int (*page_mkwrite)(struct vm_area_struct *, struct page *); locking rules: - BKL mmap_sem + BKL mmap_semPageLocked(page) open: no yes close: no yes fault: no yes nopage:no yes +page_mkwrite: no yes no + + ->page_mkwrite() is called when a previously read-only page is +about to become writeable. The file system is responsible for +protecting against truncate races. Once appropriate action has been +taking to lock out truncate, the page range should be verified to be +within i_size. The page mapping should also be checked that it is not +NULL. Dubious stuff -- 1.3.3 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22-rc3-mm1 - page_mkwrite() breakage
On Thu, May 31, 2007 at 10:20:39PM -0700, Mark Fasheh wrote: Ok. So how about the attached patch? It's a bit different than discussed, but I think it's much cleaner because it preserves the current behavior of the callback and keeps that bit of page locking inside core code. Not tested as of yet, but I can run it tommorrow. Ok - this patch seems to check out fine in testing - no more deadlocking. Andrew, if this is ok with you I'd really like to see that fix in -mm. Ocfs2 shared write mmap will instantly deadlock without it. From reading Nick's description of the problem in mm-fix-fault-vs-invalidate-race-for-linear-mappings.patch I think we're pretty safe (as I noted before) because Ocfs2 re-checks the mapping under lock to protect against trucate races. That's been an unwritten requirement of page_mkwrite() anyway. Speaking of requirements, attached is my sad attempt at documenting the API. I know it might be merged into -fault at some point, but we really ought to have _something_ in the meantime. --Mark -- Mark Fasheh Senior Software Developer, Oracle [EMAIL PROTECTED] From: Mark Fasheh [EMAIL PROTECTED] [PATCH] Document -page_mkwrite() locking There seems to be very little documentation about this callback in general. The locking in particular is a bit tricky, so it's worth having this in writing. Signed-off-by: Mark Fasheh [EMAIL PROTECTED] --- Documentation/filesystems/Locking | 11 ++- 1 files changed, 10 insertions(+), 1 deletions(-) 2320eadfa34199c779638edbdbb6c491df09c49b diff --git a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking index 970c8ec..91ec4b4 100644 --- a/Documentation/filesystems/Locking +++ b/Documentation/filesystems/Locking @@ -512,13 +512,22 @@ prototypes: void (*close)(struct vm_area_struct*); struct page *(*fault)(struct vm_area_struct*, struct fault_data *); struct page *(*nopage)(struct vm_area_struct*, unsigned long, int *); + int (*page_mkwrite)(struct vm_area_struct *, struct page *); locking rules: - BKL mmap_sem + BKL mmap_semPageLocked(page) open: no yes close: no yes fault: no yes nopage:no yes +page_mkwrite: no yes no + + -page_mkwrite() is called when a previously read-only page is +about to become writeable. The file system is responsible for +protecting against truncate races. Once appropriate action has been +taking to lock out truncate, the page range should be verified to be +within i_size. The page mapping should also be checked that it is not +NULL. Dubious stuff -- 1.3.3 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22-rc3-mm1 - page_mkwrite() breakage
On Fri, 1 Jun 2007 15:01:18 -0700 Mark Fasheh [EMAIL PROTECTED] wrote: On Thu, May 31, 2007 at 10:20:39PM -0700, Mark Fasheh wrote: Ok. So how about the attached patch? It's a bit different than discussed, but I think it's much cleaner because it preserves the current behavior of the callback and keeps that bit of page locking inside core code. Not tested as of yet, but I can run it tommorrow. Ok - this patch seems to check out fine in testing - no more deadlocking. Andrew, if this is ok with you I'd really like to see that fix in -mm. Ocfs2 shared write mmap will instantly deadlock without it. ug, OK. I get a ginormous reject when merging ocfs2 on Nick's stuff which I've been largely ignoring thus far. Perhaps I need to go back to staging Nick's stuff after the git trees. I'll take a look. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22-rc3-mm1 - page_mkwrite() breakage
On Fri, Jun 01, 2007 at 03:25:37PM -0700, Andrew Morton wrote: Andrew, if this is ok with you I'd really like to see that fix in -mm. Ocfs2 shared write mmap will instantly deadlock without it. ug, OK. I get a ginormous reject when merging ocfs2 on Nick's stuff which I've been largely ignoring thus far. Huh, I'm a bit confused... I created this patch on top of 2.6.22-rc3-mm1 which most certainly contains a merge of git-ocfs2.patch and the series which at least contains mm-fix-fault-vs-invalidate-race-for-linear-mappings.patch. So, which of Nick's patches are we talking about here? Btw, I know you tend to handle rejects yourself, but if it's a major PITA I'd be happy to help out. Boy, I'm hoping I didn't just ask for a load of trouble there :) --Mark -- Mark Fasheh Senior Software Developer, Oracle [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22-rc3-mm1 - page_mkwrite() breakage
On Fri, 1 Jun 2007 15:33:02 -0700 Mark Fasheh [EMAIL PROTECTED] wrote: On Fri, Jun 01, 2007 at 03:25:37PM -0700, Andrew Morton wrote: Andrew, if this is ok with you I'd really like to see that fix in -mm. Ocfs2 shared write mmap will instantly deadlock without it. ug, OK. I get a ginormous reject when merging ocfs2 on Nick's stuff which I've been largely ignoring thus far. Huh, I'm a bit confused... I created this patch on top of 2.6.22-rc3-mm1 which most certainly contains a merge of git-ocfs2.patch and the series which at least contains mm-fix-fault-vs-invalidate-race-for-linear-mappings.patch. Right. I did a lot of tricksy work for rc3-mm1 to merge git-ocfs2 on top of Nick's stuff. Then I repulled your tree and lost it all. This is because I was dumb and I fixed rc3-mm1's git-ocfs.patch rather than doing a separate fix-rejects-in-git-ocfs2.patch. This is all unique-to-akpm stuff which you don't need to worry about ;) So, which of Nick's patches are we talking about here? Btw, I know you tend to handle rejects yourself, but if it's a major PITA I'd be happy to help out. Boy, I'm hoping I didn't just ask for a load of trouble there :) Is OK - I'll move Nick's patches back to behind the git trees and it'll all come good. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22-rc3-mm1 - page_mkwrite() breakage
On Fri, Jun 01, 2007 at 03:47:35PM -0700, Andrew Morton wrote: Right. I did a lot of tricksy work for rc3-mm1 to merge git-ocfs2 on top of Nick's stuff. Then I repulled your tree and lost it all. This is because I was dumb and I fixed rc3-mm1's git-ocfs.patch rather than doing a separate fix-rejects-in-git-ocfs2.patch. This is all unique-to-akpm stuff which you don't need to worry about ;) Ok, I am no longer concerned! :) So, which of Nick's patches are we talking about here? Btw, I know you tend to handle rejects yourself, but if it's a major PITA I'd be happy to help out. Boy, I'm hoping I didn't just ask for a load of trouble there :) Is OK - I'll move Nick's patches back to behind the git trees and it'll all come good. Phew ok. Once again, thanks for all the work you do getting the ocfs2 git patches into -mm. --Mark -- Mark Fasheh Senior Software Developer, Oracle [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22-rc3-mm1 - page_mkwrite() breakage
On Fri, Jun 01, 2007 at 03:53:49AM +0200, Nick Piggin wrote: > On Thu, May 31, 2007 at 06:45:17PM -0700, Mark Fasheh wrote: > > On Fri, Jun 01, 2007 at 03:34:02AM +0200, Nick Piggin wrote: > > > > Here's a nasty idea... Would it be valid for ->page_mkwrite to unlock > > > > the > > > > page, so long as it's returned in a locked state? Though, do we even > > > > need > > > > the page lock that early? It seemed to me that you were adding it for > > > > consistency reasons (I could be wrong though). > > > > > > You could do that, but you'd have to probably check that it is > > > within i_size after you relock it, I think... yeah, that might > > > be the best thing for ocfs to do for now. Ok. So how about the attached patch? It's a bit different than discussed, but I think it's much cleaner because it preserves the current behavior of the callback and keeps that bit of page locking inside core code. Not tested as of yet, but I can run it tommorrow. --Mark -- Mark Fasheh Senior Software Developer, Oracle [EMAIL PROTECTED] From: Mark Fasheh <[EMAIL PROTECTED]> [PATCH] Release page lock before calling ->page_mkwrite __do_fault() was calling ->page_mkwrite() with the page lock held, which violates the locking rules for that callback. Release and retake the page lock around the callback to avoid deadlocking file systems which manually take it. Signed-off-by: Mark Fasheh <[EMAIL PROTECTED]> --- mm/memory.c | 14 +- 1 files changed, 9 insertions(+), 5 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index 7221618..491cc27 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -2378,11 +2378,14 @@ static int __do_fault(struct mm_struct * * address space wants to know that the page is about * to become writable */ - if (vma->vm_ops->page_mkwrite && - vma->vm_ops->page_mkwrite(vma, page) < 0) { - fdata.type = VM_FAULT_SIGBUS; - anon = 1; /* no anon but release faulted_page */ - goto out; + if (vma->vm_ops->page_mkwrite) { + unlock_page(page); + if (vma->vm_ops->page_mkwrite(vma, page) < 0) { + fdata.type = VM_FAULT_SIGBUS; + anon = 1; /* no anon but release faulted_page */ + goto out_unlocked; + } + lock_page(page); } } @@ -2434,6 +2437,7 @@ static int __do_fault(struct mm_struct * out: unlock_page(faulted_page); +out_unlocked: if (anon) page_cache_release(faulted_page); else if (dirty_page) { -- 1.4.2.3 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22-rc3-mm1 - page_mkwrite() breakage
On Thu, May 31, 2007 at 06:45:17PM -0700, Mark Fasheh wrote: > On Fri, Jun 01, 2007 at 03:34:02AM +0200, Nick Piggin wrote: > > > Here's a nasty idea... Would it be valid for ->page_mkwrite to unlock the > > > page, so long as it's returned in a locked state? Though, do we even need > > > the page lock that early? It seemed to me that you were adding it for > > > consistency reasons (I could be wrong though). > > > > You could do that, but you'd have to probably check that it is > > within i_size after you relock it, I think... yeah, that might > > be the best thing for ocfs to do for now. > > Well, ocfs2 already does i_size checks in page_mkwrite, so we're covered > with respect to truncate races. > > I'm still not clear though - what was the reason for adding the page locking > there in the 1st place? Yeah, its to cover page invalidation races. There is a description in an earlier patch's changelog. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22-rc3-mm1 - page_mkwrite() breakage
On Fri, Jun 01, 2007 at 03:34:02AM +0200, Nick Piggin wrote: > > Here's a nasty idea... Would it be valid for ->page_mkwrite to unlock the > > page, so long as it's returned in a locked state? Though, do we even need > > the page lock that early? It seemed to me that you were adding it for > > consistency reasons (I could be wrong though). > > You could do that, but you'd have to probably check that it is > within i_size after you relock it, I think... yeah, that might > be the best thing for ocfs to do for now. Well, ocfs2 already does i_size checks in page_mkwrite, so we're covered with respect to truncate races. I'm still not clear though - what was the reason for adding the page locking there in the 1st place? --Mark -- Mark Fasheh Senior Software Developer, Oracle [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22-rc3-mm1 - page_mkwrite() breakage
On Thu, May 31, 2007 at 06:24:40PM -0700, Mark Fasheh wrote: > On Fri, Jun 01, 2007 at 03:01:29AM +0200, Nick Piggin wrote: > > Ah, I didn't realise you were using that yet. I expect ocfs2 is using > > VM_CAN_INVALIDATE there anyway. > > > > Hmm, this becomes easier to deal with after page_mkwrite is merged with > > ->fault. But for now, can we just lock the page at the do_wp_page site > > as well, and change the API? All users I have seen want the page locked > > there anyway... > > Unfortunately that doesn't work for ocfs2 for exactly the same reasons page > lock doesn't work during a write either - there's a cluster lock inversion > and we might have to zero adjacent pages for an allocating write. I guess you could just fail the page_mkwrite and have it try again? .. that would require changing it not to always go SIGBUS though. > What's involved in merging it with ->fault? Just I have't sent the patch (because at the time there were no page_mkwrite users to look at). It is nicer too, because the nopage path only has to call into the filesystem once, to return the page (the filesystem can check whether it is for write, and do the page_mkwrite thing at that time). do_wp_page obviously still involves the extra call, and that will be with a flag telling the fs that it isn't a "nopage" fault, but a write fault on an existing page. > Here's a nasty idea... Would it be valid for ->page_mkwrite to unlock the > page, so long as it's returned in a locked state? Though, do we even need > the page lock that early? It seemed to me that you were adding it for > consistency reasons (I could be wrong though). You could do that, but you'd have to probably check that it is within i_size after you relock it, I think... yeah, that might be the best thing for ocfs to do for now. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22-rc3-mm1 - page_mkwrite() breakage
On Fri, Jun 01, 2007 at 03:01:29AM +0200, Nick Piggin wrote: > Ah, I didn't realise you were using that yet. I expect ocfs2 is using > VM_CAN_INVALIDATE there anyway. > > Hmm, this becomes easier to deal with after page_mkwrite is merged with > ->fault. But for now, can we just lock the page at the do_wp_page site > as well, and change the API? All users I have seen want the page locked > there anyway... Unfortunately that doesn't work for ocfs2 for exactly the same reasons page lock doesn't work during a write either - there's a cluster lock inversion and we might have to zero adjacent pages for an allocating write. What's involved in merging it with ->fault? Here's a nasty idea... Would it be valid for ->page_mkwrite to unlock the page, so long as it's returned in a locked state? Though, do we even need the page lock that early? It seemed to me that you were adding it for consistency reasons (I could be wrong though). --Mark -- Mark Fasheh Senior Software Developer, Oracle [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22-rc3-mm1 - page_mkwrite() breakage
On Thu, May 31, 2007 at 04:13:54PM -0700, Mark Fasheh wrote: > On Wed, May 30, 2007 at 11:58:23PM -0700, Andrew Morton wrote: > > git-ocfs2.patch > > Andrew, thanks for getting that back in there. > > > mm-fix-fault-vs-invalidate-race-for-linear-mappings.patch broke ocfs2 shared > writable mmap. We hang on a page lock because ->page_mkwrite() is > being called with the page already locked: > > + /* > + * For consistency in subsequent calls, make the nopage_page always > + * locked. > + */ > + if (unlikely(!(vma->vm_flags & VM_CAN_INVALIDATE))) > + lock_page(nopage_page); > > It wasn't previously being called with the page lock held, intentionally. Ah, I didn't realise you were using that yet. I expect ocfs2 is using VM_CAN_INVALIDATE there anyway. Hmm, this becomes easier to deal with after page_mkwrite is merged with ->fault. But for now, can we just lock the page at the do_wp_page site as well, and change the API? All users I have seen want the page locked there anyway... - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22-rc3-mm1 - page_mkwrite() breakage
On Wed, May 30, 2007 at 11:58:23PM -0700, Andrew Morton wrote: > git-ocfs2.patch Andrew, thanks for getting that back in there. mm-fix-fault-vs-invalidate-race-for-linear-mappings.patch broke ocfs2 shared writable mmap. We hang on a page lock because ->page_mkwrite() is being called with the page already locked: + /* +* For consistency in subsequent calls, make the nopage_page always +* locked. +*/ + if (unlikely(!(vma->vm_flags & VM_CAN_INVALIDATE))) + lock_page(nopage_page); It wasn't previously being called with the page lock held, intentionally. --Mark -- Mark Fasheh Senior Software Developer, Oracle [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22-rc3-mm1 - page_mkwrite() breakage
On Wed, May 30, 2007 at 11:58:23PM -0700, Andrew Morton wrote: git-ocfs2.patch Andrew, thanks for getting that back in there. mm-fix-fault-vs-invalidate-race-for-linear-mappings.patch broke ocfs2 shared writable mmap. We hang on a page lock because -page_mkwrite() is being called with the page already locked: + /* +* For consistency in subsequent calls, make the nopage_page always +* locked. +*/ + if (unlikely(!(vma-vm_flags VM_CAN_INVALIDATE))) + lock_page(nopage_page); It wasn't previously being called with the page lock held, intentionally. --Mark -- Mark Fasheh Senior Software Developer, Oracle [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22-rc3-mm1 - page_mkwrite() breakage
On Thu, May 31, 2007 at 04:13:54PM -0700, Mark Fasheh wrote: On Wed, May 30, 2007 at 11:58:23PM -0700, Andrew Morton wrote: git-ocfs2.patch Andrew, thanks for getting that back in there. mm-fix-fault-vs-invalidate-race-for-linear-mappings.patch broke ocfs2 shared writable mmap. We hang on a page lock because -page_mkwrite() is being called with the page already locked: + /* + * For consistency in subsequent calls, make the nopage_page always + * locked. + */ + if (unlikely(!(vma-vm_flags VM_CAN_INVALIDATE))) + lock_page(nopage_page); It wasn't previously being called with the page lock held, intentionally. Ah, I didn't realise you were using that yet. I expect ocfs2 is using VM_CAN_INVALIDATE there anyway. Hmm, this becomes easier to deal with after page_mkwrite is merged with -fault. But for now, can we just lock the page at the do_wp_page site as well, and change the API? All users I have seen want the page locked there anyway... - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22-rc3-mm1 - page_mkwrite() breakage
On Fri, Jun 01, 2007 at 03:01:29AM +0200, Nick Piggin wrote: Ah, I didn't realise you were using that yet. I expect ocfs2 is using VM_CAN_INVALIDATE there anyway. Hmm, this becomes easier to deal with after page_mkwrite is merged with -fault. But for now, can we just lock the page at the do_wp_page site as well, and change the API? All users I have seen want the page locked there anyway... Unfortunately that doesn't work for ocfs2 for exactly the same reasons page lock doesn't work during a write either - there's a cluster lock inversion and we might have to zero adjacent pages for an allocating write. What's involved in merging it with -fault? Here's a nasty idea... Would it be valid for -page_mkwrite to unlock the page, so long as it's returned in a locked state? Though, do we even need the page lock that early? It seemed to me that you were adding it for consistency reasons (I could be wrong though). --Mark -- Mark Fasheh Senior Software Developer, Oracle [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22-rc3-mm1 - page_mkwrite() breakage
On Thu, May 31, 2007 at 06:24:40PM -0700, Mark Fasheh wrote: On Fri, Jun 01, 2007 at 03:01:29AM +0200, Nick Piggin wrote: Ah, I didn't realise you were using that yet. I expect ocfs2 is using VM_CAN_INVALIDATE there anyway. Hmm, this becomes easier to deal with after page_mkwrite is merged with -fault. But for now, can we just lock the page at the do_wp_page site as well, and change the API? All users I have seen want the page locked there anyway... Unfortunately that doesn't work for ocfs2 for exactly the same reasons page lock doesn't work during a write either - there's a cluster lock inversion and we might have to zero adjacent pages for an allocating write. I guess you could just fail the page_mkwrite and have it try again? .. that would require changing it not to always go SIGBUS though. What's involved in merging it with -fault? Just I have't sent the patch (because at the time there were no page_mkwrite users to look at). It is nicer too, because the nopage path only has to call into the filesystem once, to return the page (the filesystem can check whether it is for write, and do the page_mkwrite thing at that time). do_wp_page obviously still involves the extra call, and that will be with a flag telling the fs that it isn't a nopage fault, but a write fault on an existing page. Here's a nasty idea... Would it be valid for -page_mkwrite to unlock the page, so long as it's returned in a locked state? Though, do we even need the page lock that early? It seemed to me that you were adding it for consistency reasons (I could be wrong though). You could do that, but you'd have to probably check that it is within i_size after you relock it, I think... yeah, that might be the best thing for ocfs to do for now. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22-rc3-mm1 - page_mkwrite() breakage
On Thu, May 31, 2007 at 06:45:17PM -0700, Mark Fasheh wrote: On Fri, Jun 01, 2007 at 03:34:02AM +0200, Nick Piggin wrote: Here's a nasty idea... Would it be valid for -page_mkwrite to unlock the page, so long as it's returned in a locked state? Though, do we even need the page lock that early? It seemed to me that you were adding it for consistency reasons (I could be wrong though). You could do that, but you'd have to probably check that it is within i_size after you relock it, I think... yeah, that might be the best thing for ocfs to do for now. Well, ocfs2 already does i_size checks in page_mkwrite, so we're covered with respect to truncate races. I'm still not clear though - what was the reason for adding the page locking there in the 1st place? Yeah, its to cover page invalidation races. There is a description in an earlier patch's changelog. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22-rc3-mm1 - page_mkwrite() breakage
On Fri, Jun 01, 2007 at 03:34:02AM +0200, Nick Piggin wrote: Here's a nasty idea... Would it be valid for -page_mkwrite to unlock the page, so long as it's returned in a locked state? Though, do we even need the page lock that early? It seemed to me that you were adding it for consistency reasons (I could be wrong though). You could do that, but you'd have to probably check that it is within i_size after you relock it, I think... yeah, that might be the best thing for ocfs to do for now. Well, ocfs2 already does i_size checks in page_mkwrite, so we're covered with respect to truncate races. I'm still not clear though - what was the reason for adding the page locking there in the 1st place? --Mark -- Mark Fasheh Senior Software Developer, Oracle [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22-rc3-mm1 - page_mkwrite() breakage
On Fri, Jun 01, 2007 at 03:53:49AM +0200, Nick Piggin wrote: On Thu, May 31, 2007 at 06:45:17PM -0700, Mark Fasheh wrote: On Fri, Jun 01, 2007 at 03:34:02AM +0200, Nick Piggin wrote: Here's a nasty idea... Would it be valid for -page_mkwrite to unlock the page, so long as it's returned in a locked state? Though, do we even need the page lock that early? It seemed to me that you were adding it for consistency reasons (I could be wrong though). You could do that, but you'd have to probably check that it is within i_size after you relock it, I think... yeah, that might be the best thing for ocfs to do for now. Ok. So how about the attached patch? It's a bit different than discussed, but I think it's much cleaner because it preserves the current behavior of the callback and keeps that bit of page locking inside core code. Not tested as of yet, but I can run it tommorrow. --Mark -- Mark Fasheh Senior Software Developer, Oracle [EMAIL PROTECTED] From: Mark Fasheh [EMAIL PROTECTED] [PATCH] Release page lock before calling -page_mkwrite __do_fault() was calling -page_mkwrite() with the page lock held, which violates the locking rules for that callback. Release and retake the page lock around the callback to avoid deadlocking file systems which manually take it. Signed-off-by: Mark Fasheh [EMAIL PROTECTED] --- mm/memory.c | 14 +- 1 files changed, 9 insertions(+), 5 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index 7221618..491cc27 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -2378,11 +2378,14 @@ static int __do_fault(struct mm_struct * * address space wants to know that the page is about * to become writable */ - if (vma-vm_ops-page_mkwrite - vma-vm_ops-page_mkwrite(vma, page) 0) { - fdata.type = VM_FAULT_SIGBUS; - anon = 1; /* no anon but release faulted_page */ - goto out; + if (vma-vm_ops-page_mkwrite) { + unlock_page(page); + if (vma-vm_ops-page_mkwrite(vma, page) 0) { + fdata.type = VM_FAULT_SIGBUS; + anon = 1; /* no anon but release faulted_page */ + goto out_unlocked; + } + lock_page(page); } } @@ -2434,6 +2437,7 @@ static int __do_fault(struct mm_struct * out: unlock_page(faulted_page); +out_unlocked: if (anon) page_cache_release(faulted_page); else if (dirty_page) { -- 1.4.2.3 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/