Re: unionfs: several more problems
In message <[EMAIL PROTECTED]>, Hugh Dickins writes: > On Mon, 26 Nov 2007, Erez Zadok wrote: [...] > > The small patch below fixed the problem. Let me know what you think. > > I've one issue with it: please move that wait_on_page_writeback before > the clear_page_dirty_for_io instead of after it, then resubmit your 14/16. [...] Done, tested, and working. Here's the revised patch (pushed to unionfs.git on korg). Thanks, Erez. -- Unionfs: prevent multiple writers to lower_page Without this patch, the LTP fs test "rwtest04" triggers a BUG_ON(PageWriteback(page)) in fs/buffer.c:1706. CC: Hugh Dickins <[EMAIL PROTECTED]> Signed-off-by: Erez Zadok <[EMAIL PROTECTED]> diff --git a/fs/unionfs/mmap.c b/fs/unionfs/mmap.c index 623a913..74f2e53 100644 --- a/fs/unionfs/mmap.c +++ b/fs/unionfs/mmap.c @@ -72,6 +72,7 @@ static int unionfs_writepage(struct page *page, struct writeback_control *wbc) } BUG_ON(!lower_mapping->a_ops->writepage); + wait_on_page_writeback(lower_page); /* prevent multiple writers */ clear_page_dirty_for_io(lower_page); /* emulate VFS behavior */ err = lower_mapping->a_ops->writepage(lower_page, wbc); if (err < 0) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: unionfs: several more problems
On Mon, 26 Nov 2007, Erez Zadok wrote: > > I just posted a series of patches to unionfs (already in unionfs.git on > korg), which fix every problem LTP found, as well as other problems > mentioned in your email. With this series of patches, the same set of tests > which pass on ext3 also pass with unionfs mounted over ext3. Good, I'll give them a try, but not tonight - my own bugs to work on! > > I'd like to call your attention to one small unionfs_writepage fix. One of > the ltp fs tests (rwtest04) triggered a BUG_ON(PageWriteback(page)) in > fs/buffer.c:1706, when we call the lower ->writepage. This was due to > multiple writers to the same page. Looking at other kernel code, it seems > that the right fix is to use wait_on_page_writeback(lower_page) to serialize > concurrent writebacks to the same page, no? Ah, yes. I believe that's the right thing to do (and you do have the lower_page locked there, which is necessary for that serialization). > The small patch below fixed the problem. Let me know what you think. I've one issue with it: please move that wait_on_page_writeback before the clear_page_dirty_for_io instead of after it, then resubmit your 14/16. As it stands, you're still mixing the end of the previous writeback with the beginning of the new one (e.g. clear_page_dirty_for_io is clearing the PageReclaim flag when it shouldn't be interfering at all); and the page might get dirtied during that wait for the previous writeback to complete, in which case you'd end up doing an extra unnecessary write (because page later found "dirty" even though written back long after last modification). Whether this is a big deal in any way, I don't know; but if you look at other instances, you'll find the wait_on_page_writeback is done before the clear_page_dirty_for_io, so you'll surely be on safer ground to follow the same ordering. (And so far as I can see, everyone who calls write_one_page does so with the wait flag 1: that's the clearest example. Hmm, I wonder if its initial wait_on_page_writeback is right to be conditional on wait at all. Well, until someone calls it with wait 0, it's hard to tell.) Hugh > > Thanks, > Erez. > > > diff --git a/fs/unionfs/mmap.c b/fs/unionfs/mmap.c > index eef43aa..d0fd4d0 100644 > --- a/fs/unionfs/mmap.c > +++ b/fs/unionfs/mmap.c > @@ -73,6 +73,7 @@ static int unionfs_writepage(struct page *page, struct > writeback_control *wbc) > > BUG_ON(!lower_mapping->a_ops->writepage); > clear_page_dirty_for_io(lower_page); /* emulate VFS behavior */ > + wait_on_page_writeback(lower_page); /* prevent multiple writers */ > err = lower_mapping->a_ops->writepage(lower_page, wbc); > if (err < 0) > goto out_release; - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: unionfs: several more problems
In message <[EMAIL PROTECTED]>, Hugh Dickins writes: [...] > Please try running LTP (e.g. ltp-full-20071031.tgz) after something like [...] Hugh, I just posted a series of patches to unionfs (already in unionfs.git on korg), which fix every problem LTP found, as well as other problems mentioned in your email. With this series of patches, the same set of tests which pass on ext3 also pass with unionfs mounted over ext3. I'd like to call your attention to one small unionfs_writepage fix. One of the ltp fs tests (rwtest04) triggered a BUG_ON(PageWriteback(page)) in fs/buffer.c:1706, when we call the lower ->writepage. This was due to multiple writers to the same page. Looking at other kernel code, it seems that the right fix is to use wait_on_page_writeback(lower_page) to serialize concurrent writebacks to the same page, no? The small patch below fixed the problem. Let me know what you think. Thanks, Erez. diff --git a/fs/unionfs/mmap.c b/fs/unionfs/mmap.c index eef43aa..d0fd4d0 100644 --- a/fs/unionfs/mmap.c +++ b/fs/unionfs/mmap.c @@ -73,6 +73,7 @@ static int unionfs_writepage(struct page *page, struct writeback_control *wbc) BUG_ON(!lower_mapping->a_ops->writepage); clear_page_dirty_for_io(lower_page); /* emulate VFS behavior */ + wait_on_page_writeback(lower_page); /* prevent multiple writers */ err = lower_mapping->a_ops->writepage(lower_page, wbc); if (err < 0) goto out_release; - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: unionfs: several more problems
In message [EMAIL PROTECTED], Hugh Dickins writes: [...] Please try running LTP (e.g. ltp-full-20071031.tgz) after something like [...] Hugh, I just posted a series of patches to unionfs (already in unionfs.git on korg), which fix every problem LTP found, as well as other problems mentioned in your email. With this series of patches, the same set of tests which pass on ext3 also pass with unionfs mounted over ext3. I'd like to call your attention to one small unionfs_writepage fix. One of the ltp fs tests (rwtest04) triggered a BUG_ON(PageWriteback(page)) in fs/buffer.c:1706, when we call the lower -writepage. This was due to multiple writers to the same page. Looking at other kernel code, it seems that the right fix is to use wait_on_page_writeback(lower_page) to serialize concurrent writebacks to the same page, no? The small patch below fixed the problem. Let me know what you think. Thanks, Erez. diff --git a/fs/unionfs/mmap.c b/fs/unionfs/mmap.c index eef43aa..d0fd4d0 100644 --- a/fs/unionfs/mmap.c +++ b/fs/unionfs/mmap.c @@ -73,6 +73,7 @@ static int unionfs_writepage(struct page *page, struct writeback_control *wbc) BUG_ON(!lower_mapping-a_ops-writepage); clear_page_dirty_for_io(lower_page); /* emulate VFS behavior */ + wait_on_page_writeback(lower_page); /* prevent multiple writers */ err = lower_mapping-a_ops-writepage(lower_page, wbc); if (err 0) goto out_release; - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: unionfs: several more problems
On Mon, 26 Nov 2007, Erez Zadok wrote: I just posted a series of patches to unionfs (already in unionfs.git on korg), which fix every problem LTP found, as well as other problems mentioned in your email. With this series of patches, the same set of tests which pass on ext3 also pass with unionfs mounted over ext3. Good, I'll give them a try, but not tonight - my own bugs to work on! I'd like to call your attention to one small unionfs_writepage fix. One of the ltp fs tests (rwtest04) triggered a BUG_ON(PageWriteback(page)) in fs/buffer.c:1706, when we call the lower -writepage. This was due to multiple writers to the same page. Looking at other kernel code, it seems that the right fix is to use wait_on_page_writeback(lower_page) to serialize concurrent writebacks to the same page, no? Ah, yes. I believe that's the right thing to do (and you do have the lower_page locked there, which is necessary for that serialization). The small patch below fixed the problem. Let me know what you think. I've one issue with it: please move that wait_on_page_writeback before the clear_page_dirty_for_io instead of after it, then resubmit your 14/16. As it stands, you're still mixing the end of the previous writeback with the beginning of the new one (e.g. clear_page_dirty_for_io is clearing the PageReclaim flag when it shouldn't be interfering at all); and the page might get dirtied during that wait for the previous writeback to complete, in which case you'd end up doing an extra unnecessary write (because page later found dirty even though written back long after last modification). Whether this is a big deal in any way, I don't know; but if you look at other instances, you'll find the wait_on_page_writeback is done before the clear_page_dirty_for_io, so you'll surely be on safer ground to follow the same ordering. (And so far as I can see, everyone who calls write_one_page does so with the wait flag 1: that's the clearest example. Hmm, I wonder if its initial wait_on_page_writeback is right to be conditional on wait at all. Well, until someone calls it with wait 0, it's hard to tell.) Hugh Thanks, Erez. diff --git a/fs/unionfs/mmap.c b/fs/unionfs/mmap.c index eef43aa..d0fd4d0 100644 --- a/fs/unionfs/mmap.c +++ b/fs/unionfs/mmap.c @@ -73,6 +73,7 @@ static int unionfs_writepage(struct page *page, struct writeback_control *wbc) BUG_ON(!lower_mapping-a_ops-writepage); clear_page_dirty_for_io(lower_page); /* emulate VFS behavior */ + wait_on_page_writeback(lower_page); /* prevent multiple writers */ err = lower_mapping-a_ops-writepage(lower_page, wbc); if (err 0) goto out_release; - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: unionfs: several more problems
In message [EMAIL PROTECTED], Hugh Dickins writes: On Mon, 26 Nov 2007, Erez Zadok wrote: [...] The small patch below fixed the problem. Let me know what you think. I've one issue with it: please move that wait_on_page_writeback before the clear_page_dirty_for_io instead of after it, then resubmit your 14/16. [...] Done, tested, and working. Here's the revised patch (pushed to unionfs.git on korg). Thanks, Erez. -- Unionfs: prevent multiple writers to lower_page Without this patch, the LTP fs test rwtest04 triggers a BUG_ON(PageWriteback(page)) in fs/buffer.c:1706. CC: Hugh Dickins [EMAIL PROTECTED] Signed-off-by: Erez Zadok [EMAIL PROTECTED] diff --git a/fs/unionfs/mmap.c b/fs/unionfs/mmap.c index 623a913..74f2e53 100644 --- a/fs/unionfs/mmap.c +++ b/fs/unionfs/mmap.c @@ -72,6 +72,7 @@ static int unionfs_writepage(struct page *page, struct writeback_control *wbc) } BUG_ON(!lower_mapping-a_ops-writepage); + wait_on_page_writeback(lower_page); /* prevent multiple writers */ clear_page_dirty_for_io(lower_page); /* emulate VFS behavior */ err = lower_mapping-a_ops-writepage(lower_page, wbc); if (err 0) - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: unionfs: several more problems
In message <[EMAIL PROTECTED]>, Hugh Dickins writes: [...] > > I deceived myself for a while that the danger of shmem_writepage > > hitting its BUG_ON(entry->val) was dealt with too; but that's wrong, > > I must go back to working out an escape from that one (despite never > > seeing it). > > Once I tried a more appropriate test (fsx while swapping) I hit that > easily. After some thought and testing, I'm happy with the mm/shmem.c > +mm/swap_state.c fixes I've arrived at for that; but since it's not > easy to reproduce in normal usage, and hasn't been holding you up, > I'd prefer for the moment to hold on to that patch. I need to make > changes around the same pagecache<->swapcache area to solve some mem > cgroup issues: there might turn out to be some interaction, so I'd > rather finalize both patches in the same series if I can. [...] If you want, send me those patches and I'll run them w/ my tests, even if they're not finalized; my testing can give you another useful point of reference. > But perhaps before fixing up the several LTP tests, you'll want > to concentrate on a more directed test. Please try this sequence: > > # Running with mem=512M, probably irrelevant > swapoff -a# Merely to rule out one potential confusion > mkfs -t ext2 /dev/sdb1 > mount -t ext2 /dev/sdb1 /mnt > df /mnt # I have 2280 Used out of 1517920 KB > cp -a 2.6.24-rc3 /mnt # Copy a kernel source tree into ext2 > rm -rf /mnt/2.6.24-rc3# Delete the copy > df /mnt # Again 2280 Used, just as you'd expect > mount -t unionfs -o dirs=/mnt unionfs /tmp > cp -a 2.6.24-rc3 /tmp # Copy a kernel source tree into unionfs > rm -rf /tmp/2.6.24-rc3# Generates 176 unionfs: filldir error messages > df /mnt # Now 68380 Used (df /tmp shows the same) > ls -a /mnt# Shows . .. .wh.2.6.24-rc3 lost+found > echo 1 >/proc/sys/vm/drop_caches # to free pagecache > df /mnt # Still 68380 Used (df /tmp shows the same) > echo 2 >/proc/sys/vm/drop_caches # to free dentries and inodes > df /mnt # Now 2280 Used as it should be (df /tmp same) > ls -a /mnt# But still shows that .wh.2.6.24-rc3 > umount /tmp # Restore > umount /mnt # Restore > swapon -a # Restore > > Three different problems there: > > 1. Whiteouts seem to get left behind (at this top level anyway): > I'm getting an increasing number of .wh.run-crons.? files there. > I'm not familiar with the correct behaviour for whiteouts (and it's > not clear to me why whiteouts are needed at all in this degenerate > case of a single directory in the union, but never mind). I could spend a lot of time explaining the history of whiteouts in unioning file systems, and all the different techniques and algorithms we've tried ourselves over the years. But suffice to say that I'd be very happy the day every Linux f/s has a native whiteout support. :-) Our current policy for when/where to create whiteouts has evolved after much experience with users. The most common use case for unionfs is one or more read-only branches, plus a high-priority writable branch (for copyup). Therefore, in the most common case we cannot remove the objects from the readonly branches, and have to create a whiteout instead. Using a single branch with unionfs is very uncommon among unionfs users, but it serves nicely as a useful "null layer" testing (ala BSD's Nullfs or my fistgen's wrapfs). Anyway, upon further thinking about this issue I realized that whiteouts in the single-branch situation are just a generalization of a possibly more common case -- when the object being unlink'ed (or rmdir'ed) is on the rightmost, lowest priority branch in which it is known to exist. In that case, there's no need to create a whiteout there, b/c there's no chance that a readonly file by the same name could exist below that branch. The same is true if you try to rmdir a directory anywhere in one of the union's branches: if we know (thanks to ->lookup) that there is no dir by the same name anywhere else, then we can safely skip creating a whiteout if the least-priority dir is being rmdir'ed. I've got a small patch that does just that. > 2. Why does copying then deleting a tree leave blocks allocated, > which remain allocated indefinitely, until memory pressure or > drop_caches removes them? Hmm, I should have done "df -i" instead, > that would be more revealing. This may well be the same as the LTP > mkdir problem - inodes remaining half-allocated after they're unlinked. Turns out we weren't releasing the ref's on the lower directory being rmdir'ed as early as we could. We'd have done it in delete/clear_inode, upon memory pressure, or unmount -- so those resources wouldn't have stuck around forever. I now have a small patch that releases those resources on rmdir and the space (df and df -i) is reclaimed right
Re: unionfs: several more problems
In message [EMAIL PROTECTED], Hugh Dickins writes: [...] I deceived myself for a while that the danger of shmem_writepage hitting its BUG_ON(entry-val) was dealt with too; but that's wrong, I must go back to working out an escape from that one (despite never seeing it). Once I tried a more appropriate test (fsx while swapping) I hit that easily. After some thought and testing, I'm happy with the mm/shmem.c +mm/swap_state.c fixes I've arrived at for that; but since it's not easy to reproduce in normal usage, and hasn't been holding you up, I'd prefer for the moment to hold on to that patch. I need to make changes around the same pagecache-swapcache area to solve some mem cgroup issues: there might turn out to be some interaction, so I'd rather finalize both patches in the same series if I can. [...] If you want, send me those patches and I'll run them w/ my tests, even if they're not finalized; my testing can give you another useful point of reference. But perhaps before fixing up the several LTP tests, you'll want to concentrate on a more directed test. Please try this sequence: # Running with mem=512M, probably irrelevant swapoff -a# Merely to rule out one potential confusion mkfs -t ext2 /dev/sdb1 mount -t ext2 /dev/sdb1 /mnt df /mnt # I have 2280 Used out of 1517920 KB cp -a 2.6.24-rc3 /mnt # Copy a kernel source tree into ext2 rm -rf /mnt/2.6.24-rc3# Delete the copy df /mnt # Again 2280 Used, just as you'd expect mount -t unionfs -o dirs=/mnt unionfs /tmp cp -a 2.6.24-rc3 /tmp # Copy a kernel source tree into unionfs rm -rf /tmp/2.6.24-rc3# Generates 176 unionfs: filldir error messages df /mnt # Now 68380 Used (df /tmp shows the same) ls -a /mnt# Shows . .. .wh.2.6.24-rc3 lost+found echo 1 /proc/sys/vm/drop_caches # to free pagecache df /mnt # Still 68380 Used (df /tmp shows the same) echo 2 /proc/sys/vm/drop_caches # to free dentries and inodes df /mnt # Now 2280 Used as it should be (df /tmp same) ls -a /mnt# But still shows that .wh.2.6.24-rc3 umount /tmp # Restore umount /mnt # Restore swapon -a # Restore Three different problems there: 1. Whiteouts seem to get left behind (at this top level anyway): I'm getting an increasing number of .wh.run-crons.? files there. I'm not familiar with the correct behaviour for whiteouts (and it's not clear to me why whiteouts are needed at all in this degenerate case of a single directory in the union, but never mind). I could spend a lot of time explaining the history of whiteouts in unioning file systems, and all the different techniques and algorithms we've tried ourselves over the years. But suffice to say that I'd be very happy the day every Linux f/s has a native whiteout support. :-) Our current policy for when/where to create whiteouts has evolved after much experience with users. The most common use case for unionfs is one or more read-only branches, plus a high-priority writable branch (for copyup). Therefore, in the most common case we cannot remove the objects from the readonly branches, and have to create a whiteout instead. Using a single branch with unionfs is very uncommon among unionfs users, but it serves nicely as a useful null layer testing (ala BSD's Nullfs or my fistgen's wrapfs). Anyway, upon further thinking about this issue I realized that whiteouts in the single-branch situation are just a generalization of a possibly more common case -- when the object being unlink'ed (or rmdir'ed) is on the rightmost, lowest priority branch in which it is known to exist. In that case, there's no need to create a whiteout there, b/c there's no chance that a readonly file by the same name could exist below that branch. The same is true if you try to rmdir a directory anywhere in one of the union's branches: if we know (thanks to -lookup) that there is no dir by the same name anywhere else, then we can safely skip creating a whiteout if the least-priority dir is being rmdir'ed. I've got a small patch that does just that. 2. Why does copying then deleting a tree leave blocks allocated, which remain allocated indefinitely, until memory pressure or drop_caches removes them? Hmm, I should have done df -i instead, that would be more revealing. This may well be the same as the LTP mkdir problem - inodes remaining half-allocated after they're unlinked. Turns out we weren't releasing the ref's on the lower directory being rmdir'ed as early as we could. We'd have done it in delete/clear_inode, upon memory pressure, or unmount -- so those resources wouldn't have stuck around forever. I now have a small patch that releases those resources on rmdir and the space (df and df -i) is reclaimed right away. And, with the above patch to not create whiteouts on
unionfs: several more problems
I think it may be time to move away from that by now misleading "msync(2) bug(?), returns AOP_WRITEPAGE_ACTIVATE to userland" thread. And I've cut most out of the Cc list, but by all means add any back. On Saturday I said: > I'm glad to report that this unionfs, not the one in 2.6.24-rc2-mm1 > but the one including those 9 patches you posted, now gets through > my testing with tmpfs without a problem. I do still get occasional > "unionfs: new lower inode mtime (bindex=0, name=)" > messages, but nothing worse seen yet: a big improvement. I have seen worse now: details below. > I deceived myself for a while that the danger of shmem_writepage > hitting its BUG_ON(entry->val) was dealt with too; but that's wrong, > I must go back to working out an escape from that one (despite never > seeing it). Once I tried a more appropriate test (fsx while swapping) I hit that easily. After some thought and testing, I'm happy with the mm/shmem.c +mm/swap_state.c fixes I've arrived at for that; but since it's not easy to reproduce in normal usage, and hasn't been holding you up, I'd prefer for the moment to hold on to that patch. I need to make changes around the same pagecache<->swapcache area to solve some mem cgroup issues: there might turn out to be some interaction, so I'd rather finalize both patches in the same series if I can. For a while I thought the further unionfs problems I'm seeing were tmpfs ones, but now I seem to be seeing all(?) of them with ext2: so I'd better spend no more time on them, hand them over to you. Please try running LTP (e.g. ltp-full-20071031.tgz) after something like mkfs -t ext2 /dev/sdb1 mount -t ext2 /dev/sdb1 /mnt mount -t unionfs -o dirs=/mnt unionfs /tmp That throws up quite a number of errors which don't occur with a straightforward ext2 mounted on /tmp. Notably, when it embarks upon "MULTIPLE PROCESSES CREATING AND DELETING FILES", mkdir seems to collapse into lots of mkdir: cannot create directory `dirN': No space left on device mkdir dirN FAILED (the same happened when I was using a tmpfs instead of an ext2). But perhaps before fixing up the several LTP tests, you'll want to concentrate on a more directed test. Please try this sequence: # Running with mem=512M, probably irrelevant swapoff -a # Merely to rule out one potential confusion mkfs -t ext2 /dev/sdb1 mount -t ext2 /dev/sdb1 /mnt df /mnt # I have 2280 Used out of 1517920 KB cp -a 2.6.24-rc3 /mnt # Copy a kernel source tree into ext2 rm -rf /mnt/2.6.24-rc3 # Delete the copy df /mnt # Again 2280 Used, just as you'd expect mount -t unionfs -o dirs=/mnt unionfs /tmp cp -a 2.6.24-rc3 /tmp # Copy a kernel source tree into unionfs rm -rf /tmp/2.6.24-rc3 # Generates 176 unionfs: filldir error messages df /mnt # Now 68380 Used (df /tmp shows the same) ls -a /mnt # Shows . .. .wh.2.6.24-rc3 lost+found echo 1 >/proc/sys/vm/drop_caches# to free pagecache df /mnt # Still 68380 Used (df /tmp shows the same) echo 2 >/proc/sys/vm/drop_caches# to free dentries and inodes df /mnt # Now 2280 Used as it should be (df /tmp same) ls -a /mnt # But still shows that .wh.2.6.24-rc3 umount /tmp # Restore umount /mnt # Restore swapon -a # Restore Three different problems there: 1. Whiteouts seem to get left behind (at this top level anyway): I'm getting an increasing number of .wh.run-crons.? files there. I'm not familiar with the correct behaviour for whiteouts (and it's not clear to me why whiteouts are needed at all in this degenerate case of a single directory in the union, but never mind). 2. Why does copying then deleting a tree leave blocks allocated, which remain allocated indefinitely, until memory pressure or drop_caches removes them? Hmm, I should have done "df -i" instead, that would be more revealing. This may well be the same as the LTP mkdir problem - inodes remaining half-allocated after they're unlinked. 3. The unionfs filldir error messages: unionfs: filldir: possible I/O error: a file is duplicated in the same branch 0: page_offset.h <... 174 more include/asm-* header files named until ...> unionfs: filldir: possible I/O error: a file is duplicated in the same branch 0: sfafsr.h It's tempting to suppose these are related to the unfreed inodes, but retrying again gives me 176 messages, whereas inodes fall from 2672 to 30. And I didn't see those messages with tmpfs, just with ext2. Hugh - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
unionfs: several more problems
I think it may be time to move away from that by now misleading msync(2) bug(?), returns AOP_WRITEPAGE_ACTIVATE to userland thread. And I've cut most out of the Cc list, but by all means add any back. On Saturday I said: I'm glad to report that this unionfs, not the one in 2.6.24-rc2-mm1 but the one including those 9 patches you posted, now gets through my testing with tmpfs without a problem. I do still get occasional unionfs: new lower inode mtime (bindex=0, name=directory) messages, but nothing worse seen yet: a big improvement. I have seen worse now: details below. I deceived myself for a while that the danger of shmem_writepage hitting its BUG_ON(entry-val) was dealt with too; but that's wrong, I must go back to working out an escape from that one (despite never seeing it). Once I tried a more appropriate test (fsx while swapping) I hit that easily. After some thought and testing, I'm happy with the mm/shmem.c +mm/swap_state.c fixes I've arrived at for that; but since it's not easy to reproduce in normal usage, and hasn't been holding you up, I'd prefer for the moment to hold on to that patch. I need to make changes around the same pagecache-swapcache area to solve some mem cgroup issues: there might turn out to be some interaction, so I'd rather finalize both patches in the same series if I can. For a while I thought the further unionfs problems I'm seeing were tmpfs ones, but now I seem to be seeing all(?) of them with ext2: so I'd better spend no more time on them, hand them over to you. Please try running LTP (e.g. ltp-full-20071031.tgz) after something like mkfs -t ext2 /dev/sdb1 mount -t ext2 /dev/sdb1 /mnt mount -t unionfs -o dirs=/mnt unionfs /tmp That throws up quite a number of errors which don't occur with a straightforward ext2 mounted on /tmp. Notably, when it embarks upon MULTIPLE PROCESSES CREATING AND DELETING FILES, mkdir seems to collapse into lots of mkdir: cannot create directory `dirN': No space left on device mkdir dirN FAILED (the same happened when I was using a tmpfs instead of an ext2). But perhaps before fixing up the several LTP tests, you'll want to concentrate on a more directed test. Please try this sequence: # Running with mem=512M, probably irrelevant swapoff -a # Merely to rule out one potential confusion mkfs -t ext2 /dev/sdb1 mount -t ext2 /dev/sdb1 /mnt df /mnt # I have 2280 Used out of 1517920 KB cp -a 2.6.24-rc3 /mnt # Copy a kernel source tree into ext2 rm -rf /mnt/2.6.24-rc3 # Delete the copy df /mnt # Again 2280 Used, just as you'd expect mount -t unionfs -o dirs=/mnt unionfs /tmp cp -a 2.6.24-rc3 /tmp # Copy a kernel source tree into unionfs rm -rf /tmp/2.6.24-rc3 # Generates 176 unionfs: filldir error messages df /mnt # Now 68380 Used (df /tmp shows the same) ls -a /mnt # Shows . .. .wh.2.6.24-rc3 lost+found echo 1 /proc/sys/vm/drop_caches# to free pagecache df /mnt # Still 68380 Used (df /tmp shows the same) echo 2 /proc/sys/vm/drop_caches# to free dentries and inodes df /mnt # Now 2280 Used as it should be (df /tmp same) ls -a /mnt # But still shows that .wh.2.6.24-rc3 umount /tmp # Restore umount /mnt # Restore swapon -a # Restore Three different problems there: 1. Whiteouts seem to get left behind (at this top level anyway): I'm getting an increasing number of .wh.run-crons.? files there. I'm not familiar with the correct behaviour for whiteouts (and it's not clear to me why whiteouts are needed at all in this degenerate case of a single directory in the union, but never mind). 2. Why does copying then deleting a tree leave blocks allocated, which remain allocated indefinitely, until memory pressure or drop_caches removes them? Hmm, I should have done df -i instead, that would be more revealing. This may well be the same as the LTP mkdir problem - inodes remaining half-allocated after they're unlinked. 3. The unionfs filldir error messages: unionfs: filldir: possible I/O error: a file is duplicated in the same branch 0: page_offset.h ... 174 more include/asm-* header files named until ... unionfs: filldir: possible I/O error: a file is duplicated in the same branch 0: sfafsr.h It's tempting to suppose these are related to the unfreed inodes, but retrying again gives me 176 messages, whereas inodes fall from 2672 to 30. And I didn't see those messages with tmpfs, just with ext2. Hugh - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/