On Fri, 29 Dec 2006, Linus Torvalds wrote:
> 
> Hmm? I'd love it if somebody else wrote the patch and tested it, because 
> I'm getting sick and tired of this bug ;)

Who the hell am I kidding? I haven't been able to sleep right for the last 
few days over this bug. It was really getting to me.

And putting on the thinking cap, there's actually a fairly simple an 
nonintrusive patch. It still has a tiny tiny race (see the comment), but I 
bet nobody can really hit it in real life anyway, and I know several ways 
to fix it, so I'm not really _that_ worried about it.

The patch is mostly a comment. The "real" meat of it is actually just a 
few lines.

Can anybody get corruption with this thing applied? It goes on top of 
plain v2.6.20-rc2.

                Linus

----
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index b3a198c..ec01da1 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -862,17 +862,46 @@ int clear_page_dirty_for_io(struct page *page)
 {
        struct address_space *mapping = page_mapping(page);
 
-       if (!mapping)
-               return TestClearPageDirty(page);
-
-       if (TestClearPageDirty(page)) {
-               if (mapping_cap_account_dirty(mapping)) {
-                       page_mkclean(page);
+       if (mapping && mapping_cap_account_dirty(mapping)) {
+               /*
+                * Yes, Virginia, this is indeed insane.
+                *
+                * We use this sequence to make sure that
+                *  (a) we account for dirty stats properly
+                *  (b) we tell the low-level filesystem to
+                *      mark the whole page dirty if it was
+                *      dirty in a pagetable. Only to then
+                *  (c) clean the page again and return 1 to
+                *      cause the writeback.
+                *
+                * This way we avoid all nasty races with the
+                * dirty bit in multiple places and clearing
+                * them concurrently from different threads.
+                *
+                * Note! Normally the "set_page_dirty(page)"
+                * has no effect on the actual dirty bit - since
+                * that will already usually be set. But we
+                * need the side effects, and it can help us
+                * avoid races.
+                *
+                * We basically use the page "master dirty bit"
+                * as a serialization point for all the different
+                * threds doing their things.
+                *
+                * FIXME! We still have a race here: if somebody
+                * adds the page back to the page tables in
+                * between the "page_mkclean()" and the "TestClearPageDirty()",
+                * we might have it mapped without the dirty bit set.
+                */
+               if (page_mkclean(page))
+                       set_page_dirty(page);
+               if (TestClearPageDirty(page)) {
                        dec_zone_page_state(page, NR_FILE_DIRTY);
+                       return 1;
                }
-               return 1;
+               return 0;
        }
-       return 0;
+       return TestClearPageDirty(page);
 }
 EXPORT_SYMBOL(clear_page_dirty_for_io);
 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to