[EMAIL PROTECTED] (Niels Möller) writes: > [EMAIL PROTECTED] (Thomas Bushnell, BSG) writes: > > > I have a more concrete idea about how to change diskfs into an > > "ordered writes" instead of a "synchronous writes" model. If someone > > prods me, I can explain it. > > Please do.
Suppose diskfs/ufs/ext2fs needs to guarantee that block A is written before block B. Right now, it just does a synchronous update of block A before proceeding to the modification of B. That guarantees it, at some cost. (It's the way BSD always worked, by the way.) The ordered writes way is to make both modifications, but keep track of the dependency: "A must be written before B". When a block gets written, you can delete all records saying that it must be written before other things. For example, if your table says "A must be written before B", and you are now writing A, you can drop that entry But if the table says "A must be written before B", and the pager is now presented with block B to be written, it must *first* go find the block A and have it written. I think the best way to do arrange both of these is the following algorithm (in the pageout routine): while (table contains a block A that must be written before this block) mark table "waiting for block A to be written" ask kernel to pageout block A wait on condition for each (block A such that this block must be written first) remove mark from table if (table marked "waiting for this block to be written") wakeup condition This is not quite enough yet, however. There are cases (as noted before) where the following sequence arises: write block A write block B write block A again and where the writes *must* occur in that sequence. (This often happens when block A contains two inodes, and block B must be written *after* the update of the first, and *before* the update of the second.) If no actual pageouts happen until after all three writes, then the table will contain two records: "A must be written before B" "B must be written before A" And as soon as a pageout happens, you'll get a deadlock. And indeed, by that time, there's nothing you can do, because the intermediate state of block A is gone forever. So the "add mark to table" routine must detect cycles. To say "block A must be written before block B" you must: while (table contains "block B must be written before block A" [TRANSITIVELY, even if not identically]) mark table "waiting for block A to be written" ask kernel to pageout block A wait for condition mark table "block A must be written before block B" Now, one final wrinkle. Suppose we are changing A, and then B, and the modification to A must get written first. Then which should you do: modify A mark table "A must be written before B" modify B or modify A modify B mark table "A must be written before B" The answer is clearly, the former. Thomas _______________________________________________ Bug-hurd mailing list [EMAIL PROTECTED] http://mail.gnu.org/mailman/listinfo/bug-hurd