To expand a bit, the on-disk format needs to allow the roots of N of the last transactions to be/remain reachable at all times. At open time you look for the latest transaction, verify that it has been written[0] completely, then use it, else look for the preceding transaction, verify it, and so on.
N needs to be at least 2: the last and the preceding transactions. No blocks should be freed or reused for any transactions still in use or possible use (e.g., for power failure recovery). For high read concurrency you can allow connections to lock a past transaction so that no blocks are freed that are needed to access the DB at that state. This all goes back to 1980s DB and filesystem concepts. See, for example, the BSD4.4 Log Structure Filesystem. (I mention this in case there are concerns about patents, though IANAL and I make no particular assertions here other than that there is plenty of old prior art and expired patents that can probably be used to obtain sufficient certainty as to the patent law risks in the approach described herein.) [0] E.g., check a transaction block manifest and check that those blocks were written correctly; or traverse the tree looking for differences to the previous transaction; this may require checking block contents checksums. Nico -- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/