Hello,

It looks like there is a scenario in which Metakit can damage a datafile on commit (for a certain interpretation of "damage", see below):

- the file is growing, i.e. needs to be extended
- there is an error in the very first write
  (could be, perhaps: disk full)
- some subsequent write succeeds
  (if the disk full condition ends, even briefly)
- the file is closed and re-opened

The problem is that MK sometimes has to write an end marker in a new seek position *past* the current end of the file. It does not always *immediately* check whether this write succeeds (i.e. fflush in stdio) and continues to try to write the rest of its changes, some of which would then be written *past* the last valid tail marker, which is a big no-no.

Even though the final commit will never be done if any I/O error has occurred, this *can* leave the file in a state where there is no valid tail marker at the end. A subsequent open will fail to find valid data, and will extend the file with a new dataset.

So while this can never flag an incomplete commit as being successful, the above does lead to a state where all data appears to be gone in the next open.

Mea culpa, mea culpa, mea maxima culpa.

The fix I'm contemplating for this is:

$ cvs diff persist.cpp
Index: persist.cpp
===================================================================
RCS file: /home/cvs/metakit/src/persist.cpp,v
retrieving revision 1.26
diff -u -r1.26 persist.cpp
--- persist.cpp 28 Jan 2004 21:33:40 -0000      1.26
+++ persist.cpp 17 Nov 2005 20:31:15 -0000
@@ -686,6 +686,11 @@
   } else {
c4_FileMark head (limit + 16 - end, _strategy._bytesFlipped, end > 0);
     _strategy.DataWrite(end, &head, sizeof head);
+
+ /* file is extended, force succesful write before using the new space! */
+    _strategy.DataCommit(0);
+    if (_strategy._failure != 0)
+      return;

     if (end0 < limit)
       end0 = limit; // create a gap
$

Note that such damaged files can still be manually recovered, at least in some cases. The trick is to find out where the file originally ended, and then truncate it. The last successfully committed file size will be in bytes (4..7) of the header. This need not be an absolute seek position, if there is non-metakit data in front (such as is the case with starkits).

IOW to try a recovery of a file damaged in the above way, do this:

- locate the header (starting with 0x4A4C "JL" or 0x4C4A "LJ")
- get the big-endian file size from bytes 4..7
- truncate the file at that relative position
- check: the truncated file must have byte 0x80 in offset "end-8"

Such a recovery will restore the state to before the first time when the damage occurred. Further changes made to the newly-created tail dataset will be lost. Note that this problem will always go through the following phases:

1. all is well, commits work, next open sees the changes
2. a commit fails as described above
3. the next open succeeds, but sees a dataset with no content
4. if further commits are done, they add to this initially empty state

So in essence, it's as if someone deleted all data and the app started from scratch again. With the observation that the deleted data still exists at the head of the file, and may be recoverable as described above.

I am still doing checks and discussing this with the developer who ran into this issue, but wanted to flag it as soon as possible for everyone concerned.

Once the problem has indeed correctly been identified and resolved, I'll set up a new release as soon as possible.

Comments welcome,
-jcw
_____________________________________________
Metakit mailing list  -  [email protected]
http://www.equi4.com/mailman/listinfo/metakit

Reply via email to