> Am I wrong when saying that ending up with replay journals that have
> unexpected data and that can't be replayed is just inevitable and something
> any journalling filesystem must deal with?

If by journal you mean the btrfs log then yes, strictly speaking, you're
wrong.  btrfs does deal with the kind of incomplete and reordered writes
that you're talking about and it should not result in corruption of what
it calls the log.

But it's a reasonable thing to be confused by.  I'm guessing that you're
being tripped up by what ext3 means by a journal and by what btrfs means
by a log.

The journal in ext3 can be partially written during a crash.  The
journal replay on mount notices this because the commit block isn't
present and just throws it away.  No worries.

The equivalent consistent update mechanism in btrfs is cow tree updates.
The superblock that references new tree blocks written to free space is
itself only written once all those blocks are stable on disk.  If the
tree block writes are interrupted then the superblock isn't updated and
btrfs won't see the partially written blocks.  No worries.

The btrfs "log" is itself just a logical btree *inside these consistent
tree updates* that records logical operations that will need to be
replayed.  For the log to be corrupted, if the btrfs code is perfect,
the storage had to have lied to btrfs and told it that tree update
blocks were stable which caused the superblock write that referenced
them prematurely.

The equivalent problem in the ext3 journal would be a transaction that
has blocks missing but which has a valid commit block.  ext3 couldn't
just throw this transaction away because after the commit block write it
could have been in the process of replaying the transaction blocks at
their final location on disk.  And it's now missing some of those blocks
to replay.  This kind of corruption Shouldn't Happen and the fs can't
just silently ignore it.

I absolutely agree that the error messages should be greatly improved in
this case, yes, and that it shouldn't BUG_ON (it should *never* BUG_ON).

But btrfs is right to refuse to silently revert previously stable
changes by just ignoring the corrupt log.

- z
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to