On 07/03/2018 12:22 PM, Marc MERLIN wrote:
On Mon, Jul 02, 2018 at 06:31:43PM -0600, Chris Murphy wrote:
So the idea behind journaled file systems is that journal replay
enabled mount time "repair" that's faster than an fsck. Already Btrfs
use cases with big, but not huge, file systems makes btrfs check a
problem. Either running out of memory or it takes too long. So already
it isn't scaling as well as ext4 or XFS in this regard.

So what's the future hold? It seems like the goal is that the problems
must be avoided in the first place rather than to repair them after
the fact.

Are the problem's Marc is running into understood well enough that
there can eventually be a fix, maybe even an on-disk format change,
that prevents such problems from happening in the first place?

Or does it make sense for him to be running with btrfs debug or some
subset of btrfs integrity checking mask to try to catch the problems
in the act of them happening?

Those are all good questions.
To be fair, I cannot claim that btrfs was at fault for whatever filesystem
damage I ended up with. It's very possible that it happened due to a flaky
Sata card that kicked drives off the bus when it shouldn't have.
Sure in theory a journaling filesystem can recover from unexpected power
loss and drives dropping off at bad times, but I'm going to guess that
btrfs' complexity also means that it has data structures (extent tree?) that
need to be updated completely "or else".

Yes, extent tree is the hardest part for lowmem mode. I'm quite
confident the tool can deal well with file trees(which records metadata
about file and directory name, relationships).
As for extent tree, I have few confidence due to its complexity.

I'm obviously ok with a filesystem check being necessary to recover in cases
like this, afterall I still occasionally have to run e2fsck on ext4 too, but
I'm a lot less thrilled with the btrfs situation where basically the repair
tools can either completely crash your kernel, or take days and then either
get stuck in an infinite loop or hit an algorithm that can't scale if you
have too many hardlinks/snapshots.

It's not surprising that real world filesytems have many snapshots.
Original mode repair eats large memory space, so lowmem mode is created
to save memory but costs time. The latter is just not robust to handle
complex situations.

It sounds like there may not be a fix to this problem with the filesystem's
design, outside of "do not get there, or else".
It would even be useful for btrfs tools to start computing heuristics and
output warnings like "you have more than 100 snapshots on this filesystem,
this is not recommended, please read http://url/";

Qu, Su, does that sound both reasonable and doable?

Thanks,
Marc



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to