Re: [auto-]defrag, nodatacow - general suggestions?(was: btrfs: poor performance on deleting many large files?)

Christoph Anton Mitterer Tue, 08 Dec 2015 21:44:06 -0800

Hey Hugo,


On Thu, 2015-11-26 at 00:33 +0000, Hugo Mills wrote:
>    Answering the second part first, no, it can't.
Thanks so far :)


>    The issue is that nodatacow bypasses the transactional nature of
> the FS, making changes to live data immediately. This then means that
> if you modify a modatacow file, the csum for that modified section is
> out of date, and won't be back in sync again until the latest
> transaction is committed. So you can end up with an inconsistent
> filesystem if there's a crash between the two events.
Sure,... (and btw: is there some kind of journal planned for
nodatacow'ed files?),... but why not simply trying to write an updated
checksum after the modified section has been flushed to disk... of
course there's no guarantee that both are consistent in case of crash (
but that's also the case without any checksum)... but at least one
would have the csum protection against everything else (blockerrors and
that like) in case no crash occurs?



> > For me the checksumming is actually the most important part of
> > btrfs
> > (not that I wouldn't like its other features as well)... so turning
> > it
> > off is something I really would want to avoid.
> > 
> > Plus it opens questions like: When there are no checksums, how can
> > it
> > (in the RAID cases) decide which block is the good one in case of
> > corruptions?
>    It doesn't decide -- both copies look equally good, because
> there's
> no checksum, so if you read the data, the FS will return whatever
> data
> was on the copy it happened to pick.
Hmm I see... so one gets basically the behaviour of RAID.
Isn't that kind of a big loss? I always considered the guarantee
against block errors and that like one of the big and basic features of
btrfs.
It seems that for certain (not too unimportant cases: DBs, VMs) one has
to decide between either evil, loosing the guaranteed consistency via
checksums... or basically running into severe troubles (like Mitch's
reported fragmentation issues).


> > 3) When I would actually disable datacow for e.g. a subvolume that
> > holds VMs or DBs... what are all the implications?
> > Obviously no checksumming, but what happens if I snapshot such a
> > subvolume or if I send/receive it?
> 
>    After snapshotting, modifications are CoWed precisely once, and
> then it reverts to nodatacow again. This means that making a snapshot
> of a nodatacow object will cause it to fragment as writes are made to
> it.
I see... something that should possibly go to some advanced admin
documentation (if not already in).
It means basically, that one must assure that any such files (VM
images, DB data dirs) are already created with nodatacow (perhaps on a
subvolume which is mounted as such.


> > 4) Duncan mentioned that defrag (and I guess that's also for auto-
> > defrag) isn't ref-link aware...
> > Isn't that somehow a complete showstopper?
>    It is, but the one attempt at dealing with it caused massive data
> corruption, and it was turned off again.
So... does this mean that it's still planned to be implemented some day
or has it been given up forever?
And is it (hopefully) also planned to be implemented for reflinks when
compression is added/changed/removed?


Given that you (or Duncan?,... sorry I sometimes mix up which of said
exactly what, since both of you are notoriously helpful :-) ) mentioned
that autodefrag basically fails with larger files,... and given that it
seems to be quite important for btrfs to not be fragmented too heavily,
it sounds a bit as if anything that uses (multiple) reflinks (e.g.
snapshots) cannot be really used very well.


>  autodefrag, however, has
> always been snapshot aware and snapshot safe, and would be the
> recommended approach here.
Ahhh... so autodefag *is* snapshot aware, and that's basically why the
suggestion is (AFAIU) that it's turned on, right?
So, I'm afraid O:-), that triggers a follow-up question:
Why isn't it the default? Or in other words what are its drawbacks
(e.g. other cases where ref-links would be broken up,... or issues with
compression)?

And also, when I now activate it on an already populated fs, will it
defrag also any old files (even if they're not rewritten or so)?
I tried to have a look for some general (rather "for dummies" than for
core developers) description of how defrag and autodefrag work... but
couldn't find anything in the usual places... :-(

btw: The wiki (https://btrfs.wiki.kernel.org/index.php/UseCases#How_do_
I_defragment_many_files.3F) doesn't mention that auto-defrag doesn't
suffer from that problem.


>  (Actually, it was broken in the same
> incident I just described -- but fixed again when the broken patches
> were reverted).
So it just couldn't be fixed (hopfully: yet) for the (manual) online
defragmentation?!


> > 5) Especially keeping (4) in mind but also the other comments in
> > from
> > Duncan and Austin...
> > Is auto-defrag now recommended to be generally used?
>
>    Absolutely, yes.
I see... well, I'll probably wait for some answers about its drawbacks
and then give it a try.


>    It's late for me, and this email was longer than I suspected, so
> I'm going to stop here, but I'll try to pick it up again and answer
> your other questions tomorrow.
Thanks so far :)

I know I haven't replied to that thread for some days, but if you have
anything to add to the remaining questions, I'd be still happy to read
it :)


Thanks and best wishes,
Chris.

smime.p7s
Description: S/MIME cryptographic signature

Re: [auto-]defrag, nodatacow - general suggestions?(was: btrfs: poor performance on deleting many large files?)

Reply via email to