Re: btrfs, journald logs, fragmentation, and fallocate

Peter Grandi Fri, 28 Apr 2017 10:55:06 -0700

> Old news is that systemd-journald journals end up pretty
> heavily fragmented on Btrfs due to COW.


This has been discussed before in detail indeeed here, but also
here: http://www.sabi.co.uk/blog/15-one.html?150203#150203

> While journald uses chattr +C on journal files now, COW still
> happens if the subvolume the journal is in gets snapshot. e.g.
> a week old system.journal has 19000+ extents. [ ... ]  It
> appears to me (see below URLs pointing to example journals)
> that journald fallocated in 8MiB increments but then ends up
> doing 4KiB writes; [ ... ]

So there are three layers of silliness here:

* Writing large files slowly to a COW filesystem and
  snapshotting it frequently.
* A filesystem that does delayed allocation instead of
  allocate-ahead, and does not have psychic code.
* Working around that by using no-COW and preallocation
  with a fixed size regardless of snapshot frequency.

The primary problem here is that there is no way to have slow
small writes and frequent snapshots without generating small
extents: if a file is written at a rate of 1MiB/hour and gets
snapshot every hour the extent size will not be larger than 1MiB
*obviously*.

Filesystem-level snapshots are not designed to snapshot slowly
growing files, but to snapshots changing collections of
files. There are harsh tradeoffs involved. Application-level
shapshots (also known as log rotations :->) are needed for
special cases and finer grained policies.

The secondary problem is that a fixed preallocate of 8MiB is
good only if in betweeen snapshots the file grows by a little
less than 8MiB or by substantially more.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs, journald logs, fragmentation, and fallocate

Reply via email to