Re: btrfs, journald logs, fragmentation, and fallocate

Chris Murphy Fri, 28 Apr 2017 10:41:23 -0700

On Fri, Apr 28, 2017 at 11:05 AM, Goffredo Baroncelli
<kreij...@inwind.it> wrote:


> In the past I faced the same problems; I collected some data here 
> http://kreijack.blogspot.it/2014/06/btrfs-and-systemd-journal.html.
> Unfortunately the journald files are very bad, because first the data is 
> written (appended), then the index fields are updated. Unfortunately these 
> indexes are near after the last write . So fragmentation is unavoidable.
>
> After some thinking I adopted a different strategies: I used journald as 
> collector, then I forward all the log to rsyslogd, which used a "log append" 
> format. Journald never write on the root filesystem, only in tmp.

The gotcha though is there's a pile of data in the journal that would
never make it to rsyslogd. If you use journalctl -o verbose you can
see some of this. There's a bunch of extra metadata in the journal.
And then also filtering based on that metadata is useful rather than
being limited to grep on a syslog file. Which, you know, it's fine for
many use cases. I guess I'm just interested in whether there's an
enhancement that can be done to make journals more compatible with
Btrfs or vice versa. It's not a huge problem anyway.


>
> The think became interesting when I discovered that the searching in a 
> rsyslog file is faster than journalctl (on a rotational media). Unfortunately 
> I don't have any data to support this.


Yes on drives all of these scattered extents cause a lot of head
seeking. And I also suspect it's a lot of metadata spread out
everywhere too, to account for all of these extents. That's why they
moved to chattr +C to make them nocow. An idea I had on systemd list
was to automatically make the journal directory a Btrfs subvolume,
similar to how systemd already creates a /var/lib/machines subvolume
for nspawn containers. This prevents the journals from being caught up
in a snapshot of the parent subvolume that typically contains the
journals (root fs). There's no practical use I can think of for
snapshotting logs. You'd really want the logs to always be linear,
contiguous, and never get rolled back. Even if something in the system
does get rolled back, you'd want the logs to show that and continue
on, rather than being rolled back themselves.

So the super simple option would be continue with +C on journals, and
then a separate subvolume to prevent COW from ever happening
inadvertently.

The same behavior happens with NTFS in qcow2 files. They quickly end
up with 100,000+ extents unless set nocow. It's like the worst case
scenario.

-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs, journald logs, fragmentation, and fallocate

Reply via email to