On Wed, Feb 10, 2021 at 11:12 PM Zygo Blaxell
<ce3g8...@umail.furryterror.org> wrote:


>
> If we want the data compressed (and who doesn't?  journal data compresses
> 8:1 with btrfs zstd) then we'll always need to make a copy at close.
> Because systemd used prealloc, the copy is necessarily to a new inode,
> as there's no way to re-enable compression on an inode once prealloc
> is used (this has deep disk-format reasons, but not as deep as the
> nodatacow ones).

Pretty sure sd-journald still fallocates when datacow by touching
/etc/tmpfiles.d/journal-nocow.conf

And I know for sure those datacow files do compress on rotation.

Preallocated datacow might not be so bad if it weren't for that one
damn header or indexing block, whatever the proper term is, that
sd-journald hammers every time it fsyncs. I don't know if I wanna know
what it means to snapshot a datacow file that's prealloc. But in
theory if the same blocks weren't all being hammered, a preallocated
file shouldn't fragment like hell if each prealloc block gets just one
write.


> If we don't care about compression or datasums, then keep the file
> nodatacow and do nothing at close.  The defrag isn't needed and the
> FS_NOCOW_FL flag change doesn't work.

Agreed.


> It makes sense for SSD too.  It's 4K extents, so the metadata and small-IO
> overheads will be non-trivial even on SSD.  Deleting or truncating datacow
> journal files will put a lot of tiny free space holes into the filesystem.
> It will flood the next commit with delayed refs and push up latency.

I haven't seen meaningful latency on a single journal file, datacow
and heavily fragmented, on ssd. But to test on more than one file at a
time i need to revert the defrag commits, and build systemd, and let a
bunch of journals accumulate somehow. If I dump too much data
artificially to try and mimic aging, I know I will get nowhere near as
many of those 4KiB extents. So I dunno.


>
> > In that case the fragmentation is
> > quite considerable, hundreds to thousands of extents. It's
> > sufficiently bad that it'd be probably be better if they were
> > defragmented automatically with a trigger that tests for number of
> > non-contiguous small blocks that somehow cheaply estimates latency
> > reading all of them.
>
> Yeah it would be nice of autodefrag could be made to not suck.

It triggers on inserts, not appends. So it doesn't do anything for the
sd-journald case.

I would think the active journals are the one more likely to get
searched for recent events than archived journals. So in the datacow
case, you only get relief once it's rotated. It'd be nice to find an
decent, not necessarily perfect, way for them to not get so fragmented
in the first place. Or just defrag once a file has 16M of
non-contiguous extents.

Estimating extents though is another issue, especially with compression enabled.

-- 
Chris Murphy

Reply via email to