On Tue, Feb 9, 2021 at 12:45 PM Goffredo Baroncelli <kreij...@inwind.it> wrote:
>
> On 2/9/21 8:01 PM, Chris Murphy wrote:
> > On Tue, Feb 9, 2021 at 11:13 AM Goffredo Baroncelli <kreij...@inwind.it> 
> > wrote:
> >>
> >> On 2/9/21 1:42 AM, Chris Murphy wrote:
> >>> Perhaps. Attach strace to journald before --rotate, and then --rotate
> >>>
> >>> https://pastebin.com/UGihfCG9
> >>
> >> I looked to this strace.
> >>
> >> in line 115: it is called a ioctl(<BTRFS-DEFRAG>)
> >> in line 123: it is called a ioctl(<BTRFS-DEFRAG>)
> >>
> >> However the two descriptors for which the defrag is invoked are never 
> >> sync-ed before.
> >>
> >> I was expecting is to see a sync (flush the data on the platters) and then 
> >> a
> >> ioctl(<BTRFS-defrag>. This doesn't seems to be looking from the strace.
> >>
> >> I wrote a script (see below) which basically:
> >> - create a fragmented file
> >> - run filefrag on it
> >> - optionally sync the file             <-----
> >> - run btrfs fi defrag on it
> >> - run filefrag on it
> >>
> >> If I don't perform the sync, the defrag is ineffective. But if I sync the
> >> file BEFORE doing the defrag, I got only one extent.
> >> Now my hypothesis is: the journal log files are bad de-fragmented because 
> >> these
> >> are not sync-ed before.
> >> This could be tested quite easily putting an fsync() before the
> >> ioctl(<BTRFS_DEFRAG>).
> >>
> >> Any thought ?
> >
> > No idea. If it's a full sync then it could be expensive on either
> > slower devices or heavier workloads. On the one hand, there's no point
> > of doing an ineffective defrag so maybe the defrag ioctl should  just
> > do the sync first? On the other hand, this would effectively make the
> > defrag ioctl a full file system sync which might be unexpected. It's a
> > set of tradeoffs and I don't know what the expectation is.
> >
> > What about fdatasync() on the journal file rather than a full sync?
>
> I tried a fsync(2) call, and the results is the same.
> Only after reading your reply I realized that I used a sync(2), when
> I meant to use fsync(2).
>
> I update my python test code

Ok fsync should be least costly of the three.

The three unique things about systemd-journald that might be factors:

* nodatacow file
* fallocated file in 8MB increments multiple times up to 128M
* BTRFS_IOC_DEFRAG, whereas btrfs-progs uses BTRFS_IOC_DEFRAG_RANGE

So maybe it's all explained by lack of fsync, I'm not sure. But the
commit that added this doesn't show any form of sync.

https://github.com/systemd/systemd/commit/f27a386430cc7a27ebd06899d93310fb3bd4cee7



-- 
Chris Murphy

Reply via email to