subject:"Re\: R\: Re\: Slow startup of systemd\-journal on BTRFS"

Re: R: Re: Slow startup of systemd-journal on BTRFS

2014-06-13 Thread Goffredo Baroncelli

Hi Dave

On 06/13/2014 01:24 AM, Dave Chinner wrote:
On Thu, Jun 12, 2014 at 12:37:13PM +, Duncan wrote:
Goffredo Baroncelli kreij...@libero.it posted on Thu, 12 Jun 2014
13:13:26 +0200 as excerpted:

systemd has a very stupid journal write pattern. It checks if there is
space in the file for the write, and if not it fallocates the small
amount of space it needs (it does *4 byte* fallocate calls!) and then
does the write to it. All this does is fragment the crap out of the log
files because the filesystems cannot optimise the allocation patterns.

I checked the code, and to me it seems that the fallocate() are done in
FILE_SIZE_INCREASE unit (actually 8MB).

FWIW, either 4 byte or 8 MiB fallocate calls would be bad, I think
actually pretty much equally bad without NOCOW set on the file.

So maybe it's been fixed in systemd since the last time I looked.
Yup:

http://cgit.freedesktop.org/systemd/systemd/commit/src/journal/journal-file.c?id=eda4b58b50509dc8ad0428a46e20f6c5cf516d58

The reason it was changed? To save a syscall per append, not to
prevent fragmentation of the file, which was the problem everyone
was complaining about...

thanks for pointing that. However I am performing my tests on a fedora 20 with
systemd-208, which seems have this change

Why? Because btrfs data blocks are 4 KiB. With COW, the effect for
either 4 byte or 8 MiB file allocations is going to end up being the
same, forcing (repeated until full) rewrite of each 4 KiB block into its
own extent.

I am reaching the conclusion that fallocate is not the problem. The fallocate
increase the filesize of about 8MB, which is enough for some logging. So it is
not called very often.

I have to investigate more what happens when the log are copied from /run to
/var/log/journal: this is when journald seems to slow all.

I am prepared a PC which reboot continuously; I am collecting the time required
to finish the boot vs the fragmentation of the system.journal file vs the
number of boot. The results are dramatic: after 20 reboot, the boot time
increase of 20-30 seconds. Doing a defrag of system.journal reduces the boot
time to the original one, but after another 20 reboot, the boot time still
requires 20-30 seconds more

It is a slow PC, but I saw the same behavior also on a more modern pc (i5 with
8GB).

For both PC the HD is a mechanical one...

And that's now a btrfs problem :/

Are you sure ?

ghigo@venice:/var/log$ sudo filefrag messages
messages: 29 extents found

ghigo@venice:/var/log$ sudo filefrag journal/*/system.journal
journal/41d686199835445395ac629d576dfcb9/system.journal: 1378 extents found

So the old rsyslog creates files with fewer fragments. BTRFS (but it seems also
xfs) for sure highlights more this problem than other filesystem. But also
systemd seems to create a lot of extens.

BR
G.Baroncelli

Cheers,

Dave.

--
gpg @keyserver.linux.it: Goffredo Baroncelli (kreijackATinwind.it
Key fingerprint BBF5 1610 0B64 DAC6 5F7D 17B2 0EDA 9B37 8B82 E0B5
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: R: Re: Slow startup of systemd-journal on BTRFS

2014-06-12 Thread Duncan

Goffredo Baroncelli kreij...@libero.it posted on Thu, 12 Jun 2014
13:13:26 +0200 as excerpted:

systemd has a very stupid journal write pattern. It checks if there is
space in the file for the write, and if not it fallocates the small
amount of space it needs (it does *4 byte* fallocate calls!) and then
does the write to it.  All this does is fragment the crap out of the log
files because the filesystems cannot optimise the allocation patterns.
 
 I checked the code, and to me it seems that the fallocate() are done in
 FILE_SIZE_INCREASE unit (actually 8MB).

FWIW, either 4 byte or 8 MiB fallocate calls would be bad, I think 
actually pretty much equally bad without NOCOW set on the file.

Why?  Because btrfs data blocks are 4 KiB.  With COW, the effect for 
either 4 byte or 8 MiB file allocations is going to end up being the 
same, forcing (repeated until full) rewrite of each 4 KiB block into its 
own extent.

Turning off the fallocate should allow btrfs to at least consolidate a 
bit, tho to the extent that multiple 4 KiB blocks cannot be written, 
repeated fsync will still cause issues.

80-100 MiB logs (size mentioned in another reply) should be reasonably 
well handled by btrfs autodefrag, however, if it's turned on.  I'd be 
worried if sizes were  256 MiB and certainly as sizes approached a GiB, 
but it should handle 80-100 MiB just fine.

-- 
Duncan - List replies preferred.   No HTML msgs.
Every nonfree program has a lord, a master --
and if you use the program, he is your master.  Richard Stallman

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: R: Re: Slow startup of systemd-journal on BTRFS

Re: R: Re: Slow startup of systemd-journal on BTRFS

2 matches

Site Navigation

Mail list logo

Footer information