Re: [systemd-devel] Slow startup of systemd-journal on BTRFS

Austin S Hemmelgarn Mon, 16 Jun 2014 17:16:25 -0700
On 06/16/2014 03:52 PM, Martin wrote:
> On 16/06/14 17:05, Josef Bacik wrote:
>>
>> On 06/16/2014 03:14 AM, Lennart Poettering wrote:
>>> On Mon, 16.06.14 10:17, Russell Coker (russ...@coker.com.au) wrote:
>>>
>>>>> I am not really following though why this trips up btrfs though. I am
>>>>> not sure I understand why this breaks btrfs COW behaviour. I mean,
> 
>>>> I don't believe that fallocate() makes any difference to
>>>> fragmentation on
>>>> BTRFS.  Blocks will be allocated when writes occur so regardless of an
>>>> fallocate() call the usage pattern in systemd-journald will cause
>>>> fragmentation.
>>>
>>> journald's write pattern looks something like this: append something to
>>> the end, make sure it is written, then update a few offsets stored at
>>> the beginning of the file to point to the newly appended data. This is
>>> of course not easy to handle for COW file systems. But then again, it's
>>> probably not too different from access patterns of other database or
>>> database-like engines...
> 
> Even though this appears to be a problem case for btrfs/COW, is there a
> more favourable write/access sequence possible that is easily
> implemented that is favourable for both ext4-like fs /and/ COW fs?
> 
> Database-like writing is known 'difficult' for filesystems: Can a data
> log can be a simpler case?
> 
> 
>> Was waiting for you to show up before I said anything since most systemd
>> related emails always devolve into how evil you are rather than what is
>> actually happening.
> 
> Ouch! Hope you two know each other!! :-P :-)
> 
> 
> [...]
>> since we shouldn't be fragmenting this badly.
>>
>> Like I said what you guys are doing is fine, if btrfs falls on it's face
>> then its not your fault.  I'd just like an exact idea of when you guys
>> are fsync'ing so I can replicate in a smaller way.  Thanks,
> 
> Good if COW can be so resilient. I have about 2GBytes of data logging
> files and I must defrag those as part of my backups to stop the system
> fragmenting to a stop (I use "cp -a" to defrag the files to a new area
> and restart the data software logger on that).
> 
> 
> Random thoughts:
> 
> Would using a second small file just for the mmap-ed pointers help avoid
> repeated rewriting of random offsets in the log file causing excessive
> fragmentation?
> 
> Align the data writes to 16kByte or 64kByte boundaries/chunks?
> 
> Are mmap-ed files a similar problem to using a swap file and so should
> the same "btrfs file swap" code be used for both?
> 
> 
> Not looked over the code so all random guesses...
> 
> Regards,
> Martin
> 
> 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
Just a thought, partly inspired by the mention of the swap code, has
anyone tried making the file NOCOW and pre-allocating to the max journal
size?  A similar approach has seemed to help on my systems with generic
log files (I keep debug level logs from almost everything, so I end up
with very active log files with ridiculous numbers of fragments if I
don't pre-allocate and mark them NOCOW).  I don't know for certain how
BTRFS handles appends to NOCOW files, but I would be willing to bet that
it ends up with a new fragment for each filesystem block worth of space
allocated.
smime.p7s
Description: S/MIME Cryptographic Signature
Re: [systemd-devel] Slow startup of systemd-journal on BTRFS

Reply via email to