Re: FYIO: A rant about btrfs

Austin S Hemmelgarn Wed, 16 Sep 2015 12:37:16 -0700

On 2015-09-16 15:04, Vincent Olivier wrote:

On Sep 16, 2015, at 2:22 PM, Austin S Hemmelgarn <ahferro...@gmail.com> wrote:

On 2015-09-16 12:51, Vincent Olivier wrote:

Hi,

On Sep 16, 2015, at 11:20 AM, Austin S Hemmelgarn <ahferro...@gmail.com> wrote:

On 2015-09-16 10:43, M G Berberich wrote:

Hello,

just for information. I stumbled about a rant about btrfs-performance:

  http://blog.pgaddict.com/posts/friends-dont-let-friends-use-btrfs-for-oltp

I read it too.

It is worth noting a few things that were done incorrectly in this testing:
1. _NEVER_ turn off write barriers (nobarrier mount option), doing so subtly 
breaks the data integrity guarantees of _ALL_ filesystems, but especially so on 
COW filesystems like BTRFS.  With this off, you will have a much higher chance 
that a power loss will cause data loss.  It shouldn't be turned off unless you 
are also turning off write-caching in the hardware or know for certain that no 
write-reordering is done by the hardware (and almost all modern hardware does 
write-reordering for performance reasons).

But can the “nobarrier” mount option affect performances negatively for Btrfs 
(and not only data integrity)?

Using it improves performance for every filesystem on Linux that supports it.  
This does not mean that it is _EVER_ a good idea to do so.  This mount option 
is one of the few things on my list of things that I will _NEVER_ personally 
provide support to people for, because it almost guarantees that you will lose 
data if the system dies unexpectedly (even if it's for a reason other than 
power loss).

OK fine. Let it be clearer then (on the Btrfs wiki): nobarrier is an absolute 
no go. Case closed.

From the https://btrfs.wiki.kernel.org/index.php/Mount_options

NOTE: Using this option greatly increases the chances of you experiencing data corruption during a power failure situation. This means full file-system corruption, and not just losing or corrupting data that was being written during a power cut or kernel panic.


It could be a bit clearer, but it's pretty well spelled out.

2. He provides no comparison of any other filesystem with TRIM support turned 
on (it is very likely that all filesystems will demonstrate such performance 
drops.  Based on that graph, it looks like the device doesn't support 
asynchronous trim commands).

I think he means by the text surrounding the only graph that mentions TRIM that 
this exact same test on the other filesystems he benchmarked yield much better 
results.

Possibly, but there are also known issues with TRIM/DISCARD on BTRFS in 4.0.  
And his claim is still baseless unless he actually provides reference for it.

Same as above: TRIM/DISCARD officially not recommended in production until 
further notice?

TRIM/DISCARD do work, it's just that they don't work to the degree they are expected to, there's some cases where BTRFS doesn't issue a discard when it should, and fstrim doesn't properly trim everything.

3. He's testing it for a workload is a known and documented problem for BTRFS, 
and claiming that that means that it isn't worth considering as a general usage 
filesystem.  Most people don't run RDBMS servers on their systems, and as such, 
such a workload is not worth considering for most people.

Apparently RDBMS being a problem on Btrfs is neither known nor documented 
enough (he’s right about the contrast with claiming publicly that Btrfs is 
indeed production ready).

OK, maybe not documented, but RDBMS falls under 'Large files with highly random 
access patterns and heavy RMW usage', which is a known issue for BTRFS, and 
also applies to VM images.

This guy is no idiot. If it wasn’t clear enough for him. It’s not clear enough 
period.

From https://btrfs.wiki.kernel.org/index.php/Gotchas
Fragmentation

Files with a lot of random writes can become heavily fragmented (10000+ extents) causing trashing on HDDs and excessive multi-second spikes of CPU load on systems with an SSD or large amount a RAM. On servers and workstations this affects databases and virtual machine images.

The nodatacow mount option may be of use here, with associated gotchas.

On desktops this primarily affects application databases (including Firefox and Chromium profiles, GNOME Zeitgeist, Ubuntu Desktop Couch, Banshee, and Evolution's datastore.) Workarounds include manually defragmenting your home directory using btrfs fi defragment. Auto-defragment (mount option autodefrag) should solve this problem in 3.0. Symptoms include btrfs-transacti and btrfs-endio-wri taking up a lot of CPU time (in spikes, possibly triggered by syncs). You can use filefrag to locate heavily fragmented files (may not work correctly with compression).

His points about the degree of performance jitter are valid however, as are the 
complaints of apparent CPU intensive stalls in the BTRFS code, and I 
occasionally see both on my own systems.

Me too. My two cents is that focusing on improving performances for 
Btrfs-optimal use cases is much more interesting than bringing new features 
like automatically turning COW off for RDBMS usage or debugging TRIM support.

It depends, BTRFS is still not feature complete with the overall intent when it 
was started (raid56 and qgroups being the two big issues at the moment), and 
attempting to optimize things tends to introduce bugs, which we have quite 
enough of already without people adding more (and they still seem to be 
breeding like rabbits).

I would just like a clear statement from a dev-lead saying : until we are 
feature-complete (with a finite list of features to complete) the focus will be 
on feature-completion and not optimizing already-implemented features. Ideally 
with an ETA on when optimization will be more of a priority than it is today.

As of right now, the list as far as I know is (in no particular order):
* working raid5/6
* n-copy replication (ie, three or more copy replication)
* qgroups
* improved read-balancing (technically an optimization)
* proper swap file support
* better random-write performance (again, optimization)
* online fsck (not scrub, but actual fsck)
* in-band data de-duplication
* various code cleanups
* many more things listed on the wiki

That said, my systems (which are usually doing mostly CPU or memory bound 
tasks, and not I/O bound like the aforementioned benchmarks were testing) run 
no slower than they did with ext4 as the main filesystem, and in some cases 
work much faster (even after averaging out the jitter in performance).  Based 
on this, I wouldn't advocate it for most server usage (except possibly as the 
root filesystem), but it does work very well for most desktop usage patterns 
and a number of HPC usage patterns as well.

See, this is interesting: I’d rather have a super fast and discardable SSD 
F2FS/ext4 root with a large Btrfs RAID for (NAS) server usage. Does your 
non-advocacy of Btrfs for server usage include a <10 user Samba NAS ?

If it's light usage, and you keep the softwarae running it up to date (and make sure you have other backups), this should do just fine. By server usage I meant large scale deployment with very big volumes of critical data.

Are more details about the Facebook deployment going to be available soon ? I’m 
very curious about this.

I really have no idea about this.

smime.p7s
Description: S/MIME Cryptographic Signature

Re: FYIO: A rant about btrfs

Reply via email to