subject:"Re\: FYIO\: A rant about btrfs"

Re: FYIO: A rant about btrfs

2015-09-24 Thread Aneurin Price

On 18 September 2015 at 14:10, Austin S Hemmelgarn  wrote:
> On 2015-09-17 10:52, Aneurin Price wrote:
>>
>> On 16 September 2015 at 20:21, Austin S Hemmelgarn 
>> wrote:
>>>
>>> ZFS has been around for much longer, it's been mature and feature
>>> complete for more than a decade, and has had a long time to improve
>>> performance wise.  It is important to note though, that on low-end hardware,
>>> BTRFS can (and often does in my experience) perform better than ZFS, because
>>> ZFS is a serious resource hog (I have yet to see a stable ZFS deployment
>>> with less than 16G of RAM, even with all the fancy features turned off).
>>
>>
>> If you have a real example of ZFS becoming unstable with, say, 4 or
>> 8GB of memory, that doesn't involve attempting deduplication (which I
>> guess is what you mean by 'all the fancy features') on a many-TB pool,
>> I'd be interested to hear about it. (Psychic debugger says 'possibly
>> somebody trying to use a large L2ARC on a pool with many/large zvols')
>>
>> My home fileserver has been running zfs for about 5 years, on a system
>> maxed out at 4GB RAM. Currently up to ~9TB of data. The only stability
>> problems I ever had were towards the beginning when I was using
>> zfs-fuse because zfsonlinux wasn't ready then, *and* I was trying out
>> deduplication.
>>
>> I have a couple of work machines with 2GB RAM and pools currently
>> around 2.5TB full; no problems with these either in the couple of
>> years they've been in use, though granted these are lightly loaded
>> machines since what they mostly do is receive backup streams.
>>
>> Bear in mind that these are Linux machines, and zfsonlinux's memory
>> management is known to be inferior to ZFS on Solaris and FreeBSD
>> (because it does not integrate with the page cache and instead grabs a
>> [configurable] chunk of memory, and doesn't always do a great job of
>> dropping it in response to memory pressure).
>>
> I should qualify this further, in particular I meant using ZFS on Linux (not
> *BSD, they did an amazing job WRT stability), and actually taking advantage
> of the volume-management (ie, not just storing files on it, but also using
> zvols).  In essence, A better way to put it is that I've never seen a truly
> stable system using zfsonlinux with less than 16G or RAM which is using it
> for volume-management as well as file storage.
>

That's interesting, and I think it supports the 'large L2ARC on a pool
with many/large zvols' theory.

Apologies that this is now so off-topic, but I really wish this were
more widely known:

I've seen major problems using an L2ARC with zvols, and from reading
various reports I get the impression that a lot of other people have
too, though they may not always have diagnosed it.

It's easy to get the general impression that an L2ARC is a great thing
to have, that has the potential to improve your read throughput with
minimal downside, but in fact the downside can be pretty large - and
it is much larger by default with volumes than filesystems. Whenever a
chunk of data is stored in the L2ARC, a header for that data is stored
in the ARC, where it must remain in memory (not sure if it's pinned to
stop it being swapped out; I think it is, but if you're at the point
where that makes a difference you're probably already having a bad
day). For a filesystem, the header corresponds to a record which is,
by default, a variable size up to 128k; for a volume, the header
corresponds to a block which is, by default, 8k. Assuming that your
records are mostly close to their maximum size, this means that the
RAM overhead of having a given amount of data in L2ARC is *16x* higher
for a volume than a filesystem.

Unfortunately, if you're using volumes it's probably because you're
running VMs or a dedicated storage server, and in those circumstances
we naturally think "yes, some caching sounds like a great idea", and
then proceed to wildly underestimate the amount of memory required.
Consequently, as the L2ARC fills, ZFS fills its memory allowance with
data that is basically pure overhead - often blowing past its
configured limit without getting much mileage out of it, and becoming
barely responsive. The effect is that the system seems fine when it
first starts, and becomes slower and slower over time until it's
unusable. Practically speaking, there's not too much difference
between a storage server that spends three minutes thrashing in
response to each read and one that's locked up hard.

Basically having an L2ARC of non-trivial size without a lot of RAM to
back it up is a bad idea, so I think there's a good chance that the
problems you've observed/heard about may well be due to inappropriate
L2ARC usage without fully appreciating the memory requirements.

Nye
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: FYIO: A rant about btrfs

2015-09-23 Thread Josef Bacik


On 09/16/2015 10:43 AM, M G Berberich wrote:

Hello,

just for information. I stumbled about a rant about btrfs-performance:



Found this through reddit, I'm reproducing some of his issues 
artificially, he's definitely run into some real bugs that aren't 
related to "databases suck on btrfs."  Thanks,


Josef

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: FYIO: A rant about btrfs

2015-09-18 Thread Austin S Hemmelgarn


On 2015-09-17 20:34, Duncan wrote:

Zygo Blaxell posted on Wed, 16 Sep 2015 18:08:56 -0400 as excerpted:


On Wed, Sep 16, 2015 at 03:04:38PM -0400, Vincent Olivier wrote:


OK fine. Let it be clearer then (on the Btrfs wiki): nobarrier is an
absolute no go. Case closed.


Sometimes it is useful to make an ephemeral filesystem, i.e. a btrfs on
a dm-crypt device with a random key that is not stored.  This
configuration intentionally and completely destroys the entire
filesystem, and all data on it, in the event of a power failure.  It's
useful for things like temporary table storage, where ramfs is too
small, swap-backed tmpfs is too slow, and/or there is a requirement that
the data not be persisted across reboots.

In other words, nobarrier is for a little better performance when you
already want to _intentionally_ destroy your filesystem on power
failure.


Very good explanation of why it's useful to have such an otherwise
destructive mount option even available in the first place.  Thanks! =:^)

The other reason, as has been pointed out in a different sub-thread, is 
that if you have a guaranteed good hardware RAID controller, which has a 
known good built in non-volatile write cache, and you turn off 
write-reordering, and you turn off the write-caches on all the connected 
hard drives, then it is relatively safe.  Of course, the chances of most 
people actually meeting all those conditions is pretty slim.




smime.p7s
Description: S/MIME Cryptographic Signature

Re: FYIO: A rant about btrfs

2015-09-18 Thread Austin S Hemmelgarn


On 2015-09-17 10:52, Aneurin Price wrote:

On 16 September 2015 at 20:21, Austin S Hemmelgarn  wrote:

ZFS has been around for much longer, it's been mature and feature complete for 
more than a decade, and has had a long time to improve performance wise.  It is 
important to note though, that on low-end hardware, BTRFS can (and often does 
in my experience) perform better than ZFS, because ZFS is a serious resource 
hog (I have yet to see a stable ZFS deployment with less than 16G of RAM, even 
with all the fancy features turned off).


If you have a real example of ZFS becoming unstable with, say, 4 or
8GB of memory, that doesn't involve attempting deduplication (which I
guess is what you mean by 'all the fancy features') on a many-TB pool,
I'd be interested to hear about it. (Psychic debugger says 'possibly
somebody trying to use a large L2ARC on a pool with many/large zvols')

My home fileserver has been running zfs for about 5 years, on a system
maxed out at 4GB RAM. Currently up to ~9TB of data. The only stability
problems I ever had were towards the beginning when I was using
zfs-fuse because zfsonlinux wasn't ready then, *and* I was trying out
deduplication.

I have a couple of work machines with 2GB RAM and pools currently
around 2.5TB full; no problems with these either in the couple of
years they've been in use, though granted these are lightly loaded
machines since what they mostly do is receive backup streams.

Bear in mind that these are Linux machines, and zfsonlinux's memory
management is known to be inferior to ZFS on Solaris and FreeBSD
(because it does not integrate with the page cache and instead grabs a
[configurable] chunk of memory, and doesn't always do a great job of
dropping it in response to memory pressure).

I should qualify this further, in particular I meant using ZFS on Linux 
(not *BSD, they did an amazing job WRT stability), and actually taking 
advantage of the volume-management (ie, not just storing files on it, 
but also using zvols).  In essence, A better way to put it is that I've 
never seen a truly stable system using zfsonlinux with less than 16G or 
RAM which is using it for volume-management as well as file storage.




smime.p7s
Description: S/MIME Cryptographic Signature

Re: FYIO: A rant about btrfs

2015-09-18 Thread Austin S Hemmelgarn


On 2015-09-17 11:57, Martin Steigerwald wrote:

Am Mittwoch, 16. September 2015, 23:29:30 CEST schrieb Hugo Mills:

but even then having write-barriers
turned off is still not as safe as having them turned on.  Most of
the time when I've tried testing with 'nobarrier' (not just on BTRFS
but on ext* as well), I had just as many issues with data loss when
the system crashed as when it (simlated via killing the virtual
machine) lost power.  Both journaling and COW filesystems need to
ensure ordering of certain write operations to be able to maintain
consistency.  For example, the new/updated data blocks need to be on
disk before the metadata is updated to point to them, otherwise you
database can end up corrupted.


Indeed. The barriers are an ordering condition. The FS relies on
(i.e. *requires*) that ordering condition, in order to be truly
consistent. Running with "nobarrier" is a very strong signal that you
really don't care about the data on the FS.

This is not a case of me simply believing that because I've been
using btrfs for so long that I've got used to the peculiarities. The
first time I heard about the nobarrier option, something like 6 years
ago when I was first using btrfs, I thought "that's got to be a really
silly idea". Any complex data structure, like a filesystem, is going
to rely on some kind of ordering guarantees, somewhere in its
structure. (The ordering might be strict, with a global clock, or
barrier-based, or lattice-like, as for example a vector clock, but
there's going to be _some_ concept of order). nobarrier allows the FS
to ignore those guarantees, and even without knowing anything about
the FS at all, doing so is a big red DANGER flag.


Official recommendation for XFS differs from that:

  Q. Should barriers be enabled with storage which has a persistent write
cache?

Many hardware RAID have a persistent write cache which preserves it across
power failure, interface resets, system crashes, etc. Using write barriers in
this instance is not recommended and will in fact lower performance.
Therefore, it is recommended to turn off the barrier support and mount the
filesystem with "nobarrier", assuming your RAID controller is infallible and
not resetting randomly like some common ones do. But take care about the hard
disk write cache, which should be off.

http://xfs.org/index.php/
XFS_FAQ#Q._Should_barriers_be_enabled_with_storage_which_has_a_persistent_write_cache.
3F
There's a difference there still, XFS isn't quite as dependent on 
ordering as BTRFS is, and they're also giving some other strict 
rquirements (hard-disk write-cache being off (which I see a rather large 
number of people ignore), and a high-end RAID controller).





smime.p7s
Description: S/MIME Cryptographic Signature

Re: FYIO: A rant about btrfs

2015-09-17 Thread Duncan

Zygo Blaxell posted on Wed, 16 Sep 2015 18:08:56 -0400 as excerpted:

> On Wed, Sep 16, 2015 at 03:04:38PM -0400, Vincent Olivier wrote:
>> 
>> OK fine. Let it be clearer then (on the Btrfs wiki): nobarrier is an
>> absolute no go. Case closed.
> 
> Sometimes it is useful to make an ephemeral filesystem, i.e. a btrfs on
> a dm-crypt device with a random key that is not stored.  This
> configuration intentionally and completely destroys the entire
> filesystem, and all data on it, in the event of a power failure.  It's
> useful for things like temporary table storage, where ramfs is too
> small, swap-backed tmpfs is too slow, and/or there is a requirement that
> the data not be persisted across reboots.
> 
> In other words, nobarrier is for a little better performance when you
> already want to _intentionally_ destroy your filesystem on power
> failure.

Very good explanation of why it's useful to have such an otherwise 
destructive mount option even available in the first place.  Thanks! =:^)

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: FYIO: A rant about btrfs

2015-09-17 Thread Martin Steigerwald

Am Mittwoch, 16. September 2015, 23:29:30 CEST schrieb Hugo Mills:
> > but even then having write-barriers
> > turned off is still not as safe as having them turned on.  Most of
> > the time when I've tried testing with 'nobarrier' (not just on BTRFS
> > but on ext* as well), I had just as many issues with data loss when
> > the system crashed as when it (simlated via killing the virtual
> > machine) lost power.  Both journaling and COW filesystems need to
> > ensure ordering of certain write operations to be able to maintain
> > consistency.  For example, the new/updated data blocks need to be on
> > disk before the metadata is updated to point to them, otherwise you
> > database can end up corrupted.
> 
>Indeed. The barriers are an ordering condition. The FS relies on
> (i.e. *requires*) that ordering condition, in order to be truly
> consistent. Running with "nobarrier" is a very strong signal that you
> really don't care about the data on the FS.
> 
>This is not a case of me simply believing that because I've been
> using btrfs for so long that I've got used to the peculiarities. The
> first time I heard about the nobarrier option, something like 6 years
> ago when I was first using btrfs, I thought "that's got to be a really
> silly idea". Any complex data structure, like a filesystem, is going
> to rely on some kind of ordering guarantees, somewhere in its
> structure. (The ordering might be strict, with a global clock, or
> barrier-based, or lattice-like, as for example a vector clock, but
> there's going to be _some_ concept of order). nobarrier allows the FS
> to ignore those guarantees, and even without knowing anything about
> the FS at all, doing so is a big red DANGER flag.

Official recommendation for XFS differs from that:

 Q. Should barriers be enabled with storage which has a persistent write 
cache?

Many hardware RAID have a persistent write cache which preserves it across 
power failure, interface resets, system crashes, etc. Using write barriers in 
this instance is not recommended and will in fact lower performance. 
Therefore, it is recommended to turn off the barrier support and mount the 
filesystem with "nobarrier", assuming your RAID controller is infallible and 
not resetting randomly like some common ones do. But take care about the hard 
disk write cache, which should be off. 

http://xfs.org/index.php/
XFS_FAQ#Q._Should_barriers_be_enabled_with_storage_which_has_a_persistent_write_cache.
3F

Thanks,
-- 
Martin
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: FYIO: A rant about btrfs

2015-09-17 Thread Aneurin Price

On 16 September 2015 at 20:21, Austin S Hemmelgarn  wrote:
> ZFS has been around for much longer, it's been mature and feature complete 
> for more than a decade, and has had a long time to improve performance wise.  
> It is important to note though, that on low-end hardware, BTRFS can (and 
> often does in my experience) perform better than ZFS, because ZFS is a 
> serious resource hog (I have yet to see a stable ZFS deployment with less 
> than 16G of RAM, even with all the fancy features turned off).

If you have a real example of ZFS becoming unstable with, say, 4 or
8GB of memory, that doesn't involve attempting deduplication (which I
guess is what you mean by 'all the fancy features') on a many-TB pool,
I'd be interested to hear about it. (Psychic debugger says 'possibly
somebody trying to use a large L2ARC on a pool with many/large zvols')

My home fileserver has been running zfs for about 5 years, on a system
maxed out at 4GB RAM. Currently up to ~9TB of data. The only stability
problems I ever had were towards the beginning when I was using
zfs-fuse because zfsonlinux wasn't ready then, *and* I was trying out
deduplication.

I have a couple of work machines with 2GB RAM and pools currently
around 2.5TB full; no problems with these either in the couple of
years they've been in use, though granted these are lightly loaded
machines since what they mostly do is receive backup streams.

Bear in mind that these are Linux machines, and zfsonlinux's memory
management is known to be inferior to ZFS on Solaris and FreeBSD
(because it does not integrate with the page cache and instead grabs a
[configurable] chunk of memory, and doesn't always do a great job of
dropping it in response to memory pressure).
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: FYIO: A rant about btrfs

2015-09-17 Thread Austin S Hemmelgarn


On 2015-09-16 19:31, Hugo Mills wrote:

On Wed, Sep 16, 2015 at 03:21:26PM -0400, Austin S Hemmelgarn wrote:

On 2015-09-16 12:45, Martin Tippmann wrote:

2015-09-16 17:20 GMT+02:00 Austin S Hemmelgarn :
[...]

[...]

 From reading the list I understand that btrfs is still very much work
in progress and performance is not a top priority at this stage but I
don't see why it shouldn't perform at least equally good as ZFS/F2FS
on the same workloads. Is looking at performance problems on the
development roadmap?

Performance is on the roadmap, but the roadmap is notoriously
short-sighted when it comes to time-frame for completion of
something. You have to understand also that the focus in BTRFS has
also been more on data safety than performance, because that's the
intended niche, and the area most people look to ZFS for.


Wait... there's a roadmap? ;)

Yeah, maybe it's better to say that there's a directed graph of feature 
interdependence.  I was just basing my statement on the presence of a 
list of project ideas on the wiki. :)




smime.p7s
Description: S/MIME Cryptographic Signature

Re: FYIO: A rant about btrfs

2015-09-16 Thread Rich Freeman

On Wed, Sep 16, 2015 at 12:45 PM, Martin Tippmann
 wrote:
> From reading the list I understand that btrfs is still very much work
> in progress and performance is not a top priority at this stage but I
> don't see why it shouldn't perform at least equally good as ZFS/F2FS
> on the same workloads. Is looking at performance problems on the
> development roadmap?

My sense is that sufferings in comparison to ZFS just represent a lack
of maturity - there just hasn't been as much focus on performance.
I'm not aware of any fundamental design issues which are likely to
make btrfs perform worse than ZFS in the long-term.

F2FS is a fundamentally different beast.  It is a log-based filesystem
as far as I'm aware, and on flash that gives it some substantial
advantages, but it doesn't support snapshotting/etc as far as I'm
aware.  I'm sure that in the long term some operations are just going
to be faster on F2FS no matter what just due to its design, and other
operations will always be slower on F2FS.

To draw an analogy, imagine you have a 1TB ext4 filesystem and a 1TB
btrfs filesystem.  On each you create a 900GB file, and then proceed
to make millions of internal writes all over it.  The ext4 filesystem
is just going to completely outperform btrfs at this job, and I
suspect it would outperform zfs as well.  For such a use case you
don't really even need a filesystem - you might as well just be
reading/writing random blocks right off the disk, and ext4 is pretty
close to that in behavior when it comes to internal file
modifications.  The COW filesystems are going to be fragmenting the
living daylights out of the file and its metadata.  Of course, if you
pulled the plug in the middle of one of those operations the COW
filesystems are more likely to end up in a sane state if you care
about the order of file modifications, and if you're doing this on
RAID both zfs and btrfs will be immune to any write hole issues.
Also, if you go making reflink copies of large files on a btrfs
filesystem it will perform MUCH better than doing the equivalent on
ext4 (which requires copying all the data, at a cost of both time and
space).

In the end you have to look at your application, and not just
performance stats.  There are tradeoffs.  Personally, I've had enough
hard drive failures that btrfs is worth it to me just for the
assurance that when something goes wrong the filesystem knows what is
good and what isn't.  As drives get bigger this becomes more and more
important.

--
Rich
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: FYIO: A rant about btrfs

2015-09-16 Thread Hugo Mills

On Wed, Sep 16, 2015 at 03:21:26PM -0400, Austin S Hemmelgarn wrote:
> On 2015-09-16 12:45, Martin Tippmann wrote:
> >2015-09-16 17:20 GMT+02:00 Austin S Hemmelgarn :
> >[...]
[...]
> > From reading the list I understand that btrfs is still very much work
> >in progress and performance is not a top priority at this stage but I
> >don't see why it shouldn't perform at least equally good as ZFS/F2FS
> >on the same workloads. Is looking at performance problems on the
> >development roadmap?
> Performance is on the roadmap, but the roadmap is notoriously
> short-sighted when it comes to time-frame for completion of
> something. You have to understand also that the focus in BTRFS has
> also been more on data safety than performance, because that's the
> intended niche, and the area most people look to ZFS for.

   Wait... there's a roadmap? ;)

   Hugo.

-- 
Hugo Mills | Our so-called leaders speak
hugo@... carfax.org.uk | with words they try to jail ya
http://carfax.org.uk/  | They subjugate the meek
PGP: E2AB1DE4  | but it's the rhetoric of failure.  The Police


signature.asc
Description: Digital signature

Re: FYIO: A rant about btrfs

2015-09-16 Thread Hugo Mills

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On Wed, Sep 16, 2015 at 03:08:43PM -0400, Austin S Hemmelgarn wrote:
> On 2015-09-16 12:25, Zia Nayamuth wrote:
> >Some response to your criticism:
> >
> >1. How would that hole fare with a fully battery-backed/flash-backed
> >path (battery-backed or flash-backed HBA with disks with full power-loss
> >protection, like the Intel S3500)? In such a situation (quite
> >commonplace in server-land), power-loss should not cause any data loss
> >since all data in the cache is guaranteed to be committed to
> >non-volatile memory at some point (whether such assurances may be
> >trusted is another matter entirely though, and well outside the scope of
> >this discussion).
> It's not as much of an issue if you have full power loss protection
> (assuming of course it works),

   "Power loss" is a shorthand for a number of issues, only one of
which is related to electrons ceasing to vibrate through the wires.
You can have a power-loss-like event when your kernel crashes hard. No
UPS will help you there.

> but even then having write-barriers
> turned off is still not as safe as having them turned on.  Most of
> the time when I've tried testing with 'nobarrier' (not just on BTRFS
> but on ext* as well), I had just as many issues with data loss when
> the system crashed as when it (simlated via killing the virtual
> machine) lost power.  Both journaling and COW filesystems need to
> ensure ordering of certain write operations to be able to maintain
> consistency.  For example, the new/updated data blocks need to be on
> disk before the metadata is updated to point to them, otherwise you
> database can end up corrupted.

   Indeed. The barriers are an ordering condition. The FS relies on
(i.e. *requires*) that ordering condition, in order to be truly
consistent. Running with "nobarrier" is a very strong signal that you
really don't care about the data on the FS. 

   This is not a case of me simply believing that because I've been
using btrfs for so long that I've got used to the peculiarities. The
first time I heard about the nobarrier option, something like 6 years
ago when I was first using btrfs, I thought "that's got to be a really
silly idea". Any complex data structure, like a filesystem, is going
to rely on some kind of ordering guarantees, somewhere in its
structure. (The ordering might be strict, with a global clock, or
barrier-based, or lattice-like, as for example a vector clock, but
there's going to be _some_ concept of order). nobarrier allows the FS
to ignore those guarantees, and even without knowing anything about
the FS at all, doing so is a big red DANGER flag.

   Hugo.

> >2. Fair point. I'd like to know his hardware, given how strongly
> >hardware can influence things.
> >
> >3. It's pretty obvious that the author of that blog is specifically
> >targeting OLTP performance (explicit statement in intro, choice of
> >benchmark, name and focus of blog), not common-case, and even states
> >that in the first two paragraphs of his conclusion. The focus is
> >somewhat less clear in said conclusion, namely, is he truly talking
> >about general purpose use or is he talking about general purpose OLTP use?
> >
> My takeaway was that he intended 'general purpose use' to mean
> generic every day usage across a wide variety of systems, he was not
> particularly specific about it however.

- -- 
Hugo Mills | Our so-called leaders speak
hugo@... carfax.org.uk | with words they try to jail ya
http://carfax.org.uk/  | They subjugate the meek
PGP: E2AB1DE4  | but it's the rhetoric of failure.  The Police
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.12 (GNU/Linux)

iQIcBAEBAgAGBQJV+fswAAoJEFheFHXiqx3knD0QAJ0aBV6nE8npPF+daG0Nwdwi
U8ksTAxNYjJQPbwLVcoIDhBzuP/DD8yMBER/FaGuXDvOIOdaX05mffQblYLGc+bT
bze8nY9F0+5/5m2dGXiNyoouifHIZBtW+vaIipTqMwFU6I6IeOJIABIYeCxNmcm+
INcODG6rogCyRamJZGuSkZxnUe4GIu5yzHzPqnlXQ9PHVRa1IjYzAO17EMoBfHHo
bASoTQ7h1u1Kx3KqTnctgwKxBD1LdRNlZ/SeEmnAGrj6s8RPIYj2/nEtgjTLleU+
vGt2keIApk1mFPwayvxZdNBZhyuy4UrqGdXYVkmXYIJnoGshOulWr2Ntp83ZSw/f
pGHXubU7oHU+Lha7pKbDunTp8ktKlcyCKIU+3eyE2fhMozy0HChqR591uoGUtp3n
ylxchdl9GiUEsIpeZ7MLVEvfxHHob6Na18NuU3zLWtvMuch0+riWWWG8Wf73ql1b
JovCfdAAroowjWfThxApx4QT3Lz5c9K9JZsHeSNbvmPZz2kEsyXIBg7zVNMbKdcm
uqv8aMQxYMP6O8rrHXC6td8tizlLOKQVf8GNmAZS5BvfVSaPx5bbpDKYz6g87dRj
hbsAWwhIvWOE4S5/dhwKTinRXAJ+qNbeeASyGGXNgpa6afNzfAVErpFtte10SFuE
M3w+DYcbj1JEORvxvC3U
=LXO/
-END PGP SIGNATURE-


signature.asc
Description: Digital signature

Re: FYIO: A rant about btrfs

2015-09-16 Thread Duncan

Vincent Olivier posted on Wed, 16 Sep 2015 15:04:38 -0400 as excerpted:

 3. He's testing it for a workload is a known and documented problem
 for BTRFS, and claiming that that means that it isn't worth
 considering as a general usage filesystem.  Most people don't run
 RDBMS servers on their systems, and as such, such a workload is not
 worth considering for most people.
>>> Apparently RDBMS being a problem on Btrfs is neither known nor
>>> documented enough (he’s right about the contrast with claiming
>>> publicly that Btrfs is indeed production ready).
>> OK, maybe not documented, but RDBMS falls under 'Large files with
>> highly random access patterns and heavy RMW usage', which is a known
>> issue for BTRFS, and also applies to VM images.
> 
> 
> This guy is no idiot. If it wasn’t clear enough for him. It’s not clear
> enough period.

I'd argue that while he's "clearly no idiot", he's equally clearly an 
"idiot savant" (relatively speaking).  He has an extremely high level of 
knowledge in a very few specific areas (rdbms being the primary one in 
discussion here), and much higher than average in a number of others, but 
unfortunately tends to lack what many would call "common sense" in some 
or many others, to the point that he does what people in those areas 
would consider "idiotic things" because he simply doesn't have a 
reasonable level of experience with them, and what's more, is 
demonstrably uninterested in getting that level of "functional 
experience", as he has higher priorities ranking where he's going to be 
spending his time -- basically, staying at the top of his field in the 
areas where he's already extremely highly knowledgeable.

Note that I'm not mocking him.  I'm much the same way, as are many geeks/
nerds, thus the term "nerdview".  (Personal example.  Back in college I 
once _stapled_ a note to a football.  It simply never occurred to me that 
a football is inflated, and stapling something to it was a _very_ bad idea
(!!), because I simply didn't have even a minimal level of functional 
experience in that area, such that I lacked "common sense" in regard to 
it, and further, wasn't then and am not now, particularly interested in 
spending my time getting that minimal functional experience level, as my 
priorities simply lie elsewhere.)  See also the relatively high level of 
Asperger syndrome among the geek crowd.

As for his conclusions, I find myself "in violent agreement", as I've 
seen it said, with most of them, the exception being that I don't agree 
that btrfs isn't appropriate as a general purpose filesystem -- I'd say a 
more accurate statement is that, taking into account btrfs' maturity 
level, btrfs is acceptable as a general purpose filesystem, with the 
caveat that as a COW based filesystem where optimization has not yet been 
a priority and may well not be for a few more years, btrfs is definitely 
COUNTER-indicated for GiB+ sized file VM-image and database use.  
However, accepting that RDBMS is in fact one of his primary focus areas, 
I can see how his definition of "general purpose" would tend to include 
that, where a more "general purpose" definition of "general purpose 
filesystem" (ha, recursive definitions!) has room for that particular 
case being an exception.

But I _definitely_ agree with him that btrfs is unfortunately being 
billed as "mature" and "production ready", where, for the general use 
case, I'd... let's just say I'd not choose that characterization, 
preferring instead to characterize btrfs as "not yet entirely stable and 
mature", and definitely not yet optimized.  It's certainly ready for 
"cautious" use after doing a bit of research on your use-case, and with 
backups at the ready if you care about the data, but as any good sysadmin 
will tell you, by definition, if you care about the data, you have it 
backed up, and if you don't have it backed up, by equal definition, you 
do _NOT_ care about losing the data, so that's a _given_.  But "cautious 
use after researching your use-case" isn't what he's interested in doing, 
or rather...

He's not particularly interested in doing that level of pre-deployment 
research, and to his credit, for a mature filesystem, he really shouldn't 
have to... tho people that care enough about their use-case, as he really 
should for RDBMS given that as a major focus point for him, will be doing 
that research anyway, _because_ they care.

Which to his credit, he's doing the research, in a way.  It's not the way 
_I_ would go about it, certainly, but by running those benchmarks, etc, 
and comparing against other filesystems, he's doing research in his own 
way, and under the parameters he uses, I certainly don't disagree with 
his conclusions, that btrfs is simply a rather poor choice for his use-
case of interest, particularly given the level of additional research and 
tuning he's demonstrably willing to put into btrfs specifically, that 
being (very close to) zero.

As for the btrfs s

Re: FYIO: A rant about btrfs

2015-09-16 Thread Zygo Blaxell

On Wed, Sep 16, 2015 at 03:04:38PM -0400, Vincent Olivier wrote:
> > On Sep 16, 2015, at 2:22 PM, Austin S Hemmelgarn  
> > wrote:
> > On 2015-09-16 12:51, Vincent Olivier wrote:
> >>> On Sep 16, 2015, at 11:20 AM, Austin S Hemmelgarn  
> >>> wrote:
> >>> On 2015-09-16 10:43, M G Berberich wrote:
> >>> It is worth noting a few things that were done incorrectly in this 
> >>> testing:
> >>> 1. _NEVER_ turn off write barriers (nobarrier mount option), doing so 
> >>> subtly breaks the data integrity guarantees of _ALL_ filesystems, but 
> >>> especially so on COW filesystems like BTRFS.  With this off, you will 
> >>> have a much higher chance that a power loss will cause data loss.  It 
> >>> shouldn't be turned off unless you are also turning off write-caching in 
> >>> the hardware or know for certain that no write-reordering is done by the 
> >>> hardware (and almost all modern hardware does write-reordering for 
> >>> performance reasons).
> >> But can the “nobarrier” mount option affect performances negatively for 
> >> Btrfs (and not only data integrity)?
> > Using it improves performance for every filesystem on Linux that supports 
> > it.  This does not mean that it is _EVER_ a good idea to do so.  This mount 
> > option is one of the few things on my list of things that I will _NEVER_ 
> > personally provide support to people for, because it almost guarantees that 
> > you will lose data if the system dies unexpectedly (even if it's for a 
> > reason other than power loss).
> 
> OK fine. Let it be clearer then (on the Btrfs wiki): nobarrier is an absolute 
> no go. Case closed.

Sometimes it is useful to make an ephemeral filesystem, i.e. a btrfs on a
dm-crypt device with a random key that is not stored.  This configuration
intentionally and completely destroys the entire filesystem, and all
data on it, in the event of a power failure.  It's useful for things like
temporary table storage, where ramfs is too small, swap-backed tmpfs is
too slow, and/or there is a requirement that the data not be persisted
across reboots.

In other words, nobarrier is for a little better performance when you
already want to _intentionally_ destroy your filesystem on power failure.



signature.asc
Description: Digital signature

Re: FYIO: A rant about btrfs

2015-09-16 Thread Austin S Hemmelgarn


On 2015-09-16 15:04, Vincent Olivier wrote:



On Sep 16, 2015, at 2:22 PM, Austin S Hemmelgarn  wrote:

On 2015-09-16 12:51, Vincent Olivier wrote:

Hi,



On Sep 16, 2015, at 11:20 AM, Austin S Hemmelgarn  wrote:

On 2015-09-16 10:43, M G Berberich wrote:

Hello,

just for information. I stumbled about a rant about btrfs-performance:

  http://blog.pgaddict.com/posts/friends-dont-let-friends-use-btrfs-for-oltp

I read it too.

It is worth noting a few things that were done incorrectly in this testing:
1. _NEVER_ turn off write barriers (nobarrier mount option), doing so subtly 
breaks the data integrity guarantees of _ALL_ filesystems, but especially so on 
COW filesystems like BTRFS.  With this off, you will have a much higher chance 
that a power loss will cause data loss.  It shouldn't be turned off unless you 
are also turning off write-caching in the hardware or know for certain that no 
write-reordering is done by the hardware (and almost all modern hardware does 
write-reordering for performance reasons).

But can the “nobarrier” mount option affect performances negatively for Btrfs 
(and not only data integrity)?

Using it improves performance for every filesystem on Linux that supports it.  
This does not mean that it is _EVER_ a good idea to do so.  This mount option 
is one of the few things on my list of things that I will _NEVER_ personally 
provide support to people for, because it almost guarantees that you will lose 
data if the system dies unexpectedly (even if it's for a reason other than 
power loss).

OK fine. Let it be clearer then (on the Btrfs wiki): nobarrier is an absolute 
no go. Case closed.

From the https://btrfs.wiki.kernel.org/index.php/Mount_options
NOTE: Using this option greatly increases the chances of you 
experiencing data corruption during a power failure situation. This 
means full file-system corruption, and not just losing or corrupting 
data that was being written during a power cut or kernel panic.


It could be a bit clearer, but it's pretty well spelled out.

2. He provides no comparison of any other filesystem with TRIM support turned 
on (it is very likely that all filesystems will demonstrate such performance 
drops.  Based on that graph, it looks like the device doesn't support 
asynchronous trim commands).

I think he means by the text surrounding the only graph that mentions TRIM that 
this exact same test on the other filesystems he benchmarked yield much better 
results.

Possibly, but there are also known issues with TRIM/DISCARD on BTRFS in 4.0.  
And his claim is still baseless unless he actually provides reference for it.

Same as above: TRIM/DISCARD officially not recommended in production until 
further notice?
TRIM/DISCARD do work, it's just that they don't work to the degree they 
are expected to, there's some cases where BTRFS doesn't issue a discard 
when it should, and fstrim doesn't properly trim everything.

3. He's testing it for a workload is a known and documented problem for BTRFS, 
and claiming that that means that it isn't worth considering as a general usage 
filesystem.  Most people don't run RDBMS servers on their systems, and as such, 
such a workload is not worth considering for most people.

Apparently RDBMS being a problem on Btrfs is neither known nor documented 
enough (he’s right about the contrast with claiming publicly that Btrfs is 
indeed production ready).

OK, maybe not documented, but RDBMS falls under 'Large files with highly random 
access patterns and heavy RMW usage', which is a known issue for BTRFS, and 
also applies to VM images.

This guy is no idiot. If it wasn’t clear enough for him. It’s not clear enough 
period.

From https://btrfs.wiki.kernel.org/index.php/Gotchas
Fragmentation
Files with a lot of random writes can become heavily fragmented (1+ 
extents) causing trashing on HDDs and excessive multi-second spikes of 
CPU load on systems with an SSD or large amount a RAM.
On servers and workstations this affects databases and virtual machine 
images.

The nodatacow mount option may be of use here, with associated gotchas.
On desktops this primarily affects application databases (including 
Firefox and Chromium profiles, GNOME Zeitgeist, Ubuntu Desktop Couch, 
Banshee, and Evolution's datastore.)
Workarounds include manually defragmenting your home directory using 
btrfs fi defragment. Auto-defragment (mount option autodefrag) should 
solve this problem in 3.0.
Symptoms include btrfs-transacti and btrfs-endio-wri taking up a lot of 
CPU time (in spikes, possibly triggered by syncs). You can use filefrag 
to locate heavily fragmented files (may not work correctly with 
compression).

His points about the degree of performance jitter are valid however, as are the 
complaints of apparent CPU intensive stalls in the BTRFS code, and I 
occasionally see both on my own systems.

Me too. My two cents is that focusing on improving performances for 
Btrfs-optimal use cases is much more interesting than bringing

Re: FYIO: A rant about btrfs

2015-09-16 Thread Austin S Hemmelgarn


On 2015-09-16 12:45, Martin Tippmann wrote:

Hi,

2015-09-16 17:20 GMT+02:00 Austin S Hemmelgarn :
[...]

3. He's testing it for a workload is a known and documented problem for
BTRFS, and claiming that that means that it isn't worth considering as a
general usage filesystem.  Most people don't run RDBMS servers on their
systems, and as such, such a workload is not worth considering for most
people.

His points about the degree of performance jitter are valid however, as are
the complaints of apparent CPU intensive stalls in the BTRFS code, and I
occasionally see both on my own systems.


Are there any conceptual or design properties that make btrfs perform
worse in this workload compared to ZFS or F2FS that both do CoW and
share some similarity?
ZFS has been around for much longer, it's been mature and feature 
complete for more than a decade, and has had a long time to improve 
performance wise.  It is important to note though, that on low-end 
hardware, BTRFS can (and often does in my experience) perform better 
than ZFS, because ZFS is a serious resource hog (I have yet to see a 
stable ZFS deployment with less than 16G of RAM, even with all the fancy 
features turned off).


As for F2FS, it was designed from the ground up for high performance and 
efficiency.  BTRFS started (from what I understand) as more of an 
experiment, and as such the original code was not particularly optimized.


If not then I still think (from a user perspective) it it still
interesting too look where the difference is coming from? There is
also a quite old talk with some unfortunate numbers for btrfs from the
XFS developer Dave Chinner:
https://www.youtube.com/watch?v=FegjLbCnoBw - is this already
resolved? 2012 is 3 years ago.
That depends on what you mean by resolved.  In every test I've done that 
wasn't a specialized benchmark designed to simulate a particular 
workload, I've gotten roughly equal performance between XFS and BTRFS 
(YMMV of course, especially because my usecase is not very typical of 
most people (primarily building software and running BOINC projects)). 
I use BTRFS mostly because it makes online reprovisioning much easier 
(you can't migrate an XFS filesystem to a new block device online, and 
you also can't shrink one).


It's also worth noting that Dave Chinner usually comes forth with 
numbers extolling the value of XFS when a new filesystem gets proposed. 
 In my experience, for small scale usage, ext* gets better performance 
than almost anything else, including XFS.


 From reading the list I understand that btrfs is still very much work
in progress and performance is not a top priority at this stage but I
don't see why it shouldn't perform at least equally good as ZFS/F2FS
on the same workloads. Is looking at performance problems on the
development roadmap?
Performance is on the roadmap, but the roadmap is notoriously 
short-sighted when it comes to time-frame for completion of something. 
You have to understand also that the focus in BTRFS has also been more 
on data safety than performance, because that's the intended niche, and 
the area most people look to ZFS for.




smime.p7s
Description: S/MIME Cryptographic Signature

Re: FYIO: A rant about btrfs

2015-09-16 Thread Austin S Hemmelgarn


On 2015-09-16 12:25, Zia Nayamuth wrote:

Some response to your criticism:

1. How would that hole fare with a fully battery-backed/flash-backed
path (battery-backed or flash-backed HBA with disks with full power-loss
protection, like the Intel S3500)? In such a situation (quite
commonplace in server-land), power-loss should not cause any data loss
since all data in the cache is guaranteed to be committed to
non-volatile memory at some point (whether such assurances may be
trusted is another matter entirely though, and well outside the scope of
this discussion).
It's not as much of an issue if you have full power loss protection 
(assuming of course it works), but even then having write-barriers 
turned off is still not as safe as having them turned on.  Most of the 
time when I've tried testing with 'nobarrier' (not just on BTRFS but on 
ext* as well), I had just as many issues with data loss when the system 
crashed as when it (simlated via killing the virtual machine) lost 
power.  Both journaling and COW filesystems need to ensure ordering of 
certain write operations to be able to maintain consistency.  For 
example, the new/updated data blocks need to be on disk before the 
metadata is updated to point to them, otherwise you database can end up 
corrupted.


2. Fair point. I'd like to know his hardware, given how strongly
hardware can influence things.

3. It's pretty obvious that the author of that blog is specifically
targeting OLTP performance (explicit statement in intro, choice of
benchmark, name and focus of blog), not common-case, and even states
that in the first two paragraphs of his conclusion. The focus is
somewhat less clear in said conclusion, namely, is he truly talking
about general purpose use or is he talking about general purpose OLTP use?

My takeaway was that he intended 'general purpose use' to mean generic 
every day usage across a wide variety of systems, he was not 
particularly specific about it however.





smime.p7s
Description: S/MIME Cryptographic Signature

Re: FYIO: A rant about btrfs

2015-09-16 Thread Vincent Olivier


> On Sep 16, 2015, at 2:22 PM, Austin S Hemmelgarn  wrote:
> 
> On 2015-09-16 12:51, Vincent Olivier wrote:
>> Hi,
>> 
>> 
>>> On Sep 16, 2015, at 11:20 AM, Austin S Hemmelgarn  
>>> wrote:
>>> 
>>> On 2015-09-16 10:43, M G Berberich wrote:
 Hello,
 
 just for information. I stumbled about a rant about btrfs-performance:
 
  http://blog.pgaddict.com/posts/friends-dont-let-friends-use-btrfs-for-oltp
>> I read it too.
>>> It is worth noting a few things that were done incorrectly in this testing:
>>> 1. _NEVER_ turn off write barriers (nobarrier mount option), doing so 
>>> subtly breaks the data integrity guarantees of _ALL_ filesystems, but 
>>> especially so on COW filesystems like BTRFS.  With this off, you will have 
>>> a much higher chance that a power loss will cause data loss.  It shouldn't 
>>> be turned off unless you are also turning off write-caching in the hardware 
>>> or know for certain that no write-reordering is done by the hardware (and 
>>> almost all modern hardware does write-reordering for performance reasons).
>> But can the “nobarrier” mount option affect performances negatively for 
>> Btrfs (and not only data integrity)?
> Using it improves performance for every filesystem on Linux that supports it. 
>  This does not mean that it is _EVER_ a good idea to do so.  This mount 
> option is one of the few things on my list of things that I will _NEVER_ 
> personally provide support to people for, because it almost guarantees that 
> you will lose data if the system dies unexpectedly (even if it's for a reason 
> other than power loss).



OK fine. Let it be clearer then (on the Btrfs wiki): nobarrier is an absolute 
no go. Case closed.



>>> 2. He provides no comparison of any other filesystem with TRIM support 
>>> turned on (it is very likely that all filesystems will demonstrate such 
>>> performance drops.  Based on that graph, it looks like the device doesn't 
>>> support asynchronous trim commands).
>> I think he means by the text surrounding the only graph that mentions TRIM 
>> that this exact same test on the other filesystems he benchmarked yield much 
>> better results.
> Possibly, but there are also known issues with TRIM/DISCARD on BTRFS in 4.0.  
> And his claim is still baseless unless he actually provides reference for it.



Same as above: TRIM/DISCARD officially not recommended in production until 
further notice?



>>> 3. He's testing it for a workload is a known and documented problem for 
>>> BTRFS, and claiming that that means that it isn't worth considering as a 
>>> general usage filesystem.  Most people don't run RDBMS servers on their 
>>> systems, and as such, such a workload is not worth considering for most 
>>> people.
>> Apparently RDBMS being a problem on Btrfs is neither known nor documented 
>> enough (he’s right about the contrast with claiming publicly that Btrfs is 
>> indeed production ready).
> OK, maybe not documented, but RDBMS falls under 'Large files with highly 
> random access patterns and heavy RMW usage', which is a known issue for 
> BTRFS, and also applies to VM images.


This guy is no idiot. If it wasn’t clear enough for him. It’s not clear enough 
period.


>>> His points about the degree of performance jitter are valid however, as are 
>>> the complaints of apparent CPU intensive stalls in the BTRFS code, and I 
>>> occasionally see both on my own systems.
>> Me too. My two cents is that focusing on improving performances for 
>> Btrfs-optimal use cases is much more interesting than bringing new features 
>> like automatically turning COW off for RDBMS usage or debugging TRIM support.
> It depends, BTRFS is still not feature complete with the overall intent when 
> it was started (raid56 and qgroups being the two big issues at the moment), 
> and attempting to optimize things tends to introduce bugs, which we have 
> quite enough of already without people adding more (and they still seem to be 
> breeding like rabbits).



I would just like a clear statement from a dev-lead saying : until we are 
feature-complete (with a finite list of features to complete) the focus will be 
on feature-completion and not optimizing already-implemented features. Ideally 
with an ETA on when optimization will be more of a priority than it is today.



> That said, my systems (which are usually doing mostly CPU or memory bound 
> tasks, and not I/O bound like the aforementioned benchmarks were testing) run 
> no slower than they did with ext4 as the main filesystem, and in some cases 
> work much faster (even after averaging out the jitter in performance).  Based 
> on this, I wouldn't advocate it for most server usage (except possibly as the 
> root filesystem), but it does work very well for most desktop usage patterns 
> and a number of HPC usage patterns as well.



See, this is interesting: I’d rather have a super fast and discardable SSD 
F2FS/ext4 root with a large Btrfs RAID for (NAS) server usage. Does your 
non-advocacy

Re: FYIO: A rant about btrfs

2015-09-16 Thread Austin S Hemmelgarn


On 2015-09-16 12:51, Vincent Olivier wrote:

Hi,



On Sep 16, 2015, at 11:20 AM, Austin S Hemmelgarn  wrote:

On 2015-09-16 10:43, M G Berberich wrote:

Hello,

just for information. I stumbled about a rant about btrfs-performance:

  http://blog.pgaddict.com/posts/friends-dont-let-friends-use-btrfs-for-oltp

I read it too.

It is worth noting a few things that were done incorrectly in this testing:
1. _NEVER_ turn off write barriers (nobarrier mount option), doing so subtly 
breaks the data integrity guarantees of _ALL_ filesystems, but especially so on 
COW filesystems like BTRFS.  With this off, you will have a much higher chance 
that a power loss will cause data loss.  It shouldn't be turned off unless you 
are also turning off write-caching in the hardware or know for certain that no 
write-reordering is done by the hardware (and almost all modern hardware does 
write-reordering for performance reasons).

But can the “nobarrier” mount option affect performances negatively for Btrfs 
(and not only data integrity)?
Using it improves performance for every filesystem on Linux that 
supports it.  This does not mean that it is _EVER_ a good idea to do so. 
 This mount option is one of the few things on my list of things that I 
will _NEVER_ personally provide support to people for, because it almost 
guarantees that you will lose data if the system dies unexpectedly (even 
if it's for a reason other than power loss).

2. He provides no comparison of any other filesystem with TRIM support turned 
on (it is very likely that all filesystems will demonstrate such performance 
drops.  Based on that graph, it looks like the device doesn't support 
asynchronous trim commands).

I think he means by the text surrounding the only graph that mentions TRIM that 
this exact same test on the other filesystems he benchmarked yield much better 
results.
Possibly, but there are also known issues with TRIM/DISCARD on BTRFS in 
4.0.  And his claim is still baseless unless he actually provides 
reference for it.

3. He's testing it for a workload is a known and documented problem for BTRFS, 
and claiming that that means that it isn't worth considering as a general usage 
filesystem.  Most people don't run RDBMS servers on their systems, and as such, 
such a workload is not worth considering for most people.

Apparently RDBMS being a problem on Btrfs is neither known nor documented 
enough (he’s right about the contrast with claiming publicly that Btrfs is 
indeed production ready).
OK, maybe not documented, but RDBMS falls under 'Large files with highly 
random access patterns and heavy RMW usage', which is a known issue for 
BTRFS, and also applies to VM images.

My view on this is that having one filesystem to rule them all (all storage 
technologies, all use cases) is unrealistic. Also the time when you could put 
your NAS on an old i386 with 3MB of RAM is over. Compression, checksumming, 
COW, snapshotting, quotas, etc. are all computationally intensive features. In 
2015 block storage has become computationally intensive. How about saying 
non-root Btrfs RAID10 is the best choice for a Samba NAS on rotational-HDDs 
with no SMR (my use case)? For root and RDBMS, I use ext4 on a M.2 SSD and with 
a sane initramfs and the most recent stable kernel. I am happy with the 
performances and delighted with the features Btrfs provides. I think it is much 
more productive to document and compare the most successful Btrfs deployments 
rather than trying to find bugs and bottlenecks for use cases that are the 
development focus of other filesystems. I don’t know, I might not make a lot of 
sense here but on top of refactoring the Gotchas, I would be happy to

start a successful deployment story section on the wiki and maybe improve my 
usage of Btrfs along the way (who else here is using Btrfs in a similar 
fashion?).
Agreed, there's a reason that XFS was never the default in most Linux 
distributions, and similarly why there are so many filesystem drivers 
available.  Any given filesystem can have a number of arguments made 
against it, for example:

 * ZFS: Ridiculously resource hungry, and doesn't use the normal page-cache
 * XFS: filesystems can't be shrunk, and tends to perform slow under 
light load compared to most other filesystems.
 * NTFS: Poor file layout for many use cases, and clusters all the 
metadata together in one place.
 * ext*: Lacks some useful functionality (reflinks for example), and 
the file layout and aggressive journaling are usually bad for flash.
 * reiserfs: numerous gotchas in usage, and fsck loses it's mind when 
dealing with filestems that have reiserfs images stored in them as 
regular files

His points about the degree of performance jitter are valid however, as are the 
complaints of apparent CPU intensive stalls in the BTRFS code, and I 
occasionally see both on my own systems.

Me too. My two cents is that focusing on improving performances for 
Btrfs-optimal use cases is much more interesting

Re: FYIO: A rant about btrfs

2015-09-16 Thread Vincent Olivier

Hi,

> On Sep 16, 2015, at 11:20 AM, Austin S Hemmelgarn  
> wrote:
> 
> On 2015-09-16 10:43, M G Berberich wrote:
>> Hello,
>> 
>> just for information. I stumbled about a rant about btrfs-performance:
>> 
>> http://blog.pgaddict.com/posts/friends-dont-let-friends-use-btrfs-for-oltp

I read it too.

> It is worth noting a few things that were done incorrectly in this testing:
> 1. _NEVER_ turn off write barriers (nobarrier mount option), doing so subtly 
> breaks the data integrity guarantees of _ALL_ filesystems, but especially so 
> on COW filesystems like BTRFS.  With this off, you will have a much higher 
> chance that a power loss will cause data loss.  It shouldn't be turned off 
> unless you are also turning off write-caching in the hardware or know for 
> certain that no write-reordering is done by the hardware (and almost all 
> modern hardware does write-reordering for performance reasons).

But can the “nobarrier” mount option affect performances negatively for Btrfs 
(and not only data integrity)?

> 2. He provides no comparison of any other filesystem with TRIM support turned 
> on (it is very likely that all filesystems will demonstrate such performance 
> drops.  Based on that graph, it looks like the device doesn't support 
> asynchronous trim commands).

I think he means by the text surrounding the only graph that mentions TRIM that 
this exact same test on the other filesystems he benchmarked yield much better 
results.

> 3. He's testing it for a workload is a known and documented problem for 
> BTRFS, and claiming that that means that it isn't worth considering as a 
> general usage filesystem.  Most people don't run RDBMS servers on their 
> systems, and as such, such a workload is not worth considering for most 
> people.

Apparently RDBMS being a problem on Btrfs is neither known nor documented 
enough (he’s right about the contrast with claiming publicly that Btrfs is 
indeed production ready).

My view on this is that having one filesystem to rule them all (all storage 
technologies, all use cases) is unrealistic. Also the time when you could put 
your NAS on an old i386 with 3MB of RAM is over. Compression, checksumming, 
COW, snapshotting, quotas, etc. are all computationally intensive features. In 
2015 block storage has become computationally intensive. How about saying 
non-root Btrfs RAID10 is the best choice for a Samba NAS on rotational-HDDs 
with no SMR (my use case)? For root and RDBMS, I use ext4 on a M.2 SSD and with 
a sane initramfs and the most recent stable kernel. I am happy with the 
performances and delighted with the features Btrfs provides. I think it is much 
more productive to document and compare the most successful Btrfs deployments 
rather than trying to find bugs and bottlenecks for use cases that are the 
development focus of other filesystems. I don’t know, I might not make a lot of 
sense here but on top of refactoring the Gotchas, I would be happy to start a 
successful deployment story section on the wiki and maybe improve my usage of 
Btrfs along the way (who else here is using Btrfs in a similar fashion?).

> His points about the degree of performance jitter are valid however, as are 
> the complaints of apparent CPU intensive stalls in the BTRFS code, and I 
> occasionally see both on my own systems.

Me too. My two cents is that focusing on improving performances for 
Btrfs-optimal use cases is much more interesting than bringing new features 
like automatically turning COW off for RDBMS usage or debugging TRIM support.

Vincent

signature.asc
Description: Message signed with OpenPGP using GPGMail

Re: FYIO: A rant about btrfs

2015-09-16 Thread Martin Tippmann

Hi,

2015-09-16 17:20 GMT+02:00 Austin S Hemmelgarn :
[...]
> 3. He's testing it for a workload is a known and documented problem for
> BTRFS, and claiming that that means that it isn't worth considering as a
> general usage filesystem.  Most people don't run RDBMS servers on their
> systems, and as such, such a workload is not worth considering for most
> people.
>
> His points about the degree of performance jitter are valid however, as are
> the complaints of apparent CPU intensive stalls in the BTRFS code, and I
> occasionally see both on my own systems.

Are there any conceptual or design properties that make btrfs perform
worse in this workload compared to ZFS or F2FS that both do CoW and
share some similarity?

If not then I still think (from a user perspective) it it still
interesting too look where the difference is coming from? There is
also a quite old talk with some unfortunate numbers for btrfs from the
XFS developer Dave Chinner:
https://www.youtube.com/watch?v=FegjLbCnoBw - is this already
resolved? 2012 is 3 years ago.

>From reading the list I understand that btrfs is still very much work
in progress and performance is not a top priority at this stage but I
don't see why it shouldn't perform at least equally good as ZFS/F2FS
on the same workloads. Is looking at performance problems on the
development roadmap?

regards
Martin
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: FYIO: A rant about btrfs

2015-09-16 Thread Zia Nayamuth


Some response to your criticism:

1. How would that hole fare with a fully battery-backed/flash-backed 
path (battery-backed or flash-backed HBA with disks with full power-loss 
protection, like the Intel S3500)? In such a situation (quite 
commonplace in server-land), power-loss should not cause any data loss 
since all data in the cache is guaranteed to be committed to 
non-volatile memory at some point (whether such assurances may be 
trusted is another matter entirely though, and well outside the scope of 
this discussion).


2. Fair point. I'd like to know his hardware, given how strongly 
hardware can influence things.


3. It's pretty obvious that the author of that blog is specifically 
targeting OLTP performance (explicit statement in intro, choice of 
benchmark, name and focus of blog), not common-case, and even states 
that in the first two paragraphs of his conclusion. The focus is 
somewhat less clear in said conclusion, namely, is he truly talking 
about general purpose use or is he talking about general purpose OLTP use?


--
Zia Nayamuth

On 17/09/2015 01:20, Austin S Hemmelgarn wrote:

On 2015-09-16 10:43, M G Berberich wrote:

Hello,

just for information. I stumbled about a rant about btrfs-performance:

http://blog.pgaddict.com/posts/friends-dont-let-friends-use-btrfs-for-oltp

MfG
bmg

It is worth noting a few things that were done incorrectly in this 
testing:
1. _NEVER_ turn off write barriers (nobarrier mount option), doing so 
subtly breaks the data integrity guarantees of _ALL_ filesystems, but 
especially so on COW filesystems like BTRFS. With this off, you will 
have a much higher chance that a power loss will cause data loss.  It 
shouldn't be turned off unless you are also turning off write-caching 
in the hardware or know for certain that no write-reordering is done 
by the hardware (and almost all modern hardware does write-reordering 
for performance reasons).
2. He provides no comparison of any other filesystem with TRIM support 
turned on (it is very likely that all filesystems will demonstrate 
such performance drops.  Based on that graph, it looks like the device 
doesn't support asynchronous trim commands).
3. He's testing it for a workload is a known and documented problem 
for BTRFS, and claiming that that means that it isn't worth 
considering as a general usage filesystem.  Most people don't run 
RDBMS servers on their systems, and as such, such a workload is not 
worth considering for most people.


His points about the degree of performance jitter are valid however, 
as are the complaints of apparent CPU intensive stalls in the BTRFS 
code, and I occasionally see both on my own systems.




--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: FYIO: A rant about btrfs

2015-09-16 Thread Austin S Hemmelgarn


On 2015-09-16 10:43, M G Berberich wrote:

Hello,

just for information. I stumbled about a rant about btrfs-performance:

  http://blog.pgaddict.com/posts/friends-dont-let-friends-use-btrfs-for-oltp

MfG
bmg


It is worth noting a few things that were done incorrectly in this testing:
1. _NEVER_ turn off write barriers (nobarrier mount option), doing so 
subtly breaks the data integrity guarantees of _ALL_ filesystems, but 
especially so on COW filesystems like BTRFS.  With this off, you will 
have a much higher chance that a power loss will cause data loss.  It 
shouldn't be turned off unless you are also turning off write-caching in 
the hardware or know for certain that no write-reordering is done by the 
hardware (and almost all modern hardware does write-reordering for 
performance reasons).
2. He provides no comparison of any other filesystem with TRIM support 
turned on (it is very likely that all filesystems will demonstrate such 
performance drops.  Based on that graph, it looks like the device 
doesn't support asynchronous trim commands).
3. He's testing it for a workload is a known and documented problem for 
BTRFS, and claiming that that means that it isn't worth considering as a 
general usage filesystem.  Most people don't run RDBMS servers on their 
systems, and as such, such a workload is not worth considering for most 
people.


His points about the degree of performance jitter are valid however, as 
are the complaints of apparent CPU intensive stalls in the BTRFS code, 
and I occasionally see both on my own systems.




smime.p7s
Description: S/MIME Cryptographic Signature

Re: FYIO: A rant about btrfs

Re: FYIO: A rant about btrfs

Re: FYIO: A rant about btrfs

Re: FYIO: A rant about btrfs

Re: FYIO: A rant about btrfs

Re: FYIO: A rant about btrfs

Re: FYIO: A rant about btrfs

Re: FYIO: A rant about btrfs

Re: FYIO: A rant about btrfs

Re: FYIO: A rant about btrfs

Re: FYIO: A rant about btrfs

Re: FYIO: A rant about btrfs

Re: FYIO: A rant about btrfs

Re: FYIO: A rant about btrfs

Re: FYIO: A rant about btrfs

Re: FYIO: A rant about btrfs

Re: FYIO: A rant about btrfs

Re: FYIO: A rant about btrfs

Re: FYIO: A rant about btrfs

Re: FYIO: A rant about btrfs

Re: FYIO: A rant about btrfs

Re: FYIO: A rant about btrfs

Re: FYIO: A rant about btrfs

23 matches

Site Navigation

Mail list logo

Footer information