On 2017-03-30 11:55, Peter Grandi wrote:
My guess is that very complex risky slow operations like that are
provided by "clever" filesystem developers for "marketing" purposes,
to win box-ticking competitions. That applies to those system
developers who do know better; I suspect that even some filesystem
developers are "optimistic" as to what they can actually achieve.
There are cases where there really is no other sane option. Not
everyone has the kind of budget needed for proper HA setups,
Thnaks for letting me know, that must have never occurred to me, just as
it must have never occurred to me that some people expect extremely
advanced features that imply big-budget high-IOPS high-reliability
storage to be fast and reliable on small-budget storage too :-)
You're missing my point (or intentionally ignoring it). Those types of
operations are implemented because there are use cases that actually
need them, not because some developer thought it would be cool. The one
possible counter-example of this is XFS, which doesn't support shrinking
the filesystem at all, but that was a conscious decision because their
target use case (very large scale data storage) does not need that
feature and not implementing it allows them to make certain other parts
of the filesystem faster.
and if you need maximal uptime and as a result have to reprovision the
system online, then you pretty much need a filesystem that supports
online shrinking.
That's a bigger topic than we can address here. The topic used to be
known in one related domain as "Very Large Databases", which were
defined as databases so large and critical that they the time needed for
maintenance and backup were too slow for taking them them offline etc.;
that is a topics that has largely vanished for discussion, I guess
because most management just don't want to hear it :-).
No, it's mostly vanished because of changes in best current practice.
That was a topic in an era where the only platform that could handle
high-availability was VMS, and software wasn't routinely written to
handle things like load balancing. As a result, people ran a single
system which hosted the database, and if that went down, everything went
down. By contrast, it's rare these days outside of small companies to
see singly hosted databases that aren't specific to the local system,
and once you start parallelizing on the system level, backup and
maintenance times generally go down.
Also, it's not really all that slow on most filesystem, BTRFS is just
hurt by it's comparatively poor performance, and the COW metadata
updates that are needed.
Btrfs in realistic situations has pretty good speed *and* performance,
and COW actually helps, as it often results in less head repositioning
than update-in-place. What makes it a bit slower with metadata is having
'dup' by default to recover from especially damaging bitflips in
metadata, but then that does not impact performance, only speed.
I and numerous other people have done benchmarks running single metadata
and single data profiles on BTRFS, and it consistently performs worse
than XFS and ext4 even under those circumstances. It's not horrible
performance (it's better for example than trying the same workload on
NTFS on Windows), but it's still not what most people would call 'high'
performance or speed.
That feature set is arguably not appropriate for VM images, but
lots of people know better :-).
That depends on a lot of factors. I have no issues personally running
small VM images on BTRFS, but I'm also running on decent SSD's
(>500MB/s read and write speeds), using sparse files, and keeping on
top of managing them. [ ... ]
Having (relatively) big-budget high-IOPS storage for high-IOPS workloads
helps, that must have never occurred to me either :-).
It's not big budget, the SSD's in question are at best mid-range
consumer SSD's that cost only marginally more than a decent hard drive,
and they really don't get all that great performance in terms of IOPS
because they're all on the same cheap SATA controller. The point I was
trying to make (which I should have been clearer about) is that they
have good bulk throughput, which means that the OS can do much more
aggressive writeback caching, which in turn means that COW and
fragmentation have less impact.
XFS and 'ext4' are essentially equivalent, except for the fixed-size
inode table limitation of 'ext4' (and XFS reportedly has finer
grained locking). Btrfs is nearly as good as either on most workloads
is single-device mode [ ... ]
No, if you look at actual data, [ ... ]
Well, I have looked at actual data in many published but often poorly
made "benchmarks", and to me they seem they seem quite equivalent
indeed, within somewhat differently shaped performance envelopes, so the
results depend on the testing point within that envelope. I have been
done my own simplistic actual data gathering, most recently here:
http://www.sabi.co.uk/blog/17-one.html?170302#170302
http://www.sabi.co.uk/blog/17-one.html?170228#170228
and however simplistic they are fairly informative (and for writes they
point a finger at a layer below the filesystem type).
In terms of performance, yes they are roughly equivalent. Performance
isn't all that matters though, and once you get that point, ext4 and XFS
are significantly different in what they offer.
[ ... ]
"Flexibility" in filesystems, especially on rotating disk
storage with extremely anisotropic performance envelopes, is
very expensive, but of course lots of people know better :-).
Time is not free,
Your time seems especially and uniquely precious as you "waste"
as little as possible editing your replies into readability.
and humans generally prefer to minimize the amount of time they have
to work on things. This is why ZFS is so popular, it handles most
errors correctly by itself and usually requires very little human
intervention for maintenance.
That seems to me a pretty illusion, as it does not contain any magical
AI, just pretty ordinary and limited error correction for trivial cases.
On average, trivial cases account for most errors in any computer. So,
by definition, to handle most errors correctly, you can get by with just
handling all 'trivial' cases correctly. By handling all trivial cases
correctly, ZFS is doing far better than any other current filesystem or
storage stack can even begin to claim. It's been doing this since
before most modern Linux distributions made their first release too, so
compared to just about anything else people are using these days, it's
got a pretty solid track record. Anyone trying to claim it's the best
option in any case is obviously either a zealot or being paid, but for
many cases, it really is one of the top options.
'Flexibility' in a filesystem costs some time on a regular basis, but
can save a huge amount of time in the long run.
Like everything else. The difficulty is having flexibility at scale with
challenging workloads. "An engineer can do for a nickel what any damn
fool can do for a dollar" :-).
To look at it another way, I have a home server system running BTRFS
on top of LVM. [ ... ]
But usually home servers have "unchallenging" workloads, and it is
relatively easy to overbudget their storage, because the total absolute
cost is "affordable".
OK, so running
* Almost a dozen statically allocated VM's with a variety of
differing workloads including web-servers, a local mail server, DHCP and
DNS for the network, a VPN server, and 3 different file sharing
protocols (which see rather regular use) among other things
* On average between 4 and 10 transient VM's running regression
testing on kernel patches (including automation of almost everything but
selecting patches)
* A BOINC client
* GlusterFS (both client and storage node)
* Network security monitoring (Nagios plus a handful of custom scripts)
* Cloud storage software
All on the same system is an 'unchallenging' workload. Given the fact
that it's only got 32G of RAM and a cheap quad-core Xeon, that's a
pretty damn challenging workload by most people standards. I call it a
home server because I run it out of my house, not because it's some
trivial dinky little file server that could run just fine on something
like a Raspberry Pi.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html