Re: Is stability a joke?

Nicholas D Steeves Wed, 14 Sep 2016 18:06:44 -0700

On Mon, Sep 12, 2016 at 08:20:20AM -0400, Austin S. Hemmelgarn wrote:
> On 2016-09-11 09:02, Hugo Mills wrote:
> >On Sun, Sep 11, 2016 at 02:39:14PM +0200, Waxhead wrote:
> >>Martin Steigerwald wrote:
> >>>Am Sonntag, 11. September 2016, 13:43:59 CEST schrieb Martin Steigerwald:
> >>>>>>Thing is: This just seems to be when has a feature been implemented
> >>>>>>matrix.
> >>>>>>Not when it is considered to be stable. I think this could be done with
> >>>>>>colors or so. Like red for not supported, yellow for implemented and
> >>>>>>green for production ready.
> >>>>>Exactly, just like the Nouveau matrix. It clearly shows what you can
> >>>>>expect from it.
> >>>I mentioned this matrix as a good *starting* point. And I think it would be
> >>>easy to extent it:
> >>>
> >>>Just add another column called "Production ready". Then research / ask 
> >>>about
> >>>production stability of each feature. The only challenge is: Who is
> >>>authoritative on that? I´d certainly ask the developer of a feature, but 
> >>>I´d
> >>>also consider user reports to some extent.
> >>>
> >>>Maybe thats the real challenge.
> >>>
> >>>If you wish, I´d go through each feature there and give my own estimation. 
> >>>But
> >>>I think there are others who are deeper into this.
> >>That is exactly the same reason I don't edit the wiki myself. I
> >>could of course get it started and hopefully someone will correct
> >>what I write, but I feel that if I start this off I don't have deep
> >>enough knowledge to do a proper start. Perhaps I will change my mind
> >>about this.
> >
> >   Given that nobody else has done it yet, what are the odds that
> >someone else will step up to do it now? I would say that you should at
> >least try. Yes, you don't have as much knowledge as some others, but
> >if you keep working at it, you'll gain that knowledge. Yes, you'll
> >probably get it wrong to start with, but you probably won't get it
> >*very* wrong. You'll probably get it horribly wrong at some point, but
> >even the more knowledgable people you're deferring to didn't identify
> >the problems with parity RAID until Zygo and Austin and Chris (and
> >others) put in the work to pin down the exact issues.
> FWIW, here's a list of what I personally consider stable (as in, I'm willing
> to bet against reduced uptime to use this stuff on production systems at
> work and personal systems at home):
> 1. Single device mode, including DUP data profiles on single device without
> mixed-bg.
> 2. Multi-device raid0, raid1, and raid10 profiles with symmetrical devices
> (all devices are the same size).
> 3. Multi-device single profiles with asymmetrical devices.
> 4. Small numbers (max double digit) of snapshots, taken at infrequent
> intervals (no more than once an hour).  I use single snapshots regularly to
> get stable images of the filesystem for backups, and I keep hourly ones of
> my home directory for about 48 hours.
> 5. Subvolumes used to isolate parts of a filesystem from snapshots.  I use
> this regularly to isolate areas of my filesystems from backups.
> 6. Non-incremental send/receive (no clone source, no parent's, no
> deduplication).  I use this regularly for cloning virtual machines.
> 7. Checksumming and scrubs using any of the profiles I've listed above.
> 8. Defragmentation, including autodefrag.
> 9. All of the compat_features, including no-holes and skinny-metadata.
> 
> Things I consider stable enough that I'm willing to use them on my personal
> systems but not systems at work:
> 1. In-line data compression with compress=lzo.  I use this on my laptop and
> home server system.  I've never had any issues with it myself, but I know
> that other people have, and it does seem to make other things more likely to
> have issues.
> 2. Batch deduplication.  I only use this on the back-end filesystems for my
> personal storage cluster, and only because I have multiple copies as a
> result of GlusterFS on top of BTRFS.  I've not had any significant issues
> with it, and I don't remember any reports of data loss resulting from it,
> but it's something that people should not be using if they don't understand
> all the implications.
> 
> Things that I don't consider stable but some people do:
> 1. Quotas and qgroups.  Some people (such as SUSE) consider these to be
> stable.  There are a couple of known issues with them still however (such as
> returning the wrong errno when a quota is hit (should be returning -EDQUOT,
> instead returns -ENOSPC)).
> 2. RAID5/6.  There are a few people who use this, but it's generally agreed
> to be unstable.  There are still at least 3 known bugs which can cause
> complete loss of a filesystem, and there's also a known issue with rebuilds
> taking insanely long, which puts data at risk as well.
> 3. Multi device filesystems with asymmetrical devices running raid0, raid1,
> or raid10.  The issue I have here is that it's much easier to hit errors
> regarding free space than a reliable system should be.  It's possible to
> avoid with careful planning (for example, a 3 disk raid1 profile with 1 disk
> exactly twice the size of the other two will work fine, albeit with more
> load on the larger disk).
> 
...
> As far as documentation though, we [BTRFS] really do need to get our act
> together.  It really doesn't look good to have most of the best
> documentation be in the distro's wikis instead of ours.  I'm not trying to
> say the distros shouldn't be documenting BTRFS, but the point at which
> Debian (for example) has better documentation of the upstream version of
> BTRFS than the upstream project itself does, that starts to look bad.


I would have loved to have this feature-to-stability list when I
started working on the Debian documentation!  I started it because I
was saddened by number of horror story "adventures with btrfs"
articles and posts I had read about, combined with the perspective of
certain members within the Debian community that it was a toy fs.

Are my contributions to that wiki of a high enough quality that I
can work on the upstream one?  Do you think the broader btrfs
community is interested in citations and curated links to discussions?

eg: if a company wants to use btrfs, they check the status page, see a
feature they want is still in the yellow zone of stabilisation, and
then follow the links to familiarise themselves with past discussions.
I imagine this would also help individuals or grad students more
quickly familiarise themselves with the available literature before
choosing a specific project.  If regular updates from SUSE, STRATO,
Facebook, and Fujitsu are also publicly available the k.org wiki would
be a wonderful place to syndicate them!

Sincerely,
Nicholas

> >
> >   So, go for it. You have a lot to offer the community.
> >
> >   Hugo.
> >
> >>>I do think for example that scrubbing and auto raid repair are stable, 
> >>>except
> >>>for RAID 5/6. Also device statistics and RAID 0 and 1 I consider to be 
> >>>stable.
> >>>I think RAID 10 is also stable, but as I do not run it, I don´t know. For 
> >>>me
> >>>also skinny-metadata is stable. For me so far even compress=lzo seems to be
> >>>stable, but well for others it may not.
> >>>
> >>>Since what kernel version? Now, there you go. I have no idea. All I know I
> >>>started BTRFS with Kernel 2.6.38 or 2.6.39 on my laptop, but not as RAID 1 
> >>>at
> >>>that time.
> >>>
> >>>See, the implementation time of a feature is much easier to assess. Maybe
> >>>thats part of the reason why there is not stability matrix: Maybe no one
> >>>*exactly* knows *for sure*. How could you? So I would even put a footnote 
> >>>on
> >>>that "production ready" row explaining "Considered to be stable by 
> >>>developer
> >>>and user oppinions".
> >>>
> >>>Of course additionally it would be good to read about experiences of 
> >>>corporate
> >>>usage of BTRFS. I know at least Fujitsu, SUSE, Facebook, Oracle are using 
> >>>it.
> >>>But I don´t know in what configurations and with what experiences. One 
> >>>Oracle
> >>>developer invests a lot of time to bring BTRFS like features to XFS and 
> >>>RedHat
> >>>still favors XFS over BTRFS, even SLES defaults to XFS for /home and other 
> >>>non
> >>>/-filesystems. That also tells a story.
> >>>
> >>>Some ideas you can get from SUSE releasenotes. Even if you do not want to 
> >>>use
> >>>it, it tells something and I bet is one of the better sources of 
> >>>information
> >>>regarding your question you can get at this time. Cause I believe SUSE
> >>>developers invested some time to assess the stability of features. Cause 
> >>>they
> >>>would carefully assess what they can support in enterprise environments. 
> >>>There
> >>>is also someone from Fujitsu who shared experiences in a talk, I can search
> >>>the URL to the slides again.
> >>By all means, SUSE's wiki is very valuable. I just said that I
> >>*prefer* to have that stuff on the BTRFS wiki and feel that is the
> >>right place for it.
> >>>
> >>>I bet Chris Mason and other BTRFS developers at Facebook have some idea on
> >>>what they use within Facebook as well. To what extent they are allowed to 
> >>>talk
> >>>about it… I don´t know. My personal impression is that as soon as Chris 
> >>>went
> >>>to Facebook he became quite quiet. Maybe just due to being busy. Maybe due 
> >>>to
> >>>Facebook being concerned much more about the privacy of itself than of its
> >>>users.
> >>>
> >>>Thanks,
> >>
> >
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

signature.asc
Description: Digital signature

Re: Is stability a joke?

Reply via email to