Re: Is stability a joke? (wiki updated)

2016-09-19 Thread Zygo Blaxell
On Mon, Sep 19, 2016 at 01:38:36PM -0400, Austin S. Hemmelgarn wrote:
> >>I'm not sure if the brfsck is really all that helpful to user as much
> >>as it is for developers to better learn about the failure vectors of
> >>the file system.
> >
> >ReiserFS had no working fsck for all of the 8 years I used it (and still
> >didn't last year when I tried to use it on an old disk).  "Not working"
> >here means "much less data is readable from the filesystem after running
> >fsck than before."  It's not that much of an inconvenience if you have
> >backups.
> For a small array, this may be the case.  Once you start looking into double
> digit TB scale arrays though, restoring backups becomes a very expensive
> operation.  If you had a multi-PB array with a single dentry which had no
> inode, would you rather be spending multiple days restoring files and
> possibly losing recent changes, or spend a few hours to check the filesystem
> and fix it with minimal data loss?

I'd really prefer to be able to delete the dead dentry with 'rm' as root,
or failing that, with a ZDB-like tool or ioctl, if it's the only known
instance of such a bad metadata object and I already know where it's
located.

Usually the ultimate failure mode of a btrfs filesystem is a read-only
filesystem from which you can read most or all of your data, but you
can't ever make it writable again because of fsck limitations.

The one thing I do miss about every filesystem that isn't ext2/ext3 is
automated fsck that prioritizes availability, making the filesystem
safely writable even if it can't recover lost data.  On the other
hand, fixing an ext[23] filesystem is utterly trivial compared to any
btree-based filesystem.



signature.asc
Description: Digital signature


Re: Is stability a joke? (wiki updated)

2016-09-19 Thread Austin S. Hemmelgarn

On 2016-09-19 14:27, Chris Murphy wrote:

On Mon, Sep 19, 2016 at 11:38 AM, Austin S. Hemmelgarn
 wrote:

ReiserFS had no working fsck for all of the 8 years I used it (and still
didn't last year when I tried to use it on an old disk).  "Not working"
here means "much less data is readable from the filesystem after running
fsck than before."  It's not that much of an inconvenience if you have
backups.


For a small array, this may be the case.  Once you start looking into double
digit TB scale arrays though, restoring backups becomes a very expensive
operation.  If you had a multi-PB array with a single dentry which had no
inode, would you rather be spending multiple days restoring files and
possibly losing recent changes, or spend a few hours to check the filesystem
and fix it with minimal data loss?


Yep restoring backups, even fully re-replicating data in a cluster, is
untenable it's so expensive. But even offline fsck is sufficiently
non-scalable that at a certain volume size it's not tenable. 100TB
takes a long time to fsck offline, and is it even possible to fsck 1PB
Btrfs? Seems to me it's another case were if it were possible to
isolate what tree limbs are sick, just cut them off and report the
data loss rather than consider the whole fs unusable. That's what we
do with living things.

This is part of why I said the ZFS approach is valid.  At the moment 
though, we can't even do that, and to do it properly, we'd need a tool 
to bypass the VFS layer to prune the tree, which is non-trivial in and 
of itself.  It would be nice to have a mode in check where you could say 
'I know this path in the FS has some kind of issue, figure out what's 
wrong and fix it if possible, otherwise optionally prune that branch 
from the appropriate tree'.  On the same note, it would be nice to be 
able to manually restrict it to specific checks (eg, 'check only for 
orphaned inodes', or 'only validate the FSC/FST').  If we were to add 
such functionality, dealing with some minor corruption in a 100TB+ array 
wouldn't be quite as much of an issue.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Is stability a joke? (wiki updated)

2016-09-19 Thread Chris Murphy
On Mon, Sep 19, 2016 at 11:38 AM, Austin S. Hemmelgarn
 wrote:
>> ReiserFS had no working fsck for all of the 8 years I used it (and still
>> didn't last year when I tried to use it on an old disk).  "Not working"
>> here means "much less data is readable from the filesystem after running
>> fsck than before."  It's not that much of an inconvenience if you have
>> backups.
>
> For a small array, this may be the case.  Once you start looking into double
> digit TB scale arrays though, restoring backups becomes a very expensive
> operation.  If you had a multi-PB array with a single dentry which had no
> inode, would you rather be spending multiple days restoring files and
> possibly losing recent changes, or spend a few hours to check the filesystem
> and fix it with minimal data loss?

Yep restoring backups, even fully re-replicating data in a cluster, is
untenable it's so expensive. But even offline fsck is sufficiently
non-scalable that at a certain volume size it's not tenable. 100TB
takes a long time to fsck offline, and is it even possible to fsck 1PB
Btrfs? Seems to me it's another case were if it were possible to
isolate what tree limbs are sick, just cut them off and report the
data loss rather than consider the whole fs unusable. That's what we
do with living things.


-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Is stability a joke? (wiki updated)

2016-09-19 Thread Austin S. Hemmelgarn

On 2016-09-19 00:08, Zygo Blaxell wrote:

On Thu, Sep 15, 2016 at 01:02:43PM -0600, Chris Murphy wrote:

Right, well I'm vaguely curious why ZFS, as different as it is,
basically take the position that if the hardware went so batshit that
they can't unwind it on a normal mount, then an fsck probably can't
help either... they still don't have an fsck and don't appear to want
one.


ZFS has no automated fsck, but it does have a kind of interactive
debugger that can be used to manually fix things.

ZFS seems to be a lot more robust when it comes to handling bad metadata
(contrast with btrfs-style BUG_ON panics).

When you delete a directory entry that has a missing inode on ZFS,
the dirent goes away.  In the ZFS administrator documentation they give
examples of this as a response in cases where ZFS metadata gets corrupted.

When you delete a file with a missing inode on btrfs, something
(VFS?) wants to check the inode to see if it has attributes that might
affect unlink (e.g. the immutable bit), gets an error reading the
inode, and bombs out of the unlink() before unlink() can get rid of the
dead dirent.  So if you get a dirent with no inode on btrfs on a large
filesystem (too large for btrfs check to handle), you're basically stuck
with it forever.  You can't even rename it.  Hopefully it doesn't happen
in a top-level directory.

ZFS is also infamous for saying "sucks to be you, I'm outta here" when
things go wrong.  People do want ZFS fsck and defrag, but nobody seems
to be bothered much about making those things happen.

At the end of the day I'm not sure fsck really matters.  If the filesystem
is getting corrupted enough that both copies of metadata are broken,
there's something fundamentally wrong with that setup (hardware bugs,
software bugs, bad RAM, etc) and it's just going to keep slowly eating
more data until the underlying problem is fixed, and there's no guarantee
that a repair is going to restore data correctly.  If we exclude broken
hardware, the only thing btrfs check is going to repair is btrfs kernel
bugs...and in that case, why would we expect btrfs check to have fewer
bugs than the filesystem itself?
I wouldn't, but I would still expect to have some tool to deal with 
things like orphaned inodes, dentries which are missing inodes, and 
other similar cases that don't make the filesystem unusable, but can't 
easily be fixed in a sane manner on a live filesystem.  The ZFS approach 
is valid, but it can't deal with things like orphaned inodes where 
there's no reference in the directories any more.



I'm not sure if the brfsck is really all that helpful to user as much
as it is for developers to better learn about the failure vectors of
the file system.


ReiserFS had no working fsck for all of the 8 years I used it (and still
didn't last year when I tried to use it on an old disk).  "Not working"
here means "much less data is readable from the filesystem after running
fsck than before."  It's not that much of an inconvenience if you have
backups.
For a small array, this may be the case.  Once you start looking into 
double digit TB scale arrays though, restoring backups becomes a very 
expensive operation.  If you had a multi-PB array with a single dentry 
which had no inode, would you rather be spending multiple days restoring 
files and possibly losing recent changes, or spend a few hours to check 
the filesystem and fix it with minimal data loss?

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Is stability a joke? (wiki updated)

2016-09-19 Thread Zygo Blaxell
On Mon, Sep 19, 2016 at 08:32:14AM -0400, Austin S. Hemmelgarn wrote:
> On 2016-09-18 23:47, Zygo Blaxell wrote:
> >On Mon, Sep 12, 2016 at 12:56:03PM -0400, Austin S. Hemmelgarn wrote:
> >>4. File Range Cloning and Out-of-band Dedupe: Similarly, work fine if the FS
> >>is healthy.
> >
> >I've found issues with OOB dedup (clone/extent-same):
> >
> >1.  Don't dedup data that has not been committed--either call fsync()
> >on it, or check the generation numbers on each extent before deduping
> >it, or make sure the data is not being actively modified during dedup;
> >otherwise, a race condition may lead to the the filesystem locking up and
> >becoming inaccessible until the kernel is rebooted.  This is particularly
> >important if you are doing bedup-style incremental dedup on a live system.
> >
> >I've worked around #1 by placing a fsync() call on the src FD immediately
> >before calling FILE_EXTENT_SAME.  When I do an A/B experiment with and
> >without the fsync, "with-fsync" runs for weeks at a time without issues,
> >while "without-fsync" hangs, sometimes in just a matter of hours.  Note
> >that the fsync() doesn't resolve the underlying race condition, it just
> >makes the filesystem hang less often.
> >
> >2.  There is a practical limit to the number of times a single duplicate
> >extent can be deduplicated.  As more references to a shared extent
> >are created, any part of the filesystem that uses backref walking code
> >gets slower.  This includes dedup itself, balance, device replace/delete,
> >FIEMAP, LOGICAL_INO, and mmap() (which can be bad news if the duplicate
> >files are executables).  Several factors (including file size and number
> >of snapshots) are involved, making it difficult to devise workarounds or
> >set up test cases.  99.5% of the time, these operations just get slower
> >by a few ms each time a new reference is created, but the other 0.5% of
> >the time, write operations will abruptly grow to consume hours of CPU
> >time or dozens of gigabytes of RAM (in millions of kmalloc-32 slabs)
> >when they touch one of these over-shared extents.  When this occurs,
> >it effectively (but not literally) crashes the host machine.
> >
> >I've worked around #2 by building tables of "toxic" hashes that occur too
> >frequently in a filesystem to be deduped, and using these tables in dedup
> >software to ignore any duplicate data matching them.  These tables can
> >be relatively small as they only need to list hashes that are repeated
> >more than a few thousand times, and typical filesystems (up to 10TB or
> >so) have only a few hundred such hashes.
> >
> >I happened to have a couple of machines taken down by these issues this
> >very weekend, so I can confirm the issues are present in kernels 4.4.21,
> >4.5.7, and 4.7.4.
> OK, that's good to know.  In my case, I'm not operating on a very big data
> set (less than 40GB, but the storage cluster I'm doing this on only has
> about 200GB of total space, so I'm trying to conserve as much as possible),
> and it's mostly static data (less than 100MB worth of changes a day except
> on Sunday when I run backups), so it makes sense that I've not seen either
> of these issues.

I ran into issue #2 on an 8GB filesystem last weekend.  The lower limit
on filesystem size could be as low as a few megabytes if they're arranged
in *just* the right way.

> The second one sounds like the same performance issue caused by having very
> large numbers of snapshots, and based on what's happening, I don't think
> there's any way we could fix it without rewriting certain core code.

find_parent_nodes is the usual culprit for CPU usage.  Fixing this is
required for in-band dedup as well, so I assume someone has it on their
roadmap and will get it done eventually.



signature.asc
Description: Digital signature


Re: Is stability a joke? (wiki updated)

2016-09-19 Thread Sean Greenslade
On Mon, Sep 19, 2016 at 12:08:55AM -0400, Zygo Blaxell wrote:
> 
> At the end of the day I'm not sure fsck really matters.  If the filesystem
> is getting corrupted enough that both copies of metadata are broken,
> there's something fundamentally wrong with that setup (hardware bugs,
> software bugs, bad RAM, etc) and it's just going to keep slowly eating
> more data until the underlying problem is fixed, and there's no guarantee
> that a repair is going to restore data correctly.  If we exclude broken
> hardware, the only thing btrfs check is going to repair is btrfs kernel
> bugs...and in that case, why would we expect btrfs check to have fewer
> bugs than the filesystem itself?

I see btrfs check as having a very useful role: fixing known problems
introduced by previous versions of kernel / progs. In my ext conversion
thread, I seem to have discovered a problem introduced by convert,
balance, or defrag. The data and metadata seem to be OK, however the
filesystem cannot be written to without btrfs falling over. If this was
caused by some edge-case data in the btrfs partition, it makes a lot
more sense to have btrfs check repair it than it does to modify the
kernel code to work around this and possibly many other bugs. The upshot
to this is that since (potentially all of) the data is intact, a
functional btrfs check would save me the hassle of restoring from
backup.

--Sean

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Is stability a joke? (wiki updated)

2016-09-19 Thread Austin S. Hemmelgarn

On 2016-09-18 22:57, Zygo Blaxell wrote:

On Fri, Sep 16, 2016 at 08:00:44AM -0400, Austin S. Hemmelgarn wrote:

To be entirely honest, both zero-log and super-recover could probably be
pretty easily integrated into btrfs check such that it detects when they
need to be run and does so.  zero-log has a very well defined situation in
which it's absolutely needed (log tree corrupted such that it can't be
replayed), which is pretty easy to detect (the kernel obviously does so,
albeit by crashing).


Check already includes zero-log.  It loses a little data that way, but
that is probably better than the alternative (try to teach btrfs check
how to replay the log tree and keep up with kernel changes).
Interesting, as I've never seen check try to zero the log (even in cases 
where it would fix things) unless it makes some other change in the FS. 
I won't dispute that it clears the log tree _if_ it makes other changes 
to the FS (it kind of has to for safety reasons), but that's the only 
circumstance that I've seen it do so (even on filesystems where the log 
tree was corrupted, but the rest of the FS was fine).


There have been at least two log-tree bugs (or, more accurately,
bugs triggered while processing the log tree during mount) in the 3.x
and 4.x kernels.  The most recent I've encountered was in one of the
4.7-rc kernels.  zero-log is certainly not obsolete.
I won't dispute this, as I've had it happen myself (albeit not quite 
that recently), all I was trying to say was that it fixes a very well 
defined problem.


For a filesystem where availablity is more important than integrity
(e.g. root filesystems) it's really handy to have zero-log as a separate
tool without the huge overhead (and regression risk) of check.
Agreed, hence my later statement that if it gets fully merged, there 
should be an option to run just that.


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Is stability a joke? (wiki updated)

2016-09-19 Thread Austin S. Hemmelgarn

On 2016-09-18 23:47, Zygo Blaxell wrote:

On Mon, Sep 12, 2016 at 12:56:03PM -0400, Austin S. Hemmelgarn wrote:

4. File Range Cloning and Out-of-band Dedupe: Similarly, work fine if the FS
is healthy.


I've found issues with OOB dedup (clone/extent-same):

1.  Don't dedup data that has not been committed--either call fsync()
on it, or check the generation numbers on each extent before deduping
it, or make sure the data is not being actively modified during dedup;
otherwise, a race condition may lead to the the filesystem locking up and
becoming inaccessible until the kernel is rebooted.  This is particularly
important if you are doing bedup-style incremental dedup on a live system.

I've worked around #1 by placing a fsync() call on the src FD immediately
before calling FILE_EXTENT_SAME.  When I do an A/B experiment with and
without the fsync, "with-fsync" runs for weeks at a time without issues,
while "without-fsync" hangs, sometimes in just a matter of hours.  Note
that the fsync() doesn't resolve the underlying race condition, it just
makes the filesystem hang less often.

2.  There is a practical limit to the number of times a single duplicate
extent can be deduplicated.  As more references to a shared extent
are created, any part of the filesystem that uses backref walking code
gets slower.  This includes dedup itself, balance, device replace/delete,
FIEMAP, LOGICAL_INO, and mmap() (which can be bad news if the duplicate
files are executables).  Several factors (including file size and number
of snapshots) are involved, making it difficult to devise workarounds or
set up test cases.  99.5% of the time, these operations just get slower
by a few ms each time a new reference is created, but the other 0.5% of
the time, write operations will abruptly grow to consume hours of CPU
time or dozens of gigabytes of RAM (in millions of kmalloc-32 slabs)
when they touch one of these over-shared extents.  When this occurs,
it effectively (but not literally) crashes the host machine.

I've worked around #2 by building tables of "toxic" hashes that occur too
frequently in a filesystem to be deduped, and using these tables in dedup
software to ignore any duplicate data matching them.  These tables can
be relatively small as they only need to list hashes that are repeated
more than a few thousand times, and typical filesystems (up to 10TB or
so) have only a few hundred such hashes.

I happened to have a couple of machines taken down by these issues this
very weekend, so I can confirm the issues are present in kernels 4.4.21,
4.5.7, and 4.7.4.
OK, that's good to know.  In my case, I'm not operating on a very big 
data set (less than 40GB, but the storage cluster I'm doing this on only 
has about 200GB of total space, so I'm trying to conserve as much as 
possible), and it's mostly static data (less than 100MB worth of changes 
a day except on Sunday when I run backups), so it makes sense that I've 
not seen either of these issues.


The second one sounds like the same performance issue caused by having 
very large numbers of snapshots, and based on what's happening, I don't 
think there's any way we could fix it without rewriting certain core code.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Is stability a joke? (wiki updated)

2016-09-18 Thread Zygo Blaxell
On Thu, Sep 15, 2016 at 01:02:43PM -0600, Chris Murphy wrote:
> Right, well I'm vaguely curious why ZFS, as different as it is,
> basically take the position that if the hardware went so batshit that
> they can't unwind it on a normal mount, then an fsck probably can't
> help either... they still don't have an fsck and don't appear to want
> one.

ZFS has no automated fsck, but it does have a kind of interactive
debugger that can be used to manually fix things.

ZFS seems to be a lot more robust when it comes to handling bad metadata
(contrast with btrfs-style BUG_ON panics).

When you delete a directory entry that has a missing inode on ZFS,
the dirent goes away.  In the ZFS administrator documentation they give
examples of this as a response in cases where ZFS metadata gets corrupted.

When you delete a file with a missing inode on btrfs, something
(VFS?) wants to check the inode to see if it has attributes that might
affect unlink (e.g. the immutable bit), gets an error reading the
inode, and bombs out of the unlink() before unlink() can get rid of the
dead dirent.  So if you get a dirent with no inode on btrfs on a large
filesystem (too large for btrfs check to handle), you're basically stuck
with it forever.  You can't even rename it.  Hopefully it doesn't happen
in a top-level directory.

ZFS is also infamous for saying "sucks to be you, I'm outta here" when
things go wrong.  People do want ZFS fsck and defrag, but nobody seems
to be bothered much about making those things happen.

At the end of the day I'm not sure fsck really matters.  If the filesystem
is getting corrupted enough that both copies of metadata are broken,
there's something fundamentally wrong with that setup (hardware bugs,
software bugs, bad RAM, etc) and it's just going to keep slowly eating
more data until the underlying problem is fixed, and there's no guarantee
that a repair is going to restore data correctly.  If we exclude broken
hardware, the only thing btrfs check is going to repair is btrfs kernel
bugs...and in that case, why would we expect btrfs check to have fewer
bugs than the filesystem itself?

> I'm not sure if the brfsck is really all that helpful to user as much
> as it is for developers to better learn about the failure vectors of
> the file system.

ReiserFS had no working fsck for all of the 8 years I used it (and still
didn't last year when I tried to use it on an old disk).  "Not working"
here means "much less data is readable from the filesystem after running
fsck than before."  It's not that much of an inconvenience if you have
backups.



signature.asc
Description: Digital signature


Re: Is stability a joke? (wiki updated)

2016-09-18 Thread Zygo Blaxell
On Mon, Sep 12, 2016 at 12:56:03PM -0400, Austin S. Hemmelgarn wrote:
> 4. File Range Cloning and Out-of-band Dedupe: Similarly, work fine if the FS
> is healthy.

I've found issues with OOB dedup (clone/extent-same):

1.  Don't dedup data that has not been committed--either call fsync()
on it, or check the generation numbers on each extent before deduping
it, or make sure the data is not being actively modified during dedup;
otherwise, a race condition may lead to the the filesystem locking up and
becoming inaccessible until the kernel is rebooted.  This is particularly
important if you are doing bedup-style incremental dedup on a live system.

I've worked around #1 by placing a fsync() call on the src FD immediately
before calling FILE_EXTENT_SAME.  When I do an A/B experiment with and
without the fsync, "with-fsync" runs for weeks at a time without issues,
while "without-fsync" hangs, sometimes in just a matter of hours.  Note
that the fsync() doesn't resolve the underlying race condition, it just
makes the filesystem hang less often.

2.  There is a practical limit to the number of times a single duplicate
extent can be deduplicated.  As more references to a shared extent
are created, any part of the filesystem that uses backref walking code
gets slower.  This includes dedup itself, balance, device replace/delete,
FIEMAP, LOGICAL_INO, and mmap() (which can be bad news if the duplicate
files are executables).  Several factors (including file size and number
of snapshots) are involved, making it difficult to devise workarounds or
set up test cases.  99.5% of the time, these operations just get slower
by a few ms each time a new reference is created, but the other 0.5% of
the time, write operations will abruptly grow to consume hours of CPU
time or dozens of gigabytes of RAM (in millions of kmalloc-32 slabs)
when they touch one of these over-shared extents.  When this occurs,
it effectively (but not literally) crashes the host machine.

I've worked around #2 by building tables of "toxic" hashes that occur too
frequently in a filesystem to be deduped, and using these tables in dedup
software to ignore any duplicate data matching them.  These tables can
be relatively small as they only need to list hashes that are repeated
more than a few thousand times, and typical filesystems (up to 10TB or
so) have only a few hundred such hashes.

I happened to have a couple of machines taken down by these issues this
very weekend, so I can confirm the issues are present in kernels 4.4.21,
4.5.7, and 4.7.4.



signature.asc
Description: Digital signature


Re: Is stability a joke? (wiki updated)

2016-09-18 Thread Zygo Blaxell
On Fri, Sep 16, 2016 at 08:00:44AM -0400, Austin S. Hemmelgarn wrote:
> To be entirely honest, both zero-log and super-recover could probably be
> pretty easily integrated into btrfs check such that it detects when they
> need to be run and does so.  zero-log has a very well defined situation in
> which it's absolutely needed (log tree corrupted such that it can't be
> replayed), which is pretty easy to detect (the kernel obviously does so,
> albeit by crashing).  

Check already includes zero-log.  It loses a little data that way, but
that is probably better than the alternative (try to teach btrfs check
how to replay the log tree and keep up with kernel changes).

There have been at least two log-tree bugs (or, more accurately,
bugs triggered while processing the log tree during mount) in the 3.x
and 4.x kernels.  The most recent I've encountered was in one of the
4.7-rc kernels.  zero-log is certainly not obsolete.

For a filesystem where availablity is more important than integrity
(e.g. root filesystems) it's really handy to have zero-log as a separate
tool without the huge overhead (and regression risk) of check.



signature.asc
Description: Digital signature


Re: Is stability a joke? (wiki updated)

2016-09-16 Thread Austin S. Hemmelgarn

On 2016-09-15 17:23, Christoph Anton Mitterer wrote:

On Thu, 2016-09-15 at 14:20 -0400, Austin S. Hemmelgarn wrote:

3. Fsck should be needed only for un-mountable filesystems.  Ideally,
we
should be handling things like Windows does.  Preform slightly
better
checking when reading data, and if we see an error, flag the
filesystem
for expensive repair on the next mount.


That philosophy also has some drawbacks:
- The user doesn't directly that anything went wrong. Thus errors may
even continue to accumulate and getting much worse if the fs would have
immediately gone ro and giving the user the chance to manually
intervene (possibly then with help from upstream).
Except that the fsck implementation in windows for NTFS actually fixes 
things that are broken.  MS policy is 'if chkdsk can't fix it, you need 
to just reinstall and restore from backups'.  They don't beat around the 
bush trying to figure out what exactly went wrong, because 99% of the 
time on Windows a corrupted filesystem means broken hardware or a virus. 
 BTRFS obviously isn't to that point yet, but it has the potential if 
we were to start focusing on fixing stuff that's broken instead of 
working on shiny new features that will inevitably make everything else 
harder to debug, we could probably get there faster than most other 
Linux filesystems.


- Any smart auto-magical™ repair may also just fail (and make things
worse, as the current --repair e.g. may). Not performing such auto-
repair, gives the user at least the possible chance to make a bitwise
copy of the whole fs, before trying any rescue operations.
This wouldn't be the case, if the user never noticed that something
happen, and the fs tries to repair things right at mounting.
People talk about it being dangerous, but I have yet to see it break a 
filesystem that wasn't already in a state that in XFS or ext4 would be 
considered broken beyond repair.  For pretty much all of the common 
cases (orphaned inodes, dangling hardlinks, isize mismatches, etc), it 
does fix things correctly.  Most of that stuff could be optionally 
checked at mount and fixed without causing issues, but it's not 
something that should be done all the time since it's expensive, hence 
me suggesting checking such things dynamically on-access and flagging 
them for cleanup next mount.


So I think any such auto-repair should be used with extreme caution and
only in those cases where one is absolutely a 100% sure that the action
will help and just do good.
In general, I agree with this, and I'd say it should be opt-in, not 
mandatory.  I'm not talking about doing things that are all that risky 
though, but things which btrfs check can handle safely.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Is stability a joke? (wiki updated)

2016-09-16 Thread Austin S. Hemmelgarn

On 2016-09-15 16:26, Chris Murphy wrote:

On Thu, Sep 15, 2016 at 2:16 PM, Hugo Mills  wrote:

On Thu, Sep 15, 2016 at 01:02:43PM -0600, Chris Murphy wrote:

On Thu, Sep 15, 2016 at 12:20 PM, Austin S. Hemmelgarn
 wrote:


2. We're developing new features without making sure that check can fix
issues in any associated metadata.  Part of merging a new feature needs to
be proving that fsck can handle fixing any issues in the metadata for that
feature short of total data loss or complete corruption.

3. Fsck should be needed only for un-mountable filesystems.  Ideally, we
should be handling things like Windows does.  Preform slightly better
checking when reading data, and if we see an error, flag the filesystem for
expensive repair on the next mount.


Right, well I'm vaguely curious why ZFS, as different as it is,
basically take the position that if the hardware went so batshit that
they can't unwind it on a normal mount, then an fsck probably can't
help either... they still don't have an fsck and don't appear to want
one.

I'm not sure if the brfsck is really all that helpful to user as much
as it is for developers to better learn about the failure vectors of
the file system.



4. Btrfs check should know itself if it can fix something or not, and that
should be reported.  I have an otherwise perfectly fine filesystem that
throws some (apparently harmless) errors in check, and check can't repair
them.  Despite this, it gives zero indication that it can't repair them,
zero indication that it didn't repair them, and doesn't even seem to give a
non-zero exit status for this filesystem.


Yeah, it's really not a user tool in my view...





As far as the other tools:
- Self-repair at mount time: This isn't a repair tool, if the FS mounts,
it's not broken, it's just a messy and the kernel is tidying things up.
- btrfsck/btrfs check: I think I covered the issues here well.
- Mount options: These are mostly just for expensive checks during mount,
and most people should never need them except in very unusual circumstances.
- btrfs rescue *: These are all fixes for very specific issues.  They should
be folded into check with special aliases, and not be separate tools.  The
first fixes an issue that's pretty much non-existent in any modern kernel,
and the other two are for very low-level data recovery of horribly broken
filesystems.
- scrub: This is a very purpose specific tool which is supposed to be part
of regular maintainence, and only works to fix things as a side effect of
what it does.
- balance: This is also a relatively purpose specific tool, and again only
fixes things as a side effect of what it does.


   You've forgotten btrfs-zero-log, which seems to have built itself a
reputation on the internet as the tool you run to fix all btrfs ills,
rather than a very finely-targeted tool that was introduced to deal
with approximately one bug somewhere back in the 2.x era (IIRC).

   Hugo.


:-) It's in my original list, and it's in Austin's by way of being
lumped into 'btrfs rescue *' along with chunk and super recover. Seems
like super recover should be built into Btrfs check, and would be one
of the first ambiguities to get out of the way but I'm just an ape
that wears pants so what do I know.

Thing is?? zero log has fixed file systems in cases where I never
would have expected it to, and the user was recommended not to use it,
or use it as a 2nd to last resort. So, pfffIt's like throwing salt
around.

To be entirely honest, both zero-log and super-recover could probably be 
pretty easily integrated into btrfs check such that it detects when they 
need to be run and does so.  zero-log has a very well defined situation 
in which it's absolutely needed (log tree corrupted such that it can't 
be replayed), which is pretty easy to detect (the kernel obviously does 
so, albeit by crashing).  super-recover is also used in a pretty 
specific set of circumstances (first SB corrupted, backups fine), which 
are also pretty easy to detect.  In both cases, I'd like to see some 
switch (--single-fix maybe?) for directly invoking just those functions 
(as well as a few others like dropping the FSC/FST or cancelling a 
paused or crashed balance) that operate at a filesystem level instead of 
a block/inode/extent level like most of the other stuff in check does.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Is stability a joke? (wiki updated)

2016-09-15 Thread Christoph Anton Mitterer
On Thu, 2016-09-15 at 14:20 -0400, Austin S. Hemmelgarn wrote:
> 3. Fsck should be needed only for un-mountable filesystems.  Ideally,
> we 
> should be handling things like Windows does.  Preform slightly
> better 
> checking when reading data, and if we see an error, flag the
> filesystem 
> for expensive repair on the next mount.

That philosophy also has some drawbacks:
- The user doesn't directly that anything went wrong. Thus errors may
even continue to accumulate and getting much worse if the fs would have
immediately gone ro and giving the user the chance to manually
intervene (possibly then with help from upstream).

- Any smart auto-magical™ repair may also just fail (and make things
worse, as the current --repair e.g. may). Not performing such auto-
repair, gives the user at least the possible chance to make a bitwise
copy of the whole fs, before trying any rescue operations.
This wouldn't be the case, if the user never noticed that something
happen, and the fs tries to repair things right at mounting.

So I think any such auto-repair should be used with extreme caution and
only in those cases where one is absolutely a 100% sure that the action
will help and just do good.



Cheers,
Chris.

smime.p7s
Description: S/MIME cryptographic signature


Re: Is stability a joke? (wiki updated)

2016-09-15 Thread Chris Murphy
On Thu, Sep 15, 2016 at 2:16 PM, Hugo Mills  wrote:
> On Thu, Sep 15, 2016 at 01:02:43PM -0600, Chris Murphy wrote:
>> On Thu, Sep 15, 2016 at 12:20 PM, Austin S. Hemmelgarn
>>  wrote:
>>
>> > 2. We're developing new features without making sure that check can fix
>> > issues in any associated metadata.  Part of merging a new feature needs to
>> > be proving that fsck can handle fixing any issues in the metadata for that
>> > feature short of total data loss or complete corruption.
>> >
>> > 3. Fsck should be needed only for un-mountable filesystems.  Ideally, we
>> > should be handling things like Windows does.  Preform slightly better
>> > checking when reading data, and if we see an error, flag the filesystem for
>> > expensive repair on the next mount.
>>
>> Right, well I'm vaguely curious why ZFS, as different as it is,
>> basically take the position that if the hardware went so batshit that
>> they can't unwind it on a normal mount, then an fsck probably can't
>> help either... they still don't have an fsck and don't appear to want
>> one.
>>
>> I'm not sure if the brfsck is really all that helpful to user as much
>> as it is for developers to better learn about the failure vectors of
>> the file system.
>>
>>
>> > 4. Btrfs check should know itself if it can fix something or not, and that
>> > should be reported.  I have an otherwise perfectly fine filesystem that
>> > throws some (apparently harmless) errors in check, and check can't repair
>> > them.  Despite this, it gives zero indication that it can't repair them,
>> > zero indication that it didn't repair them, and doesn't even seem to give a
>> > non-zero exit status for this filesystem.
>>
>> Yeah, it's really not a user tool in my view...
>>
>>
>>
>> >
>> > As far as the other tools:
>> > - Self-repair at mount time: This isn't a repair tool, if the FS mounts,
>> > it's not broken, it's just a messy and the kernel is tidying things up.
>> > - btrfsck/btrfs check: I think I covered the issues here well.
>> > - Mount options: These are mostly just for expensive checks during mount,
>> > and most people should never need them except in very unusual 
>> > circumstances.
>> > - btrfs rescue *: These are all fixes for very specific issues.  They 
>> > should
>> > be folded into check with special aliases, and not be separate tools.  The
>> > first fixes an issue that's pretty much non-existent in any modern kernel,
>> > and the other two are for very low-level data recovery of horribly broken
>> > filesystems.
>> > - scrub: This is a very purpose specific tool which is supposed to be part
>> > of regular maintainence, and only works to fix things as a side effect of
>> > what it does.
>> > - balance: This is also a relatively purpose specific tool, and again only
>> > fixes things as a side effect of what it does.
>
>You've forgotten btrfs-zero-log, which seems to have built itself a
> reputation on the internet as the tool you run to fix all btrfs ills,
> rather than a very finely-targeted tool that was introduced to deal
> with approximately one bug somewhere back in the 2.x era (IIRC).
>
>Hugo.

:-) It's in my original list, and it's in Austin's by way of being
lumped into 'btrfs rescue *' along with chunk and super recover. Seems
like super recover should be built into Btrfs check, and would be one
of the first ambiguities to get out of the way but I'm just an ape
that wears pants so what do I know.

Thing is?? zero log has fixed file systems in cases where I never
would have expected it to, and the user was recommended not to use it,
or use it as a 2nd to last resort. So, pfffIt's like throwing salt
around.

-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Is stability a joke? (wiki updated)

2016-09-15 Thread Hugo Mills
On Thu, Sep 15, 2016 at 01:02:43PM -0600, Chris Murphy wrote:
> On Thu, Sep 15, 2016 at 12:20 PM, Austin S. Hemmelgarn
>  wrote:
> 
> > 2. We're developing new features without making sure that check can fix
> > issues in any associated metadata.  Part of merging a new feature needs to
> > be proving that fsck can handle fixing any issues in the metadata for that
> > feature short of total data loss or complete corruption.
> >
> > 3. Fsck should be needed only for un-mountable filesystems.  Ideally, we
> > should be handling things like Windows does.  Preform slightly better
> > checking when reading data, and if we see an error, flag the filesystem for
> > expensive repair on the next mount.
> 
> Right, well I'm vaguely curious why ZFS, as different as it is,
> basically take the position that if the hardware went so batshit that
> they can't unwind it on a normal mount, then an fsck probably can't
> help either... they still don't have an fsck and don't appear to want
> one.
> 
> I'm not sure if the brfsck is really all that helpful to user as much
> as it is for developers to better learn about the failure vectors of
> the file system.
> 
> 
> > 4. Btrfs check should know itself if it can fix something or not, and that
> > should be reported.  I have an otherwise perfectly fine filesystem that
> > throws some (apparently harmless) errors in check, and check can't repair
> > them.  Despite this, it gives zero indication that it can't repair them,
> > zero indication that it didn't repair them, and doesn't even seem to give a
> > non-zero exit status for this filesystem.
> 
> Yeah, it's really not a user tool in my view...
> 
> 
> 
> >
> > As far as the other tools:
> > - Self-repair at mount time: This isn't a repair tool, if the FS mounts,
> > it's not broken, it's just a messy and the kernel is tidying things up.
> > - btrfsck/btrfs check: I think I covered the issues here well.
> > - Mount options: These are mostly just for expensive checks during mount,
> > and most people should never need them except in very unusual circumstances.
> > - btrfs rescue *: These are all fixes for very specific issues.  They should
> > be folded into check with special aliases, and not be separate tools.  The
> > first fixes an issue that's pretty much non-existent in any modern kernel,
> > and the other two are for very low-level data recovery of horribly broken
> > filesystems.
> > - scrub: This is a very purpose specific tool which is supposed to be part
> > of regular maintainence, and only works to fix things as a side effect of
> > what it does.
> > - balance: This is also a relatively purpose specific tool, and again only
> > fixes things as a side effect of what it does.

   You've forgotten btrfs-zero-log, which seems to have built itself a
reputation on the internet as the tool you run to fix all btrfs ills,
rather than a very finely-targeted tool that was introduced to deal
with approximately one bug somewhere back in the 2.x era (IIRC).

   Hugo.

> 
> Yeah I know, it's just much of this is non-obvious to users unfamiliar
> with this file system. And even I'm often throwing spaghetti on a
> wall.
> 
> 
> -- 
> Chris Murphy
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Hugo Mills | It's against my programming to impersonate a deity!
hugo@... carfax.org.uk |
http://carfax.org.uk/  |
PGP: E2AB1DE4  |  C3PO, Return of the Jedi


signature.asc
Description: Digital signature


Re: Is stability a joke? (wiki updated)

2016-09-15 Thread Chris Murphy
On Thu, Sep 15, 2016 at 12:20 PM, Austin S. Hemmelgarn
 wrote:

> 2. We're developing new features without making sure that check can fix
> issues in any associated metadata.  Part of merging a new feature needs to
> be proving that fsck can handle fixing any issues in the metadata for that
> feature short of total data loss or complete corruption.
>
> 3. Fsck should be needed only for un-mountable filesystems.  Ideally, we
> should be handling things like Windows does.  Preform slightly better
> checking when reading data, and if we see an error, flag the filesystem for
> expensive repair on the next mount.

Right, well I'm vaguely curious why ZFS, as different as it is,
basically take the position that if the hardware went so batshit that
they can't unwind it on a normal mount, then an fsck probably can't
help either... they still don't have an fsck and don't appear to want
one.

I'm not sure if the brfsck is really all that helpful to user as much
as it is for developers to better learn about the failure vectors of
the file system.


> 4. Btrfs check should know itself if it can fix something or not, and that
> should be reported.  I have an otherwise perfectly fine filesystem that
> throws some (apparently harmless) errors in check, and check can't repair
> them.  Despite this, it gives zero indication that it can't repair them,
> zero indication that it didn't repair them, and doesn't even seem to give a
> non-zero exit status for this filesystem.

Yeah, it's really not a user tool in my view...



>
> As far as the other tools:
> - Self-repair at mount time: This isn't a repair tool, if the FS mounts,
> it's not broken, it's just a messy and the kernel is tidying things up.
> - btrfsck/btrfs check: I think I covered the issues here well.
> - Mount options: These are mostly just for expensive checks during mount,
> and most people should never need them except in very unusual circumstances.
> - btrfs rescue *: These are all fixes for very specific issues.  They should
> be folded into check with special aliases, and not be separate tools.  The
> first fixes an issue that's pretty much non-existent in any modern kernel,
> and the other two are for very low-level data recovery of horribly broken
> filesystems.
> - scrub: This is a very purpose specific tool which is supposed to be part
> of regular maintainence, and only works to fix things as a side effect of
> what it does.
> - balance: This is also a relatively purpose specific tool, and again only
> fixes things as a side effect of what it does.
>

Yeah I know, it's just much of this is non-obvious to users unfamiliar
with this file system. And even I'm often throwing spaghetti on a
wall.


-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Is stability a joke? (wiki updated)

2016-09-15 Thread Austin S. Hemmelgarn

On 2016-09-15 14:01, Chris Murphy wrote:

On Tue, Sep 13, 2016 at 5:35 AM, Austin S. Hemmelgarn
 wrote:

On 2016-09-12 16:08, Chris Murphy wrote:


- btrfsck status
e.g. btrfs-progs 4.7.2 still warns against using --repair, and lists
it under dangerous options also;  while that's true, Btrfs can't be
considered stable or recommended by default
e.g. There's still way too many separate repair tools for Btrfs.
Depending on how you count there's at least 4, and more realistically
8 ways, scattered across multiple commands. This excludes btrfs
check's -E, -r, and -s flags. And it ignores sequence in the success
rate. The permutations are just excessive. It's definitely not easy to
know how to fix a Btrfs volume should things go wrong.


I assume you're counting balance and scrub in that, plus check gives 3, what
are you considering the 4th?


- Self repair at mount time, similar to other fs's with a journal
- fsck, similar to other fs's except the output is really unclear
about what the prognosis is compared to ext4 or xfs
- mount option usebackuproot/recovery
- btrfs rescue zero-log
- btrfs rescue super-recover
- btrfs rescue chunk-recover
- scrub
- balance

check --repair really needed to be fail safe a long time ago, it's
what everyone's come to expect from fsck's, that they don't make
things worse; and in particular on Btrfs it seems like its repairs
should be reversible but the reality is the man page says do not use
(except under advisement) and that it's dangerous (twice). And a user
got a broken system in the bug that affects 4.7, 4.7.1, that 4.7.2
apparently can't fix. So... life is hard, file systems are hard. But
it's also hard to see how distros can possibly feel comfortable with
Btrfs by default when the fsck tool is dangerous, even if in theory it
shouldn't often be necessary.


For check specifically, I see four issues:
1. It spits out pretty low-level information about the internals in many 
cases when it returns an error.  xfs_repair does this too, but it's 
needed even less frequently than btrfs check, and it at least uses 
relatively simple jargon by comparison.  I've been using BTRFS for years 
and still can't tell what more than half the error messages check can 
return mean.  In contrast to that, deciphering an error message from 
e2fsck is pretty trivial if you have some basic understanding of VFS 
level filesystem abstractions (stuff like what inodes and dentries are), 
and I never needed to learn low level things about the internals of ext4 
to parse the fsck output (I did anyway, but that's beside the point).


2. We're developing new features without making sure that check can fix 
issues in any associated metadata.  Part of merging a new feature needs 
to be proving that fsck can handle fixing any issues in the metadata for 
that feature short of total data loss or complete corruption.


3. Fsck should be needed only for un-mountable filesystems.  Ideally, we 
should be handling things like Windows does.  Preform slightly better 
checking when reading data, and if we see an error, flag the filesystem 
for expensive repair on the next mount.


4. Btrfs check should know itself if it can fix something or not, and 
that should be reported.  I have an otherwise perfectly fine filesystem 
that throws some (apparently harmless) errors in check, and check can't 
repair them.  Despite this, it gives zero indication that it can't 
repair them, zero indication that it didn't repair them, and doesn't 
even seem to give a non-zero exit status for this filesystem.


As far as the other tools:
- Self-repair at mount time: This isn't a repair tool, if the FS mounts, 
it's not broken, it's just a messy and the kernel is tidying things up.

- btrfsck/btrfs check: I think I covered the issues here well.
- Mount options: These are mostly just for expensive checks during 
mount, and most people should never need them except in very unusual 
circumstances.
- btrfs rescue *: These are all fixes for very specific issues.  They 
should be folded into check with special aliases, and not be separate 
tools.  The first fixes an issue that's pretty much non-existent in any 
modern kernel, and the other two are for very low-level data recovery of 
horribly broken filesystems.
- scrub: This is a very purpose specific tool which is supposed to be 
part of regular maintainence, and only works to fix things as a side 
effect of what it does.
- balance: This is also a relatively purpose specific tool, and again 
only fixes things as a side effect of what it does.


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Is stability a joke? (wiki updated)

2016-09-15 Thread Chris Murphy
On Tue, Sep 13, 2016 at 5:35 AM, Austin S. Hemmelgarn
 wrote:
> On 2016-09-12 16:08, Chris Murphy wrote:
>>
>> - btrfsck status
>> e.g. btrfs-progs 4.7.2 still warns against using --repair, and lists
>> it under dangerous options also;  while that's true, Btrfs can't be
>> considered stable or recommended by default
>> e.g. There's still way too many separate repair tools for Btrfs.
>> Depending on how you count there's at least 4, and more realistically
>> 8 ways, scattered across multiple commands. This excludes btrfs
>> check's -E, -r, and -s flags. And it ignores sequence in the success
>> rate. The permutations are just excessive. It's definitely not easy to
>> know how to fix a Btrfs volume should things go wrong.
>
> I assume you're counting balance and scrub in that, plus check gives 3, what
> are you considering the 4th?

- Self repair at mount time, similar to other fs's with a journal
- fsck, similar to other fs's except the output is really unclear
about what the prognosis is compared to ext4 or xfs
- mount option usebackuproot/recovery
- btrfs rescue zero-log
- btrfs rescue super-recover
- btrfs rescue chunk-recover
- scrub
- balance

check --repair really needed to be fail safe a long time ago, it's
what everyone's come to expect from fsck's, that they don't make
things worse; and in particular on Btrfs it seems like its repairs
should be reversible but the reality is the man page says do not use
(except under advisement) and that it's dangerous (twice). And a user
got a broken system in the bug that affects 4.7, 4.7.1, that 4.7.2
apparently can't fix. So... life is hard, file systems are hard. But
it's also hard to see how distros can possibly feel comfortable with
Btrfs by default when the fsck tool is dangerous, even if in theory it
shouldn't often be necessary.


-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Is stability a joke? (wiki updated)

2016-09-13 Thread Marc Haber
On Mon, Sep 12, 2016 at 02:44:35PM -0600, Chris Murphy wrote:
> Just to cut yourself some slack, you could skip 3.14 because it's EOL
> now, and just go from 4.4.

Don't the btrfs-tools used to create the filesystem also play a huge
role in this game?

Greetings
Marc

-- 
-
Marc Haber | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany|  lose things."Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Is stability a joke? (wiki updated)

2016-09-13 Thread Martin Steigerwald
Am Dienstag, 13. September 2016, 07:28:38 CEST schrieb Austin S. Hemmelgarn:
> On 2016-09-12 16:44, Chris Murphy wrote:
> > On Mon, Sep 12, 2016 at 2:35 PM, Martin Steigerwald  
wrote:
> >> Am Montag, 12. September 2016, 23:21:09 CEST schrieb Pasi Kärkkäinen:
> >>> On Mon, Sep 12, 2016 at 09:57:17PM +0200, Martin Steigerwald wrote:
>  Am Montag, 12. September 2016, 18:27:47 CEST schrieb David Sterba:
> > On Mon, Sep 12, 2016 at 04:27:14PM +0200, David Sterba wrote:
[…]
> > https://btrfs.wiki.kernel.org/index.php/Status
>  
>  Great.
>  
>  I made to minor adaption. I added a link to the Status page to my
>  warning
>  in before the kernel log by feature page. And I also mentioned that at
>  the time the page was last updated the latest kernel version was 4.7.
>  Yes, thats some extra work to update the kernel version, but I think
>  its
>  beneficial to explicitely mention the kernel version the page talks
>  about. Everyone who updates the page can update the version within a
>  second.
> >>> 
> >>> Hmm.. that will still leave people wondering "but I'm running Linux 4.4,
> >>> not 4.7, I wonder what the status of feature X is.."
> >>> 
> >>> Should we also add a column for kernel version, so we can add "feature X
> >>> is
> >>> known to be OK on Linux 3.18 and later"..  ? Or add those to "notes"
> >>> field,
> >>> where applicable?
> >> 
> >> That was my initial idea, and it may be better than a generic kernel
> >> version for all features. Even if we fill in 4.7 for any of the features
> >> that are known to work okay for the table.
> >> 
> >> For RAID 1 I am willing to say it works stable since kernel 3.14, as this
> >> was the kernel I used when I switched /home and / to Dual SSD RAID 1 on
> >> this ThinkPad T520.
> > 
> > Just to cut yourself some slack, you could skip 3.14 because it's EOL
> > now, and just go from 4.4.
> 
> That reminds me, we should probably make a point to make it clear that
> this is for the _upstream_ mainline kernel versions, not for versions
> from some arbitrary distro, and that people should check the distro's
> documentation for that info.

I´d do the following:

Really state the first known to work stable kernel version for a feature.

But before the table state this:

1) Instead of the first known to work stable kernel for a feature recommend to 
use the latest upstream kernel or alternatively the latest upstream LTS kernel 
for those users who want to play it a bit safer.

2) For stable distros such as  SLES, RHEL, Ubuntu LTS, Debian Stable recommend 
to check distro documentation. Note that some distro kernels track upstream 
kernels quite closely like Debian backport kernel or Ubuntu kernel backports 
PPA.

Thanks,
-- 
Martin
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Is stability a joke? (wiki updated)

2016-09-13 Thread Austin S. Hemmelgarn

On 2016-09-12 16:08, Chris Murphy wrote:

On Mon, Sep 12, 2016 at 10:56 AM, Austin S. Hemmelgarn
 wrote:



Things listed as TBD status:
1. Seeding: Seems to work fine the couple of times I've tested it, however
I've only done very light testing, and the whole feature is pretty much
undocumented.


Mostly OK.

Odd behaviors:
- mount seed (ro), add device, remount mountpoint: this just changed
the mounted fs volume UUID
- if two sprouts for a seed exist, ambiguous which is remounted rw,
you'd have to check
- remount should probably be disallowed in this case somehow; require
explicit mount of the sprout

btrfs fi usage crash when multiple device volume contains seed device
https://bugzilla.kernel.org/show_bug.cgi?id=115851
Yeah, like I said, I've only done very light testing.  I kind of lost 
interest in seeding when overlayfs went mainline, as it offers pretty 
much everything I care about that seeding does, and it's filesystem 
agnostic.




2. Device Replace: Works perfectly as long as the filesystem itself is not
corrupted, all the component devices are working, and the FS isn't using any
raid56 profiles.  Works fine if only the device being replaced is failing.
I've not done much testing WRT replacement when multiple devices are
suspect, but what I've done seems to suggest that it might be possible to
make it work, but it doesn't currently.  On raid56 it sometimes works fine,
sometimes corrupts data, and sometimes takes an insanely long time to
complete (putting data at risk from subsequent failures while the replace is
running).
3. Balance: Works perfectly as long as the filesystem is not corrupted and
nothing throws any read or write errors.  IOW, only run this on a generally
healthy filesystem.  Similar caveats to those for replace with raid56 apply
here too.
4. File Range Cloning and Out-of-band Dedupe: Similarly, work fine if the FS
is healthy.


Concur.


Missing from the matrix:

- default file system for distros recommendation
e.g. between enospc and btrfsck status, I'd say in general this is not
currently recommended by upstream (short of having a Btrfs kernel
developer on staff)

I'd add the whole UUID issue to that too.


- enospc status
e.g. there's new stuff in 4.8 that probably still needs to shake out,
and Jeff's found some metadata accounting problem resulting in enospc
where there's tons of unallocated space available.
e.g. I have empty block groups, and they are not being deallocated,
they just stick around, and this is with 4.7 and 4.8 kernels; so
whatever was at one time automatically removing totally empty bg's
isn't happening anymore.

FWIW, that's still working on my systems.


- btrfsck status
e.g. btrfs-progs 4.7.2 still warns against using --repair, and lists
it under dangerous options also;  while that's true, Btrfs can't be
considered stable or recommended by default
e.g. There's still way too many separate repair tools for Btrfs.
Depending on how you count there's at least 4, and more realistically
8 ways, scattered across multiple commands. This excludes btrfs
check's -E, -r, and -s flags. And it ignores sequence in the success
rate. The permutations are just excessive. It's definitely not easy to
know how to fix a Btrfs volume should things go wrong.
I assume you're counting balance and scrub in that, plus check gives 3, 
what are you considering the 4th?


In the case of just balance, scrub, and check, the differentiation there 
makes more sense IMHO than combining them, check only runs on offline 
filesystems (and as much as we want online fsck, I doubt that that will 
happen any time soon), while scrub and balance operate on online 
filesystems and do two semantically different things.


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Is stability a joke? (wiki updated)

2016-09-13 Thread Austin S. Hemmelgarn

On 2016-09-12 16:44, Chris Murphy wrote:

On Mon, Sep 12, 2016 at 2:35 PM, Martin Steigerwald  wrote:

Am Montag, 12. September 2016, 23:21:09 CEST schrieb Pasi Kärkkäinen:

On Mon, Sep 12, 2016 at 09:57:17PM +0200, Martin Steigerwald wrote:

Am Montag, 12. September 2016, 18:27:47 CEST schrieb David Sterba:

On Mon, Sep 12, 2016 at 04:27:14PM +0200, David Sterba wrote:

I therefore would like to propose that some sort of feature /
stability
matrix for the latest kernel is added to the wiki preferably
somewhere
where it is easy to find. It would be nice to archive old matrix'es
as
well in case someone runs on a bit older kernel (we who use Debian
tend
to like older kernels). In my opinion it would make things bit
easier
and perhaps a bit less scary too. Remember if you get bitten badly
once
you tend to stay away from from it all just in case, if you on the
other
hand know what bites you can safely pet the fluffy end instead :)


Somebody has put that table on the wiki, so it's a good starting
point.
I'm not sure we can fit everything into one table, some combinations
do
not bring new information and we'd need n-dimensional matrix to get
the
whole picture.


https://btrfs.wiki.kernel.org/index.php/Status


Great.

I made to minor adaption. I added a link to the Status page to my warning
in before the kernel log by feature page. And I also mentioned that at
the time the page was last updated the latest kernel version was 4.7.
Yes, thats some extra work to update the kernel version, but I think its
beneficial to explicitely mention the kernel version the page talks
about. Everyone who updates the page can update the version within a
second.


Hmm.. that will still leave people wondering "but I'm running Linux 4.4, not
4.7, I wonder what the status of feature X is.."

Should we also add a column for kernel version, so we can add "feature X is
known to be OK on Linux 3.18 and later"..  ? Or add those to "notes" field,
where applicable?


That was my initial idea, and it may be better than a generic kernel version
for all features. Even if we fill in 4.7 for any of the features that are
known to work okay for the table.

For RAID 1 I am willing to say it works stable since kernel 3.14, as this was
the kernel I used when I switched /home and / to Dual SSD RAID 1 on this
ThinkPad T520.


Just to cut yourself some slack, you could skip 3.14 because it's EOL
now, and just go from 4.4.
That reminds me, we should probably make a point to make it clear that 
this is for the _upstream_ mainline kernel versions, not for versions 
from some arbitrary distro, and that people should check the distro's 
documentation for that info.


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Is stability a joke? (wiki updated)

2016-09-13 Thread Austin S. Hemmelgarn

On 2016-09-13 04:38, Timofey Titovets wrote:

https://btrfs.wiki.kernel.org/index.php/Status
I suggest to mark RAID1/10 as 'mostly ok'
as on btrfs RAID1/10 is safe to data, but not for application that uses it.
i.e. it not hide I/O error even if it's can be masked.
https://www.spinics.net/lists/linux-btrfs/msg56739.html

/* Retest it with upstream 4.7.2 - not fixed */
This doesn't match with what my own testing indicates at least for raid1 
mode.  I run similar tests myself every time a new stable kernel version 
comes out (but only on the most recent stable version) once I get my own 
patches re-based onto it, and I haven't seen issues like this in any of 
the 4.7 kernels, and don't recall any issues like this in any of the 4.6 
kernels.  In fact, I've actually dealt with systems with failing disks 
using BTRFS raid1 mode, including one at work just yesterday where the 
SATA cable had worked loose from vibrations and was causing significant 
data corruption.  It survived just fine, as have all the other systems 
I've dealt with which had hardware issues while running BTRFS in raid1 mode.


The indicated behavior would be consistent with issues seen sometimes 
when using compression however, but the OP in the linked message made no 
indication of there being any in-line compression involved.


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Is stability a joke? (wiki updated)

2016-09-13 Thread Timofey Titovets
https://btrfs.wiki.kernel.org/index.php/Status
I suggest to mark RAID1/10 as 'mostly ok'
as on btrfs RAID1/10 is safe to data, but not for application that uses it.
i.e. it not hide I/O error even if it's can be masked.
https://www.spinics.net/lists/linux-btrfs/msg56739.html

/* Retest it with upstream 4.7.2 - not fixed */


-- 
Have a nice day,
Timofey.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Is stability a joke? (wiki updated)

2016-09-12 Thread Waxhead

Pasi Kärkkäinen wrote:

On Mon, Sep 12, 2016 at 09:57:17PM +0200, Martin Steigerwald wrote:


Great.

I made to minor adaption. I added a link to the Status page to my warning in
before the kernel log by feature page. And I also mentioned that at the time
the page was last updated the latest kernel version was 4.7. Yes, thats some
extra work to update the kernel version, but I think its beneficial to
explicitely mention the kernel version the page talks about. Everyone who
updates the page can update the version within a second.


Hmm.. that will still leave people wondering "but I'm running Linux 4.4, not 4.7, I 
wonder what the status of feature X is.."

Should we also add a column for kernel version, so we can add "feature X is known to 
be OK on Linux 3.18 and later"..  ?
Or add those to "notes" field, where applicable?


-- Pasi

I think a separate column would be the best solution. For example 
archiving the status page pr. kernel version (as I suggested) will lead 
to issues too. For example if something appears to be just fine in 4.6 
is found to be horribly broken in for example 4.10 the archive would 
still indicate that it WAS ok at that time even if it perhaps was not. 
Then you have regressions - something that worked in 4.4 may not work in 
4.9, but I still think the best idea is to simply label the status as ok 
/ broken since 4.x as those who really want to use a broken feature 
probably would to research to see if this used to work , besides if 
something that used to work goes haywire it should be fixed quickly :)

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html





--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Is stability a joke? (wiki updated)

2016-09-12 Thread Chris Murphy
On Mon, Sep 12, 2016 at 2:35 PM, Martin Steigerwald  wrote:
> Am Montag, 12. September 2016, 23:21:09 CEST schrieb Pasi Kärkkäinen:
>> On Mon, Sep 12, 2016 at 09:57:17PM +0200, Martin Steigerwald wrote:
>> > Am Montag, 12. September 2016, 18:27:47 CEST schrieb David Sterba:
>> > > On Mon, Sep 12, 2016 at 04:27:14PM +0200, David Sterba wrote:
>> > > > > I therefore would like to propose that some sort of feature /
>> > > > > stability
>> > > > > matrix for the latest kernel is added to the wiki preferably
>> > > > > somewhere
>> > > > > where it is easy to find. It would be nice to archive old matrix'es
>> > > > > as
>> > > > > well in case someone runs on a bit older kernel (we who use Debian
>> > > > > tend
>> > > > > to like older kernels). In my opinion it would make things bit
>> > > > > easier
>> > > > > and perhaps a bit less scary too. Remember if you get bitten badly
>> > > > > once
>> > > > > you tend to stay away from from it all just in case, if you on the
>> > > > > other
>> > > > > hand know what bites you can safely pet the fluffy end instead :)
>> > > >
>> > > > Somebody has put that table on the wiki, so it's a good starting
>> > > > point.
>> > > > I'm not sure we can fit everything into one table, some combinations
>> > > > do
>> > > > not bring new information and we'd need n-dimensional matrix to get
>> > > > the
>> > > > whole picture.
>> > >
>> > > https://btrfs.wiki.kernel.org/index.php/Status
>> >
>> > Great.
>> >
>> > I made to minor adaption. I added a link to the Status page to my warning
>> > in before the kernel log by feature page. And I also mentioned that at
>> > the time the page was last updated the latest kernel version was 4.7.
>> > Yes, thats some extra work to update the kernel version, but I think its
>> > beneficial to explicitely mention the kernel version the page talks
>> > about. Everyone who updates the page can update the version within a
>> > second.
>>
>> Hmm.. that will still leave people wondering "but I'm running Linux 4.4, not
>> 4.7, I wonder what the status of feature X is.."
>>
>> Should we also add a column for kernel version, so we can add "feature X is
>> known to be OK on Linux 3.18 and later"..  ? Or add those to "notes" field,
>> where applicable?
>
> That was my initial idea, and it may be better than a generic kernel version
> for all features. Even if we fill in 4.7 for any of the features that are
> known to work okay for the table.
>
> For RAID 1 I am willing to say it works stable since kernel 3.14, as this was
> the kernel I used when I switched /home and / to Dual SSD RAID 1 on this
> ThinkPad T520.

Just to cut yourself some slack, you could skip 3.14 because it's EOL
now, and just go from 4.4.



-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Is stability a joke? (wiki updated)

2016-09-12 Thread Martin Steigerwald
Am Montag, 12. September 2016, 23:21:09 CEST schrieb Pasi Kärkkäinen:
> On Mon, Sep 12, 2016 at 09:57:17PM +0200, Martin Steigerwald wrote:
> > Am Montag, 12. September 2016, 18:27:47 CEST schrieb David Sterba:
> > > On Mon, Sep 12, 2016 at 04:27:14PM +0200, David Sterba wrote:
> > > > > I therefore would like to propose that some sort of feature /
> > > > > stability
> > > > > matrix for the latest kernel is added to the wiki preferably
> > > > > somewhere
> > > > > where it is easy to find. It would be nice to archive old matrix'es
> > > > > as
> > > > > well in case someone runs on a bit older kernel (we who use Debian
> > > > > tend
> > > > > to like older kernels). In my opinion it would make things bit
> > > > > easier
> > > > > and perhaps a bit less scary too. Remember if you get bitten badly
> > > > > once
> > > > > you tend to stay away from from it all just in case, if you on the
> > > > > other
> > > > > hand know what bites you can safely pet the fluffy end instead :)
> > > > 
> > > > Somebody has put that table on the wiki, so it's a good starting
> > > > point.
> > > > I'm not sure we can fit everything into one table, some combinations
> > > > do
> > > > not bring new information and we'd need n-dimensional matrix to get
> > > > the
> > > > whole picture.
> > > 
> > > https://btrfs.wiki.kernel.org/index.php/Status
> > 
> > Great.
> > 
> > I made to minor adaption. I added a link to the Status page to my warning
> > in before the kernel log by feature page. And I also mentioned that at
> > the time the page was last updated the latest kernel version was 4.7.
> > Yes, thats some extra work to update the kernel version, but I think its
> > beneficial to explicitely mention the kernel version the page talks
> > about. Everyone who updates the page can update the version within a
> > second.
> 
> Hmm.. that will still leave people wondering "but I'm running Linux 4.4, not
> 4.7, I wonder what the status of feature X is.."
> 
> Should we also add a column for kernel version, so we can add "feature X is
> known to be OK on Linux 3.18 and later"..  ? Or add those to "notes" field,
> where applicable?

That was my initial idea, and it may be better than a generic kernel version 
for all features. Even if we fill in 4.7 for any of the features that are 
known to work okay for the table.

For RAID 1 I am willing to say it works stable since kernel 3.14, as this was 
the kernel I used when I switched /home and / to Dual SSD RAID 1 on this 
ThinkPad T520.


-- 
Martin
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Is stability a joke? (wiki updated)

2016-09-12 Thread Pasi Kärkkäinen
On Mon, Sep 12, 2016 at 09:57:17PM +0200, Martin Steigerwald wrote:
> Am Montag, 12. September 2016, 18:27:47 CEST schrieb David Sterba:
> > On Mon, Sep 12, 2016 at 04:27:14PM +0200, David Sterba wrote:
> > > > I therefore would like to propose that some sort of feature / stability
> > > > matrix for the latest kernel is added to the wiki preferably somewhere
> > > > where it is easy to find. It would be nice to archive old matrix'es as
> > > > well in case someone runs on a bit older kernel (we who use Debian tend
> > > > to like older kernels). In my opinion it would make things bit easier
> > > > and perhaps a bit less scary too. Remember if you get bitten badly once
> > > > you tend to stay away from from it all just in case, if you on the other
> > > > hand know what bites you can safely pet the fluffy end instead :)
> > > 
> > > Somebody has put that table on the wiki, so it's a good starting point.
> > > I'm not sure we can fit everything into one table, some combinations do
> > > not bring new information and we'd need n-dimensional matrix to get the
> > > whole picture.
> > 
> > https://btrfs.wiki.kernel.org/index.php/Status
> 
> Great.
> 
> I made to minor adaption. I added a link to the Status page to my warning in 
> before the kernel log by feature page. And I also mentioned that at the time 
> the page was last updated the latest kernel version was 4.7. Yes, thats some 
> extra work to update the kernel version, but I think its beneficial to 
> explicitely mention the kernel version the page talks about. Everyone who 
> updates the page can update the version within a second.
> 

Hmm.. that will still leave people wondering "but I'm running Linux 4.4, not 
4.7, I wonder what the status of feature X is.." 

Should we also add a column for kernel version, so we can add "feature X is 
known to be OK on Linux 3.18 and later"..  ?
Or add those to "notes" field, where applicable? 


-- Pasi

> -- 
> Martin

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Is stability a joke? (wiki updated)

2016-09-12 Thread Chris Murphy
On Mon, Sep 12, 2016 at 10:56 AM, Austin S. Hemmelgarn
 wrote:

>
> Things listed as TBD status:
> 1. Seeding: Seems to work fine the couple of times I've tested it, however
> I've only done very light testing, and the whole feature is pretty much
> undocumented.

Mostly OK.

Odd behaviors:
- mount seed (ro), add device, remount mountpoint: this just changed
the mounted fs volume UUID
- if two sprouts for a seed exist, ambiguous which is remounted rw,
you'd have to check
- remount should probably be disallowed in this case somehow; require
explicit mount of the sprout

btrfs fi usage crash when multiple device volume contains seed device
https://bugzilla.kernel.org/show_bug.cgi?id=115851


> 2. Device Replace: Works perfectly as long as the filesystem itself is not
> corrupted, all the component devices are working, and the FS isn't using any
> raid56 profiles.  Works fine if only the device being replaced is failing.
> I've not done much testing WRT replacement when multiple devices are
> suspect, but what I've done seems to suggest that it might be possible to
> make it work, but it doesn't currently.  On raid56 it sometimes works fine,
> sometimes corrupts data, and sometimes takes an insanely long time to
> complete (putting data at risk from subsequent failures while the replace is
> running).
> 3. Balance: Works perfectly as long as the filesystem is not corrupted and
> nothing throws any read or write errors.  IOW, only run this on a generally
> healthy filesystem.  Similar caveats to those for replace with raid56 apply
> here too.
> 4. File Range Cloning and Out-of-band Dedupe: Similarly, work fine if the FS
> is healthy.

Concur.


Missing from the matrix:

- default file system for distros recommendation
e.g. between enospc and btrfsck status, I'd say in general this is not
currently recommended by upstream (short of having a Btrfs kernel
developer on staff)

- enospc status
e.g. there's new stuff in 4.8 that probably still needs to shake out,
and Jeff's found some metadata accounting problem resulting in enospc
where there's tons of unallocated space available.
e.g. I have empty block groups, and they are not being deallocated,
they just stick around, and this is with 4.7 and 4.8 kernels; so
whatever was at one time automatically removing totally empty bg's
isn't happening anymore.

- btrfsck status
e.g. btrfs-progs 4.7.2 still warns against using --repair, and lists
it under dangerous options also;  while that's true, Btrfs can't be
considered stable or recommended by default
e.g. There's still way too many separate repair tools for Btrfs.
Depending on how you count there's at least 4, and more realistically
8 ways, scattered across multiple commands. This excludes btrfs
check's -E, -r, and -s flags. And it ignores sequence in the success
rate. The permutations are just excessive. It's definitely not easy to
know how to fix a Btrfs volume should things go wrong.


-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Is stability a joke? (wiki updated)

2016-09-12 Thread Martin Steigerwald
Am Montag, 12. September 2016, 18:27:47 CEST schrieb David Sterba:
> On Mon, Sep 12, 2016 at 04:27:14PM +0200, David Sterba wrote:
> > > I therefore would like to propose that some sort of feature / stability
> > > matrix for the latest kernel is added to the wiki preferably somewhere
> > > where it is easy to find. It would be nice to archive old matrix'es as
> > > well in case someone runs on a bit older kernel (we who use Debian tend
> > > to like older kernels). In my opinion it would make things bit easier
> > > and perhaps a bit less scary too. Remember if you get bitten badly once
> > > you tend to stay away from from it all just in case, if you on the other
> > > hand know what bites you can safely pet the fluffy end instead :)
> > 
> > Somebody has put that table on the wiki, so it's a good starting point.
> > I'm not sure we can fit everything into one table, some combinations do
> > not bring new information and we'd need n-dimensional matrix to get the
> > whole picture.
> 
> https://btrfs.wiki.kernel.org/index.php/Status

Great.

I made to minor adaption. I added a link to the Status page to my warning in 
before the kernel log by feature page. And I also mentioned that at the time 
the page was last updated the latest kernel version was 4.7. Yes, thats some 
extra work to update the kernel version, but I think its beneficial to 
explicitely mention the kernel version the page talks about. Everyone who 
updates the page can update the version within a second.

-- 
Martin
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Is stability a joke? (wiki updated)

2016-09-12 Thread Austin S. Hemmelgarn

On 2016-09-12 13:29, Filipe Manana wrote:

On Mon, Sep 12, 2016 at 5:56 PM, Austin S. Hemmelgarn
 wrote:

On 2016-09-12 12:27, David Sterba wrote:


On Mon, Sep 12, 2016 at 04:27:14PM +0200, David Sterba wrote:


I therefore would like to propose that some sort of feature / stability
matrix for the latest kernel is added to the wiki preferably somewhere
where it is easy to find. It would be nice to archive old matrix'es as
well in case someone runs on a bit older kernel (we who use Debian tend
to like older kernels). In my opinion it would make things bit easier
and perhaps a bit less scary too. Remember if you get bitten badly once
you tend to stay away from from it all just in case, if you on the other
hand know what bites you can safely pet the fluffy end instead :)



Somebody has put that table on the wiki, so it's a good starting point.
I'm not sure we can fit everything into one table, some combinations do
not bring new information and we'd need n-dimensional matrix to get the
whole picture.



https://btrfs.wiki.kernel.org/index.php/Status



Some things to potentially add based on my own experience:

Things listed as TBD status:
1. Seeding: Seems to work fine the couple of times I've tested it, however
I've only done very light testing, and the whole feature is pretty much
undocumented.
2. Device Replace: Works perfectly as long as the filesystem itself is not
corrupted, all the component devices are working, and the FS isn't using any
raid56 profiles.  Works fine if only the device being replaced is failing.
I've not done much testing WRT replacement when multiple devices are
suspect, but what I've done seems to suggest that it might be possible to
make it work, but it doesn't currently.  On raid56 it sometimes works fine,
sometimes corrupts data, and sometimes takes an insanely long time to
complete (putting data at risk from subsequent failures while the replace is
running).
3. Balance: Works perfectly as long as the filesystem is not corrupted and
nothing throws any read or write errors.  IOW, only run this on a generally
healthy filesystem.  Similar caveats to those for replace with raid56 apply
here too.
4. File Range Cloning and Out-of-band Dedupe: Similarly, work fine if the FS
is healthy.


Virtually all other features work fine if the fs is healthy...
I would add more, but I don't often have the time to test broken 
filesystems...


TBH though, that's most of the issue I see with BTRFS in general at the 
moment.  RAID5/6 works fine, as long as all the devices keep working and 
you don't try to replace them and don't lose power.  Qgroups appear to 
work fine as long as no other bug shows up (other than the issues with 
accounting and returning ENOSPC instead of EDQUOT).  We do so much 
testing on pristine filesystems, but most of the utilities and less 
widely used features have had near zero testing on filesystems that are 
in bad shape.  If you pay attention, many (possibly most?) of the 
recently reported bugs are from broken (or poorly curated) filesystems, 
not some random kernel bug.  New features are nice, but they generally 
don't improve stability, and for BTRFS to be truly production ready 
outside of constrained environments like FaceBook, it needs to not choke 
on encountering a FS with some small amount of corruption.




Other stuff:
1. Compression: The specific known issue is that compressed extents don't
always get recovered properly on failed reads when dealing with lots of
failed reads.  This can be demonstrated by generating a large raid1
filesystem image with huge numbers of small (1MB) readliy compressible
files, then putting that on top of a dm-flaky or dm-error target set to give
a high read-error rate, then mounting and running cat `find .` > /dev/null
from the top level of the FS multiple times in a row.



2. Send: The particular edge case appears to be caused by metadata
corruption on the sender and results in send choking on the same file every
time you try to run it.  The quick fix is to copy the contents of the file
to another file and rename that over the original.


I don't remember having seen such case at least for the last 2 or 3
years, all the problems I've seen/solved or seen fixes from others
were all related to bugs in the send algorithm and definitely not any
metadata corruption.
So I wonder what evidence you have about this.
For the compression related issues, I can still reproduce it, but it 
takes a while.


As for the send issues, I do still see these on rare occasion, but only 
on 2+ year old filesystems, but I think the last time I saw it happen 
was more than 3 months ago.


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Is stability a joke? (wiki updated)

2016-09-12 Thread Filipe Manana
On Mon, Sep 12, 2016 at 5:56 PM, Austin S. Hemmelgarn
 wrote:
> On 2016-09-12 12:27, David Sterba wrote:
>>
>> On Mon, Sep 12, 2016 at 04:27:14PM +0200, David Sterba wrote:

 I therefore would like to propose that some sort of feature / stability
 matrix for the latest kernel is added to the wiki preferably somewhere
 where it is easy to find. It would be nice to archive old matrix'es as
 well in case someone runs on a bit older kernel (we who use Debian tend
 to like older kernels). In my opinion it would make things bit easier
 and perhaps a bit less scary too. Remember if you get bitten badly once
 you tend to stay away from from it all just in case, if you on the other
 hand know what bites you can safely pet the fluffy end instead :)
>>>
>>>
>>> Somebody has put that table on the wiki, so it's a good starting point.
>>> I'm not sure we can fit everything into one table, some combinations do
>>> not bring new information and we'd need n-dimensional matrix to get the
>>> whole picture.
>>
>>
>> https://btrfs.wiki.kernel.org/index.php/Status
>
>
> Some things to potentially add based on my own experience:
>
> Things listed as TBD status:
> 1. Seeding: Seems to work fine the couple of times I've tested it, however
> I've only done very light testing, and the whole feature is pretty much
> undocumented.
> 2. Device Replace: Works perfectly as long as the filesystem itself is not
> corrupted, all the component devices are working, and the FS isn't using any
> raid56 profiles.  Works fine if only the device being replaced is failing.
> I've not done much testing WRT replacement when multiple devices are
> suspect, but what I've done seems to suggest that it might be possible to
> make it work, but it doesn't currently.  On raid56 it sometimes works fine,
> sometimes corrupts data, and sometimes takes an insanely long time to
> complete (putting data at risk from subsequent failures while the replace is
> running).
> 3. Balance: Works perfectly as long as the filesystem is not corrupted and
> nothing throws any read or write errors.  IOW, only run this on a generally
> healthy filesystem.  Similar caveats to those for replace with raid56 apply
> here too.
> 4. File Range Cloning and Out-of-band Dedupe: Similarly, work fine if the FS
> is healthy.

Virtually all other features work fine if the fs is healthy...

>
> Other stuff:
> 1. Compression: The specific known issue is that compressed extents don't
> always get recovered properly on failed reads when dealing with lots of
> failed reads.  This can be demonstrated by generating a large raid1
> filesystem image with huge numbers of small (1MB) readliy compressible
> files, then putting that on top of a dm-flaky or dm-error target set to give
> a high read-error rate, then mounting and running cat `find .` > /dev/null
> from the top level of the FS multiple times in a row.

> 2. Send: The particular edge case appears to be caused by metadata
> corruption on the sender and results in send choking on the same file every
> time you try to run it.  The quick fix is to copy the contents of the file
> to another file and rename that over the original.

I don't remember having seen such case at least for the last 2 or 3
years, all the problems I've seen/solved or seen fixes from others
were all related to bugs in the send algorithm and definitely not any
metadata corruption.
So I wonder what evidence you have about this.

>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Filipe David Manana,

"People will forget what you said,
 people will forget what you did,
 but people will never forget how you made them feel."
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Is stability a joke? (wiki updated)

2016-09-12 Thread Austin S. Hemmelgarn

On 2016-09-12 12:27, David Sterba wrote:

On Mon, Sep 12, 2016 at 04:27:14PM +0200, David Sterba wrote:

I therefore would like to propose that some sort of feature / stability
matrix for the latest kernel is added to the wiki preferably somewhere
where it is easy to find. It would be nice to archive old matrix'es as
well in case someone runs on a bit older kernel (we who use Debian tend
to like older kernels). In my opinion it would make things bit easier
and perhaps a bit less scary too. Remember if you get bitten badly once
you tend to stay away from from it all just in case, if you on the other
hand know what bites you can safely pet the fluffy end instead :)


Somebody has put that table on the wiki, so it's a good starting point.
I'm not sure we can fit everything into one table, some combinations do
not bring new information and we'd need n-dimensional matrix to get the
whole picture.


https://btrfs.wiki.kernel.org/index.php/Status


Some things to potentially add based on my own experience:

Things listed as TBD status:
1. Seeding: Seems to work fine the couple of times I've tested it, 
however I've only done very light testing, and the whole feature is 
pretty much undocumented.
2. Device Replace: Works perfectly as long as the filesystem itself is 
not corrupted, all the component devices are working, and the FS isn't 
using any raid56 profiles.  Works fine if only the device being replaced 
is failing.  I've not done much testing WRT replacement when multiple 
devices are suspect, but what I've done seems to suggest that it might 
be possible to make it work, but it doesn't currently.  On raid56 it 
sometimes works fine, sometimes corrupts data, and sometimes takes an 
insanely long time to complete (putting data at risk from subsequent 
failures while the replace is running).
3. Balance: Works perfectly as long as the filesystem is not corrupted 
and nothing throws any read or write errors.  IOW, only run this on a 
generally healthy filesystem.  Similar caveats to those for replace with 
raid56 apply here too.
4. File Range Cloning and Out-of-band Dedupe: Similarly, work fine if 
the FS is healthy.


Other stuff:
1. Compression: The specific known issue is that compressed extents 
don't always get recovered properly on failed reads when dealing with 
lots of failed reads.  This can be demonstrated by generating a large 
raid1 filesystem image with huge numbers of small (1MB) readliy 
compressible files, then putting that on top of a dm-flaky or dm-error 
target set to give a high read-error rate, then mounting and running cat 
`find .` > /dev/null from the top level of the FS multiple times in a row.
2. Send: The particular edge case appears to be caused by metadata 
corruption on the sender and results in send choking on the same file 
every time you try to run it.  The quick fix is to copy the contents of 
the file to another file and rename that over the original.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Is stability a joke? (wiki updated)

2016-09-12 Thread David Sterba
On Mon, Sep 12, 2016 at 04:27:14PM +0200, David Sterba wrote:
> > I therefore would like to propose that some sort of feature / stability 
> > matrix for the latest kernel is added to the wiki preferably somewhere 
> > where it is easy to find. It would be nice to archive old matrix'es as 
> > well in case someone runs on a bit older kernel (we who use Debian tend 
> > to like older kernels). In my opinion it would make things bit easier 
> > and perhaps a bit less scary too. Remember if you get bitten badly once 
> > you tend to stay away from from it all just in case, if you on the other 
> > hand know what bites you can safely pet the fluffy end instead :)
> 
> Somebody has put that table on the wiki, so it's a good starting point.
> I'm not sure we can fit everything into one table, some combinations do
> not bring new information and we'd need n-dimensional matrix to get the
> whole picture.

https://btrfs.wiki.kernel.org/index.php/Status
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html