On 2016-09-12 13:29, Filipe Manana wrote:
On Mon, Sep 12, 2016 at 5:56 PM, Austin S. Hemmelgarn
<ahferro...@gmail.com> wrote:
On 2016-09-12 12:27, David Sterba wrote:

On Mon, Sep 12, 2016 at 04:27:14PM +0200, David Sterba wrote:

I therefore would like to propose that some sort of feature / stability
matrix for the latest kernel is added to the wiki preferably somewhere
where it is easy to find. It would be nice to archive old matrix'es as
well in case someone runs on a bit older kernel (we who use Debian tend
to like older kernels). In my opinion it would make things bit easier
and perhaps a bit less scary too. Remember if you get bitten badly once
you tend to stay away from from it all just in case, if you on the other
hand know what bites you can safely pet the fluffy end instead :)


Somebody has put that table on the wiki, so it's a good starting point.
I'm not sure we can fit everything into one table, some combinations do
not bring new information and we'd need n-dimensional matrix to get the
whole picture.


https://btrfs.wiki.kernel.org/index.php/Status


Some things to potentially add based on my own experience:

Things listed as TBD status:
1. Seeding: Seems to work fine the couple of times I've tested it, however
I've only done very light testing, and the whole feature is pretty much
undocumented.
2. Device Replace: Works perfectly as long as the filesystem itself is not
corrupted, all the component devices are working, and the FS isn't using any
raid56 profiles.  Works fine if only the device being replaced is failing.
I've not done much testing WRT replacement when multiple devices are
suspect, but what I've done seems to suggest that it might be possible to
make it work, but it doesn't currently.  On raid56 it sometimes works fine,
sometimes corrupts data, and sometimes takes an insanely long time to
complete (putting data at risk from subsequent failures while the replace is
running).
3. Balance: Works perfectly as long as the filesystem is not corrupted and
nothing throws any read or write errors.  IOW, only run this on a generally
healthy filesystem.  Similar caveats to those for replace with raid56 apply
here too.
4. File Range Cloning and Out-of-band Dedupe: Similarly, work fine if the FS
is healthy.

Virtually all other features work fine if the fs is healthy...
I would add more, but I don't often have the time to test broken filesystems...

TBH though, that's most of the issue I see with BTRFS in general at the moment. RAID5/6 works fine, as long as all the devices keep working and you don't try to replace them and don't lose power. Qgroups appear to work fine as long as no other bug shows up (other than the issues with accounting and returning ENOSPC instead of EDQUOT). We do so much testing on pristine filesystems, but most of the utilities and less widely used features have had near zero testing on filesystems that are in bad shape. If you pay attention, many (possibly most?) of the recently reported bugs are from broken (or poorly curated) filesystems, not some random kernel bug. New features are nice, but they generally don't improve stability, and for BTRFS to be truly production ready outside of constrained environments like FaceBook, it needs to not choke on encountering a FS with some small amount of corruption.


Other stuff:
1. Compression: The specific known issue is that compressed extents don't
always get recovered properly on failed reads when dealing with lots of
failed reads.  This can be demonstrated by generating a large raid1
filesystem image with huge numbers of small (1MB) readliy compressible
files, then putting that on top of a dm-flaky or dm-error target set to give
a high read-error rate, then mounting and running cat `find .` > /dev/null
from the top level of the FS multiple times in a row.

2. Send: The particular edge case appears to be caused by metadata
corruption on the sender and results in send choking on the same file every
time you try to run it.  The quick fix is to copy the contents of the file
to another file and rename that over the original.

I don't remember having seen such case at least for the last 2 or 3
years, all the problems I've seen/solved or seen fixes from others
were all related to bugs in the send algorithm and definitely not any
metadata corruption.
So I wonder what evidence you have about this.
For the compression related issues, I can still reproduce it, but it takes a while.

As for the send issues, I do still see these on rare occasion, but only on 2+ year old filesystems, but I think the last time I saw it happen was more than 3 months ago.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to