Re: Recover btrfs volume which can only be mounded in read-only mode

Duncan Wed, 14 Oct 2015 17:49:07 -0700

Dmitry Katsubo posted on Wed, 14 Oct 2015 22:27:29 +0200 as excerpted:

> On 14/10/2015 16:40, Anand Jain wrote:
>>> # mount -o degraded /var Oct 11 18:20:15 kernel: BTRFS: too many
>>> missing devices, writeable mount is not allowed
>>>
>>> # mount -o degraded,ro /var # btrfs device add /dev/sdd1 /var ERROR:
>>> error adding the device '/dev/sdd1' - Read-only file system
>>>
>>> Now I am stuck: I cannot add device to the volume to satisfy raid
>>> pre-requisite.
>> 
>>  This is a known issue. Would you be able to test below set of patches
>>  and update us..
>> 
>>    [PATCH 0/5] Btrfs: Per-chunk degradable check
> 
> Many thanks for the reply. Unfortunately I have no environment to
> recompile the kernel, and setting it up will perhaps take a day. Can the
> latest kernel be pushed to Debian sid?


In the way of general information...

While btrfs is no longer entirely unstable (since 3.12 when the 
experimental tag was removed) and kernel patch backports are generally 
done where stability is a factor, it's not yet fully stable and mature, 
either.  As such, an expectation of true stability such that wishing to 
remain on kernels more than one LTS series behind the latest LTS kernel 
series (4.1, with 3.18 the one LTS series back version) can be considered 
incompatible with wishing to run the still under heavy development and 
not yet fully stable and mature btrfs, at least as soon as problems are 
reported.  A request to upgrade to current and/or to try various not yet 
mainline integrated patches is thus to be expected on report of problems.

As for userspace, the division between btrfs kernel and userspace works 
like this:  Under normal operating conditions, userspace simply makes 
requests of the kernel, which does the actual work.  Thus, under normal 
conditions, updated kernel code is most important.  However, once a 
problem occurs and repair/recovery is attempted, it's generally userspace 
code itself directly operating on the unmounted filesystem, so having the 
latest userspace code fixes becomes most important once something has 
gone wrong and you're trying to fix it.

So upgrading to a 3.18 series kernel, at minimum, is very strongly 
recommended for those running btrfs, with an expectation that an upgrade 
to 4.1 should be being planned and tested, for deployment as soon as it's 
passing on-site pre-deployment testing.  And an upgrade to current or 
close to current btrfs-progs 4.2.2 userspace is recommended as soon as 
you need its features, which include the latest patches for repair and 
recovery, so as soon as you have a filesystem that's not working as 
expected, if not before.  (Note that earlier btrfs-progs 4.2 releases, 
before 4.2.2, had a buggy mkfs.btrfs, so they should be skipped if you 
will be doing mkfs.btrfs with them, and any btrfs created with those 
versions should have what's on them backed up if it's not already, and 
the filesystems recreated with 4.2.2, as they'll be unstable and are 
subject to failure.)

> 1. Is there any way to recover btrfs at the moment? Or the easiest I can
> do is to mount ro, copy all data to another drive, re-create btrfs
> volume and copy back?

Sysadmin's rule of backups:  If data isn't backed up, by definition you 
value the data less than the cost of time/hassle/resources to do the 
backup, so loss of a filesystem is never a big problem, because if the 
data was of any value, it was backed up and can be restored from that 
backup, and if it wasn't backed up, then by definition you have already 
saved the more important to you commodity, the hassle/time/resources you 
would have spent doing the backup.  Therefore, loss of a filesystem is 
loss of throw-away data in any case, either because it was backed up (and 
a would-be backup that hasn't been tested restorable isn't yet a 
completed backup, so doesn't count), or because the data really was throw-
away data, not worth the hassle of backing up in the first place, even at 
risk of loss should the un-backed-up data be lost.

No exceptions.  Any after-the-fact protests to the contrary simply put 
the lie to claims that the value was considered valuable, since actions 
spoke louder than words and actions defined the data as throw-away.

Therefore, no worries.  Worst-case, you either recover the data from 
backup, or if it wasn't backed up, by definition, it wasn't valuable data 
in the first place.  Either way, no valuable data was, or can be, lost.

(It's worth noting that this rule nicely takes care of the loss of both 
the working copy and N'th backup case, as well, since again, either it 
was worth the cost of N+1 levels of backup, or that N+1 backup wasn't 
made, which automatically defines the data as not worth the cost of the 
the N+1 backup, at least relative to the risk factor that it might 
actually be needed.  That remains the case, regardless of whether N=0 or 
N=10^1000, since even at N=10^1000, backup to level N+1 is either worth 
the cost vs. risk -- the data really is THAT valuable -- or it's not.)

Thus, the easiest way is very possibly to blow away the filesystem, 
recreate and restore from backup, assuming the data was valuable enough 
to make that backup in the first place.  If it wasn't, then we already 
know the value of the data is relatively limited, and the question 
becomes one of whether the chance of recovery of the already known to be 
very limited value data is worth the hassle cost of trying to do that 
recovery.

FWIW, here, I do have backups, but I don't always keep them as current as 
I might.  By doing so, I know my actions are defining the value of the 
data in the delta between the backups and current status as very limited, 
but that's the choice I'm making.

Fortunately for me, btrfs restore (the actual btrfs restore command), 
working on the unmounted filesystem, can often restore the data from the 
filesystem even if it won't mount, so the risk of actual loss of that 
data is much lower than the risk of not actually being able to mount the 
filesystem, of course letting me get away with delaying backup updates 
even longer, as the risk of total loss of the data in the delta between 
the backup and current is much lower than it would be otherwise, thereby 
making the cost of backup updates relatively higher in comparison, 
meaning I can and do space them further apart.

FWIW I've had to use btrfs restore twice, since I started using btrfs.  
Newer btrfs restore (from newer btrfs-progs) works better than older 
versions, too, letting you optionally restore ownership/permissions and 
symlinks, where previously both were lost, symlinks simply not restored, 
and ownership/permissions the default for the btrfs restore process 
(root, obviously, umask defaults).  See what I mean about current 
userspace being recommended. =:^)

Since in your case you can mount, even if it must be read-only, the same 
logic applies, except that grabbing the data off the filesystem is easier 
since you can simply copy it off and don't need btrfs restore to do it.

Of course the existence of those patches gives you another alternative as 
well, letting you judge the hassle cost of setting up the build 
environment and updating, against that of doing the copy off the read-
only mounted filesystem, against that of simply declaring the filesystem 
a loss and blowing it away, to either restore from backup, or if it 
wasn't backed up, simply losing what is already defined as data of very 
limited value anyway.

> 2. How to avoid such a trap in the future?

Keep current. =:^)  At least to latest LTS kernel and last release of 
last-but-one userspace series (which would be 4.1.2 IIRC as I don't 
remember a 4.1.3 being released).

Or at the bigger picture, ask yourself whether running btrfs is really 
appropriate for you until it further stabilizes, since it's not fully 
stable and mature yet, and running it is thereby incompatible with the 
conservative stability objectives of those who wish to run older tried 
and tested really stable versions.  Perhaps ext4 (or even ext3), or 
reiserfs (my previous filesystem of choice, with which I've had extremely 
good experience) or xfs are more appropriate choices for you, if you 
really need that stability and maturity.

> 3. How can I know what version of kernel the patch "Per-chunk degradable
> check" is targeting?

It may be worth (re)reading the btrfs wiki page on sources.  Generally 
speaking, there's an integration branch, where patches deemed mostly 
ready (after on-list review) are included, before they're accepted into 
the mainline Linus kernel.  Otherwise, patches are generally based on 
mainline, currently 4.3-rcX, unless otherwise noted.  If you follow the 
list, you'll see the pull requests as they are posted, and for the Linus 
kernel, pulls are usually accepted within a day or so, if you're 
following Linus kernel git, as I am.

For userspace, git master branch is always the current release.  There's 
a devel branch that's effectively the same as current integration, except 
that it's no longer updated on the kernel.org mirrors.  The github mirror 
or .cz mirrors (again, as listed on the wiki) have the current devel 
branch, however, and that's what gentoo's "live" ebuild now points at, 
and what I'm running here (after I filed a gentoo bug because the live 
ebuild was pointed at the stale devel branch of the kernel.org kdave 
mirror and thus was no longer updating, that got the live ebuild pointed 
at the current devel branch on the .cz mirrors).

So you can either run current release and cherry-pick patches you want/
need as they are posted to the list, or if you want something live but a 
bit more managed than that, run the integration branches and/or for 
userspace, the devel branch.

> 4. What is the best way to express/vote for new features or suggestions
> (wikipage "Project_ideas" / bugzilla)?

Well, the wiki page is user-editable, if you register.  (Tho last I knew, 
there was some problem with at least some wiki user registrations, 
requiring admin intervention in some cases as posted to the list.)  
Personally, I'm more a list person, however, and have never registered on 
the wiki.

In general, however, there's only a few btrfs devs, and between bug 
tracking and fixing and development of the features they're already 
working on or have already roadmapped as their next project, with each 
feature typically taking a kernel cycle and often several kernel cycles 
to develop and stabilize, they don't so often pick "new" features to work 
on.

There are independent devs that sometimes pick a particular feature 
they're interested in, and submit patches for it, but those features may 
or may not be immediately integrated, depending on maturity of the patch 
set, how it meshes with the existing roadmap, whether the dev intends to 
continue to support that feature or leave it to existing devs to support 
after development, and in general, how well that dev works with existing 
longer-term btrfs devs.  In general, a dev interested in such a project 
should either be prepared to carry and maintain the patches as an 
independent patch set for some time if they're not immediately 
integrated, or should plan on a one-time "proof of concept" patch set 
that will then go stale if it's not integrated, tho it may still be 
better than starting from scratch, should somebody later want to pick up 
the set and update it for integration.

So definitely, I'd say add it to the wiki page, so it doesn't get lost 
and can be picked up when it fits into the roadmap, but be prepared for 
it to sit there, unimplemented, for some years, as there's simply way 
more ideas than resources to implement them, and the most in-demand 
features will obviously be already listed by now.

For more minor suggestions, tweaks to current functionality or output, 
etc, run current so you're suggestions are on top of a current base, and 
either post the suggestions here, or where they fit, add them as comments 
to proposed patches as they are posted.  Of course, if you're a dev and 
can code them up as patches yourself, so much the better! =:^)
(I'm not, FWIW. =:^( )

Many of your suggestions above fit this category, minor improvements to 
current output.  However, in some cases the wording in current is already 
better than what you were running, so your suggestions read as stale, and 
in others, they don't quite read (to me at least, tho I already said I'm 
not a dev) as practical.

In particular, tracking last seen device doesn't appear practical to me, 
since in many instances, device assignment is dynamic, and what was
/dev/sdc3 a couple boots ago may well be /dev/sde3 this time around, in 
which case listing /dev/sdc3 could well only confuse the user even more.

Tho that isn't to say that the suggestions don't have some merit, 
pointing out where some change of wording, if not to your suggested 
wording, might be useful.

In particular, btrfs filesystem show, should work with both mounted and 
unmounted filesystems, and would have perhaps given you some hints about 
what devices should have been in the filesystem.  The assumption seems to 
be implicit that a user will know to run that, now, but perhaps an 
explicit suggestion to run btrfs filesystem show, would be worthwhile.  
The case can of course be argued that such an explicit suggestion isn't 
appropriate for dmesg, as well, but at least to my thinking, it's at 
least practical and could be debated on the merits, where I don't 
consider the tracking of last seen device as practical at all.

Anyway, btrfs filesystem show, should work for unmounted as well as 
mounted filesystems, and is already designed to do what you were 
expecting btrfs device scan to do, in terms of output.  Meanwhile, btrfs 
device scan is designed purely to update the btrfs-kernel-module's idea 
of what btrfs filesystems are available, and as such, it doesn't output 
anything, tho if there was some change that the kernel module didn't know 
about, a btrfs filesystem show, followed by a btrfs device scan and 
another btrfs filesystem show, would produce different results for the 
two show outputs.  (Meanwhile, show's --mounted and --all-devices options 
can change what's listed as well, and if you're interested in just one 
filesystem, you can feed that to show as well, to get output for just it, 
instead of for all btrfs the system knows about.  See the manpage...)

Similarly, your btrfs scrub "was aborted after X seconds" issue is known, 
and I believe fixed in something that's not effectively ancient history, 
in terms of btrfs development.  So remarking on it simply highlights the 
fact that you're running ancient versions and complaining about long 
since fixed issues, instead of running current versions where at least 
your complaints might still have some validity.  And if you were running 
current and still had the problem, well at least I'd know that while I 
remember it being discussed, the fix could not have made it into current 
yet, since the bad output (which I don't recall seeing in older versions 
either, possibly because I run multiple small btrfs on partitioned ssds, 
so the other scrubs completed fast enough I didn't have a chance to see 
the aborted after one completed/aborted but before the others did) would 
then still be reported in current, tho I /think/ it has been fixed since 
it was discussed, but I didn't actually track that individual fix to see 
if it's in current or not, since I never saw the problem in my case 
anyway.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Recover btrfs volume which can only be mounded in read-only mode

Reply via email to