Re: Can't mount degraded. How to remove/add drives OFFLINE?

2015-08-14 Thread Timothy Normand Miller
I'm not sure my situation is quite like the one you linked, so here's
my bug report:

https://bugzilla.kernel.org/show_bug.cgi?id=102881

On Fri, Aug 14, 2015 at 2:44 PM, Chris Murphy li...@colorremedies.com wrote:
 On Fri, Aug 14, 2015 at 12:12 PM, Timothy Normand Miller
 theo...@gmail.com wrote:
 Sorry about that empty email.  I hit a wrong key, and gmail decided to send.

 Anyhow, my replacement drive is going to arrive this evening, and I
 need to know how to add it to my btrfs array.  Here's the situation:

 - I had a drive fail, so I removed it and mounted degraded.
 - I hooked up a replacement drive, did an add on that one, and did a
 delete missing.
 - During the rebalance, the replacement drive failed, there were OOPSes, etc.
 - Now, although all of my data is there, I can't mount degraded,
 because btrfs is complaining that too many devices are missing (3 are
 there, but it sees 2 missing).

 It might be related to this (long) bug:
 https://bugzilla.kernel.org/show_bug.cgi?id=92641

 While Btrfs RAID 1 can tolerate only a single device failure, what you
 have is an in-progress rebuild of a missing device. If it becomes
 missing, the volume should be no worse off than it was before. But
 Btrfs doesn't see it this way, instead is sees this as two separate
 missing devices and now too many devices missing and it refuses to
 proceed. And there's no mechanism to remove missing devices unless you
 can mount rw. So it's stuck.


 So I could use some help with cleaning up this mess.  All the data is
 there, so I need to know how to either force it to mount degraded, or
 add and remove devices offline.  Where do I begin?

 You can try to ask on IRC. I have no ideas for this scenario, I've
 tried and failed. My case was throw away, what should still be
 possible is using btrfs restore.


 Also, doesn't it seem a bit arbitrary that there are too many
 missing, when all of the data is there?  If I understand correctly,
 all four drives in my RAID1 should all have copies of the metadata,

 No that's not correct. RAID 1 means 2 copies of metadata. In a 4
 device RAID 1 that's still only 2 copies. It is not n-way RAID 1.

 But that doesn't matter here, the problem is that Btrfs has a narrow
 idea of the volume, it assumes without context that once the number of
 devices is below the minimum, the volume can't be mounted. In reality,
 an exception exists if the failure is for an in-progress rebuild of a
 missing drive. That drive failing should mean the volume is no worse
 off than before but Btrfs doesn't know that.

 Pretty sure about that anyway.


 and of the remaining three good drives, there should be one or two
 copies of every data block.  So it's all there, but btrfs has decided,
 based on the NUMBER of missing devices, that it won't mount.
 Shouldn't it refuse to mount if it knows there is data missing?  For
 that matter, why should it even refuse in that case?  So some data
 might missing, so it should throw some errors if you try to access
 that missing data.  Right?

 I think no data is missing, no metadata is missing, and Btrfs is
 confused and stuck in this case.

 --
 Chris Murphy



-- 
Timothy Normand Miller, PhD
Assistant Professor of Computer Science, Binghamton University
http://www.cs.binghamton.edu/~millerti/
Open Graphics Project
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Can't mount degraded. How to remove/add drives OFFLINE?

2015-08-14 Thread Chris Murphy
On Fri, Aug 14, 2015 at 1:03 PM, Timothy Normand Miller
theo...@gmail.com wrote:
 I'm not sure my situation is quite like the one you linked, so here's
 my bug report:

 https://bugzilla.kernel.org/show_bug.cgi?id=102881

I can easily reproduce with just 2 device RAID. I updated the bug.
It's best these are separate bugs, but I think the underlying problems
are related.

The work around is to mount -o ro,degraded, and then move data to a
new Btrfs volume with btrfs send/receive or conventional copy for data
that's not already in a read-only snapshot.


-- 
Chris Murphy
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Can't mount degraded. How to remove/add drives OFFLINE?

2015-08-14 Thread Chris Murphy
On Fri, Aug 14, 2015 at 12:12 PM, Timothy Normand Miller
theo...@gmail.com wrote:
 Sorry about that empty email.  I hit a wrong key, and gmail decided to send.

 Anyhow, my replacement drive is going to arrive this evening, and I
 need to know how to add it to my btrfs array.  Here's the situation:

 - I had a drive fail, so I removed it and mounted degraded.
 - I hooked up a replacement drive, did an add on that one, and did a
 delete missing.
 - During the rebalance, the replacement drive failed, there were OOPSes, etc.
 - Now, although all of my data is there, I can't mount degraded,
 because btrfs is complaining that too many devices are missing (3 are
 there, but it sees 2 missing).

It might be related to this (long) bug:
https://bugzilla.kernel.org/show_bug.cgi?id=92641

While Btrfs RAID 1 can tolerate only a single device failure, what you
have is an in-progress rebuild of a missing device. If it becomes
missing, the volume should be no worse off than it was before. But
Btrfs doesn't see it this way, instead is sees this as two separate
missing devices and now too many devices missing and it refuses to
proceed. And there's no mechanism to remove missing devices unless you
can mount rw. So it's stuck.


 So I could use some help with cleaning up this mess.  All the data is
 there, so I need to know how to either force it to mount degraded, or
 add and remove devices offline.  Where do I begin?

You can try to ask on IRC. I have no ideas for this scenario, I've
tried and failed. My case was throw away, what should still be
possible is using btrfs restore.


 Also, doesn't it seem a bit arbitrary that there are too many
 missing, when all of the data is there?  If I understand correctly,
 all four drives in my RAID1 should all have copies of the metadata,

No that's not correct. RAID 1 means 2 copies of metadata. In a 4
device RAID 1 that's still only 2 copies. It is not n-way RAID 1.

But that doesn't matter here, the problem is that Btrfs has a narrow
idea of the volume, it assumes without context that once the number of
devices is below the minimum, the volume can't be mounted. In reality,
an exception exists if the failure is for an in-progress rebuild of a
missing drive. That drive failing should mean the volume is no worse
off than before but Btrfs doesn't know that.

Pretty sure about that anyway.


 and of the remaining three good drives, there should be one or two
 copies of every data block.  So it's all there, but btrfs has decided,
 based on the NUMBER of missing devices, that it won't mount.
 Shouldn't it refuse to mount if it knows there is data missing?  For
 that matter, why should it even refuse in that case?  So some data
 might missing, so it should throw some errors if you try to access
 that missing data.  Right?

I think no data is missing, no metadata is missing, and Btrfs is
confused and stuck in this case.

-- 
Chris Murphy
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Can't mount degraded. How to remove/add drives OFFLINE?

2015-08-14 Thread Timothy Normand Miller
My

-- 
Timothy Normand Miller, PhD
Assistant Professor of Computer Science, Binghamton University
http://www.cs.binghamton.edu/~millerti/
Open Graphics Project
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Can't mount degraded. How to remove/add drives OFFLINE?

2015-08-14 Thread Timothy Normand Miller
Sorry about that empty email.  I hit a wrong key, and gmail decided to send.

Anyhow, my replacement drive is going to arrive this evening, and I
need to know how to add it to my btrfs array.  Here's the situation:

- I had a drive fail, so I removed it and mounted degraded.
- I hooked up a replacement drive, did an add on that one, and did a
delete missing.
- During the rebalance, the replacement drive failed, there were OOPSes, etc.
- Now, although all of my data is there, I can't mount degraded,
because btrfs is complaining that too many devices are missing (3 are
there, but it sees 2 missing).

So I could use some help with cleaning up this mess.  All the data is
there, so I need to know how to either force it to mount degraded, or
add and remove devices offline.  Where do I begin?

Also, doesn't it seem a bit arbitrary that there are too many
missing, when all of the data is there?  If I understand correctly,
all four drives in my RAID1 should all have copies of the metadata,
and of the remaining three good drives, there should be one or two
copies of every data block.  So it's all there, but btrfs has decided,
based on the NUMBER of missing devices, that it won't mount.
Shouldn't it refuse to mount if it knows there is data missing?  For
that matter, why should it even refuse in that case?  So some data
might missing, so it should throw some errors if you try to access
that missing data.  Right?

Thanks!

-- 
Timothy Normand Miller, PhD
Assistant Professor of Computer Science, Binghamton University
http://www.cs.binghamton.edu/~millerti/
Open Graphics Project
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Can't mount degraded. How to remove/add drives OFFLINE?

2015-08-14 Thread Anand Jain



On 08/15/2015 02:12 AM, Timothy Normand Miller wrote:

Sorry about that empty email.  I hit a wrong key, and gmail decided to send.

Anyhow, my replacement drive is going to arrive this evening, and I
need to know how to add it to my btrfs array.  Here's the situation:

- I had a drive fail, so I removed it and mounted degraded.


that bit dangerous to do without the below patch. patch has more details 
why.



- I hooked up a replacement drive, did an add on that one, and did a
delete missing.
- During the rebalance, the replacement drive failed, there were OOPSes, etc.
- Now, although all of my data is there, I can't mount degraded,
because btrfs is complaining that too many devices are missing (3 are
there, but it sees 2 missing).



This is addressed in the patch

  [PATCH 23/23] Btrfs: allow -o rw,degraded for single group profile


Thanks, Anand



So I could use some help with cleaning up this mess.  All the data is
there, so I need to know how to either force it to mount degraded, or
add and remove devices offline.  Where do I begin?

Also, doesn't it seem a bit arbitrary that there are too many
missing, when all of the data is there?  If I understand correctly,
all four drives in my RAID1 should all have copies of the metadata,
and of the remaining three good drives, there should be one or two
copies of every data block.  So it's all there, but btrfs has decided,
based on the NUMBER of missing devices, that it won't mount.
Shouldn't it refuse to mount if it knows there is data missing?  For
that matter, why should it even refuse in that case?  So some data
might missing, so it should throw some errors if you try to access
that missing data.  Right?

Thanks!


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Can't mount degraded. How to remove/add drives OFFLINE?

2015-08-14 Thread Anand Jain




Just to be clear, I removed the drive (the original failed drive) when
the power was off, then powered up, and then mounted degraded.  That's
not dangerous that I know of.


patch has details. pls refer.


Where is this patch, and what kernel versions can this be applied to?



https://patchwork.kernel.org/patch/7014141/

its on 4.3. but should apply nice on below.

thanks
Anand
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Can't mount degraded. How to remove/add drives OFFLINE?

2015-08-14 Thread Timothy Normand Miller
On Fri, Aug 14, 2015 at 7:49 PM, Anand Jain anand.j...@oracle.com wrote:



 - I had a drive fail, so I removed it and mounted degraded.


 that bit dangerous to do without the below patch. patch has more details
 why.

Just to be clear, I removed the drive (the original failed drive) when
the power was off, then powered up, and then mounted degraded.  That's
not dangerous that I know of.


 - I hooked up a replacement drive, did an add on that one, and did a
 delete missing.
 - During the rebalance, the replacement drive failed, there were OOPSes,
 etc.
 - Now, although all of my data is there, I can't mount degraded,
 because btrfs is complaining that too many devices are missing (3 are
 there, but it sees 2 missing).



 This is addressed in the patch

   [PATCH 23/23] Btrfs: allow -o rw,degraded for single group profile


Where is this patch, and what kernel versions can this be applied to?



-- 
Timothy Normand Miller, PhD
Assistant Professor of Computer Science, Binghamton University
http://www.cs.binghamton.edu/~millerti/
Open Graphics Project
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Can't mount degraded. How to remove/add drives OFFLINE?

2015-08-14 Thread Timothy Normand Miller
I applied that patch to my 4.1.4, it mounted degraded, and now it's
balancing to the new drive.

Thanks for all the help!

On Fri, Aug 14, 2015 at 8:28 PM, Anand Jain anand.j...@oracle.com wrote:


 Just to be clear, I removed the drive (the original failed drive) when
 the power was off, then powered up, and then mounted degraded.  That's
 not dangerous that I know of.


 patch has details. pls refer.


 Where is this patch, and what kernel versions can this be applied to?



 https://patchwork.kernel.org/patch/7014141/

 its on 4.3. but should apply nice on below.

 thanks
 Anand



-- 
Timothy Normand Miller, PhD
Assistant Professor of Computer Science, Binghamton University
http://www.cs.binghamton.edu/~millerti/
Open Graphics Project
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Can't mount degraded. How to remove/add drives OFFLINE?

2015-08-14 Thread Chris Murphy
I thought for a second that maybe the problem is due to the phantom
single chunk(s) created at mkfs time. I redid the test, and did a
balance to get rid of the single chunk. I did this right after
populating volume with some data. But the problem still happens.

---
Chris Murphy
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html