Re: Possible Raid Bug

2016-03-28 Thread Patrik Lundquist
On 28 March 2016 at 05:54, Anand Jain  wrote:
>
> On 03/26/2016 07:51 PM, Patrik Lundquist wrote:
>>
>> # btrfs device stats /mnt
>>
>> [/dev/sde].write_io_errs   11
>> [/dev/sde].read_io_errs0
>> [/dev/sde].flush_io_errs   2
>> [/dev/sde].corruption_errs 0
>> [/dev/sde].generation_errs 0
>>
>> The old counters are back. That's good, but wtf?
>
>
>  No. I doubt if they are old counters. The steps above didn't
>  show old error counts, but since you have created a file
>  test3 so there will be some write_io_errors, which we don;t
>  see after the balance. So I doubt if they are old counter
>  but instead they are new flush errors.

No, /mnt/test3 doesn't generate errors, only 'single' block groups.
The old counters seem to be cached somewhere and replace doesn't reset
them everywhere.

One more time with more device stats and I've upgraded the kernel to
Linux debian 4.5.0-trunk-amd64 #1 SMP Debian 4.5-1~exp1 (2016-03-20)
x86_64 GNU/Linux

# mkfs.btrfs -m raid10 -d raid10 /dev/sdb /dev/sdc /dev/sdd /dev/sde

# mount /dev/sdb /mnt; dmesg | tail
# touch /mnt/test1; sync; btrfs device usage /mnt

Only raid10 profiles.

# echo 1 >/sys/block/sde/device/delete; dmesg | tail

[  426.831037] sd 5:0:0:0: [sde] Synchronizing SCSI cache
[  426.831517] sd 5:0:0:0: [sde] Stopping disk
[  426.845199] ata6.00: disabled

We lost a disk.

# touch /mnt/test2; sync; dmesg | tail

[  467.126471] BTRFS error (device sde): bdev /dev/sde errs: wr 1, rd
0, flush 0, corrupt 0, gen 0
[  467.127386] BTRFS error (device sde): bdev /dev/sde errs: wr 2, rd
0, flush 0, corrupt 0, gen 0
[  467.128125] BTRFS error (device sde): bdev /dev/sde errs: wr 3, rd
0, flush 0, corrupt 0, gen 0
[  467.128640] BTRFS error (device sde): bdev /dev/sde errs: wr 4, rd
0, flush 0, corrupt 0, gen 0
[  467.129215] BTRFS error (device sde): bdev /dev/sde errs: wr 4, rd
0, flush 1, corrupt 0, gen 0
[  467.129331] BTRFS warning (device sde): lost page write due to IO
error on /dev/sde
[  467.129334] BTRFS error (device sde): bdev /dev/sde errs: wr 5, rd
0, flush 1, corrupt 0, gen 0
[  467.129420] BTRFS warning (device sde): lost page write due to IO
error on /dev/sde
[  467.129422] BTRFS error (device sde): bdev /dev/sde errs: wr 6, rd
0, flush 1, corrupt 0, gen 0

We've got write errors on the lost disk.

# btrfs device usage /mnt

No 'single' profiles because we haven't remounted yet.

# btrfs device stat /mnt

[/dev/sde].write_io_errs   6
[/dev/sde].read_io_errs0
[/dev/sde].flush_io_errs   1
[/dev/sde].corruption_errs 0
[/dev/sde].generation_errs 0

# reboot
# wipefs -a /dev/sde; reboot

# mount -o degraded /dev/sdb /mnt; dmesg | tail

[   52.876897] BTRFS info (device sdb): allowing degraded mounts
[   52.876901] BTRFS info (device sdb): disk space caching is enabled
[   52.876902] BTRFS: has skinny extents
[   52.878008] BTRFS warning (device sdb): devid 4 uuid
231d7892-3f31-40b5-8dff-baf8fec1a8aa is missing
[   52.879057] BTRFS info (device sdb): bdev (null) errs: wr 6, rd 0,
flush 1, corrupt 0, gen 0

# btrfs device usage /mnt

Still only raid10 profiles.

# btrfs device stat /mnt

[(null)].write_io_errs   6
[(null)].read_io_errs0
[(null)].flush_io_errs   1
[(null)].corruption_errs 0
[(null)].generation_errs 0

/dev/sde is now called "(null)". Print device id instead? E.g.
"[devid:4].write_io_errs   6"

# touch /mnt/test3; sync; btrfs device usage /mnt
/dev/sdb, ID: 1
   Device size: 2.00GiB
   Data,single:   624.00MiB
   Data,RAID10:   102.38MiB
   Metadata,RAID10:   102.38MiB
   System,RAID10:   4.00MiB
   Unallocated: 1.19GiB

/dev/sdc, ID: 2
   Device size: 2.00GiB
   Data,RAID10:   102.38MiB
   Metadata,RAID10:   102.38MiB
   System,single:  32.00MiB
   System,RAID10:   4.00MiB
   Unallocated: 1.76GiB

/dev/sdd, ID: 3
   Device size: 2.00GiB
   Data,RAID10:   102.38MiB
   Metadata,single:   256.00MiB
   Metadata,RAID10:   102.38MiB
   System,RAID10:   4.00MiB
   Unallocated: 1.55GiB

missing, ID: 4
   Device size:   0.00B
   Data,RAID10:   102.38MiB
   Metadata,RAID10:   102.38MiB
   System,RAID10:   4.00MiB
   Unallocated: 1.80GiB

Now we've got 'single' profiles on all devices except the missing one.
Replace missing device before unmount or get stuck with a read-only
filesystem.

# btrfs device stat /mnt

Same as before. Only old errors on the missing device.

# btrfs replace start -B 4 /dev/sde /mnt; dmesg | tail

[ 1268.598652] BTRFS info (device sdb): dev_replace from  (devid 4) to /dev/sde started
[ 1268.615601] BTRFS info (device sdb): dev_replace from  (devid 4) to /dev/sde finished

# btrfs device stats /mnt

[/dev/sde].write_io_errs   0
[/dev/sde].read_io_errs0
[/dev/sde].flush_io_errs   0
[/dev/sde].corruption_errs 0
[/dev/sde].generation_errs 0

Device "(null)" is back to /dev/sde and the error counts have been reset.

# btrfs b

Re: Possible Raid Bug

2016-03-27 Thread Anand Jain


Hi Patrik,

Thanks for posting a test case. more below.

On 03/26/2016 07:51 PM, Patrik Lundquist wrote:

So with the lessons learned:

# mkfs.btrfs -m raid10 -d raid10 /dev/sdb /dev/sdc /dev/sdd /dev/sde

# mount /dev/sdb /mnt; dmesg | tail
# touch /mnt/test1; sync; btrfs device usage /mnt

Only raid10 profiles.

# echo 1 >/sys/block/sde/device/delete

We lost a disk.

# touch /mnt/test2; sync; dmesg | tail

We've got write errors.

# btrfs device usage /mnt

No 'single' profiles because we haven't remounted yet.

# reboot
# wipefs -a /dev/sde; reboot

# mount -o degraded /dev/sdb /mnt; dmesg | tail
# btrfs device usage /mnt

Still only raid10 profiles.

# touch /mnt/test3; sync; btrfs device usage /mnt

Now we've got 'single' profiles. Replace now or get hosed.


 Since you are replacing the failed device without mount/unmount/reboot,
 so this should work.

 And you would need those parts of hot spare/auto replace patches only
 if the test case had unmount/mount or reboot at this stage.



# btrfs replace start -B 4 /dev/sde /mnt; dmesg | tail

# btrfs device stats /mnt

[/dev/sde].write_io_errs   0
[/dev/sde].read_io_errs0
[/dev/sde].flush_io_errs   0
[/dev/sde].corruption_errs 0
[/dev/sde].generation_errs 0

We didn't inherit the /dev/sde error count. Is that a bug?


  No. Its other way, it would have been a bug if the replace-target
  inherited the error counters.


# btrfs balance start -dconvert=raid10,soft -mconvert=raid10,soft
-sconvert=raid10,soft -vf /mnt; dmesg | tail

# btrfs device usage /mnt

Back to only 'raid10' profiles.

# umount /mnt; mount /dev/sdb /mnt; dmesg | tail

# btrfs device stats /mnt

[/dev/sde].write_io_errs   11
[/dev/sde].read_io_errs0
[/dev/sde].flush_io_errs   2
[/dev/sde].corruption_errs 0
[/dev/sde].generation_errs 0

The old counters are back. That's good, but wtf?


 No. I doubt if they are old counters. The steps above didn't
 show old error counts, but since you have created a file
 test3 so there will be some write_io_errors, which we don;t
 see after the balance. So I doubt if they are old counter
 but instead they are new flush errors.


# btrfs device stats -z /dev/sde

Give /dev/sde a clean bill of health. Won't warn when mounting again.





Thanks, Anand

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Possible Raid Bug

2016-03-27 Thread Stephen Williams
Yeah I think the Gotchas page would be a good place to give people a
heads up.

-- 
  Stephen Williams
  steph...@veryfast.biz

On Sat, Mar 26, 2016, at 09:58 PM, Chris Murphy wrote:
> On Sat, Mar 26, 2016 at 8:00 AM, Stephen Williams 
> wrote:
> 
> > I know this is quite a rare occurrence for home use but for Data center
> > use this is something that will happen A LOT.
> > This really should be placed in the wiki while we wait for a fix. I can
> > see a lot of sys admins crying over this.
> 
> Maybe on the gotchas page? While it's not a data loss bug, it might be
> viewed as an uptime bug because the dataset is stuck being ro and
> hence unmodifiable, until a restore to a rw volume is complete.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Possible Raid Bug

2016-03-26 Thread Chris Murphy
On Sat, Mar 26, 2016 at 8:00 AM, Stephen Williams  wrote:

> I know this is quite a rare occurrence for home use but for Data center
> use this is something that will happen A LOT.
> This really should be placed in the wiki while we wait for a fix. I can
> see a lot of sys admins crying over this.

Maybe on the gotchas page? While it's not a data loss bug, it might be
viewed as an uptime bug because the dataset is stuck being ro and
hence unmodifiable, until a restore to a rw volume is complete.

Since we can ro mount a volume, some way to safely make it a seed
device could be useful. All that's needed to make it rw is adding even
a small USB stick for example, and now at least ro snapshots can be
taken and migrate data off the volume. A larger device that's used for
rw would allow this raid to be brought back online. And then once the
new array is up and has most data restored, a short downtime to get
the latest incremental changes sent over.

Yeah, the alternative to this is a cluster, and you just consider this
one brick a loss and move on. But most regular users don't do
clusters, even with big (for them) storage.


-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Possible Raid Bug

2016-03-26 Thread Chris Murphy
On Sat, Mar 26, 2016 at 5:51 AM, Patrik Lundquist
 wrote:

> # btrfs replace start -B 4 /dev/sde /mnt; dmesg | tail
>
> # btrfs device stats /mnt
>
> [/dev/sde].write_io_errs   0
> [/dev/sde].read_io_errs0
> [/dev/sde].flush_io_errs   0
> [/dev/sde].corruption_errs 0
> [/dev/sde].generation_errs 0
>
> We didn't inherit the /dev/sde error count. Is that a bug?

I'm not sure where this information is stored. Presumably in the fs
metadata? So when mounted degraded the counter is zero's is that
what's going on?



-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Possible Raid Bug

2016-03-26 Thread Stephen Williams
Can confirm that you only get one chance to fix the problem before the
array is dead.

I know this is quite a rare occurrence for home use but for Data center
use this is something that will happen A LOT. 
This really should be placed in the wiki while we wait for a fix. I can
see a lot of sys admins crying over this. 

-- 
  Stephen Williams
  steph...@veryfast.biz

On Sat, Mar 26, 2016, at 11:51 AM, Patrik Lundquist wrote:
> So with the lessons learned:
> 
> # mkfs.btrfs -m raid10 -d raid10 /dev/sdb /dev/sdc /dev/sdd /dev/sde
> 
> # mount /dev/sdb /mnt; dmesg | tail
> # touch /mnt/test1; sync; btrfs device usage /mnt
> 
> Only raid10 profiles.
> 
> # echo 1 >/sys/block/sde/device/delete
> 
> We lost a disk.
> 
> # touch /mnt/test2; sync; dmesg | tail
> 
> We've got write errors.
> 
> # btrfs device usage /mnt
> 
> No 'single' profiles because we haven't remounted yet.
> 
> # reboot
> # wipefs -a /dev/sde; reboot
> 
> # mount -o degraded /dev/sdb /mnt; dmesg | tail
> # btrfs device usage /mnt
> 
> Still only raid10 profiles.
> 
> # touch /mnt/test3; sync; btrfs device usage /mnt
> 
> Now we've got 'single' profiles. Replace now or get hosed.
> 
> # btrfs replace start -B 4 /dev/sde /mnt; dmesg | tail
> 
> # btrfs device stats /mnt
> 
> [/dev/sde].write_io_errs   0
> [/dev/sde].read_io_errs0
> [/dev/sde].flush_io_errs   0
> [/dev/sde].corruption_errs 0
> [/dev/sde].generation_errs 0
> 
> We didn't inherit the /dev/sde error count. Is that a bug?
> 
> # btrfs balance start -dconvert=raid10,soft -mconvert=raid10,soft
> -sconvert=raid10,soft -vf /mnt; dmesg | tail
> 
> # btrfs device usage /mnt
> 
> Back to only 'raid10' profiles.
> 
> # umount /mnt; mount /dev/sdb /mnt; dmesg | tail
> 
> # btrfs device stats /mnt
> 
> [/dev/sde].write_io_errs   11
> [/dev/sde].read_io_errs0
> [/dev/sde].flush_io_errs   2
> [/dev/sde].corruption_errs 0
> [/dev/sde].generation_errs 0
> 
> The old counters are back. That's good, but wtf?
> 
> # btrfs device stats -z /dev/sde
> 
> Give /dev/sde a clean bill of health. Won't warn when mounting again.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Possible Raid Bug

2016-03-26 Thread Patrik Lundquist
So with the lessons learned:

# mkfs.btrfs -m raid10 -d raid10 /dev/sdb /dev/sdc /dev/sdd /dev/sde

# mount /dev/sdb /mnt; dmesg | tail
# touch /mnt/test1; sync; btrfs device usage /mnt

Only raid10 profiles.

# echo 1 >/sys/block/sde/device/delete

We lost a disk.

# touch /mnt/test2; sync; dmesg | tail

We've got write errors.

# btrfs device usage /mnt

No 'single' profiles because we haven't remounted yet.

# reboot
# wipefs -a /dev/sde; reboot

# mount -o degraded /dev/sdb /mnt; dmesg | tail
# btrfs device usage /mnt

Still only raid10 profiles.

# touch /mnt/test3; sync; btrfs device usage /mnt

Now we've got 'single' profiles. Replace now or get hosed.

# btrfs replace start -B 4 /dev/sde /mnt; dmesg | tail

# btrfs device stats /mnt

[/dev/sde].write_io_errs   0
[/dev/sde].read_io_errs0
[/dev/sde].flush_io_errs   0
[/dev/sde].corruption_errs 0
[/dev/sde].generation_errs 0

We didn't inherit the /dev/sde error count. Is that a bug?

# btrfs balance start -dconvert=raid10,soft -mconvert=raid10,soft
-sconvert=raid10,soft -vf /mnt; dmesg | tail

# btrfs device usage /mnt

Back to only 'raid10' profiles.

# umount /mnt; mount /dev/sdb /mnt; dmesg | tail

# btrfs device stats /mnt

[/dev/sde].write_io_errs   11
[/dev/sde].read_io_errs0
[/dev/sde].flush_io_errs   2
[/dev/sde].corruption_errs 0
[/dev/sde].generation_errs 0

The old counters are back. That's good, but wtf?

# btrfs device stats -z /dev/sde

Give /dev/sde a clean bill of health. Won't warn when mounting again.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Possible Raid Bug

2016-03-25 Thread Duncan
Chris Murphy posted on Fri, 25 Mar 2016 15:34:11 -0600 as excerpted:

> Basically you get one chance to mount rw,degraded and you have to fix
> the problem at that time. And you have to balance away any phantom
> single chunks that have appeared. For what it's worth it's not the
> reboot that degraded it further, it's the unmount and then attempt to
> mount rw,degraded a 2nd time that's not allowed due to this bug.

As CMurphy says here but without mentioning the patch, as Alexander F 
says in sibling to CMurphy's reply, and as I said in my longer 
explanation further upthread, this is a known bug, with a patch in the 
pipeline that really should have made it into 4.5 but didn't as it was 
part of a larger patch set that apparently wasn't considered ready, and 
unfortunately it wasn't cherrypicked.

So right now, yes, known bug.  You get one chance at a degraded-writable 
mount to rebuild the array.  If you crash after writing but before the 
rebuild is complete, too bad, so sad, now you can only mount degraded-
readonly and your only possibility of saving the data (other than 
rebuilding with the appropriate patch) is to do just that, mount degraded-
readonly, and copy off the data to elsewhere.

But there's a patch that has been demonstrated to fix the bug, not only 
in tests, but in live-deployments where people found themselves with a 
degraded-readonly mount until they built with the patch.  Hopefully that 
patch will hit the 4.6 development kernel with a CC to stable, and be 
backported as necessary there, but I'm not sure it will be in 4.6 at this 
point, tho it should hit mainline /eventually/.  Meanwhile, the patch can 
still be applied manually if necessary, and I suppose some distros may 
already be applying it to their shipped versions as it's certainly a fix 
worth having.

I'll simply refer you to previous discussion on the list for the patch, 
as that's where I'd have to look for it if I needed it myself before it 
gets mainlined.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Possible Raid Bug

2016-03-25 Thread Anand Jain



On 03/26/2016 04:09 AM, Alexander Fougner wrote:

2016-03-25 20:57 GMT+01:00 Patrik Lundquist :

On 25 March 2016 at 18:20, Stephen Williams  wrote:


Your information below was very helpful and I was able to recreate the
Raid array. However my initial question still stands - What if the
drives dies completely? I work in a Data center and we see this quite a
lot where a drive is beyond dead - The OS will literally not detect it.


That's currently a weakness of Btrfs. I don't know how people deal
with it in production. I think Anand Jain is working on improving it.


 We need this issue be fixed for the real production usage.

 Patch set of hot spare contains the fix for this. Currently I am
 fixing an issue (#5) which Yauhen reported and thats related to the
 auto replace. Refreshed v2 will be out soon.

Thanks, Anand


At this point would the Raid10 array be beyond repair? As you need the
drive present in order to mount the array in degraded mode.


Right... let's try it again but a little bit differently.

# mount /dev/sdb /mnt

Let's drop the disk.

# echo 1 >/sys/block/sde/device/delete

[ 3669.024256] sd 5:0:0:0: [sde] Synchronizing SCSI cache
[ 3669.024934] sd 5:0:0:0: [sde] Stopping disk
[ 3669.037028] ata6.00: disabled

# touch /mnt/test3
# sync

[ 3845.960839] BTRFS error (device sdb): bdev /dev/sde errs: wr 1, rd
0, flush 0, corrupt 0, gen 0
[ 3845.961525] BTRFS error (device sdb): bdev /dev/sde errs: wr 2, rd
0, flush 0, corrupt 0, gen 0
[ 3845.962738] BTRFS error (device sdb): bdev /dev/sde errs: wr 3, rd
0, flush 0, corrupt 0, gen 0
[ 3845.963038] BTRFS error (device sdb): bdev /dev/sde errs: wr 4, rd
0, flush 0, corrupt 0, gen 0
[ 3845.963422] BTRFS error (device sdb): bdev /dev/sde errs: wr 4, rd
0, flush 1, corrupt 0, gen 0
[ 3845.963686] BTRFS warning (device sdb): lost page write due to IO
error on /dev/sde
[ 3845.963691] BTRFS error (device sdb): bdev /dev/sde errs: wr 5, rd
0, flush 1, corrupt 0, gen 0
[ 3845.963932] BTRFS warning (device sdb): lost page write due to IO
error on /dev/sde
[ 3845.963941] BTRFS error (device sdb): bdev /dev/sde errs: wr 6, rd
0, flush 1, corrupt 0, gen 0

# umount /mnt

[ 4095.276831] BTRFS error (device sdb): bdev /dev/sde errs: wr 7, rd
0, flush 1, corrupt 0, gen 0
[ 4095.278368] BTRFS error (device sdb): bdev /dev/sde errs: wr 8, rd
0, flush 1, corrupt 0, gen 0
[ 4095.279152] BTRFS error (device sdb): bdev /dev/sde errs: wr 8, rd
0, flush 2, corrupt 0, gen 0
[ 4095.279373] BTRFS warning (device sdb): lost page write due to IO
error on /dev/sde
[ 4095.279377] BTRFS error (device sdb): bdev /dev/sde errs: wr 9, rd
0, flush 2, corrupt 0, gen 0
[ 4095.279609] BTRFS warning (device sdb): lost page write due to IO
error on /dev/sde
[ 4095.279612] BTRFS error (device sdb): bdev /dev/sde errs: wr 10, rd
0, flush 2, corrupt 0, gen 0

# mount -o degraded /dev/sdb /mnt

[ 4608.113751] BTRFS info (device sdb): allowing degraded mounts
[ 4608.113756] BTRFS info (device sdb): disk space caching is enabled
[ 4608.113757] BTRFS: has skinny extents
[ 4608.116557] BTRFS info (device sdb): bdev /dev/sde errs: wr 6, rd
0, flush 1, corrupt 0, gen 0

# touch /mnt/test4
# sync

Writing to the filesystem works while the device is missing.
No new errors in dmesg after re-mounting degraded. Reboot to get back /dev/sde.

[4.329852] BTRFS: device fsid 75737bea-d76c-42f5-b0e6-7d346e38610d
devid 4 transid 26 /dev/sde
[4.330157] BTRFS: device fsid 75737bea-d76c-42f5-b0e6-7d346e38610d
devid 3 transid 31 /dev/sdd
[4.330511] BTRFS: device fsid 75737bea-d76c-42f5-b0e6-7d346e38610d
devid 2 transid 31 /dev/sdc
[4.330865] BTRFS: device fsid 75737bea-d76c-42f5-b0e6-7d346e38610d
devid 1 transid 31 /dev/sdb

/dev/sde transid is lagging behind, of course.

# wipefs -a /dev/sde
# btrfs device scan

# mount -o degraded /dev/sdb /mnt

[  507.248621] BTRFS info (device sdb): allowing degraded mounts
[  507.248626] BTRFS info (device sdb): disk space caching is enabled
[  507.248628] BTRFS: has skinny extents
[  507.252815] BTRFS info (device sdb): bdev /dev/sde errs: wr 6, rd
0, flush 1, corrupt 0, gen 0
[  507.252919] BTRFS: missing devices(1) exceeds the limit(0),


single/dup profile has zero-limit tolerance for missing devices. Only
ro-mount allowed in that case.


writeable mount is not allowed
[  507.278277] BTRFS: open_ctree failed

Well, that was unexpected! Reboot again.

# mount -o degraded /dev/sdb /mnt

[   94.368514] BTRFS info (device sdd): allowing degraded mounts
[   94.368519] BTRFS info (device sdd): disk space caching is enabled
[   94.368521] BTRFS: has skinny extents
[   94.370909] BTRFS warning (device sdd): devid 4 uuid
8549a275-f663-4741-b410-79b49a1d465f is missing
[   94.372170] BTRFS info (device sdd): bdev (null) errs: wr 6, rd 0,
flush 1, corrupt 0, gen 0
[   94.372284] BTRFS: missing devices(1) exceeds the limit(0),
writeable mount is not allowed
[   94.395021] BTRFS: open_ctree failed

No go.

# mount -o degraded,ro /dev/sdb /mnt
# btrfs device sta

Re: Possible Raid Bug

2016-03-25 Thread Chris Murphy
On Fri, Mar 25, 2016 at 1:57 PM, Patrik Lundquist
 wrote:

>
> Only errors on the device formerly known as /dev/sde, so why won't it
> mount degraded,rw? Now I'm stuck like Stephen.
>
> # btrfs device usage /mnt
> /dev/sdb, ID: 1
>Device size: 2.00GiB
>Data,single:   624.00MiB   <<--
>Data,RAID10:   102.38MiB
>Metadata,RAID10:   102.38MiB
>System,RAID10:   4.00MiB
>Unallocated: 1.19GiB
>
> /dev/sdc, ID: 2
>Device size: 2.00GiB
>Data,RAID10:   102.38MiB
>Metadata,RAID10:   102.38MiB
>System,single:  32.00MiB   <<--
>System,RAID10:   4.00MiB
>Unallocated: 1.76GiB
>
> /dev/sdd, ID: 3
>Device size: 2.00GiB
>Data,RAID10:   102.38MiB
>Metadata,single:   256.00MiB   <<--
>Metadata,RAID10:   102.38MiB
>System,RAID10:   4.00MiB
>Unallocated: 1.55GiB
>
> missing, ID: 4
>Device size:   0.00B
>Data,RAID10:   102.38MiB
>Metadata,RAID10:   102.38MiB
>System,RAID10:   4.00MiB
>Unallocated: 1.80GiB
>
> The data written while mounted degraded is in profile 'single' and
> will have to be converted to 'raid10' once the filesystem is whole
> again.
>
> So what do I do now? Why did it degrade further after a reboot?

You're hosed. The file system is read only and can't be fixed. It's an
old bug. It's not a data loss bug, but it's major time loss bug
because now the volume has to be rebuilt, and totally unworkable for
production use.

While the appearance of the single chunks is one bug that shouldn't
happen, the worse bug is the truly bogus one that claims there aren't
enough drives for rw degraded mount. Those single chunks aren't on the
missing drive. They're on the three remaining ones. So the rw fail is
just a bad bug. It's a PITA but at least it's not a data loss bug.

Basically you get one chance to mount rw,degraded and you have to fix
the problem at that time. And you have to balance away any phantom
single chunks that have appeared. For what it's worth it's not the
reboot that degraded it further, it's the unmount and then attempt to
mount rw,degraded a 2nd time that's not allowed due to this bug.


-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Possible Raid Bug

2016-03-25 Thread Alexander Fougner
2016-03-25 20:57 GMT+01:00 Patrik Lundquist :
> On 25 March 2016 at 18:20, Stephen Williams  wrote:
>>
>> Your information below was very helpful and I was able to recreate the
>> Raid array. However my initial question still stands - What if the
>> drives dies completely? I work in a Data center and we see this quite a
>> lot where a drive is beyond dead - The OS will literally not detect it.
>
> That's currently a weakness of Btrfs. I don't know how people deal
> with it in production. I think Anand Jain is working on improving it.
>
>> At this point would the Raid10 array be beyond repair? As you need the
>> drive present in order to mount the array in degraded mode.
>
> Right... let's try it again but a little bit differently.
>
> # mount /dev/sdb /mnt
>
> Let's drop the disk.
>
> # echo 1 >/sys/block/sde/device/delete
>
> [ 3669.024256] sd 5:0:0:0: [sde] Synchronizing SCSI cache
> [ 3669.024934] sd 5:0:0:0: [sde] Stopping disk
> [ 3669.037028] ata6.00: disabled
>
> # touch /mnt/test3
> # sync
>
> [ 3845.960839] BTRFS error (device sdb): bdev /dev/sde errs: wr 1, rd
> 0, flush 0, corrupt 0, gen 0
> [ 3845.961525] BTRFS error (device sdb): bdev /dev/sde errs: wr 2, rd
> 0, flush 0, corrupt 0, gen 0
> [ 3845.962738] BTRFS error (device sdb): bdev /dev/sde errs: wr 3, rd
> 0, flush 0, corrupt 0, gen 0
> [ 3845.963038] BTRFS error (device sdb): bdev /dev/sde errs: wr 4, rd
> 0, flush 0, corrupt 0, gen 0
> [ 3845.963422] BTRFS error (device sdb): bdev /dev/sde errs: wr 4, rd
> 0, flush 1, corrupt 0, gen 0
> [ 3845.963686] BTRFS warning (device sdb): lost page write due to IO
> error on /dev/sde
> [ 3845.963691] BTRFS error (device sdb): bdev /dev/sde errs: wr 5, rd
> 0, flush 1, corrupt 0, gen 0
> [ 3845.963932] BTRFS warning (device sdb): lost page write due to IO
> error on /dev/sde
> [ 3845.963941] BTRFS error (device sdb): bdev /dev/sde errs: wr 6, rd
> 0, flush 1, corrupt 0, gen 0
>
> # umount /mnt
>
> [ 4095.276831] BTRFS error (device sdb): bdev /dev/sde errs: wr 7, rd
> 0, flush 1, corrupt 0, gen 0
> [ 4095.278368] BTRFS error (device sdb): bdev /dev/sde errs: wr 8, rd
> 0, flush 1, corrupt 0, gen 0
> [ 4095.279152] BTRFS error (device sdb): bdev /dev/sde errs: wr 8, rd
> 0, flush 2, corrupt 0, gen 0
> [ 4095.279373] BTRFS warning (device sdb): lost page write due to IO
> error on /dev/sde
> [ 4095.279377] BTRFS error (device sdb): bdev /dev/sde errs: wr 9, rd
> 0, flush 2, corrupt 0, gen 0
> [ 4095.279609] BTRFS warning (device sdb): lost page write due to IO
> error on /dev/sde
> [ 4095.279612] BTRFS error (device sdb): bdev /dev/sde errs: wr 10, rd
> 0, flush 2, corrupt 0, gen 0
>
> # mount -o degraded /dev/sdb /mnt
>
> [ 4608.113751] BTRFS info (device sdb): allowing degraded mounts
> [ 4608.113756] BTRFS info (device sdb): disk space caching is enabled
> [ 4608.113757] BTRFS: has skinny extents
> [ 4608.116557] BTRFS info (device sdb): bdev /dev/sde errs: wr 6, rd
> 0, flush 1, corrupt 0, gen 0
>
> # touch /mnt/test4
> # sync
>
> Writing to the filesystem works while the device is missing.
> No new errors in dmesg after re-mounting degraded. Reboot to get back 
> /dev/sde.
>
> [4.329852] BTRFS: device fsid 75737bea-d76c-42f5-b0e6-7d346e38610d
> devid 4 transid 26 /dev/sde
> [4.330157] BTRFS: device fsid 75737bea-d76c-42f5-b0e6-7d346e38610d
> devid 3 transid 31 /dev/sdd
> [4.330511] BTRFS: device fsid 75737bea-d76c-42f5-b0e6-7d346e38610d
> devid 2 transid 31 /dev/sdc
> [4.330865] BTRFS: device fsid 75737bea-d76c-42f5-b0e6-7d346e38610d
> devid 1 transid 31 /dev/sdb
>
> /dev/sde transid is lagging behind, of course.
>
> # wipefs -a /dev/sde
> # btrfs device scan
>
> # mount -o degraded /dev/sdb /mnt
>
> [  507.248621] BTRFS info (device sdb): allowing degraded mounts
> [  507.248626] BTRFS info (device sdb): disk space caching is enabled
> [  507.248628] BTRFS: has skinny extents
> [  507.252815] BTRFS info (device sdb): bdev /dev/sde errs: wr 6, rd
> 0, flush 1, corrupt 0, gen 0
> [  507.252919] BTRFS: missing devices(1) exceeds the limit(0),

single/dup profile has zero-limit tolerance for missing devices. Only
ro-mount allowed in that case.

> writeable mount is not allowed
> [  507.278277] BTRFS: open_ctree failed
>
> Well, that was unexpected! Reboot again.
>
> # mount -o degraded /dev/sdb /mnt
>
> [   94.368514] BTRFS info (device sdd): allowing degraded mounts
> [   94.368519] BTRFS info (device sdd): disk space caching is enabled
> [   94.368521] BTRFS: has skinny extents
> [   94.370909] BTRFS warning (device sdd): devid 4 uuid
> 8549a275-f663-4741-b410-79b49a1d465f is missing
> [   94.372170] BTRFS info (device sdd): bdev (null) errs: wr 6, rd 0,
> flush 1, corrupt 0, gen 0
> [   94.372284] BTRFS: missing devices(1) exceeds the limit(0),
> writeable mount is not allowed
> [   94.395021] BTRFS: open_ctree failed
>
> No go.
>
> # mount -o degraded,ro /dev/sdb /mnt
> # btrfs device stats /mnt
> [/dev/sdb].write_io_errs   0
> [/dev/sdb].read_io_errs0
> [/dev/sdb].flush_io

Re: Possible Raid Bug

2016-03-25 Thread Patrik Lundquist
On 25 March 2016 at 18:20, Stephen Williams  wrote:
>
> Your information below was very helpful and I was able to recreate the
> Raid array. However my initial question still stands - What if the
> drives dies completely? I work in a Data center and we see this quite a
> lot where a drive is beyond dead - The OS will literally not detect it.

That's currently a weakness of Btrfs. I don't know how people deal
with it in production. I think Anand Jain is working on improving it.

> At this point would the Raid10 array be beyond repair? As you need the
> drive present in order to mount the array in degraded mode.

Right... let's try it again but a little bit differently.

# mount /dev/sdb /mnt

Let's drop the disk.

# echo 1 >/sys/block/sde/device/delete

[ 3669.024256] sd 5:0:0:0: [sde] Synchronizing SCSI cache
[ 3669.024934] sd 5:0:0:0: [sde] Stopping disk
[ 3669.037028] ata6.00: disabled

# touch /mnt/test3
# sync

[ 3845.960839] BTRFS error (device sdb): bdev /dev/sde errs: wr 1, rd
0, flush 0, corrupt 0, gen 0
[ 3845.961525] BTRFS error (device sdb): bdev /dev/sde errs: wr 2, rd
0, flush 0, corrupt 0, gen 0
[ 3845.962738] BTRFS error (device sdb): bdev /dev/sde errs: wr 3, rd
0, flush 0, corrupt 0, gen 0
[ 3845.963038] BTRFS error (device sdb): bdev /dev/sde errs: wr 4, rd
0, flush 0, corrupt 0, gen 0
[ 3845.963422] BTRFS error (device sdb): bdev /dev/sde errs: wr 4, rd
0, flush 1, corrupt 0, gen 0
[ 3845.963686] BTRFS warning (device sdb): lost page write due to IO
error on /dev/sde
[ 3845.963691] BTRFS error (device sdb): bdev /dev/sde errs: wr 5, rd
0, flush 1, corrupt 0, gen 0
[ 3845.963932] BTRFS warning (device sdb): lost page write due to IO
error on /dev/sde
[ 3845.963941] BTRFS error (device sdb): bdev /dev/sde errs: wr 6, rd
0, flush 1, corrupt 0, gen 0

# umount /mnt

[ 4095.276831] BTRFS error (device sdb): bdev /dev/sde errs: wr 7, rd
0, flush 1, corrupt 0, gen 0
[ 4095.278368] BTRFS error (device sdb): bdev /dev/sde errs: wr 8, rd
0, flush 1, corrupt 0, gen 0
[ 4095.279152] BTRFS error (device sdb): bdev /dev/sde errs: wr 8, rd
0, flush 2, corrupt 0, gen 0
[ 4095.279373] BTRFS warning (device sdb): lost page write due to IO
error on /dev/sde
[ 4095.279377] BTRFS error (device sdb): bdev /dev/sde errs: wr 9, rd
0, flush 2, corrupt 0, gen 0
[ 4095.279609] BTRFS warning (device sdb): lost page write due to IO
error on /dev/sde
[ 4095.279612] BTRFS error (device sdb): bdev /dev/sde errs: wr 10, rd
0, flush 2, corrupt 0, gen 0

# mount -o degraded /dev/sdb /mnt

[ 4608.113751] BTRFS info (device sdb): allowing degraded mounts
[ 4608.113756] BTRFS info (device sdb): disk space caching is enabled
[ 4608.113757] BTRFS: has skinny extents
[ 4608.116557] BTRFS info (device sdb): bdev /dev/sde errs: wr 6, rd
0, flush 1, corrupt 0, gen 0

# touch /mnt/test4
# sync

Writing to the filesystem works while the device is missing.
No new errors in dmesg after re-mounting degraded. Reboot to get back /dev/sde.

[4.329852] BTRFS: device fsid 75737bea-d76c-42f5-b0e6-7d346e38610d
devid 4 transid 26 /dev/sde
[4.330157] BTRFS: device fsid 75737bea-d76c-42f5-b0e6-7d346e38610d
devid 3 transid 31 /dev/sdd
[4.330511] BTRFS: device fsid 75737bea-d76c-42f5-b0e6-7d346e38610d
devid 2 transid 31 /dev/sdc
[4.330865] BTRFS: device fsid 75737bea-d76c-42f5-b0e6-7d346e38610d
devid 1 transid 31 /dev/sdb

/dev/sde transid is lagging behind, of course.

# wipefs -a /dev/sde
# btrfs device scan

# mount -o degraded /dev/sdb /mnt

[  507.248621] BTRFS info (device sdb): allowing degraded mounts
[  507.248626] BTRFS info (device sdb): disk space caching is enabled
[  507.248628] BTRFS: has skinny extents
[  507.252815] BTRFS info (device sdb): bdev /dev/sde errs: wr 6, rd
0, flush 1, corrupt 0, gen 0
[  507.252919] BTRFS: missing devices(1) exceeds the limit(0),
writeable mount is not allowed
[  507.278277] BTRFS: open_ctree failed

Well, that was unexpected! Reboot again.

# mount -o degraded /dev/sdb /mnt

[   94.368514] BTRFS info (device sdd): allowing degraded mounts
[   94.368519] BTRFS info (device sdd): disk space caching is enabled
[   94.368521] BTRFS: has skinny extents
[   94.370909] BTRFS warning (device sdd): devid 4 uuid
8549a275-f663-4741-b410-79b49a1d465f is missing
[   94.372170] BTRFS info (device sdd): bdev (null) errs: wr 6, rd 0,
flush 1, corrupt 0, gen 0
[   94.372284] BTRFS: missing devices(1) exceeds the limit(0),
writeable mount is not allowed
[   94.395021] BTRFS: open_ctree failed

No go.

# mount -o degraded,ro /dev/sdb /mnt
# btrfs device stats /mnt
[/dev/sdb].write_io_errs   0
[/dev/sdb].read_io_errs0
[/dev/sdb].flush_io_errs   0
[/dev/sdb].corruption_errs 0
[/dev/sdb].generation_errs 0
[/dev/sdc].write_io_errs   0
[/dev/sdc].read_io_errs0
[/dev/sdc].flush_io_errs   0
[/dev/sdc].corruption_errs 0
[/dev/sdc].generation_errs 0
[/dev/sdd].write_io_errs   0
[/dev/sdd].read_io_errs0
[/dev/sdd].flush_io_errs   0
[/dev/sdd].corruption_errs 0
[/dev/sdd].generation_errs 0
[(null)].wri

Re: Possible Raid Bug

2016-03-25 Thread Stephen Williams
Hi Patrik,

[root@Xen ~]# uname -r
4.4.5-1-ARCH

[root@Xen ~]# pacman -Q btrfs-progs
btrfs-progs 4.4.1-1

Your information below was very helpful and I was able to recreate the
Raid array. However my initial question still stands - What if the
drives dies completely? I work in a Data center and we see this quite a
lot where a drive is beyond dead - The OS will literally not detect it.
At this point would the Raid10 array be beyond repair? As you need the
drive present in order to mount the array in degraded mode.

-- 
  Stephen Williams
  steph...@veryfast.biz

On Fri, Mar 25, 2016, at 02:57 PM, Patrik Lundquist wrote:
> On Debian Stretch with Linux 4.4.6, btrfs-progs 4.4 in VirtualBox
> 5.0.16 with 4*2GB VDIs:
> 
> # mkfs.btrfs -m raid10 -d raid10 /dev/sdb /dev/sdc /dev/sdd /dev/sdbe
> 
> # mount /dev/sdb /mnt
> # touch /mnt/test
> # umount /mnt
> 
> Everything fine so far.
> 
> # wipefs -a /dev/sde
> 
> *reboot*
> 
> # mount /dev/sdb /mnt
> mount: wrong fs type, bad option, bad superblock on /dev/sdb,
>missing codepage or helper program, or other error
> 
>In some cases useful info is found in syslog - try
>dmesg | tail or so.
> 
> # dmesg | tail
> [   85.979655] BTRFS info (device sdb): disk space caching is enabled
> [   85.979660] BTRFS: has skinny extents
> [   85.982377] BTRFS: failed to read the system array on sdb
> [   85.996793] BTRFS: open_ctree failed
> 
> Not very informative! An information regression?
> 
> # mount -o degraded /dev/sdb /mnt
> 
> # dmesg | tail
> [  919.899071] BTRFS info (device sdb): allowing degraded mounts
> [  919.899075] BTRFS info (device sdb): disk space caching is enabled
> [  919.899077] BTRFS: has skinny extents
> [  919.903216] BTRFS warning (device sdb): devid 4 uuid
> 8549a275-f663-4741-b410-79b49a1d465f is missing
> 
> # touch /mnt/test2
> # ls -l /mnt/
> total 0
> -rw-r--r-- 1 root root 0 mar 25 15:17 test
> -rw-r--r-- 1 root root 0 mar 25 15:42 test2
> 
> # btrfs device remove missing /mnt
> ERROR: error removing device 'missing': unable to go below four
> devices on raid10
> 
> As expected.
> 
> # btrfs replace start -B missing /dev/sde /mnt
> ERROR: source device must be a block device or a devid
> 
> Would have been nice if missing worked here too. Maybe it does in
> btrfs-progs 4.5?
> 
> # btrfs replace start -B 4 /dev/sde /mnt
> 
> # dmesg | tail
> [ 1618.170619] BTRFS info (device sdb): dev_replace from  disk> (devid 4) to /dev/sde started
> [ 1618.184979] BTRFS info (device sdb): dev_replace from  disk> (devid 4) to /dev/sde finished
> 
> Repaired!
> 
> # umount /mnt
> # mount /dev/sdb /mnt
> # dmesg | tail
> [ 1729.917661] BTRFS info (device sde): disk space caching is enabled
> [ 1729.917665] BTRFS: has skinny extents
> 
> All in all it works just fine with Linux 4.4.6.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Possible Raid Bug

2016-03-25 Thread Patrik Lundquist
On Debian Stretch with Linux 4.4.6, btrfs-progs 4.4 in VirtualBox
5.0.16 with 4*2GB VDIs:

# mkfs.btrfs -m raid10 -d raid10 /dev/sdb /dev/sdc /dev/sdd /dev/sdbe

# mount /dev/sdb /mnt
# touch /mnt/test
# umount /mnt

Everything fine so far.

# wipefs -a /dev/sde

*reboot*

# mount /dev/sdb /mnt
mount: wrong fs type, bad option, bad superblock on /dev/sdb,
   missing codepage or helper program, or other error

   In some cases useful info is found in syslog - try
   dmesg | tail or so.

# dmesg | tail
[   85.979655] BTRFS info (device sdb): disk space caching is enabled
[   85.979660] BTRFS: has skinny extents
[   85.982377] BTRFS: failed to read the system array on sdb
[   85.996793] BTRFS: open_ctree failed

Not very informative! An information regression?

# mount -o degraded /dev/sdb /mnt

# dmesg | tail
[  919.899071] BTRFS info (device sdb): allowing degraded mounts
[  919.899075] BTRFS info (device sdb): disk space caching is enabled
[  919.899077] BTRFS: has skinny extents
[  919.903216] BTRFS warning (device sdb): devid 4 uuid
8549a275-f663-4741-b410-79b49a1d465f is missing

# touch /mnt/test2
# ls -l /mnt/
total 0
-rw-r--r-- 1 root root 0 mar 25 15:17 test
-rw-r--r-- 1 root root 0 mar 25 15:42 test2

# btrfs device remove missing /mnt
ERROR: error removing device 'missing': unable to go below four
devices on raid10

As expected.

# btrfs replace start -B missing /dev/sde /mnt
ERROR: source device must be a block device or a devid

Would have been nice if missing worked here too. Maybe it does in
btrfs-progs 4.5?

# btrfs replace start -B 4 /dev/sde /mnt

# dmesg | tail
[ 1618.170619] BTRFS info (device sdb): dev_replace from  (devid 4) to /dev/sde started
[ 1618.184979] BTRFS info (device sdb): dev_replace from  (devid 4) to /dev/sde finished

Repaired!

# umount /mnt
# mount /dev/sdb /mnt
# dmesg | tail
[ 1729.917661] BTRFS info (device sde): disk space caching is enabled
[ 1729.917665] BTRFS: has skinny extents

All in all it works just fine with Linux 4.4.6.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Possible Raid Bug

2016-03-25 Thread Duncan
Patrik Lundquist posted on Fri, 25 Mar 2016 13:48:08 +0100 as excerpted:

> On 25 March 2016 at 12:49, Stephen Williams 
> wrote:
>>
>> So catch 22, you need all the drives otherwise it won't let you mount,
>> But what happens if a drive dies and the OS doesn't detect it? BTRFS
>> wont allow you to mount the raid volume to remove the bad disk!
> 
> Version of Linux and btrfs-progs?

Yes, please.  This can be very critical information as a lot of bugs will 
be fixed in new versions that are known to exist in older versions, and 
occasionally new ones are introduced as well, where older versions won't 
be affected.

> You can't have a raid10 with less than 4 devices so you need to add a
> new device before deleting the missing. That is of course still a
> problem with a read-only fs.
> 
> btrfs replace is also the recommended way to replace a failed device
> nowadays. The wiki is outdated.

In theory, what it's supposed to do in a missing device situation that 
takes it below the minimum (four devices for a raid10) for a given raid 
mode, is allow writable mounting, unless the number of missing devices is 
too high (more than one missing on raid10) to allow functional degraded 
operation.

What it will often end up doing in that case, since it can't write the 
full raid10, is once current raid10 chunks get filled up and it needs to 
create more, since it doesn't have enough devices to create them in 
raid10, it will degrade to creating them in raid1 mode.

The problem, however, is that on subsequent mounts, btrfs will see that 
single chunk in addition to the raid10 chunks, and will see the missing 
device, and knowing single mode is broken with /any/ missing devices, 
will at that point only mount read-only.

That's a currently known bug, which effectively means you may well get 
only one read-write mount to fix the problem, before btrfs will see that 
new single chunk created in the first degraded writable mount, and will 
refuse to mount writable again.

There are patches available that will fix this known bug by changing this 
detection to per-chunk, instead of per-filesystem.  The degraded-writable 
mount will still degrade to writing single chunks, but btrfs will see 
that all single chunks are accounted for, and all raid10 chunks only have 
one device missing and thus can still be used, and the filesystem will 
thus continue to be write mountable, unless of course another device 
fails.

But AFAIK, those patches were part of a patch set (the hot-spare patches) 
that as a whole wasn't picked for 4.5, tho by rights the per-chunk 
checking patches should have been cherry-picked as ready and fixing an 
existing bug, but weren't.  So as of 4.5, AFAIK, they still have to be 
applied separately before build.  Hopefully they'll be in 4.6.

However, while lack of the per-chunk checking patch would mean an 
expected situation of allowing only one degraded-writable mount before no 
more would be allowed, unless you got it to work once and didn't mention 
it, and unless that btrfs fi usage was from before that writable mount as 
it doesn't show the single-mode chunk that would then prevent further 
writable mounts, it looks like you may have a possibly related, but 
definitely more severe bug, as it appears you aren't even being allowed 
what would otherwise be expected to be that one-shot degraded-writable 
mount.

And without that, as mentioned, you have a problem, since you have to 
have a writable mount to repair the filesystem, and it's not allowing you 
even that one-shot writable mount that should be possible even with that 
known bug.

Assuming you're using a current kernel and post that information, it's 
quite likely the dev working on the other bug will be interested, and 
will have you build a kernel with those patches to see if that alone 
fixes it, before possibly having you try various debugging patches to 
hone in on the problem, if it doesn't, so he can hopefully duplicate the 
problem himself, and ultimately come up with a fix.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Possible Raid Bug

2016-03-25 Thread Patrik Lundquist
On 25 March 2016 at 12:49, Stephen Williams  wrote:
>
> So catch 22, you need all the drives otherwise it won't let you mount,
> But what happens if a drive dies and the OS doesn't detect it? BTRFS
> wont allow you to mount the raid volume to remove the bad disk!

Version of Linux and btrfs-progs?

You can't have a raid10 with less than 4 devices so you need to add a
new device before deleting the missing. That is of course still a
problem with a read-only fs.

btrfs replace is also the recommended way to replace a failed device
nowadays. The wiki is outdated.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Possible Raid Bug

2016-03-25 Thread Stephen Williams
Hi,

Find instructions on how to recreate below -

I have a BTRFS raid 10 setup in Virtualbox (I'm getting to grips with
the Filesystem) 
I have the raid mounted to /mnt like so -
 
[root@Xen ~]# btrfs filesystem show /mnt/
Label: none  uuid: ad1d95ee-5cdc-420f-ad30-bd16158ad8cb
Total devices 4 FS bytes used 1.00GiB
devid1 size 2.00GiB used 927.00MiB path /dev/sdb
devid2 size 2.00GiB used 927.00MiB path /dev/sdc
devid3 size 2.00GiB used 927.00MiB path /dev/sdd
devid4 size 2.00GiB used 927.00MiB path /dev/sde
And -
[root@Xen ~]# btrfs filesystem usage /mnt/
Overall:
Device size:   8.00GiB
Device allocated:  3.62GiB
Device unallocated:4.38GiB
Device missing:  0.00B
Used:  2.00GiB
Free (estimated):  2.69GiB  (min: 2.69GiB)
Data ratio:   2.00
Metadata ratio:   2.00
Global reserve:   16.00MiB  (used: 0.00B)

Data,RAID10: Size:1.50GiB, Used:1.00GiB
   /dev/sdb  383.50MiB
   /dev/sdc  383.50MiB
   /dev/sdd  383.50MiB
   /dev/sde  383.50MiB

Metadata,RAID10: Size:256.00MiB, Used:1.16MiB
   /dev/sdb   64.00MiB
   /dev/sdc   64.00MiB
   /dev/sdd   64.00MiB
   /dev/sde   64.00MiB

System,RAID10: Size:64.00MiB, Used:16.00KiB
   /dev/sdb   16.00MiB
   /dev/sdc   16.00MiB
   /dev/sdd   16.00MiB
   /dev/sde   16.00MiB

Unallocated:
   /dev/sdb1.55GiB
   /dev/sdc1.55GiB
   /dev/sdd1.55GiB
   /dev/sde1.55GiB

Right so everything looks good and I stuck some dummy files in there too
-
[root@Xen ~]# ls -lh /mnt/
total 1.1G
-rw-r--r-- 1 root root 1.0G May 30  2008 1GB.zip
-rw-r--r-- 1 root root   28 Mar 24 15:16 hello
-rw-r--r-- 1 root root6 Mar 24 16:12 niglu
-rw-r--r-- 1 root root4 Mar 24 15:32 test

The bug appears to happen when you try and test out it's ability to
handle a dead drive.
If you follow the instructions here:
https://btrfs.wiki.kernel.org/index.php/Using_Btrfs_with_Multiple_Devices#Replacing_failed_devices
It tells you do mount the drive with the 'degraded' option,, however
this just does not work, allow me to show -

1) I power off the VM and remove one of the drives (Simulating a drive
being pulled from a machine)
2) Power on the VM
3) Check DMESG - Everything looks good
4) Check how BTRFS is feeling -

Label: none  uuid: ad1d95ee-5cdc-420f-ad30-bd16158ad8cb
Total devices 4 FS bytes used 1.00GiB
devid1 size 2.00GiB used 1.31GiB path /dev/sdb
devid2 size 2.00GiB used 1.31GiB path /dev/sdc
devid3 size 2.00GiB used 1.31GiB path /dev/sdd
*** Some devices missing

So far so good, /dev/sde is missing and BTRFS has detected this.
5) Try and mount it as per the wiki so I can remove the bad drive and
replace it with a good one -

[root@Xen ~]# mount -o degraded /dev/sdb /mnt/
mount: wrong fs type, bad option, bad superblock on /dev/sdb,
   missing codepage or helper program, or other error

   In some cases useful info is found in syslog - try
   dmesg | tail or so.

Ok, this is not good, I check DMESG -

[root@Xen ~]# dmesg | tail
[4.416445] e1000: enp0s3 NIC Link is Up 1000 Mbps Full Duplex, Flow
Control: RX
[4.416672] IPv6: ADDRCONF(NETDEV_CHANGE): enp0s3: link becomes ready
[4.631812] snd_intel8x0 :00:05.0: white list rate for 1028:0177
is 48000
[7.091047] floppy0: no floppy controllers found
[   27.488345] BTRFS info (device sdb): allowing degraded mounts
[   27.488348] BTRFS info (device sdb): disk space caching is enabled
[   27.488349] BTRFS: has skinny extents
[   27.489794] BTRFS warning (device sdb): devid 4 uuid
ebcd53d9-5956-41d9-b0ef-c59d08e5830f is missing
[   27.491465] BTRFS: missing devices(1) exceeds the limit(0), writeable
mount is not allowed
[   27.520231] BTRFS: open_ctree failed

So here lies the problem - BTRFS needs you to have all the devices
present in order to mount is as writeable, however if a drive dies
spectacularly (as they can do) You can't have that do that. And as a
result you cannot mount any of the remaining drives and fix the problem. 
Now you ARE able to mount it read only but you can't issue the fix that
is recommend on the wiki, see here -

[root@Xen ~]# mount -o ro,degraded /dev/sdb /mnt/
[root@Xen ~]# btrfs device delete missing /mnt/
ERROR: error removing device 'missing': Read-only file system

So catch 22, you need all the drives otherwise it won't let you mount,
But what happens if a drive dies and the OS doesn't detect it? BTRFS
wont allow you to mount the raid volume to remove the bad disk! 

I also tried it with read only -

[root@Xen ~]# mount -o ro,degraded /dev/sdb /mnt/
[root@Xen ~]# btrfs device delete missing /mnt/
ERROR: error removing device 'missing': Read-only file system

subscribe linux-btrfs
--
To unsubscribe from this list: send the line "unsubscri