Re: size 2.73TiB used 240.97GiB after balance

2015-07-09 Thread Austin S Hemmelgarn

On 2015-07-08 15:06, Donald Pearson wrote:

I wouldn't use dd.

I would use recover to get the data if at all possible, then you can
experiment with try to fix the degraded condition live.  If you have
any chance of getting data from the pool, you reduce that chance every
time you make a change.

If btrfs did the balance like you said, it wouldn't be raid5.  What
you just described is raid4 where only one drive holds parity data.  I
can't say that I actually know for a fact that btrfs doesn't do this,
but I'd be shocked and some dev would need to eat their underware if
the balance job didn't distribute the parity also.

That is correct, it does distribute the parity among all the member 
drives.  That said, it would still have to modify the existing drives 
even if it did put the parity on just the new drive, because raid{4,5,6} 
are defined as _striped_ data with parity, not mirrored (ie, if you just 
removed the parity, you'd have a raid0, not a raid1).





smime.p7s
Description: S/MIME Cryptographic Signature


Re: size 2.73TiB used 240.97GiB after balance

2015-07-09 Thread Austin S Hemmelgarn

On 2015-07-08 18:16, Donald Pearson wrote:

Basically I wouldn't trust the drive that's already showing signs of
failure to survive a dd.  It isn't completely full, so the recover is
less load.  That's just the way I see it.  But I see your point of
trying to get drive images now to hedge against failures.

Unfortunately those errors are over my head so hopefully someone else
has insights.

A better option if you want a block level copy would probably be 
ddrescue (it's available in almost every distro in a package of the same 
name), it's designed for recovering as much data as possible from failed 
disks (and gives a much nicer status display than plain old dd).  If you 
do go for a block level copy however, make certain that no more than one 
of the copies is visible to the system at any given time, especially 
when the filesystem is mounted, otherwise things _WILL_ get 
exponentially worse.





smime.p7s
Description: S/MIME Cryptographic Signature


Re: size 2.73TiB used 240.97GiB after balance

2015-07-08 Thread Donald Pearson
Basically I wouldn't trust the drive that's already showing signs of
failure to survive a dd.  It isn't completely full, so the recover is
less load.  That's just the way I see it.  But I see your point of
trying to get drive images now to hedge against failures.

Unfortunately those errors are over my head so hopefully someone else
has insights.

Also the posessive think's at the end of those outputs made me chuckle.



On Wed, Jul 8, 2015 at 4:29 PM, Hendrik Friedel hend...@friedels.name wrote:
 Hello Donald,

 thanks for your reply. I appreciate your help.

 I would use recover to get the data if at all possible, then you can

 experiment with try to fix the degraded condition live.  If you have
 any chance of getting data from the pool, you reduce that chance every
 time you make a change.


 Ok, you assume that btrfs recover is the most likely way of recovering data.
 But if mounting degraded, scrubbing, btrfsck, ... are more successful, your
 proposal is more risky, isn't it? With a dd-image I can always go back to
 todays status.

 If btrfs did the balance like you said, it wouldn't be raid5.  What
 you just described is raid4 where only one drive holds parity data.  I
 can't say that I actually know for a fact that btrfs doesn't do this,
 but I'd be shocked and some dev would need to eat their underware if
 the balance job didn't distribute the parity also.


 Ok, I was not aware of the difference between raid45.

 So, I did try a btrs-recover:
 warning devid 3 not found already
 Check tree block failed, want=8300102483968, have=65536
 Check tree block failed, want=8300102483968, have=65536
 Check tree block failed, want=8300102483968, have=65536
 read block failed check_tree_block
 Couldn't setup extent tree
 [it is still running]

 btrfs-find-root gives me:
 http://paste.ubuntu.com/11844005/
 http://paste.ubuntu.com/11844009/
 (on the two disks)


 btrfs-show-super:
 http://paste.ubuntu.com/11844016/

 Greetings,
 Hendrik





 ---
 Diese E-Mail wurde von Avast Antivirus-Software auf Viren geprüft.
 https://www.avast.com/antivirus

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: size 2.73TiB used 240.97GiB after balance

2015-07-08 Thread Hendrik Friedel

Hello Donald,

thanks for your reply. I appreciate your help.

 I would use recover to get the data if at all possible, then you can

experiment with try to fix the degraded condition live.  If you have
any chance of getting data from the pool, you reduce that chance every
time you make a change.


Ok, you assume that btrfs recover is the most likely way of recovering 
data. But if mounting degraded, scrubbing, btrfsck, ... are more 
successful, your proposal is more risky, isn't it? With a dd-image I can 
always go back to todays status.



If btrfs did the balance like you said, it wouldn't be raid5.  What
you just described is raid4 where only one drive holds parity data.  I
can't say that I actually know for a fact that btrfs doesn't do this,
but I'd be shocked and some dev would need to eat their underware if
the balance job didn't distribute the parity also.


Ok, I was not aware of the difference between raid45.

So, I did try a btrs-recover:
warning devid 3 not found already
Check tree block failed, want=8300102483968, have=65536
Check tree block failed, want=8300102483968, have=65536
Check tree block failed, want=8300102483968, have=65536
read block failed check_tree_block
Couldn't setup extent tree
[it is still running]

btrfs-find-root gives me:
http://paste.ubuntu.com/11844005/
http://paste.ubuntu.com/11844009/
(on the two disks)


btrfs-show-super:
http://paste.ubuntu.com/11844016/

Greetings,
Hendrik




---
Diese E-Mail wurde von Avast Antivirus-Software auf Viren geprüft.
https://www.avast.com/antivirus

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: size 2.73TiB used 240.97GiB after balance

2015-07-08 Thread Hendrik Friedel

Hello,

yes, I will check the cables, thanks for the hint.
Before trying to recover the data, I would like to save the status quo. 
I have two new drives? Is it advisable to dd-copy the data on the new 
drives and then to try to recover?


I am asking, because I suppose that dd will also copy the UUID, which 
might confuse BTRFS (two drives with same UUID attached)?


And then I have a technical question on btrfs balance when converting to 
raid5 (from raid1): does the balance create the parity information on 
the newly-added (empty) drive, so that the data on the two original 
disks is not touched at all?


Regards,
Hendrik


On 07.07.2015 15:14, Donald Pearson wrote:

That's what it looks like.  You may want to try reseating cables, etc.

Instead of mounting and file copy, btrfs restore might be worth a shot
to recover what you can.

On Tue, Jul 7, 2015 at 12:42 AM, Hendrik Friedel hend...@friedels.name wrote:

Hello,

while mounting works with the recovery option, the system locks after
reading.
dmesg shows:
[  684.258246] ata6.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[  684.258249] ata6.00: irq_stat 0x4001
[  684.258252] ata6.00: failed command: DATA SET MANAGEMENT
[  684.258255] ata6.00: cmd 06/01:01:00:00:00/00:00:00:00:00/a0 tag 26 dma
512 out
[  684.258255]  res 51/04:01:01:00:00/00:00:00:00:00/a0 Emask 0x1
(device error)
[  684.258256] ata6.00: status: { DRDY ERR }
[  684.258258] ata6.00: error: { ABRT }
[  684.258266] sd 5:0:0:0: [sdd] tag#26 FAILED Result: hostbyte=DID_OK
driverbyte=DRIVER_SENSE
[  684.258268] sd 5:0:0:0: [sdd] tag#26 Sense Key : Illegal Request
[current] [descriptor]
[  684.258270] sd 5:0:0:0: [sdd] tag#26 Add. Sense: Unaligned write command
[  684.258272] sd 5:0:0:0: [sdd] tag#26 CDB: Write same(16) 93 08 00 00 00
00 00 01 d3 80 00 00 00 80 00 00


So, also this drive is failing?!

Regards,
Hendrik


On 07.07.2015 00:59, Donald Pearson wrote:


Anything in dmesg?

On Mon, Jul 6, 2015 at 5:07 PM, hend...@friedels.name
hend...@friedels.name wrote:


Hallo,

It seems, that mounting works, but the System locks completely soon after
I
backing up.


Greetings,

Hendrik


-- Originalnachricht--

Von: Donald Pearson

Datum: Mo., 6. Juli 2015 23:49

An: Hendrik Friedel;

Cc: Omar Sandoval;Hugo Mills;Btrfs BTRFS;

Betreff:Re: size 2.73TiB used 240.97GiB after balance


If you can mount it RO, first thing to do is back up any data that
youcare
about.According to the bug that Omar posted you should not try a
devicereplace and you should not try a scrub with a missing device.You
may
be able to just do a device delete missing, then separately doa device
add
of a new drive, or rebalance back in to raid1.On Mon, Jul 6, 2015 at 4:12
PM, Hendrik Friedel  wrote: Hello, oh dear, I fear I am in trouble:
recovery-mounted, I tried to save some data, but the system hung. So I
re-booted and sdc is now physically disconnected. Label: none  uuid:
b4a6cce6-dc9c-4a13-80a4-ed6bc5b40bb8 Total devices 3 FS bytes
used
4.67TiB devid1 size 2.73TiB used 2.67TiB path /dev/sdc
devid2 size 2.73TiB used 2.67TiB path /dev/sdb *** Some
devices
missing I try to mount the rest again: mount -o recovery,ro /dev/sdb
/mnt/__Complete_Disk mount: wrong fs type, bad option, bad superblock on
/dev/sdb,missing codepage or helper program, or other error
In some cases useful info is found in syslog - trydmesg | tail
or
so root@homeserver:~# dmesg | tail [  447.059275] BTRFS info (device
sdc): enabling auto recovery [  447.059280] BTRFS info (device sdc):
disk
space caching is enabled [  447.086844] BTRFS: failed to read chunk tree
on
sdc [  447.110588] BTRFS: open_ctree failed [  474.496778] BTRFS info
(device sdc): enabling auto recovery [  474.496781] BTRFS info (device
sdc): disk space caching is enabled [  474.519005] BTRFS: failed to read
chunk tree on sdc [  474.540627] BTRFS: open_ctree failed mount -o
degraded,ro /dev/sdb /mnt/__Complete_Disk Does work now though. So,
how
can I remove the reference to the failed disk and check the data for
consistency (scrub I suppose, but is it safe?)? Regards, Hendrik
On
06.07.2015 22:52, Omar Sandoval wrote: On 07/06/2015 01:01 PM, Donald
Pearson wrote: Based on my experience Hugo's advice is critical,
get
the bad drive out of the pool when in raid56 and do not try to replace
or
delete it while it's still attached and recognized. If you add a
new device, mount degraded and rebalance.  If you don't, mount
degraded
then device delete missing. Watch out, replacing a missing device
in
RAID 5/6 currently doesn't work and will cause a kernel BUG(). See my
patch series here:
http://www.spinics.net/lists/linux-btrfs/msg44874.html -- Hendrik
Friedel Auf dem Brink 12 28844 Weyhe Tel. 04203 8394854 Mobil 0178
1874363 --- Diese E-Mail wurde von Avast Antivirus-Software auf Viren
geprüft. https://www.avast.com/antivirus




--
Hendrik Friedel
Auf dem Brink 12
28844 Weyhe
Tel. 04203 8394854
Mobil 0178 1874363

---
Diese

Re: size 2.73TiB used 240.97GiB after balance

2015-07-08 Thread Donald Pearson
I wouldn't use dd.

I would use recover to get the data if at all possible, then you can
experiment with try to fix the degraded condition live.  If you have
any chance of getting data from the pool, you reduce that chance every
time you make a change.

If btrfs did the balance like you said, it wouldn't be raid5.  What
you just described is raid4 where only one drive holds parity data.  I
can't say that I actually know for a fact that btrfs doesn't do this,
but I'd be shocked and some dev would need to eat their underware if
the balance job didn't distribute the parity also.

On Wed, Jul 8, 2015 at 1:56 PM, Hendrik Friedel hend...@friedels.name wrote:
 Hello,

 yes, I will check the cables, thanks for the hint.
 Before trying to recover the data, I would like to save the status quo. I
 have two new drives? Is it advisable to dd-copy the data on the new drives
 and then to try to recover?

 I am asking, because I suppose that dd will also copy the UUID, which might
 confuse BTRFS (two drives with same UUID attached)?

 And then I have a technical question on btrfs balance when converting to
 raid5 (from raid1): does the balance create the parity information on the
 newly-added (empty) drive, so that the data on the two original disks is not
 touched at all?

 Regards,
 Hendrik



 On 07.07.2015 15:14, Donald Pearson wrote:

 That's what it looks like.  You may want to try reseating cables, etc.

 Instead of mounting and file copy, btrfs restore might be worth a shot
 to recover what you can.

 On Tue, Jul 7, 2015 at 12:42 AM, Hendrik Friedel hend...@friedels.name
 wrote:

 Hello,

 while mounting works with the recovery option, the system locks after
 reading.
 dmesg shows:
 [  684.258246] ata6.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
 [  684.258249] ata6.00: irq_stat 0x4001
 [  684.258252] ata6.00: failed command: DATA SET MANAGEMENT
 [  684.258255] ata6.00: cmd 06/01:01:00:00:00/00:00:00:00:00/a0 tag 26
 dma
 512 out
 [  684.258255]  res 51/04:01:01:00:00/00:00:00:00:00/a0 Emask 0x1
 (device error)
 [  684.258256] ata6.00: status: { DRDY ERR }
 [  684.258258] ata6.00: error: { ABRT }
 [  684.258266] sd 5:0:0:0: [sdd] tag#26 FAILED Result: hostbyte=DID_OK
 driverbyte=DRIVER_SENSE
 [  684.258268] sd 5:0:0:0: [sdd] tag#26 Sense Key : Illegal Request
 [current] [descriptor]
 [  684.258270] sd 5:0:0:0: [sdd] tag#26 Add. Sense: Unaligned write
 command
 [  684.258272] sd 5:0:0:0: [sdd] tag#26 CDB: Write same(16) 93 08 00 00
 00
 00 00 01 d3 80 00 00 00 80 00 00


 So, also this drive is failing?!

 Regards,
 Hendrik


 On 07.07.2015 00:59, Donald Pearson wrote:


 Anything in dmesg?

 On Mon, Jul 6, 2015 at 5:07 PM, hend...@friedels.name
 hend...@friedels.name wrote:


 Hallo,

 It seems, that mounting works, but the System locks completely soon
 after
 I
 backing up.


 Greetings,

 Hendrik


 -- Originalnachricht--

 Von: Donald Pearson

 Datum: Mo., 6. Juli 2015 23:49

 An: Hendrik Friedel;

 Cc: Omar Sandoval;Hugo Mills;Btrfs BTRFS;

 Betreff:Re: size 2.73TiB used 240.97GiB after balance


 If you can mount it RO, first thing to do is back up any data that
 youcare
 about.According to the bug that Omar posted you should not try a
 devicereplace and you should not try a scrub with a missing device.You
 may
 be able to just do a device delete missing, then separately doa device
 add
 of a new drive, or rebalance back in to raid1.On Mon, Jul 6, 2015 at
 4:12
 PM, Hendrik Friedel  wrote: Hello, oh dear, I fear I am in trouble:
 recovery-mounted, I tried to save some data, but the system hung. So I
 re-booted and sdc is now physically disconnected. Label: none  uuid:
 b4a6cce6-dc9c-4a13-80a4-ed6bc5b40bb8 Total devices 3 FS bytes
 used
 4.67TiB devid1 size 2.73TiB used 2.67TiB path /dev/sdc
 devid2 size 2.73TiB used 2.67TiB path /dev/sdb *** Some
 devices
 missing I try to mount the rest again: mount -o recovery,ro /dev/sdb
 /mnt/__Complete_Disk mount: wrong fs type, bad option, bad superblock
 on
 /dev/sdb,missing codepage or helper program, or other error
 In some cases useful info is found in syslog - trydmesg | tail
 or
 so root@homeserver:~# dmesg | tail [  447.059275] BTRFS info (device
 sdc): enabling auto recovery [  447.059280] BTRFS info (device sdc):
 disk
 space caching is enabled [  447.086844] BTRFS: failed to read chunk
 tree
 on
 sdc [  447.110588] BTRFS: open_ctree failed [  474.496778] BTRFS info
 (device sdc): enabling auto recovery [  474.496781] BTRFS info (device
 sdc): disk space caching is enabled [  474.519005] BTRFS: failed to
 read
 chunk tree on sdc [  474.540627] BTRFS: open_ctree failed mount -o
 degraded,ro /dev/sdb /mnt/__Complete_Disk Does work now though. So,
 how
 can I remove the reference to the failed disk and check the data for
 consistency (scrub I suppose, but is it safe?)? Regards,
 Hendrik
 On
 06.07.2015 22:52, Omar Sandoval wrote: On 07/06/2015 01:01 PM,
 Donald
 Pearson wrote: Based

Re: size 2.73TiB used 240.97GiB after balance

2015-07-07 Thread Donald Pearson
That's what it looks like.  You may want to try reseating cables, etc.

Instead of mounting and file copy, btrfs restore might be worth a shot
to recover what you can.

On Tue, Jul 7, 2015 at 12:42 AM, Hendrik Friedel hend...@friedels.name wrote:
 Hello,

 while mounting works with the recovery option, the system locks after
 reading.
 dmesg shows:
 [  684.258246] ata6.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
 [  684.258249] ata6.00: irq_stat 0x4001
 [  684.258252] ata6.00: failed command: DATA SET MANAGEMENT
 [  684.258255] ata6.00: cmd 06/01:01:00:00:00/00:00:00:00:00/a0 tag 26 dma
 512 out
 [  684.258255]  res 51/04:01:01:00:00/00:00:00:00:00/a0 Emask 0x1
 (device error)
 [  684.258256] ata6.00: status: { DRDY ERR }
 [  684.258258] ata6.00: error: { ABRT }
 [  684.258266] sd 5:0:0:0: [sdd] tag#26 FAILED Result: hostbyte=DID_OK
 driverbyte=DRIVER_SENSE
 [  684.258268] sd 5:0:0:0: [sdd] tag#26 Sense Key : Illegal Request
 [current] [descriptor]
 [  684.258270] sd 5:0:0:0: [sdd] tag#26 Add. Sense: Unaligned write command
 [  684.258272] sd 5:0:0:0: [sdd] tag#26 CDB: Write same(16) 93 08 00 00 00
 00 00 01 d3 80 00 00 00 80 00 00


 So, also this drive is failing?!

 Regards,
 Hendrik


 On 07.07.2015 00:59, Donald Pearson wrote:

 Anything in dmesg?

 On Mon, Jul 6, 2015 at 5:07 PM, hend...@friedels.name
 hend...@friedels.name wrote:

 Hallo,

 It seems, that mounting works, but the System locks completely soon after
 I
 backing up.


 Greetings,

 Hendrik


 -- Originalnachricht--

 Von: Donald Pearson

 Datum: Mo., 6. Juli 2015 23:49

 An: Hendrik Friedel;

 Cc: Omar Sandoval;Hugo Mills;Btrfs BTRFS;

 Betreff:Re: size 2.73TiB used 240.97GiB after balance


 If you can mount it RO, first thing to do is back up any data that
 youcare
 about.According to the bug that Omar posted you should not try a
 devicereplace and you should not try a scrub with a missing device.You
 may
 be able to just do a device delete missing, then separately doa device
 add
 of a new drive, or rebalance back in to raid1.On Mon, Jul 6, 2015 at 4:12
 PM, Hendrik Friedel  wrote: Hello, oh dear, I fear I am in trouble:
 recovery-mounted, I tried to save some data, but the system hung. So I
 re-booted and sdc is now physically disconnected. Label: none  uuid:
 b4a6cce6-dc9c-4a13-80a4-ed6bc5b40bb8 Total devices 3 FS bytes
 used
 4.67TiB devid1 size 2.73TiB used 2.67TiB path /dev/sdc
 devid2 size 2.73TiB used 2.67TiB path /dev/sdb *** Some
 devices
 missing I try to mount the rest again: mount -o recovery,ro /dev/sdb
 /mnt/__Complete_Disk mount: wrong fs type, bad option, bad superblock on
 /dev/sdb,missing codepage or helper program, or other error
 In some cases useful info is found in syslog - trydmesg | tail
 or
 so root@homeserver:~# dmesg | tail [  447.059275] BTRFS info (device
 sdc): enabling auto recovery [  447.059280] BTRFS info (device sdc):
 disk
 space caching is enabled [  447.086844] BTRFS: failed to read chunk tree
 on
 sdc [  447.110588] BTRFS: open_ctree failed [  474.496778] BTRFS info
 (device sdc): enabling auto recovery [  474.496781] BTRFS info (device
 sdc): disk space caching is enabled [  474.519005] BTRFS: failed to read
 chunk tree on sdc [  474.540627] BTRFS: open_ctree failed mount -o
 degraded,ro /dev/sdb /mnt/__Complete_Disk Does work now though. So,
 how
 can I remove the reference to the failed disk and check the data for
 consistency (scrub I suppose, but is it safe?)? Regards, Hendrik
 On
 06.07.2015 22:52, Omar Sandoval wrote: On 07/06/2015 01:01 PM, Donald
 Pearson wrote: Based on my experience Hugo's advice is critical,
 get
 the bad drive out of the pool when in raid56 and do not try to replace
 or
 delete it while it's still attached and recognized. If you add a
 new device, mount degraded and rebalance.  If you don't, mount
 degraded
 then device delete missing. Watch out, replacing a missing device
 in
 RAID 5/6 currently doesn't work and will cause a kernel BUG(). See my
 patch series here:
 http://www.spinics.net/lists/linux-btrfs/msg44874.html -- Hendrik
 Friedel Auf dem Brink 12 28844 Weyhe Tel. 04203 8394854 Mobil 0178
 1874363 --- Diese E-Mail wurde von Avast Antivirus-Software auf Viren
 geprüft. https://www.avast.com/antivirus



 --
 Hendrik Friedel
 Auf dem Brink 12
 28844 Weyhe
 Tel. 04203 8394854
 Mobil 0178 1874363

 ---
 Diese E-Mail wurde von Avast Antivirus-Software auf Viren geprüft.
 https://www.avast.com/antivirus

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: size 2.73TiB used 240.97GiB after balance

2015-07-06 Thread Omar Sandoval
On 07/06/2015 01:01 PM, Donald Pearson wrote:
 Based on my experience Hugo's advice is critical, get the bad drive
 out of the pool when in raid56 and do not try to replace or delete it
 while it's still attached and recognized.
 
 If you add a new device, mount degraded and rebalance.  If you don't,
 mount degraded then device delete missing.
 

Watch out, replacing a missing device in RAID 5/6 currently doesn't work
and will cause a kernel BUG(). See my patch series here:
http://www.spinics.net/lists/linux-btrfs/msg44874.html

-- 
Omar
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: size 2.73TiB used 240.97GiB after balance

2015-07-06 Thread Donald Pearson
Based on my experience Hugo's advice is critical, get the bad drive
out of the pool when in raid56 and do not try to replace or delete it
while it's still attached and recognized.

If you add a new device, mount degraded and rebalance.  If you don't,
mount degraded then device delete missing.

On Mon, Jul 6, 2015 at 2:49 PM, Hugo Mills h...@carfax.org.uk wrote:
 On Mon, Jul 06, 2015 at 09:44:53PM +0200, Hendrik Friedel wrote:
 Hello,

 ok, sdc seems to have failed (sorry, I checked only sdd and sdb
 SMART values, as sdc is brand new. Maybe a bad assumption, from my
 side.

 I have mounted the device
 mount -o recovery,ro

 So, what should I do now:
 btrfs device delete /dev/sdc /mnt

 or

 mount -o degraded /dev/sdb /mnt
 btrfs device delete missing /mnt

 I do have a backup of the most valuable data.
 But if you consider one of the above options risky, I might better
 get a new drive before, but this might take a couple of days (in
 which sdc could further degrade).
 What is your recommendation?

Physically remove the device from the array, mount with -o
 degraded, optionally add the new device, and run a balance.

Hugo.

 --
 Hugo Mills | I lost my leg in 1942. Some bastard stole it in a
 hugo@... carfax.org.uk | pub in Pimlico.
 http://carfax.org.uk/  |
 PGP: E2AB1DE4  |
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: size 2.73TiB used 240.97GiB after balance

2015-07-06 Thread Hendrik Friedel

Hello,

oh dear, I fear I am in trouble:
recovery-mounted, I tried to save some data, but the system hung.
So I re-booted and sdc is now physically disconnected.

Label: none  uuid: b4a6cce6-dc9c-4a13-80a4-ed6bc5b40bb8
Total devices 3 FS bytes used 4.67TiB
devid1 size 2.73TiB used 2.67TiB path /dev/sdc
devid2 size 2.73TiB used 2.67TiB path /dev/sdb
*** Some devices missing

I try to mount the rest again:
mount -o recovery,ro /dev/sdb /mnt/__Complete_Disk
mount: wrong fs type, bad option, bad superblock on /dev/sdb,
   missing codepage or helper program, or other error
   In some cases useful info is found in syslog - try
   dmesg | tail  or so

root@homeserver:~# dmesg | tail 


[  447.059275] BTRFS info (device sdc): enabling auto recovery
[  447.059280] BTRFS info (device sdc): disk space caching is enabled
[  447.086844] BTRFS: failed to read chunk tree on sdc
[  447.110588] BTRFS: open_ctree failed
[  474.496778] BTRFS info (device sdc): enabling auto recovery
[  474.496781] BTRFS info (device sdc): disk space caching is enabled
[  474.519005] BTRFS: failed to read chunk tree on sdc
[  474.540627] BTRFS: open_ctree failed


mount -o degraded,ro /dev/sdb /mnt/__Complete_Disk
Does work now though.

So, how can I remove the reference to the failed disk and check the data 
for consistency (scrub I suppose, but is it safe?)?


Regards,
Hendrik



On 06.07.2015 22:52, Omar Sandoval wrote:

On 07/06/2015 01:01 PM, Donald Pearson wrote:

Based on my experience Hugo's advice is critical, get the bad drive
out of the pool when in raid56 and do not try to replace or delete it
while it's still attached and recognized.

If you add a new device, mount degraded and rebalance.  If you don't,
mount degraded then device delete missing.



Watch out, replacing a missing device in RAID 5/6 currently doesn't work
and will cause a kernel BUG(). See my patch series here:
http://www.spinics.net/lists/linux-btrfs/msg44874.html




--
Hendrik Friedel
Auf dem Brink 12
28844 Weyhe
Tel. 04203 8394854
Mobil 0178 1874363

---
Diese E-Mail wurde von Avast Antivirus-Software auf Viren geprüft.
https://www.avast.com/antivirus

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


size 2.73TiB used 240.97GiB after balance

2015-07-06 Thread Hendrik Friedel

Hello,

I started with a raid1:
devid1 size 2.73TiB used 2.67TiB path /dev/sdd
devid2 size 2.73TiB used 2.67TiB path /dev/sdb
Then I added a third device, /dev/sdc1 and a balance
btrfs balance start -dconvert=raid5 -mconvert=raid5 /mnt/__Complete_Disk/

Now the file-system looks like this:
Total devices 3 FS bytes used 4.68TiB
devid1 size 2.73TiB used 2.67TiB path /dev/sdd
devid2 size 2.73TiB used 2.67TiB path /dev/sdb
devid3 size 2.73TiB used 240.97GiB path /dev/sdc1

I am surprised by the 240.97GiB...

In the syslog and dmesg I find several:
[108274.415499] btrfs_dev_stat_print_on_error: 8 callbacks suppressed
[108279.840334] btrfs_dev_stat_print_on_error: 12 callbacks suppressed

What's wrong here?

Regards,
Hendrik

---
Diese E-Mail wurde von Avast Antivirus-Software auf Viren geprüft.
https://www.avast.com/antivirus

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: size 2.73TiB used 240.97GiB after balance

2015-07-06 Thread Hugo Mills
On Mon, Jul 06, 2015 at 09:44:53PM +0200, Hendrik Friedel wrote:
 Hello,
 
 ok, sdc seems to have failed (sorry, I checked only sdd and sdb
 SMART values, as sdc is brand new. Maybe a bad assumption, from my
 side.
 
 I have mounted the device
 mount -o recovery,ro
 
 So, what should I do now:
 btrfs device delete /dev/sdc /mnt
 
 or
 
 mount -o degraded /dev/sdb /mnt
 btrfs device delete missing /mnt
 
 I do have a backup of the most valuable data.
 But if you consider one of the above options risky, I might better
 get a new drive before, but this might take a couple of days (in
 which sdc could further degrade).
 What is your recommendation?

   Physically remove the device from the array, mount with -o
degraded, optionally add the new device, and run a balance.

   Hugo.

-- 
Hugo Mills | I lost my leg in 1942. Some bastard stole it in a
hugo@... carfax.org.uk | pub in Pimlico.
http://carfax.org.uk/  |
PGP: E2AB1DE4  |


signature.asc
Description: Digital signature


Re: size 2.73TiB used 240.97GiB after balance

2015-07-06 Thread Hendrik Friedel

Hello,

ok, sdc seems to have failed (sorry, I checked only sdd and sdb SMART 
values, as sdc is brand new. Maybe a bad assumption, from my side.


I have mounted the device
mount -o recovery,ro

So, what should I do now:
btrfs device delete /dev/sdc /mnt

or

mount -o degraded /dev/sdb /mnt
btrfs device delete missing /mnt

I do have a backup of the most valuable data.
But if you consider one of the above options risky, I might better get a 
new drive before, but this might take a couple of days (in which sdc 
could further degrade).

What is your recommendation?


Regards,
Hendrik

---
Diese E-Mail wurde von Avast Antivirus-Software auf Viren geprüft.
https://www.avast.com/antivirus

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: size 2.73TiB used 240.97GiB after balance

2015-07-06 Thread Donald Pearson
Anything in dmesg?

On Mon, Jul 6, 2015 at 5:07 PM, hend...@friedels.name
hend...@friedels.name wrote:
 Hallo,

 It seems, that mounting works, but the System locks completely soon after I
 backing up.


 Greetings,

 Hendrik


 -- Originalnachricht--

 Von: Donald Pearson

 Datum: Mo., 6. Juli 2015 23:49

 An: Hendrik Friedel;

 Cc: Omar Sandoval;Hugo Mills;Btrfs BTRFS;

 Betreff:Re: size 2.73TiB used 240.97GiB after balance


 If you can mount it RO, first thing to do is back up any data that youcare
 about.According to the bug that Omar posted you should not try a
 devicereplace and you should not try a scrub with a missing device.You may
 be able to just do a device delete missing, then separately doa device add
 of a new drive, or rebalance back in to raid1.On Mon, Jul 6, 2015 at 4:12
 PM, Hendrik Friedel  wrote: Hello, oh dear, I fear I am in trouble:
 recovery-mounted, I tried to save some data, but the system hung. So I
 re-booted and sdc is now physically disconnected. Label: none  uuid:
 b4a6cce6-dc9c-4a13-80a4-ed6bc5b40bb8 Total devices 3 FS bytes used
 4.67TiB devid1 size 2.73TiB used 2.67TiB path /dev/sdc
 devid2 size 2.73TiB used 2.67TiB path /dev/sdb *** Some devices
 missing I try to mount the rest again: mount -o recovery,ro /dev/sdb
 /mnt/__Complete_Disk mount: wrong fs type, bad option, bad superblock on
 /dev/sdb,missing codepage or helper program, or other error
 In some cases useful info is found in syslog - trydmesg | tail  or
 so root@homeserver:~# dmesg | tail [  447.059275] BTRFS info (device
 sdc): enabling auto recovery [  447.059280] BTRFS info (device sdc): disk
 space caching is enabled [  447.086844] BTRFS: failed to read chunk tree on
 sdc [  447.110588] BTRFS: open_ctree failed [  474.496778] BTRFS info
 (device sdc): enabling auto recovery [  474.496781] BTRFS info (device
 sdc): disk space caching is enabled [  474.519005] BTRFS: failed to read
 chunk tree on sdc [  474.540627] BTRFS: open_ctree failed mount -o
 degraded,ro /dev/sdb /mnt/__Complete_Disk Does work now though. So, how
 can I remove the reference to the failed disk and check the data for
 consistency (scrub I suppose, but is it safe?)? Regards, Hendrik On
 06.07.2015 22:52, Omar Sandoval wrote: On 07/06/2015 01:01 PM, Donald
 Pearson wrote: Based on my experience Hugo's advice is critical, get
 the bad drive out of the pool when in raid56 and do not try to replace or
 delete it while it's still attached and recognized. If you add a
 new device, mount degraded and rebalance.  If you don't, mount degraded
 then device delete missing. Watch out, replacing a missing device in
 RAID 5/6 currently doesn't work and will cause a kernel BUG(). See my
 patch series here:
 http://www.spinics.net/lists/linux-btrfs/msg44874.html -- Hendrik
 Friedel Auf dem Brink 12 28844 Weyhe Tel. 04203 8394854 Mobil 0178
 1874363 --- Diese E-Mail wurde von Avast Antivirus-Software auf Viren
 geprüft. https://www.avast.com/antivirus
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: size 2.73TiB used 240.97GiB after balance

2015-07-06 Thread Donald Pearson
If you can mount it RO, first thing to do is back up any data that you
care about.

According to the bug that Omar posted you should not try a device
replace and you should not try a scrub with a missing device.

You may be able to just do a device delete missing, then separately do
a device add of a new drive, or rebalance back in to raid1.

On Mon, Jul 6, 2015 at 4:12 PM, Hendrik Friedel hend...@friedels.name wrote:
 Hello,

 oh dear, I fear I am in trouble:
 recovery-mounted, I tried to save some data, but the system hung.
 So I re-booted and sdc is now physically disconnected.

 Label: none  uuid: b4a6cce6-dc9c-4a13-80a4-ed6bc5b40bb8
 Total devices 3 FS bytes used 4.67TiB
 devid1 size 2.73TiB used 2.67TiB path /dev/sdc
 devid2 size 2.73TiB used 2.67TiB path /dev/sdb
 *** Some devices missing

 I try to mount the rest again:
 mount -o recovery,ro /dev/sdb /mnt/__Complete_Disk
 mount: wrong fs type, bad option, bad superblock on /dev/sdb,
missing codepage or helper program, or other error
In some cases useful info is found in syslog - try
dmesg | tail  or so

 root@homeserver:~# dmesg | tail
 [  447.059275] BTRFS info (device sdc): enabling auto recovery
 [  447.059280] BTRFS info (device sdc): disk space caching is enabled
 [  447.086844] BTRFS: failed to read chunk tree on sdc
 [  447.110588] BTRFS: open_ctree failed
 [  474.496778] BTRFS info (device sdc): enabling auto recovery
 [  474.496781] BTRFS info (device sdc): disk space caching is enabled
 [  474.519005] BTRFS: failed to read chunk tree on sdc
 [  474.540627] BTRFS: open_ctree failed


 mount -o degraded,ro /dev/sdb /mnt/__Complete_Disk
 Does work now though.

 So, how can I remove the reference to the failed disk and check the data for
 consistency (scrub I suppose, but is it safe?)?

 Regards,
 Hendrik




 On 06.07.2015 22:52, Omar Sandoval wrote:

 On 07/06/2015 01:01 PM, Donald Pearson wrote:

 Based on my experience Hugo's advice is critical, get the bad drive
 out of the pool when in raid56 and do not try to replace or delete it
 while it's still attached and recognized.

 If you add a new device, mount degraded and rebalance.  If you don't,
 mount degraded then device delete missing.


 Watch out, replacing a missing device in RAID 5/6 currently doesn't work
 and will cause a kernel BUG(). See my patch series here:
 http://www.spinics.net/lists/linux-btrfs/msg44874.html



 --
 Hendrik Friedel
 Auf dem Brink 12
 28844 Weyhe
 Tel. 04203 8394854
 Mobil 0178 1874363


 ---
 Diese E-Mail wurde von Avast Antivirus-Software auf Viren geprüft.
 https://www.avast.com/antivirus

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: size 2.73TiB used 240.97GiB after balance

2015-07-06 Thread Hendrik Friedel

Hello,

while mounting works with the recovery option, the system locks after 
reading.

dmesg shows:
[  684.258246] ata6.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[  684.258249] ata6.00: irq_stat 0x4001
[  684.258252] ata6.00: failed command: DATA SET MANAGEMENT
[  684.258255] ata6.00: cmd 06/01:01:00:00:00/00:00:00:00:00/a0 tag 26 
dma 512 out
[  684.258255]  res 51/04:01:01:00:00/00:00:00:00:00/a0 Emask 
0x1 (device error)

[  684.258256] ata6.00: status: { DRDY ERR }
[  684.258258] ata6.00: error: { ABRT }
[  684.258266] sd 5:0:0:0: [sdd] tag#26 FAILED Result: hostbyte=DID_OK 
driverbyte=DRIVER_SENSE
[  684.258268] sd 5:0:0:0: [sdd] tag#26 Sense Key : Illegal Request 
[current] [descriptor]

[  684.258270] sd 5:0:0:0: [sdd] tag#26 Add. Sense: Unaligned write command
[  684.258272] sd 5:0:0:0: [sdd] tag#26 CDB: Write same(16) 93 08 00 00 
00 00 00 01 d3 80 00 00 00 80 00 00



So, also this drive is failing?!

Regards,
Hendrik

On 07.07.2015 00:59, Donald Pearson wrote:

Anything in dmesg?

On Mon, Jul 6, 2015 at 5:07 PM, hend...@friedels.name
hend...@friedels.name wrote:

Hallo,

It seems, that mounting works, but the System locks completely soon after I
backing up.


Greetings,

Hendrik


-- Originalnachricht--

Von: Donald Pearson

Datum: Mo., 6. Juli 2015 23:49

An: Hendrik Friedel;

Cc: Omar Sandoval;Hugo Mills;Btrfs BTRFS;

Betreff:Re: size 2.73TiB used 240.97GiB after balance


If you can mount it RO, first thing to do is back up any data that youcare
about.According to the bug that Omar posted you should not try a
devicereplace and you should not try a scrub with a missing device.You may
be able to just do a device delete missing, then separately doa device add
of a new drive, or rebalance back in to raid1.On Mon, Jul 6, 2015 at 4:12
PM, Hendrik Friedel  wrote: Hello, oh dear, I fear I am in trouble:
recovery-mounted, I tried to save some data, but the system hung. So I
re-booted and sdc is now physically disconnected. Label: none  uuid:
b4a6cce6-dc9c-4a13-80a4-ed6bc5b40bb8 Total devices 3 FS bytes used
4.67TiB devid1 size 2.73TiB used 2.67TiB path /dev/sdc
devid2 size 2.73TiB used 2.67TiB path /dev/sdb *** Some devices
missing I try to mount the rest again: mount -o recovery,ro /dev/sdb
/mnt/__Complete_Disk mount: wrong fs type, bad option, bad superblock on
/dev/sdb,missing codepage or helper program, or other error
In some cases useful info is found in syslog - trydmesg | tail  or
so root@homeserver:~# dmesg | tail [  447.059275] BTRFS info (device
sdc): enabling auto recovery [  447.059280] BTRFS info (device sdc): disk
space caching is enabled [  447.086844] BTRFS: failed to read chunk tree on
sdc [  447.110588] BTRFS: open_ctree failed [  474.496778] BTRFS info
(device sdc): enabling auto recovery [  474.496781] BTRFS info (device
sdc): disk space caching is enabled [  474.519005] BTRFS: failed to read
chunk tree on sdc [  474.540627] BTRFS: open_ctree failed mount -o
degraded,ro /dev/sdb /mnt/__Complete_Disk Does work now though. So, how
can I remove the reference to the failed disk and check the data for
consistency (scrub I suppose, but is it safe?)? Regards, Hendrik On
06.07.2015 22:52, Omar Sandoval wrote: On 07/06/2015 01:01 PM, Donald
Pearson wrote: Based on my experience Hugo's advice is critical, get
the bad drive out of the pool when in raid56 and do not try to replace or
delete it while it's still attached and recognized. If you add a
new device, mount degraded and rebalance.  If you don't, mount degraded
then device delete missing. Watch out, replacing a missing device in
RAID 5/6 currently doesn't work and will cause a kernel BUG(). See my
patch series here:
http://www.spinics.net/lists/linux-btrfs/msg44874.html -- Hendrik
Friedel Auf dem Brink 12 28844 Weyhe Tel. 04203 8394854 Mobil 0178
1874363 --- Diese E-Mail wurde von Avast Antivirus-Software auf Viren
geprüft. https://www.avast.com/antivirus



--
Hendrik Friedel
Auf dem Brink 12
28844 Weyhe
Tel. 04203 8394854
Mobil 0178 1874363

---
Diese E-Mail wurde von Avast Antivirus-Software auf Viren geprüft.
https://www.avast.com/antivirus

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html