Recovering a 4xhdd RAID10 file system with 2 failed disks

2014-08-06 Thread TM
Recovering a 4xhdd RAID10 file system with 2 failed disks

Hi all,

  Quick and Dirty:
  4disk RAID10 with 2 missing devices, mounts as degraded,ro ,  readonly
scrub ends with no errors
  Recovery options:
  A/ If you had at least 3 hdds, you could replace/add a device
  B/ If you only have 2 hdds, even if scrub ro is ok, 
 you cannot replace/add a device
  So I guess the best option is:
  B.1/ create a new RAID0 filesystem , copy data over to the new filesystem,
 add the old drives to the new filesystem, re-balance the system as RAID10.
  B.2/ any other ways to recover that I am missing ? anything easier/faster ?


  Long story:
  A couple of weeks back I had a failed hdd in a RAID10 4disk btrfs.
  I added a new device, removed the failed, but three days later after the
recovery, I ended up with another 2 failing disks.
  So I physically removed the failing 2 disks from the drive bays. 
  (sent one back to Seagate for replacement, the other one I kept it and
will send it later)
  (please note I do have a backup)

  Good thing is that the two drives I have left in this RAID10 , seem to
hold all data and data seems ok according to a read-only scrub.
  The remaining 2 disks from the RAID can be mounted with –o degraded,ro
  I did a read-only scrub on the filesystem (while mounted as –o
degraded,ro) and scrub ended with no errors. 
  I hope this ro scrub is 100% validation that I have not lost any files,
and all files are ok. 

  Just today I *tried* to inserted a new disk, and add it to the RAID10 setup.
  If I mount the filesystem as degraded,ro I cannot add a new device (btrfs
device add). And I cannot replace a disk (btrfs replace –r start).
  That is because the filesystem is mounted not only as degraded but as
read-only.
  But a two disk RAID10, can only be mounted as ro.
  This is by design
gitorious.org/linux-n900/linux-n900/commit
/bbb651e469d99f0088e286fdeb54acca7bb4ad4e
  
  But again, a RAID10 system should be recoverable somehow if the data is
all there but half of the disks are missing. 
 (  Ie. the raid0 drives are there and only the raid1 part is missing. The
striped volume is ok, the mirror data is missing)
  If it was an ordinary RAID10 , replacing the two mirror disks at the same
time should be acceptable and the RAID should be recoverable.

  Myself I am lucky , since I still have one of the old failing disks in my
hands. (the other one is being RMAd currently)
  I can insert the old failing disk and mount the file system as degraded
(but not ro), and then run a btrfs replace or btrfs device add.

  But in case I did not have the old failing disk in my hands, or if the
disk was damaged beyond recognition/repair (eg not recognized in BIOS),
  as far as I understand it is impossible to add/replace drives in a file
system mounted as read-only.

  Am I missing something ?
  Is there a better and faster way to recover a RAID10 when only the striped
data is there but not the mirror data?

Thanks in advance,
TM


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Recovering a 4xhdd RAID10 file system with 2 failed disks

2014-08-06 Thread TM
Recovering a 4xhdd RAID10 file system with 2 failed disks

Hi all,

  Quick and Dirty:
  4disk RAID10 with 2 missing devices, mounts as degraded,ro ,  readonly
scrub ends with no errors
  Recovery options:
  A/ If you had at least 3 hdds, you could replace/add a device
  B/ If you only have 2 hdds, even if scrub ro is ok, 
 you cannot replace/add a device
  So I guess the best option is:
  B.1/ create a new RAID0 filesystem , copy data over to the new filesystem,
 move the old drives to the new filesystem, re-balance the system as RAID10.
  B.2/ any other ways to recover I am missing ? anything easier/faster ?


  Long story:
  A couple of weeks back I had a failed hdd in a RAID10 4disk btrfs.
  I added a new device, removed the failed, but three days later after the
recovery, I ended up with another 2 failing disks.
  So I physically removed the failing 2 disks from the drive bays. 
  (sent one back to Seagate for replacement, the other one I kept it and
will send it later)
  (please note I do have a backup)

  Good thing is that the two drives I have left in this RAID10 , seem to
hold all data and data seems ok according to a read-only scrub.
  The remaining 2 disks from the RAID can be mounted with –o degraded,ro
  I did a read-only scrub on the filesystem (while mounted as –o
degraded,ro) and scrub ended with no errors. 
  I hope this ro scrub is 100% validation that I have not lost any files,
and all files are ok. 

  Just today I *tried* to inserted a new disk, and add it to the RAID10 setup.
  If I mount the filesystem as degraded,ro I cannot add a new device (btrfs
device add). And I cannot replace a disk (btrfs replace –r start).
  That is because the filesystem is mounted not only as degraded but as
read-only.
  But a two disk RAID10, can only be mounted as ro.
  This is by design
gitorious.org/linux-n900/linux-n900/commit
/bbb651e469d99f0088e286fdeb54acca7bb4ad4e
  
  But again, a RAID10 system should be recoverable somehow if the data is
all there but half of the disks are missing. 
(Ie. the raid0 data is there and only the raid1 part is missing. The
striped volume is ok, the mirror data is missing)
  If it was an ordinary RAID10 , replacing the two mirror disks at the same
time should be acceptable and the RAID should be recoverable.

  Myself I am lucky , since I still have one of the old failing disks in my
hands. (the other one is being RMAd currently)
  I can insert the old failing disk and mount the file system as degraded
(but not ro), and then run a btrfs replace or btrfs device add.

  But in case I did not have the old failing disk in my hands, or if the
disk was damaged beyond recognition/repair (eg not recognized in BIOS),
  as far as I understand it is impossible to add/replace drives in a file
system mounted as read-only.

  Am I missing something ?
  Is there a better and faster way to recover a RAID10 when only the striped
data is there but not the mirror data?

Thanks in advance,
TM


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 1 week to rebuid 4x 3TB raid10 is a long time!

2014-07-22 Thread TM
Wang Shilong wangsl.fnst at cn.fujitsu.com writes:


 The latest btrfs-progs include man page of btrfs-replace. Actually, you 
 could use it
 something like:
 
 btrfs replace start  srcdev|devid targetdev mnt
 
 You could use 'btrfs file show' to see missing device id. and then run 
 btrfs replace.
 

Hi Wang,

  I physically removed the drive before the rebuild, having a failing device
as a source is not a good idea anyway.
  Without the device in place, the device name is not showing up, since the
missing device is not under /dev/sdXX or anything else. 

  That is why I asked if the special parameter 'missing' may be used in a
replace. I can't say if it is supported. But I guess not, since I found no
documentation on this matter.

  So I guess replace is not aimed at fault tolerance / rebuilding. It's just
a convenient way to lets lay replace the disks with larger disks , to extend
your array. A convenience tool, not an emergency tool.

TM

 Thanks,
 Wang


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 1 week to rebuid 4x 3TB raid10 is a long time!

2014-07-22 Thread TM
Stefan Behrens sbehrens at giantdisaster.de writes:


 TM, Just read the man-page. You could have used the replace tool after
 physically removing the failing device.
 
 Quoting the man page:
 If the source device is not available anymore, or if the -r option is
 set, the data is built only using the RAID redundancy mechanisms.
 
 Options
 -r   only read from srcdev if no other zero-defect mirror
  exists (enable this if your drive has lots of read errors,
  the access would be very slow)
 
 Concerning the rebuild performance, the access to the disk is linear for
 both reading and writing, I measured above 75 MByte/s at that time with
 regular 7200 RPM disks, which would be less than 10 hours to replace a
 3TB disk (in worst case, if it is completely filled up).
 Unused/unallocated areas are skipped and additionally improve the
 rebuild speed.
 
 For missing disks, unfortunately the command invocation is not using the
 term missing but the numerical device-id instead of the device name.
 missing _is_ implemented in the kernel part of the replace code, but
 was simply forgotten in the user mode part, at least it was forgotten in
 the man page.
 

Hi Stefan,
thank you very much, for the comprehensive info, I will opt to use replace
next time.

Breaking news :-) 
from Jul 19 14:41:36 microserver kernel: [ 1134.244007] btrfs: relocating
block group 8974430633984 flags 68
to  Jul 22 16:54:54 microserver kernel: [268419.463433] btrfs: relocating
block group 2991474081792 flags 65

Rebuild ended before counting down to 
So flight time was 3 days, and I see no more messages or btrfs processes
utilizing cpu. So rebuild seams ready.
Just a few hours ago another disk showed some earlly touble accumulating
Current_Pending_Sector but no Reallocated_Sector_Ct yet.


TM

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 1 week to rebuid 4x 3TB raid10 is a long time!

2014-07-21 Thread TM
Wang Shilong wangsl.fnst at cn.fujitsu.com writes:

 Just my two cents:
 
 Since 'btrfs replace' support RADI10, I suppose using replace
 operation is better than 'device removal and add'.
 
 Another Question is related to btrfs snapshot-aware balance.
 How many snapshots did you have in your system?
 
 Of course, During balance/resize/device removal operations,
 you could still snapshot, but fewer snapshots should speed things up!
 
 Anyway 'btrfs replace' is implemented more effective than
 'device remova and add'.
 


Hi Wang,
just one subvolume, no snaphots or anything else.

device replace: to tell you the truth I have not used it in the past. Most
of my testing was done 2 years ago. So in this 'kind of production' system I
did not try it. But if I knew that it was faster, perhaps I could have used
it. Anyone has statistics for such a replace and the time it takes?

Also, can replace be used when one device is missing? Cant find
documentation. eg.
btrfs replace start missing /dev/sdXX


TM


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


1 week to rebuid 4x 3TB raid10 is a long time!

2014-07-20 Thread TM
Hi,

I have a raid10 with 4x 3TB disks on a microserver
http://n40l.wikia.com/wiki/Base_Hardware_N54L , 8Gb RAM

Recently one disk started to fail (smart errors), so I replaced it
Mounted as degraded, added new disk, removed old
Started yesterday
I am monitoring /var/log/messages and it seems it will take a long time
Started at about 8010631739392
And 20 hours later I am at 6910631739392 
btrfs: relocating block group 6910631739392 flags 65

At this rate it will take a week to complete the raid rebuild!!!

Furthermore it seems that the operation is getting slower and slower
When the rebuild started I had a new message every half a minute, now it’s
getting to OneAndHalf minutes
Most files are small files like flac/jpeg

One week for a raid10 rebuild 4x3TB drives is a very long time.
Any thoughts?
Can you share any statistics from your RAID10 rebuilds?

If I shut down the system, before the rebuild, what is the proper procedure
to remount it? Again degraded? Or normally? Can the process of rebuilding
the raid continue after a reboot? Will it survive, and continue rebuilding?

Thanks in advance
TM


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


block rsv returned -28

2013-01-06 Thread TM
Hi all,
in a newly created btrfs filesystem, just after two days and ~4K dirs, 40K
files, performance has degraded very bad. 
I did a chmod / chown the files once (that might have implications on the
filesystem) but this is casual/expected  use. 

Only notable thing in dmesg is  [9 times]

btrfs: block rsv returned -28
[ cut here ]
WARNING: at fs/btrfs/extent-tree.c:6297 use_block_rsv+0x192/0x1a0 [btrfs]()
Hardware name: ProLiant MicroServer
Modules linked in: bnx2fc cnic uio fcoe libfcoe libfc scsi_transport_fc scsi_tgt
vboxpci(OF) vboxnetadp(OF) vboxnetflt(OF) 8021q vboxdrv(OF) garp stp llc sunrpc
ipt_REJECT ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack
ip6table_filter ip6_tables ipv6 btrfs zlib_deflate libcrc32c kvm_amd kvm
microcode pcspkr k10temp amd64_edac_mod edac_core edac_mce_amd i2c_piix4 tg3 sg
shpchp ext4 mbcache jbd2 sd_mod crc_t10dif ahci libahci radeon ttm
drm_kms_helper drm i2c_algo_bit i2c_core dm_mirror dm_region_hash dm_log dm_mod
[last unloaded: nf_defrag_ipv4]
Pid: 1859, comm: btrfs-transacti Tainted: GF   W  O 3.7.1 #1
Call Trace:
 [8105556f] warn_slowpath_common+0x7f/0xc0
 [810555ca] warn_slowpath_null+0x1a/0x20
 [a0406e02] use_block_rsv+0x192/0x1a0 [btrfs]
 [a040bb9d] btrfs_alloc_free_block+0x3d/0x210 [btrfs]
 [a0432ab1] ? read_extent_buffer+0xd1/0x130 [btrfs]
 [a03f6eb0] __btrfs_cow_block+0x130/0x560 [btrfs]
 [a03f7942] btrfs_cow_block+0x102/0x210 [btrfs]
 [a03fad11] btrfs_search_slot+0x391/0x810 [btrfs]
 [a04577f7] __btrfs_write_out_cache+0x757/0x960 [btrfs]
 [a045b3ae] ? btrfs_find_ref_cluster+0x5e/0x160 [btrfs]
 [a0457b72] btrfs_write_out_cache+0xb2/0xf0 [btrfs]
 [a0409fe8] btrfs_write_dirty_block_groups+0x238/0x270 [btrfs]
 [a041a2c1] commit_cowonly_roots+0x171/0x250 [btrfs]
 [a041b120] btrfs_commit_transaction+0x570/0xa20 [btrfs]
 [a041ba84] ? start_transaction+0x94/0x430 [btrfs]
 [81079cb0] ? wake_up_bit+0x40/0x40
 [a0415d26] transaction_kthread+0x1a6/0x220 [btrfs]
 [a0415b80] ? btree_readpage_end_io_hook+0x290/0x290 [btrfs]
 [a0415b80] ? btree_readpage_end_io_hook+0x290/0x290 [btrfs]
 [8107942e] kthread+0xce/0xe0
 [81079360] ? kthread_freezable_should_stop+0x70/0x70
 [8153e6ac] ret_from_fork+0x7c/0xb0
 [81079360] ? kthread_freezable_should_stop+0x70/0x70
---[ end trace 592323d6a331318d ]---



Kernel: 
Linux microserver 3.7.1 #1 SMP Sun Dec 30 21:34:59 EET 2012 x86_64 x86_64 x86_64
GNU/Linux
Toos: 
Btrfs v0.20-rc1-37-g91d9eec

# btrfs fi show
Label: 'tm_0'  uuid: f2866a33-fe53-4fc0-98bf-52d347b43824
Total devices 1 FS bytes used 712.21GB
devid1 size 1.82TB used 714.04GB path /dev/sdb1


# btrfs fi df /mnt/dls
Data: total=712.01GB, used=711.24GB
System, DUP: total=8.00MB, used=80.00KB
System: total=4.00MB, used=0.00
Metadata, DUP: total=1.00GB, used=998.53MB
Metadata: total=8.00MB, used=0.00


Performance is horrible for the last 12 hours at least
I tried remounting with noatime nothing changed
I tried rebooting, stopping any services like smb etc.
I tried defragmenting nothing changed

Metadata seems full
Darksatanic suggested That's ENOSPC 
(Thanks to the guys at irc chanel for their support :)
But how can I regain space on Metadata? 
Any suggestions?

TM


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: block rsv returned -28

2013-01-06 Thread TM
TM tmjuju at yahoo.com writes:

# btrfs fi df /mnt/dls
Data: total=712.01GB, used=711.24GB
System, DUP: total=8.00MB, used=80.00KB
System: total=4.00MB, used=0.00
Metadata, DUP: total=1.00GB, used=998.53MB
Metadata: total=8.00MB, used=0.00

btrfs fi show
Label: 'tm_0'  uuid: f2866a33-fe53-4fc0-98bf-52d347b43824
Total devices 1 FS bytes used 712.21GB
devid1 size 1.82TB used 714.04GB path /dev/sdb1

...

[root@microserver mnt]# btrfs balance start -dusage=5 /mnt/dls/
Done, had to relocate 0 out of 716 chunks
[root@microserver mnt]# btrfs balance start -dusage=10 /mnt/dls/
Done, had to relocate 0 out of 716 chunks
[root@microserver mnt]# btrfs balance start -dusage=15 /mnt/dls/
Done, had to relocate 0 out of 716 chunks
[root@microserver mnt]# btrfs balance start -dusage=25 /mnt/dls/
Done, had to relocate 0 out of 716 chunks
[root@microserver mnt]# btrfs balance start -dusage=35 /mnt/dls/
Done, had to relocate 1 out of 716 chunks

...
Meanwhile situation got worst
..


[root@microserver ~]# btrfs fi df /mnt/dls/
Data: total=713.01GB, used=712.23GB
System, DUP: total=8.00MB, used=80.00KB
System: total=4.00MB, used=0.00
Metadata, DUP: total=1.00GB, used=999.31MB
Metadata: total=8.00MB, used=0.00


and I kept getting more messages
dmesg |grep block rsv returned -28 |wc -l
60

(I had stared with only 9 messages just a few hours ago)

So I decided to remove files
I started to remove (rm -rfv) thousands of files and directories
At first the rate was about 1 filedelete per second... 
Always monitoring usage while deleteing
Utilll I got down to: 

# btrfs fi df /mnt/dls/
Data: total=712.01GB, used=638.04GB
System, DUP: total=8.00MB, used=80.00KB
System: total=4.00MB, used=0.00
Metadata, DUP: total=1.00GB, used=895.24MB
Metadata: total=8.00MB, used=0.00

And then the filesystem became responsive again
Back to 100MBps range from ~1MBps

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html