Recovering a 4xhdd RAID10 file system with 2 failed disks
Recovering a 4xhdd RAID10 file system with 2 failed disks Hi all, Quick and Dirty: 4disk RAID10 with 2 missing devices, mounts as degraded,ro , readonly scrub ends with no errors Recovery options: A/ If you had at least 3 hdds, you could replace/add a device B/ If you only have 2 hdds, even if scrub ro is ok, you cannot replace/add a device So I guess the best option is: B.1/ create a new RAID0 filesystem , copy data over to the new filesystem, add the old drives to the new filesystem, re-balance the system as RAID10. B.2/ any other ways to recover that I am missing ? anything easier/faster ? Long story: A couple of weeks back I had a failed hdd in a RAID10 4disk btrfs. I added a new device, removed the failed, but three days later after the recovery, I ended up with another 2 failing disks. So I physically removed the failing 2 disks from the drive bays. (sent one back to Seagate for replacement, the other one I kept it and will send it later) (please note I do have a backup) Good thing is that the two drives I have left in this RAID10 , seem to hold all data and data seems ok according to a read-only scrub. The remaining 2 disks from the RAID can be mounted with –o degraded,ro I did a read-only scrub on the filesystem (while mounted as –o degraded,ro) and scrub ended with no errors. I hope this ro scrub is 100% validation that I have not lost any files, and all files are ok. Just today I *tried* to inserted a new disk, and add it to the RAID10 setup. If I mount the filesystem as degraded,ro I cannot add a new device (btrfs device add). And I cannot replace a disk (btrfs replace –r start). That is because the filesystem is mounted not only as degraded but as read-only. But a two disk RAID10, can only be mounted as ro. This is by design gitorious.org/linux-n900/linux-n900/commit /bbb651e469d99f0088e286fdeb54acca7bb4ad4e But again, a RAID10 system should be recoverable somehow if the data is all there but half of the disks are missing. ( Ie. the raid0 drives are there and only the raid1 part is missing. The striped volume is ok, the mirror data is missing) If it was an ordinary RAID10 , replacing the two mirror disks at the same time should be acceptable and the RAID should be recoverable. Myself I am lucky , since I still have one of the old failing disks in my hands. (the other one is being RMAd currently) I can insert the old failing disk and mount the file system as degraded (but not ro), and then run a btrfs replace or btrfs device add. But in case I did not have the old failing disk in my hands, or if the disk was damaged beyond recognition/repair (eg not recognized in BIOS), as far as I understand it is impossible to add/replace drives in a file system mounted as read-only. Am I missing something ? Is there a better and faster way to recover a RAID10 when only the striped data is there but not the mirror data? Thanks in advance, TM -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Recovering a 4xhdd RAID10 file system with 2 failed disks
Recovering a 4xhdd RAID10 file system with 2 failed disks Hi all, Quick and Dirty: 4disk RAID10 with 2 missing devices, mounts as degraded,ro , readonly scrub ends with no errors Recovery options: A/ If you had at least 3 hdds, you could replace/add a device B/ If you only have 2 hdds, even if scrub ro is ok, you cannot replace/add a device So I guess the best option is: B.1/ create a new RAID0 filesystem , copy data over to the new filesystem, move the old drives to the new filesystem, re-balance the system as RAID10. B.2/ any other ways to recover I am missing ? anything easier/faster ? Long story: A couple of weeks back I had a failed hdd in a RAID10 4disk btrfs. I added a new device, removed the failed, but three days later after the recovery, I ended up with another 2 failing disks. So I physically removed the failing 2 disks from the drive bays. (sent one back to Seagate for replacement, the other one I kept it and will send it later) (please note I do have a backup) Good thing is that the two drives I have left in this RAID10 , seem to hold all data and data seems ok according to a read-only scrub. The remaining 2 disks from the RAID can be mounted with –o degraded,ro I did a read-only scrub on the filesystem (while mounted as –o degraded,ro) and scrub ended with no errors. I hope this ro scrub is 100% validation that I have not lost any files, and all files are ok. Just today I *tried* to inserted a new disk, and add it to the RAID10 setup. If I mount the filesystem as degraded,ro I cannot add a new device (btrfs device add). And I cannot replace a disk (btrfs replace –r start). That is because the filesystem is mounted not only as degraded but as read-only. But a two disk RAID10, can only be mounted as ro. This is by design gitorious.org/linux-n900/linux-n900/commit /bbb651e469d99f0088e286fdeb54acca7bb4ad4e But again, a RAID10 system should be recoverable somehow if the data is all there but half of the disks are missing. (Ie. the raid0 data is there and only the raid1 part is missing. The striped volume is ok, the mirror data is missing) If it was an ordinary RAID10 , replacing the two mirror disks at the same time should be acceptable and the RAID should be recoverable. Myself I am lucky , since I still have one of the old failing disks in my hands. (the other one is being RMAd currently) I can insert the old failing disk and mount the file system as degraded (but not ro), and then run a btrfs replace or btrfs device add. But in case I did not have the old failing disk in my hands, or if the disk was damaged beyond recognition/repair (eg not recognized in BIOS), as far as I understand it is impossible to add/replace drives in a file system mounted as read-only. Am I missing something ? Is there a better and faster way to recover a RAID10 when only the striped data is there but not the mirror data? Thanks in advance, TM -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 1 week to rebuid 4x 3TB raid10 is a long time!
Wang Shilong wangsl.fnst at cn.fujitsu.com writes: The latest btrfs-progs include man page of btrfs-replace. Actually, you could use it something like: btrfs replace start srcdev|devid targetdev mnt You could use 'btrfs file show' to see missing device id. and then run btrfs replace. Hi Wang, I physically removed the drive before the rebuild, having a failing device as a source is not a good idea anyway. Without the device in place, the device name is not showing up, since the missing device is not under /dev/sdXX or anything else. That is why I asked if the special parameter 'missing' may be used in a replace. I can't say if it is supported. But I guess not, since I found no documentation on this matter. So I guess replace is not aimed at fault tolerance / rebuilding. It's just a convenient way to lets lay replace the disks with larger disks , to extend your array. A convenience tool, not an emergency tool. TM Thanks, Wang -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 1 week to rebuid 4x 3TB raid10 is a long time!
Stefan Behrens sbehrens at giantdisaster.de writes: TM, Just read the man-page. You could have used the replace tool after physically removing the failing device. Quoting the man page: If the source device is not available anymore, or if the -r option is set, the data is built only using the RAID redundancy mechanisms. Options -r only read from srcdev if no other zero-defect mirror exists (enable this if your drive has lots of read errors, the access would be very slow) Concerning the rebuild performance, the access to the disk is linear for both reading and writing, I measured above 75 MByte/s at that time with regular 7200 RPM disks, which would be less than 10 hours to replace a 3TB disk (in worst case, if it is completely filled up). Unused/unallocated areas are skipped and additionally improve the rebuild speed. For missing disks, unfortunately the command invocation is not using the term missing but the numerical device-id instead of the device name. missing _is_ implemented in the kernel part of the replace code, but was simply forgotten in the user mode part, at least it was forgotten in the man page. Hi Stefan, thank you very much, for the comprehensive info, I will opt to use replace next time. Breaking news :-) from Jul 19 14:41:36 microserver kernel: [ 1134.244007] btrfs: relocating block group 8974430633984 flags 68 to Jul 22 16:54:54 microserver kernel: [268419.463433] btrfs: relocating block group 2991474081792 flags 65 Rebuild ended before counting down to So flight time was 3 days, and I see no more messages or btrfs processes utilizing cpu. So rebuild seams ready. Just a few hours ago another disk showed some earlly touble accumulating Current_Pending_Sector but no Reallocated_Sector_Ct yet. TM -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 1 week to rebuid 4x 3TB raid10 is a long time!
Wang Shilong wangsl.fnst at cn.fujitsu.com writes: Just my two cents: Since 'btrfs replace' support RADI10, I suppose using replace operation is better than 'device removal and add'. Another Question is related to btrfs snapshot-aware balance. How many snapshots did you have in your system? Of course, During balance/resize/device removal operations, you could still snapshot, but fewer snapshots should speed things up! Anyway 'btrfs replace' is implemented more effective than 'device remova and add'. Hi Wang, just one subvolume, no snaphots or anything else. device replace: to tell you the truth I have not used it in the past. Most of my testing was done 2 years ago. So in this 'kind of production' system I did not try it. But if I knew that it was faster, perhaps I could have used it. Anyone has statistics for such a replace and the time it takes? Also, can replace be used when one device is missing? Cant find documentation. eg. btrfs replace start missing /dev/sdXX TM -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
1 week to rebuid 4x 3TB raid10 is a long time!
Hi, I have a raid10 with 4x 3TB disks on a microserver http://n40l.wikia.com/wiki/Base_Hardware_N54L , 8Gb RAM Recently one disk started to fail (smart errors), so I replaced it Mounted as degraded, added new disk, removed old Started yesterday I am monitoring /var/log/messages and it seems it will take a long time Started at about 8010631739392 And 20 hours later I am at 6910631739392 btrfs: relocating block group 6910631739392 flags 65 At this rate it will take a week to complete the raid rebuild!!! Furthermore it seems that the operation is getting slower and slower When the rebuild started I had a new message every half a minute, now it’s getting to OneAndHalf minutes Most files are small files like flac/jpeg One week for a raid10 rebuild 4x3TB drives is a very long time. Any thoughts? Can you share any statistics from your RAID10 rebuilds? If I shut down the system, before the rebuild, what is the proper procedure to remount it? Again degraded? Or normally? Can the process of rebuilding the raid continue after a reboot? Will it survive, and continue rebuilding? Thanks in advance TM -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
block rsv returned -28
Hi all, in a newly created btrfs filesystem, just after two days and ~4K dirs, 40K files, performance has degraded very bad. I did a chmod / chown the files once (that might have implications on the filesystem) but this is casual/expected use. Only notable thing in dmesg is [9 times] btrfs: block rsv returned -28 [ cut here ] WARNING: at fs/btrfs/extent-tree.c:6297 use_block_rsv+0x192/0x1a0 [btrfs]() Hardware name: ProLiant MicroServer Modules linked in: bnx2fc cnic uio fcoe libfcoe libfc scsi_transport_fc scsi_tgt vboxpci(OF) vboxnetadp(OF) vboxnetflt(OF) 8021q vboxdrv(OF) garp stp llc sunrpc ipt_REJECT ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 btrfs zlib_deflate libcrc32c kvm_amd kvm microcode pcspkr k10temp amd64_edac_mod edac_core edac_mce_amd i2c_piix4 tg3 sg shpchp ext4 mbcache jbd2 sd_mod crc_t10dif ahci libahci radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core dm_mirror dm_region_hash dm_log dm_mod [last unloaded: nf_defrag_ipv4] Pid: 1859, comm: btrfs-transacti Tainted: GF W O 3.7.1 #1 Call Trace: [8105556f] warn_slowpath_common+0x7f/0xc0 [810555ca] warn_slowpath_null+0x1a/0x20 [a0406e02] use_block_rsv+0x192/0x1a0 [btrfs] [a040bb9d] btrfs_alloc_free_block+0x3d/0x210 [btrfs] [a0432ab1] ? read_extent_buffer+0xd1/0x130 [btrfs] [a03f6eb0] __btrfs_cow_block+0x130/0x560 [btrfs] [a03f7942] btrfs_cow_block+0x102/0x210 [btrfs] [a03fad11] btrfs_search_slot+0x391/0x810 [btrfs] [a04577f7] __btrfs_write_out_cache+0x757/0x960 [btrfs] [a045b3ae] ? btrfs_find_ref_cluster+0x5e/0x160 [btrfs] [a0457b72] btrfs_write_out_cache+0xb2/0xf0 [btrfs] [a0409fe8] btrfs_write_dirty_block_groups+0x238/0x270 [btrfs] [a041a2c1] commit_cowonly_roots+0x171/0x250 [btrfs] [a041b120] btrfs_commit_transaction+0x570/0xa20 [btrfs] [a041ba84] ? start_transaction+0x94/0x430 [btrfs] [81079cb0] ? wake_up_bit+0x40/0x40 [a0415d26] transaction_kthread+0x1a6/0x220 [btrfs] [a0415b80] ? btree_readpage_end_io_hook+0x290/0x290 [btrfs] [a0415b80] ? btree_readpage_end_io_hook+0x290/0x290 [btrfs] [8107942e] kthread+0xce/0xe0 [81079360] ? kthread_freezable_should_stop+0x70/0x70 [8153e6ac] ret_from_fork+0x7c/0xb0 [81079360] ? kthread_freezable_should_stop+0x70/0x70 ---[ end trace 592323d6a331318d ]--- Kernel: Linux microserver 3.7.1 #1 SMP Sun Dec 30 21:34:59 EET 2012 x86_64 x86_64 x86_64 GNU/Linux Toos: Btrfs v0.20-rc1-37-g91d9eec # btrfs fi show Label: 'tm_0' uuid: f2866a33-fe53-4fc0-98bf-52d347b43824 Total devices 1 FS bytes used 712.21GB devid1 size 1.82TB used 714.04GB path /dev/sdb1 # btrfs fi df /mnt/dls Data: total=712.01GB, used=711.24GB System, DUP: total=8.00MB, used=80.00KB System: total=4.00MB, used=0.00 Metadata, DUP: total=1.00GB, used=998.53MB Metadata: total=8.00MB, used=0.00 Performance is horrible for the last 12 hours at least I tried remounting with noatime nothing changed I tried rebooting, stopping any services like smb etc. I tried defragmenting nothing changed Metadata seems full Darksatanic suggested That's ENOSPC (Thanks to the guys at irc chanel for their support :) But how can I regain space on Metadata? Any suggestions? TM -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: block rsv returned -28
TM tmjuju at yahoo.com writes: # btrfs fi df /mnt/dls Data: total=712.01GB, used=711.24GB System, DUP: total=8.00MB, used=80.00KB System: total=4.00MB, used=0.00 Metadata, DUP: total=1.00GB, used=998.53MB Metadata: total=8.00MB, used=0.00 btrfs fi show Label: 'tm_0' uuid: f2866a33-fe53-4fc0-98bf-52d347b43824 Total devices 1 FS bytes used 712.21GB devid1 size 1.82TB used 714.04GB path /dev/sdb1 ... [root@microserver mnt]# btrfs balance start -dusage=5 /mnt/dls/ Done, had to relocate 0 out of 716 chunks [root@microserver mnt]# btrfs balance start -dusage=10 /mnt/dls/ Done, had to relocate 0 out of 716 chunks [root@microserver mnt]# btrfs balance start -dusage=15 /mnt/dls/ Done, had to relocate 0 out of 716 chunks [root@microserver mnt]# btrfs balance start -dusage=25 /mnt/dls/ Done, had to relocate 0 out of 716 chunks [root@microserver mnt]# btrfs balance start -dusage=35 /mnt/dls/ Done, had to relocate 1 out of 716 chunks ... Meanwhile situation got worst .. [root@microserver ~]# btrfs fi df /mnt/dls/ Data: total=713.01GB, used=712.23GB System, DUP: total=8.00MB, used=80.00KB System: total=4.00MB, used=0.00 Metadata, DUP: total=1.00GB, used=999.31MB Metadata: total=8.00MB, used=0.00 and I kept getting more messages dmesg |grep block rsv returned -28 |wc -l 60 (I had stared with only 9 messages just a few hours ago) So I decided to remove files I started to remove (rm -rfv) thousands of files and directories At first the rate was about 1 filedelete per second... Always monitoring usage while deleteing Utilll I got down to: # btrfs fi df /mnt/dls/ Data: total=712.01GB, used=638.04GB System, DUP: total=8.00MB, used=80.00KB System: total=4.00MB, used=0.00 Metadata, DUP: total=1.00GB, used=895.24MB Metadata: total=8.00MB, used=0.00 And then the filesystem became responsive again Back to 100MBps range from ~1MBps -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html