Re: Failed Disk RAID10 Problems
Chris, Thanks for the continued help. I had to put the recovery on hiatus while I waited for new hard drives to be delivered. I never was able to figure out how to replace the failed drive, but I did learn a lot about how Btrfs works. The approach to doing practically all operations with file system mounted specially was quite a surprise. In the end, I created a Btrfs RAID5 file system with the newly delivered drives on another system and used rsync to copy from the degraded array. There was a little file system damage that showed up as csum failed errors in the logs from the IO that was in progress when the original failure occurred. Fortunately, it was all data that could be recovered from other systems, and there wasn't any need to troubleshoot the errors. Thanks, Justin On Wed, May 28, 2014 at 3:40 PM, Chris Murphy li...@colorremedies.com wrote: On May 28, 2014, at 12:39 PM, Justin Brown justin.br...@fandingo.org wrote: Chris, Thanks for the tip. I was able to mount the drive as degraded and recovery. Then, I deleted the faulty drive, leaving me with the following array: Label: media uuid: 7b7afc82-f77c-44c0-b315-669ebd82f0c5 Total devices 6 FS bytes used 2.40TiB devid1 size 931.51GiB used 919.88GiB path /dev/mapper/SAMSUNG_HD103SI_499431FS734755p1 devid2 size 931.51GiB used 919.38GiB path /dev/dm-8 devid3 size 1.82TiB used 1.19TiB path /dev/dm-6 devid4 size 931.51GiB used 919.88GiB path /dev/dm-5 devid5 size 0.00 used 918.38GiB path /dev/dm-11 devid6 size 1.82TiB used 3.88GiB path /dev/dm-9 /dev/dm-11 is the failed drive. I take it that size 0 is a good sign. I'm not really sure where to go from here. I tried rebooting the system with the failed drive attached, and Btrfs re-adds it to the array. Should I physically remove the drive now? Is a balance recommended? I'm going to guess at what I think has happened. You had a 5 device raid10. devid 5 is the failed device, but at the time you added new device devid 6, it was not considered failed by btrfs. Your first btrfs fi show does not show size 0 for devid 5. So I think btrfs made you a 6 device raid10 volume. But now devid 5 has failed, shows up as size 0. The reason you have to mount degraded still is because you have a 6 device raid10 now, and 1 device has failed. And you can't remove the failed device because you've mounted degraded. So actually it was a mistake to add a new device first, but it's an easy mistake to make because right now btrfs really tolerates a lot of error conditions that it probably should give up on and outright fail the device. So I think you might have to get a 7th device to fix this with btrfs replace start. You can later delete devices once you're not mounted degraded. Or you can just do a backup now while you can mount degraded, and then blow away the btrfs volume and start over. If you have a current backups and are willing to lose data on this volume, you can try the following 1. Poweroff, remove the failed drive, boot, and do a normal mount. That probably won't work but it's worth a shot. If it doesn't work try mount -o degraded. [That might not work either, in which case stop here, I think you'll need to go with a 7th device and use 'btrfs replace start 5 /dev/newdevice7 /mp' That will explicitly replace failed device 5 with new device.] 2. Assuming mount -o degraded works, take a btrfs fi show. There should be a missing device listed. Now try btrfs device delete missing /mp and see what happens. If it at least doesn't complain, it means it's working and might take hours to replicate data that was on the missing device onto the new one. So I'd leave it alone until iotop or something like that tells you it's not busy anymore. 3. Unmount the file system. Try to mount normally (not degraded). Chris Murphy -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Failed Disk RAID10 Problems
Hi, I have a Btrfs RAID 10 (data and metadata) file system that I believe suffered a disk failure. In my attempt to replace the disk, I think that I've made the problem worse and need some help recovering it. I happened to notice a lot of errors in the journal: end_request: I/O error, dev dm-11, sector 1549378344 BTRFS: bdev /dev/mapper/Hitachi_HDS721010KLA330_GTA040PBG71HXF1 errs: wr 759675, rd 539730, flush 23, corrupt 0, gen 0 The file system continued to work for some time, but eventually a NFS client encountered IO errors. I figured that device was failing (It was very old.). I attached a new drive to the hot-swappable SATA slot on my computer, partitioned it with GPT, and ran partprobe to detect it. Next I attempted to add a new device, which was successful. However, something peculiar happened: ~: btrfs fi df /var/media/ Data, RAID10: total=2.33TiB, used=2.33TiB Data, RAID6: total=72.00GiB, used=71.96GiB System, RAID10: total=96.00MiB, used=272.00KiB Metadata, RAID10: total=4.12GiB, used=2.60GiB I don't know where that RAID6 file system came from, but it did not exist over the weekend when I last checked. I attempted to run a balance operation, but this is when the IO errors became severe, and I cancelled it. Next, I tried to remove the failed device, thinking that Btrfs could rebalance after that. Removing the failed device failed: ~: btrfs device delete /dev/dm-11 /var/media ERROR: error removing the device '/dev/dm-11' - Device or resource busy I shutdown the system and detached the failed disk. Upon reboot, I cannot mount the filesystem: ~: mount /dev/mapper/SAMSUNG_HD103SI_499431FS734755p1 /var/media mount: wrong fs type, bad option, bad superblock on /dev/mapper/SAMSUNG_HD103SI_499431FS734755p1, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg | tail or so. BTRFS: device label media devid 2 transid 44804 /dev/mapper/WDC_WD10EACS-00D6B0_WD-WCAU40229179p1 BTRFS info (device dm-10): disk space caching is enabled BTRFS: failed to read the system array on dm-10 BTRFS: open_ctree failed I reattached the failed disk, and I'm still getting the same mount error as above. Here's where the array currently stands: Label: 'media' uuid: 7b7afc82-f77c-44c0-b315-669ebd82f0c5 Total devices 5 FS bytes used 2.39TiB devid1 size 931.51GiB used 919.41GiB path /dev/mapper/SAMSUNG_HD103SI_499431FS734755p1 devid2 size 931.51GiB used 919.41GiB path /dev/mapper/WDC_WD10EACS-00D6B0_WD-WCAU40229179p1 devid3 size 1.82TiB used 1.19TiB path /dev/mapper/WDC_WD20EFRX-68AX9N0_WD-WMC1T1268493p1 devid4 size 931.51GiB used 920.41GiB path /dev/mapper/WDC_WD10EARS-00Y5B1_WD-WMAV50654875p1 devid5 size 931.51GiB used 918.50GiB path /dev/mapper/Hitachi_HDS721010KLA330_GTA040PBG71HXF1 devid6 size 1.82TiB used 3.41GiB path /dev/mapper/WDC_WD20EFRX-68AX9N0_WD-WMC300239240p1 Btrfs v3.12 Devid 6 is the drive that I added earlier. What can I do to recover this file system? I have another spare drive that I can use if it's any help. Thanks, Justin -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Failed Disk RAID10 Problems
On May 28, 2014, at 12:19 AM, Justin Brown justin.br...@fandingo.org wrote: Hi, I have a Btrfs RAID 10 (data and metadata) file system that I believe suffered a disk failure. In my attempt to replace the disk, I think that I've made the problem worse and need some help recovering it. I happened to notice a lot of errors in the journal: end_request: I/O error, dev dm-11, sector 1549378344 BTRFS: bdev /dev/mapper/Hitachi_HDS721010KLA330_GTA040PBG71HXF1 errs: wr 759675, rd 539730, flush 23, corrupt 0, gen 0 The file system continued to work for some time, but eventually a NFS client encountered IO errors. I figured that device was failing (It was very old.). I attached a new drive to the hot-swappable SATA slot on my computer, partitioned it with GPT, and ran partprobe to detect it. Next I attempted to add a new device, which was successful. For future reference, it should to add a device and then use btrfs device delete missing. But I've found btrfs replace start to be more reliable. It does the add, delete and balance in one step. ~: mount /dev/mapper/SAMSUNG_HD103SI_499431FS734755p1 /var/media mount: wrong fs type, bad option, bad superblock on /dev/mapper/SAMSUNG_HD103SI_499431FS734755p1, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg | tail or so. BTRFS: device label media devid 2 transid 44804 /dev/mapper/WDC_WD10EACS-00D6B0_WD-WCAU40229179p1 BTRFS info (device dm-10): disk space caching is enabled BTRFS: failed to read the system array on dm-10 BTRFS: open_ctree failed I'd try in order: mount -o degraded,ro mount -o recovery,ro mount -o degraded,recovery,ro If any of those works, then update your backup before trying anything else. Whatever command above worked, try it without ro. If a degrade option is needed then that makes me think a btrfs device delete missing won't work, but then I'm also not seeing a missing device in your btrfs fi show either. You definitely need to make sure the device producing the errors is the device that's missing and is the one you're removing. Chris Murphy -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Failed Disk RAID10 Problems
On May 28, 2014, at 1:03 AM, Chris Murphy li...@colorremedies.com wrote: For future reference, it should to add a device and then use btrfs device delete missing. it should work (if not it's probably a bug). Chris Murphy -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Fwd: Failed Disk RAID10 Problems
Chris, Thanks for the tip. I was able to mount the drive as degraded and recovery. Then, I deleted the faulty drive, leaving me with the following array: Label: media uuid: 7b7afc82-f77c-44c0-b315-669ebd82f0c5 Total devices 6 FS bytes used 2.40TiB devid1 size 931.51GiB used 919.88GiB path /dev/mapper/SAMSUNG_HD103SI_499431FS734755p1 devid2 size 931.51GiB used 919.38GiB path /dev/dm-8 devid3 size 1.82TiB used 1.19TiB path /dev/dm-6 devid4 size 931.51GiB used 919.88GiB path /dev/dm-5 devid5 size 0.00 used 918.38GiB path /dev/dm-11 devid6 size 1.82TiB used 3.88GiB path /dev/dm-9 /dev/dm-11 is the failed drive. I take it that size 0 is a good sign. I'm not really sure where to go from here. I tried rebooting the system with the failed drive attached, and Btrfs re-adds it to the array. Should I physically remove the drive now? Is a balance recommended? Thanks, Justin -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Failed Disk RAID10 Problems
On May 28, 2014, at 12:39 PM, Justin Brown justin.br...@fandingo.org wrote: Chris, Thanks for the tip. I was able to mount the drive as degraded and recovery. Then, I deleted the faulty drive, leaving me with the following array: Label: media uuid: 7b7afc82-f77c-44c0-b315-669ebd82f0c5 Total devices 6 FS bytes used 2.40TiB devid1 size 931.51GiB used 919.88GiB path /dev/mapper/SAMSUNG_HD103SI_499431FS734755p1 devid2 size 931.51GiB used 919.38GiB path /dev/dm-8 devid3 size 1.82TiB used 1.19TiB path /dev/dm-6 devid4 size 931.51GiB used 919.88GiB path /dev/dm-5 devid5 size 0.00 used 918.38GiB path /dev/dm-11 devid6 size 1.82TiB used 3.88GiB path /dev/dm-9 /dev/dm-11 is the failed drive. You deleted a faulty drive, dm-11 is a failed drive. Is there a difference between faulty drive and failed drive, or are they the same drive? And what drive is the one you said you successfully added? I don't see how you have 6 devices raid10, with one failed and one added device. You need an even number of good drives to fix this. I take it that size 0 is a good sign. Seems neither good nor bad to me, it's 0 because it's a dead drive presumably and therefore Btrfs isn't getting device information from it. I'm not really sure where to go from here. I tried rebooting the system with the failed drive attached, and Btrfs re-adds it to the array. Should I physically remove the drive now? Is a balance recommended? No don't do anything else until someone actually understands faulty vs failed vs added drives. Chris Murphy-- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Failed Disk RAID10 Problems
On May 28, 2014, at 12:39 PM, Justin Brown justin.br...@fandingo.org wrote: Chris, Thanks for the tip. I was able to mount the drive as degraded and recovery. Then, I deleted the faulty drive, leaving me with the following array: Label: media uuid: 7b7afc82-f77c-44c0-b315-669ebd82f0c5 Total devices 6 FS bytes used 2.40TiB devid1 size 931.51GiB used 919.88GiB path /dev/mapper/SAMSUNG_HD103SI_499431FS734755p1 devid2 size 931.51GiB used 919.38GiB path /dev/dm-8 devid3 size 1.82TiB used 1.19TiB path /dev/dm-6 devid4 size 931.51GiB used 919.88GiB path /dev/dm-5 devid5 size 0.00 used 918.38GiB path /dev/dm-11 devid6 size 1.82TiB used 3.88GiB path /dev/dm-9 /dev/dm-11 is the failed drive. I take it that size 0 is a good sign. I'm not really sure where to go from here. I tried rebooting the system with the failed drive attached, and Btrfs re-adds it to the array. Should I physically remove the drive now? Is a balance recommended? I'm going to guess at what I think has happened. You had a 5 device raid10. devid 5 is the failed device, but at the time you added new device devid 6, it was not considered failed by btrfs. Your first btrfs fi show does not show size 0 for devid 5. So I think btrfs made you a 6 device raid10 volume. But now devid 5 has failed, shows up as size 0. The reason you have to mount degraded still is because you have a 6 device raid10 now, and 1 device has failed. And you can't remove the failed device because you've mounted degraded. So actually it was a mistake to add a new device first, but it's an easy mistake to make because right now btrfs really tolerates a lot of error conditions that it probably should give up on and outright fail the device. So I think you might have to get a 7th device to fix this with btrfs replace start. You can later delete devices once you're not mounted degraded. Or you can just do a backup now while you can mount degraded, and then blow away the btrfs volume and start over. If you have a current backups and are willing to lose data on this volume, you can try the following 1. Poweroff, remove the failed drive, boot, and do a normal mount. That probably won't work but it's worth a shot. If it doesn't work try mount -o degraded. [That might not work either, in which case stop here, I think you'll need to go with a 7th device and use 'btrfs replace start 5 /dev/newdevice7 /mp' That will explicitly replace failed device 5 with new device.] 2. Assuming mount -o degraded works, take a btrfs fi show. There should be a missing device listed. Now try btrfs device delete missing /mp and see what happens. If it at least doesn't complain, it means it's working and might take hours to replicate data that was on the missing device onto the new one. So I'd leave it alone until iotop or something like that tells you it's not busy anymore. 3. Unmount the file system. Try to mount normally (not degraded). Chris Murphy-- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Scrub causes oom after removal of failed disk (linux 3.10)
On Wed, Jul 03, 2013 at 08:35:48PM +0200, Torbjørn wrote: Hi btrfs devs, I have a btrfs raid10 array consisting of 2TB drives. I added a new drive to the array, then balanced. The balance failed after ~50GB was moved to the new drive. The balance fixed lots of errors according to dmesg. Server rebooted The newly added drive were no longer detected as a btrfs disk. The array was then mounted -o recovery I ran btrfs dev del missing, and everything seemed to be fine. After this I ran a scrub on the array. The scrub was soon stopped by the oom-killer. After another reboot I started a new scrub. About 3TB into the scrub over 10 GB of memory was being consumed. The scrub had then fixed roughly 3,000,000 errors. Canceling the scrub and resuming it frees the 10 GB of memory. Thanks for the report. This looks like the same problem that was fixed by https://patchwork.kernel.org/patch/2697501/ Btrfs: free csums when we're done scrubbing an extent but I don't see it included in the current for-linus branch. We want this in the 3.10.x stable series and according to stable tree policy it has to be merged into Linus' tree first. david -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Scrub causes oom after removal of failed disk (linux 3.10)
On 07/08/2013 11:36 PM, David Sterba wrote: On Wed, Jul 03, 2013 at 08:35:48PM +0200, Torbjørn wrote: Hi btrfs devs, I have a btrfs raid10 array consisting of 2TB drives. I added a new drive to the array, then balanced. The balance failed after ~50GB was moved to the new drive. The balance fixed lots of errors according to dmesg. Server rebooted The newly added drive were no longer detected as a btrfs disk. The array was then mounted -o recovery I ran btrfs dev del missing, and everything seemed to be fine. After this I ran a scrub on the array. The scrub was soon stopped by the oom-killer. After another reboot I started a new scrub. About 3TB into the scrub over 10 GB of memory was being consumed. The scrub had then fixed roughly 3,000,000 errors. Canceling the scrub and resuming it frees the 10 GB of memory. Thanks for the report. This looks like the same problem that was fixed by https://patchwork.kernel.org/patch/2697501/ Btrfs: free csums when we're done scrubbing an extent but I don't see it included in the current for-linus branch. We want this in the 3.10.x stable series and according to stable tree policy it has to be merged into Linus' tree first. david -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Ok, thanks -- Torbjørn -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Scrub causes oom after removal of failed disk (linux 3.10)
Hi btrfs devs, I have a btrfs raid10 array consisting of 2TB drives. I added a new drive to the array, then balanced. The balance failed after ~50GB was moved to the new drive. The balance fixed lots of errors according to dmesg. Server rebooted The newly added drive were no longer detected as a btrfs disk. The array was then mounted -o recovery I ran btrfs dev del missing, and everything seemed to be fine. After this I ran a scrub on the array. The scrub was soon stopped by the oom-killer. After another reboot I started a new scrub. About 3TB into the scrub over 10 GB of memory was being consumed. The scrub had then fixed roughly 3,000,000 errors. Canceling the scrub and resuming it frees the 10 GB of memory. I'm assuming this is not expected behavior. If I can help in any way please let me know. dmesg from the failed balance: [68190.748909] btrfs csum failed ino 1512 extent 1540228509696 csum 2089345036 wanted 864794082 mirror 1 [68190.809090] BUG: unable to handle kernel paging request at 87fe167a32c0 [68190.814638] IP: [a0272287] repair_io_failure+0x117/0x230 [btrfs] [68190.820709] PGD 0 [68190.826781] Oops: [#1] SMP [68190.833090] Modules linked in: xfs ip6table_filter ip6_tables ebtable_nat ebtables ipt_MASQUERADE ipt_REJECT xt_CHECKSUM sch_prio bridge stp llc dm_crypt xt_state iptable_filter xt_CLASSIFY xt_tcpudp xt_DSCP iptable_mangle iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack ip_tables x_tables intel_powerclamp kvm_intel kvm psmouse serio_raw microcode lpc_ich ppdev parport_pc w83627ehf hwmon_vid coretemp nfsd nfs_acl auth_rpcgss nfs lp fscache lockd parport sunrpc btrfs zlib_deflate libcrc32c raid10 raid1 raid0 multipath linear raid456 async_pq async_xor xor async_memcpy async_raid6_recov raid6_pq async_tx hid_generic usbhid hid ast ttm drm_kms_helper crc32_pclmul ghash_clmulni_intel drm aesni_intel ablk_helper cryptd lrw i2c_algo_bit gf128mul sysimgblt glue_helper sysfillrect aes_x86_64 syscopyarea e1000e mpt2sas ahci ptp libahci scsi_transport_sas pps_core raid_class video [68190.926164] CPU: 3 PID: 16472 Comm: btrfs-endio-8 Not tainted 3.10.0+ #11 [68190.941478] Hardware name: To be filled by O.E.M. To be filled by O.E.M./P8B-X series, BIOS 2107 05/04/2012 [68190.957876] task: 880125fe1740 ti: 8802cb6ee000 task.ti: 8802cb6ee000 [68190.974836] RIP: 0010:[a0272287] [a0272287] repair_io_failure+0x117/0x230 [btrfs] [68190.992754] RSP: 0018:8802cb6efca8 EFLAGS: 00010287 [68191.010933] RAX: fffa43a60fe8 RBX: 1000 RCX: 019e1ce5e000 [68191.029830] RDX: 8803d2d422a0 RSI: 8802cb6efcc0 RDI: 8803dd444be0 [68191.049014] RBP: 8802cb6efd18 R08: R09: [68191.068584] R10: c2d195ff R11: 3fb5 R12: 880416adc000 [68191.088446] R13: 0929834ae000 R14: c2d19600 R15: 8803db4c5910 [68191.108648] FS: () GS:88042fcc() knlGS: [68191.129491] CS: 0010 DS: ES: CR0: 80050033 [68191.150587] CR2: 87fe167a32c0 CR3: 01c0c000 CR4: 001427e0 [68191.172318] DR0: DR1: DR2: [68191.194339] DR3: DR6: 0ff0 DR7: 0400 [68191.216403] Stack: [68191.238415] 0006c000 ea0005becf40 2000 8803d2d422a0 [68191.261527] 8802 8802cb6efcd8 8802cb6efcd8 [68191.285026] 8802cb6efd18 0006c000 8802137488a0 ea0005becf40 [68191.308855] Call Trace: [68191.332739] [a0272bdf] end_bio_extent_readpage+0x78f/0x7f0 [btrfs] [68191.357675] [811a38ad] bio_endio+0x1d/0x30 [68191.382816] [a024cf41] end_workqueue_fn+0x41/0x50 [btrfs] [68191.408455] [a02822d8] worker_loop+0x148/0x520 [btrfs] [68191.434422] [816902c7] ? __schedule+0x3d7/0x800 [68191.460669] [a0282190] ? btrfs_queue_worker+0x320/0x320 [btrfs] [68191.487415] [81064410] kthread+0xc0/0xd0 [68191.514246] [81064350] ? kthread_create_on_node+0x130/0x130 [68191.541603] [81699f1c] ret_from_fork+0x7c/0xb0 [68191.569187] [81064350] ? kthread_create_on_node+0x130/0x130 [68191.597279] Code: a0 e8 4e c1 00 00 85 c0 0f 85 b6 00 00 00 48 8b 55 a8 44 3b 72 2c 0f 85 e8 00 00 00 45 8d 56 ff 4d 63 d2 4b 8d 04 52 48 c1 e0 03 4c 8b 6c 02 38 49 c1 ed 09 4d 89 2f 48 8b 7d a8 4c 8b 64 07 30 [68191.657235] RIP [a0272287] repair_io_failure+0x117/0x230 [btrfs] [68191.687689] RSP 8802cb6efca8 [68191.718273] CR2: 87fe167a32c0 [68191.870900] ---[ end trace ad5eb9d56280bbe5 ]--- [68191.870902] BUG: unable to handle kernel paging request at 87f6dfa7da60 [68191.870910] IP: [a0272287] repair_io_failure+0x117/0x230 [btrfs] [68191.870911] PGD 0 [68191.870912] Oops: [#2] SMP [68191.870992] Modules linked in: xfs
failed disk (was: kernel 3.3.4 damages filesystem (?))
Hallo, Hugo, Du meintest am 07.05.12: mkfs.btrfs -m raid1 -d single should give you that. What's the difference to mkfs.btrfs -m raid1 -d raid0 - RAID-0 stripes each piece of data across all the disks. - single puts data on one disk at a time. [...] In fact, this is probably a good argument for having the option to put back the old allocator algorithm, which would have ensured that the first disk would fill up completely first before it touched the next one... The actual version seems to oscillate from disk to disk: Copying about 160 GiByte shows Label: none uuid: fd0596c6-d819-42cd-bb4a-420c38d2a60b Total devices 2 FS bytes used 155.64GB devid2 size 136.73GB used 114.00GB path /dev/sdl1 devid1 size 68.37GB used 45.04GB path /dev/sdk1 Btrfs Btrfs v0.19 Watching the amount showed that both disks are filled nearly simultaneously. That would be more difficult to restore ... Viele Gruesse! Helmut -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
failed disk (was: kernel 3.3.4 damages filesystem (?))
Hallo, Hugo, Du meintest am 07.05.12: [...] With a file system like ext2/3/4 I can work with several directories which are mounted together, but (as said before) one broken disk doesn't disturb the others. mkfs.btrfs -m raid1 -d single should give you that. Just a small bug, perhaps: created a system with mkfs.btrfs -m raid1 -d single /dev/sdl1 mount /dev/sdl1 /mnt/Scsi btrfs device add /dev/sdk1 /mnt/Scsi btrfs device add /dev/sdm1 /mnt/Scsi (filling with data) and btrfs fi df /mnt/Scsi now tells Data, RAID0: total=183.18GB, used=76.60GB Data: total=80.01GB, used=79.83GB System, DUP: total=8.00MB, used=32.00KB System: total=4.00MB, used=0.00 Metadata, DUP: total=1.00GB, used=192.74MB Metadata: total=8.00MB, used=0.00 -- Data, RAID0 confuses me (not very much ...), and the system for metadata (RAID1) is not told. Viele Gruesse! Helmut -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: failed disk (was: kernel 3.3.4 damages filesystem (?))
On Wed, May 09, 2012 at 04:25:00PM +0200, Helmut Hullen wrote: Du meintest am 07.05.12: [...] With a file system like ext2/3/4 I can work with several directories which are mounted together, but (as said before) one broken disk doesn't disturb the others. mkfs.btrfs -m raid1 -d single should give you that. Just a small bug, perhaps: created a system with mkfs.btrfs -m raid1 -d single /dev/sdl1 mount /dev/sdl1 /mnt/Scsi btrfs device add /dev/sdk1 /mnt/Scsi btrfs device add /dev/sdm1 /mnt/Scsi (filling with data) and btrfs fi df /mnt/Scsi now tells Data, RAID0: total=183.18GB, used=76.60GB Data: total=80.01GB, used=79.83GB System, DUP: total=8.00MB, used=32.00KB System: total=4.00MB, used=0.00 Metadata, DUP: total=1.00GB, used=192.74MB Metadata: total=8.00MB, used=0.00 -- Data, RAID0 confuses me (not very much ...), and the system for metadata (RAID1) is not told. DUP is two copies of each block, but it allows the two copies to live on the same device. It's done this because you started with a single device, and you can't do RAID-1 on one device. The first bit of metadata you write to it should automatically upgrade the DUP chunk to RAID-1. As to the spurious upgrade of single to RAID-0, I thought Ilya had stopped it doing that. What kernel version are you running? Out of interest, why did you do the device adds separately, instead of just this? # mkfs.btrfs -m raid1 -d single /dev/sdl1 /dev/sdk1 /dev/sdm1 Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Comic Sans goes into a bar, and the barman says, We don't --- serve your type here. signature.asc Description: Digital signature
Re: failed disk
Hallo, Hugo, Du meintest am 09.05.12: mkfs.btrfs -m raid1 -d single should give you that. Just a small bug, perhaps: created a system with mkfs.btrfs -m raid1 -d single /dev/sdl1 mount /dev/sdl1 /mnt/Scsi btrfs device add /dev/sdk1 /mnt/Scsi btrfs device add /dev/sdm1 /mnt/Scsi (filling with data) and btrfs fi df /mnt/Scsi now tells Data, RAID0: total=183.18GB, used=76.60GB Data: total=80.01GB, used=79.83GB System, DUP: total=8.00MB, used=32.00KB System: total=4.00MB, used=0.00 Metadata, DUP: total=1.00GB, used=192.74MB Metadata: total=8.00MB, used=0.00 -- Data, RAID0 confuses me (not very much ...), and the system for metadata (RAID1) is not told. DUP is two copies of each block, but it allows the two copies to live on the same device. It's done this because you started with a single device, and you can't do RAID-1 on one device. The first bit of metadata you write to it should automatically upgrade the DUP chunk to RAID-1. Ok. Sounds familiar - have you explained that to me many months ago? As to the spurious upgrade of single to RAID-0, I thought Ilya had stopped it doing that. What kernel version are you running? 3.2.9, self made. I could test the message with 3.3.4, but not today (if it's only an interpretation of always the same data). Out of interest, why did you do the device adds separately, instead of just this? a) making the first 2 devices: I have tested both versions (one line with 2 devices or 2 lines with 1 device); no big difference. But I had tested the option -L (labelling) too, and that makes shit for the oneliner: both devices get the same label, and then findfs finds none of them. The really safe way would be: deleting this option for the mkfs.btrfs command and only using btrfs fi label device [newlabel] b) third device: that's my usual test: make a cluster of 2 deivces fill them with data add a third device delete the smallest device Viele Gruesse! Helmut -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: failed disk
On Wed, May 09, 2012 at 05:14:00PM +0200, Helmut Hullen wrote: Hallo, Hugo, Du meintest am 09.05.12: DUP is two copies of each block, but it allows the two copies to live on the same device. It's done this because you started with a single device, and you can't do RAID-1 on one device. The first bit of metadata you write to it should automatically upgrade the DUP chunk to RAID-1. Ok. Sounds familiar - have you explained that to me many months ago? Probably. I tend to explain this kind of thing a lot to people. As to the spurious upgrade of single to RAID-0, I thought Ilya had stopped it doing that. What kernel version are you running? 3.2.9, self made. OK, I'm pretty sure that's too old -- it will upgrade single to RAID-0. You can probably turn it back to single using balance filters: # btrfs fi balance -dconvert=single /mountpoint (You may want to write at least a little data to the FS first -- balance has some slightly odd behaviour on empty filesystems). I could test the message with 3.3.4, but not today (if it's only an interpretation of always the same data). Out of interest, why did you do the device adds separately, instead of just this? a) making the first 2 devices: I have tested both versions (one line with 2 devices or 2 lines with 1 device); no big difference. But I had tested the option -L (labelling) too, and that makes shit for the oneliner: both devices get the same label, and then findfs finds none of them. Umm... Yes, of course both devices will get the same label -- you're labelling the filesystem, not the devices. (Didn't we have this argument some time ago?). I don't know what findfs is doing, that it can't find the filesystem by label: you may need to run sync after mkfs, possibly. The really safe way would be: deleting this option for the mkfs.btrfs command and only using btrfs fi label device [newlabel] ... except that it'd have to take a filesystem as parameter, not a device (see above). b) third device: that's my usual test: make a cluster of 2 deivces fill them with data add a third device delete the smallest device What are you testing? And by delete do you mean btrfs dev delete or pull the cable out? Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Quidquid latine dictum sit, altum videtur. --- signature.asc Description: Digital signature
Re: failed disk (was: kernel 3.3.4 damages filesystem (?))
On Wed, May 09, 2012 at 03:37:35PM +0100, Hugo Mills wrote: On Wed, May 09, 2012 at 04:25:00PM +0200, Helmut Hullen wrote: Du meintest am 07.05.12: [...] With a file system like ext2/3/4 I can work with several directories which are mounted together, but (as said before) one broken disk doesn't disturb the others. mkfs.btrfs -m raid1 -d single should give you that. Just a small bug, perhaps: created a system with mkfs.btrfs -m raid1 -d single /dev/sdl1 mount /dev/sdl1 /mnt/Scsi btrfs device add /dev/sdk1 /mnt/Scsi btrfs device add /dev/sdm1 /mnt/Scsi (filling with data) and btrfs fi df /mnt/Scsi now tells Data, RAID0: total=183.18GB, used=76.60GB Data: total=80.01GB, used=79.83GB System, DUP: total=8.00MB, used=32.00KB System: total=4.00MB, used=0.00 Metadata, DUP: total=1.00GB, used=192.74MB Metadata: total=8.00MB, used=0.00 -- Data, RAID0 confuses me (not very much ...), and the system for metadata (RAID1) is not told. DUP is two copies of each block, but it allows the two copies to live on the same device. It's done this because you started with a single device, and you can't do RAID-1 on one device. The first bit of What Hugo said. Newer mkfs.btrfs will error out if you try to do this. metadata you write to it should automatically upgrade the DUP chunk to RAID-1. We don't upgrade chunks in place, only during balance. As to the spurious upgrade of single to RAID-0, I thought Ilya had stopped it doing that. What kernel version are you running? I did, but again, we were doing it only as part of balance, not as part of normal operation. Helmut, do you have any additional data points - the output of btrfs fi df right after you created FS or somewhere in the middle of filling it ? Also could you please paste the output of btrfs fi show and tell us what kernel version you are running ? Thanks, Ilya -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: failed disk
Hallo, Hugo, Du meintest am 09.05.12: As to the spurious upgrade of single to RAID-0, I thought Ilya had stopped it doing that. What kernel version are you running? 3.2.9, self made. OK, I'm pretty sure that's too old -- it will upgrade single to RAID-0. You can probably turn it back to single using balance filters: # btrfs fi balance -dconvert=single /mountpoint (You may want to write at least a little data to the FS first -- balance has some slightly odd behaviour on empty filesystems). manana ... the system is just running balance after device delete. And that may still need 4 ... 5 hours. Out of interest, why did you do the device adds separately, instead of just this? a) making the first 2 devices: I have tested both versions (one line with 2 devices or 2 lines with 1 device); no big difference. But I had tested the option -L (labelling) too, and that makes shit for the oneliner: both devices get the same label, and then findfs finds none of them. Umm... Yes, of course both devices will get the same label -- you're labelling the filesystem, not the devices. (Didn't we have this argument some time ago?). Not with that special case (and that led me to misinterpreting the error ...). I don't know what findfs is doing, that it can't find the filesystem by label: you may need to run sync after mkfs, possibly. No - findfs works quite simple: if it finds 1 label then it tells the partition. If it finds more or less labels it tells nothing. b) third device: that's my usual test: make a cluster of 2 deivces fill them with data add a third device delete the smallest device What are you testing? And by delete do you mean btrfs dev delete or pull the cable out? First pure software delete. Tomorrow I'll reboot the system and look at the results with btrfs fi show It should tell only 2 devices (that's the part which seems to work as described at least since kernel 3.2). By the way: it seems to be necessary running btrfs fi balance ... after btrfs device add ... and after btrfs device delete Viele Gruesse! Helmut -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: failed disk
Hallo, Hugo, Du meintest am 09.05.12: btrfs fi df /mnt/Scsi now tells Data, RAID0: total=183.18GB, used=76.60GB Data: total=80.01GB, used=79.83GB System, DUP: total=8.00MB, used=32.00KB System: total=4.00MB, used=0.00 Metadata, DUP: total=1.00GB, used=192.74MB Metadata: total=8.00MB, used=0.00 -- Data, RAID0 confuses me (not very much ...), and the system for metadata (RAID1) is not told. DUP is two copies of each block, but it allows the two copies to live on the same device. It's done this because you started with a single device, and you can't do RAID-1 on one device. The first bit of metadata you write to it should automatically upgrade the DUP chunk to RAID-1. It has done - ok. Adding and removing disks/partitions works as expected. Viele Gruesse! Helmut -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html