On Sun, Mar 16, 2014 at 11:12:43PM -0600, Chris Murphy wrote: > > On Mar 16, 2014, at 9:44 PM, Marc MERLIN <m...@merlins.org> wrote: > > > On Sun, Mar 16, 2014 at 08:56:35PM -0600, Chris Murphy wrote: > > > >>> If I add a device, isn't it going to grow my raid to make it bigger > >>> instead > >>> of trying to replace the bad device? > >> > >> Yes if it's successful. No if it fails which is the problem I'm having. > > > > That's where I don't follow you. > > You just agreed that it will grow my raid. > > So right now it's 4.5TB with 10 drives, if I add one drive, it will grow to > > 5TB with 11 drives. > > How does that help? > > If you swap the faulty drive for a good drive, I'm thinking then you'll be > able to device delete the bad device, which ought to be "missing" at that > point; or if that fails you should be able to do a balance, and then be able > to device delete the faulty drive. > > The problem I'm having is that when I detach one device out of a 3 device > raid5, btrfs fi show doesn't list it as missing. It's listed without the > /dev/sdd designation it had when attached, but now it's just blank.
Ok, I tried unmounting and remounting degraded this morning: polgara:~# mount -v -t btrfs -o compress=zlib,space_cache,noatime,degraded LABEL=backupcopy /mnt/btrfs_backupcopy Mar 17 08:57:35 polgara kernel: [123824.344085] BTRFS: device label backupcopy devid 9 transid 3837 /dev/mapper/crypt_sdk1 Mar 17 08:57:35 polgara kernel: [123824.454641] BTRFS info (device dm-9): allowing degraded mounts Mar 17 08:57:35 polgara kernel: [123824.454978] BTRFS info (device dm-9): disk space caching is enabled Mar 17 08:57:35 polgara kernel: [123824.497437] BTRFS: bdev /dev/mapper/crypt_sde1 errs: wr 3888, rd 321927975, flush 0, corrupt 0, gen 0 /dev/mapper/crypt_sdk1 on /mnt/btrfs_backupcopy type btrfs (rw,noatime,compress=zlib,space_cache,degraded) What's confusing is that mounting in degraded mode shows all devices: polgara:~# btrfs fi show Label: backupcopy uuid: 7d8e1197-69e4-40d8-8d86-278d275af896 Total devices 10 FS bytes used 376.27GiB devid 1 size 465.76GiB used 42.42GiB path /dev/dm-0 devid 2 size 465.76GiB used 42.40GiB path /dev/dm-1 devid 3 size 465.75GiB used 42.40GiB path /dev/mapper/crypt_sde1 << this is missing devid 4 size 465.76GiB used 42.40GiB path /dev/dm-3 devid 5 size 465.76GiB used 42.40GiB path /dev/dm-4 devid 6 size 465.76GiB used 42.40GiB path /dev/dm-5 devid 7 size 465.76GiB used 42.40GiB path /dev/dm-6 devid 8 size 465.76GiB used 42.40GiB path /dev/mapper/crypt_sdj1 devid 9 size 465.76GiB used 42.40GiB path /dev/mapper/crypt_sdk1 devid 10 size 465.76GiB used 42.40GiB path /dev/dm-8 Ok, so mount in degraded mode works. Adding a new device failed though: polgara:~# btrfs device add /dev/mapper/crypt_sdm1 /mnt/btrfs_backupcopy BTRFS: bad tree block start 852309604880683448 156237824 ------------[ cut here ]------------ WARNING: CPU: 0 PID: 1963 at fs/btrfs/super.c:257 __btrfs_abort_transaction+0x50/0x100() BTRFS: Transaction aborted (error -5) Modules linked in: xts gf128mul ipt_MASQUERADE ipt_REJECT xt_tcpudp xt_conntrack xt_LOG iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle ip6table_filter ip6_tables iptable_filter ip_tables ebtable_nat ebtables x_tables cpufreq_userspace cpufreq_powersave cpufreq_conservative cpufreq_stats ppdev rfcomm bnep autofs4 binfmt_misc uinput nfsd auth_rpcgss nfs_acl nfs lockd fscache sunrpc fuse dm_crypt dm_mod configs parport_pc lp parport input_polldev loop firewire_sbp2 firewire_core crc_itu_t ecryptfs btusb bluetooth 6lowpan_iphc rfkill usbkbd usbmouse joydev hid_generic usbhid hid iTCO_wdt iTCO_vendor_support gpio_ich coretemp kvm_intel kvm microcode snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel snd_hda_codec pcspkr snd_hwdep i2c_i801 snd_pcm_oss snd_mixer_oss lpc_ich snd_pcm snd_seq_midi snd_seq_midi_event sg sr_mod cdrom snd_rawmidi snd_seq snd_seq_device snd_timer atl1 mii mvsas snd nouveau libsas scsi_transport_ soundcore ttm ehci_pci asus_atk0110 floppy uhci_hcd ehci_hcd usbcore acpi_cpufreq usb_common processor evdev CPU: 0 PID: 1963 Comm: btrfs Tainted: G W 3.14.0-rc5-amd64-i915-preempt-20140216c #1 Hardware name: System manufacturer P5KC/P5KC, BIOS 0502 05/24/2007 0000000000000000 ffff88004b5c9988 ffffffff816090b3 ffff88004b5c99d0 ffff88004b5c99c0 ffffffff81050025 ffffffff8120913a 00000000fffffffb ffff8800144d5800 ffff88007bd3ba00 ffffffff81839280 ffff88004b5c9a20 Call Trace: [<ffffffff816090b3>] dump_stack+0x4e/0x7a [<ffffffff81050025>] warn_slowpath_common+0x7f/0x98 [<ffffffff8120913a>] ? __btrfs_abort_transaction+0x50/0x100 [<ffffffff8105008a>] warn_slowpath_fmt+0x4c/0x4e [<ffffffff8120913a>] __btrfs_abort_transaction+0x50/0x100 [<ffffffff81216fed>] __btrfs_free_extent+0x6ce/0x712 [<ffffffff8121bc89>] __btrfs_run_delayed_refs+0x939/0xbdf [<ffffffff8121dac8>] btrfs_run_delayed_refs+0x81/0x18f [<ffffffff8122aeb2>] btrfs_commit_transaction+0xeb/0x849 [<ffffffff8124e777>] btrfs_init_new_device+0x9a1/0xc00 [<ffffffff8114069b>] ? ____cache_alloc+0x1c/0x29b [<ffffffff81129d3e>] ? mem_cgroup_end_update_page_stat+0x17/0x26 [<ffffffff8125570f>] ? btrfs_ioctl+0x989/0x24b1 [<ffffffff81141096>] ? __kmalloc_track_caller+0x130/0x144 [<ffffffff8125570f>] ? btrfs_ioctl+0x989/0x24b1 [<ffffffff81255730>] btrfs_ioctl+0x9aa/0x24b1 [<ffffffff81611e15>] ? __do_page_fault+0x330/0x3df [<ffffffff8116da43>] ? mntput_no_expire+0x33/0x12b [<ffffffff81163b16>] do_vfs_ioctl+0x3d2/0x41d [<ffffffff8115676b>] ? ____fput+0xe/0x10 [<ffffffff8106973a>] ? task_work_run+0x87/0x98 [<ffffffff81163bb8>] SyS_ioctl+0x57/0x82 [<ffffffff81611ed2>] ? do_page_fault+0xe/0x10 [<ffffffff816154ad>] system_call_fastpath+0x1a/0x1f ---[ end trace 7d08b9b7f2f17b38 ]--- BTRFS: error (device dm-9) in __btrfs_free_extent:5755: errno=-5 IO failure BTRFS info (device dm-9): forced readonly ERROR: error adding the device '/dev/mapper/crypt_sdm1' - Input/output error polgara:~# Mar 17 09:07:14 polgara kernel: [124403.240880] BTRFS: error (device dm-9) in btrfs_run_delayed_refs:2713: errno=-5 IO failure Mmmh, dm-9 is another device, although it seems to work: polgara:~# dd if=/dev/dm-9 of=/dev/null bs=1M ^C1255+0 records in 1254+0 records out 1314914304 bytes (1.3 GB) copied, 15.169 s, 86.7 MB/s polgara:~# btrfs device stats /dev/dm-9 [/dev/mapper/crypt_sdk1].write_io_errs 0 [/dev/mapper/crypt_sdk1].read_io_errs 0 [/dev/mapper/crypt_sdk1].flush_io_errs 0 [/dev/mapper/crypt_sdk1].corruption_errs 0 [/dev/mapper/crypt_sdk1].generation_errs 0 I also started getting errors on my device after hours of use last night (pasted below). Not sure if I really have a 2nd device problem or not: /dev/mapper/crypt_sde1 is dm-2, BTRFS: lost page write due to I/O error on /dev/mapper/crypt_sde1 BTRFS: lost page write due to I/O error on /dev/mapper/crypt_sde1 BTRFS: lost page write due to I/O error on /dev/mapper/crypt_sde1 BTRFS: lost page write due to I/O error on /dev/mapper/crypt_sde1 BTRFS: lost page write due to I/O error on /dev/mapper/crypt_sde1 BTRFS: lost page write due to I/O error on /dev/mapper/crypt_sde1 BTRFS: lost page write due to I/O error on /dev/mapper/crypt_sde1 BTRFS: lost page write due to I/O error on /dev/mapper/crypt_sde1 BTRFS: lost page write due to I/O error on /dev/mapper/crypt_sde1 quiet_error: 123 callbacks suppressed Buffer I/O error on device dm-2, logical block 16 Buffer I/O error on device dm-2, logical block 16384 Buffer I/O error on device dm-2, logical block 67108864 Buffer I/O error on device dm-2, logical block 16 Buffer I/O error on device dm-2, logical block 16384 Buffer I/O error on device dm-2, logical block 67108864 BTRFS: lost page write due to I/O error on /dev/mapper/crypt_sde1 BTRFS: lost page write due to I/O error on /dev/mapper/crypt_sde1 BTRFS: lost page write due to I/O error on /dev/mapper/crypt_sde1 Buffer I/O error on device dm-2, logical block 0 Buffer I/O error on device dm-2, logical block 1 Buffer I/O error on device dm-2, logical block 2 Buffer I/O error on device dm-2, logical block 3 Buffer I/O error on device dm-2, logical block 0 Buffer I/O error on device dm-2, logical block 122095101 Buffer I/O error on device dm-2, logical block 122095101 Buffer I/O error on device dm-2, logical block 0 Buffer I/O error on device dm-2, logical block 0 btrfs_dev_stat_print_on_error: 366 callbacks suppressed btrfs_dev_stat_print_on_error: 346 callbacks suppressed btrfs_dev_stat_print_on_error: 606 callbacks suppressed btrfs_dev_stat_print_on_error: 276 callbacks suppressed BTRFS: bad tree block start 16817792799093053571 2701656064 BTRFS: bad tree block start 16817792799093053571 2701656064 BTRFS: bad tree block start 16817792799093053571 2701656064 BTRFS: bad tree block start 16817792799093053571 2701656064 BTRFS: bad tree block start 16817792799093053571 2701656064 BTRFS: bad tree block start 16817792799093053571 2701656064 BTRFS: bad tree block start 16817792799093053571 2701656064 BTRFS: bad tree block start 16817792799093053571 2701656064 BTRFS: bad tree block start 16817792799093053571 2701656064 BTRFS: bad tree block start 16817792799093053571 2701656064 btrfs_dev_stat_print_on_error: 11469 callbacks suppressed btree_readpage_end_io_hook: 31227 callbacks suppressed BTRFS: bad tree block start 16817792799093053571 2701656064 BTRFS: bad tree block start 16817792799093053571 2701656064 BTRFS: bad tree block start 16817792799093053571 2701656064 eventually it turned into: BTRFS: bdev /dev/mapper/crypt_sde1 errs: wr 3891, rd 321927996, flush 0, corrupt 0, gen 0 BTRFS: bdev /dev/mapper/crypt_sde1 errs: wr 3891, rd 321927997, flush 0, corrupt 0, gen 0 BTRFS: bad tree block start 17271740454546054736 1265680384 ------------[ cut here ]------------ WARNING: CPU: 1 PID: 10414 at fs/btrfs/super.c:257 __btrfs_abort_transaction+0x50/0x100() BTRFS: Transaction aborted (error -5) Modules linked in: xts gf128mul ipt_MASQUERADE ipt_REJECT xt_tcpudp xt_conntrack xt_LOG iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle ip6table_filter ip6_tables iptable_filter ip_tables ebtable_nat ebtables x_tables cpufreq_userspace cpufreq_powersave cpufreq_conservative cpufreq_stats ppdev rfcomm bnep autofs4 binfmt_misc uinput nfsd auth_rpcgss nfs_acl nfs lockd fscache sunrpc fuse dm_crypt dm_mod configs parport_pc lp parport input_polldev loop firewire_sbp2 firewire_core crc_itu_t ecryptfs btusb bluetooth 6lowpan_iphc rfkill usbkbd usbmouse joydev hid_generic usbhid hid iTCO_wdt iTCO_vendor_support gpio_ich coretemp kvm_intel kvm microcode snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel snd_hda_codec pcspkr snd_hwdep i2c_i801 snd_pcm_oss snd_mixer_oss lpc_ich snd_pcm snd_seq_midi snd_seq_midi_event sg sr_mod cdrom snd_rawmidi snd_seq snd_seq_device snd_timer atl1 mii mvsas snd nouveau libsas scsi_transport_ soundcore ttm ehci_pci asus_atk0110 floppy uhci_hcd ehci_hcd usbcore acpi_cpufreq usb_common processor evdev CPU: 1 PID: 10414 Comm: btrfs-transacti Not tainted 3.14.0-rc5-amd64-i915-preempt-20140216c #1 Hardware name: System manufacturer P5KC/P5KC, BIOS 0502 05/24/2007 0000000000000000 ffff88004ae4fb30 ffffffff816090b3 ffff88004ae4fb78 ffff88004ae4fb68 ffffffff81050025 ffffffff8120913a 00000000fffffffb ffff88004f2e7800 ffff8800603804c0 ffffffff81839280 ffff88004ae4fbc8 Call Trace: [<ffffffff816090b3>] dump_stack+0x4e/0x7a [<ffffffff81050025>] warn_slowpath_common+0x7f/0x98 [<ffffffff8120913a>] ? __btrfs_abort_transaction+0x50/0x100 [<ffffffff8105008a>] warn_slowpath_fmt+0x4c/0x4e [<ffffffff8120913a>] __btrfs_abort_transaction+0x50/0x100 [<ffffffff81216fed>] __btrfs_free_extent+0x6ce/0x712 [<ffffffff8121bc89>] __btrfs_run_delayed_refs+0x939/0xbdf [<ffffffff8121dac8>] btrfs_run_delayed_refs+0x81/0x18f [<ffffffff8122ae40>] btrfs_commit_transaction+0x79/0x849 [<ffffffff812277ca>] transaction_kthread+0xf8/0x1ab [<ffffffff812276d2>] ? btrfs_cleanup_transaction+0x43f/0x43f [<ffffffff8106bc56>] kthread+0xae/0xb6 [<ffffffff8106bba8>] ? __kthread_parkme+0x61/0x61 [<ffffffff816153fc>] ret_from_fork+0x7c/0xb0 [<ffffffff8106bba8>] ? __kthread_parkme+0x61/0x61 ---[ end trace 7d08b9b7f2f17b35 ]--- BTRFS: error (device dm-9) in __btrfs_free_extent:5755: errno=-5 IO failure BTRFS info (device dm-9): forced readonly BTRFS: error (device dm-9) in btrfs_run_delayed_refs:2713: errno=-5 IO failure ------------[ cut here ]------------ -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems .... .... what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html