On Sat, 2013-08-31 at 11:42 -0600, Chris Murphy wrote: > On Aug 31, 2013, at 4:12 AM, Steven Post <redalert.comman...@gmail.com> wrote: > > > > The system is running Debian Wheezy (kernel 3.2.0-4-amd64 #1 SMP Debian > > 3.2.46-1 x86_64). > > > > Is this something known (and possibly resolved in a later version), or > > should I open a bug report about it? > > Try 3.10 or 3.11 before filing a bug on it.
I don't intend on upgrading the first machine at this point, but I'll see if I can reproduce this on the second machine which is running Debian Testing (Jessie), that one has a 3.10.7 kernel. Hugo Mills suggested using a kernel from experimental, but I don't feel comfortable at running that at this point, as that would be a 3.11-rc4 kernel, I might consider it if the 3.11 release became available in 'unstable' (I understand that Linus might release 3.11 this weekend) . I might also consider running the 3.10 kernel from backports on the first machine if that would be necessary for some reason, but we'll see. > > > Could it be that the device removal > > was completed, but still shows as part of the array for some reason? > > Yes. It might take a few minutes after the chunks are reallocated for the > device to be removed from the volume. I've had some cases where even a reboot > was needed for the information in fi sh to refresh. I see, so that might be normal behaviour. Although we're several hours later now and there has been a reboot after the first time the "unable to go below four drives" error. I did start a balance operation after the reboot, we'll see what that gives. Once that completes, I intend to try removing the device again with the 'device delete' command, if that still gives the error I'll just remove the drive from the machine and go from there. > > > > The reason for the remove is actually that I want to (gradually) replace > > the 3TB drives with 1 TB ones, and somewhere in the middle move some of > > the data of the array, to another machine, that currently has the 1 TB > > drives which I intend to replace with the 3TB ones. > > Use a newer kernel for sure. What you suggest should work. If you're testing > to see if it does work, and you're prepared for it not working (i.e. totally > losing the entire file system) and prepared to find a consistent reproducer > if it doesn't work, then have at it. > > Otherwise, create a whole new btrfs volume with recent kernel and btrfs-progs > on the other machine; and then rsync everything from old to new. Rsync has a > checksum option, it will take longer, but you can then be reasonably assured > of file integrity. The plan was to switch 2 or 3 3TB drives with 1TB drives, then move data using sftp (scp), and then switch the remaining drives, all this time keeping the raid10 configuration. Except for the first switch on machine 2 as I didn't have the capacity to remove a single drive, so I had to mount degraded. As I was handling the second machine (3.10.7 kernel) the filesystem suddenly became read-only during a device delete missing operation with the a warning in /var/log/syslog (after already adding a new 3TB device), I'll add the Call Trace from the log at the end of this message for reference. After remounting (with -o degraded again) I issued a balance which completed successfully, then the device delete command immediately returned and the the filesystem seemed alright, with no sign of data loss or corruption. As an aside, I'd rather not recreate the arrays if it can be done without recreating. On the other hand we're not talking about a mission critical system, I wouldn't use btrfs for such a system at this point, but for home use (with backups) or testing, things seem to be in good shape. > > > Chris Murphy Thanks to all who replied for your responses. Best regards, Steven PS: I forgot to mention it in my first mail, but please CC me, I'm not subscribed to the list. I'll try to check the archives to see if I missed anything though. I see I missed 1 reply on the list, while 1 reply was sent to me directly, and a third didn't even hit the list archives (yet?) at spinics.net. PPS: sorry if I seem to be rambling on a bit about everything in a non-structured e-mail message. /var/log/syslog (3.10-2-amd64 #1 SMP Debian 3.10.7-1 (2013-08-17) x86_64): [16431.789463] btrfs: relocating block group 1573890818048 flags 65 [16456.635819] btrfs: found 3392 extents [16459.691201] BTRFS error (device sdb) in btrfs_commit_transaction:1809: errno=-5 IO failure (Error while writing out transaction) [16459.691207] BTRFS info (device sdb): forced readonly [16459.691210] BTRFS warning (device sdb): Skipping commit of aborted transaction. [16459.691212] ------------[ cut here ]------------ [16459.691252] WARNING: at /build/linux-kDQkfE/linux-3.10.7/fs/btrfs/super.c:254 __btrfs_abort_transaction+0x4a/0xbe [btrfs]() [16459.691253] btrfs: Transaction aborted (error -5) [16459.691254] Modules linked in: rpcsec_gss_krb5 nfsv4 nfnetlink_queue nfnetlink nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack xt_tcpudp ip6table_filter ip6_tables ebtable_nat ebtables iptable_filter ip_tables xt_iprange xt_state nf_conntrack ipt_REJECT xt_mark xt_NFQUEUE x_tables parport_pc ppdev lp parport bnep rfcomm bluetooth snd_hrtimer pci_stub vboxpci(O) vboxnetadp(O) cpufreq_userspace cpufreq_conservative cpufreq_powersave cpufreq_stats vboxnetflt(O) vboxdrv(O) binfmt_misc nfsd auth_rpcgss oid_registry nfs_acl nfs lockd dns_resolver fscache sunrpc loop fuse joydev adt7475 hwmon_vid snd_hda_codec_realtek snd_hda_intel coretemp snd_hda_codec snd_hwdep kvm_intel snd_pcm_oss snd_mixer_oss kvm snd_pcm snd_page_alloc crc32c_intel snd_seq_midi snd_seq_midi_event ghash_clmulni_intel snd_rawmidi snd_seq eeepc_wmi iTCO_wdt asus_wmi iTCO_vendor_support sparse_keymap rfkill evdev aesni_intel snd_seq_device snd_timer aes_x86_64 ablk_helper cryptd lrw gf128mul glue_helper microcode pcspkr snd nouveau psmouse serio_raw i2c_i801 mxm_wmi lpc_ich video mfd_core ttm drm_kms_helper drm mperf i2c_algo_bit i2c_core soundcore wmi mei_me processor button mei thermal_sys ext4 crc16 jbd2 mbcache btrfs xor zlib_deflate raid6_pq crc32c libcrc32c dm_mod hid_generic md_mod usbhid hid sg sd_mod crc_t10dif ata_generic xhci_hcd ehci_pci ehci_hcd pata_via ata_piix ahci libahci usbcore usb_common libata r8169 mii scsi_mod [16459.691308] CPU: 0 PID: 5381 Comm: btrfs Tainted: G O 3.10-2-amd64 #1 Debian 3.10.7-1 [16459.691309] Hardware name: System manufacturer System Product Name/P8H67, BIOS 1103 08/12/2011 [16459.691311] 0000000000000000 ffffffff8103bb5f ffff8801f75039f0 00000000fffffffb [16459.691313] ffff8801f7503a40 ffff88005ed243b0 ffffffffa01f5500 ffffffff8103bc0a [16459.691315] ffffffffa01f7288 0000000000000020 ffff8801f7503a50 ffff8801f7503a10 [16459.691317] Call Trace: [16459.691323] [<ffffffff8103bb5f>] ? warn_slowpath_common+0x5b/0x70 [16459.691326] [<ffffffff8103bc0a>] ? warn_slowpath_fmt+0x47/0x49 [16459.691334] [<ffffffffa017d657>] ? __btrfs_abort_transaction +0x4a/0xbe [btrfs] [16459.691344] [<ffffffffa019dbbe>] ? cleanup_transaction+0x84/0x24f [btrfs] [16459.691347] [<ffffffff81057c67>] ? abort_exclusive_wait+0x79/0x79 [16459.691357] [<ffffffffa019d870>] ? btrfs_commit_transaction +0x866/0x878 [btrfs] [16459.691359] [<ffffffff81057c67>] ? abort_exclusive_wait+0x79/0x79 [16459.691368] [<ffffffffa019e0ae>] ? start_transaction+0x325/0x448 [btrfs] [16459.691371] [<ffffffff8105f669>] ? should_resched+0x5/0x23 [16459.691374] [<ffffffff81384167>] ? mutex_lock+0xa/0x27 [16459.691384] [<ffffffffa01d3988>] ? prepare_to_relocate+0xc2/0xd0 [btrfs] [16459.691395] [<ffffffffa01d7d45>] ? relocate_block_group+0x3d/0x4db [btrfs] [16459.691404] [<ffffffffa01d8327>] ? btrfs_relocate_block_group +0x144/0x268 [btrfs] [16459.691415] [<ffffffffa01b9c23>] ? btrfs_relocate_chunk.isra.59 +0x50/0x3f6 [btrfs] [16459.691421] [<ffffffffa017e0eb>] ? btrfs_item_key_to_cpu+0x12/0x30 [btrfs] [16459.691432] [<ffffffffa01af0fc>] ? btrfs_get_token_64+0x76/0xc6 [btrfs] [16459.691442] [<ffffffffa01b19a1>] ? release_extent_buffer+0x90/0x97 [btrfs] [16459.691452] [<ffffffffa01bbea0>] ? btrfs_shrink_device+0x1f8/0x35e [btrfs] [16459.691462] [<ffffffffa01be84b>] ? btrfs_rm_device+0x2b8/0x690 [btrfs] [16459.691472] [<ffffffffa01c49ed>] ? btrfs_ioctl+0x8ee/0x197d [btrfs] [16459.691474] [<ffffffff810dee28>] ? handle_mm_fault+0x1f1/0x238 [16459.691476] [<ffffffff81388c33>] ? __do_page_fault+0x32d/0x3cb [16459.691479] [<ffffffff81115f74>] ? vfs_ioctl+0x1b/0x25 [16459.691480] [<ffffffff81116795>] ? do_vfs_ioctl+0x3e8/0x42a [16459.691482] [<ffffffff81116825>] ? SyS_ioctl+0x4e/0x79 [16459.691484] [<ffffffff8138ade9>] ? system_call_fastpath+0x16/0x1b [16459.691485] ---[ end trace 92cca53f6fe2bc37 ]--- [16459.691487] BTRFS error (device sdb) in cleanup_transaction:1449: errno=-5 IO failure [16459.691488] delayed_refs has NO entry
signature.asc
Description: This is a digitally signed message part