Re: [PATCH] btrfs: raid56: Use correct stolen pages to calculate P/Q
On 2016-11-26 19:54, Zygo Blaxell wrote: > On Sat, Nov 26, 2016 at 02:12:56PM +0100, Goffredo Baroncelli wrote: >> On 2016-11-25 05:31, Zygo Blaxell wrote: [...] >> >> BTW Btrfs in RAID1 mode corrects the data even in the read case. So > > Have you tested this? I think you'll find that it doesn't. Yes I tested it; and it does the rebuild automatically. I corrupted a disk of mirror, then I read the related file. The log says: [ 59.287748] BTRFS warning (device vdb): csum failed ino 257 off 0 csum 12813760 expected csum 3114703128 [ 59.291542] BTRFS warning (device vdb): csum failed ino 257 off 0 csum 12813760 expected csum 3114703128 [ 59.294950] BTRFS info (device vdb): read error corrected: ino 257 off 0 (dev /dev/vdb sector 2154496) ^ IIRC In case of RAID5/6 the last line is missing. However in both the case the data returned is good; but in RAID1 the data is corrected also on the disk. Where you read that the data is not rebuild automatically ? In fact I was surprised that RAID5/6 behaves differently > >> I am still convinced that is the RAID5/6 behavior "strange". >> >> BR >> G.Baroncelli >> -- >> gpg @keyserver.linux.it: Goffredo Baroncelli >> Key fingerprint BBF5 1610 0B64 DAC6 5F7D 17B2 0EDA 9B37 8B82 E0B5 >> -- gpg @keyserver.linux.it: Goffredo Baroncelli Key fingerprint BBF5 1610 0B64 DAC6 5F7D 17B2 0EDA 9B37 8B82 E0B5 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [CORRUPTION FILESYSTEM] Corrupted and unrecoverable file system during the snapshot receive
I haven't seen this with 4.7.10. I suggest running 'btrfs check' (without repair) using a recent btrfs-progs. You can find 4.8.3 in koji, just download the appropriate rpm, and 'dnf update *rpm' As for kernel, Fedora 23 has 4.8.8 in updates (stable) and 4.8.10 is in updates-testing, I suggest moving to one of those. Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs: raid56: Use correct stolen pages to calculate P/Q
On Sat, Nov 26, 2016 at 02:12:56PM +0100, Goffredo Baroncelli wrote: > On 2016-11-25 05:31, Zygo Blaxell wrote: > >>> Do you mean, read the corrupted data won't repair it? > >>> > >>> IIRC that's the designed behavior. > >> :O > >> > >> You are right... I was unaware of that > > This is correct. > > > > Ordinary reads shouldn't touch corrupt data, they should only read > > around it. Scrubs in read-write mode should write corrected data over > > the corrupt data. Read-only scrubs can only report errors without > > correcting them. > > > > Rewriting corrupt data outside of scrub (i.e. on every read) is a > > bad idea. Consider what happens if a RAM controller gets too hot: > > checksums start failing randomly, but the data on disk is still OK. > > If we tried to fix the bad data on every read, we'd probably just trash > > the filesystem in some cases. > > > > I cant agree. If the filesystem is mounted read-only this behavior may > be correct; bur in others cases I don't see any reason to not correct > wrong data even in the read case. If your ram is unreliable you have > big problem anyway. If you don't like RAM corruption, pick any other failure mode. Laptops have to deal with things like vibration and temperature extremes which produce the same results (spurious csum failures and IO errors under conditions where writing will only destroy data that would otherwise be recoverable). > The likelihood that the data contained in a disk is "corrupted" is > higher than the likelihood that the RAM is bad. > > BTW Btrfs in RAID1 mode corrects the data even in the read case. So Have you tested this? I think you'll find that it doesn't. > I am still convinced that is the RAID5/6 behavior "strange". > > BR > G.Baroncelli > -- > gpg @keyserver.linux.it: Goffredo Baroncelli > Key fingerprint BBF5 1610 0B64 DAC6 5F7D 17B2 0EDA 9B37 8B82 E0B5 > signature.asc Description: Digital signature
Re: reproducable oops in btrfs/130 with latests mainline
On 11/25/2016 03:07 AM, Christoph Hellwig wrote: Any chance to get someone look at this or the next bug report? I've been trying to reproduce, but haven't yet. This test does hit the CPU hard, which PREEMPT setting are you using? -chris On Mon, Nov 14, 2016 at 04:35:29AM -0800, Christoph Hellwig wrote: btrfs/130 [ 384.645337] run fstests btrfs/130 at 2016-11-14 12:33:26 [ 384.827333] BTRFS: device fsid bf118b00-e2e0-4a96-a177-765789170093 devid 1 transid 3 /dev/vdc [ 384.851643] BTRFS info (device vdc): disk space caching is enabled [ 384.852113] BTRFS info (device vdc): flagging fs with big metadata feature [ 384.857043] BTRFS info (device vdc): creating UUID tree [ 384.988347] BTRFS: device fsid 3b92b8c1-295d-4099-8623-d71a3cb270f8 devid 1 transid 3 /dev/vdc [ 385.001946] BTRFS info (device vdc): disk space caching is enabled [ 385.002846] BTRFS info (device vdc): flagging fs with big metadata feature [ 385.008870] BTRFS info (device vdc): creating UUID tree [ 416.318581] NMI watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [btrfs:12782] [ 416.319139] Modules linked in: [ 416.319366] CPU: 3 PID: 12782 Comm: btrfs Not tainted 4.9.0-rc1 #826 [ 416.319789] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014 [ 416.320466] task: 8801355a4140 task.stack: c90a4000 [ 416.320864] RIP: 0010:[] [] find_parent_nodes+0xb7d/0x1530 [ 416.321455] RSP: 0018:c90a79b0 EFLAGS: 0286 [ 416.321811] RAX: 88012de45640 RBX: RCX: c90a7a28 [ 416.322285] RDX: 88012de45660 RSI: 01ca8000 RDI: 88013b803e40 [ 416.322759] RBP: c90a7ab0 R08: 02400040 R09: 88010077a478 [ 416.323317] R10: 880127652f70 R11: 880127652f08 R12: 8800 [ 416.323791] R13: 6db6db6db6db6db7 R14: 8801295093b0 R15: [ 416.324262] FS: 7f83ef8398c0() GS:88013fd8() knlGS: [ 416.324795] CS: 0010 DS: ES: CR0: 80050033 [ 416.325176] CR2: 7f83ee4dbe38 CR3: 000136b56000 CR4: 06e0 [ 416.325649] DR0: DR1: DR2: [ 416.326120] DR3: DR6: fffe0ff0 DR7: 0400 [ 416.326590] Stack: [ 416.326730] 880102400040 88012a34 1063 [ 416.327257] 00010001 88012dd0e800 88012a34 0001 [ 416.327780] 88012a34 88013b803e40 00c4 [ 416.328304] Call Trace: [ 416.328475] [] ? changed_cb+0xb70/0xb70 [ 416.328841] [] iterate_extent_inodes+0xe7/0x270 [ 416.329251] [] ? release_extent_buffer+0x26/0xc0 [ 416.329657] [] ? free_extent_buffer+0x46/0x80 [ 416.330068] [] process_extent+0x69f/0xb00 [ 416.330452] [] changed_cb+0x2cb/0xb70 [ 416.330811] [] ? read_extent_buffer+0xe2/0x140 [ 416.331380] [] ? btrfs_search_slot_for_read+0xc2/0x1b0 [ 416.331905] [] btrfs_ioctl_send+0x1187/0x12c0 [ 416.332309] [] ? kmem_cache_alloc+0x8a/0x160 [ 416.332704] [] btrfs_ioctl+0x7dc/0x21f0 [ 416.333071] [] ? flat_send_IPI_mask+0xc/0x10 [ 416.333465] [] ? default_send_IPI_single+0x2d/0x30 [ 416.333893] [] ? native_smp_send_reschedule+0x27/0x40 [ 416.334340] [] ? resched_curr+0xad/0xb0 [ 416.334706] [] do_vfs_ioctl+0x8b/0x5b0 [ 416.335065] [] ? _do_fork+0x132/0x390 [ 416.335423] [] SyS_ioctl+0x3c/0x70 [ 416.335763] [] entry_SYSCALL_64_fastpath+0x1a/0xa9 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ---end quoted text--- -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs: raid56: Use correct stolen pages to calculate P/Q
On 11/22/2016 07:26 PM, Qu Wenruo wrote: We're changing which pages we kmap() but not which ones we kunmap(). Can you please update the kunmap loop to use this pointers array? Also it looks like this kmap is never unmapped. Oh I forget that. I'll update it soon. Thanks! This reminds me, is there any kernel debug option to trace such unmapped pages? I don't think so, which is surprising. It explodes so quickly on 32 bit machines, its easiest to boot it on a 32 bit qemu. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [CORRUPTION FILESYSTEM] Corrupted and unrecoverable file system during the snapshot receive
from /var/log/messages Nov 20 20:04:29 exnetold kernel: Modules linked in: fuse bridge ebtable_filter ebtables tun 8021q bnx2fc cnic uio garp mrp fcoe stp llc libfcoe libfc scsi_transport_fc nvidia_modeset(POE) nf_log_ipv4 nf_log_common xt_LOG xt_limit xt_multiport ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_CHECKSUM xt_conntrack iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack ip6table_filter ip6_tables hwmon_vid btrfs snd_hda_codec_analog xor snd_hda_codec_generic raid6_pq snd_hda_codec_hdmi powernow_k8 snd_hda_intel kvm_amd snd_hda_codec nvidia(POE) kvm snd_hda_core snd_hwdep ppdev irqbypass snd_seq snd_seq_device snd_pcm amd64_edac_mod edac_mce_amd edac_core snd_timer snd drm parport_pc k8temp soundcore shpchp parport asus_atk0110 i2c_nforce2 Nov 20 20:04:29 exnetold kernel: acpi_cpufreq tpm_tis tpm vboxnetadp(OE) vboxnetflt(OE) vboxdrv(OE) binfmt_misc ata_generic pata_acpi serio_raw sata_nv forcedeth fjes analog gameport joydev i2c_dev Nov 20 20:04:29 exnetold kernel: CPU: 1 PID: 16330 Comm: kworker/u4:0 Tainted: P OE 4.4.6-301.fc23.x86_64 #1 Nov 20 20:04:29 exnetold kernel: Hardware name: System manufacturer System Product Name/M2N, BIOS 090202/16/2009 Nov 20 20:04:29 exnetold kernel: Workqueue: btrfs-extent-refs btrfs_extent_refs_helper [btrfs] Nov 20 20:04:29 exnetold kernel: 0286 61a6ce66 88008c797ae0 813b542e Nov 20 20:04:29 exnetold kernel: a0c944d2 88008c797b18 810a40f2 Nov 20 20:04:29 exnetold kernel: 0006e7a8 fffe 88004edc Nov 20 20:04:29 exnetold kernel: Call Trace: Nov 20 20:04:29 exnetold kernel: [] dump_stack+0x63/0x85 Nov 20 20:04:29 exnetold kernel: [] warn_slowpath_common+0x82/0xc0 Nov 20 20:04:29 exnetold kernel: [] warn_slowpath_null+0x1a/0x20 Nov 20 20:04:29 exnetold kernel: [] __btrfs_free_extent.isra.69+0x898/0xd30 [btrfs] Nov 20 20:04:29 exnetold kernel: [] ? btrfs_free_path+0x26/0x30 [btrfs] Nov 20 20:04:29 exnetold kernel: [] ? __btrfs_inc_extent_ref.isra.53+0x11/0x270 [btrfs] Nov 20 20:04:29 exnetold kernel: [] __btrfs_run_delayed_refs+0xa7c/0x11a0 [btrfs] Nov 20 20:04:29 exnetold kernel: [] btrfs_run_delayed_refs+0x82/0x2a0 [btrfs] Nov 20 20:04:29 exnetold kernel: [] delayed_ref_async_start+0x37/0x90 [btrfs] Nov 20 20:04:29 exnetold kernel: [] btrfs_scrubparity_helper+0xc2/0x2e0 [btrfs] Nov 20 20:04:29 exnetold kernel: [] ? pwq_activate_delayed_work+0x3d/0x90 Nov 20 20:04:29 exnetold kernel: [] btrfs_extent_refs_helper+0xe/0x10 [btrfs] Nov 20 20:04:29 exnetold kernel: [] process_one_work+0x156/0x430 Nov 20 20:04:29 exnetold kernel: [] worker_thread+0x4e/0x450 Nov 20 20:04:29 exnetold kernel: [] ? process_one_work+0x430/0x430 Nov 20 20:04:29 exnetold kernel: [] kthread+0xd8/0xf0 Nov 20 20:04:29 exnetold kernel: [] ? kthread_worker_fn+0x160/0x160 Nov 20 20:04:29 exnetold kernel: [] ret_from_fork+0x3f/0x70 Nov 20 20:04:29 exnetold kernel: [] ? kthread_worker_fn+0x160/0x160 Nov 20 20:04:29 exnetold kernel: ---[ end trace 4a87ca727985e15e ]--- Nov 20 20:04:29 exnetold kernel: BTRFS info (device sda1): leaf 95421874176 total ptrs 209 free space 3459 Nov 20 20:04:29 exnetold kernel: #011item 0 key (29654663168 169 0) itemoff 16250 itemsize 33 Nov 20 20:04:29 exnetold kernel: #011#011extent refs 1 gen 400 flags 258 8717 itemsize 33 .. long text .. Nov 20 20:04:30 exnetold kernel: #011#011extent refs 1 gen 473 flags 258 Nov 20 20:04:30 exnetold kernel: #011#011shared block backref parent 29657972736 Nov 20 20:04:30 exnetold kernel: #011item 208 key (29658071040 169 0) itemoff 8684 itemsize 33 Nov 20 20:04:30 exnetold kernel: #011#011extent refs 1 gen 473 flags 258 Nov 20 20:04:30 exnetold kernel: #011#011shared block backref parent 29657972736 Nov 20 20:04:30 exnetold kernel: BTRFS error (device sda1): unable to find ref byte nr 29656350720 parent 0 root 448 owner 1 offset 0 Nov 20 20:04:30 exnetold kernel: [ cut here ] Nov 20 20:04:30 exnetold kernel: WARNING: CPU: 1 PID: 16330 at fs/btrfs/extent-tree.c:6549 __btrfs_free_extent.isra.69+0x8ff/0xd30 [btrfs]() Nov 20 20:04:30 exnetold kernel: Modules linked in: fuse bridge ebtable_filter ebtables tun 8021q bnx2fc cnic uio garp mrp fcoe stp llc libfcoe libfc scsi_transport_fc nvidia_modeset(POE) nf_log_ipv4 nf_log_common xt_LOG xt_limit xt_multiport ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_CHECKSUM xt_conntrack iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack ip6table_filter ip6_tables hwmon_vid btrfs snd_hda_codec_analog xor snd_hda_codec_generic raid6_pq snd_hda_codec_hdmi powernow_k8 snd_hda_intel kvm_amd snd_hda_codec nvidia(POE) kvm snd_hda_core snd_hwdep ppdev irqbypass snd_seq snd_seq_device snd_pcm amd64_edac_mod
Re: btrfs: still lockdep splat for 4.9-rc5+ (btrfs_log_inode)
On Fri, Nov 25, 2016 at 10:03:25AM +0100, Christian Borntraeger wrote: FWIW, I still see the lockdep splat in btrfs in 4.9-rc5+ Filipe reworked the code to avoid taking the same lock twice. As far as I can tell, this just needs some annotation. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs: raid56: Use correct stolen pages to calculate P/Q
On 2016-11-25 05:31, Zygo Blaxell wrote: >>> Do you mean, read the corrupted data won't repair it? >>> >>> IIRC that's the designed behavior. >> :O >> >> You are right... I was unaware of that > This is correct. > > Ordinary reads shouldn't touch corrupt data, they should only read > around it. Scrubs in read-write mode should write corrected data over > the corrupt data. Read-only scrubs can only report errors without > correcting them. > > Rewriting corrupt data outside of scrub (i.e. on every read) is a > bad idea. Consider what happens if a RAM controller gets too hot: > checksums start failing randomly, but the data on disk is still OK. > If we tried to fix the bad data on every read, we'd probably just trash > the filesystem in some cases. I cant agree. If the filesystem is mounted read-only this behavior may be correct; bur in others cases I don't see any reason to not correct wrong data even in the read case. If your ram is unreliable you have big problem anyway. The likelihood that the data contained in a disk is "corrupted" is higher than the likelihood that the RAM is bad. BTW Btrfs in RAID1 mode corrects the data even in the read case. So I am still convinced that is the RAID5/6 behavior "strange". BR G.Baroncelli -- gpg @keyserver.linux.it: Goffredo Baroncelli Key fingerprint BBF5 1610 0B64 DAC6 5F7D 17B2 0EDA 9B37 8B82 E0B5 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: mount option nodatacow for VMs on SSD?
Am Fri, 25 Nov 2016 09:28:40 +0100 schrieb Ulli Horlacher: > I have vmware and virtualbox VMs on btrfs SSD. > > I read in > https://btrfs.wiki.kernel.org/index.php/SysadminGuide#When_To_Make_Subvolumes > > certain types of data (databases, VM images and similar > typically big files that are randomly written internally) may require > CoW to be disabled for them. So for example such areas could be > placed in a subvolume, that is always mounted with the option > "nodatacow". > > Does this apply to SSDs, too? As a side note: I don't think you can use "nodatacow" just for one subvolume while the other subvolumes of the same btrfs are mounted different. The wiki is just wrong here. The list of possible mount options in the wiki explicitly lists "nodatacow" as not working per subvolume - just globally for the whole fs. -- Regards, Kai Replies to list-only preferred. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html