Re: [PATCH] btrfs: raid56: Use correct stolen pages to calculate P/Q

2016-11-26 Thread Goffredo Baroncelli
On 2016-11-26 19:54, Zygo Blaxell wrote:
> On Sat, Nov 26, 2016 at 02:12:56PM +0100, Goffredo Baroncelli wrote:
>> On 2016-11-25 05:31, Zygo Blaxell wrote:
[...]
>>
>> BTW Btrfs in RAID1 mode corrects the data even in the read case. So
> 
> Have you tested this?  I think you'll find that it doesn't.

Yes I tested it; and it does the rebuild automatically.
I corrupted a disk of mirror, then I read the related file. The log  says:

[   59.287748] BTRFS warning (device vdb): csum failed ino 257 off 0 csum 
12813760 expected csum 3114703128
[   59.291542] BTRFS warning (device vdb): csum failed ino 257 off 0 csum 
12813760 expected csum 3114703128
[   59.294950] BTRFS info (device vdb): read error corrected: ino 257 off 0 
(dev /dev/vdb sector 2154496)
^

IIRC In case of RAID5/6 the last line is missing. However in both the case the 
data returned is good; but in RAID1 the data is corrected also on the disk.

Where you read that the data is not rebuild automatically ?

In fact I was surprised that RAID5/6 behaves differently

> 
>> I am still convinced that is the RAID5/6 behavior "strange".
>>
>> BR
>> G.Baroncelli
>> -- 
>> gpg @keyserver.linux.it: Goffredo Baroncelli 
>> Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5
>>


-- 
gpg @keyserver.linux.it: Goffredo Baroncelli 
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [CORRUPTION FILESYSTEM] Corrupted and unrecoverable file system during the snapshot receive

2016-11-26 Thread Chris Murphy
I haven't seen this with 4.7.10. I suggest running 'btrfs check'
(without repair) using a recent btrfs-progs. You can find 4.8.3 in
koji, just download the appropriate rpm, and 'dnf update *rpm'

As for kernel, Fedora 23 has 4.8.8 in updates (stable) and 4.8.10 is
in updates-testing, I suggest moving to one of those.


Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: raid56: Use correct stolen pages to calculate P/Q

2016-11-26 Thread Zygo Blaxell
On Sat, Nov 26, 2016 at 02:12:56PM +0100, Goffredo Baroncelli wrote:
> On 2016-11-25 05:31, Zygo Blaxell wrote:
> >>> Do you mean, read the corrupted data won't repair it?
> >>>
> >>> IIRC that's the designed behavior.
> >> :O
> >>
> >> You are right... I was unaware of that
> > This is correct.
> > 
> > Ordinary reads shouldn't touch corrupt data, they should only read
> > around it.  Scrubs in read-write mode should write corrected data over
> > the corrupt data.  Read-only scrubs can only report errors without
> > correcting them.
> > 
> > Rewriting corrupt data outside of scrub (i.e. on every read) is a
> > bad idea.  Consider what happens if a RAM controller gets too hot:
> > checksums start failing randomly, but the data on disk is still OK.
> > If we tried to fix the bad data on every read, we'd probably just trash
> > the filesystem in some cases.
> 
> 
> 
> I cant agree. If the filesystem is mounted read-only this behavior may
> be correct; bur in others cases I don't see any reason to not correct
> wrong data even in the read case. If your ram is unreliable you have
> big problem anyway.

If you don't like RAM corruption, pick any other failure mode.  Laptops
have to deal with things like vibration and temperature extremes which
produce the same results (spurious csum failures and IO errors under
conditions where writing will only destroy data that would otherwise
be recoverable).

> The likelihood that the data contained in a disk is "corrupted" is
> higher than the likelihood that the RAM is bad.
>
> BTW Btrfs in RAID1 mode corrects the data even in the read case. So

Have you tested this?  I think you'll find that it doesn't.

> I am still convinced that is the RAID5/6 behavior "strange".
> 
> BR
> G.Baroncelli
> -- 
> gpg @keyserver.linux.it: Goffredo Baroncelli 
> Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5
> 


signature.asc
Description: Digital signature


Re: reproducable oops in btrfs/130 with latests mainline

2016-11-26 Thread Chris Mason

On 11/25/2016 03:07 AM, Christoph Hellwig wrote:

Any chance to get someone look at this or the next bug report?


I've been trying to reproduce, but haven't yet.  This test does hit the 
CPU hard, which PREEMPT setting are you using?


-chris



On Mon, Nov 14, 2016 at 04:35:29AM -0800, Christoph Hellwig wrote:

btrfs/130   [  384.645337] run fstests btrfs/130 at 2016-11-14
12:33:26
[  384.827333] BTRFS: device fsid bf118b00-e2e0-4a96-a177-765789170093 devid 1 
transid 3 /dev/vdc
[  384.851643] BTRFS info (device vdc): disk space caching is enabled
[  384.852113] BTRFS info (device vdc): flagging fs with big metadata feature
[  384.857043] BTRFS info (device vdc): creating UUID tree
[  384.988347] BTRFS: device fsid 3b92b8c1-295d-4099-8623-d71a3cb270f8 devid 1 
transid 3 /dev/vdc
[  385.001946] BTRFS info (device vdc): disk space caching is enabled
[  385.002846] BTRFS info (device vdc): flagging fs with big metadata
feature
[  385.008870] BTRFS info (device vdc): creating UUID tree
[  416.318581] NMI watchdog: BUG: soft lockup - CPU#3 stuck for 22s! 
[btrfs:12782]
[  416.319139] Modules linked in:
[  416.319366] CPU: 3 PID: 12782 Comm: btrfs Not tainted 4.9.0-rc1 #826
[  416.319789] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.7.5-20140531_083030-gandalf 04/01/2014
[  416.320466] task: 8801355a4140 task.stack: c90a4000
[  416.320864] RIP: 0010:[]  [] 
find_parent_nodes+0xb7d/0x1530
[  416.321455] RSP: 0018:c90a79b0  EFLAGS: 0286
[  416.321811] RAX: 88012de45640 RBX:  RCX: c90a7a28
[  416.322285] RDX: 88012de45660 RSI: 01ca8000 RDI: 88013b803e40
[  416.322759] RBP: c90a7ab0 R08: 02400040 R09: 88010077a478
[  416.323317] R10: 880127652f70 R11: 880127652f08 R12: 8800
[  416.323791] R13: 6db6db6db6db6db7 R14: 8801295093b0 R15: 
[  416.324262] FS:  7f83ef8398c0() GS:88013fd8() 
knlGS:
[  416.324795] CS:  0010 DS:  ES:  CR0: 80050033
[  416.325176] CR2: 7f83ee4dbe38 CR3: 000136b56000 CR4: 06e0
[  416.325649] DR0:  DR1:  DR2: 
[  416.326120] DR3:  DR6: fffe0ff0 DR7: 0400
[  416.326590] Stack:
[  416.326730]   880102400040 88012a34 
1063
[  416.327257]  00010001 88012dd0e800 88012a34 
0001
[  416.327780]  88012a34  88013b803e40 
00c4
[  416.328304] Call Trace:
[  416.328475]  [] ? changed_cb+0xb70/0xb70
[  416.328841]  [] iterate_extent_inodes+0xe7/0x270
[  416.329251]  [] ? release_extent_buffer+0x26/0xc0
[  416.329657]  [] ? free_extent_buffer+0x46/0x80
[  416.330068]  [] process_extent+0x69f/0xb00
[  416.330452]  [] changed_cb+0x2cb/0xb70
[  416.330811]  [] ? read_extent_buffer+0xe2/0x140
[  416.331380]  [] ? btrfs_search_slot_for_read+0xc2/0x1b0
[  416.331905]  [] btrfs_ioctl_send+0x1187/0x12c0
[  416.332309]  [] ? kmem_cache_alloc+0x8a/0x160
[  416.332704]  [] btrfs_ioctl+0x7dc/0x21f0
[  416.333071]  [] ? flat_send_IPI_mask+0xc/0x10
[  416.333465]  [] ? default_send_IPI_single+0x2d/0x30
[  416.333893]  [] ? native_smp_send_reschedule+0x27/0x40
[  416.334340]  [] ? resched_curr+0xad/0xb0
[  416.334706]  [] do_vfs_ioctl+0x8b/0x5b0
[  416.335065]  [] ? _do_fork+0x132/0x390
[  416.335423]  [] SyS_ioctl+0x3c/0x70
[  416.335763]  [] entry_SYSCALL_64_fastpath+0x1a/0xa9

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

---end quoted text---
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: raid56: Use correct stolen pages to calculate P/Q

2016-11-26 Thread Chris Mason

On 11/22/2016 07:26 PM, Qu Wenruo wrote:




We're changing which pages we kmap() but not which ones we kunmap(). Can
you please update the kunmap loop to use this pointers array?  Also it
looks like this kmap is never unmapped.


Oh I forget that.
I'll update it soon.


Thanks!



This reminds me, is there any kernel debug option to trace such unmapped
pages?



I don't think so, which is surprising.  It explodes so quickly on 32 bit 
machines, its easiest to boot it on a 32 bit qemu.


-chris

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [CORRUPTION FILESYSTEM] Corrupted and unrecoverable file system during the snapshot receive

2016-11-26 Thread Giuseppe Della Bianca
from /var/log/messages

Nov 20 20:04:29 exnetold kernel: Modules linked in: fuse bridge ebtable_filter 
ebtables tun 8021q bnx2fc cnic uio garp mrp fcoe stp llc libfcoe libfc 
scsi_transport_fc nvidia_modeset(POE) nf_log_ipv4 nf_log_common xt_LOG 
xt_limit xt_multiport ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 
nf_defrag_ipv6 xt_CHECKSUM xt_conntrack iptable_mangle ipt_MASQUERADE 
nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 
nf_nat_ipv4 nf_nat nf_conntrack ip6table_filter ip6_tables hwmon_vid btrfs 
snd_hda_codec_analog xor snd_hda_codec_generic raid6_pq snd_hda_codec_hdmi 
powernow_k8 snd_hda_intel kvm_amd snd_hda_codec nvidia(POE) kvm snd_hda_core 
snd_hwdep ppdev irqbypass snd_seq snd_seq_device snd_pcm amd64_edac_mod 
edac_mce_amd edac_core snd_timer snd drm parport_pc k8temp soundcore shpchp 
parport asus_atk0110 i2c_nforce2
Nov 20 20:04:29 exnetold kernel: acpi_cpufreq tpm_tis tpm vboxnetadp(OE) 
vboxnetflt(OE) vboxdrv(OE) binfmt_misc ata_generic pata_acpi serio_raw sata_nv 
forcedeth fjes analog gameport joydev i2c_dev
Nov 20 20:04:29 exnetold kernel: CPU: 1 PID: 16330 Comm: kworker/u4:0 Tainted: 
P   OE   4.4.6-301.fc23.x86_64 #1
Nov 20 20:04:29 exnetold kernel: Hardware name: System manufacturer System 
Product Name/M2N, BIOS 090202/16/2009
Nov 20 20:04:29 exnetold kernel: Workqueue: btrfs-extent-refs 
btrfs_extent_refs_helper [btrfs]
Nov 20 20:04:29 exnetold kernel: 0286 61a6ce66 
88008c797ae0 813b542e
Nov 20 20:04:29 exnetold kernel:  a0c944d2 
88008c797b18 
810a40f2
Nov 20 20:04:29 exnetold kernel: 0006e7a8 fffe 
 88004edc
Nov 20 20:04:29 exnetold kernel: Call Trace:
Nov 20 20:04:29 exnetold kernel: [] dump_stack+0x63/0x85
Nov 20 20:04:29 exnetold kernel: [] 
warn_slowpath_common+0x82/0xc0
Nov 20 20:04:29 exnetold kernel: [] 
warn_slowpath_null+0x1a/0x20
Nov 20 20:04:29 exnetold kernel: [] 
__btrfs_free_extent.isra.69+0x898/0xd30 [btrfs]
Nov 20 20:04:29 exnetold kernel: [] ? 
btrfs_free_path+0x26/0x30 
[btrfs]
Nov 20 20:04:29 exnetold kernel: [] ? 
__btrfs_inc_extent_ref.isra.53+0x11/0x270 [btrfs]
Nov 20 20:04:29 exnetold kernel: [] 
__btrfs_run_delayed_refs+0xa7c/0x11a0 [btrfs]
Nov 20 20:04:29 exnetold kernel: [] 
btrfs_run_delayed_refs+0x82/0x2a0 [btrfs]
Nov 20 20:04:29 exnetold kernel: [] 
delayed_ref_async_start+0x37/0x90 [btrfs]
Nov 20 20:04:29 exnetold kernel: [] 
btrfs_scrubparity_helper+0xc2/0x2e0 [btrfs]
Nov 20 20:04:29 exnetold kernel: [] ? 
pwq_activate_delayed_work+0x3d/0x90
Nov 20 20:04:29 exnetold kernel: [] 
btrfs_extent_refs_helper+0xe/0x10 [btrfs]
Nov 20 20:04:29 exnetold kernel: [] 
process_one_work+0x156/0x430
Nov 20 20:04:29 exnetold kernel: [] worker_thread+0x4e/0x450
Nov 20 20:04:29 exnetold kernel: [] ? 
process_one_work+0x430/0x430
Nov 20 20:04:29 exnetold kernel: [] kthread+0xd8/0xf0
Nov 20 20:04:29 exnetold kernel: [] ? 
kthread_worker_fn+0x160/0x160
Nov 20 20:04:29 exnetold kernel: [] ret_from_fork+0x3f/0x70
Nov 20 20:04:29 exnetold kernel: [] ? 
kthread_worker_fn+0x160/0x160
Nov 20 20:04:29 exnetold kernel: ---[ end trace 4a87ca727985e15e ]---
Nov 20 20:04:29 exnetold kernel: BTRFS info (device sda1): leaf 95421874176 
total ptrs 209 free space 3459
Nov 20 20:04:29 exnetold kernel: #011item 0 key (29654663168 169 0) itemoff 
16250 itemsize 33
Nov 20 20:04:29 exnetold kernel: #011#011extent refs 1 gen 400 flags 258
8717 itemsize 33
.. long text ..
Nov 20 20:04:30 exnetold kernel: #011#011extent refs 1 gen 473 flags 258
Nov 20 20:04:30 exnetold kernel: #011#011shared block backref parent 
29657972736
Nov 20 20:04:30 exnetold kernel: #011item 208 key (29658071040 169 0) itemoff 
8684 itemsize 33
Nov 20 20:04:30 exnetold kernel: #011#011extent refs 1 gen 473 flags 258
Nov 20 20:04:30 exnetold kernel: #011#011shared block backref parent 
29657972736
Nov 20 20:04:30 exnetold kernel: BTRFS error (device sda1): unable to find ref 
byte nr 29656350720 parent 0 root 448  owner 1 offset 0
Nov 20 20:04:30 exnetold kernel: [ cut here ]
Nov 20 20:04:30 exnetold kernel: WARNING: CPU: 1 PID: 16330 at 
fs/btrfs/extent-tree.c:6549 __btrfs_free_extent.isra.69+0x8ff/0xd30 [btrfs]()
Nov 20 20:04:30 exnetold kernel: Modules linked in: fuse bridge ebtable_filter 
ebtables tun 8021q bnx2fc cnic uio garp mrp fcoe stp llc libfcoe libfc 
scsi_transport_fc nvidia_modeset(POE) nf_log_ipv4 nf_log_common xt_LOG 
xt_limit xt_multiport ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 
nf_defrag_ipv6 xt_CHECKSUM xt_conntrack iptable_mangle ipt_MASQUERADE 
nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 
nf_nat_ipv4 nf_nat nf_conntrack ip6table_filter ip6_tables hwmon_vid btrfs 
snd_hda_codec_analog xor snd_hda_codec_generic raid6_pq snd_hda_codec_hdmi 
powernow_k8 snd_hda_intel kvm_amd snd_hda_codec nvidia(POE) kvm snd_hda_core 
snd_hwdep ppdev irqbypass snd_seq snd_seq_device snd_pcm amd64_edac_mod 

Re: btrfs: still lockdep splat for 4.9-rc5+ (btrfs_log_inode)

2016-11-26 Thread Chris Mason

On Fri, Nov 25, 2016 at 10:03:25AM +0100, Christian Borntraeger wrote:

FWIW, I still see the lockdep splat in btrfs in 4.9-rc5+


Filipe reworked the code to avoid taking the same lock twice.  As far as 
I can tell, this just needs some annotation.


-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: raid56: Use correct stolen pages to calculate P/Q

2016-11-26 Thread Goffredo Baroncelli
On 2016-11-25 05:31, Zygo Blaxell wrote:
>>> Do you mean, read the corrupted data won't repair it?
>>>
>>> IIRC that's the designed behavior.
>> :O
>>
>> You are right... I was unaware of that
> This is correct.
> 
> Ordinary reads shouldn't touch corrupt data, they should only read
> around it.  Scrubs in read-write mode should write corrected data over
> the corrupt data.  Read-only scrubs can only report errors without
> correcting them.
> 
> Rewriting corrupt data outside of scrub (i.e. on every read) is a
> bad idea.  Consider what happens if a RAM controller gets too hot:
> checksums start failing randomly, but the data on disk is still OK.
> If we tried to fix the bad data on every read, we'd probably just trash
> the filesystem in some cases.



I cant agree. If the filesystem is mounted read-only this behavior may be 
correct; bur in others cases I don't see any reason to not correct wrong data 
even in the read case. If your ram is unreliable you have big problem anyway.

The likelihood that the data contained in a disk is "corrupted" is higher than 
the likelihood that the RAM is bad.

BTW Btrfs in RAID1 mode corrects the data even in the read case. So I am still 
convinced that is the RAID5/6 behavior "strange".

BR
G.Baroncelli
-- 
gpg @keyserver.linux.it: Goffredo Baroncelli 
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: mount option nodatacow for VMs on SSD?

2016-11-26 Thread Kai Krakow
Am Fri, 25 Nov 2016 09:28:40 +0100
schrieb Ulli Horlacher :

> I have vmware and virtualbox VMs on btrfs SSD.
> 
> I read in
> https://btrfs.wiki.kernel.org/index.php/SysadminGuide#When_To_Make_Subvolumes
> 
>  certain types of data (databases, VM images and similar
> typically big files that are randomly written internally) may require
> CoW to be disabled for them.  So for example such areas could be
> placed in a subvolume, that is always mounted with the option
> "nodatacow".
> 
> Does this apply to SSDs, too?

As a side note: I don't think you can use "nodatacow" just for one
subvolume while the other subvolumes of the same btrfs are mounted
different. The wiki is just wrong here.

The list of possible mount options in the wiki explicitly lists
"nodatacow" as not working per subvolume - just globally for the whole
fs.

-- 
Regards,
Kai

Replies to list-only preferred.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html