Re: [PATCH] btrfs: remove spurious WARN_ON(ref->count) in find_parent_nodes

2018-01-22 Thread Lu Fengqi
On Mon, Jan 22, 2018 at 08:35:43PM -0700, Edmund Nadolski wrote:
>On 1/22/18 5:58 AM, Nikolay Borisov wrote:
>> 
>> 
>> On 21.01.2018 21:08, Zygo Blaxell wrote:
>>> This warning appears during execution of the LOGICAL_INO ioctl and
>>> appears to be spurious:
>>>
>>> [ cut here ]
>>> WARNING: CPU: 3 PID: 18172 at fs/btrfs/backref.c:1391 
>>> find_parent_nodes+0xc41/0x14e0
>>> Modules linked in: ib_iser rdma_cm iw_cm ib_cm ib_core configfs 
>>> iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi overlay r8169 ufs qnx4 
>>> hfsplus hfs minix ntfs vfat msdos fat jfs xfs cpuid rpcsec_gss_krb5 nfsv4 
>>> nfsv3 nfs fscache algif_skcipher af_alg softdog nfsd auth_rpcgss nfs_acl 
>>> lockd grace sunrpc bnep cpufreq_userspace cpufreq_powersave 
>>> cpufreq_conservative nfnetlink_queue nfnetlink_log nfnetlink bluetooth 
>>> rfkill snd_seq_dummy snd_hrtimer snd_seq_midi snd_seq_oss 
>>> snd_seq_midi_event snd_rawmidi snd_seq snd_seq_device binfmt_misc fuse nbd 
>>> xt_REDIRECT nf_nat_redirect ipt_REJECT nf_reject_ipv4 xt_nat xt_conntrack 
>>> xt_tcpudp nf_log_ipv4 nf_log_common xt_LOG ip6table_nat nf_conntrack_ipv6 
>>> nf_defrag_ipv6 nf_nat_ipv6 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 
>>> nf_nat_ipv4 nf_nat nf_conntrack
>>>  ip6table_mangle iptable_mangle ip6table_filter ip6_tables 
>>> iptable_filter ip_tables x_tables tcp_cubic dummy lp dm_crypt edac_mce_amd 
>>> edac_core snd_hda_codec_hdmi ppdev kvm_amd kvm irqbypass crct10dif_pclmul 
>>> crc32_pclmul ghash_clmulni_intel snd_hda_codec_via pcbc amdkfd 
>>> snd_hda_codec_generic amd_iommu_v2 aesni_intel snd_hda_intel radeon 
>>> snd_hda_codec aes_x86_64 snd_hda_core snd_hwdep crypto_simd glue_helper sg 
>>> snd_pcm_oss cryptd input_leds joydev pcspkr serio_raw snd_mixer_oss 
>>> rtc_cmos snd_pcm parport_pc parport shpchp wmi acpi_cpufreq evdev snd_timer 
>>> asus_atk0110 k10temp fam15h_power snd soundcore sp5100_tco hid_generic ipv6 
>>> af_packet crc_ccitt raid10 raid456 async_raid6_recov async_memcpy async_pq 
>>> async_xor async_tx libcrc32c raid0 multipath linear dm_mod raid1 md_mod 
>>> ohci_pci ide_pci_generic
>>>  sr_mod cdrom pdc202xx_new ohci_hcd crc32c_intel atiixp ehci_pci 
>>> psmouse ide_core i2c_piix4 ehci_hcd xhci_pci mii xhci_hcd [last unloaded: 
>>> r8169]
>>> CPU: 3 PID: 18172 Comm: bees Tainted: G  D WL  4.11.9-zb64+ #1
>>> Hardware name: System manufacturer System Product Name/M5A78L-M/USB3, 
>>> BIOS 210112/02/2014
>>> Call Trace:
>>>  dump_stack+0x85/0xc2
>>>  __warn+0xd1/0xf0
>>>  warn_slowpath_null+0x1d/0x20
>>>  find_parent_nodes+0xc41/0x14e0
>>>  __btrfs_find_all_roots+0xad/0x120
>>>  ? extent_same_check_offsets+0x70/0x70
>>>  iterate_extent_inodes+0x168/0x300
>>>  iterate_inodes_from_logical+0x87/0xb0
>>>  ? iterate_inodes_from_logical+0x87/0xb0
>>>  ? extent_same_check_offsets+0x70/0x70
>>>  btrfs_ioctl+0x8ac/0x2820
>>>  ? lock_acquire+0xc2/0x200
>>>  do_vfs_ioctl+0x91/0x700
>>>  ? __fget+0x112/0x200
>>>  SyS_ioctl+0x79/0x90
>>>  entry_SYSCALL_64_fastpath+0x23/0xc6
>>> RIP: 0033:0x7f727b20be07
>>> RSP: 002b:7f7279f1e018 EFLAGS: 0246 ORIG_RAX: 0010
>>> RAX: ffda RBX: 9c0f4d7f RCX: 7f727b20be07
>>> RDX: 7f7279f1e118 RSI: c0389424 RDI: 0003
>>> RBP: 0035 R08: 7f72581bf340 R09: 
>>> R10: 0020 R11: 0246 R12: 0040
>>> R13: 7f725818d230 R14: 7f7279f1b640 R15: 7f725820
>>>  ? trace_hardirqs_off_caller+0x1f/0x140
>>> ---[ end trace 5de243350f6762c6 ]---
>>> [ cut here ]
>>>
>>> ref->count can be below zero under normal conditions (for delayed refs),
>>> so there is no need to spam dmesg when it happens.
>> 
>> Why do you think it's normal for this to be a negative value under
>> normal conditions? There should be some rationale about that otherwise
>> you are pampering over a bug.
>
>
>The ref->count in the prelim_ref can be <0 for a delayed ref that
>has a node->action of BTRFS_DROP_DELAYED_REF.  The prelim_ref_insert()
>relies on this when merging identical refs to keep the overall
>count correct.  So it looks to me like it should be OK to remove
>the WARN.

The call graph of find_parent_nodes:
add_delayed_refs
add_inline_refs
add_keyed_refs
add_missing_keys
-merge_refs (MERGE_IDENTICAL_KEYS)
resolve_indirect_refs
-merge_refs (MERGE_IDENTICAL_PARENTS)
WARN_ON(ref->count < 0)

Yes, I agree that the ref->count in the prelim_ref can be less than 0
between add_delayed_refs and add_inline_refs. However, prelim_ref_insert
(or merge_refs before commit 86d5f9944252 ("btrfs: convert prelimary
reference tracking to use rbtrees")) have merged all refs for the same
block before this WARN_ON, so I'm still confused about why there is the
independent negative delayed ref.

>
>(However the ref_mod in the 

Re: [PATCH] btrfs: remove spurious WARN_ON(ref->count) in find_parent_nodes

2018-01-22 Thread Nikolay Borisov


On 23.01.2018 05:35, Edmund Nadolski wrote:
> On 1/22/18 5:58 AM, Nikolay Borisov wrote:
>>
>>
>> On 21.01.2018 21:08, Zygo Blaxell wrote:
>>> This warning appears during execution of the LOGICAL_INO ioctl and
>>> appears to be spurious:
>>>
>>> [ cut here ]
>>> WARNING: CPU: 3 PID: 18172 at fs/btrfs/backref.c:1391 
>>> find_parent_nodes+0xc41/0x14e0
>>> Modules linked in: ib_iser rdma_cm iw_cm ib_cm ib_core configfs 
>>> iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi overlay r8169 ufs qnx4 
>>> hfsplus hfs minix ntfs vfat msdos fat jfs xfs cpuid rpcsec_gss_krb5 nfsv4 
>>> nfsv3 nfs fscache algif_skcipher af_alg softdog nfsd auth_rpcgss nfs_acl 
>>> lockd grace sunrpc bnep cpufreq_userspace cpufreq_powersave 
>>> cpufreq_conservative nfnetlink_queue nfnetlink_log nfnetlink bluetooth 
>>> rfkill snd_seq_dummy snd_hrtimer snd_seq_midi snd_seq_oss 
>>> snd_seq_midi_event snd_rawmidi snd_seq snd_seq_device binfmt_misc fuse nbd 
>>> xt_REDIRECT nf_nat_redirect ipt_REJECT nf_reject_ipv4 xt_nat xt_conntrack 
>>> xt_tcpudp nf_log_ipv4 nf_log_common xt_LOG ip6table_nat nf_conntrack_ipv6 
>>> nf_defrag_ipv6 nf_nat_ipv6 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 
>>> nf_nat_ipv4 nf_nat nf_conntrack
>>>  ip6table_mangle iptable_mangle ip6table_filter ip6_tables 
>>> iptable_filter ip_tables x_tables tcp_cubic dummy lp dm_crypt edac_mce_amd 
>>> edac_core snd_hda_codec_hdmi ppdev kvm_amd kvm irqbypass crct10dif_pclmul 
>>> crc32_pclmul ghash_clmulni_intel snd_hda_codec_via pcbc amdkfd 
>>> snd_hda_codec_generic amd_iommu_v2 aesni_intel snd_hda_intel radeon 
>>> snd_hda_codec aes_x86_64 snd_hda_core snd_hwdep crypto_simd glue_helper sg 
>>> snd_pcm_oss cryptd input_leds joydev pcspkr serio_raw snd_mixer_oss 
>>> rtc_cmos snd_pcm parport_pc parport shpchp wmi acpi_cpufreq evdev snd_timer 
>>> asus_atk0110 k10temp fam15h_power snd soundcore sp5100_tco hid_generic ipv6 
>>> af_packet crc_ccitt raid10 raid456 async_raid6_recov async_memcpy async_pq 
>>> async_xor async_tx libcrc32c raid0 multipath linear dm_mod raid1 md_mod 
>>> ohci_pci ide_pci_generic
>>>  sr_mod cdrom pdc202xx_new ohci_hcd crc32c_intel atiixp ehci_pci 
>>> psmouse ide_core i2c_piix4 ehci_hcd xhci_pci mii xhci_hcd [last unloaded: 
>>> r8169]
>>> CPU: 3 PID: 18172 Comm: bees Tainted: G  D WL  4.11.9-zb64+ #1
>>> Hardware name: System manufacturer System Product Name/M5A78L-M/USB3, 
>>> BIOS 210112/02/2014
>>> Call Trace:
>>>  dump_stack+0x85/0xc2
>>>  __warn+0xd1/0xf0
>>>  warn_slowpath_null+0x1d/0x20
>>>  find_parent_nodes+0xc41/0x14e0
>>>  __btrfs_find_all_roots+0xad/0x120
>>>  ? extent_same_check_offsets+0x70/0x70
>>>  iterate_extent_inodes+0x168/0x300
>>>  iterate_inodes_from_logical+0x87/0xb0
>>>  ? iterate_inodes_from_logical+0x87/0xb0
>>>  ? extent_same_check_offsets+0x70/0x70
>>>  btrfs_ioctl+0x8ac/0x2820
>>>  ? lock_acquire+0xc2/0x200
>>>  do_vfs_ioctl+0x91/0x700
>>>  ? __fget+0x112/0x200
>>>  SyS_ioctl+0x79/0x90
>>>  entry_SYSCALL_64_fastpath+0x23/0xc6
>>> RIP: 0033:0x7f727b20be07
>>> RSP: 002b:7f7279f1e018 EFLAGS: 0246 ORIG_RAX: 0010
>>> RAX: ffda RBX: 9c0f4d7f RCX: 7f727b20be07
>>> RDX: 7f7279f1e118 RSI: c0389424 RDI: 0003
>>> RBP: 0035 R08: 7f72581bf340 R09: 
>>> R10: 0020 R11: 0246 R12: 0040
>>> R13: 7f725818d230 R14: 7f7279f1b640 R15: 7f725820
>>>  ? trace_hardirqs_off_caller+0x1f/0x140
>>> ---[ end trace 5de243350f6762c6 ]---
>>> [ cut here ]
>>>
>>> ref->count can be below zero under normal conditions (for delayed refs),
>>> so there is no need to spam dmesg when it happens.
>>
>> Why do you think it's normal for this to be a negative value under
>> normal conditions? There should be some rationale about that otherwise
>> you are pampering over a bug.
> 
> 
> The ref->count in the prelim_ref can be <0 for a delayed ref that
> has a node->action of BTRFS_DROP_DELAYED_REF.  The prelim_ref_insert()
> relies on this when merging identical refs to keep the overall
> count correct.  So it looks to me like it should be OK to remove
> the WARN.

Right, I don't understand the backref code so I will have to agree with
you :) At least the explanation you provided should be in the change log

> 
> (However the ref_mod in the btrfs_delayed_ref_node evidently cannot
> go <0).
> 
> 
>>> On kernel v4.14 this warning occurs 100-1000 times more frequently than
>>> on kernels v4.2..v4.12.  In the worst case, one test machine had 59020
>>> warnings in 24 hours on v4.14.14 compared to 55 on v4.12.14.
>>>
>>> Signed-off-by: Zygo Blaxell 
>>> ---
>>>  fs/btrfs/backref.c | 1 -
>>>  1 file changed, 1 deletion(-)
>>>
>>> diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c
>>> 

Superblock update: Is there really any benefits of updating synchronously?

2018-01-22 Thread waxhead
Note: This have been mentioned before, but since I see some issues 
related to superblocks I think it would be good to bring up the question 
again.


According to the information found in the wiki: 
https://btrfs.wiki.kernel.org/index.php/On-disk_Format#Superblock


The superblocks are updated synchronously on HDD's and one after each 
other on SSD's.


Superblocks are also (to my knowledge) not protected by copy-on-write 
and are read-modify-update.


On a storage device with >256GB there will be three superblocks.

BTRFS will always prefer the superblock with the highest generation 
number providing that the checksum is good.


On the list there seem to be a few incidents where the superblocks have 
gone toast and I am pondering what (if any) benefits there is by 
updating the superblocks synchronously.


The superblock is checkpoint'ed every 30 seconds by default and if 
someone pulls the plug (poweroutage) on HDD's then a synchronous write 
depending on (the quality of) your hardware may perhaps ruin all the 
superblock copies in one go. E.g. Copy A,B and C will all be updated at 30s.


On SSD's, since one superblock is updated after other it would mean that 
using the default 30 second checkpoint Copy A=30s, Copy B=1m, Copy C=1m30s


Why is the SSD method not used on harddrives also?! If two superblocks 
are toast you would at maximum loose 1m30s by default , and if this is 
considered a problem then you can always adjust downwards the commit 
time. If this is set to 15 seconds you would still only loose 30 seconds 
of "action time" and would in my opinion be far better off from a 
reliability point of view than having to update multiple superblocks at 
the same time. I can't see why on earth updating all superblocks at the 
same time would have any benefits.


So this all boils down to the questions three (ere the other side will 
see. :P )


1. What are the benefits of updating all superblocks at the same time? 
(Just imagine if your memory is bad - you could risk updating all 
superblocks simultaneously with kebab'ed data).


2. What would the negative consequences be by using the SSD scheme also 
for harddisks? Especially if the commit time is set to 15s instead of 30s


3. In a RAID1 / 10 / 5 / 6 like setup. Would a set of corrupt 
superblocks on a single drive be recoverable from other disks or do the 
superblocks need to be intact on the (possibly) damaged drive?
(If the superblocks are needed then why would not SSD mode be better 
especially if the drive is partly working)


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: remove spurious WARN_ON(ref->count) in find_parent_nodes

2018-01-22 Thread Edmund Nadolski
On 1/22/18 5:58 AM, Nikolay Borisov wrote:
> 
> 
> On 21.01.2018 21:08, Zygo Blaxell wrote:
>> This warning appears during execution of the LOGICAL_INO ioctl and
>> appears to be spurious:
>>
>>  [ cut here ]
>>  WARNING: CPU: 3 PID: 18172 at fs/btrfs/backref.c:1391 
>> find_parent_nodes+0xc41/0x14e0
>>  Modules linked in: ib_iser rdma_cm iw_cm ib_cm ib_core configfs 
>> iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi overlay r8169 ufs qnx4 
>> hfsplus hfs minix ntfs vfat msdos fat jfs xfs cpuid rpcsec_gss_krb5 nfsv4 
>> nfsv3 nfs fscache algif_skcipher af_alg softdog nfsd auth_rpcgss nfs_acl 
>> lockd grace sunrpc bnep cpufreq_userspace cpufreq_powersave 
>> cpufreq_conservative nfnetlink_queue nfnetlink_log nfnetlink bluetooth 
>> rfkill snd_seq_dummy snd_hrtimer snd_seq_midi snd_seq_oss snd_seq_midi_event 
>> snd_rawmidi snd_seq snd_seq_device binfmt_misc fuse nbd xt_REDIRECT 
>> nf_nat_redirect ipt_REJECT nf_reject_ipv4 xt_nat xt_conntrack xt_tcpudp 
>> nf_log_ipv4 nf_log_common xt_LOG ip6table_nat nf_conntrack_ipv6 
>> nf_defrag_ipv6 nf_nat_ipv6 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 
>> nf_nat_ipv4 nf_nat nf_conntrack
>>   ip6table_mangle iptable_mangle ip6table_filter ip6_tables 
>> iptable_filter ip_tables x_tables tcp_cubic dummy lp dm_crypt edac_mce_amd 
>> edac_core snd_hda_codec_hdmi ppdev kvm_amd kvm irqbypass crct10dif_pclmul 
>> crc32_pclmul ghash_clmulni_intel snd_hda_codec_via pcbc amdkfd 
>> snd_hda_codec_generic amd_iommu_v2 aesni_intel snd_hda_intel radeon 
>> snd_hda_codec aes_x86_64 snd_hda_core snd_hwdep crypto_simd glue_helper sg 
>> snd_pcm_oss cryptd input_leds joydev pcspkr serio_raw snd_mixer_oss rtc_cmos 
>> snd_pcm parport_pc parport shpchp wmi acpi_cpufreq evdev snd_timer 
>> asus_atk0110 k10temp fam15h_power snd soundcore sp5100_tco hid_generic ipv6 
>> af_packet crc_ccitt raid10 raid456 async_raid6_recov async_memcpy async_pq 
>> async_xor async_tx libcrc32c raid0 multipath linear dm_mod raid1 md_mod 
>> ohci_pci ide_pci_generic
>>   sr_mod cdrom pdc202xx_new ohci_hcd crc32c_intel atiixp ehci_pci 
>> psmouse ide_core i2c_piix4 ehci_hcd xhci_pci mii xhci_hcd [last unloaded: 
>> r8169]
>>  CPU: 3 PID: 18172 Comm: bees Tainted: G  D WL  4.11.9-zb64+ #1
>>  Hardware name: System manufacturer System Product Name/M5A78L-M/USB3, 
>> BIOS 210112/02/2014
>>  Call Trace:
>>   dump_stack+0x85/0xc2
>>   __warn+0xd1/0xf0
>>   warn_slowpath_null+0x1d/0x20
>>   find_parent_nodes+0xc41/0x14e0
>>   __btrfs_find_all_roots+0xad/0x120
>>   ? extent_same_check_offsets+0x70/0x70
>>   iterate_extent_inodes+0x168/0x300
>>   iterate_inodes_from_logical+0x87/0xb0
>>   ? iterate_inodes_from_logical+0x87/0xb0
>>   ? extent_same_check_offsets+0x70/0x70
>>   btrfs_ioctl+0x8ac/0x2820
>>   ? lock_acquire+0xc2/0x200
>>   do_vfs_ioctl+0x91/0x700
>>   ? __fget+0x112/0x200
>>   SyS_ioctl+0x79/0x90
>>   entry_SYSCALL_64_fastpath+0x23/0xc6
>>  RIP: 0033:0x7f727b20be07
>>  RSP: 002b:7f7279f1e018 EFLAGS: 0246 ORIG_RAX: 0010
>>  RAX: ffda RBX: 9c0f4d7f RCX: 7f727b20be07
>>  RDX: 7f7279f1e118 RSI: c0389424 RDI: 0003
>>  RBP: 0035 R08: 7f72581bf340 R09: 
>>  R10: 0020 R11: 0246 R12: 0040
>>  R13: 7f725818d230 R14: 7f7279f1b640 R15: 7f725820
>>   ? trace_hardirqs_off_caller+0x1f/0x140
>>  ---[ end trace 5de243350f6762c6 ]---
>>  [ cut here ]
>>
>> ref->count can be below zero under normal conditions (for delayed refs),
>> so there is no need to spam dmesg when it happens.
> 
> Why do you think it's normal for this to be a negative value under
> normal conditions? There should be some rationale about that otherwise
> you are pampering over a bug.


The ref->count in the prelim_ref can be <0 for a delayed ref that
has a node->action of BTRFS_DROP_DELAYED_REF.  The prelim_ref_insert()
relies on this when merging identical refs to keep the overall
count correct.  So it looks to me like it should be OK to remove
the WARN.

(However the ref_mod in the btrfs_delayed_ref_node evidently cannot
go <0).


>> On kernel v4.14 this warning occurs 100-1000 times more frequently than
>> on kernels v4.2..v4.12.  In the worst case, one test machine had 59020
>> warnings in 24 hours on v4.14.14 compared to 55 on v4.12.14.
>>
>> Signed-off-by: Zygo Blaxell 
>> ---
>>  fs/btrfs/backref.c | 1 -
>>  1 file changed, 1 deletion(-)
>>
>> diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c
>> index 7d0dc100a09a..57e8d2562ed5 100644
>> --- a/fs/btrfs/backref.c
>> +++ b/fs/btrfs/backref.c
>> @@ -1263,7 +1263,6 @@ static int find_parent_nodes(struct btrfs_trans_handle 
>> *trans,
>>  while (node) {
>>  ref = rb_entry(node, struct prelim_ref, 

Re: bad key ordering - repairable?

2018-01-22 Thread Chris Murphy
On Mon, Jan 22, 2018 at 2:06 PM, Claes Fransson
 wrote:
> Hi!
>
> I really like the features of BTRFS, especially deduplication,
> snapshotting and checksumming. However, when using it on my laptop the
> last couple of years, it has became corrupted a lot of times.
> Sometimes I have managed to fix the problems (at least so much that I
> can continue to use the filesystem) with check --repair, but several
> times I had to recreate the file system and reinstall the operating
> system.
>
> I am guessing the corruptions might be the results of unclean
> shutdowns, mostly after system hangs, but also because of running out
> of battery sometimes?

I think it's something else because I intentionally and
unintentionally do unclean shutdowns (I'm really impatient and I'm a
saboteur) on my laptop and I never get corruptions. In 18 months with
an HP Spectre which doesn't even have ECC memory, and has an NVMe
drive, *and* really remarkable for almost half this time I used the
discard mount option which pretty much instantly obliterates unused
roots, even when referenced in the super block as backup roots - and
yet still zero corruption. No complaints on mount, scrub, or readonly
checks. *shrug*

Anyway I suspect hardware or power issue. Or even SSD firmware issue.

> Furthermore, the power-led has recently started blinking (also when
> the power-cable is plugged in), I guess because of an old and bad
> battery. Maybe the current corruption also can have something to do
> with this? However I almost always run with power cable plugged in in
> last year, only on battery a few seconds a few times when moving the
> laptop.
>
> Currently, I can only mount the filesystem readonly, it goes readonly
> automatically if I try to mount it normally.

Btrfs is confused and doesn't want to make the corruption worse.




>
> Fstab mount options: noatime,autodefrag (I have been using the option
> nossd with older kernels one period in the past on the filesystem).
>
> If it matters, I have been running duperemove many times on the
> filesystem since creation.

I don't think it's related.


>
> To test the RAM, I have been running mprime Blend-test for 24 hours
> after the corruption without any error or warning.

I'm not familiar with it, pretty sure you want this for UEFI:

https://www.memtest86.com/download.htm

Where you can use that or memtest86+ if the firmware is BIOS based.


> I have never noticed any corruptions on the NTFS and Ext4 file systems
> on the laptop, only on the Btrfs file systems.

NTFS and ext4 likely won't notice such corruptions either (although
new ext4 volumes any day now will have checksummed metadata by
default) as they're weren't designed with such detection in mind.


-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 1/2] btrfs: fix device order consistency

2018-01-22 Thread Anand Jain
By maintaining the device order consistency it makes reproducing
the problems related to missing chunk in the degraded mode much more
consistent. So fix this by sorting the devices by devid within the
kernel. So that we know which device is assigned to the struct
fs_info::latest_bdev when all the devices are having and same
SB generation.

Signed-off-by: Anand Jain 
---
 fs/btrfs/volumes.c | 16 
 1 file changed, 16 insertions(+)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 03f2685a5018..98e41d286283 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -27,6 +27,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include "ctree.h"
 #include "extent_map.h"
@@ -1107,6 +1108,20 @@ static int __btrfs_open_devices(struct btrfs_fs_devices 
*fs_devices,
return ret;
 }
 
+static int device_sort(void *priv, struct list_head *a, struct list_head *b)
+{
+   struct btrfs_device *dev1, *dev2;
+
+   dev1 = list_entry(a, struct btrfs_device, dev_list);
+   dev2 = list_entry(b, struct btrfs_device, dev_list);
+
+   if (dev1->devid < dev2->devid)
+   return -1;
+   else if (dev1->devid > dev2->devid)
+   return 1;
+   return 0;
+}
+
 int btrfs_open_devices(struct btrfs_fs_devices *fs_devices,
   fmode_t flags, void *holder)
 {
@@ -1117,6 +1132,7 @@ int btrfs_open_devices(struct btrfs_fs_devices 
*fs_devices,
fs_devices->opened++;
ret = 0;
} else {
+   list_sort(NULL, _devices->devices, device_sort);
ret = __btrfs_open_devices(fs_devices, flags, holder);
}
mutex_unlock(_mutex);
-- 
2.7.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 0/2] fix device orders consistency

2018-01-22 Thread Anand Jain
Hi David,

 Could you pls consider this for 4.16 ?

v1->v2:
 No code change. Change log updated to include the type
 of problem that this consistency would help. And
 I don't see patch 2/2 in the ML. So trying to resend.

By maintaining the device order (some) consistency it makes reproducing
the missing chunk related problems more consistent. (More fixes of this
sort is coming up).

Anand Jain (2):
  btrfs: fix device order consistency
  btrfs: fix alloc device order consistency

 fs/btrfs/volumes.c | 18 +-
 1 file changed, 17 insertions(+), 1 deletion(-)

-- 
2.7.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: bad key ordering - repairable?

2018-01-22 Thread Hugo Mills
On Mon, Jan 22, 2018 at 10:06:58PM +0100, Claes Fransson wrote:
> Hi!
> 
> I really like the features of BTRFS, especially deduplication,
> snapshotting and checksumming. However, when using it on my laptop the
> last couple of years, it has became corrupted a lot of times.
> Sometimes I have managed to fix the problems (at least so much that I
> can continue to use the filesystem) with check --repair, but several
> times I had to recreate the file system and reinstall the operating
> system.
> 
> I am guessing the corruptions might be the results of unclean
> shutdowns, mostly after system hangs, but also because of running out
> of battery sometimes?
> Furthermore, the power-led has recently started blinking (also when
> the power-cable is plugged in), I guess because of an old and bad
> battery. Maybe the current corruption also can have something to do
> with this? However I almost always run with power cable plugged in in
> last year, only on battery a few seconds a few times when moving the
> laptop.
> 
> Currently, I can only mount the filesystem readonly, it goes readonly
> automatically if I try to mount it normally.
> 
> When booting an OpenSUSE Tumbleweed-20180119 live-iso:
> localhost:~ # uname -r
> 4.14.13-1-default
> localhost:~ # btrfs --version
> btrfs-progs v4.14.1
> 
> localhost:~ # btrfs check -p /dev/sda12
> Checking filesystem on /dev/sda12

[fixing up bad paste]

> UUID: d2819d5a-fd69-484b-bf34-f2b5692cbe1f
> bad key ordering 159 160 bad block 690436964352
> ERROR: errors found in extent allocation tree or chunk allocation
> checking free space cache [.]
> checking fs roots [o]
> checking csums
> bad key ordering 159 160
> Error looking up extent record -1

[snip]

> localhost:~ # btrfs inspect-internal dump-tree -b 690436964352
> /dev/sda12
> btrfs-progs v4.14.1
> leaf 690436964352 items 170 free space 1811 generation 196864 owner 2
> leaf 690436964352 flags 0x1(WRITTEN) backref revision 1
> fs uuid d2819d5a-fd69-484b-bf34-f2b5692cbe1f
> chunk uuid 52f81fe6-893b-4432-9336-895057ee81e1
> .
> .
> .
> item 157 key (22732500992 EXTENT_ITEM 16384) itemoff 6538 itemsize 53
> refs 1 gen 821 flags DATA
> extent data backref root 287 objectid 51665 offset 0 count 1
> item 158 key (22732517376 EXTENT_ITEM 16384) itemoff 6485 itemsize 53
> refs 1 gen 821 flags DATA
> extent data backref root 287 objectid 51666 offset 0 count 1
> item 159 key (22732533760 EXTENT_ITEM 16384) itemoff 6485 itemsize 0
> print-tree.c:428: print_extent_item: BUG_ON `item_size != sizeof(*ei0)` 
> triggered, value 1
> btrfs(+0x365c6)[0x55bdfaada5c6]
> btrfs(print_extent_item+0x424)[0x55bdfaadb284]
> btrfs(btrfs_print_leaf+0x94e)[0x55bdfaadbc1e]
> btrfs(btrfs_print_tree+0x295)[0x55bdfaadcf05]
> btrfs(cmd_inspect_dump_tree+0x734)[0x55bdfab1b024]
> btrfs(main+0x7d)[0x55bdfaac7d4d]
> /lib64/libc.so.6(__libc_start_main+0xea)[0x7ff42100ff4a]
> btrfs(_start+0x2a)[0x55bdfaac7e5a]
> Aborted (core dumped)

   Wow, I've never seen it do that before. It's the next thing I'd
have asked for, so it's good you've preempted it.

   The main thing is that bad key ordering is almost always due to RAM
corruption. That's either bad RAM, or dodgy power regulation -- the
latter could be the PSU, or capacitors on the motherboard. (In this
case, it might also be something funny with the battery).

   I would definitely recommend a long run of memtest86. At least 8
hours, preferably 24. If you get errors repeatedly in the sme place,
it's the RAM. If they appear randomly, it's probably the power
regulation.

[snip]

> 
> The filesystem had become pretty full, I had planned to increase the
> Btrfs-partition size before it became corrupt.
> 
> Active kernel when the filesystem went read only: OpenSUSE Linux
> 4.14.14-1.geef6178-default, from the
> http://download.opensuse.org/repositories/Kernel:/stable/standard/stable
> repository.
> 
> Fstab mount options: noatime,autodefrag (I have been using the option
> nossd with older kernels one period in the past on the filesystem).
> 
> If it matters, I have been running duperemove many times on the
> filesystem since creation.
> 
> To test the RAM, I have been running mprime Blend-test for 24 hours
> after the corruption without any error or warning.

   Of all of the bad key order errors I've seen (dozens), I think
there were a whole two which turned out not to be obviously related to
corrupt RAM. I still say that it's most likely the hardware.

> Is there a way I can try to repair this filesystem without the need to
> recreate it and reinstall the operating system? A reinstall including
> all currently installed packages, and restoring all current system
> settings, would probably take some time for me to do.
> If it is currently not repairable, it would be nice if this kind of
> corruption could be repaired in the future, even if losing a few
> files. Or if the corruptions could be avoided in the 

Re: Periodic frame losses when recording to btrfs volume with OBS

2018-01-22 Thread Sebastian Ochmann

Hello,

I attached to the ffmpeg-mux process for a little while and pasted the 
result here:


https://pastebin.com/XHaMLX8z

Can you help me with interpreting this result? If you'd like me to run 
strace with specific options, please let me know. This is a level of 
debugging I'm not dealing with on a daily basis. :)


Best regards
Sebastian


On 22.01.2018 20:08, Chris Mason wrote:

On 01/22/2018 01:33 PM, Sebastian Ochmann wrote:

[ skipping to the traces ;) ]


2866 ffmpeg-mux D
[] btrfs_start_ordered_extent+0x101/0x130 [btrfs]
[] lock_and_cleanup_extent_if_need+0x340/0x380 [btrfs]
[] __btrfs_buffered_write+0x261/0x740 [btrfs]
[] btrfs_file_write_iter+0x20f/0x650 [btrfs]
[] __vfs_write+0xf9/0x170
[] vfs_write+0xad/0x1a0
[] SyS_write+0x52/0xc0
[] entry_SYSCALL_64_fastpath+0x1a/0x7d
[] 0x


This is where we wait for writes that are already in flight before we're 
allowed to redirty those pages in the file.  It'll happen when we either 
overwrite a page in the file that we've already written, or when we're 
trickling down writes slowly in non-4K aligned writes.


You can probably figure out pretty quickly which is the case by stracing 
ffmpeg-mux.  Since lower dirty ratios made it happen more often for you, 
my guess is the app is sending down unaligned writes.


-chris



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Periodic frame losses when recording to btrfs volume with OBS

2018-01-22 Thread Chris Mason

On 01/22/2018 01:33 PM, Sebastian Ochmann wrote:

[ skipping to the traces ;) ]


2866 ffmpeg-mux D
[] btrfs_start_ordered_extent+0x101/0x130 [btrfs]
[] lock_and_cleanup_extent_if_need+0x340/0x380 [btrfs]
[] __btrfs_buffered_write+0x261/0x740 [btrfs]
[] btrfs_file_write_iter+0x20f/0x650 [btrfs]
[] __vfs_write+0xf9/0x170
[] vfs_write+0xad/0x1a0
[] SyS_write+0x52/0xc0
[] entry_SYSCALL_64_fastpath+0x1a/0x7d
[] 0x


This is where we wait for writes that are already in flight before we're 
allowed to redirty those pages in the file.  It'll happen when we either 
overwrite a page in the file that we've already written, or when we're 
trickling down writes slowly in non-4K aligned writes.


You can probably figure out pretty quickly which is the case by stracing 
ffmpeg-mux.  Since lower dirty ratios made it happen more often for you, 
my guess is the app is sending down unaligned writes.


-chris


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Periodic frame losses when recording to btrfs volume with OBS

2018-01-22 Thread Sebastian Ochmann
First off, thank you for all the responses! Let me reply to multiple 
suggestions at once in this mail.


On 22.01.2018 01:39, Qu Wenruo wrote:

Either such mount option has a bug, or some unrelated problem.

As you mentioned the output is about 10~50MiB/s, 30s means 300~1500MiBs.
Maybe it's related to the dirty data amount?

Would you please verify if a lower or higher profile (resulting much
larger or smaller data stream) would affect?


A much lower rate seems to mitigate the problem somewhat, however I'm 
talking about low single-digit MB/s when the problem seems to vanish. 
But even with low, but more realistic, amounts of data the drops still 
happen.



Despite that, I'll dig to see if commit= option has any bug.

And you could also try the nospace_cache mount option provided by Chris
Murphy, which may also help.


I tried the nospace_cache option but it doesn't seem to make a 
difference to me.


On 22.01.2018 15:27, Chris Mason wrote:
> This could be a few different things, trying without the space cache was
> already suggested, and that's a top suspect.
>
> How does the application do the writes?  Are they always 4K aligned or
> does it send them out in odd sizes?
>
> The easiest way to nail it down is to use offcputime from the iovisor
> project:
>
>
> https://github.com/iovisor/bcc/blob/master/tools/offcputime.py
>
> If you haven't already configured this it may take a little while, but
> it's the perfect tool for this problem.
>
> Otherwise, if the stalls are long enough you can try to catch it with
> /proc//stack.  I've attached a helper script I often use to dump
> the stack trace of all the tasks in D state.
>
> Just run walker.py and it should give you something useful.  You can use
> walker.py -a to see all the tasks instead of just D state.  This just
> walks /proc//stack, so you'll need to run it as someone with
> permissions to see the stack traces of the procs you care about.
>
> -chris

I tried the walker.py script and was able to catch stack traces when the 
lag happens. I'm pasting two traces at the end of this mail - one when 
it happened using a USB-connected HDD and one when it happened on a SATA 
SSD. The latter is encrypted, hence the dmcrypt_write process. Note 
however that my original problem appeared on a SSD that was not encrypted.


In reply to the mail by Duncan:

64 GB RAM...

Do you know about the /proc/sys/vm/dirty_* files and how to use/tweak 
them?  If not, read $KERNDIR/Documentation/sysctl/vm.txt, focusing on 
these files.


At least I have never tweaked those settings yet. I certainly didn't 
know about the foreground/background distinction, that is really 
interesting. Thank you for the very extensive info and guide btw!


So try setting something a bit more reasonable and see if it helps.  That 
1% ratio at 16 GiB RAM for ~160 MB was fine for me, but I'm not doing 
critical streaming, and at 64 GiB you're looking at ~640 MB per 1%, as I 
said, too chunky.  For streaming, I'd suggest something approaching the 
value of your per-second IO bandwidth, we're assuming 100 MB/sec here so 
100 MiB but let's round that up to a nice binary 128 MiB, for the 
background value, perhaps half a GiB or 5 seconds worth of writeback time 
for foreground, 4 times the background value.  So:


vm.dirty_background_bytes = 134217728   # 128*1024*1024, 128 MiB
vm.dirty_bytes = 536870912  # 512*1024*1024, 512 MiB


Now I have good and bad news. The good news is that setting these 
tunables to different values does change something. The bad news is that 
lowering these values only seems to let the lag and frame drops happen 
quicker/more frequently. I have also tried lowering the background bytes 
to, say, 128 MB but the non-background bytes to 1 or 2 GB, but even the 
background task seems to already have a bad enough effect to start 
dropping frames. :( When writing to the SSD, the effect seems to be 
mitigated a little bit, but still frame drops are quickly occurring 
which is unacceptable given that the system is generally able to do better.


By the way, as you can see from the stack traces, in the SSD case blk_mq 
is in use.


But I know less about that stuff and it's googlable, should you decide to 
try playing with it too.  I know what the dirty_* stuff does from 
personal experience. =:^)


"I know what the dirty_* stuff does from personal experience. =:^)" 
sounds quite interesting... :D



Best regards and thanks again
Sebastian


First stack trace:

690 usb-storage D
[] usb_sg_wait+0xf4/0x150 [usbcore]
[] usb_stor_bulk_transfer_sglist.part.1+0x63/0xb0 
[usb_storage]

[] usb_stor_bulk_srb+0x49/0x80 [usb_storage]
[] usb_stor_Bulk_transport+0x163/0x3d0 [usb_storage]
[] usb_stor_invoke_transport+0x37/0x4c0 [usb_storage]
[] usb_stor_control_thread+0x1d8/0x2c0 [usb_storage]
[] kthread+0x118/0x130
[] ret_from_fork+0x1f/0x30
[] 0x

2505 kworker/u16:2 D
[] io_schedule+0x12/0x40
[] wbt_wait+0x1b8/0x340
[] blk_mq_make_request+0xe6/0x6e0
[] 

Re: [PATCH] btrfs: remove spurious WARN_ON(ref->count) in find_parent_nodes

2018-01-22 Thread Zygo Blaxell
On Mon, Jan 22, 2018 at 11:34:52AM +0800, Lu Fengqi wrote:
> On Sun, Jan 21, 2018 at 02:08:58PM -0500, Zygo Blaxell wrote:
> >This warning appears during execution of the LOGICAL_INO ioctl and
> >appears to be spurious:
> >
> > [ cut here ]
> > WARNING: CPU: 3 PID: 18172 at fs/btrfs/backref.c:1391 
> > find_parent_nodes+0xc41/0x14e0
> > Modules linked in: ib_iser rdma_cm iw_cm ib_cm ib_core configfs 
> > iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi overlay r8169 ufs qnx4 
> > hfsplus hfs minix ntfs vfat msdos fat jfs xfs cpuid rpcsec_gss_krb5 nfsv4 
> > nfsv3 nfs fscache algif_skcipher af_alg softdog nfsd auth_rpcgss nfs_acl 
> > lockd grace sunrpc bnep cpufreq_userspace cpufreq_powersave 
> > cpufreq_conservative nfnetlink_queue nfnetlink_log nfnetlink bluetooth 
> > rfkill snd_seq_dummy snd_hrtimer snd_seq_midi snd_seq_oss 
> > snd_seq_midi_event snd_rawmidi snd_seq snd_seq_device binfmt_misc fuse nbd 
> > xt_REDIRECT nf_nat_redirect ipt_REJECT nf_reject_ipv4 xt_nat xt_conntrack 
> > xt_tcpudp nf_log_ipv4 nf_log_common xt_LOG ip6table_nat nf_conntrack_ipv6 
> > nf_defrag_ipv6 nf_nat_ipv6 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 
> > nf_nat_ipv4 nf_nat nf_conntrack
> >  ip6table_mangle iptable_mangle ip6table_filter ip6_tables 
> > iptable_filter ip_tables x_tables tcp_cubic dummy lp dm_crypt edac_mce_amd 
> > edac_core snd_hda_codec_hdmi ppdev kvm_amd kvm irqbypass crct10dif_pclmul 
> > crc32_pclmul ghash_clmulni_intel snd_hda_codec_via pcbc amdkfd 
> > snd_hda_codec_generic amd_iommu_v2 aesni_intel snd_hda_intel radeon 
> > snd_hda_codec aes_x86_64 snd_hda_core snd_hwdep crypto_simd glue_helper sg 
> > snd_pcm_oss cryptd input_leds joydev pcspkr serio_raw snd_mixer_oss 
> > rtc_cmos snd_pcm parport_pc parport shpchp wmi acpi_cpufreq evdev snd_timer 
> > asus_atk0110 k10temp fam15h_power snd soundcore sp5100_tco hid_generic ipv6 
> > af_packet crc_ccitt raid10 raid456 async_raid6_recov async_memcpy async_pq 
> > async_xor async_tx libcrc32c raid0 multipath linear dm_mod raid1 md_mod 
> > ohci_pci ide_pci_generic
> >  sr_mod cdrom pdc202xx_new ohci_hcd crc32c_intel atiixp ehci_pci 
> > psmouse ide_core i2c_piix4 ehci_hcd xhci_pci mii xhci_hcd [last unloaded: 
> > r8169]
> > CPU: 3 PID: 18172 Comm: bees Tainted: G  D WL  4.11.9-zb64+ #1
> > Hardware name: System manufacturer System Product Name/M5A78L-M/USB3, 
> > BIOS 210112/02/2014
> > Call Trace:
> >  dump_stack+0x85/0xc2
> >  __warn+0xd1/0xf0
> >  warn_slowpath_null+0x1d/0x20
> >  find_parent_nodes+0xc41/0x14e0
> >  __btrfs_find_all_roots+0xad/0x120
> >  ? extent_same_check_offsets+0x70/0x70
> >  iterate_extent_inodes+0x168/0x300
> >  iterate_inodes_from_logical+0x87/0xb0
> >  ? iterate_inodes_from_logical+0x87/0xb0
> >  ? extent_same_check_offsets+0x70/0x70
> >  btrfs_ioctl+0x8ac/0x2820
> >  ? lock_acquire+0xc2/0x200
> >  do_vfs_ioctl+0x91/0x700
> >  ? __fget+0x112/0x200
> >  SyS_ioctl+0x79/0x90
> >  entry_SYSCALL_64_fastpath+0x23/0xc6
> > RIP: 0033:0x7f727b20be07
> > RSP: 002b:7f7279f1e018 EFLAGS: 0246 ORIG_RAX: 0010
> > RAX: ffda RBX: 9c0f4d7f RCX: 7f727b20be07
> > RDX: 7f7279f1e118 RSI: c0389424 RDI: 0003
> > RBP: 0035 R08: 7f72581bf340 R09: 
> > R10: 0020 R11: 0246 R12: 0040
> > R13: 7f725818d230 R14: 7f7279f1b640 R15: 7f725820
> >  ? trace_hardirqs_off_caller+0x1f/0x140
> > ---[ end trace 5de243350f6762c6 ]---
> > [ cut here ]
> >
> >ref->count can be below zero under normal conditions (for delayed refs),
> >so there is no need to spam dmesg when it happens.
> >
> 
> Added Edmund.
> 
> Hi,
> 
> I've also encountered the same problem when running the test case
> xfstests/btrfs/004. However, I'm not sure whether the negative ref->count
> is reasonable.
> 
> IMO, these functions (such as add_delayed_refs, add_delayed_refs,
> add_delayed_refs, add_missing_keys and resolve_indirect_refs) have been
> executed at this point in time. Hence, these references not only include
> these refs in the memory (delayed) but also include those refs in the disk
> (inline/keyed). 

I don't have the complete picture, but while looking at other code, comments,
and git log messages surrounding ref->count in btrfs, I found:

  * ref->count starts off at -1 (for a delayed ref).  During the process
of becoming a non-delayed ref, ref->count is incremented until it
is positive.

  * refs are only deleted during a positive-to-zero transition of
ref->count, not a negative-to-zero transition.

  * ref->count is sometimes incremented by more than one, specifically 2
in some cases (e.g. when the ref is attached to an inode?).
This would skip directly from ref->count == -1 to ref->count ==
1 without 

Re: [PATCH] btrfs: remove spurious WARN_ON(ref->count) in find_parent_nodes

2018-01-22 Thread Zygo Blaxell
On Mon, Jan 22, 2018 at 09:06:23PM +0800, Lu Fengqi wrote:
> On Mon, Jan 22, 2018 at 02:38:42PM +0200, Nikolay Borisov wrote:
> >
> >
> >On 22.01.2018 14:19, Lu Fengqi wrote:
> >> On 01/22/2018 04:46 PM, Nikolay Borisov wrote:
> >>>
> >>>
> >>> On 22.01.2018 05:34, Lu Fengqi wrote:
>  According to my bisect result, The frequency of the warning occurrence
>  increased to the detectable degree after this patch
> >>>
> >>> That sentence implies that even before Ed's patch it was possible to
> >>> trigger those warnings, is that true? Personally I've never seen such
> >>> warnings while executing btrfs/004. How do you configure the filesystem
> >>> for the test runs?
> >>>
> >> 
> >> Just only default mount option.
> >> 
> >> ➜  xfstests-dev git:(master) for i in $(seq 1 100); do echo $i; if !
> >> sudo ./check btrfs/004; then break; fi; done
> >> 1
> >> 
> >> FSTYP -- btrfs
> >> 
> >> PLATFORM  -- Linux/x86_64 sarch 4.15.0-rc9
> >> 
> >> MKFS_OPTIONS  -- /dev/vdd1
> >> 
> >> MOUNT_OPTIONS -- /dev/vdd1 /mnt/scratch
> >> 
> >> 
> >> 
> >> 
> >> btrfs/004 47s ... 49s
> >> 
> >> Ran: btrfs/004
> >> 
> >> Passed all 1 tests
> >> 
> >> 
> >> 
> >> 
> >> 2
> >> 
> >> FSTYP -- btrfs
> >> 
> >> PLATFORM  -- Linux/x86_64 sarch 4.15.0-rc9
> >> 
> >> MKFS_OPTIONS  -- /dev/vdd1
> >> 
> >> MOUNT_OPTIONS -- /dev/vdd1 /mnt/scratch
> >> 
> >> 
> >> 
> >> 
> >> btrfs/004 49s ... 52s
> >> 
> >> _check_dmesg: something found in dmesg (see
> >> /home/luke/workspace/xfstests-dev/results//btrfs/004.dmesg)
> >> 
> >> Ran: btrfs/004
> >> 
> >> Failures: btrfs/004
> >> 
> >> Failed 1 of 1 tests
> >> 
> >> The probability of this warning appearing is rather low, and I only
> >> encountered 52 warnings when I looped 1008 times btrfs/004 for 20 hours
> >> in 4.15-rc6 (IOW, the probability is nearly 5%). So you want to trigger
> >> warning also need more luck or patience.
> >
> >Thanks but is this before or after the mentioned commit below?
> >
> 
> After this commit. The bisect condition I use to locate this commit is
> to repeat btrfs/004 20 times without warning (This may not be accurate enough,
> can only be used as a reference). 

I have been seeing this warning since at least 2015 (v3.18?),
possibly earlier.  In the past it has never been correlated to any
event I've need to take action to correct (i.e. no data corruption,
no crashes, no hangs, no filesystem damage, and no obvious functional
failures in userspace).

In v4.14 nothing seems to have changed, except the warning now appears
three orders of magnitude more often.  This spams console terminals and
kernel logs with gigabytes of stacktrace and bumps this phenomenon up
to the top of my priority list.

It looks like the warning has been there with only minor editorial changes
since Jan Schmidt's 2011 commit "Btrfs: added btrfs_find_all_roots()"
in v3.3-rc1.

> Maybe Zygo has found a finer way to reproduce
> it, so he reproduce this warning more frequently than me.

It's not really a finer way, but bees hits this warning most often,
sometimes many times per second in bursts lasting minutes at a time.

btrfs balance also hits the warning occasionally (it was the most common
trigger of that warning in 2015 before I was running bees everywhere).

The net effect of the bees worker loop looks fairly similar to btrfs/004,
basically calling LOGICAL_INO many times per second on a busy filesystem.

bees focuses its activity on active parts of the filesystem, which
means it's more likely to do backref walks against extents that are also
being affected by user activity and therefore more likely to encounter
delayed refs.

Contrast with 'btrfs balance' which spreads its effect across the entire
filesystem and is much less likely to collide with user activity.

Every duplicate extent hit in bees uses LOGICAL_INO at least once to map
a stored duplicate block bytenr back to something that can be passed to
open() and FILE_EXTENT_SAME.  The warnings do arrive in bursts at the
same time as bees hitting clusters of duplicate extents.



> >
> >> 
>  86d5f9944252 ("btrfs: convert prelimary reference tracking to use
>  rbtrees")
>  is committed. I understand that this does not mean that this patch
>  caused
>  the problem, but maybe Edmund can give us some help, so I added him
>  to the
>  recipient.
> >>>
> >>>
> >> 
> >> 
> >
> >
> 
> -- 
> Thanks,
> Lu
> 
> 


signature.asc
Description: PGP signature


Re: [PATCH] btrfs: remove spurious WARN_ON(ref->count) in find_parent_nodes

2018-01-22 Thread Edmund Nadolski


On 01/21/2018 08:34 PM, Lu Fengqi wrote:
> On Sun, Jan 21, 2018 at 02:08:58PM -0500, Zygo Blaxell wrote:
>> This warning appears during execution of the LOGICAL_INO ioctl and
>> appears to be spurious:
>>
>>  [ cut here ]
>>  WARNING: CPU: 3 PID: 18172 at fs/btrfs/backref.c:1391 
>> find_parent_nodes+0xc41/0x14e0
>>  Modules linked in: ib_iser rdma_cm iw_cm ib_cm ib_core configfs 
>> iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi overlay r8169 ufs qnx4 
>> hfsplus hfs minix ntfs vfat msdos fat jfs xfs cpuid rpcsec_gss_krb5 nfsv4 
>> nfsv3 nfs fscache algif_skcipher af_alg softdog nfsd auth_rpcgss nfs_acl 
>> lockd grace sunrpc bnep cpufreq_userspace cpufreq_powersave 
>> cpufreq_conservative nfnetlink_queue nfnetlink_log nfnetlink bluetooth 
>> rfkill snd_seq_dummy snd_hrtimer snd_seq_midi snd_seq_oss snd_seq_midi_event 
>> snd_rawmidi snd_seq snd_seq_device binfmt_misc fuse nbd xt_REDIRECT 
>> nf_nat_redirect ipt_REJECT nf_reject_ipv4 xt_nat xt_conntrack xt_tcpudp 
>> nf_log_ipv4 nf_log_common xt_LOG ip6table_nat nf_conntrack_ipv6 
>> nf_defrag_ipv6 nf_nat_ipv6 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 
>> nf_nat_ipv4 nf_nat nf_conntrack
>>   ip6table_mangle iptable_mangle ip6table_filter ip6_tables 
>> iptable_filter ip_tables x_tables tcp_cubic dummy lp dm_crypt edac_mce_amd 
>> edac_core snd_hda_codec_hdmi ppdev kvm_amd kvm irqbypass crct10dif_pclmul 
>> crc32_pclmul ghash_clmulni_intel snd_hda_codec_via pcbc amdkfd 
>> snd_hda_codec_generic amd_iommu_v2 aesni_intel snd_hda_intel radeon 
>> snd_hda_codec aes_x86_64 snd_hda_core snd_hwdep crypto_simd glue_helper sg 
>> snd_pcm_oss cryptd input_leds joydev pcspkr serio_raw snd_mixer_oss rtc_cmos 
>> snd_pcm parport_pc parport shpchp wmi acpi_cpufreq evdev snd_timer 
>> asus_atk0110 k10temp fam15h_power snd soundcore sp5100_tco hid_generic ipv6 
>> af_packet crc_ccitt raid10 raid456 async_raid6_recov async_memcpy async_pq 
>> async_xor async_tx libcrc32c raid0 multipath linear dm_mod raid1 md_mod 
>> ohci_pci ide_pci_generic
>>   sr_mod cdrom pdc202xx_new ohci_hcd crc32c_intel atiixp ehci_pci 
>> psmouse ide_core i2c_piix4 ehci_hcd xhci_pci mii xhci_hcd [last unloaded: 
>> r8169]
>>  CPU: 3 PID: 18172 Comm: bees Tainted: G  D WL  4.11.9-zb64+ #1
>>  Hardware name: System manufacturer System Product Name/M5A78L-M/USB3, 
>> BIOS 210112/02/2014
>>  Call Trace:
>>   dump_stack+0x85/0xc2
>>   __warn+0xd1/0xf0
>>   warn_slowpath_null+0x1d/0x20
>>   find_parent_nodes+0xc41/0x14e0
>>   __btrfs_find_all_roots+0xad/0x120
>>   ? extent_same_check_offsets+0x70/0x70
>>   iterate_extent_inodes+0x168/0x300
>>   iterate_inodes_from_logical+0x87/0xb0
>>   ? iterate_inodes_from_logical+0x87/0xb0
>>   ? extent_same_check_offsets+0x70/0x70
>>   btrfs_ioctl+0x8ac/0x2820
>>   ? lock_acquire+0xc2/0x200
>>   do_vfs_ioctl+0x91/0x700
>>   ? __fget+0x112/0x200
>>   SyS_ioctl+0x79/0x90
>>   entry_SYSCALL_64_fastpath+0x23/0xc6
>>  RIP: 0033:0x7f727b20be07
>>  RSP: 002b:7f7279f1e018 EFLAGS: 0246 ORIG_RAX: 0010
>>  RAX: ffda RBX: 9c0f4d7f RCX: 7f727b20be07
>>  RDX: 7f7279f1e118 RSI: c0389424 RDI: 0003
>>  RBP: 0035 R08: 7f72581bf340 R09: 
>>  R10: 0020 R11: 0246 R12: 0040
>>  R13: 7f725818d230 R14: 7f7279f1b640 R15: 7f725820
>>   ? trace_hardirqs_off_caller+0x1f/0x140
>>  ---[ end trace 5de243350f6762c6 ]---
>>  [ cut here ]
>>
>> ref->count can be below zero under normal conditions (for delayed refs),
>> so there is no need to spam dmesg when it happens.
>>
> 
> Added Edmund.
> 
> Hi,
> 
> I've also encountered the same problem when running the test case
> xfstests/btrfs/004. However, I'm not sure whether the negative ref->count
> is reasonable.
> 
> IMO, these functions (such as add_delayed_refs, add_delayed_refs,
> add_delayed_refs, add_missing_keys and resolve_indirect_refs) have been
> executed at this point in time. Hence, these references not only include
> these refs in the memory (delayed) but also include those refs in the disk
> (inline/keyed). I would appreciate it if you could explain to me why the
> reference count can be reduced to less than zero.

It’s not clear to me that a negative count in this case is expected. For
direct refs the code gets the count from the btrfs_shared_data_ref (in
the item) or the  ref_mod (in the delayed ref). For the latter it
_should_ convert a negative count to positive when using as a
prelim_ref.  An indirect prelim_ref can also be converted from a direct
in which case it uses the count from the indirect.

I haven't seen this previously but will continue to test.  Please let me
know of any additional ways to trigger this.

Thanks,
Ed

> 
>> On kernel v4.14 this warning occurs 100-1000 

Re: [PATCH RESEND v4 0/4] device_list_add() peparation to add reappearing missing device

2018-01-22 Thread David Sterba
On Mon, Jan 22, 2018 at 09:31:47PM +0800, Anand Jain wrote:
>   Problem was mainly due to the patch 3/4, which tried to access the
>   return pointer even for the failed condition. The fix is to bring the
>   device point access under the else part as show below [2]. I have
>   included this fix in V5. Which is tested with btrfs xfstests.
>   Pls could you consider v5 for 4.16 ?

Hm ok, thre's still some time to test it. One more fstests report that
appeared before and also with the v5:

btrfs/007 4s ...[16:38:09] [16:38:12] [failed, exit status 1] - output 
mismatch (see 
/root/test/mmtests/work/sources/xfstests-git-installed/results//btrfs/007.out.bad)
--- tests/btrfs/007.out 2017-09-20 14:24:58.334716658 +0200 
+++ 
/root/test/mmtests/work/sources/xfstests-git-installed/results//btrfs/007.out.bad
   2018-01-22 16:38:12.883931593 +0100
@@ -1,4 +1,5 @@
 QA output created by 007
 *** test send / receive
-*** done
+failed: '/root/test/mmtests/work/sources/xfstests-git-installed/src/fssum 
-r /tmp/tmp.eZcr17wqNn/incr.fssum /root/test/mmtests/scratch_mnt/incr'
+(see 
/root/test/mmtests/work/sources/xfstests-git-installed/results//btrfs/007.full 
for details)
 *** unmount
...
(Run 'diff -u tests/btrfs/007.out 
/root/test/mmtests/work/sources/xfstests-git-installed/results//btrfs/007.out.bad'
  to see the entire diff)
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs check: backref lost, mismatch with its hash -- can't repair

2018-01-22 Thread ^m'e
Thanks for the quick reply, Qu!

I forgot to say that I see weird characters in the btrfs check repair
in lines "ERROR: DIR_ITEM... name ..." output. Although that can be
due to corruption, I seem to remember that a previous version of
btrfs-progs I used didn't show that...
I also see:

   [19428.934684] init_special_inode: bogus i_mode (700) for inode
sdb3:18446744073709551361

BTW, no sensible names in the debug output, and as far as I can see,
it might be all stuff in '[rootfs]/usr/portage': if that's the case,
corrupted inodes can be safely removed, as the portage package tree
can be easily rebuild. Here you are:

-->8-
# cat btrfs-debug.30039322.log
location key (30037910 INODE_ITEM 0) type DIR
transid 136248 data_len 0 name_len 3
name: bam
item 52 key (30037720 DIR_ITEM 508462201) itemoff 14104 itemsize 40
location key (30039832 INODE_ITEM 0) type DIR
transid 136248 data_len 0 name_len 10
name: suse-build
item 53 key (30037720 DIR_ITEM 541125215) itemoff 14070 itemsize 34
location key (30038354 INODE_ITEM 0) type DIR
transid 136248 data_len 0 name_len 4
name: cram
item 54 key (30037720 DIR_ITEM 543235706) itemoff 14035 itemsize 35
location key (30039133 INODE_ITEM 0) type DIR
transid 136248 data_len 0 name_len 5
name: lsuio
item 55 key (30037720 DIR_ITEM 586823170) itemoff 14000 itemsize 35
location key (30038846 INODE_ITEM 0) type DIR
transid 136248 data_len 0 name_len 5
name: geany
item 56 key (30037720 DIR_ITEM 603413733) itemoff 13938 itemsize 62
location key (30039322 INODE_ITEM 0) type DIR
transid 136248 data_len 0 name_len 32
name: obs-service-download_src_package
item 57 key (30037720 DIR_ITEM 623694194) itemoff 13903 itemsize 35
location key (30038092 INODE_ITEM 0) type DIR
transid 136248 data_len 0 name_len 5
name: byacc
item 58 key (30037720 DIR_ITEM 637448305) itemoff 13868 itemsize 35
location key (43374420 INODE_ITEM 0) type DIR
transid 200308 data_len 0 name_len 5
name: vpuml
item 59 key (30037720 DIR_ITEM 660989717) itemoff 13828 itemsize 40
location key (30038283 INODE_ITEM 0) type DIR
transid 136248 data_len 0 name_len 10
name: comparator
item 60 key (30037720 DIR_ITEM 666000672) itemoff 13782 itemsize 46
location key (30039257 INODE_ITEM 0) type DIR
transid 136248 data_len 0 name_len 16
name: molecule-plugins
item 61 key (30037720 DIR_ITEM 679217690) itemoff 13749 itemsize 33
location key (36281336 INODE_ITEM 0) type DIR
--
location key (30039292 INODE_ITEM 0) type DIR
transid 136248 data_len 0 name_len 15
name: nvidia-cuda-sdk
item 73 key (30037720 DIR_INDEX 238) itemoff 13448 itemsize 49
location key (30039299 INODE_ITEM 0) type DIR
transid 136248 data_len 0 name_len 19
name: nvidia-cuda-toolkit
item 74 key (30037720 DIR_INDEX 239) itemoff 13411 itemsize 37
location key (30039309 INODE_ITEM 0) type DIR
transid 136248 data_len 0 name_len 7
name: objconv
item 75 key (30037720 DIR_INDEX 240) itemoff 13361 itemsize 50
location key (30039314 INODE_ITEM 0) type DIR
transid 136248 data_len 0 name_len 20
name: obs-service-cpanspec
item 76 key (30037720 DIR_INDEX 241) itemoff 13305 itemsize 56
location key (30039318 INODE_ITEM 0) type DIR
transid 136248 data_len 0 name_len 26
name: obs-service-download_files
item 77 key (30037720 DIR_INDEX 242) itemoff 13243 itemsize 62
location key (30039322 INODE_ITEM 0) type DIR
transid 136248 data_len 0 name_len 32
name: obs-service-download_src_package
item 78 key (30037720 DIR_INDEX 243) itemoff 13189 itemsize 54
location key (30039326 INODE_ITEM 0) type DIR
transid 136248 data_len 0 name_len 24
name: obs-service-download_url
item 79 key (30037720 DIR_INDEX 244) itemoff 13135 itemsize 54
location key (30039330 INODE_ITEM 0) type DIR
transid 136248 data_len 0 name_len 24
name: obs-service-extract_file
item 80 key (30037720 DIR_INDEX 245) itemoff 13077 itemsize 58
location key (30039334 INODE_ITEM 0) type DIR
transid 136248 data_len 0 name_len 28
name: obs-service-format_spec_file
item 81 key (30037720 DIR_INDEX 246) itemoff 13007 itemsize 70
location key (30039338 INODE_ITEM 0) type DIR
transid 136248 data_len 0 name_len 40
name: obs-service-generator_driver_update_disk
item 82 key (30037720 DIR_INDEX 247) itemoff 12953 itemsize 54
location key (30039342 INODE_ITEM 0) type DIR
--
mtime 1504685599.188061317 (2017-09-06 08:13:19)
otime 1504685599.188061317 (2017-09-06 08:13:19)
item 73 key (30039320 

Re: btrfs check: backref lost, mismatch with its hash -- can't repair

2018-01-22 Thread Qu Wenruo


On 2018年01月22日 22:11, ^m'e wrote:
> Hi there,
> 
> After resuming from hibernation, my system shows weirdness; the
> following dmesg line alerted me:
> 
>   Jan 22 08:10:33 [kernel] BTRFS critical (device sdb3): invalid dir
> item name + data len: 3 + 32907

This is true problem.

No dir item should have such large size, so it's definitely a corruption.

But until you ls the dir containing the offending dir item, it shouldn't
cause problem.

> 
> Although I can boot (root is BTRFS), (re-)mount the concerned FS,
> succesfully scrub it (no error reported), I'm unable to repair... More
> info (using a sysrescue with static a btrfs-progs build -- the same
> installed on the main Funtoo system):
> 
>   # uname -a
>   Linux sysresccd 4.9.60-std512-amd64 #2 SMP Thu Nov 2 17:43:13 UTC
> 2017 x86_64 Intel(R) Core(TM) i7-4702MQ CPU @ 2.20GHz GenuineIntel
> GNU/Linux
> 
>   # ./btrfs.static --version
>   btrfs-progs v4.14.1
> 
>   # ./btrfs.static fi show
>Label: 'BTR-POOL'  uuid: de1723e2-150c-4448-bb36-be14d7d96093
> Total devices 1 FS bytes used 97.73GiB
> devid1 size 230.35GiB used 173.04GiB path /dev/sdb3
> 
>   # ./btrfs.static fi df /mnt/gentoo/
>   Data, single: total=167.00GiB, used=95.64GiB
>   System, single: total=32.00MiB, used=48.00KiB
>   Metadata, single: total=6.01GiB, used=2.09GiB
>   GlobalReserve, single: total=512.00MiB, used=0.00B
> 
>   # dmesg | grep BTRFS
>  [14963.234135] BTRFS info (device sdb3): disk space caching is enabled
>  [14963.234138] BTRFS info (device sdb3): has skinny extents
>  [14963.261054] BTRFS info (device sdb3): detected SSD devices,
> enabling SSD mode
>  [14963.261485] BTRFS info (device sdb3): checking UUID tree
> 
> 
> As for check and repair attempts:
> 
> --8<--
> # ./btrfs.static check -p --mode=lowmem --check-data-csum /dev/sdb3
> Checking filesystem on /dev/sdb3
> UUID: de1723e2-150c-4448-bb36-be14d7d96093
> ERROR: data extent[1862352896 425984] backref lost
> ERROR: data extent[1886453760 479232] backref lost
> ERROR: data extent[1902219264 524288] backref lost
> ERROR: data extent[1817378816 151552] backref lost
> ERROR: data extent[1799688192 57344] backref lost
> ERROR: data extent[1830277120 258048] backref lost
> ERROR: data extent[2558107648 1368064] backref lost

Not the best way to verify "backref lost" error.

Current lowmem mode is not 100% proven for such error detection, so it
may be false alerts, until original (no --repair) mode also reports such
problem.

But considering your later original mode repair is trying to insert
extent items for them, it's good to know they are not false alerts.

And for these errors, btrfs-progs repair should handle them well, so
they are not the main concern.

> ERROR: errors found in extent allocation tree or chunk allocation
> cache and super generation don't match, space cache will be invalidated
> ERROR: root 257 DIR_ITEM[30039322 4007295565] name  namelen 0 filetype
> 0 mismatch with its hash, wanted 4007295565 have 4294967294

And such error is what lowmem mode better at.

One dir item has mismatch hash with its data, either the data len is
corrupted or the hash.

> ERROR: root 257 INODE_ITEM[0] index 18446744073709551615 name
> filetype 0 missing
> ERROR: root 257 DIR_ITEM[30039322 4007295565] data_len shouldn't be 32907

Paired with the first error, it's obvious the namelen is corrupted.

> ERROR: root 257 DIR_ITEM[30039322 4007295565] name  namelen 3 filetype
> 0 mismatch with its hash, wanted 4007295565 have 987418363
> ERROR: root 257 INODE_ITEM[0] index 18446744073709551615 name
> filetype 0 missing

And some obvious index key corruption in INODE_REF.
The index is way too large for any sane fs.

So clearly a leaf (maybe more leaves) corruption.

And that's why btrfs uses DUP for its metadata as default.
(You just need to pay that cost since you choose to use single meta profile)

> ERROR: root 1385 INODE_ITEM[0] index 18446744073709551615 name
> metadata.xml filetype 1 missing
> ERROR: root 1385 DIR_ITEM[30039322 4007295565] name  namelen 0
> filetype 0 mismatch with its hash, wanted 4007295565 have 4294967294
> ERROR: root 1385 INODE_ITEM[0] index 18446744073709551615 name
> filetype 0 missing
> ERROR: root 1385 DIR_ITEM[30039322 4007295565] data_len shouldn't be 32907
> ERROR: root 1385 DIR_ITEM[30039322 4007295565] name  namelen 3
> filetype 0 mismatch with its hash, wanted 4007295565 have 987418363
> ERROR: root 1385 INODE_ITEM[0] index 18446744073709551615 name
> filetype 0 missing
> ERROR: root 1385 DIR ITEM[30039322 2] name  filetype 1 missing
> ERROR: root 1385 INODE REF[30039323, 30039322] name  filetype 1 missing
> ERROR: root 1385 DIR ITEM[30039322 3] name metadata.xml filetype 1 mismath
> ERROR: root 1385 DIR ITEM[30039322 5] name Manifest filetype 1 mismath
> ERROR: root 1385 INODE_ITEM[47302014] index 6 name metadata.xml
> filetype 1 missing
> ERROR: root 1385 INODE_ITEM[47302015] index 7 name 

Re: Periodic frame losses when recording to btrfs volume with OBS

2018-01-22 Thread Chris Mason

On 01/20/2018 05:47 AM, Sebastian Ochmann wrote:

Hello,

I would like to describe a real-world use case where btrfs does not 
perform well for me. I'm recording 60 fps, larger-than-1080p video using 
OBS Studio [1] where it is important that the video stream is encoded 
and written out to disk in real-time for a prolonged period of time (2-5 
hours). The result is a H264 video encoded on the GPU with a data rate 
ranging from approximately 10-50 MB/s.


The hardware used is powerful enough to handle this task. When I use a 
XFS volume for recording, no matter whether it's a SSD or HDD, the 
recording is smooth and no frame drops are reported (OBS has a nice 
Stats window where it shows the number of frames dropped due to encoding 
lag which seemingly also includes writing the data out to disk).


However, when using a btrfs volume I quickly observe severe, periodic 
frame drops. It's not single frames but larger chunks of frames that a 
dropped at a time. I tried mounting the volume with nobarrier but to no 
avail.


Of course, the simple fix is to use a FS that works for me(TM). However 
I thought since this is a common real-world use case I'd describe the 
symptoms here in case anyone is interested in analyzing this behavior. 
It's not immediately obvious that the FS makes such a difference. Also, 
if anyone has an idea what I could try to mitigate this issue (mount or 
mkfs options?) I can try that.


I saw this behavior on two different machines with kernels 4.14.13 and 
4.14.5, both Arch Linux. btrfs-progs 4.14, OBS 20.1.3-241-gf5c3af1b 
built from git.




This could be a few different things, trying without the space cache was 
already suggested, and that's a top suspect.


How does the application do the writes?  Are they always 4K aligned or 
does it send them out in odd sizes?


The easiest way to nail it down is to use offcputime from the iovisor 
project:



https://github.com/iovisor/bcc/blob/master/tools/offcputime.py

If you haven't already configured this it may take a little while, but 
it's the perfect tool for this problem.


Otherwise, if the stalls are long enough you can try to catch it with 
/proc//stack.  I've attached a helper script I often use to dump 
the stack trace of all the tasks in D state.


Just run walker.py and it should give you something useful.  You can use 
walker.py -a to see all the tasks instead of just D state.  This just 
walks /proc//stack, so you'll need to run it as someone with 
permissions to see the stack traces of the procs you care about.


-chris


#!/usr/bin/env python3
#
# this walks all the tasks on the system and prints out a stack trace
# of any tasks waiting in D state.  If you pass -a, it will print out
# the stack of every task it finds.
#
# It also makes a histogram of the common stacks so you can see where
# more of the tasks are.
#

import sys
import os

from optparse import OptionParser

usage = "usage: %prog [options]"
parser = OptionParser(usage=usage)
parser.add_option("-a", "--all_tasks", help="Dump all stacks", default=False,
  action="store_true")
parser.add_option("-s", "--smaps", help="Dump /proc/pid/smaps", default=False,
  action="store_true")
parser.add_option("-S", "--sort", help="/proc/pid/smaps sort key",
  default="size", type="str")
parser.add_option("-p", "--pid", help="Filter on pid", default=None,
 type="str")
parser.add_option("-c", "--command", help="Filter on command name",
 default=None, type="str")
parser.add_option("-f", "--files", help="don't collapse open files",
  default=False, action="store_true")
parser.add_option("-v", "--verbose", help="details++", default=False,
  action="store_true")
(options, args) = parser.parse_args()

stacks = {}

maps = {}

lines = []

# parse the units from a number and normalize into KB
def parse_number(s):
try:
words = s.split()
unit = words[-1].lower()
number = int(words[1])
tag = words[0].lower().rstrip(':')

# we store in kb
if unit == "mb":
number = number * 1024
elif unit == "gb":
number = number * 1024 * 1024
elif unit == "tb":
number = number * 1024 * 1024

return (tag, number)
except:
return (None, None)

# pretty print a number in KB with units
def print_number(num):
# we store the number in kb
units = ['KB', 'MB', 'GB', 'TB']
index = 0

while num > 1024:
num /= 1024
index += 1

final = float(num + num / 1024)
return (final, units[index])

# find a given line in the record and pretty print it
def print_line(header, record, tag):
num, unit = print_number(record[tag])

if options.verbose:
header = header + " "
else:
header = ""

print("\t%s%s: %.2f %s" % (header, tag.capitalize(), num, unit))


# print all the lines we care about in a given record
def 

btrfs check: backref lost, mismatch with its hash -- can't repair

2018-01-22 Thread ^m'e
Hi there,

After resuming from hibernation, my system shows weirdness; the
following dmesg line alerted me:

  Jan 22 08:10:33 [kernel] BTRFS critical (device sdb3): invalid dir
item name + data len: 3 + 32907

Although I can boot (root is BTRFS), (re-)mount the concerned FS,
succesfully scrub it (no error reported), I'm unable to repair... More
info (using a sysrescue with static a btrfs-progs build -- the same
installed on the main Funtoo system):

  # uname -a
  Linux sysresccd 4.9.60-std512-amd64 #2 SMP Thu Nov 2 17:43:13 UTC
2017 x86_64 Intel(R) Core(TM) i7-4702MQ CPU @ 2.20GHz GenuineIntel
GNU/Linux

  # ./btrfs.static --version
  btrfs-progs v4.14.1

  # ./btrfs.static fi show
   Label: 'BTR-POOL'  uuid: de1723e2-150c-4448-bb36-be14d7d96093
Total devices 1 FS bytes used 97.73GiB
devid1 size 230.35GiB used 173.04GiB path /dev/sdb3

  # ./btrfs.static fi df /mnt/gentoo/
  Data, single: total=167.00GiB, used=95.64GiB
  System, single: total=32.00MiB, used=48.00KiB
  Metadata, single: total=6.01GiB, used=2.09GiB
  GlobalReserve, single: total=512.00MiB, used=0.00B

  # dmesg | grep BTRFS
 [14963.234135] BTRFS info (device sdb3): disk space caching is enabled
 [14963.234138] BTRFS info (device sdb3): has skinny extents
 [14963.261054] BTRFS info (device sdb3): detected SSD devices,
enabling SSD mode
 [14963.261485] BTRFS info (device sdb3): checking UUID tree


As for check and repair attempts:

--8<--
# ./btrfs.static check -p --mode=lowmem --check-data-csum /dev/sdb3
Checking filesystem on /dev/sdb3
UUID: de1723e2-150c-4448-bb36-be14d7d96093
ERROR: data extent[1862352896 425984] backref lost
ERROR: data extent[1886453760 479232] backref lost
ERROR: data extent[1902219264 524288] backref lost
ERROR: data extent[1817378816 151552] backref lost
ERROR: data extent[1799688192 57344] backref lost
ERROR: data extent[1830277120 258048] backref lost
ERROR: data extent[2558107648 1368064] backref lost
ERROR: errors found in extent allocation tree or chunk allocation
cache and super generation don't match, space cache will be invalidated
ERROR: root 257 DIR_ITEM[30039322 4007295565] name  namelen 0 filetype
0 mismatch with its hash, wanted 4007295565 have 4294967294
ERROR: root 257 INODE_ITEM[0] index 18446744073709551615 name
filetype 0 missing
ERROR: root 257 DIR_ITEM[30039322 4007295565] data_len shouldn't be 32907
ERROR: root 257 DIR_ITEM[30039322 4007295565] name  namelen 3 filetype
0 mismatch with its hash, wanted 4007295565 have 987418363
ERROR: root 257 INODE_ITEM[0] index 18446744073709551615 name
filetype 0 missing
ERROR: root 1385 INODE_ITEM[0] index 18446744073709551615 name
metadata.xml filetype 1 missing
ERROR: root 1385 DIR_ITEM[30039322 4007295565] name  namelen 0
filetype 0 mismatch with its hash, wanted 4007295565 have 4294967294
ERROR: root 1385 INODE_ITEM[0] index 18446744073709551615 name
filetype 0 missing
ERROR: root 1385 DIR_ITEM[30039322 4007295565] data_len shouldn't be 32907
ERROR: root 1385 DIR_ITEM[30039322 4007295565] name  namelen 3
filetype 0 mismatch with its hash, wanted 4007295565 have 987418363
ERROR: root 1385 INODE_ITEM[0] index 18446744073709551615 name
filetype 0 missing
ERROR: root 1385 DIR ITEM[30039322 2] name  filetype 1 missing
ERROR: root 1385 INODE REF[30039323, 30039322] name  filetype 1 missing
ERROR: root 1385 DIR ITEM[30039322 3] name metadata.xml filetype 1 mismath
ERROR: root 1385 DIR ITEM[30039322 5] name Manifest filetype 1 mismath
ERROR: root 1385 INODE_ITEM[47302014] index 6 name metadata.xml
filetype 1 missing
ERROR: root 1385 INODE_ITEM[47302015] index 7 name metadata.xml
filetype 1 missing
ERROR: root 1385 DIR INODE [30039322] size 152 not equal to 163
ERROR: errors found in fs roots
found 104939749376 bytes used, error(s) found
total csum bytes: 100264520
total tree bytes: 15521644544
total fs tree bytes: 15299624960
total extent tree bytes: 88293376
btree space waste bytes: 3468278611
file data blocks allocated: 373752188928
 referenced 322517573632
--8<--


--8<--
# ./btrfs.static check -p  --repair /dev/sdb3
Fixed 0 roots.
enabling repair mode
Checking filesystem on /dev/sdb3
UUID: de1723e2-150c-4448-bb36-be14d7d96093
ref mismatch on [1799688192 57344] extent item 2, found 1
Incorrect local backref count on 1799688192 parent 186517012480 owner
0 offset 0 found 0 wanted 1 back 0xde7d020
Backref disk bytenr does not match extent record, bytenr=1799688192,
ref bytenr=0
backpointer mismatch on [1799688192 57344]
repair deleting extent record: key 1799688192 168 57344
adding new data backref on 1799688192 root 257 owner 47301992 offset 0 found 1
Repaired extent references for 1799688192
ref mismatch on [1817378816 151552] extent item 2, found 1
Incorrect local backref count on 1817378816 parent 186517012480 owner
0 

[PATCH v5 0/4] device_list_add() peparation to add reappearing missing device

2018-01-22 Thread Anand Jain
(Apply on top of my patchset
   [PATCH v4 0/6] preparatory work to add device forget
 for conflict free apply. They don't actually depend on
 each other though).

v4->v5:
 @3/4: Fix null pointer dereference of device pointer in fn
 btrfs_scan_one_device() when device_list_add() fails.

v3->v4:
 @3/4: Just return device instead of PTR_ERR(ERR_PTR(device));

v2->v3:
 Fix device_list_add() fn description which was still referring to the
 previous return values.

v1->v2:
 Drop patch 5/5 for uuid_mutex optimize. That was wrong. Thanks Josef.
 In patch 3/5 make btrfs_device * as return.

Cleanup of device_list_add(), mainly in preparation to handle
reappearing missing device which its next reroll will be sent
separately.


*** BLURB HERE ***

Anand Jain (4):
  btrfs: move pr_info into device_list_add
  btrfs: set the total_devices in device_list_add()
  btrfs: get device pointer from device_list_add()
  btrfs: drop devid as device_list_add() arg

 fs/btrfs/volumes.c | 65 +++---
 1 file changed, 28 insertions(+), 37 deletions(-)

-- 
2.7.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/4] btrfs: drop devid as device_list_add() arg

2018-01-22 Thread Anand Jain
As struct btrfs_disk_super is being passed, so it can get devid
the same way its parent does.

Signed-off-by: Anand Jain 
Reviewed-by: Josef Bacik 
---
 fs/btrfs/volumes.c | 8 +++-
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index a86c3a14ec89..03f2685a5018 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -733,12 +733,13 @@ static int btrfs_open_one_device(struct btrfs_fs_devices 
*fs_devices,
  * error pointer when failed
  */
 static noinline struct btrfs_device *device_list_add(const char *path,
-  struct btrfs_super_block *disk_super, u64 devid)
+  struct btrfs_super_block *disk_super)
 {
struct btrfs_device *device;
struct btrfs_fs_devices *fs_devices;
struct rcu_string *name;
u64 found_transid = btrfs_super_generation(disk_super);
+   u64 devid = btrfs_stack_device_id(_super->dev_item);
 
fs_devices = find_fsid(disk_super->fsid);
if (!fs_devices) {
@@ -1186,7 +1187,6 @@ int btrfs_scan_one_device(const char *path, fmode_t 
flags, void *holder,
struct block_device *bdev;
struct page *page;
int ret = 0;
-   u64 devid;
u64 bytenr;
 
/*
@@ -1207,10 +1207,8 @@ int btrfs_scan_one_device(const char *path, fmode_t 
flags, void *holder,
goto error_bdev_put;
}
 
-   devid = btrfs_stack_device_id(_super->dev_item);
-
mutex_lock(_mutex);
-   device = device_list_add(path, disk_super, devid);
+   device = device_list_add(path, disk_super);
mutex_unlock(_mutex);
if (IS_ERR(device))
ret = PTR_ERR(device);
-- 
2.7.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/4] btrfs: get device pointer from device_list_add()

2018-01-22 Thread Anand Jain
Instead of pointer to btrfs_fs_devices as an arg in device_list_add()
better to get pointer to btrfs_device as return value, then we have
both, pointer to btrfs_device and btrfs_fs_devices. btrfs_device is
needed to handle reappearing missing device.

Signed-off-by: Anand Jain 
---
 fs/btrfs/volumes.c | 34 ++
 1 file changed, 18 insertions(+), 16 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 4a4298017fe1..a86c3a14ec89 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -729,12 +729,11 @@ static int btrfs_open_one_device(struct btrfs_fs_devices 
*fs_devices,
  * Add new device to list of registered devices
  *
  * Returns:
- * 0   - device already known or newly added
- * < 0 - error
+ * device pointer which was just added or updated when successful
+ * error pointer when failed
  */
-static noinline int device_list_add(const char *path,
-  struct btrfs_super_block *disk_super,
-  u64 devid, struct btrfs_fs_devices **fs_devices_ret)
+static noinline struct btrfs_device *device_list_add(const char *path,
+  struct btrfs_super_block *disk_super, u64 devid)
 {
struct btrfs_device *device;
struct btrfs_fs_devices *fs_devices;
@@ -745,7 +744,7 @@ static noinline int device_list_add(const char *path,
if (!fs_devices) {
fs_devices = alloc_fs_devices(disk_super->fsid);
if (IS_ERR(fs_devices))
-   return PTR_ERR(fs_devices);
+   return ERR_PTR(PTR_ERR(fs_devices));
 
list_add(_devices->list, _uuids);
 
@@ -757,19 +756,19 @@ static noinline int device_list_add(const char *path,
 
if (!device) {
if (fs_devices->opened)
-   return -EBUSY;
+   return ERR_PTR(-EBUSY);
 
device = btrfs_alloc_device(NULL, ,
disk_super->dev_item.uuid);
if (IS_ERR(device)) {
/* we can safely leave the fs_devices entry around */
-   return PTR_ERR(device);
+   return device;
}
 
name = rcu_string_strdup(path, GFP_NOFS);
if (!name) {
free_device(device);
-   return -ENOMEM;
+   return ERR_PTR(-ENOMEM);
}
rcu_assign_pointer(device->name, name);
 
@@ -823,12 +822,12 @@ static noinline int device_list_add(const char *path,
 * with larger generation number or the last-in if
 * generation are equal.
 */
-   return -EEXIST;
+   return ERR_PTR(-EEXIST);
}
 
name = rcu_string_strdup(path, GFP_NOFS);
if (!name)
-   return -ENOMEM;
+   return ERR_PTR(-ENOMEM);
rcu_string_free(device->name);
rcu_assign_pointer(device->name, name);
if (test_bit(BTRFS_DEV_STATE_MISSING, >dev_state)) {
@@ -848,9 +847,7 @@ static noinline int device_list_add(const char *path,
 
fs_devices->total_devices = btrfs_super_num_devices(disk_super);
 
-   *fs_devices_ret = fs_devices;
-
-   return 0;
+   return device;
 }
 
 static struct btrfs_fs_devices *clone_fs_devices(struct btrfs_fs_devices *orig)
@@ -1185,9 +1182,10 @@ int btrfs_scan_one_device(const char *path, fmode_t 
flags, void *holder,
  struct btrfs_fs_devices **fs_devices_ret)
 {
struct btrfs_super_block *disk_super;
+   struct btrfs_device *device;
struct block_device *bdev;
struct page *page;
-   int ret;
+   int ret = 0;
u64 devid;
u64 bytenr;
 
@@ -1212,8 +1210,12 @@ int btrfs_scan_one_device(const char *path, fmode_t 
flags, void *holder,
devid = btrfs_stack_device_id(_super->dev_item);
 
mutex_lock(_mutex);
-   ret = device_list_add(path, disk_super, devid, fs_devices_ret);
+   device = device_list_add(path, disk_super, devid);
mutex_unlock(_mutex);
+   if (IS_ERR(device))
+   ret = PTR_ERR(device);
+   else
+   *fs_devices_ret = device->fs_devices;
 
btrfs_release_disk_super(page);
 
-- 
2.7.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/4] btrfs: set the total_devices in device_list_add()

2018-01-22 Thread Anand Jain
There is no other parent for device_list_add() except for
btrfs_scan_one_device(), which would set btrfs_fs_devices::total_devices
if device_list_add is successful and this can be done with in
device_list_add() itself.

Signed-off-by: Anand Jain 
Reviewed-by: Josef Bacik 
---
 fs/btrfs/volumes.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 3a272ae7f32d..4a4298017fe1 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -846,6 +846,8 @@ static noinline int device_list_add(const char *path,
if (!fs_devices->opened)
device->generation = found_transid;
 
+   fs_devices->total_devices = btrfs_super_num_devices(disk_super);
+
*fs_devices_ret = fs_devices;
 
return 0;
@@ -1187,7 +1189,6 @@ int btrfs_scan_one_device(const char *path, fmode_t 
flags, void *holder,
struct page *page;
int ret;
u64 devid;
-   u64 total_devices;
u64 bytenr;
 
/*
@@ -1209,12 +1210,9 @@ int btrfs_scan_one_device(const char *path, fmode_t 
flags, void *holder,
}
 
devid = btrfs_stack_device_id(_super->dev_item);
-   total_devices = btrfs_super_num_devices(disk_super);
 
mutex_lock(_mutex);
ret = device_list_add(path, disk_super, devid, fs_devices_ret);
-   if (!ret && fs_devices_ret)
-   (*fs_devices_ret)->total_devices = total_devices;
mutex_unlock(_mutex);
 
btrfs_release_disk_super(page);
-- 
2.7.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/4] btrfs: move pr_info into device_list_add

2018-01-22 Thread Anand Jain
Commit 60999ca4b403 ("btrfs: make device scan less noisy")
adds return value 1 to device_list_add(), so that parent function can
call pr_info only when new device is added. Move the pr_info() part
into device_list_add() so that this function can be kept simple.

Signed-off-by: Anand Jain 
Reviewed-by: Josef Bacik 
---
 fs/btrfs/volumes.c | 29 +++--
 1 file changed, 11 insertions(+), 18 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 9e743d289dfd..3a272ae7f32d 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -729,8 +729,7 @@ static int btrfs_open_one_device(struct btrfs_fs_devices 
*fs_devices,
  * Add new device to list of registered devices
  *
  * Returns:
- * 1   - first time device is seen
- * 0   - device already known
+ * 0   - device already known or newly added
  * < 0 - error
  */
 static noinline int device_list_add(const char *path,
@@ -740,7 +739,6 @@ static noinline int device_list_add(const char *path,
struct btrfs_device *device;
struct btrfs_fs_devices *fs_devices;
struct rcu_string *name;
-   int ret = 0;
u64 found_transid = btrfs_super_generation(disk_super);
 
fs_devices = find_fsid(disk_super->fsid);
@@ -780,9 +778,16 @@ static noinline int device_list_add(const char *path,
fs_devices->num_devices++;
mutex_unlock(_devices->device_list_mutex);
 
-   ret = 1;
device->fs_devices = fs_devices;
btrfs_free_stale_devices(path, device);
+
+   if (disk_super->label[0])
+   pr_info("BTRFS: device label %s devid %llu transid %llu 
%s\n",
+   disk_super->label, devid, found_transid, path);
+   else
+   pr_info("BTRFS: device fsid %pU devid %llu transid %llu 
%s\n",
+   disk_super->fsid, devid, found_transid, path);
+
} else if (!device->name || strcmp(device->name->str, path)) {
/*
 * When FS is already mounted.
@@ -843,7 +848,7 @@ static noinline int device_list_add(const char *path,
 
*fs_devices_ret = fs_devices;
 
-   return ret;
+   return 0;
 }
 
 static struct btrfs_fs_devices *clone_fs_devices(struct btrfs_fs_devices *orig)
@@ -1182,7 +1187,6 @@ int btrfs_scan_one_device(const char *path, fmode_t 
flags, void *holder,
struct page *page;
int ret;
u64 devid;
-   u64 transid;
u64 total_devices;
u64 bytenr;
 
@@ -1205,25 +1209,14 @@ int btrfs_scan_one_device(const char *path, fmode_t 
flags, void *holder,
}
 
devid = btrfs_stack_device_id(_super->dev_item);
-   transid = btrfs_super_generation(disk_super);
total_devices = btrfs_super_num_devices(disk_super);
 
mutex_lock(_mutex);
ret = device_list_add(path, disk_super, devid, fs_devices_ret);
-   if (ret >= 0 && fs_devices_ret)
+   if (!ret && fs_devices_ret)
(*fs_devices_ret)->total_devices = total_devices;
mutex_unlock(_mutex);
 
-   if (ret > 0) {
-   if (disk_super->label[0])
-   pr_info("BTRFS: device label %s ", disk_super->label);
-   else
-   pr_info("BTRFS: device fsid %pU ", disk_super->fsid);
-
-   pr_cont("devid %llu transid %llu %s\n", devid, transid, path);
-   ret = 0;
-   }
-
btrfs_release_disk_super(page);
 
 error_bdev_put:
-- 
2.7.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RESEND v4 0/4] device_list_add() peparation to add reappearing missing device

2018-01-22 Thread Anand Jain



On 01/20/2018 07:27 AM, David Sterba wrote:

On Thu, Jan 18, 2018 at 06:47:17PM +0100, David Sterba wrote:

On Thu, Jan 18, 2018 at 10:02:32PM +0800, Anand Jain wrote:

(Apply on top of my patchset
[PATCH v4 0/6] preparatory work to add device forget
  for conflict free apply. They don't actually depend on
  each other though).



Cleanup of device_list_add(), mainly in preparation to handle
reappearing missing device which its next reroll will be sent
separately.


I'm adding the two patchsets to the 4.16 queue but will push the updated
branch after the current tests finish and I also test the updated branch
as well.


So this did not survive the first fstests run, I'm going to move the patchset
to the 4.17 dev queue.

[ 2912.493351] run fstests btrfs/064 at 2018-01-19 20:55:50
[ 2914.218654] BTRFS: device fsid ee7e811a-fdb3-42e9-8a81-5ed8e1a4282b devid 1 
transid 5 /dev/sdb6
[ 2914.261560] BTRFS: device fsid ee7e811a-fdb3-42e9-8a81-5ed8e1a4282b devid 2 
transid 5 /dev/sdc5
[ 2914.296819] BTRFS: device fsid ee7e811a-fdb3-42e9-8a81-5ed8e1a4282b devid 3 
transid 5 /dev/sdb7
[ 2914.348140] BTRFS: device fsid ee7e811a-fdb3-42e9-8a81-5ed8e1a4282b devid 4 
transid 5 /dev/sdc6
[ 2914.389368] BTRFS: device fsid ee7e811a-fdb3-42e9-8a81-5ed8e1a4282b devid 5 
transid 5 /dev/sdb8
[ 2914.425378] BTRFS: device fsid ee7e811a-fdb3-42e9-8a81-5ed8e1a4282b devid 6 
transid 5 /dev/sdc7
[ 2914.443497] BTRFS: device fsid ee7e811a-fdb3-42e9-8a81-5ed8e1a4282b devid 7 
transid 5 /dev/sdb9
[ 2914.488145] BTRFS info (device sdb9): disk space caching is enabled
[ 2914.494744] BTRFS info (device sdb9): has skinny extents
[ 2914.500328] BTRFS info (device sdb9): flagging fs with big metadata feature
[ 2914.514809] BTRFS info (device sdb9): enabling ssd optimizations
[ 2914.522114] BTRFS info (device sdb9): creating UUID tree
[ 2914.716867] BTRFS info (device sdb9): dev_replace from /dev/sdc5 (devid 2) 
to /dev/sdc8 started
[ 2914.852699] BTRFS info (device sdb9): dev_replace from /dev/sdc5 (devid 2) 
to /dev/sdc8 finished
[ 2915.028666] BTRFS info (device sdb9): dev_replace from /dev/sdb7 (devid 3) 
to /dev/sdc5 started
[ 2915.110374] BTRFS info (device sdb9): dev_replace from /dev/sdb7 (devid 3) 
to /dev/sdc5 finished
[ 2915.309674] BTRFS info (device sdb9): dev_replace from /dev/sdc6 (devid 4) 
to /dev/sdb7 started
[ 2915.340819] BTRFS info (device sdb9): dev_replace from /dev/sdc6 (devid 4) 
to /dev/sdb7 finished
[ 2915.350220] BUG: unable to handle kernel NULL pointer dereference at 
0010
[ 2915.358350] IP: btrfs_scan_one_device+0x127/0x180 [btrfs]
[ 2915.358353] PGD 0 P4D 0
[ 2915.358366] Oops:  [#1] PREEMPT SMP
[ 2915.358493] CPU: 2 PID: 1076 Comm: systemd-udevd Tainted: GW
4.15.0-rc8-1.ge195904-vanilla+ #128
[ 2915.358495] Hardware name: empty empty/S3993, BIOS PAQEX0-3 02/24/2008
[ 2915.358534] RIP: 0010:btrfs_scan_one_device+0x127/0x180 [btrfs]


 I couldn't reproduce with btrfs/064 which ran for several iterations.
 But a script [1] could trigger the problem.

 [1]
---
 mkfs.btrfs -fq -draid1 -mraid1 /dev/sdb /dev/sdc
 modprobe -r btrfs
 mount -o degraded /dev/sdb /btrfs
 btrfs repl start -Bf 2 /dev/sdd /btrfs
 umount /btrfs
 modprobe -r btrfs
 btrfs dev scan
 btrfs dev scan /dev/sdc
---

 Problem was mainly due to the patch 3/4, which tried to access the
 return pointer even for the failed condition. The fix is to bring the
 device point access under the else part as show below [2]. I have
 included this fix in V5. Which is tested with btrfs xfstests.
 Pls could you consider v5 for 4.16 ?

[2]
-
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 462bae3627e3..a86c3a14ec89 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1214,8 +1214,8 @@ int btrfs_scan_one_device(const char *path, 
fmode_t flags, void *holder,

mutex_unlock(_mutex);
if (IS_ERR(device))
ret = PTR_ERR(device);
-
-   *fs_devices_ret = device->fs_devices;
+   else
+   *fs_devices_ret = device->fs_devices;

btrfs_release_disk_super(page);
--


Thanks, Anand


[ 2915.358537] RSP: 0018:b35a4524be30 EFLAGS: 00010206
[ 2915.358541] RAX: fff0 RBX: 0081 RCX: 000f
[ 2915.358544] RDX: 96a791c7e10b RSI: 0001 RDI: 96a79f734200
[ 2915.358546] RBP: 96a7a2ea6000 R08: 002b R09: 
[ 2915.358548] R10:  R11: 0004 R12: fff0
[ 2915.358550] R13: b35a4524be60 R14: f07d48471f80 R15: 55ee5775ad74
[ 2915.358554] FS:  7f736b3648c0() GS:96a7a6a0() 
knlGS:
[ 2915.358556] CS:  0010 DS:  ES:  CR0: 80050033
[ 2915.358558] CR2: 0010 CR3: 00021201e000 CR4: 06e0
[ 2915.358560] Call Trace:
[ 2915.358600]  btrfs_control_ioctl+0xad/0xe0 [btrfs]
[ 2915.358610]  ? trace_hardirqs_on_caller+0xf2/0x1a0
[ 2915.358618]  do_vfs_ioctl+0x90/0x6b0
[ 2915.358625]  

Re: [PATCH] btrfs: remove spurious WARN_ON(ref->count) in find_parent_nodes

2018-01-22 Thread Lu Fengqi
On Mon, Jan 22, 2018 at 02:38:42PM +0200, Nikolay Borisov wrote:
>
>
>On 22.01.2018 14:19, Lu Fengqi wrote:
>> On 01/22/2018 04:46 PM, Nikolay Borisov wrote:
>>>
>>>
>>> On 22.01.2018 05:34, Lu Fengqi wrote:
 According to my bisect result, The frequency of the warning occurrence
 increased to the detectable degree after this patch
>>>
>>> That sentence implies that even before Ed's patch it was possible to
>>> trigger those warnings, is that true? Personally I've never seen such
>>> warnings while executing btrfs/004. How do you configure the filesystem
>>> for the test runs?
>>>
>> 
>> Just only default mount option.
>> 
>> ➜  xfstests-dev git:(master) for i in $(seq 1 100); do echo $i; if !
>> sudo ./check btrfs/004; then break; fi; done
>> 1
>> 
>> FSTYP -- btrfs
>> 
>> PLATFORM  -- Linux/x86_64 sarch 4.15.0-rc9
>> 
>> MKFS_OPTIONS  -- /dev/vdd1
>> 
>> MOUNT_OPTIONS -- /dev/vdd1 /mnt/scratch
>> 
>> 
>> 
>> 
>> btrfs/004 47s ... 49s
>> 
>> Ran: btrfs/004
>> 
>> Passed all 1 tests
>> 
>> 
>> 
>> 
>> 2
>> 
>> FSTYP -- btrfs
>> 
>> PLATFORM  -- Linux/x86_64 sarch 4.15.0-rc9
>> 
>> MKFS_OPTIONS  -- /dev/vdd1
>> 
>> MOUNT_OPTIONS -- /dev/vdd1 /mnt/scratch
>> 
>> 
>> 
>> 
>> btrfs/004 49s ... 52s
>> 
>> _check_dmesg: something found in dmesg (see
>> /home/luke/workspace/xfstests-dev/results//btrfs/004.dmesg)
>> 
>> Ran: btrfs/004
>> 
>> Failures: btrfs/004
>> 
>> Failed 1 of 1 tests
>> 
>> The probability of this warning appearing is rather low, and I only
>> encountered 52 warnings when I looped 1008 times btrfs/004 for 20 hours
>> in 4.15-rc6 (IOW, the probability is nearly 5%). So you want to trigger
>> warning also need more luck or patience.
>
>Thanks but is this before or after the mentioned commit below?
>

After this commit. The bisect condition I use to locate this commit is
to repeat btrfs/004 20 times without warning (This may not be accurate enough,
can only be used as a reference). Maybe Zygo has found a finer way to reproduce
it, so he reproduce this warning more frequently than me.

>
>> 
 86d5f9944252 ("btrfs: convert prelimary reference tracking to use
 rbtrees")
 is committed. I understand that this does not mean that this patch
 caused
 the problem, but maybe Edmund can give us some help, so I added him
 to the
 recipient.
>>>
>>>
>> 
>> 
>
>

-- 
Thanks,
Lu


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: remove spurious WARN_ON(ref->count) in find_parent_nodes

2018-01-22 Thread Nikolay Borisov


On 21.01.2018 21:08, Zygo Blaxell wrote:
> This warning appears during execution of the LOGICAL_INO ioctl and
> appears to be spurious:
> 
>   [ cut here ]
>   WARNING: CPU: 3 PID: 18172 at fs/btrfs/backref.c:1391 
> find_parent_nodes+0xc41/0x14e0
>   Modules linked in: ib_iser rdma_cm iw_cm ib_cm ib_core configfs 
> iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi overlay r8169 ufs qnx4 
> hfsplus hfs minix ntfs vfat msdos fat jfs xfs cpuid rpcsec_gss_krb5 nfsv4 
> nfsv3 nfs fscache algif_skcipher af_alg softdog nfsd auth_rpcgss nfs_acl 
> lockd grace sunrpc bnep cpufreq_userspace cpufreq_powersave 
> cpufreq_conservative nfnetlink_queue nfnetlink_log nfnetlink bluetooth rfkill 
> snd_seq_dummy snd_hrtimer snd_seq_midi snd_seq_oss snd_seq_midi_event 
> snd_rawmidi snd_seq snd_seq_device binfmt_misc fuse nbd xt_REDIRECT 
> nf_nat_redirect ipt_REJECT nf_reject_ipv4 xt_nat xt_conntrack xt_tcpudp 
> nf_log_ipv4 nf_log_common xt_LOG ip6table_nat nf_conntrack_ipv6 
> nf_defrag_ipv6 nf_nat_ipv6 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 
> nf_nat_ipv4 nf_nat nf_conntrack
>ip6table_mangle iptable_mangle ip6table_filter ip6_tables 
> iptable_filter ip_tables x_tables tcp_cubic dummy lp dm_crypt edac_mce_amd 
> edac_core snd_hda_codec_hdmi ppdev kvm_amd kvm irqbypass crct10dif_pclmul 
> crc32_pclmul ghash_clmulni_intel snd_hda_codec_via pcbc amdkfd 
> snd_hda_codec_generic amd_iommu_v2 aesni_intel snd_hda_intel radeon 
> snd_hda_codec aes_x86_64 snd_hda_core snd_hwdep crypto_simd glue_helper sg 
> snd_pcm_oss cryptd input_leds joydev pcspkr serio_raw snd_mixer_oss rtc_cmos 
> snd_pcm parport_pc parport shpchp wmi acpi_cpufreq evdev snd_timer 
> asus_atk0110 k10temp fam15h_power snd soundcore sp5100_tco hid_generic ipv6 
> af_packet crc_ccitt raid10 raid456 async_raid6_recov async_memcpy async_pq 
> async_xor async_tx libcrc32c raid0 multipath linear dm_mod raid1 md_mod 
> ohci_pci ide_pci_generic
>sr_mod cdrom pdc202xx_new ohci_hcd crc32c_intel atiixp ehci_pci 
> psmouse ide_core i2c_piix4 ehci_hcd xhci_pci mii xhci_hcd [last unloaded: 
> r8169]
>   CPU: 3 PID: 18172 Comm: bees Tainted: G  D WL  4.11.9-zb64+ #1
>   Hardware name: System manufacturer System Product Name/M5A78L-M/USB3, 
> BIOS 210112/02/2014
>   Call Trace:
>dump_stack+0x85/0xc2
>__warn+0xd1/0xf0
>warn_slowpath_null+0x1d/0x20
>find_parent_nodes+0xc41/0x14e0
>__btrfs_find_all_roots+0xad/0x120
>? extent_same_check_offsets+0x70/0x70
>iterate_extent_inodes+0x168/0x300
>iterate_inodes_from_logical+0x87/0xb0
>? iterate_inodes_from_logical+0x87/0xb0
>? extent_same_check_offsets+0x70/0x70
>btrfs_ioctl+0x8ac/0x2820
>? lock_acquire+0xc2/0x200
>do_vfs_ioctl+0x91/0x700
>? __fget+0x112/0x200
>SyS_ioctl+0x79/0x90
>entry_SYSCALL_64_fastpath+0x23/0xc6
>   RIP: 0033:0x7f727b20be07
>   RSP: 002b:7f7279f1e018 EFLAGS: 0246 ORIG_RAX: 0010
>   RAX: ffda RBX: 9c0f4d7f RCX: 7f727b20be07
>   RDX: 7f7279f1e118 RSI: c0389424 RDI: 0003
>   RBP: 0035 R08: 7f72581bf340 R09: 
>   R10: 0020 R11: 0246 R12: 0040
>   R13: 7f725818d230 R14: 7f7279f1b640 R15: 7f725820
>? trace_hardirqs_off_caller+0x1f/0x140
>   ---[ end trace 5de243350f6762c6 ]---
>   [ cut here ]
> 
> ref->count can be below zero under normal conditions (for delayed refs),
> so there is no need to spam dmesg when it happens.

Why do you think it's normal for this to be a negative value under
normal conditions? There should be some rationale about that otherwise
you are pampering over a bug.

> 
> On kernel v4.14 this warning occurs 100-1000 times more frequently than
> on kernels v4.2..v4.12.  In the worst case, one test machine had 59020
> warnings in 24 hours on v4.14.14 compared to 55 on v4.12.14.
> 
> Signed-off-by: Zygo Blaxell 
> ---
>  fs/btrfs/backref.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c
> index 7d0dc100a09a..57e8d2562ed5 100644
> --- a/fs/btrfs/backref.c
> +++ b/fs/btrfs/backref.c
> @@ -1263,7 +1263,6 @@ static int find_parent_nodes(struct btrfs_trans_handle 
> *trans,
>   while (node) {
>   ref = rb_entry(node, struct prelim_ref, rbnode);
>   node = rb_next(>rbnode);
> - WARN_ON(ref->count < 0);
>   if (roots && ref->count && ref->root_id && ref->parent == 0) {
>   if (sc && sc->root_objectid &&
>   ref->root_id != sc->root_objectid) {
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  

Re: [PATCH] btrfs: remove spurious WARN_ON(ref->count) in find_parent_nodes

2018-01-22 Thread Nikolay Borisov


On 22.01.2018 14:19, Lu Fengqi wrote:
> On 01/22/2018 04:46 PM, Nikolay Borisov wrote:
>>
>>
>> On 22.01.2018 05:34, Lu Fengqi wrote:
>>> According to my bisect result, The frequency of the warning occurrence
>>> increased to the detectable degree after this patch
>>
>> That sentence implies that even before Ed's patch it was possible to
>> trigger those warnings, is that true? Personally I've never seen such
>> warnings while executing btrfs/004. How do you configure the filesystem
>> for the test runs?
>>
> 
> Just only default mount option.
> 
> ➜  xfstests-dev git:(master) for i in $(seq 1 100); do echo $i; if !
> sudo ./check btrfs/004; then break; fi; done
> 1
> 
> FSTYP -- btrfs
> 
> PLATFORM  -- Linux/x86_64 sarch 4.15.0-rc9
> 
> MKFS_OPTIONS  -- /dev/vdd1
> 
> MOUNT_OPTIONS -- /dev/vdd1 /mnt/scratch
> 
> 
> 
> 
> btrfs/004 47s ... 49s
> 
> Ran: btrfs/004
> 
> Passed all 1 tests
> 
> 
> 
> 
> 2
> 
> FSTYP -- btrfs
> 
> PLATFORM  -- Linux/x86_64 sarch 4.15.0-rc9
> 
> MKFS_OPTIONS  -- /dev/vdd1
> 
> MOUNT_OPTIONS -- /dev/vdd1 /mnt/scratch
> 
> 
> 
> 
> btrfs/004 49s ... 52s
> 
> _check_dmesg: something found in dmesg (see
> /home/luke/workspace/xfstests-dev/results//btrfs/004.dmesg)
> 
> Ran: btrfs/004
> 
> Failures: btrfs/004
> 
> Failed 1 of 1 tests
> 
> The probability of this warning appearing is rather low, and I only
> encountered 52 warnings when I looped 1008 times btrfs/004 for 20 hours
> in 4.15-rc6 (IOW, the probability is nearly 5%). So you want to trigger
> warning also need more luck or patience.

Thanks but is this before or after the mentioned commit below?


> 
>>> 86d5f9944252 ("btrfs: convert prelimary reference tracking to use
>>> rbtrees")
>>> is committed. I understand that this does not mean that this patch
>>> caused
>>> the problem, but maybe Edmund can give us some help, so I added him
>>> to the
>>> recipient.
>>
>>
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: remove spurious WARN_ON(ref->count) in find_parent_nodes

2018-01-22 Thread Lu Fengqi

On 01/22/2018 04:46 PM, Nikolay Borisov wrote:



On 22.01.2018 05:34, Lu Fengqi wrote:

According to my bisect result, The frequency of the warning occurrence
increased to the detectable degree after this patch


That sentence implies that even before Ed's patch it was possible to
trigger those warnings, is that true? Personally I've never seen such
warnings while executing btrfs/004. How do you configure the filesystem
for the test runs?



Just only default mount option.

➜  xfstests-dev git:(master) for i in $(seq 1 100); do echo $i; if ! 
sudo ./check btrfs/004; then break; fi; done 

1 



FSTYP -- btrfs 



PLATFORM  -- Linux/x86_64 sarch 4.15.0-rc9 



MKFS_OPTIONS  -- /dev/vdd1 



MOUNT_OPTIONS -- /dev/vdd1 /mnt/scratch 






btrfs/004 47s ... 49s 



Ran: btrfs/004 



Passed all 1 tests 






2 



FSTYP -- btrfs 



PLATFORM  -- Linux/x86_64 sarch 4.15.0-rc9 



MKFS_OPTIONS  -- /dev/vdd1 



MOUNT_OPTIONS -- /dev/vdd1 /mnt/scratch 






btrfs/004 49s ... 52s 



_check_dmesg: something found in dmesg (see 
/home/luke/workspace/xfstests-dev/results//btrfs/004.dmesg) 



Ran: btrfs/004 



Failures: btrfs/004 



Failed 1 of 1 tests

The probability of this warning appearing is rather low, and I only 
encountered 52 warnings when I looped 1008 times btrfs/004 for 20 hours 
in 4.15-rc6 (IOW, the probability is nearly 5%). So you want to trigger 
warning also need more luck or patience.



86d5f9944252 ("btrfs: convert prelimary reference tracking to use rbtrees")
is committed. I understand that this does not mean that this patch caused
the problem, but maybe Edmund can give us some help, so I added him to the
recipient.






--
Thanks,
Lu


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


miss match in btrfs getattr

2018-01-22 Thread Ilan Schwarts
If i stat a file from userspace:

  File: ‘/home/builder/leon10’
   Size: 0   Blocks: 0  IO Block: 4096   directory
  Device: 31h/49d Inode: 550109  Links: 1


Inode is 550109 and device id is 49.

When I am in the kernel, I try to get that device id (49), I call
vfs_stat("/home/builder/leon10"), this returns me also 49.
If i look at the vfs_stat implementation (located at fs/stat.c):

vfs_stat(..) -> vfs_fstatat(..) -> vfs_getattr(..) -> this return:
inode->i_op->getattr(path->mnt, path->dentry, stat);

This is a btrfs file system, so i am looking at function getattr on
inode.c (in fs\btrfs\inode.c kernel sources)
static int btrfs_getattr(struct vfsmount *mnt,
struct dentry *dentry, struct kstat *stat)
{
u64 delalloc_bytes;
struct inode *inode = dentry->d_inode;
u32 blocksize = inode->i_sb->s_blocksize;

generic_fillattr(inode, stat);
stat->dev = BTRFS_I(inode)->root->anon_dev;
So look at stat->dev = BTRFS_I(inode)->root->anon_dev; carefully, When
on my kernel module code i take an inode i hold, and try to print
this:

LOG( V_LOG_NOTICE, "BTRFS_I(inode)->root->anon_dev=%d",
BTRFS_I(inode)->root->anon_dev);
I get 0 or some wierd numbers like 1064756416.

Why is that miss match ? What am I missing ?

Isn't that the right getattr btrfs implementaion ?
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Periodic frame losses when recording to btrfs volume with OBS

2018-01-22 Thread Nikolay Borisov


On 22.01.2018 02:39, Qu Wenruo wrote:
> 
> 
> On 2018年01月21日 23:27, Sebastian Ochmann wrote:
>> On 21.01.2018 11:04, Qu Wenruo wrote:
>>>
>>>
>>> On 2018年01月20日 18:47, Sebastian Ochmann wrote:
 Hello,

 I would like to describe a real-world use case where btrfs does not
 perform well for me. I'm recording 60 fps, larger-than-1080p video using
 OBS Studio [1] where it is important that the video stream is encoded
 and written out to disk in real-time for a prolonged period of time (2-5
 hours). The result is a H264 video encoded on the GPU with a data rate
 ranging from approximately 10-50 MB/s.
>>>

 The hardware used is powerful enough to handle this task. When I use a
 XFS volume for recording, no matter whether it's a SSD or HDD, the
 recording is smooth and no frame drops are reported (OBS has a nice
 Stats window where it shows the number of frames dropped due to encoding
 lag which seemingly also includes writing the data out to disk).

 However, when using a btrfs volume I quickly observe severe, periodic
 frame drops. It's not single frames but larger chunks of frames that a
 dropped at a time. I tried mounting the volume with nobarrier but to no
 avail.
>>>
>>> What's the drop internal? Something near 30s?
>>> If so, try mount option commit00 to see if it helps.
>>
>> Thank you for your reply. I observed the interval more closely and it
>> shows that the first, quite small drop occurs about 10 seconds after
>> starting the recording (some initial metadata being written?). After
>> that, the interval is indeed about 30 seconds with large drops each time.
> 
> This almost proves my assumption to transaction commitment performance.
> 
> But...
> 
>>
>> Thus I tried setting the commit option to different values. I confirmed
>> that the setting was activated by looking at the options "mount" shows
>> (see below). However, no matter whether I set the commit interval to
>> 300, 60 or 10 seconds, the results were always similar. About every 30
>> seconds the drive shows activity for a few seconds and the drop occurs
>> shortly thereafter.
> 
> Either such mount option has a bug, or some unrelated problem.

Looking at transaction_kthread the schedule there is done in
TASK+INTERRUPTIBLE so it's entirely possible that the kthread may be
woken up earlier. The code has been there since 2008. I'm going to run
some write-heavy tests to see how often the transaction kthread is woken
up before it's timeout has elapsed.

> 
> As you mentioned the output is about 10~50MiB/s, 30s means 300~1500MiBs.
> Maybe it's related to the dirty data amount?
> 
> Would you please verify if a lower or higher profile (resulting much
> larger or smaller data stream) would affect?
> 
> 
> Despite that, I'll dig to see if commit= option has any bug.
> 
> And you could also try the nospace_cache mount option provided by Chris
> Murphy, which may also help.
> 
> Thanks,
> Qu
> 
>> It almost seems like the commit setting doesn't have
>> any effect. By the way, the machine I'm currently testing on has 64 GB
>> of RAM so it should have plenty of room for caching.
>>

 Of course, the simple fix is to use a FS that works for me(TM). However
 I thought since this is a common real-world use case I'd describe the
 symptoms here in case anyone is interested in analyzing this behavior.
 It's not immediately obvious that the FS makes such a difference. Also,
 if anyone has an idea what I could try to mitigate this issue (mount or
 mkfs options?) I can try that.
>>>
>>> Mkfs.options can help, but only marginally AFAIK.
>>>
>>> You could try mkfs with -n 4K (minimal supported nodesize), to reduce
>>> the tree lock critical region by a little, at the cost of more metadata
>>> fragmentation.
>>>
>>> And is there any special features enabled like quota?
>>> Or scheduled balance running at background?
>>> Which is known to dramatically impact performance of transaction
>>> commitment, so it's recommended to disable quota/scheduled balance first.
>>>
>>>
>>> Another recommendation is to use nodatacow mount option to reduce the
>>> CoW metadata overhead, but I doubt about the effectiveness.
>>
>> I tried the -n 4K and nodatacow options, but it doesn't seem to make a
>> big difference, if at all. No quota or auto-balance is active. It's
>> basically using Arch Linux default options.
>>
>> The output of "mount" after setting 10 seconds commit interval:
>>
>> /dev/sdc1 on /mnt/rec type btrfs
>> (rw,relatime,space_cache,commit,subvolid=5,subvol=/)
>>
>> Also tried noatime, but didn't make a difference either.
>>
>> Best regards
>> Sebastian
>>
>>> Thanks,
>>> Qu >>
 I saw this behavior on two different machines with kernels 4.14.13 and
 4.14.5, both Arch Linux. btrfs-progs 4.14, OBS 20.1.3-241-gf5c3af1b
 built from git.

 Best regards
 Sebastian

 [1] https://github.com/jp9000/obs-studio
 -- 
 To unsubscribe from 

Re: Can't mount (even in ro) after power outage - corrupt leaf, open_ctree failed

2018-01-22 Thread Zatkovský Dušan

Hi.

Badblocks finished on both disks with no errors. The only messages from 
kernel
during night are 6x perf: interrupt took too long (2511 > 2500), 
lowering kernel.perf_event_max_sample_rate to 79500


root@nas:~# smartctl -l scterc /dev/sda
smartctl 6.6 2016-05-31 r4324 [x86_64-linux-4.9.0-4-amd64] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

SCT Error Recovery Control:
   Read: 70 (7.0 seconds)
  Write: 70 (7.0 seconds)

root@nas:~# smartctl -l scterc /dev/sdb
smartctl 6.6 2016-05-31 r4324 [x86_64-linux-4.9.0-4-amd64] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

SCT Error Recovery Control:
   Read: 70 (7.0 seconds)
  Write: 70 (7.0 seconds)

root@nas:~# btrfs-debug-tree -t chunk /dev/sda4 | grep 'METADATA\|SYSTEM'
incorrect offsets 13686 13622
    type METADATA|RAID1 num_stripes 2
    type METADATA|RAID1 num_stripes 2
    type SYSTEM|RAID1 num_stripes 2
    type METADATA|RAID1 num_stripes 2
    type METADATA|RAID1 num_stripes 2

root@nas:~# btrfs-debug-tree -t chunk /dev/sdb4 | grep 'METADATA\|SYSTEM'
incorrect offsets 13686 13622
    type METADATA|RAID1 num_stripes 2
    type METADATA|RAID1 num_stripes 2
    type SYSTEM|RAID1 num_stripes 2
    type METADATA|RAID1 num_stripes 2
    type METADATA|RAID1 num_stripes 2

(still used "old" version of btrfs tools, working remotely now, I will 
boot something newer when I will get access to that NAS at EOD)


Thank you
msk


Dňa 22. 1. 2018 o 0:24 Chris Murphy napísal(a):

On Sun, Jan 21, 2018 at 4:13 PM, Chris Murphy  wrote:

On Sun, Jan 21, 2018 at 3:31 PM, msk conf  wrote:

Hello,

thank you for the reply.


What do you get for btrfs fi df /array


Can't do that because filesystem is not mountable. I will get stats for '/'
filesystem instead (because '/array' is an empty directory - mountpoint on /

Try
$ sudo btrfs-debug-tree -t chunk /dev/mapper/first | grep 'METADATA\|SYSTEM'


You need to adapt that /dev/ node for your case, I just copy pasted
that from my setup. Anyway, that will look at the chunk tree and show
the profile for these chunk types.




--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Periodic frame losses when recording to btrfs volume with OBS

2018-01-22 Thread Duncan
Sebastian Ochmann posted on Sun, 21 Jan 2018 16:27:55 +0100 as excerpted:

> On 21.01.2018 11:04, Qu Wenruo wrote:
>> 
>> 
>> On 2018年01月20日 18:47, Sebastian Ochmann wrote:
>>> Hello,
>>>
>>> I would like to describe a real-world use case where btrfs does not
>>> perform well for me. I'm recording 60 fps, larger-than-1080p video
>>> using OBS Studio [1] where it is important that the video stream is
>>> encoded and written out to disk in real-time for a prolonged period of
>>> time (2-5 hours). The result is a H264 video encoded on the GPU with a
>>> data rate ranging from approximately 10-50 MB/s.
>> 
>> 
>>> The hardware used is powerful enough to handle this task. When I use a
>>> XFS volume for recording, no matter whether it's a SSD or HDD, the
>>> recording is smooth and no frame drops are reported (OBS has a nice
>>> Stats window where it shows the number of frames dropped due to
>>> encoding lag which seemingly also includes writing the data out to
>>> disk).
>>>
>>> However, when using a btrfs volume I quickly observe severe, periodic
>>> frame drops. It's not single frames but larger chunks of frames that a
>>> dropped at a time. I tried mounting the volume with nobarrier but to
>>> no avail.
>> 
>> What's the drop internal? Something near 30s?
>> If so, try mount option commit=300 to see if it helps.
> 
> Thank you for your reply. I observed the interval more closely and it
> shows that the first, quite small drop occurs about 10 seconds after
> starting the recording (some initial metadata being written?). After
> that, the interval is indeed about 30 seconds with large drops each
> time.
> 
> Thus I tried setting the commit option to different values. I confirmed
> that the setting was activated by looking at the options "mount" shows
> (see below). However, no matter whether I set the commit interval to
> 300, 60 or 10 seconds, the results were always similar. About every 30
> seconds the drive shows activity for a few seconds and the drop occurs
> shortly thereafter. It almost seems like the commit setting doesn't have
> any effect. By the way, the machine I'm currently testing on has 64 GB
> of RAM so it should have plenty of room for caching.

64 GB RAM...

Do you know about the /proc/sys/vm/dirty_* files and how to use/tweak 
them?  If not, read $KERNDIR/Documentation/sysctl/vm.txt, focusing on 
these files.

These tunables control the amount of writeback cache that is allowed to 
accumulate before the system starts flushing it.  The problem is that the 
defaults for these tunables were selected back when system memory 
normally measured in the MiB, not the GiB of today, so the default ratios 
allow too much dirty data to accumulate before attempting to flush it to 
storage, resulting in flush storms that hog the available IO and starve 
other tasks that might be trying to use it.

The fix is to tweak these settings to try to smooth things out, starting 
background flush earlier, so with a bit of luck the system never hits 
high priority foreground flush mode, or if it does there's not so much to 
be written as much of it has already been done in the background.

There are five files, two pairs of files, one pair controlling foreground 
sizes, the other background, and one file setting the time limit.  The 
sizes can be set by either ratio, percentage of RAM, or bytes, with the 
other appearing as zero when read.

To set these temporarily you write to the appropriate file.  Once you 
have a setting that works well for you, write it to your distro's sysctl 
configuration (/etc/sysctl.conf or /etc/sysctrl.d/*.conf, usually), and 
it should be automatically applied at boot for you.

Here's the settings in my /etc/sysctl.conf, complete with notes about the 
defaults and the values I've chosen for my 16G of RAM.  Note that while I 
have fast ssds now, I set these values back when I had spinning rust.  I 
was happy with them then, and while I shouldn't really need the settings 
on my ssds, I've seen no reason to change them.

At 16G, 1% ~ 160M.  At 64G, it'd be four times larger, 640M, likely too 
chunky a granularity to be useful, so you'll probably want to set the 
bytes value instead of ratio.

# write-cache, foreground/background flushing
# vm.dirty_ratio = 10 (% of RAM)
# make it 3% of 16G ~ half a gig
vm.dirty_ratio = 3
# vm.dirty_bytes = 0

# vm.dirty_background_ratio = 5 (% of RAM)
# make it 1% of 16G ~ 160 M
vm.dirty_background_ratio = 1
# vm.dirty_background_bytes = 0

# vm.dirty_expire_centisecs = 2999 (30 sec)
# vm.dirty_writeback_centisecs = 499 (5 sec)
# make it 10 sec
vm.dirty_writeback_centisecs = 1000


Now the other factor in the picture is how fast your actual hardware can 
write.  hdparm's -t parameter tests sequential write speed and can give 
you some idea.  You'll need to run it as root:

hdparm -t /dev/sda

/dev/sda:
 Timing buffered disk reads: 1578 MB in  3.00 seconds = 525.73 MB/sec

... Like I said, fast ssd...  I believe fast modern spinning rust should 
be 100 

Re: [PATCH] btrfs: remove spurious WARN_ON(ref->count) in find_parent_nodes

2018-01-22 Thread Nikolay Borisov


On 22.01.2018 05:34, Lu Fengqi wrote:
> According to my bisect result, The frequency of the warning occurrence
> increased to the detectable degree after this patch

That sentence implies that even before Ed's patch it was possible to
trigger those warnings, is that true? Personally I've never seen such
warnings while executing btrfs/004. How do you configure the filesystem
for the test runs?

> 86d5f9944252 ("btrfs: convert prelimary reference tracking to use rbtrees")
> is committed. I understand that this does not mean that this patch caused
> the problem, but maybe Edmund can give us some help, so I added him to the
> recipient.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] btrfs: Add chunk allocation ENOSPC debug message for enospc_debug mount option

2018-01-22 Thread Nikolay Borisov


On 22.01.2018 07:50, Qu Wenruo wrote:
> Enospc_debug makes extent allocator to print more debug messages,
> however for chunk allocation, there is no debug message for enospc_debug
> at all.
> 
> This patch will add message for the following parts of chunk allocator:
> 
> 1) No rw device at all
>Quite rare, but at least output one message for this case.
> 
> 2) No enough space for some device
>This debug message is quite handy for unbalanced disks with stripe
>based profiles (RAID0/10/5/6).
> 
> 3) Not enough free devices
>This debug message should tell us if current chunk allocator is
>working correctly on minimal device requirement.
> 
> Although under most case, we will hit other ENOSPC before we even hit a
> chunk allocator ENOSPC, but such debug info won't help.
> 
> Signed-off-by: Qu Wenruo 

Reviewed-by: Nikolay Borisov 

> ---
> v2:
>   Unify all message level to btrfs_debug().
>   More meaningful message if we don't have enough device.
> ---
>  fs/btrfs/volumes.c | 19 +--
>  1 file changed, 17 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index a25684287501..86cae6a15b1e 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -4622,8 +4622,11 @@ static int __btrfs_alloc_chunk(struct 
> btrfs_trans_handle *trans,
>  
>   BUG_ON(!alloc_profile_is_valid(type, 0));
>  
> - if (list_empty(_devices->alloc_list))
> + if (list_empty(_devices->alloc_list)) {
> + if (btrfs_test_opt(info, ENOSPC_DEBUG))
> + btrfs_debug(info, "%s: No writable device", __func__);
>   return -ENOSPC;
> + }
>  
>   index = __get_raid_index(type);
>  
> @@ -4705,8 +4708,14 @@ static int __btrfs_alloc_chunk(struct 
> btrfs_trans_handle *trans,
>   if (ret == 0)
>   max_avail = max_stripe_size * dev_stripes;
>  
> - if (max_avail < BTRFS_STRIPE_LEN * dev_stripes)
> + if (max_avail < BTRFS_STRIPE_LEN * dev_stripes) {
> + if (btrfs_test_opt(info, ENOSPC_DEBUG))
> + btrfs_debug(info,
> + "%s: devid %llu has no free space, have=%llu want=%u",
> + __func__, device->devid, max_avail,
> + BTRFS_STRIPE_LEN * dev_stripes);
>   continue;
> + }
>  
>   if (ndevs == fs_devices->rw_devices) {
>   WARN(1, "%s: found more than %llu devices\n",
> @@ -4731,6 +4740,12 @@ static int __btrfs_alloc_chunk(struct 
> btrfs_trans_handle *trans,
>  
>   if (ndevs < devs_increment * sub_stripes || ndevs < devs_min) {
>   ret = -ENOSPC;
> + if (btrfs_test_opt(info, ENOSPC_DEBUG)) {
> + btrfs_debug(info,
> + "%s: not enough devices with free space: have=%d minimal 
> required=%d",

nit: s/minimal/minimum
But there is no point in resending just for that, I guess David could
fix it while merging.

> + __func__, ndevs, min(devs_min,
> + devs_increment * sub_stripes));
> + }
>   goto error;
>   }
>  
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2.1 2/3] btrfs-progs: dir-item: Don't do extra filetype validaction check for btrfs_match_dir_item_name

2018-01-22 Thread Nikolay Borisov


On 22.01.2018 07:45, Qu Wenruo wrote:
> verify_dir_item() is called in btrfs_match_dir_item_name() to ensure we
> won't search beyond item boundary and does extra filetype check.
> 
> However in the follow call chain, such extra filetype check can cause
> problems:
> 
> 1) btrfs_add_link()
>|- check_dir_conflict()
>   |- btrfs_lookup_dir_index()
>  |- btrfs_match_dir_item_name()
> 
>And if we have an offending dir index whose filetype is invalid,
>btrfs_match_dir_item_name() will return NULL, meaning no match dir
>index is found.
>So btrfs_add_link() will still try to insert a dir index, which may
>have same key->offset and leading to duplicated dir index.
> 
> 2) btrfs_unlink()
>|- btrfs_lookup_dir_index()
>   |- btrfs_lookup_dir_index()
>  |- btrfs_match_dir_item_name()
> 
>For the same offending dir index with invalid filetype, this will
>return NULL, and btrfs_unlink() will just consider there is no
>existing dir_index and do nothing.
>Leave an orphan and invalid dir_index hanging there forever.
> 
> The patch removes the extra filetype check, as "btrfs check" can already
> handle invalid filetype correctly for both modes.
> 
> And this makes "btrfs check --repair --mode=lowmem" to delete the
> offending dir index to repair it correctly.
> 
> Signed-off-by: Qu Wenruo 

Reviewed-by: Nikolay Borisov 
> ---
> v2:
>   Get rid of the new parameter.
> v2.1:
>   Better commit message.
> ---
>  dir-item.c | 6 --
>  1 file changed, 6 deletions(-)
> 
> diff --git a/dir-item.c b/dir-item.c
> index 462546c0eaf4..e0a0ab4d7a5d 100644
> --- a/dir-item.c
> +++ b/dir-item.c
> @@ -294,12 +294,6 @@ static int verify_dir_item(struct btrfs_root *root,
>   u16 namelen = BTRFS_NAME_LEN;
>   u8 type = btrfs_dir_type(leaf, dir_item);
>  
> - if (type >= BTRFS_FT_MAX) {
> - fprintf(stderr, "invalid dir item type: %d\n",
> -(int)type);
> - return 1;
> - }
> -
>   if (type == BTRFS_FT_XATTR)
>   namelen = XATTR_NAME_MAX;
>  
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


btrfs inode is different across file systems ?

2018-01-22 Thread Ilan Schwarts
Hey,

If I get btrfs inode in this way: btrfs_ino(inode)
implemented at btrfs_inode.h:
static inline u64 btrfs_ino(struct inode *inode)
{
u64 ino = BTRFS_I(inode)->location.objectid;
if (!ino || BTRFS_I(inode)->location.type == BTRFS_ROOT_ITEM_KEY)
ino = inode->i_ino;
return ino;
}

Is that inode number is unique between 2 btrfs file systems ?
Lets assume, i have 2 btrfs file systems on my machine, file system A
and file system B, each of these file system has volumes.
is the inode obtained via BTRFS_I(inode)->location.objectid is
guaranteed to be unique across all the btrfs file systems, or just per
the file system this inode is at ?
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html