Re: BTRFS: space_info 4 has 18446742286429913088 free, is not full

2016-10-06 Thread Stefan Priebe - Profihost AG
Thanks Wang,

i applied them both on top of vanilla v4.8 - i hope this is OK. Will
report back what happens.

Greets,
Stefan

Am 06.10.2016 um 05:04 schrieb Wang Xiaoguang:
> Hi,
> 
> On 09/29/2016 03:27 PM, Stefan Priebe - Profihost AG wrote:
>> Am 29.09.2016 um 09:13 schrieb Wang Xiaoguang:
>>>>> I found that compress sometime report ENOSPC error even in 4.8-rc8,
>>>>> currently
>>>> I cannot confirm that as i do not have anough space to test this
>>>> without
>>>> compression ;-( But yes i've compression enabled.
>>> I might not get you, my poor english :)
>>> You mean that you only get ENOSPC error when compression is enabled?
>>>
>>> And when compression is not enabled, you do not get ENOSPC error?
>> I can't tell you. I cannot test with compression not enabled. I do not
>> have anough free space on this disk.
> I had just sent two patches to fix false enospc error for compression,
> please have a try, they fix false enospc error in my test environment.
> btrfs: fix false enospc for compression
> btrfs: improve inode's outstanding_extents computation
> 
> I apply these two patchs in linux upstream tree, the latest commit
> is 41844e36206be90cd4d962ea49b0abc3612a99d0.
> 
> Regards,
> Xiaoguang Wang
> 
>>
>>>>> I'm trying to fix it.
>>>> That sounds good but do you also get the
>>>> BTRFS: space_info 4 has 18446742286429913088 free, is not full
>>>>
>>>> kernel messages on umount? if not you might have found another problem.
>>> Yes, I seem similar messages, you can paste you whole dmesg info here.
>> [ cut here ]
>> WARNING: CPU: 2 PID: 5187 at fs/btrfs/extent-tree.c:5790
>> btrfs_free_block_groups+0x346/0x430 [btrfs]()
>> Modules linked in: netconsole xt_multiport iptable_filter ip_tables
>> x_tables 8021q garp bonding x86_pkg_temp_thermal coretemp kvm_intel kvm
>> irqbypass sb_edac crc32_pclmul edac_core i2c_i801 i40e(O) vxlan
>> ip6_udp_tunnel udp_tunnel shpchp ipmi_si ipmi_msghandler button loop
>> btrfs dm_mod raid10 raid0 multipath linear raid456 async_raid6_recov
>> async_memcpy async_pq async_xor async_tx xor raid6_pq igb i2c_algo_bit
>> i2c_core usbhid raid1 md_mod xhci_pci sg ehci_pci xhci_hcd ehci_hcd
>> sd_mod ahci usbcore ptp libahci usb_common pps_core aacraid
>> CPU: 2 PID: 5187 Comm: umount Tainted: G O 4.4.22+63-ph #1
>> Hardware name: Supermicro Super Server/X10SRi-F, BIOS 2.0 12/17/2015
>>  880fda777d00 813b69c3 
>> c067a099 880fda777d38 810821c6 
>> 880074bf0a00 88103c10c088 88103c10c000 88103c10c098
>> Call Trace:
>> [] dump_stack+0x63/0x90
>> [] warn_slowpath_common+0x86/0xc0
>> [] warn_slowpath_null+0x1a/0x20
>> [] btrfs_free_block_groups+0x346/0x430 [btrfs]
>> [] close_ctree+0x15d/0x330 [btrfs]
>> [] btrfs_put_super+0x19/0x20 [btrfs]
>> [] generic_shutdown_super+0x6f/0x100
>> [] kill_anon_super+0x12/0x20
>> [] btrfs_kill_super+0x16/0xa0 [btrfs]
>> [] deactivate_locked_super+0x43/0x70
>> [] deactivate_super+0x5c/0x60
>> [] cleanup_mnt+0x3f/0x90
>> [] __cleanup_mnt+0x12/0x20
>> [] task_work_run+0x81/0xa0
>> [] exit_to_usermode_loop+0xb0/0xc0
>> [] syscall_return_slowpath+0xd4/0x130
>> [] int_ret_from_sys_call+0x25/0x8f
>> ---[ end trace cee6ace13018e13e ]---
>> [ cut here ]
>> WARNING: CPU: 2 PID: 5187 at fs/btrfs/extent-tree.c:5791
>> btrfs_free_block_groups+0x365/0x430 [btrfs]()
>> Modules linked in: netconsole xt_multiport iptable_filter ip_tables
>> x_tables 8021q garp bonding x86_pkg_temp_thermal coretemp kvm_intel kvm
>> irqbypass sb_edac crc32_pclmul edac_core i2c_i801 i40e(O) vxlan
>> ip6_udp_tunnel udp_tunnel shpchp ipmi_si ipmi_msghandler button loop
>> btrfs dm_mod raid10 raid0 multipath linear raid456 async_raid6_recov
>> async_memcpy async_pq async_xor async_tx xor raid6_pq igb i2c_algo_bit
>> i2c_core usbhid raid1 md_mod xhci_pci sg ehci_pci xhci_hcd ehci_hcd
>> sd_mod ahci usbcore ptp libahci usb_common pps_core aacraid
>> CPU: 2 PID: 5187 Comm: umount Tainted: G W O 4.4.22+63-ph #1
>> Hardware name: Supermicro Super Server/X10SRi-F, BIOS 2.0 12/17/2015
>>  880fda777d00 813b69c3 
>> c067a099 880fda777d38 810821c6 
>> 880074bf0a00 88103c10c088 88103c10c000 88103c10c098
>> Call Trace:
>> [] dump_stack+0x63/0x90
>> [] warn_slowpath_comm

Re: BTRFS: space_info 4 has 18446742286429913088 free, is not full

2016-10-07 Thread Stefan Priebe - Profihost AG
Dear Wang,

can't use v4.8.0 as i always get OOMs and total machine crashes.

Complete traces with your patch and some more btrfs patches applied (in
the hope in fixes the OOM but it did not):
http://pastebin.com/raw/6vmRSDm1

Greets,
Stefan
Am 06.10.2016 um 05:04 schrieb Wang Xiaoguang:
> Hi,
> 
> On 09/29/2016 03:27 PM, Stefan Priebe - Profihost AG wrote:
>> Am 29.09.2016 um 09:13 schrieb Wang Xiaoguang:
>>>>> I found that compress sometime report ENOSPC error even in 4.8-rc8,
>>>>> currently
>>>> I cannot confirm that as i do not have anough space to test this
>>>> without
>>>> compression ;-( But yes i've compression enabled.
>>> I might not get you, my poor english :)
>>> You mean that you only get ENOSPC error when compression is enabled?
>>>
>>> And when compression is not enabled, you do not get ENOSPC error?
>> I can't tell you. I cannot test with compression not enabled. I do not
>> have anough free space on this disk.
> I had just sent two patches to fix false enospc error for compression,
> please have a try, they fix false enospc error in my test environment.
> btrfs: fix false enospc for compression
> btrfs: improve inode's outstanding_extents computation
> 
> I apply these two patchs in linux upstream tree, the latest commit
> is 41844e36206be90cd4d962ea49b0abc3612a99d0.
> 
> Regards,
> Xiaoguang Wang
> 
>>
>>>>> I'm trying to fix it.
>>>> That sounds good but do you also get the
>>>> BTRFS: space_info 4 has 18446742286429913088 free, is not full
>>>>
>>>> kernel messages on umount? if not you might have found another problem.
>>> Yes, I seem similar messages, you can paste you whole dmesg info here.
>> [ cut here ]
>> WARNING: CPU: 2 PID: 5187 at fs/btrfs/extent-tree.c:5790
>> btrfs_free_block_groups+0x346/0x430 [btrfs]()
>> Modules linked in: netconsole xt_multiport iptable_filter ip_tables
>> x_tables 8021q garp bonding x86_pkg_temp_thermal coretemp kvm_intel kvm
>> irqbypass sb_edac crc32_pclmul edac_core i2c_i801 i40e(O) vxlan
>> ip6_udp_tunnel udp_tunnel shpchp ipmi_si ipmi_msghandler button loop
>> btrfs dm_mod raid10 raid0 multipath linear raid456 async_raid6_recov
>> async_memcpy async_pq async_xor async_tx xor raid6_pq igb i2c_algo_bit
>> i2c_core usbhid raid1 md_mod xhci_pci sg ehci_pci xhci_hcd ehci_hcd
>> sd_mod ahci usbcore ptp libahci usb_common pps_core aacraid
>> CPU: 2 PID: 5187 Comm: umount Tainted: G O 4.4.22+63-ph #1
>> Hardware name: Supermicro Super Server/X10SRi-F, BIOS 2.0 12/17/2015
>>  880fda777d00 813b69c3 
>> c067a099 880fda777d38 810821c6 
>> 880074bf0a00 88103c10c088 88103c10c000 88103c10c098
>> Call Trace:
>> [] dump_stack+0x63/0x90
>> [] warn_slowpath_common+0x86/0xc0
>> [] warn_slowpath_null+0x1a/0x20
>> [] btrfs_free_block_groups+0x346/0x430 [btrfs]
>> [] close_ctree+0x15d/0x330 [btrfs]
>> [] btrfs_put_super+0x19/0x20 [btrfs]
>> [] generic_shutdown_super+0x6f/0x100
>> [] kill_anon_super+0x12/0x20
>> [] btrfs_kill_super+0x16/0xa0 [btrfs]
>> [] deactivate_locked_super+0x43/0x70
>> [] deactivate_super+0x5c/0x60
>> [] cleanup_mnt+0x3f/0x90
>> [] __cleanup_mnt+0x12/0x20
>> [] task_work_run+0x81/0xa0
>> [] exit_to_usermode_loop+0xb0/0xc0
>> [] syscall_return_slowpath+0xd4/0x130
>> [] int_ret_from_sys_call+0x25/0x8f
>> ---[ end trace cee6ace13018e13e ]---
>> [ cut here ]
>> WARNING: CPU: 2 PID: 5187 at fs/btrfs/extent-tree.c:5791
>> btrfs_free_block_groups+0x365/0x430 [btrfs]()
>> Modules linked in: netconsole xt_multiport iptable_filter ip_tables
>> x_tables 8021q garp bonding x86_pkg_temp_thermal coretemp kvm_intel kvm
>> irqbypass sb_edac crc32_pclmul edac_core i2c_i801 i40e(O) vxlan
>> ip6_udp_tunnel udp_tunnel shpchp ipmi_si ipmi_msghandler button loop
>> btrfs dm_mod raid10 raid0 multipath linear raid456 async_raid6_recov
>> async_memcpy async_pq async_xor async_tx xor raid6_pq igb i2c_algo_bit
>> i2c_core usbhid raid1 md_mod xhci_pci sg ehci_pci xhci_hcd ehci_hcd
>> sd_mod ahci usbcore ptp libahci usb_common pps_core aacraid
>> CPU: 2 PID: 5187 Comm: umount Tainted: G W O 4.4.22+63-ph #1
>> Hardware name: Supermicro Super Server/X10SRi-F, BIOS 2.0 12/17/2015
>>  880fda777d00 813b69c3 
>> c067a099 880fda777d38 810821c6 
>> 880074bf

Re: BTRFS: space_info 4 has 18446742286429913088 free, is not full

2016-10-07 Thread Stefan Priebe - Profihost AG
Am 07.10.2016 um 09:17 schrieb Wang Xiaoguang:
> Hi,
> 
> On 10/07/2016 03:03 PM, Stefan Priebe - Profihost AG wrote:
>> Dear Wang,
>>
>> can't use v4.8.0 as i always get OOMs and total machine crashes.
>>
>> Complete traces with your patch and some more btrfs patches applied (in
>> the hope in fixes the OOM but it did not):
>> http://pastebin.com/raw/6vmRSDm1
> I didn't see any such OOMs...
> Can you try holger's tree with my patches.

Dear wang already tried that. Doesn't help. It also happens only on two
out of three servers.  It starts killing low men processes after time.
But I've no idea where all those memory is consumed. (Have 64gb)

Greets,
Stefan


> Regards,
> Xiaoguang Wang
>>
>> Greets,
>> Stefan
>> Am 06.10.2016 um 05:04 schrieb Wang Xiaoguang:
>>> Hi,
>>>
>>> On 09/29/2016 03:27 PM, Stefan Priebe - Profihost AG wrote:
>>>> Am 29.09.2016 um 09:13 schrieb Wang Xiaoguang:
>>>>>>> I found that compress sometime report ENOSPC error even in 4.8-rc8,
>>>>>>> currently
>>>>>> I cannot confirm that as i do not have anough space to test this
>>>>>> without
>>>>>> compression ;-( But yes i've compression enabled.
>>>>> I might not get you, my poor english :)
>>>>> You mean that you only get ENOSPC error when compression is enabled?
>>>>>
>>>>> And when compression is not enabled, you do not get ENOSPC error?
>>>> I can't tell you. I cannot test with compression not enabled. I do not
>>>> have anough free space on this disk.
>>> I had just sent two patches to fix false enospc error for compression,
>>> please have a try, they fix false enospc error in my test environment.
>>>  btrfs: fix false enospc for compression
>>>  btrfs: improve inode's outstanding_extents computation
>>>
>>> I apply these two patchs in linux upstream tree, the latest commit
>>> is 41844e36206be90cd4d962ea49b0abc3612a99d0.
>>>
>>> Regards,
>>> Xiaoguang Wang
>>>
>>>>>>> I'm trying to fix it.
>>>>>> That sounds good but do you also get the
>>>>>> BTRFS: space_info 4 has 18446742286429913088 free, is not full
>>>>>>
>>>>>> kernel messages on umount? if not you might have found another
>>>>>> problem.
>>>>> Yes, I seem similar messages, you can paste you whole dmesg info here.
>>>> [ cut here ]
>>>> WARNING: CPU: 2 PID: 5187 at fs/btrfs/extent-tree.c:5790
>>>> btrfs_free_block_groups+0x346/0x430 [btrfs]()
>>>> Modules linked in: netconsole xt_multiport iptable_filter ip_tables
>>>> x_tables 8021q garp bonding x86_pkg_temp_thermal coretemp kvm_intel kvm
>>>> irqbypass sb_edac crc32_pclmul edac_core i2c_i801 i40e(O) vxlan
>>>> ip6_udp_tunnel udp_tunnel shpchp ipmi_si ipmi_msghandler button loop
>>>> btrfs dm_mod raid10 raid0 multipath linear raid456 async_raid6_recov
>>>> async_memcpy async_pq async_xor async_tx xor raid6_pq igb i2c_algo_bit
>>>> i2c_core usbhid raid1 md_mod xhci_pci sg ehci_pci xhci_hcd ehci_hcd
>>>> sd_mod ahci usbcore ptp libahci usb_common pps_core aacraid
>>>> CPU: 2 PID: 5187 Comm: umount Tainted: G O 4.4.22+63-ph #1
>>>> Hardware name: Supermicro Super Server/X10SRi-F, BIOS 2.0 12/17/2015
>>>>  880fda777d00 813b69c3 
>>>> c067a099 880fda777d38 810821c6 
>>>> 880074bf0a00 88103c10c088 88103c10c000 88103c10c098
>>>> Call Trace:
>>>> [] dump_stack+0x63/0x90
>>>> [] warn_slowpath_common+0x86/0xc0
>>>> [] warn_slowpath_null+0x1a/0x20
>>>> [] btrfs_free_block_groups+0x346/0x430 [btrfs]
>>>> [] close_ctree+0x15d/0x330 [btrfs]
>>>> [] btrfs_put_super+0x19/0x20 [btrfs]
>>>> [] generic_shutdown_super+0x6f/0x100
>>>> [] kill_anon_super+0x12/0x20
>>>> [] btrfs_kill_super+0x16/0xa0 [btrfs]
>>>> [] deactivate_locked_super+0x43/0x70
>>>> [] deactivate_super+0x5c/0x60
>>>> [] cleanup_mnt+0x3f/0x90
>>>> [] __cleanup_mnt+0x12/0x20
>>>> [] task_work_run+0x81/0xa0
>>>> [] exit_to_usermode_loop+0xb0/0xc0
>>>> [] syscall_return_slowpath+0xd4/0x130
>>>> [] int_ret_from_sys_call+0x25/0x8f
>>>> ---[ end trace cee6ace13018e

Re: BTRFS: space_info 4 has 18446742286429913088 free, is not full

2016-10-07 Thread Stefan Priebe - Profihost AG
this is what atop shows at mem usage 5 minutes before the crash:

MEM | tot62.8G  | free  198.2M  | cache  56.8G  | buff1.4M |
slab3.5G |  shmem   1.1M |  vmbal   0.0M |  hptot   0.0M |

SWP | tot 3.7G  | free3.2G  |   |  |
  |   |  vmcom   2.8G |  vmlim  35.1G |

Greets,
Stefan

Am 07.10.2016 um 09:17 schrieb Wang Xiaoguang:
> Hi,
> 
> On 10/07/2016 03:03 PM, Stefan Priebe - Profihost AG wrote:
>> Dear Wang,
>>
>> can't use v4.8.0 as i always get OOMs and total machine crashes.
>>
>> Complete traces with your patch and some more btrfs patches applied (in
>> the hope in fixes the OOM but it did not):
>> http://pastebin.com/raw/6vmRSDm1
> I didn't see any such OOMs...
> Can you try holger's tree with my patches.
> 
> Regards,
> Xiaoguang Wang
>>
>> Greets,
>> Stefan
>> Am 06.10.2016 um 05:04 schrieb Wang Xiaoguang:
>>> Hi,
>>>
>>> On 09/29/2016 03:27 PM, Stefan Priebe - Profihost AG wrote:
>>>> Am 29.09.2016 um 09:13 schrieb Wang Xiaoguang:
>>>>>>> I found that compress sometime report ENOSPC error even in 4.8-rc8,
>>>>>>> currently
>>>>>> I cannot confirm that as i do not have anough space to test this
>>>>>> without
>>>>>> compression ;-( But yes i've compression enabled.
>>>>> I might not get you, my poor english :)
>>>>> You mean that you only get ENOSPC error when compression is enabled?
>>>>>
>>>>> And when compression is not enabled, you do not get ENOSPC error?
>>>> I can't tell you. I cannot test with compression not enabled. I do not
>>>> have anough free space on this disk.
>>> I had just sent two patches to fix false enospc error for compression,
>>> please have a try, they fix false enospc error in my test environment.
>>>  btrfs: fix false enospc for compression
>>>  btrfs: improve inode's outstanding_extents computation
>>>
>>> I apply these two patchs in linux upstream tree, the latest commit
>>> is 41844e36206be90cd4d962ea49b0abc3612a99d0.
>>>
>>> Regards,
>>> Xiaoguang Wang
>>>
>>>>>>> I'm trying to fix it.
>>>>>> That sounds good but do you also get the
>>>>>> BTRFS: space_info 4 has 18446742286429913088 free, is not full
>>>>>>
>>>>>> kernel messages on umount? if not you might have found another
>>>>>> problem.
>>>>> Yes, I seem similar messages, you can paste you whole dmesg info here.
>>>> [ cut here ]
>>>> WARNING: CPU: 2 PID: 5187 at fs/btrfs/extent-tree.c:5790
>>>> btrfs_free_block_groups+0x346/0x430 [btrfs]()
>>>> Modules linked in: netconsole xt_multiport iptable_filter ip_tables
>>>> x_tables 8021q garp bonding x86_pkg_temp_thermal coretemp kvm_intel kvm
>>>> irqbypass sb_edac crc32_pclmul edac_core i2c_i801 i40e(O) vxlan
>>>> ip6_udp_tunnel udp_tunnel shpchp ipmi_si ipmi_msghandler button loop
>>>> btrfs dm_mod raid10 raid0 multipath linear raid456 async_raid6_recov
>>>> async_memcpy async_pq async_xor async_tx xor raid6_pq igb i2c_algo_bit
>>>> i2c_core usbhid raid1 md_mod xhci_pci sg ehci_pci xhci_hcd ehci_hcd
>>>> sd_mod ahci usbcore ptp libahci usb_common pps_core aacraid
>>>> CPU: 2 PID: 5187 Comm: umount Tainted: G O 4.4.22+63-ph #1
>>>> Hardware name: Supermicro Super Server/X10SRi-F, BIOS 2.0 12/17/2015
>>>>  880fda777d00 813b69c3 
>>>> c067a099 880fda777d38 810821c6 
>>>> 880074bf0a00 88103c10c088 88103c10c000 88103c10c098
>>>> Call Trace:
>>>> [] dump_stack+0x63/0x90
>>>> [] warn_slowpath_common+0x86/0xc0
>>>> [] warn_slowpath_null+0x1a/0x20
>>>> [] btrfs_free_block_groups+0x346/0x430 [btrfs]
>>>> [] close_ctree+0x15d/0x330 [btrfs]
>>>> [] btrfs_put_super+0x19/0x20 [btrfs]
>>>> [] generic_shutdown_super+0x6f/0x100
>>>> [] kill_anon_super+0x12/0x20
>>>> [] btrfs_kill_super+0x16/0xa0 [btrfs]
>>>> [] deactivate_locked_super+0x43/0x70
>>>> [] deactivate_super+0x5c/0x60
>>>> [] cleanup_mnt+0x3f/0x90
>>>> [] __cleanup_mnt+0x12/0x20
>>>> [] task_work_run+0x81/0xa0
>>>> [] exit_to_usermode_loop+0xb0/0xc0
>>>> [] syscall_return_slow

Re: BTRFS: space_info 4 has 18446742286429913088 free, is not full

2016-10-07 Thread Stefan Priebe - Profihost AG
and it shows:

PAG | scan 33829e5  | steal 1968e3  | stall  0  |  |
  |   |  swin  257071 |  swout 346960 |

but the highest user space prog uses only 190MB.

greets,
Stefan

Am 07.10.2016 um 09:17 schrieb Wang Xiaoguang:
> Hi,
> 
> On 10/07/2016 03:03 PM, Stefan Priebe - Profihost AG wrote:
>> Dear Wang,
>>
>> can't use v4.8.0 as i always get OOMs and total machine crashes.
>>
>> Complete traces with your patch and some more btrfs patches applied (in
>> the hope in fixes the OOM but it did not):
>> http://pastebin.com/raw/6vmRSDm1
> I didn't see any such OOMs...
> Can you try holger's tree with my patches.
> 
> Regards,
> Xiaoguang Wang
>>
>> Greets,
>> Stefan
>> Am 06.10.2016 um 05:04 schrieb Wang Xiaoguang:
>>> Hi,
>>>
>>> On 09/29/2016 03:27 PM, Stefan Priebe - Profihost AG wrote:
>>>> Am 29.09.2016 um 09:13 schrieb Wang Xiaoguang:
>>>>>>> I found that compress sometime report ENOSPC error even in 4.8-rc8,
>>>>>>> currently
>>>>>> I cannot confirm that as i do not have anough space to test this
>>>>>> without
>>>>>> compression ;-( But yes i've compression enabled.
>>>>> I might not get you, my poor english :)
>>>>> You mean that you only get ENOSPC error when compression is enabled?
>>>>>
>>>>> And when compression is not enabled, you do not get ENOSPC error?
>>>> I can't tell you. I cannot test with compression not enabled. I do not
>>>> have anough free space on this disk.
>>> I had just sent two patches to fix false enospc error for compression,
>>> please have a try, they fix false enospc error in my test environment.
>>>  btrfs: fix false enospc for compression
>>>  btrfs: improve inode's outstanding_extents computation
>>>
>>> I apply these two patchs in linux upstream tree, the latest commit
>>> is 41844e36206be90cd4d962ea49b0abc3612a99d0.
>>>
>>> Regards,
>>> Xiaoguang Wang
>>>
>>>>>>> I'm trying to fix it.
>>>>>> That sounds good but do you also get the
>>>>>> BTRFS: space_info 4 has 18446742286429913088 free, is not full
>>>>>>
>>>>>> kernel messages on umount? if not you might have found another
>>>>>> problem.
>>>>> Yes, I seem similar messages, you can paste you whole dmesg info here.
>>>> [ cut here ]
>>>> WARNING: CPU: 2 PID: 5187 at fs/btrfs/extent-tree.c:5790
>>>> btrfs_free_block_groups+0x346/0x430 [btrfs]()
>>>> Modules linked in: netconsole xt_multiport iptable_filter ip_tables
>>>> x_tables 8021q garp bonding x86_pkg_temp_thermal coretemp kvm_intel kvm
>>>> irqbypass sb_edac crc32_pclmul edac_core i2c_i801 i40e(O) vxlan
>>>> ip6_udp_tunnel udp_tunnel shpchp ipmi_si ipmi_msghandler button loop
>>>> btrfs dm_mod raid10 raid0 multipath linear raid456 async_raid6_recov
>>>> async_memcpy async_pq async_xor async_tx xor raid6_pq igb i2c_algo_bit
>>>> i2c_core usbhid raid1 md_mod xhci_pci sg ehci_pci xhci_hcd ehci_hcd
>>>> sd_mod ahci usbcore ptp libahci usb_common pps_core aacraid
>>>> CPU: 2 PID: 5187 Comm: umount Tainted: G O 4.4.22+63-ph #1
>>>> Hardware name: Supermicro Super Server/X10SRi-F, BIOS 2.0 12/17/2015
>>>>  880fda777d00 813b69c3 
>>>> c067a099 880fda777d38 810821c6 
>>>> 880074bf0a00 88103c10c088 88103c10c000 88103c10c098
>>>> Call Trace:
>>>> [] dump_stack+0x63/0x90
>>>> [] warn_slowpath_common+0x86/0xc0
>>>> [] warn_slowpath_null+0x1a/0x20
>>>> [] btrfs_free_block_groups+0x346/0x430 [btrfs]
>>>> [] close_ctree+0x15d/0x330 [btrfs]
>>>> [] btrfs_put_super+0x19/0x20 [btrfs]
>>>> [] generic_shutdown_super+0x6f/0x100
>>>> [] kill_anon_super+0x12/0x20
>>>> [] btrfs_kill_super+0x16/0xa0 [btrfs]
>>>> [] deactivate_locked_super+0x43/0x70
>>>> [] deactivate_super+0x5c/0x60
>>>> [] cleanup_mnt+0x3f/0x90
>>>> [] __cleanup_mnt+0x12/0x20
>>>> [] task_work_run+0x81/0xa0
>>>> [] exit_to_usermode_loop+0xb0/0xc0
>>>> [] syscall_return_slowpath+0xd4/0x130
>>>> [] int_ret_from_sys_call+0x25/0x8f
>>>> ---[ end trace cee6ace13018e13e ]--

Re: BTRFS: space_info 4 has 18446742286429913088 free, is not full

2016-10-07 Thread Stefan Priebe - Profihost AG
Am 07.10.2016 um 10:07 schrieb Wang Xiaoguang:
> hello,
> 
> On 10/07/2016 04:06 PM, Stefan Priebe - Profihost AG wrote:
>> and it shows:
>>
>> PAG | scan 33829e5  | steal 1968e3  | stall  0  |  |
>>|   |  swin  257071 |  swout 346960 |
>>
>> but the highest user space prog uses only 190MB.
> If you don't apply my patches, there will be no OOMs in your test
> environment?
> I want to confirm whether this OOM is caused by my patches...

This happens also without your patches. That's what i meant with can't
use v4.8.0.

Is it OK to try v4.7.6?

Greets,
Stefan

> 
> Regards,
> Xiaoguang Wang
> 
>>
>> greets,
>> Stefan
>>
>> Am 07.10.2016 um 09:17 schrieb Wang Xiaoguang:
>>> Hi,
>>>
>>> On 10/07/2016 03:03 PM, Stefan Priebe - Profihost AG wrote:
>>>> Dear Wang,
>>>>
>>>> can't use v4.8.0 as i always get OOMs and total machine crashes.
>>>>
>>>> Complete traces with your patch and some more btrfs patches applied (in
>>>> the hope in fixes the OOM but it did not):
>>>> http://pastebin.com/raw/6vmRSDm1
>>> I didn't see any such OOMs...
>>> Can you try holger's tree with my patches.
>>>
>>> Regards,
>>> Xiaoguang Wang
>>>> Greets,
>>>> Stefan
>>>> Am 06.10.2016 um 05:04 schrieb Wang Xiaoguang:
>>>>> Hi,
>>>>>
>>>>> On 09/29/2016 03:27 PM, Stefan Priebe - Profihost AG wrote:
>>>>>> Am 29.09.2016 um 09:13 schrieb Wang Xiaoguang:
>>>>>>>>> I found that compress sometime report ENOSPC error even in
>>>>>>>>> 4.8-rc8,
>>>>>>>>> currently
>>>>>>>> I cannot confirm that as i do not have anough space to test this
>>>>>>>> without
>>>>>>>> compression ;-( But yes i've compression enabled.
>>>>>>> I might not get you, my poor english :)
>>>>>>> You mean that you only get ENOSPC error when compression is enabled?
>>>>>>>
>>>>>>> And when compression is not enabled, you do not get ENOSPC error?
>>>>>> I can't tell you. I cannot test with compression not enabled. I do
>>>>>> not
>>>>>> have anough free space on this disk.
>>>>> I had just sent two patches to fix false enospc error for compression,
>>>>> please have a try, they fix false enospc error in my test environment.
>>>>>   btrfs: fix false enospc for compression
>>>>>   btrfs: improve inode's outstanding_extents computation
>>>>>
>>>>> I apply these two patchs in linux upstream tree, the latest commit
>>>>> is 41844e36206be90cd4d962ea49b0abc3612a99d0.
>>>>>
>>>>> Regards,
>>>>> Xiaoguang Wang
>>>>>
>>>>>>>>> I'm trying to fix it.
>>>>>>>> That sounds good but do you also get the
>>>>>>>> BTRFS: space_info 4 has 18446742286429913088 free, is not full
>>>>>>>>
>>>>>>>> kernel messages on umount? if not you might have found another
>>>>>>>> problem.
>>>>>>> Yes, I seem similar messages, you can paste you whole dmesg info
>>>>>>> here.
>>>>>> [ cut here ]
>>>>>> WARNING: CPU: 2 PID: 5187 at fs/btrfs/extent-tree.c:5790
>>>>>> btrfs_free_block_groups+0x346/0x430 [btrfs]()
>>>>>> Modules linked in: netconsole xt_multiport iptable_filter ip_tables
>>>>>> x_tables 8021q garp bonding x86_pkg_temp_thermal coretemp
>>>>>> kvm_intel kvm
>>>>>> irqbypass sb_edac crc32_pclmul edac_core i2c_i801 i40e(O) vxlan
>>>>>> ip6_udp_tunnel udp_tunnel shpchp ipmi_si ipmi_msghandler button loop
>>>>>> btrfs dm_mod raid10 raid0 multipath linear raid456 async_raid6_recov
>>>>>> async_memcpy async_pq async_xor async_tx xor raid6_pq igb
>>>>>> i2c_algo_bit
>>>>>> i2c_core usbhid raid1 md_mod xhci_pci sg ehci_pci xhci_hcd ehci_hcd
>>>>>> sd_mod ahci usbcore ptp libahci usb_common pps_core aacraid
>>>>>> CPU: 2 PID: 5187 Comm: umount Tainted: G O 4.4.22+63-ph #1
>>>>>> Hardware name: Supermicro Super Server/X10SRi-F,

Re: BTRFS: space_info 4 has 18446742286429913088 free, is not full

2016-10-07 Thread Stefan Priebe - Profihost AG
Hi Holger,

Am 07.10.2016 um 11:33 schrieb Holger Hoffstätte:
> On 10/07/16 09:17, Wang Xiaoguang wrote:
>> Hi,
>>
>> On 10/07/2016 03:03 PM, Stefan Priebe - Profihost AG wrote:
>>> Dear Wang,
>>>
>>> can't use v4.8.0 as i always get OOMs and total machine crashes.
>>>
>>> Complete traces with your patch and some more btrfs patches applied (in
>>> the hope in fixes the OOM but it did not):
>>> http://pastebin.com/raw/6vmRSDm1
>> I didn't see any such OOMs...
>> Can you try holger's tree with my patches.
> 
> They don't really apply to either 4.4.x (because it has diverged too
> much by now) or 4.8.x because of the initial dedupe support which came
> in as part of 4.9rc1 - there are way too many conflicts all over the
> place and merging them manually took way too much time.
> It would be useful if you could rebase your patches to for-next.
> 
> Stefan, have you tried setting THP to 'madvise' or 'never'?
> Try 'echo madvise > /sys/kernel/mm/transparent_hugepage/enabled'
> or boot with transparent_hugepage=madvise (or never) kernel flag.
> I have no idea if it will help, but it's worth a try.

It's already set to never. The hosts are currently still up and running
but only if i run
echo 3 >/proc/sys/vm/drop_caches

every 30 minutes. It seems the kernel fails to reclaim the cache itself
if user space needs memory.

Greets,
Stefan

> 
> -h
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS: space_info 4 has 18446742286429913088 free, is not full

2016-10-07 Thread Stefan Priebe - Profihost AG
Hi Wang,

currently on the system where it's working fine - no ENOSPC error. But
it will take a week to be sure they don't come back.

Thanks!

Greets,
Stefan
Am 06.10.2016 um 05:04 schrieb Wang Xiaoguang:
> Hi,
> 
> On 09/29/2016 03:27 PM, Stefan Priebe - Profihost AG wrote:
>> Am 29.09.2016 um 09:13 schrieb Wang Xiaoguang:
>>>>> I found that compress sometime report ENOSPC error even in 4.8-rc8,
>>>>> currently
>>>> I cannot confirm that as i do not have anough space to test this
>>>> without
>>>> compression ;-( But yes i've compression enabled.
>>> I might not get you, my poor english :)
>>> You mean that you only get ENOSPC error when compression is enabled?
>>>
>>> And when compression is not enabled, you do not get ENOSPC error?
>> I can't tell you. I cannot test with compression not enabled. I do not
>> have anough free space on this disk.
> I had just sent two patches to fix false enospc error for compression,
> please have a try, they fix false enospc error in my test environment.
> btrfs: fix false enospc for compression
> btrfs: improve inode's outstanding_extents computation
> 
> I apply these two patchs in linux upstream tree, the latest commit
> is 41844e36206be90cd4d962ea49b0abc3612a99d0.
> 
> Regards,
> Xiaoguang Wang
> 
>>
>>>>> I'm trying to fix it.
>>>> That sounds good but do you also get the
>>>> BTRFS: space_info 4 has 18446742286429913088 free, is not full
>>>>
>>>> kernel messages on umount? if not you might have found another problem.
>>> Yes, I seem similar messages, you can paste you whole dmesg info here.
>> [ cut here ]
>> WARNING: CPU: 2 PID: 5187 at fs/btrfs/extent-tree.c:5790
>> btrfs_free_block_groups+0x346/0x430 [btrfs]()
>> Modules linked in: netconsole xt_multiport iptable_filter ip_tables
>> x_tables 8021q garp bonding x86_pkg_temp_thermal coretemp kvm_intel kvm
>> irqbypass sb_edac crc32_pclmul edac_core i2c_i801 i40e(O) vxlan
>> ip6_udp_tunnel udp_tunnel shpchp ipmi_si ipmi_msghandler button loop
>> btrfs dm_mod raid10 raid0 multipath linear raid456 async_raid6_recov
>> async_memcpy async_pq async_xor async_tx xor raid6_pq igb i2c_algo_bit
>> i2c_core usbhid raid1 md_mod xhci_pci sg ehci_pci xhci_hcd ehci_hcd
>> sd_mod ahci usbcore ptp libahci usb_common pps_core aacraid
>> CPU: 2 PID: 5187 Comm: umount Tainted: G O 4.4.22+63-ph #1
>> Hardware name: Supermicro Super Server/X10SRi-F, BIOS 2.0 12/17/2015
>>  880fda777d00 813b69c3 
>> c067a099 880fda777d38 810821c6 
>> 880074bf0a00 88103c10c088 88103c10c000 88103c10c098
>> Call Trace:
>> [] dump_stack+0x63/0x90
>> [] warn_slowpath_common+0x86/0xc0
>> [] warn_slowpath_null+0x1a/0x20
>> [] btrfs_free_block_groups+0x346/0x430 [btrfs]
>> [] close_ctree+0x15d/0x330 [btrfs]
>> [] btrfs_put_super+0x19/0x20 [btrfs]
>> [] generic_shutdown_super+0x6f/0x100
>> [] kill_anon_super+0x12/0x20
>> [] btrfs_kill_super+0x16/0xa0 [btrfs]
>> [] deactivate_locked_super+0x43/0x70
>> [] deactivate_super+0x5c/0x60
>> [] cleanup_mnt+0x3f/0x90
>> [] __cleanup_mnt+0x12/0x20
>> [] task_work_run+0x81/0xa0
>> [] exit_to_usermode_loop+0xb0/0xc0
>> [] syscall_return_slowpath+0xd4/0x130
>> [] int_ret_from_sys_call+0x25/0x8f
>> ---[ end trace cee6ace13018e13e ]---
>> [ cut here ]
>> WARNING: CPU: 2 PID: 5187 at fs/btrfs/extent-tree.c:5791
>> btrfs_free_block_groups+0x365/0x430 [btrfs]()
>> Modules linked in: netconsole xt_multiport iptable_filter ip_tables
>> x_tables 8021q garp bonding x86_pkg_temp_thermal coretemp kvm_intel kvm
>> irqbypass sb_edac crc32_pclmul edac_core i2c_i801 i40e(O) vxlan
>> ip6_udp_tunnel udp_tunnel shpchp ipmi_si ipmi_msghandler button loop
>> btrfs dm_mod raid10 raid0 multipath linear raid456 async_raid6_recov
>> async_memcpy async_pq async_xor async_tx xor raid6_pq igb i2c_algo_bit
>> i2c_core usbhid raid1 md_mod xhci_pci sg ehci_pci xhci_hcd ehci_hcd
>> sd_mod ahci usbcore ptp libahci usb_common pps_core aacraid
>> CPU: 2 PID: 5187 Comm: umount Tainted: G W O 4.4.22+63-ph #1
>> Hardware name: Supermicro Super Server/X10SRi-F, BIOS 2.0 12/17/2015
>>  880fda777d00 813b69c3 
>> c067a099 880fda777d38 810821c6 
>> 880074bf0a00 88103c10c088 88103c10c000 88103c10c098
>> Call Trace:
>> [] dump_stack+0x

Re: BTRFS: space_info 4 has 18446742286429913088 free, is not full

2016-10-08 Thread Stefan Priebe - Profihost AG
main difference between the system where oom happens is:
- Single Xeon  => no OOM
- Dual Xeon / NUMA => OOM

both 64GB mem.
Am 07.10.2016 um 11:33 schrieb Holger Hoffstätte:
> On 10/07/16 09:17, Wang Xiaoguang wrote:
>> Hi,
>>
>> On 10/07/2016 03:03 PM, Stefan Priebe - Profihost AG wrote:
>>> Dear Wang,
>>>
>>> can't use v4.8.0 as i always get OOMs and total machine crashes.
>>>
>>> Complete traces with your patch and some more btrfs patches applied (in
>>> the hope in fixes the OOM but it did not):
>>> http://pastebin.com/raw/6vmRSDm1
>> I didn't see any such OOMs...
>> Can you try holger's tree with my patches.
> 
> They don't really apply to either 4.4.x (because it has diverged too
> much by now) or 4.8.x because of the initial dedupe support which came
> in as part of 4.9rc1 - there are way too many conflicts all over the
> place and merging them manually took way too much time.
> It would be useful if you could rebase your patches to for-next.
> 
> Stefan, have you tried setting THP to 'madvise' or 'never'?
> Try 'echo madvise > /sys/kernel/mm/transparent_hugepage/enabled'
> or boot with transparent_hugepage=madvise (or never) kernel flag.
> I have no idea if it will help, but it's worth a try.
> 
> -h
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS: space_info 4 has 18446742286429913088 free, is not full

2016-10-10 Thread Stefan Priebe - Profihost AG
Dear Wang,

Am 06.10.2016 um 05:04 schrieb Wang Xiaoguang:
> Hi,
> 
> On 09/29/2016 03:27 PM, Stefan Priebe - Profihost AG wrote:
>> Am 29.09.2016 um 09:13 schrieb Wang Xiaoguang:
>>>>> I found that compress sometime report ENOSPC error even in 4.8-rc8,
>>>>> currently
>>>> I cannot confirm that as i do not have anough space to test this
>>>> without
>>>> compression ;-( But yes i've compression enabled.
>>> I might not get you, my poor english :)
>>> You mean that you only get ENOSPC error when compression is enabled?
>>>
>>> And when compression is not enabled, you do not get ENOSPC error?
>> I can't tell you. I cannot test with compression not enabled. I do not
>> have anough free space on this disk.
> I had just sent two patches to fix false enospc error for compression,
> please have a try, they fix false enospc error in my test environment.
> btrfs: fix false enospc for compression
> btrfs: improve inode's outstanding_extents computation
> 
> I apply these two patchs in linux upstream tree, the latest commit
> is 41844e36206be90cd4d962ea49b0abc3612a99d0.

no space errors since 5 days! that's currently amazing. I Hope it stays
this and your patches get into 4.9.

Greets,
Stefan

> 
> Regards,
> Xiaoguang Wang
> 
>>
>>>>> I'm trying to fix it.
>>>> That sounds good but do you also get the
>>>> BTRFS: space_info 4 has 18446742286429913088 free, is not full
>>>>
>>>> kernel messages on umount? if not you might have found another problem.
>>> Yes, I seem similar messages, you can paste you whole dmesg info here.
>> [ cut here ]
>> WARNING: CPU: 2 PID: 5187 at fs/btrfs/extent-tree.c:5790
>> btrfs_free_block_groups+0x346/0x430 [btrfs]()
>> Modules linked in: netconsole xt_multiport iptable_filter ip_tables
>> x_tables 8021q garp bonding x86_pkg_temp_thermal coretemp kvm_intel kvm
>> irqbypass sb_edac crc32_pclmul edac_core i2c_i801 i40e(O) vxlan
>> ip6_udp_tunnel udp_tunnel shpchp ipmi_si ipmi_msghandler button loop
>> btrfs dm_mod raid10 raid0 multipath linear raid456 async_raid6_recov
>> async_memcpy async_pq async_xor async_tx xor raid6_pq igb i2c_algo_bit
>> i2c_core usbhid raid1 md_mod xhci_pci sg ehci_pci xhci_hcd ehci_hcd
>> sd_mod ahci usbcore ptp libahci usb_common pps_core aacraid
>> CPU: 2 PID: 5187 Comm: umount Tainted: G O 4.4.22+63-ph #1
>> Hardware name: Supermicro Super Server/X10SRi-F, BIOS 2.0 12/17/2015
>>  880fda777d00 813b69c3 
>> c067a099 880fda777d38 810821c6 
>> 880074bf0a00 88103c10c088 88103c10c000 88103c10c098
>> Call Trace:
>> [] dump_stack+0x63/0x90
>> [] warn_slowpath_common+0x86/0xc0
>> [] warn_slowpath_null+0x1a/0x20
>> [] btrfs_free_block_groups+0x346/0x430 [btrfs]
>> [] close_ctree+0x15d/0x330 [btrfs]
>> [] btrfs_put_super+0x19/0x20 [btrfs]
>> [] generic_shutdown_super+0x6f/0x100
>> [] kill_anon_super+0x12/0x20
>> [] btrfs_kill_super+0x16/0xa0 [btrfs]
>> [] deactivate_locked_super+0x43/0x70
>> [] deactivate_super+0x5c/0x60
>> [] cleanup_mnt+0x3f/0x90
>> [] __cleanup_mnt+0x12/0x20
>> [] task_work_run+0x81/0xa0
>> [] exit_to_usermode_loop+0xb0/0xc0
>> [] syscall_return_slowpath+0xd4/0x130
>> [] int_ret_from_sys_call+0x25/0x8f
>> ---[ end trace cee6ace13018e13e ]---
>> [ cut here ]
>> WARNING: CPU: 2 PID: 5187 at fs/btrfs/extent-tree.c:5791
>> btrfs_free_block_groups+0x365/0x430 [btrfs]()
>> Modules linked in: netconsole xt_multiport iptable_filter ip_tables
>> x_tables 8021q garp bonding x86_pkg_temp_thermal coretemp kvm_intel kvm
>> irqbypass sb_edac crc32_pclmul edac_core i2c_i801 i40e(O) vxlan
>> ip6_udp_tunnel udp_tunnel shpchp ipmi_si ipmi_msghandler button loop
>> btrfs dm_mod raid10 raid0 multipath linear raid456 async_raid6_recov
>> async_memcpy async_pq async_xor async_tx xor raid6_pq igb i2c_algo_bit
>> i2c_core usbhid raid1 md_mod xhci_pci sg ehci_pci xhci_hcd ehci_hcd
>> sd_mod ahci usbcore ptp libahci usb_common pps_core aacraid
>> CPU: 2 PID: 5187 Comm: umount Tainted: G W O 4.4.22+63-ph #1
>> Hardware name: Supermicro Super Server/X10SRi-F, BIOS 2.0 12/17/2015
>>  880fda777d00 813b69c3 
>> c067a099 880fda777d38 810821c6 
>> 880074bf0a00 88103c10c088 88103c10c000 88103c10c098
>> Call Trace:
>> [] dump_stack+0x63/0x90
>> [] warn_sl

btrfs and numa - needing drop_caches to keep speed up

2016-10-13 Thread Stefan Priebe - Profihost AG
Hello list,

while running the same workload on two machines (single xeon and a dual
xeon) both with 64GB RAM.

I need to run echo 3 >/proc/sys/vm/drop_caches every 15-30 minutes to
keep the speed as good as on the non numa system. I'm not sure whether
this is related to numa.

Is there any sysctl parameter to tune?

Tested with vanilla v4.8.1

Greets,
Stefan
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] btrfs: improve inode's outstanding_extents computation

2016-10-14 Thread Stefan Priebe - Profihost AG

Am 06.10.2016 um 04:51 schrieb Wang Xiaoguang:
> This issue was revealed by modifying BTRFS_MAX_EXTENT_SIZE(128MB) to 64KB,
> When modifying BTRFS_MAX_EXTENT_SIZE(128MB) to 64KB, fsstress test often
> gets these warnings from btrfs_destroy_inode():
>   WARN_ON(BTRFS_I(inode)->outstanding_extents);
>   WARN_ON(BTRFS_I(inode)->reserved_extents);
> 
> Simple test program below can reproduce this issue steadily.
> Note: you need to modify BTRFS_MAX_EXTENT_SIZE to 64KB to have test,
> otherwise there won't be such WARNING.
>   #include 
>   #include 
>   #include 
>   #include 
>   #include 
> 
>   int main(void)
>   {
>   int fd;
>   char buf[68 *1024];
> 
>   memset(buf, 0, 68 * 1024);
>   fd = open("testfile", O_CREAT | O_EXCL | O_RDWR);
>   pwrite(fd, buf, 68 * 1024, 64 * 1024);
>   return;
>   }
> 
> When BTRFS_MAX_EXTENT_SIZE is 64KB, and buffered data range is:
> 64KB  128K132KB
> |---|---|
>  64 + 4KB
> 
> 1) for above data range, btrfs_delalloc_reserve_metadata() will reserve
> metadata and set BTRFS_I(inode)->outstanding_extents to 2.
> (68KB + 64KB - 1) / 64KB == 2
> 
> Outstanding_extents: 2
> 
> 2) then btrfs_dirty_page() will be called to dirty pages and set
> EXTENT_DELALLOC flag. In this case, btrfs_set_bit_hook() will be called
> twice.
> The 1st set_bit_hook() call will set DEALLOC flag for the first 64K.
> 64KB  128KB
> |---|
>   64KB DELALLOC
> Outstanding_extents: 2
> 
> Set_bit_hooks() uses FIRST_DELALLOC flag to avoid re-increase
> outstanding_extents counter.
> So for 1st set_bit_hooks() call, it won't modify outstanding_extents,
> it's still 2.
> 
> Then FIRST_DELALLOC flag is *CLEARED*.
> 
> 3) 2nd btrfs_set_bit_hook() call.
> Because FIRST_DELALLOC have been cleared by previous set_bit_hook(),
> btrfs_set_bit_hook() will increase BTRFS_I(inode)->outstanding_extents by
> one, so now BTRFS_I(inode)->outstanding_extents is 3.
> 64KB128KB132KB
> |---||
>   64K DELALLOC   4K DELALLOC
> Outstanding_extents: 3
> 
> But the correct outstanding_extents number should be 2, not 3.
> The 2nd btrfs_set_bit_hook() call just screwed up this, and leads to the
> WARN_ON().
> 
> Normally, we can solve it by only increasing outstanding_extents in
> set_bit_hook().
> But the problem is for delalloc_reserve/release_metadata(), we only have
> a 'length' parameter, and calculate in-accurate outstanding_extents.
> If we only rely on set_bit_hook() release_metadata() will crew things up
> as it will decrease inaccurate number.
> 
> So the fix we use is:
> 1) Increase *INACCURATE* outstanding_extents at delalloc_reserve_meta
>Just as a place holder.
> 2) Increase *accurate* outstanding_extents at set_bit_hooks()
>This is the real increaser.
> 3) Decrease *INACCURATE* outstanding_extents before returning
>This makes outstanding_extents to correct value.
> 
> For 128M BTRFS_MAX_EXTENT_SIZE, due to limitation of
> __btrfs_buffered_write(), each iteration will only handle about 2MB
> data.
> So btrfs_dirty_pages() won't need to handle cases cross 2 extents.
> 
> Signed-off-by: Wang Xiaoguang 

Tested-by: Stefan Priebe 

Works fine since 8 days - no ENOSPC errors anymore.

Greets,
Stefan

> ---
>  fs/btrfs/ctree.h |  2 ++
>  fs/btrfs/inode.c | 65 
> ++--
>  fs/btrfs/ioctl.c |  6 ++
>  3 files changed, 62 insertions(+), 11 deletions(-)
> 
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index 33fe035..16885f6 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -3119,6 +3119,8 @@ int btrfs_start_delalloc_roots(struct btrfs_fs_info 
> *fs_info, int delay_iput,
>  int nr);
>  int btrfs_set_extent_delalloc(struct inode *inode, u64 start, u64 end,
> struct extent_state **cached_state);
> +int btrfs_set_extent_defrag(struct inode *inode, u64 start, u64 end,
> + struct extent_state **cached_state);
>  int btrfs_create_subvol_root(struct btrfs_trans_handle *trans,
>struct btrfs_root *new_root,
>struct btrfs_root *parent_root,
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index e6811c4..a7193b1 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -1590,6 +1590,9 @@ static void btrfs_split_extent_hook(struct inode *inode,
>   if (!(orig->state & EXTENT_DELALLOC))
>   return;
>  
> + if (btrfs_is_free_space_inode(inode))
> + return;
> +
>   size = orig->end - orig->start + 1;
>   if (si

Re: [PATCH 2/2] btrfs: fix false enospc for compression

2016-10-14 Thread Stefan Priebe - Profihost AG

Am 06.10.2016 um 04:51 schrieb Wang Xiaoguang:
> When testing btrfs compression, sometimes we got ENOSPC error, though fs
> still has much free space, xfstests generic/171, generic/172, generic/173,
> generic/174, generic/175 can reveal this bug in my test environment when
> compression is enabled.
> 
> After some debuging work, we found that it's btrfs_delalloc_reserve_metadata()
> which sometimes tries to reserve plenty of metadata space, even for very small
> data range. In btrfs_delalloc_reserve_metadata(), the number of metadata bytes
> we try to reserve is calculated by the difference between outstanding_extents
> and reserved_extents. Please see below case for how ENOSPC occurs:
> 
>   1, Buffered write 128MB data in unit of 128KB, so finially we'll have inode
> outstanding extents be 1, and reserved_extents be 1024. Note it's
> btrfs_merge_extent_hook() that merges these 128KB units into one big
> outstanding extent, but do not change reserved_extents.
> 
>   2, When writing dirty pages, for compression, cow_file_range_async() will
> split above big extent in unit of 128KB(compression extent size is 128KB).
> When first split opeartion finishes, we'll have 2 outstanding extents and 1024
> reserved extents, and just right now the currently generated ordered extent is
> dispatched to run and complete, then btrfs_delalloc_release_metadata()(see
> btrfs_finish_ordered_io()) will be called to release metadata, after that we
> will have 1 outstanding extents and 1 reserved extents(also see logic in
> drop_outstanding_extent()). Later cow_file_range_async() continues to handles
> left data range[128KB, 128MB), and if no other ordered extent was dispatched
> to run, there will be 1023 outstanding extents and 1 reserved extent.
> 
>   3, Now if another bufferd write for this file enters, then
> btrfs_delalloc_reserve_metadata() will at least try to reserve metadata
> for 1023 outstanding extents' metadata, for 16KB node size, it'll be 
> 1023*16384*2*8,
> about 255MB, for 64K node size, it'll be 1023*65536*8*2, about 1GB metadata, 
> so
> obviously it's not sane and can easily result in enospc error.
> 
> The root cause is that for compression, its max extent size will no longer be
> BTRFS_MAX_EXTENT_SIZE(128MB), it'll be 128KB, so current metadata reservation
> method in btrfs is not appropriate or correct, here we introduce:
>   enum btrfs_metadata_reserve_type {
>   BTRFS_RESERVE_NORMAL,
>   BTRFS_RESERVE_COMPRESS,
>   };
> and expand btrfs_delalloc_reserve_metadata() and 
> btrfs_delalloc_reserve_space()
> by adding a new enum btrfs_metadata_reserve_type argument. When a data range 
> will
> go through compression, we use BTRFS_RESERVE_COMPRESS to reserve metatata.
> Meanwhile we introduce EXTENT_COMPRESS flag to mark a data range that will go
> through compression path.
> 
> With this patch, we can fix these false enospc error for compression.
> 
> Signed-off-by: Wang Xiaoguang 

Tested-by: Stefan Priebe 

Works fine since 8 days - no ENOSPC errors anymore.

Greets,
Stefan


> ---
>  fs/btrfs/ctree.h |  31 ++--
>  fs/btrfs/extent-tree.c   |  55 +
>  fs/btrfs/extent_io.c |  59 +-
>  fs/btrfs/extent_io.h |   2 +
>  fs/btrfs/file.c  |  26 +--
>  fs/btrfs/free-space-cache.c  |   6 +-
>  fs/btrfs/inode-map.c |   5 +-
>  fs/btrfs/inode.c | 181 
> ---
>  fs/btrfs/ioctl.c |  12 ++-
>  fs/btrfs/relocation.c|  14 +++-
>  fs/btrfs/tests/inode-tests.c |  15 ++--
>  11 files changed, 309 insertions(+), 97 deletions(-)
> 
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index 16885f6..fa6a19a 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -97,6 +97,19 @@ static const int btrfs_csum_sizes[] = { 4 };
>  
>  #define BTRFS_DIRTY_METADATA_THRESH  SZ_32M
>  
> +/*
> + * for compression, max file extent size would be limited to 128K, so when
> + * reserving metadata for such delalloc writes, pass BTRFS_RESERVE_COMPRESS 
> to
> + * btrfs_delalloc_reserve_metadata() or btrfs_delalloc_reserve_space() to
> + * calculate metadata, for none-compression, use BTRFS_RESERVE_NORMAL.
> + */
> +enum btrfs_metadata_reserve_type {
> + BTRFS_RESERVE_NORMAL,
> + BTRFS_RESERVE_COMPRESS,
> +};
> +int inode_need_compress(struct inode *inode);
> +u64 btrfs_max_extent_size(enum btrfs_metadata_reserve_type reserve_type);
> +
>  #define BTRFS_MAX_EXTENT_SIZE SZ_128M
>  
>  struct btrfs_mapping_tree {
> @@ -2677,10 +2690,14 @@ int btrfs_subvolume_reserve_metadata(struct 
> btrfs_root *root,
>  void btrfs_subvolume_release_metadata(struct btrfs_root *root,
> struct btrfs_block_rsv *rsv,
> u64 qgroup_reserved);
> -int btrfs_delalloc_reserve_metadata(struct inode *inode, u64 num_bytes);
> -void btrfs_delalloc_release_metadata(struct inode *inode, u64 num_bytes);
> -i

Re: btrfs and numa - needing drop_caches to keep speed up

2016-10-14 Thread Stefan Priebe - Profihost AG
Dear julian,

Am 14.10.2016 um 14:26 schrieb Julian Taylor:
> On 10/14/2016 08:28 AM, Stefan Priebe - Profihost AG wrote:
>> Hello list,
>>
>> while running the same workload on two machines (single xeon and a dual
>> xeon) both with 64GB RAM.
>>
>> I need to run echo 3 >/proc/sys/vm/drop_caches every 15-30 minutes to
>> keep the speed as good as on the non numa system. I'm not sure whether
>> this is related to numa.
>>
>> Is there any sysctl parameter to tune?
>>
>> Tested with vanilla v4.8.1
>>
>> Greets,
>> Stefan
> 
> hi,
> why do you think this is related to btrfs?

was just an idea as i couldn't find any other difference between those
systems.

> This is easy to diagnose but recording some kernel stacks during the >
problem with perf.

you just mean perf top? Does it also show locking problems? As i see not
much CPU usage in that case.

> The only known issue that has this type of workaround that I know of are
> transparent huge pages.

I already disabled thp by:
echo never > /sys/kernel/mm/transparent_hugepage/enabled

cat /proc/meminfo says:
HugePages_Total:   0
HugePages_Free:0
HugePages_Rsvd:0
HugePages_Surp:0



Greets,
Stefan

> 
> cheers,
> Julian
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs and numa - needing drop_caches to keep speed up

2016-10-14 Thread Stefan Priebe - Profihost AG
Hi,
Am 14.10.2016 um 15:19 schrieb Stefan Priebe - Profihost AG:
> Dear julian,
> 
> Am 14.10.2016 um 14:26 schrieb Julian Taylor:
>> On 10/14/2016 08:28 AM, Stefan Priebe - Profihost AG wrote:
>>> Hello list,
>>>
>>> while running the same workload on two machines (single xeon and a dual
>>> xeon) both with 64GB RAM.
>>>
>>> I need to run echo 3 >/proc/sys/vm/drop_caches every 15-30 minutes to
>>> keep the speed as good as on the non numa system. I'm not sure whether
>>> this is related to numa.
>>>
>>> Is there any sysctl parameter to tune?
>>>
>>> Tested with vanilla v4.8.1
>>>
>>> Greets,
>>> Stefan
>>
>> hi,
>> why do you think this is related to btrfs?
> 
> was just an idea as i couldn't find any other difference between those
> systems.
> 
>> This is easy to diagnose but recording some kernel stacks during the >
> problem with perf.
> 
> you just mean perf top? Does it also show locking problems? As i see not
> much CPU usage in that case.


perf top looks like this:
   5,46%  libc-2.19.so   [.] memset
   5,26%  [kernel]   [k] page_fault
   3,63%  [kernel]   [k] clear_page_c_e
   1,38%  [kernel]   [k] _raw_spin_lock
   1,06%  [kernel]   [k] get_page_from_freelist
   0,83%  [kernel]   [k] copy_user_enhanced_fast_string
   0,79%  [kernel]   [k] release_pages
   0,68%  [kernel]   [k] handle_mm_fault
   0,57%  [kernel]   [k] free_hot_cold_page
   0,55%  [kernel]   [k] handle_pte_fault
   0,54%  [kernel]   [k] __pagevec_lru_add_fn
   0,45%  [kernel]   [k] unmap_page_range
   0,45%  [kernel]   [k] __mod_zone_page_state
   0,43%  [kernel]   [k] page_add_new_anon_rmap
   0,38%  [kernel]   [k] free_pcppages_bulk

> 
>> The only known issue that has this type of workaround that I know of are
>> transparent huge pages.
> 
> I already disabled thp by:
> echo never > /sys/kernel/mm/transparent_hugepage/enabled
> 
> cat /proc/meminfo says:
> HugePages_Total:   0
> HugePages_Free:0
> HugePages_Rsvd:0
> HugePages_Surp:0
> 
> 
> 
> Greets,
> Stefan
> 
>>
>> cheers,
>> Julian
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


speed up cp --reflink=always

2016-10-15 Thread Stefan Priebe - Profihost AG
Hello,

cp --reflink=always takes sometimes very long. (i.e. 25-35 minutes)

An example:

source file:
# ls -la vm-279-disk-1.img
-rw-r--r-- 1 root root 204010946560 Oct 14 12:15 vm-279-disk-1.img

target file after around 10 minutes:
# ls -la vm-279-disk-1.img.tmp
-rw-r--r-- 1 root root 65022328832 Oct 15 22:13 vm-279-disk-1.img.tmp

I/O Waits are at around 6% but disk usage is at around 100%.

The process using most of the disk I/O is a kworker process. A function
trace of this kworker for 30s is already 44MB - no idea where to upload.
This volume uses space_cache=v2.

While digging through it i see a lot of this calls:

   kworker/u65:4-20679 [007]  46021.641882: btrfs_set_token_32
<-btrfs_del_items
   kworker/u65:4-20679 [007]  46021.641882: btrfs_get_token_32
<-btrfs_del_items
   kworker/u65:4-20679 [007]  46021.641882: btrfs_set_token_32
<-btrfs_del_items
   kworker/u65:4-20679 [007]  46021.641882: btrfs_get_token_32
<-btrfs_del_items
   kworker/u65:4-20679 [007]  46021.641882: btrfs_set_token_32
<-btrfs_del_items
   kworker/u65:4-20679 [007]  46021.641882: btrfs_get_token_32
<-btrfs_del_items
   kworker/u65:4-20679 [007]  46021.641882: btrfs_set_token_32
<-btrfs_del_items
   kworker/u65:4-20679 [007]  46021.641882: btrfs_get_token_32
<-btrfs_del_items
   kworker/u65:4-20679 [007]  46021.641883: btrfs_set_token_32
<-btrfs_del_items
   kworker/u65:4-20679 [007]  46021.641883: btrfs_get_token_32
<-btrfs_del_items
   kworker/u65:4-20679 [007]  46021.641883: btrfs_set_token_32
<-btrfs_del_items
   kworker/u65:4-20679 [007]  46021.641883: btrfs_get_token_32
<-btrfs_del_items
   kworker/u65:4-20679 [007]  46021.641883: btrfs_set_token_32
<-btrfs_del_items
   kworker/u65:4-20679 [007]  46021.641883: btrfs_get_token_32
<-btrfs_del_items
   kworker/u65:4-20679 [007]  46021.641883: btrfs_set_token_32
<-btrfs_del_items
   kworker/u65:4-20679 [007]  46021.641883: btrfs_get_token_32
<-btrfs_del_items
   kworker/u65:4-20679 [007]  46021.641883: btrfs_set_token_32
<-btrfs_del_items
   kworker/u65:4-20679 [007]  46021.641883: btrfs_get_token_32
<-btrfs_del_items
   kworker/u65:4-20679 [007]  46021.641883: btrfs_set_token_32
<-btrfs_del_items
   kworker/u65:4-20679 [007]  46021.641884: btrfs_get_token_32
<-btrfs_del_items
   kworker/u65:4-20679 [007]  46021.641884: btrfs_set_token_32
<-btrfs_del_items
   kworker/u65:4-20679 [007]  46021.641884: btrfs_get_token_32
<-btrfs_del_items
   kworker/u65:4-20679 [007]  46021.641884: btrfs_set_token_32
<-btrfs_del_items
   kworker/u65:4-20679 [007]  46021.641884: btrfs_get_token_32
<-btrfs_del_items
   kworker/u65:4-20679 [007]  46021.641884: btrfs_set_token_32
<-btrfs_del_items
   kworker/u65:4-20679 [007]  46021.641884: btrfs_get_token_32
<-btrfs_del_items
   kworker/u65:4-20679 [007]  46021.641884: btrfs_set_token_32
<-btrfs_del_items
   kworker/u65:4-20679 [007]  46021.641884: btrfs_get_token_32
<-btrfs_del_items
   kworker/u65:4-20679 [007]  46021.641884: btrfs_set_token_32
<-btrfs_del_items
   kworker/u65:4-20679 [007]  46021.641884: btrfs_get_token_32
<-btrfs_del_items
   kworker/u65:4-20679 [007]  46021.641885: btrfs_set_token_32
<-btrfs_del_items
   kworker/u65:4-20679 [007]  46021.641885: btrfs_get_token_32
<-btrfs_del_items
   kworker/u65:4-20679 [007]  46021.641885: btrfs_set_token_32
<-btrfs_del_items
   kworker/u65:4-20679 [007]  46021.641885: btrfs_get_token_32
<-btrfs_del_items
   kworker/u65:4-20679 [007]  46021.641885: btrfs_set_token_32
<-btrfs_del_items
   kworker/u65:4-20679 [007]  46021.641885: btrfs_get_token_32
<-btrfs_del_items
   kworker/u65:4-20679 [007]  46021.641886: btrfs_set_token_32
<-btrfs_del_items
   kworker/u65:4-20679 [007]  46021.641886: btrfs_get_token_32
<-btrfs_del_items
   kworker/u65:4-20679 [007]  46021.641886: btrfs_set_token_32
<-btrfs_del_items

Sorting the calls shows:
   4892 _raw_spin_lock <-free_extent_buffer
   4894 release_extent_buffer <-free_extent_buffer
   6803 map_private_extent_buffer <-generic_bin_search.constprop.36
   6839 __set_page_dirty_nobuffers <-btree_set_page_dirty
   6840 btree_set_page_dirty <-set_page_dirty
   6840 mem_cgroup_begin_page_stat <-__set_page_dirty_nobuffers
   6840 page_mapping <-set_page_dirty
   6840 set_page_dirty <-set_extent_buffer_dirty
   6841 mem_cgroup_end_page_stat <-__set_page_dirty_nobuffers
   7521 btrfs_clear_lock_blocking_rw <-btrfs_clear_path_blocking
   7967 btrfs_get_token_64 <-read_block_for_search.isra.33
   8018 btrfs_set_token_32 <-btrfs_del_items
   8235 btrfs_get_token_32 <-btrfs_del_items
   8813 btrfs_set_lock_blocking_rw <-btrfs_set_path_blocking
   9235 map_private_extent_buffer <-btrfs_get_token_32
  11824 btrfs_set_token_32 <-btrfs_extend_item
  12090 map_private_extent_buffer <-btrfs_get_token_64
  12367 mark_page_accessed <-mark_extent_buffer_accessed
  12621 btrfs_get_token_32 <-btrfs_extend_item
  16267 btr

Re: speed up cp --reflink=always

2016-10-16 Thread Stefan Priebe - Profihost AG
Am 16.10.2016 um 00:37 schrieb Hans van Kranenburg:
> Hi,
> 
> On 10/15/2016 10:49 PM, Stefan Priebe - Profihost AG wrote:
>>
>> cp --reflink=always takes sometimes very long. (i.e. 25-35 minutes)
>>
>> An example:
>>
>> source file:
>> # ls -la vm-279-disk-1.img
>> -rw-r--r-- 1 root root 204010946560 Oct 14 12:15 vm-279-disk-1.img
>>
>> target file after around 10 minutes:
>> # ls -la vm-279-disk-1.img.tmp
>> -rw-r--r-- 1 root root 65022328832 Oct 15 22:13 vm-279-disk-1.img.tmp
> 
> Two quick thoughts:
> 1. How many extents does this img have?

filefrag says:
1011508 extents found

> 2. Is this an XY problem? Why not just put the img in a subvolume and
> snapshot that?

Sorry what's XY problem?

Implementing cp reflink was easier - as the original code was based on
XFS. But shouldn't be cp reflink / clone a file be nearly identical to a
snapshot? Just creating refs to the extents?

Greets,
Stefan
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: speed up cp --reflink=always

2016-10-16 Thread Stefan Priebe - Profihost AG
Am 16.10.2016 um 21:48 schrieb Hans van Kranenburg:
> On 10/16/2016 08:54 PM, Stefan Priebe - Profihost AG wrote:
>> Am 16.10.2016 um 00:37 schrieb Hans van Kranenburg:
>>> On 10/15/2016 10:49 PM, Stefan Priebe - Profihost AG wrote:
>>>>
>>>> cp --reflink=always takes sometimes very long. (i.e. 25-35 minutes)
>>>>
>>>> An example:
>>>>
>>>> source file:
>>>> # ls -la vm-279-disk-1.img
>>>> -rw-r--r-- 1 root root 204010946560 Oct 14 12:15 vm-279-disk-1.img
>>>>
>>>> target file after around 10 minutes:
>>>> # ls -la vm-279-disk-1.img.tmp
>>>> -rw-r--r-- 1 root root 65022328832 Oct 15 22:13 vm-279-disk-1.img.tmp
>>>
>>> Two quick thoughts:
>>> 1. How many extents does this img have?
>>
>> filefrag says:
>> 1011508 extents found
> 
> To cp --reflink this, the filesystem needs to create a million new
> EXTENT_DATA objects for the new file, which point all parts of the new
> file to all the little same parts of the old file, and probably also
> needs to update a million EXTENT_DATA objects in the btrees to add a
> second backreference back to the new file.

Thanks for this explanation.

> 
>>> 2. Is this an XY problem? Why not just put the img in a subvolume and
>>> snapshot that?
>>
>> Sorry what's XY problem?
> 
> It means that I suspected that your actual goal is not spending time to
> work on optimizing how cp --reflink works, but that you just want to use
> the quickest way to have a clone of the file.
> 
> An XY problem is when someone has problem X, then thinks about solution
> Y to solve it, then runs into a problem/limitation/whatever when trying
> Y and asks help with that actual problem when doing Y while there might
> in the end be a better solution to get X done.

ah ;-) makes sense.

>> Implementing cp reflink was easier - as the original code was based on
>> XFS. But shouldn't be cp reflink / clone a file be nearly identical to a
>> snapshot? Just creating refs to the extents?
> 
> Snapshotting a subvolume only has to write a cowed copy of the top-level
> information of the subvolume filesystem tree, and leaves the extent tree
> alone. It doesn't have to do 2 million different things. \o/

Thanks for this explanation. Will look into switching to subvolumes.
Wasn't able todo this before as i was always running into ENOSPC issues
which was solved last week.

Greets,
Stefan
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: speed up cp --reflink=always

2016-10-16 Thread Stefan Priebe - Profihost AG
Am 17.10.2016 um 03:50 schrieb Qu Wenruo:
> At 10/17/2016 02:54 AM, Stefan Priebe - Profihost AG wrote:
>> Am 16.10.2016 um 00:37 schrieb Hans van Kranenburg:
>>> Hi,
>>>
>>> On 10/15/2016 10:49 PM, Stefan Priebe - Profihost AG wrote:
>>>>
>>>> cp --reflink=always takes sometimes very long. (i.e. 25-35 minutes)
>>>>
>>>> An example:
>>>>
>>>> source file:
>>>> # ls -la vm-279-disk-1.img
>>>> -rw-r--r-- 1 root root 204010946560 Oct 14 12:15 vm-279-disk-1.img
>>>>
>>>> target file after around 10 minutes:
>>>> # ls -la vm-279-disk-1.img.tmp
>>>> -rw-r--r-- 1 root root 65022328832 Oct 15 22:13 vm-279-disk-1.img.tmp
>>>
>>> Two quick thoughts:
>>> 1. How many extents does this img have?
>>
>> filefrag says:
>> 1011508 extents found
> 
> Too many fragments.
> Average extent size is only about 200K.
> Quite common for VM images, if not setting no copy-on-write (C) attr.
> 
> Normally it's not a good idea to put VM images into btrfs without any
> tuning.

Those are backups just written sequentially once. As far as i know the
extent size is hardcoded to 128k for compression. Isn't it?

Stefan

> Thanks,
> Qu
>>
>>> 2. Is this an XY problem? Why not just put the img in a subvolume and
>>> snapshot that?
>>
>> Sorry what's XY problem?
>>
>> Implementing cp reflink was easier - as the original code was based on
>> XFS. But shouldn't be cp reflink / clone a file be nearly identical to a
>> snapshot? Just creating refs to the extents?
>>
>> Greets,
>> Stefan
>> -- 
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs goes readonly + No space left on 4.3

2016-05-09 Thread Stefan Priebe - Profihost AG

Am 03.05.2016 um 00:05 schrieb Omar Sandoval:
> On Fri, Apr 29, 2016 at 10:48:15PM +0200, Stefan Priebe wrote:
>> just want to drop a note that all those ENOSPC msg are gone with v4.5 and
>> space_cache=v2. Any plans to make space_cache=v2 default?
>>
>> Greets,
>> Stefan
> 
> Yup, we want to make space_cache=v2 the default at some point. I'm
> running it on my own machines and testing it here at Facebook and
> haven't run into any issues yet. Besides stability, I also want to make
> sure there aren't any performance regressions versus the old free space
> cache that we haven't thought about yet.
> 
> Thanks for trying it out :)

Can i patch v2 as a default for me? I just looked at the code but didn't
find an easy way to make v2 the default.

Greets,
Stefan
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


big volumes only work reliable with ssd_spread

2018-01-15 Thread Stefan Priebe - Profihost AG
Hello,

since around two or three years i'm using btrfs for incremental VM backups.

some data:
- volume size 60TB
- around 2000 subvolumes
- each differential backup stacks on top of a subvolume
- compress-force=zstd
- space_cache=v2
- no quote / qgroup

this works fine since Kernel 4.14 except that i need ssd_spread as an
option. If i do not use ssd_spread i always end up with very slow
performance and a single kworker process using 100% CPU after some days.

With ssd_spread those boxes run fine since around 6 month. Is this
something expected? I haven't found any hint regarding such an impact.

Thanks!

Greets,
Stefan
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


btrfs-progs: check --repair crashes with BUG ON

2017-11-04 Thread Stefan Priebe - Profihost AG
Hello,

after a power failure i have a btrfs volume which isn't mountable.

dmesg shows:
parent transid verify failed on 181846016 wanted 143404 found 143399

If i run:
btrfs check --repair /dev/mapper/crypt_md1

The output is:
parent transid verify failed on 181846016 wanted 143404 found 143399
parent transid verify failed on 181846016 wanted 143404 found 143399
Ignoring transid failure
Clearing log on /dev/mapper/crypt_md0, previous log_root 1520200695808,
level 0
parent transid verify failed on 308183040 wanted 143404 found 143399
parent transid verify failed on 308183040 wanted 143404 found 143399
Ignoring transid failure
parent transid verify failed on 338870272 wanted 143404 found 143399
parent transid verify failed on 338870272 wanted 143404 found 143399
Ignoring transid failure
parent transid verify failed on 12778157178880 wanted 143404 found 143399
parent transid verify failed on 12778157178880 wanted 143404 found 143399
Ignoring transid failure
leaf parent key incorrect 38699008
btrfs unable to find ref byte nr 12778147823616 parent 0 root 2  owner 0
offset 0
parent transid verify failed on 308183040 wanted 143404 found 143399
Ignoring transid failure
leaf parent key incorrect 91766784
extent-tree.c:2725: alloc_reserved_tree_block: BUG_ON `ret` triggered,
value -1
./btrfs[0x415cb3]
./btrfs[0x416ee5]
./btrfs[0x417104]
./btrfs[0x418cea]
./btrfs[0x418f06]
./btrfs(btrfs_alloc_free_block+0x1e4)[0x41b8d0]
./btrfs(__btrfs_cow_block+0xd3)[0x40c5f9]
./btrfs(btrfs_cow_block+0x110)[0x40d03b]
./btrfs(commit_tree_roots+0x53)[0x439baa]
./btrfs(btrfs_commit_transaction+0xf9)[0x439f75]
./btrfs[0x467212]
./btrfs(handle_command_group+0x5d)[0x40b360]
./btrfs(cmd_rescue+0x15)[0x46749f]
./btrfs(main+0x163)[0x40b5e9]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7fc63f25db45]
./btrfs[0x40b0b9]
Aborted

This is btrfs-progs branch: devel - same happens with master or v4.13.3.

Stefan
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


how to repair or access broken btrfs?

2017-11-14 Thread Stefan Priebe - Profihost AG
Hello,

after a controller firmware bug / failure i've a broken btrfs.

# parent transid verify failed on 181846016 wanted 143404 found 143399

running repair, fsck or zero-log always results in the same failure message:
extent-tree.c:2725: alloc_reserved_tree_block: BUG_ON `ret` triggered,
value -1
.. stack trace ..

Is there an chance to get at least a single file out of the broken fs?

Greets,
Stefan


Complete output:
./btrfs check --repair /dev/mapper/crypt_md0
enabling repair mode
parent transid verify failed on 181846016 wanted 143404 found 143399
parent transid verify failed on 181846016 wanted 143404 found 143399
Ignoring transid failure
Checking filesystem on /dev/mapper/crypt_md0
UUID: d3f9eee9-efbd-4590-858f-27b39d453350
repair mode will force to clear out log tree, are you sure? [y/N]: y
parent transid verify failed on 308183040 wanted 143404 found 143399
parent transid verify failed on 308183040 wanted 143404 found 143399
Ignoring transid failure
parent transid verify failed on 338870272 wanted 143404 found 143399
parent transid verify failed on 338870272 wanted 143404 found 143399
Ignoring transid failure
parent transid verify failed on 12778157178880 wanted 143404 found 143399
parent transid verify failed on 12778157178880 wanted 143404 found 143399
Ignoring transid failure
leaf parent key incorrect 38699008
btrfs unable to find ref byte nr 12778147823616 parent 0 root 2  owner 0
offset 0
parent transid verify failed on 308183040 wanted 143404 found 143399
Ignoring transid failure
leaf parent key incorrect 91766784
extent-tree.c:2725: alloc_reserved_tree_block: BUG_ON `ret` triggered,
value -1
./btrfs[0x415cb3]
./btrfs[0x416ee5]
./btrfs[0x417104]
./btrfs[0x418cea]
./btrfs[0x418f06]
./btrfs(btrfs_alloc_free_block+0x1e4)[0x41b8d0]
./btrfs(__btrfs_cow_block+0xd3)[0x40c5f9]
./btrfs(btrfs_cow_block+0x110)[0x40d03b]
./btrfs(commit_tree_roots+0x53)[0x439a37]
./btrfs(btrfs_commit_transaction+0xf9)[0x439e02]
./btrfs(cmd_check+0x861)[0x46172e]
./btrfs(main+0x163)[0x40b5e9]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7f44b14fab45]
./btrfs[0x40b0b9]
Aborted
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: how to repair or access broken btrfs?

2017-11-14 Thread Stefan Priebe - Profihost AG

Am 14.11.2017 um 18:45 schrieb Andrei Borzenkov:
> 14.11.2017 12:56, Stefan Priebe - Profihost AG пишет:
>> Hello,
>>
>> after a controller firmware bug / failure i've a broken btrfs.
>>
>> # parent transid verify failed on 181846016 wanted 143404 found 143399
>>
>> running repair, fsck or zero-log always results in the same failure message:
>> extent-tree.c:2725: alloc_reserved_tree_block: BUG_ON `ret` triggered,
>> value -1
>> .. stack trace ..
>>
>> Is there an chance to get at least a single file out of the broken fs?
>>
> 
> Did you try "btrfs restore"?

Great that worked for that file. Still wondering why a repair is not
possible.

Greets,
Stefan

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: problems with btrfs send / restore

2012-10-16 Thread Stefan Priebe - Profihost AG

Am 15.10.2012 22:14, schrieb Alex Lyakas:

Stefan,
the second issue you're seeing was discussed here:
http://www.spinics.net/lists/linux-btrfs/msg19672.html

You can apply the patch I sent there meanwhile, but as Miao pointed
out, I will need to make a better patch (hope will do it soon,
together with this one).


ah OK thanks. So i hope we'll see an updated btrfs-progs git repo soon ;-)

Thanks!

Stefan
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Production use with vanilla 3.6.6

2012-11-05 Thread Stefan Priebe - Profihost AG

Hello list,

is btrfs ready for production use in 3.6.6? Or should i backport fixes 
from 3.7-rc?


Is it planned to have a stable kernel which will get all btrfs fixes 
backported?


Greets
Stefan
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


problem with ceph and btrfs patch: set journal_info in async trans commit worker

2012-11-14 Thread Stefan Priebe - Profihost AG

Hello list,

i wanted to try out ceph with latest vanilla kernel 3.7-rc5. I was 
seeing a massive performance degration. I see around 22x 
btrfs-endio-write processes every 10-20 seconds and they run a long time 
while consuming a massive amount of CPU.


So my performance of 23.000 iops drops to an up and down of 23.000 iops 
to 0 - avg is now 2500 iops instead of 23.000.


Git bisect shows me commit: e209db7ace281ca347b1ac699bf1fb222eac03fe 
"Btrfs: set journal_info in async trans commit worker" as the 
problematic patch.


When i revert this one everything is fine again.

Is this known?

Greets,
Stefan
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: problem with ceph and btrfs patch: set journal_info in async trans commit worker

2012-11-15 Thread Stefan Priebe - Profihost AG

Hi Miao,

Am 15.11.2012 06:18, schrieb Miao Xie:

Hi, Stefan

On wed, 14 Nov 2012 14:42:07 +0100, Stefan Priebe - Profihost AG wrote:

Hello list,

i wanted to try out ceph with latest vanilla kernel 3.7-rc5. I was seeing a 
massive performance degration. I see around 22x btrfs-endio-write processes 
every 10-20 seconds and they run a long time while consuming a massive amount 
of CPU.

So my performance of 23.000 iops drops to an up and down of 23.000 iops to 0 - 
avg is now 2500 iops instead of 23.000.

Git bisect shows me commit: e209db7ace281ca347b1ac699bf1fb222eac03fe "Btrfs: set 
journal_info in async trans commit worker" as the problematic patch.

When i revert this one everything is fine again.

Is this known?


Could you try the following patch?

http://marc.info/?l=linux-btrfs&m=135175512030453&w=2

I think the patch

   Btrfs: set journal_info in async trans commit worker

is not the real reason that caused the regression.

I guess it is caused by the bug of the reservation. When we join the
same transaction handle more than 2 times, the pointer of the reservation
in the transaction handle would be lost, and the statistical data in the
reservation would be corrupted. And then we would trigger the space flush,
which may block your tasks.


i applied your whole patchset. It looks a lot better now but avg iops is 
now 5000 iops and not 23.000 like when removing the mentioned commit 
(e209db7ace281ca347b1ac699bf1fb222eac03fe).


Stefan
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs deadlock in 3.5-rc3

2012-06-25 Thread Stefan Priebe - Profihost AG

Am 25.06.2012 15:08, schrieb Josef Bacik:>
> This isn't showing the guy who's actually trying to commit the
> transaction.  Next time this happens can you do a sysrq+w and capture
> the output and post it here so we can see what everybody is doing? 
Thanks,

>
> Josef

No problem.

Kernel trace:
http://pastebin.com/raw.php?i=puZkCRCn

# sys rq w trigger:
# echo w > /proc/sysrq-trigger
http://pastebin.com/raw.php?i=AQA8xxCX

Hope that helps. I'm able to trigger this pretty easily with ceph. So i 
can produce as much info as you want.


Thanks!

Greets,
Stefan

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs deadlock in 3.5-rc3

2012-06-25 Thread Stefan Priebe - Profihost AG


Thats weird, sysrq+w should have a bunch of stacktraces but it's empty, so
unless theres a bug theres nothing blocked.  Is the box actually hung or is it
just taking forever?  Maybe try sysrq+w again to see if the one you pasted was
just a fluke?  Thanks,


This one looks better:
http://pastebin.com/raw.php?i=R4pztDRt

Sorry.

Stefan
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs deadlock in 3.5-rc3

2012-06-26 Thread Stefan Priebe - Profihost AG
Yes i will do so. Right now i was trying to compare discard with non 
discard with this simple command:
for i in `seq 0 1 1000`; do dd if=/dev/zero of=t_$i bs=4M count=1; rm 
t_$i; done;


But i hit a new bug:

[39577.660228] BUG: unable to handle kernel paging request at 
fe50

[39577.686517] IP: [] btrfs_finish_ordered_io+0x23/0x3f0
[39577.713417] PGD 1c0d067 PUD 1c0e067 PMD 0
[39577.740039] Oops:  [#1] SMP
[39577.766401] CPU 6
[39577.792540] Modules linked in: nf_conntrack_ipv4 nf_conntrack 
nf_defrag_ipv4 ipv6 i2c_i801 coretemp i2c_core ixgbe(O) [last unloaded: 
scsi_wait_scan]

[39577.847511]
[39577.847513] Pid: 3447, comm: btrfs-endio-wri Tainted: G   O 
3.5.0-rc4intel #15 Supermicro 
X9SRE/X9SRE-3F/X9SRi/X9SRi-3F/X9SRE/X9SRE-3F/X9SRi/X9SRi-3F
[39577.847516] RIP: 0010:[]  [] 
btrfs_finish_ordered_io+0x23/0x3f0

[39577.847516] RSP: 0018:880e3b861d90  EFLAGS: 00010287
[39577.847517] RAX: 880e3b861e90 RBX: 880e3a8fb100 RCX: 
880e3b861e90
[39577.847517] RDX: 880e3b861e90 RSI: 880e3a8fb190 RDI: 
880e3a8fb100
[39577.847518] RBP: 880e3b861e10 R08: dead00100100 R09: 
dead00200200
[39577.847518] R10:  R11: 0001 R12: 
880e3a624770
[39577.847518] R13:  R14: 880e3a8fb1b8 R15: 
880e3b861e80
[39577.847519] FS:  () GS:880e7fd8() 
knlGS:

[39577.847520] CS:  0010 DS:  ES:  CR0: 8005003b
[39577.847520] CR2: fe50 CR3: 01c0b000 CR4: 
000407e0
[39577.847521] DR0:  DR1:  DR2: 

[39577.847521] DR3:  DR6: 0ff0 DR7: 
0400
[39577.847522] Process btrfs-endio-wri (pid: 3447, threadinfo 
880e3b86, task 880e40e58000)

[39577.847522] Stack:
[39577.847524]   dead00200200 000100965b86 
880e40e94000
[39577.847525]  8104dc20 880e40e58000  

[39577.847526]    880e40e58000 
880e3a624720

[39577.847527] Call Trace:
[39577.847530]  [] ? lock_timer_base+0x70/0x70
[39577.847531]  [] finish_ordered_fn+0x10/0x20
[39577.847533]  [] worker_loop+0x14e/0x530
[39577.847534]  [] ? btrfs_queue_worker+0x310/0x310
[39577.847535]  [] ? btrfs_queue_worker+0x310/0x310
[39577.847538]  [] kthread+0x96/0xa0
[39577.847541]  [] kernel_thread_helper+0x4/0x10
[39577.847543]  [] ? kthread_worker_fn+0x130/0x130
[39577.847544]  [] ? gs_change+0xb/0xb
[39577.847555] Code: 0f 1f 84 00 00 00 00 00 55 48 89 e5 48 83 c4 80 48 
89 5d d8 4c 89 65 e0 4c 89 6d e8 4c 89 75 f0 4c 89 7d f8 48 89 fb 4c 8b 
6f 38 <4d> 8b a5 50 fe ff ff 4d 8d 95 50 fe ff ff 48 c7 45 c8 00 00 00

[39577.847556] RIP  [] btrfs_finish_ordered_io+0x23/0x3f0
[39577.847557]  RSP 
[39577.847557] CR2: fe50
[39577.847558] ---[ end trace 27bdc0b318ad6463 ]---

Am 26.06.2012 22:48, schrieb Josef Bacik:

On Tue, Jun 26, 2012 at 02:19:17PM -0600, Stefan Priebe wrote:

Am 26.06.2012 22:14, schrieb Josef Bacik:

I can't reproduce so I'm going to have to figure out a way to debug it through
you, as soon as I think of something I will let you know.  Thanks,



Thanks. You mentioned that discard shouldn't have any positive effects
on a SSD.

May i see a sideffect? I mean with discard 13.000 IOPs in ceph without
discard just 6000-9000 IOPs could this be real or might this just happen
due to the bug i see?



Beats me, it would seem to me that discard would make things slower since we
have to wait for the commit to discard everybody, but who knows, stranger things
have happened.  Can you reproduce 2 more times and get sysrq+w each time so I
have a few different outputs to compare, maybe I'm missing something.  Thanks,

Josef



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


<    1   2