Re: btrfs crash when low on memory.
Am Mittwoch, 27. Februar 2013 schrieb Ahmet Inan: > > Yeah we have a lot of > > > > ptr = kmalloc(); > > BUG_ON(ptr); > > > > everywhere. I'll fix this one up but I really need to sit down and go > > through all of them and make sure we do the right thing in all these > > places. Thanks, > > But what would be the right thing to do when you got no memory? > Spinlock until you can kmalloc? Pre-reserve some memory? > > At the moment im using: > > vm.min_free_kbytes = 65536 > > Which helps most of the time and i think is the better way to handle > this kind of Situation. Thank you. Raising /proc/sys/vm/min_free_kbytes from about 65000 to 20 KiB helped here on a ThinkPad T520 equipped with a 8 MiB for RAM. I now have the oom killer raised while Planeshift client was doing accesses to BTRFS. Thus I have a complete OOM backtrace and I bet thats likely the place where before BTRFS crashed the kernel before the OOM killer could chime in to clear up the situation. The rtkit-daemon invoked the OOM killer, since I do not use Pulseaudio anymore since it didnĀ“t work to my satisfaction on this machine as well, I might remove it. But then someone else will likely trigger it. :) If need be, I would reduce the min_free_kbytes again until it crashes some more time and do a screenshot, but a screenshot always only just shows part of the trace. Some my memory the backtrace I saw on tty1 was similar to the psclient.bin backtrace in the following OOM excerpt. Otherwise I will leave it at that and make the min_free_kbytes setting permanent :), or maybe disable the over commit to let Planeshift client fail and possibly crash earlier on out of memory conditions. Mar 4 22:56:00 merkaba rtkit-daemon[1575]: The canary thread is apparently starving. Taking action. Mar 4 22:56:00 merkaba kernel: [183059.738831] rtkit-daemon invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0 Mar 4 22:56:00 merkaba kernel: [183059.738837] rtkit-daemon cpuset=/ mems_allowed=0 Mar 4 22:56:00 merkaba kernel: [183059.738841] Pid: 1579, comm: rtkit-daemon Tainted: G O 3.8.0-tp520 #40 Mar 4 22:56:00 merkaba kernel: [183059.738843] Call Trace: Mar 4 22:56:00 merkaba kernel: [183059.738851] [] ? _raw_spin_unlock+0x26/0x31 Mar 4 22:56:00 merkaba kernel: [183059.738856] [] dump_header.isra.9+0x6b/0x1cd Mar 4 22:56:00 merkaba kernel: [183059.738860] [] ? _raw_spin_unlock_irqrestore+0x2e/0x39 Mar 4 22:56:00 merkaba kernel: [183059.738865] [] ? ___ratelimit+0xc9/0xe7 Mar 4 22:56:00 merkaba kernel: [183059.738869] [] oom_kill_process+0x62/0x2bc Mar 4 22:56:00 merkaba kernel: [183059.738874] [] ? rcu_read_unlock_special+0x138/0x162 Mar 4 22:56:00 merkaba kernel: [183059.738877] [] out_of_memory+0x3c4/0x3f7 Mar 4 22:56:00 merkaba kernel: [183059.738881] [] __alloc_pages_nodemask+0x548/0x6c7 Mar 4 22:56:00 merkaba kernel: [183059.738886] [] alloc_pages_current+0xc0/0xdd Mar 4 22:56:00 merkaba kernel: [183059.738889] [] __page_cache_alloc+0x87/0x93 Mar 4 22:56:00 merkaba kernel: [183059.738893] [] filemap_fault+0x250/0x35f Mar 4 22:56:00 merkaba kernel: [183059.738897] [] __do_fault+0xa6/0x351 Mar 4 22:56:00 merkaba kernel: [183059.738900] [] handle_pte_fault+0x28e/0x73f Mar 4 22:56:00 merkaba kernel: [183059.738905] [] ? try_to_wake_up+0x1b7/0x1c9 Mar 4 22:56:00 merkaba kernel: [183059.738908] [] ? pmd_offset+0x10/0x3d Mar 4 22:56:00 merkaba kernel: [183059.738911] [] handle_mm_fault+0x1d8/0x1f2 Mar 4 22:56:00 merkaba kernel: [183059.738915] [] __do_page_fault+0x37b/0x3c5 Mar 4 22:56:00 merkaba kernel: [183059.738920] [] ? timespec_add_safe+0x22/0x51 Mar 4 22:56:00 merkaba kernel: [183059.738924] [] ? paravirt_read_tsc+0x9/0xd Mar 4 22:56:00 merkaba kernel: [183059.738928] [] ? read_tsc+0x9/0x19 Mar 4 22:56:00 merkaba kernel: [183059.738931] [] ? timekeeping_get_ns.constprop.8+0x13/0x3a Mar 4 22:56:00 merkaba kernel: [183059.738935] [] ? ktime_get_ts+0x47/0x87 Mar 4 22:56:00 merkaba kernel: [183059.738939] [] ? poll_select_set_timeout+0x53/0x6f Mar 4 22:56:00 merkaba kernel: [183059.738942] [] do_page_fault+0x9/0xb Mar 4 22:56:00 merkaba kernel: [183059.738945] [] page_fault+0x28/0x30 Mar 4 22:56:00 merkaba kernel: [183059.738947] Mem-Info: Mar 4 22:56:00 merkaba kernel: [183059.738949] Node 0 DMA per-cpu: Mar 4 22:56:00 merkaba kernel: [183059.738952] CPU0: hi:0, btch: 1 usd: 0 Mar 4 22:56:00 merkaba kernel: [183059.738954] CPU1: hi:0, btch: 1 usd: 0 Mar 4 22:56:00 merkaba kernel: [183059.738956] CPU2: hi:0, btch: 1 usd: 0 Mar 4 22:56:00 merkaba kernel: [183059.738958] CPU3: hi:0, btch: 1 usd: 0 Mar 4 22:56:00 merkaba kernel: [183059.738960] Node 0 DMA32 per-cpu: Mar 4 22:56:00 merkaba kernel: [183059.738962] CPU0: hi: 186, btch: 31 usd: 0 Mar 4 22:56:00 merkaba kernel: [183059.738964] CPU1: hi: 186, btch: 31 usd: 0 Mar 4 22:56:00 merkaba kernel: [183059.738966] CPU
Re: btrfs crash when low on memory.
> If we're corrupting on abort that is a bug too that needs to be fixed > too. I've banged on the abort stuff a lot recently when trying to > make it not panic the box and it appears to work fine. Obviously that > kind of stuff needs to be tested as well, but so far I haven't seen > abort corrupt the file system. Thanks, thank you for the info Josef. i will report a bug next time i hit such a case then. Ahmet -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs crash when low on memory.
On Wed, Feb 27, 2013 at 3:10 PM, Ahmet Inan wrote: > On Wed, Feb 27, 2013 at 7:26 PM, Josef Bacik wrote: >> On Wed, Feb 27, 2013 at 07:31:11AM -0700, Ahmet Inan wrote: >>> > Yeah we have a lot of >>> > >>> > ptr = kmalloc(); >>> > BUG_ON(ptr); >>> > >>> > everywhere. I'll fix this one up but I really need to sit down and go >>> > through >>> > all of them and make sure we do the right thing in all these places. >>> > Thanks, >>> >>> But what would be the right thing to do when you got no memory? >>> Spinlock until you can kmalloc? Pre-reserve some memory? >>> >> >> Return ENOMEM? We have a way to abort transactions now, if it's in a >> horrible >> of enough spot we can just abort the transaction and let the user deal with >> the >> aftermath, it's nicer than panicing. Thanks, > > youre right. i am only afraid of silent corruption of data on aborts: > our guys here trigger OOM all the time with their compilers and > numerical codes (go figure). > and until now we had no more aborts / panics because of > "vm.min_free_kbytes = 65536" and thus no corruption. > > my point is: > i like a freezing computer more than an corrupting computer, even if > its a server. reboot to the rescue. > If we're corrupting on abort that is a bug too that needs to be fixed too. I've banged on the abort stuff a lot recently when trying to make it not panic the box and it appears to work fine. Obviously that kind of stuff needs to be tested as well, but so far I haven't seen abort corrupt the file system. Thanks, Josef -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs crash when low on memory.
On Wed, Feb 27, 2013 at 7:26 PM, Josef Bacik wrote: > On Wed, Feb 27, 2013 at 07:31:11AM -0700, Ahmet Inan wrote: >> > Yeah we have a lot of >> > >> > ptr = kmalloc(); >> > BUG_ON(ptr); >> > >> > everywhere. I'll fix this one up but I really need to sit down and go >> > through >> > all of them and make sure we do the right thing in all these places. >> > Thanks, >> >> But what would be the right thing to do when you got no memory? >> Spinlock until you can kmalloc? Pre-reserve some memory? >> > > Return ENOMEM? We have a way to abort transactions now, if it's in a horrible > of enough spot we can just abort the transaction and let the user deal with > the > aftermath, it's nicer than panicing. Thanks, youre right. i am only afraid of silent corruption of data on aborts: our guys here trigger OOM all the time with their compilers and numerical codes (go figure). and until now we had no more aborts / panics because of "vm.min_free_kbytes = 65536" and thus no corruption. my point is: i like a freezing computer more than an corrupting computer, even if its a server. reboot to the rescue. Ahmet -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs crash when low on memory.
On Wed, Feb 27, 2013 at 07:31:11AM -0700, Ahmet Inan wrote: > > Yeah we have a lot of > > > > ptr = kmalloc(); > > BUG_ON(ptr); > > > > everywhere. I'll fix this one up but I really need to sit down and go > > through > > all of them and make sure we do the right thing in all these places. > > Thanks, > > But what would be the right thing to do when you got no memory? > Spinlock until you can kmalloc? Pre-reserve some memory? > Return ENOMEM? We have a way to abort transactions now, if it's in a horrible of enough spot we can just abort the transaction and let the user deal with the aftermath, it's nicer than panicing. Thanks, Josef -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs crash when low on memory.
> Yeah we have a lot of > > ptr = kmalloc(); > BUG_ON(ptr); > > everywhere. I'll fix this one up but I really need to sit down and go through > all of them and make sure we do the right thing in all these places. Thanks, But what would be the right thing to do when you got no memory? Spinlock until you can kmalloc? Pre-reserve some memory? At the moment im using: vm.min_free_kbytes = 65536 Which helps most of the time and i think is the better way to handle this kind of Situation. Ahmet -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs crash when low on memory.
On Tue, Feb 26, 2013 at 10:22:47PM -0700, Dave Jones wrote: > Something I've yet to repeat managed to leak a whole bunch of memory > while I was travelling, and locked up my workstation. > > When I got home, this was the last thing printed out before it locked up > (it did make it into the logs thankfully) after a bunch of instances of > the oom-killers handywork. Yeah we have a lot of ptr = kmalloc(); BUG_ON(ptr); everywhere. I'll fix this one up but I really need to sit down and go through all of them and make sure we do the right thing in all these places. Thanks, Josef -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs crash when low on memory.
Am Mittwoch, 27. Februar 2013 schrieb Dave Jones: > Something I've yet to repeat managed to leak a whole bunch of memory > while I was travelling, and locked up my workstation. > > When I got home, this was the last thing printed out before it locked up > (it did make it into the logs thankfully) after a bunch of instances of > the oom-killers handywork. > > > > SLUB: Unable to allocate memory on node -1 (gfp=0x50) > cache: btrfs_extent_state, object size: 176, buffer size: 504, default > order: 1, min order: 0 node 0: slabs: 49, objs: 640, free: 0 > [ cut here ] > kernel BUG at fs/btrfs/extent_io.c:748! Thank you for reporting this Dave. I have lockups due to memory pressure conditions on my ThinkPad T520 as well when playing Planeshift for some time. (AFAIR since I switched my home directory to BTRFS (/ was BTRFS before), but I am not sure about this.) Planeshift goes from 2 GB to about 4 GB RSS and then the machine usually starts to swap to SSD. I did not get around to report this yet. The machine is basically locked (at least for long periods of times like minutes). I intend to collect some photos and upload them somewhere, cause I do not see anything in logs after reboot. I think this happens *before* real OOM conditions are met (i.e. all of swap is being used up as well). In backtraces btrfs related stuff appears. Expected results of cause: System continues swapping and if OOM conditions are met calls the OOM killer (which might try to get rid of running Planeshift client). Current workaround: Develop a good feeling on when to better restart the PS client. :) So for now just a heads up that I have seen similar issues. (But I think my backtraces might have been different, difficult to say since some of it scrolls by quite quickly.) Ciao, -- Martin 'Helios' Steigerwald - http://www.Lichtvoll.de GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
btrfs crash when low on memory.
Something I've yet to repeat managed to leak a whole bunch of memory while I was travelling, and locked up my workstation. When I got home, this was the last thing printed out before it locked up (it did make it into the logs thankfully) after a bunch of instances of the oom-killers handywork. SLUB: Unable to allocate memory on node -1 (gfp=0x50) cache: btrfs_extent_state, object size: 176, buffer size: 504, default order: 1, min order: 0 node 0: slabs: 49, objs: 640, free: 0 [ cut here ] kernel BUG at fs/btrfs/extent_io.c:748! invalid opcode: [#1] PREEMPT SMP Modules linked in: xfs vfat fat ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_conntrack nf_conntrack ip6table_filter ip6_tables snd_emu10k1 coretemp snd_hwdep snd_util_mem snd_ac97_codec ac97_bus snd_rawmidi snd_seq snd_seq_device microcode snd_pcm pcspkr snd_page_alloc snd_timer snd soundcore e1000e vhost_net tun macvtap macvlan kvm_intel kvm binfmt_misc nfsd auth_rpcgss nfs_acl lockd sunrpc btrfs libcrc32c lzo_compress zlib_deflate ata_piix usb_storage firewire_ohci firewire_core sata_sil crc_itu_t radeon i2c_algo_bit hwmon drm_kms_helper ttm drm i2c_core floppy CPU 1 Pid: 7017, comm: mutt Not tainted 3.8.0+ #67 /D975XBX RIP: 0010:[] [] __set_extent_bit+0x3ae/0x4d0 [btrfs] RSP: :8800a4c31838 EFLAGS: 00010246 RAX: RBX: 001bbfff RCX: RDX: 0001 RSI: 00b0 RDI: RBP: 8800a4c318b8 R08: 81cf0b80 R09: 0400 R10: 0001 R11: 0508 R12: 8800ba4ab2c8 R13: 8800ba4ab2c8 R14: R15: 001bb000 FS: 7eff96e14800() GS:8800bfc0() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 00449cee CR3: 80ebb000 CR4: 07e0 DR0: DR1: DR2: DR3: DR6: 0ff0 DR7: 0400 Process mutt (pid: 7017, threadinfo 8800a4c3, task 880025cba4c0) Stack: 2ab2 8800a4c318e0 8800a4c31fd8 0292 8800a4c31fd8 1000 001bbfff 10080008 034a39a8 8800ba4ab2c8 8800a4c31898 001bbfff Call Trace: [] lock_extent_bits+0x74/0xa0 [btrfs] [] lock_extent+0x13/0x20 [btrfs] [] __extent_read_full_page+0xc4/0x720 [btrfs] [] ? repair_io_failure+0x440/0x440 [btrfs] [] ? btrfs_submit_direct+0x640/0x640 [btrfs] [] ? btrfs_submit_direct+0x640/0x640 [btrfs] [] ? btrfs_submit_direct+0x640/0x640 [btrfs] [] extent_readpages+0x116/0x1f0 [btrfs] [] btrfs_readpages+0x1f/0x30 [btrfs] [] __do_page_cache_readahead+0x2aa/0x350 [] ? __do_page_cache_readahead+0x110/0x350 [] ? find_get_page+0x5/0x280 [] ra_submit+0x21/0x30 [] filemap_fault+0x267/0x4a0 [] __do_fault+0x6e/0x530 [] handle_pte_fault+0x8f/0x900 [] handle_mm_fault+0x210/0x300 [] __do_page_fault+0x15c/0x4e0 [] ? rcu_eqs_exit_common+0xc7/0x380 [] ? rcu_eqs_exit+0x65/0xb0 [] do_page_fault+0x2b/0x50 [] page_fault+0x1f/0x30 Code: c9 0f 85 c7 fc ff ff 66 0f 1f 44 00 00 f6 45 18 10 0f 84 b7 fc ff ff 8b 7d 18 e8 8e f2 ff ff 48 85 c0 48 89 c1 0f 85 a3 fc ff ff <0f> 0b 4d 89 ef 31 c9 eb 89 66 0f 1f 84 00 00 00 00 00 48 83 7b WARNING: at kernel/exit.c:721 do_exit+0x55/0xc70() Hardware name: Modules linked in: xfs vfat fat ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_conntrack nf_conntrack ip6table_filter ip6_tables snd_emu10k1 coretemp snd_hwdep snd_util_mem snd_ac97_codec ac97_bus snd_rawmidi snd_seq snd_seq_device microcode snd_pcm pcspkr snd_page_alloc snd_timer snd soundcore e1000e vhost_net tun macvtap macvlan kvm_intel kvm binfmt_misc nfsd auth_rpcgss nfs_acl lockd sunrpc btrfs libcrc32c lzo_compress zlib_deflate ata_piix usb_storage firewire_ohci firewire_core sata_sil crc_itu_t radeon i2c_algo_bit hwmon drm_kms_helper ttm drm i2c_core floppy Pid: 7017, comm: mutt Tainted: G D 3.8.0+ #67 Call Trace: [] warn_slowpath_common+0x7f/0xc0 [] warn_slowpath_null+0x1a/0x20 [] do_exit+0x55/0xc70 [] ? __const_udelay+0x28/0x30 [] ? __rcu_read_unlock+0x5c/0xa0 [] ? kmsg_dump+0x1bd/0x230 [] ? kmsg_dump+0x25/0x230 [] oops_end+0x96/0xe0 [] die+0x58/0x90 [] do_trap+0x6b/0x170 [] do_invalid_op+0x9a/0xc0 [] ? __set_extent_bit+0x3ae/0x4d0 [btrfs] [] ? alloc_extent_state+0x2e/0x1b0 [btrfs] [] ? trace_hardirqs_off_thunk+0x3a/0x3c [] ? restore_args+0x30/0x30 [] invalid_op+0x15/0x20 [] ? __set_extent_bit+0x3ae/0x4d0 [btrfs] [] ? __set_extent_bit+0x3a2/0x4d0 [btrfs] [] lock_extent_bits+0x74/0xa0 [btrfs] [] lock_extent+0x13/0x20 [btrfs] [] __extent_read_full_page+0xc4/0x720 [btrfs] [] ? repair_io_failure+0x440/0x440 [btrfs] [] ? btrfs_submit_direct+0x640/0x640 [btrfs] [] ? btrfs_submit_direct+0x640/0x640 [btrfs] [] ? btrfs_submit_direct+0x640/0x640 [btrfs] [] extent_readpages+0x116/0x1f0 [btrfs] [] btrfs_readpages+0x1f/0x30 [btrfs] [] __