Re: PROBLEM: Crash cgdeleting empty memory cgroups with memory.kmem.limit_in_bytes set
(2013/02/21 17:34), Glauber Costa wrote: On 02/21/2013 03:00 AM, Tejun Heo wrote: (cc'ing cgroup / memcg people and quoting whole body) Looks like something is going wrong with memcg cache destruction. Glauber, any ideas? Also, can we please not use names as generic as kmem_cache_destroy_work_func for something specific to memcg? How about something like memcg_destroy_cache_workfn? I will take a look. Thanks for the report for the reportee: I tested cgroup deletion quite extensively (quite important feature for me) so it is nice to have an uncaught case. About naming, I can change, no problem. seems reproduced on linux-3.8 On KVM guest , Fedora18's config + kmemcg. -Kame == [ 250.533831] general protection fault: [#1] SMP [ 250.538096] Modules linked in: ebtable_nat xt_CHECKSUM nf_conntrack_netbios_ns nf_conntrack_broadcast ipt_MASQUERADE ip6table_mangle ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 iptable_nat nf_nat_ipv4 nf_nat iptable_mangle nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack tun bridge stp llc ebtable_filter ebtables be2iscsi iscsi_boot_sysfs ip6table_filter ip6_tables bnx2i cnic uio cxgb4i cxgb4 cxgb3i cxgb3 mdio libcxgbi ib_iser rdma_cm ib_addr iw_cm ib_cm ib_sa ib_mad ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm snd_page_alloc 8139too snd_timer microcode snd 8139cp mii floppy pcspkr virtio_balloon soundcore i2c_piix4 btrfs libcrc32c zlib_deflate cirrus drm_kms_helper ttm drm virtio_blk i2c_core [ 250.538096] CPU 1 [ 250.538096] Pid: 38, comm: kworker/1:1 Not tainted 3.8.0 #3 Bochs Bochs [ 250.538096] RIP: 0010:[] [] kmem_cache_free+0x13a/0x1d0 [ 250.538096] RSP: 0018:880214345cc8 EFLAGS: 00010286 [ 250.538096] RAX: 81d84020 RBX: 880217000f00 RCX: 0068 [ 250.538096] RDX: RSI: 880217000f00 RDI: 880217000f00 [ 250.538096] RBP: 880214345ce8 R08: 13c0 R09: 006c [ 250.538096] R10: 0007ebc0ffe0 R11: 0007ebc0ffe0 R12: 880217001100 [ 250.538096] R13: 880214042c00 R14: 0200 R15: 880217000ef0 [ 250.538096] FS: () GS:88021fc8() knlGS: [ 250.538096] CS: 0010 DS: ES: CR0: 8005003b [ 250.538096] CR2: 003e98ae6ef0 CR3: 00021365 CR4: 06e0 [ 250.538096] DR0: DR1: DR2: [ 250.538096] DR3: DR6: 0ff0 DR7: 0400 [ 250.538096] Process kworker/1:1 (pid: 38, threadinfo 880214344000, task 88021435) [ 250.538096] Stack: [ 250.538096] e8c013c0 880214042c00 [ 250.538096] 880214345d18 81182084 880214042c00 880217000ef0 [ 250.538096] 880217000ef0 880214042c00 880214345d88 81184d7e [ 250.538096] Call Trace: [ 250.538096] [] free_kmem_cache_nodes+0x64/0xb0 [ 250.538096] [] __kmem_cache_shutdown+0x24e/0x320 [ 250.538096] [] ? kmem_cache_shrink+0x210/0x230 [ 250.538096] [] kmem_cache_destroy+0x3f/0xe0 [ 250.538096] [] kmem_cache_destroy_work_func+0x30/0x60 [ 250.538096] [] process_one_work+0x147/0x490 [ 250.538096] [] ? mem_cgroup_slabinfo_read+0xb0/0xb0 [ 250.538096] [] worker_thread+0x15e/0x450 [ 250.538096] [] ? busy_worker_rebind_fn+0x110/0x110 [ 250.538096] [] kthread+0xc0/0xd0 [ 250.538096] [] ? ftrace_define_fields_xen_mc_entry+0xa0/0xf0 [ 250.538096] [] ? kthread_create_on_node+0x120/0x120 [ 250.538096] [] ret_from_fork+0x7c/0xb0 [ 250.538096] [] ? kthread_create_on_node+0x120/0x120 [ 250.538096] Code: c1 e0 06 48 01 d0 48 8b 10 80 e6 80 0f 85 98 00 00 00 48 8b 40 30 49 39 c4 0f 84 f9 fe ff ff 48 8b 90 b8 00 00 00 48 85 d2 74 06 <4c> 3b 62 20 74 50 48 8b 50 60 49 8b 4c 24 60 31 c0 48 c7 c6 68 [ 250.538096] RIP [] kmem_cache_free+0x13a/0x1d0 [ 250.538096] RSP [ 250.746175] ---[ end trace 91abe13b8481aaaf ]--- [ 250.748879] BUG: unable to handle kernel paging request at ffd8 [ 250.749818] IP: [] kthread_data+0x10/0x20 [ 250.749818] PGD 1c0e067 PUD 1c0f067 PMD 0 [ 250.749818] Oops: [#2] SMP [ 250.749818] Modules linked in: ebtable_nat xt_CHECKSUM nf_conntrack_netbios_ns nf_conntrack_broadcast ipt_MASQUERADE ip6table_mangle ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 iptable_nat nf_nat_ipv4 nf_nat iptable_mangle nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack tun bridge stp llc ebtable_filter ebtables be2iscsi iscsi_boot_sysfs ip6table_filter ip6_tables bnx2i cnic uio cxgb4i cxgb4 cxgb3i cxgb3 mdio libcxgbi ib_iser rdma_cm ib_addr iw_cm ib_cm ib_sa ib_mad ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm snd_page_alloc 8139too snd_timer microcode snd 8139cp mii floppy pcspkr virtio_balloon soundcore i2c_piix4 btrfs
Re: PROBLEM: Crash cgdeleting empty memory cgroups with memory.kmem.limit_in_bytes set
On 02/21/2013 03:22 PM, Glauber Costa wrote: > On 02/21/2013 03:00 AM, Tejun Heo wrote: >> (cc'ing cgroup / memcg people and quoting whole body) >> >> Looks like something is going wrong with memcg cache destruction. >> Glauber, any ideas? Also, can we please not use names as generic as >> kmem_cache_destroy_work_func for something specific to memcg? How >> about something like memcg_destroy_cache_workfn? >> >> Thanks. > > Steffen, > > Is there any chance you could test that using SLAB instead of SLUB? > I haven't manage to reproduce it yet, but I am working on some theories > about why this is happening. If I could at least know if this is likely > a cache problem vs a inner-memcg problem, that would help. The calltrace > is not incredibly helpful, but it does indicate that the problem happens > when freeing cache objects. > Update: I've already reproduced this and determined this is a problem that plagues slub only, most likely due to initialization of the node caches. But I still don't know for sure the exact location. Expect a patch by tomorrow. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PROBLEM: Crash cgdeleting empty memory cgroups with memory.kmem.limit_in_bytes set
On 02/21/2013 03:00 AM, Tejun Heo wrote: > (cc'ing cgroup / memcg people and quoting whole body) > > Looks like something is going wrong with memcg cache destruction. > Glauber, any ideas? Also, can we please not use names as generic as > kmem_cache_destroy_work_func for something specific to memcg? How > about something like memcg_destroy_cache_workfn? > > Thanks. Steffen, Is there any chance you could test that using SLAB instead of SLUB? I haven't manage to reproduce it yet, but I am working on some theories about why this is happening. If I could at least know if this is likely a cache problem vs a inner-memcg problem, that would help. The calltrace is not incredibly helpful, but it does indicate that the problem happens when freeing cache objects. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PROBLEM: Crash cgdeleting empty memory cgroups with memory.kmem.limit_in_bytes set
On 02/21/2013 03:00 AM, Tejun Heo wrote: > (cc'ing cgroup / memcg people and quoting whole body) > > Looks like something is going wrong with memcg cache destruction. > Glauber, any ideas? Also, can we please not use names as generic as > kmem_cache_destroy_work_func for something specific to memcg? How > about something like memcg_destroy_cache_workfn? > I will take a look. Thanks for the report for the reportee: I tested cgroup deletion quite extensively (quite important feature for me) so it is nice to have an uncaught case. About naming, I can change, no problem. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PROBLEM: Crash cgdeleting empty memory cgroups with memory.kmem.limit_in_bytes set
On 02/21/2013 03:00 AM, Tejun Heo wrote: (cc'ing cgroup / memcg people and quoting whole body) Looks like something is going wrong with memcg cache destruction. Glauber, any ideas? Also, can we please not use names as generic as kmem_cache_destroy_work_func for something specific to memcg? How about something like memcg_destroy_cache_workfn? I will take a look. Thanks for the report for the reportee: I tested cgroup deletion quite extensively (quite important feature for me) so it is nice to have an uncaught case. About naming, I can change, no problem. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PROBLEM: Crash cgdeleting empty memory cgroups with memory.kmem.limit_in_bytes set
On 02/21/2013 03:00 AM, Tejun Heo wrote: (cc'ing cgroup / memcg people and quoting whole body) Looks like something is going wrong with memcg cache destruction. Glauber, any ideas? Also, can we please not use names as generic as kmem_cache_destroy_work_func for something specific to memcg? How about something like memcg_destroy_cache_workfn? Thanks. Steffen, Is there any chance you could test that using SLAB instead of SLUB? I haven't manage to reproduce it yet, but I am working on some theories about why this is happening. If I could at least know if this is likely a cache problem vs a inner-memcg problem, that would help. The calltrace is not incredibly helpful, but it does indicate that the problem happens when freeing cache objects. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PROBLEM: Crash cgdeleting empty memory cgroups with memory.kmem.limit_in_bytes set
On 02/21/2013 03:22 PM, Glauber Costa wrote: On 02/21/2013 03:00 AM, Tejun Heo wrote: (cc'ing cgroup / memcg people and quoting whole body) Looks like something is going wrong with memcg cache destruction. Glauber, any ideas? Also, can we please not use names as generic as kmem_cache_destroy_work_func for something specific to memcg? How about something like memcg_destroy_cache_workfn? Thanks. Steffen, Is there any chance you could test that using SLAB instead of SLUB? I haven't manage to reproduce it yet, but I am working on some theories about why this is happening. If I could at least know if this is likely a cache problem vs a inner-memcg problem, that would help. The calltrace is not incredibly helpful, but it does indicate that the problem happens when freeing cache objects. Update: I've already reproduced this and determined this is a problem that plagues slub only, most likely due to initialization of the node caches. But I still don't know for sure the exact location. Expect a patch by tomorrow. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PROBLEM: Crash cgdeleting empty memory cgroups with memory.kmem.limit_in_bytes set
(2013/02/21 17:34), Glauber Costa wrote: On 02/21/2013 03:00 AM, Tejun Heo wrote: (cc'ing cgroup / memcg people and quoting whole body) Looks like something is going wrong with memcg cache destruction. Glauber, any ideas? Also, can we please not use names as generic as kmem_cache_destroy_work_func for something specific to memcg? How about something like memcg_destroy_cache_workfn? I will take a look. Thanks for the report for the reportee: I tested cgroup deletion quite extensively (quite important feature for me) so it is nice to have an uncaught case. About naming, I can change, no problem. seems reproduced on linux-3.8 On KVM guest , Fedora18's config + kmemcg. -Kame == [ 250.533831] general protection fault: [#1] SMP [ 250.538096] Modules linked in: ebtable_nat xt_CHECKSUM nf_conntrack_netbios_ns nf_conntrack_broadcast ipt_MASQUERADE ip6table_mangle ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 iptable_nat nf_nat_ipv4 nf_nat iptable_mangle nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack tun bridge stp llc ebtable_filter ebtables be2iscsi iscsi_boot_sysfs ip6table_filter ip6_tables bnx2i cnic uio cxgb4i cxgb4 cxgb3i cxgb3 mdio libcxgbi ib_iser rdma_cm ib_addr iw_cm ib_cm ib_sa ib_mad ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm snd_page_alloc 8139too snd_timer microcode snd 8139cp mii floppy pcspkr virtio_balloon soundcore i2c_piix4 btrfs libcrc32c zlib_deflate cirrus drm_kms_helper ttm drm virtio_blk i2c_core [ 250.538096] CPU 1 [ 250.538096] Pid: 38, comm: kworker/1:1 Not tainted 3.8.0 #3 Bochs Bochs [ 250.538096] RIP: 0010:[81181f8a] [81181f8a] kmem_cache_free+0x13a/0x1d0 [ 250.538096] RSP: 0018:880214345cc8 EFLAGS: 00010286 [ 250.538096] RAX: 81d84020 RBX: 880217000f00 RCX: 0068 [ 250.538096] RDX: RSI: 880217000f00 RDI: 880217000f00 [ 250.538096] RBP: 880214345ce8 R08: 13c0 R09: 006c [ 250.538096] R10: 0007ebc0ffe0 R11: 0007ebc0ffe0 R12: 880217001100 [ 250.538096] R13: 880214042c00 R14: 0200 R15: 880217000ef0 [ 250.538096] FS: () GS:88021fc8() knlGS: [ 250.538096] CS: 0010 DS: ES: CR0: 8005003b [ 250.538096] CR2: 003e98ae6ef0 CR3: 00021365 CR4: 06e0 [ 250.538096] DR0: DR1: DR2: [ 250.538096] DR3: DR6: 0ff0 DR7: 0400 [ 250.538096] Process kworker/1:1 (pid: 38, threadinfo 880214344000, task 88021435) [ 250.538096] Stack: [ 250.538096] e8c013c0 880214042c00 [ 250.538096] 880214345d18 81182084 880214042c00 880217000ef0 [ 250.538096] 880217000ef0 880214042c00 880214345d88 81184d7e [ 250.538096] Call Trace: [ 250.538096] [81182084] free_kmem_cache_nodes+0x64/0xb0 [ 250.538096] [81184d7e] __kmem_cache_shutdown+0x24e/0x320 [ 250.538096] [811842b0] ? kmem_cache_shrink+0x210/0x230 [ 250.538096] [81153f3f] kmem_cache_destroy+0x3f/0xe0 [ 250.538096] [8118f080] kmem_cache_destroy_work_func+0x30/0x60 [ 250.538096] [8107a3c7] process_one_work+0x147/0x490 [ 250.538096] [8118f050] ? mem_cgroup_slabinfo_read+0xb0/0xb0 [ 250.538096] [8107cc5e] worker_thread+0x15e/0x450 [ 250.538096] [8107cb00] ? busy_worker_rebind_fn+0x110/0x110 [ 250.538096] [81081d20] kthread+0xc0/0xd0 [ 250.538096] [8101] ? ftrace_define_fields_xen_mc_entry+0xa0/0xf0 [ 250.538096] [81081c60] ? kthread_create_on_node+0x120/0x120 [ 250.538096] [8165ab6c] ret_from_fork+0x7c/0xb0 [ 250.538096] [81081c60] ? kthread_create_on_node+0x120/0x120 [ 250.538096] Code: c1 e0 06 48 01 d0 48 8b 10 80 e6 80 0f 85 98 00 00 00 48 8b 40 30 49 39 c4 0f 84 f9 fe ff ff 48 8b 90 b8 00 00 00 48 85 d2 74 06 4c 3b 62 20 74 50 48 8b 50 60 49 8b 4c 24 60 31 c0 48 c7 c6 68 [ 250.538096] RIP [81181f8a] kmem_cache_free+0x13a/0x1d0 [ 250.538096] RSP 880214345cc8 [ 250.746175] ---[ end trace 91abe13b8481aaaf ]--- [ 250.748879] BUG: unable to handle kernel paging request at ffd8 [ 250.749818] IP: [81082100] kthread_data+0x10/0x20 [ 250.749818] PGD 1c0e067 PUD 1c0f067 PMD 0 [ 250.749818] Oops: [#2] SMP [ 250.749818] Modules linked in: ebtable_nat xt_CHECKSUM nf_conntrack_netbios_ns nf_conntrack_broadcast ipt_MASQUERADE ip6table_mangle ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 iptable_nat nf_nat_ipv4 nf_nat iptable_mangle nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack tun bridge stp llc ebtable_filter ebtables be2iscsi iscsi_boot_sysfs ip6table_filter ip6_tables bnx2i cnic uio cxgb4i cxgb4 cxgb3i cxgb3 mdio