Hi Hannes, Your rewrite of the memcg charge/uncharge API is bold and attractive, but I'm having some problems with the way release_pages() now does uncharging in I/O completion context.
At the bottom see the lockdep message I get when I start shmem swapping. Which I have not begun to attempt to decipher (over to you!), but I do see release_pages() mentioned in there (also i915, hope it's irrelevant). Which was already worrying me on the PowerPC G5, when moving tasks from one memcg to another and removing the old, while swapping and swappingoff (I haven't tried much else actually, maybe it's much easier to reproduce). I get "unable to handle kernel paging at 0x180" oops in __raw_spinlock < res_counter_uncharge_until < mem_cgroup_uncharge_end < release_pages < free_pages_and_swap_cache < tlb_flush_mmu_free < tlb_finish_mmu < unmap_region < do_munmap (or from exit_mmap < mmput < do_exit). I do have CONFIG_MEMCG_SWAP=y, and I think 0x180 corresponds to the memsw res_counter spinlock, if memcg is NULL. I don't understand why usually the PowerPC: I did see something like it once on this x86 laptop, maybe having lockdep in on this slows things down enough not to hit that. I've stopped those crashes with patch below: the memcg_batch uncharging was never designed for use from interrupts. But I bet it needs more work: to disable interrupts, or do something clever with atomics, or... over to you again. As it stands, I think an interrupt in the wrong place risks leaking charges (but actually I see the reverse - kernel/res_counter.c:28! underflow warnings; though I don't know if it's merely that the patch lets the machine stay up long enough to reach those, or causes them). Not-really-Signed-off-by: Hugh Dickins <hu...@google.com> --- mm/memcontrol.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) --- 3.16-rc2-mm1/mm/memcontrol.c 2014-06-25 18:43:59.856588121 -0700 +++ linux/mm/memcontrol.c 2014-06-29 21:45:03.896588350 -0700 @@ -3636,12 +3636,11 @@ void mem_cgroup_uncharge_end(void) if (!batch->do_batch) return; - batch->do_batch--; - if (batch->do_batch) /* If stacked, do nothing. */ - return; + if (batch->do_batch > 1) /* If stacked, do nothing. */ + goto out; if (!batch->memcg) - return; + goto out; /* * This "batch->memcg" is valid without any css_get/put etc... * bacause we hide charges behind us. @@ -3655,6 +3654,8 @@ void mem_cgroup_uncharge_end(void) memcg_oom_recover(batch->memcg); /* forget this pointer (for sanity check) */ batch->memcg = NULL; +out: + batch->do_batch--; } #ifdef CONFIG_MEMCG_SWAP And here's lockdep's little fortune cookie: ====================================================== [ INFO: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected ] 3.16.0-rc2-mm1 #3 Not tainted ------------------------------------------------------ cc1/2771 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire: (&(&rtpz->lock)->rlock){+.+.-.}, at: [<ffffffff811518b5>] memcg_check_events+0x17e/0x206 dd and this task is already holding: (&(&zone->lru_lock)->rlock){..-.-.}, at: [<ffffffff8110da3f>] release_pages+0xe7/0x239 which would create a new lock dependency: (&(&zone->lru_lock)->rlock){..-.-.} -> (&(&rtpz->lock)->rlock){+.+.-.} but this new dependency connects a SOFTIRQ-irq-safe lock: (&(&zone->lru_lock)->rlock){..-.-.} ... which became SOFTIRQ-irq-safe at: [<ffffffff810c201e>] __lock_acquire+0x59f/0x17e8 [<ffffffff810c38a6>] lock_acquire+0x61/0x78 [<ffffffff815bdfbd>] _raw_spin_lock_irqsave+0x3f/0x51 [<ffffffff8110dc0e>] pagevec_lru_move_fn+0x7d/0xf6 [<ffffffff8110dca4>] pagevec_move_tail+0x1d/0x2c [<ffffffff8110e298>] rotate_reclaimable_page+0xb2/0xd4 [<ffffffff811018bf>] end_page_writeback+0x1c/0x45 [<ffffffff81134400>] end_swap_bio_write+0x5c/0x69 [<ffffffff8123473e>] bio_endio+0x50/0x6e [<ffffffff81238dee>] blk_update_request+0x163/0x255 [<ffffffff81238ef7>] blk_update_bidi_request+0x17/0x65 [<ffffffff81239242>] blk_end_bidi_request+0x1a/0x56 [<ffffffff81239289>] blk_end_request+0xb/0xd [<ffffffff813a075a>] scsi_io_completion+0x16d/0x553 [<ffffffff81399c0f>] scsi_finish_command+0xb6/0xbf [<ffffffff813a0564>] scsi_softirq_done+0xe9/0xf0 [<ffffffff8123e8e5>] blk_done_softirq+0x79/0x8b [<ffffffff81088675>] __do_softirq+0xfc/0x21f [<ffffffff8108898f>] irq_exit+0x3d/0x92 [<ffffffff81032379>] do_IRQ+0xcc/0xe5 [<ffffffff815bf5ac>] ret_from_intr+0x0/0x13 [<ffffffff81443ac0>] cpuidle_enter+0x12/0x14 [<ffffffff810bb4e4>] cpu_startup_entry+0x187/0x243 [<ffffffff815a90ab>] rest_init+0x12f/0x133 [<ffffffff81970e7c>] start_kernel+0x396/0x3a3 [<ffffffff81970489>] x86_64_start_reservations+0x2a/0x2c [<ffffffff81970552>] x86_64_start_kernel+0xc7/0xca to a SOFTIRQ-irq-unsafe lock: (&(&rtpz->lock)->rlock){+.+.-.} ... which became SOFTIRQ-irq-unsafe at: ... [<ffffffff810c2095>] __lock_acquire+0x616/0x17e8 [<ffffffff810c38a6>] lock_acquire+0x61/0x78 [<ffffffff815bde9f>] _raw_spin_lock+0x34/0x41 [<ffffffff811518b5>] memcg_check_events+0x17e/0x206 [<ffffffff811535bb>] commit_charge+0x260/0x26f [<ffffffff81157004>] mem_cgroup_commit_charge+0xb1/0xdb [<ffffffff81115b51>] shmem_getpage_gfp+0x400/0x6c2 [<ffffffff81115ecc>] shmem_write_begin+0x33/0x35 [<ffffffff81102a24>] generic_perform_write+0xb7/0x1a4 [<ffffffff8110391e>] __generic_file_write_iter+0x25b/0x29b [<ffffffff81103999>] generic_file_write_iter+0x3b/0xa5 [<ffffffff8115a115>] new_sync_write+0x7b/0x9f [<ffffffff8115a56c>] vfs_write+0xb5/0x169 [<ffffffff8115ae1f>] SyS_write+0x45/0x8c [<ffffffff815bead2>] system_call_fastpath+0x16/0x1b other info that might help us debug this: Possible interrupt unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&(&rtpz->lock)->rlock); local_irq_disable(); lock(&(&zone->lru_lock)->rlock); lock(&(&rtpz->lock)->rlock); <Interrupt> lock(&(&zone->lru_lock)->rlock); *** DEADLOCK *** 1 lock held by cc1/2771: #0: (&(&zone->lru_lock)->rlock){..-.-.}, at: [<ffffffff8110da3f>] release_pages+0xe7/0x239 the dependencies between SOFTIRQ-irq-safe lock and the holding lock: -> (&(&zone->lru_lock)->rlock){..-.-.} ops: 413812 { IN-SOFTIRQ-W at: [<ffffffff810c201e>] __lock_acquire+0x59f/0x17e8 [<ffffffff810c38a6>] lock_acquire+0x61/0x78 [<ffffffff815bdfbd>] _raw_spin_lock_irqsave+0x3f/0x51 [<ffffffff8110dc0e>] pagevec_lru_move_fn+0x7d/0xf6 [<ffffffff8110dca4>] pagevec_move_tail+0x1d/0x2c [<ffffffff8110e298>] rotate_reclaimable_page+0xb2/0xd4 [<ffffffff811018bf>] end_page_writeback+0x1c/0x45 [<ffffffff81134400>] end_swap_bio_write+0x5c/0x69 [<ffffffff8123473e>] bio_endio+0x50/0x6e [<ffffffff81238dee>] blk_update_request+0x163/0x255 [<ffffffff81238ef7>] blk_update_bidi_request+0x17/0x65 [<ffffffff81239242>] blk_end_bidi_request+0x1a/0x56 [<ffffffff81239289>] blk_end_request+0xb/0xd [<ffffffff813a075a>] scsi_io_completion+0x16d/0x553 [<ffffffff81399c0f>] scsi_finish_command+0xb6/0xbf [<ffffffff813a0564>] scsi_softirq_done+0xe9/0xf0 [<ffffffff8123e8e5>] blk_done_softirq+0x79/0x8b [<ffffffff81088675>] __do_softirq+0xfc/0x21f [<ffffffff8108898f>] irq_exit+0x3d/0x92 [<ffffffff81032379>] do_IRQ+0xcc/0xe5 [<ffffffff815bf5ac>] ret_from_intr+0x0/0x13 [<ffffffff81443ac0>] cpuidle_enter+0x12/0x14 [<ffffffff810bb4e4>] cpu_startup_entry+0x187/0x243 [<ffffffff815a90ab>] rest_init+0x12f/0x133 [<ffffffff81970e7c>] start_kernel+0x396/0x3a3 [<ffffffff81970489>] x86_64_start_reservations+0x2a/0x2c [<ffffffff81970552>] x86_64_start_kernel+0xc7/0xca IN-RECLAIM_FS-W at: [<ffffffff810c20c3>] __lock_acquire+0x644/0x17e8 [<ffffffff810c38a6>] lock_acquire+0x61/0x78 [<ffffffff815bdfbd>] _raw_spin_lock_irqsave+0x3f/0x51 [<ffffffff8110dc0e>] pagevec_lru_move_fn+0x7d/0xf6 [<ffffffff8110dca4>] pagevec_move_tail+0x1d/0x2c [<ffffffff8110e66d>] lru_add_drain_cpu+0x4d/0xb3 [<ffffffff8110e783>] lru_add_drain+0x1a/0x37 [<ffffffff81111b95>] shrink_active_list+0x62/0x2cb [<ffffffff81112eaa>] balance_pgdat+0x156/0x503 [<ffffffff8111355e>] kswapd+0x307/0x341 [<ffffffff810a1923>] kthread+0xf1/0xf9 [<ffffffff815bea2c>] ret_from_fork+0x7c/0xb0 INITIAL USE at: [<ffffffff810c20db>] __lock_acquire+0x65c/0x17e8 [<ffffffff810c38a6>] lock_acquire+0x61/0x78 [<ffffffff815bdfbd>] _raw_spin_lock_irqsave+0x3f/0x51 [<ffffffff8110dc0e>] pagevec_lru_move_fn+0x7d/0xf6 [<ffffffff8110dcc5>] __pagevec_lru_add+0x12/0x14 [<ffffffff8110dd37>] __lru_cache_add+0x70/0x9f [<ffffffff8110e44e>] lru_cache_add_anon+0x14/0x16 [<ffffffff81115b5a>] shmem_getpage_gfp+0x409/0x6c2 [<ffffffff81115fcb>] shmem_read_mapping_page_gfp+0x2e/0x49 [<ffffffff8133168f>] i915_gem_object_get_pages_gtt+0xe5/0x3f9 [<ffffffff8132d66e>] i915_gem_object_get_pages+0x64/0x8f [<ffffffff81330eaa>] i915_gem_object_pin+0x2a0/0x5af [<ffffffff813408fb>] intel_init_ring_buffer+0x2ba/0x3e6 [<ffffffff8134323a>] intel_init_render_ring_buffer+0x38b/0x3a6 [<ffffffff8132faae>] i915_gem_init_hw+0x127/0x2c6 [<ffffffff8132fd57>] i915_gem_init+0x10a/0x189 [<ffffffff81381d0c>] i915_driver_load+0xb1b/0xde7 [<ffffffff812fff60>] drm_dev_register+0x7f/0xf8 [<ffffffff81302185>] drm_get_pci_dev+0xf7/0x1b4 [<ffffffff81311d2f>] i915_pci_probe+0x40/0x49 [<ffffffff8127dddd>] local_pci_probe+0x1f/0x51 [<ffffffff8127ded5>] pci_device_probe+0xc6/0xec [<ffffffff81389720>] driver_probe_device+0x99/0x1b9 [<ffffffff813898d4>] __driver_attach+0x5c/0x7e [<ffffffff81387e7f>] bus_for_each_dev+0x55/0x89 [<ffffffff813893f6>] driver_attach+0x19/0x1b [<ffffffff81388fb2>] bus_add_driver+0xec/0x1d3 [<ffffffff81389e21>] driver_register+0x89/0xc5 [<ffffffff8127d48f>] __pci_register_driver+0x58/0x5b [<ffffffff8130229b>] drm_pci_init+0x59/0xda [<ffffffff8199497f>] i915_init+0x89/0x90 [<ffffffff8100030e>] do_one_initcall+0xea/0x18c [<ffffffff81970f8d>] kernel_init_freeable+0x104/0x196 [<ffffffff815a90b8>] kernel_init+0x9/0xd5 [<ffffffff815bea2c>] ret_from_fork+0x7c/0xb0 } ... key at: [<ffffffff8273c920>] __key.37664+0x0/0x8 ... acquired at: [<ffffffff810c0f1b>] check_irq_usage+0x54/0xa8 [<ffffffff810c2b50>] __lock_acquire+0x10d1/0x17e8 [<ffffffff810c38a6>] lock_acquire+0x61/0x78 [<ffffffff815bde9f>] _raw_spin_lock+0x34/0x41 [<ffffffff811518b5>] memcg_check_events+0x17e/0x206 [<ffffffff811571aa>] mem_cgroup_uncharge+0xf6/0x1c0 [<ffffffff8110db2a>] release_pages+0x1d2/0x239 [<ffffffff81134ea2>] free_pages_and_swap_cache+0x72/0x8c [<ffffffff8112136f>] tlb_flush_mmu_free+0x21/0x3c [<ffffffff81121d5d>] tlb_flush_mmu+0x1b/0x1e [<ffffffff81121d6f>] tlb_finish_mmu+0xf/0x34 [<ffffffff8112a968>] exit_mmap+0xb5/0x117 [<ffffffff81081a9d>] mmput+0x52/0xce [<ffffffff81086842>] do_exit+0x355/0x9b7 [<ffffffff81086f46>] do_group_exit+0x76/0xb5 [<ffffffff81086f94>] __wake_up_parent+0x0/0x23 [<ffffffff815bead2>] system_call_fastpath+0x16/0x1b the dependencies between the lock to be acquired and SOFTIRQ-irq-unsafe lock: -> (&(&rtpz->lock)->rlock){+.+.-.} ops: 2348 { HARDIRQ-ON-W at: [<ffffffff810c2073>] __lock_acquire+0x5f4/0x17e8 [<ffffffff810c38a6>] lock_acquire+0x61/0x78 [<ffffffff815bde9f>] _raw_spin_lock+0x34/0x41 [<ffffffff811518b5>] memcg_check_events+0x17e/0x206 [<ffffffff811535bb>] commit_charge+0x260/0x26f [<ffffffff81157004>] mem_cgroup_commit_charge+0xb1/0xdb [<ffffffff81115b51>] shmem_getpage_gfp+0x400/0x6c2 [<ffffffff81115ecc>] shmem_write_begin+0x33/0x35 [<ffffffff81102a24>] generic_perform_write+0xb7/0x1a4 [<ffffffff8110391e>] __generic_file_write_iter+0x25b/0x29b [<ffffffff81103999>] generic_file_write_iter+0x3b/0xa5 [<ffffffff8115a115>] new_sync_write+0x7b/0x9f [<ffffffff8115a56c>] vfs_write+0xb5/0x169 [<ffffffff8115ae1f>] SyS_write+0x45/0x8c [<ffffffff815bead2>] system_call_fastpath+0x16/0x1b SOFTIRQ-ON-W at: [<ffffffff810c2095>] __lock_acquire+0x616/0x17e8 [<ffffffff810c38a6>] lock_acquire+0x61/0x78 [<ffffffff815bde9f>] _raw_spin_lock+0x34/0x41 [<ffffffff811518b5>] memcg_check_events+0x17e/0x206 [<ffffffff811535bb>] commit_charge+0x260/0x26f [<ffffffff81157004>] mem_cgroup_commit_charge+0xb1/0xdb [<ffffffff81115b51>] shmem_getpage_gfp+0x400/0x6c2 [<ffffffff81115ecc>] shmem_write_begin+0x33/0x35 [<ffffffff81102a24>] generic_perform_write+0xb7/0x1a4 [<ffffffff8110391e>] __generic_file_write_iter+0x25b/0x29b [<ffffffff81103999>] generic_file_write_iter+0x3b/0xa5 [<ffffffff8115a115>] new_sync_write+0x7b/0x9f [<ffffffff8115a56c>] vfs_write+0xb5/0x169 [<ffffffff8115ae1f>] SyS_write+0x45/0x8c [<ffffffff815bead2>] system_call_fastpath+0x16/0x1b IN-RECLAIM_FS-W at: [<ffffffff810c20c3>] __lock_acquire+0x644/0x17e8 [<ffffffff810c38a6>] lock_acquire+0x61/0x78 [<ffffffff815bde9f>] _raw_spin_lock+0x34/0x41 [<ffffffff81156311>] mem_cgroup_soft_limit_reclaim+0x80/0x6b9 [<ffffffff81112fc2>] balance_pgdat+0x26e/0x503 [<ffffffff8111355e>] kswapd+0x307/0x341 [<ffffffff810a1923>] kthread+0xf1/0xf9 [<ffffffff815bea2c>] ret_from_fork+0x7c/0xb0 INITIAL USE at: [<ffffffff810c20db>] __lock_acquire+0x65c/0x17e8 [<ffffffff810c38a6>] lock_acquire+0x61/0x78 [<ffffffff815bde9f>] _raw_spin_lock+0x34/0x41 [<ffffffff811518b5>] memcg_check_events+0x17e/0x206 [<ffffffff811535bb>] commit_charge+0x260/0x26f [<ffffffff81157004>] mem_cgroup_commit_charge+0xb1/0xdb [<ffffffff81115b51>] shmem_getpage_gfp+0x400/0x6c2 [<ffffffff81115ecc>] shmem_write_begin+0x33/0x35 [<ffffffff81102a24>] generic_perform_write+0xb7/0x1a4 [<ffffffff8110391e>] __generic_file_write_iter+0x25b/0x29b [<ffffffff81103999>] generic_file_write_iter+0x3b/0xa5 [<ffffffff8115a115>] new_sync_write+0x7b/0x9f [<ffffffff8115a56c>] vfs_write+0xb5/0x169 [<ffffffff8115ae1f>] SyS_write+0x45/0x8c [<ffffffff815bead2>] system_call_fastpath+0x16/0x1b } ... key at: [<ffffffff82747bf0>] __key.49479+0x0/0x8 ... acquired at: [<ffffffff810c0f1b>] check_irq_usage+0x54/0xa8 [<ffffffff810c2b50>] __lock_acquire+0x10d1/0x17e8 [<ffffffff810c38a6>] lock_acquire+0x61/0x78 [<ffffffff815bde9f>] _raw_spin_lock+0x34/0x41 [<ffffffff811518b5>] memcg_check_events+0x17e/0x206 [<ffffffff811571aa>] mem_cgroup_uncharge+0xf6/0x1c0 [<ffffffff8110db2a>] release_pages+0x1d2/0x239 [<ffffffff81134ea2>] free_pages_and_swap_cache+0x72/0x8c [<ffffffff8112136f>] tlb_flush_mmu_free+0x21/0x3c [<ffffffff81121d5d>] tlb_flush_mmu+0x1b/0x1e [<ffffffff81121d6f>] tlb_finish_mmu+0xf/0x34 [<ffffffff8112a968>] exit_mmap+0xb5/0x117 [<ffffffff81081a9d>] mmput+0x52/0xce [<ffffffff81086842>] do_exit+0x355/0x9b7 [<ffffffff81086f46>] do_group_exit+0x76/0xb5 [<ffffffff81086f94>] __wake_up_parent+0x0/0x23 [<ffffffff815bead2>] system_call_fastpath+0x16/0x1b stack backtrace: CPU: 1 PID: 2771 Comm: cc1 Not tainted 3.16.0-rc2-mm1 #3 Hardware name: LENOVO 4174EH1/4174EH1, BIOS 8CET51WW (1.31 ) 11/29/2011 0000000000000000 ffff88000fe77a18 ffffffff815b2b2f ffff880004b09868 ffff88000fe77b10 ffffffff810c0eb6 0000000000000000 ffff880000000000 ffff880000000001 0000000400000006 ffffffff81811f22 ffff88000fe77a60 Call Trace: [<ffffffff815b2b2f>] dump_stack+0x4e/0x7a [<ffffffff810c0eb6>] check_usage+0x591/0x5a2 [<ffffffff81156261>] ? mem_cgroup_bad_page_check+0x15/0x1d [<ffffffff810c1809>] ? trace_hardirqs_on+0xd/0xf [<ffffffff815be16f>] ? _raw_spin_unlock_irq+0x32/0x46 [<ffffffff810c0f1b>] check_irq_usage+0x54/0xa8 [<ffffffff810c2b50>] __lock_acquire+0x10d1/0x17e8 [<ffffffff810c38a6>] lock_acquire+0x61/0x78 [<ffffffff811518b5>] ? memcg_check_events+0x17e/0x206 [<ffffffff815bde9f>] _raw_spin_lock+0x34/0x41 [<ffffffff811518b5>] ? memcg_check_events+0x17e/0x206 [<ffffffff811518b5>] memcg_check_events+0x17e/0x206 [<ffffffff811571aa>] mem_cgroup_uncharge+0xf6/0x1c0 [<ffffffff8110db2a>] release_pages+0x1d2/0x239 [<ffffffff81134ea2>] free_pages_and_swap_cache+0x72/0x8c [<ffffffff8112136f>] tlb_flush_mmu_free+0x21/0x3c [<ffffffff81121d5d>] tlb_flush_mmu+0x1b/0x1e [<ffffffff81121d6f>] tlb_finish_mmu+0xf/0x34 [<ffffffff8112a968>] exit_mmap+0xb5/0x117 [<ffffffff81081a9d>] mmput+0x52/0xce [<ffffffff81086842>] do_exit+0x355/0x9b7 [<ffffffff815bf64e>] ? retint_swapgs+0xe/0x13 [<ffffffff81086f46>] do_group_exit+0x76/0xb5 [<ffffffff81086f94>] SyS_exit_group+0xf/0xf [<ffffffff815bead2>] system_call_fastpath+0x16/0x1b Hugh -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/