On Tue, Jun 12, 2018 at 6:02 PM, John Stultz <john.stu...@linaro.org> wrote: > Hey Tejun, > With the current linus/master, I'm able to fairly regularly trip > OOPSes (two examples below) in mem_cgroup_protected(), which seems to > be new. I haven't managed to trigger this sort of thing with v4.17. > > I've not had much time to dig in or bisect it - I only know that > enabling most of the memory debuging config options didn't seem to > trip anything prior to the issue. So I wanted to send you a heads up > to see if there was already known, or if there was anything you might > suggest to help chase this down.
So the line where we're crashing seems to be in mem_cgroup_protected(): parent_emin = READ_ONCE(parent->memory.emin); where I'm guessing the parent->memory value is null, and emin is at the 0x120 offset in the strucutre. Reverting the following commits seems to avoid the issue. bf8d5d52ffe8 ("memcg: introduce memory.min") 5f93ad67436b ("mm: treat memory.low value inclusive") 230671533d64 ("mm: memory.low hierarchical behavior") I'm guessing I'm tripping over some path where the memory value never gets initialized? Any ideas or suggestions? thanks -john (usually I'd trim the backtraces below, but keeping them as I added Roman to the CC list) > console:/ $ [ 170.530896] Unable to handle kernel read from > unreadable memory at virtual address 0000000000000120 > [ 170.540158] Mem abort info: > [ 170.543092] ESR = 0x96000005 > [ 170.546193] Exception class = DABT (current EL), IL = 32 bits > [ 170.552251] SET = 0, FnV = 0 > [ 170.555444] EA = 0, S1PTW = 0 > [ 170.558698] Data abort info: > [ 170.561624] ISV = 0, ISS = 0x00000005 > [ 170.565572] CM = 0, WnR = 0 > [ 170.568650] user pgtable: 4k pages, 39-bit VAs, pgdp = 00000000190bb04e > [ 170.575374] [0000000000000120] pgd=0000000000000000, pud=0000000000000000 > [ 170.582297] Internal error: Oops: 96000005 [#1] PREEMPT SMP > [ 170.587929] CPU: 7 PID: 663 Comm: kswapd0 Not tainted > 4.17.0-11699-gb4f23f3 #411 > [ 170.595358] Hardware name: HiKey Development Board (DT) > [ 170.600623] pstate: a0400005 (NzCv daif +PAN -UAO) > [ 170.605478] pc : mem_cgroup_protected+0x34/0x120 > [ 170.610142] lr : shrink_node+0x120/0x478 > [ 170.614093] sp : ffffff8009d23c50 > [ 170.617438] x29: ffffff8009d23c50 x28: ffffff8009d23d48 > [ 170.622808] x27: ffffffc074ca1000 x26: ffffff8009d23e28 > [ 170.628160] x25: ffffff8009d23d88 x24: 0000000000000000 > [ 170.633481] x23: 0000000000000000 x22: ffffff8009071f80 > [ 170.638802] x21: 0000000000000012 x20: 0000000000000012 > [ 170.644124] x19: 0000000000000000 x18: 0000000000000400 > [ 170.649444] x17: 0000000000000000 x16: ffffffc074ca2000 > [ 170.654765] x15: 0000000000000000 x14: 0000000000000400 > [ 170.660087] x13: 00000000000000b1 x12: 0000000000000003 > [ 170.665408] x11: 0000000000000020 x10: 0000000000000000 > [ 170.670729] x9 : 0000000000000001 x8 : 0000000000000004 > [ 170.676050] x7 : ffffffc074d43c00 x6 : 0000000000000000 > [ 170.681370] x5 : 0000000000000000 x4 : 0000000000000000 > [ 170.686690] x3 : 000000000000dafa x2 : 0000000000000000 > [ 170.692010] x1 : ffffffc074ca1000 x0 : ffffffc0386e8000 > [ 170.697335] Process kswapd0 (pid: 663, stack limit = 0x00000000e0f0ae51) > [ 170.704039] Call trace: > [ 170.706497] mem_cgroup_protected+0x34/0x120 > [ 170.710775] balance_pgdat+0x1cc/0x418 > [ 170.714529] kswapd+0x180/0x3b8 > [ 170.717674] kthread+0xf8/0x128 > [ 170.720824] ret_from_fork+0x10/0x18 > [ 170.724411] Code: b40007a2 d103e042 eb02001f 540006c0 (f9409046) > [ 170.730542] ---[ end trace 7c961b6d409886f1 ]--- > [ 170.839299] Kernel panic - not syncing: Fatal exception > [ 170.844549] SMP: stopping secondary CPUs > [ 170.848488] Kernel Offset: disabled > [ 170.851982] CPU features: 0x24802004 > [ 170.855556] Memory Limit: none > [ 170.888494] Rebooting in 5 seconds.. > > > > > console:/ # [ 348.612152] Unable to handle kernel read from > unreadable memory at virtual address 0000000000000120 > [ 348.617384] Unable to handle kernel access to user memory outside > uaccess routines at virtual address 0000000000000120 > [ 348.621360] Mem abort info: > [ 348.632086] Mem abort info: > [ 348.634870] ESR = 0x96000005 > [ 348.634885] Exception class = DABT (current EL), IL = 32 bits > [ 348.637686] ESR = 0x96000005 > [ 348.640785] SET = 0, FnV = 0 > [ 348.646740] Exception class = DABT (current EL), IL = 32 bits > [ 348.649799] EA = 0, S1PTW = 0 > [ 348.652892] SET = 0, FnV = 0 > [ 348.652901] EA = 0, S1PTW = 0 > [ 348.652913] Data abort info: > [ 348.658905] Data abort info: > [ 348.662041] ISV = 0, ISS = 0x00000005 > [ 348.662050] CM = 0, WnR = 0 > [ 348.662071] user pgtable: 4k pages, 39-bit VAs, pgdp = 00000000697cecc4 > [ 348.665129] ISV = 0, ISS = 0x00000005 > [ 348.668298] [0000000000000120] pgd=000000003a915003, pud=000000003a915003 > [ 348.671224] CM = 0, WnR = 0 > [ 348.671242] user pgtable: 4k pages, 39-bit VAs, pgdp = 00000000c568bd29 > [ 348.674193] , pmd=0000000000000000 > [ 348.678021] [0000000000000120] pgd=0000000000000000, pud=0000000000000000 > [ 348.691540] Internal error: Oops: 96000005 [#1] PREEMPT SMP > [ 348.723733] CPU: 5 PID: 3246 Comm: CrRendererMain Not tainted > 4.17.0-11699-gb4f23f3 #412 > [ 348.731857] Hardware name: HiKey Development Board (DT) > [ 348.737121] pstate: a0400005 (NzCv daif +PAN -UAO) > [ 348.741975] pc : mem_cgroup_protected+0x34/0x120 > [ 348.746640] lr : shrink_node+0x120/0x478 > [ 348.750590] sp : ffffff800ac9b8a0 > [ 348.753934] x29: ffffff800ac9b8a0 x28: ffffff800ac9b9d8 > [ 348.759304] x27: ffffffc071982480 x26: ffffff800ac9bb30 > [ 348.764673] x25: ffffff800ac9ba18 x24: 0000000000000000 > [ 348.770038] x23: 0000000000000000 x22: ffffff8009113d00 > [ 348.775404] x21: 000000000000000f x20: 000000000000000f > [ 348.780769] x19: 0000000000000000 x18: 0000000000000000 > [ 348.786134] x17: 0000000000000000 x16: ffffffc071985a80 > [ 348.791500] x15: 0000000000000000 x14: 00000000d5e75c2f > [ 348.796868] x13: 00000000d7237d18 x12: 0000000000000003 > [ 348.802233] x11: 0000000000000020 x10: 0000000000000000 > [ 348.807598] x9 : 0000000000000001 x8 : 0000000000000004 > [ 348.812963] x7 : ffffffc072d58c80 x6 : 0000000000000000 > [ 348.818311] x5 : 0000000000000000 x4 : 0000000000000000 > [ 348.823626] x3 : 000000000000e1fc x2 : 0000000000000000 > [ 348.828941] x1 : ffffffc071982480 x0 : ffffffc038700080 > [ 348.834258] Process CrRendererMain (pid: 3246, stack limit = > 0x00000000b82069c1) > [ 348.841652] Call trace: > [ 348.844100] mem_cgroup_protected+0x34/0x120 > [ 348.848370] do_try_to_free_pages+0xd0/0x3c0 > [ 348.852639] try_to_free_pages+0xf8/0x120 > [ 348.856651] __alloc_pages_nodemask+0x460/0xb68 > [ 348.861181] do_huge_pmd_anonymous_page+0x328/0x7d8 > [ 348.866061] __handle_mm_fault+0x57c/0xea0 > [ 348.870157] handle_mm_fault+0x128/0x1f8 > [ 348.874082] do_page_fault+0x1d0/0x490 > [ 348.877830] do_translation_fault+0x5c/0x68 > [ 348.882012] do_mem_abort+0x54/0x118 > [ 348.885587] el0_da+0x20/0x24 > [ 348.888557] Code: b40007a2 d103e042 eb02001f 540006c0 (f9409046) > [ 348.894651] ---[ end trace 58afd90183767ac2 ]--- > [ 348.942150] Kernel panic - not syncing: Fatal exception > [ 348.947448] SMP: stopping secondary CPUs > [ 349.784747] SMP: failed to stop secondary CPUs 2,5 > [ 349.789569] Kernel Offset: disabled > [ 349.793089] CPU features: 0x24802004 > [ 349.796691] Memory Limit: none > [ 349.909567] Rebooting in 5 seconds..