On Mon, May 23, 2016 at 05:24:59PM +0300, Kirill A. Shutemov wrote: > On Mon, May 23, 2016 at 05:06:38PM +0300, Mika Westerberg wrote: > > Hi, > > > > After upgrading kernel of my desktop system from v4.6-rc7 to v4.6, I've > > started seeing following: > > > > [176611.093747] page:ffffea0000360000 count:1 mapcount:0 > > mapping:ffff880034d2e0a1 index:0x1f9b06600 compound_mapcount: 0 > > [176611.093751] flags: > > 0x3fff8000044079(locked|uptodate|dirty|lru|active|head|swapbacked) > > [176611.093752] page dumped because: VM_BUG_ON_PAGE(page->index != > > linear_page_index(vma, address)) > > [176611.093753] page->mem_cgroup:ffff88049e81b800 > > [176611.093765] ------------[ cut here ]------------ > > [176611.093778] kernel BUG at mm/rmap.c:1101! > > [176611.093787] invalid opcode: 0000 [#1] PREEMPT SMP > > [176611.093800] Modules linked in: vfat fat usb_storage fuse bridge stp llc > > ebtable_filter ebtables ip6table_filter ip6_tables pl2303 > > snd_hda_codec_realtek snd_hda_codec_hdmi snd_hda_codec_generic > > snd_hda_intel snd_hda_codec x86_pkg_temp_thermal coretemp kvm_intel > > snd_hwdep snd_hda_core kvm snd_seq snd_seq_device iTCO_wdt > > iTCO_vendor_support snd_pcm mxm_wmi irqbypass crct10dif_pclmul joydev > > crc32_pclmul crc32c_intel mei_me snd_timer ghash_clmulni_intel snd mei > > lpc_ich i2c_i801 shpchp mfd_core soundcore wmi i915 drm_kms_helper drm > > e1000e igb serio_raw dca i2c_algo_bit i2c_core ptp pps_core video > > [176611.093947] CPU: 1 PID: 2851 Comm: BrowserBlocking Tainted: G > > I 4.6.0 #71 > > [176611.093962] Hardware name: Gigabyte Technology Co., Ltd. Z87X-UD7 > > TH/Z87X-UD7 TH-CF, BIOS F4 03/18/2014 > > [176611.093981] task: ffff880492193600 ti: ffff8804971e0000 task.ti: > > ffff8804971e0000 > > [176611.093996] RIP: 0010:[<ffffffff811dbcb3>] [<ffffffff811dbcb3>] > > page_move_anon_rmap+0x93/0xa0 > > [176611.094018] RSP: 0000:ffff8804971e3d58 EFLAGS: 00010296 > > [176611.094030] RAX: 0000000000000021 RBX: ffffea0000360000 RCX: > > 0000000000000002 > > [176611.094045] RDX: 0000000080000002 RSI: ffffffff81a2dce2 RDI: > > 00000000ffffffff > > [176611.094059] RBP: ffff8804971e3d70 R08: 0000000000016e39 R09: > > 0000000000000004 > > [176611.094074] R10: 800000000d81f065 R11: ffffffff81f19c4e R12: > > ffff880034d2e0a0 > > [176611.094088] R13: 00000001f9b06600 R14: ffffea00003607c0 R15: > > ffff880495b3bc00 > > [176611.094103] FS: 00007f0a91e71700(0000) GS:ffff8804af240000(0000) > > knlGS:0000000000000000 > > [176611.094119] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > [176611.094131] CR2: 00001f9b0661fcc8 CR3: 0000000497097000 CR4: > > 00000000001406e0 > > [176611.094146] Stack: > > [176611.094151] ffff880042301398 00001f9b0661fcc8 ffffea0011c746b0 > > ffff8804971e3df8 > > [176611.094169] ffffffff811ccdd7 000000000000000c ffff880471d1a0f8 > > ffff880498d2f198 > > [176611.094186] 0000000000000001 ffff8804971e3e50 ffffffff8119b156 > > 0000000000000001 > > [176611.094203] Call Trace: > > [176611.094213] [<ffffffff811ccdd7>] do_wp_page+0x487/0x710 > > [176611.094225] [<ffffffff8119b156>] ? generic_file_read_iter+0x606/0x6f0 > > [176611.094238] [<ffffffff811cf1e9>] handle_mm_fault+0xf59/0x1d30 > > [176611.094252] [<ffffffff8121eef7>] ? __vfs_read+0xa7/0xd0 > > [176611.094266] [<ffffffff81066298>] __do_page_fault+0x1a8/0x520 > > [176611.094280] [<ffffffff81066632>] do_page_fault+0x22/0x30 > > [176611.094295] [<ffffffff81759508>] page_fault+0x28/0x30 > > [176611.094306] Code: 20 05 a1 81 e8 2f d0 fe ff 0f 0b e8 68 ce fe ff 0f 0b > > 48 89 d6 e8 ee 32 01 00 eb cd 48 c7 c6 b0 2e a1 81 48 89 df e8 0d d0 fe ff > > <0f> 0b 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 > > [176611.094386] RIP [<ffffffff811dbcb3>] page_move_anon_rmap+0x93/0xa0 > > [176611.094400] RSP <ffff8804971e3d58> > > [176611.099920] ---[ end trace d9cb6b7ad0bd6c55 ]--- > > [176611.099922] note: BrowserBlocking[2851] exited with preempt_count 1 > > > > I haven't bisected this yet but there seems to be only one commit > > touching mm in v4.6 so I kind of suspect that it has something to do > > with the issue. I'll try to revert it next and see if that changes > > anything. > > > > I've seen the issue now few times but I have no easy means to reproduce > > it. Only thing that seems to be consistent is the fact that the running > > process is always chrome. > > > > The commit in question is: > > > > 6d0a07edd17c ("mm: thp: calculate the mapcount correctly for THP pages > > during WP faults"). > > > > Does this ring any bells? Thanks in advance. > > Looks like we forgot to align address if the page is huge. > I'm not sure if caller or callee should do this. > > Below is callee version. > > Note that we use address only in CONFIG_DEBUG_VM=y case and the bug is not > visible on production kernels with the option disabled. > > diff --git a/mm/rmap.c b/mm/rmap.c > index 8a839935b18c..0ea5d9071b32 100644 > --- a/mm/rmap.c > +++ b/mm/rmap.c > @@ -1098,6 +1098,8 @@ void page_move_anon_rmap(struct page *page, > > VM_BUG_ON_PAGE(!PageLocked(page), page); > VM_BUG_ON_VMA(!anon_vma, vma); > + if (IS_ENABLED(CONFIG_DEBUG_VM) && PageTransHuge(page)) > + address &= HPAGE_PMD_MASK; > VM_BUG_ON_PAGE(page->index != linear_page_index(vma, address), page); > > anon_vma = (void *) anon_vma + PAGE_MAPPING_ANON;
Reviewed-by: Andrea Arcangeli <aarca...@redhat.com> Just sent a patch doing the exact same thing just emebedded in the VM_BUG_ON_PAGE, either version is fine with me.