Re: v4.6 kernel BUG at mm/rmap.c:1101!

Andrea Arcangeli Mon, 23 May 2016 08:19:55 -0700

On Mon, May 23, 2016 at 05:24:59PM +0300, Kirill A. Shutemov wrote:
> On Mon, May 23, 2016 at 05:06:38PM +0300, Mika Westerberg wrote:
> > Hi,
> > 
> > After upgrading kernel of my desktop system from v4.6-rc7 to v4.6, I've
> > started seeing following:
> > 
> > [176611.093747] page:ffffea0000360000 count:1 mapcount:0 
> > mapping:ffff880034d2e0a1 index:0x1f9b06600 compound_mapcount: 0
> > [176611.093751] flags: 
> > 0x3fff8000044079(locked|uptodate|dirty|lru|active|head|swapbacked)
> > [176611.093752] page dumped because: VM_BUG_ON_PAGE(page->index != 
> > linear_page_index(vma, address))
> > [176611.093753] page->mem_cgroup:ffff88049e81b800
> > [176611.093765] ------------[ cut here ]------------
> > [176611.093778] kernel BUG at mm/rmap.c:1101!
> > [176611.093787] invalid opcode: 0000 [#1] PREEMPT SMP 
> > [176611.093800] Modules linked in: vfat fat usb_storage fuse bridge stp llc 
> > ebtable_filter ebtables ip6table_filter ip6_tables pl2303 
> > snd_hda_codec_realtek snd_hda_codec_hdmi snd_hda_codec_generic 
> > snd_hda_intel snd_hda_codec x86_pkg_temp_thermal coretemp kvm_intel 
> > snd_hwdep snd_hda_core kvm snd_seq snd_seq_device iTCO_wdt 
> > iTCO_vendor_support snd_pcm mxm_wmi irqbypass crct10dif_pclmul joydev 
> > crc32_pclmul crc32c_intel mei_me snd_timer ghash_clmulni_intel snd mei 
> > lpc_ich i2c_i801 shpchp mfd_core soundcore wmi i915 drm_kms_helper drm 
> > e1000e igb serio_raw dca i2c_algo_bit i2c_core ptp pps_core video
> > [176611.093947] CPU: 1 PID: 2851 Comm: BrowserBlocking Tainted: G          
> > I     4.6.0 #71
> > [176611.093962] Hardware name: Gigabyte Technology Co., Ltd. Z87X-UD7 
> > TH/Z87X-UD7 TH-CF, BIOS F4 03/18/2014
> > [176611.093981] task: ffff880492193600 ti: ffff8804971e0000 task.ti: 
> > ffff8804971e0000
> > [176611.093996] RIP: 0010:[<ffffffff811dbcb3>]  [<ffffffff811dbcb3>] 
> > page_move_anon_rmap+0x93/0xa0
> > [176611.094018] RSP: 0000:ffff8804971e3d58  EFLAGS: 00010296
> > [176611.094030] RAX: 0000000000000021 RBX: ffffea0000360000 RCX: 
> > 0000000000000002
> > [176611.094045] RDX: 0000000080000002 RSI: ffffffff81a2dce2 RDI: 
> > 00000000ffffffff
> > [176611.094059] RBP: ffff8804971e3d70 R08: 0000000000016e39 R09: 
> > 0000000000000004
> > [176611.094074] R10: 800000000d81f065 R11: ffffffff81f19c4e R12: 
> > ffff880034d2e0a0
> > [176611.094088] R13: 00000001f9b06600 R14: ffffea00003607c0 R15: 
> > ffff880495b3bc00
> > [176611.094103] FS:  00007f0a91e71700(0000) GS:ffff8804af240000(0000) 
> > knlGS:0000000000000000
> > [176611.094119] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [176611.094131] CR2: 00001f9b0661fcc8 CR3: 0000000497097000 CR4: 
> > 00000000001406e0
> > [176611.094146] Stack:
> > [176611.094151]  ffff880042301398 00001f9b0661fcc8 ffffea0011c746b0 
> > ffff8804971e3df8
> > [176611.094169]  ffffffff811ccdd7 000000000000000c ffff880471d1a0f8 
> > ffff880498d2f198
> > [176611.094186]  0000000000000001 ffff8804971e3e50 ffffffff8119b156 
> > 0000000000000001
> > [176611.094203] Call Trace:
> > [176611.094213]  [<ffffffff811ccdd7>] do_wp_page+0x487/0x710
> > [176611.094225]  [<ffffffff8119b156>] ? generic_file_read_iter+0x606/0x6f0
> > [176611.094238]  [<ffffffff811cf1e9>] handle_mm_fault+0xf59/0x1d30
> > [176611.094252]  [<ffffffff8121eef7>] ? __vfs_read+0xa7/0xd0
> > [176611.094266]  [<ffffffff81066298>] __do_page_fault+0x1a8/0x520
> > [176611.094280]  [<ffffffff81066632>] do_page_fault+0x22/0x30
> > [176611.094295]  [<ffffffff81759508>] page_fault+0x28/0x30
> > [176611.094306] Code: 20 05 a1 81 e8 2f d0 fe ff 0f 0b e8 68 ce fe ff 0f 0b 
> > 48 89 d6 e8 ee 32 01 00 eb cd 48 c7 c6 b0 2e a1 81 48 89 df e8 0d d0 fe ff 
> > <0f> 0b 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 
> > [176611.094386] RIP  [<ffffffff811dbcb3>] page_move_anon_rmap+0x93/0xa0
> > [176611.094400]  RSP <ffff8804971e3d58>
> > [176611.099920] ---[ end trace d9cb6b7ad0bd6c55 ]---
> > [176611.099922] note: BrowserBlocking[2851] exited with preempt_count 1
> > 
> > I haven't bisected this yet but there seems to be only one commit
> > touching mm in v4.6 so I kind of suspect that it has something to do
> > with the issue. I'll try to revert it next and see if that changes
> > anything.
> > 
> > I've seen the issue now few times but I have no easy means to reproduce
> > it. Only thing that seems to be consistent is the fact that the running
> > process is always chrome.
> > 
> > The commit in question is:
> > 
> > 6d0a07edd17c ("mm: thp: calculate the mapcount correctly for THP pages
> > during WP faults").
> > 
> > Does this ring any bells? Thanks in advance.
> 
> Looks like we forgot to align address if the page is huge.
> I'm not sure if caller or callee should do this.
> 
> Below is callee version.
> 
> Note that we use address only in CONFIG_DEBUG_VM=y case and the bug is not
> visible on production kernels with the option disabled.
> 
> diff --git a/mm/rmap.c b/mm/rmap.c
> index 8a839935b18c..0ea5d9071b32 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -1098,6 +1098,8 @@ void page_move_anon_rmap(struct page *page,
>  
>       VM_BUG_ON_PAGE(!PageLocked(page), page);
>       VM_BUG_ON_VMA(!anon_vma, vma);
> +     if (IS_ENABLED(CONFIG_DEBUG_VM) && PageTransHuge(page))
> +             address &= HPAGE_PMD_MASK;
>       VM_BUG_ON_PAGE(page->index != linear_page_index(vma, address), page);
>  
>       anon_vma = (void *) anon_vma + PAGE_MAPPING_ANON;


Reviewed-by: Andrea Arcangeli <[email protected]>

Just sent a patch doing the exact same thing just emebedded in the
VM_BUG_ON_PAGE, either version is fine with me.

Re: v4.6 kernel BUG at mm/rmap.c:1101!

Reply via email to