On 12.10.2013 14:54, Fengguang Wu wrote: > On Sat, Oct 12, 2013 at 07:47:05AM +1000, Dave Airlie wrote: >>>> This is my preferred method of fixing it as I don't think the lifetimes >>>> need >>>> to be tied so closely, though this requires review my someone to make sure >>>> my unregistering etc is correct and in the right place. >>> Apparently this fixes the problem for Fengguang, and the code looks >>> cleaner too. Thanks, >> Leaves the fixes or next question, since I suppose its not technically >> a regression, -next is probably fine, let me know if you'd l like them >> earlier. > Dave, I tested the two patches on top of drm-next and find that it > does help eliminate the lots of oops messages in the below kernels. > > However in the 2nd config, the patched kernel has one new oops type > than its base kernels (6aba5b6 and v3.12-rc3). v3.12-rc4 is also > tested for your reference. In the 3nd config, both patched and base > kernels have that oops: > > [ 96.969429] init: plymouth-upstart-bridge main process (309) terminated > with status 1 > * Asking all remaining processes to terminate... > [ 97.260371] BUG: Bad page map in process killall5 pte:4f426de0 > pmd:0f4f4067 > [ 97.261114] addr:3fc00000 vm_flags:00100173 anon_vma:4f4066c0 mapping: > (null) index:3ffe6 > [ 97.261912] CPU: 0 PID: 334 Comm: killall5 Not tainted > 3.12.0-rc3-00156-gdaeb5e3 #1 > [ 97.262633] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 > [ 97.263192] 3fc00000 4f4c1e14 4212e45c 4fbff9a0 4f4c1e4c 411a9c4b > 4262ade0 3fc00000 > [ 97.264051] 00100173 4f4066c0 00000000 0003ffe6 4f426de0 0003ffe6 > 00000000 4fbff9a0 > [ 97.264906] 3fc00000 3fc00000 4f4c1e60 411ab50e 00000000 4f464000 > 00000000 4f4c1ed0 > [ 97.265751] Call Trace: > [ 97.266022] [<4212e45c>] dump_stack+0xbb/0x14b > [ 97.266456] [<411a9c4b>] print_bad_pte+0x28b/0x2c0 > [ 97.266931] [<411ab50e>] vm_normal_page+0xae/0xe0 > [ 97.267388] [<411b37f3>] munlock_vma_pages_range+0x143/0x320 > [ 97.267950] [<410d30fd>] ? sched_clock_cpu+0x20d/0x250 > [ 97.268451] [<411bacee>] exit_mmap+0x7e/0x200 > [ 97.268893] [<4131de9c>] ? __const_udelay+0x2c/0x40 > [ 97.269369] [<410adf28>] ? __rcu_read_unlock+0x68/0x150 > [ 97.269888] [<4123227b>] ? exit_aio+0x11b/0x280 > [ 97.270412] [<4123217c>] ? exit_aio+0x1c/0x280 > [ 97.270892] [<410829c7>] ? do_exit+0x697/0x1280 > [ 97.271332] [<4107a1a1>] mmput+0x81/0x170 > [ 97.271726] [<410829dc>] do_exit+0x6ac/0x1280 > [ 97.272166] [<410bad75>] ? hrtimer_nanosleep+0x1f5/0x210 > [ 97.272679] [<4215328a>] ? sysenter_exit+0xf/0x45 > [ 97.273151] [<41083751>] do_group_exit+0x131/0x160 > [ 97.273617] [<410837ad>] SyS_exit_group+0x2d/0x30 > [ 97.274088] [<42153251>] sysenter_do_call+0x12/0x3c > [ 97.274560] Disabling lock debugging due to kernel taint > * All processes ended within 1 seconds.... > > That oops message looks very like the one I reported for this commit. > > commit 7a8010cd36273ff5f6fea5201ef9232f30cebbd9 > Author: Vlastimil Babka<vbabka at suse.cz> > Date: Wed Sep 11 14:22:35 2013 -0700 > > mm: munlock: manual pte walk in fast path instead of follow_page_mask() > > Vlastimil Babka, has this bug been fixed in -rc4? >
Yes, this should have been fixed by commit eadb41ae82f80210 "mm/mlock.c: prevent walking off the end of a pagetable in no-pmd configuration", merged between rc3 and rc4. Vlastimil Babka