On Tuesday, June 14, 2016 08:06:49 PM chenyu wrote: > On Mon, Jun 13, 2016 at 9:42 PM, Rafael J. Wysocki <r...@rjwysocki.net> wrote: > > From: Rafael J. Wysocki <rafael.j.wyso...@intel.com> > > > > Logan Gunthorpe reports that hibernation stopped working reliably for > > him after commit ab76f7b4ab23 (x86/mm: Set NX on gap between __ex_table > > and rodata). Most likely, what happens is that the page containing > > the image kernel's entry point is sometimes marked as non-executable > > in the page tables used at the time of the final jump to the image > > kernel. That at least is why commit ab76f7b4ab23 may matter. > > > > However, there is one more long-standing issue with the code in > > question, which is that the temporary page tables set up by it > > to avoid page tables corruption when the last bits of the image > > kernel's memory contents are copied into their original page frames > > re-use the boot kernel's text mapping, but that mapping may very > > well get corrupted just like any other part of the page tables. > > Of course, if that happens, the final jump to the image kernel's > > entry point will go to nowhere. > > > 100 rounds test has passed with this patch on top of 4.7-rc3, > Tested-by: Chen Yu <yu.c.c...@intel.com> > > BTW, I'm thinking of another possible scenario this patch fixed the NX issue, > according to the log previously provided by Logan in bugzilla 116941 > > without ab76f7b4ab23: > > --[ High Kernel Mapping ]--- > 0xffffffff80000000-0xffffffff81000000 16M > pmd > 0xffffffff81000000-0xffffffff81600000 6M ro PSE > GLB x pmd > 0xffffffff81600000-0xffffffff81800000 2M ro PSE > GLB NX pmd > 0xffffffff81800000-0xffffffff81c00000 4M RW > GLB NX pte > 0xffffffff81c00000-0xffffffffa0000000 484M > pmd > > with ab76f7b4ab23: > > ---[ High Kernel Mapping ]--- > 0xffffffff80000000-0xffffffff81000000 16M > pmd > 0xffffffff81000000-0xffffffff81400000 4M ro PSE > GLB x pmd > 0xffffffff81400000-0xffffffff8155e000 1400K ro > GLB x pte > 0xffffffff8155e000-0xffffffff81600000 648K RW > GLB NX pte > 0xffffffff81600000-0xffffffff81800000 2M ro PSE > GLB NX pmd > 0xffffffff81800000-0xffffffff81c00000 4M RW > GLB NX pte > 0xffffffff81c00000-0xffffffffa0000000 484M > pmd > > ffffffff81446bb0 T restore_registers > > > It looks like after the NX modification, the 'huge page' text mapping > is splited into smaller pieces, > from pmd to pte mapping, and since the original pmd is located in > .data section(which should be > the same across hibernation), while after modification the pte table > is allocated dynamically, > we can not guarantee the dynamically allocated pte table are the same > across hibernation, > thus the kernel entry of restore_registers might become unaccessible > because of broken > page table.
Right. Quite frankly, I suspected something like that, but wasn't quite sure, so thanks a lot for that analysis! Rafael