On Fri, May 20, 2016 at 6:57 PM, Logan Gunthorpe <log...@deltatee.com> wrote: > On 20/05/16 04:16 PM, Kees Cook wrote: >> >> On Fri, May 20, 2016 at 2:59 PM, Kees Cook <keesc...@chromium.org> wrote: >>> >>> On Fri, May 20, 2016 at 2:46 PM, Rafael J. Wysocki <raf...@kernel.org> >>> wrote: >>>> >>>> On Fri, May 20, 2016 at 3:56 PM, Stephen Smalley <s...@tycho.nsa.gov> >>>> wrote: >>>>> >>>>> On 05/20/2016 07:34 AM, Rafael J. Wysocki wrote: >>>>>> >>>>>> On Fri, May 20, 2016 at 9:15 AM, Ingo Molnar <mi...@kernel.org> wrote: >>>>>>> >>>>>>> >>>>>>> * Logan Gunthorpe <log...@deltatee.com> wrote: >>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> I have been working on a bug that causes my laptop to freeze during >>>>>>>> resume from hibernation. I did a bisect to find the offending >>>>>>>> commit: >>>>>>>> >>>>>>>> [ab76f7b4ab] x86/mm: Set NX on gap between __ex_table and rodata >>>>>>>> >>>>>>>> There is more information in the bugzilla report [1] that >>>>>>>> I've been working on but I will summarize things below. >>>>>>>> >>>>>>>> I've experienced intermittent but reproducible freezes when resuming >>>>>>>> from hibernation since about kernel version 3.19. The freeze was >>>>>>>> significantly more reproducible when a few applications were loaded >>>>>>>> before hibernation and would largely not happen if hibernated >>>>>>>> immediately after booting to a desktop. I did some tracing work to >>>>>>>> find >>>>>>>> that the kernel gets as far as the resume_image call in >>>>>>>> swsusp_arch_resume and I could not find any response from the image >>>>>>>> kernel when I hit the bug. I also did testing that seemed to rule >>>>>>>> out >>>>>>>> this being caused by a problematic driver. >>>>>>>> >>>>>>>> I did a successful bisect between 3.18 and 3.19 which found a bug in >>>>>>>> commit f5b2831d6 that was then later fixed by commit 55696b1f66 in >>>>>>>> 4.4. >>>>>>>> Then, I did a second bisect with a ported version of the fix to the >>>>>>>> first bug and found commit ab76f7b4ab in 4.3 to also break >>>>>>>> hibernation >>>>>>>> with what appears to be the exact same symptoms. Reverting that >>>>>>>> commit >>>>>>>> in recent kernels up to and including 4.6 fixes the issue and >>>>>>>> restores >>>>>>>> reliable hibernation. However, it's not at all clear to me why that >>>>>>>> commit would cause this issue or how to fix the issue without >>>>>>>> reverting. >>>>>>> >>>>>>> >>>>>>> I've attached that commit below and also Cc:-ed a few more people who >>>>>>> might have >>>>>>> an idea about why this regressed. Worst-case we'll have to revert it. >>>>>> >>>>>> >>>>>> Without looking deep into mm, my theory would be that after this patch >>>>>> the final jump from the boot kernel to the image kernel's trampoline >>>>>> code during resume may crash the kernel if the trampoline page turns >>>>>> out to be NX in the boot kernel (it has to be executable in both the >>>>>> boot and the image kernels). >>>>> >>>>> >>>>> So, pardon my ignorance, but where is this trampoline page placed in >>>>> kernel memory? >>>> >>>> >>>> On 32-bit its location has to be the same in both the boot and the >>>> image kernels and that's within kernel text in both cases, so that >>>> shouldn't be a problem. >>>> >>>> On 64-bit its location depends on the image kernel and specifically on >>>> the location of the restore_registers routine in it. The (virtual) >>>> address of that routine is stored in the restore_jump_address >>>> variable, so the page containing it (the trampoline page) can be found >>>> with the help of that. >>>> >>>> swsusp_arch_resume() sets up a temporary kernel mapping to finalize >>>> the image restoration and that page must not be NX in that mapping for >>>> things to work. >>> >>> >>> It looks like nothing in the swsusp_arch_resume() -> get_safe_page() >>> -> get_image_page() path sets the page executable... >>> >>> Untested, but I wonder if this work work in swsusp_arch_resume() >>> before the memcpy? >> >> >> I can't type today, it seems. It should read "... if this would work ..." >> >> If you can test this and it works for you, I'll send a proper patch... :P >> >> -Kees >> > > Hi Kees, > > Thanks. I tried the patch but it only resulted in a kernel warning and > freeze. I've attached a photo showing as much of the messages as I could > get. > > Logan
Ah, dang, ok, thanks for trying it. I'll let Rafael try to figure this one out. -Kees -- Kees Cook Chrome OS & Brillo Security