Re: [PATCH 4.4 00/37] 4.4.110-stable review
On Fri, 12 Jan 2018, Greg Kroah-Hartman wrote: > On Fri, Jan 12, 2018 at 12:03:10AM +0100, Thomas Gleixner wrote: > > So the transition to long mode for secondaries uses the trampoline pgd for > > long mode transition and then jumping to secondary_startup_64 where CR3 is > > set to the real kernel page tables. > > Ok, so the summary is that this patch is only needed for the 4.4 and 4.9 > kernels, and _NOT_ for Linus's tree and 4.14, right? Correct.
Re: [PATCH 4.4 00/37] 4.4.110-stable review
On Fri, 12 Jan 2018, Greg Kroah-Hartman wrote: > On Fri, Jan 12, 2018 at 12:03:10AM +0100, Thomas Gleixner wrote: > > So the transition to long mode for secondaries uses the trampoline pgd for > > long mode transition and then jumping to secondary_startup_64 where CR3 is > > set to the real kernel page tables. > > Ok, so the summary is that this patch is only needed for the 4.4 and 4.9 > kernels, and _NOT_ for Linus's tree and 4.14, right? Correct.
Re: [PATCH 4.4 00/37] 4.4.110-stable review
On Fri, Jan 12, 2018 at 12:03:10AM +0100, Thomas Gleixner wrote: > On Thu, 11 Jan 2018, Thomas Gleixner wrote: > > On Thu, 11 Jan 2018, Thomas Gleixner wrote: > > > On Thu, 11 Jan 2018, Linus Torvalds wrote: > > > > > > > On Thu, Jan 11, 2018 at 12:37 PM, Thomas Gleixner> > > > wrote: > > > > > > > > > > 67a9108ed431 ("x86/efi: Build our own page table structures") > > > > > > > > > > got rid of EFI depending on real_mode_header->trampoline_pgd > > > > > > > > So I think it only got rid of by default - the codepath is still > > > > there, the allocation is still there, it's just that it's not actually > > > > used unless somebody does that "efi=old_mmap" thing. > > > > > > Yes, the trampoline_pgd is still around, but I can't figure out how it > > > would be used after boot. Confused, digging more. > > > > So coming back to the same commit. From the changelog: > > > > This is caused by mapping EFI regions with RWX permissions. > > There isn't much we can do to restrict the permissions for these > > regions due to the way the firmware toolchains mix code and > > data, but we can at least isolate these mappings so that they do > > not appear in the regular kernel page tables. > > > > In commit d2f7cbe7b26a ("x86/efi: Runtime services virtual > > mapping") we started using 'trampoline_pgd' to map the EFI > > regions because there was an existing identity mapping there > > which we use during the SetVirtualAddressMap() call and for > > broken firmware that accesses those addresses. > > > > So this very commit gets rid of the (ab)use of trampoline_pgd and allocates > > efi_pgd, which we made use the proper size. > > > > trampoline_pgd is since then only used to get into long mode in > > realmode/rm/trampoline_64.S and for reboot in machine_real_restart(). > > > > The runtime services stuff does not use it in kernel versions >= 4.6 > > But there is one very well hidden user for it after boot: > > It's used for booting secondary CPUs from real mode > > So the transition to long mode for secondaries uses the trampoline pgd for > long mode transition and then jumping to secondary_startup_64 where CR3 is > set to the real kernel page tables. Ok, so the summary is that this patch is only needed for the 4.4 and 4.9 kernels, and _NOT_ for Linus's tree and 4.14, right? thanks, greg k-h
Re: [PATCH 4.4 00/37] 4.4.110-stable review
On Fri, Jan 12, 2018 at 12:03:10AM +0100, Thomas Gleixner wrote: > On Thu, 11 Jan 2018, Thomas Gleixner wrote: > > On Thu, 11 Jan 2018, Thomas Gleixner wrote: > > > On Thu, 11 Jan 2018, Linus Torvalds wrote: > > > > > > > On Thu, Jan 11, 2018 at 12:37 PM, Thomas Gleixner > > > > wrote: > > > > > > > > > > 67a9108ed431 ("x86/efi: Build our own page table structures") > > > > > > > > > > got rid of EFI depending on real_mode_header->trampoline_pgd > > > > > > > > So I think it only got rid of by default - the codepath is still > > > > there, the allocation is still there, it's just that it's not actually > > > > used unless somebody does that "efi=old_mmap" thing. > > > > > > Yes, the trampoline_pgd is still around, but I can't figure out how it > > > would be used after boot. Confused, digging more. > > > > So coming back to the same commit. From the changelog: > > > > This is caused by mapping EFI regions with RWX permissions. > > There isn't much we can do to restrict the permissions for these > > regions due to the way the firmware toolchains mix code and > > data, but we can at least isolate these mappings so that they do > > not appear in the regular kernel page tables. > > > > In commit d2f7cbe7b26a ("x86/efi: Runtime services virtual > > mapping") we started using 'trampoline_pgd' to map the EFI > > regions because there was an existing identity mapping there > > which we use during the SetVirtualAddressMap() call and for > > broken firmware that accesses those addresses. > > > > So this very commit gets rid of the (ab)use of trampoline_pgd and allocates > > efi_pgd, which we made use the proper size. > > > > trampoline_pgd is since then only used to get into long mode in > > realmode/rm/trampoline_64.S and for reboot in machine_real_restart(). > > > > The runtime services stuff does not use it in kernel versions >= 4.6 > > But there is one very well hidden user for it after boot: > > It's used for booting secondary CPUs from real mode > > So the transition to long mode for secondaries uses the trampoline pgd for > long mode transition and then jumping to secondary_startup_64 where CR3 is > set to the real kernel page tables. Ok, so the summary is that this patch is only needed for the 4.4 and 4.9 kernels, and _NOT_ for Linus's tree and 4.14, right? thanks, greg k-h
Re: [PATCH 4.4 00/37] 4.4.110-stable review
On Thu, Jan 11, 2018 at 11:47:23PM +0100, Thomas Gleixner wrote: > On Thu, 11 Jan 2018, Steven Sistare wrote: > > On 1/11/2018 5:30 PM, Thomas Gleixner wrote: > > > On Thu, 11 Jan 2018, Thomas Gleixner wrote: > > >> On Thu, 11 Jan 2018, Linus Torvalds wrote: > > >> > > >>> On Thu, Jan 11, 2018 at 12:37 PM, Thomas Gleixner> > >>> wrote: > > > > 67a9108ed431 ("x86/efi: Build our own page table structures") > > > > got rid of EFI depending on real_mode_header->trampoline_pgd > > >>> > > >>> So I think it only got rid of by default - the codepath is still > > >>> there, the allocation is still there, it's just that it's not actually > > >>> used unless somebody does that "efi=old_mmap" thing. > > >> > > >> Yes, the trampoline_pgd is still around, but I can't figure out how it > > >> would be used after boot. Confused, digging more. > > > > > > So coming back to the same commit. From the changelog: > > > > > > This is caused by mapping EFI regions with RWX permissions. > > > There isn't much we can do to restrict the permissions for these > > > regions due to the way the firmware toolchains mix code and > > > data, but we can at least isolate these mappings so that they do > > > not appear in the regular kernel page tables. > > > > > > In commit d2f7cbe7b26a ("x86/efi: Runtime services virtual > > > mapping") we started using 'trampoline_pgd' to map the EFI > > > regions because there was an existing identity mapping there > > > which we use during the SetVirtualAddressMap() call and for > > > broken firmware that accesses those addresses. > > > > > > So this very commit gets rid of the (ab)use of trampoline_pgd and > > > allocates > > > efi_pgd, which we made use the proper size. > > > > > > trampoline_pgd is since then only used to get into long mode in > > > realmode/rm/trampoline_64.S and for reboot in machine_real_restart(). > > > > > > The runtime services stuff does not use it in kernel versions >= 4.6 > > > > > > Thanks, > > > > > > tglx > > > > Yes, and addressing Linus' concern about EFI_OLD_MEMMAP, those paths are > > independent of it. When EFI_OLD_MMAP is enabled, the efi pgd is not > > used, and the bug will not bite. > > We have a fix queued in tip/x86/pti which addresses a missing NX clear, but > that's a different story. > Since you are talking about NX, I see this in last night's -next: kernel tried to execute NX-protected page - exploit attempt? (uid: 0) BUG: unable to handle kernel paging request at fe007000 IP: 0xfe006e9d PGD ffd6067 P4D ffd6067 PUD ffd5067 PMD ff73067 PTE 8fc09063 Oops: 0011 [#1] PREEMPT SMP PTI Modules linked in: CPU: 0 PID: 1 Comm: init Tainted: GW 4.15.0-rc7-next-20180111-yocto-standard #1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014 RIP: 0010:0xfe006e9d RSP: 0018:aee28000ffd0 EFLAGS: 0006 RAX: 000c RBX: 00400040 RCX: 7f2c4186ad6a RDX: RSI: RDI: b6a0 RBP: 0008 R08: 037f R09: 0064 R10: 078bfbfd R11: 0246 R12: 7f2c41856a60 R13: R14: 00402368 R15: 1000 FS: () GS:95fecfc0() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: fe007000 CR3: 0d88a000 CR4: 003406f0 Call Trace: Code: 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 <90> 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 RIP: 0xfe006e9d RSP: aee28000ffd0 CR2: fe007000 ---[ end trace a82b8742114c1785 ]--- Is this the issue you are talking about, or is the fix triggering the crash ? Guenter
Re: [PATCH 4.4 00/37] 4.4.110-stable review
On Thu, Jan 11, 2018 at 11:47:23PM +0100, Thomas Gleixner wrote: > On Thu, 11 Jan 2018, Steven Sistare wrote: > > On 1/11/2018 5:30 PM, Thomas Gleixner wrote: > > > On Thu, 11 Jan 2018, Thomas Gleixner wrote: > > >> On Thu, 11 Jan 2018, Linus Torvalds wrote: > > >> > > >>> On Thu, Jan 11, 2018 at 12:37 PM, Thomas Gleixner > > >>> wrote: > > > > 67a9108ed431 ("x86/efi: Build our own page table structures") > > > > got rid of EFI depending on real_mode_header->trampoline_pgd > > >>> > > >>> So I think it only got rid of by default - the codepath is still > > >>> there, the allocation is still there, it's just that it's not actually > > >>> used unless somebody does that "efi=old_mmap" thing. > > >> > > >> Yes, the trampoline_pgd is still around, but I can't figure out how it > > >> would be used after boot. Confused, digging more. > > > > > > So coming back to the same commit. From the changelog: > > > > > > This is caused by mapping EFI regions with RWX permissions. > > > There isn't much we can do to restrict the permissions for these > > > regions due to the way the firmware toolchains mix code and > > > data, but we can at least isolate these mappings so that they do > > > not appear in the regular kernel page tables. > > > > > > In commit d2f7cbe7b26a ("x86/efi: Runtime services virtual > > > mapping") we started using 'trampoline_pgd' to map the EFI > > > regions because there was an existing identity mapping there > > > which we use during the SetVirtualAddressMap() call and for > > > broken firmware that accesses those addresses. > > > > > > So this very commit gets rid of the (ab)use of trampoline_pgd and > > > allocates > > > efi_pgd, which we made use the proper size. > > > > > > trampoline_pgd is since then only used to get into long mode in > > > realmode/rm/trampoline_64.S and for reboot in machine_real_restart(). > > > > > > The runtime services stuff does not use it in kernel versions >= 4.6 > > > > > > Thanks, > > > > > > tglx > > > > Yes, and addressing Linus' concern about EFI_OLD_MEMMAP, those paths are > > independent of it. When EFI_OLD_MMAP is enabled, the efi pgd is not > > used, and the bug will not bite. > > We have a fix queued in tip/x86/pti which addresses a missing NX clear, but > that's a different story. > Since you are talking about NX, I see this in last night's -next: kernel tried to execute NX-protected page - exploit attempt? (uid: 0) BUG: unable to handle kernel paging request at fe007000 IP: 0xfe006e9d PGD ffd6067 P4D ffd6067 PUD ffd5067 PMD ff73067 PTE 8fc09063 Oops: 0011 [#1] PREEMPT SMP PTI Modules linked in: CPU: 0 PID: 1 Comm: init Tainted: GW 4.15.0-rc7-next-20180111-yocto-standard #1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014 RIP: 0010:0xfe006e9d RSP: 0018:aee28000ffd0 EFLAGS: 0006 RAX: 000c RBX: 00400040 RCX: 7f2c4186ad6a RDX: RSI: RDI: b6a0 RBP: 0008 R08: 037f R09: 0064 R10: 078bfbfd R11: 0246 R12: 7f2c41856a60 R13: R14: 00402368 R15: 1000 FS: () GS:95fecfc0() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: fe007000 CR3: 0d88a000 CR4: 003406f0 Call Trace: Code: 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 <90> 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 RIP: 0xfe006e9d RSP: aee28000ffd0 CR2: fe007000 ---[ end trace a82b8742114c1785 ]--- Is this the issue you are talking about, or is the fix triggering the crash ? Guenter
Re: [PATCH 4.4 00/37] 4.4.110-stable review
On Thu, 11 Jan 2018, Thomas Gleixner wrote: > On Thu, 11 Jan 2018, Thomas Gleixner wrote: > > On Thu, 11 Jan 2018, Linus Torvalds wrote: > > > > > On Thu, Jan 11, 2018 at 12:37 PM, Thomas Gleixner> > > wrote: > > > > > > > > 67a9108ed431 ("x86/efi: Build our own page table structures") > > > > > > > > got rid of EFI depending on real_mode_header->trampoline_pgd > > > > > > So I think it only got rid of by default - the codepath is still > > > there, the allocation is still there, it's just that it's not actually > > > used unless somebody does that "efi=old_mmap" thing. > > > > Yes, the trampoline_pgd is still around, but I can't figure out how it > > would be used after boot. Confused, digging more. > > So coming back to the same commit. From the changelog: > > This is caused by mapping EFI regions with RWX permissions. > There isn't much we can do to restrict the permissions for these > regions due to the way the firmware toolchains mix code and > data, but we can at least isolate these mappings so that they do > not appear in the regular kernel page tables. > > In commit d2f7cbe7b26a ("x86/efi: Runtime services virtual > mapping") we started using 'trampoline_pgd' to map the EFI > regions because there was an existing identity mapping there > which we use during the SetVirtualAddressMap() call and for > broken firmware that accesses those addresses. > > So this very commit gets rid of the (ab)use of trampoline_pgd and allocates > efi_pgd, which we made use the proper size. > > trampoline_pgd is since then only used to get into long mode in > realmode/rm/trampoline_64.S and for reboot in machine_real_restart(). > > The runtime services stuff does not use it in kernel versions >= 4.6 But there is one very well hidden user for it after boot: It's used for booting secondary CPUs from real mode So the transition to long mode for secondaries uses the trampoline pgd for long mode transition and then jumping to secondary_startup_64 where CR3 is set to the real kernel page tables. Thanks, tglx
Re: [PATCH 4.4 00/37] 4.4.110-stable review
On Thu, 11 Jan 2018, Thomas Gleixner wrote: > On Thu, 11 Jan 2018, Thomas Gleixner wrote: > > On Thu, 11 Jan 2018, Linus Torvalds wrote: > > > > > On Thu, Jan 11, 2018 at 12:37 PM, Thomas Gleixner > > > wrote: > > > > > > > > 67a9108ed431 ("x86/efi: Build our own page table structures") > > > > > > > > got rid of EFI depending on real_mode_header->trampoline_pgd > > > > > > So I think it only got rid of by default - the codepath is still > > > there, the allocation is still there, it's just that it's not actually > > > used unless somebody does that "efi=old_mmap" thing. > > > > Yes, the trampoline_pgd is still around, but I can't figure out how it > > would be used after boot. Confused, digging more. > > So coming back to the same commit. From the changelog: > > This is caused by mapping EFI regions with RWX permissions. > There isn't much we can do to restrict the permissions for these > regions due to the way the firmware toolchains mix code and > data, but we can at least isolate these mappings so that they do > not appear in the regular kernel page tables. > > In commit d2f7cbe7b26a ("x86/efi: Runtime services virtual > mapping") we started using 'trampoline_pgd' to map the EFI > regions because there was an existing identity mapping there > which we use during the SetVirtualAddressMap() call and for > broken firmware that accesses those addresses. > > So this very commit gets rid of the (ab)use of trampoline_pgd and allocates > efi_pgd, which we made use the proper size. > > trampoline_pgd is since then only used to get into long mode in > realmode/rm/trampoline_64.S and for reboot in machine_real_restart(). > > The runtime services stuff does not use it in kernel versions >= 4.6 But there is one very well hidden user for it after boot: It's used for booting secondary CPUs from real mode So the transition to long mode for secondaries uses the trampoline pgd for long mode transition and then jumping to secondary_startup_64 where CR3 is set to the real kernel page tables. Thanks, tglx
Re: [PATCH 4.4 00/37] 4.4.110-stable review
On Thu, Jan 11, 2018 at 2:42 PM, Steven Sistarewrote: > > Yes, and addressing Linus' concern about EFI_OLD_MEMMAP, those paths are > independent of it. When EFI_OLD_MMAP is enabled, the efi pgd is not > used, and the bug will not bite. Ok, good. Thanks for checking. Linus
Re: [PATCH 4.4 00/37] 4.4.110-stable review
On Thu, Jan 11, 2018 at 2:42 PM, Steven Sistare wrote: > > Yes, and addressing Linus' concern about EFI_OLD_MEMMAP, those paths are > independent of it. When EFI_OLD_MMAP is enabled, the efi pgd is not > used, and the bug will not bite. Ok, good. Thanks for checking. Linus
Re: [PATCH 4.4 00/37] 4.4.110-stable review
On Thu, 11 Jan 2018, Steven Sistare wrote: > On 1/11/2018 5:30 PM, Thomas Gleixner wrote: > > On Thu, 11 Jan 2018, Thomas Gleixner wrote: > >> On Thu, 11 Jan 2018, Linus Torvalds wrote: > >> > >>> On Thu, Jan 11, 2018 at 12:37 PM, Thomas Gleixner> >>> wrote: > > 67a9108ed431 ("x86/efi: Build our own page table structures") > > got rid of EFI depending on real_mode_header->trampoline_pgd > >>> > >>> So I think it only got rid of by default - the codepath is still > >>> there, the allocation is still there, it's just that it's not actually > >>> used unless somebody does that "efi=old_mmap" thing. > >> > >> Yes, the trampoline_pgd is still around, but I can't figure out how it > >> would be used after boot. Confused, digging more. > > > > So coming back to the same commit. From the changelog: > > > > This is caused by mapping EFI regions with RWX permissions. > > There isn't much we can do to restrict the permissions for these > > regions due to the way the firmware toolchains mix code and > > data, but we can at least isolate these mappings so that they do > > not appear in the regular kernel page tables. > > > > In commit d2f7cbe7b26a ("x86/efi: Runtime services virtual > > mapping") we started using 'trampoline_pgd' to map the EFI > > regions because there was an existing identity mapping there > > which we use during the SetVirtualAddressMap() call and for > > broken firmware that accesses those addresses. > > > > So this very commit gets rid of the (ab)use of trampoline_pgd and allocates > > efi_pgd, which we made use the proper size. > > > > trampoline_pgd is since then only used to get into long mode in > > realmode/rm/trampoline_64.S and for reboot in machine_real_restart(). > > > > The runtime services stuff does not use it in kernel versions >= 4.6 > > > > Thanks, > > > > tglx > > Yes, and addressing Linus' concern about EFI_OLD_MEMMAP, those paths are > independent of it. When EFI_OLD_MMAP is enabled, the efi pgd is not > used, and the bug will not bite. We have a fix queued in tip/x86/pti which addresses a missing NX clear, but that's a different story. Thanks, tglx
Re: [PATCH 4.4 00/37] 4.4.110-stable review
On Thu, 11 Jan 2018, Steven Sistare wrote: > On 1/11/2018 5:30 PM, Thomas Gleixner wrote: > > On Thu, 11 Jan 2018, Thomas Gleixner wrote: > >> On Thu, 11 Jan 2018, Linus Torvalds wrote: > >> > >>> On Thu, Jan 11, 2018 at 12:37 PM, Thomas Gleixner > >>> wrote: > > 67a9108ed431 ("x86/efi: Build our own page table structures") > > got rid of EFI depending on real_mode_header->trampoline_pgd > >>> > >>> So I think it only got rid of by default - the codepath is still > >>> there, the allocation is still there, it's just that it's not actually > >>> used unless somebody does that "efi=old_mmap" thing. > >> > >> Yes, the trampoline_pgd is still around, but I can't figure out how it > >> would be used after boot. Confused, digging more. > > > > So coming back to the same commit. From the changelog: > > > > This is caused by mapping EFI regions with RWX permissions. > > There isn't much we can do to restrict the permissions for these > > regions due to the way the firmware toolchains mix code and > > data, but we can at least isolate these mappings so that they do > > not appear in the regular kernel page tables. > > > > In commit d2f7cbe7b26a ("x86/efi: Runtime services virtual > > mapping") we started using 'trampoline_pgd' to map the EFI > > regions because there was an existing identity mapping there > > which we use during the SetVirtualAddressMap() call and for > > broken firmware that accesses those addresses. > > > > So this very commit gets rid of the (ab)use of trampoline_pgd and allocates > > efi_pgd, which we made use the proper size. > > > > trampoline_pgd is since then only used to get into long mode in > > realmode/rm/trampoline_64.S and for reboot in machine_real_restart(). > > > > The runtime services stuff does not use it in kernel versions >= 4.6 > > > > Thanks, > > > > tglx > > Yes, and addressing Linus' concern about EFI_OLD_MEMMAP, those paths are > independent of it. When EFI_OLD_MMAP is enabled, the efi pgd is not > used, and the bug will not bite. We have a fix queued in tip/x86/pti which addresses a missing NX clear, but that's a different story. Thanks, tglx
Re: [PATCH 4.4 00/37] 4.4.110-stable review
On 1/11/2018 5:30 PM, Thomas Gleixner wrote: > On Thu, 11 Jan 2018, Thomas Gleixner wrote: >> On Thu, 11 Jan 2018, Linus Torvalds wrote: >> >>> On Thu, Jan 11, 2018 at 12:37 PM, Thomas Gleixner>>> wrote: 67a9108ed431 ("x86/efi: Build our own page table structures") got rid of EFI depending on real_mode_header->trampoline_pgd >>> >>> So I think it only got rid of by default - the codepath is still >>> there, the allocation is still there, it's just that it's not actually >>> used unless somebody does that "efi=old_mmap" thing. >> >> Yes, the trampoline_pgd is still around, but I can't figure out how it >> would be used after boot. Confused, digging more. > > So coming back to the same commit. From the changelog: > > This is caused by mapping EFI regions with RWX permissions. > There isn't much we can do to restrict the permissions for these > regions due to the way the firmware toolchains mix code and > data, but we can at least isolate these mappings so that they do > not appear in the regular kernel page tables. > > In commit d2f7cbe7b26a ("x86/efi: Runtime services virtual > mapping") we started using 'trampoline_pgd' to map the EFI > regions because there was an existing identity mapping there > which we use during the SetVirtualAddressMap() call and for > broken firmware that accesses those addresses. > > So this very commit gets rid of the (ab)use of trampoline_pgd and allocates > efi_pgd, which we made use the proper size. > > trampoline_pgd is since then only used to get into long mode in > realmode/rm/trampoline_64.S and for reboot in machine_real_restart(). > > The runtime services stuff does not use it in kernel versions >= 4.6 > > Thanks, > > tglx Yes, and addressing Linus' concern about EFI_OLD_MEMMAP, those paths are independent of it. When EFI_OLD_MMAP is enabled, the efi pgd is not used, and the bug will not bite. - Steve
Re: [PATCH 4.4 00/37] 4.4.110-stable review
On 1/11/2018 5:30 PM, Thomas Gleixner wrote: > On Thu, 11 Jan 2018, Thomas Gleixner wrote: >> On Thu, 11 Jan 2018, Linus Torvalds wrote: >> >>> On Thu, Jan 11, 2018 at 12:37 PM, Thomas Gleixner >>> wrote: 67a9108ed431 ("x86/efi: Build our own page table structures") got rid of EFI depending on real_mode_header->trampoline_pgd >>> >>> So I think it only got rid of by default - the codepath is still >>> there, the allocation is still there, it's just that it's not actually >>> used unless somebody does that "efi=old_mmap" thing. >> >> Yes, the trampoline_pgd is still around, but I can't figure out how it >> would be used after boot. Confused, digging more. > > So coming back to the same commit. From the changelog: > > This is caused by mapping EFI regions with RWX permissions. > There isn't much we can do to restrict the permissions for these > regions due to the way the firmware toolchains mix code and > data, but we can at least isolate these mappings so that they do > not appear in the regular kernel page tables. > > In commit d2f7cbe7b26a ("x86/efi: Runtime services virtual > mapping") we started using 'trampoline_pgd' to map the EFI > regions because there was an existing identity mapping there > which we use during the SetVirtualAddressMap() call and for > broken firmware that accesses those addresses. > > So this very commit gets rid of the (ab)use of trampoline_pgd and allocates > efi_pgd, which we made use the proper size. > > trampoline_pgd is since then only used to get into long mode in > realmode/rm/trampoline_64.S and for reboot in machine_real_restart(). > > The runtime services stuff does not use it in kernel versions >= 4.6 > > Thanks, > > tglx Yes, and addressing Linus' concern about EFI_OLD_MEMMAP, those paths are independent of it. When EFI_OLD_MMAP is enabled, the efi pgd is not used, and the bug will not bite. - Steve
Re: [PATCH 4.4 00/37] 4.4.110-stable review
On Thu, 11 Jan 2018, Thomas Gleixner wrote: > On Thu, 11 Jan 2018, Linus Torvalds wrote: > > > On Thu, Jan 11, 2018 at 12:37 PM, Thomas Gleixner> > wrote: > > > > > > 67a9108ed431 ("x86/efi: Build our own page table structures") > > > > > > got rid of EFI depending on real_mode_header->trampoline_pgd > > > > So I think it only got rid of by default - the codepath is still > > there, the allocation is still there, it's just that it's not actually > > used unless somebody does that "efi=old_mmap" thing. > > Yes, the trampoline_pgd is still around, but I can't figure out how it > would be used after boot. Confused, digging more. So coming back to the same commit. From the changelog: This is caused by mapping EFI regions with RWX permissions. There isn't much we can do to restrict the permissions for these regions due to the way the firmware toolchains mix code and data, but we can at least isolate these mappings so that they do not appear in the regular kernel page tables. In commit d2f7cbe7b26a ("x86/efi: Runtime services virtual mapping") we started using 'trampoline_pgd' to map the EFI regions because there was an existing identity mapping there which we use during the SetVirtualAddressMap() call and for broken firmware that accesses those addresses. So this very commit gets rid of the (ab)use of trampoline_pgd and allocates efi_pgd, which we made use the proper size. trampoline_pgd is since then only used to get into long mode in realmode/rm/trampoline_64.S and for reboot in machine_real_restart(). The runtime services stuff does not use it in kernel versions >= 4.6 Thanks, tglx
Re: [PATCH 4.4 00/37] 4.4.110-stable review
On Thu, 11 Jan 2018, Thomas Gleixner wrote: > On Thu, 11 Jan 2018, Linus Torvalds wrote: > > > On Thu, Jan 11, 2018 at 12:37 PM, Thomas Gleixner > > wrote: > > > > > > 67a9108ed431 ("x86/efi: Build our own page table structures") > > > > > > got rid of EFI depending on real_mode_header->trampoline_pgd > > > > So I think it only got rid of by default - the codepath is still > > there, the allocation is still there, it's just that it's not actually > > used unless somebody does that "efi=old_mmap" thing. > > Yes, the trampoline_pgd is still around, but I can't figure out how it > would be used after boot. Confused, digging more. So coming back to the same commit. From the changelog: This is caused by mapping EFI regions with RWX permissions. There isn't much we can do to restrict the permissions for these regions due to the way the firmware toolchains mix code and data, but we can at least isolate these mappings so that they do not appear in the regular kernel page tables. In commit d2f7cbe7b26a ("x86/efi: Runtime services virtual mapping") we started using 'trampoline_pgd' to map the EFI regions because there was an existing identity mapping there which we use during the SetVirtualAddressMap() call and for broken firmware that accesses those addresses. So this very commit gets rid of the (ab)use of trampoline_pgd and allocates efi_pgd, which we made use the proper size. trampoline_pgd is since then only used to get into long mode in realmode/rm/trampoline_64.S and for reboot in machine_real_restart(). The runtime services stuff does not use it in kernel versions >= 4.6 Thanks, tglx
Re: [PATCH 4.4 00/37] 4.4.110-stable review
On Thu, 11 Jan 2018, Steven Sistare wrote: > On 1/11/2018 3:46 PM, Linus Torvalds wrote: > > On Thu, Jan 11, 2018 at 12:37 PM, Thomas Gleixner> > wrote: > >> > >> 67a9108ed431 ("x86/efi: Build our own page table structures") > >> > >> got rid of EFI depending on real_mode_header->trampoline_pgd > > > > So I think it only got rid of by default - the codepath is still > > there, the allocation is still there, it's just that it's not actually > > used unless somebody does that "efi=old_mmap" thing. > > > > Looking around, there's at least one quirk for the SGI UV1 system that > > enables EFI_OLD_MMAP automatically. There might be others that I > > missed, but I think that's it. > > > > So it *can* trigger without "efi=old_mmap", but not on any normal machines. > > > > And as Pavel points out, even when the bug is active, it's pretty hard > > to actually trigger. > > > > But yeah, there may be other EFI patches that I didn't notice that > > changed things in other ways too. > > > >Linus > > The bug is not present in the latest upstream kernel because the efi_pgd is > correctly aligned: > > arch/x86/platform/efi/efi_64.c > int __init efi_alloc_page_tables(void) > efi_pgd = (pgd_t *)__get_free_pages(gfp_mask, PGD_ALLOCATION_ORDER); Yes, I came exactly to the same conclusion, but I didn't want to call Linus a moron before I triple checked that trampoline_pgd is still there, but only every used to get out of the realmode swamp at bpot. Thanks, tglx
Re: [PATCH 4.4 00/37] 4.4.110-stable review
On Thu, 11 Jan 2018, Steven Sistare wrote: > On 1/11/2018 3:46 PM, Linus Torvalds wrote: > > On Thu, Jan 11, 2018 at 12:37 PM, Thomas Gleixner > > wrote: > >> > >> 67a9108ed431 ("x86/efi: Build our own page table structures") > >> > >> got rid of EFI depending on real_mode_header->trampoline_pgd > > > > So I think it only got rid of by default - the codepath is still > > there, the allocation is still there, it's just that it's not actually > > used unless somebody does that "efi=old_mmap" thing. > > > > Looking around, there's at least one quirk for the SGI UV1 system that > > enables EFI_OLD_MMAP automatically. There might be others that I > > missed, but I think that's it. > > > > So it *can* trigger without "efi=old_mmap", but not on any normal machines. > > > > And as Pavel points out, even when the bug is active, it's pretty hard > > to actually trigger. > > > > But yeah, there may be other EFI patches that I didn't notice that > > changed things in other ways too. > > > >Linus > > The bug is not present in the latest upstream kernel because the efi_pgd is > correctly aligned: > > arch/x86/platform/efi/efi_64.c > int __init efi_alloc_page_tables(void) > efi_pgd = (pgd_t *)__get_free_pages(gfp_mask, PGD_ALLOCATION_ORDER); Yes, I came exactly to the same conclusion, but I didn't want to call Linus a moron before I triple checked that trampoline_pgd is still there, but only every used to get out of the realmode swamp at bpot. Thanks, tglx
Re: [PATCH 4.4 00/37] 4.4.110-stable review
On 1/11/2018 3:46 PM, Linus Torvalds wrote: > On Thu, Jan 11, 2018 at 12:37 PM, Thomas Gleixnerwrote: >> >> 67a9108ed431 ("x86/efi: Build our own page table structures") >> >> got rid of EFI depending on real_mode_header->trampoline_pgd > > So I think it only got rid of by default - the codepath is still > there, the allocation is still there, it's just that it's not actually > used unless somebody does that "efi=old_mmap" thing. > > Looking around, there's at least one quirk for the SGI UV1 system that > enables EFI_OLD_MMAP automatically. There might be others that I > missed, but I think that's it. > > So it *can* trigger without "efi=old_mmap", but not on any normal machines. > > And as Pavel points out, even when the bug is active, it's pretty hard > to actually trigger. > > But yeah, there may be other EFI patches that I didn't notice that > changed things in other ways too. > >Linus The bug is not present in the latest upstream kernel because the efi_pgd is correctly aligned: arch/x86/platform/efi/efi_64.c int __init efi_alloc_page_tables(void) efi_pgd = (pgd_t *)__get_free_pages(gfp_mask, PGD_ALLOCATION_ORDER); arch/x86/include/asm/pgalloc.h +#ifdef CONFIG_PAGE_TABLE_ISOLATION +#define PGD_ALLOCATION_ORDER 1 +#else +#define PGD_ALLOCATION_ORDER 0 +#endif Pavel's patch fixes kernels prior to 67a9108ed431 ("x86/efi: Build our own page table structures") where the efi pgd allocation looks like: arch/x86/realmode/init.c void __init reserve_real_mode(void) mem = memblock_find_in_range(0, 1<<20, size, PAGE_SIZE); base = __va(mem); real_mode_header = (struct real_mode_header *) base; void __init setup_real_mode(void) trampoline_pgd = (u64 *) __va(real_mode_header->trampoline_pgd); Kernel versions between 67a9108ed431 and the latest also have the bug and need a similar fix: arch/x86/platform/efi/efi_64.c int __init efi_alloc_page_tables(void) efi_pgd = (pgd_t *)__get_free_page(gfp_mask); int __init efi_setup_page_tables(unsigned long pa_memmap, unsigned num_pages) pgd = efi_pgd; efi_scratch.efi_pgt = (pgd_t *)__pa(efi_pgd); All of the code paths above are taken when *not* EFI_OLD_MMAP. - Steve
Re: [PATCH 4.4 00/37] 4.4.110-stable review
On 1/11/2018 3:46 PM, Linus Torvalds wrote: > On Thu, Jan 11, 2018 at 12:37 PM, Thomas Gleixner wrote: >> >> 67a9108ed431 ("x86/efi: Build our own page table structures") >> >> got rid of EFI depending on real_mode_header->trampoline_pgd > > So I think it only got rid of by default - the codepath is still > there, the allocation is still there, it's just that it's not actually > used unless somebody does that "efi=old_mmap" thing. > > Looking around, there's at least one quirk for the SGI UV1 system that > enables EFI_OLD_MMAP automatically. There might be others that I > missed, but I think that's it. > > So it *can* trigger without "efi=old_mmap", but not on any normal machines. > > And as Pavel points out, even when the bug is active, it's pretty hard > to actually trigger. > > But yeah, there may be other EFI patches that I didn't notice that > changed things in other ways too. > >Linus The bug is not present in the latest upstream kernel because the efi_pgd is correctly aligned: arch/x86/platform/efi/efi_64.c int __init efi_alloc_page_tables(void) efi_pgd = (pgd_t *)__get_free_pages(gfp_mask, PGD_ALLOCATION_ORDER); arch/x86/include/asm/pgalloc.h +#ifdef CONFIG_PAGE_TABLE_ISOLATION +#define PGD_ALLOCATION_ORDER 1 +#else +#define PGD_ALLOCATION_ORDER 0 +#endif Pavel's patch fixes kernels prior to 67a9108ed431 ("x86/efi: Build our own page table structures") where the efi pgd allocation looks like: arch/x86/realmode/init.c void __init reserve_real_mode(void) mem = memblock_find_in_range(0, 1<<20, size, PAGE_SIZE); base = __va(mem); real_mode_header = (struct real_mode_header *) base; void __init setup_real_mode(void) trampoline_pgd = (u64 *) __va(real_mode_header->trampoline_pgd); Kernel versions between 67a9108ed431 and the latest also have the bug and need a similar fix: arch/x86/platform/efi/efi_64.c int __init efi_alloc_page_tables(void) efi_pgd = (pgd_t *)__get_free_page(gfp_mask); int __init efi_setup_page_tables(unsigned long pa_memmap, unsigned num_pages) pgd = efi_pgd; efi_scratch.efi_pgt = (pgd_t *)__pa(efi_pgd); All of the code paths above are taken when *not* EFI_OLD_MMAP. - Steve
Re: [PATCH 4.4 00/37] 4.4.110-stable review
On Thu, 11 Jan 2018, Linus Torvalds wrote: > On Thu, Jan 11, 2018 at 12:37 PM, Thomas Gleixnerwrote: > > > > 67a9108ed431 ("x86/efi: Build our own page table structures") > > > > got rid of EFI depending on real_mode_header->trampoline_pgd > > So I think it only got rid of by default - the codepath is still > there, the allocation is still there, it's just that it's not actually > used unless somebody does that "efi=old_mmap" thing. Yes, the trampoline_pgd is still around, but I can't figure out how it would be used after boot. Confused, digging more. Thanks, tglx
Re: [PATCH 4.4 00/37] 4.4.110-stable review
On Thu, 11 Jan 2018, Linus Torvalds wrote: > On Thu, Jan 11, 2018 at 12:37 PM, Thomas Gleixner wrote: > > > > 67a9108ed431 ("x86/efi: Build our own page table structures") > > > > got rid of EFI depending on real_mode_header->trampoline_pgd > > So I think it only got rid of by default - the codepath is still > there, the allocation is still there, it's just that it's not actually > used unless somebody does that "efi=old_mmap" thing. Yes, the trampoline_pgd is still around, but I can't figure out how it would be used after boot. Confused, digging more. Thanks, tglx
Re: [PATCH 4.4 00/37] 4.4.110-stable review
On Thu, Jan 11, 2018 at 12:37 PM, Thomas Gleixnerwrote: > > 67a9108ed431 ("x86/efi: Build our own page table structures") > > got rid of EFI depending on real_mode_header->trampoline_pgd So I think it only got rid of by default - the codepath is still there, the allocation is still there, it's just that it's not actually used unless somebody does that "efi=old_mmap" thing. Looking around, there's at least one quirk for the SGI UV1 system that enables EFI_OLD_MMAP automatically. There might be others that I missed, but I think that's it. So it *can* trigger without "efi=old_mmap", but not on any normal machines. And as Pavel points out, even when the bug is active, it's pretty hard to actually trigger. But yeah, there may be other EFI patches that I didn't notice that changed things in other ways too. Linus
Re: [PATCH 4.4 00/37] 4.4.110-stable review
On Thu, Jan 11, 2018 at 12:37 PM, Thomas Gleixner wrote: > > 67a9108ed431 ("x86/efi: Build our own page table structures") > > got rid of EFI depending on real_mode_header->trampoline_pgd So I think it only got rid of by default - the codepath is still there, the allocation is still there, it's just that it's not actually used unless somebody does that "efi=old_mmap" thing. Looking around, there's at least one quirk for the SGI UV1 system that enables EFI_OLD_MMAP automatically. There might be others that I missed, but I think that's it. So it *can* trigger without "efi=old_mmap", but not on any normal machines. And as Pavel points out, even when the bug is active, it's pretty hard to actually trigger. But yeah, there may be other EFI patches that I didn't notice that changed things in other ways too. Linus
Re: [PATCH 4.4 00/37] 4.4.110-stable review
On Thu, 11 Jan 2018, Linus Torvalds wrote: > [ Patch to make sure the EFI trampoline_pgd is properly aligned and > has the double pgd that KPTI requires ] > > On Thu, Jan 11, 2018 at 10:40 AM, Pavel Tatashin >wrote: > > If it is better to resubmit this patch via git send-email, please let me > > know. > > It would be better, because that way the patch can be more easily > quoted and discussed. > > That said, I do not see why this isn't an issue upstream too. > > As far as I can tell, it's not just 4.4.110. Our current entry code > does that ADJUST_KERNEL_CR3 dance too, which clears the > PTI_SWITCH_MASK bit from cr3. > > And that realmode trampoline pgd seems all to be just aligned to PAGE_SIZE. Right, but see below. > Now, in the modern world, we generate new page tables for EFI, but we > still have that EFI_OLD_MEMMAP code that disables that. And afaik, > EFI_OLD_MEMMAP has the exact same problem that your patch fixes in 4.4 > (where it's always on). > > So I think this patch should go into the development kernel too. > > Or maybe it already is, and I just haven't gotten it yet. It's not. There is an efi oldmap fix pending, but that's a different story. > Or - even more likely - I'm missing something entirely, and even > EFI_OLD_MEMMAP solved this some other way upstream. 67a9108ed431 ("x86/efi: Build our own page table structures") got rid of EFI depending on real_mode_header->trampoline_pgd So I don't see how upstream needs the fix as the trampoline_pgd seems only to be used when coming out of the boot loader. Adding Matt. He stepped back from EFI, but he might still know. > Adding Thomas Gleixner explicitly to the participants so that he can > tell me I'm a moron and point me to the right thing. Your wish is my command, but I need to stare some more before doing so. Thanks, tglx
Re: [PATCH 4.4 00/37] 4.4.110-stable review
On Thu, 11 Jan 2018, Linus Torvalds wrote: > [ Patch to make sure the EFI trampoline_pgd is properly aligned and > has the double pgd that KPTI requires ] > > On Thu, Jan 11, 2018 at 10:40 AM, Pavel Tatashin > wrote: > > If it is better to resubmit this patch via git send-email, please let me > > know. > > It would be better, because that way the patch can be more easily > quoted and discussed. > > That said, I do not see why this isn't an issue upstream too. > > As far as I can tell, it's not just 4.4.110. Our current entry code > does that ADJUST_KERNEL_CR3 dance too, which clears the > PTI_SWITCH_MASK bit from cr3. > > And that realmode trampoline pgd seems all to be just aligned to PAGE_SIZE. Right, but see below. > Now, in the modern world, we generate new page tables for EFI, but we > still have that EFI_OLD_MEMMAP code that disables that. And afaik, > EFI_OLD_MEMMAP has the exact same problem that your patch fixes in 4.4 > (where it's always on). > > So I think this patch should go into the development kernel too. > > Or maybe it already is, and I just haven't gotten it yet. It's not. There is an efi oldmap fix pending, but that's a different story. > Or - even more likely - I'm missing something entirely, and even > EFI_OLD_MEMMAP solved this some other way upstream. 67a9108ed431 ("x86/efi: Build our own page table structures") got rid of EFI depending on real_mode_header->trampoline_pgd So I don't see how upstream needs the fix as the trampoline_pgd seems only to be used when coming out of the boot loader. Adding Matt. He stepped back from EFI, but he might still know. > Adding Thomas Gleixner explicitly to the participants so that he can > tell me I'm a moron and point me to the right thing. Your wish is my command, but I need to stare some more before doing so. Thanks, tglx
Re: [PATCH 4.4 00/37] 4.4.110-stable review
On 01/11/2018 03:10 PM, Greg Kroah-Hartman wrote: On Thu, Jan 11, 2018 at 01:36:50PM -0500, Pavel Tatashin wrote: I have root caused the memory corruption panics/hangs that I've been experiencing during boot with the latest 4.4.110 kernel. The problem as was suspected by Andy Lutomirski is with interaction between PTI and EFI. It may affect any system that has EFI bios. I have not verified if it can affect any other kernel beside 4.4.110 Attached is the fix for this issue with explanations that Steve Sistare and I developed. Nice, but why does this not show up in 4.9 and 4.14 and Linus's tree as well on this hardware? Nor on the SLES12 SP3 kernel? What is different there that 4.4 requires? That worries me more than your fix (which looks good to me, fwiw.) Hi Greg, I have not studied other versions of kernels, efi was changed substantially since 4.4. But, even on 4.4.110 there are several things have to happen for this bug to show-up: 1. During boot memmblock must allocate address that is not 2PAGE_SIZE aligned. 2. nmi must arrive exactly when EFI replaced page table. While I was debugging this problem, I tried to enable, kasan, vm_debug, add more printfs etc, but every little change would cause this problem to disappear, or appear less frequently. Thank you, Pavel
Re: [PATCH 4.4 00/37] 4.4.110-stable review
On 01/11/2018 03:10 PM, Greg Kroah-Hartman wrote: On Thu, Jan 11, 2018 at 01:36:50PM -0500, Pavel Tatashin wrote: I have root caused the memory corruption panics/hangs that I've been experiencing during boot with the latest 4.4.110 kernel. The problem as was suspected by Andy Lutomirski is with interaction between PTI and EFI. It may affect any system that has EFI bios. I have not verified if it can affect any other kernel beside 4.4.110 Attached is the fix for this issue with explanations that Steve Sistare and I developed. Nice, but why does this not show up in 4.9 and 4.14 and Linus's tree as well on this hardware? Nor on the SLES12 SP3 kernel? What is different there that 4.4 requires? That worries me more than your fix (which looks good to me, fwiw.) Hi Greg, I have not studied other versions of kernels, efi was changed substantially since 4.4. But, even on 4.4.110 there are several things have to happen for this bug to show-up: 1. During boot memmblock must allocate address that is not 2PAGE_SIZE aligned. 2. nmi must arrive exactly when EFI replaced page table. While I was debugging this problem, I tried to enable, kasan, vm_debug, add more printfs etc, but every little change would cause this problem to disappear, or appear less frequently. Thank you, Pavel
Re: [PATCH 4.4 00/37] 4.4.110-stable review
On Thu, Jan 11, 2018 at 12:10 PM, Greg Kroah-Hartmanwrote: > > Nice, but why does this not show up in 4.9 and 4.14 and Linus's tree as > well on this hardware? Nor on the SLES12 SP3 kernel? > > What is different there that 4.4 requires? That worries me more than > your fix (which looks good to me, fwiw.) I really think it's simply that since v4.6, we've had commit 67a9108ed431 ("x86/efi: Build our own page table structures"), so no normal EFI use actually uses the old legacy mapping unless you passed in "efi=old_map" on the kernel command line. So the bug is there in all versions, it's just that it's normally only noticeable in 4.4. But I might be missing some other difference, so take that with a pinch of salt. Linus
Re: [PATCH 4.4 00/37] 4.4.110-stable review
On Thu, Jan 11, 2018 at 12:10 PM, Greg Kroah-Hartman wrote: > > Nice, but why does this not show up in 4.9 and 4.14 and Linus's tree as > well on this hardware? Nor on the SLES12 SP3 kernel? > > What is different there that 4.4 requires? That worries me more than > your fix (which looks good to me, fwiw.) I really think it's simply that since v4.6, we've had commit 67a9108ed431 ("x86/efi: Build our own page table structures"), so no normal EFI use actually uses the old legacy mapping unless you passed in "efi=old_map" on the kernel command line. So the bug is there in all versions, it's just that it's normally only noticeable in 4.4. But I might be missing some other difference, so take that with a pinch of salt. Linus
Re: [PATCH 4.4 00/37] 4.4.110-stable review
On Thu, Jan 11, 2018 at 01:36:50PM -0500, Pavel Tatashin wrote: > I have root caused the memory corruption panics/hangs that I've been > experiencing during boot with the latest 4.4.110 kernel. The problem > as was suspected by Andy Lutomirski is with interaction between PTI > and EFI. It may affect any system that has EFI bios. I have not > verified if it can affect any other kernel beside 4.4.110 > > Attached is the fix for this issue with explanations that Steve > Sistare and I developed. Nice, but why does this not show up in 4.9 and 4.14 and Linus's tree as well on this hardware? Nor on the SLES12 SP3 kernel? What is different there that 4.4 requires? That worries me more than your fix (which looks good to me, fwiw.) thanks, greg k-h
Re: [PATCH 4.4 00/37] 4.4.110-stable review
On Thu, Jan 11, 2018 at 01:36:50PM -0500, Pavel Tatashin wrote: > I have root caused the memory corruption panics/hangs that I've been > experiencing during boot with the latest 4.4.110 kernel. The problem > as was suspected by Andy Lutomirski is with interaction between PTI > and EFI. It may affect any system that has EFI bios. I have not > verified if it can affect any other kernel beside 4.4.110 > > Attached is the fix for this issue with explanations that Steve > Sistare and I developed. Nice, but why does this not show up in 4.9 and 4.14 and Linus's tree as well on this hardware? Nor on the SLES12 SP3 kernel? What is different there that 4.4 requires? That worries me more than your fix (which looks good to me, fwiw.) thanks, greg k-h
Re: [PATCH 4.4 00/37] 4.4.110-stable review
[ Patch to make sure the EFI trampoline_pgd is properly aligned and has the double pgd that KPTI requires ] On Thu, Jan 11, 2018 at 10:40 AM, Pavel Tatashinwrote: > If it is better to resubmit this patch via git send-email, please let me know. It would be better, because that way the patch can be more easily quoted and discussed. That said, I do not see why this isn't an issue upstream too. As far as I can tell, it's not just 4.4.110. Our current entry code does that ADJUST_KERNEL_CR3 dance too, which clears the PTI_SWITCH_MASK bit from cr3. And that realmode trampoline pgd seems all to be just aligned to PAGE_SIZE. Now, in the modern world, we generate new page tables for EFI, but we still have that EFI_OLD_MEMMAP code that disables that. And afaik, EFI_OLD_MEMMAP has the exact same problem that your patch fixes in 4.4 (where it's always on). So I think this patch should go into the development kernel too. Or maybe it already is, and I just haven't gotten it yet. Or - even more likely - I'm missing something entirely, and even EFI_OLD_MEMMAP solved this some other way upstream. Adding Thomas Gleixner explicitly to the participants so that he can tell me I'm a moron and point me to the right thing. Linus
Re: [PATCH 4.4 00/37] 4.4.110-stable review
[ Patch to make sure the EFI trampoline_pgd is properly aligned and has the double pgd that KPTI requires ] On Thu, Jan 11, 2018 at 10:40 AM, Pavel Tatashin wrote: > If it is better to resubmit this patch via git send-email, please let me know. It would be better, because that way the patch can be more easily quoted and discussed. That said, I do not see why this isn't an issue upstream too. As far as I can tell, it's not just 4.4.110. Our current entry code does that ADJUST_KERNEL_CR3 dance too, which clears the PTI_SWITCH_MASK bit from cr3. And that realmode trampoline pgd seems all to be just aligned to PAGE_SIZE. Now, in the modern world, we generate new page tables for EFI, but we still have that EFI_OLD_MEMMAP code that disables that. And afaik, EFI_OLD_MEMMAP has the exact same problem that your patch fixes in 4.4 (where it's always on). So I think this patch should go into the development kernel too. Or maybe it already is, and I just haven't gotten it yet. Or - even more likely - I'm missing something entirely, and even EFI_OLD_MEMMAP solved this some other way upstream. Adding Thomas Gleixner explicitly to the participants so that he can tell me I'm a moron and point me to the right thing. Linus
Re: [PATCH 4.4 00/37] 4.4.110-stable review
If it is better to resubmit this patch via git send-email, please let me know. Thank you, Pavel On Thu, Jan 11, 2018 at 1:36 PM, Pavel Tatashinwrote: > I have root caused the memory corruption panics/hangs that I've been > experiencing during boot with the latest 4.4.110 kernel. The problem > as was suspected by Andy Lutomirski is with interaction between PTI > and EFI. It may affect any system that has EFI bios. I have not > verified if it can affect any other kernel beside 4.4.110 > > Attached is the fix for this issue with explanations that Steve > Sistare and I developed.
Re: [PATCH 4.4 00/37] 4.4.110-stable review
If it is better to resubmit this patch via git send-email, please let me know. Thank you, Pavel On Thu, Jan 11, 2018 at 1:36 PM, Pavel Tatashin wrote: > I have root caused the memory corruption panics/hangs that I've been > experiencing during boot with the latest 4.4.110 kernel. The problem > as was suspected by Andy Lutomirski is with interaction between PTI > and EFI. It may affect any system that has EFI bios. I have not > verified if it can affect any other kernel beside 4.4.110 > > Attached is the fix for this issue with explanations that Steve > Sistare and I developed.
Re: [PATCH 4.4 00/37] 4.4.110-stable review
I have root caused the memory corruption panics/hangs that I've been experiencing during boot with the latest 4.4.110 kernel. The problem as was suspected by Andy Lutomirski is with interaction between PTI and EFI. It may affect any system that has EFI bios. I have not verified if it can affect any other kernel beside 4.4.110 Attached is the fix for this issue with explanations that Steve Sistare and I developed. From 1189f3568a90ddd40e1418b9687def5d89153ee3 Mon Sep 17 00:00:00 2001 From: Pavel TatashinDate: Thu, 11 Jan 2018 06:50:25 -0800 Subject: [PATCH] x86/pti/efi: broken conversion from efi to kernel page table In entry_64.S we have code like this: /* Unconditionally use kernel CR3 for do_nmi() */ /* %rax is saved above, so OK to clobber here */ ALTERNATIVE "jmp 2f", "movq %cr3, %rax", X86_FEATURE_KAISER /* If PCID enabled, NOFLUSH now and NOFLUSH on return */ ALTERNATIVE "", "bts $63, %rax", X86_FEATURE_PCID pushq %rax /* mask off "user" bit of pgd address and 12 PCID bits: */ andq$(~(X86_CR3_PCID_ASID_MASK | KAISER_SHADOW_PGD_OFFSET)), %rax movq%rax, %cr3 2: /* paranoidentry do_nmi, 0; without TRACE_IRQS_OFF */ calldo_nmi With this instruction: andq$(~(X86_CR3_PCID_ASID_MASK | KAISER_SHADOW_PGD_OFFSET)), %rax We unconditionally switch from whatever our CR3 was to kernel page table. But, in arch/x86/platform/efi/efi_64.c We temporarily set a different page table, that does not have the kernel page table with 0x1000 offset from it. Look in efi_thunk() and efi_thunk_set_virtual_address_map(). So, while CR3 points to the other page table, we get an NMI interrupt, and clear 0x1000 from CR3, resulting in a bogus CR3 if the 0x1000 bit was set. The efi page table comes from realmode/rm/trampoline_64.S: arch/x86/realmode/rm/trampoline_64.S 141 .bss 142 .balign PAGE_SIZE 143 GLOBAL(trampoline_pgd) .space PAGE_SIZE Notice: alignment is PAGE_SIZE, so after applying KAISER_SHADOW_PGD_OFFSET which equal to PAGE_SIZE, we can get a different page table. But, even if we fix alignment, here the trampoline binary is later copied into dynamically allocated memory in reserve_real_mode(), so we need to fix that place as well. Fixes: 8a43ddfb93a0 ("KAISER: Kernel Address Isolation") Signed-off-by: Pavel Tatashin Reviewed-by: Steven Sistare --- arch/x86/include/asm/kaiser.h| 8 arch/x86/realmode/init.c | 4 +++- arch/x86/realmode/rm/trampoline_64.S | 3 ++- 3 files changed, 13 insertions(+), 2 deletions(-) diff --git a/arch/x86/include/asm/kaiser.h b/arch/x86/include/asm/kaiser.h index 802bbbdfe143..e087bd7a8d29 100644 --- a/arch/x86/include/asm/kaiser.h +++ b/arch/x86/include/asm/kaiser.h @@ -19,6 +19,12 @@ #define KAISER_SHADOW_PGD_OFFSET 0x1000 +/* + * A page table address must have this alignment to stay the same when + * KAISER_SHADOW_PGD_OFFSET mask is applied + */ +#define KAISER_KERNEL_PGD_ALIGNMENT (KAISER_SHADOW_PGD_OFFSET << 1) + #ifdef __ASSEMBLY__ #ifdef CONFIG_PAGE_TABLE_ISOLATION @@ -71,6 +77,8 @@ movq PER_CPU_VAR(unsafe_stack_register_backup), %rax #else /* CONFIG_PAGE_TABLE_ISOLATION */ +#define KAISER_KERNEL_PGD_ALIGNMENT PAGE_SIZE + .macro SWITCH_KERNEL_CR3 .endm .macro SWITCH_USER_CR3 diff --git a/arch/x86/realmode/init.c b/arch/x86/realmode/init.c index 0b7a63d98440..cfecb7d6c6a8 100644 --- a/arch/x86/realmode/init.c +++ b/arch/x86/realmode/init.c @@ -1,5 +1,6 @@ #include #include +#include #include #include @@ -15,7 +16,8 @@ void __init reserve_real_mode(void) size_t size = PAGE_ALIGN(real_mode_blob_end - real_mode_blob); /* Has to be under 1M so we can execute real-mode AP code. */ - mem = memblock_find_in_range(0, 1<<20, size, PAGE_SIZE); + mem = memblock_find_in_range(0, 1 << 20, size, + KAISER_KERNEL_PGD_ALIGNMENT); if (!mem) panic("Cannot allocate trampoline\n"); diff --git a/arch/x86/realmode/rm/trampoline_64.S b/arch/x86/realmode/rm/trampoline_64.S index dac7b20d2f9d..781cca63f795 100644 --- a/arch/x86/realmode/rm/trampoline_64.S +++ b/arch/x86/realmode/rm/trampoline_64.S @@ -30,6 +30,7 @@ #include #include #include +#include #include "realmode.h" .text @@ -139,7 +140,7 @@ tr_gdt: tr_gdt_end: .bss - .balign PAGE_SIZE + .balign KAISER_KERNEL_PGD_ALIGNMENT GLOBAL(trampoline_pgd) .space PAGE_SIZE .balign 8 -- 1.8.3.1
Re: [PATCH 4.4 00/37] 4.4.110-stable review
I have root caused the memory corruption panics/hangs that I've been experiencing during boot with the latest 4.4.110 kernel. The problem as was suspected by Andy Lutomirski is with interaction between PTI and EFI. It may affect any system that has EFI bios. I have not verified if it can affect any other kernel beside 4.4.110 Attached is the fix for this issue with explanations that Steve Sistare and I developed. From 1189f3568a90ddd40e1418b9687def5d89153ee3 Mon Sep 17 00:00:00 2001 From: Pavel Tatashin Date: Thu, 11 Jan 2018 06:50:25 -0800 Subject: [PATCH] x86/pti/efi: broken conversion from efi to kernel page table In entry_64.S we have code like this: /* Unconditionally use kernel CR3 for do_nmi() */ /* %rax is saved above, so OK to clobber here */ ALTERNATIVE "jmp 2f", "movq %cr3, %rax", X86_FEATURE_KAISER /* If PCID enabled, NOFLUSH now and NOFLUSH on return */ ALTERNATIVE "", "bts $63, %rax", X86_FEATURE_PCID pushq %rax /* mask off "user" bit of pgd address and 12 PCID bits: */ andq$(~(X86_CR3_PCID_ASID_MASK | KAISER_SHADOW_PGD_OFFSET)), %rax movq%rax, %cr3 2: /* paranoidentry do_nmi, 0; without TRACE_IRQS_OFF */ calldo_nmi With this instruction: andq$(~(X86_CR3_PCID_ASID_MASK | KAISER_SHADOW_PGD_OFFSET)), %rax We unconditionally switch from whatever our CR3 was to kernel page table. But, in arch/x86/platform/efi/efi_64.c We temporarily set a different page table, that does not have the kernel page table with 0x1000 offset from it. Look in efi_thunk() and efi_thunk_set_virtual_address_map(). So, while CR3 points to the other page table, we get an NMI interrupt, and clear 0x1000 from CR3, resulting in a bogus CR3 if the 0x1000 bit was set. The efi page table comes from realmode/rm/trampoline_64.S: arch/x86/realmode/rm/trampoline_64.S 141 .bss 142 .balign PAGE_SIZE 143 GLOBAL(trampoline_pgd) .space PAGE_SIZE Notice: alignment is PAGE_SIZE, so after applying KAISER_SHADOW_PGD_OFFSET which equal to PAGE_SIZE, we can get a different page table. But, even if we fix alignment, here the trampoline binary is later copied into dynamically allocated memory in reserve_real_mode(), so we need to fix that place as well. Fixes: 8a43ddfb93a0 ("KAISER: Kernel Address Isolation") Signed-off-by: Pavel Tatashin Reviewed-by: Steven Sistare --- arch/x86/include/asm/kaiser.h| 8 arch/x86/realmode/init.c | 4 +++- arch/x86/realmode/rm/trampoline_64.S | 3 ++- 3 files changed, 13 insertions(+), 2 deletions(-) diff --git a/arch/x86/include/asm/kaiser.h b/arch/x86/include/asm/kaiser.h index 802bbbdfe143..e087bd7a8d29 100644 --- a/arch/x86/include/asm/kaiser.h +++ b/arch/x86/include/asm/kaiser.h @@ -19,6 +19,12 @@ #define KAISER_SHADOW_PGD_OFFSET 0x1000 +/* + * A page table address must have this alignment to stay the same when + * KAISER_SHADOW_PGD_OFFSET mask is applied + */ +#define KAISER_KERNEL_PGD_ALIGNMENT (KAISER_SHADOW_PGD_OFFSET << 1) + #ifdef __ASSEMBLY__ #ifdef CONFIG_PAGE_TABLE_ISOLATION @@ -71,6 +77,8 @@ movq PER_CPU_VAR(unsafe_stack_register_backup), %rax #else /* CONFIG_PAGE_TABLE_ISOLATION */ +#define KAISER_KERNEL_PGD_ALIGNMENT PAGE_SIZE + .macro SWITCH_KERNEL_CR3 .endm .macro SWITCH_USER_CR3 diff --git a/arch/x86/realmode/init.c b/arch/x86/realmode/init.c index 0b7a63d98440..cfecb7d6c6a8 100644 --- a/arch/x86/realmode/init.c +++ b/arch/x86/realmode/init.c @@ -1,5 +1,6 @@ #include #include +#include #include #include @@ -15,7 +16,8 @@ void __init reserve_real_mode(void) size_t size = PAGE_ALIGN(real_mode_blob_end - real_mode_blob); /* Has to be under 1M so we can execute real-mode AP code. */ - mem = memblock_find_in_range(0, 1<<20, size, PAGE_SIZE); + mem = memblock_find_in_range(0, 1 << 20, size, + KAISER_KERNEL_PGD_ALIGNMENT); if (!mem) panic("Cannot allocate trampoline\n"); diff --git a/arch/x86/realmode/rm/trampoline_64.S b/arch/x86/realmode/rm/trampoline_64.S index dac7b20d2f9d..781cca63f795 100644 --- a/arch/x86/realmode/rm/trampoline_64.S +++ b/arch/x86/realmode/rm/trampoline_64.S @@ -30,6 +30,7 @@ #include #include #include +#include #include "realmode.h" .text @@ -139,7 +140,7 @@ tr_gdt: tr_gdt_end: .bss - .balign PAGE_SIZE + .balign KAISER_KERNEL_PGD_ALIGNMENT GLOBAL(trampoline_pgd) .space PAGE_SIZE .balign 8 -- 1.8.3.1
Re: [PATCH 4.4 00/37] 4.4.110-stable review
Quoting Greg Kroah-Hartman (gre...@linuxfoundation.org): > On Tue, Jan 09, 2018 at 01:49:48PM -0600, Serge E. Hallyn wrote: > > Quoting Greg Kroah-Hartman (gre...@linuxfoundation.org): > > > On Sat, Jan 06, 2018 at 02:20:16AM +0900, Alice Ferrazzi wrote: > > > > On Thu, Jan 4, 2018 at 5:11 AM, Greg Kroah-Hartman > > > >wrote: > > > > > This is the start of the stable review cycle for the 4.4.110 release. > > > > > There are 37 patches in this series, all will be posted as a response > > > > > to this one. If anyone has any issues with these being applied, > > > > > please > > > > > let me know. > > > > > > > > > > Responses should be made by Fri Jan 5 19:50:38 UTC 2018. > > > > > Anything received after that time might be too late. > > > > > > > > > > The whole patch series can be found in one patch at: > > > > > > > > > > kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.4.110-rc1.gz > > > > > or in the git tree and branch at: > > > > > > > > > > git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git > > > > > linux-4.4.y > > > > > and the diffstat can be found below. > > > > > > > > > > > > > This patchset merges correctly with Gentoo patches and GCC version 6.4.0 > > > > The kernel boot up correctly. > > > > Logs: http://kernel1.amd64.dev.gentoo.org:8010/#/builders/5/builds/44 > > > > > > Great, but Gentoo really should be moving to 4.9 and 4.14 here, I > > > hope no one running Gentoo is relying on 4.4 :) > > > > Wait what? > > > > According to https://www.kernel.org/category/releases.html > > 4.4 should be the best bet for longest support, right? Does > > that page need to be updated? If 4.4 is not going to be > > supported, is there anything else with a possible 5-6 years > > of support? > > 4.4 is going to be supported, yes, but really, for a desktop/server > system, why would you ever want to stick with it for anything longer > than a year? No new hardware support is added, and no new features that > you would want are in there. > > The LTS kernels are for the crazy embedded people that don't change > their hardware systems, and have the insane huge number of out-of-tree > patches. No one else should be using those kernels, they should always > be using newer ones, as there are always more issues fixed in newer > kernels than older ones. > > So again, I hope no one running Gentoo, which is a rolling, constantly > updated distro, is using the old and crusty 4.4 kernel release. To do > so is to defeat the purpose of relying on Gentoo in the first place... Ah, I see, yeah that makes sense :) thanks, -serge
Re: [PATCH 4.4 00/37] 4.4.110-stable review
Quoting Greg Kroah-Hartman (gre...@linuxfoundation.org): > On Tue, Jan 09, 2018 at 01:49:48PM -0600, Serge E. Hallyn wrote: > > Quoting Greg Kroah-Hartman (gre...@linuxfoundation.org): > > > On Sat, Jan 06, 2018 at 02:20:16AM +0900, Alice Ferrazzi wrote: > > > > On Thu, Jan 4, 2018 at 5:11 AM, Greg Kroah-Hartman > > > > wrote: > > > > > This is the start of the stable review cycle for the 4.4.110 release. > > > > > There are 37 patches in this series, all will be posted as a response > > > > > to this one. If anyone has any issues with these being applied, > > > > > please > > > > > let me know. > > > > > > > > > > Responses should be made by Fri Jan 5 19:50:38 UTC 2018. > > > > > Anything received after that time might be too late. > > > > > > > > > > The whole patch series can be found in one patch at: > > > > > > > > > > kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.4.110-rc1.gz > > > > > or in the git tree and branch at: > > > > > > > > > > git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git > > > > > linux-4.4.y > > > > > and the diffstat can be found below. > > > > > > > > > > > > > This patchset merges correctly with Gentoo patches and GCC version 6.4.0 > > > > The kernel boot up correctly. > > > > Logs: http://kernel1.amd64.dev.gentoo.org:8010/#/builders/5/builds/44 > > > > > > Great, but Gentoo really should be moving to 4.9 and 4.14 here, I > > > hope no one running Gentoo is relying on 4.4 :) > > > > Wait what? > > > > According to https://www.kernel.org/category/releases.html > > 4.4 should be the best bet for longest support, right? Does > > that page need to be updated? If 4.4 is not going to be > > supported, is there anything else with a possible 5-6 years > > of support? > > 4.4 is going to be supported, yes, but really, for a desktop/server > system, why would you ever want to stick with it for anything longer > than a year? No new hardware support is added, and no new features that > you would want are in there. > > The LTS kernels are for the crazy embedded people that don't change > their hardware systems, and have the insane huge number of out-of-tree > patches. No one else should be using those kernels, they should always > be using newer ones, as there are always more issues fixed in newer > kernels than older ones. > > So again, I hope no one running Gentoo, which is a rolling, constantly > updated distro, is using the old and crusty 4.4 kernel release. To do > so is to defeat the purpose of relying on Gentoo in the first place... Ah, I see, yeah that makes sense :) thanks, -serge
Re: [PATCH 4.4 00/37] 4.4.110-stable review
On Tue, Jan 09, 2018 at 01:49:48PM -0600, Serge E. Hallyn wrote: > Quoting Greg Kroah-Hartman (gre...@linuxfoundation.org): > > On Sat, Jan 06, 2018 at 02:20:16AM +0900, Alice Ferrazzi wrote: > > > On Thu, Jan 4, 2018 at 5:11 AM, Greg Kroah-Hartman > > >wrote: > > > > This is the start of the stable review cycle for the 4.4.110 release. > > > > There are 37 patches in this series, all will be posted as a response > > > > to this one. If anyone has any issues with these being applied, please > > > > let me know. > > > > > > > > Responses should be made by Fri Jan 5 19:50:38 UTC 2018. > > > > Anything received after that time might be too late. > > > > > > > > The whole patch series can be found in one patch at: > > > > > > > > kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.4.110-rc1.gz > > > > or in the git tree and branch at: > > > > > > > > git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git > > > > linux-4.4.y > > > > and the diffstat can be found below. > > > > > > > > > > This patchset merges correctly with Gentoo patches and GCC version 6.4.0 > > > The kernel boot up correctly. > > > Logs: http://kernel1.amd64.dev.gentoo.org:8010/#/builders/5/builds/44 > > > > Great, but Gentoo really should be moving to 4.9 and 4.14 here, I > > hope no one running Gentoo is relying on 4.4 :) > > Wait what? > > According to https://www.kernel.org/category/releases.html > 4.4 should be the best bet for longest support, right? Does > that page need to be updated? If 4.4 is not going to be > supported, is there anything else with a possible 5-6 years > of support? 4.4 is going to be supported, yes, but really, for a desktop/server system, why would you ever want to stick with it for anything longer than a year? No new hardware support is added, and no new features that you would want are in there. The LTS kernels are for the crazy embedded people that don't change their hardware systems, and have the insane huge number of out-of-tree patches. No one else should be using those kernels, they should always be using newer ones, as there are always more issues fixed in newer kernels than older ones. So again, I hope no one running Gentoo, which is a rolling, constantly updated distro, is using the old and crusty 4.4 kernel release. To do so is to defeat the purpose of relying on Gentoo in the first place... thanks, greg k-h
Re: [PATCH 4.4 00/37] 4.4.110-stable review
On Tue, Jan 09, 2018 at 01:49:48PM -0600, Serge E. Hallyn wrote: > Quoting Greg Kroah-Hartman (gre...@linuxfoundation.org): > > On Sat, Jan 06, 2018 at 02:20:16AM +0900, Alice Ferrazzi wrote: > > > On Thu, Jan 4, 2018 at 5:11 AM, Greg Kroah-Hartman > > > wrote: > > > > This is the start of the stable review cycle for the 4.4.110 release. > > > > There are 37 patches in this series, all will be posted as a response > > > > to this one. If anyone has any issues with these being applied, please > > > > let me know. > > > > > > > > Responses should be made by Fri Jan 5 19:50:38 UTC 2018. > > > > Anything received after that time might be too late. > > > > > > > > The whole patch series can be found in one patch at: > > > > > > > > kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.4.110-rc1.gz > > > > or in the git tree and branch at: > > > > > > > > git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git > > > > linux-4.4.y > > > > and the diffstat can be found below. > > > > > > > > > > This patchset merges correctly with Gentoo patches and GCC version 6.4.0 > > > The kernel boot up correctly. > > > Logs: http://kernel1.amd64.dev.gentoo.org:8010/#/builders/5/builds/44 > > > > Great, but Gentoo really should be moving to 4.9 and 4.14 here, I > > hope no one running Gentoo is relying on 4.4 :) > > Wait what? > > According to https://www.kernel.org/category/releases.html > 4.4 should be the best bet for longest support, right? Does > that page need to be updated? If 4.4 is not going to be > supported, is there anything else with a possible 5-6 years > of support? 4.4 is going to be supported, yes, but really, for a desktop/server system, why would you ever want to stick with it for anything longer than a year? No new hardware support is added, and no new features that you would want are in there. The LTS kernels are for the crazy embedded people that don't change their hardware systems, and have the insane huge number of out-of-tree patches. No one else should be using those kernels, they should always be using newer ones, as there are always more issues fixed in newer kernels than older ones. So again, I hope no one running Gentoo, which is a rolling, constantly updated distro, is using the old and crusty 4.4 kernel release. To do so is to defeat the purpose of relying on Gentoo in the first place... thanks, greg k-h
Re: [PATCH 4.4 00/37] 4.4.110-stable review
Quoting Greg Kroah-Hartman (gre...@linuxfoundation.org): > On Sat, Jan 06, 2018 at 02:20:16AM +0900, Alice Ferrazzi wrote: > > On Thu, Jan 4, 2018 at 5:11 AM, Greg Kroah-Hartman > >wrote: > > > This is the start of the stable review cycle for the 4.4.110 release. > > > There are 37 patches in this series, all will be posted as a response > > > to this one. If anyone has any issues with these being applied, please > > > let me know. > > > > > > Responses should be made by Fri Jan 5 19:50:38 UTC 2018. > > > Anything received after that time might be too late. > > > > > > The whole patch series can be found in one patch at: > > > > > > kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.4.110-rc1.gz > > > or in the git tree and branch at: > > > > > > git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git > > > linux-4.4.y > > > and the diffstat can be found below. > > > > > > > This patchset merges correctly with Gentoo patches and GCC version 6.4.0 > > The kernel boot up correctly. > > Logs: http://kernel1.amd64.dev.gentoo.org:8010/#/builders/5/builds/44 > > Great, but Gentoo really should be moving to 4.9 and 4.14 here, I > hope no one running Gentoo is relying on 4.4 :) Wait what? According to https://www.kernel.org/category/releases.html 4.4 should be the best bet for longest support, right? Does that page need to be updated? If 4.4 is not going to be supported, is there anything else with a possible 5-6 years of support?
Re: [PATCH 4.4 00/37] 4.4.110-stable review
Quoting Greg Kroah-Hartman (gre...@linuxfoundation.org): > On Sat, Jan 06, 2018 at 02:20:16AM +0900, Alice Ferrazzi wrote: > > On Thu, Jan 4, 2018 at 5:11 AM, Greg Kroah-Hartman > > wrote: > > > This is the start of the stable review cycle for the 4.4.110 release. > > > There are 37 patches in this series, all will be posted as a response > > > to this one. If anyone has any issues with these being applied, please > > > let me know. > > > > > > Responses should be made by Fri Jan 5 19:50:38 UTC 2018. > > > Anything received after that time might be too late. > > > > > > The whole patch series can be found in one patch at: > > > > > > kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.4.110-rc1.gz > > > or in the git tree and branch at: > > > > > > git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git > > > linux-4.4.y > > > and the diffstat can be found below. > > > > > > > This patchset merges correctly with Gentoo patches and GCC version 6.4.0 > > The kernel boot up correctly. > > Logs: http://kernel1.amd64.dev.gentoo.org:8010/#/builders/5/builds/44 > > Great, but Gentoo really should be moving to 4.9 and 4.14 here, I > hope no one running Gentoo is relying on 4.4 :) Wait what? According to https://www.kernel.org/category/releases.html 4.4 should be the best bet for longest support, right? Does that page need to be updated? If 4.4 is not going to be supported, is there anything else with a possible 5-6 years of support?
Re: [PATCH 4.4 00/37] 4.4.110-stable review
Here is one more: [6.284763] EFI Variables Facility v0.08 2004-May-17 [6.555990] [ cut here ] [6.561145] kernel BUG at /scratch/ptatashi/linux-stable/mm/slub.c:3627! [6.568625] invalid opcode: [#1] SMP [6.573219] Modules linked in: [6.576639] CPU: 1 PID: 364 Comm: kworker/1:1 Not tainted 4.4.110_pt_stable #3 [6.584692] Hardware name: Oracle Corporation ORACLE SERVER X6-2/ASM,MOTHERBOARD,1U, BIOS 38050100 08/30/2016 [6.595766] Workqueue: events clocksource_watchdog_work [6.601611] task: 881fecd82b00 ti: 881fecda4000 task.ti: 881fecda4000 [6.609963] RIP: 0010:[] [] kfree+0x14a/0x150 [6.618419] RSP: :881fecda7d40 EFLAGS: 00010246 [6.624348] RAX: 8106c280 RBX: 883ff114bfc0 RCX: ffd8 [6.632314] RDX: 77ff8000 RSI: 0246 RDI: 883ff114bfc0 [6.640280] RBP: 881fecda7d58 R08: R09: 881fff917300 [6.648244] R10: R11: ea00ffc452c0 R12: 883fec2f4080 [6.656208] R13: 810a5bee R14: R15: [6.664175] FS: () GS:881fff84() knlGS: [6.673208] CS: 0010 DS: ES: CR0: 80050033 [6.679623] CR2: CR3: 01aa2000 CR4: 00360670 [6.687587] DR0: DR1: DR2: [6.695553] DR3: DR6: fffe0ff0 DR7: 0400 [6.703516] Stack: [6.705759] 883ff114bfc0 883fec2f4080 819a26e8 881fecda7e00 [6.714061] 810a5bee 881f0020 881fecda7e10 881fecda7da8 [6.722363] 881f 881fecda7d90 881fecda7d90 [6.730666] Call Trace: [6.733400] [] kthread_create_on_node+0x14e/0x1a0 [6.740495] [] clocksource_watchdog_work+0x25/0x40 [6.747679] [] process_one_work+0x14f/0x400 [6.754181] [] worker_thread+0x114/0x480 [6.760402] [] ? rescuer_thread+0x310/0x310 [6.766913] [] kthread+0xe5/0x100 [6.772456] [] ? kthread_park+0x60/0x60 [6.778580] [] ret_from_fork+0x3f/0x70 [6.784608] [] ? kthread_park+0x60/0x60 [6.790721] Code: 8b 03 31 f6 f6 c4 40 74 04 41 8b 73 6c 4c 89 df e8 1c a8 fa ff e9 73 ff ff ff 4c 8d 58 ff e9 20 ff ff ff 49 8b 43 20 a8 01 75 d4 <0f> 0b 0f 1f 40 00 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 41 55 [6.812429] RIP [] kfree+0x14a/0x150 [6.818273] RSP [6.822177] ---[ end trace 4ce44d21c6d68eed ]--- On Mon, Jan 8, 2018 at 3:38 PM, Pavel Tatashinwrote: > Hi Greg, > > > > On Mon, Jan 8, 2018 at 2:46 AM, Greg Kroah-Hartman > wrote: >> On Sun, Jan 07, 2018 at 10:06:59AM -0500, Pavel Tatashin wrote: >>> Hi Greg, >>> >>> I reverted suse12 back to: >>> 13dae54cb229d078635f159dd8afe16ae683980b >>> x86/kaiser: Move feature detection up (bsc#1068032). >>> >>> And, still do not see the problem. So, whatever fixes the issue comes >>> before kaiser. >> >> Ok, thanks for the hint. >> >> As I can't duplicate this here at all, any specifics as to what >> hardware/procesor type this is? >> > > BIOS: > Version 2.17.1249. Copyright (C) 2016 American Megatrends, Inc. > BIOS Date: 08/30/2016 10:35:36 Ver: 38050100 > > ca-ostest442:linux-stable$ lscpu > Architecture: x86_64 > CPU op-mode(s):32-bit, 64-bit > Byte Order:Little Endian > CPU(s):40 > On-line CPU(s) list: 0-39 > Thread(s) per core:2 > Core(s) per socket:10 > Socket(s): 2 > NUMA node(s): 2 > Vendor ID: GenuineIntel > CPU family:6 > Model: 79 > Model name:Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz > Stepping: 1 > CPU MHz: 1738.601 > BogoMIPS: 4396.18 > Virtualization:VT-x > L1d cache: 32K > L1i cache: 32K > L2 cache: 256K > L3 cache: 25600K > NUMA node0 CPU(s): 0-9,20-29 > NUMA node1 CPU(s): 10-19,30-39 > > Note, if I boot with nr_cpus=1, hang never happens, with nr_cpus=4 > happens but seldomly, and with all 40 CPUs happens on almost every > reboot. > > As Hugh Dickins suggested, I am going to show panic outputs, as I get > them. Here is one more panic (note output is not complete because > machine reboots): > > [6.276456] EFI Variables Facility v0.08 2004-May-17 > [6.384665] BUG: unable to handle kernel paging request at > 901fff5a6000 > [6.392461] IP: [] vmalloc_fault+0x1f8/0x340 > [6.398987] PGD 0 > [6.401242] Oops: [#1] SMP > [6.404866] Modules linked in: > [6.408287] CPU: 10 PID: 0 Comm: swapper/10 Not tainted > 4.4.110_pt_stable #2 > [6.416156] Hardware name: Oracle Corporation ORACLE SERVER > X6-2/ASM,MOTHERBOARD,1U, BIOS 3 > 8050100 08/30/2016 > [6.427226] task: 883ff1e28000 ti:
Re: [PATCH 4.4 00/37] 4.4.110-stable review
Here is one more: [6.284763] EFI Variables Facility v0.08 2004-May-17 [6.555990] [ cut here ] [6.561145] kernel BUG at /scratch/ptatashi/linux-stable/mm/slub.c:3627! [6.568625] invalid opcode: [#1] SMP [6.573219] Modules linked in: [6.576639] CPU: 1 PID: 364 Comm: kworker/1:1 Not tainted 4.4.110_pt_stable #3 [6.584692] Hardware name: Oracle Corporation ORACLE SERVER X6-2/ASM,MOTHERBOARD,1U, BIOS 38050100 08/30/2016 [6.595766] Workqueue: events clocksource_watchdog_work [6.601611] task: 881fecd82b00 ti: 881fecda4000 task.ti: 881fecda4000 [6.609963] RIP: 0010:[] [] kfree+0x14a/0x150 [6.618419] RSP: :881fecda7d40 EFLAGS: 00010246 [6.624348] RAX: 8106c280 RBX: 883ff114bfc0 RCX: ffd8 [6.632314] RDX: 77ff8000 RSI: 0246 RDI: 883ff114bfc0 [6.640280] RBP: 881fecda7d58 R08: R09: 881fff917300 [6.648244] R10: R11: ea00ffc452c0 R12: 883fec2f4080 [6.656208] R13: 810a5bee R14: R15: [6.664175] FS: () GS:881fff84() knlGS: [6.673208] CS: 0010 DS: ES: CR0: 80050033 [6.679623] CR2: CR3: 01aa2000 CR4: 00360670 [6.687587] DR0: DR1: DR2: [6.695553] DR3: DR6: fffe0ff0 DR7: 0400 [6.703516] Stack: [6.705759] 883ff114bfc0 883fec2f4080 819a26e8 881fecda7e00 [6.714061] 810a5bee 881f0020 881fecda7e10 881fecda7da8 [6.722363] 881f 881fecda7d90 881fecda7d90 [6.730666] Call Trace: [6.733400] [] kthread_create_on_node+0x14e/0x1a0 [6.740495] [] clocksource_watchdog_work+0x25/0x40 [6.747679] [] process_one_work+0x14f/0x400 [6.754181] [] worker_thread+0x114/0x480 [6.760402] [] ? rescuer_thread+0x310/0x310 [6.766913] [] kthread+0xe5/0x100 [6.772456] [] ? kthread_park+0x60/0x60 [6.778580] [] ret_from_fork+0x3f/0x70 [6.784608] [] ? kthread_park+0x60/0x60 [6.790721] Code: 8b 03 31 f6 f6 c4 40 74 04 41 8b 73 6c 4c 89 df e8 1c a8 fa ff e9 73 ff ff ff 4c 8d 58 ff e9 20 ff ff ff 49 8b 43 20 a8 01 75 d4 <0f> 0b 0f 1f 40 00 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 41 55 [6.812429] RIP [] kfree+0x14a/0x150 [6.818273] RSP [6.822177] ---[ end trace 4ce44d21c6d68eed ]--- On Mon, Jan 8, 2018 at 3:38 PM, Pavel Tatashin wrote: > Hi Greg, > > > > On Mon, Jan 8, 2018 at 2:46 AM, Greg Kroah-Hartman > wrote: >> On Sun, Jan 07, 2018 at 10:06:59AM -0500, Pavel Tatashin wrote: >>> Hi Greg, >>> >>> I reverted suse12 back to: >>> 13dae54cb229d078635f159dd8afe16ae683980b >>> x86/kaiser: Move feature detection up (bsc#1068032). >>> >>> And, still do not see the problem. So, whatever fixes the issue comes >>> before kaiser. >> >> Ok, thanks for the hint. >> >> As I can't duplicate this here at all, any specifics as to what >> hardware/procesor type this is? >> > > BIOS: > Version 2.17.1249. Copyright (C) 2016 American Megatrends, Inc. > BIOS Date: 08/30/2016 10:35:36 Ver: 38050100 > > ca-ostest442:linux-stable$ lscpu > Architecture: x86_64 > CPU op-mode(s):32-bit, 64-bit > Byte Order:Little Endian > CPU(s):40 > On-line CPU(s) list: 0-39 > Thread(s) per core:2 > Core(s) per socket:10 > Socket(s): 2 > NUMA node(s): 2 > Vendor ID: GenuineIntel > CPU family:6 > Model: 79 > Model name:Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz > Stepping: 1 > CPU MHz: 1738.601 > BogoMIPS: 4396.18 > Virtualization:VT-x > L1d cache: 32K > L1i cache: 32K > L2 cache: 256K > L3 cache: 25600K > NUMA node0 CPU(s): 0-9,20-29 > NUMA node1 CPU(s): 10-19,30-39 > > Note, if I boot with nr_cpus=1, hang never happens, with nr_cpus=4 > happens but seldomly, and with all 40 CPUs happens on almost every > reboot. > > As Hugh Dickins suggested, I am going to show panic outputs, as I get > them. Here is one more panic (note output is not complete because > machine reboots): > > [6.276456] EFI Variables Facility v0.08 2004-May-17 > [6.384665] BUG: unable to handle kernel paging request at > 901fff5a6000 > [6.392461] IP: [] vmalloc_fault+0x1f8/0x340 > [6.398987] PGD 0 > [6.401242] Oops: [#1] SMP > [6.404866] Modules linked in: > [6.408287] CPU: 10 PID: 0 Comm: swapper/10 Not tainted > 4.4.110_pt_stable #2 > [6.416156] Hardware name: Oracle Corporation ORACLE SERVER > X6-2/ASM,MOTHERBOARD,1U, BIOS 3 > 8050100 08/30/2016 > [6.427226] task: 883ff1e28000 ti: 883ff1e24000 task.ti: > 883ff1e24000 > [6.435580]
Re: [PATCH 4.4 00/37] 4.4.110-stable review
Hi Greg, On Mon, Jan 8, 2018 at 2:46 AM, Greg Kroah-Hartmanwrote: > On Sun, Jan 07, 2018 at 10:06:59AM -0500, Pavel Tatashin wrote: >> Hi Greg, >> >> I reverted suse12 back to: >> 13dae54cb229d078635f159dd8afe16ae683980b >> x86/kaiser: Move feature detection up (bsc#1068032). >> >> And, still do not see the problem. So, whatever fixes the issue comes >> before kaiser. > > Ok, thanks for the hint. > > As I can't duplicate this here at all, any specifics as to what > hardware/procesor type this is? > BIOS: Version 2.17.1249. Copyright (C) 2016 American Megatrends, Inc. BIOS Date: 08/30/2016 10:35:36 Ver: 38050100 ca-ostest442:linux-stable$ lscpu Architecture: x86_64 CPU op-mode(s):32-bit, 64-bit Byte Order:Little Endian CPU(s):40 On-line CPU(s) list: 0-39 Thread(s) per core:2 Core(s) per socket:10 Socket(s): 2 NUMA node(s): 2 Vendor ID: GenuineIntel CPU family:6 Model: 79 Model name:Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz Stepping: 1 CPU MHz: 1738.601 BogoMIPS: 4396.18 Virtualization:VT-x L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 25600K NUMA node0 CPU(s): 0-9,20-29 NUMA node1 CPU(s): 10-19,30-39 Note, if I boot with nr_cpus=1, hang never happens, with nr_cpus=4 happens but seldomly, and with all 40 CPUs happens on almost every reboot. As Hugh Dickins suggested, I am going to show panic outputs, as I get them. Here is one more panic (note output is not complete because machine reboots): [6.276456] EFI Variables Facility v0.08 2004-May-17 [6.384665] BUG: unable to handle kernel paging request at 901fff5a6000 [6.392461] IP: [] vmalloc_fault+0x1f8/0x340 [6.398987] PGD 0 [6.401242] Oops: [#1] SMP [6.404866] Modules linked in: [6.408287] CPU: 10 PID: 0 Comm: swapper/10 Not tainted 4.4.110_pt_stable #2 [6.416156] Hardware name: Oracle Corporation ORACLE SERVER X6-2/ASM,MOTHERBOARD,1U, BIOS 3 8050100 08/30/2016 [6.427226] task: 883ff1e28000 ti: 883ff1e24000 task.ti: 883ff1e24000 [6.435580] RIP: 0010:[] [] vmalloc_fault+0x1f8/0x340 [6.444819] RSP: :883ff1e27cc0 EFLAGS: 00010086 [6.450749] RAX: 881fff5a6058 RBX: 3000 RCX: 081fff5a6000 [6.458714] RDX: 8800 RSI: 901fff5a6000 RDI: [6.466681] RBP: 883ff1e27cf0 R08: 0018 R09: 0002d2de [6.474647] R10: 00032ef3 R11: 2e04 R12: c9f0 [6.482615] R13: 8800 R14: 901fff5a6000 R15: 881fff5a6000 [6.490574] FS: () GS:88407e60() knlGS: [6.499607] CS: 0010 DS: ES: CR0: 80050033 [6.506022] CR2: 901fff5a6000 CR3: 01aa2000 CR4: 00360670 [6.513989] DR0: DR1: DR2: [6.521956] DR3: DR6: fffe0ff0 DR7: 0400 [6.529923] Stack: [6.532169] 881fff5a6000[6.532405] [ cut here ] [6.532414] WARNING: CPU: 22 PID: 162
Re: [PATCH 4.4 00/37] 4.4.110-stable review
Hi Greg, On Mon, Jan 8, 2018 at 2:46 AM, Greg Kroah-Hartman wrote: > On Sun, Jan 07, 2018 at 10:06:59AM -0500, Pavel Tatashin wrote: >> Hi Greg, >> >> I reverted suse12 back to: >> 13dae54cb229d078635f159dd8afe16ae683980b >> x86/kaiser: Move feature detection up (bsc#1068032). >> >> And, still do not see the problem. So, whatever fixes the issue comes >> before kaiser. > > Ok, thanks for the hint. > > As I can't duplicate this here at all, any specifics as to what > hardware/procesor type this is? > BIOS: Version 2.17.1249. Copyright (C) 2016 American Megatrends, Inc. BIOS Date: 08/30/2016 10:35:36 Ver: 38050100 ca-ostest442:linux-stable$ lscpu Architecture: x86_64 CPU op-mode(s):32-bit, 64-bit Byte Order:Little Endian CPU(s):40 On-line CPU(s) list: 0-39 Thread(s) per core:2 Core(s) per socket:10 Socket(s): 2 NUMA node(s): 2 Vendor ID: GenuineIntel CPU family:6 Model: 79 Model name:Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz Stepping: 1 CPU MHz: 1738.601 BogoMIPS: 4396.18 Virtualization:VT-x L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 25600K NUMA node0 CPU(s): 0-9,20-29 NUMA node1 CPU(s): 10-19,30-39 Note, if I boot with nr_cpus=1, hang never happens, with nr_cpus=4 happens but seldomly, and with all 40 CPUs happens on almost every reboot. As Hugh Dickins suggested, I am going to show panic outputs, as I get them. Here is one more panic (note output is not complete because machine reboots): [6.276456] EFI Variables Facility v0.08 2004-May-17 [6.384665] BUG: unable to handle kernel paging request at 901fff5a6000 [6.392461] IP: [] vmalloc_fault+0x1f8/0x340 [6.398987] PGD 0 [6.401242] Oops: [#1] SMP [6.404866] Modules linked in: [6.408287] CPU: 10 PID: 0 Comm: swapper/10 Not tainted 4.4.110_pt_stable #2 [6.416156] Hardware name: Oracle Corporation ORACLE SERVER X6-2/ASM,MOTHERBOARD,1U, BIOS 3 8050100 08/30/2016 [6.427226] task: 883ff1e28000 ti: 883ff1e24000 task.ti: 883ff1e24000 [6.435580] RIP: 0010:[] [] vmalloc_fault+0x1f8/0x340 [6.444819] RSP: :883ff1e27cc0 EFLAGS: 00010086 [6.450749] RAX: 881fff5a6058 RBX: 3000 RCX: 081fff5a6000 [6.458714] RDX: 8800 RSI: 901fff5a6000 RDI: [6.466681] RBP: 883ff1e27cf0 R08: 0018 R09: 0002d2de [6.474647] R10: 00032ef3 R11: 2e04 R12: c9f0 [6.482615] R13: 8800 R14: 901fff5a6000 R15: 881fff5a6000 [6.490574] FS: () GS:88407e60() knlGS: [6.499607] CS: 0010 DS: ES: CR0: 80050033 [6.506022] CR2: 901fff5a6000 CR3: 01aa2000 CR4: 00360670 [6.513989] DR0: DR1: DR2: [6.521956] DR3: DR6: fffe0ff0 DR7: 0400 [6.529923] Stack: [6.532169] 881fff5a6000[6.532405] [ cut here ] [6.532414] WARNING: CPU: 22 PID: 162
Re: [PATCH 4.4 00/37] 4.4.110-stable review
On 05/01/18 00:06, Kevin Hilman wrote: kernelci.org botwrites: stable-rc/linux-4.4.y boot: 100 boots: 4 failed, 93 passed with 1 offline, 2 conflicts (v4.4.109-38-g99abd6cdd65e) Full Boot Summary: https://kernelci.org/boot/all/job/stable-rc/branch/linux-4.4.y/kernel/v4.4.109-38-g99abd6cdd65e/ Full Build Summary: https://kernelci.org/build/stable-rc/branch/linux-4.4.y/kernel/v4.4.109-38-g99abd6cdd65e/ Tree: stable-rc Branch: linux-4.4.y Git Describe: v4.4.109-38-g99abd6cdd65e Git Commit: 99abd6cdd65e984d89c8565508a7a96ea0fce179 Git URL: http://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git Tested: 53 unique boards, 19 SoC families, 16 builds out of 178 TL;DR; All is well. Boot Regressions Detected: arm: exynos_defconfig: exynos5422-odroidxu3: lab-collabora: failing since 58 days (last pass: v4.4.95-21-g32458fcb7bd6 - first fail: v4.4.96-41-g336421367b9c) Long standing issue in lab-collabora (passing in other labs) Guillaume? This should be fixed now, with a tweak to the device config to enable relocating the ramdisk and dtb: https://review.linaro.org/#/c/23238/ multi_v7_defconfig: armada-xp-linksys-mamba: lab-free-electrons: new failure (last pass: v4.4.109-36-g8b381424010c) Not a kerel issue, bootROM fails to start bootloader. I pinged lab owners (Free Electrons) tegra124-nyan-big: lab-collabora: failing since 1 day (last pass: v4.4.109 - first fail: v4.4.109-36-g8b381424010c) tegra_defconfig: tegra124-nyan-big: lab-collabora: failing since 1 day (last pass: v4.4.108-65-g57856049c0f8 - first fail: v4.4.109) This one is booting fine, but the command to power-off the board is timing out, resulting in a failure report. Indeed, this was due to a crash of the lavapdu daemon - it's back on track now. (On a side note, the tegra124-nyan-big is still failing to boot in mainline due to a genuine kernel driver issue.) Guillaume
Re: [PATCH 4.4 00/37] 4.4.110-stable review
On 05/01/18 00:06, Kevin Hilman wrote: kernelci.org bot writes: stable-rc/linux-4.4.y boot: 100 boots: 4 failed, 93 passed with 1 offline, 2 conflicts (v4.4.109-38-g99abd6cdd65e) Full Boot Summary: https://kernelci.org/boot/all/job/stable-rc/branch/linux-4.4.y/kernel/v4.4.109-38-g99abd6cdd65e/ Full Build Summary: https://kernelci.org/build/stable-rc/branch/linux-4.4.y/kernel/v4.4.109-38-g99abd6cdd65e/ Tree: stable-rc Branch: linux-4.4.y Git Describe: v4.4.109-38-g99abd6cdd65e Git Commit: 99abd6cdd65e984d89c8565508a7a96ea0fce179 Git URL: http://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git Tested: 53 unique boards, 19 SoC families, 16 builds out of 178 TL;DR; All is well. Boot Regressions Detected: arm: exynos_defconfig: exynos5422-odroidxu3: lab-collabora: failing since 58 days (last pass: v4.4.95-21-g32458fcb7bd6 - first fail: v4.4.96-41-g336421367b9c) Long standing issue in lab-collabora (passing in other labs) Guillaume? This should be fixed now, with a tweak to the device config to enable relocating the ramdisk and dtb: https://review.linaro.org/#/c/23238/ multi_v7_defconfig: armada-xp-linksys-mamba: lab-free-electrons: new failure (last pass: v4.4.109-36-g8b381424010c) Not a kerel issue, bootROM fails to start bootloader. I pinged lab owners (Free Electrons) tegra124-nyan-big: lab-collabora: failing since 1 day (last pass: v4.4.109 - first fail: v4.4.109-36-g8b381424010c) tegra_defconfig: tegra124-nyan-big: lab-collabora: failing since 1 day (last pass: v4.4.108-65-g57856049c0f8 - first fail: v4.4.109) This one is booting fine, but the command to power-off the board is timing out, resulting in a failure report. Indeed, this was due to a crash of the lavapdu daemon - it's back on track now. (On a side note, the tegra124-nyan-big is still failing to boot in mainline due to a genuine kernel driver issue.) Guillaume
Re: [PATCH 4.4 00/37] 4.4.110-stable review
On Sun, Jan 07, 2018 at 10:06:59AM -0500, Pavel Tatashin wrote: > Hi Greg, > > I reverted suse12 back to: > 13dae54cb229d078635f159dd8afe16ae683980b > x86/kaiser: Move feature detection up (bsc#1068032). > > And, still do not see the problem. So, whatever fixes the issue comes > before kaiser. Ok, thanks for the hint. As I can't duplicate this here at all, any specifics as to what hardware/procesor type this is? I can punt and say just "use 4.9 on this hardware if you have it", right? :) I'll try to dig through the sles kernel some more, but given it is 2 patches, and I can't actually test the problem myself, it's not exactly easy going... greg k-h
Re: [PATCH 4.4 00/37] 4.4.110-stable review
On Sun, Jan 07, 2018 at 10:06:59AM -0500, Pavel Tatashin wrote: > Hi Greg, > > I reverted suse12 back to: > 13dae54cb229d078635f159dd8afe16ae683980b > x86/kaiser: Move feature detection up (bsc#1068032). > > And, still do not see the problem. So, whatever fixes the issue comes > before kaiser. Ok, thanks for the hint. As I can't duplicate this here at all, any specifics as to what hardware/procesor type this is? I can punt and say just "use 4.9 on this hardware if you have it", right? :) I'll try to dig through the sles kernel some more, but given it is 2 patches, and I can't actually test the problem myself, it's not exactly easy going... greg k-h
Re: [PATCH 4.4 00/37] 4.4.110-stable review
Hi Greg, I reverted suse12 back to: 13dae54cb229d078635f159dd8afe16ae683980b x86/kaiser: Move feature detection up (bsc#1068032). And, still do not see the problem. So, whatever fixes the issue comes before kaiser. Pavel On Sun, Jan 7, 2018 at 9:17 AM, Pavel Tatashinwrote: > Hi Greg, > > I cloned and built suse12, and it does not have issues with EFI + PTI > (kaiser) on my machine. > > BTW, i have also reproduced this problem on another machine with the > same configuration, therefore, it is not specific only to one box. > Also, as I mentioned earlier I am seeing the same issue with 4.1 + > kaiser patches taken from 4.4.110. > > Thank you, > Pavel > > On Sun, Jan 7, 2018 at 5:45 AM, Greg Kroah-Hartman > wrote: >> On Fri, Jan 05, 2018 at 04:03:54PM -0500, Pavel Tatashin wrote: >>> The hardware works :) I meant that before the patch linked in >>> https://lkml.org/lkml/2018/1/5/534, I was never able to boot 4.4.110. But >>> with that patch applied, I was able to boot it at least once, but it could >>> be accidental. The hang/panic does not happen at the same time on every >>> boot. >> >> Any chance you can grab the latest SLES 12 kernel and run it with pti >> and efi enabled to see if that works properly for you or not? I trust >> SUSE's testing of their kernel, and odds are I'm just missing one of >> their many other patches they have in their tree for other issues that >> they have seen in the past. >> >> If you want, I can just send you the full patch that they run on top of >> the latest 4.4 stable tree, so you don't have to dig it out of their git >> repo if you can't find the binary image. >> >> thanks, >> >> greg k-h
Re: [PATCH 4.4 00/37] 4.4.110-stable review
Hi Greg, I reverted suse12 back to: 13dae54cb229d078635f159dd8afe16ae683980b x86/kaiser: Move feature detection up (bsc#1068032). And, still do not see the problem. So, whatever fixes the issue comes before kaiser. Pavel On Sun, Jan 7, 2018 at 9:17 AM, Pavel Tatashin wrote: > Hi Greg, > > I cloned and built suse12, and it does not have issues with EFI + PTI > (kaiser) on my machine. > > BTW, i have also reproduced this problem on another machine with the > same configuration, therefore, it is not specific only to one box. > Also, as I mentioned earlier I am seeing the same issue with 4.1 + > kaiser patches taken from 4.4.110. > > Thank you, > Pavel > > On Sun, Jan 7, 2018 at 5:45 AM, Greg Kroah-Hartman > wrote: >> On Fri, Jan 05, 2018 at 04:03:54PM -0500, Pavel Tatashin wrote: >>> The hardware works :) I meant that before the patch linked in >>> https://lkml.org/lkml/2018/1/5/534, I was never able to boot 4.4.110. But >>> with that patch applied, I was able to boot it at least once, but it could >>> be accidental. The hang/panic does not happen at the same time on every >>> boot. >> >> Any chance you can grab the latest SLES 12 kernel and run it with pti >> and efi enabled to see if that works properly for you or not? I trust >> SUSE's testing of their kernel, and odds are I'm just missing one of >> their many other patches they have in their tree for other issues that >> they have seen in the past. >> >> If you want, I can just send you the full patch that they run on top of >> the latest 4.4 stable tree, so you don't have to dig it out of their git >> repo if you can't find the binary image. >> >> thanks, >> >> greg k-h
Re: [PATCH 4.4 00/37] 4.4.110-stable review
Hi Greg, I cloned and built suse12, and it does not have issues with EFI + PTI (kaiser) on my machine. BTW, i have also reproduced this problem on another machine with the same configuration, therefore, it is not specific only to one box. Also, as I mentioned earlier I am seeing the same issue with 4.1 + kaiser patches taken from 4.4.110. Thank you, Pavel On Sun, Jan 7, 2018 at 5:45 AM, Greg Kroah-Hartmanwrote: > On Fri, Jan 05, 2018 at 04:03:54PM -0500, Pavel Tatashin wrote: >> The hardware works :) I meant that before the patch linked in >> https://lkml.org/lkml/2018/1/5/534, I was never able to boot 4.4.110. But >> with that patch applied, I was able to boot it at least once, but it could >> be accidental. The hang/panic does not happen at the same time on every >> boot. > > Any chance you can grab the latest SLES 12 kernel and run it with pti > and efi enabled to see if that works properly for you or not? I trust > SUSE's testing of their kernel, and odds are I'm just missing one of > their many other patches they have in their tree for other issues that > they have seen in the past. > > If you want, I can just send you the full patch that they run on top of > the latest 4.4 stable tree, so you don't have to dig it out of their git > repo if you can't find the binary image. > > thanks, > > greg k-h
Re: [PATCH 4.4 00/37] 4.4.110-stable review
Hi Greg, I cloned and built suse12, and it does not have issues with EFI + PTI (kaiser) on my machine. BTW, i have also reproduced this problem on another machine with the same configuration, therefore, it is not specific only to one box. Also, as I mentioned earlier I am seeing the same issue with 4.1 + kaiser patches taken from 4.4.110. Thank you, Pavel On Sun, Jan 7, 2018 at 5:45 AM, Greg Kroah-Hartman wrote: > On Fri, Jan 05, 2018 at 04:03:54PM -0500, Pavel Tatashin wrote: >> The hardware works :) I meant that before the patch linked in >> https://lkml.org/lkml/2018/1/5/534, I was never able to boot 4.4.110. But >> with that patch applied, I was able to boot it at least once, but it could >> be accidental. The hang/panic does not happen at the same time on every >> boot. > > Any chance you can grab the latest SLES 12 kernel and run it with pti > and efi enabled to see if that works properly for you or not? I trust > SUSE's testing of their kernel, and odds are I'm just missing one of > their many other patches they have in their tree for other issues that > they have seen in the past. > > If you want, I can just send you the full patch that they run on top of > the latest 4.4 stable tree, so you don't have to dig it out of their git > repo if you can't find the binary image. > > thanks, > > greg k-h
Re: [PATCH 4.4 00/37] 4.4.110-stable review
On Fri, Jan 05, 2018 at 04:03:54PM -0500, Pavel Tatashin wrote: > The hardware works :) I meant that before the patch linked in > https://lkml.org/lkml/2018/1/5/534, I was never able to boot 4.4.110. But > with that patch applied, I was able to boot it at least once, but it could > be accidental. The hang/panic does not happen at the same time on every > boot. Any chance you can grab the latest SLES 12 kernel and run it with pti and efi enabled to see if that works properly for you or not? I trust SUSE's testing of their kernel, and odds are I'm just missing one of their many other patches they have in their tree for other issues that they have seen in the past. If you want, I can just send you the full patch that they run on top of the latest 4.4 stable tree, so you don't have to dig it out of their git repo if you can't find the binary image. thanks, greg k-h
Re: [PATCH 4.4 00/37] 4.4.110-stable review
On Fri, Jan 05, 2018 at 04:03:54PM -0500, Pavel Tatashin wrote: > The hardware works :) I meant that before the patch linked in > https://lkml.org/lkml/2018/1/5/534, I was never able to boot 4.4.110. But > with that patch applied, I was able to boot it at least once, but it could > be accidental. The hang/panic does not happen at the same time on every > boot. Any chance you can grab the latest SLES 12 kernel and run it with pti and efi enabled to see if that works properly for you or not? I trust SUSE's testing of their kernel, and odds are I'm just missing one of their many other patches they have in their tree for other issues that they have seen in the past. If you want, I can just send you the full patch that they run on top of the latest 4.4 stable tree, so you don't have to dig it out of their git repo if you can't find the binary image. thanks, greg k-h
Re: [PATCH 4.4 00/37] 4.4.110-stable review
On Fri, 2018-01-05 at 15:28 -0800, Hugh Dickins wrote: > On Fri, Jan 5, 2018 at 6:03 AM, Mike Galbraithwrote: > > On Fri, 2018-01-05 at 14:34 +0100, Greg Kroah-Hartman wrote: > >> > >> Ok, we found two patches that were missing in 4.4-stable that were in > >> the SLES12 tree (thanks to Jamie Iles), now I only have 19k more to sift > >> through :) > > > > As you know, in enterprise, uname -r means you might find something > > this old in your kernel if you look hard enough :) > > Mike, I think there's a good chance that Greg's 4.4.110 final will fix > your "segfault at ff5ff100" crashes: please give it a try when > you can, and let us know - thanks. Already done, and yes, it did. -Mike
Re: [PATCH 4.4 00/37] 4.4.110-stable review
On Fri, 2018-01-05 at 15:28 -0800, Hugh Dickins wrote: > On Fri, Jan 5, 2018 at 6:03 AM, Mike Galbraith wrote: > > On Fri, 2018-01-05 at 14:34 +0100, Greg Kroah-Hartman wrote: > >> > >> Ok, we found two patches that were missing in 4.4-stable that were in > >> the SLES12 tree (thanks to Jamie Iles), now I only have 19k more to sift > >> through :) > > > > As you know, in enterprise, uname -r means you might find something > > this old in your kernel if you look hard enough :) > > Mike, I think there's a good chance that Greg's 4.4.110 final will fix > your "segfault at ff5ff100" crashes: please give it a try when > you can, and let us know - thanks. Already done, and yes, it did. -Mike
Re: [PATCH 4.4 00/37] 4.4.110-stable review
On 01/05/2018 12:54 PM, Greg Kroah-Hartman wrote: On Fri, Jan 05, 2018 at 09:56:16AM -0800, Guenter Roeck wrote: On Wed, Jan 03, 2018 at 09:11:06PM +0100, Greg Kroah-Hartman wrote: This is the start of the stable review cycle for the 4.4.110 release. There are 37 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know. Responses should be made by Fri Jan 5 19:50:38 UTC 2018. Anything received after that time might be too late. Update: v4.4.110 final nosmp builds fail as follows: Error log: arch/x86/entry/vdso/vma.c: In function ‘map_vdso’: arch/x86/entry/vdso/vma.c:173:9: error: implicit declaration of function ‘pvclock_pvti_cpu0_va’ x86-64 or i386? That should be a CONFIG_PARAVIRT_CLOCK issue, not a smp build issue, have a .config I can try? Here is an easier way to reproduce the problem: make allnoconfig ; make Guenter
Re: [PATCH 4.4 00/37] 4.4.110-stable review
On 01/05/2018 12:54 PM, Greg Kroah-Hartman wrote: On Fri, Jan 05, 2018 at 09:56:16AM -0800, Guenter Roeck wrote: On Wed, Jan 03, 2018 at 09:11:06PM +0100, Greg Kroah-Hartman wrote: This is the start of the stable review cycle for the 4.4.110 release. There are 37 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know. Responses should be made by Fri Jan 5 19:50:38 UTC 2018. Anything received after that time might be too late. Update: v4.4.110 final nosmp builds fail as follows: Error log: arch/x86/entry/vdso/vma.c: In function ‘map_vdso’: arch/x86/entry/vdso/vma.c:173:9: error: implicit declaration of function ‘pvclock_pvti_cpu0_va’ x86-64 or i386? That should be a CONFIG_PARAVIRT_CLOCK issue, not a smp build issue, have a .config I can try? Here is an easier way to reproduce the problem: make allnoconfig ; make Guenter
Re: [PATCH 4.4 00/37] 4.4.110-stable review
Hi Hugh, Thank you very much for your very thoughtful input. I quiet positive this problem is PTI regression, because exactly the same problem I see with kernel 4.1 to which I back-ported all the necessary PTI patches from 4.4.110. I will provide this thread with more information as I collect it. I will also try to root cause the problem. The bug has memory corruption behavior, but with both 4.1 and 4.4 kernels problem goes away when I boot with noefi parameter. So, EFI + PTI is the culprit for this memory corruption. Thank you, Pavel On 01/05/2018 06:15 PM, Hugh Dickins wrote: On Fri, Jan 5, 2018 at 1:03 PM, Pavel Tatashinwrote: The hardware works :) I meant that before the patch linked in https://lkml.org/lkml/2018/1/5/534, I was never able to boot 4.4.110. But with that patch applied, I was able to boot it at least once, but it could be accidental. The hang/panic does not happen at the same time on every boot. I get the feeling that it was accidental: it seems to me that you have a memory corruption problem, that gets shifted around by the different patches (or "noefi" or "nopti"). Because yesterday your boots were able to get way beyond the "EFI Variables Facility" message, and I can't imagine why the EFI issue would not have been equally debilitating on yesterday's 110-rc, if it were in play. I did intend to ask you to send your System.map, for us to scan through: maybe some variable is marked __init and should not be, then the "Freeing unused kernel memory" frees it for random reuse. But today you didn't get anywhere near the "Freeing unused kernel memory", so that can't be it - or do you sometimes get that far today? You mention that the hang/panic does not happen at the same time on every boot: I think all I can ask is for you to keep supplying us with different examples (console messages) of where it occurs, in the hope that one of them will point us in the right direction. And it even seems possible that this has nothing to do with the 4.4.110 changes - that 4.4.109 plus some other random patches would unleash similar corruption. Though on balance that does seem unlikely. Hugh
Re: [PATCH 4.4 00/37] 4.4.110-stable review
Hi Hugh, Thank you very much for your very thoughtful input. I quiet positive this problem is PTI regression, because exactly the same problem I see with kernel 4.1 to which I back-ported all the necessary PTI patches from 4.4.110. I will provide this thread with more information as I collect it. I will also try to root cause the problem. The bug has memory corruption behavior, but with both 4.1 and 4.4 kernels problem goes away when I boot with noefi parameter. So, EFI + PTI is the culprit for this memory corruption. Thank you, Pavel On 01/05/2018 06:15 PM, Hugh Dickins wrote: On Fri, Jan 5, 2018 at 1:03 PM, Pavel Tatashin wrote: The hardware works :) I meant that before the patch linked in https://lkml.org/lkml/2018/1/5/534, I was never able to boot 4.4.110. But with that patch applied, I was able to boot it at least once, but it could be accidental. The hang/panic does not happen at the same time on every boot. I get the feeling that it was accidental: it seems to me that you have a memory corruption problem, that gets shifted around by the different patches (or "noefi" or "nopti"). Because yesterday your boots were able to get way beyond the "EFI Variables Facility" message, and I can't imagine why the EFI issue would not have been equally debilitating on yesterday's 110-rc, if it were in play. I did intend to ask you to send your System.map, for us to scan through: maybe some variable is marked __init and should not be, then the "Freeing unused kernel memory" frees it for random reuse. But today you didn't get anywhere near the "Freeing unused kernel memory", so that can't be it - or do you sometimes get that far today? You mention that the hang/panic does not happen at the same time on every boot: I think all I can ask is for you to keep supplying us with different examples (console messages) of where it occurs, in the hope that one of them will point us in the right direction. And it even seems possible that this has nothing to do with the 4.4.110 changes - that 4.4.109 plus some other random patches would unleash similar corruption. Though on balance that does seem unlikely. Hugh
Re: [PATCH 4.4 00/37] 4.4.110-stable review
On Fri, Jan 5, 2018 at 6:03 AM, Mike Galbraithwrote: > On Fri, 2018-01-05 at 14:34 +0100, Greg Kroah-Hartman wrote: >> >> Ok, we found two patches that were missing in 4.4-stable that were in >> the SLES12 tree (thanks to Jamie Iles), now I only have 19k more to sift >> through :) > > As you know, in enterprise, uname -r means you might find something > this old in your kernel if you look hard enough :) Mike, I think there's a good chance that Greg's 4.4.110 final will fix your "segfault at ff5ff100" crashes: please give it a try when you can, and let us know - thanks. Hugh
Re: [PATCH 4.4 00/37] 4.4.110-stable review
On Fri, Jan 5, 2018 at 6:03 AM, Mike Galbraith wrote: > On Fri, 2018-01-05 at 14:34 +0100, Greg Kroah-Hartman wrote: >> >> Ok, we found two patches that were missing in 4.4-stable that were in >> the SLES12 tree (thanks to Jamie Iles), now I only have 19k more to sift >> through :) > > As you know, in enterprise, uname -r means you might find something > this old in your kernel if you look hard enough :) Mike, I think there's a good chance that Greg's 4.4.110 final will fix your "segfault at ff5ff100" crashes: please give it a try when you can, and let us know - thanks. Hugh
Re: [PATCH 4.4 00/37] 4.4.110-stable review
On Fri, Jan 5, 2018 at 1:03 PM, Pavel Tatashinwrote: > The hardware works :) I meant that before the patch linked in > https://lkml.org/lkml/2018/1/5/534, I was never able to boot 4.4.110. But > with that patch applied, I was able to boot it at least once, but it could > be accidental. The hang/panic does not happen at the same time on every > boot. I get the feeling that it was accidental: it seems to me that you have a memory corruption problem, that gets shifted around by the different patches (or "noefi" or "nopti"). Because yesterday your boots were able to get way beyond the "EFI Variables Facility" message, and I can't imagine why the EFI issue would not have been equally debilitating on yesterday's 110-rc, if it were in play. I did intend to ask you to send your System.map, for us to scan through: maybe some variable is marked __init and should not be, then the "Freeing unused kernel memory" frees it for random reuse. But today you didn't get anywhere near the "Freeing unused kernel memory", so that can't be it - or do you sometimes get that far today? You mention that the hang/panic does not happen at the same time on every boot: I think all I can ask is for you to keep supplying us with different examples (console messages) of where it occurs, in the hope that one of them will point us in the right direction. And it even seems possible that this has nothing to do with the 4.4.110 changes - that 4.4.109 plus some other random patches would unleash similar corruption. Though on balance that does seem unlikely. Hugh > > Pasha > > > On 01/05/2018 03:45 PM, Greg Kroah-Hartman wrote: >> >> On Fri, Jan 05, 2018 at 02:18:32PM -0500, Pavel Tatashin wrote: >>> >>> Actually it helps, if before 4.4.110 never booted on my machine, not i >>> was able to boot on a second try. >> >> >> Wait, what? This has never booted on 4.4.x before? Did 4.4.108 work? >> 109? Are you sure this hardware even works? :) >> >> thanks, >> >> greg k-h >> >
Re: [PATCH 4.4 00/37] 4.4.110-stable review
On Fri, Jan 5, 2018 at 1:03 PM, Pavel Tatashin wrote: > The hardware works :) I meant that before the patch linked in > https://lkml.org/lkml/2018/1/5/534, I was never able to boot 4.4.110. But > with that patch applied, I was able to boot it at least once, but it could > be accidental. The hang/panic does not happen at the same time on every > boot. I get the feeling that it was accidental: it seems to me that you have a memory corruption problem, that gets shifted around by the different patches (or "noefi" or "nopti"). Because yesterday your boots were able to get way beyond the "EFI Variables Facility" message, and I can't imagine why the EFI issue would not have been equally debilitating on yesterday's 110-rc, if it were in play. I did intend to ask you to send your System.map, for us to scan through: maybe some variable is marked __init and should not be, then the "Freeing unused kernel memory" frees it for random reuse. But today you didn't get anywhere near the "Freeing unused kernel memory", so that can't be it - or do you sometimes get that far today? You mention that the hang/panic does not happen at the same time on every boot: I think all I can ask is for you to keep supplying us with different examples (console messages) of where it occurs, in the hope that one of them will point us in the right direction. And it even seems possible that this has nothing to do with the 4.4.110 changes - that 4.4.109 plus some other random patches would unleash similar corruption. Though on balance that does seem unlikely. Hugh > > Pasha > > > On 01/05/2018 03:45 PM, Greg Kroah-Hartman wrote: >> >> On Fri, Jan 05, 2018 at 02:18:32PM -0500, Pavel Tatashin wrote: >>> >>> Actually it helps, if before 4.4.110 never booted on my machine, not i >>> was able to boot on a second try. >> >> >> Wait, what? This has never booted on 4.4.x before? Did 4.4.108 work? >> 109? Are you sure this hardware even works? :) >> >> thanks, >> >> greg k-h >> >
Re: [PATCH 4.4 00/37] 4.4.110-stable review
On Fri, Jan 05, 2018 at 09:54:45PM +0100, Greg Kroah-Hartman wrote: > On Fri, Jan 05, 2018 at 09:56:16AM -0800, Guenter Roeck wrote: > > On Wed, Jan 03, 2018 at 09:11:06PM +0100, Greg Kroah-Hartman wrote: > > > This is the start of the stable review cycle for the 4.4.110 release. > > > There are 37 patches in this series, all will be posted as a response > > > to this one. If anyone has any issues with these being applied, please > > > let me know. > > > > > > Responses should be made by Fri Jan 5 19:50:38 UTC 2018. > > > Anything received after that time might be too late. > > > > > > > Update: v4.4.110 final nosmp builds fail as follows: > > > > > > Error log: > > arch/x86/entry/vdso/vma.c: In function ‘map_vdso’: > > arch/x86/entry/vdso/vma.c:173:9: error: > > implicit declaration of function ‘pvclock_pvti_cpu0_va’ > > x86-64 or i386? x86-64 > That should be a CONFIG_PARAVIRT_CLOCK issue, not a smp build issue, > have a .config I can try? > https://github.com/groeck/linux-build-test/blob/master/rootfs/x86_64/qemu_x86_64_pc_nosmp_defconfig However, https://github.com/groeck/linux-build-test/blob/master/rootfs/x86_64/qemu_x86_64_pc_defconfig does build, and the only differences are: 30a31 > CONFIG_SMP=y 32a34,35 > CONFIG_NR_CPUS=24 > CONFIG_SCHED_SMT=y 44d46 < CONFIG_ACPI_CONTAINER=y Both configurations have CONFIG_PARAVIRT_CLOCK disabled. Guenter
Re: [PATCH 4.4 00/37] 4.4.110-stable review
On Fri, Jan 05, 2018 at 09:54:45PM +0100, Greg Kroah-Hartman wrote: > On Fri, Jan 05, 2018 at 09:56:16AM -0800, Guenter Roeck wrote: > > On Wed, Jan 03, 2018 at 09:11:06PM +0100, Greg Kroah-Hartman wrote: > > > This is the start of the stable review cycle for the 4.4.110 release. > > > There are 37 patches in this series, all will be posted as a response > > > to this one. If anyone has any issues with these being applied, please > > > let me know. > > > > > > Responses should be made by Fri Jan 5 19:50:38 UTC 2018. > > > Anything received after that time might be too late. > > > > > > > Update: v4.4.110 final nosmp builds fail as follows: > > > > > > Error log: > > arch/x86/entry/vdso/vma.c: In function ‘map_vdso’: > > arch/x86/entry/vdso/vma.c:173:9: error: > > implicit declaration of function ‘pvclock_pvti_cpu0_va’ > > x86-64 or i386? x86-64 > That should be a CONFIG_PARAVIRT_CLOCK issue, not a smp build issue, > have a .config I can try? > https://github.com/groeck/linux-build-test/blob/master/rootfs/x86_64/qemu_x86_64_pc_nosmp_defconfig However, https://github.com/groeck/linux-build-test/blob/master/rootfs/x86_64/qemu_x86_64_pc_defconfig does build, and the only differences are: 30a31 > CONFIG_SMP=y 32a34,35 > CONFIG_NR_CPUS=24 > CONFIG_SCHED_SMT=y 44d46 < CONFIG_ACPI_CONTAINER=y Both configurations have CONFIG_PARAVIRT_CLOCK disabled. Guenter
Re: [PATCH 4.4 00/37] 4.4.110-stable review
The hardware works :) I meant that before the patch linked in https://lkml.org/lkml/2018/1/5/534, I was never able to boot 4.4.110. But with that patch applied, I was able to boot it at least once, but it could be accidental. The hang/panic does not happen at the same time on every boot. Pasha On 01/05/2018 03:45 PM, Greg Kroah-Hartman wrote: On Fri, Jan 05, 2018 at 02:18:32PM -0500, Pavel Tatashin wrote: Actually it helps, if before 4.4.110 never booted on my machine, not i was able to boot on a second try. Wait, what? This has never booted on 4.4.x before? Did 4.4.108 work? 109? Are you sure this hardware even works? :) thanks, greg k-h
Re: [PATCH 4.4 00/37] 4.4.110-stable review
The hardware works :) I meant that before the patch linked in https://lkml.org/lkml/2018/1/5/534, I was never able to boot 4.4.110. But with that patch applied, I was able to boot it at least once, but it could be accidental. The hang/panic does not happen at the same time on every boot. Pasha On 01/05/2018 03:45 PM, Greg Kroah-Hartman wrote: On Fri, Jan 05, 2018 at 02:18:32PM -0500, Pavel Tatashin wrote: Actually it helps, if before 4.4.110 never booted on my machine, not i was able to boot on a second try. Wait, what? This has never booted on 4.4.x before? Did 4.4.108 work? 109? Are you sure this hardware even works? :) thanks, greg k-h
Re: [PATCH 4.4 00/37] 4.4.110-stable review
On Fri, Jan 05, 2018 at 09:56:16AM -0800, Guenter Roeck wrote: > On Wed, Jan 03, 2018 at 09:11:06PM +0100, Greg Kroah-Hartman wrote: > > This is the start of the stable review cycle for the 4.4.110 release. > > There are 37 patches in this series, all will be posted as a response > > to this one. If anyone has any issues with these being applied, please > > let me know. > > > > Responses should be made by Fri Jan 5 19:50:38 UTC 2018. > > Anything received after that time might be too late. > > > > Update: v4.4.110 final nosmp builds fail as follows: > > > Error log: > arch/x86/entry/vdso/vma.c: In function ‘map_vdso’: > arch/x86/entry/vdso/vma.c:173:9: error: > implicit declaration of function ‘pvclock_pvti_cpu0_va’ x86-64 or i386? That should be a CONFIG_PARAVIRT_CLOCK issue, not a smp build issue, have a .config I can try? thanks, greg k-h
Re: [PATCH 4.4 00/37] 4.4.110-stable review
On Fri, Jan 05, 2018 at 09:56:16AM -0800, Guenter Roeck wrote: > On Wed, Jan 03, 2018 at 09:11:06PM +0100, Greg Kroah-Hartman wrote: > > This is the start of the stable review cycle for the 4.4.110 release. > > There are 37 patches in this series, all will be posted as a response > > to this one. If anyone has any issues with these being applied, please > > let me know. > > > > Responses should be made by Fri Jan 5 19:50:38 UTC 2018. > > Anything received after that time might be too late. > > > > Update: v4.4.110 final nosmp builds fail as follows: > > > Error log: > arch/x86/entry/vdso/vma.c: In function ‘map_vdso’: > arch/x86/entry/vdso/vma.c:173:9: error: > implicit declaration of function ‘pvclock_pvti_cpu0_va’ x86-64 or i386? That should be a CONFIG_PARAVIRT_CLOCK issue, not a smp build issue, have a .config I can try? thanks, greg k-h
Re: [PATCH 4.4 00/37] 4.4.110-stable review
On Fri, Jan 05, 2018 at 10:12:38AM -0800, Guenter Roeck wrote: > On Fri, Jan 05, 2018 at 04:00:55PM +0100, Greg Kroah-Hartman wrote: > > On Thu, Jan 04, 2018 at 09:56:47AM -0800, Guenter Roeck wrote: > > > > > > FWIW, v4.4.110-rc1 boots fine when merged into chromeos-4.4, on i7-7Y75. > > > > That's good to know, hopefully 4.4.110-final also still works for you :) > > It seems to be working. One patch to add for v4.4.111: > > 063fb3e56f6d ("x86/kasan: Write protect kasan zero shadow") > > It is needed to be able to run KASAN enabled images in KVM. Ugh, thanks for that, it also looks like SLES also is missing that one too. thanks, greg k-h
Re: [PATCH 4.4 00/37] 4.4.110-stable review
On Fri, Jan 05, 2018 at 10:12:38AM -0800, Guenter Roeck wrote: > On Fri, Jan 05, 2018 at 04:00:55PM +0100, Greg Kroah-Hartman wrote: > > On Thu, Jan 04, 2018 at 09:56:47AM -0800, Guenter Roeck wrote: > > > > > > FWIW, v4.4.110-rc1 boots fine when merged into chromeos-4.4, on i7-7Y75. > > > > That's good to know, hopefully 4.4.110-final also still works for you :) > > It seems to be working. One patch to add for v4.4.111: > > 063fb3e56f6d ("x86/kasan: Write protect kasan zero shadow") > > It is needed to be able to run KASAN enabled images in KVM. Ugh, thanks for that, it also looks like SLES also is missing that one too. thanks, greg k-h
Re: [PATCH 4.4 00/37] 4.4.110-stable review
On Fri, Jan 05, 2018 at 02:18:32PM -0500, Pavel Tatashin wrote: > Actually it helps, if before 4.4.110 never booted on my machine, not i > was able to boot on a second try. Wait, what? This has never booted on 4.4.x before? Did 4.4.108 work? 109? Are you sure this hardware even works? :) thanks, greg k-h
Re: [PATCH 4.4 00/37] 4.4.110-stable review
On Fri, Jan 05, 2018 at 02:18:32PM -0500, Pavel Tatashin wrote: > Actually it helps, if before 4.4.110 never booted on my machine, not i > was able to boot on a second try. Wait, what? This has never booted on 4.4.x before? Did 4.4.108 work? 109? Are you sure this hardware even works? :) thanks, greg k-h
Re: [PATCH 4.4 00/37] 4.4.110-stable review
On Fri, Jan 05, 2018 at 10:15:00AM -0800, Andy Lutomirski wrote: > On Fri, Jan 5, 2018 at 9:52 AM, Greg Kroah-Hartman >wrote: > > On Fri, Jan 05, 2018 at 12:48:54PM -0500, Pavel Tatashin wrote: > >> Boots successfully with "noefi" kernel parameter :) > > > > Thanks, that will help me narrow it down. I'll dig through more patches > > when I get home tonight... > > I wish you luck. The 4.4 series is "KAISER", not "KPTI", and the > relevant code is spread all over the place and is generally garbage. > See, for example, the turd called kaiser_set_shadow_pgd(). I would > not be terribly surprised if that particular turd is biting here. > > An alternative theory is that something is screwy in the EFI code. I > don't see anything directly wrong, but it's certainly a bit sketchy. > The newer kernels carefully avoid using PCID 0 for real work to avoid > corruption due to EFI and similar things. The "KAISER" code has no > such mitigation. Fortunately, it seems to use PCID=0 for kernel and > PCID=nonzero for user, so the obvious problem isn't present, but > something could still be wrong. > > Pavel, can you send your /proc/cpuinfo on a noefi boot? (Just the > first CPU worth is fine.) > > FWIW, I said before that I have very little desire to help debug > "KAISER". I stand by that. I totally understand, and do not expect your help at all. Worse case, I point people at 4.14 and tell them to upgrade, I'm not going to waste a ton of time on this for the same exact reasons you list here. And yeah, kaiser_set_shadow_pgd() is horrid, I've already gotten sucked into it for long enough... greg k-h
Re: [PATCH 4.4 00/37] 4.4.110-stable review
On Fri, Jan 05, 2018 at 10:15:00AM -0800, Andy Lutomirski wrote: > On Fri, Jan 5, 2018 at 9:52 AM, Greg Kroah-Hartman > wrote: > > On Fri, Jan 05, 2018 at 12:48:54PM -0500, Pavel Tatashin wrote: > >> Boots successfully with "noefi" kernel parameter :) > > > > Thanks, that will help me narrow it down. I'll dig through more patches > > when I get home tonight... > > I wish you luck. The 4.4 series is "KAISER", not "KPTI", and the > relevant code is spread all over the place and is generally garbage. > See, for example, the turd called kaiser_set_shadow_pgd(). I would > not be terribly surprised if that particular turd is biting here. > > An alternative theory is that something is screwy in the EFI code. I > don't see anything directly wrong, but it's certainly a bit sketchy. > The newer kernels carefully avoid using PCID 0 for real work to avoid > corruption due to EFI and similar things. The "KAISER" code has no > such mitigation. Fortunately, it seems to use PCID=0 for kernel and > PCID=nonzero for user, so the obvious problem isn't present, but > something could still be wrong. > > Pavel, can you send your /proc/cpuinfo on a noefi boot? (Just the > first CPU worth is fine.) > > FWIW, I said before that I have very little desire to help debug > "KAISER". I stand by that. I totally understand, and do not expect your help at all. Worse case, I point people at 4.14 and tell them to upgrade, I'm not going to waste a ton of time on this for the same exact reasons you list here. And yeah, kaiser_set_shadow_pgd() is horrid, I've already gotten sucked into it for long enough... greg k-h
Re: [PATCH 4.4 00/37] 4.4.110-stable review
Actually it helps, if before 4.4.110 never booted on my machine, not i was able to boot on a second try. On Fri, Jan 5, 2018 at 2:14 PM, Pavel Tatashinwrote: > I hoped, this patch would fix the efi issue: > https://lkml.org/lkml/2018/1/5/534 > > But, unfortunatly it does not. I got a partial panic message this time: > > [4.737578] usb 1-1: new high-speed USB device number 2 using ehci-pci > [4.846712] BUG: unable to handle kernel paging request at 00017e10 > [4.854509] IP: [] > native_queued_spin_lock_slowpath+0xfe/0x170 > [4.862780] PGD 0 > [4.865034] Oops: 0002 [#1] SMP > [4.868657] Modules linked in: > [4.872075] CPU: 0 PID: 0 Comm: swapper/0 Not tainted > 4.4.110_pt_linux-v4.4.110 #3 > [4.880526] Hardware name: Oracle Corporation ORACLE SERVER > X6-2/ASM,MOTHERBOARD,1U, BIOS 38050100 08/30/2016 > [4.891596] task: 81aab500 ti: 81a98000 task.ti: > 81a98000 > [4.899950] RIP: 0010:[] [] > native_queued_spin_lock_slowpath+0xfe/0x170 > [4.910936] RSP: :881fff803c88 EFLAGS: 00010002 > [4.916865] RAX: 206b RBX: 88407e611900 RCX: > 881fff817e00 > [4.924831] RDX: 00017e10 RSI: 0004 RDI: > 88407e611a58 > [4.932797] RBP: 881fff803c88 R08: 0101 R09: > > [4.940764] R10: 5c96d000 R11: 88005c96d0c0 R12: > 881ff25e52c8 > [4.948730] R13: 88407e6d1900 R14: 881fff8118c0 R15: > 88407e6118c0 > [4.956696] FS: () GS:881fff80() > knlGS: > [4.965727] CS: 0010 DS: ES: CR0: 80050033 > [4.972140] CR2: 00017e10 CR3: 01aa2000 CR4: 003606 > > On Fri, Jan 5, 2018 at 1:21 PM, Pavel Tatashin > wrote: >>> Pavel, can you send your /proc/cpuinfo on a noefi boot? (Just the >>> first CPU worth is fine.) >> >> With noefi option: >> >> [root@ca-ostest441 ~]# more /proc/cpuinfo >> processor : 0 >> vendor_id : GenuineIntel >> cpu family : 6 >> model : 79 >> model name : Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz >> stepping: 1 >> microcode : 0xb1d >> cpu MHz : 1971.406 >> cache size : 25600 KB >> physical id : 0 >> siblings: 20 >> core id : 0 >> cpu cores : 10 >> apicid : 0 >> initial apicid : 0 >> fpu : yes >> fpu_exception : yes >> cpuid level : 20 >> wp : yes >> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca >> cmov >> pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb >> rdt >> scp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology >> nonstop_tsc ap >> erfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 >> sdbg >> fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt >> tsc_deadline_time >> r aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch ida arat epb >> invpcid_singl >> e pln pts dtherm intel_pt kaiser tpr_shadow vnmi flexpriority ept vpid >> fsgsbase >> tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm rdseed adx smap >> xsaveopt >> cqm_llc cqm_occup_llc >> bugs: >> bogomips: 4390.08 >> clflush size: 64 >> cache_alignment : 64 >> address sizes : 46 bits physical, 48 bits virtual >> power management:
Re: [PATCH 4.4 00/37] 4.4.110-stable review
Actually it helps, if before 4.4.110 never booted on my machine, not i was able to boot on a second try. On Fri, Jan 5, 2018 at 2:14 PM, Pavel Tatashin wrote: > I hoped, this patch would fix the efi issue: > https://lkml.org/lkml/2018/1/5/534 > > But, unfortunatly it does not. I got a partial panic message this time: > > [4.737578] usb 1-1: new high-speed USB device number 2 using ehci-pci > [4.846712] BUG: unable to handle kernel paging request at 00017e10 > [4.854509] IP: [] > native_queued_spin_lock_slowpath+0xfe/0x170 > [4.862780] PGD 0 > [4.865034] Oops: 0002 [#1] SMP > [4.868657] Modules linked in: > [4.872075] CPU: 0 PID: 0 Comm: swapper/0 Not tainted > 4.4.110_pt_linux-v4.4.110 #3 > [4.880526] Hardware name: Oracle Corporation ORACLE SERVER > X6-2/ASM,MOTHERBOARD,1U, BIOS 38050100 08/30/2016 > [4.891596] task: 81aab500 ti: 81a98000 task.ti: > 81a98000 > [4.899950] RIP: 0010:[] [] > native_queued_spin_lock_slowpath+0xfe/0x170 > [4.910936] RSP: :881fff803c88 EFLAGS: 00010002 > [4.916865] RAX: 206b RBX: 88407e611900 RCX: > 881fff817e00 > [4.924831] RDX: 00017e10 RSI: 0004 RDI: > 88407e611a58 > [4.932797] RBP: 881fff803c88 R08: 0101 R09: > > [4.940764] R10: 5c96d000 R11: 88005c96d0c0 R12: > 881ff25e52c8 > [4.948730] R13: 88407e6d1900 R14: 881fff8118c0 R15: > 88407e6118c0 > [4.956696] FS: () GS:881fff80() > knlGS: > [4.965727] CS: 0010 DS: ES: CR0: 80050033 > [4.972140] CR2: 00017e10 CR3: 01aa2000 CR4: 003606 > > On Fri, Jan 5, 2018 at 1:21 PM, Pavel Tatashin > wrote: >>> Pavel, can you send your /proc/cpuinfo on a noefi boot? (Just the >>> first CPU worth is fine.) >> >> With noefi option: >> >> [root@ca-ostest441 ~]# more /proc/cpuinfo >> processor : 0 >> vendor_id : GenuineIntel >> cpu family : 6 >> model : 79 >> model name : Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz >> stepping: 1 >> microcode : 0xb1d >> cpu MHz : 1971.406 >> cache size : 25600 KB >> physical id : 0 >> siblings: 20 >> core id : 0 >> cpu cores : 10 >> apicid : 0 >> initial apicid : 0 >> fpu : yes >> fpu_exception : yes >> cpuid level : 20 >> wp : yes >> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca >> cmov >> pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb >> rdt >> scp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology >> nonstop_tsc ap >> erfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 >> sdbg >> fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt >> tsc_deadline_time >> r aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch ida arat epb >> invpcid_singl >> e pln pts dtherm intel_pt kaiser tpr_shadow vnmi flexpriority ept vpid >> fsgsbase >> tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm rdseed adx smap >> xsaveopt >> cqm_llc cqm_occup_llc >> bugs: >> bogomips: 4390.08 >> clflush size: 64 >> cache_alignment : 64 >> address sizes : 46 bits physical, 48 bits virtual >> power management:
Re: [PATCH 4.4 00/37] 4.4.110-stable review
I hoped, this patch would fix the efi issue: https://lkml.org/lkml/2018/1/5/534 But, unfortunatly it does not. I got a partial panic message this time: [4.737578] usb 1-1: new high-speed USB device number 2 using ehci-pci [4.846712] BUG: unable to handle kernel paging request at 00017e10 [4.854509] IP: [] native_queued_spin_lock_slowpath+0xfe/0x170 [4.862780] PGD 0 [4.865034] Oops: 0002 [#1] SMP [4.868657] Modules linked in: [4.872075] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.4.110_pt_linux-v4.4.110 #3 [4.880526] Hardware name: Oracle Corporation ORACLE SERVER X6-2/ASM,MOTHERBOARD,1U, BIOS 38050100 08/30/2016 [4.891596] task: 81aab500 ti: 81a98000 task.ti: 81a98000 [4.899950] RIP: 0010:[] [] native_queued_spin_lock_slowpath+0xfe/0x170 [4.910936] RSP: :881fff803c88 EFLAGS: 00010002 [4.916865] RAX: 206b RBX: 88407e611900 RCX: 881fff817e00 [4.924831] RDX: 00017e10 RSI: 0004 RDI: 88407e611a58 [4.932797] RBP: 881fff803c88 R08: 0101 R09: [4.940764] R10: 5c96d000 R11: 88005c96d0c0 R12: 881ff25e52c8 [4.948730] R13: 88407e6d1900 R14: 881fff8118c0 R15: 88407e6118c0 [4.956696] FS: () GS:881fff80() knlGS: [4.965727] CS: 0010 DS: ES: CR0: 80050033 [4.972140] CR2: 00017e10 CR3: 01aa2000 CR4: 003606 On Fri, Jan 5, 2018 at 1:21 PM, Pavel Tatashinwrote: >> Pavel, can you send your /proc/cpuinfo on a noefi boot? (Just the >> first CPU worth is fine.) > > With noefi option: > > [root@ca-ostest441 ~]# more /proc/cpuinfo > processor : 0 > vendor_id : GenuineIntel > cpu family : 6 > model : 79 > model name : Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz > stepping: 1 > microcode : 0xb1d > cpu MHz : 1971.406 > cache size : 25600 KB > physical id : 0 > siblings: 20 > core id : 0 > cpu cores : 10 > apicid : 0 > initial apicid : 0 > fpu : yes > fpu_exception : yes > cpuid level : 20 > wp : yes > flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca > cmov > pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb > rdt > scp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc > ap > erfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 > sdbg > fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt > tsc_deadline_time > r aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch ida arat epb > invpcid_singl > e pln pts dtherm intel_pt kaiser tpr_shadow vnmi flexpriority ept vpid > fsgsbase > tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm rdseed adx smap > xsaveopt > cqm_llc cqm_occup_llc > bugs: > bogomips: 4390.08 > clflush size: 64 > cache_alignment : 64 > address sizes : 46 bits physical, 48 bits virtual > power management:
Re: [PATCH 4.4 00/37] 4.4.110-stable review
I hoped, this patch would fix the efi issue: https://lkml.org/lkml/2018/1/5/534 But, unfortunatly it does not. I got a partial panic message this time: [4.737578] usb 1-1: new high-speed USB device number 2 using ehci-pci [4.846712] BUG: unable to handle kernel paging request at 00017e10 [4.854509] IP: [] native_queued_spin_lock_slowpath+0xfe/0x170 [4.862780] PGD 0 [4.865034] Oops: 0002 [#1] SMP [4.868657] Modules linked in: [4.872075] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.4.110_pt_linux-v4.4.110 #3 [4.880526] Hardware name: Oracle Corporation ORACLE SERVER X6-2/ASM,MOTHERBOARD,1U, BIOS 38050100 08/30/2016 [4.891596] task: 81aab500 ti: 81a98000 task.ti: 81a98000 [4.899950] RIP: 0010:[] [] native_queued_spin_lock_slowpath+0xfe/0x170 [4.910936] RSP: :881fff803c88 EFLAGS: 00010002 [4.916865] RAX: 206b RBX: 88407e611900 RCX: 881fff817e00 [4.924831] RDX: 00017e10 RSI: 0004 RDI: 88407e611a58 [4.932797] RBP: 881fff803c88 R08: 0101 R09: [4.940764] R10: 5c96d000 R11: 88005c96d0c0 R12: 881ff25e52c8 [4.948730] R13: 88407e6d1900 R14: 881fff8118c0 R15: 88407e6118c0 [4.956696] FS: () GS:881fff80() knlGS: [4.965727] CS: 0010 DS: ES: CR0: 80050033 [4.972140] CR2: 00017e10 CR3: 01aa2000 CR4: 003606 On Fri, Jan 5, 2018 at 1:21 PM, Pavel Tatashin wrote: >> Pavel, can you send your /proc/cpuinfo on a noefi boot? (Just the >> first CPU worth is fine.) > > With noefi option: > > [root@ca-ostest441 ~]# more /proc/cpuinfo > processor : 0 > vendor_id : GenuineIntel > cpu family : 6 > model : 79 > model name : Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz > stepping: 1 > microcode : 0xb1d > cpu MHz : 1971.406 > cache size : 25600 KB > physical id : 0 > siblings: 20 > core id : 0 > cpu cores : 10 > apicid : 0 > initial apicid : 0 > fpu : yes > fpu_exception : yes > cpuid level : 20 > wp : yes > flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca > cmov > pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb > rdt > scp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc > ap > erfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 > sdbg > fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt > tsc_deadline_time > r aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch ida arat epb > invpcid_singl > e pln pts dtherm intel_pt kaiser tpr_shadow vnmi flexpriority ept vpid > fsgsbase > tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm rdseed adx smap > xsaveopt > cqm_llc cqm_occup_llc > bugs: > bogomips: 4390.08 > clflush size: 64 > cache_alignment : 64 > address sizes : 46 bits physical, 48 bits virtual > power management:
Re: [PATCH 4.4 00/37] 4.4.110-stable review
> Pavel, can you send your /proc/cpuinfo on a noefi boot? (Just the > first CPU worth is fine.) With noefi option: [root@ca-ostest441 ~]# more /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 79 model name : Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz stepping: 1 microcode : 0xb1d cpu MHz : 1971.406 cache size : 25600 KB physical id : 0 siblings: 20 core id : 0 cpu cores : 10 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 20 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdt scp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc ap erfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_time r aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch ida arat epb invpcid_singl e pln pts dtherm intel_pt kaiser tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm rdseed adx smap xsaveopt cqm_llc cqm_occup_llc bugs: bogomips: 4390.08 clflush size: 64 cache_alignment : 64 address sizes : 46 bits physical, 48 bits virtual power management:
Re: [PATCH 4.4 00/37] 4.4.110-stable review
> Pavel, can you send your /proc/cpuinfo on a noefi boot? (Just the > first CPU worth is fine.) With noefi option: [root@ca-ostest441 ~]# more /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 79 model name : Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz stepping: 1 microcode : 0xb1d cpu MHz : 1971.406 cache size : 25600 KB physical id : 0 siblings: 20 core id : 0 cpu cores : 10 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 20 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdt scp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc ap erfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_time r aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch ida arat epb invpcid_singl e pln pts dtherm intel_pt kaiser tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm rdseed adx smap xsaveopt cqm_llc cqm_occup_llc bugs: bogomips: 4390.08 clflush size: 64 cache_alignment : 64 address sizes : 46 bits physical, 48 bits virtual power management:
Re: [PATCH 4.4 00/37] 4.4.110-stable review
On Fri, Jan 5, 2018 at 9:52 AM, Greg Kroah-Hartmanwrote: > On Fri, Jan 05, 2018 at 12:48:54PM -0500, Pavel Tatashin wrote: >> Boots successfully with "noefi" kernel parameter :) > > Thanks, that will help me narrow it down. I'll dig through more patches > when I get home tonight... I wish you luck. The 4.4 series is "KAISER", not "KPTI", and the relevant code is spread all over the place and is generally garbage. See, for example, the turd called kaiser_set_shadow_pgd(). I would not be terribly surprised if that particular turd is biting here. An alternative theory is that something is screwy in the EFI code. I don't see anything directly wrong, but it's certainly a bit sketchy. The newer kernels carefully avoid using PCID 0 for real work to avoid corruption due to EFI and similar things. The "KAISER" code has no such mitigation. Fortunately, it seems to use PCID=0 for kernel and PCID=nonzero for user, so the obvious problem isn't present, but something could still be wrong. Pavel, can you send your /proc/cpuinfo on a noefi boot? (Just the first CPU worth is fine.) FWIW, I said before that I have very little desire to help debug "KAISER". I stand by that.
Re: [PATCH 4.4 00/37] 4.4.110-stable review
On Fri, Jan 5, 2018 at 9:52 AM, Greg Kroah-Hartman wrote: > On Fri, Jan 05, 2018 at 12:48:54PM -0500, Pavel Tatashin wrote: >> Boots successfully with "noefi" kernel parameter :) > > Thanks, that will help me narrow it down. I'll dig through more patches > when I get home tonight... I wish you luck. The 4.4 series is "KAISER", not "KPTI", and the relevant code is spread all over the place and is generally garbage. See, for example, the turd called kaiser_set_shadow_pgd(). I would not be terribly surprised if that particular turd is biting here. An alternative theory is that something is screwy in the EFI code. I don't see anything directly wrong, but it's certainly a bit sketchy. The newer kernels carefully avoid using PCID 0 for real work to avoid corruption due to EFI and similar things. The "KAISER" code has no such mitigation. Fortunately, it seems to use PCID=0 for kernel and PCID=nonzero for user, so the obvious problem isn't present, but something could still be wrong. Pavel, can you send your /proc/cpuinfo on a noefi boot? (Just the first CPU worth is fine.) FWIW, I said before that I have very little desire to help debug "KAISER". I stand by that.
Re: [PATCH 4.4 00/37] 4.4.110-stable review
On Fri, Jan 05, 2018 at 04:00:55PM +0100, Greg Kroah-Hartman wrote: > On Thu, Jan 04, 2018 at 09:56:47AM -0800, Guenter Roeck wrote: > > > > FWIW, v4.4.110-rc1 boots fine when merged into chromeos-4.4, on i7-7Y75. > > That's good to know, hopefully 4.4.110-final also still works for you :) It seems to be working. One patch to add for v4.4.111: 063fb3e56f6d ("x86/kasan: Write protect kasan zero shadow") It is needed to be able to run KASAN enabled images in KVM. Guenter
Re: [PATCH 4.4 00/37] 4.4.110-stable review
On Fri, Jan 05, 2018 at 04:00:55PM +0100, Greg Kroah-Hartman wrote: > On Thu, Jan 04, 2018 at 09:56:47AM -0800, Guenter Roeck wrote: > > > > FWIW, v4.4.110-rc1 boots fine when merged into chromeos-4.4, on i7-7Y75. > > That's good to know, hopefully 4.4.110-final also still works for you :) It seems to be working. One patch to add for v4.4.111: 063fb3e56f6d ("x86/kasan: Write protect kasan zero shadow") It is needed to be able to run KASAN enabled images in KVM. Guenter
Re: [PATCH 4.4 00/37] 4.4.110-stable review
On Sat, Jan 06, 2018 at 02:20:16AM +0900, Alice Ferrazzi wrote: > On Thu, Jan 4, 2018 at 5:11 AM, Greg Kroah-Hartman >wrote: > > This is the start of the stable review cycle for the 4.4.110 release. > > There are 37 patches in this series, all will be posted as a response > > to this one. If anyone has any issues with these being applied, please > > let me know. > > > > Responses should be made by Fri Jan 5 19:50:38 UTC 2018. > > Anything received after that time might be too late. > > > > The whole patch series can be found in one patch at: > > kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.4.110-rc1.gz > > or in the git tree and branch at: > > git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git > > linux-4.4.y > > and the diffstat can be found below. > > > > This patchset merges correctly with Gentoo patches and GCC version 6.4.0 > The kernel boot up correctly. > Logs: http://kernel1.amd64.dev.gentoo.org:8010/#/builders/5/builds/44 Great, but Gentoo really should be moving to 4.9 and 4.14 here, I hope no one running Gentoo is relying on 4.4 :) thanks, greg k-h
Re: [PATCH 4.4 00/37] 4.4.110-stable review
On Sat, Jan 06, 2018 at 02:20:16AM +0900, Alice Ferrazzi wrote: > On Thu, Jan 4, 2018 at 5:11 AM, Greg Kroah-Hartman > wrote: > > This is the start of the stable review cycle for the 4.4.110 release. > > There are 37 patches in this series, all will be posted as a response > > to this one. If anyone has any issues with these being applied, please > > let me know. > > > > Responses should be made by Fri Jan 5 19:50:38 UTC 2018. > > Anything received after that time might be too late. > > > > The whole patch series can be found in one patch at: > > kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.4.110-rc1.gz > > or in the git tree and branch at: > > git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git > > linux-4.4.y > > and the diffstat can be found below. > > > > This patchset merges correctly with Gentoo patches and GCC version 6.4.0 > The kernel boot up correctly. > Logs: http://kernel1.amd64.dev.gentoo.org:8010/#/builders/5/builds/44 Great, but Gentoo really should be moving to 4.9 and 4.14 here, I hope no one running Gentoo is relying on 4.4 :) thanks, greg k-h
Re: [PATCH 4.4 00/37] 4.4.110-stable review
On Fri, Jan 05, 2018 at 04:57:15PM +0100, Willy Tarreau wrote: > On Fri, Jan 05, 2018 at 04:51:32PM +0100, Greg Kroah-Hartman wrote: > > On Fri, Jan 05, 2018 at 10:32:49AM -0500, Pavel Tatashin wrote: > (...) > > > Reboots after about 30 seconds. > > > > > > Boots fine with nopti option. > > > > Crap. > > > > And 4.9.75 works for you just fine? Same with 4.15-rc6? > > > > I'm wondering if this is some crazy gcc thing, given the ancient age of > > what you are using (gcc 4.8.5). I haven't used 4.x in many many years, > > is this what comes with RHEL6? What is the "base" distro you are > > building this on, and anything special about the hardware being used > > here? > > I don't think so, I'm personally building with 4.7.4 and am not seeing > this with 4.4.110. Ok, looks like an efi issue... greg k-h
Re: [PATCH 4.4 00/37] 4.4.110-stable review
On Fri, Jan 05, 2018 at 04:57:15PM +0100, Willy Tarreau wrote: > On Fri, Jan 05, 2018 at 04:51:32PM +0100, Greg Kroah-Hartman wrote: > > On Fri, Jan 05, 2018 at 10:32:49AM -0500, Pavel Tatashin wrote: > (...) > > > Reboots after about 30 seconds. > > > > > > Boots fine with nopti option. > > > > Crap. > > > > And 4.9.75 works for you just fine? Same with 4.15-rc6? > > > > I'm wondering if this is some crazy gcc thing, given the ancient age of > > what you are using (gcc 4.8.5). I haven't used 4.x in many many years, > > is this what comes with RHEL6? What is the "base" distro you are > > building this on, and anything special about the hardware being used > > here? > > I don't think so, I'm personally building with 4.7.4 and am not seeing > this with 4.4.110. Ok, looks like an efi issue... greg k-h
Re: [PATCH 4.4 00/37] 4.4.110-stable review
On Wed, Jan 03, 2018 at 09:11:06PM +0100, Greg Kroah-Hartman wrote: > This is the start of the stable review cycle for the 4.4.110 release. > There are 37 patches in this series, all will be posted as a response > to this one. If anyone has any issues with these being applied, please > let me know. > > Responses should be made by Fri Jan 5 19:50:38 UTC 2018. > Anything received after that time might be too late. > Update: v4.4.110 final nosmp builds fail as follows: Error log: arch/x86/entry/vdso/vma.c: In function ‘map_vdso’: arch/x86/entry/vdso/vma.c:173:9: error: implicit declaration of function ‘pvclock_pvti_cpu0_va’ Guenter
Re: [PATCH 4.4 00/37] 4.4.110-stable review
On Wed, Jan 03, 2018 at 09:11:06PM +0100, Greg Kroah-Hartman wrote: > This is the start of the stable review cycle for the 4.4.110 release. > There are 37 patches in this series, all will be posted as a response > to this one. If anyone has any issues with these being applied, please > let me know. > > Responses should be made by Fri Jan 5 19:50:38 UTC 2018. > Anything received after that time might be too late. > Update: v4.4.110 final nosmp builds fail as follows: Error log: arch/x86/entry/vdso/vma.c: In function ‘map_vdso’: arch/x86/entry/vdso/vma.c:173:9: error: implicit declaration of function ‘pvclock_pvti_cpu0_va’ Guenter
Re: [PATCH 4.4 00/37] 4.4.110-stable review
On Fri, Jan 05, 2018 at 12:48:54PM -0500, Pavel Tatashin wrote: > Boots successfully with "noefi" kernel parameter :) Thanks, that will help me narrow it down. I'll dig through more patches when I get home tonight... greg k-h
Re: [PATCH 4.4 00/37] 4.4.110-stable review
On Fri, Jan 05, 2018 at 12:48:54PM -0500, Pavel Tatashin wrote: > Boots successfully with "noefi" kernel parameter :) Thanks, that will help me narrow it down. I'll dig through more patches when I get home tonight... greg k-h
Re: [PATCH 4.4 00/37] 4.4.110-stable review
On Fri, Jan 05, 2018 at 02:41:04PM +0100, Greg Kroah-Hartman wrote: > On Thu, Jan 04, 2018 at 03:45:55PM -0800, Guenter Roeck wrote: > > On Wed, Jan 03, 2018 at 09:11:06PM +0100, Greg Kroah-Hartman wrote: > > > This is the start of the stable review cycle for the 4.4.110 release. > > > There are 37 patches in this series, all will be posted as a response > > > to this one. If anyone has any issues with these being applied, please > > > let me know. > > > > > > Responses should be made by Fri Jan 5 19:50:38 UTC 2018. > > > Anything received after that time might be too late. > > > > > > > This is also reported to crash if loaded under qemu + haxm under windows. [ ... ] > > The crash part of this problem may be solved with the following patch > > (thanks to Hugh for the hint). There is still another problem, though - > > with this patch applied, the qemu session aborts with "VCPU Shutdown > > request", whatever that means. > > v4.4.110 still suffers from "VCPU Shutdown request" with qemu+haxm. Unfortunately I don't have any other information about the problem at this time. Guenter
Re: [PATCH 4.4 00/37] 4.4.110-stable review
On Fri, Jan 05, 2018 at 02:41:04PM +0100, Greg Kroah-Hartman wrote: > On Thu, Jan 04, 2018 at 03:45:55PM -0800, Guenter Roeck wrote: > > On Wed, Jan 03, 2018 at 09:11:06PM +0100, Greg Kroah-Hartman wrote: > > > This is the start of the stable review cycle for the 4.4.110 release. > > > There are 37 patches in this series, all will be posted as a response > > > to this one. If anyone has any issues with these being applied, please > > > let me know. > > > > > > Responses should be made by Fri Jan 5 19:50:38 UTC 2018. > > > Anything received after that time might be too late. > > > > > > > This is also reported to crash if loaded under qemu + haxm under windows. [ ... ] > > The crash part of this problem may be solved with the following patch > > (thanks to Hugh for the hint). There is still another problem, though - > > with this patch applied, the qemu session aborts with "VCPU Shutdown > > request", whatever that means. > > v4.4.110 still suffers from "VCPU Shutdown request" with qemu+haxm. Unfortunately I don't have any other information about the problem at this time. Guenter