Re: Crash in MM code in v4.4.y, v4.9.y with TRANSPARENT_HUGEPAGE enabled
On Mon, Aug 20, 2018 at 01:18:43PM -0700, Andi Kleen wrote: > On Mon, Aug 20, 2018 at 11:03:53AM -0700, Andi Kleen wrote: > > On Mon, Aug 20, 2018 at 06:29:38PM +0200, Michal Hocko wrote: > > > On Fri 17-08-18 15:27:33, Guenter Roeck wrote: > > > > Hi, > > > > > > > > the following crash is seen in v4.4.148, v4.4.149, v4.9.120, and > > > > v4.9.121 > > > > with CONFIG_TRANSPARENT_HUGEPAGE=y, > > > > CONFIG_TRANSPARENT_HUGEPAGE_MADVISE=y. > > > > > > Could you try to apply fd7e315988b7 ("x86/mm: Simplify p[g4um]d_page() > > > macros"). I do not see it in stable 4.4 tree and it has been introduced > > > much later in 4.14. This one gave us quite some headache because it is > > > s easy to overlook. > > > > Good catch! > > > > I tested that with 4.9 and backporting the patch indeed fixes the > > syzcaller test case running in a KVM VM. Backported patch appended. > > Tested on 4.4 too and it fixes the syzkaller test case there too. > Confirmed that the problem is fixed in v4.4.151-rc1 and v4.9.123-rc1. Thanks a lot for tracking this down and backporting the fix! Guenter
Re: Crash in MM code in v4.4.y, v4.9.y with TRANSPARENT_HUGEPAGE enabled
On Mon, Aug 20, 2018 at 01:18:43PM -0700, Andi Kleen wrote: > On Mon, Aug 20, 2018 at 11:03:53AM -0700, Andi Kleen wrote: > > On Mon, Aug 20, 2018 at 06:29:38PM +0200, Michal Hocko wrote: > > > On Fri 17-08-18 15:27:33, Guenter Roeck wrote: > > > > Hi, > > > > > > > > the following crash is seen in v4.4.148, v4.4.149, v4.9.120, and > > > > v4.9.121 > > > > with CONFIG_TRANSPARENT_HUGEPAGE=y, > > > > CONFIG_TRANSPARENT_HUGEPAGE_MADVISE=y. > > > > > > Could you try to apply fd7e315988b7 ("x86/mm: Simplify p[g4um]d_page() > > > macros"). I do not see it in stable 4.4 tree and it has been introduced > > > much later in 4.14. This one gave us quite some headache because it is > > > s easy to overlook. > > > > Good catch! > > > > I tested that with 4.9 and backporting the patch indeed fixes the > > syzcaller test case running in a KVM VM. Backported patch appended. > > Tested on 4.4 too and it fixes the syzkaller test case there too. > Confirmed that the problem is fixed in v4.4.151-rc1 and v4.9.123-rc1. Thanks a lot for tracking this down and backporting the fix! Guenter
Re: Crash in MM code in v4.4.y, v4.9.y with TRANSPARENT_HUGEPAGE enabled
On Mon, Aug 20, 2018 at 11:03:53AM -0700, Andi Kleen wrote: > On Mon, Aug 20, 2018 at 06:29:38PM +0200, Michal Hocko wrote: > > On Fri 17-08-18 15:27:33, Guenter Roeck wrote: > > > Hi, > > > > > > the following crash is seen in v4.4.148, v4.4.149, v4.9.120, and v4.9.121 > > > with CONFIG_TRANSPARENT_HUGEPAGE=y, CONFIG_TRANSPARENT_HUGEPAGE_MADVISE=y. > > > > Could you try to apply fd7e315988b7 ("x86/mm: Simplify p[g4um]d_page() > > macros"). I do not see it in stable 4.4 tree and it has been introduced > > much later in 4.14. This one gave us quite some headache because it is > > s easy to overlook. > > Good catch! > > I tested that with 4.9 and backporting the patch indeed fixes the > syzcaller test case running in a KVM VM. Backported patch appended. Tested on 4.4 too and it fixes the syzkaller test case there too. -Andi
Re: Crash in MM code in v4.4.y, v4.9.y with TRANSPARENT_HUGEPAGE enabled
On Mon, Aug 20, 2018 at 11:03:53AM -0700, Andi Kleen wrote: > On Mon, Aug 20, 2018 at 06:29:38PM +0200, Michal Hocko wrote: > > On Fri 17-08-18 15:27:33, Guenter Roeck wrote: > > > Hi, > > > > > > the following crash is seen in v4.4.148, v4.4.149, v4.9.120, and v4.9.121 > > > with CONFIG_TRANSPARENT_HUGEPAGE=y, CONFIG_TRANSPARENT_HUGEPAGE_MADVISE=y. > > > > Could you try to apply fd7e315988b7 ("x86/mm: Simplify p[g4um]d_page() > > macros"). I do not see it in stable 4.4 tree and it has been introduced > > much later in 4.14. This one gave us quite some headache because it is > > s easy to overlook. > > Good catch! > > I tested that with 4.9 and backporting the patch indeed fixes the > syzcaller test case running in a KVM VM. Backported patch appended. Tested on 4.4 too and it fixes the syzkaller test case there too. -Andi
Re: Crash in MM code in v4.4.y, v4.9.y with TRANSPARENT_HUGEPAGE enabled
On Mon 20-08-18 11:03:53, Andi Kleen wrote: > On Mon, Aug 20, 2018 at 06:29:38PM +0200, Michal Hocko wrote: > > On Fri 17-08-18 15:27:33, Guenter Roeck wrote: > > > Hi, > > > > > > the following crash is seen in v4.4.148, v4.4.149, v4.9.120, and v4.9.121 > > > with CONFIG_TRANSPARENT_HUGEPAGE=y, CONFIG_TRANSPARENT_HUGEPAGE_MADVISE=y. > > > > Could you try to apply fd7e315988b7 ("x86/mm: Simplify p[g4um]d_page() > > macros"). I do not see it in stable 4.4 tree and it has been introduced > > much later in 4.14. This one gave us quite some headache because it is > > s easy to overlook. > > Good catch! > > I tested that with 4.9 and backporting the patch indeed fixes the > syzcaller test case running in a KVM VM. Backported patch appended. > > Should probably go into 4.4 and 4.9. > > Cannot explain the 4.17 report unfortunately. I haven't seen that one yet and likely won't get to it tomorrow as well but I would start looking for a direct pte_val usage. We have had som out of tree xen code which was doing exactly this. Not really easy to find by a code inspection. -- Michal Hocko SUSE Labs
Re: Crash in MM code in v4.4.y, v4.9.y with TRANSPARENT_HUGEPAGE enabled
On Mon 20-08-18 11:03:53, Andi Kleen wrote: > On Mon, Aug 20, 2018 at 06:29:38PM +0200, Michal Hocko wrote: > > On Fri 17-08-18 15:27:33, Guenter Roeck wrote: > > > Hi, > > > > > > the following crash is seen in v4.4.148, v4.4.149, v4.9.120, and v4.9.121 > > > with CONFIG_TRANSPARENT_HUGEPAGE=y, CONFIG_TRANSPARENT_HUGEPAGE_MADVISE=y. > > > > Could you try to apply fd7e315988b7 ("x86/mm: Simplify p[g4um]d_page() > > macros"). I do not see it in stable 4.4 tree and it has been introduced > > much later in 4.14. This one gave us quite some headache because it is > > s easy to overlook. > > Good catch! > > I tested that with 4.9 and backporting the patch indeed fixes the > syzcaller test case running in a KVM VM. Backported patch appended. > > Should probably go into 4.4 and 4.9. > > Cannot explain the 4.17 report unfortunately. I haven't seen that one yet and likely won't get to it tomorrow as well but I would start looking for a direct pte_val usage. We have had som out of tree xen code which was doing exactly this. Not really easy to find by a code inspection. -- Michal Hocko SUSE Labs
Re: Crash in MM code in v4.4.y, v4.9.y with TRANSPARENT_HUGEPAGE enabled
On Mon, Aug 20, 2018 at 06:29:38PM +0200, Michal Hocko wrote: > On Fri 17-08-18 15:27:33, Guenter Roeck wrote: > > Hi, > > > > the following crash is seen in v4.4.148, v4.4.149, v4.9.120, and v4.9.121 > > with CONFIG_TRANSPARENT_HUGEPAGE=y, CONFIG_TRANSPARENT_HUGEPAGE_MADVISE=y. > > Could you try to apply fd7e315988b7 ("x86/mm: Simplify p[g4um]d_page() > macros"). I do not see it in stable 4.4 tree and it has been introduced > much later in 4.14. This one gave us quite some headache because it is > s easy to overlook. Good catch! I tested that with 4.9 and backporting the patch indeed fixes the syzcaller test case running in a KVM VM. Backported patch appended. Should probably go into 4.4 and 4.9. Cannot explain the 4.17 report unfortunately. I'll resend it as an email too --- x86/mm: Simplify p[g4um]d_page() macros Create a pgd_pfn() macro similar to the p[4um]d_pfn() macros and then use the p[g4um]d_pfn() macros in the p[g4um]d_page() macros instead of duplicating the code. [Needed to fix crashes caused by earlier backports in 4.9 stable, likely 4.4 too] Signed-off-by: Tom Lendacky Reviewed-by: Thomas Gleixner Reviewed-by: Borislav Petkov Cc: Alexander Potapenko Cc: Andrey Ryabinin Cc: Andy Lutomirski Cc: Arnd Bergmann Cc: Borislav Petkov Cc: Brijesh Singh Cc: Dave Young Cc: Dmitry Vyukov Cc: Jonathan Corbet Cc: Konrad Rzeszutek Wilk Cc: Larry Woodman Cc: Linus Torvalds Cc: Matt Fleming Cc: Michael S. Tsirkin Cc: Paolo Bonzini Cc: Peter Zijlstra Cc: Radim Krčmář Cc: Rik van Riel Cc: Toshimitsu Kani Cc: kasan-...@googlegroups.com Cc: k...@vger.kernel.org Cc: linux-a...@vger.kernel.org Cc: linux-...@vger.kernel.org Cc: linux-...@vger.kernel.org Cc: linux...@kvack.org Link: http://lkml.kernel.org/r/e61eb533a6d0aac941db2723d8aa63ef6b882dee.1500319216.git.thomas.lenda...@amd.com [Backported to 4.9 stable by AK, suggested by Michael Hocko] Signed-off-by: Ingo Molnar Signed-off-by: Andi Kleen diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index 4de6c282c02a..68a55273ce0f 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -173,6 +173,11 @@ static inline unsigned long pud_pfn(pud_t pud) return (pfn & pud_pfn_mask(pud)) >> PAGE_SHIFT; } +static inline unsigned long pgd_pfn(pgd_t pgd) +{ + return (pgd_val(pgd) & PTE_PFN_MASK) >> PAGE_SHIFT; +} + #define pte_page(pte) pfn_to_page(pte_pfn(pte)) static inline int pmd_large(pmd_t pte) @@ -578,8 +583,7 @@ static inline unsigned long pmd_page_vaddr(pmd_t pmd) * Currently stuck as a macro due to indirect forward reference to * linux/mmzone.h's __section_mem_map_addr() definition: */ -#define pmd_page(pmd) \ - pfn_to_page((pmd_val(pmd) & pmd_pfn_mask(pmd)) >> PAGE_SHIFT) +#define pmd_page(pmd) pfn_to_page(pmd_pfn(pmd)) /* * the pmd page can be thought of an array like this: pmd_t[PTRS_PER_PMD] @@ -647,8 +651,7 @@ static inline unsigned long pud_page_vaddr(pud_t pud) * Currently stuck as a macro due to indirect forward reference to * linux/mmzone.h's __section_mem_map_addr() definition: */ -#define pud_page(pud) \ - pfn_to_page((pud_val(pud) & pud_pfn_mask(pud)) >> PAGE_SHIFT) +#define pud_page(pud) pfn_to_page(pud_pfn(pud)) /* Find an entry in the second-level page table.. */ static inline pmd_t *pmd_offset(pud_t *pud, unsigned long address) @@ -688,7 +691,7 @@ static inline unsigned long pgd_page_vaddr(pgd_t pgd) * Currently stuck as a macro due to indirect forward reference to * linux/mmzone.h's __section_mem_map_addr() definition: */ -#define pgd_page(pgd) pfn_to_page(pgd_val(pgd) >> PAGE_SHIFT) +#define pgd_page(pgd) pfn_to_page(pgd_pfn(pgd)) /* to find an entry in a page-table-directory. */ static inline unsigned long pud_index(unsigned long address)
Re: Crash in MM code in v4.4.y, v4.9.y with TRANSPARENT_HUGEPAGE enabled
On Mon, Aug 20, 2018 at 06:29:38PM +0200, Michal Hocko wrote: > On Fri 17-08-18 15:27:33, Guenter Roeck wrote: > > Hi, > > > > the following crash is seen in v4.4.148, v4.4.149, v4.9.120, and v4.9.121 > > with CONFIG_TRANSPARENT_HUGEPAGE=y, CONFIG_TRANSPARENT_HUGEPAGE_MADVISE=y. > > Could you try to apply fd7e315988b7 ("x86/mm: Simplify p[g4um]d_page() > macros"). I do not see it in stable 4.4 tree and it has been introduced > much later in 4.14. This one gave us quite some headache because it is > s easy to overlook. Good catch! I tested that with 4.9 and backporting the patch indeed fixes the syzcaller test case running in a KVM VM. Backported patch appended. Should probably go into 4.4 and 4.9. Cannot explain the 4.17 report unfortunately. I'll resend it as an email too --- x86/mm: Simplify p[g4um]d_page() macros Create a pgd_pfn() macro similar to the p[4um]d_pfn() macros and then use the p[g4um]d_pfn() macros in the p[g4um]d_page() macros instead of duplicating the code. [Needed to fix crashes caused by earlier backports in 4.9 stable, likely 4.4 too] Signed-off-by: Tom Lendacky Reviewed-by: Thomas Gleixner Reviewed-by: Borislav Petkov Cc: Alexander Potapenko Cc: Andrey Ryabinin Cc: Andy Lutomirski Cc: Arnd Bergmann Cc: Borislav Petkov Cc: Brijesh Singh Cc: Dave Young Cc: Dmitry Vyukov Cc: Jonathan Corbet Cc: Konrad Rzeszutek Wilk Cc: Larry Woodman Cc: Linus Torvalds Cc: Matt Fleming Cc: Michael S. Tsirkin Cc: Paolo Bonzini Cc: Peter Zijlstra Cc: Radim Krčmář Cc: Rik van Riel Cc: Toshimitsu Kani Cc: kasan-...@googlegroups.com Cc: k...@vger.kernel.org Cc: linux-a...@vger.kernel.org Cc: linux-...@vger.kernel.org Cc: linux-...@vger.kernel.org Cc: linux...@kvack.org Link: http://lkml.kernel.org/r/e61eb533a6d0aac941db2723d8aa63ef6b882dee.1500319216.git.thomas.lenda...@amd.com [Backported to 4.9 stable by AK, suggested by Michael Hocko] Signed-off-by: Ingo Molnar Signed-off-by: Andi Kleen diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index 4de6c282c02a..68a55273ce0f 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -173,6 +173,11 @@ static inline unsigned long pud_pfn(pud_t pud) return (pfn & pud_pfn_mask(pud)) >> PAGE_SHIFT; } +static inline unsigned long pgd_pfn(pgd_t pgd) +{ + return (pgd_val(pgd) & PTE_PFN_MASK) >> PAGE_SHIFT; +} + #define pte_page(pte) pfn_to_page(pte_pfn(pte)) static inline int pmd_large(pmd_t pte) @@ -578,8 +583,7 @@ static inline unsigned long pmd_page_vaddr(pmd_t pmd) * Currently stuck as a macro due to indirect forward reference to * linux/mmzone.h's __section_mem_map_addr() definition: */ -#define pmd_page(pmd) \ - pfn_to_page((pmd_val(pmd) & pmd_pfn_mask(pmd)) >> PAGE_SHIFT) +#define pmd_page(pmd) pfn_to_page(pmd_pfn(pmd)) /* * the pmd page can be thought of an array like this: pmd_t[PTRS_PER_PMD] @@ -647,8 +651,7 @@ static inline unsigned long pud_page_vaddr(pud_t pud) * Currently stuck as a macro due to indirect forward reference to * linux/mmzone.h's __section_mem_map_addr() definition: */ -#define pud_page(pud) \ - pfn_to_page((pud_val(pud) & pud_pfn_mask(pud)) >> PAGE_SHIFT) +#define pud_page(pud) pfn_to_page(pud_pfn(pud)) /* Find an entry in the second-level page table.. */ static inline pmd_t *pmd_offset(pud_t *pud, unsigned long address) @@ -688,7 +691,7 @@ static inline unsigned long pgd_page_vaddr(pgd_t pgd) * Currently stuck as a macro due to indirect forward reference to * linux/mmzone.h's __section_mem_map_addr() definition: */ -#define pgd_page(pgd) pfn_to_page(pgd_val(pgd) >> PAGE_SHIFT) +#define pgd_page(pgd) pfn_to_page(pgd_pfn(pgd)) /* to find an entry in a page-table-directory. */ static inline unsigned long pud_index(unsigned long address)
Re: Crash in MM code in v4.4.y, v4.9.y with TRANSPARENT_HUGEPAGE enabled
On Fri 17-08-18 15:27:33, Guenter Roeck wrote: > Hi, > > the following crash is seen in v4.4.148, v4.4.149, v4.9.120, and v4.9.121 > with CONFIG_TRANSPARENT_HUGEPAGE=y, CONFIG_TRANSPARENT_HUGEPAGE_MADVISE=y. Could you try to apply fd7e315988b7 ("x86/mm: Simplify p[g4um]d_page() macros"). I do not see it in stable 4.4 tree and it has been introduced much later in 4.14. This one gave us quite some headache because it is s easy to overlook. -- Michal Hocko SUSE Labs
Re: Crash in MM code in v4.4.y, v4.9.y with TRANSPARENT_HUGEPAGE enabled
On Fri 17-08-18 15:27:33, Guenter Roeck wrote: > Hi, > > the following crash is seen in v4.4.148, v4.4.149, v4.9.120, and v4.9.121 > with CONFIG_TRANSPARENT_HUGEPAGE=y, CONFIG_TRANSPARENT_HUGEPAGE_MADVISE=y. Could you try to apply fd7e315988b7 ("x86/mm: Simplify p[g4um]d_page() macros"). I do not see it in stable 4.4 tree and it has been introduced much later in 4.14. This one gave us quite some headache because it is s easy to overlook. -- Michal Hocko SUSE Labs
Re: Crash in MM code in v4.4.y, v4.9.y with TRANSPARENT_HUGEPAGE enabled
> Plus I'd have expected the problem to have been in mainline too, and > apparently it's just the 4.4 and 4.9 backports. There's another problem in 4.17, but not 4.18, see https://bugzilla.redhat.com/show_bug.cgi?id=1618792 Could be the same or different. -Andi > > Your test-case does have mprotect with PROT_NONE. Which together with > that mask that *might* be PHYSICAL_PMD_PAGE_MASK makes me think it > might be related. > > Linus
Re: Crash in MM code in v4.4.y, v4.9.y with TRANSPARENT_HUGEPAGE enabled
> Plus I'd have expected the problem to have been in mainline too, and > apparently it's just the 4.4 and 4.9 backports. There's another problem in 4.17, but not 4.18, see https://bugzilla.redhat.com/show_bug.cgi?id=1618792 Could be the same or different. -Andi > > Your test-case does have mprotect with PROT_NONE. Which together with > that mask that *might* be PHYSICAL_PMD_PAGE_MASK makes me think it > might be related. > > Linus
Re: Crash in MM code in v4.4.y, v4.9.y with TRANSPARENT_HUGEPAGE enabled
On 08/17/2018 05:25 PM, Linus Torvalds wrote: On Fri, Aug 17, 2018 at 3:27 PM Guenter Roeck wrote: [6.649970] random: crng init done [6.689002] BUG: unable to handle kernel paging request at eafffa1a0020 Hmm. Lots of bits set. [6.689082] RIP: 0010:[] [] page_remove_rmap+0x10/0x230 [6.689082] RSP: 0018:c97abc18 EFLAGS: 0296 [6.689082] RAX: ea0005e58000 RBX: eafffa1a RCX: 2020 [6.689082] RDX: 3fe0 RSI: 0001 RDI: eafffa1a Is that RDX value the same value as PHYSICAL_PMD_PAGE_MASK? If I did my math right, it would be, if your CPU has 46 bits of physical memory. Might that be the case? Yes. The reason I mention that is because we had the bug with spurious inversion of the zero pte/pmd, fixed by f19f5c49bbc3 ("x86/speculation/l1tf: Exempt zeroed PTEs from inversion") I applied that patch, but it didn't help. I get exactly the same crash and traceback. and that would make a zeroed pmd entry be inverted by PHYSICAL_PMD_PAGE_MASK, and then you get odd garbage page pointers etc. Maybe. I could have gotten the math wrong too, but it sounds like the register contents _potentially_ might match up with something like this, and then we'd zap a bogus hugepage because of some confusion. Although then I'd have expected the bisection to hit "x86/speculation/l1tf: Invert all not present mappings" instead of the one you hit, so I don't know. Plus I'd have expected the problem to have been in mainline too, and apparently it's just the 4.4 and 4.9 backports. Personally I suspect that something went wrong or is missing in the backport from 4.14 to 4.9. 5-level paging was introduced in between, and thp support was extended to support additional architectures. With all those changes, it is easy to miss something. Only I have no idea what that might be. Guenter
Re: Crash in MM code in v4.4.y, v4.9.y with TRANSPARENT_HUGEPAGE enabled
On 08/17/2018 05:25 PM, Linus Torvalds wrote: On Fri, Aug 17, 2018 at 3:27 PM Guenter Roeck wrote: [6.649970] random: crng init done [6.689002] BUG: unable to handle kernel paging request at eafffa1a0020 Hmm. Lots of bits set. [6.689082] RIP: 0010:[] [] page_remove_rmap+0x10/0x230 [6.689082] RSP: 0018:c97abc18 EFLAGS: 0296 [6.689082] RAX: ea0005e58000 RBX: eafffa1a RCX: 2020 [6.689082] RDX: 3fe0 RSI: 0001 RDI: eafffa1a Is that RDX value the same value as PHYSICAL_PMD_PAGE_MASK? If I did my math right, it would be, if your CPU has 46 bits of physical memory. Might that be the case? Yes. The reason I mention that is because we had the bug with spurious inversion of the zero pte/pmd, fixed by f19f5c49bbc3 ("x86/speculation/l1tf: Exempt zeroed PTEs from inversion") I applied that patch, but it didn't help. I get exactly the same crash and traceback. and that would make a zeroed pmd entry be inverted by PHYSICAL_PMD_PAGE_MASK, and then you get odd garbage page pointers etc. Maybe. I could have gotten the math wrong too, but it sounds like the register contents _potentially_ might match up with something like this, and then we'd zap a bogus hugepage because of some confusion. Although then I'd have expected the bisection to hit "x86/speculation/l1tf: Invert all not present mappings" instead of the one you hit, so I don't know. Plus I'd have expected the problem to have been in mainline too, and apparently it's just the 4.4 and 4.9 backports. Personally I suspect that something went wrong or is missing in the backport from 4.14 to 4.9. 5-level paging was introduced in between, and thp support was extended to support additional architectures. With all those changes, it is easy to miss something. Only I have no idea what that might be. Guenter
Re: Crash in MM code in v4.4.y, v4.9.y with TRANSPARENT_HUGEPAGE enabled
On Fri, Aug 17, 2018 at 3:27 PM Guenter Roeck wrote: > > [6.649970] random: crng init done > [6.689002] BUG: unable to handle kernel paging request at eafffa1a0020 Hmm. Lots of bits set. > [6.689082] RIP: 0010:[] [] > page_remove_rmap+0x10/0x230 > [6.689082] RSP: 0018:c97abc18 EFLAGS: 0296 > [6.689082] RAX: ea0005e58000 RBX: eafffa1a RCX: > 2020 > [6.689082] RDX: 3fe0 RSI: 0001 RDI: > eafffa1a Is that RDX value the same value as PHYSICAL_PMD_PAGE_MASK? If I did my math right, it would be, if your CPU has 46 bits of physical memory. Might that be the case? The reason I mention that is because we had the bug with spurious inversion of the zero pte/pmd, fixed by f19f5c49bbc3 ("x86/speculation/l1tf: Exempt zeroed PTEs from inversion") and that would make a zeroed pmd entry be inverted by PHYSICAL_PMD_PAGE_MASK, and then you get odd garbage page pointers etc. Maybe. I could have gotten the math wrong too, but it sounds like the register contents _potentially_ might match up with something like this, and then we'd zap a bogus hugepage because of some confusion. Although then I'd have expected the bisection to hit "x86/speculation/l1tf: Invert all not present mappings" instead of the one you hit, so I don't know. Plus I'd have expected the problem to have been in mainline too, and apparently it's just the 4.4 and 4.9 backports. Your test-case does have mprotect with PROT_NONE. Which together with that mask that *might* be PHYSICAL_PMD_PAGE_MASK makes me think it might be related. Linus
Re: Crash in MM code in v4.4.y, v4.9.y with TRANSPARENT_HUGEPAGE enabled
On Fri, Aug 17, 2018 at 3:27 PM Guenter Roeck wrote: > > [6.649970] random: crng init done > [6.689002] BUG: unable to handle kernel paging request at eafffa1a0020 Hmm. Lots of bits set. > [6.689082] RIP: 0010:[] [] > page_remove_rmap+0x10/0x230 > [6.689082] RSP: 0018:c97abc18 EFLAGS: 0296 > [6.689082] RAX: ea0005e58000 RBX: eafffa1a RCX: > 2020 > [6.689082] RDX: 3fe0 RSI: 0001 RDI: > eafffa1a Is that RDX value the same value as PHYSICAL_PMD_PAGE_MASK? If I did my math right, it would be, if your CPU has 46 bits of physical memory. Might that be the case? The reason I mention that is because we had the bug with spurious inversion of the zero pte/pmd, fixed by f19f5c49bbc3 ("x86/speculation/l1tf: Exempt zeroed PTEs from inversion") and that would make a zeroed pmd entry be inverted by PHYSICAL_PMD_PAGE_MASK, and then you get odd garbage page pointers etc. Maybe. I could have gotten the math wrong too, but it sounds like the register contents _potentially_ might match up with something like this, and then we'd zap a bogus hugepage because of some confusion. Although then I'd have expected the bisection to hit "x86/speculation/l1tf: Invert all not present mappings" instead of the one you hit, so I don't know. Plus I'd have expected the problem to have been in mainline too, and apparently it's just the 4.4 and 4.9 backports. Your test-case does have mprotect with PROT_NONE. Which together with that mask that *might* be PHYSICAL_PMD_PAGE_MASK makes me think it might be related. Linus
Crash in MM code in v4.4.y, v4.9.y with TRANSPARENT_HUGEPAGE enabled
Hi, the following crash is seen in v4.4.148, v4.4.149, v4.9.120, and v4.9.121 with CONFIG_TRANSPARENT_HUGEPAGE=y, CONFIG_TRANSPARENT_HUGEPAGE_MADVISE=y. [6.649970] random: crng init done [6.689002] BUG: unable to handle kernel paging request at eafffa1a0020 [6.689082] IP: [] page_remove_rmap+0x10/0x230 [6.689082] PGD 0 [6.689082] [6.689082] Oops: [#1] SMP [6.689082] Modules linked in: [6.689082] CPU: 3 PID: 1132 Comm: mmtest Not tainted 4.9.121 #16 [6.689082] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.11.1-0-g0551a4be2c-prebuilt.qemu-project.org 04/01/204 [6.689082] task: 88017a558c40 task.stack: c97a8000 [6.689082] RIP: 0010:[] [] page_remove_rmap+0x10/0x230 [6.689082] RSP: 0018:c97abc18 EFLAGS: 0296 [6.689082] RAX: ea0005e58000 RBX: eafffa1a RCX: 2020 [6.689082] RDX: 3fe0 RSI: 0001 RDI: eafffa1a [6.689082] RBP: c97abc38 R08: R09: 2080 [6.689082] R10: ea0005e51ec0 R11: R12: eafffa1a [6.689082] R13: c97abdc0 R14: 880179426808 R15: c97abdc0 [6.689082] FS: () GS:88017fd8() knlGS: [6.689082] CS: 0010 DS: ES: CR0: 80050033 [6.689082] CR2: eafffa1a0020 CR3: 00017a3f8000 CR4: 00340670 [6.689082] Stack: [6.689082] 81138517 ea0005e50980 eafffa1a c97abdc0 [6.689082] c97abc68 8118d52c 880179426808 2020 [6.689082] c97abdc0 c97abdc0 c97abd40 8115e270 [6.689082] Call Trace: [6.689082] [] ? __alloc_pages_nodemask+0xd7/0x210 [6.689082] [] zap_huge_pmd+0xec/0x2a0 [6.689082] [] unmap_page_range+0x7d0/0x8d0 [6.689082] [] unmap_single_vma+0x54/0xd0 [6.689082] [] unmap_vmas+0x4c/0xa0 [6.689082] [] exit_mmap+0xa7/0x130 [6.689082] [] ? __khugepaged_exit+0x6f/0x100 [6.689082] [] mmput+0x38/0x100 [6.689082] [] do_exit+0x25c/0xb10 [6.689082] [] do_group_exit+0x3e/0xa0 [6.689082] [] SyS_exit_group+0xf/0x10 [6.689082] [] do_syscall_64+0x5c/0xc0 [6.689082] [] entry_SYSCALL_64_after_swapgs+0x58/0xc6 [6.689082] Code: 77 ff ff ff eb b8 be 13 00 00 00 4c 89 e7 e8 d8 40 fe ff 48 63 d3 eb b3 0f 1f 00 55 48 89 e5 41 55 41 54 53 48 [6.689082] RIP [] page_remove_rmap+0x10/0x230 [6.689082] RSP [6.689082] CR2: eafffa1a0020 [6.689082] ---[ end trace 62ac9ace190510cd ]--- [6.689082] Fixing recursive fault but reboot is needed! A test program to trigger the crash is attached, as are bisect results and some additional information. Upstream commit 19f5c49bbc3 ("x86/speculation/l1tf: Exempt zeroed PTEs from inversion") does not fix the problem. Many thanks to the syzcaller team for finding the problem and for providing a reproducer. Any help to track down the problem would be appreciated. This is out of my league. Thanks, Guenter --- #define _GNU_SOURCE #include #include #include #include #include #include #include #include int main() { syscall(__NR_mmap, 0x2000, 0x100, 3, 0x32, -1, 0); syscall(__NR_madvise, 0x20a93000, 0x4000, 0xe); syscall(__NR_mremap, 0x20a96000, 0x1000, 0x80, 3, 0x2013); syscall(__NR_sigaltstack, 0x20341000, 0x20ef9ff8); syscall(__NR_mprotect, 0x2000, 0x80, 0); return 0; } --- # bad: [93e02ae4200184bab43ce29966e895826a756a37] Linux 4.9.120 # good: [8f21ecb4249a0914aea08bef1befca9019a3b44b] Linux 4.9.119 git bisect start 'v4.9.120' 'v4.9.119' # bad: [a0695af3406ae2a08184bd47a9e948fe6f9858b9] x86/KVM: Warn user if KVM is loaded SMT and L1TF CPU bug being present git bisect bad a0695af3406ae2a08184bd47a9e948fe6f9858b9 # good: [1a4922e0f01d08a4789b1e17b195bc30bf234a3b] mm: x86: move _PAGE_SWP_SOFT_DIRTY from bit 7 to bit 1 git bisect good 1a4922e0f01d08a4789b1e17b195bc30bf234a3b # bad: [e0439285c628dea71517a1e77cab805d9134f898] x86/cpu: Remove the pointless CPU printout git bisect bad e0439285c628dea71517a1e77cab805d9134f898 # bad: [e3923475ebb1b503668dfdb3ba90e2ebd46931e6] x86/speculation/l1tf: Limit swap file size to MAX_PA/2 git bisect bad e3923475ebb1b503668dfdb3ba90e2ebd46931e6 # bad: [33182fe97add6e83c195e9d0f7297a6499563b52] x86/speculation/l1tf: Protect PROT_NONE PTEs against speculation git bisect bad 33182fe97add6e83c195e9d0f7297a6499563b52 # good: [60712274887fcd4ad5eb8e01796022b6b202143c] x86/speculation/l1tf: Protect swap entries against L1TF git bisect good 60712274887fcd4ad5eb8e01796022b6b202143c # first bad commit: [33182fe97add6e83c195e9d0f7297a6499563b52] # x86/speculation/l1tf: Protect PROT_NONE PTEs against speculation --- qemu command line: qemu-system-x86_64 \ -kernel arch/x86/boot/bzImage -M q35 -cpu Broadwell-noTSX \ -no-reboot -m 4G -smp 4 \ -drive
Crash in MM code in v4.4.y, v4.9.y with TRANSPARENT_HUGEPAGE enabled
Hi, the following crash is seen in v4.4.148, v4.4.149, v4.9.120, and v4.9.121 with CONFIG_TRANSPARENT_HUGEPAGE=y, CONFIG_TRANSPARENT_HUGEPAGE_MADVISE=y. [6.649970] random: crng init done [6.689002] BUG: unable to handle kernel paging request at eafffa1a0020 [6.689082] IP: [] page_remove_rmap+0x10/0x230 [6.689082] PGD 0 [6.689082] [6.689082] Oops: [#1] SMP [6.689082] Modules linked in: [6.689082] CPU: 3 PID: 1132 Comm: mmtest Not tainted 4.9.121 #16 [6.689082] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.11.1-0-g0551a4be2c-prebuilt.qemu-project.org 04/01/204 [6.689082] task: 88017a558c40 task.stack: c97a8000 [6.689082] RIP: 0010:[] [] page_remove_rmap+0x10/0x230 [6.689082] RSP: 0018:c97abc18 EFLAGS: 0296 [6.689082] RAX: ea0005e58000 RBX: eafffa1a RCX: 2020 [6.689082] RDX: 3fe0 RSI: 0001 RDI: eafffa1a [6.689082] RBP: c97abc38 R08: R09: 2080 [6.689082] R10: ea0005e51ec0 R11: R12: eafffa1a [6.689082] R13: c97abdc0 R14: 880179426808 R15: c97abdc0 [6.689082] FS: () GS:88017fd8() knlGS: [6.689082] CS: 0010 DS: ES: CR0: 80050033 [6.689082] CR2: eafffa1a0020 CR3: 00017a3f8000 CR4: 00340670 [6.689082] Stack: [6.689082] 81138517 ea0005e50980 eafffa1a c97abdc0 [6.689082] c97abc68 8118d52c 880179426808 2020 [6.689082] c97abdc0 c97abdc0 c97abd40 8115e270 [6.689082] Call Trace: [6.689082] [] ? __alloc_pages_nodemask+0xd7/0x210 [6.689082] [] zap_huge_pmd+0xec/0x2a0 [6.689082] [] unmap_page_range+0x7d0/0x8d0 [6.689082] [] unmap_single_vma+0x54/0xd0 [6.689082] [] unmap_vmas+0x4c/0xa0 [6.689082] [] exit_mmap+0xa7/0x130 [6.689082] [] ? __khugepaged_exit+0x6f/0x100 [6.689082] [] mmput+0x38/0x100 [6.689082] [] do_exit+0x25c/0xb10 [6.689082] [] do_group_exit+0x3e/0xa0 [6.689082] [] SyS_exit_group+0xf/0x10 [6.689082] [] do_syscall_64+0x5c/0xc0 [6.689082] [] entry_SYSCALL_64_after_swapgs+0x58/0xc6 [6.689082] Code: 77 ff ff ff eb b8 be 13 00 00 00 4c 89 e7 e8 d8 40 fe ff 48 63 d3 eb b3 0f 1f 00 55 48 89 e5 41 55 41 54 53 48 [6.689082] RIP [] page_remove_rmap+0x10/0x230 [6.689082] RSP [6.689082] CR2: eafffa1a0020 [6.689082] ---[ end trace 62ac9ace190510cd ]--- [6.689082] Fixing recursive fault but reboot is needed! A test program to trigger the crash is attached, as are bisect results and some additional information. Upstream commit 19f5c49bbc3 ("x86/speculation/l1tf: Exempt zeroed PTEs from inversion") does not fix the problem. Many thanks to the syzcaller team for finding the problem and for providing a reproducer. Any help to track down the problem would be appreciated. This is out of my league. Thanks, Guenter --- #define _GNU_SOURCE #include #include #include #include #include #include #include #include int main() { syscall(__NR_mmap, 0x2000, 0x100, 3, 0x32, -1, 0); syscall(__NR_madvise, 0x20a93000, 0x4000, 0xe); syscall(__NR_mremap, 0x20a96000, 0x1000, 0x80, 3, 0x2013); syscall(__NR_sigaltstack, 0x20341000, 0x20ef9ff8); syscall(__NR_mprotect, 0x2000, 0x80, 0); return 0; } --- # bad: [93e02ae4200184bab43ce29966e895826a756a37] Linux 4.9.120 # good: [8f21ecb4249a0914aea08bef1befca9019a3b44b] Linux 4.9.119 git bisect start 'v4.9.120' 'v4.9.119' # bad: [a0695af3406ae2a08184bd47a9e948fe6f9858b9] x86/KVM: Warn user if KVM is loaded SMT and L1TF CPU bug being present git bisect bad a0695af3406ae2a08184bd47a9e948fe6f9858b9 # good: [1a4922e0f01d08a4789b1e17b195bc30bf234a3b] mm: x86: move _PAGE_SWP_SOFT_DIRTY from bit 7 to bit 1 git bisect good 1a4922e0f01d08a4789b1e17b195bc30bf234a3b # bad: [e0439285c628dea71517a1e77cab805d9134f898] x86/cpu: Remove the pointless CPU printout git bisect bad e0439285c628dea71517a1e77cab805d9134f898 # bad: [e3923475ebb1b503668dfdb3ba90e2ebd46931e6] x86/speculation/l1tf: Limit swap file size to MAX_PA/2 git bisect bad e3923475ebb1b503668dfdb3ba90e2ebd46931e6 # bad: [33182fe97add6e83c195e9d0f7297a6499563b52] x86/speculation/l1tf: Protect PROT_NONE PTEs against speculation git bisect bad 33182fe97add6e83c195e9d0f7297a6499563b52 # good: [60712274887fcd4ad5eb8e01796022b6b202143c] x86/speculation/l1tf: Protect swap entries against L1TF git bisect good 60712274887fcd4ad5eb8e01796022b6b202143c # first bad commit: [33182fe97add6e83c195e9d0f7297a6499563b52] # x86/speculation/l1tf: Protect PROT_NONE PTEs against speculation --- qemu command line: qemu-system-x86_64 \ -kernel arch/x86/boot/bzImage -M q35 -cpu Broadwell-noTSX \ -no-reboot -m 4G -smp 4 \ -drive