I've seen the WARN_ON_ONCE(__read_cr3() != build_cr3()) in
switch_mm_irqs_off() every once in a while during a snapshotted system
upgrade.
I also saw the warning early during which was introduced in commit
decab0888e6e ("x86/mm: Remove preempt_disable/enable() from
__native_flush_tlb()"). The callchain is

  get_page_from_freelist() -> post_alloc_hook() -> __kernel_map_pages()

with CONFIG_DEBUG_PAGEALLOC enabled.

Turns out, once I disable preemption around __flush_tlb_all() both
warnings do not appear.

Disable preemption during CR3 reset / __flush_tlb_all() and add a
comment why preemption is disabled.
Add another preemptible() check in __flush_tlb_all() so we catch users
with enabled preemption and with the PGE which would not trigger the
warning in __native_flush_tlb() (suggested by Andy Lutomirski).

Fixes: decab0888e6e ("x86/mm: Remove preempt_disable/enable() from 
__native_flush_tlb()")
Signed-off-by: Sebastian Andrzej Siewior <[email protected]>
---
v1…v2:
        - Add a comment before disabling preemption explaining why this
          is done.
        - Add a preemption check to __flush_tlb_all().

 arch/x86/include/asm/tlbflush.h | 6 ++++++
 arch/x86/mm/pageattr.c          | 6 +++++-
 2 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h
index 58ce5288878e8..0e2130d8d6b12 100644
--- a/arch/x86/include/asm/tlbflush.h
+++ b/arch/x86/include/asm/tlbflush.h
@@ -469,6 +469,12 @@ static inline void __native_flush_tlb_one_user(unsigned 
long addr)
  */
 static inline void __flush_tlb_all(void)
 {
+       /*
+        * This is to catch users with enabled preemption and the PGE feature
+        * and don't trigger the warning in __native_flush_tlb().
+        */
+       VM_WARN_ON_ONCE(preemptible());
+
        if (boot_cpu_has(X86_FEATURE_PGE)) {
                __flush_tlb_global();
        } else {
diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index 51a5a69ecac9f..e2d4b25c7aa44 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -2086,9 +2086,13 @@ void __kernel_map_pages(struct page *page, int numpages, 
int enable)
 
        /*
         * We should perform an IPI and flush all tlbs,
-        * but that can deadlock->flush only current cpu:
+        * but that can deadlock->flush only current cpu.
+        * Preemption needs to be disabled around __flush_tlb_all() due to
+        * CR3 reload in __native_flush_tlb().
         */
+       preempt_disable();
        __flush_tlb_all();
+       preempt_enable();
 
        arch_flush_lazy_mmu_mode();
 }
-- 
2.19.1

Reply via email to