Hello,

  Sorry for the wide distribution -- I'm not sure who this should be directed
  to...

  We had been seeing panics in the alpha 2.3.41 stream where a kernel thread,
  typically one of the nfsd daemons or kswapd, fault on the swap_info swap_map
  address, which is a mapped (vmalloc'd) address.  The problem was due to
  the disconnect between the active_mm pgd value and what's actually stored
  in the kernel task's ptbr value -- which is what gets loaded into the PTBR
  register with each alpha context switch.  Eventually kernel tasks will find
  that the physical address stored in their thread_struct's ptbr become stale,
  as the page that they reference is freed and re-used elsewhere.

  I note that in 2.3.47, the problem looked to have been addressed by
  the addition of the enter_lazy_tlb() call in schedule():

        if (!mm) {
                if (next->active_mm) BUG();
                next->active_mm = oldmm;
                atomic_inc(&oldmm->mm_count);
  +++           enter_lazy_tlb(oldmm, next, this_cpu);
        }

  Unfortunately the alpha enter_lazy_tlb() doesn't do anything:

static inline void enter_lazy_tlb(struct mm_struct *mm, struct task_struct *tsk, unsigned cpu)
{
}

  If this is still a work in progress, excuse my interruption, but if not,
  the alpha enter_lazy_tlb() should update the kernel task's ptbr with the
  oldmm's pgd.  Right?

  If you're interested in the details, here's the evidence from a 2.3.47 crash
  dump, in which kswapd panicked trying to reference a swap_map address at
  fffffe0000000032:

crash> bt
PID: 2  TASK: fffffc001fd64000  CPU: 0  COMMAND: "kswapd"
 #0 [fffffc001fd67ad0] crash_save_current_state at fffffc0000336ffc
 #1 [fffffc001fd67ae0] panic at fffffc00003271f8
 #2 [fffffc001fd67b80] die_if_kernel at fffffc00003113d0
 #3 [fffffc001fd67bb0] do_page_fault at fffffc000031fecc
 #4 [fffffc001fd67bf0] entMM at fffffc000031055c
 EFRAME: fffffc001fd67c28      R24: 0000000000000cec
     R0: 0000000000000001      R25: 0000000000000007
     R1: fffffe0000000032      R26: fffffc0000350aec  <__delete_from_swap_cache+0x8c>
     R2: 0000000000000003      R27: fffffc00003514c0
     R3: 0000190000000000      R28: 0000000000000000
     R4: fffffc000052d888      HAE: 0000000000000000
     R5: 0000000000000200  TRAP_A0: fffffe0000000032
     R6: fffffc00006329d0  TRAP_A1: 0000000000000001
     R7: fffffc001fd67dc0  TRAP_A2: 0000000000000000
     R8: fffffc001fd64000       PS: 0000000000000000
    R19: 0000000000000400       PC: fffffc0000351544  <__swap_free+0x84>
    R20: fffffc00005317c0       GP: fffffc0000554030
    R21: 0000000000000000      R16: 0000190000000000
    R22: 0000000000000006      R17: 0000000000000001
    R23: fffffc0000345244      R18: 0000000000000059
 #5 [fffffc001fd67d10] __swap_free at fffffc0000351544
 #6 [fffffc001fd67d50] __delete_from_swap_cache at fffffc0000350aec
 #7 [fffffc001fd67d60] shrink_mmap at fffffc0000345460
 #8 [fffffc001fd67de0] do_try_to_free_pages at fffffc000034f87c
 #9 [fffffc001fd67e20] kswapd at fffffc000034fa2c
#10 [fffffc001fd67e60] kernel_thread at fffffc00003107f0

  In the case above, the kswapd's ptbr references physical address
  5bd8000, which has long since been freed and re-assigned to the
  kmem slab area:

crash> task fffffc001fd64000 | grep ptbr
    ptbr = 0x2dec,
crash> ptob 0x2dec
2dec: 5bd8000
crash> kmem -p 5bd8000
      PAGE       PHYSICAL       MAPPING      INDEX  CNT FLAGS
fffffc0000c212e0   5bd8000  0000000000000000    106  1  uptodate,slab

  At the same time as the panic above, the 8 nfsd daemons and the two
  idle tasks *all* contained ptbr values referencing physical addresses that
  had been freed and re-used.

  Thanks,
     Dave Anderson
     [EMAIL PROTECTED]
 
 
 
 

Reply via email to