PROBLEM: infinite loop do_sparc64_fault with fault_code 2

weiqi Mon, 01 Jun 2015 23:55:08 -0700

Hello,
   Everyone
       Nearly, I'm working on a sparc64 machine with linux-2.6.32 (32 cores, 
SMP) ,64bit kernel and userspace is 32bit.
 
      when I run LTP test case with command :"./kill10 -c100 -g 1 -n 
1",  It will trap in  an infinite page_fault   loop  occasionally.  and 
 one of the kill10 process will  use 100% CPU . (easy to repeat, just 
run command again and again)


       After some debug, I find :

      1) the fault address is the same, and always at kill10's user-stack, for 
example "0xffb0b470".

  
 
    2) the fault  happend when kill10 handle signal at  put_user()  , 
code path: arch/sparc/kernel/signal32.c: setup_frame32()  --> 
put_user().

      3) The first  fault is handled by do_wp_page() 
because of COW,  and then do_wp_page() found PageAnon(old_page)  then 
reuse old_page.

   
   4) then go into  infinite loop  fault  with fault_code 2 (D-TLB 
miss), and  handled by handle_pte_fault() out at flush_tlb_page()  which
 has a comment :
                /*
                 * This is needed only for protection faults but the arch code
                 * is not yet telling us if this is a protection fault or not.
                 * This still avoids useless tlb flushes for .text page faults
                 * with threads.
                 */
                   if (flags & FAULT_FLAG_WRITE)
                        flush_tlb_page(vma, address);

     I'v also tested  with linux-3.10,  and almost same result.
  
   I know sparc has software tlb process,  In the function do_wp_page(),
 it will call  flush_tlb_page() and update_mmu_cache() , but It seems  
no effect, just   D-TLB miss  infinitely at same address

N�����r��y����b�X��ǧv�^�)޺{.n�+����{����zX����ܨ}���Ơz�&j:+v�������zZ+��+zf���h���~����i���z��w���?�����&�)ߢf��^jǫy�m��@A�a���
0��h���i

PROBLEM: infinite loop do_sparc64_fault with fault_code 2

Reply via email to