I've been chasing after a bug in 2.4.0-test5 that I can't quite nail
down.  I don't see anything obvious between test5 and test11 that
leads me to believe it's been fixed.

I encountered a lockup on my SMP box.  One CPU got stuck in a spinlock
via the following call trace.  There were enough args and saved
registers on the stack for me to reconstruct a few of the calls:

  valid_swaphandles(entry=c218b268, offset=c68e7e78)
  swapin_readahead(entry=c218b268)
  shm_nopage_core(shp=c218b240, idx=0, address=40014000)
  shm_nopage
  do_no_page
  handle_mm_fault
  do_page_fault
  schedule
  sys_ipc (at call to sys_shmat)

"valid_swaphandles" locked on the:

        swap_device_lock(swapdev)

and it's not surprising it did.  The SWP_TYPE(entry) was swapfile
index 52 on my 2-swapfile system, so it was spinning on some random
piece of memory.

In "shm_nopage", the code

        if(!(shp = shm_lock(inode->i_ino)))
                BUG();

got a "shp" of 0xc218b240.  For some reason, this wasn't a valid
"shp", because in "shm_nopage_core", the

        pte = SHM_ENTRY(shp,idx);  // in our case, shp->shm_dir[0][0]

returned 0xc218b268 (i.e., the value of &shp->shm_dir, so maybe
shp->shm_dir was a pointer to itself---not possible if "shp" pointed
to a valid "struct shmid_kernel").

The SHM locking has thwarted my attempts at understanding.  Maybe
someone else can see the bug or reassure me that it's already been
fixed in test11?

Kevin <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Reply via email to