Hi all,

   I've been banging my head against this all day. I'm working on a kernel
module, attached to the kernel by small hook modifications in
net/ipv4/ip_input.c and net/ipv4/ip_output.c. Specifically, early in
ip_rcv() and ip_output(), neither of which runs in an interrupt context.

   The module has mutex locking in various places, which I've implemented
with rwlock_t types. For now, I've aliased all locks into read-write locks
(most stringent) with write_lock() and write_unlock(). I've defined __SMP__
for the module compile, and I've disassembled the .o files to make sure
that the code is actually going in correctly.

   My module is deadlocking, but not on a mutex. My mutex-locking action is
now a macro:

#define MUTEX_ENTER(x) do { printk("LOCK %p in %s, %d (%x)\n", x, \
        __FUNCTION__, __LINE__, (x)->lock); write_lock(x); \
        printk("DONE. Lock= %x\n", (x)->lock); } while (0)

there's a similar macro for the unlocks.

I actually stuck a bit more in there, to print out big lines of asterisks
whenever it tried to lock a mutex which was held by another thread. In the
tests I did, it appears that I lock solid the first time I get a mutex
contention. This is probably just because one thread is locking up and the
other one keeps running until it hits the mutex.

   In order to simplify the situation some, I put a global mutex around my
entire module. All entry points, including ioctl()s. Only one thread can be
in my module at a time. It still dies.

   I put printk() droppings throughout the module, in addition to the ones
made on every lock and unlock call. Every freeze occurs in a different
place. In some cases, it simply stops making progress in a code block which
contains no subroutine calls (so it appears that we've been pulled out of
here by an interrupt, but never returned after the interrupt to release the
held locks. I still have the problem if I use the write_lock_irq()
functions, though!).

   If I run this module compiled without __SMP__, on a uniprocessor kernel,
it works fine. No obvious memory corruption or infinite loops.
   
   I suspect I'm missing something fundamental here, and I wonder if
anybody can suggest something I haven't done, or something I can try to
narrow down the possibilities.

   I'm running an otherwise unpatched 2.2.14 kernel, with module support
but all services other than my module installed monolithically.
   The machine is a Netfinity 4000R, dual PIII-500. There's an AIC-7881U
SCSI controller, 512 MB of RAM, and Intel 82557 ether controller.

   So, any suggestions on where to go next, please?


-- 
 Christopher Neufeld               [EMAIL PROTECTED]
 Home page:  http://caliban.physics.utoronto.ca/neufeld/
 "Don't edit reality for the sake of simplicity"
-
Linux SMP list: FIRST see FAQ at http://www.irisa.fr/prive/dmentre/smp-howto/
To Unsubscribe: send "unsubscribe linux-smp" to [EMAIL PROTECTED]

Reply via email to