Ingo, I tried just removing the CONFIG_PREEMPT_RT code, but that drops packets if another task has the lock. Here's the debug printouts:
<4>xmit_lock_owner owned by sigd not softirq-net-rx/ <4>xmit_lock_owner owned by sigd not softirq-net-rx/ <4>xmit_lock_owner owned by sigd not softirq-net-rx/ Our quality department has been testing the patch below for a few days and has not seen any problems. It pretty much preserves the original -rt patch pieces, but adds recursive checking. I changed xmit_lock_owner to a void * as it is now a pointer to the task which owns the lock. What do you think? Thanks, Mark diff -ur linux-2.6.23-rc1-rt0/include/linux/netdevice.h linux-2.6.23-rc1-rt0_new/include/linux/netdevice.h --- linux-2.6.23-rc1-rt0/include/linux/netdevice.h 2007-07-24 15:17:07.000000000 -0400 +++ linux-2.6.23-rc1-rt0_new/include/linux/netdevice.h 2007-08-01 09:01:32.000000000 -0400 @@ -468,7 +468,11 @@ /* cpu id of processor entered to hard_start_xmit or -1, if nobody entered there. */ - int xmit_lock_owner; +#ifdef CONFIG_PREEMPT_RT + void * xmit_lock_owner; +#else + int xmit_lock_owner; +#endif void *priv; /* pointer to private data */ int (*hard_start_xmit) (struct sk_buff *skb, struct net_device *dev); @@ -1041,32 +1045,54 @@ static inline void netif_tx_lock(struct net_device *dev) { spin_lock(&dev->_xmit_lock); - dev->xmit_lock_owner = raw_smp_processor_id(); +#ifdef CONFIG_PREEMPT_RT + dev->xmit_lock_owner = (void *)current; +#else + dev->xmit_lock_owner = raw_smp_processor_id(); +#endif } static inline void netif_tx_lock_bh(struct net_device *dev) { spin_lock_bh(&dev->_xmit_lock); - dev->xmit_lock_owner = raw_smp_processor_id(); +#ifdef CONFIG_PREEMPT_RT + dev->xmit_lock_owner = (void *)current; +#else + dev->xmit_lock_owner = raw_smp_processor_id(); +#endif } static inline int netif_tx_trylock(struct net_device *dev) { int ok = spin_trylock(&dev->_xmit_lock); if (likely(ok)) - dev->xmit_lock_owner = raw_smp_processor_id(); + { +#ifdef CONFIG_PREEMPT_RT + dev->xmit_lock_owner = (void *)current; +#else + dev->xmit_lock_owner = raw_smp_processor_id(); +#endif + } return ok; } static inline void netif_tx_unlock(struct net_device *dev) { +#ifdef CONFIG_PREEMPT_RT dev->xmit_lock_owner = -1; +#else + dev->xmit_lock_owner = (void *)-1; +#endif spin_unlock(&dev->_xmit_lock); } static inline void netif_tx_unlock_bh(struct net_device *dev) { +#ifdef CONFIG_PREEMPT_RT dev->xmit_lock_owner = -1; +#else + dev->xmit_lock_owner = (void *)-1; +#endif spin_unlock_bh(&dev->_xmit_lock); } diff -ur linux-2.6.23-rc1-rt0/net/core/dev.c linux-2.6.23-rc1-rt0_new/net/core/dev.c --- linux-2.6.23-rc1-rt0/net/core/dev.c 2007-07-24 15:17:07.000000000 -0400 +++ linux-2.6.23-rc1-rt0_new/net/core/dev.c 2007-08-01 08:56:02.000000000 -0400 @@ -1592,7 +1592,7 @@ * No need to check for recursion with threaded interrupts: */ #ifdef CONFIG_PREEMPT_RT - if (1) { + if (dev->xmit_lock_owner != (void *)current) { #else int cpu = raw_smp_processor_id(); /* ok because BHs are off */ diff -ur linux-2.6.23-rc1-rt0/net/sched/sch_generic.c linux-2.6.23-rc1-rt0_new/net/sched/sch_generic.c --- linux-2.6.23-rc1-rt0/net/sched/sch_generic.c 2007-07-24 15:17:07.000000000 -0400 +++ linux-2.6.23-rc1-rt0_new/net/sched/sch_generic.c 2007-08-01 08:57:14.000000000 -0400 @@ -153,7 +153,13 @@ if (!lockless) { #ifdef CONFIG_PREEMPT_RT - netif_tx_lock(dev); + if (dev->xmit_lock_owner == (void *)current) { + kfree_skb(skb); + if (net_ratelimit()) + printk(KERN_DEBUG "Dead loop on netdevice %s, fix it urgently!\n", dev->name); + return -1; + } + netif_tx_lock(dev); #else if (netif_tx_trylock(dev)) /* Another CPU grabbed the driver tx lock */ -----Original Message----- From: Ingo Molnar [mailto:[EMAIL PROTECTED] Sent: Tuesday, July 24, 2007 3:15 PM To: Beauchemin, Mark Cc: Thomas Gleixner; linux-kernel@vger.kernel.org; David Miller Subject: Re: [PATCH -rt] Preemption problem in kernel RT Patch * Beauchemin, Mark <[EMAIL PROTECTED]> wrote: > I'm not sure why the check for recursion has been removed. In the > backtrace below, I think it would be caught by this check and not > recursively call the spinlock code. ah ... i think i did it like that because i didnt realize that there would be a recursive call sequence, i was concentrating on recursive locking. incidentally, this code got cleaned up in .23-rc1-rt0, and now it looks quite similar to your suggested fix. Could you double-check that it solves your problem? Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/