On Fri, Apr 15, 2005 at 10:54:22PM +0000, Thomas Graf wrote: > > Another case were it's not locked is upon a deletion of a class where > the class deletes its inner qdisc, although there is only one case > how this can happen and that's under rtnl semaphore so there is no > way we can have a dumper at the same time.
Sorry, that's where tc went astray :) The assumption that the rtnl is held during dumping is false. It is only true for the initial dump call. All subsequent dumps are not protected by the rtnl. The solution is certainly not to place the dumpers under rtnl :) The dump operation is read-only and should not need any exclusive locks. In fact the whole qdisc locking is a mess. It's a cross-breed between locking with ad-hoc reference counting and RCU. What's more, the RCU is completely useless too because for the case where we actually have a queue we still end up taking the spin lock on each transmit! I think someone's been benchmarking the loopback device again :) It needs to be reengineered. Here is a quick'n'dirty fix to the problem at hand. What happened between 2.6.10-rc1 and 2.6.10-rc2 is that qdisc_destroy started changing the next pointer of qdisc entries which totally confuses the readers because qdisc_destroy doesn't always take the tree lock. This patch tries to ensure that all top-level calls to qdisc_destroy come under the tree lock. As Thomas correctedly pointed out, most of the other qdisc_destroy calls occur after the top qdisc has been unlinked from the device qdisc_list. However, someone should go through each one of the remaining ones (they're all in the individual sch_* implementations) and make sure that this assumption is really true. Signed-off-by: Herbert Xu <[EMAIL PROTECTED]> If anyone has cycles to spare and a stomach strong enough for this stuff, here is your chance :) Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
===== net/sched/sch_api.c 1.47 vs edited ===== --- 1.47/net/sched/sch_api.c 2005-04-01 14:35:56 +10:00 +++ edited/net/sched/sch_api.c 2005-04-16 08:47:16 +10:00 @@ -608,9 +608,9 @@ return err; if (q) { qdisc_notify(skb, n, clid, q, NULL); - spin_lock_bh(&dev->queue_lock); + qdisc_lock_tree(dev); qdisc_destroy(q); - spin_unlock_bh(&dev->queue_lock); + qdisc_unlock_tree(dev); } } else { qdisc_notify(skb, n, clid, NULL, q); @@ -743,17 +743,17 @@ err = qdisc_graft(dev, p, clid, q, &old_q); if (err) { if (q) { - spin_lock_bh(&dev->queue_lock); + qdisc_lock_tree(dev); qdisc_destroy(q); - spin_unlock_bh(&dev->queue_lock); + qdisc_unlock_tree(dev); } return err; } qdisc_notify(skb, n, clid, old_q, q); if (old_q) { - spin_lock_bh(&dev->queue_lock); + qdisc_lock_tree(dev); qdisc_destroy(old_q); - spin_unlock_bh(&dev->queue_lock); + qdisc_unlock_tree(dev); } } return 0;