Re: + gen_estimator-fix-locking-and-timer-related-bugs.patch added to -mm tree
On Tue, Jul 10, 2007 at 03:10:34PM +0200, Jarek Poplawski wrote: > On Tue, Jul 10, 2007 at 02:20:12PM +0200, Patrick McHardy wrote: > > Jarek Poplawski wrote: > > > On Tue, Jul 10, 2007 at 01:09:07PM +0300, Ranko Zivojnovic wrote: > > > > > >>However I decided not to use _rcu based iteration neither the > > >>rcu_read_lock() after going through the RCU documentation and a bunch of > > >>examples in kernel that iterate through the lists using non _rcu macros > > >>and do list_del_rcu() just fine. > > >> > > >>For readability, the reference to list_del_rcu as well as call_rcu, I > > >>believe, should be enough of the indication. Please do correct me if I > > >>am wrong here. > > > > > > > > > It's only my opinion, and it's probably not very popular at least > > > at net/ code, so it's more about general policy and not this > > > particular code. But: > > > - if somebody is looking after some rcu related problems, why can't > > > he/she spare some time by omitting lists without _rcu and not > > > analyzing why/how such lists are used and locked? > > > > > > RCU is used for the read-side, using it on the write-side just makes > > things *less* understandable IMO. It will look like the read-side > > but still do modifications. > > > > From Documentation/RCU/checklist: > > "9. All RCU list-traversal primitives, which include > list_for_each_rcu(), list_for_each_entry_rcu(), > list_for_each_continue_rcu(), and list_for_each_safe_rcu(), > must be within an RCU read-side critical section. RCU > read-side critical sections are delimited by rcu_read_lock() > and rcu_read_unlock(), or by similar primitives such as > rcu_read_lock_bh() and rcu_read_unlock_bh(). ...But, on the other hand, Ranko didn't use any of these primitives... So, first I said he should use this, and than I asked why there was no rcu_read_locks around... I really start to amaze with my consistency. I hope some day you'll forgive me (no hurry, I can wait). Cheers, Jarek P. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: + gen_estimator-fix-locking-and-timer-related-bugs.patch added to -mm tree
On Tue, Jul 10, 2007 at 02:20:12PM +0200, Patrick McHardy wrote: > Jarek Poplawski wrote: > > On Tue, Jul 10, 2007 at 01:09:07PM +0300, Ranko Zivojnovic wrote: > > > >>However I decided not to use _rcu based iteration neither the > >>rcu_read_lock() after going through the RCU documentation and a bunch of > >>examples in kernel that iterate through the lists using non _rcu macros > >>and do list_del_rcu() just fine. > >> > >>For readability, the reference to list_del_rcu as well as call_rcu, I > >>believe, should be enough of the indication. Please do correct me if I > >>am wrong here. > > > > > > It's only my opinion, and it's probably not very popular at least > > at net/ code, so it's more about general policy and not this > > particular code. But: > > - if somebody is looking after some rcu related problems, why can't > > he/she spare some time by omitting lists without _rcu and not > > analyzing why/how such lists are used and locked? > > > RCU is used for the read-side, using it on the write-side just makes > things *less* understandable IMO. It will look like the read-side > but still do modifications. > >From Documentation/RCU/checklist: "9. All RCU list-traversal primitives, which include list_for_each_rcu(), list_for_each_entry_rcu(), list_for_each_continue_rcu(), and list_for_each_safe_rcu(), must be within an RCU read-side critical section. RCU read-side critical sections are delimited by rcu_read_lock() and rcu_read_unlock(), or by similar primitives such as rcu_read_lock_bh() and rcu_read_unlock_bh(). Use of the _rcu() list-traversal primitives outside of an RCU read-side critical section causes no harm other than a slight performance degradation on Alpha CPUs. It can also be quite helpful in reducing code bloat when common code is shared between readers and updaters." So, it seems, it's a question of taste about what is this bloat. I'd certainly agree with you if the write-side looks like in examples from documentation: there is always a few lines of code with some spinlocks. But here we see no locks and no comments, so each time we have to guess which of the supposed here locks is needed for RCU or maybe for the lists without "rcu". Sorry, for messing here with my private opinions: I really shouldn't start this theme. Jarek P. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: + gen_estimator-fix-locking-and-timer-related-bugs.patch added to -mm tree
Jarek Poplawski wrote: > On Tue, Jul 10, 2007 at 01:09:07PM +0300, Ranko Zivojnovic wrote: > >>However I decided not to use _rcu based iteration neither the >>rcu_read_lock() after going through the RCU documentation and a bunch of >>examples in kernel that iterate through the lists using non _rcu macros >>and do list_del_rcu() just fine. >> >>For readability, the reference to list_del_rcu as well as call_rcu, I >>believe, should be enough of the indication. Please do correct me if I >>am wrong here. > > > It's only my opinion, and it's probably not very popular at least > at net/ code, so it's more about general policy and not this > particular code. But: > - if somebody is looking after some rcu related problems, why can't > he/she spare some time by omitting lists without _rcu and not > analyzing why/how such lists are used and locked? RCU is used for the read-side, using it on the write-side just makes things *less* understandable IMO. It will look like the read-side but still do modifications. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: + gen_estimator-fix-locking-and-timer-related-bugs.patch added to -mm tree
On Tue, Jul 10, 2007 at 01:09:07PM +0300, Ranko Zivojnovic wrote: > On Tue, 2007-07-10 at 09:34 +0200, Jarek Poplawski wrote: > > On Mon, Jul 09, 2007 at 07:43:40PM +0300, Ranko Zivojnovic wrote: > > > On Mon, 2007-07-09 at 15:52 +0200, Patrick McHardy wrote: > > > > Ranko Zivojnovic wrote: > > > > > Patrick, I've taken liberty to try and implement this myself. Attached > > > > > is the whole new > > > > > gen_estimator-fix-locking-and-timer-related-bugs.patch > > > > > that is RCU lists based. Please be kind to review. > > ... > > > > I've some doubts/suggestions too: > > > > > --- a/net/core/gen_estimator.c2007-06-25 02:21:48.0 +0300 > > > +++ b/net/core/gen_estimator.c2007-07-09 19:08:06.801544963 +0300 ... > > > struct gnet_stats_rate_est *rate_est) > > > { > > ... > > > + list_for_each_entry_safe(e, n, &elist[idx].list, list) { > > > > IMHO, at least for readability list_for_each_entry_rcu() is better here. > > Should not hurt - however functionality wise not necessary as the list > is protected by rtnl_mutex from being altered - also, for correctness, > it should be list_for_each_safe_rcu() as there is a list_del_rcu in the > loop... > > However I decided not to use _rcu based iteration neither the > rcu_read_lock() after going through the RCU documentation and a bunch of > examples in kernel that iterate through the lists using non _rcu macros > and do list_del_rcu() just fine. > > For readability, the reference to list_del_rcu as well as call_rcu, I > believe, should be enough of the indication. Please do correct me if I > am wrong here. It's only my opinion, and it's probably not very popular at least at net/ code, so it's more about general policy and not this particular code. But: - if somebody is looking after some rcu related problems, why can't he/she spare some time by omitting lists without _rcu and not analyzing why/how such lists are used and locked? - if some day rcu code should change and needs to know precisely, where all the reads and writes take place (or some automatic tool needs this for checking), and it's no problem to show this now, why leave this for future? > > > > > > + if (e->rate_est != rate_est || e->bstats != bstats) > > > + continue; > > > > > > - kfree(est); > > > - killed++; > > > + list_del_rcu(&e->list); > > > + call_rcu(&e->e_rcu, __gen_kill_estimator); > > > > I think a race is possible here: e.g. a timer could be running > > after return from this function yet, and trying to use *bstats, > > *rate_est and maybe even stats_lock after their destruction. > > > > That will not happen because est_timer protects the loop with > rcu_read_lock() as well as the iteration is done using > list_for_each_entry_rcu(). The destruction callback is deferred and will > happen only after the outermost rcu_read_unlock() is called. Well - that > is at least what the documentation says... I'm not very sure it's really possible, so I hope it'll be checked by somebody more. I don't mean gen_estimator struct here, but gnet_stats_basic and gnet_stats_rate_est structs here, which are allocated elsewhere, and could be destructed just 'after' - if not before: that's the question... E.g. when we are leaving this function, on another cpu a timer could traverse this list and I'm not sure rcu has to show the change fast enough - before freeing these other structs. > > > BTW, I think, rcu_read_lock/unlock are recommended around e.g. > > rcu lists traversals, even if the current implementation doesn't > > use them now. > > I am afraid I do not understand what that the current implementation is > not using right now... The whole concept and the implementation of the > RCU, as I understand it, depends on rcu_read_lock/unlock to protect the > read-side critical sections... > > Please be kind to elaborate... Don't take this very seriously, there are many opinions... Cheers, Jarek P. PS: I think you could read more about this here: http://marc.info/?l=linux-kernel&m=118193431820959&w=2 Especially this: List: linux-kernel Subject:Re: Using RCU with rcu_read_lock()? From: Peter Zijlstra Date: 2007-06-15 19:04:19 Message-ID: 1181934259.3.6.camel () lappy [Download message RAW] On Fri, 2007-06-15 at 15:00 -0400, Dmitry Torokhov wrote: > Hi, > > I have a piece of code that is always called under a spinlock with > interrups disabled. Within that piece of code I iterate through a > list. I have another piece of code that wants to modify that list. I > have 2 options: > > 1) Have the other piece of code acquire the same spinlock > 2) Use RCU > > I don't want to do 1) because the otheir piece of code does not really > care about object owning the spinlock and so acquiring the spinlock is > "not nice". However it is guaranteed that the piece of code that > accesses lock runs atomically with interrupts disabled. So > rcu_read_l
Re: + gen_estimator-fix-locking-and-timer-related-bugs.patch added to -mm tree
On Tue, 2007-07-10 at 09:34 +0200, Jarek Poplawski wrote: > On Mon, Jul 09, 2007 at 07:43:40PM +0300, Ranko Zivojnovic wrote: > > On Mon, 2007-07-09 at 15:52 +0200, Patrick McHardy wrote: > > > Ranko Zivojnovic wrote: > > > > Patrick, I've taken liberty to try and implement this myself. Attached > > > > is the whole new gen_estimator-fix-locking-and-timer-related-bugs.patch > > > > that is RCU lists based. Please be kind to review. > ... > > I've some doubts/suggestions too: > > > --- a/net/core/gen_estimator.c 2007-06-25 02:21:48.0 +0300 > > +++ b/net/core/gen_estimator.c 2007-07-09 19:08:06.801544963 +0300 > ... > > @@ -173,20 +172,24 @@ > > est->last_packets = bstats->packets; > > est->avpps = rate_est->pps<<10; > > > > - est->next = elist[est->interval].list; > > - if (est->next == NULL) { > > - init_timer(&elist[est->interval].timer); > > - elist[est->interval].timer.data = est->interval; > > - elist[est->interval].timer.expires = jiffies + > > ((HZ> - elist[est->interval].timer.function = est_timer; > > - add_timer(&elist[est->interval].timer); > > + if (!elist[idx].timer.function) { > > + INIT_LIST_HEAD(&elist[idx].list); > > + setup_timer(&elist[idx].timer, est_timer, est->interval); > > s/est->interval/idx/ here and below. Agree - will do by final submission. > > > } > > - write_lock_bh(&est_lock); > > - elist[est->interval].list = est; > > - write_unlock_bh(&est_lock); > > + > > + if (list_empty(&elist[est->interval].list)) > > + mod_timer(&elist[idx].timer, jiffies + ((HZ< > + > > + list_add_rcu(&est->list, &elist[idx].list); > > return 0; > > } > > > > +static void __gen_kill_estimator(struct rcu_head *head) > > +{ > > + struct gen_estimator *e = container_of(head, struct gen_estimator, > > e_rcu); > > + kfree(e); > > +} > > + > > /** > > * gen_kill_estimator - remove a rate estimator > > * @bstats: basic statistics > > @@ -199,26 +202,21 @@ > > struct gnet_stats_rate_est *rate_est) > > { > ... > > + list_for_each_entry_safe(e, n, &elist[idx].list, list) { > > IMHO, at least for readability list_for_each_entry_rcu() is better here. Should not hurt - however functionality wise not necessary as the list is protected by rtnl_mutex from being altered - also, for correctness, it should be list_for_each_safe_rcu() as there is a list_del_rcu in the loop... However I decided not to use _rcu based iteration neither the rcu_read_lock() after going through the RCU documentation and a bunch of examples in kernel that iterate through the lists using non _rcu macros and do list_del_rcu() just fine. For readability, the reference to list_del_rcu as well as call_rcu, I believe, should be enough of the indication. Please do correct me if I am wrong here. > > > + if (e->rate_est != rate_est || e->bstats != bstats) > > + continue; > > > > - kfree(est); > > - killed++; > > + list_del_rcu(&e->list); > > + call_rcu(&e->e_rcu, __gen_kill_estimator); > > I think a race is possible here: e.g. a timer could be running > after return from this function yet, and trying to use *bstats, > *rate_est and maybe even stats_lock after their destruction. > That will not happen because est_timer protects the loop with rcu_read_lock() as well as the iteration is done using list_for_each_entry_rcu(). The destruction callback is deferred and will happen only after the outermost rcu_read_unlock() is called. Well - that is at least what the documentation says... > BTW, I think, rcu_read_lock/unlock are recommended around e.g. > rcu lists traversals, even if the current implementation doesn't > use them now. I am afraid I do not understand what that the current implementation is not using right now... The whole concept and the implementation of the RCU, as I understand it, depends on rcu_read_lock/unlock to protect the read-side critical sections... Please be kind to elaborate... R. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: + gen_estimator-fix-locking-and-timer-related-bugs.patch added to -mm tree
On Mon, Jul 09, 2007 at 07:43:40PM +0300, Ranko Zivojnovic wrote: > On Mon, 2007-07-09 at 15:52 +0200, Patrick McHardy wrote: > > Ranko Zivojnovic wrote: > > > Patrick, I've taken liberty to try and implement this myself. Attached > > > is the whole new gen_estimator-fix-locking-and-timer-related-bugs.patch > > > that is RCU lists based. Please be kind to review. ... I've some doubts/suggestions too: > --- a/net/core/gen_estimator.c2007-06-25 02:21:48.0 +0300 > +++ b/net/core/gen_estimator.c2007-07-09 19:08:06.801544963 +0300 ... > @@ -173,20 +172,24 @@ > est->last_packets = bstats->packets; > est->avpps = rate_est->pps<<10; > > - est->next = elist[est->interval].list; > - if (est->next == NULL) { > - init_timer(&elist[est->interval].timer); > - elist[est->interval].timer.data = est->interval; > - elist[est->interval].timer.expires = jiffies + > ((HZ- elist[est->interval].timer.function = est_timer; > - add_timer(&elist[est->interval].timer); > + if (!elist[idx].timer.function) { > + INIT_LIST_HEAD(&elist[idx].list); > + setup_timer(&elist[idx].timer, est_timer, est->interval); s/est->interval/idx/ here and below. > } > - write_lock_bh(&est_lock); > - elist[est->interval].list = est; > - write_unlock_bh(&est_lock); > + > + if (list_empty(&elist[est->interval].list)) > + mod_timer(&elist[idx].timer, jiffies + ((HZ< + > + list_add_rcu(&est->list, &elist[idx].list); > return 0; > } > > +static void __gen_kill_estimator(struct rcu_head *head) > +{ > + struct gen_estimator *e = container_of(head, struct gen_estimator, > e_rcu); > + kfree(e); > +} > + > /** > * gen_kill_estimator - remove a rate estimator > * @bstats: basic statistics > @@ -199,26 +202,21 @@ > struct gnet_stats_rate_est *rate_est) > { ... > + list_for_each_entry_safe(e, n, &elist[idx].list, list) { IMHO, at least for readability list_for_each_entry_rcu() is better here. > + if (e->rate_est != rate_est || e->bstats != bstats) > + continue; > > - kfree(est); > - killed++; > + list_del_rcu(&e->list); > + call_rcu(&e->e_rcu, __gen_kill_estimator); I think a race is possible here: e.g. a timer could be running after return from this function yet, and trying to use *bstats, *rate_est and maybe even stats_lock after their destruction. BTW, I think, rcu_read_lock/unlock are recommended around e.g. rcu lists traversals, even if the current implementation doesn't use them now. Regards, Jarek P. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: + gen_estimator-fix-locking-and-timer-related-bugs.patch added to -mm tree
Ranko Zivojnovic wrote: > On Mon, 2007-07-09 at 15:52 +0200, Patrick McHardy wrote: > >>You could also use synchronize_rcu(), estimator destruction is >>not particulary performance critical. > > > I've tried synchronize_rcu(), but it went kaboom with the below, thus > call_rcu() must stay: > > ---cut--- > BUG: sleeping function called from invalid context at kernel/sched.c:3928 > in_atomic():1, irqs_disabled():0 Right, I forgot that we're holding some locks. > [...] > All the changes above now included in the attached patch. I've clean > compiled it with 2.6.22 and now testing this version. Looks good. Please resend with a Signed-off-by: line once you're done testing so Dave can apply it (ideally with a new subject so it doesn't get missed). - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: + gen_estimator-fix-locking-and-timer-related-bugs.patch added to -mm tree
On Mon, 2007-07-09 at 15:52 +0200, Patrick McHardy wrote: > Ranko Zivojnovic wrote: > > Patrick, I've taken liberty to try and implement this myself. Attached > > is the whole new gen_estimator-fix-locking-and-timer-related-bugs.patch > > that is RCU lists based. Please be kind to review. > > Thanks for taking care of this, I'm a bit behind. Glad to be able to help. > See below for comments .. > > > --- a/net/core/gen_estimator.c 2007-06-25 02:21:48.0 +0300 > > +++ b/net/core/gen_estimator.c 2007-07-09 14:27:12.053336875 +0300 > > @@ -79,7 +79,7 @@ > > > > struct gen_estimator > > { > > - struct gen_estimator*next; > > + struct list_headlist; > > struct gnet_stats_basic *bstats; > > struct gnet_stats_rate_est *rate_est; > > spinlock_t *stats_lock; > > @@ -89,26 +89,27 @@ > > u32 last_packets; > > u32 avpps; > > u32 avbps; > > + struct rcu_head e_rcu; > > > You could also use synchronize_rcu(), estimator destruction is > not particulary performance critical. I've tried synchronize_rcu(), but it went kaboom with the below, thus call_rcu() must stay: ---cut--- BUG: sleeping function called from invalid context at kernel/sched.c:3928 in_atomic():1, irqs_disabled():0 3 locks held by tc/2933: #0: (rtnl_mutex){--..}, at: [] rtnetlink_rcv+0x18/0x42 #1: (&dev->queue_lock){-+..}, at: [] qdisc_lock_tree+0xe/0x1c #2: (&dev->ingress_lock){-...}, at: [] tc_get_qdisc+0x192/0x1e9 [] wait_for_completion+0x1a/0xb7 [] __spin_lock_init+0x29/0x4b [] synchronize_rcu+0x2a/0x2f [] wakeme_after_rcu+0x0/0x8 [] gen_kill_estimator+0x70/0x9b [] htb_destroy_class+0x27/0x12f [sch_htb] [] htb_destroy+0x34/0x70 [sch_htb] [] qdisc_destroy+0x44/0x69 . . . ---cut--- > > static struct gen_estimator_head elist[EST_MAX_INTERVAL+1]; > > > > /* Estimator array lock */ > > -static DEFINE_RWLOCK(est_lock); > > +static DEFINE_SPINLOCK(est_lock); > > > The lock isn't needed anymore since we can rely on the rtnl > for creation/destruction. The 'est_lock' now completely removed. > > /** > > @@ -152,6 +153,7 @@ > > { > > struct gen_estimator *est; > > struct gnet_estimator *parm = RTA_DATA(opt); > > + int idx; > > > > if (RTA_PAYLOAD(opt) < sizeof(*parm)) > > return -EINVAL; > > @@ -163,7 +165,8 @@ > > if (est == NULL) > > return -ENOBUFS; > > > > - est->interval = parm->interval + 2; > > + INIT_LIST_HEAD(&est->list); > > > Initializing the members' list_head isn't necessary. > Removed. All the changes above now included in the attached patch. I've clean compiled it with 2.6.22 and now testing this version. R. --- a/net/core/gen_estimator.c 2007-06-25 02:21:48.0 +0300 +++ b/net/core/gen_estimator.c 2007-07-09 19:08:06.801544963 +0300 @@ -79,7 +79,7 @@ struct gen_estimator { - struct gen_estimator *next; + struct list_head list; struct gnet_stats_basic *bstats; struct gnet_stats_rate_est *rate_est; spinlock_t *stats_lock; @@ -89,26 +89,24 @@ u32 last_packets; u32 avpps; u32 avbps; + struct rcu_head e_rcu; }; struct gen_estimator_head { struct timer_list timer; - struct gen_estimator *list; + struct list_head list; }; static struct gen_estimator_head elist[EST_MAX_INTERVAL+1]; -/* Estimator array lock */ -static DEFINE_RWLOCK(est_lock); - static void est_timer(unsigned long arg) { int idx = (int)arg; struct gen_estimator *e; - read_lock(&est_lock); - for (e = elist[idx].list; e; e = e->next) { + rcu_read_lock(); + list_for_each_entry_rcu(e, &elist[idx].list, list) { u64 nbytes; u32 npackets; u32 rate; @@ -127,9 +125,9 @@ e->rate_est->pps = (e->avpps+0x1FF)>>10; spin_unlock(e->stats_lock); } - - mod_timer(&elist[idx].timer, jiffies + ((HZinterval + 2; + est->interval = idx = parm->interval + 2; est->bstats = bstats; est->rate_est = rate_est; est->stats_lock = stats_lock; @@ -173,20 +172,24 @@ est->last_packets = bstats->packets; est->avpps = rate_est->pps<<10; - est->next = elist[est->interval].list; - if (est->next == NULL) { - init_timer(&elist[est->interval].timer); - elist[est->interval].timer.data = est->interval; - elist[est->interval].timer.expires = jiffies + ((HZ interval].timer.function = est_timer; - add_timer(&elist[est->interval].timer); + if (!elist[idx].timer.function) { + INIT_LIST_HEAD(&elist[idx].list); + setup_timer(&elist[idx].timer, est_timer, est->interval); } - write_lock_bh(&est_lock); - elist[est->interval].list = est; - write_unlock_bh(&est_lock); + + if (list_empty(&elist[est->interval].list)) + mod_timer(&elist[idx].timer, jiffies + ((HZ
Re: + gen_estimator-fix-locking-and-timer-related-bugs.patch added to -mm tree
Ranko Zivojnovic wrote: > On Sat, 2007-07-07 at 17:10 +0200, Patrick McHardy wrote: > >>On Sat, 7 Jul 2007, Ranko Zivojnovic wrote: >> >>>Maybe the appropriate way to fix this would to call gen_kill_estimator, >>>with the appropriate lock order, before the call to qdisc_destroy, so >>>when dev->queue_lock is taken for qdisc_destroy - the structure is >>>already off the list. >> >>Probably easier to just kill est_lock and use rcu lists. >>I'm currently travelling, I'll look into it tomorrow. > > > Patrick, I've taken liberty to try and implement this myself. Attached > is the whole new gen_estimator-fix-locking-and-timer-related-bugs.patch > that is RCU lists based. Please be kind to review. Thanks for taking care of this, I'm a bit behind. See below for comments .. > --- a/net/core/gen_estimator.c2007-06-25 02:21:48.0 +0300 > +++ b/net/core/gen_estimator.c2007-07-09 14:27:12.053336875 +0300 > @@ -79,7 +79,7 @@ > > struct gen_estimator > { > - struct gen_estimator*next; > + struct list_headlist; > struct gnet_stats_basic *bstats; > struct gnet_stats_rate_est *rate_est; > spinlock_t *stats_lock; > @@ -89,26 +89,27 @@ > u32 last_packets; > u32 avpps; > u32 avbps; > + struct rcu_head e_rcu; You could also use synchronize_rcu(), estimator destruction is not particulary performance critical. > static struct gen_estimator_head elist[EST_MAX_INTERVAL+1]; > > /* Estimator array lock */ > -static DEFINE_RWLOCK(est_lock); > +static DEFINE_SPINLOCK(est_lock); The lock isn't needed anymore since we can rely on the rtnl for creation/destruction. > /** > @@ -152,6 +153,7 @@ > { > struct gen_estimator *est; > struct gnet_estimator *parm = RTA_DATA(opt); > + int idx; > > if (RTA_PAYLOAD(opt) < sizeof(*parm)) > return -EINVAL; > @@ -163,7 +165,8 @@ > if (est == NULL) > return -ENOBUFS; > > - est->interval = parm->interval + 2; > + INIT_LIST_HEAD(&est->list); Initializing the members' list_head isn't necessary. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: + gen_estimator-fix-locking-and-timer-related-bugs.patch added to -mm tree
On Sat, 2007-07-07 at 17:10 +0200, Patrick McHardy wrote: > On Sat, 7 Jul 2007, Ranko Zivojnovic wrote: > > Maybe the appropriate way to fix this would to call gen_kill_estimator, > > with the appropriate lock order, before the call to qdisc_destroy, so > > when dev->queue_lock is taken for qdisc_destroy - the structure is > > already off the list. > > Probably easier to just kill est_lock and use rcu lists. > I'm currently travelling, I'll look into it tomorrow. Patrick, I've taken liberty to try and implement this myself. Attached is the whole new gen_estimator-fix-locking-and-timer-related-bugs.patch that is RCU lists based. Please be kind to review. I've just compiled it against 2.6.22 together with sch_htb patch and I am currently testing it. I will let it run until tomorrow or until it crashes. R. --- a/net/core/gen_estimator.c 2007-06-25 02:21:48.0 +0300 +++ b/net/core/gen_estimator.c 2007-07-09 14:27:12.053336875 +0300 @@ -79,7 +79,7 @@ struct gen_estimator { - struct gen_estimator *next; + struct list_head list; struct gnet_stats_basic *bstats; struct gnet_stats_rate_est *rate_est; spinlock_t *stats_lock; @@ -89,26 +89,27 @@ u32 last_packets; u32 avpps; u32 avbps; + struct rcu_head e_rcu; }; struct gen_estimator_head { struct timer_list timer; - struct gen_estimator *list; + struct list_head list; }; static struct gen_estimator_head elist[EST_MAX_INTERVAL+1]; /* Estimator array lock */ -static DEFINE_RWLOCK(est_lock); +static DEFINE_SPINLOCK(est_lock); static void est_timer(unsigned long arg) { int idx = (int)arg; struct gen_estimator *e; - read_lock(&est_lock); - for (e = elist[idx].list; e; e = e->next) { + rcu_read_lock(); + list_for_each_entry_rcu(e, &elist[idx].list, list) { u64 nbytes; u32 npackets; u32 rate; @@ -127,9 +128,9 @@ e->rate_est->pps = (e->avpps+0x1FF)>>10; spin_unlock(e->stats_lock); } - - mod_timer(&elist[idx].timer, jiffies + ((HZinterval + 2; + INIT_LIST_HEAD(&est->list); + est->interval = idx = parm->interval + 2; est->bstats = bstats; est->rate_est = rate_est; est->stats_lock = stats_lock; @@ -173,20 +176,26 @@ est->last_packets = bstats->packets; est->avpps = rate_est->pps<<10; - est->next = elist[est->interval].list; - if (est->next == NULL) { - init_timer(&elist[est->interval].timer); - elist[est->interval].timer.data = est->interval; - elist[est->interval].timer.expires = jiffies + ((HZ interval].timer.function = est_timer; - add_timer(&elist[est->interval].timer); + spin_lock_bh(&est_lock); + if (!elist[idx].timer.function) { + INIT_LIST_HEAD(&elist[idx].list); + setup_timer(&elist[idx].timer, est_timer, est->interval); } - write_lock_bh(&est_lock); - elist[est->interval].list = est; - write_unlock_bh(&est_lock); + + if (list_empty(&elist[est->interval].list)) + mod_timer(&elist[idx].timer, jiffies + ((HZ rate_est != rate_est || est->bstats != bstats) { -pest = &est->next; -continue; - } - write_lock_bh(&est_lock); - *pest = est->next; - write_unlock_bh(&est_lock); + /* Skip non initialized indexes */ + if (!elist[idx].timer.function) + continue; + + list_for_each_entry_safe(e, n, &elist[idx].list, list) { + if (e->rate_est != rate_est || e->bstats != bstats) +continue; - kfree(est); - killed++; + spin_lock_bh(&est_lock); + list_del_rcu(&e->list); + spin_unlock_bh(&est_lock); + call_rcu(&e->e_rcu, __gen_kill_estimator); } - if (killed && elist[idx].list == NULL) - del_timer(&elist[idx].timer); } }
Re: + gen_estimator-fix-locking-and-timer-related-bugs.patch added to -mm tree
On Fri, Jul 06, 2007 at 04:16:18PM +0300, Ranko Zivojnovic wrote: > On Fri, 2007-07-06 at 14:47 +0200, Jarek Poplawski wrote: > > On Fri, Jul 06, 2007 at 08:45:23AM +0200, Jarek Poplawski wrote: > > > On Fri, Jul 06, 2007 at 09:08:43AM +0300, Ranko Zivojnovic wrote: > > ... > > > > In order to get that parameter out of the way - I will make the same > > > > test on a real machine. > > > > BTW, maybe it would be better to try with something more stable than -mm. > > That would be the next thing I was to try. Suggestion? Of course, -mm needs testing too, and it's probably easier to find bugs here, but it's hard to check fixes: you never know if the new bugs are caused by the fix or the fix only uncovered them. So, last stable or -rc versions (at least with higher numbers) should make things faster. I've looked a bit on this first log, and it's very strange, especially stats of cpu#0. Maybe it's really about vmware or maybe there is something more, but it doesn't seem this stats_lock deadlock could've done all this mess. Anyway, it would be interesting to repeat such test with -mm after checking e.g. with 2.6.22. Thanks, Jarek P. PS: Sorry for late responding. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: + gen_estimator-fix-locking-and-timer-related-bugs.patch added to -mm tree
On Sat, Jul 07, 2007 at 05:10:54PM +0200, Patrick McHardy wrote: > On Sat, 7 Jul 2007, Ranko Zivojnovic wrote: > > >>On Fri, 2007-07-06 at 16:21 +0200, Patrick McHardy wrote: > >>>There is at least one ABBA deadlock, est_timer does: > >>> > >>>read_lock(&est_lock) > >>>spin_lock(e->stats_lock) (which is dev->queue_lock) > >>> > >>>and qdisc_destroy calls htb_destroy under dev->queue_lock, which > >>>calls htb_destroy_class, then gen_kill_estimator and this > >>>write_locks est_lock. > >>> > >>>I can't see the problem above though, the qdisc_run path only takes > >>>dev->queue_lock. Please enable lockdep and post the output if any. > >> > > > >I've got both code paths this time. It shows exactly the ABBA deadlock > >you describe above. The details are below. > > > Thanks. I'm still wondering whats causing the other lockup > you're seeing. > > >Maybe the appropriate way to fix this would to call gen_kill_estimator, > >with the appropriate lock order, before the call to qdisc_destroy, so > >when dev->queue_lock is taken for qdisc_destroy - the structure is > >already off the list. > > > Probably easier to just kill est_lock and use rcu lists. > I'm currently travelling, I'll look into it tomorrow. > Congratulations to both of you! Ranko, it looks like you have some licence to kill! (I hope bugs only...) Cheers, Jarek P. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: + gen_estimator-fix-locking-and-timer-related-bugs.patch added to -mm tree
On Sat, 7 Jul 2007, Ranko Zivojnovic wrote: On Fri, 2007-07-06 at 16:21 +0200, Patrick McHardy wrote: There is at least one ABBA deadlock, est_timer does: read_lock(&est_lock) spin_lock(e->stats_lock) (which is dev->queue_lock) and qdisc_destroy calls htb_destroy under dev->queue_lock, which calls htb_destroy_class, then gen_kill_estimator and this write_locks est_lock. I can't see the problem above though, the qdisc_run path only takes dev->queue_lock. Please enable lockdep and post the output if any. I've got both code paths this time. It shows exactly the ABBA deadlock you describe above. The details are below. Thanks. I'm still wondering whats causing the other lockup you're seeing. Maybe the appropriate way to fix this would to call gen_kill_estimator, with the appropriate lock order, before the call to qdisc_destroy, so when dev->queue_lock is taken for qdisc_destroy - the structure is already off the list. Probably easier to just kill est_lock and use rcu lists. I'm currently travelling, I'll look into it tomorrow. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: + gen_estimator-fix-locking-and-timer-related-bugs.patch added to -mm tree
On Fri, 2007-07-06 at 17:55 +0300, Ranko Zivojnovic wrote: > On Fri, 2007-07-06 at 16:21 +0200, Patrick McHardy wrote: > > Ranko Zivojnovic wrote: > > > BUG: spinlock lockup on CPU#0, swapper/0, c03eff80 > > > [] _raw_spin_lock+0x108/0x13c > > > [] __qdisc_run+0x97/0x1b0 > > > [] qdisc_watchdog+0x19/0x58 > > > [] __lock_text_start+0x37/0x43 > > > [] qdisc_watchdog+0x56/0x58 > > > [] qdisc_watchdog+0x0/0x58 > > > [] run_hrtimer_softirq+0x58/0xb5 > > > [...] > > > > > BUG: spinlock lockup on CPU#1, swapper/0, c03eff80 > > > [] _raw_spin_lock+0x108/0x13c > > > [] est_timer+0x53/0x148 > > > [] run_timer_softirq+0x30/0x184 > > > [] run_timer_softirq+0x121/0x184 > > > [] __do_softirq+0x66/0xf3 > > > [] est_timer+0x0/0x148 > > > [...] > > > > > > There is at least one ABBA deadlock, est_timer does: > > > > read_lock(&est_lock) > > spin_lock(e->stats_lock) (which is dev->queue_lock) > > > > and qdisc_destroy calls htb_destroy under dev->queue_lock, which > > calls htb_destroy_class, then gen_kill_estimator and this > > write_locks est_lock. > > > > I can't see the problem above though, the qdisc_run path only takes > > dev->queue_lock. Please enable lockdep and post the output if any. > I've got both code paths this time. It shows exactly the ABBA deadlock you describe above. The details are below. Maybe the appropriate way to fix this would to call gen_kill_estimator, with the appropriate lock order, before the call to qdisc_destroy, so when dev->queue_lock is taken for qdisc_destroy - the structure is already off the list. -LOG BUG: spinlock lockup on CPU#2, ping/27868, c03eff80 [] _raw_spin_lock+0x108/0x13c [] est_timer+0x53/0x148 [] run_timer_softirq+0x121/0x184 [] __do_softirq+0x66/0xf3 [] est_timer+0x0/0x148 [] __do_softirq+0x7e/0xf3 [] do_softirq+0x56/0x58 [] smp_apic_timer_interrupt+0x5a/0x85 [] apic_timer_interrupt+0x29/0x38 [] apic_timer_interrupt+0x33/0x38 [] local_bh_enable+0x94/0x13b [] dev_queue_xmit+0x95/0x2d5 [] ip_output+0x193/0x32a [] ip_finish_output+0x0/0x29e [] ip_push_pending_frames+0x27f/0x46b [] dst_output+0x0/0x7 [] raw_sendmsg+0x70b/0x7f2 [] inet_sendmsg+0x2b/0x49 [] sock_sendmsg+0xe2/0xfd [] autoremove_wake_function+0x0/0x37 [] autoremove_wake_function+0x0/0x37 [] enqueue_entity+0x139/0x4f8 [] copy_from_user+0x2d/0x59 [] sys_sendmsg+0x12d/0x243 [] __lock_acquire+0x825/0x1002 [] __lock_acquire+0x825/0x1002 [] scheduler_tick+0x1a7/0x20e [] _spin_unlock_irq+0x20/0x23 [] trace_hardirqs_on+0x73/0x147 [] run_timer_softirq+0x30/0x184 [] _spin_unlock_irq+0x20/0x23 [] sys_socketcall+0x24f/0x271 [] trace_hardirqs_on+0xab/0x147 [] copy_to_user+0x2f/0x49 [] sysenter_past_esp+0x8f/0x99 [] sysenter_past_esp+0x5f/0x99 === And here is the ABBA deadlock: ---cut--- SysRq : Show Locks Held Showing all locks held in the system: snip 3 locks held by ping/27868: #0: (sk_lock-AF_INET){--..}, at: [] raw_sendmsg+0x676/0x7f2 #1: (est_lock#2){-.-+}, at: [] est_timer+0x15/0x148 #2: (&dev->queue_lock){-+..}, at: [] est_timer+0x53/0x148 snip 8 locks held by tc/2278: #0: (rtnl_mutex){--..}, at: [] rtnetlink_rcv+0x18/0x42 #1: (&dev->queue_lock){-+..}, at: [] qdisc_lock_tree+0xe/0x1c #2: (&dev->ingress_lock){-...}, at: [] tc_get_qdisc+0x192/0x1e9 #3: (est_lock#2){-.-+}, at: [] gen_kill_estimator+0x58/0x6f #4: (&irq_lists[i].lock){++..}, at: [] serial8250_interrupt+0x14/0x132 #5: (&port_lock_key){++..}, at: [] serial8250_interrupt+0x62/0x132 #6: (sysrq_key_table_lock){+...}, at: [] __handle_sysrq+0x17/0x115 #7: (tasklist_lock){..-?}, at: [] debug_show_all_locks+0x2e/0x15e snip ---cut--- As well as 'tc' stack: ---cut--- SysRq : Show Regs Pid: 2278, comm: tc EIP: 0060:[] CPU: 0 EIP is at __write_lock_failed+0xf/0x1c EFLAGS: 0287Not tainted (2.6.22-rc6-mm1.SNET.Thors.htbpatch.2.lockdebug #1) EAX: c03f5968 EBX: c03f5968 ECX: EDX: 0004 ESI: c9852840 EDI: c85eae24 EBP: c06aaa60 DS: 007b ES: 007b FS: 00d8 CR0: 8005003b CR2: 008ba828 CR3: 11841000 CR4: 06d0 DR0: DR1: DR2: DR3: DR6: 0ff0 DR7: 0400 [] _raw_write_lock+0x32/0x6e [] gen_kill_estimator+0x58/0x6f [] htb_destroy_class+0x27/0x12f [sch_htb] [] htb_destroy+0x34/0x70 [sch_htb] [] qdisc_destroy+0x52/0x8d [] trace_hardirqs_on+0x73/0x147 [] htb_destroy_class+0xd2/0x12f [sch_htb] [] htb_destroy+0x34/0x70 [sch_htb] [] qdisc_destroy+0x52/0x8d [] tc_get_qdisc+0x19b/0x1e9 [] tc_get_qdisc+0x0/0x1e9 [] rtnetlink_rcv_msg+0x1c2/0x1f5 [] netlink_run_queue+0x96/0xfd [] rtnetlink_rcv_msg+0x0/0x1f5 [] rtnetlink_rcv+0x26/0x42 [] netlink_data_ready+0x12/0x54 [] netlink_sendskb+0x1f/0x53 [] netlink_sendmsg+0x1f5/0x2d4 [] sock_sendmsg+0xe2/0xfd [] autoremove_wake_function+0x0/0x37 [] __lock_acquire+0x825/0x1002 [] sock_sendmsg+0xe2/0xfd [] copy_from_user+0x2d/0x59 [] sys_sendmsg+0x12d/0x243 [] __do_fault+0x12b/0x38b [] __do_fau
Re: + gen_estimator-fix-locking-and-timer-related-bugs.patch added to -mm tree
On Fri, 2007-07-06 at 16:21 +0200, Patrick McHardy wrote: > Ranko Zivojnovic wrote: > > BUG: spinlock lockup on CPU#0, swapper/0, c03eff80 > > [] _raw_spin_lock+0x108/0x13c > > [] __qdisc_run+0x97/0x1b0 > > [] qdisc_watchdog+0x19/0x58 > > [] __lock_text_start+0x37/0x43 > > [] qdisc_watchdog+0x56/0x58 > > [] qdisc_watchdog+0x0/0x58 > > [] run_hrtimer_softirq+0x58/0xb5 > > [...] > > > BUG: spinlock lockup on CPU#1, swapper/0, c03eff80 > > [] _raw_spin_lock+0x108/0x13c > > [] est_timer+0x53/0x148 > > [] run_timer_softirq+0x30/0x184 > > [] run_timer_softirq+0x121/0x184 > > [] __do_softirq+0x66/0xf3 > > [] est_timer+0x0/0x148 > > [...] > > > There is at least one ABBA deadlock, est_timer does: > > read_lock(&est_lock) > spin_lock(e->stats_lock) (which is dev->queue_lock) > > and qdisc_destroy calls htb_destroy under dev->queue_lock, which > calls htb_destroy_class, then gen_kill_estimator and this > write_locks est_lock. > > I can't see the problem above though, the qdisc_run path only takes > dev->queue_lock. Please enable lockdep and post the output if any. That was with lockdep already enabled: $ zgrep LOCKDEP /proc/config.gz CONFIG_LOCKDEP_SUPPORT=y CONFIG_LOCKDEP=y # CONFIG_DEBUG_LOCKDEP is not set also with: CONFIG_PROVE_LOCKING=y CONFIG_DEBUG_LOCK_ALLOC=y that Jarek suggested, and including: CONFIG_DETECT_SOFTLOCKUP=y CONFIG_DEBUG_SPINLOCK=y I will give it another spin as it is, just to see if it will show a different path - and then I will try 2.6.21.5 with these two patches. R. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: + gen_estimator-fix-locking-and-timer-related-bugs.patch added to -mm tree
Ranko Zivojnovic wrote: > BUG: spinlock lockup on CPU#0, swapper/0, c03eff80 > [] _raw_spin_lock+0x108/0x13c > [] __qdisc_run+0x97/0x1b0 > [] qdisc_watchdog+0x19/0x58 > [] __lock_text_start+0x37/0x43 > [] qdisc_watchdog+0x56/0x58 > [] qdisc_watchdog+0x0/0x58 > [] run_hrtimer_softirq+0x58/0xb5 > [...] > BUG: spinlock lockup on CPU#1, swapper/0, c03eff80 > [] _raw_spin_lock+0x108/0x13c > [] est_timer+0x53/0x148 > [] run_timer_softirq+0x30/0x184 > [] run_timer_softirq+0x121/0x184 > [] __do_softirq+0x66/0xf3 > [] est_timer+0x0/0x148 > [...] There is at least one ABBA deadlock, est_timer does: read_lock(&est_lock) spin_lock(e->stats_lock) (which is dev->queue_lock) and qdisc_destroy calls htb_destroy under dev->queue_lock, which calls htb_destroy_class, then gen_kill_estimator and this write_locks est_lock. I can't see the problem above though, the qdisc_run path only takes dev->queue_lock. Please enable lockdep and post the output if any. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: + gen_estimator-fix-locking-and-timer-related-bugs.patch added to -mm tree
On Fri, 2007-07-06 at 15:27 +0200, Patrick McHardy wrote: > Ranko Zivojnovic wrote: > > Managed to get stuck on a normal machine as well (2.6.22-rc6-mm1 + > > sch_htb patch) ... here's the log: > > > > Can you post the script you're using to reproduce this please? Attached. R. tc-crash.sh Description: application/shellscript
Re: + gen_estimator-fix-locking-and-timer-related-bugs.patch added to -mm tree
Ranko Zivojnovic wrote: > Managed to get stuck on a normal machine as well (2.6.22-rc6-mm1 + > sch_htb patch) ... here's the log: > Can you post the script you're using to reproduce this please? - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: + gen_estimator-fix-locking-and-timer-related-bugs.patch added to -mm tree
On Fri, 2007-07-06 at 14:47 +0200, Jarek Poplawski wrote: > On Fri, Jul 06, 2007 at 08:45:23AM +0200, Jarek Poplawski wrote: > > On Fri, Jul 06, 2007 at 09:08:43AM +0300, Ranko Zivojnovic wrote: > ... > > > In order to get that parameter out of the way - I will make the same > > > test on a real machine. > > BTW, maybe it would be better to try with something more stable than -mm. That would be the next thing I was to try. Suggestion? > There is also one bug invoked by sysreq, but I hope you've started to use > sysreq only after the lockup begun. Yes - it was after the lockup. R. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: + gen_estimator-fix-locking-and-timer-related-bugs.patch added to -mm tree
On Fri, 2007-07-06 at 08:45 +0200, Jarek Poplawski wrote: > On Fri, Jul 06, 2007 at 09:08:43AM +0300, Ranko Zivojnovic wrote: > > On Thu, 2007-07-05 at 18:59 +0300, Ranko Zivojnovic wrote: > > > On Thu, 2007-07-05 at 17:34 +0300, Ranko Zivojnovic wrote: > > > > Anyhow - I am currently running 2.6.22-rc6-mm1 + sch_htb patch and > > > > running a test script that always managed to reproduce the problem > > > > within half hour - so far it looks good, but I will leave it hammering > > > > until tomorrow and will let you know. > > > > > > > > > > Ahm...it managed to get stuck: > > > > > > Relevant piece of the process table ('tc' stays stuck): > > > 9608 pts/0S+ 0:00 /bin/bash ./tc-crash.sh > > > 9609 pts/0D+ 0:00 tc qdisc del dev lo root > > > > > > > Something really weird happened here... > > > > I've left it the way it was (stuck) over night and this morning the > > process was terminated. > > > > It looks like some timer was out of whack. > > > > I'm not sure if it is possibly relevant - but the test I performed was > > done in VMWare. > > Yes, this could be relevant, so it's good to know. > > > > > In order to get that parameter out of the way - I will make the same > > test on a real machine. > > Fine! But, unles you have something to hide, try to reply > on these messages which went through netdev, or cc netdev > at least, because maybe more people would be interested > or even could help to solve this faster. > Nope - nothing to hide :). Managed to get stuck on a normal machine as well (2.6.22-rc6-mm1 + sch_htb patch) ... here's the log: ---CUT--- -- 0:loop-telnet.sh -- time-stamp -- Jul/06/07 14:46:17 -- u32 classifier Performance counters on OLD policer on input device check on -- 0:loop-telnet.sh -- time-stamp -- Jul/06/07 14:48:13 -- -- 0:loop-telnet.sh -- time-stamp -- Jul/06/07 15:38:26 -- BUG: spinlock lockup on CPU#0, swapper/0, c03eff80 [] _raw_spin_lock+0x108/0x13c [] __qdisc_run+0x97/0x1b0 [] qdisc_watchdog+0x19/0x58 [] __lock_text_start+0x37/0x43 [] qdisc_watchdog+0x56/0x58 [] qdisc_watchdog+0x0/0x58 [] run_hrtimer_softirq+0x58/0xb5 [] __do_softirq+0x7e/0xf3 [] do_softirq+0x56/0x58 [] smp_apic_timer_interrupt+0x5a/0x85 [] apic_timer_interrupt+0x29/0x38 [] apic_timer_interrupt+0x33/0x38 [] mwait_idle_with_hints+0x3b/0x3f [] cpu_idle+0x5f/0x74 [] start_kernel+0x318/0x3ae [] unknown_bootoption+0x0/0x258 === BUG: spinlock lockup on CPU#1, swapper/0, c03eff80 [] _raw_spin_lock+0x108/0x13c [] est_timer+0x53/0x148 [] run_timer_softirq+0x30/0x184 [] run_timer_softirq+0x121/0x184 [] __do_softirq+0x66/0xf3 [] est_timer+0x0/0x148 [] __do_softirq+0x7e/0xf3 [] do_softirq+0x56/0x58 [] smp_apic_timer_interrupt+0x5a/0x85 [] apic_timer_interrupt+0x29/0x38 [] apic_timer_interrupt+0x33/0x38 [] mwait_idle_with_hints+0x3b/0x3f [] cpu_idle+0x5f/0x74 === telnet> send break SysRq : HELP : loglevel0-8 reBoot show-all-locks(D) tErm Full kIll saK showMem Nice powerOff showPc show-all-timers(Q) unRaw Sync showTasks Unmount shoW-blocked-tasks telnet> send break SysRq : Show Regs Pid: 0, comm: swapper EIP: 0060:[] CPU: 1 EIP is at _raw_spin_lock+0xa3/0x13c EFLAGS: 0246Not tainted (2.6.22-rc6-mm1.SNET.Thors.htbpatch.2.lockdebug #1) EAX: EBX: c03eff80 ECX: e4f7bf60 EDX: 98b928f8 ESI: 260e7970 EDI: EBP: beb75188 DS: 007b ES: 007b FS: 00d8 CR0: 8005003b CR2: 080f7578 CR3: 03651000 CR4: 06d0 DR0: DR1: DR2: DR3: DR6: 0ff0 DR7: 0400 [] est_timer+0x53/0x148 [] run_timer_softirq+0x30/0x184 [] run_timer_softirq+0x121/0x184 [] __do_softirq+0x66/0xf3 [] est_timer+0x0/0x148 [] __do_softirq+0x7e/0xf3 [] do_softirq+0x56/0x58 [] smp_apic_timer_interrupt+0x5a/0x85 [] apic_timer_interrupt+0x29/0x38 [] apic_timer_interrupt+0x33/0x38 [] mwait_idle_with_hints+0x3b/0x3f [] cpu_idle+0x5f/0x74 === telnet> send break SysRq : Show Blocked State freesibling task PCstack pid father child younger older kjournald D 0343 0 518 2 (L-TLB) c2361e8c 0092 090e4f4b 0343 c2361e74 c2baf920 c236 0002 c2baf920 c21343f0 c21345a8 c1d24080 0002 c040ed80 c2361ec4 c4289280 003252cc c02feb97 c2361ecc Call Trace: [] _spin_unlock_irqrestore+0x34/0x39 [] io_schedule+0x1d/0x27 [] sync_buffer+0x30/0x35 [] __wait_on_bit+0x42/0x5e [] sync_buffer+0x0/0x35 [] sync_buffer+0x0/0x35 [] out_of_line_wait_on_bit+0x63/0x76 [] wake_bit_function+0x0/0x3c [] __wait_on_buffer+0x24/0x29 [] sync_dirty_buffer+0x9c/0xe6 [] journal_get_descriptor_buffer+0x88/0x9a [jbd] [] journal_commit_transaction+0xf2a/0x111f [jbd] [] _spin_unlock_irq+0x20/0x23 [] kjournald+0xac/0x1f4 [jbd] [] autoremove_wake_function+0x0/0x37 [] kjournald+0x0/0x1f4 [jbd] [] kthread+0x34/0
Re: + gen_estimator-fix-locking-and-timer-related-bugs.patch added to -mm tree
On Fri, Jul 06, 2007 at 08:45:23AM +0200, Jarek Poplawski wrote: > On Fri, Jul 06, 2007 at 09:08:43AM +0300, Ranko Zivojnovic wrote: ... > > In order to get that parameter out of the way - I will make the same > > test on a real machine. BTW, maybe it would be better to try with something more stable than -mm. There is also one bug invoked by sysreq, but I hope you've started to use sysreq only after the lockup begun. Jarek P. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: + gen_estimator-fix-locking-and-timer-related-bugs.patch added to -mm tree
On Fri, Jul 06, 2007 at 09:08:43AM +0300, Ranko Zivojnovic wrote: > On Thu, 2007-07-05 at 18:59 +0300, Ranko Zivojnovic wrote: > > On Thu, 2007-07-05 at 17:34 +0300, Ranko Zivojnovic wrote: > > > Anyhow - I am currently running 2.6.22-rc6-mm1 + sch_htb patch and > > > running a test script that always managed to reproduce the problem > > > within half hour - so far it looks good, but I will leave it hammering > > > until tomorrow and will let you know. > > > > > > > Ahm...it managed to get stuck: > > > > Relevant piece of the process table ('tc' stays stuck): > > 9608 pts/0S+ 0:00 /bin/bash ./tc-crash.sh > > 9609 pts/0D+ 0:00 tc qdisc del dev lo root > > > > Something really weird happened here... > > I've left it the way it was (stuck) over night and this morning the > process was terminated. > > It looks like some timer was out of whack. > > I'm not sure if it is possibly relevant - but the test I performed was > done in VMWare. Yes, this could be relevant, so it's good to know. > > In order to get that parameter out of the way - I will make the same > test on a real machine. Fine! But, unles you have something to hide, try to reply on these messages which went through netdev, or cc netdev at least, because maybe more people would be interested or even could help to solve this faster. Cheers, Jarek P. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: + gen_estimator-fix-locking-and-timer-related-bugs.patch added to -mm tree
On Fri, Jul 06, 2007 at 08:14:20AM +0200, Jarek Poplawski wrote: ... > This new lockup bug you have just found needs some time > to figure out. BTW, I wonder if you had lockdep on > (CONFIG_PROVE_LOCKING or CONFIG_LOCK_ALLOC)? Should be: (CONFIG_PROVE_LOCKING or CONFIG_DEBUG_LOCK_ALLOC)? Jarek P. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Fwd: Re: + gen_estimator-fix-locking-and-timer-related-bugs.patch added to -mm tree
- Forwarded message from Ranko Zivojnovic <[EMAIL PROTECTED]> - > Date: Thu, 05 Jul 2007 18:59:25 +0300 > From: Ranko Zivojnovic <[EMAIL PROTECTED]> > Subject: Re: + gen_estimator-fix-locking-and-timer-related-bugs.patch added > to -mm tree > To: Jarek Poplawski <[EMAIL PROTECTED]> > Cc: [EMAIL PROTECTED], [EMAIL PROTECTED] > X-Original-To: info > Delivered-To: [EMAIL PROTECTED] > In-Reply-To: <[EMAIL PROTECTED]> > Organization: SpiderNet Services Public Ltd. > X-Mailer: Evolution 2.8.3 (2.8.3-2.fc6) > X-o2-Trust: 2, 43 > > On Thu, 2007-07-05 at 17:34 +0300, Ranko Zivojnovic wrote: > > Anyhow - I am currently running 2.6.22-rc6-mm1 + sch_htb patch and > > running a test script that always managed to reproduce the problem > > within half hour - so far it looks good, but I will leave it hammering > > until tomorrow and will let you know. > > > > Ahm...it managed to get stuck: > > Relevant piece of the process table ('tc' stays stuck): > 9608 pts/0S+ 0:00 /bin/bash ./tc-crash.sh > 9609 pts/0D+ 0:00 tc qdisc del dev lo root > > There was no soft lockup kernel warning however sysreq 'w' & 'q' dump > the following: > > ---cut--- > SysRq : Show Blocked State > > freesibling > task PCstack pid father child younger older > kjournald D 0531 0 337 2 (L-TLB) >c3083e9c 0046 c4ef3383 0531 c3083e84 c292f7fc c3082000 > c2903388 c29031d0 c2903388 c150a000 c03b5d80 >c29d9480 c2949080 c03ff080 c03ff080 c289e7fc c03ff080 0054b8e6 > Call Trace: > [] io_schedule+0x1e/0x28 > [] sync_buffer+0x34/0x37 > [] __wait_on_bit+0x42/0x5e > [] sync_buffer+0x0/0x37 > [] sync_buffer+0x0/0x37 > [] out_of_line_wait_on_bit+0x63/0x6b > [] wake_bit_function+0x0/0x3c > [] __wait_on_buffer+0x24/0x29 > [] journal_commit_transaction+0x63d/0x10df [jbd] > [] _spin_unlock_irq+0x5/0x7 > [] schedule+0x323/0x867 > [] _spin_lock_irqsave+0x9/0xd > [] lock_timer_base+0x19/0x35 > [] try_to_del_timer_sync+0x47/0x4f > [] kjournald+0xac/0x1f4 [jbd] > [] autoremove_wake_function+0x0/0x37 > [] kjournald+0x0/0x1f4 [jbd] > [] kthread+0x34/0x56 > [] kthread+0x0/0x56 > [] kernel_thread_helper+0x7/0x14 > === > syslogd D 0531 0 1414 1 (NOTLB) >c3045e84 00200082 c4e83c2f 0531 c1506a80 0002 c3044000 >1d5003ae 0552 c29de888 c29de6d0 c29de888 c150a000 0003 >00200082 c2949080 c03ff080 c03ff080 c2ac46b0 c03ff080 0054b8e6 > Call Trace: > [] log_wait_commit+0xaf/0x116 [jbd] > [] autoremove_wake_function+0x0/0x37 > [] journal_stop+0x158/0x1f7 [jbd] > [] __writeback_single_inode+0x267/0x329 > [] generic_writepages+0x20/0x29 > [] sync_inode+0x19/0x2a > [] ext3_sync_file+0xbc/0xc8 [ext3] > [] do_fsync+0x55/0x8a > [] __do_fsync+0x1d/0x2b > [] sysenter_past_esp+0x5f/0x85 > === > irqbalanceD 040E 0 1428 1 (NOTLB) >c304becc 00200082 f5b6412a 040e c304beb4 c34e c304a000 >0001 c2a754c4 c22a2848 c22a2690 c22a2848 c1513000 0001 c03b5d80 >000a c286e480 c03ff080 c03ff080 c03631be c18667e4 c03ff080 003fd3f2 > Call Trace: > [] __mutex_lock_slowpath+0x54/0x8d > [] mutex_lock+0x21/0x26 > [] sock_ioctl+0x0/0x1c6 > [] sock_ioctl+0x0/0x1c6 > [] dev_ioctl+0x20c/0x309 > [] inotify_d_instantiate+0x1b/0x80 > [] d_instantiate+0x56/0x8a > [] sock_ioctl+0x40/0x1c6 > [] sock_ioctl+0x0/0x1c6 > [] do_ioctl+0x1f/0x6d > [] sys_socket+0x29/0x44 > [] vfs_ioctl+0x50/0x26b > [] sys_ioctl+0x5d/0x6c > [] sysenter_past_esp+0x5f/0x85 > === > tcD 0409 0 9609 9608 (NOTLB) >c37a7be8 00200086 ee308e07 0409 c1506a80 0002 c37a6000 >3e416208 0414 c29021e8 c2902030 c29021e8 c150a000 c321c4c4 >c37a7bd0 c19d36c0 c03ff080 c03ff080 c0134d71 00200286 c03ff080 003fe1ed > Call Trace: > [] lock_hrtimer_base+0x15/0x2f > [] wait_for_completion+0x9e/0xb7 > [] default_wake_function+0x0/0xc > [] synchronize_rcu+0x2a/0x2f > [] wakeme_after_rcu+0x0/0x8 > [] dev_deactivate+0x8a/0xa3 > [] dev_graft_qdisc+0x81/0xa4 > [] qdisc_graft+0x28/0x88 > [] tc_get_qdisc+0x15d/0x1e9 > [] tc_get_qdisc+0x0/0x1e9 > [] rtnetlink_rcv_msg+0x1c2/0x1f5 > [] netlink_run_queue+0xa7/0x10e > [] rtnetlink_rcv_msg+0x0/0x1f5 > [] rtnetlink_rcv+0x26/0x42 > [] netlink_data_ready+0x12/0x54 &
Re: + gen_estimator-fix-locking-and-timer-related-bugs.patch added to -mm tree
On Thu, Jul 05, 2007 at 06:59:25PM +0300, Ranko Zivojnovic wrote: > On Thu, 2007-07-05 at 17:34 +0300, Ranko Zivojnovic wrote: > > Anyhow - I am currently running 2.6.22-rc6-mm1 + sch_htb patch and > > running a test script that always managed to reproduce the problem > > within half hour - so far it looks good, but I will leave it hammering > > until tomorrow and will let you know. > > > > Ahm...it managed to get stuck: > > Relevant piece of the process table ('tc' stays stuck): > 9608 pts/0S+ 0:00 /bin/bash ./tc-crash.sh > 9609 pts/0D+ 0:00 tc qdisc del dev lo root > > There was no soft lockup kernel warning however sysreq 'w' & 'q' dump > the following: > > ---cut--- I have to admit you are really good at this testing! Especially after this first htb bug I was almost stunned, how long such repeatable and not very hidden bug can live in one of the most popular networking modules... This new lockup bug you have just found needs some time to figure out. BTW, I wonder if you had lockdep on (CONFIG_PROVE_LOCKING or CONFIG_LOCK_ALLOC)? Thanks & regards, Jarek P. PS: I hope you don't mind I forward this to netdev. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html