Re: + gen_estimator-fix-locking-and-timer-related-bugs.patch added to -mm tree

2007-07-10 Thread Jarek Poplawski
On Tue, Jul 10, 2007 at 03:10:34PM +0200, Jarek Poplawski wrote:
> On Tue, Jul 10, 2007 at 02:20:12PM +0200, Patrick McHardy wrote:
> > Jarek Poplawski wrote:
> > > On Tue, Jul 10, 2007 at 01:09:07PM +0300, Ranko Zivojnovic wrote:
> > > 
> > >>However I decided not to use _rcu based iteration neither the
> > >>rcu_read_lock() after going through the RCU documentation and a bunch of
> > >>examples in kernel that iterate through the lists using non _rcu macros
> > >>and do list_del_rcu() just fine.
> > >>
> > >>For readability, the reference to list_del_rcu as well as call_rcu, I
> > >>believe, should be enough of the indication. Please do correct me if I
> > >>am wrong here.
> > > 
> > > 
> > > It's only my opinion, and it's probably not very popular at least
> > > at net/ code, so it's more about general policy and not this
> > > particular code. But:
> > > - if somebody is looking after some rcu related problems, why can't
> > > he/she spare some time by omitting lists without _rcu and not
> > > analyzing why/how such lists are used and locked?
> > 
> > 
> > RCU is used for the read-side, using it on the write-side just makes
> > things *less* understandable IMO. It will look like the read-side
> > but still do modifications.
> > 
> 
> From Documentation/RCU/checklist:
> 
> "9.  All RCU list-traversal primitives, which include
> list_for_each_rcu(), list_for_each_entry_rcu(),
> list_for_each_continue_rcu(), and list_for_each_safe_rcu(),
> must be within an RCU read-side critical section.  RCU
> read-side critical sections are delimited by rcu_read_lock()
> and rcu_read_unlock(), or by similar primitives such as
> rcu_read_lock_bh() and rcu_read_unlock_bh().

...But, on the other hand, Ranko didn't use any of these primitives...
So, first I said he should use this, and than I asked why there was
no rcu_read_locks around... I really start to amaze with my consistency.

I hope some day you'll forgive me (no hurry, I can wait).

Cheers,
Jarek P. 
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: + gen_estimator-fix-locking-and-timer-related-bugs.patch added to -mm tree

2007-07-10 Thread Jarek Poplawski
On Tue, Jul 10, 2007 at 02:20:12PM +0200, Patrick McHardy wrote:
> Jarek Poplawski wrote:
> > On Tue, Jul 10, 2007 at 01:09:07PM +0300, Ranko Zivojnovic wrote:
> > 
> >>However I decided not to use _rcu based iteration neither the
> >>rcu_read_lock() after going through the RCU documentation and a bunch of
> >>examples in kernel that iterate through the lists using non _rcu macros
> >>and do list_del_rcu() just fine.
> >>
> >>For readability, the reference to list_del_rcu as well as call_rcu, I
> >>believe, should be enough of the indication. Please do correct me if I
> >>am wrong here.
> > 
> > 
> > It's only my opinion, and it's probably not very popular at least
> > at net/ code, so it's more about general policy and not this
> > particular code. But:
> > - if somebody is looking after some rcu related problems, why can't
> > he/she spare some time by omitting lists without _rcu and not
> > analyzing why/how such lists are used and locked?
> 
> 
> RCU is used for the read-side, using it on the write-side just makes
> things *less* understandable IMO. It will look like the read-side
> but still do modifications.
> 

>From Documentation/RCU/checklist:

"9.  All RCU list-traversal primitives, which include
list_for_each_rcu(), list_for_each_entry_rcu(),
list_for_each_continue_rcu(), and list_for_each_safe_rcu(),
must be within an RCU read-side critical section.  RCU
read-side critical sections are delimited by rcu_read_lock()
and rcu_read_unlock(), or by similar primitives such as
rcu_read_lock_bh() and rcu_read_unlock_bh().

Use of the _rcu() list-traversal primitives outside of an
RCU read-side critical section causes no harm other than
a slight performance degradation on Alpha CPUs.  It can
also be quite helpful in reducing code bloat when common
code is shared between readers and updaters."

So, it seems, it's a question of taste about what is this bloat.

I'd certainly agree with you if the write-side looks like in
examples from documentation: there is always a few lines of code
with some spinlocks. But here we see no locks and no comments,
so each time we have to guess which of the supposed here locks
is needed for RCU or maybe for the lists without "rcu".

Sorry, for messing here with my private opinions: I really
shouldn't start this theme.

Jarek P.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: + gen_estimator-fix-locking-and-timer-related-bugs.patch added to -mm tree

2007-07-10 Thread Patrick McHardy
Jarek Poplawski wrote:
> On Tue, Jul 10, 2007 at 01:09:07PM +0300, Ranko Zivojnovic wrote:
> 
>>However I decided not to use _rcu based iteration neither the
>>rcu_read_lock() after going through the RCU documentation and a bunch of
>>examples in kernel that iterate through the lists using non _rcu macros
>>and do list_del_rcu() just fine.
>>
>>For readability, the reference to list_del_rcu as well as call_rcu, I
>>believe, should be enough of the indication. Please do correct me if I
>>am wrong here.
> 
> 
> It's only my opinion, and it's probably not very popular at least
> at net/ code, so it's more about general policy and not this
> particular code. But:
> - if somebody is looking after some rcu related problems, why can't
> he/she spare some time by omitting lists without _rcu and not
> analyzing why/how such lists are used and locked?


RCU is used for the read-side, using it on the write-side just makes
things *less* understandable IMO. It will look like the read-side
but still do modifications.

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: + gen_estimator-fix-locking-and-timer-related-bugs.patch added to -mm tree

2007-07-10 Thread Jarek Poplawski
On Tue, Jul 10, 2007 at 01:09:07PM +0300, Ranko Zivojnovic wrote:
> On Tue, 2007-07-10 at 09:34 +0200, Jarek Poplawski wrote:
> > On Mon, Jul 09, 2007 at 07:43:40PM +0300, Ranko Zivojnovic wrote:
> > > On Mon, 2007-07-09 at 15:52 +0200, Patrick McHardy wrote:
> > > > Ranko Zivojnovic wrote:
> > > > > Patrick, I've taken liberty to try and implement this myself. Attached
> > > > > is the whole new 
> > > > > gen_estimator-fix-locking-and-timer-related-bugs.patch
> > > > > that is RCU lists based. Please be kind to review.
> > ...
> > 
> > I've some doubts/suggestions too:
> > 
> > > --- a/net/core/gen_estimator.c2007-06-25 02:21:48.0 +0300
> > > +++ b/net/core/gen_estimator.c2007-07-09 19:08:06.801544963 +0300
...
> > >   struct gnet_stats_rate_est *rate_est)
> > >  {
> > ...
> > > + list_for_each_entry_safe(e, n, &elist[idx].list, list) {
> > 
> > IMHO, at least for readability list_for_each_entry_rcu() is better here.
> 
> Should not hurt - however functionality wise not necessary as the list
> is protected by rtnl_mutex from being altered - also, for correctness,
> it should be list_for_each_safe_rcu() as there is a list_del_rcu in the
> loop... 
> 
> However I decided not to use _rcu based iteration neither the
> rcu_read_lock() after going through the RCU documentation and a bunch of
> examples in kernel that iterate through the lists using non _rcu macros
> and do list_del_rcu() just fine.
> 
> For readability, the reference to list_del_rcu as well as call_rcu, I
> believe, should be enough of the indication. Please do correct me if I
> am wrong here.

It's only my opinion, and it's probably not very popular at least
at net/ code, so it's more about general policy and not this
particular code. But:
- if somebody is looking after some rcu related problems, why can't
he/she spare some time by omitting lists without _rcu and not
analyzing why/how such lists are used and locked?
- if some day rcu code should change and needs to know precisely,
where all the reads and writes take place (or some automatic tool
needs this for checking), and it's no problem to show this now,
why leave this for future?

> 
> > 
> > > + if (e->rate_est != rate_est || e->bstats != bstats)
> > > + continue;
> > >  
> > > - kfree(est);
> > > - killed++;
> > > + list_del_rcu(&e->list);
> > > + call_rcu(&e->e_rcu, __gen_kill_estimator);
> > 
> > I think a race is possible here: e.g. a timer could be running
> > after return from this function yet, and trying to use *bstats,
> > *rate_est and maybe even stats_lock after their destruction.
> > 
> 
> That will not happen because est_timer protects the loop with
> rcu_read_lock() as well as the iteration is done using
> list_for_each_entry_rcu(). The destruction callback is deferred and will
> happen only after the outermost rcu_read_unlock() is called. Well - that
> is at least what the documentation says...

I'm not very sure it's really possible, so I hope it'll be checked
by somebody more. I don't mean gen_estimator struct here, but
gnet_stats_basic and gnet_stats_rate_est structs here, which are
allocated elsewhere, and could be destructed just 'after' - if not
before: that's the question... E.g. when we are leaving this
function, on another cpu a timer could traverse this list and I'm
not sure rcu has to show the change fast enough - before freeing
these other structs.

> 
> > BTW, I think, rcu_read_lock/unlock are recommended around e.g.
> > rcu lists traversals, even if the current implementation doesn't
> > use them now.
> 
> I am afraid I do not understand what that the current implementation is
> not using right now... The whole concept and the implementation of the
> RCU, as I understand it, depends on rcu_read_lock/unlock to protect the
> read-side critical sections...
> 
> Please be kind to elaborate...

Don't take this very seriously, there are many opinions...

Cheers,
Jarek P.

PS: I think you could read more about this here:

http://marc.info/?l=linux-kernel&m=118193431820959&w=2

Especially this:

List:   linux-kernel
Subject:Re: Using RCU with rcu_read_lock()?
From:   Peter Zijlstra 
Date:   2007-06-15 19:04:19
Message-ID: 1181934259.3.6.camel () lappy
[Download message RAW]

On Fri, 2007-06-15 at 15:00 -0400, Dmitry Torokhov wrote:
> Hi,
> 
> I have a piece of code that is always called under a spinlock with
> interrups disabled. Within that piece of code I iterate through a
> list. I have another piece of code that wants to modify that list. I
> have 2 options:
> 
> 1) Have the other piece of code acquire the same spinlock
> 2) Use RCU
> 
> I don't want to do 1) because the otheir piece of code does not really
> care about object owning the spinlock and so acquiring the spinlock is
> "not nice". However it is guaranteed that the piece of code that
> accesses lock runs atomically with interrupts disabled. So
> rcu_read_l

Re: + gen_estimator-fix-locking-and-timer-related-bugs.patch added to -mm tree

2007-07-10 Thread Ranko Zivojnovic
On Tue, 2007-07-10 at 09:34 +0200, Jarek Poplawski wrote:
> On Mon, Jul 09, 2007 at 07:43:40PM +0300, Ranko Zivojnovic wrote:
> > On Mon, 2007-07-09 at 15:52 +0200, Patrick McHardy wrote:
> > > Ranko Zivojnovic wrote:
> > > > Patrick, I've taken liberty to try and implement this myself. Attached
> > > > is the whole new gen_estimator-fix-locking-and-timer-related-bugs.patch
> > > > that is RCU lists based. Please be kind to review.
> ...
> 
> I've some doubts/suggestions too:
> 
> > --- a/net/core/gen_estimator.c  2007-06-25 02:21:48.0 +0300
> > +++ b/net/core/gen_estimator.c  2007-07-09 19:08:06.801544963 +0300
> ...
> > @@ -173,20 +172,24 @@
> > est->last_packets = bstats->packets;
> > est->avpps = rate_est->pps<<10;
> >  
> > -   est->next = elist[est->interval].list;
> > -   if (est->next == NULL) {
> > -   init_timer(&elist[est->interval].timer);
> > -   elist[est->interval].timer.data = est->interval;
> > -   elist[est->interval].timer.expires = jiffies + 
> > ((HZ > -   elist[est->interval].timer.function = est_timer;
> > -   add_timer(&elist[est->interval].timer);
> > +   if (!elist[idx].timer.function) {
> > +   INIT_LIST_HEAD(&elist[idx].list);
> > +   setup_timer(&elist[idx].timer, est_timer, est->interval);
> 
> s/est->interval/idx/ here and below.

Agree - will do by final submission.

> 
> > }
> > -   write_lock_bh(&est_lock);
> > -   elist[est->interval].list = est;
> > -   write_unlock_bh(&est_lock);
> > +   
> > +   if (list_empty(&elist[est->interval].list))
> > +   mod_timer(&elist[idx].timer, jiffies + ((HZ< > +
> > +   list_add_rcu(&est->list, &elist[idx].list);
> > return 0;
> >  }
> >  
> > +static void __gen_kill_estimator(struct rcu_head *head)
> > +{
> > +   struct gen_estimator *e = container_of(head, struct gen_estimator, 
> > e_rcu);
> > +   kfree(e);
> > +}
> > +
> >  /**
> >   * gen_kill_estimator - remove a rate estimator
> >   * @bstats: basic statistics
> > @@ -199,26 +202,21 @@
> > struct gnet_stats_rate_est *rate_est)
> >  {
> ...
> > +   list_for_each_entry_safe(e, n, &elist[idx].list, list) {
> 
> IMHO, at least for readability list_for_each_entry_rcu() is better here.

Should not hurt - however functionality wise not necessary as the list
is protected by rtnl_mutex from being altered - also, for correctness,
it should be list_for_each_safe_rcu() as there is a list_del_rcu in the
loop... 

However I decided not to use _rcu based iteration neither the
rcu_read_lock() after going through the RCU documentation and a bunch of
examples in kernel that iterate through the lists using non _rcu macros
and do list_del_rcu() just fine.

For readability, the reference to list_del_rcu as well as call_rcu, I
believe, should be enough of the indication. Please do correct me if I
am wrong here.

> 
> > +   if (e->rate_est != rate_est || e->bstats != bstats)
> > +   continue;
> >  
> > -   kfree(est);
> > -   killed++;
> > +   list_del_rcu(&e->list);
> > +   call_rcu(&e->e_rcu, __gen_kill_estimator);
> 
> I think a race is possible here: e.g. a timer could be running
> after return from this function yet, and trying to use *bstats,
> *rate_est and maybe even stats_lock after their destruction.
> 

That will not happen because est_timer protects the loop with
rcu_read_lock() as well as the iteration is done using
list_for_each_entry_rcu(). The destruction callback is deferred and will
happen only after the outermost rcu_read_unlock() is called. Well - that
is at least what the documentation says...

> BTW, I think, rcu_read_lock/unlock are recommended around e.g.
> rcu lists traversals, even if the current implementation doesn't
> use them now.

I am afraid I do not understand what that the current implementation is
not using right now... The whole concept and the implementation of the
RCU, as I understand it, depends on rcu_read_lock/unlock to protect the
read-side critical sections...

Please be kind to elaborate...

R.

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: + gen_estimator-fix-locking-and-timer-related-bugs.patch added to -mm tree

2007-07-10 Thread Jarek Poplawski
On Mon, Jul 09, 2007 at 07:43:40PM +0300, Ranko Zivojnovic wrote:
> On Mon, 2007-07-09 at 15:52 +0200, Patrick McHardy wrote:
> > Ranko Zivojnovic wrote:
> > > Patrick, I've taken liberty to try and implement this myself. Attached
> > > is the whole new gen_estimator-fix-locking-and-timer-related-bugs.patch
> > > that is RCU lists based. Please be kind to review.
...

I've some doubts/suggestions too:

> --- a/net/core/gen_estimator.c2007-06-25 02:21:48.0 +0300
> +++ b/net/core/gen_estimator.c2007-07-09 19:08:06.801544963 +0300
...
> @@ -173,20 +172,24 @@
>   est->last_packets = bstats->packets;
>   est->avpps = rate_est->pps<<10;
>  
> - est->next = elist[est->interval].list;
> - if (est->next == NULL) {
> - init_timer(&elist[est->interval].timer);
> - elist[est->interval].timer.data = est->interval;
> - elist[est->interval].timer.expires = jiffies + 
> ((HZ - elist[est->interval].timer.function = est_timer;
> - add_timer(&elist[est->interval].timer);
> + if (!elist[idx].timer.function) {
> + INIT_LIST_HEAD(&elist[idx].list);
> + setup_timer(&elist[idx].timer, est_timer, est->interval);

s/est->interval/idx/ here and below.

>   }
> - write_lock_bh(&est_lock);
> - elist[est->interval].list = est;
> - write_unlock_bh(&est_lock);
> + 
> + if (list_empty(&elist[est->interval].list))
> + mod_timer(&elist[idx].timer, jiffies + ((HZ< +
> + list_add_rcu(&est->list, &elist[idx].list);
>   return 0;
>  }
>  
> +static void __gen_kill_estimator(struct rcu_head *head)
> +{
> + struct gen_estimator *e = container_of(head, struct gen_estimator, 
> e_rcu);
> + kfree(e);
> +}
> +
>  /**
>   * gen_kill_estimator - remove a rate estimator
>   * @bstats: basic statistics
> @@ -199,26 +202,21 @@
>   struct gnet_stats_rate_est *rate_est)
>  {
...
> + list_for_each_entry_safe(e, n, &elist[idx].list, list) {

IMHO, at least for readability list_for_each_entry_rcu() is better here.

> + if (e->rate_est != rate_est || e->bstats != bstats)
> + continue;
>  
> - kfree(est);
> - killed++;
> + list_del_rcu(&e->list);
> + call_rcu(&e->e_rcu, __gen_kill_estimator);

I think a race is possible here: e.g. a timer could be running
after return from this function yet, and trying to use *bstats,
*rate_est and maybe even stats_lock after their destruction.

BTW, I think, rcu_read_lock/unlock are recommended around e.g.
rcu lists traversals, even if the current implementation doesn't
use them now.

Regards,
Jarek P.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: + gen_estimator-fix-locking-and-timer-related-bugs.patch added to -mm tree

2007-07-09 Thread Patrick McHardy
Ranko Zivojnovic wrote:
> On Mon, 2007-07-09 at 15:52 +0200, Patrick McHardy wrote:
> 
>>You could also use synchronize_rcu(), estimator destruction is
>>not particulary performance critical.
> 
> 
> I've tried synchronize_rcu(), but it went kaboom with the below, thus
> call_rcu() must stay:
> 
> ---cut---
> BUG: sleeping function called from invalid context at kernel/sched.c:3928
> in_atomic():1, irqs_disabled():0


Right, I forgot that we're holding some locks.

> [...]
> All the changes above now included in the attached patch. I've clean
> compiled it with 2.6.22 and now testing this version.


Looks good. Please resend with a Signed-off-by: line once you're done
testing so Dave can apply it (ideally with a new subject so it doesn't
get missed).

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: + gen_estimator-fix-locking-and-timer-related-bugs.patch added to -mm tree

2007-07-09 Thread Ranko Zivojnovic
On Mon, 2007-07-09 at 15:52 +0200, Patrick McHardy wrote:
> Ranko Zivojnovic wrote:
> > Patrick, I've taken liberty to try and implement this myself. Attached
> > is the whole new gen_estimator-fix-locking-and-timer-related-bugs.patch
> > that is RCU lists based. Please be kind to review.
> 
> Thanks for taking care of this, I'm a bit behind.

Glad to be able to help.

> See below for comments ..
> 
> > --- a/net/core/gen_estimator.c  2007-06-25 02:21:48.0 +0300
> > +++ b/net/core/gen_estimator.c  2007-07-09 14:27:12.053336875 +0300
> > @@ -79,7 +79,7 @@
> >  
> >  struct gen_estimator
> >  {
> > -   struct gen_estimator*next;
> > +   struct list_headlist;
> > struct gnet_stats_basic *bstats;
> > struct gnet_stats_rate_est  *rate_est;
> > spinlock_t  *stats_lock;
> > @@ -89,26 +89,27 @@
> > u32 last_packets;
> > u32 avpps;
> > u32 avbps;
> > +   struct rcu_head e_rcu;
> 
> 
> You could also use synchronize_rcu(), estimator destruction is
> not particulary performance critical.

I've tried synchronize_rcu(), but it went kaboom with the below, thus
call_rcu() must stay:

---cut---
BUG: sleeping function called from invalid context at kernel/sched.c:3928
in_atomic():1, irqs_disabled():0
3 locks held by tc/2933:
 #0:  (rtnl_mutex){--..}, at: [] rtnetlink_rcv+0x18/0x42
 #1:  (&dev->queue_lock){-+..}, at: [] qdisc_lock_tree+0xe/0x1c
 #2:  (&dev->ingress_lock){-...}, at: [] tc_get_qdisc+0x192/0x1e9
 [] wait_for_completion+0x1a/0xb7
 [] __spin_lock_init+0x29/0x4b
 [] synchronize_rcu+0x2a/0x2f
 [] wakeme_after_rcu+0x0/0x8
 [] gen_kill_estimator+0x70/0x9b
 [] htb_destroy_class+0x27/0x12f [sch_htb]
 [] htb_destroy+0x34/0x70 [sch_htb]
 [] qdisc_destroy+0x44/0x69
 . 
 .
 .
---cut---

> >  static struct gen_estimator_head elist[EST_MAX_INTERVAL+1];
> >  
> >  /* Estimator array lock */
> > -static DEFINE_RWLOCK(est_lock);
> > +static DEFINE_SPINLOCK(est_lock);
> 
> 
> The lock isn't needed anymore since we can rely on the rtnl
> for creation/destruction.

The 'est_lock' now completely removed.

> >  /**
> > @@ -152,6 +153,7 @@
> >  {
> > struct gen_estimator *est;
> > struct gnet_estimator *parm = RTA_DATA(opt);
> > +   int idx;
> >  
> > if (RTA_PAYLOAD(opt) < sizeof(*parm))
> > return -EINVAL;
> > @@ -163,7 +165,8 @@
> > if (est == NULL)
> > return -ENOBUFS;
> >  
> > -   est->interval = parm->interval + 2;
> > +   INIT_LIST_HEAD(&est->list);
> 
> 
> Initializing the members' list_head isn't necessary.
> 
Removed.

All the changes above now included in the attached patch. I've clean
compiled it with 2.6.22 and now testing this version.

R.
--- a/net/core/gen_estimator.c	2007-06-25 02:21:48.0 +0300
+++ b/net/core/gen_estimator.c	2007-07-09 19:08:06.801544963 +0300
@@ -79,7 +79,7 @@
 
 struct gen_estimator
 {
-	struct gen_estimator	*next;
+	struct list_head	list;
 	struct gnet_stats_basic	*bstats;
 	struct gnet_stats_rate_est	*rate_est;
 	spinlock_t		*stats_lock;
@@ -89,26 +89,24 @@
 	u32			last_packets;
 	u32			avpps;
 	u32			avbps;
+	struct rcu_head		e_rcu;
 };
 
 struct gen_estimator_head
 {
 	struct timer_list	timer;
-	struct gen_estimator	*list;
+	struct list_head	list;
 };
 
 static struct gen_estimator_head elist[EST_MAX_INTERVAL+1];
 
-/* Estimator array lock */
-static DEFINE_RWLOCK(est_lock);
-
 static void est_timer(unsigned long arg)
 {
 	int idx = (int)arg;
 	struct gen_estimator *e;
 
-	read_lock(&est_lock);
-	for (e = elist[idx].list; e; e = e->next) {
+	rcu_read_lock();
+	list_for_each_entry_rcu(e, &elist[idx].list, list) {
 		u64 nbytes;
 		u32 npackets;
 		u32 rate;
@@ -127,9 +125,9 @@
 		e->rate_est->pps = (e->avpps+0x1FF)>>10;
 		spin_unlock(e->stats_lock);
 	}
-
-	mod_timer(&elist[idx].timer, jiffies + ((HZinterval + 2;
+	est->interval = idx = parm->interval + 2;
 	est->bstats = bstats;
 	est->rate_est = rate_est;
 	est->stats_lock = stats_lock;
@@ -173,20 +172,24 @@
 	est->last_packets = bstats->packets;
 	est->avpps = rate_est->pps<<10;
 
-	est->next = elist[est->interval].list;
-	if (est->next == NULL) {
-		init_timer(&elist[est->interval].timer);
-		elist[est->interval].timer.data = est->interval;
-		elist[est->interval].timer.expires = jiffies + ((HZinterval].timer.function = est_timer;
-		add_timer(&elist[est->interval].timer);
+	if (!elist[idx].timer.function) {
+		INIT_LIST_HEAD(&elist[idx].list);
+		setup_timer(&elist[idx].timer, est_timer, est->interval);
 	}
-	write_lock_bh(&est_lock);
-	elist[est->interval].list = est;
-	write_unlock_bh(&est_lock);
+		
+	if (list_empty(&elist[est->interval].list))
+		mod_timer(&elist[idx].timer, jiffies + ((HZ

Re: + gen_estimator-fix-locking-and-timer-related-bugs.patch added to -mm tree

2007-07-09 Thread Patrick McHardy
Ranko Zivojnovic wrote:
> On Sat, 2007-07-07 at 17:10 +0200, Patrick McHardy wrote:
> 
>>On Sat, 7 Jul 2007, Ranko Zivojnovic wrote:
>>
>>>Maybe the appropriate way to fix this would to call gen_kill_estimator,
>>>with the appropriate lock order, before the call to qdisc_destroy, so
>>>when dev->queue_lock is taken for qdisc_destroy - the structure is
>>>already off the list.
>>
>>Probably easier to just kill est_lock and use rcu lists.
>>I'm currently travelling, I'll look into it tomorrow.
> 
> 
> Patrick, I've taken liberty to try and implement this myself. Attached
> is the whole new gen_estimator-fix-locking-and-timer-related-bugs.patch
> that is RCU lists based. Please be kind to review.


Thanks for taking care of this, I'm a bit behind.
See below for comments ..


> --- a/net/core/gen_estimator.c2007-06-25 02:21:48.0 +0300
> +++ b/net/core/gen_estimator.c2007-07-09 14:27:12.053336875 +0300
> @@ -79,7 +79,7 @@
>  
>  struct gen_estimator
>  {
> - struct gen_estimator*next;
> + struct list_headlist;
>   struct gnet_stats_basic *bstats;
>   struct gnet_stats_rate_est  *rate_est;
>   spinlock_t  *stats_lock;
> @@ -89,26 +89,27 @@
>   u32 last_packets;
>   u32 avpps;
>   u32 avbps;
> + struct rcu_head e_rcu;


You could also use synchronize_rcu(), estimator destruction is
not particulary performance critical.

>  static struct gen_estimator_head elist[EST_MAX_INTERVAL+1];
>  
>  /* Estimator array lock */
> -static DEFINE_RWLOCK(est_lock);
> +static DEFINE_SPINLOCK(est_lock);


The lock isn't needed anymore since we can rely on the rtnl
for creation/destruction.

>  /**
> @@ -152,6 +153,7 @@
>  {
>   struct gen_estimator *est;
>   struct gnet_estimator *parm = RTA_DATA(opt);
> + int idx;
>  
>   if (RTA_PAYLOAD(opt) < sizeof(*parm))
>   return -EINVAL;
> @@ -163,7 +165,8 @@
>   if (est == NULL)
>   return -ENOBUFS;
>  
> - est->interval = parm->interval + 2;
> + INIT_LIST_HEAD(&est->list);


Initializing the members' list_head isn't necessary.

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: + gen_estimator-fix-locking-and-timer-related-bugs.patch added to -mm tree

2007-07-09 Thread Ranko Zivojnovic
On Sat, 2007-07-07 at 17:10 +0200, Patrick McHardy wrote:
> On Sat, 7 Jul 2007, Ranko Zivojnovic wrote:
> > Maybe the appropriate way to fix this would to call gen_kill_estimator,
> > with the appropriate lock order, before the call to qdisc_destroy, so
> > when dev->queue_lock is taken for qdisc_destroy - the structure is
> > already off the list.
> 
> Probably easier to just kill est_lock and use rcu lists.
> I'm currently travelling, I'll look into it tomorrow.

Patrick, I've taken liberty to try and implement this myself. Attached
is the whole new gen_estimator-fix-locking-and-timer-related-bugs.patch
that is RCU lists based. Please be kind to review.

I've just compiled it against 2.6.22 together with sch_htb patch and I
am currently testing it. I will let it run until tomorrow or until it
crashes.

R.
--- a/net/core/gen_estimator.c	2007-06-25 02:21:48.0 +0300
+++ b/net/core/gen_estimator.c	2007-07-09 14:27:12.053336875 +0300
@@ -79,7 +79,7 @@
 
 struct gen_estimator
 {
-	struct gen_estimator	*next;
+	struct list_head	list;
 	struct gnet_stats_basic	*bstats;
 	struct gnet_stats_rate_est	*rate_est;
 	spinlock_t		*stats_lock;
@@ -89,26 +89,27 @@
 	u32			last_packets;
 	u32			avpps;
 	u32			avbps;
+	struct rcu_head		e_rcu;
 };
 
 struct gen_estimator_head
 {
 	struct timer_list	timer;
-	struct gen_estimator	*list;
+	struct list_head	list;
 };
 
 static struct gen_estimator_head elist[EST_MAX_INTERVAL+1];
 
 /* Estimator array lock */
-static DEFINE_RWLOCK(est_lock);
+static DEFINE_SPINLOCK(est_lock);
 
 static void est_timer(unsigned long arg)
 {
 	int idx = (int)arg;
 	struct gen_estimator *e;
 
-	read_lock(&est_lock);
-	for (e = elist[idx].list; e; e = e->next) {
+	rcu_read_lock();
+	list_for_each_entry_rcu(e, &elist[idx].list, list) {
 		u64 nbytes;
 		u32 npackets;
 		u32 rate;
@@ -127,9 +128,9 @@
 		e->rate_est->pps = (e->avpps+0x1FF)>>10;
 		spin_unlock(e->stats_lock);
 	}
-
-	mod_timer(&elist[idx].timer, jiffies + ((HZinterval + 2;
+	INIT_LIST_HEAD(&est->list);
+	est->interval = idx = parm->interval + 2;
 	est->bstats = bstats;
 	est->rate_est = rate_est;
 	est->stats_lock = stats_lock;
@@ -173,20 +176,26 @@
 	est->last_packets = bstats->packets;
 	est->avpps = rate_est->pps<<10;
 
-	est->next = elist[est->interval].list;
-	if (est->next == NULL) {
-		init_timer(&elist[est->interval].timer);
-		elist[est->interval].timer.data = est->interval;
-		elist[est->interval].timer.expires = jiffies + ((HZinterval].timer.function = est_timer;
-		add_timer(&elist[est->interval].timer);
+	spin_lock_bh(&est_lock);
+	if (!elist[idx].timer.function) {
+		INIT_LIST_HEAD(&elist[idx].list);
+		setup_timer(&elist[idx].timer, est_timer, est->interval);
 	}
-	write_lock_bh(&est_lock);
-	elist[est->interval].list = est;
-	write_unlock_bh(&est_lock);
+		
+	if (list_empty(&elist[est->interval].list))
+		mod_timer(&elist[idx].timer, jiffies + ((HZrate_est != rate_est || est->bstats != bstats) {
-pest = &est->next;
-continue;
-			}
 
-			write_lock_bh(&est_lock);
-			*pest = est->next;
-			write_unlock_bh(&est_lock);
+		/* Skip non initialized indexes */
+		if (!elist[idx].timer.function)
+			continue;
+
+		list_for_each_entry_safe(e, n, &elist[idx].list, list) {
+			if (e->rate_est != rate_est || e->bstats != bstats)
+continue;
 
-			kfree(est);
-			killed++;
+			spin_lock_bh(&est_lock);
+			list_del_rcu(&e->list);
+			spin_unlock_bh(&est_lock);
+			call_rcu(&e->e_rcu, __gen_kill_estimator);
 		}
-		if (killed && elist[idx].list == NULL)
-			del_timer(&elist[idx].timer);
 	}
 }
 


Re: + gen_estimator-fix-locking-and-timer-related-bugs.patch added to -mm tree

2007-07-09 Thread Jarek Poplawski
On Fri, Jul 06, 2007 at 04:16:18PM +0300, Ranko Zivojnovic wrote:
> On Fri, 2007-07-06 at 14:47 +0200, Jarek Poplawski wrote:
> > On Fri, Jul 06, 2007 at 08:45:23AM +0200, Jarek Poplawski wrote:
> > > On Fri, Jul 06, 2007 at 09:08:43AM +0300, Ranko Zivojnovic wrote:
> > ...
> > > > In order to get that parameter out of the way - I will make the same
> > > > test on a real machine.
> > 
> > BTW, maybe it would be better to try with something more stable than -mm.
> 
> That would be the next thing I was to try. Suggestion?

Of course, -mm needs testing too, and it's probably easier
to find bugs here, but it's hard to check fixes: you never
know if the new bugs are caused by the fix or the fix only
uncovered them. So, last stable or -rc versions (at least
with higher numbers) should make things faster. 

I've looked a bit on this first log, and it's very strange,
especially stats of cpu#0. Maybe it's really about vmware
or maybe there is something more, but it doesn't seem this
stats_lock deadlock could've done all this mess. Anyway, it
would be interesting to repeat such test with -mm after
checking e.g. with 2.6.22.

Thanks,
Jarek P.

PS: Sorry for late responding.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: + gen_estimator-fix-locking-and-timer-related-bugs.patch added to -mm tree

2007-07-09 Thread Jarek Poplawski
On Sat, Jul 07, 2007 at 05:10:54PM +0200, Patrick McHardy wrote:
> On Sat, 7 Jul 2007, Ranko Zivojnovic wrote:
> 
> >>On Fri, 2007-07-06 at 16:21 +0200, Patrick McHardy wrote:
> >>>There is at least one ABBA deadlock, est_timer does:
> >>>
> >>>read_lock(&est_lock)
> >>>spin_lock(e->stats_lock) (which is dev->queue_lock)
> >>>
> >>>and qdisc_destroy calls htb_destroy under dev->queue_lock, which
> >>>calls htb_destroy_class, then gen_kill_estimator and this
> >>>write_locks est_lock.
> >>>
> >>>I can't see the problem above though, the qdisc_run path only takes
> >>>dev->queue_lock. Please enable lockdep and post the output if any.
> >>
> >
> >I've got both code paths this time. It shows exactly the ABBA deadlock
> >you describe above. The details are below.
> 
> 
> Thanks. I'm still wondering whats causing the other lockup
> you're seeing.
> 
> >Maybe the appropriate way to fix this would to call gen_kill_estimator,
> >with the appropriate lock order, before the call to qdisc_destroy, so
> >when dev->queue_lock is taken for qdisc_destroy - the structure is
> >already off the list.
> 
> 
> Probably easier to just kill est_lock and use rcu lists.
> I'm currently travelling, I'll look into it tomorrow.
> 

Congratulations to both of you!

Ranko, it looks like you have some licence to kill!
(I hope bugs only...)

Cheers,
Jarek P.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: + gen_estimator-fix-locking-and-timer-related-bugs.patch added to -mm tree

2007-07-07 Thread Patrick McHardy

On Sat, 7 Jul 2007, Ranko Zivojnovic wrote:


On Fri, 2007-07-06 at 16:21 +0200, Patrick McHardy wrote:

There is at least one ABBA deadlock, est_timer does:

read_lock(&est_lock)
spin_lock(e->stats_lock) (which is dev->queue_lock)

and qdisc_destroy calls htb_destroy under dev->queue_lock, which
calls htb_destroy_class, then gen_kill_estimator and this
write_locks est_lock.

I can't see the problem above though, the qdisc_run path only takes
dev->queue_lock. Please enable lockdep and post the output if any.




I've got both code paths this time. It shows exactly the ABBA deadlock
you describe above. The details are below.



Thanks. I'm still wondering whats causing the other lockup
you're seeing.


Maybe the appropriate way to fix this would to call gen_kill_estimator,
with the appropriate lock order, before the call to qdisc_destroy, so
when dev->queue_lock is taken for qdisc_destroy - the structure is
already off the list.



Probably easier to just kill est_lock and use rcu lists.
I'm currently travelling, I'll look into it tomorrow.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: + gen_estimator-fix-locking-and-timer-related-bugs.patch added to -mm tree

2007-07-07 Thread Ranko Zivojnovic
On Fri, 2007-07-06 at 17:55 +0300, Ranko Zivojnovic wrote:
> On Fri, 2007-07-06 at 16:21 +0200, Patrick McHardy wrote:
> > Ranko Zivojnovic wrote:
> > > BUG: spinlock lockup on CPU#0, swapper/0, c03eff80
> > >  [] _raw_spin_lock+0x108/0x13c
> > >  [] __qdisc_run+0x97/0x1b0
> > >  [] qdisc_watchdog+0x19/0x58
> > >  [] __lock_text_start+0x37/0x43
> > >  [] qdisc_watchdog+0x56/0x58
> > >  [] qdisc_watchdog+0x0/0x58
> > >  [] run_hrtimer_softirq+0x58/0xb5
> > >  [...]
> > 
> > > BUG: spinlock lockup on CPU#1, swapper/0, c03eff80
> > >  [] _raw_spin_lock+0x108/0x13c
> > >  [] est_timer+0x53/0x148
> > >  [] run_timer_softirq+0x30/0x184
> > >  [] run_timer_softirq+0x121/0x184
> > >  [] __do_softirq+0x66/0xf3
> > >  [] est_timer+0x0/0x148
> > >  [...]
> > 
> > 
> > There is at least one ABBA deadlock, est_timer does:
> > 
> > read_lock(&est_lock)
> > spin_lock(e->stats_lock) (which is dev->queue_lock)
> > 
> > and qdisc_destroy calls htb_destroy under dev->queue_lock, which
> > calls htb_destroy_class, then gen_kill_estimator and this
> > write_locks est_lock.
> > 
> > I can't see the problem above though, the qdisc_run path only takes
> > dev->queue_lock. Please enable lockdep and post the output if any.
> 

I've got both code paths this time. It shows exactly the ABBA deadlock
you describe above. The details are below.

Maybe the appropriate way to fix this would to call gen_kill_estimator,
with the appropriate lock order, before the call to qdisc_destroy, so
when dev->queue_lock is taken for qdisc_destroy - the structure is
already off the list.

-LOG
BUG: spinlock lockup on CPU#2, ping/27868, c03eff80
 [] _raw_spin_lock+0x108/0x13c
 [] est_timer+0x53/0x148
 [] run_timer_softirq+0x121/0x184
 [] __do_softirq+0x66/0xf3
 [] est_timer+0x0/0x148
 [] __do_softirq+0x7e/0xf3
 [] do_softirq+0x56/0x58
 [] smp_apic_timer_interrupt+0x5a/0x85
 [] apic_timer_interrupt+0x29/0x38
 [] apic_timer_interrupt+0x33/0x38
 [] local_bh_enable+0x94/0x13b
 [] dev_queue_xmit+0x95/0x2d5
 [] ip_output+0x193/0x32a
 [] ip_finish_output+0x0/0x29e
 [] ip_push_pending_frames+0x27f/0x46b
 [] dst_output+0x0/0x7
 [] raw_sendmsg+0x70b/0x7f2
 [] inet_sendmsg+0x2b/0x49
 [] sock_sendmsg+0xe2/0xfd
 [] autoremove_wake_function+0x0/0x37
 [] autoremove_wake_function+0x0/0x37
 [] enqueue_entity+0x139/0x4f8
 [] copy_from_user+0x2d/0x59
 [] sys_sendmsg+0x12d/0x243
 [] __lock_acquire+0x825/0x1002
 [] __lock_acquire+0x825/0x1002
 [] scheduler_tick+0x1a7/0x20e
 [] _spin_unlock_irq+0x20/0x23
 [] trace_hardirqs_on+0x73/0x147
 [] run_timer_softirq+0x30/0x184
 [] _spin_unlock_irq+0x20/0x23
 [] sys_socketcall+0x24f/0x271
 [] trace_hardirqs_on+0xab/0x147
 [] copy_to_user+0x2f/0x49
 [] sysenter_past_esp+0x8f/0x99
 [] sysenter_past_esp+0x5f/0x99
 ===

And here is the ABBA deadlock:
---cut---
SysRq : Show Locks Held

Showing all locks held in the system:
snip
3 locks held by ping/27868:
 #0:  (sk_lock-AF_INET){--..}, at: [] raw_sendmsg+0x676/0x7f2
 #1:  (est_lock#2){-.-+}, at: [] est_timer+0x15/0x148
 #2:  (&dev->queue_lock){-+..}, at: [] est_timer+0x53/0x148
snip
8 locks held by tc/2278:
 #0:  (rtnl_mutex){--..}, at: [] rtnetlink_rcv+0x18/0x42
 #1:  (&dev->queue_lock){-+..}, at: [] qdisc_lock_tree+0xe/0x1c
 #2:  (&dev->ingress_lock){-...}, at: [] tc_get_qdisc+0x192/0x1e9
 #3:  (est_lock#2){-.-+}, at: [] gen_kill_estimator+0x58/0x6f
 #4:  (&irq_lists[i].lock){++..}, at: [] 
serial8250_interrupt+0x14/0x132
 #5:  (&port_lock_key){++..}, at: [] serial8250_interrupt+0x62/0x132
 #6:  (sysrq_key_table_lock){+...}, at: [] __handle_sysrq+0x17/0x115
 #7:  (tasklist_lock){..-?}, at: [] debug_show_all_locks+0x2e/0x15e
snip
---cut---

As well as 'tc' stack:
---cut---
SysRq : Show Regs

Pid: 2278, comm:   tc
EIP: 0060:[] CPU: 0
EIP is at __write_lock_failed+0xf/0x1c
 EFLAGS: 0287Not tainted
(2.6.22-rc6-mm1.SNET.Thors.htbpatch.2.lockdebug #1)
EAX: c03f5968 EBX: c03f5968 ECX:  EDX: 0004
ESI: c9852840 EDI: c85eae24 EBP: c06aaa60 DS: 007b ES: 007b FS: 00d8
CR0: 8005003b CR2: 008ba828 CR3: 11841000 CR4: 06d0
DR0:  DR1:  DR2:  DR3: 
DR6: 0ff0 DR7: 0400
 [] _raw_write_lock+0x32/0x6e
 [] gen_kill_estimator+0x58/0x6f
 [] htb_destroy_class+0x27/0x12f [sch_htb]
 [] htb_destroy+0x34/0x70 [sch_htb]
 [] qdisc_destroy+0x52/0x8d
 [] trace_hardirqs_on+0x73/0x147
 [] htb_destroy_class+0xd2/0x12f [sch_htb]
 [] htb_destroy+0x34/0x70 [sch_htb]
 [] qdisc_destroy+0x52/0x8d
 [] tc_get_qdisc+0x19b/0x1e9
 [] tc_get_qdisc+0x0/0x1e9
 [] rtnetlink_rcv_msg+0x1c2/0x1f5
 [] netlink_run_queue+0x96/0xfd
 [] rtnetlink_rcv_msg+0x0/0x1f5
 [] rtnetlink_rcv+0x26/0x42
 [] netlink_data_ready+0x12/0x54
 [] netlink_sendskb+0x1f/0x53
 [] netlink_sendmsg+0x1f5/0x2d4
 [] sock_sendmsg+0xe2/0xfd
 [] autoremove_wake_function+0x0/0x37
 [] __lock_acquire+0x825/0x1002
 [] sock_sendmsg+0xe2/0xfd
 [] copy_from_user+0x2d/0x59
 [] sys_sendmsg+0x12d/0x243
 [] __do_fault+0x12b/0x38b
 [] __do_fau

Re: + gen_estimator-fix-locking-and-timer-related-bugs.patch added to -mm tree

2007-07-06 Thread Ranko Zivojnovic
On Fri, 2007-07-06 at 16:21 +0200, Patrick McHardy wrote:
> Ranko Zivojnovic wrote:
> > BUG: spinlock lockup on CPU#0, swapper/0, c03eff80
> >  [] _raw_spin_lock+0x108/0x13c
> >  [] __qdisc_run+0x97/0x1b0
> >  [] qdisc_watchdog+0x19/0x58
> >  [] __lock_text_start+0x37/0x43
> >  [] qdisc_watchdog+0x56/0x58
> >  [] qdisc_watchdog+0x0/0x58
> >  [] run_hrtimer_softirq+0x58/0xb5
> >  [...]
> 
> > BUG: spinlock lockup on CPU#1, swapper/0, c03eff80
> >  [] _raw_spin_lock+0x108/0x13c
> >  [] est_timer+0x53/0x148
> >  [] run_timer_softirq+0x30/0x184
> >  [] run_timer_softirq+0x121/0x184
> >  [] __do_softirq+0x66/0xf3
> >  [] est_timer+0x0/0x148
> >  [...]
> 
> 
> There is at least one ABBA deadlock, est_timer does:
> 
> read_lock(&est_lock)
> spin_lock(e->stats_lock) (which is dev->queue_lock)
> 
> and qdisc_destroy calls htb_destroy under dev->queue_lock, which
> calls htb_destroy_class, then gen_kill_estimator and this
> write_locks est_lock.
> 
> I can't see the problem above though, the qdisc_run path only takes
> dev->queue_lock. Please enable lockdep and post the output if any.

That was with lockdep already enabled:
$ zgrep LOCKDEP /proc/config.gz
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_LOCKDEP=y
# CONFIG_DEBUG_LOCKDEP is not set

also with:
CONFIG_PROVE_LOCKING=y
CONFIG_DEBUG_LOCK_ALLOC=y

that Jarek suggested, and including:
CONFIG_DETECT_SOFTLOCKUP=y
CONFIG_DEBUG_SPINLOCK=y

I will give it another spin as it is, just to see if it will show a
different path - and then I will try 2.6.21.5 with these two patches.

R.

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: + gen_estimator-fix-locking-and-timer-related-bugs.patch added to -mm tree

2007-07-06 Thread Patrick McHardy
Ranko Zivojnovic wrote:
> BUG: spinlock lockup on CPU#0, swapper/0, c03eff80
>  [] _raw_spin_lock+0x108/0x13c
>  [] __qdisc_run+0x97/0x1b0
>  [] qdisc_watchdog+0x19/0x58
>  [] __lock_text_start+0x37/0x43
>  [] qdisc_watchdog+0x56/0x58
>  [] qdisc_watchdog+0x0/0x58
>  [] run_hrtimer_softirq+0x58/0xb5
>  [...]

> BUG: spinlock lockup on CPU#1, swapper/0, c03eff80
>  [] _raw_spin_lock+0x108/0x13c
>  [] est_timer+0x53/0x148
>  [] run_timer_softirq+0x30/0x184
>  [] run_timer_softirq+0x121/0x184
>  [] __do_softirq+0x66/0xf3
>  [] est_timer+0x0/0x148
>  [...]


There is at least one ABBA deadlock, est_timer does:

read_lock(&est_lock)
spin_lock(e->stats_lock) (which is dev->queue_lock)

and qdisc_destroy calls htb_destroy under dev->queue_lock, which
calls htb_destroy_class, then gen_kill_estimator and this
write_locks est_lock.

I can't see the problem above though, the qdisc_run path only takes
dev->queue_lock. Please enable lockdep and post the output if any.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: + gen_estimator-fix-locking-and-timer-related-bugs.patch added to -mm tree

2007-07-06 Thread Ranko Zivojnovic
On Fri, 2007-07-06 at 15:27 +0200, Patrick McHardy wrote:
> Ranko Zivojnovic wrote:
> > Managed to get stuck on a normal machine as well (2.6.22-rc6-mm1 +
> > sch_htb patch) ... here's the log:
> > 
> 
> Can you post the script you're using to reproduce this please?

Attached.

R.


tc-crash.sh
Description: application/shellscript


Re: + gen_estimator-fix-locking-and-timer-related-bugs.patch added to -mm tree

2007-07-06 Thread Patrick McHardy
Ranko Zivojnovic wrote:
> Managed to get stuck on a normal machine as well (2.6.22-rc6-mm1 +
> sch_htb patch) ... here's the log:
> 

Can you post the script you're using to reproduce this please?

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: + gen_estimator-fix-locking-and-timer-related-bugs.patch added to -mm tree

2007-07-06 Thread Ranko Zivojnovic
On Fri, 2007-07-06 at 14:47 +0200, Jarek Poplawski wrote:
> On Fri, Jul 06, 2007 at 08:45:23AM +0200, Jarek Poplawski wrote:
> > On Fri, Jul 06, 2007 at 09:08:43AM +0300, Ranko Zivojnovic wrote:
> ...
> > > In order to get that parameter out of the way - I will make the same
> > > test on a real machine.
> 
> BTW, maybe it would be better to try with something more stable than -mm.

That would be the next thing I was to try. Suggestion?

> There is also one bug invoked by sysreq, but I hope you've started to use
> sysreq only after the lockup begun.

Yes - it was after the lockup.

R.

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: + gen_estimator-fix-locking-and-timer-related-bugs.patch added to -mm tree

2007-07-06 Thread Ranko Zivojnovic
On Fri, 2007-07-06 at 08:45 +0200, Jarek Poplawski wrote: 
> On Fri, Jul 06, 2007 at 09:08:43AM +0300, Ranko Zivojnovic wrote:
> > On Thu, 2007-07-05 at 18:59 +0300, Ranko Zivojnovic wrote:
> > > On Thu, 2007-07-05 at 17:34 +0300, Ranko Zivojnovic wrote:
> > > > Anyhow - I am currently running 2.6.22-rc6-mm1 + sch_htb patch and
> > > > running a test script that always managed to reproduce the problem
> > > > within half hour - so far it looks good, but I will leave it hammering
> > > > until tomorrow and will let you know.
> > > > 
> > > 
> > > Ahm...it managed to get stuck:
> > > 
> > > Relevant piece of the process table ('tc' stays stuck):
> > >  9608 pts/0S+ 0:00 /bin/bash ./tc-crash.sh
> > >  9609 pts/0D+ 0:00 tc qdisc del dev lo root
> > > 
> > 
> > Something really weird happened here...
> > 
> > I've left it the way it was (stuck) over night and this morning the
> > process was terminated.
> > 
> > It looks like some timer was out of whack.
> > 
> > I'm not sure if it is possibly relevant - but the test I performed was
> > done in VMWare.
> 
> Yes, this could be relevant, so it's good to know.
> 
> > 
> > In order to get that parameter out of the way - I will make the same
> > test on a real machine.
> 
> Fine! But, unles you have something to hide, try to reply
> on these messages which went through netdev, or cc netdev
> at least, because maybe more people would be interested
> or even could help to solve this faster.
> 

Nope - nothing to hide :).

Managed to get stuck on a normal machine as well (2.6.22-rc6-mm1 +
sch_htb patch) ... here's the log:

---CUT--- 
-- 0:loop-telnet.sh -- time-stamp -- Jul/06/07 14:46:17 --
u32 classifier
Performance counters on
OLD policer on
input device check on
-- 0:loop-telnet.sh -- time-stamp -- Jul/06/07 14:48:13 --
-- 0:loop-telnet.sh -- time-stamp -- Jul/06/07 15:38:26 --
BUG: spinlock lockup on CPU#0, swapper/0, c03eff80
 [] _raw_spin_lock+0x108/0x13c
 [] __qdisc_run+0x97/0x1b0
 [] qdisc_watchdog+0x19/0x58
 [] __lock_text_start+0x37/0x43
 [] qdisc_watchdog+0x56/0x58
 [] qdisc_watchdog+0x0/0x58
 [] run_hrtimer_softirq+0x58/0xb5
 [] __do_softirq+0x7e/0xf3
 [] do_softirq+0x56/0x58
 [] smp_apic_timer_interrupt+0x5a/0x85
 [] apic_timer_interrupt+0x29/0x38
 [] apic_timer_interrupt+0x33/0x38
 [] mwait_idle_with_hints+0x3b/0x3f
 [] cpu_idle+0x5f/0x74
 [] start_kernel+0x318/0x3ae
 [] unknown_bootoption+0x0/0x258
 ===
BUG: spinlock lockup on CPU#1, swapper/0, c03eff80
 [] _raw_spin_lock+0x108/0x13c
 [] est_timer+0x53/0x148
 [] run_timer_softirq+0x30/0x184
 [] run_timer_softirq+0x121/0x184
 [] __do_softirq+0x66/0xf3
 [] est_timer+0x0/0x148
 [] __do_softirq+0x7e/0xf3
 [] do_softirq+0x56/0x58
 [] smp_apic_timer_interrupt+0x5a/0x85
 [] apic_timer_interrupt+0x29/0x38
 [] apic_timer_interrupt+0x33/0x38
 [] mwait_idle_with_hints+0x3b/0x3f
 [] cpu_idle+0x5f/0x74
 ===

telnet> send break
SysRq : HELP : loglevel0-8 reBoot show-all-locks(D) tErm Full kIll saK showMem 
Nice powerOff showPc show-all-timers(Q) unRaw Sync showTasks Unmount 
shoW-blocked-tasks

telnet> send break
SysRq : Show Regs

Pid: 0, comm:  swapper
EIP: 0060:[] CPU: 1
EIP is at _raw_spin_lock+0xa3/0x13c
 EFLAGS: 0246Not tainted  
(2.6.22-rc6-mm1.SNET.Thors.htbpatch.2.lockdebug #1)
EAX:  EBX: c03eff80 ECX: e4f7bf60 EDX: 98b928f8
ESI: 260e7970 EDI:  EBP: beb75188 DS: 007b ES: 007b FS: 00d8
CR0: 8005003b CR2: 080f7578 CR3: 03651000 CR4: 06d0
DR0:  DR1:  DR2:  DR3: 
DR6: 0ff0 DR7: 0400
 [] est_timer+0x53/0x148
 [] run_timer_softirq+0x30/0x184
 [] run_timer_softirq+0x121/0x184
 [] __do_softirq+0x66/0xf3
 [] est_timer+0x0/0x148
 [] __do_softirq+0x7e/0xf3
 [] do_softirq+0x56/0x58
 [] smp_apic_timer_interrupt+0x5a/0x85
 [] apic_timer_interrupt+0x29/0x38
 [] apic_timer_interrupt+0x33/0x38
 [] mwait_idle_with_hints+0x3b/0x3f
 [] cpu_idle+0x5f/0x74
 ===

telnet> send break
SysRq : Show Blocked State

 freesibling
  task PCstack   pid father child younger older
kjournald D 0343 0   518  2 (L-TLB)
   c2361e8c 0092 090e4f4b 0343 c2361e74  c2baf920 c236
   0002 c2baf920 c21343f0 c21345a8 c1d24080 0002 c040ed80 c2361ec4
   c4289280   003252cc c02feb97 c2361ecc  
Call Trace:
 [] _spin_unlock_irqrestore+0x34/0x39
 [] io_schedule+0x1d/0x27
 [] sync_buffer+0x30/0x35
 [] __wait_on_bit+0x42/0x5e
 [] sync_buffer+0x0/0x35
 [] sync_buffer+0x0/0x35
 [] out_of_line_wait_on_bit+0x63/0x76
 [] wake_bit_function+0x0/0x3c
 [] __wait_on_buffer+0x24/0x29
 [] sync_dirty_buffer+0x9c/0xe6
 [] journal_get_descriptor_buffer+0x88/0x9a [jbd]
 [] journal_commit_transaction+0xf2a/0x111f [jbd]
 [] _spin_unlock_irq+0x20/0x23
 [] kjournald+0xac/0x1f4 [jbd]
 [] autoremove_wake_function+0x0/0x37
 [] kjournald+0x0/0x1f4 [jbd]
 [] kthread+0x34/0

Re: + gen_estimator-fix-locking-and-timer-related-bugs.patch added to -mm tree

2007-07-06 Thread Jarek Poplawski
On Fri, Jul 06, 2007 at 08:45:23AM +0200, Jarek Poplawski wrote:
> On Fri, Jul 06, 2007 at 09:08:43AM +0300, Ranko Zivojnovic wrote:
...
> > In order to get that parameter out of the way - I will make the same
> > test on a real machine.

BTW, maybe it would be better to try with something more stable than -mm.
There is also one bug invoked by sysreq, but I hope you've started to use
sysreq only after the lockup begun.

Jarek P.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: + gen_estimator-fix-locking-and-timer-related-bugs.patch added to -mm tree

2007-07-05 Thread Jarek Poplawski
On Fri, Jul 06, 2007 at 09:08:43AM +0300, Ranko Zivojnovic wrote:
> On Thu, 2007-07-05 at 18:59 +0300, Ranko Zivojnovic wrote:
> > On Thu, 2007-07-05 at 17:34 +0300, Ranko Zivojnovic wrote:
> > > Anyhow - I am currently running 2.6.22-rc6-mm1 + sch_htb patch and
> > > running a test script that always managed to reproduce the problem
> > > within half hour - so far it looks good, but I will leave it hammering
> > > until tomorrow and will let you know.
> > > 
> > 
> > Ahm...it managed to get stuck:
> > 
> > Relevant piece of the process table ('tc' stays stuck):
> >  9608 pts/0S+ 0:00 /bin/bash ./tc-crash.sh
> >  9609 pts/0D+ 0:00 tc qdisc del dev lo root
> > 
> 
> Something really weird happened here...
> 
> I've left it the way it was (stuck) over night and this morning the
> process was terminated.
> 
> It looks like some timer was out of whack.
> 
> I'm not sure if it is possibly relevant - but the test I performed was
> done in VMWare.

Yes, this could be relevant, so it's good to know.

> 
> In order to get that parameter out of the way - I will make the same
> test on a real machine.

Fine! But, unles you have something to hide, try to reply
on these messages which went through netdev, or cc netdev
at least, because maybe more people would be interested
or even could help to solve this faster.

Cheers,
Jarek P.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: + gen_estimator-fix-locking-and-timer-related-bugs.patch added to -mm tree

2007-07-05 Thread Jarek Poplawski
On Fri, Jul 06, 2007 at 08:14:20AM +0200, Jarek Poplawski wrote:
...
> This new lockup bug you have just found needs some time
> to figure out. BTW, I wonder if you had lockdep on
> (CONFIG_PROVE_LOCKING or CONFIG_LOCK_ALLOC)?

Should be:
(CONFIG_PROVE_LOCKING or CONFIG_DEBUG_LOCK_ALLOC)?

Jarek P.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Fwd: Re: + gen_estimator-fix-locking-and-timer-related-bugs.patch added to -mm tree

2007-07-05 Thread Jarek Poplawski
- Forwarded message from Ranko Zivojnovic <[EMAIL PROTECTED]> -

> Date: Thu, 05 Jul 2007 18:59:25 +0300
> From: Ranko Zivojnovic <[EMAIL PROTECTED]>
> Subject: Re: + gen_estimator-fix-locking-and-timer-related-bugs.patch added
>   to -mm tree
> To: Jarek Poplawski <[EMAIL PROTECTED]>
> Cc: [EMAIL PROTECTED], [EMAIL PROTECTED]
> X-Original-To: info
> Delivered-To: [EMAIL PROTECTED]
> In-Reply-To: <[EMAIL PROTECTED]>
> Organization: SpiderNet Services Public Ltd.
> X-Mailer: Evolution 2.8.3 (2.8.3-2.fc6) 
> X-o2-Trust: 2, 43
> 
> On Thu, 2007-07-05 at 17:34 +0300, Ranko Zivojnovic wrote:
> > Anyhow - I am currently running 2.6.22-rc6-mm1 + sch_htb patch and
> > running a test script that always managed to reproduce the problem
> > within half hour - so far it looks good, but I will leave it hammering
> > until tomorrow and will let you know.
> > 
> 
> Ahm...it managed to get stuck:
> 
> Relevant piece of the process table ('tc' stays stuck):
>  9608 pts/0S+ 0:00 /bin/bash ./tc-crash.sh
>  9609 pts/0D+ 0:00 tc qdisc del dev lo root
> 
> There was no soft lockup kernel warning however sysreq 'w' & 'q' dump
> the following:
> 
> ---cut---
> SysRq : Show Blocked State
> 
>  freesibling
>   task PCstack   pid father child younger older
> kjournald D 0531 0   337  2 (L-TLB)
>c3083e9c 0046 c4ef3383 0531 c3083e84  c292f7fc c3082000
>  c2903388 c29031d0 c2903388 c150a000  c03b5d80
>c29d9480 c2949080 c03ff080 c03ff080 c289e7fc  c03ff080 0054b8e6
> Call Trace:
>  [] io_schedule+0x1e/0x28
>  [] sync_buffer+0x34/0x37
>  [] __wait_on_bit+0x42/0x5e
>  [] sync_buffer+0x0/0x37
>  [] sync_buffer+0x0/0x37
>  [] out_of_line_wait_on_bit+0x63/0x6b
>  [] wake_bit_function+0x0/0x3c
>  [] __wait_on_buffer+0x24/0x29
>  [] journal_commit_transaction+0x63d/0x10df [jbd]
>  [] _spin_unlock_irq+0x5/0x7
>  [] schedule+0x323/0x867
>  [] _spin_lock_irqsave+0x9/0xd
>  [] lock_timer_base+0x19/0x35
>  [] try_to_del_timer_sync+0x47/0x4f
>  [] kjournald+0xac/0x1f4 [jbd]
>  [] autoremove_wake_function+0x0/0x37
>  [] kjournald+0x0/0x1f4 [jbd]
>  [] kthread+0x34/0x56
>  [] kthread+0x0/0x56
>  [] kernel_thread_helper+0x7/0x14
>  ===
> syslogd   D 0531 0  1414  1 (NOTLB)
>c3045e84 00200082 c4e83c2f 0531 c1506a80 0002  c3044000
>1d5003ae 0552 c29de888 c29de6d0 c29de888 c150a000  0003
>00200082 c2949080 c03ff080 c03ff080 c2ac46b0  c03ff080 0054b8e6
> Call Trace:
>  [] log_wait_commit+0xaf/0x116 [jbd]
>  [] autoremove_wake_function+0x0/0x37
>  [] journal_stop+0x158/0x1f7 [jbd]
>  [] __writeback_single_inode+0x267/0x329
>  [] generic_writepages+0x20/0x29
>  [] sync_inode+0x19/0x2a
>  [] ext3_sync_file+0xbc/0xc8 [ext3]
>  [] do_fsync+0x55/0x8a
>  [] __do_fsync+0x1d/0x2b
>  [] sysenter_past_esp+0x5f/0x85
>  ===
> irqbalanceD 040E 0  1428  1 (NOTLB)
>c304becc 00200082 f5b6412a 040e c304beb4  c34e c304a000
>0001 c2a754c4 c22a2848 c22a2690 c22a2848 c1513000 0001 c03b5d80
>000a c286e480 c03ff080 c03ff080 c03631be c18667e4 c03ff080 003fd3f2
> Call Trace:
>  [] __mutex_lock_slowpath+0x54/0x8d
>  [] mutex_lock+0x21/0x26
>  [] sock_ioctl+0x0/0x1c6
>  [] sock_ioctl+0x0/0x1c6
>  [] dev_ioctl+0x20c/0x309
>  [] inotify_d_instantiate+0x1b/0x80
>  [] d_instantiate+0x56/0x8a
>  [] sock_ioctl+0x40/0x1c6
>  [] sock_ioctl+0x0/0x1c6
>  [] do_ioctl+0x1f/0x6d
>  [] sys_socket+0x29/0x44
>  [] vfs_ioctl+0x50/0x26b
>  [] sys_ioctl+0x5d/0x6c
>  [] sysenter_past_esp+0x5f/0x85
>  ===
> tcD 0409 0  9609   9608 (NOTLB)
>c37a7be8 00200086 ee308e07 0409 c1506a80 0002  c37a6000
>3e416208 0414 c29021e8 c2902030 c29021e8 c150a000  c321c4c4
>c37a7bd0 c19d36c0 c03ff080 c03ff080 c0134d71 00200286 c03ff080 003fe1ed
> Call Trace:
>  [] lock_hrtimer_base+0x15/0x2f
>  [] wait_for_completion+0x9e/0xb7
>  [] default_wake_function+0x0/0xc
>  [] synchronize_rcu+0x2a/0x2f
>  [] wakeme_after_rcu+0x0/0x8
>  [] dev_deactivate+0x8a/0xa3
>  [] dev_graft_qdisc+0x81/0xa4
>  [] qdisc_graft+0x28/0x88
>  [] tc_get_qdisc+0x15d/0x1e9
>  [] tc_get_qdisc+0x0/0x1e9
>  [] rtnetlink_rcv_msg+0x1c2/0x1f5
>  [] netlink_run_queue+0xa7/0x10e
>  [] rtnetlink_rcv_msg+0x0/0x1f5
>  [] rtnetlink_rcv+0x26/0x42
>  [] netlink_data_ready+0x12/0x54
&

Re: + gen_estimator-fix-locking-and-timer-related-bugs.patch added to -mm tree

2007-07-05 Thread Jarek Poplawski
On Thu, Jul 05, 2007 at 06:59:25PM +0300, Ranko Zivojnovic wrote:
> On Thu, 2007-07-05 at 17:34 +0300, Ranko Zivojnovic wrote:
> > Anyhow - I am currently running 2.6.22-rc6-mm1 + sch_htb patch and
> > running a test script that always managed to reproduce the problem
> > within half hour - so far it looks good, but I will leave it hammering
> > until tomorrow and will let you know.
> > 
> 
> Ahm...it managed to get stuck:
> 
> Relevant piece of the process table ('tc' stays stuck):
>  9608 pts/0S+ 0:00 /bin/bash ./tc-crash.sh
>  9609 pts/0D+ 0:00 tc qdisc del dev lo root
> 
> There was no soft lockup kernel warning however sysreq 'w' & 'q' dump
> the following:
> 
> ---cut---

I have to admit you are really good at this testing!

Especially after this first htb bug I was almost stunned,
how long such repeatable and not very hidden bug can
live in one of the most popular networking modules...

This new lockup bug you have just found needs some time
to figure out. BTW, I wonder if you had lockdep on
(CONFIG_PROVE_LOCKING or CONFIG_LOCK_ALLOC)?

Thanks & regards,
Jarek P.

PS: I hope you don't mind I forward this to netdev.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html