Re: [PATCH] Re: [NETPOLL] netconsole: fix soft lockup when removing module

2007-07-04 Thread Jarek Poplawski
On Mon, Jul 02, 2007 at 09:52:26AM +0200, Jarek Poplawski wrote:
 
 From my recent patch:
 
  #1
  Until kernel ver. 2.6.21 (including) cancel_rearming_delayed_work()
  required a work function should always (unconditionally) rearm with
  delay  0 - otherwise it would endlessly loop. This patch replaces
  this function with cancel_delayed_work(). Later kernel versions don't
  require this, so here it's only for uniformity.
 
 But Oleg Nesterov [EMAIL PROTECTED] found:
 
  But 2.6.22 doesn't need this change, why it was merged?
  
  In fact, I suspect this change adds a race,
 ...
 
 His description was right (thanks), so this patch reverts #1.
 
 Signed-off-by: Jarek Poplawski [EMAIL PROTECTED]

Oleg,

I think maybe you could ack these 2 netconsole patches...
They were done on your request but it looks like Andrew
is waiting on something...

Thanks,
Jarek P.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Re: [NETPOLL] netconsole: fix soft lockup when removing module

2007-07-04 Thread David Miller
From: Jarek Poplawski [EMAIL PROTECTED]
Date: Wed, 4 Jul 2007 08:41:59 +0200

 On Mon, Jul 02, 2007 at 09:52:26AM +0200, Jarek Poplawski wrote:
  
  From my recent patch:
  
   #1
   Until kernel ver. 2.6.21 (including) cancel_rearming_delayed_work()
   required a work function should always (unconditionally) rearm with
   delay  0 - otherwise it would endlessly loop. This patch replaces
   this function with cancel_delayed_work(). Later kernel versions don't
   require this, so here it's only for uniformity.
  
  But Oleg Nesterov [EMAIL PROTECTED] found:
  
   But 2.6.22 doesn't need this change, why it was merged?
   
   In fact, I suspect this change adds a race,
  ...
  
  His description was right (thanks), so this patch reverts #1.
  
  Signed-off-by: Jarek Poplawski [EMAIL PROTECTED]
 
 Oleg,
 
 I think maybe you could ack these 2 netconsole patches...
 They were done on your request but it looks like Andrew
 is waiting on something...

I plan to apply this patch, don't worry about it :)
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Re: [NETPOLL] netconsole: fix soft lockup when removing module

2007-07-04 Thread Jarek Poplawski
On Tue, Jul 03, 2007 at 11:47:18PM -0700, David Miller wrote:
...
 I plan to apply this patch, don't worry about it :)
 
Now I'm really worried! Don't you evere sleep?

Good night,
Jarek P.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Re: [NETPOLL] netconsole: fix soft lockup when removing module

2007-07-04 Thread Jarek Poplawski
On Wed, Jul 04, 2007 at 08:41:59AM +0200, Jarek Poplawski wrote:
...
 They were done on your request but it looks like Andrew
 is waiting on something...

Andrew,

This time I'm not sorry for my English because I've just
found I could speak Chiefly Midland and Southern U.S..

Jarek P.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [NETPOLL] netconsole: fix soft lockup when removing module

2007-07-02 Thread Jarek Poplawski
On Sun, Jul 01, 2007 at 09:35:58PM +0400, Oleg Nesterov wrote:
 Jarek Poplawski wrote:
 
 #1
 Until kernel ver. 2.6.21 (including) cancel_rearming_delayed_work()
 required a work function should always (unconditionally) rearm with
 delay  0 - otherwise it would endlessly loop. This patch replaces
 this function with cancel_delayed_work(). Later kernel versions don't
 require this, so here it's only for uniformity.
 
 But 2.6.22 doesn't need this change, why it was merged?

One bad reason is given above. Should I look for another one?

 
 In fact, I suspect this change adds a race,

You are right!

 
  --- a/net/core/netpoll.c
  +++ b/net/core/netpoll.c
  @@ -72,7 +72,8 @@ static void queue_process(struct work_struct *work)
  netif_tx_unlock(dev);
  local_irq_restore(flags);
   
  -   schedule_delayed_work(npinfo-tx_work, HZ/10);
  +   if (atomic_read(npinfo-refcnt))
  +   schedule_delayed_work(npinfo-tx_work, HZ/10);
  return;
  }
  netif_tx_unlock(dev);
  @@ -785,9 +786,15 @@ void netpoll_cleanup(struct netpoll *np)
  if (atomic_dec_and_test(npinfo-refcnt)) {
  skb_queue_purge(npinfo-arp_tx);
  skb_queue_purge(npinfo-txq);
  -   cancel_rearming_delayed_work(npinfo-tx_work);
  +   cancel_delayed_work(npinfo-tx_work);
  flush_scheduled_work();
 
 Suppose that -refcnt == 1, and queue_process() was preempted just after
 atomic_read(npinfo-refcnt).
 
 netpoll_cleanup() comes, cancel_delayed_work() does nothing, 
 flush_scheduled_work()
 sleeps.
 
 queue_process() gets CPU, re-schedules -tx_work, and returns.
 
 flush_scheduled_work() completes, netpoll_cleanup() frees npinfo and returns
 while -tx_work is pending.
 
 No?

No no. (Yes!)

I had some doubts about this, and you found very good reason
for this.

I'll soon send a patch to restore cancel_rearming_delayed_work
in 2.6.22.

So, 2.6.21 needs something better (maybe you've found it btw.?),
but they weren't too interested, anyway.

Thanks very much  regards,
Jarek P.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Re: [NETPOLL] netconsole: fix soft lockup when removing module

2007-07-02 Thread Jarek Poplawski

From my recent patch:

 #1
 Until kernel ver. 2.6.21 (including) cancel_rearming_delayed_work()
 required a work function should always (unconditionally) rearm with
 delay  0 - otherwise it would endlessly loop. This patch replaces
 this function with cancel_delayed_work(). Later kernel versions don't
 require this, so here it's only for uniformity.

But Oleg Nesterov [EMAIL PROTECTED] found:

 But 2.6.22 doesn't need this change, why it was merged?
 
 In fact, I suspect this change adds a race,
...

His description was right (thanks), so this patch reverts #1.

Signed-off-by: Jarek Poplawski [EMAIL PROTECTED]

---

diff -Nurp 2.6.22-rc7-/net/core/netpoll.c 2.6.22-rc7/net/core/netpoll.c
--- 2.6.22-rc7-/net/core/netpoll.c  2007-07-02 09:03:27.0 +0200
+++ 2.6.22-rc7/net/core/netpoll.c   2007-07-02 09:32:34.0 +0200
@@ -72,8 +72,7 @@ static void queue_process(struct work_st
netif_tx_unlock(dev);
local_irq_restore(flags);
 
-   if (atomic_read(npinfo-refcnt))
-   schedule_delayed_work(npinfo-tx_work, HZ/10);
+   schedule_delayed_work(npinfo-tx_work, HZ/10);
return;
}
netif_tx_unlock(dev);
@@ -786,7 +785,7 @@ void netpoll_cleanup(struct netpoll *np)
if (atomic_dec_and_test(npinfo-refcnt)) {
skb_queue_purge(npinfo-arp_tx);
skb_queue_purge(npinfo-txq);
-   cancel_delayed_work(npinfo-tx_work);
+   cancel_rearming_delayed_work(npinfo-tx_work);
flush_scheduled_work();
 
/* clean after last, unfinished work */
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Re: [NETPOLL] netconsole: fix soft lockup when removing module

2007-07-02 Thread Oleg Nesterov
On 07/02, Jarek Poplawski wrote:
 
 diff -Nurp 2.6.22-rc7-/net/core/netpoll.c 2.6.22-rc7/net/core/netpoll.c
 --- 2.6.22-rc7-/net/core/netpoll.c2007-07-02 09:03:27.0 +0200
 +++ 2.6.22-rc7/net/core/netpoll.c 2007-07-02 09:32:34.0 +0200
 @@ -72,8 +72,7 @@ static void queue_process(struct work_st
   netif_tx_unlock(dev);
   local_irq_restore(flags);
  
 - if (atomic_read(npinfo-refcnt))
 - schedule_delayed_work(npinfo-tx_work, HZ/10);
 + schedule_delayed_work(npinfo-tx_work, HZ/10);
   return;
   }
   netif_tx_unlock(dev);
 @@ -786,7 +785,7 @@ void netpoll_cleanup(struct netpoll *np)
   if (atomic_dec_and_test(npinfo-refcnt)) {
   skb_queue_purge(npinfo-arp_tx);
   skb_queue_purge(npinfo-txq);
 - cancel_delayed_work(npinfo-tx_work);
 + cancel_rearming_delayed_work(npinfo-tx_work);
   flush_scheduled_work();

While you are here, could you also delete this flush_scheduled_work() ?
It is not needed any longer.

Oleg.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [NETPOLL] netconsole: fix soft lockup when removing module

2007-07-02 Thread Oleg Nesterov
On 07/02, Jarek Poplawski wrote:

   --- a/net/core/netpoll.c
   +++ b/net/core/netpoll.c
   @@ -72,7 +72,8 @@ static void queue_process(struct work_struct *work)
 netif_tx_unlock(dev);
 local_irq_restore(flags);

   - schedule_delayed_work(npinfo-tx_work, HZ/10);
   + if (atomic_read(npinfo-refcnt))
   + schedule_delayed_work(npinfo-tx_work, HZ/10);
 return;
 }

 [...snip...]
 
 So, 2.6.21 needs something better (maybe you've found it btw.?),
 but they weren't too interested, anyway.

We can do a double flush trick. If queue_process() checks -refcnt before
schedule_delayed_work() like above, netpoll_cleanup() can do

flush_scheduled_work();

// the next invocation of queue_process()
// must see -refcnt == 0
if (!cancel_delayed_work(npinfo-tx_work)) {
/* may be queued, wait for completion */
flush_scheduled_work();
}

Jarek, I don't understand net/, a silly question. Why do we need the #2 chunk?
Isn't it better to move skb_queue_purge(npinfo-txq) after cancel_..._work()
instead?

Oleg.   

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [NETPOLL] netconsole: fix soft lockup when removing module

2007-07-02 Thread Jarek Poplawski
On Mon, Jul 02, 2007 at 01:24:08PM +0400, Oleg Nesterov wrote:
 On 07/02, Jarek Poplawski wrote:
 
--- a/net/core/netpoll.c
+++ b/net/core/netpoll.c
@@ -72,7 +72,8 @@ static void queue_process(struct work_struct *work)
netif_tx_unlock(dev);
local_irq_restore(flags);
 
-   schedule_delayed_work(npinfo-tx_work, HZ/10);
+   if (atomic_read(npinfo-refcnt))
+   schedule_delayed_work(npinfo-tx_work, 
HZ/10);
return;
}
 
  [...snip...]
  
  So, 2.6.21 needs something better (maybe you've found it btw.?),
  but they weren't too interested, anyway.
 
 We can do a double flush trick. If queue_process() checks -refcnt before
 schedule_delayed_work() like above, netpoll_cleanup() can do
 
   flush_scheduled_work();
 
   // the next invocation of queue_process()
   // must see -refcnt == 0
   if (!cancel_delayed_work(npinfo-tx_work)) {
   /* may be queued, wait for completion */
   flush_scheduled_work();
   }

I'll try to think about it later, but I don't plan to do next patch,
so feel free to send this. I didn't plan to fix netpoll at all
(I never didn't use nor studied this before...). But couldn't stand
this stupid lockup stays in 2.6.21. Now, I see it probably doesn't
annoy more than 2 or 3 people around...

 
 Jarek, I don't understand net/, a silly question. Why do we need the #2 chunk?
 Isn't it better to move skb_queue_purge(npinfo-txq) after cancel_..._work()
 instead?

I've thought about this too, but because I don't know netpoll/netconsole
enough I didn't want to change more than minimum needed. 

skb_queue_purge() uses heavy locking (irqsave) and I don't remember now
if I've found the reason or simply believed somebody had to have a reason
to do this there, anyway, if moved after cancel_ it could be done without
this locking, and something like while () instead of my if () should be
enough.

But there was not much interest about this patch, and I'm not currently
interested to be the main netconsole expert too, so maybe you would
like to try...

Cheers,
Jarek P.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html