Re: [Bugme-new] [Bug 7974] New: BUG: scheduling while atomic: swapper/0x10000100/0

2007-02-19 Thread Andy Gospodarek
On Thu, Feb 15, 2007 at 03:45:23PM -0800, Jay Vosburgh wrote:
 
   For the short term, yes, I don't have any disagreement with
 switching the timer based stuff over to workqueues.  Basically a one for
 one replacement to get the functions in a process context and tweak the
 locking.

I did some testing of mine last week and my patch definitely has some
issues.  I'm running into a problem that is similar to the thread
started last week titled BUG] RTNL and flush_scheduled_work
deadlocks but I think I can patch around that if needed.

 
   I do think we're having a little confusion over details of
 terminology; if I'm not mistaken, you're thinking that workqueue means
 single threaded: even though each individual monitor thingie is a
 separate piece of work, they still can't collide.
 
   That's true, but (unless I've missed a call somewhere) there
 isn't a wq_pause_for_a_bit type of call (that, e.g., waits for
 anything running to stop, then doesn't run any further work until we
 later tell it to), so suspending all of the periodic things running for
 the bond is more hassle than if there's just one schedulable work thing,
 which internally calls the right functions to do the various things.
 This is also single threaded, but easier to stop and start.  It seems to
 be simpler to have multiple link monitors running in such a system as
 well (without having them thrashing the link state as would happen now).
 

I see by looking at your patch that you keep a list of timers and only
schedule work for the event that will happen next.  I've seen timer
implementations like this before and feel its reasonable.  It would be
good to account for skew, but other than that it seems like a reasonable
solution -- though it is too bad that workqueues and their behavior seem
like somewhat of a mystery to most and cause people to code around them
(I don't blame you one bit).

I also plan to start testing your patch later this week and will let you
know what I find.

-andy


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Bugme-new] [Bug 7974] New: BUG: scheduling while atomic: swapper/0x10000100/0

2007-02-15 Thread Andy Gospodarek
On Tue, Feb 13, 2007 at 06:11:15PM -0800, Jay Vosburgh wrote:
 Andy Gospodarek [EMAIL PROTECTED] wrote:
 
 This is exactly the problem I've got, Jay.  I'd love to come up with
 something that will be a smaller patch to solve this in the near term
 and then focus on a larger set of changes down the road but it doesn't
 seem likely.
 
   That's more or less the conclusion I've reached as well.  The
 locking model needs to be redone from scratch; there are too many
 players and the conditions have changed since things were originally
 done (in terms of side effects, particularly the places that may sleep
 today that didn't in the past).  It's tricky to redo the bulk of the
 innards in small modular steps.
 

That might be the case, but I feel like the code is in a place where it
*could* work tomorrow with just some small tweaks (getting everything
into process context).  I'm reluctant to cram a whole bunch of changes
down someone's throat immediately (with a big patch) because it will be
difficult to learn incrementally what is being done well and what isn't
based on user feedback.

 Andy, one thought: do you think it would work better to simplify
  the locking that is there first, i.e., convert the timers to work
  queues, have a single dispatcher that handles everything (and can be
  suspended for mutexing purposes), as in the patch I sent you?  The
  problem isn't just rtnl; there also has to be a release of the bonding
  locks themselves (to handle the might sleep issues), and that's tricky
  to do with so many entities operating concurrently.  Reducing the number
  of involved parties should make the problem simpler.
  
 
 I really don't feel like there are that many operations happening
 concurrently, but having a workqueue that managed and dispatched the
 operations and detected current link status would probably be helpful
 for long term maintenance.  It would probably be wise to have individual
 workqueues that managed any mode-specific operations, so their processing
 doesn't interfere with any link-checking operations.
 
   I like having the mode-specific monitors simply be special case
 monitors in a unified monitor system. It resolves the link-check
 vs. mode-specific conflict, and allows all of the periodic things to be
 paused as a set for mutexing purposes.  I'm happy to be convinced that
 I'm just blowing hooey here, but that's how it seems to me.
 

I went back and looked at my patch from a few months ago and I actually
use a single-threaded workqueue for each bond device.  I did not use
different queues to replace each timer -- just different types of work
that is placed on the queue.  So it sounds like we agree -- now we just
need to pick a decent starting point?

   For balance-alb (the biggest offender in terms of locking
 violations), the link monitor, alb mode monitor, transmit activity, and
 user initiated stuff (add or remove slave, for example) all need to be
 mutexed against one another.  The user initiated stuff comes in with
 rtnl and the link monitor needs rtnl if it has to fail over.  All of
 them need the regular bonding lock, but the user initiated stuff and the
 link monitor both need to do things (change mac addresses, call
 dev_open) with that lock released.
 
   Do you have any work in progress patches or descendents of the
 big rework thing I sent you?  I need to get back to this, and whatever
 we do it's probably better if we're at least a little bit coordinated.
 

I've updated my original patch with some changes that take the rtnetlink
lock deeper into the code without adding any conditional locking.  I'm
going to work with it more today and let you know if I think it is
better or worse that what I had before.


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Bugme-new] [Bug 7974] New: BUG: scheduling while atomic: swapper/0x10000100/0

2007-02-15 Thread Jay Vosburgh
Andy Gospodarek [EMAIL PROTECTED] wrote:

[...]
That might be the case, but I feel like the code is in a place where it
*could* work tomorrow with just some small tweaks (getting everything
into process context).  I'm reluctant to cram a whole bunch of changes
down someone's throat immediately (with a big patch) because it will be
difficult to learn incrementally what is being done well and what isn't
based on user feedback.

Fair enough.

  I like having the mode-specific monitors simply be special case
 monitors in a unified monitor system. It resolves the link-check
 vs. mode-specific conflict, and allows all of the periodic things to be
 paused as a set for mutexing purposes.  I'm happy to be convinced that
 I'm just blowing hooey here, but that's how it seems to me.
 

I went back and looked at my patch from a few months ago and I actually
use a single-threaded workqueue for each bond device.  I did not use
different queues to replace each timer -- just different types of work
that is placed on the queue.  So it sounds like we agree -- now we just
need to pick a decent starting point?

For the short term, yes, I don't have any disagreement with
switching the timer based stuff over to workqueues.  Basically a one for
one replacement to get the functions in a process context and tweak the
locking.

I do think we're having a little confusion over details of
terminology; if I'm not mistaken, you're thinking that workqueue means
single threaded: even though each individual monitor thingie is a
separate piece of work, they still can't collide.

That's true, but (unless I've missed a call somewhere) there
isn't a wq_pause_for_a_bit type of call (that, e.g., waits for
anything running to stop, then doesn't run any further work until we
later tell it to), so suspending all of the periodic things running for
the bond is more hassle than if there's just one schedulable work thing,
which internally calls the right functions to do the various things.
This is also single threaded, but easier to stop and start.  It seems to
be simpler to have multiple link monitors running in such a system as
well (without having them thrashing the link state as would happen now).

-J

---
-Jay Vosburgh, IBM Linux Technology Center, [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Bugme-new] [Bug 7974] New: BUG: scheduling while atomic: swapper/0x10000100/0

2007-02-13 Thread Andy Gospodarek
On Fri, Feb 09, 2007 at 01:38:02PM -0800, Andrew Morton wrote:
 
 cond_resched() called from softirq, amongst other problems.
 
 On Fri, 9 Feb 2007 08:23:44 -0800
 [EMAIL PROTECTED] wrote:
 
  http://bugzilla.kernel.org/show_bug.cgi?id=7974
  
 Summary: BUG: scheduling while atomic: swapper/0x1100/0
  Kernel Version: 2.6.20
  Status: NEW
Severity: normal
   Owner: [EMAIL PROTECTED]
   Submitter: [EMAIL PROTECTED]
  
  
  The machine hangs in normal boot with 2.6.19 and 2.6.20 after network 
  starts. If
  I boot in single mode and start the services manually, the machine and 
  network
  works fine, but I see this on dmesg:
  
  
  Call Trace:
   IRQ  [802607d0] __sched_text_start+0x60/0xb27
   [8802c8a1] :tg3:tg3_setup_copper_phy+0x9d9/0xad9
   [8026f8d5] smp_local_timer_interrupt+0x34/0x5b
   [8802d6d4] :tg3:tg3_setup_phy+0xd33/0xe16
   [8027f572] __cond_resched+0x1c/0x44
   [802613aa] cond_resched+0x2e/0x39
   [80209f5a] kmem_cache_alloc+0x14/0x58
   [8022df6f] __alloc_skb+0x36/0x134
   [804a4ea7] rtmsg_ifinfo+0x28/0xa1
   [804a4f81] rtnetlink_event+0x61/0x68
   [8028ad24] notifier_call_chain+0x24/0x36
   [8049ccad] dev_set_mac_address+0x53/0x66
   [88019648] :bonding:alb_set_slave_mac_addr+0x4b/0x73
   [88019b0c] :bonding:alb_swap_mac_addr+0x8b/0x15c
   [804a02e7] dev_mc_add+0x137/0x148
   [88014338] :bonding:bond_change_active_slave+0x1ea/0x34d
   [8801481d] :bonding:bond_select_active_slave+0xbe/0xf4
   [880165e5] :bonding:bond_mii_monitor+0x401/0x45b
   [880161e4] :bonding:bond_mii_monitor+0x0/0x45b
   [80287ed2] run_timer_softirq+0x142/0x1c0
   [80211f6a] __do_softirq+0x5c/0xd2
   [8025ef4c] call_softirq+0x1c/0x28
   [80268e95] do_softirq+0x2c/0x87
   [8026fdd7] smp_apic_timer_interrupt+0x57/0x6c
   [8025691a] mwait_idle+0x0/0x45
   [8025e9f6] apic_timer_interrupt+0x66/0x70
   EOI  [8025695c] mwait_idle+0x42/0x45
   [802484e5] cpu_idle+0x5b/0x7a
   [8065b88a] start_secondary+0x4e0/0x4f6
  
  RTNL: assertion failed at net/core/fib_rules.c (444)
  
  Call Trace:
   IRQ  [804a8abb] fib_rules_event+0x3b/0x120
   [8028ad24] notifier_call_chain+0x24/0x36
   [8049ccad] dev_set_mac_address+0x53/0x66
   [88019648] :bonding:alb_set_slave_mac_addr+0x4b/0x73
   [88019b0c] :bonding:alb_swap_mac_addr+0x8b/0x15c
   [804a02e7] dev_mc_add+0x137/0x148
   [88014338] :bonding:bond_change_active_slave+0x1ea/0x34d
   [8801481d] :bonding:bond_select_active_slave+0xbe/0xf4
   [880165e5] :bonding:bond_mii_monitor+0x401/0x45b
   [880161e4] :bonding:bond_mii_monitor+0x0/0x45b
   [80287ed2] run_timer_softirq+0x142/0x1c0
   [80211f6a] __do_softirq+0x5c/0xd2
   [8025ef4c] call_softirq+0x1c/0x28
   [80268e95] do_softirq+0x2c/0x87
   [8026fdd7] smp_apic_timer_interrupt+0x57/0x6c
   [8025691a] mwait_idle+0x0/0x45
   [8025e9f6] apic_timer_interrupt+0x66/0x70
   EOI  [8025695c] mwait_idle+0x42/0x45
   [802484e5] cpu_idle+0x5b/0x7a
   [8065b88a] start_secondary+0x4e0/0x4f6
  
  BUG: scheduling while atomic: swapper/0x1100/0
  
  Call Trace:
   IRQ  [802607d0] __sched_text_start+0x60/0xb27
   [8026f8d5] smp_local_timer_interrupt+0x34/0x5b
   [8027f572] __cond_resched+0x1c/0x44
   [802613aa] cond_resched+0x2e/0x39
   [80262029] mutex_lock+0x9/0x18
   [8049ea76] netdev_run_todo+0x16/0x230
   [804bcc75] dst_rcu_free+0x0/0x3f
   [804d9c29] inetdev_event+0x29/0x2d0
   [80263138] _spin_lock_bh+0x9/0x19
   [8023d84a] rt_run_flush+0x92/0xcc
   [8028ad24] notifier_call_chain+0x24/0x36
   [8049ccad] dev_set_mac_address+0x53/0x66
   [88019648] :bonding:alb_set_slave_mac_addr+0x4b/0x73
   [88019b0c] :bonding:alb_swap_mac_addr+0x8b/0x15c
   [804a02e7] dev_mc_add+0x137/0x148
   [88014338] :bonding:bond_change_active_slave+0x1ea/0x34d
   [8801481d] :bonding:bond_select_active_slave+0xbe/0xf4
   [880165e5] :bonding:bond_mii_monitor+0x401/0x45b
   [880161e4] :bonding:bond_mii_monitor+0x0/0x45b
   [80287ed2] run_timer_softirq+0x142/0x1c0
   [80211f6a] __do_softirq+0x5c/0xd2
   [8025ef4c] call_softirq+0x1c/0x28
   [80268e95] do_softirq+0x2c/0x87
   [8026fdd7] smp_apic_timer_interrupt+0x57/0x6c
   [8025691a] mwait_idle+0x0/0x45
   [8025e9f6] apic_timer_interrupt+0x66/0x70
   EOI  [8025695c] mwait_idle+0x42/0x45
   [802484e5] cpu_idle+0x5b/0x7a
   [8065b88a] start_secondary+0x4e0/0x4f6
  
  RTNL: assertion failed at net/ipv4/devinet.c (1055)
  
  Call Trace:
   IRQ  

Re: [Bugme-new] [Bug 7974] New: BUG: scheduling while atomic: swapper/0x10000100/0

2007-02-13 Thread Jay Vosburgh
Andy Gospodarek [EMAIL PROTECTED] wrote:

[...]
I've been working off and on for a little while to resolve these issues
and even posted a patch not long ago to address some these by removing
the timers and using workqueues instead.  This enabled resolution of
quite a few of the issues with bonding since the code was no longer
running in an atomic context and could now more easily take locks.
[...]
On the side I've also been working to keep the timers and take the rtnl
lock in the correct place so avoid messages like these:

I've also been giving this some thought.

I'm all in favor of moving the timers to work queues, and
eventually dispatching all of the various things from a single common
handler.  The advantage of concentrating everything like that it
effectively mutexes all of the timer things against one another, and
disabling (pausing) the work queue then provides mutexing against any of
the things that are dispated by timed events.  That single thread can
acquire rtnl when it needs to, which mutexes against the user initiated
events (add or remove slave, change active, etc) since those also hold
rtnl.  Some of that is further down the road, but I've prototyped it
out.

In reference to Andy's recent patch (which first did conditional
locking for rtnl, then later acquired rtnl for every entry into the
timer function), I know the conditional locking isn't popular, but it
seems to me that it's a less bad alternative than holding rtnl every
time the bond_mii_monitor() runs (typically 10 - 50 times per second).
Or is the rtnl lock really so cheap that this isn't an issue?  The
overwhelming majority of cases the mii_monitor won't need to do anything
that requires rtnl, so only holding it when needed is better.

Moving the slave lists to read copy doesn't help with this
particular problem, as it's necessary to read copy unlock before going
into something that might sleep (which is the big problem, when the alb
code calls dev_set_mac_address()).  Unless I'm misremembering how RCU
hangs together.

-J

---
-Jay Vosburgh, IBM Linux Technology Center, [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Bugme-new] [Bug 7974] New: BUG: scheduling while atomic: swapper/0x10000100/0

2007-02-13 Thread David Miller
From: Jay Vosburgh [EMAIL PROTECTED]
Date: Tue, 13 Feb 2007 14:26:28 -0800

   In reference to Andy's recent patch (which first did conditional
 locking for rtnl, then later acquired rtnl for every entry into the
 timer function), I know the conditional locking isn't popular, but it
 seems to me that it's a less bad alternative than holding rtnl every
 time the bond_mii_monitor() runs (typically 10 - 50 times per second).
 Or is the rtnl lock really so cheap that this isn't an issue?  The
 overwhelming majority of cases the mii_monitor won't need to do anything
 that requires rtnl, so only holding it when needed is better.

We definitely don't want to take the RTNL that often if it can
be avoided.

Maybe if you put the RTNL acquisition deeper into the call
path, ie. down into the code that knows RTNL is needed,
perhaps it won't be so ugly.  Replace the conditions with
functions.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Bugme-new] [Bug 7974] New: BUG: scheduling while atomic: swapper/0x10000100/0

2007-02-13 Thread Jay Vosburgh
Andy Gospodarek [EMAIL PROTECTED] wrote:

On Tue, Feb 13, 2007 at 02:32:43PM -0800, David Miller wrote:
[...]
 Maybe if you put the RTNL acquisition deeper into the call
 path, ie. down into the code that knows RTNL is needed,
 perhaps it won't be so ugly.  Replace the conditions with
 functions.

That is almost exactly what I am working on right now.  I'm trying to
determine where the best place to put this would be so reduce the
chance that I'd be using conditional locking.  

It's complicated to do this because the small number of places
that need rtnl are way down at the bottom of the chain, and the top of
the chain can be entered either with or without rtnl, and not knowing if
we'll actually end up doing the need rtnl bits or not until we're
pretty far down the chain.  Hence my original prototype that I sent to
Andy that passed down have rtnl status to the lower levels.

Andy, one thought: do you think it would work better to simplify
the locking that is there first, i.e., convert the timers to work
queues, have a single dispatcher that handles everything (and can be
suspended for mutexing purposes), as in the patch I sent you?  The
problem isn't just rtnl; there also has to be a release of the bonding
locks themselves (to handle the might sleep issues), and that's tricky
to do with so many entities operating concurrently.  Reducing the number
of involved parties should make the problem simpler.

I once put together a patch that used a macro like this (ignore the
whitespace problems, this was just a cut and paste):

/**
 * bond_rtnl_wrapper - take the rtnl_lock if needed
 * @x: function with args
 *
 */
#define RTNL_WRAPPER(x)\
({ \
   int __rc__; \
   if (rtnl_trylock()) {   \
   __rc__ = x; \
   rtnl_unlock();  \
   } else {\
   __rc__ = x; \
   }   \
   __rc__; \
})


and wrapped it around the calls to dev_set_mac_address.  I wasn't
pleased with it, but it seemed like it worked pretty well based on the
testing I did.

The problem with this is that it'll cause failures in the case
that some other unrelated entity holds rtnl (x will be performed
concurrently with whomever actually holds rtnl).

-J

---
-Jay Vosburgh, IBM Linux Technology Center, [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Bugme-new] [Bug 7974] New: BUG: scheduling while atomic: swapper/0x10000100/0

2007-02-13 Thread Andy Gospodarek
On Tue, Feb 13, 2007 at 02:32:43PM -0800, David Miller wrote:
 From: Jay Vosburgh [EMAIL PROTECTED]
 Date: Tue, 13 Feb 2007 14:26:28 -0800
 
  In reference to Andy's recent patch (which first did conditional
  locking for rtnl, then later acquired rtnl for every entry into the
  timer function), I know the conditional locking isn't popular, but it
  seems to me that it's a less bad alternative than holding rtnl every
  time the bond_mii_monitor() runs (typically 10 - 50 times per second).
  Or is the rtnl lock really so cheap that this isn't an issue?  The
  overwhelming majority of cases the mii_monitor won't need to do anything
  that requires rtnl, so only holding it when needed is better.
 
 We definitely don't want to take the RTNL that often if it can
 be avoided.
 
 Maybe if you put the RTNL acquisition deeper into the call
 path, ie. down into the code that knows RTNL is needed,
 perhaps it won't be so ugly.  Replace the conditions with
 functions.


That is almost exactly what I am working on right now.  I'm trying to
determine where the best place to put this would be so reduce the
chance that I'd be using conditional locking.  

I once put together a patch that used a macro like this (ignore the
whitespace problems, this was just a cut and paste):

/**
 * bond_rtnl_wrapper - take the rtnl_lock if needed
 * @x: function with args
 *
 */
#define RTNL_WRAPPER(x)\
({ \
   int __rc__; \
   if (rtnl_trylock()) {   \
   __rc__ = x; \
   rtnl_unlock();  \
   } else {\
   __rc__ = x; \
   }   \
   __rc__; \
})


and wrapped it around the calls to dev_set_mac_address.  I wasn't
pleased with it, but it seemed like it worked pretty well based on the
testing I did.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Bugme-new] [Bug 7974] New: BUG: scheduling while atomic: swapper/0x10000100/0

2007-02-13 Thread Stephen Hemminger

 /**
  * bond_rtnl_wrapper - take the rtnl_lock if needed
  * @x: function with args
  *
  */
 #define RTNL_WRAPPER(x)\
 ({ \
int __rc__; \
if (rtnl_trylock()) {   \
__rc__ = x; \
rtnl_unlock();  \
} else {\
__rc__ = x; \
}   \
__rc__; \
 })
 
 
 and wrapped it around the calls to dev_set_mac_address.  I wasn't
 pleased with it, but it seemed like it worked pretty well based on the
 testing I did.
 
   The problem with this is that it'll cause failures in the case
 that some other unrelated entity holds rtnl (x will be performed
 concurrently with whomever actually holds rtnl).
 
   -J

It is also a crap macro.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Bugme-new] [Bug 7974] New: BUG: scheduling while atomic: swapper/0x10000100/0

2007-02-13 Thread Andy Gospodarek
On Tue, Feb 13, 2007 at 03:54:46PM -0800, Stephen Hemminger wrote:
  
  The problem with this is that it'll cause failures in the case
  that some other unrelated entity holds rtnl (x will be performed
  concurrently with whomever actually holds rtnl).
  
 
 It is also a crap macro.

^^^ The 2 reasons why I never posted it for actual inclusion, but just
used it for testing


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Bugme-new] [Bug 7974] New: BUG: scheduling while atomic: swapper/0x10000100/0

2007-02-13 Thread Andy Gospodarek
On Tue, Feb 13, 2007 at 03:33:00PM -0800, Jay Vosburgh wrote:
 Andy Gospodarek [EMAIL PROTECTED] wrote:
 
 On Tue, Feb 13, 2007 at 02:32:43PM -0800, David Miller wrote:
 [...]
  Maybe if you put the RTNL acquisition deeper into the call
  path, ie. down into the code that knows RTNL is needed,
  perhaps it won't be so ugly.  Replace the conditions with
  functions.
 
 That is almost exactly what I am working on right now.  I'm trying to
 determine where the best place to put this would be so reduce the
 chance that I'd be using conditional locking.  
 
   It's complicated to do this because the small number of places
 that need rtnl are way down at the bottom of the chain, and the top of
 the chain can be entered either with or without rtnl, and not knowing if
 we'll actually end up doing the need rtnl bits or not until we're
 pretty far down the chain.  Hence my original prototype that I sent to
 Andy that passed down have rtnl status to the lower levels.

This is exactly the problem I've got, Jay.  I'd love to come up with
something that will be a smaller patch to solve this in the near term
and then focus on a larger set of changes down the road but it doesn't
seem likely.

   Andy, one thought: do you think it would work better to simplify
 the locking that is there first, i.e., convert the timers to work
 queues, have a single dispatcher that handles everything (and can be
 suspended for mutexing purposes), as in the patch I sent you?  The
 problem isn't just rtnl; there also has to be a release of the bonding
 locks themselves (to handle the might sleep issues), and that's tricky
 to do with so many entities operating concurrently.  Reducing the number
 of involved parties should make the problem simpler.
 

I really don't feel like there are that many operations happening
concurrently, but having a workqueue that managed and dispatched the
operations and detected current link status would probably be helpful
for long term maintenance.  It would probably be wise to have individual
workqueues that managed any mode-specific operations, so their processing
doesn't interfere with any link-checking operations.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Bugme-new] [Bug 7974] New: BUG: scheduling while atomic: swapper/0x10000100/0

2007-02-13 Thread Jay Vosburgh
Andy Gospodarek [EMAIL PROTECTED] wrote:

This is exactly the problem I've got, Jay.  I'd love to come up with
something that will be a smaller patch to solve this in the near term
and then focus on a larger set of changes down the road but it doesn't
seem likely.

That's more or less the conclusion I've reached as well.  The
locking model needs to be redone from scratch; there are too many
players and the conditions have changed since things were originally
done (in terms of side effects, particularly the places that may sleep
today that didn't in the past).  It's tricky to redo the bulk of the
innards in small modular steps.

  Andy, one thought: do you think it would work better to simplify
 the locking that is there first, i.e., convert the timers to work
 queues, have a single dispatcher that handles everything (and can be
 suspended for mutexing purposes), as in the patch I sent you?  The
 problem isn't just rtnl; there also has to be a release of the bonding
 locks themselves (to handle the might sleep issues), and that's tricky
 to do with so many entities operating concurrently.  Reducing the number
 of involved parties should make the problem simpler.
 

I really don't feel like there are that many operations happening
concurrently, but having a workqueue that managed and dispatched the
operations and detected current link status would probably be helpful
for long term maintenance.  It would probably be wise to have individual
workqueues that managed any mode-specific operations, so their processing
doesn't interfere with any link-checking operations.

I like having the mode-specific monitors simply be special case
monitors in a unified monitor system. It resolves the link-check
vs. mode-specific conflict, and allows all of the periodic things to be
paused as a set for mutexing purposes.  I'm happy to be convinced that
I'm just blowing hooey here, but that's how it seems to me.

For balance-alb (the biggest offender in terms of locking
violations), the link monitor, alb mode monitor, transmit activity, and
user initiated stuff (add or remove slave, for example) all need to be
mutexed against one another.  The user initiated stuff comes in with
rtnl and the link monitor needs rtnl if it has to fail over.  All of
them need the regular bonding lock, but the user initiated stuff and the
link monitor both need to do things (change mac addresses, call
dev_open) with that lock released.

Do you have any work in progress patches or descendents of the
big rework thing I sent you?  I need to get back to this, and whatever
we do it's probably better if we're at least a little bit coordinated.

-J

---
-Jay Vosburgh, IBM Linux Technology Center, [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Bugme-new] [Bug 7974] New: BUG: scheduling while atomic: swapper/0x10000100/0

2007-02-11 Thread Herbert Xu
Francois Romieu [EMAIL PROTECTED] wrote:
 
 diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
 index e76539a..a4bcfe2 100644
 --- a/net/core/rtnetlink.c
 +++ b/net/core/rtnetlink.c
 @@ -673,7 +673,7 @@ void rtmsg_ifinfo(int type, struct net_device *dev, 
 unsigned change)
struct sk_buff *skb;
int err = -ENOBUFS;
 
 -   skb = nlmsg_new(if_nlmsg_size(0), GFP_KERNEL);
 +   skb = nlmsg_new(if_nlmsg_size(0), GFP_ATOMIC);
if (skb == NULL)
goto errout;

This should only be called under RTNL so it should never be atomic.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED]
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Bugme-new] [Bug 7974] New: BUG: scheduling while atomic: swapper/0x10000100/0

2007-02-09 Thread Andrew Morton

cond_resched() called from softirq, amongst other problems.

On Fri, 9 Feb 2007 08:23:44 -0800
[EMAIL PROTECTED] wrote:

 http://bugzilla.kernel.org/show_bug.cgi?id=7974
 
Summary: BUG: scheduling while atomic: swapper/0x1100/0
 Kernel Version: 2.6.20
 Status: NEW
   Severity: normal
  Owner: [EMAIL PROTECTED]
  Submitter: [EMAIL PROTECTED]
 
 
 The machine hangs in normal boot with 2.6.19 and 2.6.20 after network starts. 
 If
 I boot in single mode and start the services manually, the machine and network
 works fine, but I see this on dmesg:
 
 Linux version 2.6.20 ([EMAIL PROTECTED]) (gcc version 4.1.0 20060515 (Red
 Hat 4.1.0-18)) #6 SMP Mon Feb 5 15:14:30 BRST 2007
 Command line: ro root=/dev/sda2 vga=791 
 BIOS-provided physical RAM map:
  BIOS-e820:  - 000a (usable)
  BIOS-e820: 0010 - 3f7c (usable)
  BIOS-e820: 3f7c - 3f7cfc00 (ACPI data)
  BIOS-e820: 3f7cfc00 - 3f7ff000 (reserved)
  BIOS-e820: 3f80 - 4000 (reserved)
  BIOS-e820: e000 - fed00400 (reserved)
  BIOS-e820: fed13000 - feda (reserved)
  BIOS-e820: fee0 - fee1 (reserved)
  BIOS-e820: ffb0 - 0001 (reserved)
 Entering add_active_range(0, 0, 160) 0 entries of 256 used
 Entering add_active_range(0, 256, 260032) 1 entries of 256 used
 end_pfn_map = 1048576
 DMI 2.3 present.
 ACPI: RSDP (v000 DELL  ) @ 0x000fd690
 ACPI: RSDT (v001 DELL   PE8000x0001 MSFT 0x010a) @ 
 0x000fd6a4
 ACPI: FADT (v001 DELL   PE8000x0001 MSFT 0x010a) @ 
 0x000fd6dc
 ACPI: MADT (v001 DELL   PE8000x0001 MSFT 0x010a) @ 
 0x000fd750
 ACPI: SPCR (v001 DELL   PE8000x0001 MSFT 0x010a) @ 
 0x000fd7c4
 ACPI: HPET (v001 DELL   PE8000x0001 MSFT 0x010a) @ 
 0x000fd814
 ACPI: MCFG (v001 DELL   PE8000x0001 MSFT 0x010a) @ 
 0x000fd84c
 ACPI: DSDT (v001 DELL   PE8000x0001 MSFT 0x010e) @ 
 0x
 Entering add_active_range(0, 0, 160) 0 entries of 256 used
 Entering add_active_range(0, 256, 260032) 1 entries of 256 used
 Zone PFN ranges:
   DMA 0 - 4096
   DMA324096 -  1048576
   Normal1048576 -  1048576
 early_node_map[2] active PFN ranges
 0:0 -  160
 0:  256 -   260032
 On node 0 totalpages: 259936
   DMA zone: 56 pages used for memmap
   DMA zone: 1279 pages reserved
   DMA zone: 2665 pages, LIFO batch:0
   DMA32 zone: 3499 pages used for memmap
   DMA32 zone: 252437 pages, LIFO batch:31
   Normal zone: 0 pages used for memmap
 ACPI: PM-Timer IO Port: 0x808
 ACPI: Local APIC address 0xfee0
 ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled)
 Processor #0 (Bootup-CPU)
 ACPI: LAPIC (acpi_id[0x02] lapic_id[0x01] enabled)
 Processor #1
 ACPI: LAPIC_NMI (acpi_id[0x01] high edge lint[0x1])
 ACPI: LAPIC_NMI (acpi_id[0x02] high edge lint[0x1])
 ACPI: IOAPIC (id[0x02] address[0xfec0] gsi_base[0])
 IOAPIC[0]: apic_id 2, address 0xfec0, GSI 0-23
 ACPI: IOAPIC (id[0x03] address[0xfec8] gsi_base[32])
 IOAPIC[1]: apic_id 3, address 0xfec8, GSI 32-55
 ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
 ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
 ACPI: IRQ0 used by override.
 ACPI: IRQ2 used by override.
 ACPI: IRQ9 used by override.
 Setting APIC routing to flat
 ACPI: HPET id: 0x base: 0xfed0
 Using ACPI (MADT) for SMP configuration information
 Nosave address range: 000a - 0010
 Allocating PCI resources starting at 5000 (gap: 4000:a000)
 PERCPU: Allocating 33024 bytes of per cpu data
 Built 1 zonelists.  Total pages: 255102
 Kernel command line: ro root=/dev/sda2 vga=791 
 Initializing CPU#0
 PID hash table entries: 4096 (order: 12, 32768 bytes)
 Console: colour dummy device 80x25
 Dentry cache hash table entries: 131072 (order: 8, 1048576 bytes)
 Inode-cache hash table entries: 65536 (order: 7, 524288 bytes)
 Checking aperture...
 Memory: 1017476k/1040128k available (3075k kernel code, 21768k reserved, 1225k
 data, 248k init)
 Calibrating delay using timer specific routine.. 6406.66 BogoMIPS 
 (lpj=12813334)
 Mount-cache hash table entries: 256
 CPU: Trace cache: 12K uops, L1 D cache: 16K
 CPU: L2 cache: 1024K
 using mwait in idle threads.
 CPU: Physical Processor ID: 0
 CPU: Processor Core ID: 0
 CPU0: Thermal monitoring enabled (TM1)
 Freeing SMP alternatives: 32k freed
 ACPI: Core revision 20060707
 Using local APIC timer interrupts.
 result 12500392
 Detected 12.500 MHz APIC timer.
 Booting processor 1/2 APIC 0x1
 Initializing CPU#1
 Calibrating delay using timer specific routine.. 6400.14 BogoMIPS 
 (lpj=12800283)
 CPU: Trace cache: 12K uops, L1 D cache: 16K
 CPU: L2 cache: 1024K
 CPU: Physical Processor ID: 0
 

Re: [Bugme-new] [Bug 7974] New: BUG: scheduling while atomic: swapper/0x10000100/0

2007-02-09 Thread Francois Romieu
Andrew Morton [EMAIL PROTECTED] :
 
 cond_resched() called from softirq, amongst other problems.

Seems too simple to be right. Btw calling dev_set_mac_address
may hurt some tg3:

- tg3_set_mac_addr
   - tg3_netif_stop (depending on the content of their sram): 
  - netif_poll_disable
 - schedule_timeout_interruptible

diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index e76539a..a4bcfe2 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -673,7 +673,7 @@ void rtmsg_ifinfo(int type, struct net_device *dev, 
unsigned change)
struct sk_buff *skb;
int err = -ENOBUFS;
 
-   skb = nlmsg_new(if_nlmsg_size(0), GFP_KERNEL);
+   skb = nlmsg_new(if_nlmsg_size(0), GFP_ATOMIC);
if (skb == NULL)
goto errout;
 
@@ -681,7 +681,7 @@ void rtmsg_ifinfo(int type, struct net_device *dev, 
unsigned change)
/* failure implies BUG in if_nlmsg_size() */
BUG_ON(err  0);
 
-   err = rtnl_notify(skb, 0, RTNLGRP_LINK, NULL, GFP_KERNEL);
+   err = rtnl_notify(skb, 0, RTNLGRP_LINK, NULL, GFP_ATOMIC);
 errout:
if (err  0)
rtnl_set_sk_err(RTNLGRP_LINK, err);
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Bugme-new] [Bug 7974] New: BUG: scheduling while atomic: swapper/0x10000100/0

2007-02-09 Thread Michael Chan
On Fri, 2007-02-09 at 23:35 +0100, Francois Romieu wrote:
 Andrew Morton [EMAIL PROTECTED] :
  
  cond_resched() called from softirq, amongst other problems.
 
 Seems too simple to be right. Btw calling dev_set_mac_address
 may hurt some tg3:
 
 - tg3_set_mac_addr
- tg3_netif_stop (depending on the content of their sram): 
   - netif_poll_disable
  - schedule_timeout_interruptible
 

Yes, known problem.  Bonding calls dev_set_mac_address from unsleepable
context.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html