Re: BUG: soft lockup detected on CPU#0! (2.6.18.2 plus hacks)

2007-01-11 Thread Jarek Poplawski
On Thu, Jan 11, 2007 at 01:27:55AM -0800, David Miller wrote:
> From: Jarek Poplawski <[EMAIL PROTECTED]>
> Date: Thu, 11 Jan 2007 09:39:34 +0100
> 
> > Sure, but is this even legal to be preempted during
> > reading or modifying rcu list or be blocked while 
> > holding rcu protected pointer? Doesn't this disturb
> > rcu cycle and make possible memory release problems?
> 
> It's fine in this case.
> 
> Since the list cannot be changed by anyone else, and the hash linked
> list (as seen by readers) is modified atomically by a single store, it
> all works out.
> 
> Readers only look at foo->next in the hash traversal.  Since the
> preceeding element cannot change outside of the current writer,
> the ->next pointer to update is protected.
> 
> Readers therefore will either see the hash list with the entry or
> without.
> 
> We then use call_rcu() to make sure any reading threads that happened
> to get a glimpse of the hash entry before the hlist_del_rcu()
> completed will go away and drop their references before we free that
> entry.
> 
> I really don't see any problem here. :-)

Probably because you care more about internals and less 
about docs examples. It seems I'm too much about regulations. 
 
OK, I take your word and will try to stop annoy this list
with imagined RCU bugs, sorry.

Thanks for your precious (sleeping?) time
and explanations. Best regards,

Jarek P.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG: soft lockup detected on CPU#0! (2.6.18.2 plus hacks)

2007-01-11 Thread David Miller
From: Jarek Poplawski <[EMAIL PROTECTED]>
Date: Thu, 11 Jan 2007 09:39:34 +0100

> Sure, but is this even legal to be preempted during
> reading or modifying rcu list or be blocked while 
> holding rcu protected pointer? Doesn't this disturb
> rcu cycle and make possible memory release problems?

It's fine in this case.

Since the list cannot be changed by anyone else, and the hash linked
list (as seen by readers) is modified atomically by a single store, it
all works out.

Readers only look at foo->next in the hash traversal.  Since the
preceeding element cannot change outside of the current writer,
the ->next pointer to update is protected.

Readers therefore will either see the hash list with the entry or
without.

We then use call_rcu() to make sure any reading threads that happened
to get a glimpse of the hash entry before the hlist_del_rcu()
completed will go away and drop their references before we free that
entry.

I really don't see any problem here. :-)

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG: soft lockup detected on CPU#0! (2.6.18.2 plus hacks)

2007-01-11 Thread Jarek Poplawski
On Thu, Jan 11, 2007 at 09:35:26AM +0100, Jarek Poplawski wrote:
> On Thu, Jan 11, 2007 at 09:29:58AM +0100, Jarek Poplawski wrote:
> > On Wed, Jan 10, 2007 at 11:40:35PM -0800, David Miller wrote:
> > > From: Jarek Poplawski <[EMAIL PROTECTED]>
> > > Date: Thu, 11 Jan 2007 08:24:28 +0100
> > > 
> > > > Yesterday I did what I should do earlier - checked
> > > > this simple way, with printk, and now I have no doubts
> > > > it's a bug: if you add or remove vlan devices with
> > > > vconfig, register_vlan_device and unregister_vlan_dev
> > > > are called by ioctl and they use and change rcu
> > > > procetded data without preemption disabled so vlan
> > > > rcu hash lists could become corrupted or find results
> > > > could be wrong.
> > > 
> > > Those two operations do their modifications and changes under the RTNL
> > > semaphore, via rtnl_lock() and rtnl_unlock() which guarentees that no
> > > other modifications can occur.
> > 
> > Sure, but is this even legal to be preempted during
> 
> I should even say:
> 
> "... is this even legal to be blocked during ..."
> 
> > reading or modifying rcu list? Doesn't this disturb
> > rcu cycle and make possible memory release problems?

Sorry, one more time:

Sure, but is this even legal to be preempted during
reading or modifying rcu list or be blocked while 
holding rcu protected pointer? Doesn't this disturb
rcu cycle and make possible memory release problems?

Jarek P.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG: soft lockup detected on CPU#0! (2.6.18.2 plus hacks)

2007-01-11 Thread Jarek Poplawski
On Thu, Jan 11, 2007 at 09:29:58AM +0100, Jarek Poplawski wrote:
> On Wed, Jan 10, 2007 at 11:40:35PM -0800, David Miller wrote:
> > From: Jarek Poplawski <[EMAIL PROTECTED]>
> > Date: Thu, 11 Jan 2007 08:24:28 +0100
> > 
> > > Yesterday I did what I should do earlier - checked
> > > this simple way, with printk, and now I have no doubts
> > > it's a bug: if you add or remove vlan devices with
> > > vconfig, register_vlan_device and unregister_vlan_dev
> > > are called by ioctl and they use and change rcu
> > > procetded data without preemption disabled so vlan
> > > rcu hash lists could become corrupted or find results
> > > could be wrong.
> > 
> > Those two operations do their modifications and changes under the RTNL
> > semaphore, via rtnl_lock() and rtnl_unlock() which guarentees that no
> > other modifications can occur.
> 
> Sure, but is this even legal to be preempted during

I should even say:

"... is this even legal to be blocked during ..."

> reading or modifying rcu list? Doesn't this disturb
> rcu cycle and make possible memory release problems?

Jarek P.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG: soft lockup detected on CPU#0! (2.6.18.2 plus hacks)

2007-01-11 Thread Jarek Poplawski
On Wed, Jan 10, 2007 at 11:40:35PM -0800, David Miller wrote:
> From: Jarek Poplawski <[EMAIL PROTECTED]>
> Date: Thu, 11 Jan 2007 08:24:28 +0100
> 
> > Yesterday I did what I should do earlier - checked
> > this simple way, with printk, and now I have no doubts
> > it's a bug: if you add or remove vlan devices with
> > vconfig, register_vlan_device and unregister_vlan_dev
> > are called by ioctl and they use and change rcu
> > procetded data without preemption disabled so vlan
> > rcu hash lists could become corrupted or find results
> > could be wrong.
> 
> Those two operations do their modifications and changes under the RTNL
> semaphore, via rtnl_lock() and rtnl_unlock() which guarentees that no
> other modifications can occur.

Sure, but is this even legal to be preempted during
reading or modifying rcu list? Doesn't this disturb
rcu cycle and make possible memory release problems?

Jarek P.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG: soft lockup detected on CPU#0! (2.6.18.2 plus hacks)

2007-01-10 Thread David Miller
From: Jarek Poplawski <[EMAIL PROTECTED]>
Date: Thu, 11 Jan 2007 08:24:28 +0100

> Yesterday I did what I should do earlier - checked
> this simple way, with printk, and now I have no doubts
> it's a bug: if you add or remove vlan devices with
> vconfig, register_vlan_device and unregister_vlan_dev
> are called by ioctl and they use and change rcu
> procetded data without preemption disabled so vlan
> rcu hash lists could become corrupted or find results
> could be wrong.

Those two operations do their modifications and changes under the RTNL
semaphore, via rtnl_lock() and rtnl_unlock() which guarentees that no
other modifications can occur.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG: soft lockup detected on CPU#0! (2.6.18.2 plus hacks)

2007-01-10 Thread Jarek Poplawski
On Wed, Jan 10, 2007 at 12:01:23PM -0800, Stephen Hemminger wrote:
...
> Don't rely on books too heavily, they can get out of date
> with a simple code change.

I've tried to find this in the code at the beginning
and got mislead by the path with PREEMPT_BKL.
I think the books are necessary to get general ideas
and I tried to check why would I get so wrong ideas.

> The path that I am talking about is the receive skb path. All data
> received goes through netif_receive_skb and that does rcu_read_lock().
> This is done so that receive protocol list can be used with RCU (lock
> free). Since receiving is a time critical path, we want to process
> without having to do any locked operations; locked operations cause a
> processor force a cache miss and are one of the main CPU overheads.
> RCU requires no locked operation, but does prevent preemption.

I again think we talk about different subjects. Maybe
it's because of this thread - but I don't talk about
Ben's original problem no more - it's a problem of
linux vlans.

Yesterday I did what I should do earlier - checked
this simple way, with printk, and now I have no doubts
it's a bug: if you add or remove vlan devices with
vconfig, register_vlan_device and unregister_vlan_dev
are called by ioctl and they use and change rcu
procetded data without preemption disabled so vlan
rcu hash lists could become corrupted or find results
could be wrong.

Regards,
Jarek P.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG: soft lockup detected on CPU#0! (2.6.18.2 plus hacks)

2007-01-10 Thread Stephen Hemminger
On Wed, 10 Jan 2007 13:50:48 +0100
Jarek Poplawski <[EMAIL PROTECTED]> wrote:

> On Wed, Jan 10, 2007 at 10:04:11AM +0100, Jarek Poplawski wrote:
> ...
> > It looks like you're talking about the right thing
> > and I'm a fool again! Now I try to find why I even 
> > had to pay for this. I read again and again adequate
> > chapters from R. Love and C. Benvenuti's books, see
> > a lot about kernel preemption in 2.6, but can't see
> > anything about preemption disabled in ioctls - maybe
> > I'm blind or they are badly translated. Now I look
> > into "Linux Device Drivers", see ch. 6 about ioctls,
> > blocking I/O and RCU, but nothing about preemption
> > disabled again. Maybe this is omited because it's
> > obvious to people who started hacking with earlier
> > kernels?
> 
> ... or maybe it's even more complicated...
> 
> For the time being, I revoke my critique of these books.
> 
> Jarek P. 

Don't rely on books too heavily, they can get out of date
with a simple code change.

The path that I am talking about is the receive skb path. All data
received goes through netif_receive_skb and that does rcu_read_lock().
This is done so that receive protocol list can be used with RCU (lock
free). Since receiving is a time critical path, we want to process
without having to do any locked operations; locked operations cause a
processor force a cache miss and are one of the main CPU overheads.
RCU requires no locked operation, but does prevent preemption.

-- 
Stephen Hemminger <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG: soft lockup detected on CPU#0! (2.6.18.2 plus hacks)

2007-01-10 Thread Jarek Poplawski
On Wed, Jan 10, 2007 at 10:04:11AM +0100, Jarek Poplawski wrote:
...
> It looks like you're talking about the right thing
> and I'm a fool again! Now I try to find why I even 
> had to pay for this. I read again and again adequate
> chapters from R. Love and C. Benvenuti's books, see
> a lot about kernel preemption in 2.6, but can't see
> anything about preemption disabled in ioctls - maybe
> I'm blind or they are badly translated. Now I look
> into "Linux Device Drivers", see ch. 6 about ioctls,
> blocking I/O and RCU, but nothing about preemption
> disabled again. Maybe this is omited because it's
> obvious to people who started hacking with earlier
> kernels?

... or maybe it's even more complicated...

For the time being, I revoke my critique of these books.

Jarek P. 
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG: soft lockup detected on CPU#0! (2.6.18.2 plus hacks)

2007-01-10 Thread Jarek Poplawski
On Tue, Jan 09, 2007 at 09:10:45AM +0100, Jarek Poplawski wrote:
> On Mon, Jan 08, 2007 at 10:03:50AM -0800, Stephen Hemminger wrote:
...
> > > >  * Must be invoked with RCU read lock (no preempt)
> > > >  */
> > > > struct net_device *__find_vlan_dev(struct net_device *real_dev,
> > > > ...
> > > >
> > > > But later in this file no sign of disabling preemption
> > > > for these calls and for hlist_add_head_rcu and hlist_del_rcu.
> > > >
> > > > I can't imagine how this works?
> > 
> > Preempt is already disabled on the receive path.
> 
> I'm not sure you're talking about the same thing -

Hello Stephen,

It looks like you're talking about the right thing
and I'm a fool again! Now I try to find why I even 
had to pay for this. I read again and again adequate
chapters from R. Love and C. Benvenuti's books, see
a lot about kernel preemption in 2.6, but can't see
anything about preemption disabled in ioctls - maybe
I'm blind or they are badly translated. Now I look
into "Linux Device Drivers", see ch. 6 about ioctls,
blocking I/O and RCU, but nothing about preemption
disabled again. Maybe this is omited because it's
obvious to people who started hacking with earlier
kernels?

When I added to this things like: "If the mutex is
not available right now, it will sleep until it can
get it." and "It is illegal to block while in an RCU
read-side critical section." I didn't even try to
think about mutex or malloc with GFP_KERNEL inside
RCU block.
 
I'm enormously grateful you didn't lose patience
in guiding me yet - I hope it'll save this list from
nervous breakdown.

Many thanks and regards as always,

Jarek P.

PS: probably you could profit from this some day 
and write something like "Linux Internals for
Dummies" - it would be simple cut & paste of my
discoveries and your responses!
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG: soft lockup detected on CPU#0! (2.6.18.2 plus hacks)

2007-01-09 Thread Jarek Poplawski
On Mon, Jan 08, 2007 at 10:03:50AM -0800, Stephen Hemminger wrote:
> On Mon, 08 Jan 2007 08:57:10 -0800
> Ben Greear <[EMAIL PROTECTED]> wrote:
> 
> > Jarek Poplawski wrote:
> > > On Fri, Jan 05, 2007 at 12:33:43PM -0800, Ben Greear wrote:
> > > ...
> > >   
> > >> So, I do believe this was the problem we were hitting, and it seems 
> > >> fixed.
> > >> 
> > >
> > > Congratulations!
> > >
> > > But I can see one strange thing in vlan.c:
> > >
> > > /* Must be invoked with RCU read lock (no preempt) */
> > > static struct vlan_group *__vlan_find_group(int real_dev_ifindex)
> > > ...
> > >  * Must be invoked with RCU read lock (no preempt)
> > >  */
> > > struct net_device *__find_vlan_dev(struct net_device *real_dev,
> > > ...
> > >
> > > But later in this file no sign of disabling preemption
> > > for these calls and for hlist_add_head_rcu and hlist_del_rcu.
> > >
> > > I can't imagine how this works?
> 
> Preempt is already disabled on the receive path.

I'm not sure you're talking about the same thing -
there is blocking possible inside register_vlan_dev
and unregister_vlan_dev, grp pointer is held during
this blocking - I've thought it's only possible in
sleepable RCU...

Jarek P.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG: soft lockup detected on CPU#0! (2.6.18.2 plus hacks)

2007-01-08 Thread Stephen Hemminger
On Mon, 08 Jan 2007 08:57:10 -0800
Ben Greear <[EMAIL PROTECTED]> wrote:

> Jarek Poplawski wrote:
> > On Fri, Jan 05, 2007 at 12:33:43PM -0800, Ben Greear wrote:
> > ...
> >   
> >> So, I do believe this was the problem we were hitting, and it seems fixed.
> >> 
> >
> > Congratulations!
> >
> > But I can see one strange thing in vlan.c:
> >
> > /* Must be invoked with RCU read lock (no preempt) */
> > static struct vlan_group *__vlan_find_group(int real_dev_ifindex)
> > ...
> >  * Must be invoked with RCU read lock (no preempt)
> >  */
> > struct net_device *__find_vlan_dev(struct net_device *real_dev,
> > ...
> >
> > But later in this file no sign of disabling preemption
> > for these calls and for hlist_add_head_rcu and hlist_del_rcu.
> >
> > I can't imagine how this works?

Preempt is already disabled on the receive path.

> >   
> Perhaps...I didn't RCU-ify VLANs, but I can take a look.
> 
> For the record, the soft lockup was using MAC-VLANs, not 802.1Q VLANs, 
> so it wouldn't
> have been affected by bugs in VLANs one way or the other.
> 
> Ben
> 
> > Jarek P. 
> >   
> 
> 


-- 
Stephen Hemminger <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG: soft lockup detected on CPU#0! (2.6.18.2 plus hacks)

2007-01-08 Thread Ben Greear

Jarek Poplawski wrote:

On Fri, Jan 05, 2007 at 12:33:43PM -0800, Ben Greear wrote:
...
  

So, I do believe this was the problem we were hitting, and it seems fixed.



Congratulations!

But I can see one strange thing in vlan.c:

/* Must be invoked with RCU read lock (no preempt) */
static struct vlan_group *__vlan_find_group(int real_dev_ifindex)
...
 * Must be invoked with RCU read lock (no preempt)
 */
struct net_device *__find_vlan_dev(struct net_device *real_dev,
...

But later in this file no sign of disabling preemption
for these calls and for hlist_add_head_rcu and hlist_del_rcu.

I can't imagine how this works?
  

Perhaps...I didn't RCU-ify VLANs, but I can take a look.

For the record, the soft lockup was using MAC-VLANs, not 802.1Q VLANs, 
so it wouldn't

have been affected by bugs in VLANs one way or the other.

Ben

Jarek P. 
  



--
Ben Greear <[EMAIL PROTECTED]> 
Candela Technologies Inc  http://www.candelatech.com



-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG: soft lockup detected on CPU#0! (2.6.18.2 plus hacks)

2007-01-07 Thread Jarek Poplawski
On Fri, Jan 05, 2007 at 12:33:43PM -0800, Ben Greear wrote:
...
> So, I do believe this was the problem we were hitting, and it seems fixed.

Congratulations!

But I can see one strange thing in vlan.c:

/* Must be invoked with RCU read lock (no preempt) */
static struct vlan_group *__vlan_find_group(int real_dev_ifindex)
...
 * Must be invoked with RCU read lock (no preempt)
 */
struct net_device *__find_vlan_dev(struct net_device *real_dev,
...

But later in this file no sign of disabling preemption
for these calls and for hlist_add_head_rcu and hlist_del_rcu.

I can't imagine how this works?

Jarek P. 
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG: soft lockup detected on CPU#0! (2.6.18.2 plus hacks)

2007-01-05 Thread Ben Greear

David Miller wrote:

From: Herbert Xu <[EMAIL PROTECTED]>
Date: Thu, 04 Jan 2007 17:26:27 +1100


David Stevens <[EMAIL PROTECTED]> wrote:

   You're right, I don't know whether it'll fix the problem Ben saw
or not, but it looks like the original code can do a receive before the
in_device is fully initialized, and that, of course, is bad.
   If the device for ip_rcv() is not the same one we were
initializing when the receive interrupted, then the patch should have
no effect either way -- I don't think it'll hide other problems.
   If it's hard to reproduce (which I guess is true), then you're
right, no soft lockup doesn't really tell us if it's fixed or not.

Actually I missed your point that the multicast locks aren't even
initialised at that point.  So this does explain the soft lock-up
and therefore your patch is clearly the correct solution.


I agree too, therefore I've added David's patch as below.

I'll push this to the -stable branches as well.  This fix is
correct even if it does not entirely clear up the soft lockup
bug being discussed in this thread, but I think it will :-)


We were able to reproduce the problem twice on the un-patched 2.6.18.2 kernel 
in about
2 hours of our stress test yesterday.  I applied this patch (well, the
ipv4 part..the ipv6 won't apply to 2.6.18.2), and it has run the stress
test clean for a total of about 8 hours.

So, I do believe this was the problem we were hitting, and it seems fixed.

Thanks!
Ben

--
Ben Greear <[EMAIL PROTECTED]>
Candela Technologies Inc  http://www.candelatech.com

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG: soft lockup detected on CPU#0! (2.6.18.2 plus hacks)

2007-01-05 Thread David Miller
From: Ben Greear <[EMAIL PROTECTED]>
Date: Fri, 05 Jan 2007 12:33:43 -0800

> We were able to reproduce the problem twice on the un-patched 2.6.18.2 kernel 
> in about
> 2 hours of our stress test yesterday.  I applied this patch (well, the
> ipv4 part..the ipv6 won't apply to 2.6.18.2), and it has run the stress
> test clean for a total of about 8 hours.
> 
> So, I do believe this was the problem we were hitting, and it seems fixed.

Thanks for all the testing Ben, much appreciated!
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG: soft lockup detected on CPU#0! (2.6.18.2 plus hacks)

2007-01-05 Thread Jarek Poplawski
On Thu, Jan 04, 2007 at 09:04:29AM -0800, Ben Greear wrote:
> Jarek Poplawski wrote:
> >On Thu, Jan 04, 2007 at 09:27:07PM +1100, Herbert Xu wrote:
> >  
> >>On Thu, Jan 04, 2007 at 09:50:14AM +0100, Jarek Poplawski wrote:
> >>
> >>>Could you explain? I can see some inet_rtm_newaddr
> >>>interrupted. For me it could be e.g.:
> >>>
> >>>after
> >>>vconfig add eth0 9
> >>>
> >>>ip addr add dev eth0.9 ...
> >>>  
> >>Whether eth0.9 is up or not does not affect this at all.  The spin
> >>locks are initialised (and used) when the first IPv4 address is added,
> >>not when the device comes up.
> >>
> >
> >I understand this. I consider IFF_UP as a sign all 
> >initialisations (open functions including) are
> >completed and there is permission for working (so
> >logically, if I would do eth0.9 down all traffic
> >should be stopped, what probably isn't true now).
> >  
> It is certainly valid for an interface to be IF_UP, but have no IP 
> address.  My application
> does bring the network device up before it assigns the IP, for instance.

Yes, but I think in any case it isn't races safe
now with vlans. I thought more about the reverse
situation where skb->dev !IFF_UP could be
unnecessarily processed. But the same should be
valid according to the rest of initializations
which are done during address assigning. 

> There may be other issues with IF_UP, but that could be handled with a 
> different
> investigation.  If you have a particular test case that fails with 
> 802.1Q VLANs, then
> I will be happy to work on it...

Sorry, I even didn't use this yet... 

Wish you sunny weekend,

Jarek P.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG: soft lockup detected on CPU#0! (2.6.18.2 plus hacks)

2007-01-05 Thread Herbert Xu
On Fri, Jan 05, 2007 at 07:38:44AM +0100, Jarek Poplawski wrote:
> 
> I'd only suggest to change "goto out;" to
> "return NULL;" at the end of inetdev_init because
> now RCU is engaged unnecessarily.

I agree.  The RCU assignment should come before the out label.
Can you send a patch?

Thanks,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG: soft lockup detected on CPU#0! (2.6.18.2 plus hacks)

2007-01-04 Thread Jarek Poplawski
On Thu, Jan 04, 2007 at 12:33:33PM -0800, David Miller wrote:
> From: Herbert Xu <[EMAIL PROTECTED]>
> Date: Thu, 04 Jan 2007 17:26:27 +1100
> 
> > David Stevens <[EMAIL PROTECTED]> wrote:
> > >You're right, I don't know whether it'll fix the problem Ben saw
> > > or not, but it looks like the original code can do a receive before the
> > > in_device is fully initialized, and that, of course, is bad.
> > >If the device for ip_rcv() is not the same one we were
> > > initializing when the receive interrupted, then the patch should have
> > > no effect either way -- I don't think it'll hide other problems.
> > >If it's hard to reproduce (which I guess is true), then you're
> > > right, no soft lockup doesn't really tell us if it's fixed or not.
> > 
> > Actually I missed your point that the multicast locks aren't even
> > initialised at that point.  So this does explain the soft lock-up
> > and therefore your patch is clearly the correct solution.
> 
> I agree too, therefore I've added David's patch as below.
> 
> I'll push this to the -stable branches as well.  This fix is
> correct even if it does not entirely clear up the soft lockup
> bug being discussed in this thread, but I think it will :-)

After rethinking I came to similar conclusion.  I've
thought the changes are done only to fix this particular
bug but now I see the previous order wasn't right
particularly considering RCU.

So, I apologize to David L Stevens for my harsh words.

I'd only suggest to change "goto out;" to
"return NULL;" at the end of inetdev_init because
now RCU is engaged unnecessarily.

Regards,
Jarek P.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG: soft lockup detected on CPU#0! (2.6.18.2 plus hacks)

2007-01-04 Thread David Miller
From: Herbert Xu <[EMAIL PROTECTED]>
Date: Thu, 04 Jan 2007 17:26:27 +1100

> David Stevens <[EMAIL PROTECTED]> wrote:
> >You're right, I don't know whether it'll fix the problem Ben saw
> > or not, but it looks like the original code can do a receive before the
> > in_device is fully initialized, and that, of course, is bad.
> >If the device for ip_rcv() is not the same one we were
> > initializing when the receive interrupted, then the patch should have
> > no effect either way -- I don't think it'll hide other problems.
> >If it's hard to reproduce (which I guess is true), then you're
> > right, no soft lockup doesn't really tell us if it's fixed or not.
> 
> Actually I missed your point that the multicast locks aren't even
> initialised at that point.  So this does explain the soft lock-up
> and therefore your patch is clearly the correct solution.

I agree too, therefore I've added David's patch as below.

I'll push this to the -stable branches as well.  This fix is
correct even if it does not entirely clear up the soft lockup
bug being discussed in this thread, but I think it will :-)

commit 30c4cf577fb5b68c16e5750d6bdbd7072e42b279
Author: David L Stevens <[EMAIL PROTECTED]>
Date:   Thu Jan 4 12:31:14 2007 -0800

[IPV4/IPV6]: Fix inet{,6} device initialization order.

It is important that we only assign dev->ip{,6}_ptr
only after all portions of the inet{,6} are setup.

Otherwise we can receive packets before the multicast
spinlocks et al. are initialized.

Signed-off-by: David L Stevens <[EMAIL PROTECTED]>
Signed-off-by: David S. Miller <[EMAIL PROTECTED]>

diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
index 84bed40..25c8a42 100644
--- a/net/ipv4/devinet.c
+++ b/net/ipv4/devinet.c
@@ -165,9 +165,8 @@ struct in_device *inetdev_init(struct net_device *dev)
  NET_IPV4_NEIGH, "ipv4", NULL, NULL);
 #endif
 
-   /* Account for reference dev->ip_ptr */
+   /* Account for reference dev->ip_ptr (below) */
in_dev_hold(in_dev);
-   rcu_assign_pointer(dev->ip_ptr, in_dev);
 
 #ifdef CONFIG_SYSCTL
devinet_sysctl_register(in_dev, &in_dev->cnf);
@@ -176,6 +175,8 @@ struct in_device *inetdev_init(struct net_device *dev)
if (dev->flags & IFF_UP)
ip_mc_up(in_dev);
 out:
+   /* we can receive as soon as ip_ptr is set -- do this last */
+   rcu_assign_pointer(dev->ip_ptr, in_dev);
return in_dev;
 out_kfree:
kfree(in_dev);
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 9b0a906..171e5b5 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -413,8 +413,6 @@ static struct inet6_dev * ipv6_add_dev(struct net_device 
*dev)
if (netif_carrier_ok(dev))
ndev->if_flags |= IF_READY;
 
-   /* protected by rtnl_lock */
-   rcu_assign_pointer(dev->ip6_ptr, ndev);
 
ipv6_mc_init_dev(ndev);
ndev->tstamp = jiffies;
@@ -425,6 +423,8 @@ static struct inet6_dev * ipv6_add_dev(struct net_device 
*dev)
  NULL);
addrconf_sysctl_register(ndev, &ndev->cnf);
 #endif
+   /* protected by rtnl_lock */
+   rcu_assign_pointer(dev->ip6_ptr, ndev);
return ndev;
 }
 
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG: soft lockup detected on CPU#0! (2.6.18.2 plus hacks)

2007-01-04 Thread Ben Greear

Jarek Poplawski wrote:

On Thu, Jan 04, 2007 at 09:27:07PM +1100, Herbert Xu wrote:
  

On Thu, Jan 04, 2007 at 09:50:14AM +0100, Jarek Poplawski wrote:


Could you explain? I can see some inet_rtm_newaddr
interrupted. For me it could be e.g.:

after
vconfig add eth0 9

ip addr add dev eth0.9 ...
  

Whether eth0.9 is up or not does not affect this at all.  The spin
locks are initialised (and used) when the first IPv4 address is added,
not when the device comes up.



I understand this. I consider IFF_UP as a sign all 
initialisations (open functions including) are

completed and there is permission for working (so
logically, if I would do eth0.9 down all traffic
should be stopped, what probably isn't true now).
  
It is certainly valid for an interface to be IF_UP, but have no IP 
address.  My application

does bring the network device up before it assigns the IP, for instance.

There may be other issues with IF_UP, but that could be handled with a 
different
investigation.  If you have a particular test case that fails with 
802.1Q VLANs, then

I will be happy to work on it...

Thanks,
Ben

Jarek P. 
-

To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
  



--
Ben Greear <[EMAIL PROTECTED]> 
Candela Technologies Inc  http://www.candelatech.com



-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG: soft lockup detected on CPU#0! (2.6.18.2 plus hacks)

2007-01-04 Thread Jarek Poplawski
On Thu, Jan 04, 2007 at 09:27:07PM +1100, Herbert Xu wrote:
> On Thu, Jan 04, 2007 at 09:50:14AM +0100, Jarek Poplawski wrote:
> > 
> > Could you explain? I can see some inet_rtm_newaddr
> > interrupted. For me it could be e.g.:
> > 
> > after
> > vconfig add eth0 9
> > 
> > ip addr add dev eth0.9 ...
> 
> Whether eth0.9 is up or not does not affect this at all.  The spin
> locks are initialised (and used) when the first IPv4 address is added,
> not when the device comes up.

I understand this. I consider IFF_UP as a sign all 
initialisations (open functions including) are
completed and there is permission for working (so
logically, if I would do eth0.9 down all traffic
should be stopped, what probably isn't true now).

Jarek P. 
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG: soft lockup detected on CPU#0! (2.6.18.2 plus hacks)

2007-01-04 Thread Herbert Xu
On Thu, Jan 04, 2007 at 09:50:14AM +0100, Jarek Poplawski wrote:
> 
> Could you explain? I can see some inet_rtm_newaddr
> interrupted. For me it could be e.g.:
> 
> after
> vconfig add eth0 9
> 
> ip addr add dev eth0.9 ...

Whether eth0.9 is up or not does not affect this at all.  The spin
locks are initialised (and used) when the first IPv4 address is added,
not when the device comes up.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG: soft lockup detected on CPU#0! (2.6.18.2 plus hacks)

2007-01-04 Thread Jarek Poplawski
On Thu, Jan 04, 2007 at 07:29:30PM +1100, Herbert Xu wrote:
> On Thu, Jan 04, 2007 at 09:03:51AM +0100, Jarek Poplawski wrote:
> > 
> > I doubt this is the right solution. It certainly
> > could fix this particular situation but my main
> > point was packets shouldn't get into kernel
> > receive queues with skb->dev not IFF_UP.
> 
> I think you misunderstood.  The device certainly is IFF_UP.  What
> happens is that the multicast spin locks are set up too late:

Could you explain? I can see some inet_rtm_newaddr
interrupted. For me it could be e.g.:

after
vconfig add eth0 9

ip addr add dev eth0.9 ...

before
ip link set dev eth0.9 up

Jarek P.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG: soft lockup detected on CPU#0! (2.6.18.2 plus hacks)

2007-01-04 Thread Herbert Xu
On Thu, Jan 04, 2007 at 09:03:51AM +0100, Jarek Poplawski wrote:
> 
> I doubt this is the right solution. It certainly
> could fix this particular situation but my main
> point was packets shouldn't get into kernel
> receive queues with skb->dev not IFF_UP.

I think you misunderstood.  The device certainly is IFF_UP.  What
happens is that the multicast spin locks are set up too late:

/* Account for reference dev->ip_ptr */
in_dev_hold(in_dev);
rcu_assign_pointer(dev->ip_ptr, in_dev);

As soon as we set ip_ptr incoming packets can cause the multicast
locks to be taken.

#ifdef CONFIG_SYSCTL
devinet_sysctl_register(in_dev, &in_dev->cnf);
#endif

In fact the back trace shows that we get a packet during the
sysctl registration.

ip_mc_init_dev(in_dev);

However, the spin locks are not setup until this point.  This is
exactly what David's patch fixes.

if (dev->flags & IFF_UP)
ip_mc_up(in_dev);

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG: soft lockup detected on CPU#0! (2.6.18.2 plus hacks)

2007-01-04 Thread Jarek Poplawski
On Thu, Jan 04, 2007 at 05:26:27PM +1100, Herbert Xu wrote:
> David Stevens <[EMAIL PROTECTED]> wrote:
> >You're right, I don't know whether it'll fix the problem Ben saw
> > or not, but it looks like the original code can do a receive before the
> > in_device is fully initialized, and that, of course, is bad.
> >If the device for ip_rcv() is not the same one we were
> > initializing when the receive interrupted, then the patch should have
> > no effect either way -- I don't think it'll hide other problems.
> >If it's hard to reproduce (which I guess is true), then you're
> > right, no soft lockup doesn't really tell us if it's fixed or not.
> 
> Actually I missed your point that the multicast locks aren't even
> initialised at that point.  So this does explain the soft lock-up
> and therefore your patch is clearly the correct solution.

I doubt this is the right solution. It certainly
could fix this particular situation but my main
point was packets shouldn't get into kernel
receive queues with skb->dev not IFF_UP.

The real devices' drivers don't do that and
virtual devices should do the same. Otherwise,
the code of netif_rx or netif_receive_skb should
check this always and drop such packets or else
this kind of checking should be done later. And
this patch simply takes into consideration
something could be wrong here. But then all the
rest of receiving and routing functions should be
checked and maybe fixed to consider the same.

I've proposed some measures to check if this bug
is really caused by this skipped init but, IMHO,
this should be fixed with one of this ways:

- vlan driver should be reworked to do like "real"
drivers and assure no packet with skb->dev not
IFF_UP will be queued or processed by higher
protocols; it could possibly use bridge's master
field and skb_bond and skb_bond_should_drop (maybe
slightly changed),

- vlan driver should itself open the real devices
only after it's devices are up,

- all dev.c receive functions should be changed to
check IFF_UP - but because vlans are not so
popular - this would be the waste of time of
course.

Regards,
Jarek P.

PS: for scientific reasons we could seek this
specific place where it locks or loops now
(I've some suspicions to lockdep because it
looks like the place after it with lock init
checking isn't reached), and maybe there is
also some other bug, but it's evident this
possibility of ip_rcv and ip_route_input
before dev IFF_UP is a hole in the design and
should be fixed.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG: soft lockup detected on CPU#0! (2.6.18.2 plus hacks)

2007-01-03 Thread Herbert Xu
David Stevens <[EMAIL PROTECTED]> wrote:
>You're right, I don't know whether it'll fix the problem Ben saw
> or not, but it looks like the original code can do a receive before the
> in_device is fully initialized, and that, of course, is bad.
>If the device for ip_rcv() is not the same one we were
> initializing when the receive interrupted, then the patch should have
> no effect either way -- I don't think it'll hide other problems.
>If it's hard to reproduce (which I guess is true), then you're
> right, no soft lockup doesn't really tell us if it's fixed or not.

Actually I missed your point that the multicast locks aren't even
initialised at that point.  So this does explain the soft lock-up
and therefore your patch is clearly the correct solution.

Thanks,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG: soft lockup detected on CPU#0! (2.6.18.2 plus hacks)

2007-01-03 Thread David Stevens
Ben,
 If the ip_rcv() and the inetdev_init() are on the same
interface in your stack backtrace, it's a certainty at that point
that the lock value is still 0ed, because none of the initialization
occurs until after it has returned from the function it interrupted
to do the receive.
It'd have to be out of the register code and doing
ip_mc_init_dev() (after that call) to be a tight race with
lock creation.

+-DLS
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG: soft lockup detected on CPU#0! (2.6.18.2 plus hacks)

2007-01-03 Thread David Stevens
Herbert,
You're right, I don't know whether it'll fix the problem Ben saw
or not, but it looks like the original code can do a receive before the
in_device is fully initialized, and that, of course, is bad.
If the device for ip_rcv() is not the same one we were
initializing when the receive interrupted, then the patch should have
no effect either way -- I don't think it'll hide other problems.
If it's hard to reproduce (which I guess is true), then you're
right, no soft lockup doesn't really tell us if it's fixed or not.

+-DLS

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG: soft lockup detected on CPU#0! (2.6.18.2 plus hacks)

2007-01-03 Thread Herbert Xu
Ben Greear <[EMAIL PROTECTED]> wrote:
> 
> I'm not sure if it helps..but I did notice that 'ip' was using 99% of the
> CPU on the system.  Could this be because it was spinning trying to acquire
> the read-lock?  When I ran 'ifconfig -a', that process hung, and at that point
> the system was rebooted.  Before I ran ifconfig, 'top' and 'ls' and similar
> apps were responding fine, and I was logged in over ssh from the US to 
> Australia, so
> it's basic networking was functioning.

That's understandable because we were stuck in the softirq handler while
holding the rtnl lock.  The rtnl lock is what's preventing those commands
from executing.

What we need to figure out is whether the rtnl lock is a coincidence or
a cause.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG: soft lockup detected on CPU#0! (2.6.18.2 plus hacks)

2007-01-03 Thread Ben Greear

Herbert Xu wrote:

David Stevens <[EMAIL PROTECTED]> wrote:

Ben,
   Here's a patch that I think will fix it, assuming the receive is 
on the

same device as the initialization. Can you try this out?


Hi David:

Your patch makes sense on its own but I don't see the direct connection
to the soft lock-up.  Sure it prevents the code path in question from
triggering.  However, if we don't understand why it's locking up in the
first place then this may just be hiding it rather than fixing it.

In particular, a soft lockup means that we're doing so much work in
the softirq handlers that processes are not getting run.  So what is
it exactly here that's causing us to get stuck in the softirq handlers?
Is it because we're somehow getting stuck in a net rx loop?


I'm not sure if it helps..but I did notice that 'ip' was using 99% of the
CPU on the system.  Could this be because it was spinning trying to acquire
the read-lock?  When I ran 'ifconfig -a', that process hung, and at that point
the system was rebooted.  Before I ran ifconfig, 'top' and 'ls' and similar
apps were responding fine, and I was logged in over ssh from the US to 
Australia, so
it's basic networking was functioning.

What if the race is that the read-lock is only half initialized, so that
it doesn't trigger the uninitialized-lock-use debug message, but still screws
up and will not ever let the reader acquire the lock?

Thanks,
Ben



Cheers,



--
Ben Greear <[EMAIL PROTECTED]>
Candela Technologies Inc  http://www.candelatech.com

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG: soft lockup detected on CPU#0! (2.6.18.2 plus hacks)

2007-01-03 Thread Herbert Xu
David Stevens <[EMAIL PROTECTED]> wrote:
> 
> Ben,
>Here's a patch that I think will fix it, assuming the receive is 
> on the
> same device as the initialization. Can you try this out?

Hi David:

Your patch makes sense on its own but I don't see the direct connection
to the soft lock-up.  Sure it prevents the code path in question from
triggering.  However, if we don't understand why it's locking up in the
first place then this may just be hiding it rather than fixing it.

In particular, a soft lockup means that we're doing so much work in
the softirq handlers that processes are not getting run.  So what is
it exactly here that's causing us to get stuck in the softirq handlers?
Is it because we're somehow getting stuck in a net rx loop?

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG: soft lockup detected on CPU#0! (2.6.18.2 plus hacks)

2007-01-03 Thread David Stevens
OK, sounds good.

By the way, I think you can probably hit it more often if you have
something on the virtual network sending lots of multicast traffic while
you're creating the interface. That'll increase the odds that you'll
get into ip_check_mc() with a partially initialized in_dev.

You can use "ping -I intfX 224.0.0.1" (e.g.) to generate multicast
traffic, though you'd want more than one. :-)

+-DLS

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG: soft lockup detected on CPU#0! (2.6.18.2 plus hacks)

2007-01-03 Thread Ben Greear

David Stevens wrote:

Ben,
Here's a patch that I think will fix it, assuming the receive is 
on the

same device as the initialization. Can you try this out?


We are attempting to reproduce this now...as soon as we can reproduce,
I'll apply this and see if that fixes the problem.  This race is evidently
quite difficult to hit, so I'm not sure how long this will take.

Perhaps someone like DaveM could review the patch for logical correctness
and go ahead and apply anyway if it is more correct?  I confuse myself often
enough trying to deal with the network stack locking that I should probably
not be the final arbiter of this patch :)

Thanks,
Ben



+-DLS
[inline for viewing, attached for applying]

Signed-off-by: David L Stevens <[EMAIL PROTECTED]>

diff -ruNp linux-2.6.19.1/net/ipv4/devinet.c 
linux-2.6.19.1T1/net/ipv4/devinet.c
--- linux-2.6.19.1/net/ipv4/devinet.c   2006-12-11 11:32:53.0 
-0800
+++ linux-2.6.19.1T1/net/ipv4/devinet.c 2007-01-03 14:37:56.0 
-0800

@@ -165,9 +165,8 @@ struct in_device *inetdev_init(struct ne
  NET_IPV4_NEIGH, "ipv4", NULL, NULL);
 #endif
 
-   /* Account for reference dev->ip_ptr */

+   /* Account for reference dev->ip_ptr (below) */
in_dev_hold(in_dev);
-   rcu_assign_pointer(dev->ip_ptr, in_dev);
 
 #ifdef CONFIG_SYSCTL

devinet_sysctl_register(in_dev, &in_dev->cnf);
@@ -176,6 +175,8 @@ struct in_device *inetdev_init(struct ne
if (dev->flags & IFF_UP)
ip_mc_up(in_dev);
 out:
+   /* we can receive as soon as ip_ptr is set -- do this last */
+   rcu_assign_pointer(dev->ip_ptr, in_dev);
return in_dev;
 out_kfree:
kfree(in_dev);
diff -ruNp linux-2.6.19.1/net/ipv6/addrconf.c 
linux-2.6.19.1T1/net/ipv6/addrconf.c
--- linux-2.6.19.1/net/ipv6/addrconf.c  2006-12-11 11:32:53.0 
-0800
+++ linux-2.6.19.1T1/net/ipv6/addrconf.c2007-01-03 
14:47:07.0 -0800

@@ -413,8 +413,6 @@ static struct inet6_dev * ipv6_add_dev(s
if (netif_carrier_ok(dev))
ndev->if_flags |= IF_READY;
 
-   /* protected by rtnl_lock */

-   rcu_assign_pointer(dev->ip6_ptr, ndev);
 
ipv6_mc_init_dev(ndev);

ndev->tstamp = jiffies;
@@ -425,6 +423,8 @@ static struct inet6_dev * ipv6_add_dev(s
  NULL);
addrconf_sysctl_register(ndev, &ndev->cnf);
 #endif
+   /* protected by rtnl_lock */
+   rcu_assign_pointer(dev->ip6_ptr, ndev);
return ndev;
 }
 



--
Ben Greear <[EMAIL PROTECTED]>
Candela Technologies Inc  http://www.candelatech.com

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG: soft lockup detected on CPU#0! (2.6.18.2 plus hacks)

2007-01-03 Thread David Stevens
Ben,
Here's a patch that I think will fix it, assuming the receive is 
on the
same device as the initialization. Can you try this out?

+-DLS
[inline for viewing, attached for applying]

Signed-off-by: David L Stevens <[EMAIL PROTECTED]>

diff -ruNp linux-2.6.19.1/net/ipv4/devinet.c 
linux-2.6.19.1T1/net/ipv4/devinet.c
--- linux-2.6.19.1/net/ipv4/devinet.c   2006-12-11 11:32:53.0 
-0800
+++ linux-2.6.19.1T1/net/ipv4/devinet.c 2007-01-03 14:37:56.0 
-0800
@@ -165,9 +165,8 @@ struct in_device *inetdev_init(struct ne
  NET_IPV4_NEIGH, "ipv4", NULL, NULL);
 #endif
 
-   /* Account for reference dev->ip_ptr */
+   /* Account for reference dev->ip_ptr (below) */
in_dev_hold(in_dev);
-   rcu_assign_pointer(dev->ip_ptr, in_dev);
 
 #ifdef CONFIG_SYSCTL
devinet_sysctl_register(in_dev, &in_dev->cnf);
@@ -176,6 +175,8 @@ struct in_device *inetdev_init(struct ne
if (dev->flags & IFF_UP)
ip_mc_up(in_dev);
 out:
+   /* we can receive as soon as ip_ptr is set -- do this last */
+   rcu_assign_pointer(dev->ip_ptr, in_dev);
return in_dev;
 out_kfree:
kfree(in_dev);
diff -ruNp linux-2.6.19.1/net/ipv6/addrconf.c 
linux-2.6.19.1T1/net/ipv6/addrconf.c
--- linux-2.6.19.1/net/ipv6/addrconf.c  2006-12-11 11:32:53.0 
-0800
+++ linux-2.6.19.1T1/net/ipv6/addrconf.c2007-01-03 
14:47:07.0 -0800
@@ -413,8 +413,6 @@ static struct inet6_dev * ipv6_add_dev(s
if (netif_carrier_ok(dev))
ndev->if_flags |= IF_READY;
 
-   /* protected by rtnl_lock */
-   rcu_assign_pointer(dev->ip6_ptr, ndev);
 
ipv6_mc_init_dev(ndev);
ndev->tstamp = jiffies;
@@ -425,6 +423,8 @@ static struct inet6_dev * ipv6_add_dev(s
  NULL);
addrconf_sysctl_register(ndev, &ndev->cnf);
 #endif
+   /* protected by rtnl_lock */
+   rcu_assign_pointer(dev->ip6_ptr, ndev);
return ndev;
 }
 


initfix.patch
Description: Binary data


Re: BUG: soft lockup detected on CPU#0! (2.6.18.2 plus hacks)

2007-01-03 Thread David Stevens
Ben & Jarek,
Your analysis looks correct to me. It seems to me the problem is 
that
we don't want the in_device to be searchable until after the 
initialization is done.
What about moving the initialization of dev->ip_ptr in inetdev_init() to 
after the
"out" label?

+-DLS

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG: soft lockup detected on CPU#0! (2.6.18.2 plus hacks)

2007-01-03 Thread Ben Greear

Jarek Poplawski wrote:

On Wed, Jan 03, 2007 at 09:07:11AM +0100, Jarek Poplawski wrote:
  

On Tue, Jan 02, 2007 at 03:35:39PM -0800, David Stevens wrote:


I've looked at this a little too -- it'd be nice to know who holds
the write lock.
  

If you mean mc_list_lock - probably nobody - it's
not initialized (so the timers) for this in_device



I should say: "... probably not initialized ...".
  
That should print out the debugging when you access an un-initialized 
lock, and I did not
see that print-out in the logs.   I looked at the code and could not 
explain how it could

be accessed un-initialized, so I'm not certain this is the problem.

If I can reproduce this in a controlled manner, I'll add debugging to 
print out who is holding
the lock (if anyone), as well as make sure it is initialized before the 
blocking method initializes
it.  It will likely be a few days before we can set up something to 
reproduce it, however.


If you can explain any code path that could leave the lock 
uninitialized, then that would be a

big help...but it looked ok to me...

Ben

Jarek P. 
-

To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
  



--
Ben Greear <[EMAIL PROTECTED]> 
Candela Technologies Inc  http://www.candelatech.com



-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG: soft lockup detected on CPU#0! (2.6.18.2 plus hacks)

2007-01-03 Thread Jarek Poplawski
On Wed, Jan 03, 2007 at 09:07:11AM +0100, Jarek Poplawski wrote:
> On Tue, Jan 02, 2007 at 03:35:39PM -0800, David Stevens wrote:
> > I've looked at this a little too -- it'd be nice to know who holds
> > the write lock.
> 
> If you mean mc_list_lock - probably nobody - it's
> not initialized (so the timers) for this in_device

I should say: "... probably not initialized ...".

Jarek P. 
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG: soft lockup detected on CPU#0! (2.6.18.2 plus hacks)

2007-01-03 Thread Jarek Poplawski
On Tue, Jan 02, 2007 at 03:35:39PM -0800, David Stevens wrote:
> I've looked at this a little too -- it'd be nice to know who holds
> the write lock.

If you mean mc_list_lock - probably nobody - it's
not initialized (so the timers) for this in_device
and rtnl mutex is preempted by irq.

Actually I wonder if lockdep isn't masking (or even
spoiling) something, so I'd try with:
"Lock debugging: ..." options off
(CONFIG_DEBUG_SPINLOCK = y
CONFIG_DEBUG_LOCK_ALLOC = n).

Jarek P. 

PS: because of unknown changes from those patches
this is guessing only.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG: soft lockup detected on CPU#0! (2.6.18.2 plus hacks)

2007-01-02 Thread Ben Greear

David Stevens wrote:

I've looked at this a little too -- it'd be nice to know who holds
the write lock.

I see ip_mc_destroy_dev() is bouncing through the lock for
each multicast address, though it starts at the beginning of
the list each time. I don't see a problem with it, but it'd be
simpler if it acquired the write lock once, grabbed and nulled
the list, released the lock and then called igmp_group_dropped()
& ip_ma_put() on each address from the local list copy.

Are you destroying/creating interfaces or doing a lot of multicasting at
the time? How many group memberships do you have?


Lots and lots of interfaces were being created...at least 200 mac-vlans (out-of 
tree patch
somewhat similar to 802.1q vlans.)  The avahi-daemon process was running, and 
it appears
to be adding a multicast to each interface.  It was spewing failure messages in 
/var/log/messages,
probably because it can't handle so many interfaces.

Other than that, there is no (known) multicast traffic being generated.

This bug was reported to me by a user in Australia, and we have not yet
attempted to recreate this locally, so I am not certain exactly what it
takes to trigger this bug.

Thanks,
Ben





+-DLS



--
Ben Greear <[EMAIL PROTECTED]>
Candela Technologies Inc  http://www.candelatech.com

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG: soft lockup detected on CPU#0! (2.6.18.2 plus hacks)

2007-01-02 Thread David Stevens
I've looked at this a little too -- it'd be nice to know who holds
the write lock.

I see ip_mc_destroy_dev() is bouncing through the lock for
each multicast address, though it starts at the beginning of
the list each time. I don't see a problem with it, but it'd be
simpler if it acquired the write lock once, grabbed and nulled
the list, released the lock and then called igmp_group_dropped()
& ip_ma_put() on each address from the local list copy.

Are you destroying/creating interfaces or doing a lot of multicasting at
the time? How many group memberships do you have?

+-DLS

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG: soft lockup detected on CPU#0! (2.6.18.2 plus hacks)

2007-01-02 Thread Jarek Poplawski
On Tue, Jan 02, 2007 at 09:23:02AM +0100, Jarek Poplawski wrote:
> On Tue, Jan 02, 2007 at 08:39:09AM +0100, Jarek Poplawski wrote:
> ...
> > The main thing is the possibility of processing
> > skb with not entirely open source dev which isn't
> > expected (and checked) by receive functions.
> > I think the easiest way to convince yourself is
> > to add temporarily IFF_UP flag checking with
> > dropping at the beginning of netif_receive_skb and
> > __vlan_hwaccel_rx.

... and vlan_skb_recv also.

Jarek P.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG: soft lockup detected on CPU#0! (2.6.18.2 plus hacks)

2007-01-02 Thread Jarek Poplawski
On Tue, Jan 02, 2007 at 08:39:09AM +0100, Jarek Poplawski wrote:
...
> It is hard to say what kind of bug to expect
> because at the same time other net_rx_action
> with the same vlan dev could take place on
> other processor and this inetdev_init could
> do more.

Sorry! inetdev_init couldn't do more because
of rtnl lock but anyway the rest should be valid:

> The main thing is the possibility of processing
> skb with not entirely open source dev which isn't
> expected (and checked) by receive functions.
> I think the easiest way to convince yourself is
> to add temporarily IFF_UP flag checking with
> dropping at the beginning of netif_receive_skb and
> __vlan_hwaccel_rx.

Jarek P.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG: soft lockup detected on CPU#0! (2.6.18.2 plus hacks)

2007-01-01 Thread Jarek Poplawski
On Mon, Jan 01, 2007 at 09:00:05PM -0800, Ben Greear wrote:
> I finally had time to look through the code in this backtrace in 
> detail.  I think it *could*
> be a race between ip_rcv and inetdev_init, but I am not certain.  Other 
> than that, I'm real
> low on ideas.  I found a few more stack trace debugging options to 
> enable..perhaps that
> will give a better backtrace if we can reproduce it again.
> 
> I do have lock-debugging enabled, so it should have caught this if was 
> an un-initialized access
> problem, however.
> 
> More details below inline.
> 
> Ben Greear wrote:
> >This is from 2.6.18.2 kernel with my patch set.  The MAC-VLANs are in 
> >active use.
> >From the backtrace, I am thinking this might be a generic problem, 
> >however.
> >
> >Any ideas about what this could be?  It seems to be reproducible every 
> >day or
> >two, but no known way to make it happen quickly...
> >
> >Kernel is SMP, PREEMPT.
> >
> >
> >Dec 19 04:49:33 localhost kernel: BUG: soft lockup detected on CPU#0!
> >Dec 19 04:49:33 localhost kernel:  [<78104252>] show_trace+0x12/0x20
> >Dec 19 04:49:33 localhost kernel:  [<78104929>] dump_stack+0x19/0x20
> >Dec 19 04:49:33 localhost kernel:  [<7814c88b>] softlockup_tick+0x9b/0xd0
> >Dec 19 04:49:33 localhost kernel:  [<7812a992>] 
> >run_local_timers+0x12/0x20
> >Dec 19 04:49:33 localhost kernel:  [<7812ac08>] 
> >update_process_times+0x38/0x80
> >Dec 19 04:49:33 localhost kernel:  [<78112796>] 
> >smp_apic_timer_interrupt+0x66/0x70
> >Dec 19 04:49:33 localhost kernel:  [<78103baa>] 
> >apic_timer_interrupt+0x2a/0x30
> >Dec 19 04:49:33 localhost kernel:  [<78354e8c>] _read_lock+0x3c/0x50
> > Dec 19 04:49:33 localhost kernel:  [<78331f42>] ip_check_mc+0x22/0xb0
> This is blocked on:
> igmp.c:read_lock(&in_dev->mc_list_lock);
> 
> >Dec 19 04:49:33 localhost kernel:  [<783068bf>] 
> >ip_route_input+0x17f/0xef0
> route.c:int our = ip_check_mc(in_dev, daddr, saddr, 
> skb->nh.iph->protocol);
> >Dec 19 04:49:33 localhost kernel:  [<78309c59>] ip_rcv+0x349/0x580
> ?? Called by a macro maybe?  Can't find an obvious call to the 

Probably deliver_skb.

> ip_route_input.
> >Dec 19 04:49:33 localhost kernel:  [<782ec98d>] 
> >netif_receive_skb+0x36d/0x3b0
> >Dec 19 04:49:33 localhost kernel:  [<782ee50c>] 
> >process_backlog+0x9c/0x130
> >Dec 19 04:49:33 localhost kernel:  [<782ee795>] net_rx_action+0xc5/0x1f0
> >Dec 19 04:49:33 localhost kernel:  [<78125e58>] __do_softirq+0x88/0x110
> >Dec 19 04:49:33 localhost kernel:  [<78125f59>] do_softirq+0x79/0x80
> >Dec 19 04:49:33 localhost kernel:  [<781260ed>] irq_exit+0x5d/0x60
> >Dec 19 04:49:33 localhost kernel:  [<78105a6d>] do_IRQ+0x4d/0xa0
> >Dec 19 04:49:33 localhost kernel:  [<78103ae9>] 
> >common_interrupt+0x25/0x2c
> >Dec 19 04:49:33 localhost kernel:  [<78354c45>] _spin_lock+0x35/0x50
> >Dec 19 04:49:33 localhost kernel:  [<781aab1d>] proc_register+0x2d/0x110
> >Dec 19 04:49:33 localhost kernel:  [<781ab23d>] 
> >create_proc_entry+0x5d/0xd0
> >Dec 19 04:49:33 localhost kernel:  [<7812873b>] 
> >register_proc_table+0x6b/0x110
> >Dec 19 04:49:33 localhost kernel:  [<78128771>] 
> >register_proc_table+0xa1/0x110
> >Dec 19 04:49:33 localhost last message repeated 3 times
> >Dec 19 04:49:33 localhost kernel:  [<7812886d>] 
> >register_sysctl_table+0x8d/0xc0
> >Dec 19 04:49:33 localhost kernel:  [<7832f0c9>] 
> >devinet_sysctl_register+0x109/0x150
> 
> This devinet_sysctl_register is called right before the ip_mc_init_dev 
> call is made, and
> that call is used to initialize the multicast lock that is blocked on at 
> the top of this backtrace.
> This *could* be the race, but only if the entities in question are the 
> same thing.  I don't see
> any way to determine whether they are or not based on the backtrace.
> 
> I looked through all of the uses of the mc_list_lock, and the places 
> where it does a write_lock
> are few and appear to be correct with no possibility of deadlocking.  If 
> a lock was un-initialized, then
> that could perhaps explain why it is able to deadlock (though, that 
> should have triggered a different
> bug report since I have spin/rw-lock debugging enabled.)
> 

It is hard to say what kind of bug to expect
because at the same time other net_rx_action
with the same vlan dev could take place on
other processor and this inetdev_init could
do more.

The main thing is the possibility of processing
skb with not entirely open source dev which isn't
expected (and checked) by receive functions.
I think the easiest way to convince yourself is
to add temporarily IFF_UP flag checking with
dropping at the beginning of netif_receive_skb and
__vlan_hwaccel_rx.

Jarek P.

> >Dec 19 04:49:33 localhost kernel:  [<7832f2ea>] inetdev_init+0xea/0x160
> >Dec 19 04:49:33 localhost kernel:  [<7832fa2e>] 
> >inet_rtm_newaddr+0x16e/0x190
> >Dec 19 04:49:33 localhost kernel:  [<782f58a9>] 
> >rtnetlink_rcv_msg+0x169/0x230
> >Dec 19 04:49:33 localhost kernel:  [<78300ed0>] 
> >netlink_run_queue+0x90/0x140
> >Dec 1

Re: BUG: soft lockup detected on CPU#0! (2.6.18.2 plus hacks)

2007-01-01 Thread Ben Greear
I finally had time to look through the code in this backtrace in 
detail.  I think it *could*
be a race between ip_rcv and inetdev_init, but I am not certain.  Other 
than that, I'm real
low on ideas.  I found a few more stack trace debugging options to 
enable..perhaps that

will give a better backtrace if we can reproduce it again.

I do have lock-debugging enabled, so it should have caught this if was 
an un-initialized access

problem, however.

More details below inline.

Ben Greear wrote:
This is from 2.6.18.2 kernel with my patch set.  The MAC-VLANs are in 
active use.
From the backtrace, I am thinking this might be a generic problem, 
however.


Any ideas about what this could be?  It seems to be reproducible every 
day or

two, but no known way to make it happen quickly...

Kernel is SMP, PREEMPT.


Dec 19 04:49:33 localhost kernel: BUG: soft lockup detected on CPU#0!
Dec 19 04:49:33 localhost kernel:  [<78104252>] show_trace+0x12/0x20
Dec 19 04:49:33 localhost kernel:  [<78104929>] dump_stack+0x19/0x20
Dec 19 04:49:33 localhost kernel:  [<7814c88b>] softlockup_tick+0x9b/0xd0
Dec 19 04:49:33 localhost kernel:  [<7812a992>] 
run_local_timers+0x12/0x20
Dec 19 04:49:33 localhost kernel:  [<7812ac08>] 
update_process_times+0x38/0x80
Dec 19 04:49:33 localhost kernel:  [<78112796>] 
smp_apic_timer_interrupt+0x66/0x70
Dec 19 04:49:33 localhost kernel:  [<78103baa>] 
apic_timer_interrupt+0x2a/0x30

Dec 19 04:49:33 localhost kernel:  [<78354e8c>] _read_lock+0x3c/0x50

> Dec 19 04:49:33 localhost kernel:  [<78331f42>] ip_check_mc+0x22/0xb0
This is blocked on:
igmp.c:read_lock(&in_dev->mc_list_lock);

Dec 19 04:49:33 localhost kernel:  [<783068bf>] 
ip_route_input+0x17f/0xef0
route.c:int our = ip_check_mc(in_dev, daddr, saddr, 
skb->nh.iph->protocol);

Dec 19 04:49:33 localhost kernel:  [<78309c59>] ip_rcv+0x349/0x580
?? Called by a macro maybe?  Can't find an obvious call to the 
ip_route_input.
Dec 19 04:49:33 localhost kernel:  [<782ec98d>] 
netif_receive_skb+0x36d/0x3b0
Dec 19 04:49:33 localhost kernel:  [<782ee50c>] 
process_backlog+0x9c/0x130

Dec 19 04:49:33 localhost kernel:  [<782ee795>] net_rx_action+0xc5/0x1f0
Dec 19 04:49:33 localhost kernel:  [<78125e58>] __do_softirq+0x88/0x110
Dec 19 04:49:33 localhost kernel:  [<78125f59>] do_softirq+0x79/0x80
Dec 19 04:49:33 localhost kernel:  [<781260ed>] irq_exit+0x5d/0x60
Dec 19 04:49:33 localhost kernel:  [<78105a6d>] do_IRQ+0x4d/0xa0
Dec 19 04:49:33 localhost kernel:  [<78103ae9>] 
common_interrupt+0x25/0x2c

Dec 19 04:49:33 localhost kernel:  [<78354c45>] _spin_lock+0x35/0x50
Dec 19 04:49:33 localhost kernel:  [<781aab1d>] proc_register+0x2d/0x110
Dec 19 04:49:33 localhost kernel:  [<781ab23d>] 
create_proc_entry+0x5d/0xd0
Dec 19 04:49:33 localhost kernel:  [<7812873b>] 
register_proc_table+0x6b/0x110
Dec 19 04:49:33 localhost kernel:  [<78128771>] 
register_proc_table+0xa1/0x110

Dec 19 04:49:33 localhost last message repeated 3 times
Dec 19 04:49:33 localhost kernel:  [<7812886d>] 
register_sysctl_table+0x8d/0xc0
Dec 19 04:49:33 localhost kernel:  [<7832f0c9>] 
devinet_sysctl_register+0x109/0x150


This devinet_sysctl_register is called right before the ip_mc_init_dev 
call is made, and
that call is used to initialize the multicast lock that is blocked on at 
the top of this backtrace.
This *could* be the race, but only if the entities in question are the 
same thing.  I don't see

any way to determine whether they are or not based on the backtrace.

I looked through all of the uses of the mc_list_lock, and the places 
where it does a write_lock
are few and appear to be correct with no possibility of deadlocking.  If 
a lock was un-initialized, then
that could perhaps explain why it is able to deadlock (though, that 
should have triggered a different

bug report since I have spin/rw-lock debugging enabled.)


Dec 19 04:49:33 localhost kernel:  [<7832f2ea>] inetdev_init+0xea/0x160
Dec 19 04:49:33 localhost kernel:  [<7832fa2e>] 
inet_rtm_newaddr+0x16e/0x190
Dec 19 04:49:33 localhost kernel:  [<782f58a9>] 
rtnetlink_rcv_msg+0x169/0x230
Dec 19 04:49:33 localhost kernel:  [<78300ed0>] 
netlink_run_queue+0x90/0x140

Dec 19 04:49:33 localhost kernel:  [<782f56dc>] rtnetlink_rcv+0x2c/0x50
Dec 19 04:49:33 localhost kernel:  [<783014a5>] 
netlink_data_ready+0x15/0x60

Dec 19 04:49:33 localhost kernel:  [<78300167>] netlink_sendskb+0x27/0x50
Dec 19 04:49:33 localhost kernel:  [<78300bab>] 
netlink_unicast+0x15b/0x1f0
Dec 19 04:49:33 localhost kernel:  [<783013ab>] 
netlink_sendmsg+0x20b/0x2f0

Dec 19 04:49:33 localhost kernel:  [<782e12bc>] sock_sendmsg+0xfc/0x120
Dec 19 04:49:33 localhost kernel:  [<782e1a5a>] sys_sendmsg+0x10a/0x220
Dec 19 04:49:33 localhost kernel:  [<782e3311>] 
sys_socketcall+0x261/0x290
Dec 19 04:49:33 localhost kernel:  [<7810307d>] 
sysenter_past_esp+0x56/0x8d
Dec 19 04:52:17 localhost sshd[32311]: gethostby*.getanswer: asked for 
"203.60.60.10.in-addr.arpa IN PTR", got type "A"





--
Ben Greear <[EMAIL PROTECTED]> 
C

BUG: soft lockup detected on CPU#0! (2.6.18.2 plus hacks)

2006-12-19 Thread Ben Greear

This is from 2.6.18.2 kernel with my patch set.  The MAC-VLANs are in active 
use.
From the backtrace, I am thinking this might be a generic problem, however.

Any ideas about what this could be?  It seems to be reproducible every day or
two, but no known way to make it happen quickly...

Kernel is SMP, PREEMPT.


Dec 19 04:49:33 localhost kernel: BUG: soft lockup detected on CPU#0!
Dec 19 04:49:33 localhost kernel:  [<78104252>] show_trace+0x12/0x20
Dec 19 04:49:33 localhost kernel:  [<78104929>] dump_stack+0x19/0x20
Dec 19 04:49:33 localhost kernel:  [<7814c88b>] softlockup_tick+0x9b/0xd0
Dec 19 04:49:33 localhost kernel:  [<7812a992>] run_local_timers+0x12/0x20
Dec 19 04:49:33 localhost kernel:  [<7812ac08>] update_process_times+0x38/0x80
Dec 19 04:49:33 localhost kernel:  [<78112796>] 
smp_apic_timer_interrupt+0x66/0x70
Dec 19 04:49:33 localhost kernel:  [<78103baa>] apic_timer_interrupt+0x2a/0x30
Dec 19 04:49:33 localhost kernel:  [<78354e8c>] _read_lock+0x3c/0x50
Dec 19 04:49:33 localhost kernel:  [<78331f42>] ip_check_mc+0x22/0xb0
Dec 19 04:49:33 localhost kernel:  [<783068bf>] ip_route_input+0x17f/0xef0
Dec 19 04:49:33 localhost kernel:  [<78309c59>] ip_rcv+0x349/0x580
Dec 19 04:49:33 localhost kernel:  [<782ec98d>] netif_receive_skb+0x36d/0x3b0
Dec 19 04:49:33 localhost kernel:  [<782ee50c>] process_backlog+0x9c/0x130
Dec 19 04:49:33 localhost kernel:  [<782ee795>] net_rx_action+0xc5/0x1f0
Dec 19 04:49:33 localhost kernel:  [<78125e58>] __do_softirq+0x88/0x110
Dec 19 04:49:33 localhost kernel:  [<78125f59>] do_softirq+0x79/0x80
Dec 19 04:49:33 localhost kernel:  [<781260ed>] irq_exit+0x5d/0x60
Dec 19 04:49:33 localhost kernel:  [<78105a6d>] do_IRQ+0x4d/0xa0
Dec 19 04:49:33 localhost kernel:  [<78103ae9>] common_interrupt+0x25/0x2c
Dec 19 04:49:33 localhost kernel:  [<78354c45>] _spin_lock+0x35/0x50
Dec 19 04:49:33 localhost kernel:  [<781aab1d>] proc_register+0x2d/0x110
Dec 19 04:49:33 localhost kernel:  [<781ab23d>] create_proc_entry+0x5d/0xd0
Dec 19 04:49:33 localhost kernel:  [<7812873b>] register_proc_table+0x6b/0x110
Dec 19 04:49:33 localhost kernel:  [<78128771>] register_proc_table+0xa1/0x110
Dec 19 04:49:33 localhost last message repeated 3 times
Dec 19 04:49:33 localhost kernel:  [<7812886d>] register_sysctl_table+0x8d/0xc0
Dec 19 04:49:33 localhost kernel:  [<7832f0c9>] 
devinet_sysctl_register+0x109/0x150
Dec 19 04:49:33 localhost kernel:  [<7832f2ea>] inetdev_init+0xea/0x160
Dec 19 04:49:33 localhost kernel:  [<7832fa2e>] inet_rtm_newaddr+0x16e/0x190
Dec 19 04:49:33 localhost kernel:  [<782f58a9>] rtnetlink_rcv_msg+0x169/0x230
Dec 19 04:49:33 localhost kernel:  [<78300ed0>] netlink_run_queue+0x90/0x140
Dec 19 04:49:33 localhost kernel:  [<782f56dc>] rtnetlink_rcv+0x2c/0x50
Dec 19 04:49:33 localhost kernel:  [<783014a5>] netlink_data_ready+0x15/0x60
Dec 19 04:49:33 localhost kernel:  [<78300167>] netlink_sendskb+0x27/0x50
Dec 19 04:49:33 localhost kernel:  [<78300bab>] netlink_unicast+0x15b/0x1f0
Dec 19 04:49:33 localhost kernel:  [<783013ab>] netlink_sendmsg+0x20b/0x2f0
Dec 19 04:49:33 localhost kernel:  [<782e12bc>] sock_sendmsg+0xfc/0x120
Dec 19 04:49:33 localhost kernel:  [<782e1a5a>] sys_sendmsg+0x10a/0x220
Dec 19 04:49:33 localhost kernel:  [<782e3311>] sys_socketcall+0x261/0x290
Dec 19 04:49:33 localhost kernel:  [<7810307d>] sysenter_past_esp+0x56/0x8d
Dec 19 04:52:17 localhost sshd[32311]: gethostby*.getanswer: asked for 
"203.60.60.10.in-addr.arpa IN PTR", got type "A"

--
Ben Greear <[EMAIL PROTECTED]>
Candela Technologies Inc  http://www.candelatech.com

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html