ksoftirqd using 100% CPU

2012-02-09 Thread Ron Foster at Baldor-IS

Hello listers,

Last night I upgraded one of the Linux guests used by our developers to
kernel level 2.6.16.60-0.95.1-default.  (Latest SLES10 SP4 kernel)  By
about 11:30 AM this morning, I was getting complaints from our
developers that the system was hanging.  Looking around, I saw that
ksoftirqd was using close to 100% cpu.

Anyone got any ideas on what to check?  I have looked in the archives,
but they are a little old.

The only messages I can find have to do with a hipersockets time out.
Feb  9 11:21:39 bus0104 kernel: NETDEV WATCHDOG: hsi0: transmit timed out
Feb  9 11:21:39 bus0104 kernel: qeth: Recovery of device 0.0.5100
started ...
Feb  9 11:21:39 bus0104 kernel: qeth: Device 0.0.5100/0.0.5101/0.0.5102
is a HiperSockets card (level: HSEC)
Feb  9 11:21:39 bus0104 kernel: with link type HiperSockets.
Feb  9 11:21:39 bus0104 kernel: qeth: Hardware IP fragmentation not
supported on hsi0
Feb  9 11:21:39 bus0104 kernel: qeth: VLAN enabled
Feb  9 11:21:39 bus0104 kernel: qeth: Multicast enabled
Feb  9 11:21:39 bus0104 kernel: qeth: IPV6 enabled
Feb  9 11:21:39 bus0104 kernel: qeth: Broadcast enabled
Feb  9 11:21:39 bus0104 kernel: qeth: Using SW checksumming on hsi0.
Feb  9 11:21:39 bus0104 kernel: qeth: Outbound TSO not supported on hsi0
Feb  9 11:21:39 bus0104 kernel: qeth: Device 0.0.5100 successfully
recovered!
q
Anyone got any ideas on what to check?  I have looked in the archives,
but they are a little old.

Thanks,
Ron

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit
http://wiki.linuxvm.org/


Re: ksoftirqd using 100% CPU

2012-02-10 Thread Joerg Reuter
On Thu, Feb 09, 2012 at 04:43:25PM -0600, Ron Foster at Baldor-IS wrote:

> The only messages I can find have to do with a hipersockets time out.
> Feb  9 11:21:39 bus0104 kernel: NETDEV WATCHDOG: hsi0: transmit timed out
> Feb  9 11:21:39 bus0104 kernel: qeth: Recovery of device 0.0.5100
> started ...

Hmm, that shouldn't happen. If you have a support agreement,
can you open a service request with Novell / SUSE, please? We'll
need the files generated by running "supportconfig" and "dbginfo.sh",
and if you already have opened a case with IBM, the PMR would be handy.

Regards,
Joerg
--
Joerg Reuterhttp://yaina.de/jreuter
And I make my way to where the warm scent of soil fills the evening air.
Everything is waiting quietly out there (Anne Clark)

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit
http://wiki.linuxvm.org/


Re: ksoftirqd using 100% CPU

2012-02-10 Thread Shane G
On Fri, Feb 10th, 2012 at 11:00 PM, Joerg Reuter wrote:

> On Thu, Feb 09, 2012 at 04:43:25PM -0600, Ron Foster at Baldor-IS wrote:
> 
> > The only messages I can find have to do with a hipersockets time out.
> > Feb  9 11:21:39 bus0104 kernel: NETDEV WATCHDOG: hsi0: transmit timed
> out
> > Feb  9 11:21:39 bus0104 kernel: qeth: Recovery of device 0.0.5100
> > started ...
> 
> Hmm, that shouldn't happen.

Lol ...
I was wondering if the OP could see anything from z/VM, but if it really is a
softirq/tasklet problem that probably means (Linux) dodgy driver.
And in almost all occasions that probably means network ...

Keeping an eye on /proc/interrupts might be instructive.

Shane ...

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit
http://wiki.linuxvm.org/


Re: ksoftirqd using 100% CPU

2012-02-10 Thread Ron Foster at Baldor-IS
Shane,
As best we can tell, there were no relevant messages that came out on the vm 
operator console.
Ron

From: Linux on 390 Port [LINUX-390@VM.MARIST.EDU] On Behalf Of Shane G 
[ibm-m...@tpg.com.au]
Sent: Friday, February 10, 2012 6:46 AM
To: LINUX-390@VM.MARIST.EDU
Subject: Re: ksoftirqd using 100% CPU

On Fri, Feb 10th, 2012 at 11:00 PM, Joerg Reuter wrote:

> On Thu, Feb 09, 2012 at 04:43:25PM -0600, Ron Foster at Baldor-IS wrote:
>
> > The only messages I can find have to do with a hipersockets time out.
> > Feb  9 11:21:39 bus0104 kernel: NETDEV WATCHDOG: hsi0: transmit timed
> out
> > Feb  9 11:21:39 bus0104 kernel: qeth: Recovery of device 0.0.5100
> > started ...
>
> Hmm, that shouldn't happen.

Lol ...
I was wondering if the OP could see anything from z/VM, but if it really is a
softirq/tasklet problem that probably means (Linux) dodgy driver.
And in almost all occasions that probably means network ...

Keeping an eye on /proc/interrupts might be instructive.

Shane ...

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit
http://wiki.linuxvm.org/
--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit
http://wiki.linuxvm.org/


Re: ksoftirqd using 100% CPU

2012-02-10 Thread Ron Foster at Baldor-IS
Jeorg,

We have a support agreement with Novell, so I will be opening up a service 
request.

The developers were getting hostile, so I had to revert the development system 
that was consistently having the problem to a previous level.  But I have found 
another system that had the problem once yesterday that we can use to gather 
the documentation.

Ron

I did not know that we could open a PMR with IBM directly.



From: Linux on 390 Port [LINUX-390@VM.MARIST.EDU] On Behalf Of Joerg Reuter 
[jreu...@suse.de]
Sent: Friday, February 10, 2012 6:00 AM
To: LINUX-390@VM.MARIST.EDU
Subject: Re: ksoftirqd using 100% CPU

On Thu, Feb 09, 2012 at 04:43:25PM -0600, Ron Foster at Baldor-IS wrote:

> The only messages I can find have to do with a hipersockets time out.
> Feb  9 11:21:39 bus0104 kernel: NETDEV WATCHDOG: hsi0: transmit timed out
> Feb  9 11:21:39 bus0104 kernel: qeth: Recovery of device 0.0.5100
> started ...

Hmm, that shouldn't happen. If you have a support agreement,
can you open a service request with Novell / SUSE, please? We'll
need the files generated by running "supportconfig" and "dbginfo.sh",
and if you already have opened a case with IBM, the PMR would be handy.

Regards,
Joerg
--
Joerg Reuterhttp://yaina.de/jreuter
And I make my way to where the warm scent of soil fills the evening air.
Everything is waiting quietly out there (Anne Clark)

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit
http://wiki.linuxvm.org/
--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit
http://wiki.linuxvm.org/


Re: ksoftirqd using 100% CPU

2012-02-10 Thread Alan Altmark
On Friday, 02/10/2012 at 10:21 EST, Ron Foster at Baldor-IS
 wrote:

> I did not know that we could open a PMR with IBM directly.

Only if your Linux support contract is with IBM.

Alan Altmark

Senior Managing z/VM and Linux Consultant
IBM System Lab Services and Training
ibm.com/systems/services/labservices
office: 607.429.3323
mobile; 607.321.7556
alan_altm...@us.ibm.com
IBM Endicott

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit
http://wiki.linuxvm.org/


Re: ksoftirqd using 100% CPU

2012-02-10 Thread Mark Post
>>> On 2/10/2012 at 11:33 AM, Alan Altmark  wrote: 
> On Friday, 02/10/2012 at 10:21 EST, Ron Foster at Baldor-IS
>  wrote:
> 
>> I did not know that we could open a PMR with IBM directly.
> 
> Only if your Linux support contract is with IBM.

But, if you've already opened a PMR with IBM for z/VM or for hardware (for 
whatever reason related to the incident), we'd like to know about it anyway.  I 
know the same is true in the reverse, if a customer comes to us first, and then 
later needs to work with IBM on the problem because it involves IBM software or 
hardware.


Mark Post

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit
http://wiki.linuxvm.org/


Re: ksoftirqd using 100% CPU

2012-02-20 Thread Ron Foster at Baldor-IS
FYI,

Known problem with the .95 kernel.  IBM has released a PTF and we are testing 
it.

Ron


From: Linux on 390 Port [LINUX-390@VM.MARIST.EDU] On Behalf Of Mark Post 
[mp...@novell.com]
Sent: Friday, February 10, 2012 12:11 PM
To: LINUX-390@VM.MARIST.EDU
Subject: Re: ksoftirqd using 100% CPU

>>> On 2/10/2012 at 11:33 AM, Alan Altmark  wrote:
> On Friday, 02/10/2012 at 10:21 EST, Ron Foster at Baldor-IS
>  wrote:
>
>> I did not know that we could open a PMR with IBM directly.
>
> Only if your Linux support contract is with IBM.

But, if you've already opened a PMR with IBM for z/VM or for hardware (for 
whatever reason related to the incident), we'd like to know about it anyway.  I 
know the same is true in the reverse, if a customer comes to us first, and then 
later needs to work with IBM on the problem because it involves IBM software or 
hardware.


Mark Post

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit
http://wiki.linuxvm.org/
--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
--
For more information on Linux on System z, visit
http://wiki.linuxvm.org/