Re: 2.4.x very unstable on 8-way IBM 8500R

2001-03-02 Thread Matilainen Panu

On Fri, 2 Mar 2001, ext Alan Cox wrote:

> > > (from Red Hat 7) but very erratic on all 2.4-kernels I've tried it with
> > > (2.4.[012], compiled both with egcs and RH7's gcc-2.96, both share the
>
>
> > Under redhat 7 you should use kgcc to compile the kernel, since gcc2.96 is
>
> So he was using egcs, and whether he had the pre-errata gcc 2.96
> wouldnt matter

Since this (once again) came up... I've been running 2.4.[012] on my home
box compiled with 2.96-errata without a single problem so far.

And yes I know it's not supported, consider this just a datapoint :)

- Panu -

>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: 2.4.x very unstable on 8-way IBM 8500R

2001-03-01 Thread Mike A. Harris

On Thu, 1 Mar 2001, Dr. Kelsey Hudson wrote:

>> I've been playing around with 8-way IBM8500R (8x700MHz Xeon) with 4.5GB
>> memory & AIC7xxx SCSI-controller. It's perfectly stable with 2.2-kernel
>> (from Red Hat 7) but very erratic on all 2.4-kernels I've tried it with
>> (2.4.[012], compiled both with egcs and RH7's gcc-2.96, both share the
>
>Under redhat 7 you should use kgcc to compile the kernel, since gcc2.96 is
>inherently broken(*).

http://www.bero.org/gcc296.html

>> same symptoms). It did have a ServeRAID controller too but IBM suggested
>> we take it out since 4500R also had problems with it on 2.4 but it didn't
>> make any difference at all. Also tried to turn off highmem support but
>> didn't make difference either.
>
>(*)  redhat chose to ship an experimental compiler with this release of
> the distribution that has a great many bugs. to ensure proper kernel
> compillation another proven version of gcc was included, but called
> kgcc instead. You should always use this to compile your kernels
> under redhat 7 until the newer version of gcc is released.

http://www.bero.org/gcc296.html



--
Mike A. Harris  -  Linux advocate  -  Free Software advocate
  This message is copyright 2001, all rights reserved.
  Views expressed are my own, not necessarily shared by my employer.
--
Red Hat Linux:  http://www.redhat.com
Download for free:  ftp://ftp.redhat.com/pub/redhat/redhat-6.2/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: 2.4.x very unstable on 8-way IBM 8500R

2001-03-01 Thread Tim Wright

On Thu, Mar 01, 2001 at 05:04:09PM -0800, Dr. Kelsey Hudson wrote:
> On Thu, 1 Mar 2001, Matilainen Panu (NRC/Helsinki) wrote:
> 
> > I've been playing around with 8-way IBM8500R (8x700MHz Xeon) with 4.5GB
> > memory & AIC7xxx SCSI-controller. It's perfectly stable with 2.2-kernel
> > (from Red Hat 7) but very erratic on all 2.4-kernels I've tried it with
> > (2.4.[012], compiled both with egcs and RH7's gcc-2.96, both share the
> 
> Under redhat 7 you should use kgcc to compile the kernel, since gcc2.96 is
> inherently broken(*). 
> 

For the umpteenth time, no it isn't. There are serious bugs in the shipped
version of gcc in RedHat 7.0, but they are fixed by applying the update.
The reason for supplying kgcc is to allow building a 2.2 kernel, because of
bugs in the kernel, NOT the compiler.

> > same symptoms). It did have a ServeRAID controller too but IBM suggested
> > we take it out since 4500R also had problems with it on 2.4 but it didn't
> > make any difference at all. Also tried to turn off highmem support but
> > didn't make difference either.
> 
> (*)  redhat chose to ship an experimental compiler with this release of
>  the distribution that has a great many bugs. to ensure proper kernel
>  compillation another proven version of gcc was included, but called
>  kgcc instead. You should always use this to compile your kernels
>  under redhat 7 until the newer version of gcc is released.
> 

No. Provided you grab the update, you can build the 2.4 kernel perfectly
happily using the RedHat gcc snapshot. I'm running it successfully on a number
of machines. The issue with 2.4 on certain Netfinities is a bad interaction
between the NMI watchdog code and the systems management card. Changing
compilers makes no difference.

Tim

-- 
Tim Wright - [EMAIL PROTECTED] or [EMAIL PROTECTED] or [EMAIL PROTECTED]
IBM Linux Technology Center, Beaverton, Oregon
Interested in Linux scalability ? Look at http://lse.sourceforge.net/
"Nobody ever said I was charming, they said "Rimmer, you're a git!"" RD VI
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: 2.4.x very unstable on 8-way IBM 8500R

2001-03-01 Thread Tim Wright

Just FYI,
I am chasing this problem. There appears to be an unpleasant interaction between
the Advanced Systems Management card and the NMI watchdog code. Ripping the card
out of the machine also eradicates the problem, but is less desirable. 
I'll let people know when there's a better solution.

Tim

On Thu, Mar 01, 2001 at 03:30:56PM +0200, Matilainen Panu (NRC/Helsinki) wrote:
> On Thu, 1 Mar 2001, ext Andrew Morton wrote:
> > "Matilainen Panu (NRC/Helsinki)" wrote:
> > > On Thu, 1 Mar 2001, ext Andrew Morton wrote:
> > > >
> > > > Is it stable with `nmi_watchdog=0'?
> > >
> > > If the default value for nmi_watchdog is 0 then no - I added the
> > > nmi_watchdog=1 just to see if that makes any difference. If it's on by
> > > default then I'll need to test it that way.
> >
> > Default for nmi_watchdog is `enabled'.
> >
> > Several people have reported that turning it off with
> > the `nmi_watchdog=0' LILO option makes systems stable.
> > Nobody knows why.
> >
> > (If nmi_watchdog _does_ make the achine stable, please
> >  tell linux-kernel.).
> 
> It's too early to say for sure but that seems to have fixed it. Uptime now
> nearly an hour under loads of 20-30 which is way more than it has been
> able to stay up before. I'll let you know whether its still up tomorrow.
> 
> Million thanks for the tip!
> 
>   - Panu -
> 

-- 
Tim Wright - [EMAIL PROTECTED] or [EMAIL PROTECTED] or [EMAIL PROTECTED]
IBM Linux Technology Center, Beaverton, Oregon
Interested in Linux scalability ? Look at http://lse.sourceforge.net/
"Nobody ever said I was charming, they said "Rimmer, you're a git!"" RD VI
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: 2.4.x very unstable on 8-way IBM 8500R

2001-03-01 Thread J Sloan

"Dr. Kelsey Hudson" wrote:

> Under redhat 7 you should use kgcc to compile the kernel, since gcc2.96 is
> inherently broken(*).

Or upgrade to the current Red Hat 7 gcc, which works quite well.

jjs

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: 2.4.x very unstable on 8-way IBM 8500R

2001-03-01 Thread Alan Cox

> > (from Red Hat 7) but very erratic on all 2.4-kernels I've tried it with
> > (2.4.[012], compiled both with egcs and RH7's gcc-2.96, both share the
   

> Under redhat 7 you should use kgcc to compile the kernel, since gcc2.96 is

So he was using egcs, and whether he had the pre-errata gcc 2.96 wouldnt matter

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: 2.4.x very unstable on 8-way IBM 8500R

2001-03-01 Thread Dr. Kelsey Hudson

On Thu, 1 Mar 2001, Matilainen Panu (NRC/Helsinki) wrote:

> I've been playing around with 8-way IBM8500R (8x700MHz Xeon) with 4.5GB
> memory & AIC7xxx SCSI-controller. It's perfectly stable with 2.2-kernel
> (from Red Hat 7) but very erratic on all 2.4-kernels I've tried it with
> (2.4.[012], compiled both with egcs and RH7's gcc-2.96, both share the

Under redhat 7 you should use kgcc to compile the kernel, since gcc2.96 is
inherently broken(*). 

> same symptoms). It did have a ServeRAID controller too but IBM suggested
> we take it out since 4500R also had problems with it on 2.4 but it didn't
> make any difference at all. Also tried to turn off highmem support but
> didn't make difference either.

(*)  redhat chose to ship an experimental compiler with this release of
 the distribution that has a great many bugs. to ensure proper kernel
 compillation another proven version of gcc was included, but called
 kgcc instead. You should always use this to compile your kernels
 under redhat 7 until the newer version of gcc is released.

talk to you later,

 Kelsey Hudson   [EMAIL PROTECTED] 
 Software Engineer
 Compendium Technologies, Inc   (619) 725-0771
--- 

 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: 2.4.x very unstable on 8-way IBM 8500R

2001-03-01 Thread Matilainen Panu (NRC/Helsinki)

On Thu, 1 Mar 2001, ext Andrew Morton wrote:
> "Matilainen Panu (NRC/Helsinki)" wrote:
> > On Thu, 1 Mar 2001, ext Andrew Morton wrote:
> > >
> > > Is it stable with `nmi_watchdog=0'?
> >
> > If the default value for nmi_watchdog is 0 then no - I added the
> > nmi_watchdog=1 just to see if that makes any difference. If it's on by
> > default then I'll need to test it that way.
>
> Default for nmi_watchdog is `enabled'.
>
> Several people have reported that turning it off with
> the `nmi_watchdog=0' LILO option makes systems stable.
> Nobody knows why.
>
> (If nmi_watchdog _does_ make the achine stable, please
>  tell linux-kernel.).

It's too early to say for sure but that seems to have fixed it. Uptime now
nearly an hour under loads of 20-30 which is way more than it has been
able to stay up before. I'll let you know whether its still up tomorrow.

Million thanks for the tip!

- Panu -

>

-- 



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



2.4.x very unstable on 8-way IBM 8500R

2001-03-01 Thread Matilainen Panu (NRC/Helsinki)

Hi,

I've been playing around with 8-way IBM8500R (8x700MHz Xeon) with 4.5GB
memory & AIC7xxx SCSI-controller. It's perfectly stable with 2.2-kernel
(from Red Hat 7) but very erratic on all 2.4-kernels I've tried it with
(2.4.[012], compiled both with egcs and RH7's gcc-2.96, both share the
same symptoms). It did have a ServeRAID controller too but IBM suggested
we take it out since 4500R also had problems with it on 2.4 but it didn't
make any difference at all. Also tried to turn off highmem support but
didn't make difference either.

Symptoms: it sometimes boots and stays up for a while (anything between 10
seconds to maximum of about half an hour) but most of the time it locks up
early in the boot while its enabling the CPUs:

--
Booting processor 4/3 eip 2000
Setting warm reset code and vector.
1.
2.
3.
Asserting INIT.
Waiting for send to finish...
+Deasserting INIT.
Waiting for send to finish...
+#startup loops: 2.
Sending STARTUP #1.
After apic_write.
Startup point 1.
Waiting for send to finish...
+Sending STARTUP #2.
After apic_write.
Startup point 1.
Waiting for send to finish...
+After Startup.
Before Callout 4.
After Callout 4.
-- 
..this is where it *usually* locks up, but the processor number where it
hangs varies randomly. Also it has locked up in other places too a couple
of times. If it boots and crashes then there's nothing in the logs, it's
just a sudden hard lockup.

If it is booted with "nosmp noapic" it seems perfectly stable but I'd sure
like to use those other 7 CPU's too :)

Any ideas/suggestions/patches etc would be greatly appreciated...

- Panu -

Here's a bootlog of a rare succesfull boot (hopefully got the copy-paste
right...)
---
Inspecting /boot/System.map-2.4.2
Loaded 14798 symbols from /boot/System.map-2.4.2.
Symbols match kernel version 2.4.2.
No module symbols loaded.
 
ESR value after enabling vector: 
Calibrating delay loop... 1399.19 BogoMIPS
Stack at about c5cd7fb8
CPU: Before vendor init, caps: 0383fbff  , vendor = 0
CPU: L1 I cache: 16K, L1 D cache: 16K
CPU: L2 cache: 1024K
Intel machine check reporting enabled on CPU#2.
CPU: After vendor init, caps: 0383fbff   
CPU: After generic, caps: 0383fbff   
CPU: Common caps: 0383fbff   
OK.
CPU2: Intel Pentium III (Cascades) stepping 01
CPU has booted.
Booting processor 3/2 eip 2000
Setting warm reset code and vector.
1.
2.
3.
Asserting INIT.
Waiting for send to finish...
+Deasserting INIT.
Waiting for send to finish...
+#startup loops: 2.
Sending STARTUP #1.
After apic_write.
Startup point 1.
Waiting for send to finish...
+Sending STARTUP #2.
After apic_write.
Startup point 1.
Waiting for send to finish...
+After Startup.
Before Callout 3.
After Callout 3.
Initializing CPU#3
CPU#3 (phys ID: 2) waiting for CALLOUT
CALLIN, before setup_local_APIC().
masked ExtINT on CPU#3
ESR value before enabling vector: 
ESR value after enabling vector: 
Calibrating delay loop... 1399.19 BogoMIPS
Stack at about c5cd5fb8
CPU: Before vendor init, caps: 0383fbff  , vendor = 0
CPU: L1 I cache: 16K, L1 D cache: 16K
CPU: L2 cache: 1024K
Intel machine check reporting enabled on CPU#3.
CPU: After vendor init, caps: 0383fbff   
CPU: After generic, caps: 0383fbff   
CPU: Common caps: 0383fbff   
OK.
CPU3: Intel Pentium III (Cascades) stepping 01
CPU has booted.
Booting processor 4/3 eip 2000
Setting warm reset code and vector.
1.
2.
3.
Asserting INIT.
Waiting for send to finish...
+Deasserting INIT.
Waiting for send to finish...
+#startup loops: 2.
Sending STARTUP #1.
After apic_write.
Startup point 1.
Waiting for send to finish...
+Sending STARTUP #2.
After apic_write.
Startup point 1.
Waiting for send to finish...
+After Startup.
Before Callout 4.
After Callout 4.
Initializing CPU#4
CPU#4 (phys ID: 3) waiting for CALLOUT
CALLIN, before setup_local_APIC().
masked ExtINT on CPU#4
ESR value before enabling vector: 
ESR value after enabling vector: 
Calibrating delay loop... 1399.19 BogoMIPS
Stack at about c5cd3fb8
CPU: Before vendor init, caps: 0383fbff  , vendor = 0
CPU: L1 I cache: 16K, L1 D cache: 16K
CPU: L2 cache: 1024K
Intel machine check reporting enabled on CPU#4.
CPU: After vendor init, caps: 0383fbff   
CPU: After generic, caps: 0383fbff   
CPU: Common caps: 0383fbff   
OK.
CPU4: Intel Pentium III (Cascades) stepping 01
CPU has booted.
Booting processor 5/4 eip 2000
Setting warm reset code and vector.
1.
2.
3.
Asserting INIT.
Waiting for send to finish...
+Deasserting INIT.
Waiting for send to finish...
+#startup loops: 2.
Sending STARTUP #1.
After apic_write.
Startup point 1.
Waiting for send to finish...
+Sending STARTUP #2.
After apic_write.
Startup point 1.
Waiting for send to finish...
+After Startup.
Before Callout 5.
Af