>is it correct that this failing kernel didnt have the RAID patch applied?

No, it was with raid too (all pre15 runs were). But this specific test result is not 
of much use anyway, since there was an eventually problematic reject with the Unified 
IDE patch (but still: that kernel survived longest).

>(B) is quite unlikely if you do not have it applied and the box still
crashes? My understanding is that others who had IDE+SMP problems could
reproduce it without RAID as well. (RAID0 stresses the hardware harder)

The only worthwile things in the equation besides 2.2.13pre15 seem to be the 
raid-patch and the hardware.

But the raid software runs on SCSI without known problems, it didn�t have a bad SMP 
track record lately like the IDE driver (if ever), and it�s design isn�t affected by 
an overly complicated and quirky hardware (like the IDE driver).
I agree with you that the raid patch is a really bad candidate for the lockup.

Unfortunately I only could do few tests with pre15, and only one failure looked like 
an IDE lockup. Not a good basis for any pre15 IDE instability claims anyway. Could 
still be a hardware issue or, well, cosmic rays. Or a CPU heat problem, as Mike Black 
suggested. Or anything else in 2.2.13pre15 (mm, ext2, whatever).

Probably only systematic testing could resolve that. I unfortunately could not do 
this, since I was running out of time. The IDE raid array now is in a single processor 
machine. This one ran like a clockwork for 24 hours under heavy stress, on all five 
drives, on five controllers, udma33 enabled. With 2.2.13pre15 (kernel as outlined in 
the previous posting). The SMP machine works flawlessly too under stress for 24h, with 
one IDE drive (same kernel).

Judging from the tests it seems to me that 2.2.12 to 2.2.13pre14 had easily 
reproducible IDE+SMP stability problems, and 2.2.13pre15 eventually still has sporadic 
ones. But generally only with several IDE controllers used at the same time.

==============

On Wed, 6 Oct 1999 [EMAIL PROTECTED] wrote:

> One more pre15 test:
> 2.2.13pre15 with Unified IDE 2.2.13pre14-19991003 (two rejects in ide.c, one ok, one 
>probably harmless):
> (5) dual P3 machine: NULL deref after 6 hours (i.e. this pre15 kernel survived 
>longest)

is it correct that this failing kernel didnt have the RAID patch applied?

> I can think of these possible reasons for the SMP problems:
> 
> (A) SMP race(s) in IDE driver in original 2.2.13pre15
> (B) SMP-deadlock in raid-2.2.11-patch

(B) is quite unlikely if you do not have it applied and the box still
crashes? My understanding is that others who had IDE+SMP problems could
reproduce it without RAID as well. (RAID0 stresses the hardware harder)

-- mingo


---Header---
Received: from chiara.csoma.elte.hu (chiara.csoma.elte.hu [157.181.71.18])
        by rivalnet.de (8.9.3/8.9.3) with ESMTP id RAA03555
        for <[EMAIL PROTECTED]>; Wed, 6 Oct 1999 17:31:34 +0200
Received: (from mingo@localhost)
        by chiara.csoma.elte.hu (8.8.8/8.8.8/c) id RAA03472;
        Wed, 6 Oct 1999 17:31:14 +0200
Date: Wed, 6 Oct 1999 17:37:20 +0200 (CEST)
From: Ingo Molnar <[EMAIL PROTECTED]>
Sender: [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
cc: [EMAIL PROTECTED], [EMAIL PROTECTED],
        [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED],
        [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED]
Subject: Re: 2.2.13pre15 SMP+IDE test summary
In-Reply-To: <[EMAIL PROTECTED]>
Message-ID: <[EMAIL PROTECTED]>


--
the online community service for gamers & friends -  http://www.rivalnet.com
* unterst�tzt �ber 50 PC-Spiele im Multiplayer-Modus
* Dateien senden & empfangen bis 500 MB am St�ck
* Newsgroups, Mail, Chat & mehr

Reply via email to