Re: 4.4-STABLE crashes - suspects new ata-driver over wd-drivers

2002-01-02 Thread Kristian K. Nielsen

Hey Søren,

Do you have any idea what to do with the problems I am experiencing with the
Intel-series chipset?

Regards
Kristian

- Original Message -
From: "SXren Schmidt" <[EMAIL PROTECTED]>
To: "Nils Holland" <[EMAIL PROTECTED]>
Cc: "Matthew Dillon" <[EMAIL PROTECTED]>; "Mike Silbersack"
<[EMAIL PROTECTED]>; "Brandon S. Allbery KF8NH" <[EMAIL PROTECTED]>; "ian j
hart" <[EMAIL PROTECTED]>; "Matthew Gilbert" <[EMAIL PROTECTED]>;
<[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>
Sent: Monday, December 31, 2001 2:22 PM
Subject: Re: 4.4-STABLE crashes - suspects new ata-driver over wd-drivers


> It seems Nils Holland wrote:
> > root@poison> pciconf -r -b pci0:0:0 0x0:0xff
>
> Thanks! this was a kernel without the corruption fix, and it shows
> that you need it, the MWQ bug has been fixed in your BIOS...
>
> I have a new improved patch in the works that covers more chipset
> comboes, it'll go into -current shortly, and I hope to get
> permission to get it in 4.5, but so far the RE@ doesn't respond...
>
> -Søren
>
> To Unsubscribe: send mail to [EMAIL PROTECTED]
> with "unsubscribe freebsd-stable" in the body of the message
>


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-stable" in the body of the message



Re: 4.4-STABLE crashes - suspects new ata-driver over wd-drivers

2001-12-31 Thread Søren Schmidt

It seems Nils Holland wrote:
> root@poison> pciconf -r -b pci0:0:0 0x0:0xff

Thanks! this was a kernel without the corruption fix, and it shows
that you need it, the MWQ bug has been fixed in your BIOS...

I have a new improved patch in the works that covers more chipset
comboes, it'll go into -current shortly, and I hope to get 
permission to get it in 4.5, but so far the RE@ doesn't respond...

-Søren

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-stable" in the body of the message



Re: 4.4-STABLE crashes - suspects new ata-driver over wd-drivers

2001-12-28 Thread ian j hart

Matthew Dillon wrote:
> 
> :...
> :> >
> :> > There is also the "only supports 16MxN RAM" feature.
> :>
> :> Maybe I should toss in that I've had spontaneous reboots during heavy
> :> IDE activity both on my desktop (VIA 82C686) and my laptop (Intel
> :> 82443BX).  And before that, random disk corruption during heavy SCSI
> :> activity on my old desktop machine (seen with Tekram and Acer
> :> 83C575-based host adapters and a borrowed Adaptec 2940).
> :
> :Guys guys, we are talking about known HW issues that causes known
> :bad behavior, having a system that is flaky can have all kinds of
> :reasons, I'd risk saying that genuine HW bugs like the 686B bug
> :is one of the least likely problems...
> :
> :The most likely reasons are probably bad/subspec'd RAM, lousy PSU,
> :bad/subspec'd cabeling, too many "performance features" enabled,
> :generally crappy hardware (there are *tons* of that out there),
> :bad/insufficient cooling, overclocking (even the motherboard makers
> :overclock pr default nowadays to gain a litte over the competition),
> :
> :And do *not* forget bugs/bogons/mistakes in your favorite OS :)
> :
> :-Søren
> 
> Ok, I have more information on Nils problem.  First of all, Soren's
> patch greatly reduced the rate of corruption.  It took 25 loops of
> Nils 'cp' test to generate the corruption.
> 
> However, Soren's patch did not fiix the corruption.  The same exact
> corruption is occuring.  In Nils case it is always the same exact
> location in VM -- a certain bit (or byte) in the middle of the nfsnode
> hash table.  Hardware watch points indicate that the cpu is NOT modifying
> this location, so I really doubt that it is a kernel bug.
> 
> From this and from reading a number of other postings about VIA chipsets
> I believe that Soren's original patch (which I guess is the official
> VIA chipset patch) does not completely solve the VIA chipset's problems.
> I also believe, from reading some of the reference material that has
> been posted, that this corruption is not limited to the 686[A/B] but
> may also occur in earlier VIA chipsets.
> 
> What I would like to do is try forcing the DMA transfer rate to 66 MHz,
> i.e. UDMA66 or UDMA33, to see if that solves the problem.  Soren,
> could you supply a patch that universally turns off higher UDMA modes?
> 
> -Matt
> Matthew Dillon
> <[EMAIL PROTECTED]>
> 
> To Unsubscribe: send mail to [EMAIL PROTECTED]
> with "unsubscribe freebsd-stable" in the body of the message

I'm out of the loop on this one. I have been unable
to reproduce the error for two weeks. I'll try
backtracking to see if I can break it again.

Summary:
System VIA ATA33 (82C586) with UDMA66 drives.
4 disks RAID10 (vinum).
Softupdates ON, DMA ON.

1 Upgraded BIOS. Side effect - factory default (safe)
settings.
2 This revealed the memory addressing feature, so I
swapped the 8chip X 1side SDRAM for an 8cX2s to
match the other RAM present.
This will wrap - sorry.
http://www.azza.com.tw/ftp/specs/MTB-0109-01-00%20'Using%20more%20then%2064Mb%20memory%20with%20VIA%20MVP3%20chipset%20boards'.htm
3 The new BIOS doesn't like the ISA SCSI card and I
get a panic on boot. BIOS settings which work
put the SB128 on the same IRQ as the network card (rl).
4 Swapped the ISA SCSI card (adv) for a PCI (sym) one.
[Only a CDRW on this]
5 Full build (Dec 11).

I've thrashed the living daylights out of the drives
without so much as a twitch. I enabled the memory
performance features - still okay. If I can force
an IRQ conflict I'll try that next, followed by the
old SCSI card. I'd have to downgrade the BIOS to
try the old memory, and anyway it's in another M/C.

-- 
ian j hart

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-stable" in the body of the message



Re: 4.4-STABLE crashes - suspects new ata-driver over wd-drivers

2001-12-28 Thread Mike Silbersack


On Fri, 28 Dec 2001, Søren Schmidt wrote:

> I know that change as well, but so far I havn't been able to verify
> that it does what it intends to do, VIA's docs are very vague on this.
>
> There is alot about this on the net, but *lots* of it are just notes
> scribbled together by nerds^H^H^H^H^Hpeople that has no idea what
> they are doing, they just change random bits that other talk about
> and hope it works...
>
> I'll try to get this veryfied and tested here...
>
> -Søren

True, true.  I trust your judgement, I just wasn't sure if you had seen
those patches yet.

On a related note, would it be possible to modify ata_via686b so that it
looks at what it reads from the register and only does the write and print
if the bits are set incorrectly?

Mike "Silby" Silbersack


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-stable" in the body of the message



Re: 4.4-STABLE crashes - suspects new ata-driver over wd-drivers

2001-12-28 Thread Matthew Dillon

Note that we aren't complaining about your patch or anything, you
are simply the closest thing we have to a VIA chip expert right now :-)

-Matt
Matthew Dillon 
<[EMAIL PROTECTED]>

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-stable" in the body of the message



Re: 4.4-STABLE crashes - suspects new ata-driver over wd-drivers

2001-12-28 Thread Søren Schmidt

It seems Mike Silbersack wrote:
> Agreed, it looks like the "MWQ bug" isn't addressed by soren's patch.  The
> decription at
> http://www.networking.tzo.com/net/software/readme/faqvl019.htm
> doesn't give enough info to patch it, but this post to the linux-kernel
> mailing list seems to shed more light on what needs to be done:
> 
> http://www.uwsg.iu.edu/hypermail/linux/kernel/0112.0/1421.html

I know that change as well, but so far I havn't been able to verify
that it does what it intends to do, VIA's docs are very vague on this.

There is alot about this on the net, but *lots* of it are just notes
scribbled together by nerds^H^H^H^H^Hpeople that has no idea what
they are doing, they just change random bits that other talk about
and hope it works...

I'll try to get this veryfied and tested here...

-Søren

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-stable" in the body of the message



Re: 4.4-STABLE crashes - suspects new ata-driver over wd-drivers

2001-12-28 Thread Mike Silbersack


On Fri, 28 Dec 2001, Matthew Dillon wrote:

> Ok, I have more information on Nils problem.  First of all, Soren's
> patch greatly reduced the rate of corruption.  It took 25 loops of
> Nils 'cp' test to generate the corruption.
>
> However, Soren's patch did not fiix the corruption.  The same exact
> corruption is occuring.  In Nils case it is always the same exact
> location in VM -- a certain bit (or byte) in the middle of the nfsnode
> hash table.  Hardware watch points indicate that the cpu is NOT modifying
> this location, so I really doubt that it is a kernel bug.
>
> From this and from reading a number of other postings about VIA chipsets
> I believe that Soren's original patch (which I guess is the official
> VIA chipset patch) does not completely solve the VIA chipset's problems.
> I also believe, from reading some of the reference material that has
> been posted, that this corruption is not limited to the 686[A/B] but
> may also occur in earlier VIA chipsets.

Agreed, it looks like the "MWQ bug" isn't addressed by soren's patch.  The
decription at
http://www.networking.tzo.com/net/software/readme/faqvl019.htm
doesn't give enough info to patch it, but this post to the linux-kernel
mailing list seems to shed more light on what needs to be done:

http://www.uwsg.iu.edu/hypermail/linux/kernel/0112.0/1421.html

Perhaps someone on a faster connection than I can snag a copy of whatever
version of linux is current and see the exact patch that went in.

Mike "Silby" Silbersack


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-stable" in the body of the message



Re: 4.4-STABLE crashes - suspects new ata-driver over wd-drivers

2001-12-28 Thread Søren Schmidt

It seems Matthew Dillon wrote:
> Ok, I have more information on Nils problem.  First of all, Soren's
> patch greatly reduced the rate of corruption.  It took 25 loops of
> Nils 'cp' test to generate the corruption.

Hmm, did the second change I posted change anything ?

> However, Soren's patch did not fiix the corruption.  The same exact
> corruption is occuring.  In Nils case it is always the same exact
> location in VM -- a certain bit (or byte) in the middle of the nfsnode
> hash table.  Hardware watch points indicate that the cpu is NOT modifying
> this location, so I really doubt that it is a kernel bug.

If the BIOS has the option to disable "page mode" access to RAM try
to switch that off, it has shown problems here (as I mentioned in 
another mail)

> From this and from reading a number of other postings about VIA chipsets
> I believe that Soren's original patch (which I guess is the official
> VIA chipset patch) does not completely solve the VIA chipset's problems.

My patch is based on the info from VIA and from various other sources
plus lots of testing here in the lab.

> I also believe, from reading some of the reference material that has
> been posted, that this corruption is not limited to the 686[A/B] but
> may also occur in earlier VIA chipsets.

The 686B data corruption bug is isolated to that chip *only*, the older
686A doesn't have that problem, the even older 686 has problems with
the ATA66 clock generation but luckily only few of those exist.
There is no offcial problems with the older chips of this sort, but some
has varius minor problems/"features", which is unrelated here...

> What I would like to do is try forcing the DMA transfer rate to 66 MHz,
> i.e. UDMA66 or UDMA33, to see if that solves the problem.  Soren,
> could you supply a patch that universally turns off higher UDMA modes?

Sure, this one turns off ATA100 support (note its for -current, but 
should be easily applied to -stable) in the same fashion ATA66 can 
be turned off etc. However under -current you can just use atacontrol 
to set the wanted transfermode, no patch is needed there.

--- ata-dma.c   25 Dec 2001 14:44:26 -  1.80
+++ ata-dma.c   28 Dec 2001 22:14:01 -
@@ -407,6 +407,7 @@
if (ata_find_dev(parent, 0x06861106, 0x40) ||
ata_find_dev(parent, 0x82311106, 0) ||
ata_find_dev(parent, 0x30741106, 0)) {  /* 82C686b */
+#if 0
if (udmamode >= 5) {
error = ata_command(scp, device, ATA_C_SETFEATURES, 0,
ATA_UDMA5, ATA_C_F_SETXFER, ATA_WAIT_READY);
@@ -419,6 +420,7 @@
return;
}
}
+#endif
if (udmamode >= 4) {
error = ata_command(scp, device, ATA_C_SETFEATURES, 0,
ATA_UDMA4, ATA_C_F_SETXFER, ATA_WAIT_READY);

-Søren

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-stable" in the body of the message



Re: 4.4-STABLE crashes - suspects new ata-driver over wd-drivers

2001-12-28 Thread Matthew Dillon


:...
:> > 
:> > There is also the "only supports 16MxN RAM" feature.
:> 
:> Maybe I should toss in that I've had spontaneous reboots during heavy
:> IDE activity both on my desktop (VIA 82C686) and my laptop (Intel
:> 82443BX).  And before that, random disk corruption during heavy SCSI
:> activity on my old desktop machine (seen with Tekram and Acer
:> 83C575-based host adapters and a borrowed Adaptec 2940).
:
:Guys guys, we are talking about known HW issues that causes known
:bad behavior, having a system that is flaky can have all kinds of 
:reasons, I'd risk saying that genuine HW bugs like the 686B bug
:is one of the least likely problems...
:
:The most likely reasons are probably bad/subspec'd RAM, lousy PSU,
:bad/subspec'd cabeling, too many "performance features" enabled,
:generally crappy hardware (there are *tons* of that out there), 
:bad/insufficient cooling, overclocking (even the motherboard makers 
:overclock pr default nowadays to gain a litte over the competition), 
:
:And do *not* forget bugs/bogons/mistakes in your favorite OS :)
:
:-Søren

Ok, I have more information on Nils problem.  First of all, Soren's
patch greatly reduced the rate of corruption.  It took 25 loops of
Nils 'cp' test to generate the corruption.

However, Soren's patch did not fiix the corruption.  The same exact
corruption is occuring.  In Nils case it is always the same exact
location in VM -- a certain bit (or byte) in the middle of the nfsnode
hash table.  Hardware watch points indicate that the cpu is NOT modifying
this location, so I really doubt that it is a kernel bug.

From this and from reading a number of other postings about VIA chipsets
I believe that Soren's original patch (which I guess is the official
VIA chipset patch) does not completely solve the VIA chipset's problems.
I also believe, from reading some of the reference material that has
been posted, that this corruption is not limited to the 686[A/B] but
may also occur in earlier VIA chipsets.

What I would like to do is try forcing the DMA transfer rate to 66 MHz,
i.e. UDMA66 or UDMA33, to see if that solves the problem.  Soren,
could you supply a patch that universally turns off higher UDMA modes?

-Matt
Matthew Dillon 
<[EMAIL PROTECTED]>


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-stable" in the body of the message



Re: 4.4-STABLE crashes - suspects new ata-driver over wd-drivers

2001-12-27 Thread Matthew Dillon

This is great news!  I'm crossing my fingers and hoping that Nils can't
reproduce the crash any more with Soren's fix.

Just to let you all know, Nils has been working his ass off helping me
track his crash down.  I've been pulling my hair out... I gave him patch
after patch to test various conditions & panic if the nfs_node's hash list
somehow got broken, and for the last week not a single one of those tests
detected the problem prior to the panic.  The nfs_node's hash list
was being corrupted seemingly out of nowhere.

The last two days I've had Nils use hardware watchpoints in DDB> to 
try to track down what was modifying the memory location, with no 
success.  The watchpoint was catching the (correct) write to the list
head but then failed to catch the corrupted write prior to the system
panicing, which is what makes me believe it is some sort of chipset
issue.

Another thing to note:  One of the really weird things about Nils crashes
is that the same memory location was getting corrupted every time, five
times in a row (which made it possible to use a hardware watch point).
The corruption changed somewhat when he added the hardware watch point.
Another similar set of crashes in the vm_page_list (that other people
report, including a number of machines at Yahoo), have a similar M.O
IDE drive, medium/heavy activity, but while corrupted address always
winds up in the (static) vm_page array, it always tends to be slightly
different.  I'm hoping that it winds up being the same or similar
issue.  I'm not ruling out the possibility that chipsets other then
the 686B have problems too.

In anycase, Nils description makes a lot of sense.  I've asked him to
continue testing his system to make sure that this particular crash cannot
be reproduced, and I am crossing my fingers.

I'm also wondering how applicable this patch might be in regards to 
forcing a 'safe' mode for other PCI chipsets, to allow us to test
it on non-686B machines that have similar problems.

-Matt
Matthew Dillon 
<[EMAIL PROTECTED]>


:On Thu, Dec 27, 2001 at 10:45:01AM +0100, Søren Schmidt stood up and spoke:
:> 
:> OK, here goes the VIA 686b patch, it is hand cut out from the bulk patches
:> to go into 4.5 so beware :)
:
:Well, as Matt has said, I reported a crash that he's trying to debug. Since
:I have the 686b in my machine, I applied the patch. Ever since then I was
:not able to reproduce the crash again, although yesterday it was so easy
:that I could do it twice an hour ;-)
:
:Anyway, you (Soren) said that the right way to fix this is a BIOS update.
:Now, could it be that some mainboard manufacturers are incapabel of
:handling this? I'm using the latest BIOS for my board, and according to
:http://www.chaintech.com.tw/DL/7xMB/7AJA0.HTM, this should already have
:been fixed in their BIOS release from 2001-04-23...
:
:Second interesting thing: I was using a UDMA66 drive on my 686b until a few
:weeks ago and never had any problems - the stuff Matt is looking at only
:started two appear a short while after I exchanged that drive for a UDMA100
:one. So, it seems as if probably the slower drive didn't produce a high
:enough PCI workload for anything to actually happen.
:
:This fix will probably also have some influence on a few other similar
:problems (I read Matt was working on many of them). In the end I hope that
:this fix - or a variation thereof - will actually go into 4.5.
:
:Greetings
:Nils
:
:-- 
:Nils Holland

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-stable" in the body of the message



Re: 4.4-STABLE crashes - suspects new ata-driver over wd-drivers

2001-12-26 Thread Matthew Dillon


:
::
::> atapci0:  port 0xd000-0xd00f at device 7.1 on 
::
::You have the VIA 82c686b southbridge chip which is known to cause severe
::data corruption problems if the BIOS does not setup the northbidge
::chip correctly. Please check with your board vendor if they have a
::new updated BIOS that fixes this problem. This is not an ATA specific
::problem, but a problem with the PCI subsystem in general on these
::chips that manifests it self on high PCI load (which the ATA subsytem
::is quite capable of delivering)...
::
::A fix for this is present in -current, and if I get permission from 
::the RE, it will go into 4.5 also, but a BIOS fix is by far the right
::way to fix this problem.
::
::-Søren
:
:Soren, if you post a patch for 4.x I will be happy to follow-up 
:with Brady.  I've been working with Brady for several days now trying 
:to track down corruption in the vm_page array.  I *really* want to know
:if a VIA chipset patch solves his problem, because it would
:also explain about a dozen similar bug reports over the last 6 months.
:(Which would also be good backing to get it into 4.5).

Sorry, I meant I'd follow up with Matthew Gilbert.  Also, I've 
been trying to track down a crash in nfs_node that Nils Holland
has been having - he appears to have the same chipset and could
have the same problem, and he can reproduce the panic very consistently
within a few hours.

Brady has an older chipset.  Are there any known problems with this
chipset?

pcib1:  at device 1.0 on pci0
pci1:  on pcib1
isab0:  at device 7.0 on pci0
isa0:  on isab0
atapci0:  port 0xe400-0xe40f at device 7.1 on pci0
ata0: at 0x1f0 irq 14 on atapci0

-Matt
Matthew Dillon 
<[EMAIL PROTECTED]>

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-stable" in the body of the message



Re: 4.4-STABLE crashes - suspects new ata-driver over wd-drivers

2001-12-26 Thread Matthew Dillon


:
:It seems Matthew Gilbert wrote:
:> > So, general question to everyone having crashes during heavy
:> > IDE workloads:  Please post your /var/run/dmesg.boot
:> > output.
:
:> atapci0:  port 0xd000-0xd00f at device 7.1 on 
:
:You have the VIA 82c686b southbridge chip which is known to cause severe
:data corruption problems if the BIOS does not setup the northbidge
:chip correctly. Please check with your board vendor if they have a
:new updated BIOS that fixes this problem. This is not an ATA specific
:problem, but a problem with the PCI subsystem in general on these
:chips that manifests it self on high PCI load (which the ATA subsytem
:is quite capable of delivering)...
:
:A fix for this is present in -current, and if I get permission from 
:the RE, it will go into 4.5 also, but a BIOS fix is by far the right
:way to fix this problem.
:
:-Søren

Soren, if you post a patch for 4.x I will be happy to follow-up 
with Brady.  I've been working with Brady for several days now trying 
to track down corruption in the vm_page array.  I *really* want to know
if a VIA chipset patch solves his problem, because it would
also explain about a dozen similar bug reports over the last 6 months.
(Which would also be good backing to get it into 4.5).

-Matt
Matthew Dillon 
<[EMAIL PROTECTED]>

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-stable" in the body of the message



Re: 4.4-STABLE crashes - suspects new ata-driver over wd-drivers

2001-12-26 Thread Søren Schmidt

It seems Matthew Gilbert wrote:
> > So, general question to everyone having crashes during heavy
> > IDE workloads:  Please post your /var/run/dmesg.boot
> > output.

> atapci0:  port 0xd000-0xd00f at device 7.1 on 

You have the VIA 82c686b southbridge chip which is known to cause severe
data corruption problems if the BIOS does not setup the northbidge
chip correctly. Please check with your board vendor if they have a
new updated BIOS that fixes this problem. This is not an ATA specific
problem, but a problem with the PCI subsystem in general on these
chips that manifests it self on high PCI load (which the ATA subsytem
is quite capable of delivering)...

A fix for this is present in -current, and if I get permission from 
the RE, it will go into 4.5 also, but a BIOS fix is by far the right
way to fix this problem.

-Søren

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-stable" in the body of the message



Re: 4.4-STABLE crashes - suspects new ata-driver over wd-drivers

2001-12-26 Thread Matthew Gilbert

> So, general question to everyone having crashes during heavy
> IDE workloads:  Please post your /var/run/dmesg.boot
> output.

This is after a second reboot (that is why the disks come up clean). -Matt

Copyright (c) 1992-2001 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
The Regents of the University of California. All rights reserved.
FreeBSD 4.5-PRERELEASE #0: Sat Dec 22 12:13:13 MST 2001
root@string:/usr/obj/usr/src/sys/MYKERNEL
Timecounter "i8254"  frequency 1193182 Hz
CPU: Pentium III/Pentium III Xeon/Celeron (736.02-MHz 686-class CPU)
  Origin = "GenuineIntel"  Id = 0x683  Stepping = 3
  
Features=0x387fbff
real memory  = 536870912 (524288K bytes)
config> en sio3
config> po sio3 0x2e8
config> ir sio3 9
config> f sio3 0
config> en sio2
config> po sio2 0x3e8
config> ir sio2 5
config> f sio2 0
config> q
avail memory = 518303744 (506156K bytes)
Programming 24 pins in IOAPIC #0
IOAPIC #0 intpin 2 -> irq 0
FreeBSD/SMP: Multiprocessor motherboard
 cpu0 (BSP): apic id:  0, version: 0x00040011, at 0xfee0
 cpu1 (AP):  apic id:  1, version: 0x00040011, at 0xfee0
 io0 (APIC): apic id:  2, version: 0x00178011, at 0xfec0
Preloaded elf kernel "kernel" at 0xc045c000.
Preloaded userconfig_script "/boot/kernel.conf" at 0xc045c09c.
Preloaded elf module "agp.ko" at 0xc045c0ec.
Pentium Pro MTRR support enabled
md0: Malloc disk
Using $PIR table, 8 entries at 0xc00fdb80
npx0:  on motherboard
npx0: INT 16 interface
pcib0:  on motherboard
IOAPIC #0 intpin 9 -> irq 2
IOAPIC #0 intpin 11 -> irq 9
pci0:  on pcib0
agp0:  mem 0xd000-0xd3ff 
at device 0.0 on pci0
pcib1:  at device 1.0 on 
pci0
pci1:  on pcib1
pci1:  at 0.0 irq 10
isab0:  at device 7.0 on pci0
isa0:  on isab0
atapci0:  port 0xd000-0xd00f at device 7.1 on 
pci0
ata0: at 0x1f0 irq 14 on atapci0
ata1: at 0x170 irq 15 on atapci0
uhci0:  port 0xd400-0xd41f irq 2 at device 7.2 on 
pci0
usb0:  on uhci0
usb0: USB revision 1.0
uhub0: VIA UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub0: 2 ports with 2 removable, self powered
uhci1:  port 0xd800-0xd81f irq 2 at device 7.3 on 
pci0
usb1:  on uhci1
usb1: USB revision 1.0
uhub1: VIA UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub1: 2 ports with 2 removable, self powered
chip0:  at device 7.4 on pci0
pcm0:  port 0xe000-0xe03f irq 9 at device 9.0 on pci0
pci0:  at 11.0 irq 2
bktr0:  mem 0xdc101000-0xdc101fff irq 10 at device 12.0 on 
pci0
bktr0: Pinnacle/Miro TV, Temic NTSC tuner.
fxp0:  port 0xe400-0xe41f mem 
0xdc00-0xdc0f,0xdc10-0xdc100fff irq 9 at device 13.0 on pci0
fxp0: Ethernet address 00:a0:c9:a3:a6:4f
inphy0:  on miibus0
inphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
orm0:  at iomem 0xc-0xc7fff,0xc8000-0xcbfff on isa0
fdc0:  at port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on isa0
atkbdc0:  at port 0x60,0x64 on isa0
atkbd0:  flags 0x1 irq 1 on atkbdc0
kbd0 at atkbd0
psm0:  irq 12 on atkbdc0
psm0: model IntelliMouse, device ID 3
vga0:  at port 0x3c0-0x3df iomem 0xa-0xb on isa0
sc0:  at flags 0x100 on isa0
sc0: VGA <16 virtual consoles, flags=0x300>
sio0 at port 0x3f8-0x3ff irq 4 flags 0x10 on isa0
sio0: type 16550A
sio1 at port 0x2f8-0x2ff irq 3 on isa0
sio1: type 16550A
sio2: configured irq 5 not in bitmap of probed irqs 0
sio3: configured irq 9 not in bitmap of probed irqs 0
ppc0:  at port 0x378-0x37f irq 7 on isa0
ppc0: SMC-like chipset (ECP/EPP/PS2/NIBBLE) in COMPATIBLE mode
ppc0: FIF

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-stable" in the body of the message



Re: 4.4-STABLE crashes - suspects new ata-driver over wd-drivers

2001-12-26 Thread Brady Montz

--text follows this line--
Matthew Dillon <[EMAIL PROTECTED]> writes:

> Well, I'm close to being stuck folks.  I've looked at 5 of Brady's
> core dumps and one of Nils and it appears to be semi-random
> corruption of structures that simply cannot be otherwise corrupted
> in the way they are being corrupted.  The only common thread here is
> that Brady and Nils and most of the other people reporting
> these crashes have heavy IDE workloads.  They also both have
> VIA chipsets (different versions though).
> 
> So, general question to everyone having crashes during heavy
> IDE workloads:  Please post your /var/run/dmesg.boot
> output.

Copyright (c) 1992-2001 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
The Regents of the University of California. All rights reserved.
FreeBSD 4.5-PRERELEASE #1: Tue Dec 25 13:05:39 PST 2001
root@beaker:/usr/obj/vol/src.stable/sys/BEAKER.debug
Timecounter "i8254"  frequency 1193182 Hz
Timecounter "TSC"  frequency 451024596 Hz
CPU: AMD-K6(tm) 3D processor (451.02-MHz 586-class CPU)
  Origin = "AuthenticAMD"  Id = 0x58c  Stepping = 12
  Features=0x8021bf
  AMD Features=0x8800
real memory  = 134152192 (131008K bytes)
config> di sn0
No such device: sn0
Invalid command or syntax.  Type `?' for help.
config> di lnc0
No such device: lnc0
Invalid command or syntax.  Type `?' for help.
config> di ie0
No such device: ie0
Invalid command or syntax.  Type `?' for help.
config> di fe0
No such device: fe0
Invalid command or syntax.  Type `?' for help.
config> di ed0
No such device: ed0
Invalid command or syntax.  Type `?' for help.
config> di cs0
No such device: cs0
Invalid command or syntax.  Type `?' for help.
config> di bt0
config> di aic0
config> di aha0
config> di adv0
config> q
avail memory = 126656512 (123688K bytes)
Preloaded elf kernel "kernel" at 0xc03df000.
Preloaded userconfig_script "/boot/kernel.conf" at 0xc03df09c.
VESA: v3.0, 4096k memory, flags:0x1, mode table:0xc036dca2 (122)
VESA: NVidia
K6-family MTRR support enabled (2 registers)
md0: Malloc disk
Using $PIR table, 7 entries at 0xc00fded0
npx0:  on motherboard
npx0: INT 16 interface
pcib0:  on motherboard
pci0:  on pcib0
pcib1:  at device 1.0 on pci0
pci1:  on pcib1
isab0:  at device 7.0 on pci0
isa0:  on isab0
atapci0:  port 0xe400-0xe40f at device 7.1 on pci0
ata0: at 0x1f0 irq 14 on atapci0
ata1: at 0x170 irq 15 on atapci0
uhci0:  port 0xe000-0xe01f irq 5 at device 7.2 on pci0
usb0:  on uhci0
usb0: USB revision 1.0
uhub0: VIA UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub0: 2 ports with 2 removable, self powered
pcib2:  at device 7.3 on pci0
sym0: <810a> port 0xe800-0xe8ff mem 0xe500-0xe5ff irq 10 at device 8.0 on pci0
sym0: No NVRAM, ID 7, Fast-10, SE, parity checking
xl0: <3Com 3c900-COMBO Etherlink XL> port 0xec00-0xec3f irq 11 at device 9.0 on pci0
xl0: Ethernet address: 00:a0:24:d2:c4:91
xl0: selecting 10baseT transceiver, half duplex
pci0:  at 11.0 irq 5
orm0:  at iomem 0xc-0xc7fff on isa0
sbc0:  at port 0x220-0x22f irq 7 on isa0
sbc0: alloc_resource
device_probe_and_attach: sbc0 attach returned 6
fdc0:  at port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on isa0
fdc0: FIFO enabled, 8 bytes threshold
fd0: <1440-KB 3.5" drive> on fdc0 drive 0
atkbdc0:  at port 0x60,0x64 on isa0
atkbd0:  flags 0x1 irq 1 on atkbdc0
kbd0 at atkbd0
psm0:  irq 12 on atkbdc0
psm0: model Generic PS/2 mouse, device ID 0
vga0:  at port 0x3c0-0x3df iomem 0xa-0xb on isa0
sc0:  at flags 0x100 on isa0
sc0: VGA <8 virtual consoles, flags=0x300>
sio0 at port 0x3f8-0x3ff irq 4 flags 0x10 on isa0
sio0: type 16550A
sio1 at port 0x2f8-0x2ff irq 3 on isa0
sio1: type 16550A
ppc0:  at port 0x378-0x37f irq 7 on isa0
ppc0: Generic chipset (NIBBLE-only) in COMPATIBLE mode
ppi0:  on ppbus0
plip0:  on ppbus0
lpt0:  on ppbus0
lpt0: Interrupt-driven port
IP packet filtering initialized, divert enabled, rule-based forwarding disabled, 
default to deny, unlimited logging
ad0: 29188MB  [59303/16/63] at ata0-master UDMA33
ad2: 2015MB  [4095/16/63] at ata1-master WDMA2
acd0: CDROM  at ata1-slave using PIO4
Mounting root from ufs:/dev/ad0s3a
WARNING: / was not properly dismounted

-- 
 Brady Montz
 [EMAIL PROTECTED]

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-stable" in the body of the message



Re: 4.4-STABLE crashes - suspects new ata-driver over wd-drivers

2001-12-20 Thread Brandon D. Valentine

On Thu, 20 Dec 2001, Brady Montz wrote:

>However, being a recent convert to BSD, I don't know how to turn of
>DMA. How do I?

Welcome.  =)

You'll probably want to read sysctl(8) and sysctl.conf(5) to get
familiar with how BSDs sysctl interfaces work.  You'll be particular
interested in the hw.ata.* sysctls.

Brandon D. Valentine
-- 
"Iam mens praetrepidans avet vagari."
- G. Valerius Catullus, Carmina, XLVI


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-stable" in the body of the message



Re: 4.4-STABLE crashes - suspects new ata-driver over wd-drivers

2001-12-20 Thread Brady Montz

--text follows this line--
Matthew Dillon <[EMAIL PROTECTED]> writes:

> :I induced the crash by running "make clean; make buildworld" in one
> :infinite loop and "portsdb -Uu" in another. That string occurs in a
> :bunch of makefiles in /usr/ports. Some of the occurences in the core
> :are clearly from them, but many of them are surrounded by binary
> :data. I recursively grepped /usr/{src,obj,bin,ports} and
> :/usr/local/{bin,lib} and didn't find any binary files with that
> :string. My guess then is that it's from the memory image of a make
> :process.
> :
> :-- 
> : Brady Montz
> 
> This is s weird.  The corruption is occuring in the vm_page_t itself,
> at least in the crash you sent me.  The vm_page_t is a locked-down
> address in the kernel.  It is not effecting the vm_page_t's around the
> one that got corrupted.  The corruption does not appear to be on a page
> or device block boundry.  I am at a loss as to how its getting there.
> 
> Could you try playing with the DMA modes on your IDE hard drive?  Try
> turning DMA off, for example, and see if the corruption still occurs.

I'd had that thought as well. Seems a reasonably way for a misbehaving
driver to corrupt memory. I'll try that tonight. 

However, being a recent convert to BSD, I don't know how to turn of
DMA. How do I?

-- 
 Brady Montz
 [EMAIL PROTECTED]

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-stable" in the body of the message



Re: 4.4-STABLE crashes - suspects new ata-driver over wd-drivers

2001-12-20 Thread Matthew Dillon

:I induced the crash by running "make clean; make buildworld" in one
:infinite loop and "portsdb -Uu" in another. That string occurs in a
:bunch of makefiles in /usr/ports. Some of the occurences in the core
:are clearly from them, but many of them are surrounded by binary
:data. I recursively grepped /usr/{src,obj,bin,ports} and
:/usr/local/{bin,lib} and didn't find any binary files with that
:string. My guess then is that it's from the memory image of a make
:process.
:
:-- 
: Brady Montz

This is s weird.  The corruption is occuring in the vm_page_t itself,
at least in the crash you sent me.  The vm_page_t is a locked-down
address in the kernel.  It is not effecting the vm_page_t's around the
one that got corrupted.  The corruption does not appear to be on a page
or device block boundry.  I am at a loss as to how its getting there.

Could you try playing with the DMA modes on your IDE hard drive?  Try
turning DMA off, for example, and see if the corruption still occurs.

-Matt


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-stable" in the body of the message



Re: 4.4-STABLE crashes - suspects new ata-driver over wd-drivers

2001-12-20 Thread Brady Montz

--text follows this line--
Peter Jeremy <[EMAIL PROTECTED]> writes:

> On 2001-Dec-20 13:47:16 -0800, Matthew Dillon <[EMAIL PROTECTED]> wrote:
> >The string 'pre-fetch' is sitting right smack in the middle of the
> >VM_PAGE!!!.  The entire vm_page is corrupt, though the vm_page's
> >surrounding it appear to be ok.
> >
> >When I look for 'pre-fetch' in the raw vmcore file I see it occuring
> >all over the place.
> 
> Any guess where it's coming from?  I can't find that string anywhere
> in the -stable source code or kernel - which suggests that it's being
> read from the disk.

I induced the crash by running "make clean; make buildworld" in one
infinite loop and "portsdb -Uu" in another. That string occurs in a
bunch of makefiles in /usr/ports. Some of the occurences in the core
are clearly from them, but many of them are surrounded by binary
data. I recursively grepped /usr/{src,obj,bin,ports} and
/usr/local/{bin,lib} and didn't find any binary files with that
string. My guess then is that it's from the memory image of a make
process.


-- 
 Brady Montz
 [EMAIL PROTECTED]

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-stable" in the body of the message



Re: 4.4-STABLE crashes - suspects new ata-driver over wd-drivers

2001-12-20 Thread Peter Jeremy

On 2001-Dec-20 13:47:16 -0800, Matthew Dillon <[EMAIL PROTECTED]> wrote:
>The string 'pre-fetch' is sitting right smack in the middle of the
>VM_PAGE!!!.  The entire vm_page is corrupt, though the vm_page's
>surrounding it appear to be ok.
>
>When I look for 'pre-fetch' in the raw vmcore file I see it occuring
>all over the place.

Any guess where it's coming from?  I can't find that string anywhere
in the -stable source code or kernel - which suggests that it's being
read from the disk.

Peter

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-stable" in the body of the message



Re: 4.4-STABLE crashes - suspects new ata-driver over wd-drivers

2001-12-17 Thread Chad David

On Mon, Dec 17, 2001 at 01:33:19PM -0800, David Wolfskill wrote:
> >Date: Mon, 17 Dec 2001 14:21:15 -0700
> >From: Chad David <[EMAIL PROTECTED]>
> 
> >I still agree.  My -current machines run find, and I refuse to run -stable on
> >an SMP machine.
> 
> For whatever it may be worth, my "build machine" (which is one of the
> machines on which I track both -STABLE and -CURRENT daily) is an SMP box.
> It also has a local CVS repository on it (from which I update the CVS
> repository on my laptop, which laso tracks -STABLE and -CURRENT daily).
> 
> I am not having any problems with -STABLE that I know of.

It might be worth a lot :).  It is possible that I and a few others have
bad hardware, and there is no problem with -stable.  It seems unlikely,
but it is not at all impossible I guess.

Are you running any combination of samba/nfs/ata?

Thanks

-- 
Chad David[EMAIL PROTECTED]
ACNS Inc. Calgary, Alberta Canada

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-stable" in the body of the message



Re: 4.4-STABLE crashes - suspects new ata-driver over wd-drivers

2001-12-17 Thread David Wolfskill

>Date: Mon, 17 Dec 2001 14:21:15 -0700
>From: Chad David <[EMAIL PROTECTED]>

>I still agree.  My -current machines run find, and I refuse to run -stable on
>an SMP machine.

For whatever it may be worth, my "build machine" (which is one of the
machines on which I track both -STABLE and -CURRENT daily) is an SMP box.
It also has a local CVS repository on it (from which I update the CVS
repository on my laptop, which laso tracks -STABLE and -CURRENT daily).

I am not having any problems with -STABLE that I know of.

Cheers,
david
-- 
David H. Wolfskill  [EMAIL PROTECTED]
I believe it would be irresponsible (and thus, unethical) for me to advise,
recommend, or support the use of any product that is or depends on any
Microsoft product for any purpose other than personal amusement.

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-stable" in the body of the message



Re: 4.4-STABLE crashes - suspects new ata-driver over wd-drivers

2001-12-17 Thread Chad David

On Mon, Dec 17, 2001 at 04:14:08PM -0800, Brady Montz wrote:
> 
> Here's an update ...
> 
> I'm fairly certain there's a kernel bug at work here. Last night I rebooted 
> to linux (which is on the same disk), and ran batch compiles all
> night long without any troubles. In comparision, I can't compile more than
> an hour at a time with BSD 4.4 before it crashes.

I still agree.  My -current machines run find, and I refuse to run -stable on
an SMP machine.

> 
> I am running the latest 4.4-stable. The other day I went back to 4.4-release
> and that didn't help. I've tried both with and without softupdates. The
> crashes seem to happen most often when accessing stuff from all over
> the filesystem, such as during a large "make clean", or most reliably, with 
> "portsdb -Uu". 
> 
> I am tiring of this. Someone else on this thread mentioned that
> their 5.0 machine is doing fine. In what shape is that and how much effort
> is it to move a 4.4 machine to 5.0? 


Unless you feel confident that you can deal with the problems that arise on
-current, I wouldn't want to be the one to recommend that you change, but
my personal experience has been that -stable is anything but stable on SMP
machines.  On UP machines I have no problems at all.  The -current SMP machines
here are all very stable.  I don't track it daily, and I am careful to build
a test box before I rebuild a box I care about, but generally I have been
much happier with -current than with -stable (this year).

As for the effort to upgrade, it depends on what the box is doing.  I've only
upgraded a few boxes in the last year or so, and found that it was fairly 
timing dependant, but in general I haven't had any real problems (read UPDATING).

I have a little time this afternoon, so I'm going to see if I can figure
something out.  I'll throw -stable onto one of my SMP development machines
and see if I can kill it.  At least there I can debug it. 



A small plug: I've written a script that will rebuild an entire machine, from
a cvsup -> mergemaster and reboot.  It doesn't really address anything to
do with this thread, but you might find it handy :)

http://www.acns.ab.ca/projects/rebuild/rebuild.tar.gz


-- 
Chad David[EMAIL PROTECTED]
ACNS Inc. Calgary, Alberta Canada

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-stable" in the body of the message



Re: 4.4-STABLE crashes - suspects new ata-driver over wd-drivers

2001-12-17 Thread Brady Montz

On Fri, Dec 07, 2001 at 01:33:14PM -0800, Brady Montz wrote:
> Richard Nyberg <[EMAIL PROTECTED]> writes: 
>  
> > On Fri, Dec 07, 2001 at 10:59:13AM +0100, Samuel Tardieu wrote: 
> > > I am experiencing the same crashes on my new machine (ATA100 IDE 
> > > drive): they appeared when I noticed that I had forgotten to use 
> > > soft-updates. After I have turned them on, I experienced the first 
> > > crash in 15 minutes. Then I get one every two days, when doing 
> > > heavy disk IOs. I got a crash 10 minutes ago when the machine was 
> > > unattended though (and not doing important disk IOs), and could 
> > > see a "panic" message on the console. Unfortunately, I hadn't 
> > > enough free space in /var/crash to save the kernel. 
> > >  
> > > Do you people use soft-updates? From my experience on this 
> > > problem, I assume that either soft-updates or the ATA driver may 
> > > be causing those spontanous reboots. 
> >  
> > Yes I use soft-updates. The peculiar thing aboout my crash though is that 
> > there was no panic; the machine just froze and the screen went blank, so 
> > maybe I was hit by a different problem. 
> >  
> >   -Richard 
> 
> Yeah, I'm using soft updates too. My crashes are generally the same as
> Richards - no panic, just a freeze. Except my screen doesn't go blank.

Here's an update ...

I'm fairly certain there's a kernel bug at work here. Last night I rebooted 
to linux (which is on the same disk), and ran batch compiles all
night long without any troubles. In comparision, I can't compile more than
an hour at a time with BSD 4.4 before it crashes.

Again, I ran memtest86 and it didn't find any memory errors, and I'm
not seeing any file system corruption, just hangs and reboots.

I am running the latest 4.4-stable. The other day I went back to 4.4-release
and that didn't help. I've tried both with and without softupdates. The
crashes seem to happen most often when accessing stuff from all over
the filesystem, such as during a large "make clean", or most reliably, with 
"portsdb -Uu". 

I am tiring of this. Someone else on this thread mentioned that
their 5.0 machine is doing fine. In what shape is that and how much effort
is it to move a 4.4 machine to 5.0? 

-- 
  Brady Montz
  [EMAIL PROTECTED]

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-stable" in the body of the message



Re: 4.4-STABLE crashes - suspects new ata-driver over wd-drivers

2001-12-11 Thread ian j hart

ian j hart wrote:
> 
> ian j hart wrote:
> >
> > "Chad R. Larson" wrote:
> > >
> > > On Fri, Dec 07, 2001 at 01:33:15PM -0800, Brady Montz wrote:
> > > > Yeah, I'm using soft updates too.  My crashes are generally the
> > > > same as Richards - no panic, just a freeze.  Except my screen
> > > > doesn't go blank.
> > >
> > > For what it's worth, I'm using soft updates on a web server that gets
> > > steady if not heavy use.  Built from RELENG_4_3, and no problems at
> > > all.
> > >
> > > -crl
> > > --
> > > Chad R. Larson (CRL15)   602-953-1392   Brother, can you paradigm?
> > > [EMAIL PROTECTED] [EMAIL PROTECTED]  [EMAIL PROTECTED]
> > > DCF, Inc. - 14623 North 49th Place, Scottsdale, Arizona 85254-2207
> > >
> > > To Unsubscribe: send mail to [EMAIL PROTECTED]
> > > with "unsubscribe freebsd-stable" in the body of the message
> >
> > Me 2 :(
> >
> > I have a total lockup, screen is not blank (Matrox G400).
> >
> > I turned off soft updates and did a boot -v and got some
> > console messages, so this is worth a try. Unfortunately the
> > messages don't make it to the logs, presumably because the
> > disk and/or disk subsystem is fubar'd. The one time a got
> > a spontaineous reboot I was out of the room making coffee
> > (typical).
> >
> > Anyway the messages are something like
> > ad0: READ command timeout tag=0 serv=0 - resetting
> > ata0: resetting devices .. done
> >
> > It's not always the same drive.
> >
> > There were also some of my favorite "UDMA ICRC" errors, but
> > I didn't catch those. For those with long memories this is
> > the same box I've had UDMA problems with before (numerous
> > posts with UDMA ICRC in subject) but it's been well behaved
> > since early July. Maybe I haven't pushed it hard enough.
> > I also got one instance of "unexpected soft update inconsisency"
> > while fscking. Maybe this is to be expected if the drive "just
> > dies".
> >
> > What's interesting is the behavior seems to have changed. On
> > previous occasions the driver would keep resetting and then
> > drop to pio mode. Now it seems to lock after the first reset.
> > I'll try to confirm this behavior.
> >
> > I set pio mode on all drives and I managed to complete my
> > torture test.
> >
> > One more thing. Sometimes there's a clunk from the drive{s)
> > when it dies. Parking the heads?
> >
> > FWIW -
> > VIA ATA33 controller
> > 4x UDMA 66 drives
> > vinum mirror /var
> > vinum mirrored stripes /usr
> >
> 
> Drat, spoke too soon.
> 
> soft updates on, dma off. Hang (in kde) followed by black
> screen and reboot. This time vinum died on startup and I
> had an anxious 10 minutes starting all the subdisks.
> 
> I'd better test the memory. Then I'll try booting from
> the backup root in case ad0 is toast. I guess duff hardware is
> looking more likely.
> 
> I noticed some UDMA errors when rebooting from single
> user, which failed to sync 1 block. Of course these scroll
> off screen too quick to be readable, but the "head parking"
> noise was again apparent. APM is disabled.
> 
> --
> ian j hart
> 
> To Unsubscribe: send mail to [EMAIL PROTECTED]
> with "unsubscribe freebsd-stable" in the body of the message

Update:

I couldn't prove the memory faulty but I did discover a useful
factlet which was missed out of the M/B handbook. Apparently
the VIA MVP3 chipset only supports 16MxN RAM when you have
more than 64Mb. This is not what I had. I updated the BIOS and
sure enough the board failed to detect all the RAM. I've
swapped it out. Maybe some update finally tickled the "feature"
hard enough to cause a panic.

I also did a full build, and this seems to have fixed some
weirdness with md0. Either I cvsup'd at a bad time or (more
likely) I fluffed the mergemester.

I'll thrash the bejesus out of the drives and see what happens.

-- 
ian j hart

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-stable" in the body of the message



Re: 4.4-STABLE crashes - suspects new ata-driver over wd-drivers

2001-12-09 Thread ian j hart

ian j hart wrote:
> 
> "Chad R. Larson" wrote:
> >
> > On Fri, Dec 07, 2001 at 01:33:15PM -0800, Brady Montz wrote:
> > > Yeah, I'm using soft updates too.  My crashes are generally the
> > > same as Richards - no panic, just a freeze.  Except my screen
> > > doesn't go blank.
> >
> > For what it's worth, I'm using soft updates on a web server that gets
> > steady if not heavy use.  Built from RELENG_4_3, and no problems at
> > all.
> >
> > -crl
> > --
> > Chad R. Larson (CRL15)   602-953-1392   Brother, can you paradigm?
> > [EMAIL PROTECTED] [EMAIL PROTECTED]  [EMAIL PROTECTED]
> > DCF, Inc. - 14623 North 49th Place, Scottsdale, Arizona 85254-2207
> >
> > To Unsubscribe: send mail to [EMAIL PROTECTED]
> > with "unsubscribe freebsd-stable" in the body of the message
> 
> Me 2 :(
> 
> I have a total lockup, screen is not blank (Matrox G400).
> 
> I turned off soft updates and did a boot -v and got some
> console messages, so this is worth a try. Unfortunately the
> messages don't make it to the logs, presumably because the
> disk and/or disk subsystem is fubar'd. The one time a got
> a spontaineous reboot I was out of the room making coffee
> (typical).
> 
> Anyway the messages are something like
> ad0: READ command timeout tag=0 serv=0 - resetting
> ata0: resetting devices .. done
> 
> It's not always the same drive.
> 
> There were also some of my favorite "UDMA ICRC" errors, but
> I didn't catch those. For those with long memories this is
> the same box I've had UDMA problems with before (numerous
> posts with UDMA ICRC in subject) but it's been well behaved
> since early July. Maybe I haven't pushed it hard enough.
> I also got one instance of "unexpected soft update inconsisency"
> while fscking. Maybe this is to be expected if the drive "just
> dies".
> 
> What's interesting is the behavior seems to have changed. On
> previous occasions the driver would keep resetting and then
> drop to pio mode. Now it seems to lock after the first reset.
> I'll try to confirm this behavior.
> 
> I set pio mode on all drives and I managed to complete my
> torture test.
> 
> One more thing. Sometimes there's a clunk from the drive{s)
> when it dies. Parking the heads?
> 
> FWIW -
> VIA ATA33 controller
> 4x UDMA 66 drives
> vinum mirror /var
> vinum mirrored stripes /usr
> 

Drat, spoke too soon.

soft updates on, dma off. Hang (in kde) followed by black
screen and reboot. This time vinum died on startup and I
had an anxious 10 minutes starting all the subdisks.

I'd better test the memory. Then I'll try booting from
the backup root in case ad0 is toast. I guess duff hardware is
looking more likely.

I noticed some UDMA errors when rebooting from single
user, which failed to sync 1 block. Of course these scroll
off screen too quick to be readable, but the "head parking"
noise was again apparent. APM is disabled.

-- 
ian j hart

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-stable" in the body of the message



Re: 4.4-STABLE crashes - suspects new ata-driver over wd-drivers

2001-12-09 Thread ian j hart

"Chad R. Larson" wrote:
> 
> On Fri, Dec 07, 2001 at 01:33:15PM -0800, Brady Montz wrote:
> > Yeah, I'm using soft updates too.  My crashes are generally the
> > same as Richards - no panic, just a freeze.  Except my screen
> > doesn't go blank.
> 
> For what it's worth, I'm using soft updates on a web server that gets
> steady if not heavy use.  Built from RELENG_4_3, and no problems at
> all.
> 
> -crl
> --
> Chad R. Larson (CRL15)   602-953-1392   Brother, can you paradigm?
> [EMAIL PROTECTED] [EMAIL PROTECTED]  [EMAIL PROTECTED]
> DCF, Inc. - 14623 North 49th Place, Scottsdale, Arizona 85254-2207
> 
> To Unsubscribe: send mail to [EMAIL PROTECTED]
> with "unsubscribe freebsd-stable" in the body of the message

Me 2 :(

I have a total lockup, screen is not blank (Matrox G400).

I turned off soft updates and did a boot -v and got some
console messages, so this is worth a try. Unfortunately the
messages don't make it to the logs, presumably because the
disk and/or disk subsystem is fubar'd. The one time a got
a spontaineous reboot I was out of the room making coffee
(typical).

Anyway the messages are something like
ad0: READ command timeout tag=0 serv=0 - resetting
ata0: resetting devices .. done

It's not always the same drive.

There were also some of my favorite "UDMA ICRC" errors, but
I didn't catch those. For those with long memories this is
the same box I've had UDMA problems with before (numerous
posts with UDMA ICRC in subject) but it's been well behaved
since early July. Maybe I haven't pushed it hard enough.
I also got one instance of "unexpected soft update inconsisency"
while fscking. Maybe this is to be expected if the drive "just
dies".

What's interesting is the behavior seems to have changed. On
previous occasions the driver would keep resetting and then
drop to pio mode. Now it seems to lock after the first reset.
I'll try to confirm this behavior.

I set pio mode on all drives and I managed to complete my
torture test.

One more thing. Sometimes there's a clunk from the drive{s)
when it dies. Parking the heads?

FWIW -
VIA ATA33 controller
4x UDMA 66 drives
vinum mirror /var
vinum mirrored stripes /usr

-- 
ian j hart

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-stable" in the body of the message



Re: 4.4-STABLE crashes - suspects new ata-driver over wd-drivers

2001-12-07 Thread Chad R. Larson

On Fri, Dec 07, 2001 at 01:33:15PM -0800, Brady Montz wrote:
> Yeah, I'm using soft updates too.  My crashes are generally the
> same as Richards - no panic, just a freeze.  Except my screen
> doesn't go blank.

For what it's worth, I'm using soft updates on a web server that gets
steady if not heavy use.  Built from RELENG_4_3, and no problems at
all.

-crl
--
Chad R. Larson (CRL15)   602-953-1392   Brother, can you paradigm?
[EMAIL PROTECTED] [EMAIL PROTECTED]  [EMAIL PROTECTED]
DCF, Inc. - 14623 North 49th Place, Scottsdale, Arizona 85254-2207

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-stable" in the body of the message



Re: 4.4-STABLE crashes - suspects new ata-driver over wd-drivers

2001-12-07 Thread Brady Montz

Richard Nyberg <[EMAIL PROTECTED]> writes: 
 
> On Fri, Dec 07, 2001 at 10:59:13AM +0100, Samuel Tardieu wrote: 
> > I am experiencing the same crashes on my new machine (ATA100 IDE 
> > drive): they appeared when I noticed that I had forgotten to use 
> > soft-updates. After I have turned them on, I experienced the first 
> > crash in 15 minutes. Then I get one every two days, when doing 
> > heavy disk IOs. I got a crash 10 minutes ago when the machine was 
> > unattended though (and not doing important disk IOs), and could 
> > see a "panic" message on the console. Unfortunately, I hadn't 
> > enough free space in /var/crash to save the kernel. 
> >  
> > Do you people use soft-updates? From my experience on this 
> > problem, I assume that either soft-updates or the ATA driver may 
> > be causing those spontanous reboots. 
>  
> Yes I use soft-updates. The peculiar thing aboout my crash though is that 
> there was no panic; the machine just froze and the screen went blank, so 
> maybe I was hit by a different problem. 
>  
>   -Richard 

Yeah, I'm using soft updates too. My crashes are generally the same as
Richards - no panic, just a freeze. Except my screen doesn't go blank.

-- 
  Brady Montz
  [EMAIL PROTECTED]

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-stable" in the body of the message