Re: 5.4 -> 6.0 buildworld failure

2006-02-09 Thread Anthony Chavez
On Mon, 06 Feb 2006 14:13:13 +0100 Markus Buretorp <[EMAIL PROTECTED]> wrote:

> For me, the problem was caused by some stupid envvars I had
> in my shell config. I removed these and the problem was solved.
>
> export INCLUDE_PATH=/usr/include:/usr/local/include export
> C_INCLUDE_PATH=/usr/include:/usr/local/include:/usr/X11R6/include
> export CPLUS_INCLUDE_PATH=$C_INCLUDE_PATH export
> LIBRARY_PATH=/usr/lib:/usr/local/lib export LD_LIBRARY_PATH=. 

Thanks for the info, Markus.  I didn't have anything of the sort in my
shell config, but fortunately after I upgraded to 5.4-RELEASE-p11, I
executed a successful buildworld for 6.0-RELEASE.

Cheers!

-- 
Anthony Chavez http://anthonychavez.org/
mailto:[EMAIL PROTECTED] jabber:[EMAIL PROTECTED]


pgp3vMMxaRaGG.pgp
Description: PGP signature


Re: 5.4 -> 6.0 buildworld failure

2006-02-06 Thread Anthony Chavez
Markus and freebsd-stable:

I have encountered a situation exactly the same as the one described in
this thread when attempting a source upgrade from 5.4-RELEASE-p4 to
6.0-RELEASE-p4.

I have had nothing but success on 10 other machines that were initially
running 5.4-RELEASE, 5.4-RELEASE, -p6, and -p8.  Each of these machines
has a unique hardware configuration, and the one that fails to
buildworld is no exception.

I have tried an empty /etc/make.conf as well as specifically including
"CFLAGS=-O -pipe" therein.  I have also tried a default /etc/profile.
The build still fails.

I'm thinking that the best course of action might be to upgrade to
5.4-RELEASE-p11 (which builds successfully), but I'm very interested to
know what's causing this error in case my intended course of action
doesn't work.  The commit logs show no changes for this particular file
since well before this problem was reported, so the problem must have
lied somewhere else in the source tree.

Any ideas what could be causing this?  Am I on the right track?  I have
included my dmesg.boot below.

Cheers!

On Sat, 05 Nov 2005 23:49:34 +0100 Markus Buretorp <[EMAIL PROTECTED]> wrote:

> Peter Jeremy wrote:
>
>>On Sat, 2005-Nov-05 21:17:58 +0100, Markus Buretorp wrote:
>>  
>>> I'm trying to upgrade from FreeBSD 5.4-STABLE to 6.0. I've done a
>>> cvsup to RELENG_6 and RELENG_6_0, I've ran make cleanworld, make
>>> clean, rm -rf /usr/obj/*, etc; but nothing helps.
>>>
>>>...
>>>
>>>  /usr/src/lib/libkvm/kvm_proc.c:108: error: storage size of 't_cdev'
>>>  isn't known
>>
>>Where is this error occurring during the buildworld?  (What are the
>>latest lines beginning '>>>' and '===>')
>>What non-standard bits do you have in your command line, /etc/make.conf
>>or MAKEOBJDIRPREFIX?
>
>  >>> stage 4.2: building libraries
> ...
> ===> lib/libkvm (depend,all,install)
>
> make.conf:
>
> WITHOUT_X11=yes
> CPUTYPE?=athlon-xp
> CFLAGS=-O2 -pipe
> COPTFLAGS=-O -pipe
> # added by use.perl 2005-06-24 23:01:50
> PERL_VER=5.8.7
> PERL_VERSION=5.8.7
>
> Note, I've tried without the first four lines.
>
> $ cd lib/libkvm
> /usr/src [EMAIL PROTECTED]
> $ make
> ...r/src/lib/libkvm [EMAIL PROTECTED]
> cc -O -pipe  -DLIBC_SCCS -I/usr/src/lib/libkvm  -c
> /usr/src/lib/libkvm/kvm_proc.c
> /usr/src/lib/libkvm/kvm_proc.c: In function `kvm_proclist':
> /usr/src/lib/libkvm/kvm_proc.c:108: error: storage size of 't_cdev'
> isn't known
> /usr/src/lib/libkvm/kvm_proc.c:114: error: storage size of 'pr'
> isn't known
> /usr/src/lib/libkvm/kvm_proc.c:176: error: structure has no member
>     named `ki_jid'
> /usr/src/lib/libkvm/kvm_proc.c:377: error: structure has no member
> named `p_rux'
> *** Error code 1
>
> Stop in /usr/src/lib/libkvm.
>
> I found this, http://www.freebsd.org/cgi/query-pr.cgi?pr=77821 , but
> it doesn't help me much. I don't now what I've done. I've used cvsup
> and buildworld several times.

-- 
Anthony Chavez http://anthonychavez.org/
mailto:[EMAIL PROTECTED] jabber:[EMAIL PROTECTED]

--8<---cut here---start->8---
Copyright (c) 1992-2005 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
The Regents of the University of California. All rights reserved.
FreeBSD 5.4-RELEASE-p4 #1: Sun Sep 11 20:13:50 MDT 2005
[EMAIL PROTECTED]:/usr/obj/usr/src/sys/MYBOX
WARNING: debug.mpsafenet forced to 0 as ipsec requires Giant
WARNING: MPSAFE network stack disabled, expect reduced performance.
Timecounter "i8254" frequency 1193182 Hz quality 0
CPU: Intel(R) Pentium(R) 4 CPU 3.00GHz (3010.67-MHz 686-class CPU)
  Origin = "GenuineIntel"  Id = 0xf41  Stepping = 1
  
Features=0xbfebfbff
  Hyperthreading: 2 logical CPUs
real memory  = 1073414144 (1023 MB)
avail memory = 1040855040 (992 MB)
ACPI APIC Table: 
ioapic0  irqs 0-23 on motherboard
npx0:  on motherboard
npx0: INT 16 interface
acpi0:  on motherboard
acpi_bus_number: can't get _ADR
acpi_bus_number: can't get _ADR
acpi_bus_number: can't get _ADR
acpi_bus_number: can't get _ADR
acpi_bus_number: can't get _ADR
acpi_bus_number: can't get _ADR
acpi0: Power Button (fixed)
acpi_bus_number: can't get _ADR
acpi_bus_number: can't get _ADR
Timecounter "ACPI-safe" frequency 3579545 Hz quality 1000
acpi_timer0: <24-bit timer at 3.579545MHz> port 0x808-0x80b on acpi0
cpu0:  on acpi0
acpi_throttle0:  on cpu0
pcib0

Re: Stress testing and TIMEOUT - WRITE_DMA

2005-09-13 Thread Anthony Chavez
On Mon, 12 Sep 2005 08:19:18 +0200 martin hudec <[EMAIL PROTECTED]> wrote:

> On Sun, Sep 11, 2005 at 10:33:47PM +0200 or thereabouts, Daniel Gerzo wrote:
>> On Fri, 26 Aug 2005 03:21:35 -0600 Anthony Chavez <[EMAIL PROTECTED]> 
>> wrote:
>> > Sep  6 11:35:27 mybox kernel: ad0: TIMEOUT - WRITE_DMA retrying (2 retries 
>> > left) LBA=8348191
>> > ...
>> > Sep  6 18:59:09 mybox kernel: ad0: TIMEOUT - WRITE_DMA retrying (2 retries 
>> > left) LBA=8348383
>> > Sep  6 19:04:58 mybox kernel: ad0: TIMEOUT - READ_DMA retrying (2 retries 
>> > left) LBA=61749183
>> 
>> > The READ_DMA timeouts are happening very infrequently, but it's worth
>> > mentioning that I'm seeing them now in addition.
>> 
>> > This is quite disturbing, particularly when the machine in question is
>> > *in*production.*
>> 
>> I thing you should really quickly look for backuping your data. When
>> I was seeing this kind of messages last time, my disk died after 3
>> days from time they started showing up in my log files. I wasn't able
>> to write any data to the disk (system just sudennly paniced, when
>> I tried to mount it rw, but I was able to mount it ro and copy most of
>> the data) Note, that I wasn't able to copy about 10GB out of 30GB. So
>> don't ignore them and have a good luck.
>
>   Hmmm, before trashing that disk, you could surely consider running
>   smartmontools to see what they have to say about health condition of
>   your disk :).. go for sysutils/smartmontools.

Okay, I've actually got 3 identical drives (SAMSUNG SP0802N) in 3
identical systems, running identical hardware using Intel ICH4
controllers.

Only one of these machines managed to spit 81 errors at me over a period
of about 6.5 hours (so far).  This particular machine produced the
warnings after approximately 8 days after installing FreeBSD.
Ironically, another one of these machines only produced 1 warning after
nearly 21 days and then another solitary warning 14 days after that
(which occurred as I was drafting this response).

smartctl reports each of these drives passes the "SMART overall-health
self-assessment test" but goes on to report exactly 6 "SET MAX ADDRESS
[OBS-6]" errors occur for each drive within 1 hour of uptime.  I do not
think that any of these errors occured at the same time the DMA warnings
did.

>   After that can one make assumptions whether it is faulty hardware or
>   ata patches :).

Well, the drives are pretty much brand new.  I think that it's safe to
assume that the health of these drives are not a concern, and smartctl
seems to confirm this.

On Mon, 12 Sep 2005 15:53:27 +0200 MaXX <[EMAIL PROTECTED]> wrote:

> On Fri, 26 Aug 2005 03:21:35 -0600 Anthony Chavez <[EMAIL PROTECTED]> 
> wrote:
>> My question is simply this: is the fact that I received 4 TIMEOUT
>> warnings in the space of roughly 2 weeks significant cause for concern?
> Hi,
> You may have a look at this pr :85603  (FS corruption and 'uncorrectable' DMA 
> errors on ATA disks after unclean shutdown) and see if that applies for you.

Thanks.  My hardware doesn't match, but I'll keep it in mind.

> Are you running a kernel built around mid June this year?

The machine that gave me 81 warnings after applying ata-mk3n:

FreeBSD 5.4-RELEASE-p6 #0: Sun Sep 11 21:57:16 MDT 2005 [EMAIL 
PROTECTED]:/usr/obj/usr/src/sys/MYBOX1

The machine that's been in commission the longest:

FreeBSD 5.4-RELEASE #0: Sun Sep 11 21:46:18 MDT 2005 [EMAIL 
PROTECTED]:/usr/obj/usr/src/sys/MYBOX2

New kid on the block:

FreeBSD 5.4-RELEASE-p6 #0: Sun Sep 11 21:58:08 MDT 2005 [EMAIL 
PROTECTED]:/usr/obj/usr/src/sys/MYBOX3

FWIW, although they have different names, the kernel configs are exactly
the same.

> Did your machine paniced before the DMA problems appears (I think a power 
> faillure can do the trick too)?

No panic.  However, I recall reading that these warnings are a good
indication that a panic may be imminent, hence my call for help.

> In our case this problem was fixed by newfs, even smartctl 
> (sysutils/smartmontool) did report errors at the drive level. After newfs'ing 
> the disk no more message (but they still in the drive's log). 

That seems very strange, particularly when I have newfs'ed the disks
when installing FreeBSD.

Furthermore, this solution is not sufficient.  The machines that are
giving me this error are in crucial locations and I need to know what
causes these errors and if a fix is available or if I really should
worry about a few popping up now and then.

-- 
Anthony Chavez http://anthonychavez.org/
mailto:[EMAIL PROTECTED] jabber:[EMAIL PROTECTED]


pgpxTEOcEyNIj.pgp
Description: PGP signature


Re: Stress testing and TIMEOUT - WRITE_DMA

2005-09-11 Thread Anthony Chavez
On Sun, 11 Sep 2005 23:02:43 +0200 Matthias Buelow <[EMAIL PROTECTED]> wrote:

> Anthony Chavez wrote:
>
>> Sep  6 11:35:27 mybox kernel: ad0: TIMEOUT - WRITE_DMA retrying (2 retries 
>> left) LBA=8348191
> [...]
>> Has anyone who has experienced this pain found solace in 5-STABLE's ATA
>> drivers?
>
> Is this with the ATA mkIII patches?

As I mentioned in my first post to -questions, the system in question is
currently tracking RELENG_5 and is currently at version 5.4-RELEASE-p6.

I have applied Soeren's mkIII revsion n patchset, available at
http://people.freebsd.org/~sos/ATA/, and I'm still seeing the messages,
although *much* less frequently than before applying the patches.

The question I have is: should I revert back to an unaffected
5.x-RELEASE (which version would that be?) or should I consider tracking
5-STABLE instead?

> I assume you're acquainted with the ATA DMA timeout discussions of the
> last couple months concerning 5.x.

Yes, I have read through the discussions.  Is it safe yet to assume that
the issues (at least for the ICH controllers) have been fixed in
-CURRENT?

Thanks.

-- 
Anthony Chavez http://anthonychavez.org/
mailto:[EMAIL PROTECTED] jabber:[EMAIL PROTECTED]


pgpqeI8lypFDa.pgp
Description: PGP signature


Re: Stress testing and TIMEOUT - WRITE_DMA

2005-09-11 Thread Anthony Chavez
I'm not seeing much in the way of responses to this post from
freebsd-questions, so I thought I'd take it to freebsd-stable, where it
is probably more relevant. ;-)

Please see my original thread on freebsd-questions for context.

On Fri, 26 Aug 2005 03:21:35 -0600 Anthony Chavez <[EMAIL PROTECTED]> wrote:

> My question is simply this: is the fact that I received 4 TIMEOUT
> warnings in the space of roughly 2 weeks significant cause for concern?

Apparently, the fact that the stress tool produced so few warnings may
have given me a false sense of security.  I'm being treated to the
following messages (81 in total) today, after 8 days uptime:

Sep  6 11:35:27 mybox kernel: ad0: TIMEOUT - WRITE_DMA retrying (2 retries 
left) LBA=8348191
...
Sep  6 18:59:09 mybox kernel: ad0: TIMEOUT - WRITE_DMA retrying (2 retries 
left) LBA=8348383
Sep  6 19:04:58 mybox kernel: ad0: TIMEOUT - READ_DMA retrying (2 retries left) 
LBA=61749183

The READ_DMA timeouts are happening very infrequently, but it's worth
mentioning that I'm seeing them now in addition.

This is quite disturbing, particularly when the machine in question is
*in*production.*

Has anyone who has experienced this pain found solace in 5-STABLE's ATA
drivers?

dmesg below.

-- 
Anthony Chavez http://anthonychavez.org/
mailto:[EMAIL PROTECTED] jabber:[EMAIL PROTECTED]

Copyright (c) 1992-2005 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
The Regents of the University of California. All rights reserved.
FreeBSD 5.4-RELEASE-p6 #0: Fri Aug 26 02:23:19 MDT 2005
[EMAIL PROTECTED]:/usr/obj/usr/src/sys/GENERIC
ACPI APIC Table: 
Timecounter "i8254" frequency 1193182 Hz quality 0
CPU: Intel(R) Celeron(R) CPU 2.40GHz (2392.25-MHz 686-class CPU)
  Origin = "GenuineIntel"  Id = 0xf29  Stepping = 9
  
Features=0xbfebfbff
real memory  = 266813440 (254 MB)
avail memory = 251445248 (239 MB)
ioapic0: Changing APIC ID to 1
ioapic0  irqs 0-23 on motherboard
npx0:  on motherboard
npx0: INT 16 interface
acpi0:  on motherboard
acpi0: Power Button (fixed)
Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000
acpi_timer0: <24-bit timer at 3.579545MHz> port 0x808-0x80b on acpi0
cpu0:  on acpi0
acpi_button0:  on acpi0
pcib0:  port 0xcf8-0xcff on acpi0
pci0:  on pcib0
agp0:  mem 
0xfeb8-0xfebf,0xe800-0xefff irq 16 at device 2.0 on pci0
agp0: detected 892k stolen memory
agp0: aperture size is 128M
uhci0:  port 0xff80-0xff9f irq 16 at 
device 29.0 on pci0
usb0:  on uhci0
usb0: USB revision 1.0
uhub0: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub0: 2 ports with 2 removable, self powered
uhci1:  port 0xff60-0xff7f irq 19 at 
device 29.1 on pci0
usb1:  on uhci1
usb1: USB revision 1.0
uhub1: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub1: 2 ports with 2 removable, self powered
uhci2:  port 0xff40-0xff5f irq 18 at 
device 29.2 on pci0
usb2:  on uhci2
usb2: USB revision 1.0
uhub2: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub2: 2 ports with 2 removable, self powered
pci0:  at device 29.7 (no driver attached)
pcib1:  at device 30.0 on pci0
pci1:  on pcib1
pci1:  at device 5.0 (no driver attached)
xl0: <3Com 3c900-TPO Etherlink XL> port 0xddc0-0xddff irq 18 at device 6.0 on 
pci1
xl0: selecting 10baseT transceiver, half duplex
xl0: Ethernet address: 00:60:97:74:a8:6d
bfe0:  mem 0xfe9fe000-0xfe9f irq 17 at 
device 9.0 on pci1
miibus0:  on bfe0
bmtphy0:  on miibus0
bmtphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
bfe0: Ethernet address: 00:12:3f:d4:21:75
isab0:  at device 31.0 on pci0
isa0:  on isab0
atapci0:  port 
0xffa0-0xffaf,0x376,0x170-0x177,0x3f6,0x1f0-0x1f7 irq 18 at device 31.1 on pci0
ata0:  on atapci0
ata1:  on atapci0
pci0:  at device 31.3 (no driver attached)
pci0:  at device 31.5 (no driver attached)
fdc0:  port 0x3f7,0x3f0-0x3f5 irq 6 drq 2 on acpi0
atkbdc0:  port 0x64,0x60 irq 1 on acpi0
atkbd0:  irq 1 on atkbdc0
kbd0 at atkbd0
sio0: <16550A-compatible COM port> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0
sio0: type 16550A
ppc0:  port 0x778-0x77f,0x378-0x37f irq 7 on acpi0
ppc0: SMC-like chipset (ECP/EPP/PS2/NIBBLE) in COMPATIBLE mode
ppc0: FIFO with 16/16/8 bytes threshold
ppbus0:  on ppc0
plip0:  on ppbus0
lpt0:  on ppbus0
lpt0: Interrupt-driven port
ppi0:  on ppbus0
orm0:  at iomem 
0xcd000-0xc,0xcb800-0xccfff,0xc-0xcb7ff on isa0
pmtimer0 on isa0
sc0:  at flags 0x100 on isa0
sc0: VGA <16 virtual consoles, flags=0x300>
sio1: configured irq 3 not in bitmap of probed irqs 0
sio1: port may not be enabled
vga0:  at port 0x3c0-0x3df iomem 0xa-0xb on isa0
Timecounter "TSC" frequency 2392248384 Hz quality 800
Timecounters tick every 10.000 msec
ad0: 76293MB  at ata0-master UDMA100
acd0: CDROM  at ata1-master UDMA33
ATA PseudoRAID loaded
Mounting root from ufs:/dev/ad0s1a


pgpEaaDKdYpvh.pgp
Description: PGP signature