Re: 5.3-RELEASE crashes during make buildworld (and other problems)

2005-01-18 Thread Robert Watson

On Mon, 17 Jan 2005, Vivek Khera wrote:

 On Jan 13, 2005, at 4:46 AM, Peter Jeremy wrote:
 
  That doesn't totally rule out hardware.  Pattern-sensitive memory
  problems may not show up on different operating systems (or even
  different kernels).  That said, based on the trap information, I'd
  look at a software cause first.
 
 Indeed.  I once had a box that would run Linux 100% stable under any
 load for months on end, but with BSD/OS it would crap out (random
 processes fail) after a max of 3 weeks requiring a reboot. 
 
 Never rule out bad hardware, especially with PC crap. 

Even minor OS revisions can reveal or hide memory problems.  For example,
for quite a while one of my Pentium (1!) server boxes had a single bit
error (a stuck on bit) that fell into a section of memory that always held
pinned kernel pages, and in particular, ended up holding a fairly obscure
kernel code branch in a module that was loaded.  Then one day kernel
memory layout got chaged a bit, and the page ended up being paged into
user memory, resulting in frequent application segfaults and data
corruption.  I was sure it was the OS upgrade, since backing out to the
previous kernel/modules fixed it reliably ... until I ran a memory test
and figured out what was actually happening.  It was pretty frustrating to
try to debug, and reinforces the conclusion that doing a bit of legwork on
a badly behaving system to confirm it's not a hardware fault that can be
easily ruled out can go a long way.  Which isn't to say that the problem
in this thread is hardware, but you don't want to spend two weeks tracking
a kernel bug to find out that swapping out the memory with a seemingly
identical DIMM fixes it.

Checking ethernet cabling and link negotiation, a decent memory test run,
checking SCSI termination, checking ATA cable type, etc, as first steps to
debugging a problem that would have similar symptoms is a good strt.  Oh,
and if it's your parents calling on the phone at 6:30am with a printer
problem, the first thing to ask is whether their printer is plugged in.
:-) 

Robert N M Watson



___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 5.3-RELEASE crashes during make buildworld (and other problems)

2005-01-17 Thread Vivek Khera
On Jan 13, 2005, at 4:46 AM, Peter Jeremy wrote:
That doesn't totally rule out hardware.  Pattern-sensitive memory
problems may not show up on different operating systems (or even
different kernels).  That said, based on the trap information, I'd
look at a software cause first.
Indeed.  I once had a box that would run Linux 100% stable under any 
load for months on end, but with BSD/OS it would crap out (random 
processes fail) after a max of 3 weeks requiring a reboot.

Never rule out bad hardware, especially with PC crap.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Ctrl + Alt + F1 always locks up 5.3-STABLE machine ( Was Re:5.3-RELEASE crashes during make buildworld (and other problems))

2005-01-14 Thread Rick Updegrove
Rick Updegrove wrote:
When I finally got 5.3-STABLE built after several mysterious failed
attempts the machine basically runs fine until...
I try to Ctrl + Alt + F1 (or any of the F keys) that will now
consistently locks up the machine.
If I am quick enough with Ctrl + Alt + Backspace or
Ctrl + Alt + Del (or ssh from another machine) I can at least get it 
to reboot without a fsck on startup.  Unfortunately, if I am too slow it 
hangs indefinitely.  I have had to just power it off several times now 
and of course it then complains about / not being unmounted properly 
etc. etc.

Yes, this probably contributes to some of my problems earlier but when I 
was running make buildworld from single user mode it still crashed and 
despite all the help I have gotten here I still have not managed to 
capture a crash dump.

So, I am running 5.3-STABLE (from yesterday) and all my ports/packages 
are up to date and if I forget I cant switch to a terminal while I 
experiment with fluxbox or KDE I will always lock up the machine.

Any suggestions would be greatly appreciated.
Rick
--
No virus found in this outgoing message.
Checked by AVG Anti-Virus.
Version: 7.0.300 / Virus Database: 265.6.12 - Release Date: 1/14/2005
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Ctrl + Alt + F1 always locks up 5.3-STABLE machine ( Was Re:5.3-RELEASE crashes during make buildworld (and other problems))

2005-01-14 Thread Doug White
On Fri, 14 Jan 2005, Rick Updegrove wrote:

 Rick Updegrove wrote:

 When I finally got 5.3-STABLE built after several mysterious failed
 attempts the machine basically runs fine until...

 I try to Ctrl + Alt + F1 (or any of the F keys) that will now
 consistently locks up the machine.

 If I am quick enough with Ctrl + Alt + Backspace or
 Ctrl + Alt + Del (or ssh from another machine) I can at least get it
 to reboot without a fsck on startup.  Unfortunately, if I am too slow it
 hangs indefinitely.  I have had to just power it off several times now
 and of course it then complains about / not being unmounted properly
 etc. etc.

 Yes, this probably contributes to some of my problems earlier but when I
 was running make buildworld from single user mode it still crashed and
 despite all the help I have gotten here I still have not managed to
 capture a crash dump.

 So, I am running 5.3-STABLE (from yesterday) and all my ports/packages
 are up to date and if I forget I cant switch to a terminal while I
 experiment with fluxbox or KDE I will always lock up the machine.

Are you starting xdm, kdm or some other display manager on boot? THere's a
race you can lose that will appear to lock up the console if the display
manager and getty start up on ttyv0 simultaneously. I usually insert a
5-10s sleep in the display manager startup script to avoid this.

-- 
Doug White|  FreeBSD: The Power to Serve
[EMAIL PROTECTED]  |  www.FreeBSD.org
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 5.3-RELEASE crashes during make buildworld (and other problems)

2005-01-13 Thread Marton Kenyeres
On Wednesday 12 January 2005 22:36, Rick Updegrove wrote:
 Lowell Gilbert wrote:
[ ... ]

 So, I am still trying to obtain a dump.

 Thanks to your reply, I did re-read #KERNEL-PANIC-TROUBLESHOOTING
 more carefully and I did try the following.

 [EMAIL PROTECTED] nm -n /boot/kernel | grep c061c642
 nm: Warning: '/boot/kernel' is not an ordinary file

/boot/kernel is a directory containing the kernel and loadable modules. 
Try to run nm on /boot/kernel/kernel.


 Any ideas on that?  The reason I did not try that first was I
 mistakenly thought I had to first capture the crash dump for some
 reason.

You can get much much more information about what went wrong from a 
crash dump, so try to capture one if you can.
Oh, and build a debug kernel, if you didn't do it before. A crash dump 
can be pretty useless without one.

[ ... ]



 Rick

cheers,
m.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 5.3-RELEASE crashes during make buildworld (and other problems)

2005-01-13 Thread Peter Jeremy
On Wed, 2005-Jan-12 13:36:04 -0800, Rick Updegrove wrote:
Fatal trap 12: page fault while in kernel mode
fault virtual address  = 0x4d
fault code = supervisor read, page not present
instruction pointer= 0x8:0xc061c642

That's a NULL pointer dereference.  It's not necessarily hardware.

[EMAIL PROTECTED] nm -n /boot/kernel | grep c061c642
nm: Warning: '/boot/kernel' is not an ordinary file

Two problems:
1) The kernel is /boot/kernel/kernel (sysctl kern.bootfile)
2) You're extremely unlikely to find a symbol at that address.
   What you need to do is
   $ nm -n `sysctl kern.bootfile` | less
   and search for the symbol closest to but no greater than 0xc061c642

This still isn't enough information to reveal anything useful.  As a
minimum, you need to enable DDB (options DDB and options KDB) and
get a backtrace after the panic.  If you don't already have one, a
serial console will make things much easier.  A crashdump or gdb
session would be much better.

 Hardware problems would be my first suspicion here.

Me too... if it were not for the fact 5.3-RELEASE is the only OS that 
has problems on this hardware.

That doesn't totally rule out hardware.  Pattern-sensitive memory
problems may not show up on different operating systems (or even
different kernels).  That said, based on the trap information, I'd
look at a software cause first.

-- 
Peter Jeremy
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 5.3-RELEASE crashes during make buildworld (and other problems)

2005-01-12 Thread Rick Updegrove
Mark Kirkwood wrote:
 I am wondering if cpu overheating could be a factor. In 4.x you are
 building with gcc 2.95, whereas 5.3 uses 3.4 - the 3.x compiler
 takes longer and works harder, which may be generating more heat (i.e
 too much heat).
 You can test this by installing the cpuburn port and running it for
 10-20 minutes.
Thank you very much for the reply Mark.  I installed cpuburn but ran out 
of time that night to test it.
After some reading, I limited the RAM by adding the following to rc.conf

hw.physmem=512M# Limit physical memory. See loader(8)
Then I rebooted.  I did this because I do not want to wait for 1536M to
be written to disk after the inevitable crash.  Then I ran cpuburn
(actually burnK7) and top and monitored them.
*start paste from top process
last pid: 56346;  load averages:  1.00,  1.00,  1.13
 up 0+01:28:13  19:59:17
31 processes:  2 running, 29 sleeping
CPU states: 99.2% user,  0.0% nice,  0.4% system,  0.4% interrupt,  0.0%
idle
Mem: 48M Active, 160M Inact, 69M Wired, 1848K Cache, 60M Buf, 214M Free
Swap: 3047M Total, 3047M Free
 PID USERNAME PRI NICE   SIZERES STATETIME   WCPUCPU COMMAND
 607 root 1310   136K36K RUN 51:14 98.97% 98.97% burnK7
*end paste from top process
As you can see it went on for 51 minutes and my machine did not lock up
and the heat alarm did not go off.  Please note that I ran KDE and
portupgrade during this test with no problems at all.  Also note there is a
160 GIG drive in here that was undergoing a fsck -B which really slows 
down the system a lot.

Thanks again for the reply, but I do not suspect hardware per se.  The
reason I believe this is as I mentioned  this machine runs FreeBSD
4.11-STABLE and/or Win2K (all service packs) just fine.  On windoze I
just beat half-life 2 without ever crashing it.
On FreeBSD 4.11 I run make buildworld every few days without ever
crashing it.
The only thing I can think to do differently is comment
# hw.physmem=512M# Limit physical memory. See loader(8)
and try cpuburn again?  Any other ideas?
Rick
P.S.
After writing all this, I did manage to finally build 5.3-STABLE after a
few more tries at make buildworld, from wherever it failed.  Aside from
one lockup in KDE (no crashdump yet sorry) last night it has been ok
today.  I will however continue to test make buildworld and try to get a 
crashdump to post to the list because I would really like to know
what is really causing this instability.  Thanks again!


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 5.3-RELEASE crashes during make buildworld (and other problems)

2005-01-12 Thread Rick Updegrove
Lowell Gilbert wrote:
 That should be dumpdir, not DUMPDIR.
 The default would be /var/crash instead of /usr/crash.
 Also, /dev/ad0s1b has to be bigger than your RAM size.
Thank you very much for the reply Lowell,
That DUMPDIR was silly of me, thank you for pointing it out.
I have 1.5 GIGS of RAM and /var is 248 MEGS which is self-explanatory.
 You can try to analyze the panic messages themselves.
 There is some guidance for this in the FAQ.
The guidence I read at in the developer's handbook
suggests I obtain a crashdump and post to the list because the info 
found in a panic (example of one of mine below) is not enough.

Fatal trap 12: page fault while in kernel mode
fault virtual address   = 0x4d
fault code  = supervisor read, page not present
instruction pointer = 0x8:0xc061c642
stack pointer   = 0x10:0xf00e1cc4
frame pointer   = 0x10:0xf00e1cd0
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, def32 1, gran 1
processor eflags= interrupt enabled, resume, IOPL - 0
current process = 1009 (kdeinit)
trap number = 12
panic: page fault
Uptime: 21m8s
So, I am still trying to obtain a dump.
Thanks to your reply, I did re-read #KERNEL-PANIC-TROUBLESHOOTING more 
carefully and I did try the following.

[EMAIL PROTECTED] nm -n /boot/kernel | grep c061c642
nm: Warning: '/boot/kernel' is not an ordinary file
Any ideas on that?  The reason I did not try that first was I mistakenly 
thought I had to first capture the crash dump for some reason.

 Hardware problems would be my first suspicion here.
Me too... if it were not for the fact 5.3-RELEASE is the only OS that 
has problems on this hardware.

 If you try it again, does it fail in the same place?
No it does not fail in the same place every time but I still do not 
suspect hardware per se.

For more details on why I believe that statement, please see:
http://lists.freebsd.org/pipermail/freebsd-stable/2005-January/011034.html
Thanks again for the reply it was helpful.
Rick
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 5.3-RELEASE crashes during make buildworld (and other problems)

2005-01-12 Thread Mark Kirkwood
Rick Updegrove wrote:
 PID USERNAME PRI NICE   SIZERES STATETIME   WCPUCPU COMMAND
 607 root 1310   136K36K RUN 51:14 98.97% 98.97% burnK7
*end paste from top process
As you can see it went on for 51 minutes and my machine did not lock up
and the heat alarm did not go off.  Please note that I ran KDE and
portupgrade during this test with no problems at all.  Also note there is a
160 GIG drive in here that was undergoing a fsck -B which really slows 
down the system a lot.

Thanks again for the reply, but I do not suspect hardware per se.  The
reason I believe this is as I mentioned  this machine runs FreeBSD
4.11-STABLE and/or Win2K (all service packs) just fine.  On windoze I
just beat half-life 2 without ever crashing it.

98.87% for 51 minutes *should* have triggered any heat problems, so 
looks like the HW is not the problem (worth ruling out anyway).

best wishes
Mark
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 5.3-RELEASE crashes during make buildworld (and other problems)

2005-01-11 Thread Lowell Gilbert
Rick Updegrove [EMAIL PROTECTED] writes:

 This machine runs 4.11-STABLE just fine.  I can make buildworld all day.
   Before that it ran Win2k for many months with no problems.  For these
 reasons, I do not suspect hardware at this point.
 
 When I install 5.3-RELEASE it runs fine until...
 
 When I attempt to cvsup to STABLE and run make buildworld (yes with and
 without the -j) it crashes.
 
 I am very bad at kernel debugging because FreeBSD 4 has (almost)
 always been perfectly stable so I have read and re-read the handbook
 and I am trying to get more information to the list.
 
 So far in rc.conf I added:
 
 dumpdev=/dev/ad0s1b
 DUMPDIR=/usr/crash

That should be dumpdir, not DUMPDIR.  
The default would be /var/crash instead of /usr/crash.
Also, /dev/ad0s1b has to be bigger than your RAM size.

 Then I
 chmod 700 /usr/crash
 
 Then in /boot/loader.conf I added
 
 verbose_loading=YES
 boot_verbose=YES
 
 Does this look reasonable?

Pretty much.

 What else should I do?

You can try to analyze the panic messages themselves.
There is some guidance for this in the FAQ.

 Meanwhile, I started the make buildworld again (right where it left off)
 and I am waiting for it to crash.
 
 You can find the dmesg and anything else I find at
 http://rick.updegrove.net/FreeBSD/jan-10-2005/
 
 *UPDATE*
 While I was writing this the make buildworld failed and left me some
 details which I put into a file named gcc-error-1 file at
 http://rick.updegrove.net/FreeBSD/jan-10-2005/gcc-error-1

Hardware problems would be my first suspicion here.
If you try it again, does it fail in the same place?

 Then I rebooted and went into X and soon I got
 http://rick.updegrove.net/FreeBSD/jan-10-2005/panic_kdeinit.txt
 
 There is nothing in /usr/crash/
 
 I cant find any files name vmcore anywhere.
 
 I noticed that on the new 5.3-RELEASE SYSTEM if I do not have a
 half-failed make buildworld I can install packages with pkg_add -r
 whatever all day long and the machine (and KDE) runs fine.

There shouldn't be any relationship between the two...
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 5.3-RELEASE crashes during make buildworld (and other problems)

2005-01-10 Thread Mark Kirkwood
I am wondering if cpu overheating could be a factor. In 4.x you are 
building with gcc 2.95,  whereas 5.3 uses 3.4 - the 3.x compiler takes 
longer and works harder, which may be generating more heat (i.e too much 
heat).

You can test this by installing the cpuburn port and running it for 
10-20 minutes.

regards
Mark
Rick Updegrove wrote:
Hi all,
This machine runs 4.11-STABLE just fine.  I can make buildworld all day.
 Before that it ran Win2k for many months with no problems.  For these
reasons, I do not suspect hardware at this point.
When I install 5.3-RELEASE it runs fine until...
When I attempt to cvsup to STABLE and run make buildworld (yes with and
without the -j) it crashes.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]