Re: 5.3-RELEASE crashes during make buildworld (and other problems)
On Mon, 17 Jan 2005, Vivek Khera wrote: On Jan 13, 2005, at 4:46 AM, Peter Jeremy wrote: That doesn't totally rule out hardware. Pattern-sensitive memory problems may not show up on different operating systems (or even different kernels). That said, based on the trap information, I'd look at a software cause first. Indeed. I once had a box that would run Linux 100% stable under any load for months on end, but with BSD/OS it would crap out (random processes fail) after a max of 3 weeks requiring a reboot. Never rule out bad hardware, especially with PC crap. Even minor OS revisions can reveal or hide memory problems. For example, for quite a while one of my Pentium (1!) server boxes had a single bit error (a stuck on bit) that fell into a section of memory that always held pinned kernel pages, and in particular, ended up holding a fairly obscure kernel code branch in a module that was loaded. Then one day kernel memory layout got chaged a bit, and the page ended up being paged into user memory, resulting in frequent application segfaults and data corruption. I was sure it was the OS upgrade, since backing out to the previous kernel/modules fixed it reliably ... until I ran a memory test and figured out what was actually happening. It was pretty frustrating to try to debug, and reinforces the conclusion that doing a bit of legwork on a badly behaving system to confirm it's not a hardware fault that can be easily ruled out can go a long way. Which isn't to say that the problem in this thread is hardware, but you don't want to spend two weeks tracking a kernel bug to find out that swapping out the memory with a seemingly identical DIMM fixes it. Checking ethernet cabling and link negotiation, a decent memory test run, checking SCSI termination, checking ATA cable type, etc, as first steps to debugging a problem that would have similar symptoms is a good strt. Oh, and if it's your parents calling on the phone at 6:30am with a printer problem, the first thing to ask is whether their printer is plugged in. :-) Robert N M Watson ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 5.3-RELEASE crashes during make buildworld (and other problems)
On Jan 13, 2005, at 4:46 AM, Peter Jeremy wrote: That doesn't totally rule out hardware. Pattern-sensitive memory problems may not show up on different operating systems (or even different kernels). That said, based on the trap information, I'd look at a software cause first. Indeed. I once had a box that would run Linux 100% stable under any load for months on end, but with BSD/OS it would crap out (random processes fail) after a max of 3 weeks requiring a reboot. Never rule out bad hardware, especially with PC crap. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Ctrl + Alt + F1 always locks up 5.3-STABLE machine ( Was Re:5.3-RELEASE crashes during make buildworld (and other problems))
Rick Updegrove wrote: When I finally got 5.3-STABLE built after several mysterious failed attempts the machine basically runs fine until... I try to Ctrl + Alt + F1 (or any of the F keys) that will now consistently locks up the machine. If I am quick enough with Ctrl + Alt + Backspace or Ctrl + Alt + Del (or ssh from another machine) I can at least get it to reboot without a fsck on startup. Unfortunately, if I am too slow it hangs indefinitely. I have had to just power it off several times now and of course it then complains about / not being unmounted properly etc. etc. Yes, this probably contributes to some of my problems earlier but when I was running make buildworld from single user mode it still crashed and despite all the help I have gotten here I still have not managed to capture a crash dump. So, I am running 5.3-STABLE (from yesterday) and all my ports/packages are up to date and if I forget I cant switch to a terminal while I experiment with fluxbox or KDE I will always lock up the machine. Any suggestions would be greatly appreciated. Rick -- No virus found in this outgoing message. Checked by AVG Anti-Virus. Version: 7.0.300 / Virus Database: 265.6.12 - Release Date: 1/14/2005 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Ctrl + Alt + F1 always locks up 5.3-STABLE machine ( Was Re:5.3-RELEASE crashes during make buildworld (and other problems))
On Fri, 14 Jan 2005, Rick Updegrove wrote: Rick Updegrove wrote: When I finally got 5.3-STABLE built after several mysterious failed attempts the machine basically runs fine until... I try to Ctrl + Alt + F1 (or any of the F keys) that will now consistently locks up the machine. If I am quick enough with Ctrl + Alt + Backspace or Ctrl + Alt + Del (or ssh from another machine) I can at least get it to reboot without a fsck on startup. Unfortunately, if I am too slow it hangs indefinitely. I have had to just power it off several times now and of course it then complains about / not being unmounted properly etc. etc. Yes, this probably contributes to some of my problems earlier but when I was running make buildworld from single user mode it still crashed and despite all the help I have gotten here I still have not managed to capture a crash dump. So, I am running 5.3-STABLE (from yesterday) and all my ports/packages are up to date and if I forget I cant switch to a terminal while I experiment with fluxbox or KDE I will always lock up the machine. Are you starting xdm, kdm or some other display manager on boot? THere's a race you can lose that will appear to lock up the console if the display manager and getty start up on ttyv0 simultaneously. I usually insert a 5-10s sleep in the display manager startup script to avoid this. -- Doug White| FreeBSD: The Power to Serve [EMAIL PROTECTED] | www.FreeBSD.org ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 5.3-RELEASE crashes during make buildworld (and other problems)
On Wednesday 12 January 2005 22:36, Rick Updegrove wrote: Lowell Gilbert wrote: [ ... ] So, I am still trying to obtain a dump. Thanks to your reply, I did re-read #KERNEL-PANIC-TROUBLESHOOTING more carefully and I did try the following. [EMAIL PROTECTED] nm -n /boot/kernel | grep c061c642 nm: Warning: '/boot/kernel' is not an ordinary file /boot/kernel is a directory containing the kernel and loadable modules. Try to run nm on /boot/kernel/kernel. Any ideas on that? The reason I did not try that first was I mistakenly thought I had to first capture the crash dump for some reason. You can get much much more information about what went wrong from a crash dump, so try to capture one if you can. Oh, and build a debug kernel, if you didn't do it before. A crash dump can be pretty useless without one. [ ... ] Rick cheers, m. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 5.3-RELEASE crashes during make buildworld (and other problems)
On Wed, 2005-Jan-12 13:36:04 -0800, Rick Updegrove wrote: Fatal trap 12: page fault while in kernel mode fault virtual address = 0x4d fault code = supervisor read, page not present instruction pointer= 0x8:0xc061c642 That's a NULL pointer dereference. It's not necessarily hardware. [EMAIL PROTECTED] nm -n /boot/kernel | grep c061c642 nm: Warning: '/boot/kernel' is not an ordinary file Two problems: 1) The kernel is /boot/kernel/kernel (sysctl kern.bootfile) 2) You're extremely unlikely to find a symbol at that address. What you need to do is $ nm -n `sysctl kern.bootfile` | less and search for the symbol closest to but no greater than 0xc061c642 This still isn't enough information to reveal anything useful. As a minimum, you need to enable DDB (options DDB and options KDB) and get a backtrace after the panic. If you don't already have one, a serial console will make things much easier. A crashdump or gdb session would be much better. Hardware problems would be my first suspicion here. Me too... if it were not for the fact 5.3-RELEASE is the only OS that has problems on this hardware. That doesn't totally rule out hardware. Pattern-sensitive memory problems may not show up on different operating systems (or even different kernels). That said, based on the trap information, I'd look at a software cause first. -- Peter Jeremy ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 5.3-RELEASE crashes during make buildworld (and other problems)
Mark Kirkwood wrote: I am wondering if cpu overheating could be a factor. In 4.x you are building with gcc 2.95, whereas 5.3 uses 3.4 - the 3.x compiler takes longer and works harder, which may be generating more heat (i.e too much heat). You can test this by installing the cpuburn port and running it for 10-20 minutes. Thank you very much for the reply Mark. I installed cpuburn but ran out of time that night to test it. After some reading, I limited the RAM by adding the following to rc.conf hw.physmem=512M# Limit physical memory. See loader(8) Then I rebooted. I did this because I do not want to wait for 1536M to be written to disk after the inevitable crash. Then I ran cpuburn (actually burnK7) and top and monitored them. *start paste from top process last pid: 56346; load averages: 1.00, 1.00, 1.13 up 0+01:28:13 19:59:17 31 processes: 2 running, 29 sleeping CPU states: 99.2% user, 0.0% nice, 0.4% system, 0.4% interrupt, 0.0% idle Mem: 48M Active, 160M Inact, 69M Wired, 1848K Cache, 60M Buf, 214M Free Swap: 3047M Total, 3047M Free PID USERNAME PRI NICE SIZERES STATETIME WCPUCPU COMMAND 607 root 1310 136K36K RUN 51:14 98.97% 98.97% burnK7 *end paste from top process As you can see it went on for 51 minutes and my machine did not lock up and the heat alarm did not go off. Please note that I ran KDE and portupgrade during this test with no problems at all. Also note there is a 160 GIG drive in here that was undergoing a fsck -B which really slows down the system a lot. Thanks again for the reply, but I do not suspect hardware per se. The reason I believe this is as I mentioned this machine runs FreeBSD 4.11-STABLE and/or Win2K (all service packs) just fine. On windoze I just beat half-life 2 without ever crashing it. On FreeBSD 4.11 I run make buildworld every few days without ever crashing it. The only thing I can think to do differently is comment # hw.physmem=512M# Limit physical memory. See loader(8) and try cpuburn again? Any other ideas? Rick P.S. After writing all this, I did manage to finally build 5.3-STABLE after a few more tries at make buildworld, from wherever it failed. Aside from one lockup in KDE (no crashdump yet sorry) last night it has been ok today. I will however continue to test make buildworld and try to get a crashdump to post to the list because I would really like to know what is really causing this instability. Thanks again! ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 5.3-RELEASE crashes during make buildworld (and other problems)
Lowell Gilbert wrote: That should be dumpdir, not DUMPDIR. The default would be /var/crash instead of /usr/crash. Also, /dev/ad0s1b has to be bigger than your RAM size. Thank you very much for the reply Lowell, That DUMPDIR was silly of me, thank you for pointing it out. I have 1.5 GIGS of RAM and /var is 248 MEGS which is self-explanatory. You can try to analyze the panic messages themselves. There is some guidance for this in the FAQ. The guidence I read at in the developer's handbook suggests I obtain a crashdump and post to the list because the info found in a panic (example of one of mine below) is not enough. Fatal trap 12: page fault while in kernel mode fault virtual address = 0x4d fault code = supervisor read, page not present instruction pointer = 0x8:0xc061c642 stack pointer = 0x10:0xf00e1cc4 frame pointer = 0x10:0xf00e1cd0 code segment= base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags= interrupt enabled, resume, IOPL - 0 current process = 1009 (kdeinit) trap number = 12 panic: page fault Uptime: 21m8s So, I am still trying to obtain a dump. Thanks to your reply, I did re-read #KERNEL-PANIC-TROUBLESHOOTING more carefully and I did try the following. [EMAIL PROTECTED] nm -n /boot/kernel | grep c061c642 nm: Warning: '/boot/kernel' is not an ordinary file Any ideas on that? The reason I did not try that first was I mistakenly thought I had to first capture the crash dump for some reason. Hardware problems would be my first suspicion here. Me too... if it were not for the fact 5.3-RELEASE is the only OS that has problems on this hardware. If you try it again, does it fail in the same place? No it does not fail in the same place every time but I still do not suspect hardware per se. For more details on why I believe that statement, please see: http://lists.freebsd.org/pipermail/freebsd-stable/2005-January/011034.html Thanks again for the reply it was helpful. Rick ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 5.3-RELEASE crashes during make buildworld (and other problems)
Rick Updegrove wrote: PID USERNAME PRI NICE SIZERES STATETIME WCPUCPU COMMAND 607 root 1310 136K36K RUN 51:14 98.97% 98.97% burnK7 *end paste from top process As you can see it went on for 51 minutes and my machine did not lock up and the heat alarm did not go off. Please note that I ran KDE and portupgrade during this test with no problems at all. Also note there is a 160 GIG drive in here that was undergoing a fsck -B which really slows down the system a lot. Thanks again for the reply, but I do not suspect hardware per se. The reason I believe this is as I mentioned this machine runs FreeBSD 4.11-STABLE and/or Win2K (all service packs) just fine. On windoze I just beat half-life 2 without ever crashing it. 98.87% for 51 minutes *should* have triggered any heat problems, so looks like the HW is not the problem (worth ruling out anyway). best wishes Mark ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 5.3-RELEASE crashes during make buildworld (and other problems)
Rick Updegrove [EMAIL PROTECTED] writes: This machine runs 4.11-STABLE just fine. I can make buildworld all day. Before that it ran Win2k for many months with no problems. For these reasons, I do not suspect hardware at this point. When I install 5.3-RELEASE it runs fine until... When I attempt to cvsup to STABLE and run make buildworld (yes with and without the -j) it crashes. I am very bad at kernel debugging because FreeBSD 4 has (almost) always been perfectly stable so I have read and re-read the handbook and I am trying to get more information to the list. So far in rc.conf I added: dumpdev=/dev/ad0s1b DUMPDIR=/usr/crash That should be dumpdir, not DUMPDIR. The default would be /var/crash instead of /usr/crash. Also, /dev/ad0s1b has to be bigger than your RAM size. Then I chmod 700 /usr/crash Then in /boot/loader.conf I added verbose_loading=YES boot_verbose=YES Does this look reasonable? Pretty much. What else should I do? You can try to analyze the panic messages themselves. There is some guidance for this in the FAQ. Meanwhile, I started the make buildworld again (right where it left off) and I am waiting for it to crash. You can find the dmesg and anything else I find at http://rick.updegrove.net/FreeBSD/jan-10-2005/ *UPDATE* While I was writing this the make buildworld failed and left me some details which I put into a file named gcc-error-1 file at http://rick.updegrove.net/FreeBSD/jan-10-2005/gcc-error-1 Hardware problems would be my first suspicion here. If you try it again, does it fail in the same place? Then I rebooted and went into X and soon I got http://rick.updegrove.net/FreeBSD/jan-10-2005/panic_kdeinit.txt There is nothing in /usr/crash/ I cant find any files name vmcore anywhere. I noticed that on the new 5.3-RELEASE SYSTEM if I do not have a half-failed make buildworld I can install packages with pkg_add -r whatever all day long and the machine (and KDE) runs fine. There shouldn't be any relationship between the two... ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 5.3-RELEASE crashes during make buildworld (and other problems)
I am wondering if cpu overheating could be a factor. In 4.x you are building with gcc 2.95, whereas 5.3 uses 3.4 - the 3.x compiler takes longer and works harder, which may be generating more heat (i.e too much heat). You can test this by installing the cpuburn port and running it for 10-20 minutes. regards Mark Rick Updegrove wrote: Hi all, This machine runs 4.11-STABLE just fine. I can make buildworld all day. Before that it ran Win2k for many months with no problems. For these reasons, I do not suspect hardware at this point. When I install 5.3-RELEASE it runs fine until... When I attempt to cvsup to STABLE and run make buildworld (yes with and without the -j) it crashes. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]