Re: More panics (different hardware)
MFS will not cause you problems. It's safe to leave it in. I think it might be a little premature to reach that conclusion right now; I've had panics with MFS in the past and also took note of the fact when Andrew said his usage of fdesc post-dated the crashes. But for that, it would be my prime suspect as well (unless Andrew simply got his timeline wrong :-). - Jordan To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-stable" in the body of the message
Re: More panics (different hardware)
Jordan, Cy and co., I'll admit my concept of time was taken from me during lectures on special relativity, but for all observers on the list, I added fdesc after the crashes began. I didn't see any problem with trying it. I'll add the mfs mounts back later and poke the system by running "periodic daily", since I think that the evidence is strong enough that there is something in there which is tickling the cause of the crashes. More details later. Thanks for your continued interest and assistance. BTW, I'm intentionally not updating my world, in case a change masks (as opposed to fixes) the problem. -Andrew- -- ___ | -Andrew J. Caines- Unix Systems Engineer [EMAIL PROTECTED] | To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-stable" in the body of the message
Re: More panics (different hardware)
Jordan and list, Right on time tonight - 02:06 while running "periodic daily". The panic was exactly the same as before, so I won't repeat it. The command being run at the time was "tee" again. Just before the panic I had shut down all X and was running a "ps -axww ; top | head -24" snapshot every ten seconds. Here it is: -8 PID TT STAT TIME COMMAND 0 ?? DLs0:00.26 (swapper) 1 ?? ILs0:00.27 /sbin/init -- 2 ?? DL 0:15.34 (pagedaemon) 3 ?? DL 0:03.26 (vmdaemon) 4 ?? DL 0:04.60 (bufdaemon) 5 ?? DL 0:31.55 (syncer) 34 ?? ILs0:00.38 mfs -o noatime -s 16384 /dev/ad0s1b /tmp (mount_mfs) 36 ?? ILs0:00.06 mfs -o noatime -s 2048 /dev/ad0s1b /var/run (mount_mfs) 114 ?? Ss 0:00.64 /sbin/dhclient dc0 141 ?? Ss 0:00.91 syslogd -s -vv -a localhost:* 148 ?? Ss0:06.90 ntpd -p /var/run/ntpd.pid 169 ?? Ss 0:00.17 inetd -wW 171 ?? Is 0:00.58 cron 198 ?? Ss 0:00.57 /usr/sbin/sshd 245 ?? Ss 0:01.87 /usr/local/libexec/postfix/master 252 ?? S 0:00.91 qmgr -l -t fifo -u 254 ?? Ss 0:55.62 moused -p /dev/psm0 -t auto 301 ?? Ss 0:00.37 thttpd -C /usr/local/etc/thttpd.conf 33319 ?? S 0:00.05 pickup -l -t fifo 40862 ?? ZN 0:00.00 (junkbuster) 40863 ?? ZN 0:00.00 (junkbuster) 59967 ?? I 0:00.00 cron 59968 ?? Is 0:00.01 /bin/sh -c periodic daily 59969 ?? I 0:00.02 /bin/sh - /usr/sbin/periodic daily 59982 ?? I 0:00.08 /bin/sh - /usr/sbin/periodic daily 59983 ?? I 0:00.00 /bin/sh - /usr/sbin/periodic daily 59985 ?? I 0:00.01 mail -s hal9000.bsdonline.org daily run output root 60234 ?? I 0:00.01 /bin/sh /etc/periodic/daily/450.status-security 60239 ?? I 0:00.01 sh /etc/security 60240 ?? I 0:00.01 sendmail root 60241 ?? I 0:00.01 /usr/local/sbin/postdrop 60250 ?? S 0:00.01 sh /etc/security 60251 ?? S 0:00.01 xargs -0 -n 20 ls -liTd 60252 ?? S 0:00.01 sort +10 60285 ?? S 0:00.02 cleanup -t unix -u 60286 ?? S 0:00.01 trivial-rewrite -n rewrite -t unix -u 60287 ?? S 0:00.02 local -t unix 60288 ?? Ss 0:00.02 comsat 60313 ?? D 0:00.55 find /usr/local -xdev -type f ( -perm -u+x -or -perm -g+x -or -perm -o+x ) ( -perm -u+s -or -perm -g+s ) -print0 317 v0 Ss+0:00.13 -bash (bash) 60294 v0 S 0:00.01 -bash (bash) 60318 v0 S 0:00.00 -bash (bash) 60319 v0 R 0:00.00 ps -axww 60274 v1 Is+0:00.01 /usr/libexec/getty Pc ttyv1 319 v2 IWs+ 0:00.00 /usr/libexec/getty Pc ttyv2 320 v3 IWs+ 0:00.00 /usr/libexec/getty Pc ttyv3 321 v4 IWs+ 0:00.00 /usr/libexec/getty Pc ttyv4 322 v5 IWs+ 0:00.00 /usr/libexec/getty Pc ttyv5 323 v6 IWs+ 0:00.00 /usr/libexec/getty Pc ttyv6 324 v7 IWs+ 0:00.00 /usr/libexec/getty Pc ttyv7 278 con- TWN0:00.00 dnetc -ini /home/dnet/dnetc.ini (dnetc-2.8010.463) 290 con- IWN+ 0:00.00 junkbuster /usr/local/etc/junkbuster/junkbuster.conf last pid: 60321; load averages: 0.07, 0.09, 0.16 up 0+23:54:5502:03:49 47 processes: 1 running, 43 sleeping, 1 stopped, 2 zombie Mem: 28M Active, 31M Inact, 18M Wired, 3752K Cache, 19M Buf, 12M Free Swap: 256M Total, 5096K Used, 251M Free, 1% Inuse PID USERNAME PRI NICE SIZERES STATETIME WCPUCPU COMMAND 60313 root -6 0 980K 544K biord0:01 4.85% 1.76% find 278 dnet 68 20 740K 0K STOP 968:25 0.00% 0.00% dnetc-2.8010. 254 root 2 0 908K84K select 0:56 0.00% 0.00% moused 148 root 2 -12 1284K 328K select 0:07 0.00% 0.00% ntpd 245 root 2 0 996K 236K select 0:02 0.00% 0.00% master 290 proxy 2 5 1736K 0K accept 0:01 0.00% 0.00% junkbuster 141 root 2 0 944K 320K select 0:01 0.00% 0.00% syslogd 252 postfix2 0 1072K 524K select 0:01 0.00% 0.00% qmgr 114 root 2 0 536K 120K select 0:01 0.00% 0.00% dhclient 171 root 10 0 984K 240K nanslp 0:01 0.00% 0.00% cron 198 root 2 0 2144K88K select 0:01 0.00% 0.00% sshd 34 root 10 0 8712K40K mfsidl 0:00 0.00% 0.00% mount_mfs 301 www2 0 1256K 544K poll 0:00 0.00% 0.00% thttpd 169 root 2 0 1060K 140K select 0:00 0.00% 0.00% inetd 60320 root 30 0 1460K 1044K RUN 0:00 0.00% 0.00% top 317 root 3 0 1052K 616K ttyin0:00 0.00% 0.00% bash 59982 root 10 0 668K 264K wait 0:00 0.00% 0.00% sh 36 root 10 0 1532K68K mfsidl 0:00 0.00% 0.00% mount_mfs -8 As you can see I still had the mfs and fdesc mounts active. Now, after the reboot, I'm all disk. We'll see what happens after 02:00 tomorow. Note that this is
Re: More panics (different hardware)
Jordan and list, If you could get a kernel crash dump, especially with a kernel with debugging symbols, that would help enormously! Thanks. For better or worse, my box just obliged with a crash only 3h41m28s after booting my "DEBUG" kernel. I have found at least one interesting factor in the crashes. Searching my logs for timestatms associated with the crashes, I see... hal9000:/root# awk '/The FreeBSD Project/{print $1" "$2"\t"$3}' /var/log/messages{.1,.0,} Sep 9 23:34:24 Sep 10 16:33:17 Sep 10 16:51:09 Sep 11 02:47:41 Sep 20 20:12:51 Sep 20 20:17:06 Sep 21 02:02:07 Sep 22 02:02:16 Sep 22 19:51:15 Sep 23 02:11:15 Sep 24 02:11:53 Sep 24 02:19:24 Sep 25 02:10:45 Sep 26 02:10:56 Sep 26 18:52:44 Sep 27 02:11:00 Sep 27 23:45:23 Sep 28 02:10:37 Sep 29 02:10:33 Sep 30 02:10:32 Sep 30 22:26:08 Oct 1 02:10:50 You'll notice the remarkable number of crashes at or around 02:10. The only thing which runs regularly around then is "periodic daily", which starts at 01:59. I was sitting here while the disks rumbled away and after a while the system dived. While I would usually, think this is a hardware issue - heating from the overactive disks upsetting the memory or whatever, this system builds world at least weekly and has never crashing during that time. The build uses all three disks and, of course, hits them pretty hard. Sometimes I build a few ports at the same time and there has never been a complaint. Here's what I got from the core. Script started on Sun Oct 1 02:16:48 2000 hal9000:/root# cd /usr/obj/home/src/sys/DEBUG hal9000:DEBUG# gdb -k kernel.debug /var/crash/vmcore.0 GNU gdb 4.18 Copyright 1998 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-unknown-freebsd"... IdlePTD 3149824 initial pcb at 28b860 panicstr: page fault panic messages: --- Fatal trap 12: page fault while in kernel mode fault virtual address = 0x6c fault code = supervisor read, page not present instruction pointer = 0x8:0xc0175772 stack pointer = 0x10:0xc7676db4 frame pointer = 0x10:0xc7676dd4 code segment= base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags= interrupt enabled, resume, IOPL = 0 current process = 973 (tee) interrupt mask = none trap number = 12 panic: page fault syncing disks... 182 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 giving up on 5 buffers Uptime: 3h41m28s dumping to dev #ad/0x20001, offset 327680 dump ata0: resetting devices .. done 96 95 94 93 92 91 90 89 88 87 86 85 84 83 82 81 80 79 78 77 76 75 74 73 72 71 70 69 68 67 66 65 64 63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 --- #0 boot (howto=256) at /home/src/sys/kern/kern_shutdown.c:302 302 dumppcb.pcb_cr3 = rcr3(); (kgdb) symbol-file kernel.debug Load new symbol table from "kernel.debug"? (y or n) y Reading symbols from kernel.debug...done. (kgdb) exec-file /var/crash/kernel.0 (kgdb) core-file /var/crash/vmcore.0 IdlePTD 3149824 initial pcb at 28b860 panicstr: page fault panic messages: --- Fatal trap 12: page fault while in kernel mode fault virtual address = 0x6c fault code = supervisor read, page not present instruction pointer = 0x8:0xc0175772 stack pointer = 0x10:0xc7676db4 frame pointer = 0x10:0xc7676dd4 code segment= base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags= interrupt enabled, resume, IOPL = 0 current process = 973 (tee) interrupt mask = none trap number = 12 panic: page fault syncing disks... 182 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 giving up on 5 buffers Uptime: 3h41m28s dumping to dev #ad/0x20001, offset 327680 dump ata0: resetting devices .. done 96 95 94 93 92 91 90 89 88 87 86 85 84 83 82 81 80 79 78 77 76 75 74 73 72 71 70 69 68 67 66 65 64 63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 --- #0 boot (howto=256) at /home/src/sys/kern/kern_shutdown.c:302 302 dumppcb.pcb_cr3 = rcr3(); (kgdb) bt #0 boot (howto=256) at /home/src/sys/kern/kern_shutdown.c:302 #1 0xc01419ec in poweroff_wait (junk=0xc02410cf, howto=-949627008) at /home/src/sys/kern/kern_shutdown.c:552 #2 0xc020aef2 in trap_fatal (frame=0xc7676d74, eva=108) at /home/src/sys/i386/i386/trap.c:951 #3 0xc020abb9 in trap_pfault (frame=0xc7676d74,
Re: More panics (different hardware)
Additional: I'm running 4.1.1-STABLE cvsup'ed on September 28th at 04:13. The box is a Gateway G6-266M with ? mobo, PII-266, 96MB, Quantum Fireball ST6.4A (ata0-master), Iomega ZIP (ata1-master), Mitsumi(?) ATA FX240S CD-ROM (ata1-slave), two Seagate/Compaq ST32171Ws off a Tekram DC-390F, STB Velocity 128 (NVidia/SGS-Thomson Riva128) AGP, Netgear XA410 TXC (dc0 - LC82C115 PNIC II 10/100BaseTX), Ensoniq ES1370. More info on request. -Andrew- -- ___ | -Andrew J. Caines- Unix Systems Engineer [EMAIL PROTECTED] | To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-stable" in the body of the message
Re: More panics (different hardware)
For better or worse, my box just obliged with a crash only 3h41m28s after booting my "DEBUG" kernel. Well, that's paradoxically something of a hopeful sign. :) #3 0xc020abb9 in trap_pfault (frame=0xc7676d74, usermode=0, eva=108) at /home/src/sys/i386/i386/trap.c:844 #4 0xc020a78f in trap (frame={tf_fs = -949288944, tf_es = -949551088, tf_ds = -949551088, tf_edi = -949522828, tf_esi = -949522944, tf_ebp = -949522988, tf_isp = -949523040, tf_ebx = -950285472, tf_edx = 0, tf_ecx = 27, tf_eax = -949523008, tf_trapno = 12, tf_err = 0, tf_eip = -1072212110, tf_cs = 8, tf_eflags = 66199, tf_esp = -949523008, tf_ss = 0}) at /home/src/sys/i386/i386/trap.c:443 #5 0xc0175772 in fdesc_setattr (ap=0xc7676e00) at vnode_if.h:305 #6 0xc0173d08 in vn_open (ndp=0xc7676ed0, fmode=1026, cmode=416) at vnode_if.h:305 This, however, is quite interesting. Can you tell us a little bit about what you're running on this system and if you're using any special devices? If this panic occurs twice in a row at the same location, we're definitely starting to narrow it down. - Jordan To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-stable" in the body of the message
Re: More panics (different hardware)
Please press the 'scroll lock' key to scroll upwards and report the real error message. Before you post it, take a few minutes to check the handbook section on kernel debugging and try giving us enough information to actually help you. You wouldn't ring your doctor up and say "Hey doc, I hurt" and expect him to tell you what's wrong. Why do you subject us to a comparable form of abuse? Just as a follow up to my constant panic report on 4.1-S with my Athlon system, I'd like to say that my Pentium 200 system has now joined in. This P200 system has served me with 100% rock solid stability for years. Not once has it had any weird behaviour. Anyways, the behaviour on both systems is the same. A fault at virtual address 0x30, preceeded by another fault which by that time has scrolled off the screen. The key phrase here seems to be "supervisor read, page not present". I feel I should add here that I am a commercial unix shell provider, and so I get the worst imaginable traffic on the internet. This P200 box doesn't allow shell access though, since it's only a web server. A system with 3 bad sticks of ram, and a rock solid system suddenly going bad? C'mon guys. Will nothing short of ECC RAM prove to you guys the existance of a software fault? Anybody wanna lend me some? :) (the P200 RAM is 72-pin so no, not the same kind as the Athlon's) BTW, 3.5-S ran fine on both systems...at least until it had to access the large Maxtor HD in the Athlon ... which is what prompted me to go to 4.1-S. Finally, for some good news. The P200 system is physically accessible to me, so I will try to find a spare hard drive, and make some crash dumps for the list's benefit. Thanks for all the responses I've gotten on this subject! They're greatly appreciated and help me maintain my sanity. :) --Bart To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-stable" in the body of the message -- ... every activity meets with opposition, everyone who acts has his rivals and unfortunately opponents also. But not because people want to be opponents, rather because the tasks and relationships force people to take different points of view. [Dr. Fritz Todt] V I C T O R Y N O T V E N G E A N C E To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-stable" in the body of the message
Re: More panics (different hardware)
On Thu, 28 Sep 2000, Mike Smith wrote: Please press the 'scroll lock' key to scroll upwards and report the real error message. Before you post it, take a few minutes to check the handbook section on kernel debugging and try giving us enough information to actually help you. You wouldn't ring your doctor up and say "Hey doc, I hurt" and expect him to tell you what's wrong. Why do you subject us to a comparable form of abuse? Because I didn't know about the scroll-lock key functionality, and I'm not a debugging pro. I will take the steps you've mentioned though, and provide you with the appropriate information. Thanks for the tips. --Bart To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-stable" in the body of the message