Re: More panics (different hardware)

2000-10-03 Thread Jordan Hubbard

 MFS will not cause you problems.  It's safe to leave it in.

I think it might be a little premature to reach that conclusion right
now; I've had panics with MFS in the past and also took note of the
fact when Andrew said his usage of fdesc post-dated the crashes.  But
for that, it would be my prime suspect as well (unless Andrew simply
got his timeline wrong :-).

- Jordan


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-stable" in the body of the message



Re: More panics (different hardware)

2000-10-03 Thread Andrew J Caines

Jordan, Cy and co.,

I'll admit my concept of time was taken from me during lectures on special
relativity, but for all observers on the list, I added fdesc after the
crashes began. I didn't see any problem with trying it.

I'll add the mfs mounts back later and poke the system by running
"periodic daily", since I think that the evidence is strong enough that
there is something in there which is tickling the cause of the crashes.

More details later.

Thanks for your continued interest and assistance.

BTW, I'm intentionally not updating my world, in case a change masks (as
opposed to fixes) the problem.


-Andrew-
-- 
 ___
| -Andrew J. Caines-   Unix Systems Engineer   [EMAIL PROTECTED] |


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-stable" in the body of the message



Re: More panics (different hardware)

2000-10-02 Thread Andrew J Caines

Jordan and list,

Right on time tonight - 02:06 while running "periodic daily".

The panic was exactly the same as before, so I won't repeat it. The
command being run at the time was "tee" again.

Just before the panic I had shut down all X and was running a "ps -axww ;
top | head -24" snapshot every ten seconds.

Here it is:

-8
  PID  TT  STAT  TIME COMMAND
0  ??  DLs0:00.26  (swapper)
1  ??  ILs0:00.27 /sbin/init --
2  ??  DL 0:15.34  (pagedaemon)
3  ??  DL 0:03.26  (vmdaemon)
4  ??  DL 0:04.60  (bufdaemon)
5  ??  DL 0:31.55  (syncer)
   34  ??  ILs0:00.38 mfs -o noatime -s 16384 /dev/ad0s1b /tmp (mount_mfs)
   36  ??  ILs0:00.06 mfs -o noatime -s 2048 /dev/ad0s1b /var/run (mount_mfs)
  114  ??  Ss 0:00.64 /sbin/dhclient dc0
  141  ??  Ss 0:00.91 syslogd -s -vv -a localhost:*
  148  ??  Ss0:06.90 ntpd -p /var/run/ntpd.pid
  169  ??  Ss 0:00.17 inetd -wW
  171  ??  Is 0:00.58 cron
  198  ??  Ss 0:00.57 /usr/sbin/sshd
  245  ??  Ss 0:01.87 /usr/local/libexec/postfix/master
  252  ??  S  0:00.91 qmgr -l -t fifo -u
  254  ??  Ss 0:55.62 moused -p /dev/psm0 -t auto
  301  ??  Ss 0:00.37 thttpd -C /usr/local/etc/thttpd.conf
33319  ??  S  0:00.05 pickup -l -t fifo
40862  ??  ZN 0:00.00  (junkbuster)
40863  ??  ZN 0:00.00  (junkbuster)
59967  ??  I  0:00.00 cron
59968  ??  Is 0:00.01 /bin/sh -c periodic daily
59969  ??  I  0:00.02 /bin/sh - /usr/sbin/periodic daily
59982  ??  I  0:00.08 /bin/sh - /usr/sbin/periodic daily
59983  ??  I  0:00.00 /bin/sh - /usr/sbin/periodic daily
59985  ??  I  0:00.01 mail -s hal9000.bsdonline.org daily run output root
60234  ??  I  0:00.01 /bin/sh /etc/periodic/daily/450.status-security
60239  ??  I  0:00.01 sh /etc/security
60240  ??  I  0:00.01 sendmail root
60241  ??  I  0:00.01 /usr/local/sbin/postdrop
60250  ??  S  0:00.01 sh /etc/security
60251  ??  S  0:00.01 xargs -0 -n 20 ls -liTd
60252  ??  S  0:00.01 sort +10
60285  ??  S  0:00.02 cleanup -t unix -u
60286  ??  S  0:00.01 trivial-rewrite -n rewrite -t unix -u
60287  ??  S  0:00.02 local -t unix
60288  ??  Ss 0:00.02 comsat
60313  ??  D  0:00.55 find /usr/local -xdev -type f ( -perm -u+x -or -perm -g+x 
-or -perm -o+x ) ( -perm -u+s -or -perm -g+s ) -print0
  317  v0  Ss+0:00.13 -bash (bash)
60294  v0  S  0:00.01 -bash (bash)
60318  v0  S  0:00.00 -bash (bash)
60319  v0  R  0:00.00 ps -axww
60274  v1  Is+0:00.01 /usr/libexec/getty Pc ttyv1
  319  v2  IWs+   0:00.00 /usr/libexec/getty Pc ttyv2
  320  v3  IWs+   0:00.00 /usr/libexec/getty Pc ttyv3
  321  v4  IWs+   0:00.00 /usr/libexec/getty Pc ttyv4
  322  v5  IWs+   0:00.00 /usr/libexec/getty Pc ttyv5
  323  v6  IWs+   0:00.00 /usr/libexec/getty Pc ttyv6
  324  v7  IWs+   0:00.00 /usr/libexec/getty Pc ttyv7
  278 con- TWN0:00.00 dnetc -ini /home/dnet/dnetc.ini (dnetc-2.8010.463)
  290 con- IWN+   0:00.00 junkbuster /usr/local/etc/junkbuster/junkbuster.conf


last pid: 60321;  load averages:  0.07,  0.09,  0.16  up 0+23:54:5502:03:49
47 processes:  1 running, 43 sleeping, 1 stopped, 2 zombie

Mem: 28M Active, 31M Inact, 18M Wired, 3752K Cache, 19M Buf, 12M Free
Swap: 256M Total, 5096K Used, 251M Free, 1% Inuse


  PID USERNAME PRI NICE  SIZERES STATETIME   WCPUCPU COMMAND
60313 root  -6   0   980K   544K biord0:01  4.85%  1.76% find
  278 dnet  68  20   740K 0K STOP   968:25  0.00%  0.00% dnetc-2.8010.
  254 root   2   0   908K84K select   0:56  0.00%  0.00% moused
  148 root   2 -12  1284K   328K select   0:07  0.00%  0.00% ntpd
  245 root   2   0   996K   236K select   0:02  0.00%  0.00% master
  290 proxy  2   5  1736K 0K accept   0:01  0.00%  0.00% junkbuster
  141 root   2   0   944K   320K select   0:01  0.00%  0.00% syslogd
  252 postfix2   0  1072K   524K select   0:01  0.00%  0.00% qmgr
  114 root   2   0   536K   120K select   0:01  0.00%  0.00% dhclient
  171 root  10   0   984K   240K nanslp   0:01  0.00%  0.00% cron
  198 root   2   0  2144K88K select   0:01  0.00%  0.00% sshd
   34 root  10   0  8712K40K mfsidl   0:00  0.00%  0.00% mount_mfs
  301 www2   0  1256K   544K poll 0:00  0.00%  0.00% thttpd
  169 root   2   0  1060K   140K select   0:00  0.00%  0.00% inetd
60320 root  30   0  1460K  1044K RUN  0:00  0.00%  0.00% top
  317 root   3   0  1052K   616K ttyin0:00  0.00%  0.00% bash
59982 root  10   0   668K   264K wait 0:00  0.00%  0.00% sh
   36 root  10   0  1532K68K mfsidl   0:00  0.00%  0.00% mount_mfs
-8

As you can see I still had the mfs and fdesc mounts active. Now, after the
reboot, I'm all disk. We'll see what happens after 02:00 tomorow.

Note that this is 

Re: More panics (different hardware)

2000-10-01 Thread Andrew J Caines

Jordan and list,

 If you could get a kernel crash dump, especially with a kernel with
 debugging symbols, that would help enormously!  Thanks.

For better or worse, my box just obliged with a crash only 3h41m28s after
booting my "DEBUG" kernel.

I have found at least one interesting factor in the crashes. Searching my
logs for timestatms associated with the crashes, I see...

hal9000:/root# awk '/The FreeBSD Project/{print $1" "$2"\t"$3}' 
/var/log/messages{.1,.0,}
Sep 9   23:34:24
Sep 10  16:33:17
Sep 10  16:51:09
Sep 11  02:47:41
Sep 20  20:12:51
Sep 20  20:17:06
Sep 21  02:02:07
Sep 22  02:02:16
Sep 22  19:51:15
Sep 23  02:11:15
Sep 24  02:11:53
Sep 24  02:19:24
Sep 25  02:10:45
Sep 26  02:10:56
Sep 26  18:52:44
Sep 27  02:11:00
Sep 27  23:45:23
Sep 28  02:10:37
Sep 29  02:10:33
Sep 30  02:10:32
Sep 30  22:26:08
Oct 1   02:10:50

You'll notice the remarkable number of crashes at or around 02:10. The
only thing which runs regularly around then is "periodic daily", which
starts at 01:59. I was sitting here while the disks rumbled away and after
a while the system dived.

While I would usually, think this is a hardware issue - heating from the
overactive disks upsetting the memory or whatever, this system builds
world at least weekly and has never crashing during that time. The build
uses all three disks and, of course, hits them pretty hard. Sometimes I
build a few ports at the same time and there has never been a complaint.


Here's what I got from the core.

Script started on Sun Oct  1 02:16:48 2000
hal9000:/root# cd /usr/obj/home/src/sys/DEBUG
hal9000:DEBUG# gdb -k kernel.debug /var/crash/vmcore.0
GNU gdb 4.18
Copyright 1998 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-unknown-freebsd"...
IdlePTD 3149824
initial pcb at 28b860
panicstr: page fault
panic messages:
---
Fatal trap 12: page fault while in kernel mode
fault virtual address   = 0x6c
fault code  = supervisor read, page not present
instruction pointer = 0x8:0xc0175772
stack pointer   = 0x10:0xc7676db4
frame pointer   = 0x10:0xc7676dd4
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, def32 1, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process = 973 (tee)
interrupt mask  = none
trap number = 12
panic: page fault

syncing disks... 182 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 
giving up on 5 buffers
Uptime: 3h41m28s

dumping to dev #ad/0x20001, offset 327680
dump ata0: resetting devices .. done
96 95 94 93 92 91 90 89 88 87 86 85 84 83 82 81 80 79 78 77 76 75 74 73 72 71 70 69 68 
67 66 65 64 63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 
38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 
9 8 7 6 5 4 3 2 1 
---
#0  boot (howto=256) at /home/src/sys/kern/kern_shutdown.c:302
302 dumppcb.pcb_cr3 = rcr3();
(kgdb) symbol-file kernel.debug
Load new symbol table from "kernel.debug"? (y or n) y

Reading symbols from kernel.debug...done.
(kgdb) exec-file /var/crash/kernel.0
(kgdb) core-file /var/crash/vmcore.0
IdlePTD 3149824
initial pcb at 28b860
panicstr: page fault
panic messages:
---
Fatal trap 12: page fault while in kernel mode
fault virtual address   = 0x6c
fault code  = supervisor read, page not present
instruction pointer = 0x8:0xc0175772
stack pointer   = 0x10:0xc7676db4
frame pointer   = 0x10:0xc7676dd4
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, def32 1, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process = 973 (tee)
interrupt mask  = none
trap number = 12
panic: page fault

syncing disks... 182 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 
giving up on 5 buffers
Uptime: 3h41m28s

dumping to dev #ad/0x20001, offset 327680
dump ata0: resetting devices .. done
96 95 94 93 92 91 90 89 88 87 86 85 84 83 82 81 80 79 78 77 76 75 74 73 72 71 70 69 68 
67 66 65 64 63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 
38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 
9 8 7 6 5 4 3 2 1 
---
#0  boot (howto=256) at /home/src/sys/kern/kern_shutdown.c:302
302 dumppcb.pcb_cr3 = rcr3();
(kgdb) bt
#0  boot (howto=256) at /home/src/sys/kern/kern_shutdown.c:302
#1  0xc01419ec in poweroff_wait (junk=0xc02410cf, howto=-949627008)
at /home/src/sys/kern/kern_shutdown.c:552
#2  0xc020aef2 in trap_fatal (frame=0xc7676d74, eva=108)
at /home/src/sys/i386/i386/trap.c:951
#3  0xc020abb9 in trap_pfault (frame=0xc7676d74, 

Re: More panics (different hardware)

2000-10-01 Thread Andrew J Caines

Additional:

I'm running 4.1.1-STABLE cvsup'ed on September 28th at 04:13.

The box is a Gateway G6-266M with ? mobo, PII-266, 96MB, Quantum Fireball
ST6.4A (ata0-master), Iomega ZIP (ata1-master), Mitsumi(?) ATA FX240S
CD-ROM (ata1-slave), two Seagate/Compaq ST32171Ws off a Tekram DC-390F,
STB Velocity 128 (NVidia/SGS-Thomson Riva128) AGP, Netgear XA410 TXC (dc0
- LC82C115 PNIC II 10/100BaseTX), Ensoniq ES1370.

More info on request.


-Andrew-
-- 
 ___
| -Andrew J. Caines-   Unix Systems Engineer   [EMAIL PROTECTED] |


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-stable" in the body of the message



Re: More panics (different hardware)

2000-10-01 Thread Jordan Hubbard

 For better or worse, my box just obliged with a crash only 3h41m28s after
 booting my "DEBUG" kernel.

Well, that's paradoxically something of a hopeful sign. :)

 #3  0xc020abb9 in trap_pfault (frame=0xc7676d74, usermode=0, eva=108)
 at /home/src/sys/i386/i386/trap.c:844
 #4  0xc020a78f in trap (frame={tf_fs = -949288944, tf_es = -949551088, 
   tf_ds = -949551088, tf_edi = -949522828, tf_esi = -949522944, 
   tf_ebp = -949522988, tf_isp = -949523040, tf_ebx = -950285472, 
   tf_edx = 0, tf_ecx = 27, tf_eax = -949523008, tf_trapno = 12, 
   tf_err = 0, tf_eip = -1072212110, tf_cs = 8, tf_eflags = 66199, 
   tf_esp = -949523008, tf_ss = 0}) at /home/src/sys/i386/i386/trap.c:443
 #5  0xc0175772 in fdesc_setattr (ap=0xc7676e00) at vnode_if.h:305
 #6  0xc0173d08 in vn_open (ndp=0xc7676ed0, fmode=1026, cmode=416)
 at vnode_if.h:305

This, however, is quite interesting.  Can you tell us a little bit
about what you're running on this system and if you're using any
special devices?  If this panic occurs twice in a row at the same
location, we're definitely starting to narrow it down.

- Jordan


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-stable" in the body of the message



Re: More panics (different hardware)

2000-09-28 Thread Mike Smith


Please press the 'scroll lock' key to scroll upwards and report the real 
error message.  Before you post it, take a few minutes to check the 
handbook section on kernel debugging and try giving us enough 
information to actually help you.

You wouldn't ring your doctor up and say "Hey doc, I hurt" and expect him 
to tell you what's wrong.  Why do you subject us to a comparable form of 
abuse?

   Just as a follow up to my constant panic report on 4.1-S with my
 Athlon system, I'd like to say that my Pentium 200 system has now joined
 in.  This P200 system has served me with 100% rock solid stability for
 years.  Not once has it had any weird behaviour.  Anyways, the behaviour
 on both systems is the same.  A fault at virtual address 0x30, preceeded
 by another fault which by that time has scrolled off the screen.  The key
 phrase here seems to be "supervisor read, page not present".
 
   I feel I should add here that I am a commercial unix shell
 provider, and so I get the worst imaginable traffic on the internet.  This
 P200 box doesn't allow shell access though, since it's only a web server.
 
   A system with 3 bad sticks of ram, and a rock solid system
 suddenly going bad?  C'mon guys.  Will nothing short of ECC RAM prove to
 you guys the existance of a software fault?  Anybody wanna lend me some?
 :)  (the P200 RAM is 72-pin  so no, not the same kind as the Athlon's)
 
   BTW, 3.5-S ran fine on both systems...at least until it had to
 access the large Maxtor HD in the Athlon ... which is what prompted me to
 go to 4.1-S.
 
   Finally, for some good news.  The P200 system is physically
 accessible to me, so I will try to find a spare hard drive, and make some
 crash dumps for the list's benefit.
 
   Thanks for all the responses I've gotten on this subject!  They're
 greatly appreciated and help me maintain my sanity. :)
 
   --Bart
 
 
 
 To Unsubscribe: send mail to [EMAIL PROTECTED]
 with "unsubscribe freebsd-stable" in the body of the message
 

-- 
... every activity meets with opposition, everyone who acts has his
rivals and unfortunately opponents also.  But not because people want
to be opponents, rather because the tasks and relationships force
people to take different points of view.  [Dr. Fritz Todt]
   V I C T O R Y   N O T   V E N G E A N C E




To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-stable" in the body of the message



Re: More panics (different hardware)

2000-09-28 Thread BSD

On Thu, 28 Sep 2000, Mike Smith wrote:
 Please press the 'scroll lock' key to scroll upwards and report the real 
 error message.  Before you post it, take a few minutes to check the 
 handbook section on kernel debugging and try giving us enough 
 information to actually help you.
 
 You wouldn't ring your doctor up and say "Hey doc, I hurt" and expect him 
 to tell you what's wrong.  Why do you subject us to a comparable form of 
 abuse?

Because I didn't know about the scroll-lock key functionality, and
I'm not a debugging pro.  I will take the steps you've mentioned though,
and provide you with the appropriate information.  Thanks for the tips.

--Bart




To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-stable" in the body of the message