Re: 5.4-p1 crash

2005-06-28 Thread Doug White
On Sat, 25 Jun 2005, Mitch Parks wrote:

> On Sun, 19 Jun 2005, Robert Watson wrote:
>
> >> there is a PR for it : kern/74319
> >
> > This sounds very similar to a serial console related tty bug I was
> > experiencing on -STABLE a few months ago, and that is believed may have been
> > worked around in 5.4 tweaks before release.  In particular, that there are
> > reference counting related bugs in the 5.x tty code that are fixed by a
> > partial rewrite of the tty code in 6.x, but that are too large and 
> > disruptive
> > to merge to RELENG_5.  If the problem is persisting, it may be worth trying
> > to merge anyway, but it is a pretty big change and would break device driver
> > binary compatibility, etc.  What we might want to do here is wait until 6.x
> > has settled out a bit more, then consider merging it to 5.x once 6.x has
> > gotten burned in with similar workloads and continued to not illustrate the
> > 5.x tty reference bugs.
>
> On Fri, 24 Jun 2005, Doug White wrote:
>
> > I've run out of time to debug this, unfortunately...
>
> I went back to reports I made in January about 5.3:
> http://lists.freebsd.org/pipermail/freebsd-stable/2005-January/010898.html
> which appears to be the same issue. I _thought_ this was resolved when I
> disabled HT, but maybe I was just lucky between the last reboot and the 5.4
> upgrade.
>
> It sounds like there's a reasonable chance this has been squashed in code
> for 6.0? Since this box is already unstable, I'd be tempted to be an early 6
> adopter to see if it is actually resolved. Especially so if that would be
> helpful to the cause.

6.x is not affected by the tty-related problems that 5.x is having.

-- 
Doug White|  FreeBSD: The Power to Serve
[EMAIL PROTECTED]  |  www.FreeBSD.org
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


5.4-p1 crash KERN/74319

2005-06-27 Thread Mitch Parks
Not to sound like a broken record, but I had another ttwakeup crash last 
night. Is this more of a problem with SMP systems? Would it be more stable 
without SMP? Or without ACPI? I'll try anything at this point.


Mitch Parks

#0  doadump () at pcpu.h:159
#1  0xc05357d7 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:410
#2  0xc0535afd in panic (fmt=0xc068b12f "%s") at 
/usr/src/sys/kern/kern_shutdown.c:566
#3  0xc06633b4 in trap_fatal (frame=0xe8e6599c, eva=1127) at 
/usr/src/sys/i386/i386/trap.c:817

#4  0xc06630f7 in trap_pfault (frame=0xe8e6599c, usermode=0, eva=1127)
at /usr/src/sys/i386/i386/trap.c:735
#5  0xc0662d51 in trap (frame=
  {tf_fs = -1014628328, tf_es = -65520, tf_ds = -1014628336, tf_edi = 
-943569360, tf_esi =
-1001365504, tf_ebp = -387556900, tf_isp = -387556920, tf_ebx = -950459904, 
tf_edx = 1, tf_ecx
= -950459904, tf_eax = 1039, tf_trapno = 12, tf_err = 0, tf_eip = 
-1068078327, tf_cs = 8, tf_eflags = 66182, tf_esp = -387556888, tf_ss = 
-1068094158}) at /usr/src/sys/i386/i386/trap.c:425

#6  0xc06513ea in calltrap () at /usr/src/sys/i386/i386/exception.s:140
#7  0xc3860018 in ?? ()
#8  0x0010 in ?? ()
#9  0xc3860010 in ?? ()
#10 0xc7c24630 in ?? ()
#11 0xc4506000 in ?? ()
#12 0xe8e659dc in ?? ()
#13 0xe8e659c8 in ?? ()
#14 0xc7592200 in ?? ()
#15 0x0001 in ?? ()
#16 0xc7592200 in ?? ()
#17 0x040f in ?? ()
#18 0x000c in ?? ()
#19 0x in ?? ()
#20 0xc0566b09 in ptsstart (tp=0x0) at /usr/src/sys/kern/tty_pty.c:249
#21 0xc0562d32 in ttstart (tp=0x0) at /usr/src/sys/kern/tty.c:1567
#22 0xc0562d9d in ttymodem (tp=0xc7592200, flag=0) at 
/usr/src/sys/kern/tty.c:1601
#23 0xc0566beb in ptcopen (dev=0xc4506000, flag=3, devtype=8192, td=0x0) at 
linedisc.h:136
#24 0xc04f9f66 in spec_open (ap=0xe8e65a80) at 
/usr/src/sys/fs/specfs/spec_vnops.c:207
#25 0xc04f9cab in spec_vnoperate (ap=0x0) at 
/usr/src/sys/fs/specfs/spec_vnops.c:118
#26 0xc0594985 in vn_open_cred (ndp=0xe8e65be4, flagp=0xe8e65ce4, cmode=0, 
cred=0xc42fb480,

fdidx=0) at vnode_if.h:228
#27 0xc059456a in vn_open (ndp=0x0, flagp=0xe8e65ce4, cmode=0, fdidx=10)
at /usr/src/sys/kern/vfs_vnops.c:91
#28 0xc058e417 in kern_open (td=0xc3ff2180, path=0x0, pathseg=UIO_USERSPACE, 
flags=3, mode=0)

at /usr/src/sys/kern/vfs_syscalls.c:957
#29 0xc058e328 in open (td=0xc3ff2180, uap=0x0) at 
/usr/src/sys/kern/vfs_syscalls.c:926

#30 0xc06636ef in syscall (frame=
  {tf_fs = -1078001617, tf_es = 134873135, tf_ds = -1078001617, tf_edi = 
-1, tf_esi = 67215
6717, tf_ebp = -1077960312, tf_isp = -387555980, tf_ebx = 672163936, tf_edx 
= 672156751, tf_ecx
 = 673073932, tf_eax = 5, tf_trapno = 12, tf_err = 2, tf_eip = 672581307, 
tf_cs = 31, tf_eflags = 662, tf_esp = -1077960404, tf_ss = 47}) at 
/usr/src/sys/i386/i386/trap.c:1009
#31 0xc065143f in Xint0x80_syscall () at 
/usr/src/sys/i386/i386/exception.s:201

#32 0xbfbf002f in ?? ()
#33 0x080a002f in ?? ()
#34 0xbfbf002f in ?? ()
#35 0x in ?? ()
#36 0x28104c2d in ?? ()
#37 0xbfbfa188 in ?? ()
#38 0xe8e65d74 in ?? ()
#39 0x28106860 in ?? ()
#40 0x28104c4f in ?? ()
#41 0x281e4b0c in ?? ()
#42 0x0005 in ?? ()
#43 0x000c in ?? ()
#44 0x0002 in ?? ()
#45 0x2816c6bb in ?? ()
#46 0x001f in ?? ()
#47 0x0296 in ?? ()
#48 0xbfbfa12c in ?? ()
#49 0x002f in ?? ()
#50 0x in ?? ()
#51 0x in ?? ()
#52 0x in ?? ()
#53 0x in ?? ()
#54 0x73b98000 in ?? ()
#55 0xc3bff54c in ?? ()
#56 0xc3ff2180 in ?? ()
#57 0xe8e6585c in ?? ()
#58 0xe8e65844 in ?? ()
#59 0xc34dd780 in ?? ()
#60 0xc0545d9f in sched_switch (td=0x28104c2d, newtd=0x28106860, 
flags=Cannot access memory at address 0xbfbfa198

)
at /usr/src/sys/kern/sched_4bsd.c:881
Previous frame inner to this frame (corrupt stack?)



Copyright (c) 1992-2005 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
The Regents of the University of California. All rights reserved.
FreeBSD 5.4-RELEASE-p1 #19: Sun Jun 19 17:32:16 PDT 2005
[EMAIL PROTECTED]:/usr/obj/usr/src/sys/kuoi
Timecounter "i8254" frequency 1193182 Hz quality 0
CPU: Intel(R) Xeon(TM) CPU 2.80GHz (2791.00-MHz 686-class CPU)
  Origin = "GenuineIntel"  Id = 0xf29  Stepping = 9
  
Features=0xbfebfbff
real memory  = 2147287040 (2047 MB)
avail memory = 2095947776 (1998 MB)
ACPI APIC Table: 
FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs
 cpu0 (BSP): APIC ID:  0
 cpu1 (AP): APIC ID:  6
ioapic0: Changing APIC ID to 8
ioapic1: Changing APIC ID to 9
ioapic2: Changing APIC ID to 10
ioapic2: WARNING: intbase 72 != expected base 48
ioapic3: Changing APIC ID to 11
ioapic3: WARNING: intbase 120 != expected base 96
ioapic4: Changing APIC ID to 12
ioapic0  irqs 0-23 on motherboard
ioapic1  irqs 24-47 on motherboard
ioapic2  irqs 72-95 on motherboard
ioapic3  irqs 120-143 on motherboard
ioapic4  irqs 144-167 on motherboard
npx0:  on motherboard
npx0: INT 16 interface
acpi0:  on motherboard
acpi0: Power Button (fixed)
Timecounter "ACPI-fast" frequen

Re: 5.4-p1 crash

2005-06-25 Thread Mitch Parks

On Sun, 19 Jun 2005, Robert Watson wrote:


there is a PR for it : kern/74319


This sounds very similar to a serial console related tty bug I was 
experiencing on -STABLE a few months ago, and that is believed may have been 
worked around in 5.4 tweaks before release.  In particular, that there are 
reference counting related bugs in the 5.x tty code that are fixed by a 
partial rewrite of the tty code in 6.x, but that are too large and disruptive 
to merge to RELENG_5.  If the problem is persisting, it may be worth trying 
to merge anyway, but it is a pretty big change and would break device driver 
binary compatibility, etc.  What we might want to do here is wait until 6.x 
has settled out a bit more, then consider merging it to 5.x once 6.x has 
gotten burned in with similar workloads and continued to not illustrate the 
5.x tty reference bugs.


On Fri, 24 Jun 2005, Doug White wrote:


I've run out of time to debug this, unfortunately...


I went back to reports I made in January about 5.3:
http://lists.freebsd.org/pipermail/freebsd-stable/2005-January/010898.html
which appears to be the same issue. I _thought_ this was resolved when I 
disabled HT, but maybe I was just lucky between the last reboot and the 5.4 
upgrade.


It sounds like there's a reasonable chance this has been squashed in code 
for 6.0? Since this box is already unstable, I'd be tempted to be an early 6 
adopter to see if it is actually resolved. Especially so if that would be 
helpful to the cause.


Otherwise, I guess I need to look at going back to 4.X or 5.2.1, which were 
completely stable on this box. I have time to deal with this over the next 6 
weeks, but much less so after that.


Any suggestions?

Mitch Parks
[EMAIL PROTECTED]

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: 5.4-p1 crash

2005-06-24 Thread Doug White
On Mon, 20 Jun 2005, Philippe PEGON wrote:

> Philippe PEGON wrote:
> > Mitch Parks wrote:
> >
> >> On Sun, 19 Jun 2005, Doug White wrote:
> >>
> >>> On Fri, 17 Jun 2005, Mitch Parks wrote:
> >>>
>  Below are details regarding another crash on a Dell 2600 SMP (HTT
>  and USB
>  disabled). It has been 9 days since the last crash. I didn't have
>  the serial
>  console in place for this last crash, but it is now.
> >>>
> >>>
> >>>
> >>> As noted, the ttwakeup() panic is a known bug. The best thing we have
> >>> for
> >>> a fix is this patch:
> >>>
> >>> http://people.freebsd.org/~mlaier/tty.t_pgrp.diff
> >>>
> >>> Please give it a try and report back if you have any more panics (or
> >>> don't :-) ).
> >>
> >>
> >>
> >> Thanks! This patch appears to be for 5.3, but I manually applied the
> >> chunk of the patch that didn't apply cleanly and the countdown is on.
> >>
> >> I'll report back in 10 days unless something bad happens before then.
> >>
> >> Below is the patch chunk #10 that I actually applied rather than the
> >> one given. If I've done something bad here by removing the PGRP_LOCK
> >> please let me know.
> >
> >
> > I'm not a kernel developper, but if you remove
> >
> > PGRP_LOCK(tp->t_pgrp);
> >
> > and the PGRP_UNLOCK(tp->t_pgrp) in the if condition (removed by the
> > orginal patch)
> >
> > there is maybe another "PGRP_UNLOCK(tp->t_pgrp);" to remove if the if
> > condition doesn't match, line 2528 in the original 5.4-p1 tty.c ?
>
> after having applied the patch (with your modification), there is no
> "sx_sunlock(&proctree_lock)" in the ttyinfo function if the three
> conditions failed. Maybe we have just to replace
> "PGRP_UNLOCK(tp->t_pgrp);" line 2528 by "sx_sunlock(&proctree_lock)" ?
> I think that we need the helps of a kernel developper.

No, that would be a leaked lock, which would cause hangs.  More likely its
some other case that got missed that needs locks extended to it, or the
aliased pgrp isn't the underlying problem.

I've run out of time to debug this, unfortunately...

>
> >
> >>
> >> 
> >> Hunk #6 succeeded at 1154 (offset -51 lines).
> >> Hunk #7 succeeded at 1215 (offset -6 lines).
> >> Hunk #8 succeeded at 1203 (offset -51 lines).
> >> Hunk #9 succeeded at 1946 (offset -5 lines).
> >> Hunk #10 failed at 2562.
> >> Hunk #11 succeeded at 2847 (offset -212 lines).
> >> 1 out of 11 hunks failed--saving rejects to tty.c.rej
> >>
> >>
> >> @@ -2495,19 +2511,21 @@
> >>  * On return following a ttyprintf(), we set tp->t_rocount to
> >> 0 so
> >>  * that pending input will be retyped on BS.
> >>  */
> >> +   sx_slock(&proctree_lock);
> >> if (tp->t_session == NULL) {
> >> +   sx_sunlock(&proctree_lock);
> >> ttyprintf(tp, "not a controlling terminal\n");
> >> tp->t_rocount = 0;
> >> return;
> >> }
> >> if (tp->t_pgrp == NULL) {
> >> +   sx_sunlock(&proctree_lock);
> >> ttyprintf(tp, "no foreground process group\n");
> >> tp->t_rocount = 0;
> >> return;
> >> }
> >> -   PGRP_LOCK(tp->t_pgrp);
> >> -   if ((p = LIST_FIRST(&tp->t_pgrp->pg_members)) == 0) {
> >> -   PGRP_UNLOCK(tp->t_pgrp);
> >> +   if ((p = LIST_FIRST(&tp->t_pgrp->pg_members)) == NULL) {
> >> +   sx_sunlock(&proctree_lock);
> >> ttyprintf(tp, "empty foreground process group\n");
> >> tp->t_rocount = 0;
> >> return;
> >>
> >> Or the complete patch:
> >> http://kuoi.asui.uidaho.edu/~mitch/crash/tty_5.4.patch
> >>
> >> Mitch Parks
> >> [EMAIL PROTECTED]
> >> ___
> >> freebsd-stable@freebsd.org mailing list
> >> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> >> To unsubscribe, send any mail to "[EMAIL PROTECTED]"
> >
> >
>
>
>

-- 
Doug White|  FreeBSD: The Power to Serve
[EMAIL PROTECTED]  |  www.FreeBSD.org
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: 5.4-p1 crash

2005-06-20 Thread Philippe PEGON

Mitch Parks a écrit :

On Sun, 19 Jun 2005, Mitch Parks wrote:


On Sun, 19 Jun 2005, Doug White wrote:



As noted, the ttwakeup() panic is a known bug. The best thing we have 
for

a fix is this patch:

http://people.freebsd.org/~mlaier/tty.t_pgrp.diff

Please give it a try and report back if you have any more panics (or
don't :-) ).



I'll report back in 10 days unless something bad happens before then.



*sigh* Ok, I'm back too soon. Suggestions?


same thing for me




Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address= 0x4296bad0
fault code= supervisor write, page not present
instruction pointer= 0x8:0xc055740e
stack pointer= 0x10:0xe8f6e9b8
frame pointer= 0x10:0xe8f6e9c0
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, def32 1, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process= 34338 (sshd)
trap number= 12
panic: page fault
cpuid = 0
boot() called on cpu#0
Uptime: 17h9m7s
Dumping 2047 MB
...

#0  doadump () at pcpu.h:159
159 __asm __volatile("movl %%fs:0,%0" : "=r" (td));

#0  doadump () at pcpu.h:159
#1  0xc05357d7 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:410
#2  0xc0535afd in panic (fmt=0xc068b12f "%s")
at /usr/src/sys/kern/kern_shutdown.c:566
#3  0xc06633b4 in trap_fatal (frame=0xe8f6e978, eva=1117174480)
at /usr/src/sys/i386/i386/trap.c:817
#4  0xc06630f7 in trap_pfault (frame=0xe8f6e978, usermode=0, 
eva=1117174480)

at /usr/src/sys/i386/i386/trap.c:735
#5  0xc0662d51 in trap (frame=
  {tf_fs = -1068367848, tf_es = -386531312, tf_ds = 16777232, tf_edi 
= -9965
94328, tf_esi = 1117174476, tf_ebp = -386471488, tf_isp = -386471516, 
tf_ebx = -
1003267468, tf_edx = 1117174476, tf_ecx = -1066423096, tf_eax = 0, 
tf_trapno = 1
2, tf_err = 2, tf_eip = -1068141554, tf_cs = 8, tf_eflags = 66054, 
tf_esp = -1003267584, tf_ss = -1003279104}) at 
/usr/src/sys/i386/i386/trap.c:425

#6  0xc06513ea in calltrap () at /usr/src/sys/i386/i386/exception.s:140
#7  0xc0520018 in fork1 (td=0xc4335a74, flags=89, pages=-386471452,
procp=0xc056425d) at atomic.h:154
#8  0xc0557362 in selwakeuppri (sip=0xc4335a74, pri=89)
at /usr/src/sys/kern/sys_generic.c:1056
#9  0xc056425d in ttwakeup (tp=0x10206) at /usr/src/sys/kern/tty.c:2382
#10 0xc0562ee0 in ttymodem (tp=0xc4335a00, flag=0)
at /usr/src/sys/kern/tty.c:1639
#11 0xc0566beb in ptcopen (dev=0xc4332d00, flag=3, devtype=8192, td=0x0)
at linedisc.h:136
#12 0xc04f9f66 in spec_open (ap=0xe8f6ea80)
at /usr/src/sys/fs/specfs/spec_vnops.c:207
#13 0xc04f9cab in spec_vnoperate (ap=0x0)
at /usr/src/sys/fs/specfs/spec_vnops.c:118
#14 0xc0594985 in vn_open_cred (ndp=0xe8f6ebe4, flagp=0xe8f6ece4, cmode=0,
cred=0xc3853880, fdidx=0) at vnode_if.h:228
#15 0xc059456a in vn_open (ndp=0x0, flagp=0xe8f6ece4, cmode=0, fdidx=3)
at /usr/src/sys/kern/vfs_vnops.c:91
#16 0xc058e417 in kern_open (td=0xc41f5d80, path=0x0, 
pathseg=UIO_USERSPACE,

flags=3, mode=0) at /usr/src/sys/kern/vfs_syscalls.c:957
#17 0xc058e328 in open (td=0xc41f5d80, uap=0x0)
at /usr/src/sys/kern/vfs_syscalls.c:926
#18 0xc06636ef in syscall (frame=
  {tf_fs = 47, tf_es = 47, tf_ds = 47, tf_edi = -1, tf_esi = 
671951917, tf_e
bp = -1077943096, tf_isp = -386470540, tf_ebx = 671959136, tf_edx = 
671951944, t
f_ecx = 674495244, tf_eax = 5, tf_trapno = 12, tf_err = 2, tf_eip = 
674002619, tf_cs = 31, tf_eflags = 658, tf_esp = -1077943188, tf_ss = 47})

at /usr/src/sys/i386/i386/trap.c:1009
#19 0xc065143f in Xint0x80_syscall () at 
/usr/src/sys/i386/i386/exception.s:201

#20 0x002f in ?? ()
#21 0x002f in ?? ()
#22 0x002f in ?? ()
#23 0x in ?? ()
#24 0x280d2c2d in ?? ()
#25 0xbfbfe4c8 in ?? ()
#26 0xe8f6ed74 in ?? ()
#27 0x280d4860 in ?? ()
#28 0x280d2c48 in ?? ()
#29 0x2833fb0c in ?? ()
#30 0x0005 in ?? ()
#31 0x000c in ?? ()
#32 0x0002 in ?? ()
#33 0x282c76bb in ?? ()
#34 0x001f in ?? ()
#35 0x0292 in ?? ()
#36 0xbfbfe46c in ?? ()
#37 0x002f in ?? ()
#38 0x in ?? ()
#39 0x in ?? ()
#40 0x in ?? ()
#41 0x in ?? ()
#42 0x6701c000 in ?? ()
#43 0xc41fac5c in ?? ()
#44 0xc41f5d80 in ?? ()
#45 0xe8f6eb34 in ?? ()
#46 0xe8f6eb1c in ?? ()
#47 0xc347f600 in ?? ()
#48 0xc0545d9f in sched_switch (td=0x280d2c2d, newtd=0x280d4860, 
flags=Cannot access memory at address 0xbfbfe4d8

)
at /usr/src/sys/kern/sched_4bsd.c:881



___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: 5.4-p1 crash

2005-06-20 Thread Philippe PEGON

Philippe PEGON wrote:

Mitch Parks wrote:


On Sun, 19 Jun 2005, Doug White wrote:


On Fri, 17 Jun 2005, Mitch Parks wrote:

Below are details regarding another crash on a Dell 2600 SMP (HTT 
and USB
disabled). It has been 9 days since the last crash. I didn't have 
the serial

console in place for this last crash, but it is now.




As noted, the ttwakeup() panic is a known bug. The best thing we have 
for

a fix is this patch:

http://people.freebsd.org/~mlaier/tty.t_pgrp.diff

Please give it a try and report back if you have any more panics (or
don't :-) ).




Thanks! This patch appears to be for 5.3, but I manually applied the 
chunk of the patch that didn't apply cleanly and the countdown is on.


I'll report back in 10 days unless something bad happens before then.

Below is the patch chunk #10 that I actually applied rather than the 
one given. If I've done something bad here by removing the PGRP_LOCK 
please let me know.



I'm not a kernel developper, but if you remove

PGRP_LOCK(tp->t_pgrp);

and the PGRP_UNLOCK(tp->t_pgrp) in the if condition (removed by the 
orginal patch)


there is maybe another "PGRP_UNLOCK(tp->t_pgrp);" to remove if the if 
condition doesn't match, line 2528 in the original 5.4-p1 tty.c ?


after having applied the patch (with your modification), there is no 
"sx_sunlock(&proctree_lock)" in the ttyinfo function if the three 
conditions failed. Maybe we have just to replace 
"PGRP_UNLOCK(tp->t_pgrp);" line 2528 by "sx_sunlock(&proctree_lock)" ?

I think that we need the helps of a kernel developper.






Hunk #6 succeeded at 1154 (offset -51 lines).
Hunk #7 succeeded at 1215 (offset -6 lines).
Hunk #8 succeeded at 1203 (offset -51 lines).
Hunk #9 succeeded at 1946 (offset -5 lines).
Hunk #10 failed at 2562.
Hunk #11 succeeded at 2847 (offset -212 lines).
1 out of 11 hunks failed--saving rejects to tty.c.rej


@@ -2495,19 +2511,21 @@
 * On return following a ttyprintf(), we set tp->t_rocount to 
0 so

 * that pending input will be retyped on BS.
 */
+   sx_slock(&proctree_lock);
if (tp->t_session == NULL) {
+   sx_sunlock(&proctree_lock);
ttyprintf(tp, "not a controlling terminal\n");
tp->t_rocount = 0;
return;
}
if (tp->t_pgrp == NULL) {
+   sx_sunlock(&proctree_lock);
ttyprintf(tp, "no foreground process group\n");
tp->t_rocount = 0;
return;
}
-   PGRP_LOCK(tp->t_pgrp);
-   if ((p = LIST_FIRST(&tp->t_pgrp->pg_members)) == 0) {
-   PGRP_UNLOCK(tp->t_pgrp);
+   if ((p = LIST_FIRST(&tp->t_pgrp->pg_members)) == NULL) {
+   sx_sunlock(&proctree_lock);
ttyprintf(tp, "empty foreground process group\n");
tp->t_rocount = 0;
return;

Or the complete patch:
http://kuoi.asui.uidaho.edu/~mitch/crash/tty_5.4.patch

Mitch Parks
[EMAIL PROTECTED]
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"






--
Philippe PEGON
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: 5.4-p1 crash

2005-06-20 Thread Mitch Parks

On Sun, 19 Jun 2005, Mitch Parks wrote:


On Sun, 19 Jun 2005, Doug White wrote:



As noted, the ttwakeup() panic is a known bug. The best thing we have for
a fix is this patch:

http://people.freebsd.org/~mlaier/tty.t_pgrp.diff

Please give it a try and report back if you have any more panics (or
don't :-) ).


I'll report back in 10 days unless something bad happens before then.


*sigh* Ok, I'm back too soon. Suggestions?


Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address   = 0x4296bad0
fault code  = supervisor write, page not present
instruction pointer = 0x8:0xc055740e
stack pointer   = 0x10:0xe8f6e9b8
frame pointer   = 0x10:0xe8f6e9c0
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, def32 1, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process = 34338 (sshd)
trap number = 12
panic: page fault
cpuid = 0
boot() called on cpu#0
Uptime: 17h9m7s
Dumping 2047 MB
...

#0  doadump () at pcpu.h:159
159 __asm __volatile("movl %%fs:0,%0" : "=r" (td));

#0  doadump () at pcpu.h:159
#1  0xc05357d7 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:410
#2  0xc0535afd in panic (fmt=0xc068b12f "%s")
at /usr/src/sys/kern/kern_shutdown.c:566
#3  0xc06633b4 in trap_fatal (frame=0xe8f6e978, eva=1117174480)
at /usr/src/sys/i386/i386/trap.c:817
#4  0xc06630f7 in trap_pfault (frame=0xe8f6e978, usermode=0, eva=1117174480)
at /usr/src/sys/i386/i386/trap.c:735
#5  0xc0662d51 in trap (frame=
  {tf_fs = -1068367848, tf_es = -386531312, tf_ds = 16777232, tf_edi = 
-9965
94328, tf_esi = 1117174476, tf_ebp = -386471488, tf_isp = -386471516, tf_ebx 
= -
1003267468, tf_edx = 1117174476, tf_ecx = -1066423096, tf_eax = 0, tf_trapno 
= 1
2, tf_err = 2, tf_eip = -1068141554, tf_cs = 8, tf_eflags = 66054, tf_esp = 
-1003267584, tf_ss = -1003279104}) at /usr/src/sys/i386/i386/trap.c:425

#6  0xc06513ea in calltrap () at /usr/src/sys/i386/i386/exception.s:140
#7  0xc0520018 in fork1 (td=0xc4335a74, flags=89, pages=-386471452,
procp=0xc056425d) at atomic.h:154
#8  0xc0557362 in selwakeuppri (sip=0xc4335a74, pri=89)
at /usr/src/sys/kern/sys_generic.c:1056
#9  0xc056425d in ttwakeup (tp=0x10206) at /usr/src/sys/kern/tty.c:2382
#10 0xc0562ee0 in ttymodem (tp=0xc4335a00, flag=0)
at /usr/src/sys/kern/tty.c:1639
#11 0xc0566beb in ptcopen (dev=0xc4332d00, flag=3, devtype=8192, td=0x0)
at linedisc.h:136
#12 0xc04f9f66 in spec_open (ap=0xe8f6ea80)
at /usr/src/sys/fs/specfs/spec_vnops.c:207
#13 0xc04f9cab in spec_vnoperate (ap=0x0)
at /usr/src/sys/fs/specfs/spec_vnops.c:118
#14 0xc0594985 in vn_open_cred (ndp=0xe8f6ebe4, flagp=0xe8f6ece4, cmode=0,
cred=0xc3853880, fdidx=0) at vnode_if.h:228
#15 0xc059456a in vn_open (ndp=0x0, flagp=0xe8f6ece4, cmode=0, fdidx=3)
at /usr/src/sys/kern/vfs_vnops.c:91
#16 0xc058e417 in kern_open (td=0xc41f5d80, path=0x0, pathseg=UIO_USERSPACE,
flags=3, mode=0) at /usr/src/sys/kern/vfs_syscalls.c:957
#17 0xc058e328 in open (td=0xc41f5d80, uap=0x0)
at /usr/src/sys/kern/vfs_syscalls.c:926
#18 0xc06636ef in syscall (frame=
  {tf_fs = 47, tf_es = 47, tf_ds = 47, tf_edi = -1, tf_esi = 671951917, 
tf_e
bp = -1077943096, tf_isp = -386470540, tf_ebx = 671959136, tf_edx = 
671951944, t
f_ecx = 674495244, tf_eax = 5, tf_trapno = 12, tf_err = 2, tf_eip = 
674002619, tf_cs = 31, tf_eflags = 658, tf_esp = -1077943188, tf_ss = 47})

at /usr/src/sys/i386/i386/trap.c:1009
#19 0xc065143f in Xint0x80_syscall () at 
/usr/src/sys/i386/i386/exception.s:201

#20 0x002f in ?? ()
#21 0x002f in ?? ()
#22 0x002f in ?? ()
#23 0x in ?? ()
#24 0x280d2c2d in ?? ()
#25 0xbfbfe4c8 in ?? ()
#26 0xe8f6ed74 in ?? ()
#27 0x280d4860 in ?? ()
#28 0x280d2c48 in ?? ()
#29 0x2833fb0c in ?? ()
#30 0x0005 in ?? ()
#31 0x000c in ?? ()
#32 0x0002 in ?? ()
#33 0x282c76bb in ?? ()
#34 0x001f in ?? ()
#35 0x0292 in ?? ()
#36 0xbfbfe46c in ?? ()
#37 0x002f in ?? ()
#38 0x in ?? ()
#39 0x in ?? ()
#40 0x in ?? ()
#41 0x in ?? ()
#42 0x6701c000 in ?? ()
#43 0xc41fac5c in ?? ()
#44 0xc41f5d80 in ?? ()
#45 0xe8f6eb34 in ?? ()
#46 0xe8f6eb1c in ?? ()
#47 0xc347f600 in ?? ()
#48 0xc0545d9f in sched_switch (td=0x280d2c2d, newtd=0x280d4860, 
flags=Cannot access memory at address 0xbfbfe4d8

)
at /usr/src/sys/kern/sched_4bsd.c:881



___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: 5.4-p1 crash

2005-06-20 Thread Philippe PEGON

Mitch Parks wrote:

On Sun, 19 Jun 2005, Doug White wrote:


On Fri, 17 Jun 2005, Mitch Parks wrote:

Below are details regarding another crash on a Dell 2600 SMP (HTT and 
USB
disabled). It has been 9 days since the last crash. I didn't have the 
serial

console in place for this last crash, but it is now.



As noted, the ttwakeup() panic is a known bug. The best thing we have for
a fix is this patch:

http://people.freebsd.org/~mlaier/tty.t_pgrp.diff

Please give it a try and report back if you have any more panics (or
don't :-) ).



Thanks! This patch appears to be for 5.3, but I manually applied the 
chunk of the patch that didn't apply cleanly and the countdown is on.


I'll report back in 10 days unless something bad happens before then.

Below is the patch chunk #10 that I actually applied rather than the one 
given. If I've done something bad here by removing the PGRP_LOCK please 
let me know.


I'm not a kernel developper, but if you remove

PGRP_LOCK(tp->t_pgrp);

and the PGRP_UNLOCK(tp->t_pgrp) in the if condition (removed by the 
orginal patch)


there is maybe another "PGRP_UNLOCK(tp->t_pgrp);" to remove if the if 
condition doesn't match, line 2528 in the original 5.4-p1 tty.c ?





Hunk #6 succeeded at 1154 (offset -51 lines).
Hunk #7 succeeded at 1215 (offset -6 lines).
Hunk #8 succeeded at 1203 (offset -51 lines).
Hunk #9 succeeded at 1946 (offset -5 lines).
Hunk #10 failed at 2562.
Hunk #11 succeeded at 2847 (offset -212 lines).
1 out of 11 hunks failed--saving rejects to tty.c.rej


@@ -2495,19 +2511,21 @@
 * On return following a ttyprintf(), we set tp->t_rocount to 0 so
 * that pending input will be retyped on BS.
 */
+   sx_slock(&proctree_lock);
if (tp->t_session == NULL) {
+   sx_sunlock(&proctree_lock);
ttyprintf(tp, "not a controlling terminal\n");
tp->t_rocount = 0;
return;
}
if (tp->t_pgrp == NULL) {
+   sx_sunlock(&proctree_lock);
ttyprintf(tp, "no foreground process group\n");
tp->t_rocount = 0;
return;
}
-   PGRP_LOCK(tp->t_pgrp);
-   if ((p = LIST_FIRST(&tp->t_pgrp->pg_members)) == 0) {
-   PGRP_UNLOCK(tp->t_pgrp);
+   if ((p = LIST_FIRST(&tp->t_pgrp->pg_members)) == NULL) {
+   sx_sunlock(&proctree_lock);
ttyprintf(tp, "empty foreground process group\n");
tp->t_rocount = 0;
return;

Or the complete patch:
http://kuoi.asui.uidaho.edu/~mitch/crash/tty_5.4.patch

Mitch Parks
[EMAIL PROTECTED]
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


--
Philippe PEGON
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: 5.4-p1 crash

2005-06-19 Thread Mitch Parks

On Sun, 19 Jun 2005, Doug White wrote:


On Fri, 17 Jun 2005, Mitch Parks wrote:

Below are details regarding another crash on a Dell 2600 SMP (HTT and USB
disabled). It has been 9 days since the last crash. I didn't have the serial
console in place for this last crash, but it is now.


As noted, the ttwakeup() panic is a known bug. The best thing we have for
a fix is this patch:

http://people.freebsd.org/~mlaier/tty.t_pgrp.diff

Please give it a try and report back if you have any more panics (or
don't :-) ).


Thanks! This patch appears to be for 5.3, but I manually applied the chunk 
of the patch that didn't apply cleanly and the countdown is on.


I'll report back in 10 days unless something bad happens before then.

Below is the patch chunk #10 that I actually applied rather than the one 
given. If I've done something bad here by removing the PGRP_LOCK please let 
me know.



Hunk #6 succeeded at 1154 (offset -51 lines).
Hunk #7 succeeded at 1215 (offset -6 lines).
Hunk #8 succeeded at 1203 (offset -51 lines).
Hunk #9 succeeded at 1946 (offset -5 lines).
Hunk #10 failed at 2562.
Hunk #11 succeeded at 2847 (offset -212 lines).
1 out of 11 hunks failed--saving rejects to tty.c.rej


@@ -2495,19 +2511,21 @@
 * On return following a ttyprintf(), we set tp->t_rocount to 0 so
 * that pending input will be retyped on BS.
 */
+   sx_slock(&proctree_lock);
if (tp->t_session == NULL) {
+   sx_sunlock(&proctree_lock);
ttyprintf(tp, "not a controlling terminal\n");
tp->t_rocount = 0;
return;
}
if (tp->t_pgrp == NULL) {
+   sx_sunlock(&proctree_lock);
ttyprintf(tp, "no foreground process group\n");
tp->t_rocount = 0;
return;
}
-   PGRP_LOCK(tp->t_pgrp);
-   if ((p = LIST_FIRST(&tp->t_pgrp->pg_members)) == 0) {
-   PGRP_UNLOCK(tp->t_pgrp);
+   if ((p = LIST_FIRST(&tp->t_pgrp->pg_members)) == NULL) {
+   sx_sunlock(&proctree_lock);
ttyprintf(tp, "empty foreground process group\n");
tp->t_rocount = 0;
return;

Or the complete patch:
http://kuoi.asui.uidaho.edu/~mitch/crash/tty_5.4.patch

Mitch Parks
[EMAIL PROTECTED]
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: 5.4-p1 crash

2005-06-19 Thread Doug White
On Fri, 17 Jun 2005, Mitch Parks wrote:

> Below are details regarding another crash on a Dell 2600 SMP (HTT and USB
> disabled). It has been 9 days since the last crash. I didn't have the serial
> console in place for this last crash, but it is now.

As noted, the ttwakeup() panic is a known bug. The best thing we have for
a fix is this patch:

http://people.freebsd.org/~mlaier/tty.t_pgrp.diff

Please give it a try and report back if you have any more panics (or
don't :-) ).

-- 
Doug White|  FreeBSD: The Power to Serve
[EMAIL PROTECTED]  |  www.FreeBSD.org
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: 5.4-p1 crash

2005-06-19 Thread Philippe PEGON

Robert Watson a écrit :
This sounds very similar to a serial console related tty bug I was 
experiencing on -STABLE a few months ago, and that is believed may have 
been worked around in 5.4 tweaks before release.  In particular, that 
there are reference counting related bugs in the 5.x tty code that are 
fixed by a partial rewrite of the tty code in 6.x, but that are too 
large and disruptive to merge to RELENG_5.  If the problem is 
persisting, it may be worth trying to merge anyway, but it is a pretty 
big change and would break device driver binary compatibility, etc.  
What we might want to do here is wait until 6.x has settled out a bit 
more, then consider merging it to 5.x once 6.x has gotten burned in with 
similar workloads and continued to not illustrate the 5.x tty reference 
bugs.


Thanks for your answer.
Like I said on anothers posts, we have a FreeBSD 5.4-p1 which connects every fifteen minutes with an 
expect program to a lot of network devices for retrieving some informations, it seems that it is the 
culprit, the server crashed almost everyday. We reduced the frequency to one per hour and that 
attenuates the problem.
This panic is easy to reproduce with this simple expect program (see below) by running it 6 times 
simultaneously and waiting a few hours, I tested it on a HP DL360 with 2 cpu. If that can help, I 
can test this on current next week.



#! /usr/local/bin/expect

set timeout 60
set host [lindex $argv 0]

set pass "PASSWORD"

spawn ssh [EMAIL PROTECTED]

expect {
  "continue*(yes/no)" { send "yes\r" ; exp_continue }
  "assword:" { send "$pass\r" }
}

expect "*# " {
  send "ls\r"
}
expect "*#" {
  send "exit\r"
}

puts "Done."



Robert N M Watson


--
Philippe PEGON
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: 5.4-p1 crash

2005-06-19 Thread Robert Watson


On Sat, 18 Jun 2005, Philippe PEGON wrote:


Unfortunately this is a known bug in FreeBSD; check the archives for
more discussion.  Doug White tried to look at fixing it before
5.4-RELEASE but I think he gave up.


do you know if someone works on it ? I sent two mail in freebsd-stable 
about it without solution and this bug is really annoying :


http://lists.freebsd.org/mailman/htdig/freebsd-stable/2005-June/015952.html

and

http://lists.freebsd.org/mailman/htdig/freebsd-stable/2005-June/015864.html

there is a PR for it : kern/74319


This sounds very similar to a serial console related tty bug I was 
experiencing on -STABLE a few months ago, and that is believed may have 
been worked around in 5.4 tweaks before release.  In particular, that 
there are reference counting related bugs in the 5.x tty code that are 
fixed by a partial rewrite of the tty code in 6.x, but that are too large 
and disruptive to merge to RELENG_5.  If the problem is persisting, it may 
be worth trying to merge anyway, but it is a pretty big change and would 
break device driver binary compatibility, etc.  What we might want to do 
here is wait until 6.x has settled out a bit more, then consider merging 
it to 5.x once 6.x has gotten burned in with similar workloads and 
continued to not illustrate the 5.x tty reference bugs.


Robert N M Watson
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: 5.4-p1 crash

2005-06-18 Thread Philippe PEGON

Xin LI a écrit :

On Fri, Jun 17, 2005 at 07:53:52PM -0400, Kris Kennaway wrote:


On Fri, Jun 17, 2005 at 03:23:19PM -0700, Mitch Parks wrote:

Below are details regarding another crash on a Dell 2600 SMP (HTT and USB 
disabled). It has been 9 days since the last crash. I didn't have the 
serial console in place for this last crash, but it is now.


Text includes:
1. backtrace
2. dmesg
3. kernel conf

Since Dell diagnostics and Memtest check out fine, I'm kind of between a 
rock and a hard place here. I have a similar 2600 running 4.9 that is 
working great. I'd welcome any advice.


Unfortunately this is a known bug in FreeBSD; check the archives for
more discussion.  Doug White tried to look at fixing it before
5.4-RELEASE but I think he gave up.



Just curious...

What's the problem?  Is there known steps that can trigger it quickly so
we can grab the bug?


I just tested in one FreeBSD-5.4-p1 box (HP DL360 with two CPU) and it seems this simple expect 
program which runs six times simultaneously crashs the box after approximately 2 hours :


#! /usr/local/bin/expect

set timeout 60
set host [lindex $argv 0]

set pass "PASSWORD"

spawn ssh [EMAIL PROTECTED]

expect {
  "continue*(yes/no)" { send "yes\r" ; exp_continue }
  "assword:" { send "$pass\r" }
}

expect "*# " {
  send "ls\r"
}
expect "*#" {
  send "exit\r"
}

puts "Done."




Cheers,


if that can help
--
Philippe PEGON
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: 5.4-p1 crash

2005-06-18 Thread Mitch Parks

On Sun, 19 Jun 2005, Xin LI wrote:


On Fri, Jun 17, 2005 at 07:53:52PM -0400, Kris Kennaway wrote:

On Fri, Jun 17, 2005 at 03:23:19PM -0700, Mitch Parks wrote:

Below are details regarding another crash on a Dell 2600 SMP (HTT and USB
disabled). It has been 9 days since the last crash. I didn't have the
serial console in place for this last crash, but it is now.

[snip]
Since Dell diagnostics and Memtest check out fine, I'm kind of between a
rock and a hard place here. I have a similar 2600 running 4.9 that is
working great. I'd welcome any advice.


Unfortunately this is a known bug in FreeBSD; check the archives for
more discussion.  Doug White tried to look at fixing it before
5.4-RELEASE but I think he gave up.


Just curious...

What's the problem?  Is there known steps that can trigger it quickly so
we can grab the bug?


For my occurrence, I haven't found a predictable way to make it crash. 
Though I haven't *tried* to trigger it. Crashes have been at seemingly 
random times with no particular correlational activities that have been 
detected.


If there are particular benchmark, performance or other tools I could use to 
try and trigger it, let me know if that will help.


Mitch Parks

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: 5.4-p1 crash

2005-06-18 Thread Philippe PEGON

Xin LI a écrit :

On Fri, Jun 17, 2005 at 07:53:52PM -0400, Kris Kennaway wrote:


On Fri, Jun 17, 2005 at 03:23:19PM -0700, Mitch Parks wrote:

Below are details regarding another crash on a Dell 2600 SMP (HTT and USB 
disabled). It has been 9 days since the last crash. I didn't have the 
serial console in place for this last crash, but it is now.


Text includes:
1. backtrace
2. dmesg
3. kernel conf

Since Dell diagnostics and Memtest check out fine, I'm kind of between a 
rock and a hard place here. I have a similar 2600 running 4.9 that is 
working great. I'd welcome any advice.


Unfortunately this is a known bug in FreeBSD; check the archives for
more discussion.  Doug White tried to look at fixing it before
5.4-RELEASE but I think he gave up.



Just curious...

What's the problem?  Is there known steps that can trigger it quickly so
we can grab the bug?


for me, it seems that it is an expect program which connects to a lot of network equipement with 
"spawn ssh ..." for retrieving some informations. For the moment, I reduced the frequency and the 
server crash happens much less often.




Cheers,


--
Philippe PEGON
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: 5.4-p1 crash

2005-06-18 Thread Xin LI
On Fri, Jun 17, 2005 at 07:53:52PM -0400, Kris Kennaway wrote:
> On Fri, Jun 17, 2005 at 03:23:19PM -0700, Mitch Parks wrote:
> > Below are details regarding another crash on a Dell 2600 SMP (HTT and USB 
> > disabled). It has been 9 days since the last crash. I didn't have the 
> > serial console in place for this last crash, but it is now.
> > 
> > Text includes:
> > 1. backtrace
> > 2. dmesg
> > 3. kernel conf
> > 
> > Since Dell diagnostics and Memtest check out fine, I'm kind of between a 
> > rock and a hard place here. I have a similar 2600 running 4.9 that is 
> > working great. I'd welcome any advice.
> 
> Unfortunately this is a known bug in FreeBSD; check the archives for
> more discussion.  Doug White tried to look at fixing it before
> 5.4-RELEASE but I think he gave up.

Just curious...

What's the problem?  Is there known steps that can trigger it quickly so
we can grab the bug?

Cheers,
-- 
Xin LI   http://www.delphij.net/
See complete headers for GPG key and other information.



pgpS4QNzJhS3f.pgp
Description: PGP signature


Re: 5.4-p1 crash

2005-06-18 Thread Philippe PEGON

Kris Kennaway a écrit :

On Fri, Jun 17, 2005 at 03:23:19PM -0700, Mitch Parks wrote:

Below are details regarding another crash on a Dell 2600 SMP (HTT and USB 
disabled). It has been 9 days since the last crash. I didn't have the 
serial console in place for this last crash, but it is now.


Text includes:
1. backtrace
2. dmesg
3. kernel conf

Since Dell diagnostics and Memtest check out fine, I'm kind of between a 
rock and a hard place here. I have a similar 2600 running 4.9 that is 
working great. I'd welcome any advice.



Unfortunately this is a known bug in FreeBSD; check the archives for
more discussion.  Doug White tried to look at fixing it before
5.4-RELEASE but I think he gave up.


do you know if someone works on it ? I sent two mail in freebsd-stable about it without solution and 
this bug is really annoying :


http://lists.freebsd.org/mailman/htdig/freebsd-stable/2005-June/015952.html

and

http://lists.freebsd.org/mailman/htdig/freebsd-stable/2005-June/015864.html

there is a PR for it : kern/74319



Kris


thanks
--
Philippe PEGON
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: 5.4-p1 crash

2005-06-18 Thread Daniel Gerzo
Hi Mitch,

Saturday, June 18, 2005, 12:23:19 AM, you typed the following:

> Below are details regarding another crash on a Dell 2600 SMP (HTT and USB
> disabled). It has been 9 days since the last crash. I didn't have the serial
> console in place for this last crash, but it is now.

> Text includes:
> 1. backtrace
> 2. dmesg
> 3. kernel conf

> Since Dell diagnostics and Memtest check out fine, I'm kind of between a
> rock and a hard place here. I have a similar 2600 running 4.9 that is
> working great. I'd welcome any advice.

I think I'm experiencing this as well on my Dell gx280, however I
don't have any backtrace.

my dmesg:

Copyright (c) 1992-2005 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
The Regents of the University of California. All rights reserved.
FreeBSD 5.4-DanGerSEC #2: Fri May 27 23:16:31 CEST 2005
[EMAIL PROTECTED]:/usr/obj/usr/src/sys/daemon
Timecounter "i8254" frequency 1193182 Hz quality 0
CPU: Intel(R) Pentium(R) 4 CPU 2.80GHz (2793.01-MHz 686-class CPU)
  Origin = "GenuineIntel"  Id = 0xf41  Stepping = 1
  
Features=0xbfebfbff
  Hyperthreading: 2 logical CPUs
real memory  = 1071144960 (1021 MB)
avail memory = 1042702336 (994 MB)
MPTable: 
ioapic0: Changing APIC ID to 8
ioapic0: Assuming intbase of 0
ioapic0  irqs 0-23 on motherboard
npx0:  on motherboard
npx0: INT 16 interface
cpu0 on motherboard
pcib0:  pcibus 0 on motherboard
pci0:  on pcib0
pcib0: unable to route slot 1 INTA
pcib0: unable to route slot 2 INTA
pcib0: unable to route slot 28 INTA
pcib0: unable to route slot 28 INTB
pcib1:  irq 11 at device 1.0 on pci0
pci1:  on pcib1
pci0:  at device 2.0 (no driver attached)
pci0:  at device 2.1 (no driver attached)
pcib2:  irq 11 at device 28.0 on pci0
pci2:  on pcib2
bge0:  mem 
0xdfcf-0xdfcf irq 16 at device 0.0 on pci2
miibus0:  on bge0
brgphy0:  on miibus0
brgphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseTX, 
1000baseTX-FDX, auto
bge0: Ethernet address: 00:11:43:b9:b2:ef
pcib3:  irq 10 at device 28.1 on pci0
pci3:  on pcib3
uhci0:  port 0xff80-0xff9f 
irq 21 at device 29.0 on pci0
usb0:  on uhci0
usb0: USB revision 1.0
uhub0: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub0: 2 ports with 2 removable, self powered
uhci1:  port 0xff60-0xff7f 
irq 22 at device 29.1 on pci0
usb1:  on uhci1
usb1: USB revision 1.0
uhub1: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub1: 2 ports with 2 removable, self powered
uhci2:  port 0xff40-0xff5f 
irq 18 at device 29.2 on pci0
usb2:  on uhci2
usb2: USB revision 1.0
uhub2: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub2: 2 ports with 2 removable, self powered
uhci3:  port 0xff20-0xff3f 
irq 23 at device 29.3 on pci0
usb3:  on uhci3
usb3: USB revision 1.0
uhub3: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub3: 2 ports with 2 removable, self powered
pci0:  at device 29.7 (no driver attached)
pcib4:  at device 30.0 on pci0
pci4:  on pcib4
xl0: <3Com 3c905C-TX Fast Etherlink XL> port 0xdc80-0xdcff mem 
0xdf9fff80-0xdf9f irq 16 at device 0.0 on pci4
miibus1:  on xl0
xlphy0: <3c905C 10/100 internal PHY> on miibus1
xlphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
xl0: Ethernet address: 00:04:76:14:be:1d
isab0:  at device 31.0 on pci0
isa0:  on isab0
atapci0:  port 
0xffa0-0xffaf,0x376,0x170-0x177,0x3f6,0x1f0-0x1f7 irq 16 at device 31.1 on pci0
ata0: channel #0 on atapci0
ata1: channel #1 on atapci0
atapci1:  port 
0xfea0-0xfeaf,0xfe30-0xfe33,0xfe20-0xfe27,0xfe10-0xfe13,0xfe00-0xfe07 irq 20 at 
device 31.2 on pci0
ata2: channel #0 on atapci1
ata3: channel #1 on atapci1
pci0:  at device 31.3 (no driver attached)
orm0:  at iomem 
0xcc800-0xc,0xcb000-0xcc7ff,0xca800-0xcafff,0xc-0xca7ff on isa0
atkbdc0:  at port 0x64,0x60 on isa0
atkbd0:  irq 1 on atkbdc0
kbd0 at atkbd0
sc0:  at flags 0x100 on isa0
sc0: VGA <16 virtual consoles, flags=0x300>
sio0 at port 0x3f8-0x3ff irq 4 flags 0x10 on isa0
sio0: type 16550A, console
sio1: configured irq 3 not in bitmap of probed irqs 0
sio1: port may not be enabled
vga0:  at port 0x3c0-0x3df iomem 0xa-0xb on isa0
unknown:  can't assign resources (port)
unknown:  can't assign resources (port)
unknown:  can't assign resources (port)
unknown:  can't assign resources (port)
Timecounter "TSC" frequency 2793012901 Hz quality 800
Timecounters tick every 1.000 msec
ad1: 194481MB  [395136/16/63] at ata0-slave UDMA100
ad4: 38146MB  [77504/16/63] at ata2-master SATA150
Mounting root from ufs:/dev/ad4s1a
WARNING: /storage was not properly dismounted
Accounting enabled

kernel conf:

machine i386
cpu I686_CPU
ident   daemon-DanGer

options SCHED_4BSD  # 4BSD scheduler
options INET# InterNETworking
options INET6   # IPv6 communications protocols
options FFS # Berkeley Fast Filesystem
options SOFTUPDATES # Enable FFS soft updat

Re: 5.4-p1 crash

2005-06-17 Thread Kris Kennaway
On Fri, Jun 17, 2005 at 03:23:19PM -0700, Mitch Parks wrote:
> Below are details regarding another crash on a Dell 2600 SMP (HTT and USB 
> disabled). It has been 9 days since the last crash. I didn't have the 
> serial console in place for this last crash, but it is now.
> 
> Text includes:
> 1. backtrace
> 2. dmesg
> 3. kernel conf
> 
> Since Dell diagnostics and Memtest check out fine, I'm kind of between a 
> rock and a hard place here. I have a similar 2600 running 4.9 that is 
> working great. I'd welcome any advice.

Unfortunately this is a known bug in FreeBSD; check the archives for
more discussion.  Doug White tried to look at fixing it before
5.4-RELEASE but I think he gave up.

Kris


pgpgv7IIRvPJw.pgp
Description: PGP signature


5.4-p1 crash

2005-06-17 Thread Mitch Parks
Below are details regarding another crash on a Dell 2600 SMP (HTT and USB 
disabled). It has been 9 days since the last crash. I didn't have the serial 
console in place for this last crash, but it is now.


Text includes:
1. backtrace
2. dmesg
3. kernel conf

Since Dell diagnostics and Memtest check out fine, I'm kind of between a 
rock and a hard place here. I have a similar 2600 running 4.9 that is 
working great. I'd welcome any advice.


Mitch Parks
IT Coordinator
UI Student Affairs

## 1

This GDB was configured as "i386-marcel-freebsd".
#0  doadump () at pcpu.h:159
159 __asm __volatile("movl %%fs:0,%0" : "=r" (td));
(kgdb) backtrace
#0  doadump () at pcpu.h:159
#1  0xc05357d7 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:410
#2  0xc0535afd in panic (fmt=0xc068b04f "%s")
at /usr/src/sys/kern/kern_shutdown.c:566
#3  0xc06632d4 in trap_fatal (frame=0xe9137978, eva=1117174480)
at /usr/src/sys/i386/i386/trap.c:817
#4  0xc0663017 in trap_pfault (frame=0xe9137978, usermode=0, eva=1117174480)
at /usr/src/sys/i386/i386/trap.c:735
#5  0xc0662c71 in trap (frame=
  {tf_fs = -1068367848, tf_es = -384630768, tf_ds = 16777232, tf_edi =
-9741
04776, tf_esi = 1117174476, tf_ebp = -384599616, tf_isp = -384599644, tf_ebx
= -
1007283084, tf_edx = 1117174476, tf_ecx = -1066420548, tf_eax = 0, tf_trapno
= 1
2, tf_err = 2, tf_eip = -1068141554, tf_cs = 8, tf_eflags = 66054, tf_esp =
-1007283200, tf_ss = -1004205824}) at /usr/src/sys/i386/i386/trap.c:425
#6  0xc065130a in calltrap () at /usr/src/sys/i386/i386/exception.s:140
#7  0xc0520018 in fork1 (td=0xc3f61474, flags=89, pages=-384599580,
procp=0xc0564171) at atomic.h:154
#8  0xc0557362 in selwakeuppri (sip=0xc3f61474, pri=89)
at /usr/src/sys/kern/sys_generic.c:1056
#9  0xc0564171 in ttwakeup (tp=0x10206) at /usr/src/sys/kern/tty.c:2366
#10 0xc0562e18 in ttymodem (tp=0xc3f61400, flag=0)
at /usr/src/sys/kern/tty.c:1625
#11 0xc0566b03 in ptcopen (dev=0xc4250900, flag=3, devtype=8192, td=0x0)
at linedisc.h:136
#12 0xc04f9f66 in spec_open (ap=0xe9137a80)
at /usr/src/sys/fs/specfs/spec_vnops.c:207
#13 0xc04f9cab in spec_vnoperate (ap=0x0)
at /usr/src/sys/fs/specfs/spec_vnops.c:118
#14 0xc059489d in vn_open_cred (ndp=0xe9137be4, flagp=0xe9137ce4, cmode=0,
cred=0xc3891e00, fdidx=0) at vnode_if.h:228
#15 0xc0594482 in vn_open (ndp=0x0, flagp=0xe9137ce4, cmode=0, fdidx=3)
at /usr/src/sys/kern/vfs_vnops.c:91
#16 0xc058e32f in kern_open (td=0xc468e900, path=0x0, pathseg=UIO_USERSPACE,

flags=3, mode=0) at /usr/src/sys/kern/vfs_syscalls.c:957
#17 0xc058e240 in open (td=0xc468e900, uap=0x0)
at /usr/src/sys/kern/vfs_syscalls.c:926
#18 0xc066360f in syscall (frame=
  {tf_fs = 47, tf_es = 134676527, tf_ds = -1078001617, tf_edi = -1,
tf_esi =
 671951917, tf_ebp = -1077943096, tf_isp = -384598668, tf_ebx = 671959136,
tf_ed
x = 671951953, tf_ecx = 674495244, tf_eax = 5, tf_trapno = 12, tf_err = 2,
tf_eip = 674002619, tf_cs = 31, tf_eflags = 658, tf_esp = -1077943188, tf_ss
= 47})
at /usr/src/sys/i386/i386/trap.c:1009
#19 0xc065135f in Xint0x80_syscall () at
/usr/src/sys/i386/i386/exception.s:201
#20 0x002f in ?? ()
#21 0x0807002f in ?? ()
#22 0xbfbf002f in ?? ()
#23 0x in ?? ()
#24 0x280d2c2d in ?? ()
#25 0xbfbfe4c8 in ?? ()
#26 0xe9137d74 in ?? ()
#27 0x280d4860 in ?? ()
#28 0x280d2c51 in ?? ()
#29 0x2833fb0c in ?? ()
#30 0x0005 in ?? ()
#31 0x000c in ?? ()
#32 0x0002 in ?? ()
#33 0x282c76bb in ?? ()
#34 0x001f in ?? ()
#35 0x0292 in ?? ()
#36 0xbfbfe46c in ?? ()
#37 0x002f in ?? ()
#38 0x08067000 in ?? ()
#39 0x0004 in ?? ()
#40 0x in ?? ()
#41 0x in ?? ()
#42 0x5b77b000 in ?? ()
#43 0xc42361c4 in ?? ()
#44 0xc468e900 in ?? ()
#45 0xe9137b34 in ?? ()
#46 0xe9137b1c in ?? ()
#47 0xc347f600 in ?? ()
#48 0xc0545d9f in sched_switch (td=0x280d2c2d, newtd=0x280d4860,
flags=Cannot access memory at address 0xbfbfe4d8
)
at /usr/src/sys/kern/sched_4bsd.c:881
Previous frame inner to this frame (corrupt stack?)

# 2

Copyright (c) 1992-2005 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
The Regents of the University of California. All rights reserved.
FreeBSD 5.4-RELEASE-p1 #18: Thu May 26 23:37:44 PDT 2005
[EMAIL PROTECTED]:/usr/obj/usr/src/sys/kuoi
Timecounter "i8254" frequency 1193182 Hz quality 0
CPU: Intel(R) Xeon(TM) CPU 2.80GHz (2791.01-MHz 686-class CPU)
  Origin = "GenuineIntel"  Id = 0xf29  Stepping = 9
  
Features=0xbfebfbff
real memory  = 2147287040 (2047 MB)
avail memory = 2095947776 (1998 MB)
ACPI APIC Table: 
FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs
 cpu0 (BSP): APIC ID:  0
 cpu1 (AP): APIC ID:  6
ioapic0: Changing APIC ID to 8
ioapic1: Changing APIC ID to 9
ioapic2: Changing APIC ID to 10
ioapic2: WARNING: intbase 72 != expected base 48
ioapic3: Changing APIC ID to 11
ioapic3: WARNING: intbase 120 != expected base 96
ioapic4: Changing APIC ID