Re: Today's RELENG_5_4 and 'lock cmpxchgl'
On Tue, Jul 19, 2005 at 01:53:14PM +0200, Marc Olzheim wrote: > On Fri, Jul 15, 2005 at 08:05:23AM -0400, Kris Kennaway wrote: > > > Ok, even non-SMP 7-CURRENT crashes on it, so I do not believe that I'm > > > the only one seeing this... > > > > You're not..as noted, it's been widely reported. > > Could you give me any pointers to where this has been discussed before ? > > Would placing all of the ptsopen() and ptcclose() code under a giant > lock help ? Or is the problem somewhere else ? Ah, nevermind, it already operates under GIANT, so something else is molesting the tty's t_line array. Perhaps some kind of use after free issue ? Marc pgp5eUwbTdo0z.pgp Description: PGP signature
Re: Today's RELENG_5_4 and 'lock cmpxchgl'
On Fri, Jul 15, 2005 at 08:05:23AM -0400, Kris Kennaway wrote: > > Ok, even non-SMP 7-CURRENT crashes on it, so I do not believe that I'm > > the only one seeing this... > > You're not..as noted, it's been widely reported. Could you give me any pointers to where this has been discussed before ? Would placing all of the ptsopen() and ptcclose() code under a giant lock help ? Or is the problem somewhere else ? Marc pgpjdulqRu9MZ.pgp Description: PGP signature
Re: Today's RELENG_5_4 and 'lock cmpxchgl'
On Fri, Jul 15, 2005 at 12:05:39PM +0200, Marc Olzheim wrote: > On Fri, Jul 15, 2005 at 11:40:27AM +0200, Marc Olzheim wrote: > > On Thu, Jul 14, 2005 at 01:44:04PM -0400, Kris Kennaway wrote: > > > > The -CURRENT traces are very different from these, but I don't have > > > > comconsole on the -CURRENT machine. > > > > > > Thanks. > > > > Am I really the only one seeing this ? Or has someone been able to > > reproduce it yet ? > > Ok, even non-SMP 7-CURRENT crashes on it, so I do not believe that I'm > the only one seeing this... You're not..as noted, it's been widely reported. Kris pgpITXOSpPCSz.pgp Description: PGP signature
Re: Today's RELENG_5_4 and 'lock cmpxchgl'
On Fri, Jul 15, 2005 at 11:40:27AM +0200, Marc Olzheim wrote: > On Thu, Jul 14, 2005 at 01:44:04PM -0400, Kris Kennaway wrote: > > > The -CURRENT traces are very different from these, but I don't have > > > comconsole on the -CURRENT machine. > > > > Thanks. > > Am I really the only one seeing this ? Or has someone been able to > reproduce it yet ? Ok, even non-SMP 7-CURRENT crashes on it, so I do not believe that I'm the only one seeing this... Marc pgpHFOtvMXyXd.pgp Description: PGP signature
Re: Today's RELENG_5_4 and 'lock cmpxchgl'
On Thu, Jul 14, 2005 at 01:44:04PM -0400, Kris Kennaway wrote: > > The -CURRENT traces are very different from these, but I don't have > > comconsole on the -CURRENT machine. > > Thanks. Since I get plenty of opportunities to fiddle in the debugger now, because this happens a lot (we use screen a lot), tell me if anyone needs any specific info from the debugger prompt. Am I really the only one seeing this ? Or has someone been able to reproduce it yet ? > > Anyway, if people are having problems reproducing this, I'd like to > > know. I don't have a single RELENG_5 or 6 machine that withstands the > > 'screen-test'... > > This will hopefully be very useful in investigating and developing a > fix for the problem, at least on 6.0 (others have speculated that the > problem may be too difficult to fix in the 5.x branch). That sounds too scary to go into. Besides: it has worked on older FreeBSD 5.x systems... Perhaps 'others' could 'speculate' on list, so we might know why they're thinking that ? :-P Should SMP machines revert to FreeBSD 4.x in the mean time ? :-( Marc pgpHjgTVgBchD.pgp Description: PGP signature
Re: Today's RELENG_5_4 and 'lock cmpxchgl'
On Thu, Jul 14, 2005 at 03:05:20PM +0200, Marc Olzheim wrote: > > You need to obtain the debugging traceback for the panic and include > > it in the PR. > > Added two crash traces, one for the open() variant, one for the close() > variant. > > The -CURRENT traces are very different from these, but I don't have > comconsole on the -CURRENT machine. Thanks. > Anyway, if people are having problems reproducing this, I'd like to > know. I don't have a single RELENG_5 or 6 machine that withstands the > 'screen-test'... This will hopefully be very useful in investigating and developing a fix for the problem, at least on 6.0 (others have speculated that the problem may be too difficult to fix in the 5.x branch). Kris pgputvNEkZdcV.pgp Description: PGP signature
Re: Today's RELENG_5_4 and 'lock cmpxchgl'
On Wed, Jul 13, 2005 at 02:41:18PM -0400, Kris Kennaway wrote: > > > Make sure you recompile any modules when activating INVARIANTS, or > > > you'll get panics. > > > > Of course... make buildkernel and make installkernel do that for me... ;) > > Not if you are using third party port modules. Well, I'm not. ;-) > > It seems that 5.4-RELEASE-p* is safe btw. > > > > PR is http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/83375 > > You need to obtain the debugging traceback for the panic and include > it in the PR. Added two crash traces, one for the open() variant, one for the close() variant. The -CURRENT traces are very different from these, but I don't have comconsole on the -CURRENT machine. Anyway, if people are having problems reproducing this, I'd like to know. I don't have a single RELENG_5 or 6 machine that withstands the 'screen-test'... Marc pgp6GCGmvXMLl.pgp Description: PGP signature
Re: Today's RELENG_5_4 and 'lock cmpxchgl'
On Wed, Jul 13, 2005 at 02:55:22PM +0200, Marc Olzheim wrote: > On Wed, Jul 13, 2005 at 08:00:31AM -0400, Kris Kennaway wrote: > > > Stress testing this gives me instant fatal trap 12 on both 11 juli's > > > CURRENT and RELENG_5. > > > > > > I'll file a PR. > > > > Make sure you recompile any modules when activating INVARIANTS, or > > you'll get panics. > > Of course... make buildkernel and make installkernel do that for me... ;) Not if you are using third party port modules. > It seems that 5.4-RELEASE-p* is safe btw. > > PR is http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/83375 You need to obtain the debugging traceback for the panic and include it in the PR. Kris pgpEjP4drjeM4.pgp Description: PGP signature
Re: Today's RELENG_5_4 and 'lock cmpxchgl'
On Wed, Jul 13, 2005 at 08:00:31AM -0400, Kris Kennaway wrote: > > Stress testing this gives me instant fatal trap 12 on both 11 juli's > > CURRENT and RELENG_5. > > > > I'll file a PR. > > Make sure you recompile any modules when activating INVARIANTS, or > you'll get panics. Of course... make buildkernel and make installkernel do that for me... ;) It seems that 5.4-RELEASE-p* is safe btw. PR is http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/83375 Marc pgpVHgaCLemJt.pgp Description: PGP signature
Re: Today's RELENG_5_4 and 'lock cmpxchgl'
On Wed, Jul 13, 2005 at 11:29:39AM +0200, Marc Olzheim wrote: > On Mon, Jul 11, 2005 at 04:32:16PM +0200, Marc Olzheim wrote: > > On Sun, Jul 03, 2005 at 10:38:58AM +0200, Philippe PEGON wrote: > > > >The panic appears to be an instance of a known bug in 5.4 (and > > > >INVARIANTS will not fix it, but may just delay the inevitable by > > > >changing timings). See Doug White's recent emails which point to a > > > >patch you should test. > > > > > > If you think about this mail : > > > > > > http://lists.freebsd.org/mailman/htdig/freebsd-stable/2005-June/016165.html > > > > > > and follow the thread, you will see that this patch doesn't solve the > > > problem. > > > The last mail which I can see from doug white about this problem is : > > > > > > http://lists.freebsd.org/mailman/htdig/freebsd-stable/2005-June/016495.html > > > > > > for the moment, it seems that there is no solution for 5.x > > > > Well, the bug hasn't bitten me since I reactivated INVARIANTS, so I'll > > keep this workaround active for now. :-/ If I don't enable INVARIANTS, > > as soon as I start 'screen', it's a panic. > > Stress testing this gives me instant fatal trap 12 on both 11 juli's > CURRENT and RELENG_5. > > I'll file a PR. Make sure you recompile any modules when activating INVARIANTS, or you'll get panics. Kris pgpTlKictFnV8.pgp Description: PGP signature
Re: Today's RELENG_5_4 and 'lock cmpxchgl'
On Mon, Jul 11, 2005 at 04:32:16PM +0200, Marc Olzheim wrote: > On Sun, Jul 03, 2005 at 10:38:58AM +0200, Philippe PEGON wrote: > > >The panic appears to be an instance of a known bug in 5.4 (and > > >INVARIANTS will not fix it, but may just delay the inevitable by > > >changing timings). See Doug White's recent emails which point to a > > >patch you should test. > > > > If you think about this mail : > > > > http://lists.freebsd.org/mailman/htdig/freebsd-stable/2005-June/016165.html > > > > and follow the thread, you will see that this patch doesn't solve the > > problem. > > The last mail which I can see from doug white about this problem is : > > > > http://lists.freebsd.org/mailman/htdig/freebsd-stable/2005-June/016495.html > > > > for the moment, it seems that there is no solution for 5.x > > Well, the bug hasn't bitten me since I reactivated INVARIANTS, so I'll > keep this workaround active for now. :-/ If I don't enable INVARIANTS, > as soon as I start 'screen', it's a panic. Stress testing this gives me instant fatal trap 12 on both 11 juli's CURRENT and RELENG_5. I'll file a PR. Marc pgpNObT0hFTfY.pgp Description: PGP signature
Re: Today's RELENG_5_4 and 'lock cmpxchgl'
On Sun, Jul 03, 2005 at 10:38:58AM +0200, Philippe PEGON wrote: > >The panic appears to be an instance of a known bug in 5.4 (and > >INVARIANTS will not fix it, but may just delay the inevitable by > >changing timings). See Doug White's recent emails which point to a > >patch you should test. > > If you think about this mail : > > http://lists.freebsd.org/mailman/htdig/freebsd-stable/2005-June/016165.html > > and follow the thread, you will see that this patch doesn't solve the > problem. > The last mail which I can see from doug white about this problem is : > > http://lists.freebsd.org/mailman/htdig/freebsd-stable/2005-June/016495.html > > for the moment, it seems that there is no solution for 5.x Well, the bug hasn't bitten me since I reactivated INVARIANTS, so I'll keep this workaround active for now. :-/ If I don't enable INVARIANTS, as soon as I start 'screen', it's a panic. Marc pgppHF0f9m9aV.pgp Description: PGP signature
Re: Today's RELENG_5_4 and 'lock cmpxchgl'
Kris Kennaway a écrit : On Fri, Jul 01, 2005 at 03:03:35PM +0200, Marc Olzheim wrote: On Fri, Jul 01, 2005 at 12:41:39PM +0200, Marc Olzheim wrote: On Fri, Jul 01, 2005 at 12:14:58PM +0200, Marc Olzheim wrote: Somehow, this sounds familiar, i.e.: the "lock cmpxchgl": Fatal trap 12: page fault while in kernel mode ... Stopped at 0xc05160c3 = knote+0x27:lock cmpxchgl %ecx,0x1c(%edx) Somehow I think I solved this last time by activating 'INVARIANTS'... I'll try that now. Let's paraphrase: I think i solved this last time by activating 'INVARIANTS'... Anyway, tried that and yes, it didn't crash in the last few hours, so I guess it works. Without INVARIANTS, it crashed within seconds. On the downside, my Gigabit performance dropped from 99 MB/sec to 80 MB/sec because of INVARIANTS. The panic appears to be an instance of a known bug in 5.4 (and INVARIANTS will not fix it, but may just delay the inevitable by changing timings). See Doug White's recent emails which point to a patch you should test. If you think about this mail : http://lists.freebsd.org/mailman/htdig/freebsd-stable/2005-June/016165.html and follow the thread, you will see that this patch doesn't solve the problem. The last mail which I can see from doug white about this problem is : http://lists.freebsd.org/mailman/htdig/freebsd-stable/2005-June/016495.html for the moment, it seems that there is no solution for 5.x Kris -- Philippe PEGON ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Today's RELENG_5_4 and 'lock cmpxchgl'
On Fri, Jul 01, 2005 at 03:03:35PM +0200, Marc Olzheim wrote: > On Fri, Jul 01, 2005 at 12:41:39PM +0200, Marc Olzheim wrote: > > On Fri, Jul 01, 2005 at 12:14:58PM +0200, Marc Olzheim wrote: > > > Somehow, this sounds familiar, i.e.: the "lock cmpxchgl": > > > > > > Fatal trap 12: page fault while in kernel mode > > ... > > > Stopped at 0xc05160c3 = knote+0x27:lock cmpxchgl > > > %ecx,0x1c(%edx) > > > > Somehow I think I solved this last time by activating 'INVARIANTS'... > > I'll try that now. > > Let's paraphrase: > > I think i solved this last time by activating 'INVARIANTS'... > > Anyway, tried that and yes, it didn't crash in the last few hours, so I > guess it works. Without INVARIANTS, it crashed within seconds. > > On the downside, my Gigabit performance dropped from 99 MB/sec to 80 > MB/sec because of INVARIANTS. The panic appears to be an instance of a known bug in 5.4 (and INVARIANTS will not fix it, but may just delay the inevitable by changing timings). See Doug White's recent emails which point to a patch you should test. Kris pgphUQQX1w87x.pgp Description: PGP signature
Re: Today's RELENG_5_4 and 'lock cmpxchgl'
On Fri, Jul 01, 2005 at 12:41:39PM +0200, Marc Olzheim wrote: > On Fri, Jul 01, 2005 at 12:14:58PM +0200, Marc Olzheim wrote: > > Somehow, this sounds familiar, i.e.: the "lock cmpxchgl": > > > > Fatal trap 12: page fault while in kernel mode > ... > > Stopped at 0xc05160c3 = knote+0x27:lock cmpxchgl > > %ecx,0x1c(%edx) > > Somehow I think I solved this last time by activating 'INVARIANTS'... > I'll try that now. Let's paraphrase: I think i solved this last time by activating 'INVARIANTS'... Anyway, tried that and yes, it didn't crash in the last few hours, so I guess it works. Without INVARIANTS, it crashed within seconds. On the downside, my Gigabit performance dropped from 99 MB/sec to 80 MB/sec because of INVARIANTS. Marc pgpuGTS7uzdRi.pgp Description: PGP signature
Re: Today's RELENG_5_4 and 'lock cmpxchgl'
On Fri, Jul 01, 2005 at 12:14:58PM +0200, Marc Olzheim wrote: > Somehow, this sounds familiar, i.e.: the "lock cmpxchgl": > > Fatal trap 12: page fault while in kernel mode ... > Stopped at 0xc05160c3 = knote+0x27:lock cmpxchgl > %ecx,0x1c(%edx) Somehow I think I solved this last time by activating 'INVARIANTS'... I'll try that now. Marc pgpdLf31AfdGk.pgp Description: PGP signature
Today's RELENG_5_4 and 'lock cmpxchgl'
Somehow, this sounds familiar, i.e.: the "lock cmpxchgl": Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 00 fault virtual address = 0x1c fault code = supervisor write, page not present instruction pointer = 0x8:0xc05160c3 stack pointer = 0x10:0xebf499ac frame pointer = 0x10:0xebf499b8 code segment= base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags= interrupt enabled, resume, IOPL = 0 current process = 1299 (screen) [thread pid 1299 tid 100428 ] Stopped at 0xc05160c3 = knote+0x27:lock cmpxchgl %ecx,0x1c(%edx) db> tr Tracing pid 1299 tid 100428 td 0xc670cc00 knote(c5fdde80,0,0,c5fdde10,c5fdde00) at 0xc05160c3 = knote+0x27 ttwakeup(c5fdde00,c5fdde00,c5fdde00,c5f93000,ebf49a04) at 0xc0560ad9 = ttwakeup+0x65 ttymodem(c5fdde00,1) at 0xc055f73c = ttymodem+0x170 ptcopen(c5f93000,3,2000,c670cc00,c0717d40) at 0xc0563427 = ptcopen+0x63 spec_open(ebf49a70,ebf49b2c,c05913f9,ebf49a70,180) at 0xc04f4f82 = spec_open+0x2b6 spec_vnoperate(ebf49a70) at 0xc04f4cc7 = spec_vnoperate+0x13 vn_open_cred(ebf49bd4,ebf49cd4,0,c6614900,5) at 0xc05913f9 = vn_open_cred+0x419 vn_open(ebf49bd4,ebf49cd4,0,5,58) at 0xc0590fde = vn_open+0x1e kern_open(c670cc00,bfbfdf40,0,3,0) at 0xc058af5b = kern_open+0xeb open(c670cc00,ebf49d04,3,0,292) at 0xc058ae6c = open+0x18 syscall(bfbf002f,2f,bfbf002f,,28104c2d) at 0xc069e5e3 = syscall+0x2b3 Xint0x80_syscall() at 0xc068d2ff = Xint0x80_syscall+0x1f --- syscall (5, FreeBSD ELF32, open), eip = 0x2816c7bb, esp = 0xbfbfdf0c, ebp = 0xbfbfdf68 --- What am I doing wrong ? It's an SMP dual Xeon machine. Same kernel config as I used on my older kernels that didn't crash though... Marc pgp2JEex7NytX.pgp Description: PGP signature