Re: Today's RELENG_5_4 and 'lock cmpxchgl'

2005-07-19 Thread Marc Olzheim
On Tue, Jul 19, 2005 at 01:53:14PM +0200, Marc Olzheim wrote:
> On Fri, Jul 15, 2005 at 08:05:23AM -0400, Kris Kennaway wrote:
> > > Ok, even non-SMP 7-CURRENT crashes on it, so I do not believe that I'm
> > > the only one seeing this...
> > 
> > You're not..as noted, it's been widely reported.
> 
> Could you give me any pointers to where this has been discussed before ?
> 
> Would placing all of the ptsopen() and ptcclose() code under a giant
> lock help ? Or is the problem somewhere else ?

Ah, nevermind, it already operates under GIANT, so something else is
molesting the tty's t_line array. Perhaps some kind of use after free
issue ?

Marc


pgp5eUwbTdo0z.pgp
Description: PGP signature


Re: Today's RELENG_5_4 and 'lock cmpxchgl'

2005-07-19 Thread Marc Olzheim
On Fri, Jul 15, 2005 at 08:05:23AM -0400, Kris Kennaway wrote:
> > Ok, even non-SMP 7-CURRENT crashes on it, so I do not believe that I'm
> > the only one seeing this...
> 
> You're not..as noted, it's been widely reported.

Could you give me any pointers to where this has been discussed before ?

Would placing all of the ptsopen() and ptcclose() code under a giant
lock help ? Or is the problem somewhere else ?

Marc


pgpjdulqRu9MZ.pgp
Description: PGP signature


Re: Today's RELENG_5_4 and 'lock cmpxchgl'

2005-07-15 Thread Kris Kennaway
On Fri, Jul 15, 2005 at 12:05:39PM +0200, Marc Olzheim wrote:
> On Fri, Jul 15, 2005 at 11:40:27AM +0200, Marc Olzheim wrote:
> > On Thu, Jul 14, 2005 at 01:44:04PM -0400, Kris Kennaway wrote:
> > > > The -CURRENT traces are very different from these, but I don't have
> > > > comconsole on the -CURRENT machine.
> > > 
> > > Thanks.
> > 
> > Am I really the only one seeing this ? Or has someone been able to
> > reproduce it yet ?
> 
> Ok, even non-SMP 7-CURRENT crashes on it, so I do not believe that I'm
> the only one seeing this...

You're not..as noted, it's been widely reported.

Kris





pgpITXOSpPCSz.pgp
Description: PGP signature


Re: Today's RELENG_5_4 and 'lock cmpxchgl'

2005-07-15 Thread Marc Olzheim
On Fri, Jul 15, 2005 at 11:40:27AM +0200, Marc Olzheim wrote:
> On Thu, Jul 14, 2005 at 01:44:04PM -0400, Kris Kennaway wrote:
> > > The -CURRENT traces are very different from these, but I don't have
> > > comconsole on the -CURRENT machine.
> > 
> > Thanks.
> 
> Am I really the only one seeing this ? Or has someone been able to
> reproduce it yet ?

Ok, even non-SMP 7-CURRENT crashes on it, so I do not believe that I'm
the only one seeing this...

Marc


pgpHFOtvMXyXd.pgp
Description: PGP signature


Re: Today's RELENG_5_4 and 'lock cmpxchgl'

2005-07-15 Thread Marc Olzheim
On Thu, Jul 14, 2005 at 01:44:04PM -0400, Kris Kennaway wrote:
> > The -CURRENT traces are very different from these, but I don't have
> > comconsole on the -CURRENT machine.
> 
> Thanks.

Since I get plenty of opportunities to fiddle in the debugger now,
because this happens a lot (we use screen a lot), tell me if anyone
needs any specific info from the debugger prompt.

Am I really the only one seeing this ? Or has someone been able to
reproduce it yet ?

> > Anyway, if people are having problems reproducing this, I'd like to
> > know. I don't have a single RELENG_5 or 6 machine that withstands the
> > 'screen-test'...
> 
> This will hopefully be very useful in investigating and developing a
> fix for the problem, at least on 6.0 (others have speculated that the
> problem may be too difficult to fix in the 5.x branch).

That sounds too scary to go into. Besides: it has worked on older
FreeBSD 5.x systems...

Perhaps 'others' could 'speculate' on list, so we might know why
they're thinking that ? :-P

Should SMP machines revert to FreeBSD 4.x in the mean time ? :-(

Marc


pgpHjgTVgBchD.pgp
Description: PGP signature


Re: Today's RELENG_5_4 and 'lock cmpxchgl'

2005-07-14 Thread Kris Kennaway
On Thu, Jul 14, 2005 at 03:05:20PM +0200, Marc Olzheim wrote:

> > You need to obtain the debugging traceback for the panic and include
> > it in the PR.
> 
> Added two crash traces, one for the open() variant, one for the close()
> variant.
> 
> The -CURRENT traces are very different from these, but I don't have
> comconsole on the -CURRENT machine.

Thanks.

> Anyway, if people are having problems reproducing this, I'd like to
> know. I don't have a single RELENG_5 or 6 machine that withstands the
> 'screen-test'...

This will hopefully be very useful in investigating and developing a
fix for the problem, at least on 6.0 (others have speculated that the
problem may be too difficult to fix in the 5.x branch).

Kris




pgputvNEkZdcV.pgp
Description: PGP signature


Re: Today's RELENG_5_4 and 'lock cmpxchgl'

2005-07-14 Thread Marc Olzheim
On Wed, Jul 13, 2005 at 02:41:18PM -0400, Kris Kennaway wrote:
> > > Make sure you recompile any modules when activating INVARIANTS, or
> > > you'll get panics.
> > 
> > Of course... make buildkernel and make installkernel do that for me... ;)
> 
> Not if you are using third party port modules. 

Well, I'm not. ;-)

> > It seems that 5.4-RELEASE-p* is safe btw.
> > 
> > PR is http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/83375
> 
> You need to obtain the debugging traceback for the panic and include
> it in the PR.

Added two crash traces, one for the open() variant, one for the close()
variant.

The -CURRENT traces are very different from these, but I don't have
comconsole on the -CURRENT machine.

Anyway, if people are having problems reproducing this, I'd like to
know. I don't have a single RELENG_5 or 6 machine that withstands the
'screen-test'...

Marc


pgp6GCGmvXMLl.pgp
Description: PGP signature


Re: Today's RELENG_5_4 and 'lock cmpxchgl'

2005-07-13 Thread Kris Kennaway
On Wed, Jul 13, 2005 at 02:55:22PM +0200, Marc Olzheim wrote:
> On Wed, Jul 13, 2005 at 08:00:31AM -0400, Kris Kennaway wrote:
> > > Stress testing this gives me instant fatal trap 12 on both 11 juli's
> > > CURRENT and RELENG_5.
> > > 
> > > I'll file a PR.
> > 
> > Make sure you recompile any modules when activating INVARIANTS, or
> > you'll get panics.
> 
> Of course... make buildkernel and make installkernel do that for me... ;)

Not if you are using third party port modules. 

> It seems that 5.4-RELEASE-p* is safe btw.
> 
> PR is http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/83375

You need to obtain the debugging traceback for the panic and include
it in the PR.

Kris



pgpEjP4drjeM4.pgp
Description: PGP signature


Re: Today's RELENG_5_4 and 'lock cmpxchgl'

2005-07-13 Thread Marc Olzheim
On Wed, Jul 13, 2005 at 08:00:31AM -0400, Kris Kennaway wrote:
> > Stress testing this gives me instant fatal trap 12 on both 11 juli's
> > CURRENT and RELENG_5.
> > 
> > I'll file a PR.
> 
> Make sure you recompile any modules when activating INVARIANTS, or
> you'll get panics.

Of course... make buildkernel and make installkernel do that for me... ;)
It seems that 5.4-RELEASE-p* is safe btw.

PR is http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/83375

Marc


pgpVHgaCLemJt.pgp
Description: PGP signature


Re: Today's RELENG_5_4 and 'lock cmpxchgl'

2005-07-13 Thread Kris Kennaway
On Wed, Jul 13, 2005 at 11:29:39AM +0200, Marc Olzheim wrote:
> On Mon, Jul 11, 2005 at 04:32:16PM +0200, Marc Olzheim wrote:
> > On Sun, Jul 03, 2005 at 10:38:58AM +0200, Philippe PEGON wrote:
> > > >The panic appears to be an instance of a known bug in 5.4 (and
> > > >INVARIANTS will not fix it, but may just delay the inevitable by
> > > >changing timings).  See Doug White's recent emails which point to a
> > > >patch you should test.
> > > 
> > > If you think about this mail :
> > > 
> > > http://lists.freebsd.org/mailman/htdig/freebsd-stable/2005-June/016165.html
> > > 
> > > and follow the thread, you will see that this patch doesn't solve the 
> > > problem.
> > > The last mail which I can see from doug white about this problem is :
> > > 
> > > http://lists.freebsd.org/mailman/htdig/freebsd-stable/2005-June/016495.html
> > > 
> > > for the moment, it seems that there is no solution for 5.x
> > 
> > Well, the bug hasn't bitten me since I reactivated INVARIANTS, so I'll
> > keep this workaround active for now. :-/ If I don't enable INVARIANTS,
> > as soon as I start 'screen', it's a panic.
> 
> Stress testing this gives me instant fatal trap 12 on both 11 juli's
> CURRENT and RELENG_5.
> 
> I'll file a PR.

Make sure you recompile any modules when activating INVARIANTS, or
you'll get panics.

Kris



pgpTlKictFnV8.pgp
Description: PGP signature


Re: Today's RELENG_5_4 and 'lock cmpxchgl'

2005-07-13 Thread Marc Olzheim
On Mon, Jul 11, 2005 at 04:32:16PM +0200, Marc Olzheim wrote:
> On Sun, Jul 03, 2005 at 10:38:58AM +0200, Philippe PEGON wrote:
> > >The panic appears to be an instance of a known bug in 5.4 (and
> > >INVARIANTS will not fix it, but may just delay the inevitable by
> > >changing timings).  See Doug White's recent emails which point to a
> > >patch you should test.
> > 
> > If you think about this mail :
> > 
> > http://lists.freebsd.org/mailman/htdig/freebsd-stable/2005-June/016165.html
> > 
> > and follow the thread, you will see that this patch doesn't solve the 
> > problem.
> > The last mail which I can see from doug white about this problem is :
> > 
> > http://lists.freebsd.org/mailman/htdig/freebsd-stable/2005-June/016495.html
> > 
> > for the moment, it seems that there is no solution for 5.x
> 
> Well, the bug hasn't bitten me since I reactivated INVARIANTS, so I'll
> keep this workaround active for now. :-/ If I don't enable INVARIANTS,
> as soon as I start 'screen', it's a panic.

Stress testing this gives me instant fatal trap 12 on both 11 juli's
CURRENT and RELENG_5.

I'll file a PR.

Marc


pgpNObT0hFTfY.pgp
Description: PGP signature


Re: Today's RELENG_5_4 and 'lock cmpxchgl'

2005-07-11 Thread Marc Olzheim
On Sun, Jul 03, 2005 at 10:38:58AM +0200, Philippe PEGON wrote:
> >The panic appears to be an instance of a known bug in 5.4 (and
> >INVARIANTS will not fix it, but may just delay the inevitable by
> >changing timings).  See Doug White's recent emails which point to a
> >patch you should test.
> 
> If you think about this mail :
> 
> http://lists.freebsd.org/mailman/htdig/freebsd-stable/2005-June/016165.html
> 
> and follow the thread, you will see that this patch doesn't solve the 
> problem.
> The last mail which I can see from doug white about this problem is :
> 
> http://lists.freebsd.org/mailman/htdig/freebsd-stable/2005-June/016495.html
> 
> for the moment, it seems that there is no solution for 5.x

Well, the bug hasn't bitten me since I reactivated INVARIANTS, so I'll
keep this workaround active for now. :-/ If I don't enable INVARIANTS,
as soon as I start 'screen', it's a panic.

Marc


pgppHF0f9m9aV.pgp
Description: PGP signature


Re: Today's RELENG_5_4 and 'lock cmpxchgl'

2005-07-03 Thread Philippe PEGON

Kris Kennaway a écrit :

On Fri, Jul 01, 2005 at 03:03:35PM +0200, Marc Olzheim wrote:


On Fri, Jul 01, 2005 at 12:41:39PM +0200, Marc Olzheim wrote:


On Fri, Jul 01, 2005 at 12:14:58PM +0200, Marc Olzheim wrote:


Somehow, this sounds familiar, i.e.: the "lock cmpxchgl":

Fatal trap 12: page fault while in kernel mode


...


Stopped at  0xc05160c3 = knote+0x27:lock cmpxchgl   %ecx,0x1c(%edx)


Somehow I think I solved this last time by activating 'INVARIANTS'...
I'll try that now.


Let's paraphrase:

I think i solved this last time by activating 'INVARIANTS'...

Anyway, tried that and yes, it didn't crash in the last few hours, so I
guess it works. Without INVARIANTS, it crashed within seconds.

On the downside, my Gigabit performance dropped from 99 MB/sec to 80
MB/sec because of INVARIANTS.



The panic appears to be an instance of a known bug in 5.4 (and
INVARIANTS will not fix it, but may just delay the inevitable by
changing timings).  See Doug White's recent emails which point to a
patch you should test.


If you think about this mail :

http://lists.freebsd.org/mailman/htdig/freebsd-stable/2005-June/016165.html

and follow the thread, you will see that this patch doesn't solve the problem.
The last mail which I can see from doug white about this problem is :

http://lists.freebsd.org/mailman/htdig/freebsd-stable/2005-June/016495.html

for the moment, it seems that there is no solution for 5.x



Kris


--
Philippe PEGON
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Today's RELENG_5_4 and 'lock cmpxchgl'

2005-07-01 Thread Kris Kennaway
On Fri, Jul 01, 2005 at 03:03:35PM +0200, Marc Olzheim wrote:
> On Fri, Jul 01, 2005 at 12:41:39PM +0200, Marc Olzheim wrote:
> > On Fri, Jul 01, 2005 at 12:14:58PM +0200, Marc Olzheim wrote:
> > > Somehow, this sounds familiar, i.e.: the "lock cmpxchgl":
> > > 
> > > Fatal trap 12: page fault while in kernel mode
> > ...
> > > Stopped at  0xc05160c3 = knote+0x27:lock cmpxchgl   
> > > %ecx,0x1c(%edx)
> > 
> > Somehow I think I solved this last time by activating 'INVARIANTS'...
> > I'll try that now.
> 
> Let's paraphrase:
> 
> I think i solved this last time by activating 'INVARIANTS'...
> 
> Anyway, tried that and yes, it didn't crash in the last few hours, so I
> guess it works. Without INVARIANTS, it crashed within seconds.
> 
> On the downside, my Gigabit performance dropped from 99 MB/sec to 80
> MB/sec because of INVARIANTS.

The panic appears to be an instance of a known bug in 5.4 (and
INVARIANTS will not fix it, but may just delay the inevitable by
changing timings).  See Doug White's recent emails which point to a
patch you should test.

Kris



pgphUQQX1w87x.pgp
Description: PGP signature


Re: Today's RELENG_5_4 and 'lock cmpxchgl'

2005-07-01 Thread Marc Olzheim
On Fri, Jul 01, 2005 at 12:41:39PM +0200, Marc Olzheim wrote:
> On Fri, Jul 01, 2005 at 12:14:58PM +0200, Marc Olzheim wrote:
> > Somehow, this sounds familiar, i.e.: the "lock cmpxchgl":
> > 
> > Fatal trap 12: page fault while in kernel mode
> ...
> > Stopped at  0xc05160c3 = knote+0x27:lock cmpxchgl   
> > %ecx,0x1c(%edx)
> 
> Somehow I think I solved this last time by activating 'INVARIANTS'...
> I'll try that now.

Let's paraphrase:

I think i solved this last time by activating 'INVARIANTS'...

Anyway, tried that and yes, it didn't crash in the last few hours, so I
guess it works. Without INVARIANTS, it crashed within seconds.

On the downside, my Gigabit performance dropped from 99 MB/sec to 80
MB/sec because of INVARIANTS.

Marc


pgpuGTS7uzdRi.pgp
Description: PGP signature


Re: Today's RELENG_5_4 and 'lock cmpxchgl'

2005-07-01 Thread Marc Olzheim
On Fri, Jul 01, 2005 at 12:14:58PM +0200, Marc Olzheim wrote:
> Somehow, this sounds familiar, i.e.: the "lock cmpxchgl":
> 
> Fatal trap 12: page fault while in kernel mode
...
> Stopped at  0xc05160c3 = knote+0x27:lock cmpxchgl   
> %ecx,0x1c(%edx)

Somehow I think I solved this last time by activating 'INVARIANTS'...
I'll try that now.

Marc


pgpdLf31AfdGk.pgp
Description: PGP signature


Today's RELENG_5_4 and 'lock cmpxchgl'

2005-07-01 Thread Marc Olzheim
Somehow, this sounds familiar, i.e.: the "lock cmpxchgl":

Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address   = 0x1c
fault code  = supervisor write, page not present
instruction pointer = 0x8:0xc05160c3
stack pointer   = 0x10:0xebf499ac
frame pointer   = 0x10:0xebf499b8
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, def32 1, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process = 1299 (screen)
[thread pid 1299 tid 100428 ]
Stopped at  0xc05160c3 = knote+0x27:lock cmpxchgl   %ecx,0x1c(%edx)
db> tr
Tracing pid 1299 tid 100428 td 0xc670cc00
knote(c5fdde80,0,0,c5fdde10,c5fdde00) at 0xc05160c3 = knote+0x27
ttwakeup(c5fdde00,c5fdde00,c5fdde00,c5f93000,ebf49a04) at 0xc0560ad9 = 
ttwakeup+0x65
ttymodem(c5fdde00,1) at 0xc055f73c = ttymodem+0x170
ptcopen(c5f93000,3,2000,c670cc00,c0717d40) at 0xc0563427 = ptcopen+0x63
spec_open(ebf49a70,ebf49b2c,c05913f9,ebf49a70,180) at 0xc04f4f82 = 
spec_open+0x2b6
spec_vnoperate(ebf49a70) at 0xc04f4cc7 = spec_vnoperate+0x13
vn_open_cred(ebf49bd4,ebf49cd4,0,c6614900,5) at 0xc05913f9 = vn_open_cred+0x419
vn_open(ebf49bd4,ebf49cd4,0,5,58) at 0xc0590fde = vn_open+0x1e
kern_open(c670cc00,bfbfdf40,0,3,0) at 0xc058af5b = kern_open+0xeb
open(c670cc00,ebf49d04,3,0,292) at 0xc058ae6c = open+0x18
syscall(bfbf002f,2f,bfbf002f,,28104c2d) at 0xc069e5e3 = syscall+0x2b3
Xint0x80_syscall() at 0xc068d2ff = Xint0x80_syscall+0x1f
--- syscall (5, FreeBSD ELF32, open), eip = 0x2816c7bb, esp = 0xbfbfdf0c, ebp = 
0xbfbfdf68 ---

What am I doing wrong ?

It's an SMP dual Xeon machine. Same kernel config as I used on my older
kernels that didn't crash though...

Marc


pgp2JEex7NytX.pgp
Description: PGP signature