Re: 7.2-release/amd64: panic, spin lock held too long

2009-09-27 Thread C. C. Tang

C. C. Tang wrote:

Attilio Rao wrote:

2009/9/22 C. C. Tang :



I have patched the sched_ule.c and did a make buildkernel & make
installkernel (is buildworld and installworld necessary?), 
rebooted and

the
machine is running now.
I will post here again if there is any update.

My server is up for 3.5 days now with HyperThreading & powerd enabled.
No panic occured yet.


Usually how long did it take to panic?

Attilio



It is rather random, but will usually panic within one week.
Anyway my server will keep running and I will report if it has any problem.

Thanks,
C.C.


My server is up for 9.5 days now. Seems working fine.

C.C.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: 7.2-release/amd64: panic, spin lock held too long

2009-09-21 Thread C. C. Tang

Attilio Rao wrote:

2009/9/22 C. C. Tang :



I have patched the sched_ule.c and did a make buildkernel & make
installkernel (is buildworld and installworld necessary?), rebooted and
the
machine is running now.
I will post here again if there is any update.

My server is up for 3.5 days now with HyperThreading & powerd enabled.
No panic occured yet.


Usually how long did it take to panic?

Attilio



It is rather random, but will usually panic within one week.
Anyway my server will keep running and I will report if it has any problem.

Thanks,
C.C.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: 7.2-release/amd64: panic, spin lock held too long

2009-09-21 Thread C. C. Tang




I have patched the sched_ule.c and did a make buildkernel & make
installkernel (is buildworld and installworld necessary?), rebooted and the
machine is running now.
I will post here again if there is any update.


My server is up for 3.5 days now with HyperThreading & powerd enabled.
No panic occured yet.

C.C.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: 7.2-release/amd64: panic, spin lock held too long

2009-09-18 Thread C. C. Tang

Attilio Rao wrote:

2009/9/17 C. C. Tang :

Dan, is that machine equipped with Hyperthreading?

Attilio

Yes. It's an Intel Atom 330, which is a dualcore CPU with HT (4 cores
visible in "top" as a result)

Yes, mine is also Atom 330.

I cannot test the patch because my machine is also in production now. 
But I

have tested it with hyperthreading.
powerd with HyperThreading -> spin lock hold too long
powerd without HyperThreading -> no problem
no powerd with/without HyperThreading -> no problem


But these are with the last patch I posted in?
(specifically, for 7.2:
http://www.freebsd.org/~attilio/sched_ule.diff
)

So with the patch in, powerd and hyperthreading on you still get a 
deadlock?


Attilio


I have patched the sched_ule.c and did a make buildkernel & make 
installkernel (is buildworld and installworld necessary?), rebooted and 
the machine is running now.

I will post here again if there is any update.

Thanks,
C.C.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: 7.2-release/amd64: panic, spin lock held too long

2009-09-17 Thread C. C. Tang

Attilio Rao wrote:

2009/9/17 C. C. Tang :

Dan, is that machine equipped with Hyperthreading?

Attilio

Yes. It's an Intel Atom 330, which is a dualcore CPU with HT (4 cores
visible in "top" as a result)

Yes, mine is also Atom 330.

I cannot test the patch because my machine is also in production now. But I
have tested it with hyperthreading.
powerd with HyperThreading -> spin lock hold too long
powerd without HyperThreading -> no problem
no powerd with/without HyperThreading -> no problem


But these are with the last patch I posted in?
(specifically, for 7.2:
http://www.freebsd.org/~attilio/sched_ule.diff
)

So with the patch in, powerd and hyperthreading on you still get a deadlock?

Attilio



No, the kernel is not patched.

C.C.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: 7.2-release/amd64: panic, spin lock held too long

2009-09-16 Thread C. C. Tang

Dan, is that machine equipped with Hyperthreading?

Attilio


Yes. It's an Intel Atom 330, which is a dualcore CPU with HT (4 cores
visible in "top" as a result)


Yes, mine is also Atom 330.

I cannot test the patch because my machine is also in production now. 
But I have tested it with hyperthreading.

powerd with HyperThreading -> spin lock hold too long
powerd without HyperThreading -> no problem
no powerd with/without HyperThreading -> no problem

This blog article also describe the same situation:
http://hype-o-thetic.com/2009/07/09/freenas-d945gclf2-configuration/


By the way, I found that the readings of coretemp don't significant 
change when I enabled powerd so that I doubt that powerd may not be very 
useful for Atom CPU?


Thanks very much,
C.C.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: 7.2-release/amd64: panic, spin lock held too long

2009-07-22 Thread C. C. Tang

Attilio Rao wrote:

2009/7/22 C. C. Tang :

Could that one (on i386) be related?
http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/134584


I have no idea about it but I can tell the difference...
My machine panic randomly rather than on shutdown and I remembered that it
failed to write core dump. It also failed to reboot automatically..


Is your problem on -CURRENT and amd64?
At some point there has been a problem with PAT support (and
tlb_shootdowns() could lead to a livelock hanging forever, leading to
such a bug) but I expect it is fixed now.
Can you try with a fresh new -CURRENT if any?


My problem is on i386 version of 7.2-RELEASE-p2 on Intel Atom 330 CPU.
And my system just panic randomly with "spin lock held too long".
It didn't panic at reboot or shutdown so I think it the problem is 
somewhat different from that mentioned by Barbara's PR?


Anyway I disabled powerd and it seems become stable now.

And I am sorry that my system has been put into service so it would be 
hard for me to switch to -CURRENT...  :(


Regards,
C.C.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: 7.2-release/amd64: panic, spin lock held too long

2009-07-21 Thread C. C. Tang

> Could that one (on i386) be related?
> http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/134584
>

I have no idea about it but I can tell the difference...
My machine panic randomly rather than on shutdown and I remembered that 
it failed to write core dump. It also failed to reboot automatically..


Regards,
CC
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: 7.2-release/amd64: panic, spin lock held too long

2009-07-16 Thread C. C. Tang

Attilio Rao wrote:

2009/7/8 Dan Naumov :

On Wed, Jul 8, 2009 at 3:57 AM, Dan Naumov wrote:

On Tue, Jul 7, 2009 at 4:27 AM, Attilio Rao wrote:

2009/7/7 Dan Naumov :

On Tue, Jul 7, 2009 at 4:18 AM, Attilio Rao wrote:

2009/7/7 Dan Naumov :

I just got a panic following by a reboot a few seconds after running
"portsnap update", /var/log/messages shows the following:

Jul  7 03:49:38 atom syslogd: kernel boot file is /boot/kernel/kernel
Jul  7 03:49:38 atom kernel: spin lock 0x80b3edc0 (sched lock
1) held by 0xff00017d8370 (tid 100054) too long
Jul  7 03:49:38 atom kernel: panic: spin lock held too long

That's a known bug, affecting -CURRENT as well.
The cpustop IPI is handled though an NMI, which means it could
interrupt a CPU in any moment, even while holding a spinlock,
violating one well known FreeBSD rule.
That means that the cpu can stop itself while the thread was holding
the sched lock spinlock and not releasing it (there is no way, modulo
highly hackish, to fix that).
In the while hardclock() wants to schedule something else to run and
got stuck on the thread lock.

Ideal fix would involve not using a NMI for serving the cpustop while
having a cheap way (not making the common path too hard) to tell
hardclock() to avoid scheduling while cpustop is in flight.

Thanks,
Attilio

Any idea if a fix is being worked on and how unlucky must one be to
run into this issue, should I expect it to happen again? Is it
basically completely random?

I'd like to work on that issue before BETA3 (and backport to
STABLE_7), I'm just time-constrained right now.
it is completely random.

Thanks,
Attilio

Ok, this is getting pretty bad, 23 hours later, I get the same kind of
panic, the only difference is that instead of "portsnap update", this
was triggered by "portsnap cron" which I have running between 3 and 4
am every day:

Jul  8 03:03:49 atom kernel: ssppiinn  lloocckk
00xx8800bb33eeddc400  ((sscchheedd  lloocck k1 )0 )h
ehledl db yb y 0x0xfff0f1081735339760e 0( t(itdi d
1016070)5 )t otoo ol olnogng
Jul  8 03:03:49 atom kernel: p
Jul  8 03:03:49 atom kernel: anic: spin lock held too long
Jul  8 03:03:49 atom kernel: cpuid = 0
Jul  8 03:03:49 atom kernel: Uptime: 23h2m38s

I have now tried repeating the problem by running "stress --cpu 8 --io
8 --vm 4 --vm-bytes 1024M --timeout 600s --verbose" which pushed
system load into the 15.50 ballpark and simultaneously running
"portsnap fetch" and "portsnap update" but I couldn't manually trigger
the panic, it seems that this problem is indeed random (although it
baffles me why is it specifically portsnap triggering it). I have now
disabled powerd to check whether that makes any difference to system
stability.


But is that happening at reboot time?

Thanks,
Attilio



I think I am also having similar problem on my Atom machine. 
(FreeBSD-7.2-Release-p1)

It does not happen at boot/reboot but panic randomly.
And I found that it remains stable for more than a month now after I 
disabled powerd... (although I want to have it enabled)


--
C.C.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"