Re: 7.2-release/amd64: panic, spin lock held too long
C. C. Tang wrote: Attilio Rao wrote: 2009/9/22 C. C. Tang : I have patched the sched_ule.c and did a make buildkernel & make installkernel (is buildworld and installworld necessary?), rebooted and the machine is running now. I will post here again if there is any update. My server is up for 3.5 days now with HyperThreading & powerd enabled. No panic occured yet. Usually how long did it take to panic? Attilio It is rather random, but will usually panic within one week. Anyway my server will keep running and I will report if it has any problem. Thanks, C.C. My server is up for 9.5 days now. Seems working fine. C.C. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: 7.2-release/amd64: panic, spin lock held too long
Attilio Rao wrote: 2009/9/22 C. C. Tang : I have patched the sched_ule.c and did a make buildkernel & make installkernel (is buildworld and installworld necessary?), rebooted and the machine is running now. I will post here again if there is any update. My server is up for 3.5 days now with HyperThreading & powerd enabled. No panic occured yet. Usually how long did it take to panic? Attilio It is rather random, but will usually panic within one week. Anyway my server will keep running and I will report if it has any problem. Thanks, C.C. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: 7.2-release/amd64: panic, spin lock held too long
I have patched the sched_ule.c and did a make buildkernel & make installkernel (is buildworld and installworld necessary?), rebooted and the machine is running now. I will post here again if there is any update. My server is up for 3.5 days now with HyperThreading & powerd enabled. No panic occured yet. C.C. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: 7.2-release/amd64: panic, spin lock held too long
Attilio Rao wrote: 2009/9/17 C. C. Tang : Dan, is that machine equipped with Hyperthreading? Attilio Yes. It's an Intel Atom 330, which is a dualcore CPU with HT (4 cores visible in "top" as a result) Yes, mine is also Atom 330. I cannot test the patch because my machine is also in production now. But I have tested it with hyperthreading. powerd with HyperThreading -> spin lock hold too long powerd without HyperThreading -> no problem no powerd with/without HyperThreading -> no problem But these are with the last patch I posted in? (specifically, for 7.2: http://www.freebsd.org/~attilio/sched_ule.diff ) So with the patch in, powerd and hyperthreading on you still get a deadlock? Attilio I have patched the sched_ule.c and did a make buildkernel & make installkernel (is buildworld and installworld necessary?), rebooted and the machine is running now. I will post here again if there is any update. Thanks, C.C. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: 7.2-release/amd64: panic, spin lock held too long
Attilio Rao wrote: 2009/9/17 C. C. Tang : Dan, is that machine equipped with Hyperthreading? Attilio Yes. It's an Intel Atom 330, which is a dualcore CPU with HT (4 cores visible in "top" as a result) Yes, mine is also Atom 330. I cannot test the patch because my machine is also in production now. But I have tested it with hyperthreading. powerd with HyperThreading -> spin lock hold too long powerd without HyperThreading -> no problem no powerd with/without HyperThreading -> no problem But these are with the last patch I posted in? (specifically, for 7.2: http://www.freebsd.org/~attilio/sched_ule.diff ) So with the patch in, powerd and hyperthreading on you still get a deadlock? Attilio No, the kernel is not patched. C.C. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: 7.2-release/amd64: panic, spin lock held too long
Dan, is that machine equipped with Hyperthreading? Attilio Yes. It's an Intel Atom 330, which is a dualcore CPU with HT (4 cores visible in "top" as a result) Yes, mine is also Atom 330. I cannot test the patch because my machine is also in production now. But I have tested it with hyperthreading. powerd with HyperThreading -> spin lock hold too long powerd without HyperThreading -> no problem no powerd with/without HyperThreading -> no problem This blog article also describe the same situation: http://hype-o-thetic.com/2009/07/09/freenas-d945gclf2-configuration/ By the way, I found that the readings of coretemp don't significant change when I enabled powerd so that I doubt that powerd may not be very useful for Atom CPU? Thanks very much, C.C. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: 7.2-release/amd64: panic, spin lock held too long
Attilio Rao wrote: 2009/7/22 C. C. Tang : Could that one (on i386) be related? http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/134584 I have no idea about it but I can tell the difference... My machine panic randomly rather than on shutdown and I remembered that it failed to write core dump. It also failed to reboot automatically.. Is your problem on -CURRENT and amd64? At some point there has been a problem with PAT support (and tlb_shootdowns() could lead to a livelock hanging forever, leading to such a bug) but I expect it is fixed now. Can you try with a fresh new -CURRENT if any? My problem is on i386 version of 7.2-RELEASE-p2 on Intel Atom 330 CPU. And my system just panic randomly with "spin lock held too long". It didn't panic at reboot or shutdown so I think it the problem is somewhat different from that mentioned by Barbara's PR? Anyway I disabled powerd and it seems become stable now. And I am sorry that my system has been put into service so it would be hard for me to switch to -CURRENT... :( Regards, C.C. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: 7.2-release/amd64: panic, spin lock held too long
> Could that one (on i386) be related? > http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/134584 > I have no idea about it but I can tell the difference... My machine panic randomly rather than on shutdown and I remembered that it failed to write core dump. It also failed to reboot automatically.. Regards, CC ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: 7.2-release/amd64: panic, spin lock held too long
Attilio Rao wrote: 2009/7/8 Dan Naumov : On Wed, Jul 8, 2009 at 3:57 AM, Dan Naumov wrote: On Tue, Jul 7, 2009 at 4:27 AM, Attilio Rao wrote: 2009/7/7 Dan Naumov : On Tue, Jul 7, 2009 at 4:18 AM, Attilio Rao wrote: 2009/7/7 Dan Naumov : I just got a panic following by a reboot a few seconds after running "portsnap update", /var/log/messages shows the following: Jul 7 03:49:38 atom syslogd: kernel boot file is /boot/kernel/kernel Jul 7 03:49:38 atom kernel: spin lock 0x80b3edc0 (sched lock 1) held by 0xff00017d8370 (tid 100054) too long Jul 7 03:49:38 atom kernel: panic: spin lock held too long That's a known bug, affecting -CURRENT as well. The cpustop IPI is handled though an NMI, which means it could interrupt a CPU in any moment, even while holding a spinlock, violating one well known FreeBSD rule. That means that the cpu can stop itself while the thread was holding the sched lock spinlock and not releasing it (there is no way, modulo highly hackish, to fix that). In the while hardclock() wants to schedule something else to run and got stuck on the thread lock. Ideal fix would involve not using a NMI for serving the cpustop while having a cheap way (not making the common path too hard) to tell hardclock() to avoid scheduling while cpustop is in flight. Thanks, Attilio Any idea if a fix is being worked on and how unlucky must one be to run into this issue, should I expect it to happen again? Is it basically completely random? I'd like to work on that issue before BETA3 (and backport to STABLE_7), I'm just time-constrained right now. it is completely random. Thanks, Attilio Ok, this is getting pretty bad, 23 hours later, I get the same kind of panic, the only difference is that instead of "portsnap update", this was triggered by "portsnap cron" which I have running between 3 and 4 am every day: Jul 8 03:03:49 atom kernel: ssppiinn lloocckk 00xx8800bb33eeddc400 ((sscchheedd lloocck k1 )0 )h ehledl db yb y 0x0xfff0f1081735339760e 0( t(itdi d 1016070)5 )t otoo ol olnogng Jul 8 03:03:49 atom kernel: p Jul 8 03:03:49 atom kernel: anic: spin lock held too long Jul 8 03:03:49 atom kernel: cpuid = 0 Jul 8 03:03:49 atom kernel: Uptime: 23h2m38s I have now tried repeating the problem by running "stress --cpu 8 --io 8 --vm 4 --vm-bytes 1024M --timeout 600s --verbose" which pushed system load into the 15.50 ballpark and simultaneously running "portsnap fetch" and "portsnap update" but I couldn't manually trigger the panic, it seems that this problem is indeed random (although it baffles me why is it specifically portsnap triggering it). I have now disabled powerd to check whether that makes any difference to system stability. But is that happening at reboot time? Thanks, Attilio I think I am also having similar problem on my Atom machine. (FreeBSD-7.2-Release-p1) It does not happen at boot/reboot but panic randomly. And I found that it remains stable for more than a month now after I disabled powerd... (although I want to have it enabled) -- C.C. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"