Quoth Hiroki Sato on Thursday, 18 August 2011: > Hi, > > Mike Tancsa <m...@sentex.net> wrote > in <4e15a08c.6090...@sentex.net>: > > mi> On 7/7/2011 7:32 AM, Mike Tancsa wrote: > mi> > On 7/7/2011 4:20 AM, Kostik Belousov wrote: > mi> >> > mi> >> BTW, we had a similar panic, "spinlock held too long", the spinlock > mi> >> is the sched lock N, on busy 8-core box recently upgraded to the > mi> >> stable/8. Unfortunately, machine hung dumping core, so the stack trace > mi> >> for the owner thread was not available. > mi> >> > mi> >> I was unable to make any conclusion from the data that was present. > mi> >> If the situation is reproducable, you coulld try to revert r221937. > This > mi> >> is pure speculation, though. > mi> > > mi> > Another crash just now after 5hrs uptime. I will try and revert r221937 > mi> > unless there is any extra debugging you want me to add to the kernel > mi> > instead ? > > I am also suffering from a reproducible panic on an 8-STABLE box, an > NFS server with heavy I/O load. I could not get a kernel dump > because this panic locked up the machine just after it occurred, but > according to the stack trace it was the same as posted one. > Switching to an 8.2R kernel can prevent this panic. > > Any progress on the investigation? > > -- > spin lock 0xffffffff80cb46c0 (sched lock 0) held by 0xffffff01900458c0 (tid > 100489) too long > panic: spin lock held too long > cpuid = 1 > KDB: stack backtrace: > db_trace_self_wrapper() at db_trace_self_wrapper+0x2a > kdb_backtrace() at kdb_backtrace+0x37 > panic() at panic+0x187 > _mtx_lock_spin_failed() at _mtx_lock_spin_failed+0x39 > _mtx_lock_spin() at _mtx_lock_spin+0x9e > sched_add() at sched_add+0x117 > setrunnable() at setrunnable+0x78 > sleepq_signal() at sleepq_signal+0x7a > cv_signal() at cv_signal+0x3b > xprt_active() at xprt_active+0xe3 > svc_vc_soupcall() at svc_vc_soupcall+0xc > sowakeup() at sowakeup+0x69 > tcp_do_segment() at tcp_do_segment+0x25e7 > tcp_input() at tcp_input+0xcdd > ip_input() at ip_input+0xac > netisr_dispatch_src() at netisr_dispatch_src+0x7e > ether_demux() at ether_demux+0x14d > ether_input() at ether_input+0x17d > em_rxeof() at em_rxeof+0x1ca > em_handle_que() at em_handle_que+0x5b > taskqueue_run_locked() at taskqueue_run_locked+0x85 > taskqueue_thread_loop() at taskqueue_thread_loop+0x4e > fork_exit() at fork_exit+0x11f > fork_trampoline() at fork_trampoline+0xe > -- > > -- Hiroki
I'm also getting similar panics on 8.2-STABLE. Locks up everything and I have to power off. Once, I happened to be looking at the console when it happened and copied dow the following: Sleeping thread (tif 100037, pid 0) owns a non-sleepable lock panic: sleeping thread cpuid=1 Another time I got: lock order reversal: 1st 0xffffff000593e330 snaplk (snaplk) @ /usr/src/sys/kern/vfr_vnops.c:296 2nd 0xffffff0005e5d578 ufs (ufs) @ /usr/src/sys/ufs/ffs/ffs_snapshot.c:1587 I didn't copy down the traceback. These panics seem to hit when I'm doing heavy WAN I/O. I can go for about a day without one as long as I stay away from the web or even chat. Last night this system copied a backup of 35GB over the local network without failing, but as soon as I hopped onto Firefox this morning, down she went. I don't know if that's coincidence or useful data. I didn't get to say "Thanks" to Eitan Adler for attempting to help me with this on Monday night. Thanks, Eitan! -- .O. | Sterling (Chip) Camden | http://camdensoftware.com ..O | sterl...@camdensoftware.com | http://chipsquips.com OOO | 2048R/D6DBAF91 | http://chipstips.com
pgpiDl3SMsKRu.pgp
Description: PGP signature