Re: Big problem still remains with 7.2-STABLE locking up
> In <3bbf2fe10906091757w35ffd5cfr6f091fc718bc8...@mail.gmail.com> > Attilio Rao wrote: > > Dcons session was recorded with script. > > http://www.heimat.gr.jp/localhost/dcons.log Just fix my typo. http://www.heimat.gr.jp/~nakaji/localhost/dcons.log ^^^ > I'm following up privately with the user, news to come hopefully. Thanks. I'll try. -- NAKAJI Hiroyuki ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Big problem still remains with 7.2-STABLE locking up
2009/6/10 NAKAJI Hiroyuki : > Thanks Attilio, > > I set up dcons target/host pair. Target is 7.2-STABLE and host is > 6.4-STABLE. > > Dcons session was recorded with script. > http://www.heimat.gr.jp/localhost/dcons.log I'm following up privately with the user, news to come hopefully. Thanks, Attilio -- Peace can only be achieved by understanding - A. Einstein ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Big problem still remains with 7.2-STABLE locking up
Thanks Attilio, I set up dcons target/host pair. Target is 7.2-STABLE and host is 6.4-STABLE. Dcons session was recorded with script. http://www.heimat.gr.jp/localhost/dcons.log > In <3bbf2fe10906060749xbbc2f2fy4c09f67711a...@mail.gmail.com> > Attilio Rao wrote: > 2) Once you get the deadlock break in the DDB debugger KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2a vfs_badlock() at vfs_badlock+0x95 assert_vop_elocked() at assert_vop_elocked+0x64 VOP_WRITE_APV() at VOP_WRITE_APV+0x155 vn_write() at vn_write+0x1ce dofilewrite() at dofilewrite+0x85 kern_pwritev() at kern_pwritev+0x66 pwrite() at pwrite+0x58 syscall() at syscall+0x1ce Xfast_syscall() at Xfast_syscall+0xab --- syscall (476, FreeBSD ELF64, pwrite), rip = 0x80074d17c, rsp = 0x7fffda98, rbp = 0xb4000 --- VOP_WRITE: 0xff004a26c000 is not exclusive locked but should be KDB: enter: lock violation [thread pid 29756 tid 100177 ] Stopped at kdb_enter_why+0x3d: movq$0,0x626418(%rip) > 3) Once you are in DDB informations which could be very useful are: db> show allpcpu db> show alllocks db> show lockedvnods db> ps db> allthreads > Note that this is a lot of printout so you won't be able of collecting > all these informations if not with a serial connection. db> show allpcpu Current CPU: 0 cpuid= 0 curthread= 0xff0114476ab0: pid 29756 "expireover" curpcb = 0xff80a3f7ed40 fpcurthread = none idlethread = 0xff0001589720: pid 11 "idle: cpu0" spin locks held: db> show alllocks Process 29784 (spamc) thread 0xff004afac720 (100170) exclusive sx so_rcv_sx r = 0 (0xff0065383c40) locked @ /usr/src/sys/kern/uipc_sockbuf.c:148 Process 29413 (perl) thread 0xff004afb0720 (100162) exclusive sx so_rcv_sx r = 0 (0xff00656163d0) locked @ /usr/src/sys/kern/uipc_sockbuf.c:148 Process 29409 (perl) thread 0xff01144ea390 (100175) exclusive sx so_rcv_sx r = 0 (0xff004a210970) locked @ /usr/src/sys/kern/uipc_sockbuf.c:148 Process 29406 (perl) thread 0xff0065899000 (100196) exclusive sx so_rcv_sx r = 0 (0xff00655cbc40) locked @ /usr/src/sys/kern/uipc_sockbuf.c:148 Process 1497 (perl5.8.9) thread 0xff0013dedab0 (100113) exclusive sx so_rcv_sx r = 0 (0xff004a0b9970) locked @ /usr/src/sys/kern/uipc_sockbuf.c:148 Process 1496 (ninpaths) thread 0xff001333a720 (100082) exclusive sx so_rcv_sx r = 0 (0xff004a33d100) locked @ /usr/src/sys/kern/uipc_sockbuf.c:148 Process 1494 (perl5.8.9) thread 0xff0013dcf720 (100107) exclusive sx so_rcv_sx r = 0 (0xff004a33d6a0) locked @ /usr/src/sys/kern/uipc_sockbuf.c:148 Process 1397 (python2.5) thread 0xff0013dd1ab0 (100098) exclusive sx so_rcv_sx r = 0 (0xff00655cc3d0) locked @ /usr/src/sys/kern/uipc_sockbuf.c:148 db> show lockedvnods Locked vnodes 0xff004a26c000: tag ufs, type VREG usecount 5, writecount 2, refcount 587 mountedhere 0 flags (VI_OBJDIRTY) v_object 0xff004a1fad80 ref 3 pages 4604 lock type ufs: SHARED (count 1) ino 3157042, on dev ad4s1f db> ps pid ppid pgrp uid state wmesg wchancmd 29803 1443 1406 8 S nanslp 0x80b58688 sleep 29797 1534 1534 0 S connec 0xff00654815fe sendmail 29785 1499 1499 110 S lockf0xff0065a19000 perl [snip] db> allthreads No such command db> panic panic: from debugger cpuid = 0 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2a panic() at panic+0x182 db_panic() at db_panic+0x17 db_command() at db_command+0x1ef db_command_loop() at db_command_loop+0x50 db_trap() at db_trap+0x89 kdb_trap() at kdb_trap+0x95 trap() at trap+0x295 calltrap() at calltrap+0x8 --- trap 0x3, rip = 0x8054694d, rsp = 0xff80a3f7e850, rbp = 0xff80a3f7e870 --- kdb_enter_why() at kdb_enter_why+0x3d assert_vop_elocked() at assert_vop_elocked+0x64 VOP_WRITE_APV() at VOP_WRITE_APV+0x155 vn_write() at vn_write+0x1ce dofilewrite() at dofilewrite+0x85 kern_pwritev() at kern_pwritev+0x66 pwrite() at pwrite+0x58 syscall() at syscall+0x1ce Xfast_syscall() at Xfast_syscall+0xab --- syscall (476, FreeBSD ELF64, pwrite), rip = 0x80074d17c, rsp = 0x7fffda98, rbp = 0xb4000 --- Uptime: 3h20m7s Physical memory: 6121 MB Dumping 1730 MB: (CTRL-C to abort) ... Dump complete Automatic reboot in 15 seconds - press a key on the console to abort --> Press a key on the console to reboot, --> or switch off the system now. Rebooting... And, here is a backtrace. (kgdb) bt #0 doadump () at pcpu.h:195 #1 0x80517e73 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:418 #2 0x805182fc in panic (fmt=Variable "fmt" is not available. ) at /usr/src/sys/kern/kern_shutdown.c:574 #3 0x801c6817 in db_panic (addr=Variable "addr" is not available. ) at /usr/src/sys/ddb/db_command.c:446 #4 0x801c710f in db_command (last_cmdp=0x80b21088, cmd_table=0x0, dopager=1) at /usr/src/sys/ddb/db_comman
Re: Big problem still remains with 7.2-STABLE locking up
I got a lockup at 3 a.m. JST, but because I'm not ready for dcons I cannot show you guys the whole ddb session. I put a 'bt' output of kgdb. http://www.heimat.gr.jp/localhost/kgdbbtvmcore.0 Kernel config: include GENERIC ident HEIMAT options MSGBUF_SIZE=81920 makeoptions DEBUG=-g options KDB options KDB_TRACE options KDB_UNATTENDED options DDB options BREAK_TO_DEBUGGER options QUOTA options DEVICE_POLLING options HZ=1000 options SW_WATCHDOG options DEBUG_VFS_LOCKS options INVARIANTS options INVARIANT_SUPPORT options WITNESS Thanks. P.S. "allthreads" was not a usable command in my RELENG_7's ddb. > In <3bbf2fe10906060749xbbc2f2fy4c09f67711a...@mail.gmail.com> > Attilio Rao wrote: > Anyways, the only one way we have to debug this is getting some help > by the user. > 1) Drop the option WITNESS_SPIKSPIN (as we would like to debug > spinlocks too) and LOCK_PROFILING (in order to create higher > contention and kill some barriers) > 2) Once you get the deadlock break in the DDB debugger > 3) Once you are in DDB informations which could be very useful are: db> show allpcpu db> show alllocks db> show lockedvnods db> ps db> allthreads > Note that this is a lot of printout so you won't be able of collecting > all these informations if not with a serial connection. > 4) Dump the content so that we can further look at locks structure > states once we identify something useful (ideally, keeping the machine > up in DDB for that would be very useful, but often not viable) > Let me know. > Attilio -- NAKAJI Hiroyuki ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Big problem still remains with 7.2-STABLE locking up
NAKAJI Hiroyuki wrote: Note that this is a lot of printout so you won't be able of collecting all these informations if not with a serial connection. The box does not have any serial port. Is there any other way? Is it possible to use dcons(4) for that purpose, if I add firewire PCI board? http://wiki.freebsd.org/DebugWithDcons may be of use to you. Regards, Jase Thew. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Big problem still remains with 7.2-STABLE locking up
> In <3bbf2fe10906060749xbbc2f2fy4c09f67711a...@mail.gmail.com> > Attilio Rao wrote: > > The kernel configuration is: > > > > include GENERIC > > ident HEIMAT > > options MSGBUF_SIZE=81920 > > makeoptions DEBUG=-g > > options KDB > > options DDB > > options BREAK_TO_DEBUGGER > > options QUOTA > Were you unmounting any of the QUOTA'ed filesystems? No. Quota'ed file system is /home which is not easily unmounted. > Anyways, the only one way we have to debug this is getting some help > by the user. > 1) Drop the option WITNESS_SPIKSPIN (as we would like to debug > spinlocks too) and LOCK_PROFILING (in order to create higher > contention and kill some barriers) Removed two lines from KERNCONF. > 2) Once you get the deadlock break in the DDB debugger Hmm. It is the most difficult: the box cannot break in the DDB debugger for now. > 3) Once you are in DDB informations which could be very useful are: db> show allpcpu db> show alllocks db> show lockedvnods db> ps db> allthreads > Note that this is a lot of printout so you won't be able of collecting > all these informations if not with a serial connection. The box does not have any serial port. Is there any other way? Is it possible to use dcons(4) for that purpose, if I add firewire PCI board? > 4) Dump the content so that we can further look at locks structure > states once we identify something useful (ideally, keeping the machine > up in DDB for that would be very useful, but often not viable) Thank you for instruction. I'll try. -- NAKAJI Hiroyuki ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Big problem still remains with 7.2-STABLE locking up
> In > Pete French wrote: > > I followed some instructions in the list thread. But unfortunately, the > > big problem still remains. 7.2-STABLE server locks up frequently. > Are you using the latest STABLE ? I am rolling out the one from a few > days ago with the bce fixes, and that works fine. Yes, the kernel was compiled at Sat Jun 6 17:59:50 JST 2009. I did not check the source changes but I need watching how it works. Yesterday's kernel can easily lock up, and ichwd+watchdogd can restart (reset?) the box. > > The kernel configuration is: > ... > > options BREAK_TO_DEBUGGER > When the box locks up, can you actyually break to the debugger ? This is how > we eventually tracked down my problem. No. I have never seen the debugger, Ctrl+Alt+Esc cannot break. And because this box does not have serial port, debugging seems difficult. -- NAKAJI Hiroyuki ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Big problem still remains with 7.2-STABLE locking up
> My story is very similar to Pete's. > http://lists.freebsd.org/pipermail/freebsd-stable/2009-January/047487.html My problem, which you link to there, tturrned out to be due to ICMP redirects, and is most definitely fixed in 7.2. So, your problem is not the same as mine, but some of the tips given there may help you ddebug it. > I followed some instructions in the list thread. But unfortunately, the > big problem still remains. 7.2-STABLE server locks up frequently. Are you using the latest STABLE ? I am rolling out the one from a few days ago with the bce fixes, and that works fine. > The kernel configuration is: ... > options BREAK_TO_DEBUGGER When the box locks up, can you actyually break to the debugger ? This is how we eventually tracked down my problem. -pete. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Big problem still remains with 7.2-STABLE locking up
2009/6/6 NAKAJI Hiroyuki : > Hi, > > I noticed, some months ago, frequent lockups on my RELENG_6 server with > ECS PM800-M2, Celeron 2.6GHz (UP), 2GB ram, ATA HDDs and 3Com NIC(xl0), > and then I gave up this old server. > > Last month, I replaced this 'unstable' server to the new one with > 7.2-RELEASE which worked very well until I setup it as 'a server'. The > problem began just after it started 'the services'. > > My story is very similar to Pete's. > http://lists.freebsd.org/pipermail/freebsd-stable/2009-January/047487.html > > I followed some instructions in the list thread. But unfortunately, the > big problem still remains. 7.2-STABLE server locks up frequently. > > Help! :-( > > The server is NEC Express5800 S70/SD. > > o CPU: Intel(R) Celeron(R) CPU 440 @ 2.00GHz (2280.25-MHz K8-class CPU) > o 6GB RAM > o ACPI APIC Table: > o 80GB and 250GB SATA HDDs > o http://www.heimat.gr.jp/~nakaji/localhost/dmesg.boot > > The kernel configuration is: > > include GENERIC > ident HEIMAT > options MSGBUF_SIZE=81920 > makeoptions DEBUG=-g > options KDB > options DDB > options BREAK_TO_DEBUGGER > options QUOTA Were you unmounting any of the QUOTA'ed filesystems? I'm aware of a possible deadlock between quota and unmount path which is very difficult to trigger though. Anyways, the only one way we have to debug this is getting some help by the user. 1) Drop the option WITNESS_SPIKSPIN (as we would like to debug spinlocks too) and LOCK_PROFILING (in order to create higher contention and kill some barriers) 2) Once you get the deadlock break in the DDB debugger 3) Once you are in DDB informations which could be very useful are: db> show allpcpu db> show alllocks db> show lockedvnods db> ps db> allthreads Note that this is a lot of printout so you won't be able of collecting all these informations if not with a serial connection. 4) Dump the content so that we can further look at locks structure states once we identify something useful (ideally, keeping the machine up in DDB for that would be very useful, but often not viable) Let me know. Attilio -- Peace can only be achieved by understanding - A. Einstein ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"