On Fri, 13 Mar 2009, Nick Withers wrote:

Sorry for the original double-post, by the way, not quite sure how that happened...

I can reproduce this problem relatively easily, by the way (every 3 days, on average). I meant to say this before, too, but it seems to happen a lot more often on the fxp than on rl.

I'm sorry to ask what is probably a very simple question, but is there somewhere I should look to get clues on debugging from a manually generated dump? I tried "panic" after manually envoking the kernel debugger but proved highly inept at getting from the dump the same information "ps" / "where" gave me within the debugger live.

If this is, in fact, a TCP input lock leak of some sort, then most likely some particular property of a host your system talks to, or a network it runs over, triggers this (presumably) unusual edge case -- perhaps a firewall that mucks with TCP in a funny way, etc. Of course, it might be something completely different -- the fact that everything is blocked on *tcp_sc_h and *tcp, simply means that something holding TCP locks hasn't released them, and this could happen for a number of reasons.

Once you've acquired a crashdump, you can run crashinfo(8), which will produce a summary of useful debugging information. There are some things that are a bit easier to do in the run-time debugger, such as lock analysis, as the run-time debugger is more up-close and personal with in-kernel data structures; other things are easier in kgdb, which has complete source code and C type access. I find kgdb works pretty well for everything but "show much what locks are held". Many of our system monitoring tools, including ps and portions of netstat, can actually be run on crashdumps to report the state of the system at the time it crashed -- take a look at the -M and -N command line arguments, which respectively allow you to point those tools at the crashdump and at a kernel with debugging symbols (typically kernel.debug or kernel.symbols) matching the kernel that was booted at the time of the crash.

Robert N M Watson
Computer Laboratory
University of Cambridge


Ta for your help!

Robert N M Watson
Computer Laboratory
University of Cambridge



Tracing PID 31 tid 100030 td 0xffffff00012016e0
sched_switch() at sched_switch+0xf1
mi_switch() at mi_switch+0x18f
turnstile_wait() at turnstile_wait+0x1cf
_mtx_lock_sleep() at _mtx_lock_sleep+0x76
syncache_lookup() at syncache_lookup+0x176
syncache_expand() at syncache_expand+0x38
tcp_input() at tcp_input+0xa7d
ip_input() at ip_input+0xa8
ether_demux() at ether_demux+0x1b9
ether_input() at ether_input+0x1bb
fxp_intr() at fxp_intr+0x233
ithread_loop() at ithread_loop+0x17f
fork_exit() at fork_exit+0x11f
fork_trampoline() at fork_trampoline+0xe
____

A "where" on a process stuck in "*tcp", in this case "[swi4: clock]",
gave the somewhat similar:
____

sched_switch() at sched_switch+0xf1
mi_switch() at mi_switch+0x18f
turnstile_wait() at turnstile_wait+0x1cf
_rw_rlock() at _rw_rlock+0x8c
ipfw_chk() at ipfw_chk+0x3ab2
ipfw_check_out() at ipfw_check_out+0xb1
pfil_run_hooks() at pfil_run_hooks+0x9c
ip_output() at ip_output+0x367
syncache_respond() at syncache_respond+0x2fd
syncache_timer() at syncache_timer+0x15a
(...)
____

In this particular case, the fxp0 card is in a lagg with rl0, but this
problem can be triggered with either card on their own...

The scheduler is SCHED_ULE.

I'm not too sure how to give more useful information that this, I'm
afraid. It's a custom kernel, too... Do I need to supply information on
what code actually exists at the relevant addresses (I'm not at all
clued in on how to do this... Sorry!)? Should I chuck WITNESS,
INVARIANTS et al. in?

I *think* every time this has been triggered there's been a "python2.5"
process in the "*tcp" state. This machine runs net-p2p/deluge and
generally has at least 100 TCP connections on the go at any given time.

Can anyone give me a clue as to what I might do to track this down?
Appreciate any pointers.
--
Nick Withers
email: n...@nickwithers.com
Web: http://www.nickwithers.com
Mobile: +61 414 397 446

--
Nick Withers
email: n...@nickwithers.com
Web: http://www.nickwithers.com
Mobile: +61 414 397 446

_______________________________________________
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Reply via email to