On Fri, 19 Sep 2008, Oleg V. Nauman wrote:

(1) Start by deleting all but one nameserver entry in /etc/resolv.conf.
   Confirm that you can still reproduce the problem.

Due to various reasons my laptop running local caching DNS server ( named ) without any forwarders assigned. My /etc/resolv.conf contains nameserver 127.0.0.1

This is simplifying in some senses, but complicating in others. In particular, the question it raises is whether the problem is in the DNS resolver or the nameserver. Seeing a tcpdump of lo0 for DNS traffic would be quite interesting, since we could look at timestamps and try to place the blame a bit more precisely.

Could you
   also use procstat -k on the dig process to generate a kernel stack trace
   for it?

Let's add to this list: when the problem happens, could you also procstat -k the name server process(es)?

And procstat -kk output for logger process waiting:

PID    TID COMM             TDNAME           KSTACK
1421 100095 logger - mi_switch+0x2c8 sleepq_switch+0xd9 sleepq_catch_signals+0x239 sleepq_wait_sig+0x14 _sleep+0x35f pipe_read+0x389 dofileread+0x96 kern_readv+0x58 read+0x4f syscall+0x2b3 Xint0x80_syscall+0x20

Interesting -- logger is blocked on reading from a pipe, likely standard input. So it sounds like something else is failing to complete in a timely manner -- perhaps due to DNS.

This is approximately the date of my last UDP MFC. Could you try backing out just src/sys/netinet6/udp6_usrreq.c revision 1.81.2.7 and see if that helps? (specifically, restore the use of sosend_generic instead of sosend_dgram)

If you can show that it's definitely a problem with the change to sosend_dgram for UDPv6 socket send, then it might suggest it's the same problem that it is related to the UDPv46 code there. In which case I will propose we back out that portion of the change in the 7-stable branch until it's known to be resolved -- I don't want other people tripping over this.

Could you try compiling your kernel with WITNESS to see if we get any extended debugging information?

Have added WITNESS ( and STACK required by procstat ) options but it is not producing any output ( so no LORs or something like this )

OK. Could you try adding INVARIANT_SUPPORT and INVARIANTS if they aren't there? Be aware: this may convert the wedging you are experiencing into a kernel panic.

Is anybody experiencing the same issues with fresh RELENG_7? Unsure it is my local issues though

I'm not experiencing them, but these sorts of things can be quite subtle and workload-dependent.

Well experiencing this issue during the system boot even..

OK. So there must be something a bit different about your setup -- perhaps there's something specific about the way things are interacting over the loopback address for the name server. Is this the stock system BIND9 or something else? Are you able to temporarily switch to an external name server and see if that changes things?

Robert N M Watson
Computer Laboratory
University of Cambridge
_______________________________________________
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Reply via email to