Quoting Robert Watson <[EMAIL PROTECTED]>:


On Fri, 19 Sep 2008, Oleg V. Nauman wrote:

(1) Start by deleting all but one nameserver entry in /etc/resolv.conf.
  Confirm that you can still reproduce the problem.

Due to various reasons my laptop running local caching DNS server ( named ) without any forwarders assigned. My /etc/resolv.conf contains nameserver 127.0.0.1

This is simplifying in some senses, but complicating in others.  In
particular, the question it raises is whether the problem is in the DNS
resolver or the nameserver.  Seeing a tcpdump of lo0 for DNS traffic
would be quite interesting, since we could look at timestamps and try
to place the blame a bit more precisely.

Could you
  also use procstat -k on the dig process to generate a kernel stack trace
  for it?

Let's add to this list: when the problem happens, could you also
procstat -k the name server process(es)?

And procstat -kk output for logger process waiting:

PID    TID COMM             TDNAME           KSTACK
1421 100095 logger - mi_switch+0x2c8 sleepq_switch+0xd9 sleepq_catch_signals+0x239 sleepq_wait_sig+0x14 _sleep+0x35f pipe_read+0x389 dofileread+0x96 kern_readv+0x58 read+0x4f syscall+0x2b3 Xint0x80_syscall+0x20

Interesting -- logger is blocked on reading from a pipe, likely
standard input.  So it sounds like something else is failing to
complete in a timely manner -- perhaps due to DNS.

Nothing strange with this because it was kernel stack for logger waiting on background fsck output ( bgfsck was never starting though )


This is approximately the date of my last UDP MFC. Could you try backing out just src/sys/netinet6/udp6_usrreq.c revision 1.81.2.7 and see if that helps? (specifically, restore the use of sosend_generic instead of sosend_dgram)

If you can show that it's definitely a problem with the change to
sosend_dgram for UDPv6 socket send, then it might suggest it's the same
problem that it is related to the UDPv46 code there.  In which case I
will propose we back out that portion of the change in the 7-stable
branch until it's known to be resolved -- I don't want other people
tripping over this.

Sorry for false alarm regarding UDP issues.. Have noticed that my clock is stop incrementing ( it explaining the zeroes in traceroute output also ). It gave me idea what is related to this issue so performed backout revision 1.243.2.4 of src/sys/dev/acpica/acpi.c and it fixes my issues.. Looks like it stops incrementing the timecounters on my laptop.. Ironically speaking I was this ACPI behavior change initiator ( I was reporting "ACPI HPET stops working on my RELENG_7" at July 19 to [EMAIL PROTECTED]) so jhb@ implemented a patch and it was working for me those days. Something was changed during the next 2 months so this patch causing issues instead the success on my hardware. I will play a bit with kern.timecounter.choice at Monday and report it back to jhb@ then.


Could you try compiling your kernel with WITNESS to see if we get any extended debugging information?

Have added WITNESS ( and STACK required by procstat ) options but it is not producing any output ( so no LORs or something like this )

OK.  Could you try adding INVARIANT_SUPPORT and INVARIANTS if they
aren't there?  Be aware: this may convert the wedging you are
experiencing into a kernel panic.

No output produced with INVARIANT_SUPPORT and INVARIANTS support included in the kernel. And no kernel panic produced :) Thank you for excellent work.


Is anybody experiencing the same issues with fresh RELENG_7? Unsure it is my local issues though

I'm not experiencing them, but these sorts of things can be quite subtle and workload-dependent.

Well experiencing this issue during the system boot even..

OK.  So there must be something a bit different about your setup --
perhaps there's something specific about the way things are interacting
over the loopback address for the name server.  Is this the stock
system BIND9 or something else?  Are you able to temporarily switch to

 I have stock system BIND running

an external name server and see if that changes things?

Robert N M Watson
Computer Laboratory
University of Cambridge


_______________________________________________
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Reply via email to