[Bug 94940] Re: mdns listed in nsswitch.conf causes excessive time for dns lookups
After reading through related bugs, it looks like avahi / nss is trying multicast dns before traditional dns. With multicast DNS, the only real option is long timeouts and retries, since only one avahi-enabled machine on the network may have a response for a given request. (That is, a successful lookup could have NXDomain responses from all but one host.) Since IP networks are assumed unreliable, it makes sense to retry requests, since the request may not have reached the one host that has the record. I'm not sure that there's an easy way to fix this. Anything we did to fix this issue would weaken multicast dns (lower the timeout, reduce the number of retries, etc). It's unfortunate that currently this impacts servers that have no use for avahi-style multicast dns, since avahi mdns is enabled by default on many systems (eg. gutsy). -- mdns listed in nsswitch.conf causes excessive time for dns lookups https://bugs.launchpad.net/bugs/94940 You received this bug notification because you are a member of Ubuntu Bugs, which is the bug contact for Ubuntu. -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 94940] Re: mdns listed in nsswitch.conf causes excessive time for dns lookups
After some stracing and tcpdumping, it looks like the changed behavior here is that when mdns gets an NXDomain response, it retries up to 5 seconds, then reports a "timeout" to the requesting client, rather than immediately reporting that the record doesn't exist. Is there a reason why requests that get NXDomain responses are retried? I can't think of a situation where that would be what you'd want, but maybe I'm missing something. Trace excerpts are below. Disabling mdns, request is over in 13 ms, and we do not retry (stracing sshd, following forks): [pid 7728] 02:10:20.474565 socket(PF_INET, SOCK_DGRAM, IPPROTO_IP) = 4 [pid 7728] 02:10:20.474649 connect(4, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("68.87.76.178")}, 28) = 0 [pid 7728] 02:10:20.474745 fcntl64(4, F_GETFL) = 0x2 (flags O_RDWR) [pid 7728] 02:10:20.474806 fcntl64(4, F_SETFL, O_RDWR|O_NONBLOCK) = 0 [pid 7728] 02:10:20.474870 gettimeofday({1199614220, 474899}, NULL) = 0 [pid 7728] 02:10:20.474943 poll([{fd=4, events=POLLOUT, revents=POLLOUT}], 1, 0) = 1 [pid 7728] 02:10:20.475030 send(4, "\214\222\1\0\0\1\0\0\0\0\0\0\0011\0010\003168\003192\7"..., 42, MSG_NOSIGNAL) = 42 [pid 7728] 02:10:20.475158 poll([{fd=4, events=POLLIN, revents=POLLIN}], 1, 5000) = 1 [pid 7728] 02:10:20.488319 ioctl(4, FIONREAD, [42]) = 0 [pid 7728] 02:10:20.488425 recvfrom(4, "\214\222\201\203\0\1\0\0\0\0\0\0\0011\0010\003168\0031"..., 1024, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("68.87.76.178")}, [16]) = 42 [pid 7728] 02:10:20.488640 close(4)= 0 There are no further communications with the dns server in the trace. (It's not real clear here, but the IP being looked up is 192.168.0.1.) With mdns enabled, we retry several times (stracing avahi-daemon). I've annotated it with shell-style comments, since it's much longer. # avahi-daemon gets the RESOLVE-ADDRESS command from sshd over its socket 02:23:43.930581 poll([{fd=7, events=POLLIN}, {fd=3, events=POLLIN, revents=POLLIN}, {fd=16, events=POLLIN}, {fd=15, events=POLLIN}, {fd=14, events=POLLIN}, {fd=13, events=POLLIN}, {fd=12, events=POLLIN}, {fd=11, events=POLLIN}, {fd=9, events=POLLIN}], 9, 2196610) = 1 02:23:43.930711 gettimeofday({1199615023, 930740}, NULL) = 0 02:23:43.930778 read(3, "RESOLVE-ADDRESS 192.168.0.1\n", 20480) = 28 # (snip) # request #1 02:23:44.035615 sendmsg(14, {msg_name(16)={sa_family=AF_INET, sin_port=htons(5353), sin_addr=inet_addr("224.0.0.251")}, msg_iov(1)=[{"\0\0\0\0\0\1\0\0\0\0\0\0\0011\0010\003168\003192\7in-a"..., 42}], msg_controllen=24, {cmsg_len=24, cmsg_level=SOL_IP, cmsg_type=, ...}, msg_flags=0}, 0) = 42 # (snip) 02:23:44.036015 poll([{fd=7, events=POLLIN}, {fd=3, events=POLLIN}, {fd=16, events=POLLIN}, {fd=15, events=POLLIN}, {fd=14, events=POLLIN, revents=POLLIN}, {fd=13, events=POLLIN}, {fd=12, events=POLLIN}, {fd=11, events=POLLIN}, {fd=9, events=POLLIN}], 9, 100) = 1 02:23:44.036144 gettimeofday({1199615024, 36173}, NULL) = 0 02:23:44.036212 ioctl(14, FIONREAD, [42]) = 0 02:23:44.036307 recvmsg(14, {msg_name(16)={sa_family=AF_INET, sin_port=htons(5353), sin_addr=inet_addr("192.168.0.50")}, msg_iov(1)=[{"\0\0\0\0\0\1\0\0\0\0\0\0\0011\0010\003168\003192\7in-a"..., 42}], msg_controllen=40, {cmsg_len=24, cmsg_level=SOL_IP, cmsg_type=, ...}, msg_flags=0}, 0) = 42 # (snip) # request #2, 1 second after first request 02:23:45.039623 sendmsg(14, {msg_name(16)={sa_family=AF_INET, sin_port=htons(5353), sin_addr=inet_addr("224.0.0.251")}, msg_iov(1)=[{"\0\0\0\0\0\1\0\0\0\0\0\0\0011\0010\003168\003192\7in-a"..., 42}], msg_controllen=24, {cmsg_len=24, cmsg_level=SOL_IP, cmsg_type=, ...}, msg_flags=0}, 0) = 42 02:23:45.039807 write(8, "W", 1)= 1 02:23:45.039899 read(7, "WW", 10) = 2 02:23:45.039965 gettimeofday({1199615025, 39994}, NULL) = 0 02:23:45.040032 poll([{fd=7, events=POLLIN}, {fd=3, events=POLLIN}, {fd=16, events=POLLIN}, {fd=15, events=POLLIN}, {fd=14, events=POLLIN, revents=POLLIN}, {fd=13, events=POLLIN}, {fd=12, events=POLLIN}, {fd=11, events=POLLIN}, {fd=9, events=POLLIN}], 9, 100) = 1 02:23:45.040162 gettimeofday({1199615025, 40190}, NULL) = 0 02:23:45.040229 ioctl(14, FIONREAD, [42]) = 0 02:23:45.040307 recvmsg(14, {msg_name(16)={sa_family=AF_INET, sin_port=htons(5353), sin_addr=inet_addr("192.168.0.50")}, msg_iov(1)=[{"\0\0\0\0\0\1\0\0\0\0\0\0\0011\0010\003168\003192\7in-a"..., 42}], msg_controllen=40, {cmsg_len=24, cmsg_level=SOL_IP, cmsg_type=, ...}, msg_flags=0}, 0) = 42 # (snip) # request #2, 3 seconds after first request 02:23:47.043610 sendmsg(14, {msg_name(16)={sa_family=AF_INET, sin_port=htons(5353), sin_addr=inet_addr("224.0.0.251")}, msg_iov(1)=[{"\0\0\0\0\0\1\0\0\0\0\0\0\0011\0010\
[Bug 84899] Re: SSH with GSSAPIAuthentication option on SSH servers are very slow
This is really an nss-mdns bug, reported here: https://bugs.launchpad.net/ubuntu/+source/avahi/+bug/94940 -- SSH with GSSAPIAuthentication option on SSH servers are very slow https://bugs.launchpad.net/bugs/84899 You received this bug notification because you are a member of Ubuntu Bugs, which is the bug contact for Ubuntu. -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 34902] Re: Ralink Wireless USB/PCMCIA/PCI hangs PC
So ndiswrapper works, but that solution only helps x86, and not other supported platforms (ppc, sparc). When possible, it's better to fix a driver than to retreat to ndiswrapper. -- Ralink Wireless USB/PCMCIA/PCI hangs PC https://bugs.launchpad.net/bugs/34902 You received this bug notification because you are a member of Ubuntu Bugs, which is a direct subscriber. -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 34902] Re: Ralink Wireless USB/PCMCIA/PCI hangs PC
This seems to be a bug in the rt2570 module. Since you can also use ndiswrapper to drive these cards (at least those of us on intel boxes), there is a simple workaround: disable the rt2570 module. You can compile it out, or just add these two lines to your /etc/modprobe.d/blacklist: # use ndiswrapper instead for rt2x00 devices: rt2570 panics kernel blacklist rt2570 With rt2570 disabled, my device works as expected, appearing as the wlan0 interface (ndiswrapper), rather than rausb0 (rt2570). -- Ralink Wireless USB/PCMCIA/PCI hangs PC https://launchpad.net/bugs/34902 -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 34902] Re: Ralink Wireless USB/PCMCIA/PCI hangs PC
Oh, more information, for those more interested in getting their card working than in fixing kernel bugs: I was able to get my card up by properly configuring /etc/network/interfaces manually, in the standard Ubuntu kernels, with DHCP. I can get exactly one "ifup" command through per reboot. You can also do one "ifdown" afterwards. However, on the second "ifup" the system panic's. I think the wizard does several ip-down steps, which triggers the panic. The end of my /etc/network/interfaces file, where I added wireless configs, looks like this: iface rausb0 inet dhcp wireless-essid wireless-key *Don't* declare your interface "auto" in /etc/network/interfaces, since that tells automatic processes that they can up/down that interface, which will panic your kernel. I have a simple script that checks if i've ever up'ed the card (by reading the ESSID from iwconfig), and only up's it once, and that is in my startup scripts: is_wlan_up=$(iwconfig 2>/dev/null | grep '^rausb0' | sed -e 's/^.*ESSID://' | cut -d\" -f2) [ "$is_wlan_up" ] || ifup rausb0 That was sufficient to get my D-Link DWL-g122 USB card (rt2500 based) working with Ubuntu. My network basically works at this point, but I'm curious to poke around a bit more to see what's wong. -- Ralink Wireless USB/PCMCIA/PCI hangs PC https://launchpad.net/bugs/34902 -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 34902] Re: Ralink Wireless USB/PCMCIA/PCI hangs PC
Here is some additional information: Playing around in a text console, it appears that the "freeze" is caused by a kernel panic that occurs when we hit a BUG in a bh context at kernel/timer.c, line 411 (in cascade, an inline function that appears in __run_timers, which itself is an inline function that appears in run_timer_softirq, which is run in a bh context). Here's the trace I was working from. The code indicates that it's line 411 of some file: Code: ... <0f> 0b 9b 01 ee 03 30 c0 ... It panics, so I couldn't conclusively trace which file from the panic, but I think it's clearly kernel/timer.c, given the stack trace. [] run_timer_softirq+0x132/0x1d0 [] __do_softirq+0x4f/0xb0 [] do_softirq+0x35/0x40 [] irq_exit+0x35/0x40 [] do_IRQ+0x1f/0x30 [] common_interrupt+0x1a/0x20 I'm doing some testing now, as to how to fix this. Since this is a double-fault, I'm curious if we can't just disable BUG()'s in the kernel (by recompiling). I don't know if the system will recover or not - it realizes something's wrong, but we panic since it reports the BUG in BH context, not necessarily because the problem is un-recoverable. I just did a build with big-kernel-lock pre-emption (CONFIG_PREEMPT_BKL) turned off, and I didn't see any touble. So this may be a pre-emption issue. The kernel that had the problem and the one that didn't is a pretty big configuration delta, so I'm still trying to figure out what fixed it. Still looking... -- Ralink Wireless USB/PCMCIA/PCI hangs PC https://launchpad.net/bugs/34902 -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs