Re: [dnsdist] dnsdist using loopback address as source address for queries
On 10/29/21 15:32, Adam Bishop via dnsdist wrote: On 29 Oct 2021, at 13:38, Remi Gacogne via dnsdist wrote: Would you mind checking that you still have IPv6 addresses on that interface? I see you still have some on the incoming interface, though, since we receive a query over IPv6 on file descriptor 18 in your strace output. Yeah, it's a single interface system so the ingress/egress addresses are the same. Any events in the system logs that looks like the IP addresses might have changed at some point? The only thing in dmesg is an SElinux policy update triggered by dnf, but no AVC entries that would indicate that being the cause. NetworkManager hasn't written anything to its own log since boot, which is normal. Anything in the dnsdist logs looking like a reconnect (error while writing to the backend, ..)? We should not reconnect unless send() failed with EINVAL or ENODEV, which is not supposed to happen in your case since you don't set the source interface. There's some normal terminations when the backend is reloaded for config changes, e.g. dnsdist[1351]: Timeout while waiting for the health check response from backend [2001:630:1:170::67]:53 dnsdist[1351]: Marking downstream [2001:630:1:170::67]:53 as 'down' dnsdist[1351]: Timeout while waiting for the health check response from backend 193.63.72.67:53 dnsdist[1351]: Marking downstream 193.63.72.67:53 as 'down' dnsdist[1351]: Marking downstream [2001:630:1:170::67]:53 as 'up' dnsdist[1351]: Marking downstream 193.63.72.67:53 as 'up' But that's it - the only other messages are "Packet from:" Alright, thanks for checking! I have to say I'm running a bit out of ideas. We only bind to a specific source address if one is supplied in the configuration, and that's not the case here. The field holding that source address is not updated at runtime, and I would be surprised to see a memory corruption touching only that field. You might actually be able to check that via gdb, by attaching to the process, selecting the right thread with 'thread 7' (it needs to be one of these in responderThread() for the right backend, which is 7 or 9 in your current process based on the gdb output) then 'up' three times to reach the responderThread frame, and 'print *dss' to see the content of the structure. Even if that field is corrupted, we would need to reconnect which normally does happen only the errors I mentioned earlier, and we don't have these in your logs. It also happens on a health-check failure when 'reconnectOnFailure=true' is set on a backend, but that's not the case in your configuration. Actually, I wonder if that would make a difference, do you think you could try setting that option on one of the IPv6 backends? Best regards, Remi OpenPGP_signature Description: OpenPGP digital signature ___ dnsdist mailing list dnsdist@mailman.powerdns.com https://mailman.powerdns.com/mailman/listinfo/dnsdist
Re: [dnsdist] dnsdist using loopback address as source address for queries
On 29 Oct 2021, at 13:38, Remi Gacogne via dnsdist wrote: > Would you mind checking that you still have IPv6 addresses on that interface? > I see you still have some on the incoming interface, though, since we receive > a query over IPv6 on file descriptor 18 in your strace output. Yeah, it's a single interface system so the ingress/egress addresses are the same. > Any events in the system logs that looks like the IP addresses might have > changed at some point? The only thing in dmesg is an SElinux policy update triggered by dnf, but no AVC entries that would indicate that being the cause. NetworkManager hasn't written anything to its own log since boot, which is normal. > Anything in the dnsdist logs looking like a reconnect (error while writing to > the backend, ..)? We should not reconnect unless send() failed with EINVAL or > ENODEV, which is not supposed to happen in your case since you don't set the > source interface. There's some normal terminations when the backend is reloaded for config changes, e.g. dnsdist[1351]: Timeout while waiting for the health check response from backend [2001:630:1:170::67]:53 dnsdist[1351]: Marking downstream [2001:630:1:170::67]:53 as 'down' dnsdist[1351]: Timeout while waiting for the health check response from backend 193.63.72.67:53 dnsdist[1351]: Marking downstream 193.63.72.67:53 as 'down' dnsdist[1351]: Marking downstream [2001:630:1:170::67]:53 as 'up' dnsdist[1351]: Marking downstream 193.63.72.67:53 as 'up' But that's it - the only other messages are "Packet from:" Adam Jisc is a registered charity (number 1149740) and a company limited by guarantee which is registered in England under company number. 05747339, VAT number GB 197 0632 86. Jisc’s registered office is: 4 Portwall Lane, Bristol, BS1 6NB. T 0203 697 5800. Jisc Services Limited is a wholly owned Jisc subsidiary and a company limited by guarantee which is registered in England under company number 02881024, VAT number GB 197 0632 86. The registered office is: 4 Portwall Lane, Bristol, BS1 6NB. T 0203 697 5800. Jisc Commercial Limited is a wholly owned Jisc subsidiary and a company limited by shares which is registered in England under company number 09316933, VAT number GB 197 0632 86. The registered office is: 4 Portwall Lane, Bristol, BS1 6NB. T 0203 697 5800. For more details on how Jisc handles your data see our privacy notice here: https://www.jisc.ac.uk/website/privacy-notice ___ dnsdist mailing list dnsdist@mailman.powerdns.com https://mailman.powerdns.com/mailman/listinfo/dnsdist
Re: [dnsdist] dnsdist using loopback address as source address for queries
On 10/29/21 13:04, Adam Bishop via dnsdist wrote: On 29 Oct 2021, at 11:33, Remi Gacogne via dnsdist wrote: would you mind getting the output of "lsof -n -p " while the issue is still happening? A full backtrace with gdb might also be good to have Sure that's fine, output follows. I've uploaded the gdb output as it's quite long. We've got three other instances to take the load, so I'm happy to keep it running as is for the moment. Thanks! So we see from the strace output that dnsdist is calling send() (which becomes the sendto() syscall) to send the queries to the backend, using the connected UDP sockets: [pid 1384] sendto(11, "U\251\0 \0\1\0\0\0\0\0\1\4\4lbdn\4virt\2ja\3net\0\0\6\0\1\0\0)\20\0\0\0\0\0\0\30\0\10\0\24\0\2\200\0 \1\0060\0 [...] [pid 1382] sendto(11, "V\251\0 \0\1\0\0\0\0\0\1\4\4lbdn\4virt\2ja\3net\0\0\6\0\1\0\0)\20\0\0\0\0\0\0\f\0\10\0\10\0\1 \0\302S\7\305", 62, 0, NULL, 0) = 62 But we then see in the lsof output that these IPv6 sockets (the 11 is used above, but the others are the same) are connected using ::1 as source, while the IPv4 ones are not: dnsdist 1351 dnsdist 11u IPv6 28637 0t0 UDP [::1]:35903->[2001:630:1:160::195]:domain dnsdist 1351 dnsdist 13u IPv4 28640 0t0 UDP 194.83.7.197:45941->194.83.7.195:domain dnsdist 1351 dnsdist 15u IPv6 28650 0t0 UDP [::1]:51827->[2001:630:1:170::67]:domain dnsdist 1351 dnsdist 17u IPv4 28653 0t0 UDP 194.83.7.197:57454->193.63.72.67:domain That explains what you are seeing, but I don't understand how we end up with these.. Would you mind checking that you still have IPv6 addresses on that interface? I see you still have some on the incoming interface, though, since we receive a query over IPv6 on file descriptor 18 in your strace output. Any events in the system logs that looks like the IP addresses might have changed at some point? Anything in the dnsdist logs looking like a reconnect (error while writing to the backend, ..)? We should not reconnect unless send() failed with EINVAL or ENODEV, which is not supposed to happen in your case since you don't set the source interface. Best regards, Remi OpenPGP_signature Description: OpenPGP digital signature ___ dnsdist mailing list dnsdist@mailman.powerdns.com https://mailman.powerdns.com/mailman/listinfo/dnsdist
Re: [dnsdist] dnsdist using loopback address as source address for queries
On 29 Oct 2021, at 11:33, Remi Gacogne via dnsdist wrote: > would you mind getting the output of "lsof -n -p " while the > issue is still happening? A full backtrace with gdb might also be good to have Sure that's fine, output follows. I've uploaded the gdb output as it's quite long. We've got three other instances to take the load, so I'm happy to keep it running as is for the moment. Adam GDB thread dump: https://gist.githubusercontent.com/TheMysteriousX/08185778d506e973a0bbe8675dd35282/raw/fc5378be3e9b40abd5c4c6f50da6f0e0cdb5fe35/dnsdist%2520thread%2520dump lsof: COMMAND PIDUSER FD TYPE DEVICE SIZE/OFF NODE NAME dnsdist 1351 dnsdist cwd DIR 253,0 224 128 / dnsdist 1351 dnsdist rtd DIR 253,0 224 128 / dnsdist 1351 dnsdist txt REG 253,0 8461520 16827142 /usr/bin/dnsdist dnsdist 1351 dnsdist mem REG 253,0 9253600 401478 /var/lib/sss/mc/passwd dnsdist 1351 dnsdist mem REG 253,0 6940392 401482 /var/lib/sss/mc/group dnsdist 1351 dnsdist mem REG 253,046280 25375781 /usr/lib64/libnss_sss.so.2 dnsdist 1351 dnsdist mem REG 253,030256 25167667 /usr/lib64/libcap-ng.so.0.0.0 dnsdist 1351 dnsdist mem REG 253,026192 25167768 /usr/lib64/libattr.so.1.1.2448 dnsdist 1351 dnsdist mem REG 253,0 543160 25167185 /usr/lib64/libpcre2-8.so.0.7.1 dnsdist 1351 dnsdist mem REG 253,037024 25167632 /usr/lib64/libffi.so.6.0.2 dnsdist 1351 dnsdist mem REG 253,0 123336 25167669 /usr/lib64/libaudit.so.1.0.0 dnsdist 1351 dnsdist mem REG 253,0 1869272 25254232 /usr/lib64/libdb-5.3.so dnsdist 1351 dnsdist mem REG 253,0 309104 25439215 /usr/lib64/liblua-5.3.so dnsdist 1351 dnsdist mem REG 253,054816 25167770 /usr/lib64/libacl.so.1.1.2253 dnsdist 1351 dnsdist mem REG 253,054688 25167496 /usr/lib64/libpopt.so.0.0.1 dnsdist 1351 dnsdist mem REG 253,0 678440 25439213 /usr/lib64/libzstd.so.1.4.4 dnsdist 1351 dnsdist mem REG 253,0 104744 25443492 /usr/lib64/libelf-0.182.so dnsdist 1351 dnsdist mem REG 253,074744 25167530 /usr/lib64/libbz2.so.1.0.6 dnsdist 1351 dnsdist mem REG 253,0 144392 25167536 /usr/lib64/libgpg-error.so.0.24.2 dnsdist 1351 dnsdist mem REG 253,0 168536 25165966 /usr/lib64/libselinux.so.1 dnsdist 1351 dnsdist mem REG 253,033392 25439203 /usr/lib64/libuuid.so.1.3.0 dnsdist 1351 dnsdist mem REG 253,0 343544 25288469 /usr/lib64/libblkid.so.1.1.0 dnsdist 1351 dnsdist mem REG 253,0 1503456 25167662 /usr/lib64/libgmp.so.10.3.2 dnsdist 1351 dnsdist mem REG 253,0 201632 25167916 /usr/lib64/libhogweed.so.4.5 dnsdist 1351 dnsdist mem REG 253,0 239456 25202369 /usr/lib64/libnettle.so.6.5 dnsdist 1351 dnsdist mem REG 253,081408 25167992 /usr/lib64/libtasn1.so.6.5.5 dnsdist 1351 dnsdist mem REG 253,0 1760264 25167714 /usr/lib64/libunistring.so.2.1.0 dnsdist 1351 dnsdist mem REG 253,0 162224 25167717 /usr/lib64/libidn2.so.0.3.6 dnsdist 1351 dnsdist mem REG 253,0 1246520 25301369 /usr/lib64/libp11-kit.so.0.3.0 dnsdist 1351 dnsdist mem REG 253,095416 25205548 /usr/lib64/libz.so.1.2.11 dnsdist 1351 dnsdist mem REG 253,023464 25167336 /usr/lib64/libutil-2.28.so dnsdist 1351 dnsdist mem REG 253,0 142712 25167518 /usr/lib64/libcrypt.so.1.1.0 dnsdist 1351 dnsdist mem REG 253,0 123520 25341073 /usr/lib64/libresolv-2.28.so dnsdist 1351 dnsdist mem REG 253,0 2181160 25221757 /usr/lib64/libperl.so.5.26.3 dnsdist 1351 dnsdist mem REG 253,0 207472 25788589 /usr/lib64/librpmio.so.8.2.0 dnsdist 1351 dnsdist mem REG 253,0 519352 25702207 /usr/lib64/librpm.so.8.2.0 dnsdist 1351 dnsdist mem REG 253,062344 25185617 /usr/lib64/libsensors.so.4.4.0 dnsdist 1351 dnsdist mem REG 253,0 1188080 25167585 /usr/lib64/libgcrypt.so.20.2.5 dnsdist 1351 dnsdist mem REG 253,0 371312 25185868 /usr/lib64/libmount.so.1.1.0 dnsdist 1351 dnsdist mem REG 253,0 119760 25185932 /usr/lib64/liblz4.so.1.8.3 dnsdist 1351 dnsdist mem REG 253,0 192016 25167491 /usr/lib64/liblzma.so.5.2.4 dnsdist 1351 dnsdist mem REG 253,069024 25341075 /usr/lib64/librt-2.28.so dnsdist 1351 dnsdist mem REG 253,0 208616 25167221 /usr/lib64/libtinfo.so.6.1
Re: [dnsdist] dnsdist using loopback address as source address for queries
Hi Adam, On 10/29/21 12:10, Adam Bishop via dnsdist wrote: On 13 Sep 2021, at 13:47, Adam Bishop wrote: On 13 Sep 2021, at 13:31, Remi Gacogne via dnsdist wrote: That's very weird, I don't have any clue to what might be happening. Would you mind sharing the whole configuration? In particular, do you set the source interface? I would also be very interested in seeing a strace of the process while the issue is happening. Sure, I can attach strace the next time I catch it in the act. It's occurring infrequently, but reliably so might take me a few days. Source interface is not explicitly set, as the host is single homed. Took a while to recur, but I attached trace. Nothing immediately jumps out at me, there's no errors returned by the socket. Command used was: strace -v -s 1024 -Ff -p 1351 2>&1 | grep -v epoll_wait | grep -v nanosleep I've left the instance in the broken state, so if there's additional testing that would be useful, please let me know. I'll look at the trace more closely in a bit but would you mind getting the output of "lsof -n -p " while the issue is still happening? A full backtrace with gdb might also be good to have, you would first need to attach to the running process with 'gdb -p dnsdist' then issue the 'thread apply all bt full' command. You can then close gdb and let dnsdit running (or not). Thanks! OpenPGP_signature Description: OpenPGP digital signature ___ dnsdist mailing list dnsdist@mailman.powerdns.com https://mailman.powerdns.com/mailman/listinfo/dnsdist
Re: [dnsdist] dnsdist using loopback address as source address for queries
On 13 Sep 2021, at 13:47, Adam Bishop wrote: > On 13 Sep 2021, at 13:31, Remi Gacogne via dnsdist > wrote: >> That's very weird, I don't have any clue to what might be happening. Would >> you mind sharing the whole configuration? In particular, do you set the >> source interface? I would also be very interested in seeing a strace of the >> process while the issue is happening. > > Sure, I can attach strace the next time I catch it in the act. It's occurring > infrequently, but reliably so might take me a few days. Source interface is > not explicitly set, as the host is single homed. Took a while to recur, but I attached trace. Nothing immediately jumps out at me, there's no errors returned by the socket. Command used was: strace -v -s 1024 -Ff -p 1351 2>&1 | grep -v epoll_wait | grep -v nanosleep I've left the instance in the broken state, so if there's additional testing that would be useful, please let me know. Adam --- [pid 1388] epoll_create(1024) = 59 [pid 1388] socket(AF_INET, SOCK_DGRAM, IPPROTO_IP) = 61 [pid 1388] epoll_ctl(59, EPOLL_CTL_ADD, 61, {EPOLLIN, {u32=61, u64=61}}) = 0 [pid 1388] epoll_ctl(59, EPOLL_CTL_DEL, 61, 0x7f15c97f0124) = 0 [pid 1388] close(61) = 0 [pid 1388] close(59) = 0 [pid 1388] epoll_create(1024) = 59 [pid 1388] socket(AF_INET, SOCK_DGRAM, IPPROTO_IP) = 61 [pid 1388] epoll_ctl(59, EPOLL_CTL_ADD, 61, {EPOLLIN, {u32=61, u64=61}}) = 0 [pid 1388] epoll_ctl(59, EPOLL_CTL_DEL, 61, 0x7f15c97f0124) = 0 [pid 1388] close(61) = 0 [pid 1388] close(59) = 0 [pid 1388] epoll_create(1024) = 59 [pid 1388] socket(AF_INET, SOCK_DGRAM, IPPROTO_IP) = 61 [pid 1388] epoll_ctl(59, EPOLL_CTL_ADD, 61, {EPOLLIN, {u32=61, u64=61}}) = 0 [pid 1388] epoll_ctl(59, EPOLL_CTL_DEL, 61, 0x7f15c97f0124) = 0 [pid 1388] close(61) = 0 [pid 1388] close(59) = 0 [pid 1384] <... recvmsg resumed>{msg_name={sa_family=AF_INET6, sin6_port=htons(40204), sin6_flowinfo=htonl(0), inet_pton(AF_INET6, "2001:630:3c:5018::197", _addr), sin6_scope_id=0}, msg_namelen=28, msg_iov=[{iov_base="\327\23\1 \0\1\0\0\0\0\0\1\4\4lbdn\4virt\2ja\3net\0\0\6\0\1\0\0)\20\0\0\0\0\0\0\0", iov_len=4368}], msg_iovlen=1, msg_control=[{cmsg_len=36, cmsg_level=SOL_IPV6, cmsg_type=0x32}], msg_controllen=40, msg_flags=0}, 0) = 50 [pid 1384] write(1, "Packet from [2001:630:3c:5018::197]:40204 for .lbdn.virt.ja.net. SOA with id 5079\n", 86) = 86 [pid 1384] sendto(11, "U\251\0 \0\1\0\0\0\0\0\1\4\4lbdn\4virt\2ja\3net\0\0\6\0\1\0\0)\20\0\0\0\0\0\0\30\0\10\0\24\0\2\200\0 \1\0060\0 [pid 1388] epoll_create(1024) = 59 [pid 1388] socket(AF_INET, SOCK_DGRAM, IPPROTO_IP) = 61 [pid 1388] epoll_ctl(59, EPOLL_CTL_ADD, 61, {EPOLLIN, {u32=61, u64=61}}) = 0 [pid 1388] epoll_ctl(59, EPOLL_CTL_DEL, 61, 0x7f15c97f0124) = 0 [pid 1388] close(61) = 0 [pid 1388] close(59) = 0 [pid 1382] <... recvmsg resumed>{msg_name={sa_family=AF_INET, sin_port=htons(53936), sin_addr=inet_addr("194.83.7.197")}, msg_namelen=16, msg_iov=[{iov_base="\327\23\1 \0\1\0\0\0\0\0\1\4\4lbdn\4virt\2ja\3net\0\0\6\0\1\0\0)\20\0\0\0\0\0\0\0", iov_len=4368}], msg_iovlen=1, msg_control=[{cmsg_len=28, cmsg_level=SOL_IP, cmsg_type=IP_PKTINFO, cmsg_data={ipi_ifindex=if_nametoindex("ens192"), ipi_spec_dst=inet_addr("194.83.7.197"), ipi_addr=inet_addr("194.83.7.197")}}], msg_controllen=32, msg_flags=0}, 0) = 50 [pid 1382] write(1, "Packet from 194.83.7.197:53936 for .lbdn.virt.ja.net. SOA with id 5079\n", 75) = 75 [pid 1382] sendto(11, "V\251\0 \0\1\0\0\0\0\0\1\4\4lbdn\4virt\2ja\3net\0\0\6\0\1\0\0)\20\0\0\0\0\0\0\f\0\10\0\10\0\1 \0\302S\7\305", 62, 0, NULL, 0) = 62 [pid 1382] recvmsg(4, [pid 1388] epoll_create(1024) = 59 [pid 1388] socket(AF_INET, SOCK_DGRAM, IPPROTO_IP) = 61 [pid 1388] epoll_ctl(59, EPOLL_CTL_ADD, 61, {EPOLLIN, {u32=61, u64=61}}) = 0 [pid 1388] epoll_ctl(59, EPOLL_CTL_DEL, 61, 0x7f15c97f0124) = 0 [pid 1388] close(61) = 0 [pid 1388] close(59) = 0 [pid 1389] <... restart_syscall resumed>) = 0 [pid 1388] epoll_create(1024) = 59 [pid 1388] socket(AF_INET, SOCK_DGRAM, IPPROTO_IP) = 61 [pid 1388] epoll_ctl(59, EPOLL_CTL_ADD, 61, {EPOLLIN, {u32=61, u64=61}}) = 0 [pid 1388] epoll_ctl(59, EPOLL_CTL_DEL, 61, 0x7f15c97f0124) = 0 [pid 1388] close(61) = 0 [pid 1388] close(59) = 0 [pid 1388] epoll_create(1024) = 59 [pid 1388] socket(AF_INET, SOCK_DGRAM, IPPROTO_IP) = 61 [pid 1388] epoll_ctl(59, EPOLL_CTL_ADD, 61, {EPOLLIN, {u32=61, u64=61}}) = 0 [pid 1388] epoll_ctl(59, EPOLL_CTL_DEL, 61, 0x7f15c97f0124) = 0 [pid 1388] close(61) = 0 [pid 1388] close(59) = 0 [pid 1388] epoll_create(1024) = 59 [pid 1388] socket(AF_INET, SOCK_DGRAM, IPPROTO_IP) = 61 [pid 1388] epoll_ctl(59,
Re: [dnsdist] dnsdist uptime
On Tue, Oct 26, 2021 at 07:32:51AM +, SAMI RAHAL via dnsdist wrote a message of 125 lines which said: > I see an "Uptime:" on the Web console which is always less than one > day, even if I never restart the server. The log of your system may tell you why (a bug crashing the server, or a lack of memory forcing the system to kill dnsdist, for instance). ___ dnsdist mailing list dnsdist@mailman.powerdns.com https://mailman.powerdns.com/mailman/listinfo/dnsdist