Re: [dnsdist] dnsdist using loopback address as source address for queries

2021-10-29 Thread Remi Gacogne via dnsdist

On 10/29/21 15:32, Adam Bishop via dnsdist wrote:

On 29 Oct 2021, at 13:38, Remi Gacogne via dnsdist 
 wrote:

Would you mind checking that you still have IPv6 addresses on that interface? I 
see you still have some on the incoming interface, though, since we receive a 
query over IPv6 on file descriptor 18 in your strace output.


Yeah, it's a single interface system so the ingress/egress addresses are the 
same.


Any events in the system logs that looks like the IP addresses might have 
changed at some point?


The only thing in dmesg is an SElinux policy update triggered by dnf, but no 
AVC entries that would indicate that being the cause. NetworkManager hasn't 
written anything to its own log since boot, which is normal.


Anything in the dnsdist logs looking like a reconnect (error while writing to 
the backend, ..)? We should not reconnect unless send() failed with EINVAL or 
ENODEV, which is not supposed to happen in your case since you don't set the 
source interface.


There's some normal terminations when the backend is reloaded for config 
changes, e.g.

   dnsdist[1351]: Timeout while waiting for the health check response from 
backend [2001:630:1:170::67]:53
   dnsdist[1351]: Marking downstream [2001:630:1:170::67]:53 as 'down'
   dnsdist[1351]: Timeout while waiting for the health check response from 
backend 193.63.72.67:53
   dnsdist[1351]: Marking downstream 193.63.72.67:53 as 'down'
   dnsdist[1351]: Marking downstream [2001:630:1:170::67]:53 as 'up'
   dnsdist[1351]: Marking downstream 193.63.72.67:53 as 'up'

But that's it - the only other messages are "Packet from:"


Alright, thanks for checking! I have to say I'm running a bit out of 
ideas. We only bind to a specific source address if one is supplied in 
the configuration, and that's not the case here. The field holding that 
source address is not updated at runtime, and I would be surprised to 
see a memory corruption touching only that field. You might actually be 
able to check that via gdb, by attaching to the process, selecting the 
right thread with 'thread 7' (it needs to be one of these in 
responderThread() for the right backend, which is 7 or 9 in your current 
process based on the gdb output) then 'up' three times to reach the 
responderThread frame, and 'print *dss' to see the content of the structure.
Even if that field is corrupted, we would need to reconnect which 
normally does happen only the errors I mentioned earlier, and we don't 
have these in your logs. It also happens on a health-check failure when 
'reconnectOnFailure=true' is set on a backend, but that's not the case 
in your configuration. Actually, I wonder if that would make a 
difference, do you think you could try setting that option on one of the 
IPv6 backends?


Best regards,

Remi


OpenPGP_signature
Description: OpenPGP digital signature
___
dnsdist mailing list
dnsdist@mailman.powerdns.com
https://mailman.powerdns.com/mailman/listinfo/dnsdist


Re: [dnsdist] dnsdist using loopback address as source address for queries

2021-10-29 Thread Adam Bishop via dnsdist
On 29 Oct 2021, at 13:38, Remi Gacogne via dnsdist 
 wrote:
> Would you mind checking that you still have IPv6 addresses on that interface? 
> I see you still have some on the incoming interface, though, since we receive 
> a query over IPv6 on file descriptor 18 in your strace output.

Yeah, it's a single interface system so the ingress/egress addresses are the 
same.

> Any events in the system logs that looks like the IP addresses might have 
> changed at some point?

The only thing in dmesg is an SElinux policy update triggered by dnf, but no 
AVC entries that would indicate that being the cause. NetworkManager hasn't 
written anything to its own log since boot, which is normal.

> Anything in the dnsdist logs looking like a reconnect (error while writing to 
> the backend, ..)? We should not reconnect unless send() failed with EINVAL or 
> ENODEV, which is not supposed to happen in your case since you don't set the 
> source interface.

There's some normal terminations when the backend is reloaded for config 
changes, e.g.

  dnsdist[1351]: Timeout while waiting for the health check response from 
backend [2001:630:1:170::67]:53
  dnsdist[1351]: Marking downstream [2001:630:1:170::67]:53 as 'down'
  dnsdist[1351]: Timeout while waiting for the health check response from 
backend 193.63.72.67:53
  dnsdist[1351]: Marking downstream 193.63.72.67:53 as 'down'
  dnsdist[1351]: Marking downstream [2001:630:1:170::67]:53 as 'up'
  dnsdist[1351]: Marking downstream 193.63.72.67:53 as 'up'

But that's it - the only other messages are "Packet from:"

Adam

Jisc is a registered charity (number 1149740) and a company limited by 
guarantee which is registered in England under company number. 05747339, VAT 
number GB 197 0632 86. Jisc’s registered office is: 4 Portwall Lane, Bristol, 
BS1 6NB. T 0203 697 5800.


Jisc Services Limited is a wholly owned Jisc subsidiary and a company limited 
by guarantee which is registered in England under company number 02881024, VAT 
number GB 197 0632 86. The registered office is: 4 Portwall Lane, Bristol, BS1 
6NB. T 0203 697 5800.


Jisc Commercial Limited is a wholly owned Jisc subsidiary and a company limited 
by shares which is registered in England under company number 09316933, VAT 
number GB 197 0632 86. The registered office is: 4 Portwall Lane, Bristol, BS1 
6NB. T 0203 697 5800.


For more details on how Jisc handles your data see our privacy notice here: 
https://www.jisc.ac.uk/website/privacy-notice
___
dnsdist mailing list
dnsdist@mailman.powerdns.com
https://mailman.powerdns.com/mailman/listinfo/dnsdist


Re: [dnsdist] dnsdist using loopback address as source address for queries

2021-10-29 Thread Remi Gacogne via dnsdist

On 10/29/21 13:04, Adam Bishop via dnsdist wrote:

On 29 Oct 2021, at 11:33, Remi Gacogne via dnsdist 
 wrote:

would you mind getting the output of "lsof -n -p " while the 
issue is still happening? A full backtrace with gdb might also be good to have


Sure that's fine, output follows. I've uploaded the gdb output as it's quite 
long. We've got three other instances to take the load, so I'm happy to keep it 
running as is for the moment.


Thanks! So we see from the strace output that dnsdist is calling send() 
(which becomes the sendto() syscall) to send the queries to the backend, 
using the connected UDP sockets:


[pid  1384] sendto(11, "U\251\0 
\0\1\0\0\0\0\0\1\4\4lbdn\4virt\2ja\3net\0\0\6\0\1\0\0)\20\0\0\0\0\0\0\30\0\10\0\24\0\2\200\0 
\1\0060\0
[...]
[pid  1382] sendto(11, "V\251\0 
\0\1\0\0\0\0\0\1\4\4lbdn\4virt\2ja\3net\0\0\6\0\1\0\0)\20\0\0\0\0\0\0\f\0\10\0\10\0\1 
\0\302S\7\305", 62, 0, NULL, 0) = 62


But we then see in the lsof output that these IPv6 sockets (the 11 is 
used above, but the others are the same) are connected using ::1 as 
source, while the IPv4 ones are not:



dnsdist 1351 dnsdist   11u IPv6  28637  0t0  UDP 
[::1]:35903->[2001:630:1:160::195]:domain
dnsdist 1351 dnsdist   13u IPv4  28640  0t0  UDP 
194.83.7.197:45941->194.83.7.195:domain
dnsdist 1351 dnsdist   15u IPv6  28650  0t0  UDP 
[::1]:51827->[2001:630:1:170::67]:domain
dnsdist 1351 dnsdist   17u IPv4  28653  0t0  UDP 
194.83.7.197:57454->193.63.72.67:domain


That explains what you are seeing, but I don't understand how we end up 
with these..
Would you mind checking that you still have IPv6 addresses on that 
interface? I see you still have some on the incoming interface, though, 
since we receive a query over IPv6 on file descriptor 18 in your strace 
output. Any events in the system logs that looks like the IP addresses 
might have changed at some point? Anything in the dnsdist logs looking 
like a reconnect (error while writing to the backend, ..)? We should not 
reconnect unless send() failed with EINVAL or ENODEV, which is not 
supposed to happen in your case since you don't set the source interface.


Best regards,

Remi


OpenPGP_signature
Description: OpenPGP digital signature
___
dnsdist mailing list
dnsdist@mailman.powerdns.com
https://mailman.powerdns.com/mailman/listinfo/dnsdist


Re: [dnsdist] dnsdist using loopback address as source address for queries

2021-10-29 Thread Adam Bishop via dnsdist
On 29 Oct 2021, at 11:33, Remi Gacogne via dnsdist 
 wrote:
> would you mind getting the output of "lsof -n -p " while the 
> issue is still happening? A full backtrace with gdb might also be good to have

Sure that's fine, output follows. I've uploaded the gdb output as it's quite 
long. We've got three other instances to take the load, so I'm happy to keep it 
running as is for the moment.

Adam

GDB thread dump:
  
https://gist.githubusercontent.com/TheMysteriousX/08185778d506e973a0bbe8675dd35282/raw/fc5378be3e9b40abd5c4c6f50da6f0e0cdb5fe35/dnsdist%2520thread%2520dump

lsof:

COMMAND  PIDUSER   FD  TYPE DEVICE SIZE/OFF NODE NAME
dnsdist 1351 dnsdist  cwd   DIR  253,0  224  128 /
dnsdist 1351 dnsdist  rtd   DIR  253,0  224  128 /
dnsdist 1351 dnsdist  txt   REG  253,0  8461520 16827142 
/usr/bin/dnsdist
dnsdist 1351 dnsdist  mem   REG  253,0  9253600   401478 
/var/lib/sss/mc/passwd
dnsdist 1351 dnsdist  mem   REG  253,0  6940392   401482 
/var/lib/sss/mc/group
dnsdist 1351 dnsdist  mem   REG  253,046280 25375781 
/usr/lib64/libnss_sss.so.2
dnsdist 1351 dnsdist  mem   REG  253,030256 25167667 
/usr/lib64/libcap-ng.so.0.0.0
dnsdist 1351 dnsdist  mem   REG  253,026192 25167768 
/usr/lib64/libattr.so.1.1.2448
dnsdist 1351 dnsdist  mem   REG  253,0   543160 25167185 
/usr/lib64/libpcre2-8.so.0.7.1
dnsdist 1351 dnsdist  mem   REG  253,037024 25167632 
/usr/lib64/libffi.so.6.0.2
dnsdist 1351 dnsdist  mem   REG  253,0   123336 25167669 
/usr/lib64/libaudit.so.1.0.0
dnsdist 1351 dnsdist  mem   REG  253,0  1869272 25254232 
/usr/lib64/libdb-5.3.so
dnsdist 1351 dnsdist  mem   REG  253,0   309104 25439215 
/usr/lib64/liblua-5.3.so
dnsdist 1351 dnsdist  mem   REG  253,054816 25167770 
/usr/lib64/libacl.so.1.1.2253
dnsdist 1351 dnsdist  mem   REG  253,054688 25167496 
/usr/lib64/libpopt.so.0.0.1
dnsdist 1351 dnsdist  mem   REG  253,0   678440 25439213 
/usr/lib64/libzstd.so.1.4.4
dnsdist 1351 dnsdist  mem   REG  253,0   104744 25443492 
/usr/lib64/libelf-0.182.so
dnsdist 1351 dnsdist  mem   REG  253,074744 25167530 
/usr/lib64/libbz2.so.1.0.6
dnsdist 1351 dnsdist  mem   REG  253,0   144392 25167536 
/usr/lib64/libgpg-error.so.0.24.2
dnsdist 1351 dnsdist  mem   REG  253,0   168536 25165966 
/usr/lib64/libselinux.so.1
dnsdist 1351 dnsdist  mem   REG  253,033392 25439203 
/usr/lib64/libuuid.so.1.3.0
dnsdist 1351 dnsdist  mem   REG  253,0   343544 25288469 
/usr/lib64/libblkid.so.1.1.0
dnsdist 1351 dnsdist  mem   REG  253,0  1503456 25167662 
/usr/lib64/libgmp.so.10.3.2
dnsdist 1351 dnsdist  mem   REG  253,0   201632 25167916 
/usr/lib64/libhogweed.so.4.5
dnsdist 1351 dnsdist  mem   REG  253,0   239456 25202369 
/usr/lib64/libnettle.so.6.5
dnsdist 1351 dnsdist  mem   REG  253,081408 25167992 
/usr/lib64/libtasn1.so.6.5.5
dnsdist 1351 dnsdist  mem   REG  253,0  1760264 25167714 
/usr/lib64/libunistring.so.2.1.0
dnsdist 1351 dnsdist  mem   REG  253,0   162224 25167717 
/usr/lib64/libidn2.so.0.3.6
dnsdist 1351 dnsdist  mem   REG  253,0  1246520 25301369 
/usr/lib64/libp11-kit.so.0.3.0
dnsdist 1351 dnsdist  mem   REG  253,095416 25205548 
/usr/lib64/libz.so.1.2.11
dnsdist 1351 dnsdist  mem   REG  253,023464 25167336 
/usr/lib64/libutil-2.28.so
dnsdist 1351 dnsdist  mem   REG  253,0   142712 25167518 
/usr/lib64/libcrypt.so.1.1.0
dnsdist 1351 dnsdist  mem   REG  253,0   123520 25341073 
/usr/lib64/libresolv-2.28.so
dnsdist 1351 dnsdist  mem   REG  253,0  2181160 25221757 
/usr/lib64/libperl.so.5.26.3
dnsdist 1351 dnsdist  mem   REG  253,0   207472 25788589 
/usr/lib64/librpmio.so.8.2.0
dnsdist 1351 dnsdist  mem   REG  253,0   519352 25702207 
/usr/lib64/librpm.so.8.2.0
dnsdist 1351 dnsdist  mem   REG  253,062344 25185617 
/usr/lib64/libsensors.so.4.4.0
dnsdist 1351 dnsdist  mem   REG  253,0  1188080 25167585 
/usr/lib64/libgcrypt.so.20.2.5
dnsdist 1351 dnsdist  mem   REG  253,0   371312 25185868 
/usr/lib64/libmount.so.1.1.0
dnsdist 1351 dnsdist  mem   REG  253,0   119760 25185932 
/usr/lib64/liblz4.so.1.8.3
dnsdist 1351 dnsdist  mem   REG  253,0   192016 25167491 
/usr/lib64/liblzma.so.5.2.4
dnsdist 1351 dnsdist  mem   REG  253,069024 25341075 
/usr/lib64/librt-2.28.so
dnsdist 1351 dnsdist  mem   REG  253,0   208616 25167221 
/usr/lib64/libtinfo.so.6.1

Re: [dnsdist] dnsdist using loopback address as source address for queries

2021-10-29 Thread Remi Gacogne via dnsdist

Hi Adam,

On 10/29/21 12:10, Adam Bishop via dnsdist wrote:

On 13 Sep 2021, at 13:47, Adam Bishop  wrote:

On 13 Sep 2021, at 13:31, Remi Gacogne via dnsdist 
 wrote:

That's very weird, I don't have any clue to what might be happening. Would you 
mind sharing the whole configuration? In particular, do you set the source 
interface? I would also be very interested in seeing a strace of the process 
while the issue is happening.


Sure, I can attach strace the next time I catch it in the act. It's occurring 
infrequently, but reliably so might take me a few days. Source interface is not 
explicitly set, as the host is single homed.


Took a while to recur, but I attached trace. Nothing immediately jumps out at 
me, there's no errors returned by the socket.

Command used was: strace -v -s 1024 -Ff -p 1351 2>&1 | grep -v epoll_wait | 
grep -v nanosleep

I've left the instance in the broken state, so if there's additional testing 
that would be useful, please let me know.


I'll look at the trace more closely in a bit but would you mind getting 
the output of "lsof -n -p " while the issue is still 
happening? A full backtrace with gdb might also be good to have, you 
would first need to attach to the running process with 'gdb -p dnsdist' then issue the 'thread apply all bt full' command. You can then 
close gdb and let dnsdit running (or not).


Thanks!


OpenPGP_signature
Description: OpenPGP digital signature
___
dnsdist mailing list
dnsdist@mailman.powerdns.com
https://mailman.powerdns.com/mailman/listinfo/dnsdist


Re: [dnsdist] dnsdist using loopback address as source address for queries

2021-10-29 Thread Adam Bishop via dnsdist
On 13 Sep 2021, at 13:47, Adam Bishop  wrote:
> On 13 Sep 2021, at 13:31, Remi Gacogne via dnsdist 
>  wrote:
>> That's very weird, I don't have any clue to what might be happening. Would 
>> you mind sharing the whole configuration? In particular, do you set the 
>> source interface? I would also be very interested in seeing a strace of the 
>> process while the issue is happening.
>
> Sure, I can attach strace the next time I catch it in the act. It's occurring 
> infrequently, but reliably so might take me a few days. Source interface is 
> not explicitly set, as the host is single homed.

Took a while to recur, but I attached trace. Nothing immediately jumps out at 
me, there's no errors returned by the socket.

Command used was: strace -v -s 1024 -Ff -p 1351 2>&1 | grep -v epoll_wait | 
grep -v nanosleep

I've left the instance in the broken state, so if there's additional testing 
that would be useful, please let me know.

Adam

---

[pid  1388] epoll_create(1024)  = 59
[pid  1388] socket(AF_INET, SOCK_DGRAM, IPPROTO_IP) = 61
[pid  1388] epoll_ctl(59, EPOLL_CTL_ADD, 61, {EPOLLIN, {u32=61, u64=61}}) = 0
[pid  1388] epoll_ctl(59, EPOLL_CTL_DEL, 61, 0x7f15c97f0124) = 0
[pid  1388] close(61)   = 0
[pid  1388] close(59)   = 0
[pid  1388] epoll_create(1024)  = 59
[pid  1388] socket(AF_INET, SOCK_DGRAM, IPPROTO_IP) = 61
[pid  1388] epoll_ctl(59, EPOLL_CTL_ADD, 61, {EPOLLIN, {u32=61, u64=61}}) = 0
[pid  1388] epoll_ctl(59, EPOLL_CTL_DEL, 61, 0x7f15c97f0124) = 0
[pid  1388] close(61)   = 0
[pid  1388] close(59)   = 0
[pid  1388] epoll_create(1024)  = 59
[pid  1388] socket(AF_INET, SOCK_DGRAM, IPPROTO_IP) = 61
[pid  1388] epoll_ctl(59, EPOLL_CTL_ADD, 61, {EPOLLIN, {u32=61, u64=61}}) = 0
[pid  1388] epoll_ctl(59, EPOLL_CTL_DEL, 61, 0x7f15c97f0124) = 0
[pid  1388] close(61)   = 0
[pid  1388] close(59)   = 0
[pid  1384] <... recvmsg resumed>{msg_name={sa_family=AF_INET6, 
sin6_port=htons(40204), sin6_flowinfo=htonl(0), inet_pton(AF_INET6, 
"2001:630:3c:5018::197", _addr), sin6_scope_id=0}, msg_namelen=28, 
msg_iov=[{iov_base="\327\23\1 
\0\1\0\0\0\0\0\1\4\4lbdn\4virt\2ja\3net\0\0\6\0\1\0\0)\20\0\0\0\0\0\0\0", 
iov_len=4368}], msg_iovlen=1, msg_control=[{cmsg_len=36, cmsg_level=SOL_IPV6, 
cmsg_type=0x32}], msg_controllen=40, msg_flags=0}, 0) = 50
[pid  1384] write(1, "Packet from [2001:630:3c:5018::197]:40204 for 
.lbdn.virt.ja.net. SOA with id 5079\n", 86) = 86
[pid  1384] sendto(11, "U\251\0 
\0\1\0\0\0\0\0\1\4\4lbdn\4virt\2ja\3net\0\0\6\0\1\0\0)\20\0\0\0\0\0\0\30\0\10\0\24\0\2\200\0
 \1\0060\0
[pid  1388] epoll_create(1024)  = 59
[pid  1388] socket(AF_INET, SOCK_DGRAM, IPPROTO_IP) = 61
[pid  1388] epoll_ctl(59, EPOLL_CTL_ADD, 61, {EPOLLIN, {u32=61, u64=61}}) = 0
[pid  1388] epoll_ctl(59, EPOLL_CTL_DEL, 61, 0x7f15c97f0124) = 0
[pid  1388] close(61)   = 0
[pid  1388] close(59)   = 0
[pid  1382] <... recvmsg resumed>{msg_name={sa_family=AF_INET, 
sin_port=htons(53936), sin_addr=inet_addr("194.83.7.197")}, msg_namelen=16, 
msg_iov=[{iov_base="\327\23\1 
\0\1\0\0\0\0\0\1\4\4lbdn\4virt\2ja\3net\0\0\6\0\1\0\0)\20\0\0\0\0\0\0\0", 
iov_len=4368}], msg_iovlen=1, msg_control=[{cmsg_len=28, cmsg_level=SOL_IP, 
cmsg_type=IP_PKTINFO, cmsg_data={ipi_ifindex=if_nametoindex("ens192"), 
ipi_spec_dst=inet_addr("194.83.7.197"), ipi_addr=inet_addr("194.83.7.197")}}], 
msg_controllen=32, msg_flags=0}, 0) = 50
[pid  1382] write(1, "Packet from 194.83.7.197:53936 for .lbdn.virt.ja.net. 
SOA with id 5079\n", 75) = 75
[pid  1382] sendto(11, "V\251\0 
\0\1\0\0\0\0\0\1\4\4lbdn\4virt\2ja\3net\0\0\6\0\1\0\0)\20\0\0\0\0\0\0\f\0\10\0\10\0\1
 \0\302S\7\305", 62, 0, NULL, 0) = 62
[pid  1382] recvmsg(4,  
[pid  1388] epoll_create(1024)  = 59
[pid  1388] socket(AF_INET, SOCK_DGRAM, IPPROTO_IP) = 61
[pid  1388] epoll_ctl(59, EPOLL_CTL_ADD, 61, {EPOLLIN, {u32=61, u64=61}}) = 0
[pid  1388] epoll_ctl(59, EPOLL_CTL_DEL, 61, 0x7f15c97f0124) = 0
[pid  1388] close(61)   = 0
[pid  1388] close(59)   = 0
[pid  1389] <... restart_syscall resumed>) = 0
[pid  1388] epoll_create(1024)  = 59
[pid  1388] socket(AF_INET, SOCK_DGRAM, IPPROTO_IP) = 61
[pid  1388] epoll_ctl(59, EPOLL_CTL_ADD, 61, {EPOLLIN, {u32=61, u64=61}}) = 0
[pid  1388] epoll_ctl(59, EPOLL_CTL_DEL, 61, 0x7f15c97f0124) = 0
[pid  1388] close(61)   = 0
[pid  1388] close(59)   = 0
[pid  1388] epoll_create(1024)  = 59
[pid  1388] socket(AF_INET, SOCK_DGRAM, IPPROTO_IP) = 61
[pid  1388] epoll_ctl(59, EPOLL_CTL_ADD, 61, {EPOLLIN, {u32=61, u64=61}}) = 0
[pid  1388] epoll_ctl(59, EPOLL_CTL_DEL, 61, 0x7f15c97f0124) = 0
[pid  1388] close(61)   = 0
[pid  1388] close(59)   = 0
[pid  1388] epoll_create(1024)  = 59
[pid  1388] socket(AF_INET, SOCK_DGRAM, IPPROTO_IP) = 61
[pid  1388] epoll_ctl(59, 

Re: [dnsdist] dnsdist uptime

2021-10-29 Thread Stephane Bortzmeyer via dnsdist
On Tue, Oct 26, 2021 at 07:32:51AM +,
 SAMI RAHAL via dnsdist  wrote 
 a message of 125 lines which said:

> I see an "Uptime:" on the Web console which is always less than one
> day, even if I never restart the server.

The log of your system may tell you why (a bug crashing the server,
or a lack of memory forcing the system to kill dnsdist, for
instance).



___
dnsdist mailing list
dnsdist@mailman.powerdns.com
https://mailman.powerdns.com/mailman/listinfo/dnsdist