Re: [systemd-devel] systemd-resolved continuously switching DNS servers

2017-05-10 Thread Kai Krakow
Am Tue, 9 May 2017 20:37:16 +0200
schrieb Lennart Poettering :

> On Tue, 09.05.17 00:42, Kai Krakow (hurikha...@gmail.com) wrote:
> 
> > Am Sat, 6 May 2017 14:22:21 +0200
> > schrieb Kai Krakow :
> >   
> > > Am Fri, 5 May 2017 20:18:41 +0200
> > > schrieb Lennart Poettering :
> > >   
>  [...]  
>  [...]  
>  [...]  
> > > 
> > > It looks like this all has to do with timeouts:  
> > 
> > Fixed by restarting the router. The cable modem seems to be buggy
> > with UDP packets after a lot of uptime: it simply silently drops UDP
> > packets at regular intervals, WebUI was also very slow, probably a
> > CPU issue.
> > 
> > I'll follow up on this with the cable provider.
> > 
> > When the problem starts to show up, systemd-resolved is affected
> > more by this than direct resolving. I don't know if there's
> > something that could be optimized in systemd-resolved to handle
> > such issues better but I don't consider it a bug in
> > systemd-resolved, it was a local problem.  
> 
> Normally configured DNS servers should be equivalent, and hence
> switching them for each retry should not come at any cost, hence,
> besides the extra log output, do you experience any real issues?

Since I restarted the router, there are no longer any such logs except
maybe a few per day (less than 4).

But when I got those logs spammed to the journal, the real problem was
the DNS resolver taking 10s about once per minute to resolve a website
address - which really was a pita.

But well, what could systemd-resolved have done about it when the real
problem was some network equipment?

I just wonder why it was less visible when directly using those DNS
servers. Since DNS must have been designed with occassional packet loss
in mind (because it uses UDP), there must be a way to handle this
better. So I read a little bit in https://www.ietf.org/rfc/rfc1035.txt.

RFC1035 section 4.2.1 suggests that the retransmission interval for
queries should be 2-5 seconds, depending on statistics of previous
queries. To me, "retransmissions" means the primary DNS server should
not be switched for each query timeout it got (while still allowing to
transfer the same request to the next available server).

RFC1035 section 7 discusses the suggested implementation of the
resolver and covers retransmission and server selection algorithms:

It suggests to record average response time for each server it queries
to select the ones which respond faster first. Without query history,
the selection algorithm should pretend a response time of 5-10 seconds.

It also suggests to switch the primary server only when there was some
"bizarre" error or a server error reply. However, I don't think it
should actually remove them from the list as suggested there since we
are a client resolver, not a server resolver which can update its peer
lists from neighbor servers. However, we could reset query time
statistics to move it to the end of the list, and/or blacklist it for a
while.

Somewhere else in that document I've read that it is well permitted to
interleave multiple parallel requests to multiple DNS servers in the
list. So I guess it would be nice and allowed if systemd-resolved used
more than only one DNS server at the same time by alternating between
them each request - maybe taking the two best according to query time
statistics.

I also guess that it should maybe use shorter timeouts for queries as
you could have more than one DNS server and the initial query time
statistic should pretend 5-10 seconds, while the rotation interval
suggests 2-5 seconds.

I think it would work to have "10 seconds divided by servers count" or
2 seconds, whatever is bigger, as a timeout for query rotation. But a
late reply should still be accepted as pointed out in section 7.3, even
when the query was already rotated to the next DNS server. Using only a
single DNS server can skip all this logic as there's no rotation and
would work with timeouts of 10 seconds.

That way, systemd-resolved would "learn" to use only the fastest DNS
server and when it becomes too slow (which is 5-10 seconds based on the
RFC), it would switch to the next server. If parallel requests come in,
it would use more DNS servers from the list in parallel, auto-sorted by
query reply time. The startup order is the one given by the
administrator (or whatever provides the DNS server list).

Applied to my UDP packet loss (which seem to be single packet losses as
an immediate next request would've got a reply), it would mean that the
systemd resolver gives me an address after 2-3 seconds instead of 5 or
10 because I had 4 DNS servers on that link. This is more or less what
I've seen previously in my situation when I switched back to direct
resolving instead of through systemd-resolved.

What do you think? I could think of this being an implementation
improvement project in the Github bug tracker. I would be willing to
work on such an improvement given 

Re: [systemd-devel] systemd-resolved continuously switching DNS servers

2017-05-09 Thread Lennart Poettering
On Tue, 09.05.17 00:42, Kai Krakow (hurikha...@gmail.com) wrote:

> Am Sat, 6 May 2017 14:22:21 +0200
> schrieb Kai Krakow :
> 
> > Am Fri, 5 May 2017 20:18:41 +0200
> > schrieb Lennart Poettering :
> > 
> > > On Fri, 05.05.17 01:01, Kai Krakow (hurikha...@gmail.com) wrote:
> > >   
> > > > Hello!
> > > > 
> > > > Why is systemd-resolved switching DNS servers all day long? This
> > > > doesn't seem to be right...
> > > 
> > > If you turn on debug logging, you should see an explanation right
> > > before each switch. I figure we should choose the log levels more
> > > carefully, so that whenever we switch we also log the reason at the
> > > same level...  
> > 
> > It looks like this all has to do with timeouts:
> 
> Fixed by restarting the router. The cable modem seems to be buggy with
> UDP packets after a lot of uptime: it simply silently drops UDP
> packets at regular intervals, WebUI was also very slow, probably a CPU
> issue.
> 
> I'll follow up on this with the cable provider.
> 
> When the problem starts to show up, systemd-resolved is affected more
> by this than direct resolving. I don't know if there's something that
> could be optimized in systemd-resolved to handle such issues better but
> I don't consider it a bug in systemd-resolved, it was a local problem.

Normally configured DNS servers should be equivalent, and hence
switching them for each retry should not come at any cost, hence,
besides the extra log output, do you experience any real issues?

Lennart

-- 
Lennart Poettering, Red Hat
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] systemd-resolved continuously switching DNS servers

2017-05-08 Thread Kai Krakow
Am Sat, 6 May 2017 14:22:21 +0200
schrieb Kai Krakow :

> Am Fri, 5 May 2017 20:18:41 +0200
> schrieb Lennart Poettering :
> 
> > On Fri, 05.05.17 01:01, Kai Krakow (hurikha...@gmail.com) wrote:
> >   
> > > Hello!
> > > 
> > > Why is systemd-resolved switching DNS servers all day long? This
> > > doesn't seem to be right...
> > 
> > If you turn on debug logging, you should see an explanation right
> > before each switch. I figure we should choose the log levels more
> > carefully, so that whenever we switch we also log the reason at the
> > same level...  
> 
> It looks like this all has to do with timeouts:

Fixed by restarting the router. The cable modem seems to be buggy with
UDP packets after a lot of uptime: it simply silently drops UDP
packets at regular intervals, WebUI was also very slow, probably a CPU
issue.

I'll follow up on this with the cable provider.

When the problem starts to show up, systemd-resolved is affected more
by this than direct resolving. I don't know if there's something that
could be optimized in systemd-resolved to handle such issues better but
I don't consider it a bug in systemd-resolved, it was a local problem.

Thanks,
Kai

 
> Mai 06 14:17:09 jupiter systemd-resolved[5585]: Cache miss for
> ssl.gstatic.com IN  Mai 06 14:17:09 jupiter
> systemd-resolved[5585]: Transaction 54375 for  > scope dns on enp5s0/*. Mai 06 14:17:09 jupiter
> > systemd-resolved[5585]: Using feature level UDP for transaction
> > 54375. Mai 06 14:17:09 jupiter systemd-resolved[5585]: Using
> > DNS server fe80::b248:7aff:fee7:f438%2 for transaction 54375.
> Mai 06 14:17:09 jupiter systemd-resolved[5585]: Sending query packet
> with id 54375. Mai 06 14:17:09 jupiter systemd-resolved[5585]:
> Timeout reached on transaction 33004. Mai 06 14:17:09 jupiter
> systemd-resolved[5585]: Retrying transaction 33004. Mai 06 14:17:09
> jupiter systemd-resolved[5585]: Switching to DNS server
> 2a02:8109:1ec0:6f5:5667:51ff:feea:385f for interface enp5s0. Mai 06
> 14:17:09 jupiter systemd-resolved[5585]: Cache miss for
> ssl.gstatic.com IN A Mai 06 14:17:09 jupiter systemd-resolved[5585]:
> Transaction 33004 for  scope dns on enp5s0/*.
> Mai 06 14:17:09 jupiter systemd-resolved[5585]: Using feature level
> UDP for transaction 33004. Mai 06 14:17:09 jupiter
> systemd-resolved[5585]: Using DNS server
> 2a02:8109:1ec0:6f5:5667:51ff:feea:385f for transaction 33004. Mai 06
> 14:17:09 jupiter systemd-resolved[5585]: Sending query packet with id
> 33004. Mai 06 14:17:09 jupiter systemd-resolved[5585]: Processing
> incoming packet on transaction 33004. (rcode=SUCCESS) Mai 06 14:17:09
> jupiter systemd-resolved[5585]: Not validating response for 33004,
> used server feature level does not support DNSSEC. Mai 06 14:17:09
> jupiter systemd-resolved[5585]: Added positive unauthenticated cache
> entry for ssl.gstatic.com IN A 143s on
> */INET6/2a02:8109:1ec0:6f5:5667:51ff:feea:385f Mai 06 14:17:09
> jupiter systemd-resolved[5585]: Transaction 33004 for
>  on scope dns on enp5s0/* now complete with
>  from network (unsigned). Mai 06 14:17:09 jupiter
> systemd-resolved[5585]: Sending response packet with id 42127 on
> interface 1/AF_INET. Mai 06 14:17:09 jupiter systemd-resolved[5585]:
> Sending response packet with id 22131 on interface 1/AF_INET. Mai 06
> 14:17:09 jupiter systemd-resolved[5585]: Processing incoming packet
> on transaction 54375. (rcode=SUCCESS) Mai 06 14:17:09 jupiter
> systemd-resolved[5585]: Not validating response for 54375, used
> server feature level does not support DNSSEC. Mai 06 14:17:09 jupiter
> systemd-resolved[5585]: Added positive unauthenticated cache entry
> for ssl.gstatic.com IN  203s on
> enp5s0/INET6/fe80::b248:7aff:fee7:f438 Mai 06 14:17:09 jupiter
> systemd-resolved[5585]: Transaction 54375 for  > on scope dns on enp5s0/* now complete with  from
> > network (unsigned). Mai 06 14:17:09 jupiter
> > systemd-resolved[5585]: Freeing transaction 33004. Mai 06
> > 14:17:09 jupiter systemd-resolved[5585]: Sent message
> > type=method_return sender=n/a destination=:1.352 object=n/a
> > interface=n/a member=n/a cookie=234 reply_cookie=2 error=n/a
> > Mai 06 14:17:09 jupiter systemd-resolved[5585]: Sent message
> > type=method_call sender=n/a destination=org.freedesktop.DBus
> > object=/org/freedesktop/DBus interface=org.freedesktop.DBus
> > member=RemoveMatch cookie=235 reply_cookie=0 erro Mai 06
> > 14:17:09 jupiter systemd-resolved[5585]: Got message
> > type=method_return sender=org.freedesktop.DBus
> > destination=:1.273 object=n/a interface=n/a member=n/a
> > cookie=181 reply_cookie=235 error=n/a Mai 06 14:17:09 jupiter
> > systemd-resolved[5585]: Freeing transaction 54375.
> 
> I just don't unterstand why, because all these nameservers work
> perfectly well when used directly and not through the stub resolver.


-- 
Regards,
Kai

Replies to list-only preferred.


Re: [systemd-devel] systemd-resolved continuously switching DNS servers

2017-05-06 Thread Kai Krakow
Am Fri, 5 May 2017 20:18:41 +0200
schrieb Lennart Poettering :

> On Fri, 05.05.17 01:01, Kai Krakow (hurikha...@gmail.com) wrote:
> 
> > Hello!
> > 
> > Why is systemd-resolved switching DNS servers all day long? This
> > doesn't seem to be right...  
> 
> If you turn on debug logging, you should see an explanation right
> before each switch. I figure we should choose the log levels more
> carefully, so that whenever we switch we also log the reason at the
> same level...

It looks like this all has to do with timeouts:

Mai 06 14:17:09 jupiter systemd-resolved[5585]: Cache miss for ssl.gstatic.com 
IN 
Mai 06 14:17:09 jupiter systemd-resolved[5585]: Transaction 54375 for 
 scope dns on enp5s0/*.
Mai 06 14:17:09 jupiter systemd-resolved[5585]: Using feature level UDP for 
transaction 54375.
Mai 06 14:17:09 jupiter systemd-resolved[5585]: Using DNS server 
fe80::b248:7aff:fee7:f438%2 for transaction 54375.
Mai 06 14:17:09 jupiter systemd-resolved[5585]: Sending query packet with id 
54375.
Mai 06 14:17:09 jupiter systemd-resolved[5585]: Timeout reached on transaction 
33004.
Mai 06 14:17:09 jupiter systemd-resolved[5585]: Retrying transaction 33004.
Mai 06 14:17:09 jupiter systemd-resolved[5585]: Switching to DNS server 
2a02:8109:1ec0:6f5:5667:51ff:feea:385f for interface enp5s0.
Mai 06 14:17:09 jupiter systemd-resolved[5585]: Cache miss for ssl.gstatic.com 
IN A
Mai 06 14:17:09 jupiter systemd-resolved[5585]: Transaction 33004 for 
 scope dns on enp5s0/*.
Mai 06 14:17:09 jupiter systemd-resolved[5585]: Using feature level UDP for 
transaction 33004.
Mai 06 14:17:09 jupiter systemd-resolved[5585]: Using DNS server 
2a02:8109:1ec0:6f5:5667:51ff:feea:385f for transaction 33004.
Mai 06 14:17:09 jupiter systemd-resolved[5585]: Sending query packet with id 
33004.
Mai 06 14:17:09 jupiter systemd-resolved[5585]: Processing incoming packet on 
transaction 33004. (rcode=SUCCESS)
Mai 06 14:17:09 jupiter systemd-resolved[5585]: Not validating response for 
33004, used server feature level does not support DNSSEC.
Mai 06 14:17:09 jupiter systemd-resolved[5585]: Added positive unauthenticated 
cache entry for ssl.gstatic.com IN A 143s on 
*/INET6/2a02:8109:1ec0:6f5:5667:51ff:feea:385f
Mai 06 14:17:09 jupiter systemd-resolved[5585]: Transaction 33004 for 
 on scope dns on enp5s0/* now complete with  
from network (unsigned).
Mai 06 14:17:09 jupiter systemd-resolved[5585]: Sending response packet with id 
42127 on interface 1/AF_INET.
Mai 06 14:17:09 jupiter systemd-resolved[5585]: Sending response packet with id 
22131 on interface 1/AF_INET.
Mai 06 14:17:09 jupiter systemd-resolved[5585]: Processing incoming packet on 
transaction 54375. (rcode=SUCCESS)
Mai 06 14:17:09 jupiter systemd-resolved[5585]: Not validating response for 
54375, used server feature level does not support DNSSEC.
Mai 06 14:17:09 jupiter systemd-resolved[5585]: Added positive unauthenticated 
cache entry for ssl.gstatic.com IN  203s on 
enp5s0/INET6/fe80::b248:7aff:fee7:f438
Mai 06 14:17:09 jupiter systemd-resolved[5585]: Transaction 54375 for 
 on scope dns on enp5s0/* now complete with  
from network (unsigned).
Mai 06 14:17:09 jupiter systemd-resolved[5585]: Freeing transaction 33004.
Mai 06 14:17:09 jupiter systemd-resolved[5585]: Sent message type=method_return 
sender=n/a destination=:1.352 object=n/a interface=n/a member=n/a cookie=234 
reply_cookie=2 error=n/a
Mai 06 14:17:09 jupiter systemd-resolved[5585]: Sent message type=method_call 
sender=n/a destination=org.freedesktop.DBus object=/org/freedesktop/DBus 
interface=org.freedesktop.DBus member=RemoveMatch cookie=235 reply_cookie=0 erro
Mai 06 14:17:09 jupiter systemd-resolved[5585]: Got message type=method_return 
sender=org.freedesktop.DBus destination=:1.273 object=n/a interface=n/a 
member=n/a cookie=181 reply_cookie=235 error=n/a
Mai 06 14:17:09 jupiter systemd-resolved[5585]: Freeing transaction 54375.

I just don't unterstand why, because all these nameservers work
perfectly well when used directly and not through the stub resolver.


-- 
Regards,
Kai

Replies to list-only preferred.

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] systemd-resolved continuously switching DNS servers

2017-05-05 Thread Lennart Poettering
On Fri, 05.05.17 01:01, Kai Krakow (hurikha...@gmail.com) wrote:

> Hello!
> 
> Why is systemd-resolved switching DNS servers all day long? This
> doesn't seem to be right...

If you turn on debug logging, you should see an explanation right
before each switch. I figure we should choose the log levels more
carefully, so that whenever we switch we also log the reason at the
same level...


Lennart

-- 
Lennart Poettering, Red Hat
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] systemd-resolved continuously switching DNS servers

2017-05-04 Thread Kai Krakow
Hello!

Why is systemd-resolved switching DNS servers all day long? This
doesn't seem to be right...

Mai 05 00:52:46 jupiter systemd-resolved[658]: Switching to DNS server
192.168.4.254 for interface enp5s0. Mai 05 00:52:53 jupiter
systemd-resolved[658]: Switching to DNS server
fe80::b248:7aff:fee7:f438%2 for interface enp5s0. Mai 05 00:52:53
jupiter systemd-resolved[658]: Switching to DNS server
2a02:8109:1ec0:6f5:5667:51ff:feea:385f for interface enp5s0. Mai 05
00:52:58 jupiter systemd-resolved[658]: Switching to DNS server
192.168.4.254 for interface enp5s0. Mai 05 00:52:59 jupiter
systemd-resolved[658]: Switching to DNS server
fe80::b248:7aff:fee7:f438%2 for interface enp5s0. Mai 05 00:53:02
jupiter systemd-resolved[658]: Switching to DNS server
2a02:8109:1ec0:6f5:5667:51ff:feea:385f for interface enp5s0. Mai 05
00:53:06 jupiter systemd-resolved[658]: Switching to DNS server
192.168.4.254 for interface enp5s0. Mai 05 00:53:07 jupiter
systemd-resolved[658]: Switching to DNS server
fe80::b248:7aff:fee7:f438%2 for interface enp5s0. Mai 05 00:53:12
jupiter systemd-resolved[658]: Switching to DNS server
2a02:8109:1ec0:6f5:5667:51ff:feea:385f for interface enp5s0. Mai 05
00:53:33 jupiter systemd-resolved[658]: Switching to DNS server
192.168.4.254 for interface enp5s0. Mai 05 00:53:35 jupiter
systemd-resolved[658]: Switching to DNS server
fe80::b248:7aff:fee7:f438%2 for interface enp5s0. Mai 05 00:53:40
jupiter systemd-resolved[658]: Switching to DNS server
2a02:8109:1ec0:6f5:5667:51ff:feea:385f for interface enp5s0. Mai 05
00:54:01 jupiter systemd-resolved[658]: Switching to DNS server
192.168.4.254 for interface enp5s0. Mai 05 00:54:02 jupiter
systemd-resolved[658]: Switching to DNS server
fe80::b248:7aff:fee7:f438%2 for interface enp5s0. Mai 05 00:54:08
jupiter systemd-resolved[658]: Switching to DNS server
2a02:8109:1ec0:6f5:5667:51ff:feea:385f for interface enp5s0.

Also, name resolving seems to be very slow. When I directly use these
name servers in resolv.conf instead of going through the stub resolver,
there are no slow-downs and these logs stop.


-- 
Regards,
Kai

Replies to list-only preferred.

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel