Fwd: Re: relayd's icmp check only works for a small number of hosts
forgot to add bugs@openbsd.org Subject: Re: relayd's icmp check only works for a small number of hosts Date: 2016-09-02 17:50 From: Remi Locherer To: Reyk Floeter On 2016-09-02 16:51, Reyk Floeter wrote: On Fri, Aug 19, 2016 at 04:31:10PM +0200, Remi Locherer wrote: >Synopsis: relayd's icmp check only works for a small number of hosts >Category: relayd >Environment: System : OpenBSD 5.9 Details : OpenBSD 5.9 (GENERIC.MP) #10: Wed Aug 3 13:46:07 CEST 2016 r...@stable-59-amd64.mtier.org:/binpatchng/work-binpatch59-amd64/src/sys/arch/amd64/compile/GENERIC.MP Architecture: OpenBSD.amd64 Machine : amd64 >Description: relayd says 70 out of 104 hosts are not reachable via icmp. But ping on the same host where relayd runs can reach all hosts with a rtt below 1ms. In the logs I see "210ms,icmp read timeout". But in relayd.conf a timeout of 1000 is set. All checks have to be completed before the next check interval. With that many tests, it can happen that relayd is not finished sending/receiving all individual checks before the next interval; missed hosts will be marked down. You could try the following: 1. Increase the global interval. With this in relayd.conf: interval 300 timeout 6 relayd successfully checked 36 hosts and reported icmp response times between 4 and 6 ms. After 60s relayd reports "icmp read timeout" for the other 68 hosts. Sep 2 17:29:13 lb2 relayd[31358]: host 192.168.63.48, check icmp (60008ms,icmp read timeout), state unknown -> down, availability 0.00% While it's true that a few hosts are down the majority of hosts answer my manual pings within 0.600 ms. 2. Instead of testing the same hosts multiple times, you can use the "parent" keyword to interhit the state from a tested hosts, eg. table { 10.1.1.1 } table { 10.1.1.1 parent 1 } table { 10.1.1.1 parent 1 } I'll try this one. It's a bit tricky since I can only reference the parent table by index and not by name. My relayd.conf is generated and deployed with Ansible.
Re: relayd's icmp check only works for a small number of hosts
On 2016-09-01 00:27, Sebastian Benoit wrote: Remi Locherer(remi.loche...@relo.ch) on 2016.08.19 16:31:10 +0200: >Synopsis: relayd's icmp check only works for a small number of hosts >Category: relayd >Environment: System : OpenBSD 5.9 Details : OpenBSD 5.9 (GENERIC.MP) #10: Wed Aug 3 13:46:07 CEST 2016 r...@stable-59-amd64.mtier.org:/binpatchng/work-binpatch59-amd64/src/sys/arch/amd64/compile/GENERIC.MP Architecture: OpenBSD.amd64 Machine : amd64 >Description: relayd says 70 out of 104 hosts are not reachable via icmp. But ping on the same host where relayd runs can reach all hosts with a rtt below 1ms. In the logs I see "210ms,icmp read timeout". But in relayd.conf a timeout of 1000 is set. Could this be related to the problem mentioned in the commit message of src/usr.sbin/relayd/check_icmp.c rev 1.41? i think you mean 1.40? yes try to increase usr.sbin/relayd/relayd.h:93:#define ICMP_RCVBUF_SIZE 262144 and see if you can have more checks then. I tried the values 524288 and 393216 for ICMP_RCVBUF_SIZE. For both values relayd tells me: relayd_icmp_patch: icmp_setup: setsockopt: No buffer space available And then it exits.
Re: relayd's icmp check only works for a small number of hosts
Remi Locherer(remi.loche...@relo.ch) on 2016.08.19 16:31:10 +0200: > >Synopsis:relayd's icmp check only works for a small number of hosts > >Category:relayd > >Environment: > System : OpenBSD 5.9 > Details : OpenBSD 5.9 (GENERIC.MP) #10: Wed Aug 3 13:46:07 CEST > 2016 > > r...@stable-59-amd64.mtier.org:/binpatchng/work-binpatch59-amd64/src/sys/arch/amd64/compile/GENERIC.MP > > Architecture: OpenBSD.amd64 > Machine : amd64 > > >Description: > relayd says 70 out of 104 hosts are not reachable via icmp. But ping on > the same host where relayd runs can reach all hosts with a rtt below 1ms. > > In the logs I see "210ms,icmp read timeout". But in relayd.conf a timeout > of 1000 is set. > > Could this be related to the problem mentioned in the commit message of > src/usr.sbin/relayd/check_icmp.c rev 1.41? i think you mean 1.40? try to increase usr.sbin/relayd/relayd.h:93:#define ICMP_RCVBUF_SIZE 262144 and see if you can have more checks then. > The latest errata patches for 5.9 are applied via the mtier pkgs. > > > relayd.conf: > # Global Options > interval 10 > timeout 1000 > log updates > > # Tables > table { > 192.168.63.32 > } > table { > 192.168.63.33 > } > table { > 192.168.63.32 > } > table { > 192.168.63.33 > } > table { > 192.168.63.35 > } > table { > 192.168.63.36 > } > table { > 192.168.63.35 > } > table { > 192.168.63.36 > } > table { > 192.168.63.38 > } > table { > 192.168.63.39 > } > table { > 192.168.63.38 > } > table { > 192.168.63.39 > } > table { > 192.168.63.41 > } > table { > 192.168.63.42 > } > table { > 192.168.63.41 > } > table { > 192.168.63.42 > } > table { > 192.168.63.44 > } > table { > 192.168.63.45 > } > table { > 192.168.63.44 > } > table { > 192.168.63.45 > } > table { > 192.168.63.47 > } > table { > 192.168.63.48 > } > table { > 192.168.63.47 > } > table { > 192.168.63.48 > } > table { > 192.168.63.50 > } > table { > 192.168.63.51 > } > table { > 192.168.63.50 > } > table { > 192.168.63.51 > } > table { > 192.168.63.84 > } > table { > 192.168.63.85 > } > table { > 192.168.63.84 > } > table { > 192.168.63.85 > } > table { > 192.168.63.84 > } > table { > 192.168.63.85 > } > table { > 192.168.63.84 > } > table { > 192.168.63.85 > } > table { > 192.168.63.104 > } > table { > 192.168.63.105 > } > table { > 192.168.63.104 > } > table { > 192.168.63.105 > } > table { > 192.168.63.124 > } > table { > 192.168.63.125 > } > table { > 192.168.63.124 > } > table { > 192.168.63.125 > } > table { > 192.168.63.124 > } > table { > 192.168.63.125 > } > table { > 192.168.63.124 > } > table { > 192.168.63.125 > } > table { > 192.168.63.114 > } > table { > 192.168.63.115 > } > table { > 192.168.63.114 > } > table { > 192.168.63.115 > } > table { > 192.168.63.114 > } > table { > 192.168.63.115 > } > table { > 192.168.63.114 > } > table { > 192.168.63.115 > } > table { > 192.168.63.132 > } > table { > 192.168.63.133 > } > table { > 192.168.63.132 > } > table { > 192.168.63.133 > } > table { > 192.168.63.135 > } > table { > 192.168.63.136 > } > table { > 192.168.63.135 > } > table { > 192.168.63.136 > } > table { > 192.168.63.138 > } > table { > 192.168.63.139 > } > table { > 192.168.63.138 > } > table { > 192.168.63.139 > } > table { > 192.168.63.141 > } > table { > 192.168.63.142 > } > table { > 192.168.63.141 > } > table { > 192.168.63.142 > } > table { > 192.168.63.184 > } > table { > 192.168.63.185 > } > table { > 192.168.63.184 > } > table { > 192.168.63.185 > } > table { > 192.168.63.184 > } > table { > 192.168.63.185 > } > table { >
relayd's icmp check only works for a small number of hosts
>Synopsis: relayd's icmp check only works for a small number of hosts >Category: relayd >Environment: System : OpenBSD 5.9 Details : OpenBSD 5.9 (GENERIC.MP) #10: Wed Aug 3 13:46:07 CEST 2016 r...@stable-59-amd64.mtier.org:/binpatchng/work-binpatch59-amd64/src/sys/arch/amd64/compile/GENERIC.MP Architecture: OpenBSD.amd64 Machine : amd64 >Description: relayd says 70 out of 104 hosts are not reachable via icmp. But ping on the same host where relayd runs can reach all hosts with a rtt below 1ms. In the logs I see "210ms,icmp read timeout". But in relayd.conf a timeout of 1000 is set. Could this be related to the problem mentioned in the commit message of src/usr.sbin/relayd/check_icmp.c rev 1.41? The latest errata patches for 5.9 are applied via the mtier pkgs. relayd.conf: # Global Options interval 10 timeout 1000 log updates # Tables table { 192.168.63.32 } table { 192.168.63.33 } table { 192.168.63.32 } table { 192.168.63.33 } table { 192.168.63.35 } table { 192.168.63.36 } table { 192.168.63.35 } table { 192.168.63.36 } table { 192.168.63.38 } table { 192.168.63.39 } table { 192.168.63.38 } table { 192.168.63.39 } table { 192.168.63.41 } table { 192.168.63.42 } table { 192.168.63.41 } table { 192.168.63.42 } table { 192.168.63.44 } table { 192.168.63.45 } table { 192.168.63.44 } table { 192.168.63.45 } table { 192.168.63.47 } table { 192.168.63.48 } table { 192.168.63.47 } table { 192.168.63.48 } table { 192.168.63.50 } table { 192.168.63.51 } table { 192.168.63.50 } table { 192.168.63.51 } table { 192.168.63.84 } table { 192.168.63.85 } table { 192.168.63.84 } table { 192.168.63.85 } table { 192.168.63.84 } table { 192.168.63.85 } table { 192.168.63.84 } table { 192.168.63.85 } table { 192.168.63.104 } table { 192.168.63.105 } table { 192.168.63.104 } table { 192.168.63.105 } table { 192.168.63.124 } table { 192.168.63.125 } table { 192.168.63.124 } table { 192.168.63.125 } table { 192.168.63.124 } table { 192.168.63.125 } table { 192.168.63.124 } table { 192.168.63.125 } table { 192.168.63.114 } table { 192.168.63.115 } table { 192.168.63.114 } table { 192.168.63.115 } table { 192.168.63.114 } table { 192.168.63.115 } table { 192.168.63.114 } table { 192.168.63.115 } table { 192.168.63.132 } table { 192.168.63.133 } table { 192.168.63.132 } table { 192.168.63.133 } table { 192.168.63.135 } table { 192.168.63.136 } table { 192.168.63.135 } table { 192.168.63.136 } table { 192.168.63.138 } table { 192.168.63.139 } table { 192.168.63.138 } table { 192.168.63.139 } table { 192.168.63.141 } table { 192.168.63.142 } table { 192.168.63.141 } table { 192.168.63.142 } table { 192.168.63.184 } table { 192.168.63.185 } table { 192.168.63.184 } table { 192.168.63.185 } table { 192.168.63.184 } table { 192.168.63.185 } table { 192.168.63.184 } table { 192.168.63.185 } table { 192.168.63.184 } table { 192.168.63.185 } table { 192.168.63.224 } table { 192.168.63.225 } table { 192.168.63.224 } table { 192.168.63.225 } table { 192.168.63.224 } table { 192.168.63.225 } table { 192.168.63.224 } table { 192.168.63.225 } table { 192.168.63.224 } table { 192.168.63.225 } table { 192.168.63.204 } table { 192.168.63.205 } table { 192.168.63.204 } table { 192.168.63.205 } table { 192.168.63.214 } table { 192.168.63.215 } table { 192.168.63.214 } table { 192.168.63.215 } table { 192.168.63.214 } table { 192.168.63.215 } table { 192.168.63.214 } table { 192.168.63.215 } # Redirects redirect test_acc_a_443 { listen on 192.168.62.2 tcp port 443 session timeout 600 forward to check icmp forward to check icmp match pftag test_acc_a } redirect test_acc_a_9727 { listen on 192.168.62.2 tcp port 9727 session timeout 600 forward to check icmp forward to check icmp match pftag test_acc_a } redirect test_acc_b_443 { listen on 192.168.62.3 tcp port 443 session timeout 600 forward to ch