Fwd: Re: relayd's icmp check only works for a small number of hosts

2016-09-02 Thread Remi Locherer

forgot to add bugs@openbsd.org

Subject: Re: relayd's icmp check only works for a small number of hosts
Date: 2016-09-02 17:50
From: Remi Locherer 
To: Reyk Floeter 

On 2016-09-02 16:51, Reyk Floeter wrote:

On Fri, Aug 19, 2016 at 04:31:10PM +0200, Remi Locherer wrote:

>Synopsis:   relayd's icmp check only works for a small number of hosts
>Category:   relayd
>Environment:
System  : OpenBSD 5.9
	Details : OpenBSD 5.9 (GENERIC.MP) #10: Wed Aug  3 13:46:07 CEST 
2016
			 
r...@stable-59-amd64.mtier.org:/binpatchng/work-binpatch59-amd64/src/sys/arch/amd64/compile/GENERIC.MP


Architecture: OpenBSD.amd64
Machine : amd64

>Description:
	relayd says 70 out of 104 hosts are not reachable via icmp. But ping 
on
the same host where relayd runs can reach all hosts with a rtt below 
1ms.


In the logs I see "210ms,icmp read timeout". But in relayd.conf a 
timeout

of 1000 is set.



All checks have to be completed before the next check interval.  With
that many tests, it can happen that relayd is not finished
sending/receiving all individual checks before the next interval;
missed hosts will be marked down.

You could try the following:

1. Increase the global interval.


With this in relayd.conf:
interval 300
timeout 6

relayd successfully checked 36 hosts and reported icmp response times 
between
4 and 6 ms. After 60s relayd reports "icmp read timeout" for the other 
68 hosts.


Sep  2 17:29:13 lb2 relayd[31358]: host 192.168.63.48, check icmp 
(60008ms,icmp read timeout), state unknown -> down, availability 0.00%


While it's true that a few hosts are down the majority of hosts answer
my manual pings within 0.600 ms.



2. Instead of testing the same hosts multiple times, you can use the
"parent" keyword to interhit the state from a tested hosts, eg.

table  {
10.1.1.1
}

table  {
10.1.1.1 parent 1
}

table  {
10.1.1.1 parent 1
}


I'll try this one. It's a bit tricky since I can only reference the
parent table by index and not by name. My relayd.conf is generated
and deployed with Ansible.



Re: relayd's icmp check only works for a small number of hosts

2016-09-02 Thread Remi Locherer

On 2016-09-01 00:27, Sebastian Benoit wrote:

Remi Locherer(remi.loche...@relo.ch) on 2016.08.19 16:31:10 +0200:

>Synopsis:   relayd's icmp check only works for a small number of hosts
>Category:   relayd
>Environment:
System  : OpenBSD 5.9
	Details : OpenBSD 5.9 (GENERIC.MP) #10: Wed Aug  3 13:46:07 CEST 
2016
			 
r...@stable-59-amd64.mtier.org:/binpatchng/work-binpatch59-amd64/src/sys/arch/amd64/compile/GENERIC.MP


Architecture: OpenBSD.amd64
Machine : amd64

>Description:
	relayd says 70 out of 104 hosts are not reachable via icmp. But ping 
on
the same host where relayd runs can reach all hosts with a rtt below 
1ms.


In the logs I see "210ms,icmp read timeout". But in relayd.conf a 
timeout

of 1000 is set.

Could this be related to the problem mentioned in the commit message 
of

src/usr.sbin/relayd/check_icmp.c rev 1.41?


i think you mean 1.40?


yes


try to increase

usr.sbin/relayd/relayd.h:93:#define ICMP_RCVBUF_SIZE 262144

and see if you can have more checks then.


I tried the values 524288 and 393216 for ICMP_RCVBUF_SIZE. For both 
values relayd tells me:


relayd_icmp_patch: icmp_setup: setsockopt: No buffer space available

And then it exits.



Re: relayd's icmp check only works for a small number of hosts

2016-08-31 Thread Sebastian Benoit
Remi Locherer(remi.loche...@relo.ch) on 2016.08.19 16:31:10 +0200:
> >Synopsis:relayd's icmp check only works for a small number of hosts
> >Category:relayd
> >Environment:
>   System  : OpenBSD 5.9
>   Details : OpenBSD 5.9 (GENERIC.MP) #10: Wed Aug  3 13:46:07 CEST 
> 2016
>
> r...@stable-59-amd64.mtier.org:/binpatchng/work-binpatch59-amd64/src/sys/arch/amd64/compile/GENERIC.MP
> 
>   Architecture: OpenBSD.amd64
>   Machine : amd64
> 
> >Description:
>   relayd says 70 out of 104 hosts are not reachable via icmp. But ping on
> the same host where relayd runs can reach all hosts with a rtt below 1ms.
> 
> In the logs I see "210ms,icmp read timeout". But in relayd.conf a timeout
> of 1000 is set.
> 
> Could this be related to the problem mentioned in the commit message of
> src/usr.sbin/relayd/check_icmp.c rev 1.41?

i think you mean 1.40?

try to increase

usr.sbin/relayd/relayd.h:93:#define ICMP_RCVBUF_SIZE 262144

and see if you can have more checks then.

> The latest errata patches for 5.9 are applied via the mtier pkgs.
> 
> 
> relayd.conf:
> # Global Options
> interval 10
> timeout 1000
> log updates
> 
> # Tables
> table  {
>   192.168.63.32
> }
> table  {
>   192.168.63.33
> }
> table  {
>   192.168.63.32
> }
> table  {
>   192.168.63.33
> }
> table  {
>   192.168.63.35
> }
> table  {
>   192.168.63.36
> }
> table  {
>   192.168.63.35
> }
> table  {
>   192.168.63.36
> }
> table  {
>   192.168.63.38
> }
> table  {
>   192.168.63.39
> }
> table  {
>   192.168.63.38
> }
> table  {
>   192.168.63.39
> }
> table  {
>   192.168.63.41
> }
> table  {
>   192.168.63.42
> }
> table  {
>   192.168.63.41
> }
> table  {
>   192.168.63.42
> }
> table  {
>   192.168.63.44
> }
> table  {
>   192.168.63.45
> }
> table  {
>   192.168.63.44
> }
> table  {
>   192.168.63.45
> }
> table  {
>   192.168.63.47
> }
> table  {
>   192.168.63.48
> }
> table  {
>   192.168.63.47
> }
> table  {
>   192.168.63.48
> }
> table  {
>   192.168.63.50
> }
> table  {
>   192.168.63.51
> }
> table  {
>   192.168.63.50
> }
> table  {
>   192.168.63.51
> }
> table  {
>   192.168.63.84
> }
> table  {
>   192.168.63.85
> }
> table  {
>   192.168.63.84
> }
> table  {
>   192.168.63.85
> }
> table  {
>   192.168.63.84
> }
> table  {
>   192.168.63.85
> }
> table  {
>   192.168.63.84
> }
> table  {
>   192.168.63.85
> }
> table  {
>   192.168.63.104
> }
> table  {
>   192.168.63.105
> }
> table  {
>   192.168.63.104
> }
> table  {
>   192.168.63.105
> }
> table  {
>   192.168.63.124
> }
> table  {
>   192.168.63.125
> }
> table  {
>   192.168.63.124
> }
> table  {
>   192.168.63.125
> }
> table  {
>   192.168.63.124
> }
> table  {
>   192.168.63.125
> }
> table  {
>   192.168.63.124
> }
> table  {
>   192.168.63.125
> }
> table  {
>   192.168.63.114
> }
> table  {
>   192.168.63.115
> }
> table  {
>   192.168.63.114
> }
> table  {
>   192.168.63.115
> }
> table  {
>   192.168.63.114
> }
> table  {
>   192.168.63.115
> }
> table  {
>   192.168.63.114
> }
> table  {
>   192.168.63.115
> }
> table  {
>   192.168.63.132
> }
> table  {
>   192.168.63.133
> }
> table  {
>   192.168.63.132
> }
> table  {
>   192.168.63.133
> }
> table  {
>   192.168.63.135
> }
> table  {
>   192.168.63.136
> }
> table  {
>   192.168.63.135
> }
> table  {
>   192.168.63.136
> }
> table  {
>   192.168.63.138
> }
> table  {
>   192.168.63.139
> }
> table  {
>   192.168.63.138
> }
> table  {
>   192.168.63.139
> }
> table  {
>   192.168.63.141
> }
> table  {
>   192.168.63.142
> }
> table  {
>   192.168.63.141
> }
> table  {
>   192.168.63.142
> }
> table  {
>   192.168.63.184
> }
> table  {
>   192.168.63.185
> }
> table  {
>   192.168.63.184
> }
> table  {
>   192.168.63.185
> }
> table  {
>   192.168.63.184
> }
> table  {
>   192.168.63.185
> }
> table  {
>   

relayd's icmp check only works for a small number of hosts

2016-08-19 Thread Remi Locherer
>Synopsis:      relayd's icmp check only works for a small number of hosts
>Category:  relayd
>Environment:
System  : OpenBSD 5.9
Details : OpenBSD 5.9 (GENERIC.MP) #10: Wed Aug  3 13:46:07 CEST 
2016
 
r...@stable-59-amd64.mtier.org:/binpatchng/work-binpatch59-amd64/src/sys/arch/amd64/compile/GENERIC.MP

Architecture: OpenBSD.amd64
Machine : amd64

>Description:
relayd says 70 out of 104 hosts are not reachable via icmp. But ping on
the same host where relayd runs can reach all hosts with a rtt below 1ms.

In the logs I see "210ms,icmp read timeout". But in relayd.conf a timeout
of 1000 is set.

Could this be related to the problem mentioned in the commit message of
src/usr.sbin/relayd/check_icmp.c rev 1.41?

The latest errata patches for 5.9 are applied via the mtier pkgs.


relayd.conf:
# Global Options
interval 10
timeout 1000
log updates

# Tables
table  {
192.168.63.32
}
table  {
192.168.63.33
}
table  {
192.168.63.32
}
table  {
192.168.63.33
}
table  {
192.168.63.35
}
table  {
192.168.63.36
}
table  {
192.168.63.35
}
table  {
192.168.63.36
}
table  {
192.168.63.38
}
table  {
192.168.63.39
}
table  {
192.168.63.38
}
table  {
192.168.63.39
}
table  {
192.168.63.41
}
table  {
192.168.63.42
}
table  {
192.168.63.41
}
table  {
192.168.63.42
}
table  {
192.168.63.44
}
table  {
192.168.63.45
}
table  {
192.168.63.44
}
table  {
192.168.63.45
}
table  {
192.168.63.47
}
table  {
192.168.63.48
}
table  {
192.168.63.47
}
table  {
192.168.63.48
}
table  {
192.168.63.50
}
table  {
192.168.63.51
}
table  {
192.168.63.50
}
table  {
192.168.63.51
}
table  {
192.168.63.84
}
table  {
192.168.63.85
}
table  {
192.168.63.84
}
table  {
192.168.63.85
}
table  {
192.168.63.84
}
table  {
192.168.63.85
}
table  {
192.168.63.84
}
table  {
192.168.63.85
}
table  {
192.168.63.104
}
table  {
192.168.63.105
}
table  {
192.168.63.104
}
table  {
192.168.63.105
}
table  {
192.168.63.124
}
table  {
192.168.63.125
}
table  {
192.168.63.124
}
table  {
192.168.63.125
}
table  {
192.168.63.124
}
table  {
192.168.63.125
}
table  {
192.168.63.124
}
table  {
192.168.63.125
}
table  {
192.168.63.114
}
table  {
192.168.63.115
}
table  {
192.168.63.114
}
table  {
192.168.63.115
}
table  {
192.168.63.114
}
table  {
192.168.63.115
}
table  {
192.168.63.114
}
table  {
192.168.63.115
}
table  {
192.168.63.132
}
table  {
192.168.63.133
}
table  {
192.168.63.132
}
table  {
192.168.63.133
}
table  {
192.168.63.135
}
table  {
192.168.63.136
}
table  {
192.168.63.135
}
table  {
192.168.63.136
}
table  {
192.168.63.138
}
table  {
192.168.63.139
}
table  {
192.168.63.138
}
table  {
192.168.63.139
}
table  {
192.168.63.141
}
table  {
192.168.63.142
}
table  {
192.168.63.141
}
table  {
192.168.63.142
}
table  {
192.168.63.184
}
table  {
192.168.63.185
}
table  {
192.168.63.184
}
table  {
192.168.63.185
}
table  {
192.168.63.184
}
table  {
192.168.63.185
}
table  {
192.168.63.184
}
table  {
192.168.63.185
}
table  {
192.168.63.184
}
table  {
192.168.63.185
}
table  {
192.168.63.224
}
table  {
192.168.63.225
}
table  {
192.168.63.224
}
table  {
192.168.63.225
}
table  {
192.168.63.224
}
table  {
192.168.63.225
}
table  {
192.168.63.224
}
table  {
192.168.63.225
}
table  {
192.168.63.224
}
table  {
192.168.63.225
}
table  {
192.168.63.204
}
table  {
192.168.63.205
}
table  {
192.168.63.204
}
table  {
192.168.63.205
}
table  {
192.168.63.214
}
table  {
192.168.63.215
}
table  {
192.168.63.214
}
table  {
192.168.63.215
}
table  {
192.168.63.214
}
table  {
192.168.63.215
}
table  {
192.168.63.214
}
table  {
192.168.63.215
}

# Redirects
redirect test_acc_a_443 {
listen on 192.168.62.2 tcp port 443
session timeout 600
forward to  check icmp
forward to  check icmp
match pftag test_acc_a
}
redirect test_acc_a_9727 {
listen on 192.168.62.2 tcp port 9727
session timeout 600
forward to  check icmp
forward to  check icmp
match pftag test_acc_a
}
redirect test_acc_b_443 {
listen on 192.168.62.3 tcp port 443
session timeout 600
forward to  ch