Re: -current cloner interfaces broken/gone/unusable
Takahiro Kambe writes: > Hi, this is too late reply. > > In message <2532e215-e447-edaa-1f7b-5abf5205f...@netbsd.org> > on Tue, 24 Apr 2018 07:30:04 +0200, > Frank Kardel wrote: >> There are also 2 other observations with a 8.99.12 userland: >> >> named has now trouble with interface scanning. >> >> 2018-04-24T05:13:34.522295+00:00 gateway named 345 - - automatic >> interface scanning terminated: not enough free resources > I found it on NetBSD 8.0_RC1 machine with both base named and > pkgsrc/net/bind910. > > Below change seems to fix this problem although 16284 is something > large size without exact reason. :-) It was long ago, but I remember a code pattern surrounding interface lists that noticed that the results did not fit and retried with a larger buffer. So regardless of changing, if that's possible, it would be good to at least log an error if there is no larger/retry logic. signature.asc Description: PGP signature
Re: -current cloner interfaces broken/gone/unusable
Hi, this is too late reply. In message <2532e215-e447-edaa-1f7b-5abf5205f...@netbsd.org> on Tue, 24 Apr 2018 07:30:04 +0200, Frank Kardel wrote: > There are also 2 other observations with a 8.99.12 userland: > > named has now trouble with interface scanning. > > 2018-04-24T05:13:34.522295+00:00 gateway named 345 - - automatic > interface scanning terminated: not enough free resources I found it on NetBSD 8.0_RC1 machine with both base named and pkgsrc/net/bind910. Below change seems to fix this problem although 16284 is something large size without exact reason. :-) -- Takahiro Kambe / --- bin/named/interfacemgr.c.orig 2018-03-08 20:55:52.0 + +++ bin/named/interfacemgr.c @@ -84,7 +84,7 @@ struct ns_interfacemgr { #ifdef USE_ROUTE_SOCKET isc_task_t *task; isc_socket_t * route; - unsigned char buf[2048]; + unsigned char buf[16384]; #endif };
Re: -current cloner interfaces broken/gone/unusable
On 23/04/2018 23:34, Robert Swindells wrote: Frank Kardel wrote: using -current as of 20180421 (NetBSD 8.99.14 (GENERIC) #0: Sat Apr 21 23:01:29 UTC 2018 mkre...@mkrepro.netbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC amd64) no cloning interfaces are visible: gateway# ifconfig -l ixg0 ixg1 ixg2 ixg3 lo0 tun0 tun1 gateway# ifconfig -C ifconfig: SIOCIFGCLONERS for count: Device not configured gateway# ifconfig vlan0 create ifconfig: clone_command: Device not configured ifconfig: exec_matches: Device not configured gateway# This does not seem to be a desirable state - any clues what broke here ? It looks to be the test for a valid interface name in sys/compat/common/uipc_syscalls_50.c that is causing this, I think it should only be done when the ioctl command is SIOCGIFDATA or SIOCZIFDATA. This works for me but is a bit ugly: Index: uipc_syscalls_50.c === RCS file: /cvsroot/src/sys/compat/common/uipc_syscalls_50.c,v retrieving revision 1.4 diff -u -r1.4 uipc_syscalls_50.c --- uipc_syscalls_50.c 12 Apr 2018 18:50:13 - 1.4 +++ uipc_syscalls_50.c 23 Apr 2018 22:33:14 - @@ -63,9 +63,17 @@ struct ifnet *ifp; int error; - ifp = ifunit(ifdr->ifdr_name); - if (ifp == NULL) - return ENXIO; + switch (cmd) { + case SIOCGIFDATA: + case SIOCZIFDATA: + ifp = ifunit(ifdr->ifdr_name); + if (ifp == NULL) + return ENXIO; + break; + default: + ifp = NULL; + break; + } switch (cmd) { case SIOCGIFDATA: Committed, thanks Roy
Re: -current cloner interfaces broken/gone/unusable
It is not only a boot time issue - I also see during normal operation: 2018-04-24T10:30:10.466723+00:00 gateway blacklistd 611 - - bl_recv: recvmsg failed (No buffer space available) 2018-04-24T10:30:10.466821+00:00 gateway blacklistd 611 - - no message (No buffer space available) 2018-04-24T10:56:47.223562+00:00 gateway sshd 13053 - - error: maximum authentication attempts exceeded for invalid user root from 106.113.147.190 port 63303 ssh2 [preauth] 2018-04-24T11:15:09.240247+00:00 gateway blacklistd 611 - - bl_recv: recvmsg failed (No buffer space available) 2018-04-24T11:15:09.240791+00:00 gateway blacklistd 611 - - no message (No buffer space available) I don't expect major resource usage for blacklistd though. Also named does not seem to be too happy and ceases interface scanning. This does not yet give a warm fuzzy feeling :-) && :-( Frank On 04/24/18 09:56, Roy Marples wrote: On 24/04/2018 08:26, Martin Husemann wrote: On Tue, Apr 24, 2018 at 07:30:04AM +0200, Frank Kardel wrote: syslogd has sometimes issues with /var/run/log 2018-04-24T05:13:34.542548+00:00 gateway syslogd 408 - - recvfrom() unix `/var/run/log': No buffer space available This is a seaparate change and unrelated to compatibility. It happens with up to date binaries as well. I think it was a silent bug before and has now been made more verbose. Still pretty annoying and happens for me on various machines on every boot. Roy, did you have a chance to look at it? Not yet no. But yes, in all releases prior it was a silent bug on all types of socket and in all the BSDs as well. I know, I checked - only OpenBSD has an overflow check like this and they solve that with a magic message on route(4) only which is just yuck as it makes the problem worse. I only have one machine where I can reliably repro this, my erlite and that only happens because route(4) overflows (detected in dhcpcd) as it's a router and the box isn't up yet and a load of address validation flows over the socket when the link comes up. This is a good thing, because dhcpcd can then react to the error and sync it's state using getifaddrs(). I think the easiest fix is to increase the default size of the socket buffer. Where this is done, I don't know but could find out if pushed. This would fix everything if the default buffer was big enough. Saying this, from what I'm hearing this only happens at boot time, so we could potentially shrink the buffer back down again if we need to consider dynamically growing it in the kernel as well. No idea if that's even possible or what performance impact it would have. The last option is to increase the socket buffer size in all affected applications using ioctl (or is it setsockopt?). But to what value I don't know. Trial and error? Roy
Re: -current cloner interfaces broken/gone/unusable
Hi Tom On 24/04/2018 12:39, Tom Ivar Helbekkmo wrote: Thomas Klausner writes: On Tue, Apr 24, 2018 at 08:56:48AM +0100, Roy Marples wrote: Saying this, from what I'm hearing this only happens at boot time, so we could potentially shrink the buffer back down again if we need to consider dynamically growing it in the kernel as well. No idea if that's even possible or what performance impact it would have. I had an application report an UDP error with "no buffer space available". I don't remember the exact error, sorry. But it was definitely some time after system start. Thomas I keep getting those, and have been for a long, long time: Apr 24 02:44:27 barsoom openvpn[301]: write UDPv4: No buffer space available (code=55) Apr 24 05:54:47 barsoom openvpn[292]: write UDPv4: No buffer space available (code=55) Apr 24 07:24:54 barsoom openvpn[292]: write UDPv4: No buffer space available (code=55) Apr 24 07:24:54 barsoom openvpn[292]: write UDPv4: No buffer space available (code=55) Apr 24 08:53:08 barsoom openvpn[292]: write UDPv4: No buffer space available (code=55) Apr 24 08:53:09 barsoom openvpn[292]: write UDPv4: No buffer space available (code=55) Apr 24 10:15:09 barsoom openvpn[305]: write UDPv4: No buffer space available (code=55) Apr 24 10:45:14 barsoom openvpn[305]: write UDPv4: No buffer space available (code=55) Apr 24 11:35:18 barsoom openvpn[305]: write UDPv4: No buffer space available (code=55) Apr 24 13:15:12 barsoom openvpn[305]: write UDPv4: No buffer space available (code=55) This unrelated to the issue at hand. That's an upstream issue - the send and write family calls have been returning ENOBUFS for quite a while on all OS's I know of. Roy
Re: -current cloner interfaces broken/gone/unusable
Thomas Klausner writes: > On Tue, Apr 24, 2018 at 08:56:48AM +0100, Roy Marples wrote: >> Saying this, from what I'm hearing this only happens at boot time, so we >> could potentially shrink the buffer back down again if we need to consider >> dynamically growing it in the kernel as well. No idea if that's even >> possible or what performance impact it would have. > > I had an application report an UDP error with "no buffer space > available". I don't remember the exact error, sorry. But it was > definitely some time after system start. > Thomas I keep getting those, and have been for a long, long time: Apr 24 02:44:27 barsoom openvpn[301]: write UDPv4: No buffer space available (code=55) Apr 24 05:54:47 barsoom openvpn[292]: write UDPv4: No buffer space available (code=55) Apr 24 07:24:54 barsoom openvpn[292]: write UDPv4: No buffer space available (code=55) Apr 24 07:24:54 barsoom openvpn[292]: write UDPv4: No buffer space available (code=55) Apr 24 08:53:08 barsoom openvpn[292]: write UDPv4: No buffer space available (code=55) Apr 24 08:53:09 barsoom openvpn[292]: write UDPv4: No buffer space available (code=55) Apr 24 10:15:09 barsoom openvpn[305]: write UDPv4: No buffer space available (code=55) Apr 24 10:45:14 barsoom openvpn[305]: write UDPv4: No buffer space available (code=55) Apr 24 11:35:18 barsoom openvpn[305]: write UDPv4: No buffer space available (code=55) Apr 24 13:15:12 barsoom openvpn[305]: write UDPv4: No buffer space available (code=55) -tih -- Most people who graduate with CS degrees don't understand the significance of Lisp. Lisp is the most important idea in computer science. --Alan Kay signature.asc Description: PGP signature
Re: -current cloner interfaces broken/gone/unusable
On Tue, Apr 24, 2018 at 08:56:48AM +0100, Roy Marples wrote: > Saying this, from what I'm hearing this only happens at boot time, so we > could potentially shrink the buffer back down again if we need to consider > dynamically growing it in the kernel as well. No idea if that's even > possible or what performance impact it would have. I had an application report an UDP error with "no buffer space available". I don't remember the exact error, sorry. But it was definitely some time after system start. Thomas
Re: -current cloner interfaces broken/gone/unusable
On 24/04/2018 08:26, Martin Husemann wrote: On Tue, Apr 24, 2018 at 07:30:04AM +0200, Frank Kardel wrote: syslogd has sometimes issues with /var/run/log 2018-04-24T05:13:34.542548+00:00 gateway syslogd 408 - - recvfrom() unix `/var/run/log': No buffer space available This is a seaparate change and unrelated to compatibility. It happens with up to date binaries as well. I think it was a silent bug before and has now been made more verbose. Still pretty annoying and happens for me on various machines on every boot. Roy, did you have a chance to look at it? Not yet no. But yes, in all releases prior it was a silent bug on all types of socket and in all the BSDs as well. I know, I checked - only OpenBSD has an overflow check like this and they solve that with a magic message on route(4) only which is just yuck as it makes the problem worse. I only have one machine where I can reliably repro this, my erlite and that only happens because route(4) overflows (detected in dhcpcd) as it's a router and the box isn't up yet and a load of address validation flows over the socket when the link comes up. This is a good thing, because dhcpcd can then react to the error and sync it's state using getifaddrs(). I think the easiest fix is to increase the default size of the socket buffer. Where this is done, I don't know but could find out if pushed. This would fix everything if the default buffer was big enough. Saying this, from what I'm hearing this only happens at boot time, so we could potentially shrink the buffer back down again if we need to consider dynamically growing it in the kernel as well. No idea if that's even possible or what performance impact it would have. The last option is to increase the socket buffer size in all affected applications using ioctl (or is it setsockopt?). But to what value I don't know. Trial and error? Roy
Re: -current cloner interfaces broken/gone/unusable
On Tue, Apr 24, 2018 at 07:30:04AM +0200, Frank Kardel wrote: > syslogd has sometimes issues with /var/run/log > 2018-04-24T05:13:34.542548+00:00 gateway syslogd 408 - - recvfrom() unix > `/var/run/log': No buffer space available This is a seaparate change and unrelated to compatibility. It happens with up to date binaries as well. I think it was a silent bug before and has now been made more verbose. Still pretty annoying and happens for me on various machines on every boot. Roy, did you have a chance to look at it? Martin
Re: -current cloner interfaces broken/gone/unusable
Hi Robert ! That made it work again. I share your view on relative beauty here. There are also 2 other observations with a 8.99.12 userland: named has now trouble with interface scanning. 2018-04-24T05:13:34.522295+00:00 gateway named 345 - - automatic interface scanning terminated: not enough free resources syslogd has sometimes issues with /var/run/log 2018-04-24T05:13:34.542548+00:00 gateway syslogd 408 - - recvfrom() unix `/var/run/log': No buffer space available Looks like the may be more issues with (compatibility) code. Thanks for the (preliminary) fix. Frank On 04/24/18 00:34, Robert Swindells wrote: Frank Kardel wrote: using -current as of 20180421 (NetBSD 8.99.14 (GENERIC) #0: Sat Apr 21 23:01:29 UTC 2018 mkre...@mkrepro.netbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC amd64) no cloning interfaces are visible: gateway# ifconfig -l ixg0 ixg1 ixg2 ixg3 lo0 tun0 tun1 gateway# ifconfig -C ifconfig: SIOCIFGCLONERS for count: Device not configured gateway# ifconfig vlan0 create ifconfig: clone_command: Device not configured ifconfig: exec_matches: Device not configured gateway# This does not seem to be a desirable state - any clues what broke here ? It looks to be the test for a valid interface name in sys/compat/common/uipc_syscalls_50.c that is causing this, I think it should only be done when the ioctl command is SIOCGIFDATA or SIOCZIFDATA. This works for me but is a bit ugly: Index: uipc_syscalls_50.c === RCS file: /cvsroot/src/sys/compat/common/uipc_syscalls_50.c,v retrieving revision 1.4 diff -u -r1.4 uipc_syscalls_50.c --- uipc_syscalls_50.c 12 Apr 2018 18:50:13 - 1.4 +++ uipc_syscalls_50.c 23 Apr 2018 22:33:14 - @@ -63,9 +63,17 @@ struct ifnet *ifp; int error; - ifp = ifunit(ifdr->ifdr_name); - if (ifp == NULL) - return ENXIO; + switch (cmd) { + case SIOCGIFDATA: + case SIOCZIFDATA: + ifp = ifunit(ifdr->ifdr_name); + if (ifp == NULL) + return ENXIO; + break; + default: + ifp = NULL; + break; + } switch (cmd) { case SIOCGIFDATA:
Re: -current cloner interfaces broken/gone/unusable
Frank Kardel wrote: >using -current as of 20180421 (NetBSD 8.99.14 (GENERIC) #0: Sat Apr 21 >23:01:29 UTC 2018 >mkre...@mkrepro.netbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC amd64) > >no cloning interfaces are visible: > >gateway# ifconfig -l >ixg0 ixg1 ixg2 ixg3 lo0 tun0 tun1 >gateway# ifconfig -C >ifconfig: SIOCIFGCLONERS for count: Device not configured >gateway# ifconfig vlan0 create >ifconfig: clone_command: Device not configured >ifconfig: exec_matches: Device not configured >gateway# > >This does not seem to be a desirable state - any clues what broke here ? It looks to be the test for a valid interface name in sys/compat/common/uipc_syscalls_50.c that is causing this, I think it should only be done when the ioctl command is SIOCGIFDATA or SIOCZIFDATA. This works for me but is a bit ugly: Index: uipc_syscalls_50.c === RCS file: /cvsroot/src/sys/compat/common/uipc_syscalls_50.c,v retrieving revision 1.4 diff -u -r1.4 uipc_syscalls_50.c --- uipc_syscalls_50.c 12 Apr 2018 18:50:13 - 1.4 +++ uipc_syscalls_50.c 23 Apr 2018 22:33:14 - @@ -63,9 +63,17 @@ struct ifnet *ifp; int error; - ifp = ifunit(ifdr->ifdr_name); - if (ifp == NULL) - return ENXIO; + switch (cmd) { + case SIOCGIFDATA: + case SIOCZIFDATA: + ifp = ifunit(ifdr->ifdr_name); + if (ifp == NULL) + return ENXIO; + break; + default: + ifp = NULL; + break; + } switch (cmd) { case SIOCGIFDATA:
-current cloner interfaces broken/gone/unusable
Hi, using -current as of 20180421 (NetBSD 8.99.14 (GENERIC) #0: Sat Apr 21 23:01:29 UTC 2018 mkre...@mkrepro.netbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC amd64) no cloning interfaces are visible: gateway# ifconfig -l ixg0 ixg1 ixg2 ixg3 lo0 tun0 tun1 gateway# ifconfig -C ifconfig: SIOCIFGCLONERS for count: Device not configured gateway# ifconfig vlan0 create ifconfig: clone_command: Device not configured ifconfig: exec_matches: Device not configured gateway# This does not seem to be a desirable state - any clues what broke here ? Frank