Re: -current cloner interfaces broken/gone/unusable

2018-06-21 Thread Greg Troxel

Takahiro Kambe  writes:

> Hi, this is too late reply.
>
> In message <2532e215-e447-edaa-1f7b-5abf5205f...@netbsd.org>
>   on Tue, 24 Apr 2018 07:30:04 +0200,
>   Frank Kardel  wrote:
>> There are also 2 other observations with a 8.99.12 userland:
>> 
>> named has now trouble with interface scanning.
>> 
>> 2018-04-24T05:13:34.522295+00:00 gateway named 345 - - automatic
>> interface scanning terminated: not enough free resources
> I found it on NetBSD 8.0_RC1 machine with both base named and
> pkgsrc/net/bind910.
>
> Below change seems to fix this problem although 16284 is something
> large size without exact reason.  :-)

It was long ago, but I remember a code pattern surrounding interface
lists that noticed that the results did not fit and retried with a
larger buffer.  So regardless of changing, if that's possible, it would
be good to at least log an error if there is no larger/retry logic.


signature.asc
Description: PGP signature


Re: -current cloner interfaces broken/gone/unusable

2018-06-21 Thread Takahiro Kambe
Hi, this is too late reply.

In message <2532e215-e447-edaa-1f7b-5abf5205f...@netbsd.org>
on Tue, 24 Apr 2018 07:30:04 +0200,
Frank Kardel  wrote:
> There are also 2 other observations with a 8.99.12 userland:
> 
> named has now trouble with interface scanning.
> 
> 2018-04-24T05:13:34.522295+00:00 gateway named 345 - - automatic
> interface scanning terminated: not enough free resources
I found it on NetBSD 8.0_RC1 machine with both base named and
pkgsrc/net/bind910.

Below change seems to fix this problem although 16284 is something
large size without exact reason.  :-)

-- 
Takahiro Kambe  / 


--- bin/named/interfacemgr.c.orig   2018-03-08 20:55:52.0 +
+++ bin/named/interfacemgr.c
@@ -84,7 +84,7 @@ struct ns_interfacemgr {
 #ifdef USE_ROUTE_SOCKET
isc_task_t *task;
isc_socket_t *  route;
-   unsigned char   buf[2048];
+   unsigned char   buf[16384];
 #endif
 };
 



Re: -current cloner interfaces broken/gone/unusable

2018-04-26 Thread Roy Marples

On 23/04/2018 23:34, Robert Swindells wrote:


Frank Kardel  wrote:

using -current as of 20180421 (NetBSD 8.99.14 (GENERIC) #0: Sat Apr 21
23:01:29 UTC 2018
mkre...@mkrepro.netbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC amd64)

no cloning interfaces are visible:

gateway# ifconfig -l
ixg0 ixg1 ixg2 ixg3 lo0 tun0 tun1
gateway# ifconfig -C
ifconfig: SIOCIFGCLONERS for count: Device not configured
gateway# ifconfig vlan0 create
ifconfig: clone_command: Device not configured
ifconfig: exec_matches: Device not configured
gateway#

This does not seem to be a desirable state - any clues what broke here ?


It looks to be the test for a valid interface name in
sys/compat/common/uipc_syscalls_50.c that is causing this, I think it
should only be done when the ioctl command is SIOCGIFDATA or SIOCZIFDATA.

This works for me but is a bit ugly:

Index: uipc_syscalls_50.c
===
RCS file: /cvsroot/src/sys/compat/common/uipc_syscalls_50.c,v
retrieving revision 1.4
diff -u -r1.4 uipc_syscalls_50.c
--- uipc_syscalls_50.c  12 Apr 2018 18:50:13 -  1.4
+++ uipc_syscalls_50.c  23 Apr 2018 22:33:14 -
@@ -63,9 +63,17 @@
 struct ifnet *ifp;
 int error;
  
-   ifp = ifunit(ifdr->ifdr_name);

-   if (ifp == NULL)
-   return ENXIO;
+   switch (cmd) {
+   case SIOCGIFDATA:
+   case SIOCZIFDATA:
+   ifp = ifunit(ifdr->ifdr_name);
+   if (ifp == NULL)
+   return ENXIO;
+   break;
+   default:
+   ifp = NULL;
+   break;
+   }
  
 switch (cmd) {

 case SIOCGIFDATA:



Committed, thanks

Roy


Re: -current cloner interfaces broken/gone/unusable

2018-04-24 Thread Frank Kardel

It is not only a boot time issue - I also see during normal operation:

2018-04-24T10:30:10.466723+00:00 gateway blacklistd 611 - - bl_recv: 
recvmsg failed (No buffer space available)
2018-04-24T10:30:10.466821+00:00 gateway blacklistd 611 - - no message 
(No buffer space available)
2018-04-24T10:56:47.223562+00:00 gateway sshd 13053 - - error: maximum 
authentication attempts exceeded for invalid user root from 
106.113.147.190 port 63303 ssh2 [preauth]
2018-04-24T11:15:09.240247+00:00 gateway blacklistd 611 - - bl_recv: 
recvmsg failed (No buffer space available)
2018-04-24T11:15:09.240791+00:00 gateway blacklistd 611 - - no message 
(No buffer space available)


I don't expect major resource usage for blacklistd though.

Also named does not seem to be too happy and ceases interface scanning. 
This does not yet give a warm fuzzy feeling :-) && :-(


Frank

On 04/24/18 09:56, Roy Marples wrote:

On 24/04/2018 08:26, Martin Husemann wrote:

On Tue, Apr 24, 2018 at 07:30:04AM +0200, Frank Kardel wrote:

syslogd has sometimes issues with /var/run/log
2018-04-24T05:13:34.542548+00:00 gateway syslogd 408 - - recvfrom() 
unix

`/var/run/log': No buffer space available


This is a seaparate change and unrelated to compatibility. It happens
with up to date binaries as well. I think it was a silent bug before
and has now been made more verbose. Still pretty annoying and happens
for me on various machines on every boot. Roy, did you have a chance to
look at it?


Not yet no. But yes, in all releases prior it was a silent bug on all 
types of socket and in all the BSDs as well. I know, I checked - only 
OpenBSD has an overflow check like this and they solve that with a 
magic message on route(4) only which is just yuck as it makes the 
problem worse.


I only have one machine where I can reliably repro this, my erlite and 
that only happens because route(4) overflows (detected in dhcpcd) as 
it's a router and the box isn't up yet and a load of address 
validation flows over the socket when the link comes up. This is a 
good thing, because dhcpcd can then react to the error and sync it's 
state using getifaddrs().


I think the easiest fix is to increase the default size of the socket 
buffer. Where this is done, I don't know but could find out if pushed.

This would fix everything if the default buffer was big enough.

Saying this, from what I'm hearing this only happens at boot time, so 
we could potentially shrink the buffer back down again if we need to 
consider dynamically growing it in the kernel as well. No idea if 
that's even possible or what performance impact it would have.


The last option is to increase the socket buffer size in all affected 
applications using ioctl (or is it setsockopt?). But to what value I 
don't know. Trial and error?


Roy




Re: -current cloner interfaces broken/gone/unusable

2018-04-24 Thread Roy Marples

Hi Tom

On 24/04/2018 12:39, Tom Ivar Helbekkmo wrote:

Thomas Klausner  writes:


On Tue, Apr 24, 2018 at 08:56:48AM +0100, Roy Marples wrote:

Saying this, from what I'm hearing this only happens at boot time, so we
could potentially shrink the buffer back down again if we need to consider
dynamically growing it in the kernel as well. No idea if that's even
possible or what performance impact it would have.


I had an application report an UDP error with "no buffer space
available". I don't remember the exact error, sorry. But it was
definitely some time after system start.
  Thomas


I keep getting those, and have been for a long, long time:

Apr 24 02:44:27 barsoom openvpn[301]: write UDPv4: No buffer space available 
(code=55)
Apr 24 05:54:47 barsoom openvpn[292]: write UDPv4: No buffer space available 
(code=55)
Apr 24 07:24:54 barsoom openvpn[292]: write UDPv4: No buffer space available 
(code=55)
Apr 24 07:24:54 barsoom openvpn[292]: write UDPv4: No buffer space available 
(code=55)
Apr 24 08:53:08 barsoom openvpn[292]: write UDPv4: No buffer space available 
(code=55)
Apr 24 08:53:09 barsoom openvpn[292]: write UDPv4: No buffer space available 
(code=55)
Apr 24 10:15:09 barsoom openvpn[305]: write UDPv4: No buffer space available 
(code=55)
Apr 24 10:45:14 barsoom openvpn[305]: write UDPv4: No buffer space available 
(code=55)
Apr 24 11:35:18 barsoom openvpn[305]: write UDPv4: No buffer space available 
(code=55)
Apr 24 13:15:12 barsoom openvpn[305]: write UDPv4: No buffer space available 
(code=55)


This unrelated to the issue at hand.

That's an upstream issue - the send and write family calls have been 
returning ENOBUFS for quite a while on all OS's I know of.


Roy


Re: -current cloner interfaces broken/gone/unusable

2018-04-24 Thread Tom Ivar Helbekkmo
Thomas Klausner  writes:

> On Tue, Apr 24, 2018 at 08:56:48AM +0100, Roy Marples wrote:
>> Saying this, from what I'm hearing this only happens at boot time, so we
>> could potentially shrink the buffer back down again if we need to consider
>> dynamically growing it in the kernel as well. No idea if that's even
>> possible or what performance impact it would have.
>
> I had an application report an UDP error with "no buffer space
> available". I don't remember the exact error, sorry. But it was
> definitely some time after system start.
>  Thomas

I keep getting those, and have been for a long, long time:

Apr 24 02:44:27 barsoom openvpn[301]: write UDPv4: No buffer space available 
(code=55)
Apr 24 05:54:47 barsoom openvpn[292]: write UDPv4: No buffer space available 
(code=55)
Apr 24 07:24:54 barsoom openvpn[292]: write UDPv4: No buffer space available 
(code=55)
Apr 24 07:24:54 barsoom openvpn[292]: write UDPv4: No buffer space available 
(code=55)
Apr 24 08:53:08 barsoom openvpn[292]: write UDPv4: No buffer space available 
(code=55)
Apr 24 08:53:09 barsoom openvpn[292]: write UDPv4: No buffer space available 
(code=55)
Apr 24 10:15:09 barsoom openvpn[305]: write UDPv4: No buffer space available 
(code=55)
Apr 24 10:45:14 barsoom openvpn[305]: write UDPv4: No buffer space available 
(code=55)
Apr 24 11:35:18 barsoom openvpn[305]: write UDPv4: No buffer space available 
(code=55)
Apr 24 13:15:12 barsoom openvpn[305]: write UDPv4: No buffer space available 
(code=55)

-tih
-- 
Most people who graduate with CS degrees don't understand the significance
of Lisp.  Lisp is the most important idea in computer science.  --Alan Kay


signature.asc
Description: PGP signature


Re: -current cloner interfaces broken/gone/unusable

2018-04-24 Thread Thomas Klausner
On Tue, Apr 24, 2018 at 08:56:48AM +0100, Roy Marples wrote:
> Saying this, from what I'm hearing this only happens at boot time, so we
> could potentially shrink the buffer back down again if we need to consider
> dynamically growing it in the kernel as well. No idea if that's even
> possible or what performance impact it would have.

I had an application report an UDP error with "no buffer space
available". I don't remember the exact error, sorry. But it was
definitely some time after system start.
 Thomas


Re: -current cloner interfaces broken/gone/unusable

2018-04-24 Thread Roy Marples

On 24/04/2018 08:26, Martin Husemann wrote:

On Tue, Apr 24, 2018 at 07:30:04AM +0200, Frank Kardel wrote:

syslogd has sometimes issues with /var/run/log
2018-04-24T05:13:34.542548+00:00 gateway syslogd 408 - - recvfrom() unix
`/var/run/log': No buffer space available


This is a seaparate change and unrelated to compatibility. It happens
with up to date binaries as well. I think it was a silent bug before
and has now been made more verbose. Still pretty annoying and happens
for me on various machines on every boot. Roy, did you have a chance to
look at it?


Not yet no. But yes, in all releases prior it was a silent bug on all 
types of socket and in all the BSDs as well. I know, I checked - only 
OpenBSD has an overflow check like this and they solve that with a magic 
message on route(4) only which is just yuck as it makes the problem worse.


I only have one machine where I can reliably repro this, my erlite and 
that only happens because route(4) overflows (detected in dhcpcd) as 
it's a router and the box isn't up yet and a load of address validation 
flows over the socket when the link comes up. This is a good thing, 
because dhcpcd can then react to the error and sync it's state using 
getifaddrs().


I think the easiest fix is to increase the default size of the socket 
buffer. Where this is done, I don't know but could find out if pushed.

This would fix everything if the default buffer was big enough.

Saying this, from what I'm hearing this only happens at boot time, so we 
could potentially shrink the buffer back down again if we need to 
consider dynamically growing it in the kernel as well. No idea if that's 
even possible or what performance impact it would have.


The last option is to increase the socket buffer size in all affected 
applications using ioctl (or is it setsockopt?). But to what value I 
don't know. Trial and error?


Roy


Re: -current cloner interfaces broken/gone/unusable

2018-04-24 Thread Martin Husemann
On Tue, Apr 24, 2018 at 07:30:04AM +0200, Frank Kardel wrote:
> syslogd has sometimes issues with /var/run/log
> 2018-04-24T05:13:34.542548+00:00 gateway syslogd 408 - - recvfrom() unix
> `/var/run/log': No buffer space available

This is a seaparate change and unrelated to compatibility. It happens
with up to date binaries as well. I think it was a silent bug before
and has now been made more verbose. Still pretty annoying and happens
for me on various machines on every boot. Roy, did you have a chance to
look at it?

Martin


Re: -current cloner interfaces broken/gone/unusable

2018-04-23 Thread Frank Kardel

Hi Robert !

That made it work again. I share your view on relative beauty here.

There are also 2 other observations with a 8.99.12 userland:

named has now trouble with interface scanning.

2018-04-24T05:13:34.522295+00:00 gateway named 345 - - automatic 
interface scanning terminated: not enough free resources


syslogd has sometimes issues with /var/run/log
2018-04-24T05:13:34.542548+00:00 gateway syslogd 408 - - recvfrom() unix 
`/var/run/log': No buffer space available


Looks like the may be more issues with (compatibility) code.

Thanks for the (preliminary) fix.

Frank

On 04/24/18 00:34, Robert Swindells wrote:

Frank Kardel  wrote:

using -current as of 20180421 (NetBSD 8.99.14 (GENERIC) #0: Sat Apr 21
23:01:29 UTC 2018
mkre...@mkrepro.netbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC amd64)

no cloning interfaces are visible:

gateway# ifconfig -l
ixg0 ixg1 ixg2 ixg3 lo0 tun0 tun1
gateway# ifconfig -C
ifconfig: SIOCIFGCLONERS for count: Device not configured
gateway# ifconfig vlan0 create
ifconfig: clone_command: Device not configured
ifconfig: exec_matches: Device not configured
gateway#

This does not seem to be a desirable state - any clues what broke here ?

It looks to be the test for a valid interface name in
sys/compat/common/uipc_syscalls_50.c that is causing this, I think it
should only be done when the ioctl command is SIOCGIFDATA or SIOCZIFDATA.

This works for me but is a bit ugly:

Index: uipc_syscalls_50.c
===
RCS file: /cvsroot/src/sys/compat/common/uipc_syscalls_50.c,v
retrieving revision 1.4
diff -u -r1.4 uipc_syscalls_50.c
--- uipc_syscalls_50.c  12 Apr 2018 18:50:13 -  1.4
+++ uipc_syscalls_50.c  23 Apr 2018 22:33:14 -
@@ -63,9 +63,17 @@
 struct ifnet *ifp;
 int error;
  
-   ifp = ifunit(ifdr->ifdr_name);

-   if (ifp == NULL)
-   return ENXIO;
+   switch (cmd) {
+   case SIOCGIFDATA:
+   case SIOCZIFDATA:
+   ifp = ifunit(ifdr->ifdr_name);
+   if (ifp == NULL)
+   return ENXIO;
+   break;
+   default:
+   ifp = NULL;
+   break;
+   }
  
 switch (cmd) {

 case SIOCGIFDATA:





Re: -current cloner interfaces broken/gone/unusable

2018-04-23 Thread Robert Swindells

Frank Kardel  wrote:
>using -current as of 20180421 (NetBSD 8.99.14 (GENERIC) #0: Sat Apr 21 
>23:01:29 UTC 2018 
>mkre...@mkrepro.netbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC amd64)
>
>no cloning interfaces are visible:
>
>gateway# ifconfig -l
>ixg0 ixg1 ixg2 ixg3 lo0 tun0 tun1
>gateway# ifconfig -C
>ifconfig: SIOCIFGCLONERS for count: Device not configured
>gateway# ifconfig vlan0 create
>ifconfig: clone_command: Device not configured
>ifconfig: exec_matches: Device not configured
>gateway#
>
>This does not seem to be a desirable state - any clues what broke here ?

It looks to be the test for a valid interface name in
sys/compat/common/uipc_syscalls_50.c that is causing this, I think it
should only be done when the ioctl command is SIOCGIFDATA or SIOCZIFDATA.

This works for me but is a bit ugly:

Index: uipc_syscalls_50.c
===
RCS file: /cvsroot/src/sys/compat/common/uipc_syscalls_50.c,v
retrieving revision 1.4
diff -u -r1.4 uipc_syscalls_50.c
--- uipc_syscalls_50.c  12 Apr 2018 18:50:13 -  1.4
+++ uipc_syscalls_50.c  23 Apr 2018 22:33:14 -
@@ -63,9 +63,17 @@
struct ifnet *ifp;
int error;
 
-   ifp = ifunit(ifdr->ifdr_name);
-   if (ifp == NULL)
-   return ENXIO;
+   switch (cmd) {
+   case SIOCGIFDATA:
+   case SIOCZIFDATA:
+   ifp = ifunit(ifdr->ifdr_name);
+   if (ifp == NULL)
+   return ENXIO;
+   break;
+   default:
+   ifp = NULL;
+   break;
+   }
 
switch (cmd) {
case SIOCGIFDATA:



-current cloner interfaces broken/gone/unusable

2018-04-23 Thread Frank Kardel

Hi,

using -current as of 20180421 (NetBSD 8.99.14 (GENERIC) #0: Sat Apr 21 
23:01:29 UTC 2018 
mkre...@mkrepro.netbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC amd64)


no cloning interfaces are visible:

gateway# ifconfig -l
ixg0 ixg1 ixg2 ixg3 lo0 tun0 tun1
gateway# ifconfig -C
ifconfig: SIOCIFGCLONERS for count: Device not configured
gateway# ifconfig vlan0 create
ifconfig: clone_command: Device not configured
ifconfig: exec_matches: Device not configured
gateway#

This does not seem to be a desirable state - any clues what broke here ?

Frank