Re: Build failure trying to lib includes-libepoxy
On Sun, 06 Oct 2024 00:24:22 +0100 Robert Swindells wrote --- > > Roy Marples r...@marples.name> wrote: > > --- includes-libepoxy --- > > nbmake[7]: don't know how to make gl_generated.h. Stop > > nbmake[7]: stopped making "includes" in > > /home/roy/src/src/external/mit/xorg/lib/libepoxy > > nbmake[6]: stopped making "includes" in > > /home/roy/src/src/external/mit/xorg/lib > > > > Looking through the sources, gl_generated and friends are created by > > meson which we don't use, so is the file just missing from the repo? > > The file is in xsrc/external/mit/libepoxy/src, the includes target just > copies it to the right place in the target tree. Actually the file wasn't there! And that was the clue - a simple hg clone of xsrc checkedout xorg and not trunk. Is there a way to fixate a default branch on clone on anonhg so this doens't happen to others? > Are you using the -j flag to make to build in parallel? Yes, lots. 24j's make the build go a bit quicker. Roy
Build failure trying to lib includes-libepoxy
--- includes-libepoxy --- nbmake[7]: don't know how to make gl_generated.h. Stop nbmake[7]: stopped making "includes" in /home/roy/src/src/external/mit/xorg/lib/libepoxy nbmake[6]: stopped making "includes" in /home/roy/src/src/external/mit/xorg/lib Looking through the sources, gl_generated and friends are created by meson which we don't use, so is the file just missing from the repo? Roy
Re: Problems with dhcpcd
On Mon, 09 Oct 2023 12:41:30 +0100 Roy Marples wrote --- > On Mon, 09 Oct 2023 11:33:16 +0100 Roy Marples wrote --- > > On Sun, 08 Oct 2023 21:58:54 +0100 Lloyd Parkes wrote --- > > > > > > > > > On 8/10/23 15:30, Lloyd Parkes wrote: > > > I found the problem. The syslog function in /libexec/dhcpcd-run-hooks > > > tries to echo text to stdout/stderr and the shell script gets killed > > > with SIGPIPE when it's being run in the background. > > > > > > Commenting out the lines > > > > > > case "$lvl" in > > > err|error) echo "$interface: $*" >&2;; > > > *) echo "$interface: $*";; > > > esac > > > > > > allows the script to run correctly. > > > > > > Adding the command 'trap "" PIPE' to /libexec/dhcpcd-run-hooks is > > > another way that allows the script to run correctly. > > > > That's interesting. So I'm looking at two bugs here then > > 1) Why is SIGPIPE being raised in the first place > > 2) Why is it not being captured as an error and logged by dhcpcd. > > > > As best I can tell, even forcing stdout and stderr to /dev/null doesn't > help here. > > What else could this be? > > 2) is fixed by this patch. > Now dhcpcd correctly reports a broken pipe from running the script. I've just landed dhcpcd-10.0.4 into -current and pkgsrc which fixes this issue. Sorry for the delay. Let me know if it works for you! Roy
Re: Problems with dhcpcd
On Mon, 09 Oct 2023 11:33:16 +0100 Roy Marples wrote --- > On Sun, 08 Oct 2023 21:58:54 +0100 Lloyd Parkes wrote --- > > > > > > On 8/10/23 15:30, Lloyd Parkes wrote: > > I found the problem. The syslog function in /libexec/dhcpcd-run-hooks > > tries to echo text to stdout/stderr and the shell script gets killed > > with SIGPIPE when it's being run in the background. > > > > Commenting out the lines > > > > case "$lvl" in > > err|error) echo "$interface: $*" >&2;; > > *) echo "$interface: $*";; > > esac > > > > allows the script to run correctly. > > > > Adding the command 'trap "" PIPE' to /libexec/dhcpcd-run-hooks is > > another way that allows the script to run correctly. > > That's interesting. So I'm looking at two bugs here then > 1) Why is SIGPIPE being raised in the first place > 2) Why is it not being captured as an error and logged by dhcpcd. > > As best I can tell, even forcing stdout and stderr to /dev/null doesn't help > here. > What else could this be? 2) is fixed by this patch. Now dhcpcd correctly reports a broken pipe from running the script. https://github.com/NetworkConfiguration/dhcpcd/commit/617a3ae207898a968bccd1e40a299fbfa6a4cc52 diff --git a/src/script.c b/src/script.c index 2ef99e38..69297a46 100644 --- a/src/script.c +++ b/src/script.c @@ -681,6 +681,21 @@ send_interface(struct fd_list *fd, const struct interface *ifp, int af) return retval; } +static int +script_status(const char *script, int status) +{ + + if (WIFEXITED(status)) { + if (WEXITSTATUS(status)) + logerrx("%s: %s: WEXITSTATUS %d", + __func__, script, WEXITSTATUS(status)); + } else if (WIFSIGNALED(status)) + logerrx("%s: %s: %s", + __func__, script, strsignal(WTERMSIG(status))); + + return WEXITSTATUS(status); +} + static int script_run(struct dhcpcd_ctx *ctx, char **argv) { @@ -699,13 +714,7 @@ script_run(struct dhcpcd_ctx *ctx, char **argv) break; } } - if (WIFEXITED(status)) { - if (WEXITSTATUS(status)) - logerrx("%s: %s: WEXITSTATUS %d", - __func__, argv[0], WEXITSTATUS(status)); - } else if (WIFSIGNALED(status)) - logerrx("%s: %s: %s", - __func__, argv[0], strsignal(WTERMSIG(status))); + status = script_status(argv[0], status); } return WEXITSTATUS(status); @@ -763,9 +772,13 @@ script_runreason(const struct interface *ifp, const char *reason) #ifdef PRIVSEP if (ctx->options & DHCPCD_PRIVSEP) { - if (ps_root_script(ctx, - ctx->script_buf, (size_t)buflen) == -1) + ssize_t err; + + err = ps_root_script(ctx, ctx->script_buf, (size_t)buflen); + if (err == -1) logerr(__func__); + else + script_status(ctx->script, (int)err); goto send_listeners; } #endif
Re: Problems with dhcpcd
On Sun, 08 Oct 2023 21:58:54 +0100 Lloyd Parkes wrote --- > > > On 8/10/23 15:30, Lloyd Parkes wrote: > I found the problem. The syslog function in /libexec/dhcpcd-run-hooks > tries to echo text to stdout/stderr and the shell script gets killed > with SIGPIPE when it's being run in the background. > > Commenting out the lines > > case "$lvl" in > err|error) echo "$interface: $*" >&2;; > *) echo "$interface: $*";; > esac > > allows the script to run correctly. > > Adding the command 'trap "" PIPE' to /libexec/dhcpcd-run-hooks is > another way that allows the script to run correctly. That's interesting. So I'm looking at two bugs here then 1) Why is SIGPIPE being raised in the first place 2) Why is it not being captured as an error and logged by dhcpcd. As best I can tell, even forcing stdout and stderr to /dev/null doesn't help here. What else could this be? Roy
Re: Problems with dhcpcd
> I've installed 10.99.9 from about a day ago onto an old Raspberry Pi and I > just can't get it to correctly set its hostname from DHCP. (I have removed > the hostname=rpi from /etc/rc.conf). > What I have discovered so far is that if I manually run "dhcpcd -d" then no > hostname gets set. If I run "dhcpcd -d -B" then the hostname does get set. > This doesn't make sense. You're correct, this does not make sense. > Here are the logs from the failed run (console and /var/log/messages). Dhcpcd > doesn't seem to be running the hooks for the "CARRIER", which is something > that does happen with dhcpcd -d -B". Interestingly, the message "executing: > /libexec/dhcpcd-run-hooks ..." is not logged to syslog by either invocation > of dhcpcd. syslog.conf doesn't log debug messages to /var/log/messages by default - you need to enable that. An alternative is to put `logfile /var/log/dhcpcd.log` into /etc/dhcpcd.conf and look there. > Sep 30 22:15:40 dhcpcd[331]: usmsc0: rebinding lease of 10.0.1.53 > Sep 30 22:15:46 dhcpcd[331]: usmsc0: leased 10.0.1.53 for 86400 seconds > Sep 30 22:15:52 dhcpcd[331]: usmsc0: adding route to 10.0.1.0/24 > Sep 30 22:15:52 dhcpcd[331]: usmsc0: adding default route via 10.0.1.1 So it took 12 seconds to complete the DHCP transaction and validate the addresses are good before applying the DHCP lease. Without -B, dhcpcd will fork to the background right away so any assignments from the DHCP lease won't apply right away. Is this what you are seeing? Is the hostname even there? You can examine the contents of your leases with `dhcpcd -U`. I have only just imported dhcpcd-10.0.3 to -current. Unlikely to address this exact issue (if there is one yet), but you never know. Roy Marples
Re: zfs howto
On 14/02/2021 09:35, J. Hannken-Illjes wrote: The trigger is '-maproot' with group(s), first bug is mountd leaving 'cr_gid' as -2 and setting the first group list member to 10 in this case. Second bug is ZFS setting illegal group id -2 aka 4294967294 to GID_NOBODY with id -2. Later this illegal id leads to null pointer dereference in zfs_log_create() at zfs_log.c:297 "lr->lr_gid = fuidp->z_fuid_group" where fuidp is NULL. With the attached diff the ZFS bug gets fixed and your export works. Fixes my export full root to ERLITE as well - thanks! I don't have any group or mapping options, so I guess the hardcoded defaults failed. Could we get ZFS not to actually panic in this instance? I feel somewhat uncomfortable with ZFS hardcoding these values from user editable configs as well but don't have any good ideas for that. Roy
Re: zfs howto
On 12/02/2021 14:44, Greg Troxel wrote: Long ago I rototilled to zfs howto adding far more questions than answers. I just did another rototill pass. https://wiki.netbsd.org/zfs/ While many \todos remain, the biggest questions I have are about NFS: If I want to export a zfs filesystem over NFS, what specifically do I need to do. Does the crash bug referenced in the NFS section still exist in current current? (It's still open) http://gnats.netbsd.org/55042 It crashes when my ERLite tries to mount / NFS at the checking root phase. Roy
Re: Automated report: NetBSD-current/i386 build failure
On 03/02/2021 17:55, Ryo ONODERA wrote: Exactly. It happens in dtrace userland build. Fixed. Sorry about that. Maybe we should not define CTASSERT ourselves and just use __CTASSERT to avoid this in the future? Roy
Re: Automated report: NetBSD-current/i386 build failure
On 03/02/2021 14:42, Ryo ONODERA wrote: Hi, It seems that CTASSERT in netinet/in.h conflicts with CTASSERT in external/cddl/osnet/dist/uts/common/sys/debug.h. Ryo ONODERA writes: Hi, However I have gotten another failure: --- dt_print.pico --- In file included from /usr/src/external/cddl/osnet/sys/sys/debug.h:51, from /usr/src/external/cddl/osnet/sys/sys/uio.h:64, from /usr/world/9.99/amd64/dest/usr/include/sys/socket.h:99, from /us r/src/external/cddl/osnet/lib/libdtrace/../../dist/lib/ libdtrace/common/dt_print.c:76: /usr/world/9.99/amd64/dest/usr/include/netinet/in.h:162:1: error: macro "__CTASS ERT" passed 2 arguments, but takes just 1 162 | CTASSERT(sizeof(struct in_addr) == 4); | ^~~~ I cannot replicate this? I'm just building a stock kernel - what extra options do I need? Roy
Re: Help with libcurses and lynx under NetBSD-9 and -current?
On 02/02/2021 09:44, Brett Lymn wrote: Why don't you post your $TERMCAP and infocmp output here? Umm I don't have a problem with using terminfo. I am more interested in working out why lynx is misbehaving in window. I suspect that is something I did wrong when I fixed another PR to do with the input routines not preserving the cursor location. That was mainly for Brian incase there is something we can spot that's wrong with his $TERMCAP string. Roy
Re: Help with libcurses and lynx under NetBSD-9 and -current?
On 01/02/2021 09:53, Brett Lymn wrote: The TERMCAP variable has some severe liitations, the worst being it can only be 256bytes in size which was more than adequate for a vt100 definition but a modern colour xterm definition simply won't fit in that space, terminfo does not have these limitations. Are you sure about that? I don't think libterminfo imposes any length on $TERMCAP other than those translating to terminfo. Not ruling out any errors with the conversion though. You can verify $TERMCAP using infocmp. $ echo $TERMCAP dw|vt52|DEC vt52: :cr=^M:do=^J:nl=^J:bl=^G: :le=^H:bs:cd=\EJ:ce=\EK:cl=\EH\EJ: :cm=\EY%+ %+ :co#80:li#24: :nd=\EC:ta=^I:pt:sr=\EI:up=\EA: :ku=\EA:kd=\EB:kr=\EC:kl=\ED:kb=^H: $ infocmp dw # Reconstructed from $TERMCAP dw|vt52|DEC vt52, cols#80, lines#24, bel=^G, clear=\EH\EJ, cr=^M, cub1=^H, cud1=^J, cuf1=\EC, cup=\EY%p1%{32}%+%c%p2%{32}%+%c, cuu1=\EA, ed=\EJ, el=\EK, ht=^I, ind=^J, kbs=^H, kcub1=\ED, kcud1=\EB, kcuf1=\EC, kcuu1=\EA, nel=^M^J, ri=\EI, $ Why don't you post your $TERMCAP and infocmp output here? Roy
Re: Routing socket issue?
Hi Frank :) On 31/01/2021 07:58, Frank Kardel wrote: For example I fail to see how RTM_LOSING helps that because it won't change how ntpd would configure itself. Well if I read the comment right I am inclined to differ here: In in_pcs.c we find: /* * Check for alternatives when higher level complains * about service problems. For now, invalidate cached * routing information. If the route was created dynamically * (by a redirect), time to try a default gateway again. */ in_losing(struct inpcb *inp) and the call is in tcp_time.c: /* * If losing, let the lower level know and try for * a better route. Also, if we backed off this far, * our srtt estimate is probably bogus. Clobber it * so we'll take the next rtt measurement as our srtt; * move the current srtt into rttvar to keep the current * retransmit times until then. */ As ntpd acts after a grace period the routing engine may have corrected this situation and routing may indeed change. ntpd's interactions with peers can take up to 1024s so it is good to attempt in a best effort way to keep the internal local address/socket state close to the current state. It is likely though that there have been routing messages like RTM_CHANGE/ADD/DELETE before that and RTM_LOSING is not providing additional information at the point. Right, RTM_LOSING is just informational. If any routing does change then we get RTM_CHANGE/ADD/DELETE etc. As NTP doesn't bring interfaces up or down, RFM_IFANNOUNCE is useless as well. If the interface does vanish, any addresses on it will be reported via RTM_DELADDR. RTM_IFINFO is also questionable as commentary in the code is that it only cares about addresses. Well I read ntp_io.c /* * we are keen on new and deleted addresses and * if an interface goes up and down or routing * changes */ not as being interested in addresses only. Also keep in mind that at this point routing messages are processed in a loop and the action here timer_interfacetimeout(current_time + UPDATE_GRACE); just sets the variable for the next interface+local address update run. This is very cheap. The grace period will batch multiple routing message together. An explicit routing message flush is from my point of view code clutter here. as the socket is effectively drained in the loop at the cost of examining the msg_type and setting a variable. Not much gained here. OK, we'll keep RTM_IFINFO but drop RTM_IFANNOUNCE. The point is trying to eliminate the overflow message entirely. I mean, if you want to argue against any of that then I would suggest why even bother filtering or looking at overflow at all? Shrink the code - any activity on the routing socket, drain it ignoring all error, start the interface update timer. That would be an option but we should react only on known events. There may be one or two events that could be removed from the list after examination as other messages can cover for them. Keep in mind the this is a portable code section and the code tries to be on the fail safe, robust side for the goal of address/routing tracking so adjusting it to a particular implementation may break on other os implementations. Well, Dragonfly (prior to my patches there) and by extension FreeBSD (not checked to see if that changed) both emit RMT_DELADDR before RTM_IFANNOUNCE (ie wrong order) so when they do overflow you never see RTM_IFANNOUNCE to say the interface vanished. Hence there is zero point is listening for it for ntp. As for the message: IMHO it does not need to be logged at all (DPRINTF/maybe LOGDEBUG at most) because the overflow should and does just trigger ntpd to reevaluate the interface/routing configuration. This information is not important at all for normal operation as the effects are correctly mitigated. I changed it to LOG_DEBUG as well as removing RTM_LOSING and RTM_IFANNOUNCE as discussed above. Great. BTW: does the current code revert to (fail safe) periodic interface scanning if the routing socket is being disabled (happens when an unexpected error code is returned from read(2))? No. The socket is non blocking so the only error to ignore here would be EINTR. Any other errors are due to bad programming IMO. Could be bad programming, but I prefer the ntpd being forgiving against hiccups by reverting to periodic scanning when we disable to routing socket. That is a fail safe strategy and would also warrant a log message as it is an unusual event. EINTR is now ignored. I'll find time to restore periodic scanning later. Roy
Re: Routing socket issue?
On 30/01/2021 22:01, Frank Kardel wrote: "why it needs to be interested i..." Ntpd needs to know the local address being used when sending to peers (authentication, which socket to use). That is why it not just reacts to address information but also redetermines to local addresses (and sockets) are being used for reaching its peers. The interaction with the routing socket is purposely simple. ntpd just needs to know that *something* has changed. It will then rescan after a grace period the interfaces and reevaluate the interface/local address/socket setup. It does not need to be extremely snappy but it needs to happen. Dropping that might delay ntpd's detection of changed local addresses for peers. For example I fail to see how RTM_LOSING helps that because it won't change how ntpd would configure itself. As NTP doesn't bring interfaces up or down, RFM_IFANNOUNCE is useless as well. If the interface does vanish, any addresses on it will be reported via RTM_DELADDR. RTM_IFINFO is also questionable as commentary in the code is that it only cares about addresses. NOTE TO SELF: our kernel doesn't seem to report RTM_CHGADDR anymore looking at nxr.netbsd.org I mean, if you want to argue against any of that then I would suggest why even bother filtering or looking at overflow at all? Shrink the code - any activity on the routing socket, drain it ignoring all error, start the interface update timer. As for the message: IMHO it does not need to be logged at all (DPRINTF/maybe LOGDEBUG at most) because the overflow should and does just trigger ntpd to reevaluate the interface/routing configuration. This information is not important at all for normal operation as the effects are correctly mitigated. Great. BTW: does the current code revert to (fail safe) periodic interface scanning if the routing socket is being disabled (happens when an unexpected error code is returned from read(2))? No. The socket is non blocking so the only error to ignore here would be EINTR. Any other errors are due to bad programming IMO. Roy
Re: Routing socket issue?
On 30/01/2021 18:27, Paul Goyette wrote: On Sat, 30 Jan 2021, Roy Marples wrote: On 30/01/2021 15:12, Paul Goyette wrote: I thought we took care of the buffer-space issue a long time ago, but today I've gotten about a dozen of these: ... Jan 30 05:20:11 speedy ntpd[3146]: routing socket reports: No buffer space available I recently adding a patch to enable the diagnostic AND take action on it. We can change the upstream default from LOG_ERR to LOG_DEBUG or maybe their custom DPRINTF though if you think that would help reduce the noise. Not concerned about noise, just wanted to make sure we didn't have a regression slip by. As long as the message is deliberate, I'm not too worried. Just to be clear on this, we have the framework to filter out routing messages we don't need to stop overflow from happening and we can also detect when overflow still happens. Currently ntpd now does both, before it just filtered out, but I didn't change what it was interested in and now I'm curious why it needs to be interested in actual routing changes for interface/address discovery as I'm pretty sure we can drop that. As we enable this in more applications we just have to make some choices - filter more out vs increasing buffer size vs just discarding the error if the prior two are not feasible. Roy
Re: Routing socket issue?
On 30/01/2021 18:27, Paul Goyette wrote: On Sat, 30 Jan 2021, Roy Marples wrote: On 30/01/2021 15:12, Paul Goyette wrote: I thought we took care of the buffer-space issue a long time ago, but today I've gotten about a dozen of these: ... Jan 30 05:20:11 speedy ntpd[3146]: routing socket reports: No buffer space available I recently adding a patch to enable the diagnostic AND take action on it. We can change the upstream default from LOG_ERR to LOG_DEBUG or maybe their custom DPRINTF though if you think that would help reduce the noise. Not concerned about noise, just wanted to make sure we didn't have a regression slip by. As long as the message is deliberate, I'm not too worried. Well, currently other apps such as dhcpcd still log an error when the routing socket overflows but a more helpful message. I think we can just change it to: routing socket overflowed - will update interfaces Happy with that? To alleviate the issue we could also stop ntpd from listening to routing changes has that has no bearing on how it discovers interfaces and addresses as far as i can tell. Frank ok with that? Roy
Re: Routing socket issue?
On 30/01/2021 15:12, Paul Goyette wrote: I thought we took care of the buffer-space issue a long time ago, but today I've gotten about a dozen of these: ... Jan 30 05:20:11 speedy ntpd[3146]: routing socket reports: No buffer space available I recently adding a patch to enable the diagnostic AND take action on it. We can change the upstream default from LOG_ERR to LOG_DEBUG or maybe their custom DPRINTF though if you think that would help reduce the noise. Roy
Re: Help with libcurses and lynx under NetBSD-9 and -current?
On 27/01/2021 17:52, Christos Zoulas wrote: In article , RVP wrote: This might be due to the fact that window(1) relies on setting a custom TERMCAP environment variable to inform programs running under it of the term. capabilities it supports, and the curses library no longer makes use of that. With ncurses, building it with the `--enable-termcap' option makes it use the TERMCAP variable if it set in the environment. The ncurses(w) in pkgsrc is not built with that option, so, I compiled the latest ncurses from source with that option added and lynx -show_cursor worked just fine under window(1). -RVP I think we can make our libterminfo do the same by shuffling a few ifdefs around :-) No need for that. TERMINFO_COMPILE is defined unless built SMALLPROG So $TERMCAP is respected in the from the environement by default after installation. See terminfo(5) for more details as $TERMINFO will take precedence is also set. Roy
Re: Using wg(4) with a commerical VPN provider
On 11/11/2020 01:49, Brad Spencer wrote: @@ -2352,6 +2361,7 @@ if (*af == AF_INET) { packet_len = ntohs(ip->ip_len); } else { +#ifdef INET6 const struct ip6_hdr *ip6; if (__predict_false(decrypted_len < sizeof(struct ip6_hdr))) Might be better to roll it into case statement. Could wg one day work with a third address family? Roy
Re: Automated report: NetBSD-current/i386 test failure (l2tp)
On 23/10/2020 08:25, Andreas Gustafsson wrote: Roy Marples wrote: This is rump crashing and I don't know why. If the rump kernel crashes in the test, that likely means the real kernel will crash in actual use. I can't get a backtrace to tell me where the problem is. I managed to get one this way: sysctl -w kern.defcorename="/tmp/%n.core" cd /usr/tests/net/if_l2tp ./t_l2tp l2tp_basic_ipv4overipv4 gdb rump_server /tmp/rump_server.core It looks like this: Thanks for that, it should now be fixed. Roy
Re: Automated report: NetBSD-current/i386 test failure (l2tp)
Hi Andreas On 22/10/2020 09:00, Andreas Gustafsson wrote: Hi Roy, On Oct 16, the NetBSD Test Fixture wrote: The newly failing test cases are: net/if_l2tp/t_l2tp:l2tp_basic_ipv4overipv4 net/if_l2tp/t_l2tp:l2tp_basic_ipv4overipv6 net/if_l2tp/t_l2tp:l2tp_basic_ipv6overipv4 net/if_l2tp/t_l2tp:l2tp_basic_ipv6overipv6 net/ipsec/t_ipsec_l2tp:ipsec_l2tp_ipv4_transport_ah_hmacsha512 net/ipsec/t_ipsec_l2tp:ipsec_l2tp_ipv4_transport_ah_null net/ipsec/t_ipsec_l2tp:ipsec_l2tp_ipv4_transport_esp_null net/ipsec/t_ipsec_l2tp:ipsec_l2tp_ipv4_transport_esp_rijndaelcbc net/ipsec/t_ipsec_l2tp:ipsec_l2tp_ipv4_tunnel_ah_hmacsha512 net/ipsec/t_ipsec_l2tp:ipsec_l2tp_ipv4_tunnel_ah_null net/ipsec/t_ipsec_l2tp:ipsec_l2tp_ipv4_tunnel_esp_null net/ipsec/t_ipsec_l2tp:ipsec_l2tp_ipv4_tunnel_esp_rijndaelcbc net/ipsec/t_ipsec_l2tp:ipsec_l2tp_ipv6_transport_ah_hmacsha512 net/ipsec/t_ipsec_l2tp:ipsec_l2tp_ipv6_transport_ah_null net/ipsec/t_ipsec_l2tp:ipsec_l2tp_ipv6_transport_esp_null net/ipsec/t_ipsec_l2tp:ipsec_l2tp_ipv6_transport_esp_rijndaelcbc net/ipsec/t_ipsec_l2tp:ipsec_l2tp_ipv6_tunnel_ah_hmacsha512 net/ipsec/t_ipsec_l2tp:ipsec_l2tp_ipv6_tunnel_ah_null net/ipsec/t_ipsec_l2tp:ipsec_l2tp_ipv6_tunnel_esp_null net/ipsec/t_ipsec_l2tp:ipsec_l2tp_ipv6_tunnel_esp_rijndaelcbc These are still failing as of 2020.10.21.15.12.15, and the commit that triggered the failures has now been identified: 2020.10.15.02.54.10 roy src/sys/net/if_l2tp.c 1.44 For logs, see http://www.gson.org/netbsd/bugs/build/amd64/commits-2020.10.html#2020.10.15.02.54.10 This is rump crashing and I don't know why. I can't get a backtrace to tell me where the problem is. Roy
Re: Automated report: NetBSD-current/i386 test failure
On 16/10/2020 15:54, NetBSD Test Fixture wrote: This is an automatically generated notice of new failures of the NetBSD test suite. The newly failing test cases are: net/if_wg/t_basic:wg_basic_ipv6_over_ipv4 net/if_wg/t_basic:wg_basic_ipv6_over_ipv6 net/if_wg/t_basic:wg_payload_sizes_ipv6_over_ipv4 net/if_wg/t_basic:wg_payload_sizes_ipv6_over_ipv6 These should now be fixed. Roy
Re: Automated report: NetBSD-current/i386 test failure
On 14/10/2020 07:15, Andreas Gustafsson wrote: On Oct 8, the NetBSD Test Fixture wrote: The newly failing test cases are: net/carp/t_basic:carp_handover_ipv4_halt_carpdevip net/carp/t_basic:carp_handover_ipv4_halt_nocarpdevip net/carp/t_basic:carp_handover_ipv4_ifdown_carpdevip net/carp/t_basic:carp_handover_ipv4_ifdown_nocarpdevip net/carp/t_basic:carp_handover_ipv6_halt_carpdevip net/carp/t_basic:carp_handover_ipv6_ifdown_carpdevip These were fixed on Oct 8, but then broken again on Oct 12: http://releng.netbsd.org/b5reports/i386/commits-2020.10.html#2020.10.12.11.07.27 Fixed here: https://mail-index.netbsd.org/source-changes/2020/10/14/msg122921.html Note, if `ident /sbin/ifconfig | grep ifconfig.c` shows revision r1.243 - r1.247 then ifconfig will likely crash with this change on carp interfaces. This has been resolved in r1.248 Roy
Re: gdb - undefined reference to `std::__1::codecvt::id'
On 29/09/2020 20:26, Christos Zoulas wrote: Or use gcc instead of clang :-) Ew
Re: gdb - undefined reference to `std::__1::codecvt::id'
On 29/09/2020 17:13, Kamil Rytarowski wrote: The basesystem libc++ is too old for C++ applications like GDB. I find that dubious as we have the new gdb building fine on amd64 and i386 with gnu compiler according to our test runs. Unless the machine has a local override. This is clang compiler. A workaround is to force old GDB. I've just disabled building GDB for the time being. Roy
gdb - undefined reference to `std::__1::codecvt::id'
# link gdb/gdb /usr/tools/bin/x86_64--netbsd-clang++--sysroot=/ -Wl,--warn-shared-textrel -Wl,-z,relro -pie -o gdb gdb.o -Wl,-rpath-link,/lib -L=/lib -L/home/roy/src/hg/src/external/gpl3/gdb/lib/libgdb/obj.amd64 -lgdb -L/home/roy/src/hg/src/external/gpl3/gdb/lib/libopcodes/obj.amd64 -lopcodes -L/home/roy/src/hg/src/external/gpl3/gdb/lib/libbfd/obj.amd64 -lbfd -L/home/roy/src/hg/src/external/gpl3/gdb/lib/libdecnumber/obj.amd64 -ldecnumber -L/home/roy/src/hg/src/external/gpl3/gdb/lib/libgdbsupport/obj.amd64 -lgdbsupport -L/home/roy/src/hg/src/external/gpl3/gdb/lib/libctf/obj.amd64 -lctf -L/home/roy/src/hg/src/external/gpl3/gdb/lib/libgnulib/obj.amd64 -lgnulib -L/home/roy/src/hg/src/external/gpl3/gdb/lib/libreadline/obj.amd64 -lreadline -lterminfo -L/home/roy/src/hg/src/external/gpl3/gdb/lib/libiberty/obj.amd64 -liberty -lexpat -llzma -lz -lcurses -lintl -lm -lkvm -lutil /usr/tools/bin/x86_64--netbsd-ld: /home/roy/src/hg/src/external/gpl3/gdb/lib/libgdb/obj.amd64/libgdb.a(string_view-selftests.o): in function `std::__1::basic_filebuf >::basic_filebuf()': string_view-selftests.c:(.text._ZNSt3__113basic_filebufIcNS_11char_traitsIcEEEC2Ev[_ZNSt3__113basic_filebufIcNS_11char_traitsIcEEEC2Ev]+0x94): undefined reference to `std::__1::codecvt::id' /usr/tools/bin/x86_64--netbsd-ld: string_view-selftests.c:(.text._ZNSt3__113basic_filebufIcNS_11char_traitsIcEEEC2Ev[_ZNSt3__113basic_filebufIcNS_11char_traitsIcEEEC2Ev]+0xc4): undefined reference to `std::__1::codecvt::id' /usr/tools/bin/x86_64--netbsd-ld: /home/roy/src/hg/src/external/gpl3/gdb/lib/libgdb/obj.amd64/libgdb.a(string_view-selftests.o): in function `std::__1::basic_filebuf >::imbue(std::__1::locale const&)': string_view-selftests.c:(.text._ZNSt3__113basic_filebufIcNS_11char_traitsIcEEE5imbueERKNS_6localeE[_ZNSt3__113basic_filebufIcNS_11char_traitsIcEEE5imbueERKNS_6localeE]+0x13): undefined reference to `std::__1::codecvt::id' x86_64--netbsd-clang: error: linker command failed with exit code 1 (use -v to see invocation) *** Error code 1 What went wrong? My very limited knowledge of C++ and google foo says codecvt should be part of libc++? Roy
Re: Automated report: NetBSD-current/i386 test failure
On 23/09/2020 11:42, NetBSD Test Fixture wrote: This is an automatically generated notice of a new failure of the NetBSD test suite. The newly failing test case is: net/if/t_ifconfig:ifconfig_options The above test failed in each of the last 4 test runs, and passed in at least 26 consecutive runs before that. The following commits were made between the last successful test and the failed test: 2020.09.23.02.09.18 roy src/sbin/ifconfig/ifconfig.8,v 1.120 2020.09.23.02.09.18 roy src/sbin/ifconfig/ifconfig.c,v 1.244 2020.09.23.02.32.04 roy src/usr.sbin/ifwatchd/ifwatchd.c,v 1.44 Logs can be found at: http://releng.NetBSD.org/b5reports/i386/commits-2020.09.html#2020.09.23.02.32.04 Fixed. Roy
Re: Automated report: NetBSD-current/i386 test failure
On 20/09/2020 04:40, Robert Elz wrote: Date:Sun, 20 Sep 2020 04:02:45 +0100 From:Roy Marples Message-ID: <51d2f8dc-d059-5eae-9899-5c91539d1...@marples.name> | The test case just needed fixing. That is not uncommon after changes elsewhere. | The ping to an invalid address caused the ARP entry to enter INCOMPLETE -> | WAITDELETE state and this hung over into the next test casing this entry | to take too long to validty resolve. Why? If a failed ARP (or ND) causes problems for a later request (incl of the same addr) which should work (that is, any problems at all, including delays) then I'd consider the implementation broken (not the test). RFC 7048 expands that consistent failures expontentialy backoff. Because the server is not reset the backoff may bleed into subsequet tests for the same address which why this test was sometimes failing. | The solution is after a deliberate fail And if it wasn't a deliberate fail? Perhaps being just a fraction of a second too quick, and attempting a ping (or ssh, or something) just before the destination becomes reachable (either because it was down, unconfigured, or the net link between then wasn't functional), and ATF timings on an emulated environment cannot be that precise. See PR 43997 for more details. | to remove the ARP entry for the address if the user doing this isn't root, and cannot just remove ARP entries? Maybe I'm misunderstanding the actual scenario, but it seems to me that things aren't working as well now as they were before (the timing in the qemu tests hasn't changed recently - not since the nvmm version started being used - but before the arp implementation change, it used to work reliably). By reliably you mean that a successful ARP resoltion lasts for 20 minutes which we don't have any tests for? If anything the tests we have are more reliable than before as I have not adjusted any timings. Roy
Re: Automated report: NetBSD-current/i386 test failure
On 13/09/2020 23:10, Robert Elz wrote: Date:Sun, 13 Sep 2020 22:14:00 +0100 From:Roy Marples Message-ID: | >| > net/arp/t_arp:arp_proxy_arp_pub | >| > net/arp/t_arp:arp_proxy_arp_pubproxy | > | > Those two are still failing. | | Works fine on my box. | Can you say how they are failing? See: http://releng.netbsd.org/b5reports/i386/2020/2020.09.13.15.27.25/test.html#net_arp_t_arp_arp_proxy_arp_pub The test case just needed fixing. Basically the issue was that the test kernel was slow but the test cases were fast. The ping to an invalid address caused the ARP entry to enter INCOMPLETE -> WAITDELETE state and this hung over into the next test casing this entry to take too long to validty resolve. The solution is after a deliberate fail to remove the ARP entry for the address and ignore the exit code if the entry has naturally expired / been removed. This fixes all the test case fallout from the ARP -> ND merge and has now survived several test runs. The ND cache expiration test which intermittently fails is based on exact timings. A future patch will add jitter to NS, will cause this test to fail more. Ideas on how to solve it welcome. Roy
Re: arp: ioctl(SIOCGNBRINFO): Inappropriate ioctl for device
On 16/09/2020 10:23, Thomas Klausner wrote: On Wed, Sep 16, 2020 at 11:10:55AM +0200, Martin Husemann wrote: On Wed, Sep 16, 2020 at 11:05:49AM +0200, Thomas Klausner wrote: The one with 192.168.0.x configured is wm0. (I only have an lo0 except for that.) Strange, your kernel is newer or same age as your userland? My kernel is from September 4. Since there was no version bump I assumed that I could install a newer userland (with gcc9) without problems. Kernel bumped for you. Roy
Re: Automated report: NetBSD-current/i386 test failure
On 13/09/2020 22:07, Robert Elz wrote: Date:Sun, 13 Sep 2020 20:06:45 +0100 From:Roy Marples Message-ID: <9e977478-d209-2dbb-49d9-3fa9acd25...@marples.name> | > net/arp/t_arp:arp_cache_expiration_10s | > net/arp/t_arp:arp_cache_expiration_5s Those two are "fixed" (if you can call deleted fixed). I call them "replaced". arp_cache_expiration is the mirror of the ndp equivalent. | > net/arp/t_arp:arp_command That looks OK now. | > net/arp/t_arp:arp_proxy_arp_pub | > net/arp/t_arp:arp_proxy_arp_pubproxy Those two are still failing. Works fine on my box. Can you say how they are failing? Roy
Re: Automated report: NetBSD-current/i386 test failure
On 12/09/2020 22:57, NetBSD Test Fixture wrote: This is an automatically generated notice of new failures of the NetBSD test suite. The newly failing test cases are: net/arp/t_arp:arp_cache_expiration_10s net/arp/t_arp:arp_cache_expiration_5s net/arp/t_arp:arp_command net/arp/t_arp:arp_proxy_arp_pub > This is an automatically generated notice of a new failure of the > NetBSD test suite. > > The newly failing test case is: > > net/arp/t_arp:arp_proxy_arp_pubproxy These should now be fixed Roy
Re: Automated report: NetBSD-current/i386 build failure
On 12/09/2020 07:40, NetBSD Test Fixture wrote: This is an automatically generated notice of a NetBSD-current/i386 build failure. The failure occurred on babylon5.netbsd.org, a NetBSD/amd64 host, using sources from CVS date 2020.09.12.01.36.26. An extract from the build.sh output follows: --- dependall-include --- /tmp/bracket/build/2020.09.12.01.36.26-i386/tools/lib/gcc/i486--netbsdelf/8.4.0/../../../../i486--netbsdelf/bin/ld: /tmp/bracket/build/2020.09.12.01.36.26-i386/destdir/usr/lib/librumpnet_net.so: undefined reference to `rumpns_nd_set_timer' /tmp/bracket/build/2020.09.12.01.36.26-i386/tools/lib/gcc/i486--netbsdelf/8.4.0/../../../../i486--netbsdelf/bin/ld: /tmp/bracket/build/2020.09.12.01.36.26-i386/destdir/usr/lib/librumpnet_net.so: undefined reference to `rumpns_nd_resolve' /tmp/bracket/build/2020.09.12.01.36.26-i386/tools/lib/gcc/i486--netbsdelf/8.4.0/../../../../i486--netbsdelf/bin/ld: /tmp/bracket/build/2020.09.12.01.36.26-i386/destdir/usr/lib/librumpnet_net.so: undefined reference to `rumpns_nd_nud_hint' /tmp/bracket/build/2020.09.12.01.36.26-i386/tools/lib/gcc/i486--netbsdelf/8.4.0/../../../../i486--netbsdelf/bin/ld: /tmp/bracket/build/2020.09.12.01.36.26-i386/destdir/usr/lib/librumpnet_net.so: undefined reference to `rumpns_nd_attach_domain' collect2: error: ld returned 1 exit status *** [t_socket] Error code 1 nbmake[8]: stopped in /tmp/bracket/build/2020.09.12.01.36.26-i386/src/tests/include/sys --- dependall-sys --- --- dependall-bootxx --- This should now be fixed in sys/rump/net/lib/libnet/Makefile r1.33 Roy
Re: Automated report: NetBSD-current/i386 build failure
On 12/06/2020 16:13, NetBSD Test Fixture wrote: This is an automatically generated notice of a NetBSD-current/i386 build failure. The failure occurred on babylon5.netbsd.org, a NetBSD/amd64 host, using sources from CVS date 2020.06.12.11.21.36. An extract from the build.sh output follows: --- dependall-libagr --- --- if_agrether_hash.d --- #create libagr/if_agrether_hash.d CC=/tmp/bracket/build/2020.06.12.11.21.36-i386/tools/bin/i486--netbsdelf-gcc /tmp/bracket/build/2020.06.12.11.21.36-i386/tools/bin/nbmkdep -f if_agrether_hash.d.tmp -- -std=gnu99 --sysroot=/tmp/bracket/build/2020.06.12.11.21.36-i386/destdir -DCOMPAT_50 -DCOMPAT_60 -DCOMPAT_70 -DCOMPAT_80 -DCOMPAT_90 -nostdinc -imacros /tmp/bracket/build/2020.06.12.11.21.36-i386/src/sys/rump/net/lib/libagr/../../../include/opt/opt_rumpkernel.h -I/tmp/bracket/build/2020.06.12.11.21.36-i386/src/sys/rump/net/lib/libagr -I. -I/tmp/bracket/build/2020.06.12.11.21.36-i386/src/sys/rump/net/lib/libagr/../../../../../common/include -I/tmp/bracket/build/2020.06.12.11.21.36-i386/src/sys/rump/net/lib/libagr/../../../include -I/tmp/bracket/build/2020.06.12.11.21.36-i386/src/sys/rump/net/lib/libagr/../../../include/opt -I/tmp/bracket/build/2020.06.12.11.21.36-i386/src/sys/rump/net/lib/libagr/../../../../arch -I/tmp/bracket/build/2020.06.12.11.21.36-i386/src/sys/rump/net/lib/libagr/../../../.. -DDIAGNOSTIC - DKTRACE -D_FORTIFY_SOURCE=2 /tmp/bracket/build/2020.06.12.11.21.36-i386/src/sys/rump/net/lib/libagr/../../../../net/agr/if_agrether_hash.c && mv -f if_agrether_hash.d.tmp if_agrether_hash.d --- dependall-libnet --- /tmp/bracket/build/2020.06.12.11.21.36-i386/src/sys/rump/net/lib/libnet/../../../../netinet6/in6.c:112:10: fatal error: compat/netinet6/nd6.h: No such file or directory #include ^~~ Fixed here: https://mail-index.netbsd.org/source-changes/2020/06/12/msg118239.html Roy
Re: problem w/dhcpcd vs libsupc++.a on sparc?
On 21/04/2020 09:07, Roy Marples wrote: On 21/04/2020 00:02, John D. Baker wrote: On Mon, 20 Apr 2020, r...@marples.name wrote: Anyway the patch linked below should fix this. https://roy.marples.name/cgit/dhcpcd.git/patch/?id=1dc1fce7ae7b4c106a8eb631ed92ab1ed8e86bbc I'm waiting for feedback on a few more issues, so hopefully you can clarify the patch works before I import a fixed dhcpcd. The patch appears only to be for the 'dhcpcd' in -current. The version of 'dhcpcd' in netbsd-9 is rather different in that area. Building sparc-current again and will test soon. Need equivalent patch for 'dhcpcd' in netbsd-9. Sorry, I thought it was -current. Here is patch for netbsd-9: https://roy.marples.name/cgit/dhcpcd.git/commit/?h=dhcpcd-8&id=ff78692ef3e74f8f7de2883db541de915c295e07 I've submitted a pullup for netbsd-9 with a more fixes thanks to mrg@ and nick@ Hopefully Martin can address it soon! -current and pkgsrc have already been fixed. Roy
Re: problem w/dhcpcd vs libsupc++.a on sparc?
On 21/04/2020 00:02, John D. Baker wrote: On Mon, 20 Apr 2020, r...@marples.name wrote: Anyway the patch linked below should fix this. https://roy.marples.name/cgit/dhcpcd.git/patch/?id=1dc1fce7ae7b4c106a8eb631ed92ab1ed8e86bbc I'm waiting for feedback on a few more issues, so hopefully you can clarify the patch works before I import a fixed dhcpcd. The patch appears only to be for the 'dhcpcd' in -current. The version of 'dhcpcd' in netbsd-9 is rather different in that area. Building sparc-current again and will test soon. Need equivalent patch for 'dhcpcd' in netbsd-9. Sorry, I thought it was -current. Here is patch for netbsd-9: https://roy.marples.name/cgit/dhcpcd.git/commit/?h=dhcpcd-8&id=ff78692ef3e74f8f7de2883db541de915c295e07 Roy
Re: HEADS UP: dhcpcd gains privilege separation support
Hi Oskar On 04/04/2020 07:40, os...@fessel.org wrote: Am 02.04.2020 um 15:07 schrieb Roy Marples : could it be that this _dhcpcd has been added to master.passwd and group, so please update your local ones before upgrading. Once installed, you should stop dhcpcd running and then invoke postinstall so that the old dhcpcd files (duid, secret, leases, etc) are moved to the chroot directory. Then you can start dhcpcd and it will pick up where it left off. relates to this build failure: === 1 extra files in DESTDIR = Files in DESTDIR but missing from flist. File is obsolete or flist is out of date ? -- ./var/db/dhcpcd = end of 1 extra files === when running: ===> build.sh command:./build.sh -j 24 -M /hurz/obj -O /hurz/obj -X /hurz/xsrc -x release kernel=ZAPPA-pf kernel=XEN3_DOM0 kernel=XEN3_DOMU ===> build.sh started:Fri Apr 3 21:54:54 CEST 2020 ===> NetBSD version: 9.99.52 ===> MACHINE: amd64 ===> MACHINE_ARCH:x86_64 ===> Build platform: NetBSD 9.99.50 amd64 ===> HOST_SH: /bin/sh ===> MAKECONF file: /etc/mk.conf ===> TOOLDIR path:/hurz/obj/tooldir.NetBSD-9.99.50-amd64 ===> DESTDIR path:/hurz/obj/destdir.amd64 ===> RELEASEDIR path: /hurz/obj/releasedir ===> Updated makewrapper: /hurz/obj/tooldir.NetBSD-9.99.50-amd64/bin/nbmake-amd64 — with sources sup’ed just 30 minutes before the build start? The same with sources from tonight midnight CEST. Or did I miss something besides deleting everything in DESTDIR and RELESEDIR? I forgot to say that /var/db/dhcpcd has been deprecated for new builds so it needs to be removed. A check through our set lists tells me that we can't obsolete directories so I think it needs to be manual. Roy
HEADS UP: dhcpcd gains privilege separation support
_dhcpcd has been added to master.passwd and group, so please update your local ones before upgrading. Once installed, you should stop dhcpcd running and then invoke postinstall so that the old dhcpcd files (duid, secret, leases, etc) are moved to the chroot directory. Then you can start dhcpcd and it will pick up where it left off. Roy
Re: Automated report: NetBSD-current/i386 test failure
On 31/03/2020 12:22, NetBSD Test Fixture wrote: The newly failing test case is: usr.bin/infocmp/t_terminfo:basic This error in infocmp is now fixed. Roy
Re: ZFS on root - almost there
On 25/02/2020 21:40, Chavdar Ivanov wrote: On Tue, 25 Feb 2020 at 20:14, Roy Marples wrote: On 22/02/2020 19:22, Roy Marples wrote: https://wiki.netbsd.org/wiki/RootOnZFS/ Updated the wiki and the ramdisk - either the bootloader needs to load the modules via boot.cfg or the modules need to be built into the kernel. I don't get it - with my present, still 9.99.47 setup, I am able to load modules: Because we can label a GPT parition "boot". However, that won't work for MBR based systems. It's also not very friendly if you have any other OS present who might for similar reasons have a parition named boot either. There's just no easy way to load the modules from the ramdisk without putting them inside the ramdisk and I think too many people would forget to re-build the ramdisk or put it against the wrong kernel. So is the option of loading them as per the above no longer available? No it's not. It is however available from the source history if you really want it. As a last metric, since reverting back to letting the bootloader load the modules, zpool is no longer panicing randomly. Or the randomness just hasn't struck yet! Roy
Re: ZFS on root - almost there
On 22/02/2020 19:22, Roy Marples wrote: https://wiki.netbsd.org/wiki/RootOnZFS/ Updated the wiki and the ramdisk - either the bootloader needs to load the modules via boot.cfg or the modules need to be built into the kernel. There's just no easy way to load the modules from the ramdisk without putting them inside the ramdisk and I think too many people would forget to re-build the ramdisk or put it against the wrong kernel. Also, I've updated the minimum required kernel to 9.99.48 as Taylor R Campbell has kindly fixed the problem of writing to the FFS boot device from the ZFS chroot :) Roy
Re: ZFS on root - almost there
On 23/02/2020 11:56, Chavdar Ivanov wrote: On Sun, 23 Feb 2020 at 05:17, Roy Marples wrote: On 22/02/2020 21:19, Chavdar Ivanov wrote: I just noticed - the error message from the sysctl command was that the string was too long: Sync up, build a new ramdisk and install it. Should be fixed now. It is indeed. So far, this was a VirtualBox guest, 2PCUs, 4GB memory, EFI enabled. X works fine with the VirtualBox additions installed. The access to the boot partition (/dev/dk1) fails on umount; it usually hangs, but I had once a sudden reset. That should also be fixed if you sync up :) Roy
Re: ZFS on root - almost there
On 22/02/2020 21:19, Chavdar Ivanov wrote: I just noticed - the error message from the sysctl command was that the string was too long: Sync up, build a new ramdisk and install it. Should be fixed now. Roy
Re: ZFS on root - almost there
On 22/02/2020 19:06, Chavdar Ivanov wrote: On Sat, 22 Feb 2020 at 18:03, Chavdar Ivanov wrote: On Sat, 22 Feb 2020 at 17:03, Roy Marples wrote: On 22/02/2020 16:56, Chavdar Ivanov wrote: Surely I have missed and/or misuderstood some of the above, but I am getting: ... Starting ZFS on root boot strapper Copying needed kernel modules from NAME=boot:/stand/amd64/9.99.47/modules mount: no match for 'boot': No such process /mnt//stand/amd64/9.99.47/modules/zfs/solaris.kmod not found! /mnt//stand/amd64/9.99.47/modules/zfs/zfs.kmod not found! umount: /mnt: not currently mounted Importing rpool, mounting and pivoting internal error: failed to initialize ZFS library It seems it tries to mount the small ufs root on /mnt using 'NAME=boot' label, but the label created by the standard installed is some GUID. Wups! I missed an instruction step to ensure the label of the FFS partiton is boot. gpt label -i 1 -l boot wd0 Replace 1 with the partition index and wd0 with the device. It worked; the only problem I am still having is adding swap; both for a zvol and a gpt partition I get: ... nzfs# swapctl -a /dev/zvol/dsk/rpool/SWAP swapctl: /dev/zvol/dsk/rpool/SWAP: Device not configured nzfs# swapctl -a /dev/dk5 swapctl: /dev/dk5: Device not configured Could be something I've screwed during the installation, but can't figure it out. The next problem is that one can't load any modules; is this by design or I have again made some mistake? Only the two modules prior to pivoting are seen - solaris and zfs; after that one gets, e.g.: ➜ ~ ls -l /stand/amd64/9.99.47/modules/dtrace/dtrace.kmod -r--r--r-- 1 root wheel 320120 Feb 20 10:19 /stand/amd64/9.99.47/modules/dtrace/dtrace.kmod ➜ ~ modload dtrace modload: dtrace: No such file or directory ➜ ~ uname -a NetBSD nzfs 9.99.47 NetBSD 9.99.47 (GENERIC) #10: Sat Feb 22 14:18:50 GMT 2020 sysbuild@ymir:/home/sysbuild/amd64/obj/home/sysbuild/src/sys/arch/amd64/c ompile/GENERIC amd64 Does this patch help? Roy Index: zfsroot.rc === RCS file: /cvsroot/src/distrib/common/zfsroot.rc,v retrieving revision 1.1 diff -u -p -r1.1 zfsroot.rc --- zfsroot.rc 22 Feb 2020 09:53:47 - 1.1 +++ zfsroot.rc 22 Feb 2020 19:30:43 - @@ -51,6 +51,13 @@ done /sbin/umount "$modmnt" echo +# Point the modulepath to /altroot +mpath="$(sysctl -n kern.module.path)" +case "$mpath" in +/altroot/\*) ;; +*) sysctl -w kern.module.path="/altroot/$mpath";; +esac + echo "Importing $rpool, mounting and pivoting" # If we can mount the ZFS root partition to /altroot # then chroot to it and start /etc/rc
Re: ZFS on root - almost there
On 22/02/2020 11:27, Roy Marples wrote: On 14/02/2020 12:58, Roy Marples wrote: So I thought I would have a go at setting up ZFS on root. I've now comitted enough to manually build a ramdisk to set this all up. Quick instruction steps which I'll document on web page later: https://wiki.netbsd.org/wiki/RootOnZFS/ Roy
Re: ZFS on root - almost there
On 22/02/2020 16:56, Chavdar Ivanov wrote: Surely I have missed and/or misuderstood some of the above, but I am getting: ... Starting ZFS on root boot strapper Copying needed kernel modules from NAME=boot:/stand/amd64/9.99.47/modules mount: no match for 'boot': No such process /mnt//stand/amd64/9.99.47/modules/zfs/solaris.kmod not found! /mnt//stand/amd64/9.99.47/modules/zfs/zfs.kmod not found! umount: /mnt: not currently mounted Importing rpool, mounting and pivoting internal error: failed to initialize ZFS library It seems it tries to mount the small ufs root on /mnt using 'NAME=boot' label, but the label created by the standard installed is some GUID. Wups! I missed an instruction step to ensure the label of the FFS partiton is boot. gpt label -i 1 -l boot wd0 Replace 1 with the partition index and wd0 with the device. We do it like so to avoid the user needing to load the solaris and zfs modules in boot.cfg. Ideally we should teach sysctl to have kern.boot_device alongside kern.root_device to avoid this need. Roy
Re: ZFS on root - almost there
On 14/02/2020 12:58, Roy Marples wrote: So I thought I would have a go at setting up ZFS on root. I've now comitted enough to manually build a ramdisk to set this all up. Quick instruction steps which I'll document on web page later: Compile the ramdisk cd src/distrib/amd64/ramdisks/ramdisk-zfsroot nbmake-amd64 Ensure you are using GPT and not MBR. If you need to change, dd the disk using /dev/zero as source for about 32k and then the installer will ask you if you want MBR or GPT. Once set, it will not prompt to change it again. Use the installer to do a normal installation, extracting base, modules and rescue sets to a small FFS parition (I chose 2G). Do not allow the installer to use the rest of the disk. Drop to the prompt and copy the ramdisk you made earlier to / Edit /boot.cfg and add this menu item: menu=Boot ZFS root:fs /ramdisk-zfsroot.fs;boot Create a ZFS pool on another partition called rpool. Create the ZFS root filesytem called rpool/ROOT. zfs set mountpoint=legacy rpool/ROOT This step is important - the only downside is if you want to create any ZFS datasets in rpool/ROOT you need to either set mountpoints in /etc/fstab or specify them as they will automatically inherit legacy from ROOT. Extract the sets you want rpool/ROOT. Create dev on rpool/ROOT, copy MAKEDEV from /dev to it, cd to it and run ./MAKEDEV all Copy your /etc/fstab to rpool/ROOT/etc, but remove the / entry. Ensure that rc.conf is setup in rpool/ROOT/etc and it has zfs=YES You should now be good to go! WARNING: There seems to be a bug that once booted into a ZFS root and mount any device and write to it the system will hang trying to unmount it. This is not a fault with the ramdisk, but rather with how ZFS works with device nodes on ZFS. So to update the kernel, boot into the FFS partition and copy from the ZFS partition rather from doing it within the ZFS root. Once that is fixed I might look into trying to automate some of this in our installer. Good luck! Roy
Re: ZFS on root - almost there
On 14/02/2020 12:58, Roy Marples wrote: So I thought I would have a go at setting up ZFS on root. I now have a ramdisk-zfsroot configured! With just the kernel and modules on the partition I can put this in boot.cfg menu=Load ZFS Root;load solaris;load zfs;fs /ramdisk-zfsroot.fs;boot Sadly though zpool cannot find my pool :( I suspect this is because the bootdevice is now the ramdisk md0 rather than wd0a. Is there any way of educating the zfs module about this? Roy
ZFS on root - almost there
So I thought I would have a go at setting up ZFS on root. Thanks to hannken@ it now boots :) However, it panics at shutdown (or halt). Screen capture of the panic here: http://www.netbsd.org/~roy/netbsd-zfs-panic.jpg Now, what I did during the initial setup was to adjust the mountpoint of tank/ROOT/usr to /usr - ie relative to the chroot. The bootstrap phase is this in /etc/rc fsck -y / zfs mount tank/ROOT mount -t null /dev /tank/ROOT/dev mount -t null / /tank/ROOT/altroot # this doesn't appear to work sysctl -w init.root=/tank/ROOT This works fine, we enter the chroot For the time being I've disabled fsck_root and adjusts zfs to load all mounts. We now get to the login with minimal errors and all appears to work. You can see the mountlist inside the chroot at the top of the screen capture. If some kind person can fix this panic then I can copy across my live home site setup (web server, email, etc) and really test it out. Roy
Re: Recent if_stat changes have broken sysutils/xosview
On 09/02/2020 01:52, Jason Thorpe wrote: On Feb 8, 2020, at 4:04 PM, Paul Goyette wrote: The package no longer builds. Fails with (among others) error: 'struct ifnet' has no member named 'if_ibytes'; did you mean 'if_index'? "struct ifnet" is private to the kernel. This application should be using the properly exported data that's available via ioctls for this purpose. We have far too many kernel only things exposed to userland. A constant beef of mine is that we #define if_type in sys/net/if.h which causes conflict building hostapd/wpa_supplicant has they have an enum if_type. If we can resolve this it would make me a lot happier! I tried to have a go solving this about a year ago, but gave up due to some userland stuff like this no longer working. If we can solve it via ioctl then awesome. Roy
Re: Converting termcap entries to terminfo entries
Hi Brian On 22/10/2019 23:14, Brian Buhrow wrote: hello. I'm in the process of building NetBSD-9.0 systems in an effort to consider upgrading from my fleet of NetBSD-5.2 systems to NetBSD-9. As a long time window(1) user, I have a termcap entry for the window terminal type that I use on systems that I ssh into from window(1) panes. It is my practice to put a termcap and a terminfo database in my home directory on such systems, so that regardless of whether a program at the far end wants termcap or terminfo, it will be able to draw on the screen in full screen mode. what I need is a way of converting the termcap entries I have into a terminfo source file that tic(1) can compile into a .cdb file which can be used on NetBSD-9 systems. I have an older version of captoinfo(1) from the ncurses pkg, but it produces binary terminfo output unsuitable for the tic(1) program. I'm fuly aware that window(1) has been deprecated in favor of tmux(1), but I haven't climbed the learning curve of tmux(1) yet and I'm not sure it does everything I get from the window(1) program. So, can someone tell me what program I should use to convert termcap files into terminfo source files suitable for the new terminfo libraries in NetBSD-8 and 9? We don't have any specific program as such, but terminfo(5) has a section "Fetching Compiled Descriptions" If the environment variable TERMCAP is available and does not begin with a slash (`/') then it will be translated into terminfo and compiled as above. If its name matches TERM then it is used. So you can use infocmp(8) like so: $ TERM=captest TERMCAP="captest|:al=3*\E^R:am:bl=^G:cd=16*\E^C:ce=16\E^U:cl=2*^L:cm=\Ea%+ %+ :" infocmp # Reconstructed from $TERMCAP captest, am, bel=^G, clear=\f$<2*/>, cr=^M, cud1=^J, cup=\Ea%p1%{32}%+%c%p2%{32}%+%c, ed=\E\003$<16*/>, el=\E\025$<16/>, ht=^I, il1=\E\022$<3*/>, ind=^J, kbs=^H, kcub1=^H, kcud1=^J, nel=^M^J, I don't know how accurate the conversion will be for you as it's not entirely a 1-1 mapping and I think some assumptions are made (I've not looked at the source for a while), but hopefully it's good enough. Might be time consusing with many termcap entries to convert, but it should be scriptable at least. Is this good enough for you? Roy
Re: dhcpcd ignores "force_hostname=YES" on diskless clients?
On 21/09/2019 03:01, John D. Baker wrote: On Fri, 20 Sep 2019, John D. Baker wrote: (Before the recent imports of later versions of 'dhcpcd', it failed to obtain the FQDN on a sparc system and set the hostname as "localhost".) The diskless SPARC system works properly now. Will have to check other diskless clients (amd64, i386, evbmips). All diskless client "dhcpcd.conf" files have only the following changes from default: comment out "hostname" directive un-comment "ntp_servers" option add "env force_hostname=YES" If the hostname is "localhost" then dhcpcd won't send it. If the hostname is "localhost" then dhcpcd will set the hostname given via DHCP. So the only config change you should need to make by default is uncommenting ntp_servers. Roy
Re: build issue: _REENTRANT redefined
On 06/09/2019 11:34, Thomas Klausner wrote: I guess I have to turn off the gcc build as well, but for now I'd like to have both compilers... I've not been able to build both for many years now. As my need for building xen packages out-weighs my social want for LLVM, I currently only use gcc :( Roy
Re: NetBSD on a wireless router?
On 16/08/2019 04:28, Jason Thorpe wrote: On Aug 15, 2019, at 8:15 PM, John Franklin wrote: because I usually use the Ubiquiti APs for WiFi. For WiFi performance and management on a budget, they’re hard to beat. +1. I use Ubiquiti to cover the 3 levels of my house + back yard, and it works flawlessly (total of 4 APs to do the job). Another +1 for Ubiquiti. I have a UAP-AC-Pro plugged stock firmware plugged into my Ubiquiti EdgeRouter Lite which in turn runs NetBSD as the router itself. The range of the UAP-AC-Pro is pretty amazing comapred to anything else I've seen at consumer prices. Roy
Re: CVS commit: src/usr.sbin/postinstall
On 13/06/2019 09:00, Manuel Bouyer wrote: On Thu, Jun 13, 2019 at 06:17:29AM +0300, Valery Ushakov wrote: [...] I've been using etcupdate for ages so I only ever really used postinstall to fix "obsolete" and "catpages". etcupdate -a has some kinks and may be we should concentrate on fixing those instead? I *never* used etcupdate, so for me it's better to have a working postinstall (I have a PR about it: install/52349, which may have been fixed by the recent change) I used etc-update once and accidently overwrote master.passwd Never used it since, far too risky. Roy
Re: ipv6 broken
On 13/05/2019 13:34, Christos Zoulas wrote: In article <332662e7-3c78-5d1b-ce05-8c86806f7...@marples.name>, Roy Marples wrote: On 13/05/2019 03:00, Christos Zoulas wrote: dhcpcd says: May 13 01:47:01 [79]: wm0: ipv6_start: Can't assign requested address dhcpcd should say duplicated adddress based on the below, but that's just cosmetic really. Kernel says: [13.119958] wm0: link state DOWN (was UNKNOWN) [16.261560] wm0: link state UP (was DOWN) [17.283056] wm0: DAD duplicate address fe80:1::56bf:64ff:fe92:10c8 from 00:17:10:87:19:87:46:66 [17.283056] wm0: possible hardware address duplication detected, disable IPv6 [17.426267] wm1: link state UP (was UNKNOWN) [17.427269] Cannot enable an interface with a link-local address marked duplicate. Assuming this is -current either our nonce code it is broken or there really is a duplicate address from hardware address 00:17:10:87:19:87:46:66 Regardless, we need more data. Reverting the nd6 changes and in particular the is this needed part makes the DAD message stop. But the Can't assigned requested address remains. Which ND6 changes specifically?
Re: ipv6 broken
On 13/05/2019 03:00, Christos Zoulas wrote: dhcpcd says: May 13 01:47:01 [79]: wm0: ipv6_start: Can't assign requested address dhcpcd should say duplicated adddress based on the below, but that's just cosmetic really. Kernel says: [13.119958] wm0: link state DOWN (was UNKNOWN) [16.261560] wm0: link state UP (was DOWN) [17.283056] wm0: DAD duplicate address fe80:1::56bf:64ff:fe92:10c8 from 00:17:10:87:19:87:46:66 [17.283056] wm0: possible hardware address duplication detected, disable IPv6 [17.426267] wm1: link state UP (was UNKNOWN) [17.427269] Cannot enable an interface with a link-local address marked duplicate. Assuming this is -current either our nonce code it is broken or there really is a duplicate address from hardware address 00:17:10:87:19:87:46:66 Regardless, we need more data. Roy
Re: "route_enqueue: queue full, dropped message" blast from a 8.99.32 amd64 domU
On 10/05/2019 00:40, Greg A. Woods wrote: [Thu May 9 09:24:08 2019][ 6442662.0806318] route_enqueue: queue full, dropped message There were thousands of identical lines, all separated by a few microseconds. No doubt this spew was the real cause of the apparent interrupt storm and the resulting sluggishness. https://nxr.netbsd.org/xref/src/sys/net/rtsock_shared.c#1602 I would imagine that if an interface is interupting that much then it's constantly sending messages to route(4) to say that it's up/down and addresses are detached/tentative in a tight loop. The queueing mechanism has a fixed length and while we go out of our way to notify userland if there's an error sending these messages, we can't send this one at all so we just log it. So it's an artifact of your interupt storm, but not the cause. Roy
Re: Automated report: NetBSD-current/i386 build failure
I think Christos has kindly fixed this for me.Roy
Re: Automated report: NetBSD-current/i386 build failure
On 22/01/2019 20:30, Andreas Gustafsson wrote: The NetBSD Test Fixture wrote: --- dhcpcd_make --- cc1: all warnings being treated as errors *** [dhcpcd.o] Error code 1 More relevant error messages from earlier in the log: --- dhcpcd_make --- /tmp/bracket/build/2019.01.22.17.41.06-i386/src/external/bsd/dhcpcd/dist/src/dhcpcd.c: In function 'dhcpcd_handlecarrier': /tmp/bracket/build/2019.01.22.17.41.06-i386/src/external/bsd/dhcpcd/dist/src/dhcpcd.c:768:6: error: implicit declaration of function 'ipv4ll_reset' [-Werror=implicit-function-declaration] ipv4ll_reset(ifp); ^~~~ Should be fixed now. Roy
Re: failed to create llentry
On 22/11/2018 00:36, Greg Troxel wrote: Roy Marples writes: On 21/11/2018 19:51, co...@sdf.org wrote: -B -M -c /etc/wpa_supplicant.conf -s seem like really good flags, thanks. (are they good enough to be a default? right now anyone using wifi has to have wpa_supplicant_flags set, so we can't break their usage) Yes and no. We would need to ship a default wpa_supplicant.conf - probably enabling the default socket so wpa_cli(8) just works and commented out entries for connecting to any open ap and a specific ap with psk. We might want to enable (but commented out maybe to start with) the ability instructions over the control socket to configure wpa_supplicant.conf as well. This would be handy for applications like dhcpcd-{gtk,qt} Then, the user just has to set wpa_supplicant=YES in rc.conf and voila, wireless network setup with X11 and a systray application becomes a lot easier for the end user to setup. I am unclear on the fine points, but in general find wpa_supplicant to be way too painful to deal with. It really seems like it should be able to be started by default, It is painful without a good setup, yes. It can be started by default if the user so chooses. So I see sysinst network config coming down to this: Auto-start wireless Y/N Auto-configure addresses Y/N If auto-start wireless is Y, or autoconfigure addresses is N, spawn dhcpcd-curses to handle both. You don't actually pick an interface by default. I don't even propose we have an advanced section - you want anything more, drop to the shell and do it. ifconfig and route are not hard, neither is editing resolv.conf. Job done. and exit if no wifi interfaces, Why? Hotplugging of wifi is a thing. Pinebooks are a really good example of having no networking at boot. I generally plug the stick and ethernet dongle/cable in after boot. and have some command-line wifi_choose program that prints out a list of SSIDs, takes a number, and asks for a password, and stores both the ssid and the password, and next time just connects. Sort of like how a mac works clicking on the wifi icon, but command line. And a gui version would be fine too of course. To me this is the biggest NetBSD wifi usability issue, or perhaps it's just behind USB wifi adaptors being slightly flaky. By GUI you mean X11 based? dhcpcd-{gtk,qt} satisfy this on BSD at least. dhcpcd-curses is a thing, but it's currently just a monitor. Now I have a pinebook I can concentrate on fixing some recent dhcpcd/netbsd/platform bitrot with shared IP address and then work on dhcpcd-curses once more now I have a working NetBSD environment with wireless once again. Roy
Re: failed to create llentry
On 21/11/2018 19:51, co...@sdf.org wrote: -B -M -c /etc/wpa_supplicant.conf -s seem like really good flags, thanks. (are they good enough to be a default? right now anyone using wifi has to have wpa_supplicant_flags set, so we can't break their usage) Yes and no. We would need to ship a default wpa_supplicant.conf - probably enabling the default socket so wpa_cli(8) just works and commented out entries for connecting to any open ap and a specific ap with psk. We might want to enable (but commented out maybe to start with) the ability instructions over the control socket to configure wpa_supplicant.conf as well. This would be handy for applications like dhcpcd-{gtk,qt} Then, the user just has to set wpa_supplicant=YES in rc.conf and voila, wireless network setup with X11 and a systray application becomes a lot easier for the end user to setup. I can't unplug my card because it's PCI. I'll try to investigate next time it happens Another way of restarting things is to down/up the interface. ifconfig urtwn down up Does wonders - both wpa_supplicant and dhcpcd will react to this. There should be no need to kill anything with prejudice. Roy
Re: failed to create llentry
On 21/11/2018 18:55, co...@sdf.org wrote: I don't like debugging problems with daemonized processes. wpa_supplicant for example prints nothing to syslog. the messages it gives to stdout are informative. wpa_supplicant(8) says -s Send log messages through syslog(3) instead of to the terminal. I'm quite grumpy about networking in netbsd in general. I'm actually very happy. For example my remote ssh sessions persist without dropping when the carrier goes down/up. Heck, my dhcp lease died on my pinebook half an hour ago and building pkgsrc entirely over nfs just carried on working again without the blink of an eye. It's not magic, it's NetBSD. Roy
Re: failed to create llentry
On 21/11/2018 17:18, co...@sdf.org wrote: I use wpa_supplicant and dhcpcd. When dhcpcd fails to configure the network I start doing it manually. I don't really pay attention to when the errors occur but I'll try to keep a closer track about when they start. dhcpcd will mysteriously fail while I am connected with wpa_supplicant, How does it mysteriously fail? so I'd kill it and do: pkill -9 dhcpcd That's quite harsh. route -n flush route -n flushall dhcpcd -k should do this (and remove any addresses or anything else it added) if you don't pkill -9 it. ifconfig iwm0 local-ip-i-should-have route add default gateway Usually when these problems happen one of the following occurs too: - wpa_supplicant will complain it can't assign an address every hour or so, and network traffic will stop for a bit wpa_supplicant doesn't assign any kind of address by itself. Can you post some context? - I'll accidentally restart wpa_supplicant before killling all network traffic and get a kernel panic Backtrace would be nice. I guess wpa_supplicant does more than I want to do and run into conflicts with manual setup. Often my urtwn firmware fails for some unknown reason. It's not the most stable stick on my network, but it work in my pinebook. My solution is to remove and insert the stick until the firmware loads correctly. To allow this to work, I setup wpa_supplicant in plug and play mode. wpa_supplicant_flags="-B -M -c /etc/wpa_supplicant.conf" This tells wpa_supplicant to background, match any interface and use the stated config file. dhcpcd runs with default flags and config. I've been plugging in and removing in no set order the usb wifi stick and a usb ethernet dongle and it just works * (there is an issue with IP address sharing, unsure if platform, dhcpcd or kernel issue - I'll be fixing this once I have a working desktop on the pinebook). * Sometimes either interface gets an IPv4LL address which means carrier is "UP" but there's another issue such as firmware failure or the ethernet over power adpater needs a reboot. In any case, no manual address setup or routing is needed. Roy
Re: Travel router part 2
On 05/09/2018 14:59, D'Arcy Cain wrote: On 2018-09-05 08:03 AM, Roy Marples wrote: and have a named configured to use the forwarders in /etc/namedb/forwarders. Whatever the ISP dhcp gives me is stuffed into the forwarders and used as last resort. This has been a robust solution for many open wireless access points. Since NetBSD-6, dhclient-script has shipped with resolvconf(8) support that will do that for you. Do you run dhclient if you have PPPOE set up? Once the interface is up I already have an IP address so what does dhclient do? Note that the router is also the DHCP server so it also has static IPs on the internal interfaces. At some point I will add a second wifi card to connect to campground wifi or tether to my phone. At that point dhclient will probably be set up to talk to that second wifi since it will replace the PPPOE connection. I don't use dhclient :) When I last looked into this, I had pppoe and dhcpcd running at the same time, always up. Both fed their dns info into resolvconf(8) which then configured the results as unbound(8) forwarders. You can tell resolvconf which interfaces take priority over others. You might need to tell dhcpcd to use the desination as the default route and prefer the pppoe interface via metrics if the default isn't to your liking. dhcpcd will support resolvconf without any changes, but our pppoe isn't so friendly and you do need to call it yourself via the up and down actions. Doing it like so, I don't have to manually do anything other than connect to a wireless point or plug an ethernet cable in once the system is booted. As the world moves forward, I would encorage using dhcpcd just because it will also handle DHCPv6 prefix delegation should you need that. You can do it with dhclient as well .. but it needs an awful lot of hand holding to work well. Roy
Re: Travel router part 2
On 04/09/2018 23:21, Brett Lymn wrote: On Sun, Sep 02, 2018 at 11:55:58AM -0400, D'Arcy Cain wrote: Any thoughts on picking up the DNS servers? It's not too bad because my DHCP server can be modified as needed so it is only one location and in any case I always include Google's public servers. I have found that DNS can be problematic when travelling, some places force you through their DNS regardless and do all sorts of lossage. what I do on my laptop is run a local named and configure forwarders to the DNS provided so I can override some of the random lossage. I have a dhclient (yeah, old habits..) enter script that does: restore_resolv_conf() { # We don't want /etc/resolv.conf changed # So this is an empty function return 0 } make_resolv_conf() { if [ -f /etc/namedb/forwarders ] then mv /etc/namedb/forwarders /etc/namedb/forwarders.old fi printf "forwarders { " > /etc/namedb/forwarders for nameserver in $new_domain_name_servers do printf "%s; " ${nameserver} >> /etc/namedb/forwarders done echo "};" >> /etc/namedb/forwarders echo "forward only;" >> /etc/namedb/forwarders pkill -HUP named return 0 } and have a named configured to use the forwarders in /etc/namedb/forwarders. Whatever the ISP dhcp gives me is stuffed into the forwarders and used as last resort. This has been a robust solution for many open wireless access points. Since NetBSD-6, dhclient-script has shipped with resolvconf(8) support that will do that for you. Roy
Re: if_addrflags6: Can't assign requested address
On 18/08/2018 03:43, Roy Marples wrote: On 18/08/2018 03:29, SAITOH Masanobu wrote: This patch worked. if_addrflags6's error messages disappeared. :) Before this patch, Aug 18 01:00:58 amd64-n7 dhcpcd[250]: wm1: dhcp_sendudp: Invalid argument Aug 18 01:30:59 amd64-n7 dhcpcd[250]: wm1: dhcp_sendudp: Invalid argument Aug 18 02:01:01 amd64-n7 dhcpcd[250]: wm1: dhcp_sendudp: Invalid argument Aug 18 02:31:03 amd64-n7 dhcpcd[250]: wm1: dhcp_sendudp: Invalid argument Aug 18 03:01:04 amd64-n7 dhcpcd[250]: wm1: dhcp_sendudp: Invalid argument Aug 18 03:31:05 amd64-n7 dhcpcd[250]: wm1: dhcp_sendudp: Invalid argument Aug 18 04:01:06 amd64-n7 dhcpcd[250]: wm1: dhcp_sendudp: Invalid argument Aug 18 04:31:08 amd64-n7 dhcpcd[250]: wm1: dhcp_sendudp: Invalid argument Aug 18 05:01:09 amd64-n7 dhcpcd[250]: wm1: dhcp_sendudp: Invalid argument Aug 18 05:31:11 amd64-n7 dhcpcd[250]: wm1: dhcp_sendudp: Invalid argument Aug 18 06:01:11 amd64-n7 dhcpcd[250]: wm1: dhcp_sendudp: Invalid argument Aug 18 06:31:12 amd64-n7 dhcpcd[250]: wm1: dhcp_sendudp: Invalid argument Aug 18 07:01:14 amd64-n7 dhcpcd[250]: wm1: dhcp_sendudp: Invalid argument Aug 18 07:31:15 amd64-n7 dhcpcd[250]: wm1: dhcp_sendudp: Invalid argument Aug 18 08:01:16 amd64-n7 dhcpcd[250]: wm1: dhcp_sendudp: Invalid argument Aug 18 08:31:16 amd64-n7 dhcpcd[250]: wm1: dhcp_sendudp: Invalid argument This error message appeared ever 30 minutes, but it also disappeared with this patch. That's avoiding the broken IP_PKTINFO implementation in NetBSD-7 - can't use it to send. Comitted. Pullups requested to -7 and -8. Roy
Re: if_addrflags6: Can't assign requested address
On 18/08/2018 03:29, SAITOH Masanobu wrote: This patch worked. if_addrflags6's error messages disappeared. :) Before this patch, Aug 18 01:00:58 amd64-n7 dhcpcd[250]: wm1: dhcp_sendudp: Invalid argument Aug 18 01:30:59 amd64-n7 dhcpcd[250]: wm1: dhcp_sendudp: Invalid argument Aug 18 02:01:01 amd64-n7 dhcpcd[250]: wm1: dhcp_sendudp: Invalid argument Aug 18 02:31:03 amd64-n7 dhcpcd[250]: wm1: dhcp_sendudp: Invalid argument Aug 18 03:01:04 amd64-n7 dhcpcd[250]: wm1: dhcp_sendudp: Invalid argument Aug 18 03:31:05 amd64-n7 dhcpcd[250]: wm1: dhcp_sendudp: Invalid argument Aug 18 04:01:06 amd64-n7 dhcpcd[250]: wm1: dhcp_sendudp: Invalid argument Aug 18 04:31:08 amd64-n7 dhcpcd[250]: wm1: dhcp_sendudp: Invalid argument Aug 18 05:01:09 amd64-n7 dhcpcd[250]: wm1: dhcp_sendudp: Invalid argument Aug 18 05:31:11 amd64-n7 dhcpcd[250]: wm1: dhcp_sendudp: Invalid argument Aug 18 06:01:11 amd64-n7 dhcpcd[250]: wm1: dhcp_sendudp: Invalid argument Aug 18 06:31:12 amd64-n7 dhcpcd[250]: wm1: dhcp_sendudp: Invalid argument Aug 18 07:01:14 amd64-n7 dhcpcd[250]: wm1: dhcp_sendudp: Invalid argument Aug 18 07:31:15 amd64-n7 dhcpcd[250]: wm1: dhcp_sendudp: Invalid argument Aug 18 08:01:16 amd64-n7 dhcpcd[250]: wm1: dhcp_sendudp: Invalid argument Aug 18 08:31:16 amd64-n7 dhcpcd[250]: wm1: dhcp_sendudp: Invalid argument This error message appeared ever 30 minutes, but it also disappeared with this patch. That's avoiding the broken IP_PKTINFO implementation in NetBSD-7 - can't use it to send. Roy
Re: if_addrflags6: Can't assign requested address
On 17/08/2018 10:08, Roy Marples wrote: On 17/08/2018 09:04, Masanobu SAITOH wrote: wm2: carrier lost wm2: executing `/libexec/dhcpcd-run-hooks' NOCARRIER wm2: deleting address fe80::1392:4012:56d8:a7a2 wm2: if_addrflags6: Can't assign requested address wm2: if_addrflags6: Can't assign requested address wm2: if_addrflags6: Can't assign requested address wm2: if_addrflags6: Can't assign requested address wm2: carrier acquired wm2: executing `/libexec/dhcpcd-run-hooks' CARRIER This helps. I never saw this because on NetBSD-8, we have addrflags available in ifa_msghdr when sent over route(4). This does not exist on NetBSD-7 so we need to make an ioctl per address to work out the flags. Sadly, this is racy and this is what happens: Something adds an address. Kernel annnounces new address to route(4). Something deletes this address. Kernel announces the address deleted to route(4). dhcpcd reads the address added message from route(4) *after* the address has been deleted from the kernel. Because dhcpcd needs the address flags at this point, an ioctl is made to the deleted address and boom, error. Luckily dhcpcd handles it correctly and it's just noise. Please test the attached patch to silence it. If you can verify it works, let me know and I'll push a new version out. Since then I've discovered two more critical issues with dhcpcd-7 on NetBSD-7. 1) Broken IP_PKTINFO implementation 2) Invalid RTA_BRD in RTM_NEWADDR messages for new addresses Both of these have already been fixed in -8 and -current and neither looks suitable for a pullup and dhcpcd needs a workaround for both anyway. A better patch attached and I'll hopefully get this pushed out over the weekend. Roy diff --git a/src/dhcp.c b/src/dhcp.c index 7a6749d4..1e9fe186 100644 --- a/src/dhcp.c +++ b/src/dhcp.c @@ -86,6 +86,11 @@ #define IPDEFTTL 64 /* RFC1340 */ #endif +/* NetBSD-7 has an incomplete IP_PKTINFO implementation. */ +#if defined(__NetBSD_Version__) && __NetBSD_Version__ < 8 +#undef IP_PKTINFO +#endif + /* Assert the correct structure size for on wire */ __CTASSERT(sizeof(struct ip) == 20); __CTASSERT(sizeof(struct udphdr) == 8); diff --git a/src/if-bsd.c b/src/if-bsd.c index c3c95ba6..cdd959a6 100644 --- a/src/if-bsd.c +++ b/src/if-bsd.c @@ -1103,9 +1103,32 @@ if_ifa(struct dhcpcd_ctx *ctx, const struct ifa_msghdr *ifam) sin = (const void *)rti_info[RTAX_NETMASK]; mask.s_addr = sin != NULL && sin->sin_family == AF_INET ? sin->sin_addr.s_addr : INADDR_ANY; + +#if defined(__NetBSD_Version__) && __NetBSD_Version__ < 8 + /* NetBSD-7 and older send an invalid broadcast address. +* So we need to query the actual address to get +* the right one. */ + { + struct in_aliasreq ifra; + + memset(&ifra, 0, sizeof(ifra)); + strlcpy(ifra.ifra_name, ifp->name, + sizeof(ifra.ifra_name)); + ifra.ifra_addr.sin_family = AF_INET; + ifra.ifra_addr.sin_len = sizeof(ifra.ifra_addr); + ifra.ifra_addr.sin_addr = addr; + if (ioctl(ctx->pf_inet_fd, SIOCGIFALIAS, &ifra) == -1) { + if (errno != EADDRNOTAVAIL) + logerr("%s: SIOCGIFALIAS", __func__); + break; + } + bcast = ifra.ifra_broadaddr.sin_addr; + } +#else sin = (const void *)rti_info[RTAX_BRD]; bcast.s_addr = sin != NULL && sin->sin_family == AF_INET ? sin->sin_addr.s_addr : INADDR_ANY; +#endif #if defined(__FreeBSD__) || defined(__DragonFly__) /* FreeBSD sends RTM_DELADDR for each assigned address @@ -1134,8 +1157,8 @@ if_ifa(struct dhcpcd_ctx *ctx, const struct ifa_msghdr *ifam) if (ifam->ifam_type == RTM_DELADDR) addrflags = 0 ; else if ((addrflags = if_addrflags(ifp, &addr, NULL)) == -1) { - logerr("%s: if_addrflags: %s", - ifp->name, inet_ntoa(addr)); + if (errno != EADDRNOTAVAIL) + logerr("%s: if_addrflags", __func__); break; } #endif @@ -1160,7 +1183,8 @@ if_ifa(struct dhcpcd_ctx *ctx, const struct ifa_msghdr *ifam) if (ifam->ifam_type == RTM_DELADDR) addrflags = 0; else if ((addrflags = if_addrflags6(ifp, &addr6, NULL)) == -1) { - logerr("%s: if_addrflags6", ifp->name); + if
Re: if_addrflags6: Can't assign requested address
On 17/08/2018 09:04, Masanobu SAITOH wrote: wm2: carrier lost wm2: executing `/libexec/dhcpcd-run-hooks' NOCARRIER wm2: deleting address fe80::1392:4012:56d8:a7a2 wm2: if_addrflags6: Can't assign requested address wm2: if_addrflags6: Can't assign requested address wm2: if_addrflags6: Can't assign requested address wm2: if_addrflags6: Can't assign requested address wm2: carrier acquired wm2: executing `/libexec/dhcpcd-run-hooks' CARRIER This helps. I never saw this because on NetBSD-8, we have addrflags available in ifa_msghdr when sent over route(4). This does not exist on NetBSD-7 so we need to make an ioctl per address to work out the flags. Sadly, this is racy and this is what happens: Something adds an address. Kernel annnounces new address to route(4). Something deletes this address. Kernel announces the address deleted to route(4). dhcpcd reads the address added message from route(4) *after* the address has been deleted from the kernel. Because dhcpcd needs the address flags at this point, an ioctl is made to the deleted address and boom, error. Luckily dhcpcd handles it correctly and it's just noise. Please test the attached patch to silence it. If you can verify it works, let me know and I'll push a new version out. Thanks Roy diff --git a/src/if-bsd.c b/src/if-bsd.c index c3c95ba6..c03e4f6d 100644 --- a/src/if-bsd.c +++ b/src/if-bsd.c @@ -1134,8 +1134,8 @@ if_ifa(struct dhcpcd_ctx *ctx, const struct ifa_msghdr *ifam) if (ifam->ifam_type == RTM_DELADDR) addrflags = 0 ; else if ((addrflags = if_addrflags(ifp, &addr, NULL)) == -1) { - logerr("%s: if_addrflags: %s", - ifp->name, inet_ntoa(addr)); + if (errno != EADDRNOTAVAIL) + logerr("%s: if_addrflags", __func__); break; } #endif @@ -1160,7 +1160,8 @@ if_ifa(struct dhcpcd_ctx *ctx, const struct ifa_msghdr *ifam) if (ifam->ifam_type == RTM_DELADDR) addrflags = 0; else if ((addrflags = if_addrflags6(ifp, &addr6, NULL)) == -1) { - logerr("%s: if_addrflags6", ifp->name); + if (errno != EADDRNOTAVAIL) + logerr("%s: if_addrflags6", __func__); break; } #endif diff --git a/src/if.c b/src/if.c index eaebefa5..c1c81eb6 100644 --- a/src/if.c +++ b/src/if.c @@ -240,7 +240,7 @@ if_learnaddrs(struct dhcpcd_ctx *ctx, struct if_head *ifs, addrflags = if_addrflags(ifp, &addr->sin_addr, ifa->ifa_name); if (addrflags == -1) { - if (errno != EEXIST) + if (errno != EEXIST && errno != EADDRNOTAVAIL) logerr("%s: if_addrflags: %s", __func__, inet_ntoa(addr->sin_addr)); @@ -266,7 +266,7 @@ if_learnaddrs(struct dhcpcd_ctx *ctx, struct if_head *ifs, addrflags = if_addrflags6(ifp, &sin6->sin6_addr, ifa->ifa_name); if (addrflags == -1) { - if (errno != EEXIST) + if (errno != EEXIST || errno == EADDRNOTAVAIL) logerr("%s: if_addrflags6", __func__); continue; }
Re: Running out of buffers?
On 11/08/2018 16:41, Roy Marples wrote: On 07/08/2018 17:54, Andreas Gustafsson wrote: On April 28, Roy Marples wrote: On 27/04/2018 23:58, Robert Elz wrote: We really need to turn off the error on recv() by default - and allow it to be turned on by applications that actually want to deal with this. Why should we special case reporting this error instead of others? While NetBSD might be the first BSD to report ENOBUFS for recv(), it's certainly not the first OS to do so. I suspect NetBSD may be the first and only to return ENOBUFS for recv() on ordinary UDP sockets, and that this has broken BIND, which is treating ENOBUFS on UDP recv() as an unrecoverable error; see PR misc/53421 and http://mail-index.netbsd.org/tech-kern/2018/08/07/msg023815.html . There is not enough information to say for sure. This could be a non validated address, the behaviour would be as described. If there actually existed another OS that exhibited this behavior, then surely BIND would have exposed the issue long ago, and either BIND or the OS in case would have been fixed. Please restore the old behavior, at least for UDP sockets. Try reading the bind sources: https://github.com/NetBSD/src/blob/trunk/external/bsd/bind/dist/lib/isc/unix/socket.c#L1923 I'll quote it here for good measure: ALWAYS_HARD(ENOBUFS, ISC_R_NORESOURCES); /* Should never get this one but it was seen. */ That part of the code was imported into NetBSD over 10 years ago, which massively pre-dates my recent change to recv() on NetBSD. Similar code in unbound (which I use extensively without issue) also tests for ENOBUFS, which again pre-dates my recv() change but instead works around the error instead of just calling it a day. I've not found where bind opens the socket yet, but hopefully as it hard aborts specifically for ENOBUFS on recv it will ensure a large enough buffer is allocated for recv - unbound sets the maximum possible for reference. So I found the code here: https://github.com/isc-projects/bind9/blob/master/lib/isc/unix/socket.c#L309 /*% * The size to raise the receive buffer to (from BIND 8). */ #ifdef TUNE_LARGE #ifdef sun #define RCVBUFSIZE (1*1024*1024) #else #define RCVBUFSIZE (16*1024*1024) #endif #else #define RCVBUFSIZE (32*1024) #endif /* TUNE_LARGE */ So maybe bind just needs to be compiled with TUNE_LARGE set? https://github.com/NetBSD/src/blob/trunk/external/bsd/bind/include/config.h#L597 Roy
Re: Running out of buffers?
On 07/08/2018 17:54, Andreas Gustafsson wrote: On April 28, Roy Marples wrote: On 27/04/2018 23:58, Robert Elz wrote: We really need to turn off the error on recv() by default - and allow it to be turned on by applications that actually want to deal with this. Why should we special case reporting this error instead of others? While NetBSD might be the first BSD to report ENOBUFS for recv(), it's certainly not the first OS to do so. I suspect NetBSD may be the first and only to return ENOBUFS for recv() on ordinary UDP sockets, and that this has broken BIND, which is treating ENOBUFS on UDP recv() as an unrecoverable error; see PR misc/53421 and http://mail-index.netbsd.org/tech-kern/2018/08/07/msg023815.html . There is not enough information to say for sure. This could be a non validated address, the behaviour would be as described. If there actually existed another OS that exhibited this behavior, then surely BIND would have exposed the issue long ago, and either BIND or the OS in case would have been fixed. Please restore the old behavior, at least for UDP sockets. Try reading the bind sources: https://github.com/NetBSD/src/blob/trunk/external/bsd/bind/dist/lib/isc/unix/socket.c#L1923 I'll quote it here for good measure: ALWAYS_HARD(ENOBUFS, ISC_R_NORESOURCES); /* Should never get this one but it was seen. */ That part of the code was imported into NetBSD over 10 years ago, which massively pre-dates my recent change to recv() on NetBSD. Similar code in unbound (which I use extensively without issue) also tests for ENOBUFS, which again pre-dates my recv() change but instead works around the error instead of just calling it a day. I've not found where bind opens the socket yet, but hopefully as it hard aborts specifically for ENOBUFS on recv it will ensure a large enough buffer is allocated for recv - unbound sets the maximum possible for reference. Roy
Re: if_addrflags6: Can't assign requested address
Hi On 08/08/2018 03:13, Masanobu SAITOH wrote: Hi. While testing netbsd-7, I've noticed dhcpcd put the following message: Configuring network interfaces: wm0wm0: if_addrflags6: Can't assign requested address wm0: if_addrflags6: Can't assign requested address wm0: if_addrflags6: Can't assign requested address wm0: if_addrflags6: Can't assign requested address Can we ignore this message, or is it a real problem? /etc/dhcpcd.conf is the default. I just got back and cannot replicate this issue with the latest netbsd-7 sources which ship with dhcpcd-7.0.7. I use a XEN DOMU for testing. Can you provide more information please? Roy
Re: if_addrflags6: Can't assign requested address
That's a real problem I'm away from any test bed until next week so I'll try and look at it then. Can you add debug to dhcpcd and maybe a logfile directive and attach the result to a reply please? Roy On 8 August 2018 04:13:30 CEST, Masanobu SAITOH wrote: > Hi. > > While testing netbsd-7, I've noticed dhcpcd put the following >message: > >> Configuring network interfaces: wm0wm0: if_addrflags6: Can't assign >requested address >> wm0: if_addrflags6: Can't assign requested address >> wm0: if_addrflags6: Can't assign requested address >> wm0: if_addrflags6: Can't assign requested address > > Can we ignore this message, or is it a real problem? > >/etc/dhcpcd.conf is the default. > >-- >--- > SAITOH Masanobu (msai...@execsw.org > msai...@netbsd.org) -- Sent from my Android device with K-9 Mail. Please excuse my brevity.
Re: Cross-building release on MacOSX for amd64 fails...
On 04/06/2018 17:11, K. Schreiner wrote: ...like so: `progress.ro' is up to date. compile dhcpcd/dhcp.o /u/NetBSD/src/external/bsd/dhcpcd/dist/src/dhcp.c: In function 'dhcp_arp_probed': /u/NetBSD/src/external/bsd/dhcpcd/dist/src/dhcp.c:2105:2: error: implicit declaration of function 'ipv4ll_drop' [-Werror=implicit-function-declaration] ipv4ll_drop(ifp); ^~~ cc1: all warnings being treated as errors *** Failed target: dhcp.o *** Failed command: /u/NetBSD/arch/amd64/TOOLS/bin/x86_64--netbsd-gcc -Os -fno-asynchronous-unwind-tables -pipe -fstack-protector -Wstack-protector --param ssp-buffer-size=1 -std=gnu99 -Wall -Wstrict-prototypes -Wmissing-prototypes -Wpointer-arith -Wno-sign-compare -Wsystem-headers -Wno-traditional -Wa,--fatal-warnings -Wreturn-type -Wswitch -Wshadow -Wcast-qual -Wwrite-strings -Wextra -Wno-unused-parameter -Wno-sign-compare -Wold-style-definition -Wconversion -Wsign-compare -Wformat=2 -Wno-format-zero-length -Werror --sysroot=/u/NetBSD/arch/amd64/dest -DHAVE_CONFIG_H -D_OPENBSD_SOURCE -DSMALL -DARP -DINET -DINET6 -DDHCP6 -I/u/NetBSD/src/external/bsd/dhcpcd/include -I/u/NetBSD/src/external/bsd/dhcpcd/dist/src -I/u/NetBSD/arch/amd64/obj/distrib/amd64/ramdisks/ramdisk/dhcpcd -D_FORTIFY_SOURCE=2 -c /u/NetBSD/src/external/bsd/dhcpcd/dist/src/dhcp.c *** Error code 1 Stop. My bad! Should be fixed now. Roy
Re: Running out of buffers?
On 01/05/2018 20:21, Roy Marples wrote: Another patch. This time to handle a reported overflow listening to ND6. This one actually works Index: sys/netinet6/in6_proto.c === RCS file: /cvsroot/src/sys/netinet6/in6_proto.c,v retrieving revision 1.122 diff -u -p -r1.122 in6_proto.c --- sys/netinet6/in6_proto.c15 Mar 2018 08:15:21 - 1.122 +++ sys/netinet6/in6_proto.c1 May 2018 19:33:42 - @@ -597,7 +597,7 @@ int pmtu_expire = 60*10; * Nominal space allocated to a raw ip socket. */ #defineRIPV6SNDQ 8192 -#defineRIPV6RCVQ 8192 +#defineRIPV6RCVQ 16384 u_long rip6_sendspace = RIPV6SNDQ; u_long rip6_recvspace = RIPV6RCVQ;
Re: Running out of buffers?
On 27/04/2018 21:34, Roy Marples wrote: Hi Paul On 27/04/2018 04:09, Paul Goyette wrote: I've got lots of memory, so I don't understand what buffers are not available. Ever since upgrading to my current system (sources dated 2018-03-20 11:25:00 UTC), I've been seeing these messages at random intervals: Can you test the below patches please? The kernel part bumps the default raw socket buffer from 8k to 16k At least my ERLITE no longer complains about route socket overflow on boot. The patch to syslogd ensures that the logpath socket receive buffer is a minimum of 16k - the current default is 4k. Hopefully this fixes the issues and won't impact small memory devices too much. Another patch. This time to handle a reported overflow listening to ND6. Index: sys/netinet6/in6_proto.c === RCS file: /cvsroot/src/sys/netinet6/in6_proto.c,v retrieving revision 1.122 diff -u -p -r1.122 in6_proto.c --- sys/netinet6/in6_proto.c15 Mar 2018 08:15:21 - 1.122 +++ sys/netinet6/in6_proto.c1 May 2018 19:18:22 - @@ -597,7 +597,7 @@ int pmtu_expire = 60*10; * Nominal space allocated to a raw ip socket. */ #defineRIPV6SNDQ 8192 -#defineRIPV6RCVQ 8192 +#defineRIPV6RCV2 16384 u_long rip6_sendspace = RIPV6SNDQ; u_long rip6_recvspace = RIPV6RCVQ;
Re: Running out of buffers?
On 27/04/2018 23:58, Robert Elz wrote: Date:Fri, 27 Apr 2018 21:34:49 +0100 From:Roy Marples Message-ID: | Hopefully this fixes the issues and won't impact small memory devices | too much. While those are probably useful changes to make, they don't fix anything, merely make it less likely. Until we can dynamically size the buffer in the kernel on demand you are correct. We really need to turn off the error on recv() by default - and allow it to be turned on by applications that actually want to deal with this. Why should we special case reporting this error instead of others? While NetBSD might be the first BSD to report ENOBUFS for recv(), it's certainly not the first OS to do so. Looking at Pauls logs, ntpd is reporting this a fair bit. Looking at ntpd, it already *has* logic to deal exclusivly with this error - it logs it and continues. Any other error and it closes the socket and gives up. Roy
Re: Running out of buffers?
Hi Paul On 27/04/2018 04:09, Paul Goyette wrote: I've got lots of memory, so I don't understand what buffers are not available. Ever since upgrading to my current system (sources dated 2018-03-20 11:25:00 UTC), I've been seeing these messages at random intervals: Can you test the below patches please? The kernel part bumps the default raw socket buffer from 8k to 16k At least my ERLITE no longer complains about route socket overflow on boot. The patch to syslogd ensures that the logpath socket receive buffer is a minimum of 16k - the current default is 4k. Hopefully this fixes the issues and won't impact small memory devices too much. Roy Index: sys/net/raw_cb.h === RCS file: /cvsroot/src/sys/net/raw_cb.h,v retrieving revision 1.28 diff -u -p -r1.28 raw_cb.h --- sys/net/raw_cb.h25 Sep 2017 01:56:22 - 1.28 +++ sys/net/raw_cb.h27 Apr 2018 20:30:55 - @@ -57,7 +57,7 @@ struct rawcb { * Nominal space allocated to a raw socket. */ #defineRAWSNDQ 8192 -#defineRAWRCVQ 8192 +#defineRAWRCVQ 16384 LIST_HEAD(rawcbhead, rawcb); Index: usr.sbin/syslogd/syslogd.c === RCS file: /cvsroot/src/usr.sbin/syslogd/syslogd.c,v retrieving revision 1.124 diff -u -p -r1.124 syslogd.c --- usr.sbin/syslogd/syslogd.c 10 Sep 2017 17:01:07 - 1.124 +++ usr.sbin/syslogd/syslogd.c 27 Apr 2018 20:30:56 - @@ -75,6 +75,9 @@ __RCSID("$NetBSD: syslogd.c,v 1.124 2017 #include "syslogd.h" #include "extern.h" +/* Minimum size of the logpath socket buffer */ +#defineRCVBUFLEN 16384 + #ifndef DISABLE_SIGN #include "sign.h" struct sign_global_t GlobalSign = { @@ -480,6 +483,9 @@ getgroup: die(0, 0, NULL); } for (j = 0, pp = LogPaths; *pp; pp++, j++) { + int buflen; + socklen_t socklen = sizeof(buflen); + DPRINTF(D_NET, "Making unix dgram socket `%s'\n", *pp); unlink(*pp); memset(&sunx, 0, sizeof(sunx)); @@ -493,6 +499,19 @@ getgroup: die(0, 0, NULL); } DPRINTF(D_NET, "Listening on unix dgram socket `%s'\n", *pp); + if (getsockopt(funix[j], SOL_SOCKET, SO_RCVBUF, + &buflen, &socklen) == -1) { + logerror("getsockopt: SO_RCVBUF: `%s'", *pp); + continue; + } + if (buflen >= RCVBUFLEN) + continue; + buflen = RCVBUFLEN; + if (setsockopt(funix[j], SOL_SOCKET, SO_RCVBUF, + &buflen, socklen) == -1) { + logerror("setsockopt: SO_RCVBUF: `%s'", *pp); + continue; + } } if ((fklog = open(_PATH_KLOG, O_RDONLY, 0)) < 0) {
Re: Running out of buffers?
On 27/04/2018 09:45, Patrick Welche wrote: The very odd situation in which I saw those buffer overflows, is simply on a home machine, so flaky home broadband, running a pkg_rolling-replace. The machine has 32G of memory, but from your message that is irrelevant. The urtwmn0 was struggling (that's new BTW) and I kept having to /etc/rc.d/wpa_supplicant restart. While texlive was being downloaded, I hit ctrl-C, and then saw the reams of buffer messages. In terms of routing, there is just 1 default route. Maybe all the wpa_supplicant restarts and dhcpcd kicking in helped? (Doesn't really fit the picture...) Which application logged the error? Both wpa_supplicant and dhcpcd look at route(4). dhcpcd will note it and call getifaddrs(3) to resync the state of affairs. wpa_supplicant will just log the error and continue. Roy
Re: Running out of buffers?
On 27/04/2018 07:05, Robert Elz wrote: Date:Fri, 27 Apr 2018 05:18:16 +0100 From:Roy Marples Message-ID: <58598dae-238e-44a5-e74f-bbb2fdd7b...@marples.name> | No-one has yet weighed in on how this should be resolved. Go back to silently discarding the error (at least, by default). Datagram type services (which is what the routing socket, and others like it, are) generally are just "best effort" with no error reporting at all. The higher level protocol needs to cope. However, some kind of sockioctl() to enable error reporting would be OK, for applications that actually need to know (but they still need to cope with data lost for other reasons.) What might be interesting to discover however, is just why there are so many routing socket errors with the buffer space exhausted - particularly on a huge system like Paul's. I would have expected this to be rare assuming everything else is working correctly. My understanding (and I've not looked, so could be wrong) is that the routing socket gets a 2k buffer by default regardless of how big your memory is. Since NetBSD-5, I've modified the kernel to announce IPv6 address state changes, introduced IPv4 address state changes which are also announced AND added a layer of compat to the more generic RTM messages so that interface address changes can report back PID and flags. In other words, while a 2k buffer might have been fine for NetBSD-4 (and we'll never really know because overflow errors were silently dropped) it might not be fine for a router with many addresses that all become active with the internet decides to work. This is an important part because of all the NetBSD machines I have, the routing socket only overflows on my ERLITE router. The other physical servers, laptops and VMs I have do not. Roy
Re: Running out of buffers?
On 27/04/2018 04:09, Paul Goyette wrote: I've got lots of memory, so I don't understand what buffers are not available. Ever since upgrading to my current system (sources dated 2018-03-20 11:25:00 UTC), I've been seeing these messages at random intervals: Apr 23 05:51:33 speedy ntpd[526]: routing socket reports: No buffer space available This may come as some suprise, but the only change is that the error is now logged. Previously it was silenty discarded. No-one has yet weighed in on how this should be resolved. I never saw them with a previous kernel (from March 3rd), so it would seem that something changed between the 3rd and 20th. Is anyone else seeing similar? Any clues on what changed? The situation doesn't seem fatal (at least, not yet), but I'd like to mitigate the condition before it gets worse. :) Ideas welcome! The only one stop solution I can think of is increasing the the default buffer size, but this might adversley affect small memory systems. Thanks in advance for any suggestions. Looking forward to hearimg some! Roy
Re: -current cloner interfaces broken/gone/unusable
On 23/04/2018 23:34, Robert Swindells wrote: Frank Kardel wrote: using -current as of 20180421 (NetBSD 8.99.14 (GENERIC) #0: Sat Apr 21 23:01:29 UTC 2018 mkre...@mkrepro.netbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC amd64) no cloning interfaces are visible: gateway# ifconfig -l ixg0 ixg1 ixg2 ixg3 lo0 tun0 tun1 gateway# ifconfig -C ifconfig: SIOCIFGCLONERS for count: Device not configured gateway# ifconfig vlan0 create ifconfig: clone_command: Device not configured ifconfig: exec_matches: Device not configured gateway# This does not seem to be a desirable state - any clues what broke here ? It looks to be the test for a valid interface name in sys/compat/common/uipc_syscalls_50.c that is causing this, I think it should only be done when the ioctl command is SIOCGIFDATA or SIOCZIFDATA. This works for me but is a bit ugly: Index: uipc_syscalls_50.c === RCS file: /cvsroot/src/sys/compat/common/uipc_syscalls_50.c,v retrieving revision 1.4 diff -u -r1.4 uipc_syscalls_50.c --- uipc_syscalls_50.c 12 Apr 2018 18:50:13 - 1.4 +++ uipc_syscalls_50.c 23 Apr 2018 22:33:14 - @@ -63,9 +63,17 @@ struct ifnet *ifp; int error; - ifp = ifunit(ifdr->ifdr_name); - if (ifp == NULL) - return ENXIO; + switch (cmd) { + case SIOCGIFDATA: + case SIOCZIFDATA: + ifp = ifunit(ifdr->ifdr_name); + if (ifp == NULL) + return ENXIO; + break; + default: + ifp = NULL; + break; + } switch (cmd) { case SIOCGIFDATA: Committed, thanks Roy
Re: Automated report: NetBSD-current/i386 test failure
On 24/04/2018 15:27, Martin Husemann wrote: On Mon, Apr 23, 2018 at 08:51:52AM +, NetBSD Test Fixture wrote: This is an automatically generated notice of new failures of the NetBSD test suite. The newly failing test cases are: net/ndp/t_ra:ra_basic [..] 2018.04.20.11.25.39 roy src/usr.sbin/rtadvd/rtadvd.c,v 1.63 2018.04.20.11.31.54 roy src/usr.sbin/rtadvd/rtadvd.c,v 1.64 2018.04.20.13.27.45 roy src/usr.sbin/rtadvd/timer.c,v 1.16 2018.04.20.13.27.45 roy src/usr.sbin/rtadvd/timer.h,v 1.10 2018.04.20.15.29.19 roy src/usr.sbin/rtadvd/config.c,v 1.39 2018.04.20.15.57.23 roy src/usr.sbin/rtadvd/config.c,v 1.40 2018.04.20.15.57.23 roy src/usr.sbin/rtadvd/rtadvd.c,v 1.65 2018.04.20.15.57.23 roy src/usr.sbin/rtadvd/rtadvd.h,v 1.17 2018.04.20.15.59.17 roy src/usr.sbin/rtadvd/timer.c,v 1.17 2018.04.20.16.07.48 roy src/usr.sbin/rtadvd/timer.c,v 1.18 2018.04.20.16.18.18 roy src/usr.sbin/rtadvd/rtadvd.h,v 1.18 2018.04.20.16.37.17 roy src/usr.sbin/rtadvd/rtadvd.conf.5,v 1.19 2018.04.20.16.37.17 roy src/usr.sbin/rtadvd/rtadvd.h,v 1.19 The test seems to assume that rtadvd will send at least one RA at startup, but the kernel does not count any in this case. So the test case loops in "await_RA" with a current RA count of 0, untill ATF times the whole process out. Maybe the kernel is rejecting the RA somehow? I've checked and rtadvd is still sending 3 unsolicited RA's - one pretty much ASAP and two 15 seconds afterwards, all with some randomisation. Enable nd6_debug on the rump kernel expecting to process them and check for errors. Roy
Re: -current cloner interfaces broken/gone/unusable
Hi Tom On 24/04/2018 12:39, Tom Ivar Helbekkmo wrote: Thomas Klausner writes: On Tue, Apr 24, 2018 at 08:56:48AM +0100, Roy Marples wrote: Saying this, from what I'm hearing this only happens at boot time, so we could potentially shrink the buffer back down again if we need to consider dynamically growing it in the kernel as well. No idea if that's even possible or what performance impact it would have. I had an application report an UDP error with "no buffer space available". I don't remember the exact error, sorry. But it was definitely some time after system start. Thomas I keep getting those, and have been for a long, long time: Apr 24 02:44:27 barsoom openvpn[301]: write UDPv4: No buffer space available (code=55) Apr 24 05:54:47 barsoom openvpn[292]: write UDPv4: No buffer space available (code=55) Apr 24 07:24:54 barsoom openvpn[292]: write UDPv4: No buffer space available (code=55) Apr 24 07:24:54 barsoom openvpn[292]: write UDPv4: No buffer space available (code=55) Apr 24 08:53:08 barsoom openvpn[292]: write UDPv4: No buffer space available (code=55) Apr 24 08:53:09 barsoom openvpn[292]: write UDPv4: No buffer space available (code=55) Apr 24 10:15:09 barsoom openvpn[305]: write UDPv4: No buffer space available (code=55) Apr 24 10:45:14 barsoom openvpn[305]: write UDPv4: No buffer space available (code=55) Apr 24 11:35:18 barsoom openvpn[305]: write UDPv4: No buffer space available (code=55) Apr 24 13:15:12 barsoom openvpn[305]: write UDPv4: No buffer space available (code=55) This unrelated to the issue at hand. That's an upstream issue - the send and write family calls have been returning ENOBUFS for quite a while on all OS's I know of. Roy
Re: -current cloner interfaces broken/gone/unusable
On 24/04/2018 08:26, Martin Husemann wrote: On Tue, Apr 24, 2018 at 07:30:04AM +0200, Frank Kardel wrote: syslogd has sometimes issues with /var/run/log 2018-04-24T05:13:34.542548+00:00 gateway syslogd 408 - - recvfrom() unix `/var/run/log': No buffer space available This is a seaparate change and unrelated to compatibility. It happens with up to date binaries as well. I think it was a silent bug before and has now been made more verbose. Still pretty annoying and happens for me on various machines on every boot. Roy, did you have a chance to look at it? Not yet no. But yes, in all releases prior it was a silent bug on all types of socket and in all the BSDs as well. I know, I checked - only OpenBSD has an overflow check like this and they solve that with a magic message on route(4) only which is just yuck as it makes the problem worse. I only have one machine where I can reliably repro this, my erlite and that only happens because route(4) overflows (detected in dhcpcd) as it's a router and the box isn't up yet and a load of address validation flows over the socket when the link comes up. This is a good thing, because dhcpcd can then react to the error and sync it's state using getifaddrs(). I think the easiest fix is to increase the default size of the socket buffer. Where this is done, I don't know but could find out if pushed. This would fix everything if the default buffer was big enough. Saying this, from what I'm hearing this only happens at boot time, so we could potentially shrink the buffer back down again if we need to consider dynamically growing it in the kernel as well. No idea if that's even possible or what performance impact it would have. The last option is to increase the socket buffer size in all affected applications using ioctl (or is it setsockopt?). But to what value I don't know. Trial and error? Roy
Re: dhcpcd vs dhclient: part II (fxp0)
On 08/03/2018 14:50, John D. Baker wrote: I see this behavior with all fxp(4) interfaces under 'dhcpcd'. The carrier status from the interface bounces and causes 'dhcpcd' to repeatedly configure/unconfigure the interface. The workaround I use is to add: interface fxp0 nolink to my "/etc/dhcpcd.conf" file or pass "-K" option on command line. This should be alleviated somewhat in -current and -8 with the new address handling. The link will still flap, but dhcpcd will now persist the lease and give time for the lease to be renewed before expiring it and going into discovery once more. Roy
Re: DHCP client: dhclient vs dhcpcd ?
On 02/02/2018 12:24, Riccardo Mottola wrote: does dhpcd share code/features with dhcpcd found on other systems beyond the name? Every feature bar one found in ISC dhclient can be found in dhcpcd. The one missing feature is the ability of the DHCP client to directly update a DNS server with it's FQDN/hostname. My view is that this is best left to the DHCP server to handle the update as frequently both are on the same machine and thus no extra security is needed. dhcpcd also supports a very similar script environment - most of the variables have the same name and format. No code is shared at all between the two projects. Do you have any details on why dhcpcd failed and how dhclient worked, like say packet captures? I can probably guess though - some DHCP servers only work when the client id is in a format they know. However, this is not RFC compliant. Luckily dhcpcd can be configured to sent a client id the DHCP server does like, but this is not out of the box config, but is documented in said config. Likewise dhclient doesn't work on links where a clientid is required out of the box. I too essentially always use dhclient, it works, while I had issues with dhpcd. I have not yet a situation where the opposite true, but I am not doubting there are. dhcpcd not working where the others work has another unpleasant side effect: not working during the installer. I will see if I can find again a network where dhpcd fails.. I hope it was like at home, at the office or at my parents, so it is something easily to reproduce. If it is a network on a customer's site I might not have access to it anymore.. who knows which one it was! One thing I did not try at the time is to see if it was a purely dhcp issue or also network card depndent (e.g. wireless would work, wired not). I think to remember that fiddling with the media type helped on the wired network, but I might confusing the issues and in any case dhclient "did all the magic" dhcpcd not :) I don't think dhclient does anything special with the media. I will say that we have a PR where dhcpcd fails during the installer and dhclient works, but this turned out to be a hardware failure with the carrier detection. Swapping out the faulty hardware made the problem go away. dhcpcd is very sensitive to carrier up/down events. This has improved a lot in NetBSD-8 thanks to moving IPv4 DaD from dhcpcd into the kernel which allows dhcpcd to maintain the lease on the interface if the link flaps and still be a good network citizen. Roy
Re: DHCP client: dhclient vs dhcpcd ?
Hi Thomas On 01/02/2018 07:17, Thomas Mueller wrote: On Wed, Jan 31, 2018 at 1:18 PM, KIRIHARA Masaharu wrote: NetBSD has two DHCP clients; dhclient(8) and dhcpcd(8). What's the difference? Which is better to use? I'm biased as I maintain dhcpcd, but dhcpcd is better in every way. On Wed, 31 Jan 2018 13:47:42 +0100, Benny Siegert responded: I agree that this is confusing. dhclient is the older tool, while dhcpcd has been created by a NetBSD developer, is newer and smaller. I have run into situations (on Google Compute Engine for instance) where dhclient was unable to interpret some of the more modern DHCP features. I recommend using dhcpcd :) I have read about NetBSD planning to drop dhclient in favor of dhcpcd. I have had installations where dhcpcd succeeded where dhclient failed, and (7.99.1 amd64) where dhclient succeeded where dhcpcd failed > Failure means not being able to set up the internet connection even if the command ran without error messages. Do you have any details on why dhcpcd failed and how dhclient worked, like say packet captures? I can probably guess though - some DHCP servers only work when the client id is in a format they know. However, this is not RFC compliant. Luckily dhcpcd can be configured to sent a client id the DHCP server does like, but this is not out of the box config, but is documented in said config. Likewise dhclient doesn't work on links where a clientid is required out of the box. I have also had a situation where neither dhcpcd nor dhclient could establish the internet connection, but I was able to connect by using ifconfig and route directly. More details please. I notice NetBSD's dhclient is very big while FreeBSD's dhclient is much smaller, like $ ls -l /sbin/dhclient -r-xr-xr-x 1 root wheel 100056 Jul 31 2017 /sbin/dhclient $ ls -l /media/zip0/sbin/dh* -r-xr-xr-x 1 root wheel 5352184 Jun 20 2017 /media/zip0/sbin/dhclient -r-xr-xr-x 1 root wheel 6221 Jun 20 2017 /media/zip0/sbin/dhclient-script -r-xr-xr-x 1 root wheel 299176 Jun 20 2017 /media/zip0/sbin/dhcpcd running from FreeBSD 11.1-STABLE where /media/zip0 is mount point for NetBSD 8.99.1 installation. FreeBSD uses dhclient in base system, which does not include dhcpcd. FreeBSD dhclient is based on OpenBSD one, which is basically a very stripped down and old ISC dhclient which supports DHCPv4 only and isn't extendable. NetBSD ships a more modern and non stripped down ISC dhclient which is more bloaty and extendable but offers more features like say DHCPv6. For a fair comparison dhcpcd can be compiled for DHCPv4 only (like FreeBSD and OpenBSD) and it is currently 120k on i386. But even then, that includes the control socket code AND custom DHCP option parsing code to pass to shell scripts which cannot currently be stripped out. But frankly, with your numbers above, a client with all the features dhcpcd has and only weighing in at 299176 on disk is pretty impressive - newer versions in more recent NetBSD are smaller still. Roy
Re: Crash related to VLANs in Oct 18th -current
On 24/10/17 23:34, Roy Marples wrote: On 24/10/17 23:30, Roy Marples wrote: On 24/10/17 13:27, Tom Ivar Helbekkmo wrote: Roy Marples writes: This should only happen when dhcpcd is restarted. I just checked, and when I restart dhcpcd (from current, with your latest patch manually added), it correctly does a gratuitous arp announcement on the right VLAN -- while the UDP checksum error messages are comfortably absent. :) You should also apply this subsequent patch picked up during further tesing: https://roy.marples.name/git/dhcpcd.git/commit/?id=8dc83479f50e2ed8b51c5a9383d27367bea1ecea Whups :) Minor change here also: https://roy.marples.name/git/dhcpcd.git/commit/?id=b091529ddd7d0541548b0a41e78a84bcc65364ef Must be a bad hair day. https://roy.marples.name/git/dhcpcd.git/commit/?id=9ab9c8f51d05a0cb07d1ce641eabfdab61cb107d https://roy.marples.name/git/dhcpcd.git/commit/?id=621d35c15337577c154ca549aedf4649cc524ba9 I think that should be it now. All test cases on all platforms currently passing. Roy
Re: Crash related to VLANs in Oct 18th -current
On 24/10/17 23:30, Roy Marples wrote: On 24/10/17 13:27, Tom Ivar Helbekkmo wrote: Roy Marples writes: This should only happen when dhcpcd is restarted. I just checked, and when I restart dhcpcd (from current, with your latest patch manually added), it correctly does a gratuitous arp announcement on the right VLAN -- while the UDP checksum error messages are comfortably absent. :) You should also apply this subsequent patch picked up during further tesing: https://roy.marples.name/git/dhcpcd.git/commit/?id=8dc83479f50e2ed8b51c5a9383d27367bea1ecea Whups :) Minor change here also: https://roy.marples.name/git/dhcpcd.git/commit/?id=b091529ddd7d0541548b0a41e78a84bcc65364ef
Re: Crash related to VLANs in Oct 18th -current
On 24/10/17 13:27, Tom Ivar Helbekkmo wrote: Roy Marples writes: This should only happen when dhcpcd is restarted. I just checked, and when I restart dhcpcd (from current, with your latest patch manually added), it correctly does a gratuitous arp announcement on the right VLAN -- while the UDP checksum error messages are comfortably absent. :) You should also apply this subsequent patch picked up during further tesing: https://roy.marples.name/git/dhcpcd.git/commit/?id=8dc83479f50e2ed8b51c5a9383d27367bea1ecea Roy
Re: Crash related to VLANs in Oct 18th -current
On 24/10/2017 11:58, Tom Ivar Helbekkmo wrote: Roy Marples writes: The caveat is that we now need to ARP announce the address during reboot to ensure dhcpcd gets the reply on an active interface. I assume it'll only do send a gratuitous ARP announcement for an address whose lease is still active? :) Yes. If dhcpcd is rebooting an existing lease (which is still valid or admin asks to extend) AND the address exists on the interface AND is usable AND no other interface in the dhcpcd instance has the same address in an active lease then a gratuitous ARP is announced. This should only happen when dhcpcd is restarted. It shouldn't happen at any other time. Let me know how it works for you. Running with your latest patch now, and it's working fine for my simple configuration, at least. Great! I'll look into releasing a new build towards the weekend and putting it into -current. Roy
Re: Crash related to VLANs in Oct 18th -current
On 23/10/2017 12:18, Roy Marples wrote: On 23/10/2017 11:28, Tom Ivar Helbekkmo wrote: Has something changed that makes dhcpcd now insist on listening to all interfaces (including the 802.1q trunk)? Yes. I will try and improve the logic so it's only the relevant interfaces. The change was made to allow IP address sharing on many interfaces via DHCP without actually removing the IP address from the non active interfaces. This might have been over-zealous on my part. Can I make it not do that? Currently not, no. Hopefully I can change it so that no toggle for it is needed. Patch here to make it not do this anymore: https://roy.marples.name/git/dhcpcd.git/commit/?id=c72da9a1ce60d006136c5aa3e1c923d96761a171 The caveat is that we now need to ARP announce the address during reboot to ensure dhcpcd gets the reply on an active interface. Let me know how it works for you. Roy
Re: Crash related to VLANs in Oct 18th -current
On 23/10/2017 14:08, Thor Lancelot Simon wrote: > I think it is safe to say that an interface which is participating > in an interface stack such as vlan or agr should never be given an > address unless the user has explicitly configured the system to do > so. The sane default is to give addresses to the leaf interfaces > only (e.g. vlan) not the root nor intermediate nodes (wm, agr, etc -- > noting of course that any of these interfaces _could_ be the leaf, > but in fact are not). The mere act of bringing an interface up will generally assign it an IPv6 link-local address. dhcpcd doesn't change this behaviour. Luckily this can be disabled in dhcpcd quite easy: # Global default is IPv6 on all interfaces interface wm0 noipv6 # Disable IPv6 on wm0 Or reverse the logic noipv6 # Disable IPv6 globally interface wm0 ipv6 # Enable IPv6 for wm0 Or just disallow the interface entirely: denyinterfaces wm0 Or just allow some interfaces whilst denying others: allowinterfaces wm0 And you can stop the kernel from doing this too if not using dhcpcd ndp -i wm0 -- -auto_linklocal Roy
Re: Crash related to VLANs in Oct 18th -current
On 23/10/2017 11:28, Tom Ivar Helbekkmo wrote: > Has something changed that makes dhcpcd now insist on listening to all > interfaces (including the 802.1q trunk)? Yes. I will try and improve the logic so it's only the relevant interfaces. The change was made to allow IP address sharing on many interfaces via DHCP without actually removing the IP address from the non active interfaces. This might have been over-zealous on my part. > Can I make it not do that? Currently not, no. Hopefully I can change it so that no toggle for it is needed. > Oh, and I notice that IPv6 generates a local address on wm0, as on > everything else. That just looks weird on an 802.1q trunk. Is there a > way to make it not do that? I don't know anything about 802.1q trunks. How can I tell that it is one, and why shouldn't it have a local address? > > # cat /etc/ifconfig.wm0 > > up > media 100baseTX mediaopt full-duplex > ip4csum tcp4csum udp4csum > > # ifconfig wm0 > > wm0: flags=0x8843 mtu 1500 > capabilities=2bf80 > capabilities=2bf80 > capabilities=2bf80 > enabled=3f00 > enabled=3f00 > ec_capabilities=7 > ec_enabled=3 > address: 00:13:72:f7:00:06 > media: Ethernet 100baseTX full-duplex > status: active > inet6 fe80::213:72ff:fef7:6%wm0/64 flags 0x0 scopeid 0x1 > > Which VLAN is that IPv6 address on, anyway? :) No idea. It's the address belonging to wm0 interface. See my earlier query. Even if dhcpcd is not used, if IPv6 is enabled in the kernel and auto-link local is set for the interface (which it is by default and it looks like you've not disabled it in ifconfig.wm0) then you would get this address anyway. Roy
Re: Crash related to VLANs in Oct 18th -current
On 23/10/2017 07:42, Kengo NAKAHARA wrote: > Hi, > > On 2017/10/22 23:56, Tom Ivar Helbekkmo wrote: >> Tom Ivar Helbekkmo writes: >> >>> That did the trick! Thank you! :) > > Thank you for your testing! > >> I'm actually wondering if there may be something else strange going on. >> Everything works fine -- but I have this dhcpcd running, because one of >> my VLANs is connected to a network where this machine has to accept a >> DHCP provisioned IP address from a server. I run "dhcpcd -q vlan9", and >> also give it a configuration file that should keep it from doing >> anything I don't want: >> >> allowinterfaces vlan9 >> interface vlan9 >> background >> persistent >> hostname_short >> nogateway >> nohook resolv.conf, wpa_supplicant, hostname, ntp.conf >> script /usr/bin/true You could use script /dev/null or maybe just script by itself, then dhcpcd won't even try and call the script. Which makes it more efficient. >> >> However, after this last upgrade, I keep getting messages from dhcpcd >> about other interfaces, where this host is the DHCP server, like: >> >> Oct 22 16:48:28 barsoom dhcpcd[16236]: vlan2: invalid UDP packet from >> 172.27.201.1 >> Oct 22 16:48:28 barsoom dhcpcd[16236]: wm0: invalid UDP packet from >> 172.27.201.1 >> >> This happens every time a host on one of the other VLANs gets an address >> from the local DHCP server, and I get this pair of messages; one for the >> VLAN in question, one for wm0, which is the vlanif with the trunk on it. >> >> Running 8.99.1 from about two months ago, these messages did not occur. This normally indicates a UDP checksum failure. For future versions, I've improved the message here: https://roy.marples.name/git/dhcpcd.git/commit/?id=53bad6f740d66108c7412a492819e4c7e17bff51 > > Hmm..., sorry, I am not sure about this problem from that information. > Could you get tcpdump? Of course, if it is not a problem, please do it. > > >> roy@n.o > > I think the issue seems to be related to DHCP. Could you think of any > other way to solve it? Maybe try disabling hardware processing of UDP checksums on the interface? Roy
Re: Any actions regarding WPA2 vulnerabilities
On 16/10/2017 20:40, m...@netbsd.org wrote: On Mon, Oct 16, 2017 at 06:26:09PM +0200, Dmitry Salychev wrote: Hi, guys. Are there patches for these WPA2 vulnerabilities? Are there affected ports? I haven't seen any message regarding the subject. Thanks. Regards, - Dmitry Hi, We rely on wpa_supplicant/hostapd for WPA2. They have released a set of patches and spz@ already patched -current, it is also pullup-8 #324, pullup-7 #1517, pullup-6 #1507. Many thanks to spz@ for the fast application of patches!
Re: -current vs MKINET6=NO
On 12/08/2017 06:09, Geoff Wing wrote: Hi, the following files need changes to build a full tree with MKINET6=NO external/apache2/mDNSResponder/dist/mDNSPosix/mDNSUNP.c external/bsd/dhcpcd/dist/src/dhcpcd.c external/bsd/dhcpcd/dist/src/if-bsd.c external/bsd/tcpdump/bin/Makefile mDNSUNP.c needs #include for some IFF_* definitions. dhcpd stuff needs quite a few changes to remove calls to ip6 stuff dhcpcd patch is quite simple. https://dev.marples.name/rDHC32ee94da8d8c9d15a28a92ddb6760baf2c87fd23 And one to build without INET (not that we have a knob for that atm) https://dev.marples.name/rDHC90fabbf1826344d53835f7054655792baf7aa0b4 I'll look into importing a new dhcpcd with these changes and many others this weekend. Roy
Re: long delay getting address from ISP w/-current dhcpcd
Hi John On 24/07/2017 21:46, John D. Baker wrote: > On Mon, 24 Jul 2017, John D. Baker wrote: > >> Now that it has generated a new DUID (and once the ISP's DHCP server >> issues a lease for it), I'll need to be sure and copy the "duid" file >> as "/etc/dhcpcd.duid" for the netbsd-7 installation on the CF card. >> Then, an update to netbsd-8 will migrate it to "/var/db/dhcpcd/duid". > > This seems to have been a case of different DUID values between the > local disk (CF) installation and the NFS-root installation and the ISP's > behavior when being presented with a DUID of which it doesn't yet have > (or no-longer has) a record. > > Copying the "/var/db/dhcpcd/duid" file from the -current NFS install to > "/etc/dhcpcd.duid" on the netbsd-7 CF install ensured that either case > would get an address quickly. The subsequent upgrade to 8.0_BETA migrated > the "/etc/dhcpcd.duid" file to "/var/db/dhcpcd/duid" and everything is > working nicely now. Glad it working nicely for you now! Maybe we should put something about this change in some upgrade notes if we have any? > Now, to keep this behavior in mind should I put my old SS5-based router > back into service or replace it with an ERLITE or one of the supported > RouterBoard products. Now you know the root cause, whatever you put in place is entirely your choice. I myself run an ERLITE router with dhcpcd to negotiate stuff (although just from the cable modem which has it's own DHCP server) - but it boots entirely off the USB stick and not NFS as some like to do. Roy