[patch] Fail execve on environment duplicates
Duplicate environment variables have been a source of bugs, so perhaps the kernel should fail an execve with an invalid envp. Diff below checks that each environment string contains a '=' and that there are no duplicates up to the equals sign. So far I haven't noticed any breakage except for a purposefully misbehaving program to test that execve will fail. EINVAL gives a somewhat confusing error message, but nothing in intro(2) looked like a better choice. Idea from Martin Brandenburg. - Matthew Martin diff --git kern_exec.c kern_exec.c index 7784d5f4165..3c3f0aa6266 100644 --- kern_exec.c +++ kern_exec.c @@ -239,10 +239,8 @@ sys_execve(struct proc *p, void *v, register_t *retval) struct vattr attr; struct ucred *cred = p->p_ucred; char *argp; - char * const *cpp, *dp, *sp; -#ifdef KTRACE + char * const *cpp, *dp, *ep, *op, *sp; char *env_start; -#endif struct process *pr = p->p_p; long argc, envc; size_t len, sgap; @@ -357,9 +355,7 @@ sys_execve(struct proc *p, void *v, register_t *retval) envc = 0; /* environment does not need to be there */ if ((cpp = SCARG(uap, envp)) != NULL ) { -#ifdef KTRACE env_start = dp; -#endif while (1) { len = argp + ARG_MAX - dp; if ((error = copyin(cpp, &sp, sizeof(sp))) != 0) @@ -371,6 +367,19 @@ sys_execve(struct proc *p, void *v, register_t *retval) error = E2BIG; goto bad; } + + /* Check for duplicate environment variables */ + if ((ep = strchr(dp, '=')) == NULL) { + error = EINVAL; + goto bad; + } + for (op = env_start; op != dp; op = strchr(op, '\0') + 1) { + if (strncmp(dp, op, (ep - dp + 1)) == 0) { + error = EINVAL; + goto bad; + } + } + dp += len; cpp++; envc++;
Re: pf: remove pfr_{sin,sin6,mask} globals
Hello Patrick, your diff looks OK to me. thanks and regards sasha On Mon, May 08, 2017 at 02:56:55PM +0200, Patrick Wildt wrote: > Hi, > > in order to reduce globals so that we can run more parts of pf in > parallel, this diff removes the pfr_sin, pfr_sin6 and pfr_mask > globals. Those are instead allocated on the stack. > > ok? >
Re: ocspcheck size_t printing
You are correct. Patch committed. Thanks! -Bob On Mon, May 08, 2017 at 08:20:57PM +0200, Jonas 'Sortie' Termansen wrote: > Hi, > > When upgrading to libressl-2.5.4 I noticed a couple -Wformat errors due > to this code assuming size_t is of type long when it was actually int on > this 32-bit system. Here's a patch against cvs that fixes the issue and > also prints the variableas unsigned type. > > Jonas > > Index: ocspcheck.c > === > RCS file: /cvs/src/usr.sbin/ocspcheck/ocspcheck.c,v > retrieving revision 1.20 > diff -u -r1.20 ocspcheck.c > --- ocspcheck.c 27 Mar 2017 23:59:08 - 1.20 > +++ ocspcheck.c 8 May 2017 18:15:51 - > @@ -564,7 +564,7 @@ > if ((request = ocsp_request_new_from_cert(certfile, nonce)) == NULL) > exit(1); > > - dspew("Built an %ld byte ocsp request\n", request->size); > + dspew("Built an %zu byte ocsp request\n", request->size); > > if ((host = url2host(request->url, &port, &path)) == NULL) > errx(1, "Invalid OCSP url %s from %s", request->url, > @@ -601,7 +601,7 @@ > dspew("Server at %s returns:\n", host); > for (i = 0; i < httphsz; i++) > dspew(" [%s]=[%s]\n", httph[i].key, httph[i].val); > - dspew(" [Body]=[%ld bytes]\n", hget->bodypartsz); > + dspew(" [Body]=[%zu bytes]\n", hget->bodypartsz); > if (hget->bodypartsz <= 0) > errx(1, "No body in reply from %s", host); > >
Re: relayd: incomplete response from a TLS-accelerated apache
Compiling relayd with -DDEBUG=3 and watching the output gave me nothing. No errors what so ever about out of buffers or something else. However, removing 'socket buffer 65536’ solved my problem. Br > 8 maj 2017 kl. 13:27 skrev Maxim Bourmistrov : > > Hey, > I investigate a problem were TLS-asselerated machine response is incomplete. > I was able to reproduce this on OpenBSD 5.9, 6.0 and 6.1. Test on 5.8 is > about to be. > > Following env I have: > > relay1: relayd machine > web1: apache 2.2.31 serving the request > client1: requester > > relay1 is configured following way (relevant lines): > > http protocol http_relay { >tcp { nodelay, sack, socket buffer 65536, backlog 1024 } >match header append "X-Forwarded-For" value "$REMOTE_ADDR" >match header set "X-Forwarded-By" value "$SERVER_ADDR:$SERVER_PORT" >match header set "Keep-Alive" value "$TIMEOUT" >match request header remove "Proxy" > } > > http protocol tls_accel { >tcp { nodelay, sack, socket buffer 65536, backlog 1024 } >match header append "X-Forwarded-For" value "$REMOTE_ADDR" >match header set "X-Forwarded-By" value "$SERVER_ADDR:$SERVER_PORT" >match header set "X-Forwarded-Proto" value "https" >match header set "X-Forwarded-Port" value "443" >match header set "Keep-Alive" value "$TIMEOUT" >match request header remove "Proxy" > >tls { tlsv1, \ >ciphers "AES:!AES256:!aNULL" \ > } > } > > table { 172.16.1.111 } > > relay int_test_tls { >listen on 172.16.1.99 port 443 tls >protocol tls_accel >forward to port 80 mode roundrobin check http "/" code 200 > } > > relay int_test_http { >listen on 172.16.1.99 port 80 >protocol http_relay >forward to port 80 mode roundrobin check http "/" code 200 > } > > web1 is a std Apache 2.2.31 with enabled deflate for the following > > AddOutputFilterByType DEFLATE application/json > AddOutputFilterByType DEFLATE text/html > AddOutputFilterByType DEFLATE text/plain > AddOutputFilterByType DEFLATE text/xml > AddOutputFilterByType DEFLATE text/css > AddOutputFilterByType DEFLATE application/x-javascript > AddOutputFilterByType DEFLATE application/javascript > > and serving a JS file. > > client1 is running PHP code from CLI to reproduce this problem. > > > Following is observed: > > 1. Client1 requests web1 directly on port 80 and gets full response > > shell$ php client3.php > Expected length: 547204 > Received length: 547204 > > [Response Headers] > HTTP/1.1 200 OK > Date: Mon, 08 May 2017 11:08:27 GMT > Server: Apache/2.2.31 (Unix) mod_ssl/2.2.31 OpenSSL/1.0.1e-fips > Last-Modified: Mon, 08 May 2017 07:22:43 GMT > ETag: "60319-85984-54efe1ae42be3" > Accept-Ranges: bytes > Content-Length: 547204 > Vary: Accept-Encoding > Connection: close > Content-Type: application/javascript > > 2. Client1 requests web1 directly on port 80 WITH GZIP enabled and gets full > response back > I see gzipped stream on the screen and then it gets decoded to a complete > file. File I get is not cut. > > Expected length: Content-Length not recieved > Received length: 165454 > > [Response Headers] > HTTP/1.1 200 OK > Date: Mon, 08 May 2017 11:10:18 GMT > Server: Apache/2.2.31 (Unix) mod_ssl/2.2.31 OpenSSL/1.0.1e-fips > Last-Modified: Mon, 08 May 2017 07:22:43 GMT > ETag: "60319-85984-54efe1ae42be3" > Accept-Ranges: bytes > Vary: Accept-Encoding > Content-Encoding: gzip > Connection: close > Content-Type: application/javascript > > 3. and 4. Clien1 requests relay1 on port 80 (with and without GZIP) and gets > complete response > > 5. Client1 requests relay1 on port 443 without GZIP - response is incomplete > > Expected length: 547204 > Received length: 396424 > > [Response Headers] > HTTP/1.1 200 OK > Accept-Ranges: bytes > Connection: close > Content-Length: 547204 > Content-Type: application/javascript > Date: Mon, 08 May 2017 11:14:59 GMT > ETag: "60319-85984-54efe1ae42be3" > Last-Modified: Mon, 08 May 2017 07:22:43 GMT > Server: Apache/2.2.31 (Unix) mod_ssl/2.2.31 OpenSSL/1.0.1e-fips > Vary: Accept-Encoding > > 6. Client1 requests relay1 on port 443 with GZIP - response is complete. > > > So non-gzipped response from behind the relay1 is incomplete while doing TLS > termination. > Files server.js and client.php can be provided upon request. > > Any ideas? > > Br > > >
OpenBSD Errata: May 8th, 2017 (libssl)
Errata patches for libssl have been released for OpenBSD 6.1 and 6.0. Incorrect DTLS cookie handling can result in a NULL pointer dereference. Binary updates for the amd64 and i386 platforms are available via the syspatch utility. Source code patches can be found on the respective errata pages: https://www.openbsd.org/errata60.html https://www.openbsd.org/errata61.html If none of your server-side applications use DTLS, this issue does not affect you.
Re: Correct capable baseband in dmesg for RTL8188E chips
In that case I'd rather just have a general else like you said. I'll commit it in a bit if you don't beat me to it :) thanks! On 19:27 Mon 08 May , Stefan Sperling wrote: > On Mon, May 08, 2017 at 05:35:26PM +0100, Ricardo Mestre wrote: > > Hi tech@ > > > > During stsp@'s effort to merge rtwn(4) and urtwn(4), more specifically since > > r1.6 of /cvs/src/sys/dev/ic/rtwn.c, my urtwn(4) device started showing in > > dmesg > > with a capable baseband of 0T0R (only noticed it today!): > > > > urtwn0: MAC/BB RTL8188EU, RF 6052 0T0R, address xx:xx:xx:xx:xx:xx > > > > Since according to urtwn(4)'s manpage both RTL8188C and RTL8188E chips have > > a > > 1T1R capable baseband please find below a patch to correct this. > > > > OK? > > 8192c is the only MIMO chip supported by this driver. All other chips > are 1T1R ones. We could make the 1T1R case a general 'else' clause > instead of an 'else if', as in: > > - } else if (sc->chip & RTWN_CHIP_88C) { > + } else { > > But either way, OK by me. Thank you for hunting this down.
IPv6 IPsec transport pf
Hi, IPv6 IPsec transport mode does not work if pf is enabled. The problem is that the decrypted packets in the input path are not checked with pf(4). So if you have stateful filtering on enc0 (the default) direction aware protocols like ping or TCP do not pass. Only the output packets are matched with the states. Adding an explicit pf_test() in ipsec_common_input_cb() fixes this. In the IPv4 case the decrypted packet is enqueued again, so it hits pf_test() in IPv4_input(). In IPv6 the ip6_local() shortcut is taken. I like to keep the shortcut as it corresponds to the idea of IPv6 header chains. ok? bluhm Index: netinet/ipsec_input.c === RCS file: /data/mirror/openbsd/cvs/src/sys/netinet/ipsec_input.c,v retrieving revision 1.148 diff -u -p -r1.148 ipsec_input.c --- netinet/ipsec_input.c 5 May 2017 11:04:18 - 1.148 +++ netinet/ipsec_input.c 8 May 2017 16:44:31 - @@ -591,6 +591,25 @@ ipsec_common_input_cb(struct mbuf *m, st return; #ifdef INET6 case AF_INET6: +#if NPF > 0 + if ((tdbp->tdb_flags & TDBF_TUNNELING) == 0) { + struct ifnet *ifp; + + /* This is the enc0 interface unless for ipcomp. */ + if ((ifp = if_get(m->m_pkthdr.ph_ifidx)) == NULL) { + m_freem(m); + return; + } + if (pf_test(AF_INET6, PF_IN, ifp, &m) != PF_PASS) { + if_put(ifp); + m_freem(m); + return; + } + if_put(ifp); + if (m == NULL) + return; + } +#endif ip6_local(m, skip, prot); return; #endif /* INET6 */
ocspcheck size_t printing
Hi, When upgrading to libressl-2.5.4 I noticed a couple -Wformat errors due to this code assuming size_t is of type long when it was actually int on this 32-bit system. Here's a patch against cvs that fixes the issue and also prints the variableas unsigned type. Jonas Index: ocspcheck.c === RCS file: /cvs/src/usr.sbin/ocspcheck/ocspcheck.c,v retrieving revision 1.20 diff -u -r1.20 ocspcheck.c --- ocspcheck.c 27 Mar 2017 23:59:08 - 1.20 +++ ocspcheck.c 8 May 2017 18:15:51 - @@ -564,7 +564,7 @@ if ((request = ocsp_request_new_from_cert(certfile, nonce)) == NULL) exit(1); - dspew("Built an %ld byte ocsp request\n", request->size); + dspew("Built an %zu byte ocsp request\n", request->size); if ((host = url2host(request->url, &port, &path)) == NULL) errx(1, "Invalid OCSP url %s from %s", request->url, @@ -601,7 +601,7 @@ dspew("Server at %s returns:\n", host); for (i = 0; i < httphsz; i++) dspew(" [%s]=[%s]\n", httph[i].key, httph[i].val); - dspew(" [Body]=[%ld bytes]\n", hget->bodypartsz); + dspew(" [Body]=[%zu bytes]\n", hget->bodypartsz); if (hget->bodypartsz <= 0) errx(1, "No body in reply from %s", host);
Re: Correct capable baseband in dmesg for RTL8188E chips
On Mon, May 08, 2017 at 05:35:26PM +0100, Ricardo Mestre wrote: > Hi tech@ > > During stsp@'s effort to merge rtwn(4) and urtwn(4), more specifically since > r1.6 of /cvs/src/sys/dev/ic/rtwn.c, my urtwn(4) device started showing in > dmesg > with a capable baseband of 0T0R (only noticed it today!): > > urtwn0: MAC/BB RTL8188EU, RF 6052 0T0R, address xx:xx:xx:xx:xx:xx > > Since according to urtwn(4)'s manpage both RTL8188C and RTL8188E chips have a > 1T1R capable baseband please find below a patch to correct this. > > OK? 8192c is the only MIMO chip supported by this driver. All other chips are 1T1R ones. We could make the 1T1R case a general 'else' clause instead of an 'else if', as in: - } else if (sc->chip & RTWN_CHIP_88C) { + } else { But either way, OK by me. Thank you for hunting this down. > Index: rtwn.c > === > RCS file: /cvs/src/sys/dev/ic/rtwn.c,v > retrieving revision 1.19 > diff -u -p -u -r1.19 rtwn.c > --- rtwn.c12 Feb 2017 01:01:39 - 1.19 > +++ rtwn.c8 May 2017 16:16:57 - > @@ -154,7 +154,7 @@ rtwn_attach(struct device *pdev, struct > if (sc->chip & RTWN_CHIP_92C) { > sc->ntxchains = (sc->chip & RTWN_CHIP_92C_1T2R) ? 1 : 2; > sc->nrxchains = 2; > - } else if (sc->chip & RTWN_CHIP_88C) { > + } else if (sc->chip & RTWN_CHIP_88C || sc->chip & RTWN_CHIP_88E) { > sc->ntxchains = 1; > sc->nrxchains = 1; > } >
Correct capable baseband in dmesg for RTL8188E chips
Hi tech@ During stsp@'s effort to merge rtwn(4) and urtwn(4), more specifically since r1.6 of /cvs/src/sys/dev/ic/rtwn.c, my urtwn(4) device started showing in dmesg with a capable baseband of 0T0R (only noticed it today!): urtwn0: MAC/BB RTL8188EU, RF 6052 0T0R, address xx:xx:xx:xx:xx:xx Since according to urtwn(4)'s manpage both RTL8188C and RTL8188E chips have a 1T1R capable baseband please find below a patch to correct this. OK? Index: rtwn.c === RCS file: /cvs/src/sys/dev/ic/rtwn.c,v retrieving revision 1.19 diff -u -p -u -r1.19 rtwn.c --- rtwn.c 12 Feb 2017 01:01:39 - 1.19 +++ rtwn.c 8 May 2017 16:16:57 - @@ -154,7 +154,7 @@ rtwn_attach(struct device *pdev, struct if (sc->chip & RTWN_CHIP_92C) { sc->ntxchains = (sc->chip & RTWN_CHIP_92C_1T2R) ? 1 : 2; sc->nrxchains = 2; - } else if (sc->chip & RTWN_CHIP_88C) { + } else if (sc->chip & RTWN_CHIP_88C || sc->chip & RTWN_CHIP_88E) { sc->ntxchains = 1; sc->nrxchains = 1; }
ipv6 mapped address output
Hi, Checking for IPv4 mapped addresses is a bit inconsistent in the output path. So I would like to: - Use the common switch(af) construct for af specific code in tcp_usrreq(PRU_CONNECT). - Add a EAFNOSUPPORT default case. - Do not check for mapped addresses, this is done in in6_pcbconnect(). - Do not check for locally bound mapped addresses in in6_pcbconnect(), this is done during bind(2) in in6_pcbaddrisavail(). - Check for mapped addesses in rip6_output() like it is done in udp6_output(). - Move the EAFNOSUPPORT error from rip6_usrreq() to rip6_output() like it is done for UDP. - Return EADDRNOTAVAIL if UDP sendto(2) is used with a mapped address. ok? bluhm Index: netinet/tcp_usrreq.c === RCS file: /data/mirror/openbsd/cvs/src/sys/netinet/tcp_usrreq.c,v retrieving revision 1.147 diff -u -p -r1.147 tcp_usrreq.c --- netinet/tcp_usrreq.c5 Apr 2017 13:35:18 - 1.147 +++ netinet/tcp_usrreq.c8 May 2017 13:23:50 - @@ -127,7 +127,6 @@ int tcp_usrreq(struct socket *so, int req, struct mbuf *m, struct mbuf *nam, struct mbuf *control, struct proc *p) { - struct sockaddr_in *sin; struct inpcb *inp; struct tcpcb *tp = NULL; int error = 0; @@ -221,33 +220,40 @@ tcp_usrreq(struct socket *so, int req, s * Send initial segment on connection. */ case PRU_CONNECT: - sin = mtod(nam, struct sockaddr_in *); - -#ifdef INET6 - if (sin->sin_family == AF_INET6) { - struct in6_addr *in6_addr = &mtod(nam, - struct sockaddr_in6 *)->sin6_addr; - - if (IN6_IS_ADDR_UNSPECIFIED(in6_addr) || - IN6_IS_ADDR_MULTICAST(in6_addr) || - IN6_IS_ADDR_V4MAPPED(in6_addr)) { + switch (mtod(nam, struct sockaddr *)->sa_family) { + case AF_INET: { + struct in_addr *addr = + &mtod(nam, struct sockaddr_in *)->sin_addr; + + if ((addr->s_addr == INADDR_ANY) || + (addr->s_addr == INADDR_BROADCAST) || + IN_MULTICAST(addr->s_addr) || + in_broadcast(*addr, inp->inp_rtableid)) { error = EINVAL; break; } - error = in6_pcbconnect(inp, nam); - } else if (sin->sin_family == AF_INET) -#endif /* INET6 */ - { - if ((sin->sin_addr.s_addr == INADDR_ANY) || - (sin->sin_addr.s_addr == INADDR_BROADCAST) || - IN_MULTICAST(sin->sin_addr.s_addr) || - in_broadcast(sin->sin_addr, inp->inp_rtableid)) { + error = in_pcbconnect(inp, nam); + break; + } +#ifdef INET6 + case AF_INET6: { + struct in6_addr *addr6 = + &mtod(nam, struct sockaddr_in6 *)->sin6_addr; + + if (IN6_IS_ADDR_UNSPECIFIED(addr6) || + IN6_IS_ADDR_MULTICAST(addr6)) { error = EINVAL; break; } - error = in_pcbconnect(inp, nam); + error = in6_pcbconnect(inp, nam); + break; + } +#endif /* INET6 */ + default: + error = EAFNOSUPPORT; + break; } if (error) Index: netinet6/in6_pcb.c === RCS file: /data/mirror/openbsd/cvs/src/sys/netinet6/in6_pcb.c,v retrieving revision 1.97 diff -u -p -r1.97 in6_pcb.c --- netinet6/in6_pcb.c 7 Mar 2017 16:59:40 - 1.97 +++ netinet6/in6_pcb.c 8 May 2017 13:23:50 - @@ -256,14 +256,9 @@ in6_pcbconnect(struct inpcb *inp, struct return (EAFNOSUPPORT); if (sin6->sin6_port == 0) return (EADDRNOTAVAIL); - /* reject IPv4 mapped address, we have no support for it */ if (IN6_IS_ADDR_V4MAPPED(&sin6->sin6_addr)) - return EADDRNOTAVAIL; - - /* sanity check for mapped address case */ - if (IN6_IS_ADDR_V4MAPPED(&inp->inp_laddr6)) - return EINVAL; + return (EADDRNOTAVAIL); /* protect *sin6 from overwrites */ tmp = *sin6; Index: netinet6/raw_ip6.c === RCS file: /data/mirror/openbsd/cvs/src/sys/netinet6/raw_ip6.c,v retrieving revision 1.113 diff -u -p -r1.113 raw_ip6.c --- netinet6/raw_ip6.c 8 May 2017 08:46:39 - 1.113 +++ netinet6/raw_ip6.c 8 May 2017 14:16:27 - @@ -343,7 +343,6 @
Re: uip_socket.c: issues when using sendmsg() with small send buffers and the new 6.1 control message (IP_SENDSRCADDR)
On Fri, Apr 21, 2017 at 08:43:11AM +, Markert, Alexander wrote: > In our opinion either EMSGSIZE should be returned instead in this case (like > e.g. FreeBSD 11.0 does) or OpenBSD should reserve some space (comparable to > MSG_OOB) in addition to the maximum size of the socket's send buffer for the > control messages in order to avoid such problems. > > What do you think about that behavior and the described alternatives? I think EMSGSIZE if send space is too small, would be the way to go. On the reveive path also some bytes of space must be reseverd by the user program, if control messages are used. This behavior should be symmetric for the send path. bluhm
Re: fix iwm command wait vs interface reset
On Mon, May 08, 2017 at 03:54:38PM +0200, Mark Kettenis wrote: > > Date: Mon, 8 May 2017 15:36:11 +0200 > > From: Stefan Sperling > > > > In iwm_send_cmd(), look at the generation counter instead of the STOPPED > > flag > > to determine whether the interface was reset while we were sleeping. The > > flag > > will be set if the interface is still down when the task wakes up, but the > > interface could already be up again in which case the flag will be cleared. > > A stale task which sends an outdated command could confuse the firmware. > > Makes sense. The comment isn't really accurate now though... Yes, the comments could be better. I'll eventually do another pass over the driver just fixing stylistic issues such as this (probably during d2k17).
Re: arm64 lock: no userland progress, several procs in wchan "vp"
On 2017/05/04 00:31, Mark Kettenis wrote: > > Date: Wed, 3 May 2017 21:05:24 +0100 > > From: Stuart Henderson > > > > On 2017/05/03 15:12, Mark Kettenis wrote: > > > > Date: Wed, 3 May 2017 13:51:22 +0100 > > > > From: Stuart Henderson > > > > > > > > On 2017/05/01 22:18, Mark Kettenis wrote: > > > > > > Date: Mon, 1 May 2017 20:58:29 +0100 > > > > > > From: Stuart Henderson > > > > > > > > > > > > Userland is non-responsive, machine is pingable, tcp connections > > > > > > open > > > > > > but no banner from ssh. No failed pool requests. This kernel is from > > > > > > today's snapshot but I saw the same with one from a couple of days > > > > > > ago. Is there anything else I can get that might be useful? > > > > > > > > > > .. > > > > > > 71034 186155 65198 0 30x11 vpperl > > > > .. > > > > > > > > > > The diff below might fix thise. Or it might actually turn this into a > > > > > hard hang... > > > > > > > > > > Nevertheless, could you try running with it? > > > > > > > > I haven't seen this happen again with your diff, and haven't seen any > > > > hangs. Probably still too early to say for sure that it fixes things, > > > > but it looks promising so far. > > > > > > Thanks. Since Dale ok'ed it and I had been running with it for a > > > while already, I committed it last night. > > > > > > > Ha. As is traditional, not long after sending that message I've hit > > a hard lock - no DDB. > > I'm sure it wouldn't have happened if I hand't committed it ;). > > Could you change the PR_NOWAIT back into P_WAITOK and see if you can > reproduce the hang and break into ddb? Meanwhile I'll think about > what information to print once you've hit it ;). > I've hit it: [halt sent] Stopped at pluart_intr+0x164: ddb> ps PID TID PPIDUID S FLAGS WAIT COMMAND 625 437040 78755 55 3 0x2 vpcc 78755 385259 7516 55 30x82 wait cc 7516 136325 5903 55 30x82 wait bash 19959 322579 19904 0 30x11 vpperl 5903 139810 38260 55 30x82 wait gmake 38260 218618 2854 55 30x82 wait bash 2854 90787 9324 55 30x82 wait gmake 9324 170231 48868 55 30x82 wait gmake 48868 117527 14396 55 30x80 wait bash 14396 515415 1643 55 30x82 wait bash 1643 416463 68987 55 30x82 wait gmake 68987 461754 3515 55 30x82 wait bash 3515 418174 60280 55 30x82 wait gmake 60280 191595 96060 55 30x10008a pause sh 96060 102351 39653 55 30x10008a pause sh 39653 519956 69874 55 30x10008a pause make 69874 105490 67943 55 30x10008a pause make 679438933 3490 55 30x10008a pause sh 3490 32658 91100 55 30x10008a pause make 91100 232620 7725 55 30x10008a pause sh 7725 345849 19904 55 30x10008a pause make 74896 137698 47027 1000 30x100083 kqreadtmux 47027 296497 97822 1000 30x10008b pause ksh 97822 245510 69858 1000 30x10 vpsshd 69858 304169 7139 0 30x92 poll sshd 52214 79445 1 0 30x13 vpgetty 65809 151855 1 77 30x100090 poll dhclient 69118 177062 1 0 30x80 poll dhclient 74448 202290 83042 1000 30x100083 ttyin ksh 19904 489750 99750 0 30x93 wait perl 99750 481138 83042 1000 30x10008b pause ksh 83042 493811 1 1000 30x100080 kqreadtmux 60522 271756 1 0 30x100010 vpcron 49702 312331 1110 30x100090 poll sndiod 81577 93 1 99 30x100090 poll sndiod 3727 393780 19238 95 30x100092 kqreadsmtpd 88472 480120 19238103 30x100092 kqreadsmtpd 87719 273623 19238 95 30x100092 kqreadsmtpd 8252 227745 19238 95 30x100092 kqreadsmtpd 30445 236541 19238 95 30x100092 kqreadsmtpd 23622 99340 19238 95 30x100092 kqreadsmtpd 19238 382063 1 0 30x100080 kqreadsmtpd 7139 54914 1 0 30x80 selectsshd 63805 211190 0 0 3 0x14280 nfsidlnfsio 56273 341955 0 0 3 0x14280 nfsidlnfsio 13129 36316 0 0 3 0x14280 nfsidlnfsio 43056 41917 0 0 3 0x14280 nfsidlnfsio 65234 94439 1 0 30x80 poll rpc.sta
Re: fix iwm command wait vs interface reset
> Date: Mon, 8 May 2017 15:36:11 +0200 > From: Stefan Sperling > > In iwm_send_cmd(), look at the generation counter instead of the STOPPED flag > to determine whether the interface was reset while we were sleeping. The flag > will be set if the interface is still down when the task wakes up, but the > interface could already be up again in which case the flag will be cleared. > A stale task which sends an outdated command could confuse the firmware. Makes sense. The comment isn't really accurate now though... > Index: if_iwm.c > === > RCS file: /cvs/src/sys/dev/pci/if_iwm.c,v > retrieving revision 1.180 > diff -u -p -r1.180 if_iwm.c > --- if_iwm.c 8 May 2017 12:41:23 - 1.180 > +++ if_iwm.c 8 May 2017 13:04:05 - > @@ -3630,6 +3630,7 @@ iwm_send_cmd(struct iwm_softc *sc, struc > int group_id; > size_t hdrlen, datasz; > uint8_t *data; > + int generation = sc->sc_generation; > > code = hcmd->id; > async = hcmd->flags & IWM_CMD_ASYNC; > @@ -3651,7 +3652,7 @@ iwm_send_cmd(struct iwm_softc *sc, struc >* Is the hardware still available? (after e.g. above wait). >*/ > s = splnet(); > - if (sc->sc_flags & IWM_FLAG_STOPPED) { > + if (generation != sc->sc_generation) { > err = ENXIO; > goto out; > } > @@ -3764,7 +3765,6 @@ iwm_send_cmd(struct iwm_softc *sc, struc > IWM_WRITE(sc, IWM_HBUS_TARG_WRPTR, ring->qid << 8 | ring->cur); > > if (!async) { > - int generation = sc->sc_generation; > err = tsleep(desc, PCATCH, "iwmcmd", hz); > if (err == 0) { > /* if hardware is no longer up, return error */ > >
Re: uip_socket.c: issues when using sendmsg() with small send buffers and the new 6.1 control message (IP_SENDSRCADDR)
On 26/04/17(Wed) 12:29, Markert, Alexander wrote: > Hi, > > actually you are right that this issue is related to control messages and not > to the send buffer length. But the length of the control message is checked > in combination with the data to be sent in uip_socket.c: > > Let's assume we are sending data with the maximum send buffer size (e.g. > 1024). > Since clen is NOT evaluated in the first if statement, EMSGSIZE is NOT > returned. > But the second if statement becomes true, since clen is considered here. > Hence either EWOULDBLOCK (snderr) is returned or sbwait() is called. > > if ((atomic && resid > so->so_snd.sb_hiwat) || > (so->so_proto->pr_domain->dom_family != AF_LOCAL && > clen > so->so_snd.sb_hiwat)) > snderr(EMSGSIZE); > if (space < clen || > (space - clen < resid && > (atomic || space < so->so_snd.sb_lowat))) { > if ((so->so_state & SS_NBIO) || (flags & MSG_DONTWAIT)) > snderr(EWOULDBLOCK); > sbunlock(&so->so_snd); > error = sbwait(&so->so_snd); > > I attached a small example, which > - sets the sockets maximum buffer length > - enables/disables non-blocking IO and > - calls sendmsg with the maximum buffer size including a control message > (IP_SENDSRCADDR). I KNF'ed & converted the test case to our framework. I didn't include it in the existing regress/sys/netinet/sendscraddr test because I don't grok it. Alexander I used our standard license template for your test, is it ok? Vincent, Jeremie could you look at this? Index: ip_sendsrcaddr/Makefile === RCS file: ip_sendsrcaddr/Makefile diff -N ip_sendsrcaddr/Makefile --- /dev/null 1 Jan 1970 00:00:00 - +++ ip_sendsrcaddr/Makefile 8 May 2017 13:41:33 - @@ -0,0 +1,5 @@ +# $OpenBSD: Makefile,v 1.1 2017/04/30 09:03:58 mpi Exp $ + +PROG= ip_sendsrcaddr + +.include Index: ip_sendsrcaddr/ip_sendsrcaddr.c === RCS file: ip_sendsrcaddr/ip_sendsrcaddr.c diff -N ip_sendsrcaddr/ip_sendsrcaddr.c --- /dev/null 1 Jan 1970 00:00:00 - +++ ip_sendsrcaddr/ip_sendsrcaddr.c 8 May 2017 13:39:54 - @@ -0,0 +1,100 @@ +/* + * Copyright (c) 2017 Alexander Markert + * + * Permission to use, copy, modify, and distribute this software for any + * purpose with or without fee is hereby granted, provided that the above + * copyright notice and this permission notice appear in all copies. + * + * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES + * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF + * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR + * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES + * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN + * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF + * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. + */ + +#include +#include +#include +#include +#include + +#include +#include +#include +#include +#include +#include + +#define CFG_SOURCE_ADDRESS "192.168.88.135" +#define CFG_DESTINATION_ADDRESS"192.168.57.44" +#define CFG_PORT 5000 +#define CFG_SO_MAX_SEND_BUFFER 1024 + +int blocking = 0; + +int main(int argc, char* argv[]) +{ + char cmsgbuf[CMSG_SPACE(sizeof(struct in_addr))];; + struct sockaddr_in to; + char data[CFG_SO_MAX_SEND_BUFFER] = {0}; + unsigned int size = CFG_SO_MAX_SEND_BUFFER; + struct in_addr *source_address; + struct msghdr msg; + struct cmsghdr *cmsg; + struct iovec iov; + int so, bytes; + + so = socket(AF_INET, SOCK_DGRAM, IPPROTO_UDP); + if (so < 0) + err(-1, "opening socket failed"); + + if (setsockopt(so, SOL_SOCKET, SO_SNDBUF, &size, sizeof(size)) < 0) + err(-1, "adjusting socket send buffer failed"); + + if (!blocking) { + unsigned long on = 1; + + if (ioctl(so, FIONBIO, &on) < 0) + err(-1, "enabling non-blocking IO failed"); + } + + /* setup remote address */ + memset(&to, 0, sizeof(to)); + to.sin_family = AF_INET; + to.sin_addr.s_addr = inet_addr(CFG_DESTINATION_ADDRESS); + to.sin_port = htons(CFG_PORT); + + /* setup buffer to be sent */ + msg.msg_name = &to; + msg.msg_namelen = sizeof(to); + iov.iov_base = data; + iov.iov_len = sizeof(data); + msg.msg_iovlen = 1; + msg.msg_iov = &iov; + + /* setup configuration for source address */ + memset(cmsgbuf, 0, sizeof(cmsgbuf)); + msg.msg_control = cmsgbuf; + msg.msg_controllen = sizeof(cmsgbuf);
fix iwm command wait vs interface reset
In iwm_send_cmd(), look at the generation counter instead of the STOPPED flag to determine whether the interface was reset while we were sleeping. The flag will be set if the interface is still down when the task wakes up, but the interface could already be up again in which case the flag will be cleared. A stale task which sends an outdated command could confuse the firmware. Index: if_iwm.c === RCS file: /cvs/src/sys/dev/pci/if_iwm.c,v retrieving revision 1.180 diff -u -p -r1.180 if_iwm.c --- if_iwm.c8 May 2017 12:41:23 - 1.180 +++ if_iwm.c8 May 2017 13:04:05 - @@ -3630,6 +3630,7 @@ iwm_send_cmd(struct iwm_softc *sc, struc int group_id; size_t hdrlen, datasz; uint8_t *data; + int generation = sc->sc_generation; code = hcmd->id; async = hcmd->flags & IWM_CMD_ASYNC; @@ -3651,7 +3652,7 @@ iwm_send_cmd(struct iwm_softc *sc, struc * Is the hardware still available? (after e.g. above wait). */ s = splnet(); - if (sc->sc_flags & IWM_FLAG_STOPPED) { + if (generation != sc->sc_generation) { err = ENXIO; goto out; } @@ -3764,7 +3765,6 @@ iwm_send_cmd(struct iwm_softc *sc, struc IWM_WRITE(sc, IWM_HBUS_TARG_WRPTR, ring->qid << 8 | ring->cur); if (!async) { - int generation = sc->sc_generation; err = tsleep(desc, PCATCH, "iwmcmd", hz); if (err == 0) { /* if hardware is no longer up, return error */
boot: put debug printfs inside of defines
Hi, most if (debug) prints are inside of an ifdef guard, but those two apparently aren't. The amd64 efiboot for instance does only define debug if EFI_DEBUG is set, which it is not per default. So it won't build. We can either guard those checks as well or remove the EFI_DEBUG ifdef guarding the debug variable define. ok? Patrick diff --git a/sys/arch/i386/stand/libsa/pxe_net.c b/sys/arch/i386/stand/libsa/pxe_net.c index f5c8807e2f9..47bb2fa9a6e 100644 --- a/sys/arch/i386/stand/libsa/pxe_net.c +++ b/sys/arch/i386/stand/libsa/pxe_net.c @@ -168,8 +168,10 @@ net_getparams(int sock) bootp(sock); if (myip.s_addr != 0) return 0; +#ifdef NETIF_DEBUG if (debug) printf("net_getparams: BOOTP failed\n"); +#endif return EIO; } diff --git a/sys/lib/libsa/nfs.c b/sys/lib/libsa/nfs.c index e621f09adec..201043ef409 100644 --- a/sys/lib/libsa/nfs.c +++ b/sys/lib/libsa/nfs.c @@ -535,8 +535,10 @@ nfs_read(struct open_file *f, void *buf, size_t size, size_t *resid) return (errno); /* XXX - from nfs_readdata */ } if (cc == 0) { +#ifdef NFS_DEBUG if (debug) printf("nfs_read: hit EOF unexpectantly"); +#endif goto ret; } fp->off += cc;
pf: remove pfr_{sin,sin6,mask} globals
Hi, in order to reduce globals so that we can run more parts of pf in parallel, this diff removes the pfr_sin, pfr_sin6 and pfr_mask globals. Those are instead allocated on the stack. ok? Patrick diff --git a/sys/net/pf_table.c b/sys/net/pf_table.c index 708bd68cbcd..2cdff265ff5 100644 --- a/sys/net/pf_table.c +++ b/sys/net/pf_table.c @@ -140,11 +140,6 @@ struct pfr_walktree { struct pool pfr_ktable_pl; struct pool pfr_kentry_pl[PFRKE_MAX]; struct pool pfr_kcounters_pl; -struct sockaddr_in pfr_sin; -#ifdef INET6 -struct sockaddr_in6 pfr_sin6; -#endif /* INET6 */ -union sockaddr_unionpfr_mask; struct pf_addr pfr_ffaddr; int pfr_gcd(int, int); @@ -235,13 +230,6 @@ pfr_initialize(void) pool_init(&pfr_kcounters_pl, sizeof(struct pfr_kcounters), 0, IPL_SOFTNET, 0, "pfrkcounters", NULL); - pfr_sin.sin_len = sizeof(pfr_sin); - pfr_sin.sin_family = AF_INET; -#ifdef INET6 - pfr_sin6.sin6_len = sizeof(pfr_sin6); - pfr_sin6.sin6_family = AF_INET6; -#endif /* INET6 */ - memset(&pfr_ffaddr, 0xff, sizeof(pfr_ffaddr)); } @@ -1168,6 +1156,7 @@ pfr_walktree(struct radix_node *rn, void *arg, u_int id) { struct pfr_kentry *ke = (struct pfr_kentry *)rn; struct pfr_walktree *w = arg; + union sockaddr_union mask; int flags = w->pfrw_flags; switch (w->pfrw_op) { @@ -1229,21 +1218,21 @@ pfr_walktree(struct radix_node *rn, void *arg, u_int id) case AF_INET: if (w->pfrw_dyn->pfid_acnt4++ > 0) break; - pfr_prepare_network(&pfr_mask, AF_INET, ke->pfrke_net); + pfr_prepare_network(&mask, AF_INET, ke->pfrke_net); w->pfrw_dyn->pfid_addr4 = *SUNION2PF( &ke->pfrke_sa, AF_INET); w->pfrw_dyn->pfid_mask4 = *SUNION2PF( - &pfr_mask, AF_INET); + &mask, AF_INET); break; #ifdef INET6 case AF_INET6: if (w->pfrw_dyn->pfid_acnt6++ > 0) break; - pfr_prepare_network(&pfr_mask, AF_INET6, ke->pfrke_net); + pfr_prepare_network(&mask, AF_INET6, ke->pfrke_net); w->pfrw_dyn->pfid_addr6 = *SUNION2PF( &ke->pfrke_sa, AF_INET6); w->pfrw_dyn->pfid_mask6 = *SUNION2PF( - &pfr_mask, AF_INET6); + &mask, AF_INET6); break; #endif /* INET6 */ default: @@ -2091,6 +2080,10 @@ int pfr_match_addr(struct pfr_ktable *kt, struct pf_addr *a, sa_family_t af) { struct pfr_kentry *ke = NULL; + struct sockaddr_in tmp4; +#ifdef INET6 + struct sockaddr_in6 tmp6; +#endif /* INET6 */ int match; if (!(kt->pfrkt_flags & PFR_TFLAG_ACTIVE) && kt->pfrkt_root != NULL) @@ -2100,13 +2093,19 @@ pfr_match_addr(struct pfr_ktable *kt, struct pf_addr *a, sa_family_t af) switch (af) { case AF_INET: - pfr_sin.sin_addr.s_addr = a->addr32[0]; - ke = (struct pfr_kentry *)rn_match(&pfr_sin, kt->pfrkt_ip4); + bzero(&tmp4, sizeof(tmp4)); + tmp4.sin_len = sizeof(tmp4); + tmp4.sin_family = AF_INET; + tmp4.sin_addr.s_addr = a->addr32[0]; + ke = (struct pfr_kentry *)rn_match(&tmp4, kt->pfrkt_ip4); break; #ifdef INET6 case AF_INET6: - bcopy(a, &pfr_sin6.sin6_addr, sizeof(pfr_sin6.sin6_addr)); - ke = (struct pfr_kentry *)rn_match(&pfr_sin6, kt->pfrkt_ip6); + bzero(&tmp6, sizeof(tmp6)); + tmp6.sin6_len = sizeof(tmp6); + tmp6.sin6_family = AF_INET6; + bcopy(a, &tmp6.sin6_addr, sizeof(tmp6.sin6_addr)); + ke = (struct pfr_kentry *)rn_match(&tmp6, kt->pfrkt_ip6); break; #endif /* INET6 */ default: @@ -2125,6 +2124,10 @@ pfr_update_stats(struct pfr_ktable *kt, struct pf_addr *a, struct pf_pdesc *pd, int op, int notrule) { struct pfr_kentry *ke = NULL; + struct sockaddr_in tmp4; +#ifdef INET6 + struct sockaddr_in6 tmp6; +#endif /* INET6 */ sa_family_t af = pd->af; u_int64_tlen = pd->tot_len; int dir_idx = (pd->dir == PF_OUT); @@ -2137,13 +2140,19 @@ pfr_update_stats(struct pfr_ktable *kt, struct pf_addr *a, struct pf_pdesc *pd, switch (af) { case AF_INET: - pfr_sin.sin_addr.s_addr = a->addr32[0]; - ke = (struct pfr_kentry *)rn_match(&pfr_sin, kt->pfrkt_ip4);
Re: routing socket panic
> From: "Ted Unangst" > Date: Sun, 07 May 2017 21:29:16 -0400 > > Ted Unangst wrote: > > Mike Belopuhov wrote: > > > > So there is something in the tree that doesn't like the mbuf packet > > > > header growth and decides to color outside the lines. > > > > > > > > > > After looking into this with Mark, he has found out that the size of > > > an mbuf structure on armv7 and hppa has exceeded MSIZE (256 bytes). > > > > woah, there's no assert to check that? here's a quickie compile time diff. > > ah, there is a CTASSERT macro. thought we had one, but didn't see it. That really should CTASSERT(MSIZE == sizeof(struct mbuf)) though. > Index: uipc_mbuf.c > === > RCS file: /cvs/src/sys/kern/uipc_mbuf.c,v > retrieving revision 1.245 > diff -u -p -r1.245 uipc_mbuf.c > --- uipc_mbuf.c 7 Feb 2017 07:00:21 - 1.245 > +++ uipc_mbuf.c 8 May 2017 01:28:00 - > @@ -161,6 +161,8 @@ mbinit(void) > int i; > unsigned int lowbits; > > + CTASSERT(MSIZE >= sizeof(struct mbuf)); > + > m_pool_allocator.pa_pagesz = pool_allocator_multi.pa_pagesz; > > nmbclust_update(); > >
Re: mbuf padding and alignment
On Mon, May 08, 2017 at 13:38 +0200, Mark Kettenis wrote: > So the reason mikeb@'s mbuf changes caused issues is that the way we > define struct mbuf is inherently fragile because it doesn't take > structure padding into account. Adding an int64_t member to struct > pkthdr changed the alignment from 4 bytes to 8 bytes on most 32-bit > architectures (but not i386). Since struct m_hdr is only 20 bytes > this means the compiler has to add padding between struct m_hdr and > union that includes struct pkthdr in struct mbuf. But this padding is > not included in the calculation of MLEN so struct mbuf gets too big. > > FreeBSD solved the issue by adding explicit padding when !__LP64__. > But that's rather fragile. The padding would have to be reconsidered > whenever the defenition of the mbuf header changes. > > NetBSD solved this by using offsetoff() instead of sizeof() in its > definition of MLEN. But that requires the definition of a dummy > structure through a macro that isn't exactly pretty. But it should be > fairly robust. > > The diff below takes a different approach. It marks struct m_hdr as > 8-byte aligned. That has the effect of padding out the struct to be a > multiple of 8 bytes (because otherways the elements of arrays of > struct m_hdr wouldn't be properly aligned). This makes us rely on > compiler extensions (C99 doesn't include the possibility to specify > alignment), but all compilers we care of provide such an extension and > it is nicely abstracted in . That somewhat matters since > exposes the struct mbuf definition to userland. This > approach should be robus (until somebody decides they absolutely need > a 128-bit type in their packet header). > The userland exposure is basically limited to libkvm. I don't see it as a problem. > Fun detail; gcc on armv7 crashes if I use __aligned(sizeof(int64_t)) > instead of __aligned(8). The armv7 kernel boots fine with this diff, > but I'm defenitely doing a full build before I even consider > committing this. > > Thoughts? > If this solves the issue for these archictures and there are no objections from the crowd, I'm OK with the diff. I don't have an issue with __aligned on structs. We can also add a CTASSERT directly into the mbuf.h. > > Index: sys/mbuf.h > === > RCS file: /cvs/src/sys/sys/mbuf.h,v > retrieving revision 1.226 > diff -u -p -r1.226 mbuf.h > --- sys/mbuf.h7 May 2017 17:53:30 - 1.226 > +++ sys/mbuf.h8 May 2017 11:11:09 - > @@ -86,7 +86,7 @@ struct m_hdr { > u_int mh_len; /* amount of data in this mbuf */ > short mh_type;/* type of data in this mbuf */ > u_short mh_flags; /* flags; see below */ > -}; > +} __aligned(8); > > /* pf stuff */ > struct pf_state_key; > @@ -122,6 +122,7 @@ struct pkthdr_pf { > struct pkthdr { > void*ph_cookie; /* additional data */ > SLIST_HEAD(, m_tag) ph_tags; /* list of packet tags */ > + int64_t ph_timestamp; /* packet timestamp */ > int len; /* total packet length */ > u_int16_tph_tagsset;/* mtags attached */ > u_int16_tph_flowid; /* pseudo unique flow id */
Re: splsoftnet() -> NET_ASSERT_LOCKED()
On Mon, May 08, 2017 at 11:04:56AM +0200, Martin Pieuchot wrote: > With my previous diff to remove pfctlinput() false positive we can now > turn the last splsoftnet() into an assert. > > pfctlinput() is only used on OpenBSD for PRC_REDIRECT_HOST. It is > always called during the input path, so with the NET_LOCK() held. > > ok? OK bluhm@ > > Index: kern/uipc_domain.c > === > RCS file: /cvs/src/sys/kern/uipc_domain.c,v > retrieving revision 1.49 > diff -u -p -r1.49 uipc_domain.c > --- kern/uipc_domain.c27 Feb 2017 19:16:56 - 1.49 > +++ kern/uipc_domain.c8 May 2017 08:57:03 - > @@ -221,15 +221,15 @@ pfctlinput(int cmd, struct sockaddr *sa) > { > struct domain *dp; > struct protosw *pr; > - int i, s; > + int i; > + > + NET_ASSERT_LOCKED(); > > - s = splsoftnet(); > for (i = 0; (dp = domains[i]) != NULL; i++) { > for (pr = dp->dom_protosw; pr < dp->dom_protoswNPROTOSW; pr++) > if (pr->pr_ctlinput) > (*pr->pr_ctlinput)(cmd, sa, 0, NULL); > } > - splx(s); > } > > void
Re: Kill useless pfctlinput()
On Mon, May 08, 2017 at 10:55:04AM +0200, Martin Pieuchot wrote: > This diff removes a false positive from bluhm@'s lock analyser. This is > the last piece to turn the NET_LOCK() on. > > pfctlinput(PRC_IFDOWN, ...) is a noop. None of the *_ctlinput() handler > present in the kernel handle PRC_IFDOWN. And all of do nothing because > inet{,6}ctlerrmap == 0. > > The two other chunks are for cleanup purposes. > > ok? OK bluhm@ > > Index: net/if.c > === > RCS file: /cvs/src/sys/net/if.c,v > retrieving revision 1.494 > diff -u -p -r1.494 if.c > --- net/if.c 4 May 2017 15:00:24 - 1.494 > +++ net/if.c 8 May 2017 08:47:10 - > @@ -1504,15 +1504,10 @@ if_downall(void) > void > if_down(struct ifnet *ifp) > { > - struct ifaddr *ifa; > - > splsoftassert(IPL_SOFTNET); > > ifp->if_flags &= ~IFF_UP; > getmicrotime(&ifp->if_lastchange); > - TAILQ_FOREACH(ifa, &ifp->if_addrlist, ifa_list) { > - pfctlinput(PRC_IFDOWN, ifa->ifa_addr); > - } > IFQ_PURGE(&ifp->if_snd); > > if_linkstate(ifp); > Index: netinet6/ip6_output.c > === > RCS file: /cvs/src/sys/netinet6/ip6_output.c,v > retrieving revision 1.228 > diff -u -p -r1.228 ip6_output.c > --- netinet6/ip6_output.c 3 May 2017 08:35:55 - 1.228 > +++ netinet6/ip6_output.c 8 May 2017 08:46:28 - > @@ -708,15 +708,6 @@ reroute: > if (mtu > IPV6_MAXPACKET) > mtu = IPV6_MAXPACKET; > > -#if 0 > - /* Notify a proper path MTU to applications. */ > - mtu32 = (u_int32_t)mtu; > - bzero(&ip6cp, sizeof(ip6cp)); > - ip6cp.ip6c_cmdarg = (void *)&mtu32; > - pfctlinput2(PRC_MSGSIZE, sin6tosa(&ro_pmtu->ro_dst), > - (void *)&ip6cp); > -#endif > - > /* >* Change the next header field of the last header in the >* unfragmentable part. > Index: netinet6/nd6.c > === > RCS file: /cvs/src/sys/netinet6/nd6.c,v > retrieving revision 1.207 > diff -u -p -r1.207 nd6.c > --- netinet6/nd6.c26 Mar 2017 08:49:22 - 1.207 > +++ netinet6/nd6.c8 May 2017 08:46:21 - > @@ -748,10 +748,6 @@ nd6_free(struct rtentry *rt, int gc) > > splsoftassert(IPL_SOFTNET); > > - /* > - * we used to have pfctlinput(PRC_HOSTDEAD) here. > - * even though it is not harmful, it was not really necessary. > - */ > ifp = if_get(rt->rt_ifidx); > > if (!ip6_forwarding) {
mbuf padding and alignment
So the reason mikeb@'s mbuf changes caused issues is that the way we define struct mbuf is inherently fragile because it doesn't take structure padding into account. Adding an int64_t member to struct pkthdr changed the alignment from 4 bytes to 8 bytes on most 32-bit architectures (but not i386). Since struct m_hdr is only 20 bytes this means the compiler has to add padding between struct m_hdr and union that includes struct pkthdr in struct mbuf. But this padding is not included in the calculation of MLEN so struct mbuf gets too big. FreeBSD solved the issue by adding explicit padding when !__LP64__. But that's rather fragile. The padding would have to be reconsidered whenever the defenition of the mbuf header changes. NetBSD solved this by using offsetoff() instead of sizeof() in its definition of MLEN. But that requires the definition of a dummy structure through a macro that isn't exactly pretty. But it should be fairly robust. The diff below takes a different approach. It marks struct m_hdr as 8-byte aligned. That has the effect of padding out the struct to be a multiple of 8 bytes (because otherways the elements of arrays of struct m_hdr wouldn't be properly aligned). This makes us rely on compiler extensions (C99 doesn't include the possibility to specify alignment), but all compilers we care of provide such an extension and it is nicely abstracted in . That somewhat matters since exposes the struct mbuf definition to userland. This approach should be robus (until somebody decides they absolutely need a 128-bit type in their packet header). Fun detail; gcc on armv7 crashes if I use __aligned(sizeof(int64_t)) instead of __aligned(8). The armv7 kernel boots fine with this diff, but I'm defenitely doing a full build before I even consider committing this. Thoughts? Index: sys/mbuf.h === RCS file: /cvs/src/sys/sys/mbuf.h,v retrieving revision 1.226 diff -u -p -r1.226 mbuf.h --- sys/mbuf.h 7 May 2017 17:53:30 - 1.226 +++ sys/mbuf.h 8 May 2017 11:11:09 - @@ -86,7 +86,7 @@ struct m_hdr { u_int mh_len; /* amount of data in this mbuf */ short mh_type;/* type of data in this mbuf */ u_short mh_flags; /* flags; see below */ -}; +} __aligned(8); /* pf stuff */ struct pf_state_key; @@ -122,6 +122,7 @@ struct pkthdr_pf { struct pkthdr { void*ph_cookie; /* additional data */ SLIST_HEAD(, m_tag) ph_tags; /* list of packet tags */ + int64_t ph_timestamp; /* packet timestamp */ int len; /* total packet length */ u_int16_tph_tagsset;/* mtags attached */ u_int16_tph_flowid; /* pseudo unique flow id */
relayd: incomplete response from a TLS-accelerated apache
Hey, I investigate a problem were TLS-asselerated machine response is incomplete. I was able to reproduce this on OpenBSD 5.9, 6.0 and 6.1. Test on 5.8 is about to be. Following env I have: relay1: relayd machine web1: apache 2.2.31 serving the request client1: requester relay1 is configured following way (relevant lines): http protocol http_relay { tcp { nodelay, sack, socket buffer 65536, backlog 1024 } match header append "X-Forwarded-For" value "$REMOTE_ADDR" match header set "X-Forwarded-By" value "$SERVER_ADDR:$SERVER_PORT" match header set "Keep-Alive" value "$TIMEOUT" match request header remove "Proxy" } http protocol tls_accel { tcp { nodelay, sack, socket buffer 65536, backlog 1024 } match header append "X-Forwarded-For" value "$REMOTE_ADDR" match header set "X-Forwarded-By" value "$SERVER_ADDR:$SERVER_PORT" match header set "X-Forwarded-Proto" value "https" match header set "X-Forwarded-Port" value "443" match header set "Keep-Alive" value "$TIMEOUT" match request header remove "Proxy" tls { tlsv1, \ ciphers "AES:!AES256:!aNULL" \ } } table { 172.16.1.111 } relay int_test_tls { listen on 172.16.1.99 port 443 tls protocol tls_accel forward to port 80 mode roundrobin check http "/" code 200 } relay int_test_http { listen on 172.16.1.99 port 80 protocol http_relay forward to port 80 mode roundrobin check http "/" code 200 } web1 is a std Apache 2.2.31 with enabled deflate for the following AddOutputFilterByType DEFLATE application/json AddOutputFilterByType DEFLATE text/html AddOutputFilterByType DEFLATE text/plain AddOutputFilterByType DEFLATE text/xml AddOutputFilterByType DEFLATE text/css AddOutputFilterByType DEFLATE application/x-javascript AddOutputFilterByType DEFLATE application/javascript and serving a JS file. client1 is running PHP code from CLI to reproduce this problem. Following is observed: 1. Client1 requests web1 directly on port 80 and gets full response shell$ php client3.php Expected length: 547204 Received length: 547204 [Response Headers] HTTP/1.1 200 OK Date: Mon, 08 May 2017 11:08:27 GMT Server: Apache/2.2.31 (Unix) mod_ssl/2.2.31 OpenSSL/1.0.1e-fips Last-Modified: Mon, 08 May 2017 07:22:43 GMT ETag: "60319-85984-54efe1ae42be3" Accept-Ranges: bytes Content-Length: 547204 Vary: Accept-Encoding Connection: close Content-Type: application/javascript 2. Client1 requests web1 directly on port 80 WITH GZIP enabled and gets full response back I see gzipped stream on the screen and then it gets decoded to a complete file. File I get is not cut. Expected length: Content-Length not recieved Received length: 165454 [Response Headers] HTTP/1.1 200 OK Date: Mon, 08 May 2017 11:10:18 GMT Server: Apache/2.2.31 (Unix) mod_ssl/2.2.31 OpenSSL/1.0.1e-fips Last-Modified: Mon, 08 May 2017 07:22:43 GMT ETag: "60319-85984-54efe1ae42be3" Accept-Ranges: bytes Vary: Accept-Encoding Content-Encoding: gzip Connection: close Content-Type: application/javascript 3. and 4. Clien1 requests relay1 on port 80 (with and without GZIP) and gets complete response 5. Client1 requests relay1 on port 443 without GZIP - response is incomplete Expected length: 547204 Received length: 396424 [Response Headers] HTTP/1.1 200 OK Accept-Ranges: bytes Connection: close Content-Length: 547204 Content-Type: application/javascript Date: Mon, 08 May 2017 11:14:59 GMT ETag: "60319-85984-54efe1ae42be3" Last-Modified: Mon, 08 May 2017 07:22:43 GMT Server: Apache/2.2.31 (Unix) mod_ssl/2.2.31 OpenSSL/1.0.1e-fips Vary: Accept-Encoding 6. Client1 requests relay1 on port 443 with GZIP - response is complete. So non-gzipped response from behind the relay1 is incomplete while doing TLS termination. Files server.js and client.php can be provided upon request. Any ideas? Br
Re: pf: percpu anchor stacks
On 28/03/17(Tue) 13:02, Alexandr Nedvedicky wrote: > [...] > > > > - s/test_status/action/ as it's done everywhere else? > > I've opted to test_status, because it's something different to 'action' > as we use it in current code. I agree with you for test_status. What about naming the enum and use it instead of 'int' for the variable? This implicitly documents the possible option and allow the compiler to check for missing cases in switch. > > Does this pass pfctl regress tests? > > I'm about to run those tests for OpenBSD. Did you manage to do that? > > While I haven't noticed anything criminal here, it makes me > > wonder if it'd be possible to do this change in a few steps: > > factor out rule maching from pf_test_rule and then bring in > > anchor changes? > > > > if I understand you right, you basically want me to make change > in two steps: > > the first step splits current pf_test_rule() to pf_match_rule() and > pf_test_rule() > > the second step will kill global anchor stack array by introducing > a true recursion. The patch will remove pf_step_out_of_anchor() > function. > > I think I can do it. And also as Theo pointed out there is no rush > to get that patch to tree. Now *is* the time to commit the first step, the refactoring. Once that's done we can discuss the introduction of the context. Could you come up with such diff? Cheers, Martin
splsoftnet() -> NET_ASSERT_LOCKED()
With my previous diff to remove pfctlinput() false positive we can now turn the last splsoftnet() into an assert. pfctlinput() is only used on OpenBSD for PRC_REDIRECT_HOST. It is always called during the input path, so with the NET_LOCK() held. ok? Index: kern/uipc_domain.c === RCS file: /cvs/src/sys/kern/uipc_domain.c,v retrieving revision 1.49 diff -u -p -r1.49 uipc_domain.c --- kern/uipc_domain.c 27 Feb 2017 19:16:56 - 1.49 +++ kern/uipc_domain.c 8 May 2017 08:57:03 - @@ -221,15 +221,15 @@ pfctlinput(int cmd, struct sockaddr *sa) { struct domain *dp; struct protosw *pr; - int i, s; + int i; + + NET_ASSERT_LOCKED(); - s = splsoftnet(); for (i = 0; (dp = domains[i]) != NULL; i++) { for (pr = dp->dom_protosw; pr < dp->dom_protoswNPROTOSW; pr++) if (pr->pr_ctlinput) (*pr->pr_ctlinput)(cmd, sa, 0, NULL); } - splx(s); } void
Kill useless pfctlinput()
This diff removes a false positive from bluhm@'s lock analyser. This is the last piece to turn the NET_LOCK() on. pfctlinput(PRC_IFDOWN, ...) is a noop. None of the *_ctlinput() handler present in the kernel handle PRC_IFDOWN. And all of do nothing because inet{,6}ctlerrmap == 0. The two other chunks are for cleanup purposes. ok? Index: net/if.c === RCS file: /cvs/src/sys/net/if.c,v retrieving revision 1.494 diff -u -p -r1.494 if.c --- net/if.c4 May 2017 15:00:24 - 1.494 +++ net/if.c8 May 2017 08:47:10 - @@ -1504,15 +1504,10 @@ if_downall(void) void if_down(struct ifnet *ifp) { - struct ifaddr *ifa; - splsoftassert(IPL_SOFTNET); ifp->if_flags &= ~IFF_UP; getmicrotime(&ifp->if_lastchange); - TAILQ_FOREACH(ifa, &ifp->if_addrlist, ifa_list) { - pfctlinput(PRC_IFDOWN, ifa->ifa_addr); - } IFQ_PURGE(&ifp->if_snd); if_linkstate(ifp); Index: netinet6/ip6_output.c === RCS file: /cvs/src/sys/netinet6/ip6_output.c,v retrieving revision 1.228 diff -u -p -r1.228 ip6_output.c --- netinet6/ip6_output.c 3 May 2017 08:35:55 - 1.228 +++ netinet6/ip6_output.c 8 May 2017 08:46:28 - @@ -708,15 +708,6 @@ reroute: if (mtu > IPV6_MAXPACKET) mtu = IPV6_MAXPACKET; -#if 0 - /* Notify a proper path MTU to applications. */ - mtu32 = (u_int32_t)mtu; - bzero(&ip6cp, sizeof(ip6cp)); - ip6cp.ip6c_cmdarg = (void *)&mtu32; - pfctlinput2(PRC_MSGSIZE, sin6tosa(&ro_pmtu->ro_dst), - (void *)&ip6cp); -#endif - /* * Change the next header field of the last header in the * unfragmentable part. Index: netinet6/nd6.c === RCS file: /cvs/src/sys/netinet6/nd6.c,v retrieving revision 1.207 diff -u -p -r1.207 nd6.c --- netinet6/nd6.c 26 Mar 2017 08:49:22 - 1.207 +++ netinet6/nd6.c 8 May 2017 08:46:21 - @@ -748,10 +748,6 @@ nd6_free(struct rtentry *rt, int gc) splsoftassert(IPL_SOFTNET); - /* -* we used to have pfctlinput(PRC_HOSTDEAD) here. -* even though it is not harmful, it was not really necessary. -*/ ifp = if_get(rt->rt_ifidx); if (!ip6_forwarding) {