[patch] Fail execve on environment duplicates

2017-05-08 Thread Matthew Martin
Duplicate environment variables have been a source of bugs, so perhaps
the kernel should fail an execve with an invalid envp. Diff below checks
that each environment string contains a '=' and that there are no
duplicates up to the equals sign.

So far I haven't noticed any breakage except for a purposefully
misbehaving program to test that execve will fail.

EINVAL gives a somewhat confusing error message, but nothing in intro(2)
looked like a better choice.

Idea from Martin Brandenburg.

- Matthew Martin



diff --git kern_exec.c kern_exec.c
index 7784d5f4165..3c3f0aa6266 100644
--- kern_exec.c
+++ kern_exec.c
@@ -239,10 +239,8 @@ sys_execve(struct proc *p, void *v, register_t *retval)
struct vattr attr;
struct ucred *cred = p->p_ucred;
char *argp;
-   char * const *cpp, *dp, *sp;
-#ifdef KTRACE
+   char * const *cpp, *dp, *ep, *op, *sp;
char *env_start;
-#endif
struct process *pr = p->p_p;
long argc, envc;
size_t len, sgap;
@@ -357,9 +355,7 @@ sys_execve(struct proc *p, void *v, register_t *retval)
envc = 0;
/* environment does not need to be there */
if ((cpp = SCARG(uap, envp)) != NULL ) {
-#ifdef KTRACE
env_start = dp;
-#endif
while (1) {
len = argp + ARG_MAX - dp;
if ((error = copyin(cpp, &sp, sizeof(sp))) != 0)
@@ -371,6 +367,19 @@ sys_execve(struct proc *p, void *v, register_t *retval)
error = E2BIG;
goto bad;
}
+
+   /* Check for duplicate environment variables */
+   if ((ep = strchr(dp, '=')) == NULL) {
+   error = EINVAL;
+   goto bad;
+   }
+   for (op = env_start; op != dp; op = strchr(op, '\0') + 
1) {
+   if (strncmp(dp, op, (ep - dp + 1)) == 0) {
+   error = EINVAL;
+   goto bad;
+   }
+   }
+
dp += len;
cpp++;
envc++;



Re: pf: remove pfr_{sin,sin6,mask} globals

2017-05-08 Thread Alexandr Nedvedicky
Hello Patrick,

your diff looks OK to me.

thanks and
regards
sasha

On Mon, May 08, 2017 at 02:56:55PM +0200, Patrick Wildt wrote:
> Hi,
> 
> in order to reduce globals so that we can run more parts of pf in
> parallel, this diff removes the pfr_sin, pfr_sin6 and pfr_mask
> globals.  Those are instead allocated on the stack.
> 
> ok?
> 



Re: ocspcheck size_t printing

2017-05-08 Thread Bob Beck
You are correct. 

Patch committed.  Thanks!

-Bob


On Mon, May 08, 2017 at 08:20:57PM +0200, Jonas 'Sortie' Termansen wrote:
> Hi,
> 
> When upgrading to libressl-2.5.4 I noticed a couple -Wformat errors due
> to this code assuming size_t is of type long when it was actually int on
> this 32-bit system. Here's a patch against cvs that fixes the issue and
> also prints the variableas unsigned type.
> 
> Jonas
> 
> Index: ocspcheck.c
> ===
> RCS file: /cvs/src/usr.sbin/ocspcheck/ocspcheck.c,v
> retrieving revision 1.20
> diff -u -r1.20 ocspcheck.c
> --- ocspcheck.c   27 Mar 2017 23:59:08 -  1.20
> +++ ocspcheck.c   8 May 2017 18:15:51 -
> @@ -564,7 +564,7 @@
>   if ((request = ocsp_request_new_from_cert(certfile, nonce)) == NULL)
>   exit(1);
>  
> - dspew("Built an %ld byte ocsp request\n", request->size);
> + dspew("Built an %zu byte ocsp request\n", request->size);
>  
>   if ((host = url2host(request->url, &port, &path)) == NULL)
>   errx(1, "Invalid OCSP url %s from %s", request->url,
> @@ -601,7 +601,7 @@
>   dspew("Server at %s returns:\n", host);
>   for (i = 0; i < httphsz; i++)
>   dspew("   [%s]=[%s]\n", httph[i].key, httph[i].val);
> - dspew("   [Body]=[%ld bytes]\n", hget->bodypartsz);
> + dspew("   [Body]=[%zu bytes]\n", hget->bodypartsz);
>   if (hget->bodypartsz <= 0)
>   errx(1, "No body in reply from %s", host);
>  
> 



Re: relayd: incomplete response from a TLS-accelerated apache

2017-05-08 Thread Maxim Bourmistrov

Compiling relayd with -DDEBUG=3 and watching the output gave me nothing.
No errors what so ever about out of buffers or something else.

However, removing 'socket buffer 65536’ solved my problem.

Br

> 8 maj 2017 kl. 13:27 skrev Maxim Bourmistrov :
> 
> Hey,
> I investigate a problem were TLS-asselerated machine response is incomplete.
> I was able to reproduce this on OpenBSD 5.9, 6.0 and 6.1. Test on 5.8 is 
> about to be.
> 
> Following env I have:
> 
> relay1: relayd machine
> web1: apache 2.2.31 serving the request
> client1: requester
> 
> relay1 is configured following way (relevant lines):
> 
> http protocol http_relay {
>tcp { nodelay, sack, socket buffer 65536, backlog 1024 }
>match header append "X-Forwarded-For" value "$REMOTE_ADDR"
>match header set "X-Forwarded-By" value "$SERVER_ADDR:$SERVER_PORT"
>match header set "Keep-Alive" value "$TIMEOUT"
>match request header remove "Proxy"
> }
> 
> http protocol tls_accel {
>tcp { nodelay, sack, socket buffer 65536, backlog 1024 }
>match header append "X-Forwarded-For" value "$REMOTE_ADDR"
>match header set "X-Forwarded-By" value "$SERVER_ADDR:$SERVER_PORT"
>match header set "X-Forwarded-Proto" value "https"
>match header set "X-Forwarded-Port" value "443"
>match header set "Keep-Alive" value "$TIMEOUT"
>match request header remove "Proxy"
> 
>tls { tlsv1, \
>ciphers "AES:!AES256:!aNULL" \
>  }
> }
> 
> table  { 172.16.1.111 }
> 
> relay int_test_tls {
>listen on 172.16.1.99 port 443 tls
>protocol tls_accel
>forward to  port 80 mode roundrobin check http "/" code 200
> }
> 
> relay int_test_http {
>listen on 172.16.1.99 port 80
>protocol http_relay
>forward to  port 80 mode roundrobin check http "/" code 200
> }
> 
> web1 is a std Apache 2.2.31 with enabled deflate for the following
> 
> AddOutputFilterByType DEFLATE application/json
> AddOutputFilterByType DEFLATE text/html
> AddOutputFilterByType DEFLATE text/plain
> AddOutputFilterByType DEFLATE text/xml
> AddOutputFilterByType DEFLATE text/css
> AddOutputFilterByType DEFLATE application/x-javascript
> AddOutputFilterByType DEFLATE application/javascript
> 
> and serving a JS file.
> 
> client1 is running PHP code from CLI to reproduce this problem.
> 
> 
> Following is observed:
> 
> 1. Client1 requests web1 directly on port 80 and gets full response
> 
> shell$ php client3.php
> Expected length: 547204
> Received length: 547204
> 
> [Response Headers]
> HTTP/1.1 200 OK
> Date: Mon, 08 May 2017 11:08:27 GMT
> Server: Apache/2.2.31 (Unix) mod_ssl/2.2.31 OpenSSL/1.0.1e-fips
> Last-Modified: Mon, 08 May 2017 07:22:43 GMT
> ETag: "60319-85984-54efe1ae42be3"
> Accept-Ranges: bytes
> Content-Length: 547204
> Vary: Accept-Encoding
> Connection: close
> Content-Type: application/javascript 
> 
> 2. Client1 requests web1 directly on port 80 WITH GZIP enabled and gets full 
> response back
> I see gzipped stream on the screen and then it gets decoded to a complete 
> file. File I get is not cut.
> 
> Expected length: Content-Length not recieved
> Received length: 165454
> 
> [Response Headers]
> HTTP/1.1 200 OK
> Date: Mon, 08 May 2017 11:10:18 GMT
> Server: Apache/2.2.31 (Unix) mod_ssl/2.2.31 OpenSSL/1.0.1e-fips
> Last-Modified: Mon, 08 May 2017 07:22:43 GMT
> ETag: "60319-85984-54efe1ae42be3"
> Accept-Ranges: bytes
> Vary: Accept-Encoding
> Content-Encoding: gzip
> Connection: close
> Content-Type: application/javascript
> 
> 3. and 4. Clien1 requests relay1 on port 80 (with and without GZIP) and gets 
> complete response
> 
> 5. Client1 requests relay1 on port 443 without GZIP - response is incomplete
> 
> Expected length: 547204
> Received length: 396424
> 
> [Response Headers]
> HTTP/1.1 200 OK
> Accept-Ranges: bytes
> Connection: close
> Content-Length: 547204
> Content-Type: application/javascript
> Date: Mon, 08 May 2017 11:14:59 GMT
> ETag: "60319-85984-54efe1ae42be3"
> Last-Modified: Mon, 08 May 2017 07:22:43 GMT
> Server: Apache/2.2.31 (Unix) mod_ssl/2.2.31 OpenSSL/1.0.1e-fips
> Vary: Accept-Encoding
> 
> 6. Client1 requests relay1 on port 443 with GZIP - response is complete.
> 
> 
> So non-gzipped response from behind the relay1 is incomplete while doing TLS 
> termination.
> Files server.js and client.php can be provided upon request.
> 
> Any ideas?
> 
> Br
> 
> 
> 



OpenBSD Errata: May 8th, 2017 (libssl)

2017-05-08 Thread T.J. Townsend
Errata patches for libssl have been released for OpenBSD 6.1 and 6.0.

Incorrect DTLS cookie handling can result in a NULL pointer dereference.

Binary updates for the amd64 and i386 platforms are available via the
syspatch utility. Source code patches can be found on the respective
errata pages:

  https://www.openbsd.org/errata60.html
  https://www.openbsd.org/errata61.html

If none of your server-side applications use DTLS, this issue does not
affect you.



Re: Correct capable baseband in dmesg for RTL8188E chips

2017-05-08 Thread Ricardo Mestre
In that case I'd rather just have a general else like you said.

I'll commit it in a bit if you don't beat me to it :) thanks!

On 19:27 Mon 08 May , Stefan Sperling wrote:
> On Mon, May 08, 2017 at 05:35:26PM +0100, Ricardo Mestre wrote:
> > Hi tech@
> > 
> > During stsp@'s effort to merge rtwn(4) and urtwn(4), more specifically since
> > r1.6 of /cvs/src/sys/dev/ic/rtwn.c, my urtwn(4) device started showing in 
> > dmesg
> > with a capable baseband of 0T0R (only noticed it today!):
> > 
> > urtwn0: MAC/BB RTL8188EU, RF 6052 0T0R, address xx:xx:xx:xx:xx:xx
> > 
> > Since according to urtwn(4)'s manpage both RTL8188C and RTL8188E chips have 
> > a
> > 1T1R capable baseband please find below a patch to correct this.
> > 
> > OK?
> 
> 8192c is the only MIMO chip supported by this driver. All other chips
> are 1T1R ones. We could make the 1T1R case a general 'else' clause
> instead of an 'else if', as in:
> 
> - } else if (sc->chip & RTWN_CHIP_88C) {
> + } else {
> 
> But either way, OK by me. Thank you for hunting this down.



IPv6 IPsec transport pf

2017-05-08 Thread Alexander Bluhm
Hi,

IPv6 IPsec transport mode does not work if pf is enabled.  The
problem is that the decrypted packets in the input path are not
checked with pf(4).  So if you have stateful filtering on enc0 (the
default) direction aware protocols like ping or TCP do not pass.
Only the output packets are matched with the states.

Adding an explicit pf_test() in ipsec_common_input_cb() fixes this.
In the IPv4 case the decrypted packet is enqueued again, so it hits
pf_test() in IPv4_input().  In IPv6 the ip6_local() shortcut is
taken.  I like to keep the shortcut as it corresponds to the idea
of IPv6 header chains.

ok?

bluhm

Index: netinet/ipsec_input.c
===
RCS file: /data/mirror/openbsd/cvs/src/sys/netinet/ipsec_input.c,v
retrieving revision 1.148
diff -u -p -r1.148 ipsec_input.c
--- netinet/ipsec_input.c   5 May 2017 11:04:18 -   1.148
+++ netinet/ipsec_input.c   8 May 2017 16:44:31 -
@@ -591,6 +591,25 @@ ipsec_common_input_cb(struct mbuf *m, st
return;
 #ifdef INET6
case AF_INET6:
+#if NPF > 0
+   if ((tdbp->tdb_flags & TDBF_TUNNELING) == 0) {
+   struct ifnet *ifp;
+
+   /* This is the enc0 interface unless for ipcomp. */
+   if ((ifp = if_get(m->m_pkthdr.ph_ifidx)) == NULL) {
+   m_freem(m);
+   return;
+   }
+   if (pf_test(AF_INET6, PF_IN, ifp, &m) != PF_PASS) {
+   if_put(ifp);
+   m_freem(m);
+   return;
+   }
+   if_put(ifp);
+   if (m == NULL)
+   return;
+   }
+#endif
ip6_local(m, skip, prot);
return;
 #endif /* INET6 */



ocspcheck size_t printing

2017-05-08 Thread Jonas 'Sortie' Termansen
Hi,

When upgrading to libressl-2.5.4 I noticed a couple -Wformat errors due
to this code assuming size_t is of type long when it was actually int on
this 32-bit system. Here's a patch against cvs that fixes the issue and
also prints the variableas unsigned type.

Jonas

Index: ocspcheck.c
===
RCS file: /cvs/src/usr.sbin/ocspcheck/ocspcheck.c,v
retrieving revision 1.20
diff -u -r1.20 ocspcheck.c
--- ocspcheck.c 27 Mar 2017 23:59:08 -  1.20
+++ ocspcheck.c 8 May 2017 18:15:51 -
@@ -564,7 +564,7 @@
if ((request = ocsp_request_new_from_cert(certfile, nonce)) == NULL)
exit(1);
 
-   dspew("Built an %ld byte ocsp request\n", request->size);
+   dspew("Built an %zu byte ocsp request\n", request->size);
 
if ((host = url2host(request->url, &port, &path)) == NULL)
errx(1, "Invalid OCSP url %s from %s", request->url,
@@ -601,7 +601,7 @@
dspew("Server at %s returns:\n", host);
for (i = 0; i < httphsz; i++)
dspew("   [%s]=[%s]\n", httph[i].key, httph[i].val);
-   dspew("   [Body]=[%ld bytes]\n", hget->bodypartsz);
+   dspew("   [Body]=[%zu bytes]\n", hget->bodypartsz);
if (hget->bodypartsz <= 0)
errx(1, "No body in reply from %s", host);
 



Re: Correct capable baseband in dmesg for RTL8188E chips

2017-05-08 Thread Stefan Sperling
On Mon, May 08, 2017 at 05:35:26PM +0100, Ricardo Mestre wrote:
> Hi tech@
> 
> During stsp@'s effort to merge rtwn(4) and urtwn(4), more specifically since
> r1.6 of /cvs/src/sys/dev/ic/rtwn.c, my urtwn(4) device started showing in 
> dmesg
> with a capable baseband of 0T0R (only noticed it today!):
> 
> urtwn0: MAC/BB RTL8188EU, RF 6052 0T0R, address xx:xx:xx:xx:xx:xx
> 
> Since according to urtwn(4)'s manpage both RTL8188C and RTL8188E chips have a
> 1T1R capable baseband please find below a patch to correct this.
> 
> OK?

8192c is the only MIMO chip supported by this driver. All other chips
are 1T1R ones. We could make the 1T1R case a general 'else' clause
instead of an 'else if', as in:

-   } else if (sc->chip & RTWN_CHIP_88C) {
+   } else {

But either way, OK by me. Thank you for hunting this down.

> Index: rtwn.c
> ===
> RCS file: /cvs/src/sys/dev/ic/rtwn.c,v
> retrieving revision 1.19
> diff -u -p -u -r1.19 rtwn.c
> --- rtwn.c12 Feb 2017 01:01:39 -  1.19
> +++ rtwn.c8 May 2017 16:16:57 -
> @@ -154,7 +154,7 @@ rtwn_attach(struct device *pdev, struct 
>   if (sc->chip & RTWN_CHIP_92C) {
>   sc->ntxchains = (sc->chip & RTWN_CHIP_92C_1T2R) ? 1 : 2;
>   sc->nrxchains = 2;
> - } else if (sc->chip & RTWN_CHIP_88C) {
> + } else if (sc->chip & RTWN_CHIP_88C || sc->chip & RTWN_CHIP_88E) {
>   sc->ntxchains = 1;
>   sc->nrxchains = 1;
>   }
> 



Correct capable baseband in dmesg for RTL8188E chips

2017-05-08 Thread Ricardo Mestre
Hi tech@

During stsp@'s effort to merge rtwn(4) and urtwn(4), more specifically since
r1.6 of /cvs/src/sys/dev/ic/rtwn.c, my urtwn(4) device started showing in dmesg
with a capable baseband of 0T0R (only noticed it today!):

urtwn0: MAC/BB RTL8188EU, RF 6052 0T0R, address xx:xx:xx:xx:xx:xx

Since according to urtwn(4)'s manpage both RTL8188C and RTL8188E chips have a
1T1R capable baseband please find below a patch to correct this.

OK?

Index: rtwn.c
===
RCS file: /cvs/src/sys/dev/ic/rtwn.c,v
retrieving revision 1.19
diff -u -p -u -r1.19 rtwn.c
--- rtwn.c  12 Feb 2017 01:01:39 -  1.19
+++ rtwn.c  8 May 2017 16:16:57 -
@@ -154,7 +154,7 @@ rtwn_attach(struct device *pdev, struct 
if (sc->chip & RTWN_CHIP_92C) {
sc->ntxchains = (sc->chip & RTWN_CHIP_92C_1T2R) ? 1 : 2;
sc->nrxchains = 2;
-   } else if (sc->chip & RTWN_CHIP_88C) {
+   } else if (sc->chip & RTWN_CHIP_88C || sc->chip & RTWN_CHIP_88E) {
sc->ntxchains = 1;
sc->nrxchains = 1;
}



ipv6 mapped address output

2017-05-08 Thread Alexander Bluhm
Hi,

Checking for IPv4 mapped addresses is a bit inconsistent in the
output path.  So I would like to:

- Use the common switch(af) construct for af specific code in
  tcp_usrreq(PRU_CONNECT).
- Add a EAFNOSUPPORT default case.
- Do not check for mapped addresses, this is done in in6_pcbconnect().

- Do not check for locally bound mapped addresses in
  in6_pcbconnect(), this is done during bind(2) in in6_pcbaddrisavail().

- Check for mapped addesses in rip6_output() like it is done
  in udp6_output().
- Move the EAFNOSUPPORT error from rip6_usrreq() to rip6_output()
  like it is done for UDP.

- Return EADDRNOTAVAIL if UDP sendto(2) is used with a mapped address.

ok?

bluhm

Index: netinet/tcp_usrreq.c
===
RCS file: /data/mirror/openbsd/cvs/src/sys/netinet/tcp_usrreq.c,v
retrieving revision 1.147
diff -u -p -r1.147 tcp_usrreq.c
--- netinet/tcp_usrreq.c5 Apr 2017 13:35:18 -   1.147
+++ netinet/tcp_usrreq.c8 May 2017 13:23:50 -
@@ -127,7 +127,6 @@ int
 tcp_usrreq(struct socket *so, int req, struct mbuf *m, struct mbuf *nam,
 struct mbuf *control, struct proc *p)
 {
-   struct sockaddr_in *sin;
struct inpcb *inp;
struct tcpcb *tp = NULL;
int error = 0;
@@ -221,33 +220,40 @@ tcp_usrreq(struct socket *so, int req, s
 * Send initial segment on connection.
 */
case PRU_CONNECT:
-   sin = mtod(nam, struct sockaddr_in *);
-
-#ifdef INET6
-   if (sin->sin_family == AF_INET6) {
-   struct in6_addr *in6_addr = &mtod(nam,
-   struct sockaddr_in6 *)->sin6_addr;
-
-   if (IN6_IS_ADDR_UNSPECIFIED(in6_addr) ||
-   IN6_IS_ADDR_MULTICAST(in6_addr) ||
-   IN6_IS_ADDR_V4MAPPED(in6_addr)) {
+   switch (mtod(nam, struct sockaddr *)->sa_family) {
+   case AF_INET: {
+   struct in_addr *addr =
+   &mtod(nam, struct sockaddr_in *)->sin_addr;
+
+   if ((addr->s_addr == INADDR_ANY) ||
+   (addr->s_addr == INADDR_BROADCAST) ||
+   IN_MULTICAST(addr->s_addr) ||
+   in_broadcast(*addr, inp->inp_rtableid)) {
error = EINVAL;
break;
}
 
-   error = in6_pcbconnect(inp, nam);
-   } else if (sin->sin_family == AF_INET)
-#endif /* INET6 */
-   {
-   if ((sin->sin_addr.s_addr == INADDR_ANY) ||
-   (sin->sin_addr.s_addr == INADDR_BROADCAST) ||
-   IN_MULTICAST(sin->sin_addr.s_addr) ||
-   in_broadcast(sin->sin_addr, inp->inp_rtableid)) {
+   error = in_pcbconnect(inp, nam);
+   break;
+   }
+#ifdef INET6
+   case AF_INET6: {
+   struct in6_addr *addr6 =
+   &mtod(nam, struct sockaddr_in6 *)->sin6_addr;
+
+   if (IN6_IS_ADDR_UNSPECIFIED(addr6) ||
+   IN6_IS_ADDR_MULTICAST(addr6)) {
error = EINVAL;
break;
}
 
-   error = in_pcbconnect(inp, nam);
+   error = in6_pcbconnect(inp, nam);
+   break;
+   }
+#endif /* INET6 */
+   default:
+   error = EAFNOSUPPORT;
+   break;
}
 
if (error)
Index: netinet6/in6_pcb.c
===
RCS file: /data/mirror/openbsd/cvs/src/sys/netinet6/in6_pcb.c,v
retrieving revision 1.97
diff -u -p -r1.97 in6_pcb.c
--- netinet6/in6_pcb.c  7 Mar 2017 16:59:40 -   1.97
+++ netinet6/in6_pcb.c  8 May 2017 13:23:50 -
@@ -256,14 +256,9 @@ in6_pcbconnect(struct inpcb *inp, struct
return (EAFNOSUPPORT);
if (sin6->sin6_port == 0)
return (EADDRNOTAVAIL);
-
/* reject IPv4 mapped address, we have no support for it */
if (IN6_IS_ADDR_V4MAPPED(&sin6->sin6_addr))
-   return EADDRNOTAVAIL;
-
-   /* sanity check for mapped address case */
-   if (IN6_IS_ADDR_V4MAPPED(&inp->inp_laddr6))
-   return EINVAL;
+   return (EADDRNOTAVAIL);
 
/* protect *sin6 from overwrites */
tmp = *sin6;
Index: netinet6/raw_ip6.c
===
RCS file: /data/mirror/openbsd/cvs/src/sys/netinet6/raw_ip6.c,v
retrieving revision 1.113
diff -u -p -r1.113 raw_ip6.c
--- netinet6/raw_ip6.c  8 May 2017 08:46:39 -   1.113
+++ netinet6/raw_ip6.c  8 May 2017 14:16:27 -
@@ -343,7 +343,6 @

Re: uip_socket.c: issues when using sendmsg() with small send buffers and the new 6.1 control message (IP_SENDSRCADDR)

2017-05-08 Thread Alexander Bluhm
On Fri, Apr 21, 2017 at 08:43:11AM +, Markert, Alexander wrote:
> In our opinion either EMSGSIZE should be returned instead in this case (like 
> e.g. FreeBSD 11.0 does) or OpenBSD should reserve some space (comparable to 
> MSG_OOB) in addition to the maximum size of the socket's send buffer for the 
> control messages in order to avoid such problems.
> 
> What do you think about that behavior and the described alternatives?

I think EMSGSIZE if send space is too small, would be the way to
go.  On the reveive path also some bytes of space must be reseverd
by the user program, if control messages are used.  This behavior
should be symmetric for the send path.

bluhm



Re: fix iwm command wait vs interface reset

2017-05-08 Thread Stefan Sperling
On Mon, May 08, 2017 at 03:54:38PM +0200, Mark Kettenis wrote:
> > Date: Mon, 8 May 2017 15:36:11 +0200
> > From: Stefan Sperling 
> > 
> > In iwm_send_cmd(), look at the generation counter instead of the STOPPED 
> > flag
> > to determine whether the interface was reset while we were sleeping. The 
> > flag
> > will be set if the interface is still down when the task wakes up, but the
> > interface could already be up again in which case the flag will be cleared.
> > A stale task which sends an outdated command could confuse the firmware.
> 
> Makes sense.  The comment isn't really accurate now though...

Yes, the comments could be better. I'll eventually do another pass over the
driver just fixing stylistic issues such as this (probably during d2k17).



Re: arm64 lock: no userland progress, several procs in wchan "vp"

2017-05-08 Thread Stuart Henderson
On 2017/05/04 00:31, Mark Kettenis wrote:
> > Date: Wed, 3 May 2017 21:05:24 +0100
> > From: Stuart Henderson 
> > 
> > On 2017/05/03 15:12, Mark Kettenis wrote:
> > > > Date: Wed, 3 May 2017 13:51:22 +0100
> > > > From: Stuart Henderson 
> > > > 
> > > > On 2017/05/01 22:18, Mark Kettenis wrote:
> > > > > > Date: Mon, 1 May 2017 20:58:29 +0100
> > > > > > From: Stuart Henderson 
> > > > > > 
> > > > > > Userland is non-responsive, machine is pingable, tcp connections 
> > > > > > open
> > > > > > but no banner from ssh. No failed pool requests. This kernel is from
> > > > > > today's snapshot but I saw the same with one from a couple of days
> > > > > > ago.  Is there anything else I can get that might be useful?
> > > > > > 
> > > > ..
> > > > > >  71034  186155  65198  0  30x11  vpperl
> > > > ..
> > > > > 
> > > > > The diff below might fix thise.  Or it might actually turn this into a
> > > > > hard hang...
> > > > > 
> > > > > Nevertheless, could you try running with it?
> > > > 
> > > > I haven't seen this happen again with your diff, and haven't seen any
> > > > hangs. Probably still too early to say for sure that it fixes things,
> > > > but it looks promising so far.
> > > 
> > > Thanks.  Since Dale ok'ed it and I had been running with it for a
> > > while already, I committed it last night.
> > > 
> > 
> > Ha. As is traditional, not long after sending that message I've hit
> > a hard lock - no DDB.
> 
> I'm sure it wouldn't have happened if I hand't committed it ;).
> 
> Could you change the PR_NOWAIT back into P_WAITOK and see if you can
> reproduce the hang and break into ddb?  Meanwhile I'll think about
> what information to print once you've hit it ;).
> 

I've hit it:

[halt sent]
Stopped at  pluart_intr+0x164:
ddb> ps   
   PID TID   PPIDUID  S   FLAGS  WAIT  COMMAND
   625  437040  78755 55  3 0x2  vpcc 
 78755  385259   7516 55  30x82  wait  cc
  7516  136325   5903 55  30x82  wait  bash
 19959  322579  19904  0  30x11  vpperl
  5903  139810  38260 55  30x82  wait  gmake
 38260  218618   2854 55  30x82  wait  bash 
  2854   90787   9324 55  30x82  wait  gmake
  9324  170231  48868 55  30x82  wait  gmake
 48868  117527  14396 55  30x80  wait  bash 
 14396  515415   1643 55  30x82  wait  bash
  1643  416463  68987 55  30x82  wait  gmake
 68987  461754   3515 55  30x82  wait  bash 
  3515  418174  60280 55  30x82  wait  gmake
 60280  191595  96060 55  30x10008a  pause sh   
 96060  102351  39653 55  30x10008a  pause sh
 39653  519956  69874 55  30x10008a  pause make
 69874  105490  67943 55  30x10008a  pause make
 679438933   3490 55  30x10008a  pause sh  
  3490   32658  91100 55  30x10008a  pause make
 91100  232620   7725 55  30x10008a  pause sh  
  7725  345849  19904 55  30x10008a  pause make
 74896  137698  47027   1000  30x100083  kqreadtmux
 47027  296497  97822   1000  30x10008b  pause ksh 
 97822  245510  69858   1000  30x10  vpsshd
 69858  304169   7139  0  30x92  poll  sshd
 52214   79445  1  0  30x13  vpgetty
 65809  151855  1 77  30x100090  poll  dhclient
 69118  177062  1  0  30x80  poll  dhclient
 74448  202290  83042   1000  30x100083  ttyin ksh 
 19904  489750  99750  0  30x93  wait  perl
 99750  481138  83042   1000  30x10008b  pause ksh 
 83042  493811  1   1000  30x100080  kqreadtmux
 60522  271756  1  0  30x100010  vpcron
 49702  312331  1110  30x100090  poll  sndiod
 81577  93  1 99  30x100090  poll  sndiod
  3727  393780  19238 95  30x100092  kqreadsmtpd 
 88472  480120  19238103  30x100092  kqreadsmtpd
 87719  273623  19238 95  30x100092  kqreadsmtpd
  8252  227745  19238 95  30x100092  kqreadsmtpd
 30445  236541  19238 95  30x100092  kqreadsmtpd
 23622   99340  19238 95  30x100092  kqreadsmtpd
 19238  382063  1  0  30x100080  kqreadsmtpd
  7139   54914  1  0  30x80  selectsshd 
 63805  211190  0  0  3 0x14280  nfsidlnfsio
 56273  341955  0  0  3 0x14280  nfsidlnfsio
 13129   36316  0  0  3 0x14280  nfsidlnfsio
 43056   41917  0  0  3 0x14280  nfsidlnfsio
 65234   94439  1  0  30x80  poll  rpc.sta

Re: fix iwm command wait vs interface reset

2017-05-08 Thread Mark Kettenis
> Date: Mon, 8 May 2017 15:36:11 +0200
> From: Stefan Sperling 
> 
> In iwm_send_cmd(), look at the generation counter instead of the STOPPED flag
> to determine whether the interface was reset while we were sleeping. The flag
> will be set if the interface is still down when the task wakes up, but the
> interface could already be up again in which case the flag will be cleared.
> A stale task which sends an outdated command could confuse the firmware.

Makes sense.  The comment isn't really accurate now though...

> Index: if_iwm.c
> ===
> RCS file: /cvs/src/sys/dev/pci/if_iwm.c,v
> retrieving revision 1.180
> diff -u -p -r1.180 if_iwm.c
> --- if_iwm.c  8 May 2017 12:41:23 -   1.180
> +++ if_iwm.c  8 May 2017 13:04:05 -
> @@ -3630,6 +3630,7 @@ iwm_send_cmd(struct iwm_softc *sc, struc
>   int group_id;
>   size_t hdrlen, datasz;
>   uint8_t *data;
> + int generation = sc->sc_generation;
>  
>   code = hcmd->id;
>   async = hcmd->flags & IWM_CMD_ASYNC;
> @@ -3651,7 +3652,7 @@ iwm_send_cmd(struct iwm_softc *sc, struc
>* Is the hardware still available?  (after e.g. above wait).
>*/
>   s = splnet();
> - if (sc->sc_flags & IWM_FLAG_STOPPED) {
> + if (generation != sc->sc_generation) {
>   err = ENXIO;
>   goto out;
>   }
> @@ -3764,7 +3765,6 @@ iwm_send_cmd(struct iwm_softc *sc, struc
>   IWM_WRITE(sc, IWM_HBUS_TARG_WRPTR, ring->qid << 8 | ring->cur);
>  
>   if (!async) {
> - int generation = sc->sc_generation;
>   err = tsleep(desc, PCATCH, "iwmcmd", hz);
>   if (err == 0) {
>   /* if hardware is no longer up, return error */
> 
> 



Re: uip_socket.c: issues when using sendmsg() with small send buffers and the new 6.1 control message (IP_SENDSRCADDR)

2017-05-08 Thread Martin Pieuchot
On 26/04/17(Wed) 12:29, Markert, Alexander wrote:
> Hi,
> 
> actually you are right that this issue is related to control messages and not 
> to the send buffer length. But the length of the control message is checked 
> in combination with the data to be sent in uip_socket.c:
> 
> Let's assume we are sending data with the maximum send buffer size (e.g. 
> 1024).
> Since clen is NOT evaluated in the first if statement, EMSGSIZE is NOT 
> returned.
> But the second if statement becomes true, since clen is considered here. 
> Hence either EWOULDBLOCK (snderr) is returned or sbwait() is called.
> 
>   if ((atomic && resid > so->so_snd.sb_hiwat) ||
>   (so->so_proto->pr_domain->dom_family != AF_LOCAL &&
>   clen > so->so_snd.sb_hiwat))
>   snderr(EMSGSIZE);
>   if (space < clen ||
>   (space - clen < resid &&
>   (atomic || space < so->so_snd.sb_lowat))) {
>   if ((so->so_state & SS_NBIO) || (flags & MSG_DONTWAIT))
>   snderr(EWOULDBLOCK);
>   sbunlock(&so->so_snd);
>   error = sbwait(&so->so_snd);
> 
> I attached a small example, which
> - sets the sockets maximum buffer length
> - enables/disables non-blocking IO and
> - calls sendmsg with the maximum buffer size including a control message 
> (IP_SENDSRCADDR).

I KNF'ed & converted the test case to our framework.  I didn't include
it in the existing regress/sys/netinet/sendscraddr test because I don't
grok it.

Alexander I used our standard license template for your test, is it ok?

Vincent, Jeremie could you look at this?

Index: ip_sendsrcaddr/Makefile
===
RCS file: ip_sendsrcaddr/Makefile
diff -N ip_sendsrcaddr/Makefile
--- /dev/null   1 Jan 1970 00:00:00 -
+++ ip_sendsrcaddr/Makefile 8 May 2017 13:41:33 -
@@ -0,0 +1,5 @@
+#  $OpenBSD: Makefile,v 1.1 2017/04/30 09:03:58 mpi Exp $
+
+PROG=  ip_sendsrcaddr
+
+.include 
Index: ip_sendsrcaddr/ip_sendsrcaddr.c
===
RCS file: ip_sendsrcaddr/ip_sendsrcaddr.c
diff -N ip_sendsrcaddr/ip_sendsrcaddr.c
--- /dev/null   1 Jan 1970 00:00:00 -
+++ ip_sendsrcaddr/ip_sendsrcaddr.c 8 May 2017 13:39:54 -
@@ -0,0 +1,100 @@
+/*
+ * Copyright (c) 2017 Alexander Markert 
+ *
+ * Permission to use, copy, modify, and distribute this software for any
+ * purpose with or without fee is hereby granted, provided that the above
+ * copyright notice and this permission notice appear in all copies.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
+ * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
+ * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
+ * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
+ * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
+ * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
+ * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define CFG_SOURCE_ADDRESS "192.168.88.135"
+#define CFG_DESTINATION_ADDRESS"192.168.57.44"
+#define CFG_PORT   5000
+#define CFG_SO_MAX_SEND_BUFFER 1024
+
+int blocking = 0;
+
+int main(int argc, char* argv[])
+{
+   char cmsgbuf[CMSG_SPACE(sizeof(struct in_addr))];;
+   struct sockaddr_in to;
+   char data[CFG_SO_MAX_SEND_BUFFER] = {0};
+   unsigned int size = CFG_SO_MAX_SEND_BUFFER;
+   struct in_addr *source_address;
+   struct msghdr msg;
+   struct cmsghdr *cmsg;
+   struct iovec iov;
+   int so, bytes;
+
+   so = socket(AF_INET, SOCK_DGRAM, IPPROTO_UDP);
+   if (so < 0)
+   err(-1, "opening socket failed");
+
+   if (setsockopt(so, SOL_SOCKET, SO_SNDBUF, &size, sizeof(size)) < 0)
+   err(-1, "adjusting socket send buffer failed");
+
+   if (!blocking) {
+   unsigned long on = 1;
+
+   if (ioctl(so, FIONBIO, &on) < 0)
+   err(-1, "enabling non-blocking IO failed");
+   }
+
+   /* setup remote address */
+   memset(&to, 0, sizeof(to));
+   to.sin_family = AF_INET;
+   to.sin_addr.s_addr = inet_addr(CFG_DESTINATION_ADDRESS);
+   to.sin_port = htons(CFG_PORT);
+
+   /* setup buffer to be sent */
+   msg.msg_name = &to;
+   msg.msg_namelen = sizeof(to);
+   iov.iov_base = data;
+   iov.iov_len = sizeof(data);
+   msg.msg_iovlen = 1;
+   msg.msg_iov = &iov;
+
+   /* setup configuration for source address */
+   memset(cmsgbuf, 0, sizeof(cmsgbuf));
+   msg.msg_control = cmsgbuf;
+   msg.msg_controllen = sizeof(cmsgbuf);

fix iwm command wait vs interface reset

2017-05-08 Thread Stefan Sperling
In iwm_send_cmd(), look at the generation counter instead of the STOPPED flag
to determine whether the interface was reset while we were sleeping. The flag
will be set if the interface is still down when the task wakes up, but the
interface could already be up again in which case the flag will be cleared.
A stale task which sends an outdated command could confuse the firmware.

Index: if_iwm.c
===
RCS file: /cvs/src/sys/dev/pci/if_iwm.c,v
retrieving revision 1.180
diff -u -p -r1.180 if_iwm.c
--- if_iwm.c8 May 2017 12:41:23 -   1.180
+++ if_iwm.c8 May 2017 13:04:05 -
@@ -3630,6 +3630,7 @@ iwm_send_cmd(struct iwm_softc *sc, struc
int group_id;
size_t hdrlen, datasz;
uint8_t *data;
+   int generation = sc->sc_generation;
 
code = hcmd->id;
async = hcmd->flags & IWM_CMD_ASYNC;
@@ -3651,7 +3652,7 @@ iwm_send_cmd(struct iwm_softc *sc, struc
 * Is the hardware still available?  (after e.g. above wait).
 */
s = splnet();
-   if (sc->sc_flags & IWM_FLAG_STOPPED) {
+   if (generation != sc->sc_generation) {
err = ENXIO;
goto out;
}
@@ -3764,7 +3765,6 @@ iwm_send_cmd(struct iwm_softc *sc, struc
IWM_WRITE(sc, IWM_HBUS_TARG_WRPTR, ring->qid << 8 | ring->cur);
 
if (!async) {
-   int generation = sc->sc_generation;
err = tsleep(desc, PCATCH, "iwmcmd", hz);
if (err == 0) {
/* if hardware is no longer up, return error */



boot: put debug printfs inside of defines

2017-05-08 Thread Patrick Wildt
Hi,

most if (debug) prints are inside of an ifdef guard, but those two
apparently aren't.  The amd64 efiboot for instance does only define
debug if EFI_DEBUG is set, which it is not per default.  So it won't
build.

We can either guard those checks as well or remove the EFI_DEBUG ifdef
guarding the debug variable define.

ok?

Patrick

diff --git a/sys/arch/i386/stand/libsa/pxe_net.c 
b/sys/arch/i386/stand/libsa/pxe_net.c
index f5c8807e2f9..47bb2fa9a6e 100644
--- a/sys/arch/i386/stand/libsa/pxe_net.c
+++ b/sys/arch/i386/stand/libsa/pxe_net.c
@@ -168,8 +168,10 @@ net_getparams(int sock)
bootp(sock);
if (myip.s_addr != 0)
return 0;
+#ifdef NETIF_DEBUG
if (debug)
printf("net_getparams: BOOTP failed\n");
+#endif
 
return EIO;
 }
diff --git a/sys/lib/libsa/nfs.c b/sys/lib/libsa/nfs.c
index e621f09adec..201043ef409 100644
--- a/sys/lib/libsa/nfs.c
+++ b/sys/lib/libsa/nfs.c
@@ -535,8 +535,10 @@ nfs_read(struct open_file *f, void *buf, size_t size, 
size_t *resid)
return (errno); /* XXX - from nfs_readdata */
}
if (cc == 0) {
+#ifdef NFS_DEBUG
if (debug)
printf("nfs_read: hit EOF unexpectantly");
+#endif
goto ret;
}
fp->off += cc;



pf: remove pfr_{sin,sin6,mask} globals

2017-05-08 Thread Patrick Wildt
Hi,

in order to reduce globals so that we can run more parts of pf in
parallel, this diff removes the pfr_sin, pfr_sin6 and pfr_mask
globals.  Those are instead allocated on the stack.

ok?

Patrick

diff --git a/sys/net/pf_table.c b/sys/net/pf_table.c
index 708bd68cbcd..2cdff265ff5 100644
--- a/sys/net/pf_table.c
+++ b/sys/net/pf_table.c
@@ -140,11 +140,6 @@ struct pfr_walktree {
 struct pool pfr_ktable_pl;
 struct pool pfr_kentry_pl[PFRKE_MAX];
 struct pool pfr_kcounters_pl;
-struct sockaddr_in  pfr_sin;
-#ifdef INET6
-struct sockaddr_in6 pfr_sin6;
-#endif /* INET6 */
-union sockaddr_unionpfr_mask;
 struct pf_addr  pfr_ffaddr;
 
 int pfr_gcd(int, int);
@@ -235,13 +230,6 @@ pfr_initialize(void)
pool_init(&pfr_kcounters_pl, sizeof(struct pfr_kcounters),
0, IPL_SOFTNET, 0, "pfrkcounters", NULL);
 
-   pfr_sin.sin_len = sizeof(pfr_sin);
-   pfr_sin.sin_family = AF_INET;
-#ifdef INET6
-   pfr_sin6.sin6_len = sizeof(pfr_sin6);
-   pfr_sin6.sin6_family = AF_INET6;
-#endif /* INET6 */
-
memset(&pfr_ffaddr, 0xff, sizeof(pfr_ffaddr));
 }
 
@@ -1168,6 +1156,7 @@ pfr_walktree(struct radix_node *rn, void *arg, u_int id)
 {
struct pfr_kentry   *ke = (struct pfr_kentry *)rn;
struct pfr_walktree *w = arg;
+   union sockaddr_union mask;
int  flags = w->pfrw_flags;
 
switch (w->pfrw_op) {
@@ -1229,21 +1218,21 @@ pfr_walktree(struct radix_node *rn, void *arg, u_int id)
case AF_INET:
if (w->pfrw_dyn->pfid_acnt4++ > 0)
break;
-   pfr_prepare_network(&pfr_mask, AF_INET, ke->pfrke_net);
+   pfr_prepare_network(&mask, AF_INET, ke->pfrke_net);
w->pfrw_dyn->pfid_addr4 = *SUNION2PF(
&ke->pfrke_sa, AF_INET);
w->pfrw_dyn->pfid_mask4 = *SUNION2PF(
-   &pfr_mask, AF_INET);
+   &mask, AF_INET);
break;
 #ifdef INET6
case AF_INET6:
if (w->pfrw_dyn->pfid_acnt6++ > 0)
break;
-   pfr_prepare_network(&pfr_mask, AF_INET6, ke->pfrke_net);
+   pfr_prepare_network(&mask, AF_INET6, ke->pfrke_net);
w->pfrw_dyn->pfid_addr6 = *SUNION2PF(
&ke->pfrke_sa, AF_INET6);
w->pfrw_dyn->pfid_mask6 = *SUNION2PF(
-   &pfr_mask, AF_INET6);
+   &mask, AF_INET6);
break;
 #endif /* INET6 */
default:
@@ -2091,6 +2080,10 @@ int
 pfr_match_addr(struct pfr_ktable *kt, struct pf_addr *a, sa_family_t af)
 {
struct pfr_kentry   *ke = NULL;
+   struct sockaddr_in   tmp4;
+#ifdef INET6
+   struct sockaddr_in6  tmp6;
+#endif /* INET6 */
int  match;
 
if (!(kt->pfrkt_flags & PFR_TFLAG_ACTIVE) && kt->pfrkt_root != NULL)
@@ -2100,13 +2093,19 @@ pfr_match_addr(struct pfr_ktable *kt, struct pf_addr 
*a, sa_family_t af)
 
switch (af) {
case AF_INET:
-   pfr_sin.sin_addr.s_addr = a->addr32[0];
-   ke = (struct pfr_kentry *)rn_match(&pfr_sin, kt->pfrkt_ip4);
+   bzero(&tmp4, sizeof(tmp4));
+   tmp4.sin_len = sizeof(tmp4);
+   tmp4.sin_family = AF_INET;
+   tmp4.sin_addr.s_addr = a->addr32[0];
+   ke = (struct pfr_kentry *)rn_match(&tmp4, kt->pfrkt_ip4);
break;
 #ifdef INET6
case AF_INET6:
-   bcopy(a, &pfr_sin6.sin6_addr, sizeof(pfr_sin6.sin6_addr));
-   ke = (struct pfr_kentry *)rn_match(&pfr_sin6, kt->pfrkt_ip6);
+   bzero(&tmp6, sizeof(tmp6));
+   tmp6.sin6_len = sizeof(tmp6);
+   tmp6.sin6_family = AF_INET6;
+   bcopy(a, &tmp6.sin6_addr, sizeof(tmp6.sin6_addr));
+   ke = (struct pfr_kentry *)rn_match(&tmp6, kt->pfrkt_ip6);
break;
 #endif /* INET6 */
default:
@@ -2125,6 +2124,10 @@ pfr_update_stats(struct pfr_ktable *kt, struct pf_addr 
*a, struct pf_pdesc *pd,
 int op, int notrule)
 {
struct pfr_kentry   *ke = NULL;
+   struct sockaddr_in   tmp4;
+#ifdef INET6
+   struct sockaddr_in6  tmp6;
+#endif /* INET6 */
sa_family_t  af = pd->af;
u_int64_tlen = pd->tot_len;
int  dir_idx = (pd->dir == PF_OUT);
@@ -2137,13 +2140,19 @@ pfr_update_stats(struct pfr_ktable *kt, struct pf_addr 
*a, struct pf_pdesc *pd,
 
switch (af) {
case AF_INET:
-   pfr_sin.sin_addr.s_addr = a->addr32[0];
-   ke = (struct pfr_kentry *)rn_match(&pfr_sin, kt->pfrkt_ip4);

Re: routing socket panic

2017-05-08 Thread Mark Kettenis
> From: "Ted Unangst" 
> Date: Sun, 07 May 2017 21:29:16 -0400
> 
> Ted Unangst wrote:
> > Mike Belopuhov wrote:
> > > > So there is something in the tree that doesn't like the mbuf packet
> > > > header growth and decides to color outside the lines.
> > > > 
> > > 
> > > After looking into this with Mark, he has found out that the size of
> > > an mbuf structure on armv7 and hppa has exceeded MSIZE (256 bytes).
> > 
> > woah, there's no assert to check that? here's a quickie compile time diff.
> 
> ah, there is a CTASSERT macro. thought we had one, but didn't see it.

That really should CTASSERT(MSIZE == sizeof(struct mbuf)) though.

> Index: uipc_mbuf.c
> ===
> RCS file: /cvs/src/sys/kern/uipc_mbuf.c,v
> retrieving revision 1.245
> diff -u -p -r1.245 uipc_mbuf.c
> --- uipc_mbuf.c   7 Feb 2017 07:00:21 -   1.245
> +++ uipc_mbuf.c   8 May 2017 01:28:00 -
> @@ -161,6 +161,8 @@ mbinit(void)
>   int i;
>   unsigned int lowbits;
>  
> + CTASSERT(MSIZE >= sizeof(struct mbuf));
> +
>   m_pool_allocator.pa_pagesz = pool_allocator_multi.pa_pagesz;
>  
>   nmbclust_update();
> 
> 



Re: mbuf padding and alignment

2017-05-08 Thread Mike Belopuhov
On Mon, May 08, 2017 at 13:38 +0200, Mark Kettenis wrote:
> So the reason mikeb@'s mbuf changes caused issues is that the way we
> define struct mbuf is inherently fragile because it doesn't take
> structure padding into account.  Adding an int64_t member to struct
> pkthdr changed the alignment from 4 bytes to 8 bytes on most 32-bit
> architectures (but not i386).  Since struct m_hdr is only 20 bytes
> this means the compiler has to add padding between struct m_hdr and
> union that includes struct pkthdr in struct mbuf.  But this padding is
> not included in the calculation of MLEN so struct mbuf gets too big.
> 
> FreeBSD solved the issue by adding explicit padding when !__LP64__.
> But that's rather fragile.  The padding would have to be reconsidered
> whenever the defenition of the mbuf header changes.
> 
> NetBSD solved this by using offsetoff() instead of sizeof() in its
> definition of MLEN.  But that requires the definition of a dummy
> structure through a macro that isn't exactly pretty.  But it should be
> fairly robust.
> 
> The diff below takes a different approach.  It marks struct m_hdr as
> 8-byte aligned.  That has the effect of padding out the struct to be a
> multiple of 8 bytes (because otherways the elements of arrays of
> struct m_hdr wouldn't be properly aligned).  This makes us rely on
> compiler extensions (C99 doesn't include the possibility to specify
> alignment), but all compilers we care of provide such an extension and
> it is nicely abstracted in .  That somewhat matters since
>  exposes the struct mbuf definition to userland.  This
> approach should be robus (until somebody decides they absolutely need
> a 128-bit type in their packet header).
>

The userland exposure is basically limited to libkvm.  I don't see
it as a problem.

> Fun detail; gcc on armv7 crashes if I use __aligned(sizeof(int64_t))
> instead of __aligned(8).  The armv7 kernel boots fine with this diff,
> but I'm defenitely doing a full build before I even consider
> committing this.
> 
> Thoughts?
>

If this solves the issue for these archictures and there are no
objections from the crowd, I'm OK with the diff.  I don't have
an issue with __aligned on structs.

We can also add a CTASSERT directly into the mbuf.h.

> 
> Index: sys/mbuf.h
> ===
> RCS file: /cvs/src/sys/sys/mbuf.h,v
> retrieving revision 1.226
> diff -u -p -r1.226 mbuf.h
> --- sys/mbuf.h7 May 2017 17:53:30 -   1.226
> +++ sys/mbuf.h8 May 2017 11:11:09 -
> @@ -86,7 +86,7 @@ struct m_hdr {
>   u_int   mh_len; /* amount of data in this mbuf */
>   short   mh_type;/* type of data in this mbuf */
>   u_short mh_flags;   /* flags; see below */
> -};
> +} __aligned(8);
>  
>  /* pf stuff */
>  struct pf_state_key;
> @@ -122,6 +122,7 @@ struct pkthdr_pf {
>  struct   pkthdr {
>   void*ph_cookie; /* additional data */
>   SLIST_HEAD(, m_tag)  ph_tags;   /* list of packet tags */
> + int64_t  ph_timestamp;  /* packet timestamp */
>   int  len;   /* total packet length */
>   u_int16_tph_tagsset;/* mtags attached */
>   u_int16_tph_flowid; /* pseudo unique flow id */



Re: splsoftnet() -> NET_ASSERT_LOCKED()

2017-05-08 Thread Alexander Bluhm
On Mon, May 08, 2017 at 11:04:56AM +0200, Martin Pieuchot wrote:
> With my previous diff to remove pfctlinput() false positive we can now
> turn the last splsoftnet() into an assert.
> 
> pfctlinput() is only used on OpenBSD for PRC_REDIRECT_HOST.  It is
> always called during the input path, so with the NET_LOCK() held.
> 
> ok?

OK bluhm@

> 
> Index: kern/uipc_domain.c
> ===
> RCS file: /cvs/src/sys/kern/uipc_domain.c,v
> retrieving revision 1.49
> diff -u -p -r1.49 uipc_domain.c
> --- kern/uipc_domain.c27 Feb 2017 19:16:56 -  1.49
> +++ kern/uipc_domain.c8 May 2017 08:57:03 -
> @@ -221,15 +221,15 @@ pfctlinput(int cmd, struct sockaddr *sa)
>  {
>   struct domain *dp;
>   struct protosw *pr;
> - int i, s;
> + int i;
> +
> + NET_ASSERT_LOCKED();
>  
> - s = splsoftnet();
>   for (i = 0; (dp = domains[i]) != NULL; i++) {
>   for (pr = dp->dom_protosw; pr < dp->dom_protoswNPROTOSW; pr++)
>   if (pr->pr_ctlinput)
>   (*pr->pr_ctlinput)(cmd, sa, 0, NULL);
>   }
> - splx(s);
>  }
>  
>  void



Re: Kill useless pfctlinput()

2017-05-08 Thread Alexander Bluhm
On Mon, May 08, 2017 at 10:55:04AM +0200, Martin Pieuchot wrote:
> This diff removes a false positive from bluhm@'s lock analyser.  This is
> the last piece to turn the NET_LOCK() on.
> 
> pfctlinput(PRC_IFDOWN, ...) is a noop.  None of the *_ctlinput() handler
> present in the kernel handle PRC_IFDOWN.  And all of do nothing because
> inet{,6}ctlerrmap == 0.
> 
> The two other chunks are for cleanup purposes.
> 
> ok?

OK bluhm@

> 
> Index: net/if.c
> ===
> RCS file: /cvs/src/sys/net/if.c,v
> retrieving revision 1.494
> diff -u -p -r1.494 if.c
> --- net/if.c  4 May 2017 15:00:24 -   1.494
> +++ net/if.c  8 May 2017 08:47:10 -
> @@ -1504,15 +1504,10 @@ if_downall(void)
>  void
>  if_down(struct ifnet *ifp)
>  {
> - struct ifaddr *ifa;
> -
>   splsoftassert(IPL_SOFTNET);
>  
>   ifp->if_flags &= ~IFF_UP;
>   getmicrotime(&ifp->if_lastchange);
> - TAILQ_FOREACH(ifa, &ifp->if_addrlist, ifa_list) {
> - pfctlinput(PRC_IFDOWN, ifa->ifa_addr);
> - }
>   IFQ_PURGE(&ifp->if_snd);
>  
>   if_linkstate(ifp);
> Index: netinet6/ip6_output.c
> ===
> RCS file: /cvs/src/sys/netinet6/ip6_output.c,v
> retrieving revision 1.228
> diff -u -p -r1.228 ip6_output.c
> --- netinet6/ip6_output.c 3 May 2017 08:35:55 -   1.228
> +++ netinet6/ip6_output.c 8 May 2017 08:46:28 -
> @@ -708,15 +708,6 @@ reroute:
>   if (mtu > IPV6_MAXPACKET)
>   mtu = IPV6_MAXPACKET;
>  
> -#if 0
> - /* Notify a proper path MTU to applications. */
> - mtu32 = (u_int32_t)mtu;
> - bzero(&ip6cp, sizeof(ip6cp));
> - ip6cp.ip6c_cmdarg = (void *)&mtu32;
> - pfctlinput2(PRC_MSGSIZE, sin6tosa(&ro_pmtu->ro_dst),
> - (void *)&ip6cp);
> -#endif
> -
>   /*
>* Change the next header field of the last header in the
>* unfragmentable part.
> Index: netinet6/nd6.c
> ===
> RCS file: /cvs/src/sys/netinet6/nd6.c,v
> retrieving revision 1.207
> diff -u -p -r1.207 nd6.c
> --- netinet6/nd6.c26 Mar 2017 08:49:22 -  1.207
> +++ netinet6/nd6.c8 May 2017 08:46:21 -
> @@ -748,10 +748,6 @@ nd6_free(struct rtentry *rt, int gc)
>  
>   splsoftassert(IPL_SOFTNET);
>  
> - /*
> -  * we used to have pfctlinput(PRC_HOSTDEAD) here.
> -  * even though it is not harmful, it was not really necessary.
> -  */
>   ifp = if_get(rt->rt_ifidx);
>  
>   if (!ip6_forwarding) {



mbuf padding and alignment

2017-05-08 Thread Mark Kettenis
So the reason mikeb@'s mbuf changes caused issues is that the way we
define struct mbuf is inherently fragile because it doesn't take
structure padding into account.  Adding an int64_t member to struct
pkthdr changed the alignment from 4 bytes to 8 bytes on most 32-bit
architectures (but not i386).  Since struct m_hdr is only 20 bytes
this means the compiler has to add padding between struct m_hdr and
union that includes struct pkthdr in struct mbuf.  But this padding is
not included in the calculation of MLEN so struct mbuf gets too big.

FreeBSD solved the issue by adding explicit padding when !__LP64__.
But that's rather fragile.  The padding would have to be reconsidered
whenever the defenition of the mbuf header changes.

NetBSD solved this by using offsetoff() instead of sizeof() in its
definition of MLEN.  But that requires the definition of a dummy
structure through a macro that isn't exactly pretty.  But it should be
fairly robust.

The diff below takes a different approach.  It marks struct m_hdr as
8-byte aligned.  That has the effect of padding out the struct to be a
multiple of 8 bytes (because otherways the elements of arrays of
struct m_hdr wouldn't be properly aligned).  This makes us rely on
compiler extensions (C99 doesn't include the possibility to specify
alignment), but all compilers we care of provide such an extension and
it is nicely abstracted in .  That somewhat matters since
 exposes the struct mbuf definition to userland.  This
approach should be robus (until somebody decides they absolutely need
a 128-bit type in their packet header).

Fun detail; gcc on armv7 crashes if I use __aligned(sizeof(int64_t))
instead of __aligned(8).  The armv7 kernel boots fine with this diff,
but I'm defenitely doing a full build before I even consider
committing this.

Thoughts?


Index: sys/mbuf.h
===
RCS file: /cvs/src/sys/sys/mbuf.h,v
retrieving revision 1.226
diff -u -p -r1.226 mbuf.h
--- sys/mbuf.h  7 May 2017 17:53:30 -   1.226
+++ sys/mbuf.h  8 May 2017 11:11:09 -
@@ -86,7 +86,7 @@ struct m_hdr {
u_int   mh_len; /* amount of data in this mbuf */
short   mh_type;/* type of data in this mbuf */
u_short mh_flags;   /* flags; see below */
-};
+} __aligned(8);
 
 /* pf stuff */
 struct pf_state_key;
@@ -122,6 +122,7 @@ struct pkthdr_pf {
 struct pkthdr {
void*ph_cookie; /* additional data */
SLIST_HEAD(, m_tag)  ph_tags;   /* list of packet tags */
+   int64_t  ph_timestamp;  /* packet timestamp */
int  len;   /* total packet length */
u_int16_tph_tagsset;/* mtags attached */
u_int16_tph_flowid; /* pseudo unique flow id */



relayd: incomplete response from a TLS-accelerated apache

2017-05-08 Thread Maxim Bourmistrov
Hey,
I investigate a problem were TLS-asselerated machine response is incomplete.
I was able to reproduce this on OpenBSD 5.9, 6.0 and 6.1. Test on 5.8 is about 
to be.

Following env I have:

relay1: relayd machine
web1: apache 2.2.31 serving the request
client1: requester

relay1 is configured following way (relevant lines):

http protocol http_relay {
tcp { nodelay, sack, socket buffer 65536, backlog 1024 }
match header append "X-Forwarded-For" value "$REMOTE_ADDR"
match header set "X-Forwarded-By" value "$SERVER_ADDR:$SERVER_PORT"
match header set "Keep-Alive" value "$TIMEOUT"
match request header remove "Proxy"
}

http protocol tls_accel {
tcp { nodelay, sack, socket buffer 65536, backlog 1024 }
match header append "X-Forwarded-For" value "$REMOTE_ADDR"
match header set "X-Forwarded-By" value "$SERVER_ADDR:$SERVER_PORT"
match header set "X-Forwarded-Proto" value "https"
match header set "X-Forwarded-Port" value "443"
match header set "Keep-Alive" value "$TIMEOUT"
match request header remove "Proxy"

tls { tlsv1, \
ciphers "AES:!AES256:!aNULL" \
  }
}

table  { 172.16.1.111 }

relay int_test_tls {
listen on 172.16.1.99 port 443 tls
protocol tls_accel
forward to  port 80 mode roundrobin check http "/" code 200
}

relay int_test_http {
listen on 172.16.1.99 port 80
protocol http_relay
forward to  port 80 mode roundrobin check http "/" code 200
}

web1 is a std Apache 2.2.31 with enabled deflate for the following

AddOutputFilterByType DEFLATE application/json
AddOutputFilterByType DEFLATE text/html
AddOutputFilterByType DEFLATE text/plain
AddOutputFilterByType DEFLATE text/xml
AddOutputFilterByType DEFLATE text/css
AddOutputFilterByType DEFLATE application/x-javascript
AddOutputFilterByType DEFLATE application/javascript

and serving a JS file.

client1 is running PHP code from CLI to reproduce this problem.


Following is observed:

1. Client1 requests web1 directly on port 80 and gets full response

shell$ php client3.php
Expected length: 547204
Received length: 547204

[Response Headers]
HTTP/1.1 200 OK
Date: Mon, 08 May 2017 11:08:27 GMT
Server: Apache/2.2.31 (Unix) mod_ssl/2.2.31 OpenSSL/1.0.1e-fips
Last-Modified: Mon, 08 May 2017 07:22:43 GMT
ETag: "60319-85984-54efe1ae42be3"
Accept-Ranges: bytes
Content-Length: 547204
Vary: Accept-Encoding
Connection: close
Content-Type: application/javascript 

2. Client1 requests web1 directly on port 80 WITH GZIP enabled and gets full 
response back
I see gzipped stream on the screen and then it gets decoded to a complete file. 
File I get is not cut.

Expected length: Content-Length not recieved
Received length: 165454

[Response Headers]
HTTP/1.1 200 OK
Date: Mon, 08 May 2017 11:10:18 GMT
Server: Apache/2.2.31 (Unix) mod_ssl/2.2.31 OpenSSL/1.0.1e-fips
Last-Modified: Mon, 08 May 2017 07:22:43 GMT
ETag: "60319-85984-54efe1ae42be3"
Accept-Ranges: bytes
Vary: Accept-Encoding
Content-Encoding: gzip
Connection: close
Content-Type: application/javascript

3. and 4. Clien1 requests relay1 on port 80 (with and without GZIP) and gets 
complete response

5. Client1 requests relay1 on port 443 without GZIP - response is incomplete

Expected length: 547204
Received length: 396424

[Response Headers]
HTTP/1.1 200 OK
Accept-Ranges: bytes
Connection: close
Content-Length: 547204
Content-Type: application/javascript
Date: Mon, 08 May 2017 11:14:59 GMT
ETag: "60319-85984-54efe1ae42be3"
Last-Modified: Mon, 08 May 2017 07:22:43 GMT
Server: Apache/2.2.31 (Unix) mod_ssl/2.2.31 OpenSSL/1.0.1e-fips
Vary: Accept-Encoding

6. Client1 requests relay1 on port 443 with GZIP - response is complete.


So non-gzipped response from behind the relay1 is incomplete while doing TLS 
termination.
Files server.js and client.php can be provided upon request.

Any ideas?

Br





Re: pf: percpu anchor stacks

2017-05-08 Thread Martin Pieuchot
On 28/03/17(Tue) 13:02, Alexandr Nedvedicky wrote:
> [...] 
> > 
> >  - s/test_status/action/ as it's done everywhere else?
> 
> I've opted to test_status, because it's something different to 'action'
> as we use it in current code.

I agree with you for test_status.  What about naming the enum and use it
instead of 'int' for the variable?  This implicitly documents the possible
option and allow the compiler to check for missing cases in switch.

> > Does this pass pfctl regress tests?
> 
> I'm about to run those tests for OpenBSD.

Did you manage to do that?

> > While I haven't noticed anything criminal here, it makes me
> > wonder if it'd be possible to do this change in a few steps:
> > factor out rule maching from pf_test_rule and then bring in
> > anchor changes?
> > 
> 
> if I understand you right, you basically want me to make change
> in two steps:
> 
>   the first step splits current pf_test_rule() to pf_match_rule() and
>   pf_test_rule()
> 
>   the second step will kill global anchor stack array by introducing
>   a true recursion. The patch will remove pf_step_out_of_anchor()
>   function.
> 
> I think I can do it. And also as Theo pointed out there is no rush
> to get that patch to tree.

Now *is* the time to commit the first step, the refactoring.  Once
that's done we can discuss the introduction of the context.

Could you come up with such diff?

Cheers,
Martin



splsoftnet() -> NET_ASSERT_LOCKED()

2017-05-08 Thread Martin Pieuchot
With my previous diff to remove pfctlinput() false positive we can now
turn the last splsoftnet() into an assert.

pfctlinput() is only used on OpenBSD for PRC_REDIRECT_HOST.  It is
always called during the input path, so with the NET_LOCK() held.

ok?

Index: kern/uipc_domain.c
===
RCS file: /cvs/src/sys/kern/uipc_domain.c,v
retrieving revision 1.49
diff -u -p -r1.49 uipc_domain.c
--- kern/uipc_domain.c  27 Feb 2017 19:16:56 -  1.49
+++ kern/uipc_domain.c  8 May 2017 08:57:03 -
@@ -221,15 +221,15 @@ pfctlinput(int cmd, struct sockaddr *sa)
 {
struct domain *dp;
struct protosw *pr;
-   int i, s;
+   int i;
+
+   NET_ASSERT_LOCKED();
 
-   s = splsoftnet();
for (i = 0; (dp = domains[i]) != NULL; i++) {
for (pr = dp->dom_protosw; pr < dp->dom_protoswNPROTOSW; pr++)
if (pr->pr_ctlinput)
(*pr->pr_ctlinput)(cmd, sa, 0, NULL);
}
-   splx(s);
 }
 
 void



Kill useless pfctlinput()

2017-05-08 Thread Martin Pieuchot
This diff removes a false positive from bluhm@'s lock analyser.  This is
the last piece to turn the NET_LOCK() on.

pfctlinput(PRC_IFDOWN, ...) is a noop.  None of the *_ctlinput() handler
present in the kernel handle PRC_IFDOWN.  And all of do nothing because
inet{,6}ctlerrmap == 0.

The two other chunks are for cleanup purposes.

ok?

Index: net/if.c
===
RCS file: /cvs/src/sys/net/if.c,v
retrieving revision 1.494
diff -u -p -r1.494 if.c
--- net/if.c4 May 2017 15:00:24 -   1.494
+++ net/if.c8 May 2017 08:47:10 -
@@ -1504,15 +1504,10 @@ if_downall(void)
 void
 if_down(struct ifnet *ifp)
 {
-   struct ifaddr *ifa;
-
splsoftassert(IPL_SOFTNET);
 
ifp->if_flags &= ~IFF_UP;
getmicrotime(&ifp->if_lastchange);
-   TAILQ_FOREACH(ifa, &ifp->if_addrlist, ifa_list) {
-   pfctlinput(PRC_IFDOWN, ifa->ifa_addr);
-   }
IFQ_PURGE(&ifp->if_snd);
 
if_linkstate(ifp);
Index: netinet6/ip6_output.c
===
RCS file: /cvs/src/sys/netinet6/ip6_output.c,v
retrieving revision 1.228
diff -u -p -r1.228 ip6_output.c
--- netinet6/ip6_output.c   3 May 2017 08:35:55 -   1.228
+++ netinet6/ip6_output.c   8 May 2017 08:46:28 -
@@ -708,15 +708,6 @@ reroute:
if (mtu > IPV6_MAXPACKET)
mtu = IPV6_MAXPACKET;
 
-#if 0
-   /* Notify a proper path MTU to applications. */
-   mtu32 = (u_int32_t)mtu;
-   bzero(&ip6cp, sizeof(ip6cp));
-   ip6cp.ip6c_cmdarg = (void *)&mtu32;
-   pfctlinput2(PRC_MSGSIZE, sin6tosa(&ro_pmtu->ro_dst),
-   (void *)&ip6cp);
-#endif
-
/*
 * Change the next header field of the last header in the
 * unfragmentable part.
Index: netinet6/nd6.c
===
RCS file: /cvs/src/sys/netinet6/nd6.c,v
retrieving revision 1.207
diff -u -p -r1.207 nd6.c
--- netinet6/nd6.c  26 Mar 2017 08:49:22 -  1.207
+++ netinet6/nd6.c  8 May 2017 08:46:21 -
@@ -748,10 +748,6 @@ nd6_free(struct rtentry *rt, int gc)
 
splsoftassert(IPL_SOFTNET);
 
-   /*
-* we used to have pfctlinput(PRC_HOSTDEAD) here.
-* even though it is not harmful, it was not really necessary.
-*/
ifp = if_get(rt->rt_ifidx);
 
if (!ip6_forwarding) {