Re: bridge(4)+pf(4) fix incoming interface

2019-07-17 Thread Alexander Bluhm
On Wed, Jul 17, 2019 at 04:35:22PM -0300, Martin Pieuchot wrote:
> Diff below is a rework of Eygene's submission to avoid duplicating the
> logic leading to the re-enqueue of a packet based on a matching MAC
> address.
>
> The bug first explained by Eygene [0] happens when multiple members of
> a bridge(4) share the same MAC address.  In that particular case the
> order of the members matter as the first one encounter during the loop
> will be considered as the "receiving" interface.
>
> The idea of the fix is to prefer the physical interface instead, which
> in this case is referenced by the `ifp' argument of bridge_process().
>
> The diff below does a bit of plumbing to avoid code duplication:
>
> - rename the original port member descriptor from `bif' to `bif0'
> - check for bad source MAC (loop prevention) early
> - check for wrongly crafted packet before dereferencing `eh'
>
> Ok?

OK bluhm@

>   sc = brifp->if_softc;
>   SMR_SLIST_FOREACH_LOCKED(bif, >sc_iflist, bif_next) {
> - if (bif->ifp == ifp)
> - break;
> + if (bridge_ourether(bif->ifp, eh->ether_shost))
> + goto bad;
> + if (bif->ifp == ifp) {
> + bif0 = bif;
> + }

This  { } is not KNF.



Re: mbuf cluster limit pool wakeup

2019-07-17 Thread Alexander Bluhm
On Tue, Jul 16, 2019 at 08:58:43PM -0300, Martin Pieuchot wrote:
> On 16/07/19(Tue) 21:35, Alexander Bluhm wrote:
> > Hi,
> >
> > When the kernel reaches the sysclt kern.maxclusters limit, operations
> > get stuck while holding the net lock.  Increasing the limit does
> > not help as there is no wakeup of the pools.  So run through the
> > mbuf pool request list when the limit changes.
>
> Should you call pool_wakeup() only if the new limit is greater than the
> current one?

Additional wakups do not hurt, especially from the cold sysctl path.
Note that sysctl KERN_MAXCLUSTERS has a val != nmbclust check
already.  So the value has to change.  I think it is not worth to
add more code to nmbclust_update.

> > There seem to more problems when recovering from mbuf shortage, but
> > this is the most obvious one.
>
> By more problems to you mean "How to give mbuf back to the pool"?

Sorry for being so vague.

dlg@ has some ideas about what could cause starvation in pools.
But I see problems in socket, protocol and interface layer.

The listen queue is full, inetd blocks reading from another socket.
I guess poll/select information was not correctly reported to
userland so it got stuck.

I see half closed TCP connections hanging on loopback forever:

tcp  0  0  127.0.0.1.9127.0.0.1.18490ESTABLISHED
tcp  0  0  127.0.0.1.18490127.0.0.1.9FIN_WAIT_1

And I have to do ifconfig vio0 down and ifconfig vio0 up to receive
packets on the interface again.

These things have to be investigated and fixed after this diff has
been commited.  In general we behave poorly when hitting resource
limits.  I think nobody is testing this regulary.

bluhm

> > Index: kern/subr_pool.c
> > ===
> > RCS file: /data/mirror/openbsd/cvs/src/sys/kern/subr_pool.c,v
> > retrieving revision 1.227
> > diff -u -p -r1.227 subr_pool.c
> > --- kern/subr_pool.c23 Apr 2019 13:35:12 -  1.227
> > +++ kern/subr_pool.c16 Jul 2019 18:02:08 -
> > @@ -815,6 +815,12 @@ pool_put(struct pool *pp, void *v)
> > if (freeph != NULL)
> > pool_p_free(pp, freeph);
> >
> > +   pool_wakeup(pp);
> > +}
> > +
> > +void
> > +pool_wakeup(struct pool *pp)
> > +{
> > if (!TAILQ_EMPTY(>pr_requests)) {
> > pl_enter(pp, >pr_requests_lock);
> > pool_runqueue(pp, PR_NOWAIT);
> > Index: kern/uipc_mbuf.c
> > ===
> > RCS file: /data/mirror/openbsd/cvs/src/sys/kern/uipc_mbuf.c,v
> > retrieving revision 1.270
> > diff -u -p -r1.270 uipc_mbuf.c
> > --- kern/uipc_mbuf.c16 Jul 2019 17:39:02 -  1.270
> > +++ kern/uipc_mbuf.c16 Jul 2019 18:04:33 -
> > @@ -167,8 +167,6 @@ mbinit(void)
> >
> > m_pool_allocator.pa_pagesz = pool_allocator_multi.pa_pagesz;
> >
> > -   error = nmbclust_update(nmbclust);
> > -   KASSERT(error == 0);
> > mbuf_mem_alloc = 0;
> >
> >  #if DIAGNOSTIC
> > @@ -196,6 +194,9 @@ mbinit(void)
> > m_pool_init([i], mclsizes[i], 64, mclnames[i]);
> > }
> >
> > +   error = nmbclust_update(nmbclust);
> > +   KASSERT(error == 0);
> > +
> > (void)mextfree_register(m_extfree_pool);
> > KASSERT(num_extfree_fns == 1);
> >  }
> > @@ -217,11 +218,18 @@ mbcpuinit()
> >  int
> >  nmbclust_update(long newval)
> >  {
> > +   int i;
> > +
> > if (newval < 0 || newval > LONG_MAX / MCLBYTES)
> > return ERANGE;
> > /* update the global mbuf memory limit */
> > nmbclust = newval;
> > mbuf_mem_limit = nmbclust * MCLBYTES;
> > +
> > +   pool_wakeup();
> > +   for (i = 0; i < nitems(mclsizes); i++)
> > +   pool_wakeup([i]);
> > +
> > return 0;
> >  }
> >
> > Index: sys/pool.h
> > ===
> > RCS file: /data/mirror/openbsd/cvs/src/sys/sys/pool.h,v
> > retrieving revision 1.76
> > diff -u -p -r1.76 pool.h
> > --- sys/pool.h  10 Feb 2019 22:45:58 -  1.76
> > +++ sys/pool.h  16 Jul 2019 18:02:08 -
> > @@ -271,6 +271,7 @@ voidpool_request_init(struct pool_requ
> > void (*)(struct pool *, void *, void *), void *);
> >  void   pool_request(struct pool *, struct pool_request *);
> >  void   pool_put(struct pool *, void *);
> > +void   pool_wakeup(struct pool *);
> >  intpool_reclaim(struct pool *);
> >  void   pool_reclaim_all(void);
> >  intpool_prime(struct pool *, int);
> >



iked(8): add transport mode for childsas

2019-07-17 Thread Tobias Heider
This diff allows iked(8) to optionally negotiate Child SAs with IPsec transport
mode instead of tunnel mode.

Ok?

Index: iked.conf.5
===
RCS file: /cvs/src/sbin/iked/iked.conf.5,v
retrieving revision 1.55
diff -u -p -u -r1.55 iked.conf.5
--- iked.conf.5 11 May 2019 16:30:23 -  1.55
+++ iked.conf.5 17 Jul 2019 20:26:54 -
@@ -268,6 +268,15 @@ specifies that
 .Xr ipcomp 4 ,
 the IP Payload Compression protocol, is negotiated in addition to 
encapsulation.
 The optional compression is applied before packets are encapsulated.
+.It Op Ar tmode
+.Ar tmode
+describes the encapsulation mode to be used.
+Possible modes are
+.Ar tunnel
+and
+.Ar transport ;
+the default is
+.Ar tunnel .
 .It Op Ar encap
 .Ar encap
 specifies the encapsulation protocol to be used.
@@ -277,15 +286,6 @@ and
 .Ar ah ;
 the default is
 .Ar esp .
-.\" .It Op Ar tmode
-.\" .Ar tmode
-.\" describes the encapsulation mode to be used.
-.\" Possible modes are
-.\" .Ar tunnel
-.\" and
-.\" .Ar transport ;
-.\" the default is
-.\" .Ar tunnel .
 .It Op Ar af
 This policy only applies to endpoints of the specified address family
 which can be either
Index: iked.h
===
RCS file: /cvs/src/sbin/iked/iked.h,v
retrieving revision 1.121
diff -u -p -u -r1.121 iked.h
--- iked.h  11 May 2019 16:30:23 -  1.121
+++ iked.h  17 Jul 2019 20:26:55 -
@@ -251,6 +251,7 @@ struct iked_policy {
 #define IKED_POLICY_QUICK   0x08
 #define IKED_POLICY_SKIP0x10
 #define IKED_POLICY_IPCOMP  0x20
+#define IKED_POLICY_TRANSPORT   0x40
 
int  pol_refcnt;
 
@@ -465,6 +466,7 @@ struct iked_sa {
 
int  sa_mobike; /* MOBIKE */
int  sa_frag;   /* fragmentation */
+   int  sa_use_transport_mode; /* should be 
reset */
 
struct iked_timersa_timer;  /* SA timeouts */
 #define IKED_IKE_SA_EXCHANGE_TIMEOUT300/* 5 minutes */
Index: ikev2.c
===
RCS file: /cvs/src/sbin/iked/ikev2.c,v
retrieving revision 1.171
diff -u -p -u -r1.171 ikev2.c
--- ikev2.c 11 May 2019 16:30:23 -  1.171
+++ ikev2.c 17 Jul 2019 20:26:57 -
@@ -148,8 +148,12 @@ ssize_t ikev2_add_nat_detection(struct i
 ssize_t ikev2_add_fragmentation(struct iked *, struct ibuf *,
struct ikev2_payload **, struct iked_message *, ssize_t);
 
+ssize_t ikev2_add_notify(struct iked *, struct ibuf *,
+   struct ikev2_payload **, ssize_t, struct iked_sa *, uint16_t);
 ssize_t ikev2_add_mobike(struct iked *, struct ibuf *,
struct ikev2_payload **, ssize_t, struct iked_sa *);
+ssize_t ikev2_add_transport_mode(struct iked *, struct ibuf *,
+   struct ikev2_payload **, ssize_t, struct iked_sa *);
 int ikev2_update_sa_addresses(struct iked *, struct iked_sa *);
 int ikev2_resp_informational(struct iked *, struct iked_sa *,
struct iked_message *);
@@ -1670,8 +1674,9 @@ ikev2_add_ipcompnotify(struct iked *env,
 }
 
 ssize_t
-ikev2_add_mobike(struct iked *env, struct ibuf *e,
-struct ikev2_payload **pld, ssize_t len, struct iked_sa *sa)
+ikev2_add_notify(struct iked *env, struct ibuf *e,
+struct ikev2_payload **pld, ssize_t len, struct iked_sa *sa,
+uint16_t notify)
 {
struct ikev2_notify *n;
uint8_t *ptr;
@@ -1687,13 +1692,27 @@ ikev2_add_mobike(struct iked *env, struc
n = (struct ikev2_notify *)ptr;
n->n_protoid = 0;
n->n_spisize = 0;
-   n->n_type = htobe16(IKEV2_N_MOBIKE_SUPPORTED);
+   n->n_type = htobe16(notify);
log_debug("%s: done", __func__);
 
return (len);
 }
 
 ssize_t
+ikev2_add_mobike(struct iked *env, struct ibuf *e,
+struct ikev2_payload **pld, ssize_t len, struct iked_sa *sa)
+{
+   return  ikev2_add_notify(env, e, pld, len, sa, 
IKEV2_N_MOBIKE_SUPPORTED);
+}
+
+ssize_t
+ikev2_add_transport_mode(struct iked *env, struct ibuf *e,
+struct ikev2_payload **pld, ssize_t len, struct iked_sa *sa)
+{
+   return  ikev2_add_notify(env, e, pld, len, sa, 
IKEV2_N_USE_TRANSPORT_MODE);
+}
+
+ssize_t
 ikev2_add_fragmentation(struct iked *env, struct ibuf *buf,
 struct ikev2_payload **pld, struct iked_message *msg, ssize_t len)
 {
@@ -5209,6 +5228,7 @@ ikev2_childsa_negotiate(struct iked *env
csa->csa_spi.spi_protoid = prop->prop_protoid;
csa->csa_esn = esn;
csa->csa_acquired = acquired;
+   csa->csa_transport = sa->sa_use_transport_mode;
 
/* Set up responder's SPIs */
if (initiator) {
@@ -5295,6 +5315,7 @@ ikev2_childsa_negotiate(struct iked *env
 
ret = 0;
  

bridge(4)+pf(4) fix incoming interface

2019-07-17 Thread Martin Pieuchot
Diff below is a rework of Eygene's submission to avoid duplicating the
logic leading to the re-enqueue of a packet based on a matching MAC
address.

The bug first explained by Eygene [0] happens when multiple members of
a bridge(4) share the same MAC address.  In that particular case the
order of the members matter as the first one encounter during the loop
will be considered as the "receiving" interface.

The idea of the fix is to prefer the physical interface instead, which
in this case is referenced by the `ifp' argument of bridge_process().

The diff below does a bit of plumbing to avoid code duplication:

- rename the original port member descriptor from `bif' to `bif0'
- check for bad source MAC (loop prevention) early 
- check for wrongly crafted packet before dereferencing `eh'

Ok?

[0] https://marc.info/?l=openbsd-tech=155993043300702=2

Index: net/if_bridge.c
===
RCS file: /cvs/src/sys/net/if_bridge.c,v
retrieving revision 1.336
diff -u -p -r1.336 if_bridge.c
--- net/if_bridge.c 17 Jul 2019 16:46:17 -  1.336
+++ net/if_bridge.c 17 Jul 2019 19:34:19 -
@@ -931,10 +931,6 @@ bridgeintr_frame(struct ifnet *brifp, st
bif = bridge_getbif(src_if);
KASSERT(bif != NULL);
 
-   if (m->m_pkthdr.len < sizeof(eh)) {
-   m_freem(m);
-   return;
-   }
m_copydata(m, 0, ETHER_HDR_LEN, (caddr_t));
dst = (struct ether_addr *)_dhost[0];
src = (struct ether_addr *)_shost[0];
@@ -1115,7 +,7 @@ bridge_process(struct ifnet *ifp, struct
 {
struct ifnet *brifp;
struct bridge_softc *sc;
-   struct bridge_iflist *bif, *bif0;
+   struct bridge_iflist *bif = NULL, *bif0 = NULL;
struct ether_header *eh;
struct mbuf *mc;
 #if NBPFILTER > 0
@@ -1128,6 +1124,9 @@ bridge_process(struct ifnet *ifp, struct
if ((brifp == NULL) || !ISSET(brifp->if_flags, IFF_RUNNING))
goto reenqueue;
 
+   if (m->m_pkthdr.len < sizeof(*eh))
+   goto bad;
+
 #if NVLAN > 0
/*
 * If the underlying interface removed the VLAN header itself,
@@ -1146,17 +1145,21 @@ bridge_process(struct ifnet *ifp, struct
bpf_mtap_ether(if_bpf, m, BPF_DIRECTION_IN);
 #endif
 
+   eh = mtod(m, struct ether_header *);
+
sc = brifp->if_softc;
SMR_SLIST_FOREACH_LOCKED(bif, >sc_iflist, bif_next) {
-   if (bif->ifp == ifp)
-   break;
+   if (bridge_ourether(bif->ifp, eh->ether_shost))
+   goto bad;
+   if (bif->ifp == ifp) {
+   bif0 = bif;
+   }
}
-   if (bif == NULL)
+   if (bif0 == NULL)
goto reenqueue;
 
bridge_span(brifp, m);
 
-   eh = mtod(m, struct ether_header *);
if (ETHER_IS_MULTICAST(eh->ether_dhost)) {
/*
 * Reserved destination MAC addresses (01:80:C2:00:00:0x)
@@ -1169,7 +1172,8 @@ bridge_process(struct ifnet *ifp, struct
ETHER_ADDR_LEN - 1) == 0) {
if (eh->ether_dhost[ETHER_ADDR_LEN - 1] == 0) {
/* STP traffic */
-   m = bstp_input(sc->sc_stp, bif->bif_stp, eh, m);
+   m = bstp_input(sc->sc_stp, bif0->bif_stp, eh,
+   m);
if (m == NULL)
goto bad;
} else if (eh->ether_dhost[ETHER_ADDR_LEN - 1] <= 0xf)
@@ -1179,8 +1183,8 @@ bridge_process(struct ifnet *ifp, struct
/*
 * No need to process frames for ifs in the discarding state
 */
-   if ((bif->bif_flags & IFBIF_STP) &&
-   (bif->bif_state == BSTP_IFSTATE_DISCARDING))
+   if ((bif0->bif_flags & IFBIF_STP) &&
+   (bif0->bif_state == BSTP_IFSTATE_DISCARDING))
goto reenqueue;
 
mc = m_dup_pkt(m, ETHER_ALIGN, M_NOWAIT);
@@ -1197,27 +1201,32 @@ bridge_process(struct ifnet *ifp, struct
/*
 * Unicast, make sure it's not for us.
 */
-   bif0 = bif;
-   SMR_SLIST_FOREACH_LOCKED(bif, >sc_iflist, bif_next) {
-   if (bridge_ourether(bif->ifp, eh->ether_dhost)) {
-   if (bif0->bif_flags & IFBIF_LEARNING)
-   bridge_rtupdate(sc,
-   (struct ether_addr *)>ether_shost,
-   ifp, 0, IFBAF_DYNAMIC, m);
-   if (bridge_filterrule(>bif_brlin, eh, m) ==
-   BRL_ACTION_BLOCK) {
-   goto bad;
-   }
-
-   /* Count for the bridge */
-   brifp->if_ipackets++;
-   

pool for routing pcb

2019-07-17 Thread Alexander Bluhm
Hi,

Same thing for routing socket as for IPsec pfkey PCB.
Convert struct rtpcb malloc(9) to pool_get(9).

ok?

bluhm

Index: net/rtsock.c
===
RCS file: /data/mirror/openbsd/cvs/src/sys/net/rtsock.c,v
retrieving revision 1.288
diff -u -p -r1.288 rtsock.c
--- net/rtsock.c21 Jun 2019 17:11:42 -  1.288
+++ net/rtsock.c17 Jul 2019 18:55:30 -
@@ -69,6 +69,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 

@@ -158,6 +159,7 @@ struct rtptable {
unsigned intrtp_count;
 };

+struct pool rtpcb_pool;
 struct rtptable rtptable;

 /*
@@ -177,6 +179,8 @@ route_prinit(void)
srpl_rc_init(_rc, rcb_ref, rcb_unref, NULL);
rw_init(_lk, "rtsock");
SRPL_INIT(_list);
+   pool_init(_pool, sizeof(struct rtpcb), 0,
+   IPL_NONE, PR_WAITOK, "rtpcb", NULL);
 }

 void
@@ -294,7 +298,7 @@ route_attach(struct socket *so, int prot
 * code does not care about the additional fields
 * and works directly on the raw socket.
 */
-   rop = malloc(sizeof(struct rtpcb), M_PCB, M_WAITOK|M_ZERO);
+   rop = pool_get(_pool, PR_WAITOK|PR_ZERO);
so->so_pcb = rop;
/* Init the timeout structure */
timeout_set(>rop_timeout, rtm_senddesync_timer, so);
@@ -305,7 +309,7 @@ route_attach(struct socket *so, int prot
else
error = soreserve(so, ROUTESNDQ, ROUTERCVQ);
if (error) {
-   free(rop, M_PCB, sizeof(struct rtpcb));
+   pool_put(_pool, rop);
return (error);
}

@@ -350,7 +354,7 @@ route_detach(struct socket *so)

so->so_pcb = NULL;
KASSERT((so->so_state & SS_NOFDREF) == 0);
-   free(rop, M_PCB, sizeof(struct rtpcb));
+   pool_put(_pool, rop);

return (0);
 }



Re: vxlan(4) regression

2019-07-17 Thread Martin Pieuchot


Hello Michael,

On 17/07/19(Wed) 08:59, Michael Graves wrote:
> I think I have found a possible regression introduced in if_bridge.c at
> version 1.323.

So the bug is not present at revision 1.322?

> Using the following setup
> 
> C1 - R1 -+
>   (em1 - bridge0 - vxlan0 - em0 )|
> LAN
>  |
> C2 - R2 -+
>   (em1 - bridge0 - vxlan0 - em0 )
> 
> See https://marc.info/?l=openbsd-misc=156261409204575=2 for dmesg and
> hostname.* information.
> 
> C1 and C2 are connected to R1 and R2.  Both R1 and R2 and OpenBSD 6.5.
> vxlan0 on R1 and R2 is configured with a multicast destination address. When
> C1 tries to ping C2 the ping succeeds however the encapsulated packet is
> sent
> to the multicast address address and not directly from R1 to R2.
> 
> Trying to trace the problem down I think the issue is related to when the
> mbuf
> is passing thru bridgeintr_frame() it is not getting tagged as tunneled
> traffic, thus the tunnel information stored in the bridge is not updating
> the
> destination address when the mbuf is processed by vxlan_output().

I understand your analysis.  What I find strange is that
bridge_copytag() wasn't call previously in bridgeintr_frame().  So I
don't understand which issue/regression we are fixing.  Do you?

> 
> The diff below seems to correct this issue and was taken from the 6.5
> source.  I am not sure it is the correct way to solve the issue.  If there
> is
> better way to solve the issue I'm happy to assist and/or test.  If I'm on
> the
> right track, I'm happy to submit a patch against 6.5-current.

A diff against current would definitively help.

> --- if_bridge.c.origSun Mar 31 08:59:38 2019
> +++ if_bridge.c Tue Jul 16 10:14:29 2019
> @@ -905,7 +905,7 @@
>  * side of the bridge, drop it.
>  */
> if (!ETHER_IS_MULTICAST(eh.ether_dhost)) {
> -   dst_if = bridge_rtlookup(sc, dst, NULL);
> +   dst_if = bridge_rtlookup(sc, dst, m);
> if (dst_if == src_if) {
> m_freem(m);
> return;
> 



Re: pool for pfkey pcb

2019-07-17 Thread Martin Pieuchot
On 17/07/19(Wed) 17:34, Alexander Bluhm wrote:
> On Tue, Jul 16, 2019 at 09:01:24PM -0300, Martin Pieuchot wrote:
> > On 16/07/19(Tue) 22:45, Alexander Bluhm wrote:
> > > Hi,
> > >
> > > Convert struct pkpcb malloc(9) to pool_get(9).  PCB for pfkey is
> > > only used in process context, so pass PR_WAITOK to pool_init(9).
> > > The possible sleep in pool_put(9) should not hurt, as pfkeyv2_detach()
> > > is only called by soclose(9).
> >
> > In that case should we pass PR_RWLOCK as well to pool_init(9)?
> 
> I ran full regress with pool_init( PR_WAITOK|PR_RWLOCK ) on i386.
> There is no difference.  I think both ways work.  Is there an
> advantage to use rw_lock instead of mutex?  It is not in a hot path.
> If you like, I cange it to PR_RWLOCK.  I have the same diff for the
> routing socket.

The ipl of the pool is IPL_NONE.  This is logical because this code is
only used in process context.  However as soon as pool_put/get(9) are
used with *and* without KERNEL_LOCK() a deadlock is possible due to any
interrupt grabbing the KERNEL_LOCK().  There's two way to avoid this:
raise the ipl to IPL_MPFLOOR or PR_RWLOCK.

For both routing and pfkey, the socket lock is currently the
KERNEL_LOCK().  After analyse I don't see how pr_attach() can be moved
out from the lock without pr_detach().  So it should not really matter.

I'd say put it without the flag :)  ok mpi@ 

> > > Index: net/pfkeyv2.c
> > > ===
> > > RCS file: /data/mirror/openbsd/cvs/src/sys/net/pfkeyv2.c,v
> > > retrieving revision 1.197
> > > diff -u -p -r1.197 pfkeyv2.c
> > > --- net/pfkeyv2.c 4 Feb 2019 21:40:52 -   1.197
> > > +++ net/pfkeyv2.c 13 Jul 2019 13:59:19 -
> > > @@ -129,6 +129,7 @@ extern struct pool ipsec_policy_pool;
> > >
> > >  extern struct radix_node_head **spd_tables;
> > >
> > > +struct pool pkpcb_pool;
> > >  #define PFKEY_MSG_MAXSZ 4096
> > >  const struct sockaddr pfkey_addr = { 2, PF_KEY, };
> > >  struct domain pfkeydomain;
> > > @@ -251,6 +252,8 @@ pfkey_init(void)
> > >   srpl_rc_init(_rc, keycb_ref, keycb_unref, NULL);
> > >   rw_init(_lk, "pfkey");
> > >   SRPL_INIT(_list);
> > > + pool_init(_pool, sizeof(struct pkpcb), 0,
> > > + IPL_NONE, PR_WAITOK, "pkpcb", NULL);
> > >  }
> > >
> > >
> > > @@ -266,13 +269,13 @@ pfkeyv2_attach(struct socket *so, int pr
> > >   if ((so->so_state & SS_PRIV) == 0)
> > >   return EACCES;
> > >
> > > - kp = malloc(sizeof(struct pkpcb), M_PCB, M_WAITOK | M_ZERO);
> > > + kp = pool_get(_pool, PR_WAITOK|PR_ZERO);
> > >   so->so_pcb = kp;
> > >   refcnt_init(>kcb_refcnt);
> > >
> > >   error = soreserve(so, PFKEYSNDQ, PFKEYRCVQ);
> > >   if (error) {
> > > - free(kp, M_PCB, sizeof(struct pkpcb));
> > > + pool_put(_pool, kp);
> > >   return (error);
> > >   }
> > >
> > > @@ -326,7 +329,7 @@ pfkeyv2_detach(struct socket *so)
> > >
> > >   so->so_pcb = NULL;
> > >   KASSERT((so->so_state & SS_NOFDREF) == 0);
> > > - free(kp, M_PCB, sizeof(struct pkpcb));
> > > + pool_put(_pool, kp);
> > >
> > >   return (0);
> > >  }
> > >
> 



Re: make msgsnd(2) more posix

2019-07-17 Thread Alexander Bluhm
On Sun, Jul 14, 2019 at 02:57:54PM +0200, Klemens Nanni wrote:
> We also fail to mention that condition in the ERRORS section.

Moritz, can you create a man page ERRORS diff?

> > -   if (msg->msg_type < 0) {
> > +   if (msg->msg_type <= 0) {
> OK kn, although I'd go with `< 1' as that matches the specification's
> wording as well as what NetBSD and FreeBSD have in sysv_msg.c.

I have commited it with < 1.  For both variants llvm creates the
same object with cmp 0.

bluhm



Re: pool for pfkey pcb

2019-07-17 Thread Alexander Bluhm
On Tue, Jul 16, 2019 at 09:01:24PM -0300, Martin Pieuchot wrote:
> On 16/07/19(Tue) 22:45, Alexander Bluhm wrote:
> > Hi,
> >
> > Convert struct pkpcb malloc(9) to pool_get(9).  PCB for pfkey is
> > only used in process context, so pass PR_WAITOK to pool_init(9).
> > The possible sleep in pool_put(9) should not hurt, as pfkeyv2_detach()
> > is only called by soclose(9).
>
> In that case should we pass PR_RWLOCK as well to pool_init(9)?

I ran full regress with pool_init( PR_WAITOK|PR_RWLOCK ) on i386.
There is no difference.  I think both ways work.  Is there an
advantage to use rw_lock instead of mutex?  It is not in a hot path.
If you like, I cange it to PR_RWLOCK.  I have the same diff for the
routing socket.

bluhm

> > Index: net/pfkeyv2.c
> > ===
> > RCS file: /data/mirror/openbsd/cvs/src/sys/net/pfkeyv2.c,v
> > retrieving revision 1.197
> > diff -u -p -r1.197 pfkeyv2.c
> > --- net/pfkeyv2.c   4 Feb 2019 21:40:52 -   1.197
> > +++ net/pfkeyv2.c   13 Jul 2019 13:59:19 -
> > @@ -129,6 +129,7 @@ extern struct pool ipsec_policy_pool;
> >
> >  extern struct radix_node_head **spd_tables;
> >
> > +struct pool pkpcb_pool;
> >  #define PFKEY_MSG_MAXSZ 4096
> >  const struct sockaddr pfkey_addr = { 2, PF_KEY, };
> >  struct domain pfkeydomain;
> > @@ -251,6 +252,8 @@ pfkey_init(void)
> > srpl_rc_init(_rc, keycb_ref, keycb_unref, NULL);
> > rw_init(_lk, "pfkey");
> > SRPL_INIT(_list);
> > +   pool_init(_pool, sizeof(struct pkpcb), 0,
> > +   IPL_NONE, PR_WAITOK, "pkpcb", NULL);
> >  }
> >
> >
> > @@ -266,13 +269,13 @@ pfkeyv2_attach(struct socket *so, int pr
> > if ((so->so_state & SS_PRIV) == 0)
> > return EACCES;
> >
> > -   kp = malloc(sizeof(struct pkpcb), M_PCB, M_WAITOK | M_ZERO);
> > +   kp = pool_get(_pool, PR_WAITOK|PR_ZERO);
> > so->so_pcb = kp;
> > refcnt_init(>kcb_refcnt);
> >
> > error = soreserve(so, PFKEYSNDQ, PFKEYRCVQ);
> > if (error) {
> > -   free(kp, M_PCB, sizeof(struct pkpcb));
> > +   pool_put(_pool, kp);
> > return (error);
> > }
> >
> > @@ -326,7 +329,7 @@ pfkeyv2_detach(struct socket *so)
> >
> > so->so_pcb = NULL;
> > KASSERT((so->so_state & SS_NOFDREF) == 0);
> > -   free(kp, M_PCB, sizeof(struct pkpcb));
> > +   pool_put(_pool, kp);
> >
> > return (0);
> >  }
> >



vxlan(4) regression

2019-07-17 Thread Michael Graves

Hello

I think I have found a possible regression introduced in if_bridge.c at 
version

1.323.

Using the following setup

C1 - R1 -+
  (em1 - bridge0 - vxlan0 - em0 )|
LAN
 |
C2 - R2 -+
  (em1 - bridge0 - vxlan0 - em0 )

See https://marc.info/?l=openbsd-misc=156261409204575=2 for dmesg 
and

hostname.* information.

C1 and C2 are connected to R1 and R2.  Both R1 and R2 and OpenBSD 6.5.
vxlan0 on R1 and R2 is configured with a multicast destination address. 
When
C1 tries to ping C2 the ping succeeds however the encapsulated packet is 
sent

to the multicast address address and not directly from R1 to R2.

Trying to trace the problem down I think the issue is related to when 
the mbuf

is passing thru bridgeintr_frame() it is not getting tagged as tunneled
traffic, thus the tunnel information stored in the bridge is not 
updating the

destination address when the mbuf is processed by vxlan_output().

The diff below seems to correct this issue and was taken from the 6.5
source.  I am not sure it is the correct way to solve the issue.  If 
there is
better way to solve the issue I'm happy to assist and/or test.  If I'm 
on the

right track, I'm happy to submit a patch against 6.5-current.

Thanks for your response.
Kind regards.

--- if_bridge.c.origSun Mar 31 08:59:38 2019
+++ if_bridge.c Tue Jul 16 10:14:29 2019
@@ -905,7 +905,7 @@
 * side of the bridge, drop it.
 */
if (!ETHER_IS_MULTICAST(eh.ether_dhost)) {
-   dst_if = bridge_rtlookup(sc, dst, NULL);
+   dst_if = bridge_rtlookup(sc, dst, m);
if (dst_if == src_if) {
m_freem(m);
return;