Keep RTF_LOCAL routes UP

2015-09-24 Thread Martin Pieuchot
This is now needed to make sure local addresses can be used even if the
interface their are configured on is down.

ok?

Index: net/route.c
===
RCS file: /cvs/src/sys/net/route.c,v
retrieving revision 1.242
diff -u -p -r1.242 route.c
--- net/route.c 23 Sep 2015 08:49:46 -  1.242
+++ net/route.c 24 Sep 2015 11:32:15 -
@@ -1743,6 +1743,12 @@ rt_if_linkstate_change(struct rtentry *r
(rt->rt_ifa == NULL || rt->rt_ifa->ifa_ifp != ifp))
return (0);
 
+   /* Local routes are always usable. */
+   if (rt->rt_flags & RTF_LOCAL) {
+   rt->rt_flags |= RTF_UP;
+   return (0);
+   }
+
if (LINK_STATE_IS_UP(ifp->if_link_state) && ifp->if_flags & IFF_UP) {
if (!(rt->rt_flags & RTF_UP)) {
/* bring route up */
Index: netinet/ip_input.c
===
RCS file: /cvs/src/sys/netinet/ip_input.c,v
retrieving revision 1.255
diff -u -p -r1.255 ip_input.c
--- netinet/ip_input.c  11 Sep 2015 19:34:20 -  1.255
+++ netinet/ip_input.c  24 Sep 2015 11:31:38 -
@@ -725,7 +725,7 @@ in_ouraddr(struct mbuf *m, struct ifnet 
m->m_flags |= M_BCAST;
}
 
-   return (ISSET(ia->ia_ifp->if_flags, IFF_UP));
+   return (1);
 }
 
 /*



Re: banner(1): fix SYNOPSIS

2015-09-24 Thread Jason McIntyre
On Wed, Sep 23, 2015 at 07:42:54PM -0400, Michael Reed wrote:
> banner(1) without arguments works fine, so denote it as such
> in the manual.
> 

in the context of banner(1), define "works fine". i mean, technically
maybe it does but, practically, why try to show this to the reader? is
there a use for banner without arguments?

jmc

> 
> 
> Index: src/usr.bin/banner/banner.1
> ===
> RCS file: /cvs/src/usr.bin/banner/banner.1,v
> retrieving revision 1.8
> diff -u -p -r1.8 banner.1
> --- src/usr.bin/banner/banner.1   6 Aug 2007 19:16:06 -   1.8
> +++ src/usr.bin/banner/banner.1   23 Sep 2015 23:40:36 -
> @@ -24,7 +24,7 @@
>  .Nd print strings in large letters
>  .Sh SYNOPSIS
>  .Nm banner
> -.Ar string ...
> +.Op Ar string ...
>  .Sh DESCRIPTION
>  .Nm
>  prints up to 10 characters of each
> 



Re: Fix typo in faq13.html

2015-09-24 Thread Ingo Schwarze
chenga2 wrote on Wed, Sep 23, 2015 at 05:49:44PM -0400:

> This fixes a typo in the faq13.html page.  Specifically in section 13.7 of
> the page.  'handly' should be either 'handy', or 'handily'.

Committed, thanks.
  Ingo

> Index: faq13.html
> ===
> RCS file: /cvs/www/faq/faq13.html,v
> retrieving revision 1.155
> diff -u -p -r1.155 faq13.html
> --- faq13.html2 Jul 2015 05:49:04 -   1.155
> +++ faq13.html23 Sep 2015 21:05:02 -
> @@ -817,7 +817,7 @@ as:
> 
>  They are backed by the /dev/rmidi0 and /dev/rmidi1
>  character devices.
> -The latter are handly for testing the hardware, bypassing most software
> layers.
> +The latter are handy for testing the hardware, bypassing most software
> layers.
> 
>  To test your MIDI keyboard, you can use the
>   href="http://www.openbsd.org/cgi-bin/man.cgi?query=hexdumpsektion=1;>hexdump(1)
> 



delete reference to inetd from afterboot.8

2015-09-24 Thread Rob Pierce
Since inetd is no longer started by default and inetd.conf has been relegated
to /etc/examples, this sentence can probably be deleted from afterboot.8.

Regards,

Index: afterboot.8
===
RCS file: /cvs/src/share/man/man8/afterboot.8,v
retrieving revision 1.148
diff -u -p -r1.148 afterboot.8
--- afterboot.8 21 Sep 2015 09:23:35 -  1.148
+++ afterboot.8 24 Sep 2015 14:40:09 -
@@ -510,10 +510,6 @@ or
 You might wish to tighten up security more by editing
 .Pa /etc/fbtab
 as when installing X.
-In
-.Pa /etc/inetd.conf
-comment out any extra entries you do not need,
-and only add things that are really needed.
 .Ss Other files in /etc
 Look at the other files in
 .Pa /etc



Re: Plug a mem-leak in dhcpd(8)

2015-09-24 Thread Ted Unangst
Michael McConville wrote:
> Am I interpreting this correctly?
> 
> This is the least invasive fix, but it's unfortunate that this function
> allows the supplied buffer to be NULL. If we made it unconditionally
> allocate a new buffer, we would have to change some program logic
> because uses pass stack-allocated statically-sized buffers. So, maybe
> should require a valid buffer argument.

One or the other. This idiom of allowing the caller to pass a buffer or not
and then allocating memory or not is silly and a source of many bugs. I don't
believe we should encourage its use by demonstrating it except where
necessary.



Re: Purge route entries when an address is removed

2015-09-24 Thread Stuart Henderson
On 2015/09/13 11:15, Martin Pieuchot wrote:
> Currently we leave RTF_STATIC route entries in the table when the
> address they are attached to is removed from a system.
> 
> That's why ifas need to be refcounted and that's why we have *a lot*
> of checks in the stack to not use cached routes attached to such ifa.
> 
> I'd like to simplify all of this by simply purging all the routes
> attached to an ifa being removed.  This behavior is coherent with
> the fact that routes *need* an ifa to be inserted in the table.
> 
> This makes the kernel simpler as it no longer try to find a new ifa
> when a route with a stale address is being used.

This does bad things with pppoe(4) default routes, the usual way to
configure this is with a wildcard 0.0.0.0 in hostname.pppoe0 and with
default pointing with -ifp pppoe0.

I'm not 100% sure about this but I guess that when IPCP negotiates
an address and removes the temporary 0.0.0.0 wildcard address to
configure it on the interface, the default -ifp route is also killed.

If you want to play with this yourself and don't have pppoe available,
you can build a pppoe test rig using npppd.

On the client side:

cat >> /etc/hostname.pppoe0 << EOF
inet 0.0.0.0 255.255.255.255 0.0.0.1 pppoedev em0 \
  authproto chap authname test authkey yayaya
!route add default -ifp pppoe0 0.0.0.1
EOF

On the server side (will nat/route, assuming it has internet access itself):

ifconfig pppx0 up
sysctl net.pipex.enable=1
sysctl net.inet.ip.forwarding=1

  /etc/npppd/npppd.conf

authentication LOCAL type local {
  users-file "/etc/npppd/npppd-users"
}

tunnel PPPOE protocol pppoe {
  listen on interface em1
}

ipcp IPCP {
  pool-address 172.16.192.2-172.16.192.254
  dns-servers 8.8.8.8
}

interface pppx0 address 172.16.192.1 ipcp IPCP
bind tunnel from PPPOE authenticated by LOCAL to pppx0

  /etc/npppd/npppd-users

test:password=yayaya:

  /etc/pf.conf


pass out quick on egress inet received-on pppx nat-to egress:0


..



Re: arch(1): small cleanup

2015-09-24 Thread Ted Unangst
Michael Reed wrote:
> Hi all,
> 
> I test it out as both `arch' and `machine' and didn't observe
> any differences.  Also, I figured I might as well convert exit(3)
> to return while touching this code, as was done in [1].

this is cleaner i think.


Index: arch.c
===
RCS file: /cvs/src/usr.bin/arch/arch.c,v
retrieving revision 1.14
diff -u -p -r1.14 arch.c
--- arch.c  8 Feb 2015 23:40:34 -   1.14
+++ arch.c  24 Sep 2015 15:00:43 -
@@ -32,7 +32,7 @@
 #include 
 #include 
 
-static void usage(void);
+static void __dead usage(void);
 
 static int machine;
 
@@ -54,7 +54,7 @@ main(int argc, char *argv[])
arch = MACHINE_ARCH;
opts = "ks";
}
-   while ((c = getopt(argc, argv, opts)) != -1)
+   while ((c = getopt(argc, argv, opts)) != -1) {
switch (c) {
case 'a':
arch = MACHINE_ARCH;
@@ -67,22 +67,16 @@ main(int argc, char *argv[])
break;
default:
usage();
-   /* NOTREACHED */
}
-   if (optind != argc) {
-   usage();
-   /* NOTREACHED */
-   }
-   if (!short_form) {
-   fputs("OpenBSD", stdout);
-   fputc('.', stdout);
}
-   fputs(arch, stdout);
-   fputc('\n', stdout);
-   exit(0);
+   if (optind != argc)
+   usage();
+
+   printf("%s%s\n", short_form ? "" : "OpenBSD.", arch);
+   return (0);
 }
 
-static void
+static void __dead
 usage(void)
 {
if (machine)



Re: delete reference to inetd from afterboot.8

2015-09-24 Thread Jason McIntyre
On Thu, Sep 24, 2015 at 10:46:12AM -0400, Rob Pierce wrote:
> Since inetd is no longer started by default and inetd.conf has been relegated
> to /etc/examples, this sentence can probably be deleted from afterboot.8.
> 
> Regards,
> 

fair enough. i knocked out the following Ss to avoid being left with a
one sentence sub section.

jmc

> Index: afterboot.8
> ===
> RCS file: /cvs/src/share/man/man8/afterboot.8,v
> retrieving revision 1.148
> diff -u -p -r1.148 afterboot.8
> --- afterboot.8   21 Sep 2015 09:23:35 -  1.148
> +++ afterboot.8   24 Sep 2015 14:40:09 -
> @@ -510,10 +510,6 @@ or
>  You might wish to tighten up security more by editing
>  .Pa /etc/fbtab
>  as when installing X.
> -In
> -.Pa /etc/inetd.conf
> -comment out any extra entries you do not need,
> -and only add things that are really needed.
>  .Ss Other files in /etc
>  Look at the other files in
>  .Pa /etc
> 



mg(1) transpose-paragraph

2015-09-24 Thread Mark Lumsden
While I am 'paragraph' mode, here is the useful 'transpose-paragraph'
command. Comments/ok?

mark

Index: def.h
===
RCS file: /cvs/src/usr.bin/mg/def.h,v
retrieving revision 1.148
diff -u -p -u -p -r1.148 def.h
--- def.h   24 Sep 2015 07:07:59 -  1.148
+++ def.h   24 Sep 2015 18:15:38 -
@@ -588,6 +588,7 @@ int  killpara(int, int);
 int fillword(int, int);
 int setfillcol(int, int);
 int markpara(int, int);
+int transposepara(int, int);
 
 /* word.c X */
 int backword(int, int);
Index: funmap.c
===
RCS file: /cvs/src/usr.bin/mg/funmap.c,v
retrieving revision 1.50
diff -u -p -u -p -r1.50 funmap.c
--- funmap.c24 Sep 2015 07:07:59 -  1.50
+++ funmap.c24 Sep 2015 18:15:38 -
@@ -200,6 +200,7 @@ static struct funmap functnames[] = {
{poptobuffer, "switch-to-buffer-other-window",},
{togglereadonly, "toggle-read-only" },
{twiddle, "transpose-chars",},
+   {transposepara, "transpose-paragraph",},
{undo, "undo",},
{undo_add_boundary, "undo-boundary",},
{undo_boundary_enable, "undo-boundary-toggle",},
Index: mg.1
===
RCS file: /cvs/src/usr.bin/mg/mg.1,v
retrieving revision 1.92
diff -u -p -u -p -r1.92 mg.1
--- mg.124 Sep 2015 07:07:59 -  1.92
+++ mg.124 Sep 2015 18:15:38 -
@@ -874,6 +874,10 @@ Toggle the read-only flag on the current
 Transpose the two characters in front of and under dot,
 then move forward one character.
 Treat newline characters the same as any other.
+.It transpose-paragraph
+Transpose adjacent paragraphs.
+If multiple iterations are requested, the current paragraph will
+be moved n paragraphs forward.
 .It undo
 Undo the most recent action.
 If invoked again without an intervening command,
Index: paragraph.c
===
RCS file: /cvs/src/usr.bin/mg/paragraph.c,v
retrieving revision 1.39
diff -u -p -u -p -r1.39 paragraph.c
--- paragraph.c 24 Sep 2015 07:20:12 -  1.39
+++ paragraph.c 24 Sep 2015 18:15:38 -
@@ -314,6 +314,50 @@ markpara(int f, int n)
return (TRUE);
 }
 
+
+/*
+ * Transpose the current paragraph with the following paragraph. If invoked
+ * multiple times, transpose to the n'th paragraph. If invoked between 
+ * paragraphs, move to the previous paragraph, then continue.
+ */
+/* ARGSUSED */
+int
+transposepara(int f, int n)
+{
+   int i = 0, status;
+   charflg;
+
+   /* find a paragraph, set mark, then goto the end */
+   gotobop(FFRAND, 1);
+   curwp->w_markp = curwp->w_dotp;
+   curwp->w_marko = curwp->w_doto;
+   (void)gotoeop(FFRAND, 1);
+
+   /* take a note of buffer flags - we may need them */
+   flg = curbp->b_flag;
+
+   /* clean out kill buffer then kill region */
+   kdelete();
+   if ((status = killregion(FFRAND, 1)) != TRUE)
+   return (status);
+
+   /* 
+* Now step through n paragraphs. If we reach the end of buffer,
+* stop and paste the killed region back, then display a message.
+*/
+   if (do_gotoeop(FFRAND, n, ) == FALSE) {
+   ewprintf("Cannot transpose paragraph, end of buffer reached.");
+   (void)gotobop(FFRAND, i);
+   (void)yank(FFRAND, 1);
+   curbp->b_flag = flg;
+   return (FALSE);
+   }
+   (void)yank(FFRAND, 1);
+
+   return (TRUE);
+}
+
+
 /*
  * Go down the buffer until we find a line with non-space characters.
  */



Re: PF SMP: mutex for fragcache

2015-09-24 Thread mxb

Applied.

> On 12 sep. 2015, at 19:20, Alexandr Nedvedicky 
>  wrote:
> 
> Hello,
> 
> very small first step towards MP(i) friendly PF. Patch adds mutex around
> fragment cache.
> 
> Patch adds a lock around fragment cache. Unlike other parts of PF the fragment
> cache is self-contained subsystem. In that sense we can easily guard its entry
> points (pf_reassemble(), pf_reassemble6()) by mutex. The cache is shared
> by both protocols (AF_INET, AF_INET6), hence we have just one lock.
> 
> The locks (technically speaking mutexes) for other PF subsystems will follow 
> as
> soon as the remove operations for PF data objects will get untangled.
> What essentially needs to be done is to split remove and destroy operations 
> for
> PF objects into separate functions. This is something, what's being worked on
> currently.
> 
> As you can see the mutex, when acquired, raises  interrupt level to softnet.
> Same interrupt level is used by ioctl() and purge threads. IMO it should be
> fine, but I'd like to hear some confirmation...
> 
> 
> any OKs?
> 
> thanks and
> regards
> sasha
> 
> 8<---8<---8<--8<
> Index: pf_norm.c
> ===
> RCS file: /cvs/src/sys/net/pf_norm.c,v
> retrieving revision 1.182
> diff -u -p -r1.182 pf_norm.c
> --- pf_norm.c 10 Sep 2015 08:28:31 -  1.182
> +++ pf_norm.c 12 Sep 2015 17:18:43 -
> @@ -134,6 +134,7 @@ intpf_reassemble6(struct mbuf **, 
> st
> struct poolpf_frent_pl, pf_frag_pl;
> struct poolpf_state_scrub_pl;
> intpf_nfrents;
> +struct mutex  pf_frag_mtx = MUTEX_INITIALIZER(IPL_SOFTNET);
> 
> void
> pf_normalize_init(void)
> @@ -771,6 +772,7 @@ pf_normalize_ip(struct pf_pdesc *pd, u_s
>   struct ip   *h = mtod(pd->m, struct ip *);
>   u_int16_tfragoff = (ntohs(h->ip_off) & IP_OFFMASK) << 3;
>   u_int16_tmff = (ntohs(h->ip_off) & IP_MF);
> + int  rv;
> 
>   if (!fragoff && !mff)
>   goto no_fragment;
> @@ -792,8 +794,11 @@ pf_normalize_ip(struct pf_pdesc *pd, u_s
>   if (!pf_status.reass)
>   return (PF_PASS);   /* no reassembly */
> 
> + PF_FRAG_LOCK();
>   /* Returns PF_DROP or m is NULL or completely reassembled mbuf */
> - if (pf_reassemble(>m, pd->dir, reason) != PF_PASS)
> + rv = pf_reassemble(>m, pd->dir, reason);
> + PF_FRAG_UNLOCK();
> + if (rv != PF_PASS)
>   return (PF_DROP);
>   if (pd->m == NULL)
>   return (PF_PASS);  /* packet has been reassembled, no error */
> @@ -813,6 +818,7 @@ int
> pf_normalize_ip6(struct pf_pdesc *pd, u_short *reason)
> {
>   struct ip6_frag  frag;
> + int  rv;
> 
>   if (pd->fragoff == 0)
>   goto no_fragment;
> @@ -824,9 +830,12 @@ pf_normalize_ip6(struct pf_pdesc *pd, u_
>   if (!pf_status.reass)
>   return (PF_PASS);   /* no reassembly */
> 
> + PF_FRAG_LOCK();
>   /* Returns PF_DROP or m is NULL or completely reassembled mbuf */
> - if (pf_reassemble6(>m, , pd->fragoff + sizeof(frag),
> - pd->extoff, pd->dir, reason) != PF_PASS)
> + rv = pf_reassemble6(>m, , pd->fragoff + sizeof(frag),
> + pd->extoff, pd->dir, reason);
> + PF_FRAG_UNLOCK();
> + if (rv != PF_PASS)
>   return (PF_DROP);
>   if (pd->m == NULL)
>   return (PF_PASS);  /* packet has been reassembled, no error */
> Index: pfvar.h
> ===
> RCS file: /cvs/src/sys/net/pfvar.h,v
> retrieving revision 1.420
> diff -u -p -r1.420 pfvar.h
> --- pfvar.h   19 Aug 2015 21:22:41 -  1.420
> +++ pfvar.h   12 Sep 2015 17:18:43 -
> @@ -1907,7 +1907,10 @@ int pf_postprocess_addr(struct 
> pf_sta
> 
> void   pf_cksum(struct pf_pdesc *, struct mbuf *);
> 
> -#endif /* _KERNEL */
> +extern struct mutex pf_frag_mtx;
> +#define  PF_FRAG_LOCK()  mtx_enter(_frag_mtx)
> +#define  PF_FRAG_UNLOCK()mtx_leave(_frag_mtx)
> 
> +#endif /* _KERNEL */
> 
> #endif /* _NET_PFVAR_H_ */
> 




Re: mpsafe vmx(4)

2015-09-24 Thread mxb
This one in the tree, so it’s live on my side.
 
> On 14 sep. 2015, at 13:09, David Gwynne  wrote:
> 
> this is an attempt to make the interrupt path in vmx mpsafe.
> 
> seems to hold up under load here, but more testing would be
> appreciated.
> 
> Index: if_vmx.c
> ===
> RCS file: /cvs/src/sys/dev/pci/if_vmx.c,v
> retrieving revision 1.30
> diff -u -p -r1.30 if_vmx.c
> --- if_vmx.c  24 Jun 2015 09:40:54 -  1.30
> +++ if_vmx.c  14 Sep 2015 11:08:09 -
> @@ -61,8 +61,9 @@ struct vmxnet3_txring {
>   struct mbuf *m[NTXDESC];
>   bus_dmamap_t dmap[NTXDESC];
>   struct vmxnet3_txdesc *txd;
> - u_int head;
> - u_int next;
> + u_int prod;
> + u_int cons;
> + u_int free;
>   u_int8_t gen;
> };
> 
> @@ -107,6 +108,7 @@ struct vmxnet3_softc {
>   bus_space_handle_t sc_ioh0;
>   bus_space_handle_t sc_ioh1;
>   bus_dma_tag_t sc_dmat;
> + void *sc_ih;
> 
>   struct vmxnet3_txqueue sc_txq[NTXQUEUE];
>   struct vmxnet3_rxqueue sc_rxq[NRXQUEUE];
> @@ -167,7 +169,8 @@ void vmxnet3_reset(struct vmxnet3_softc 
> int vmxnet3_init(struct vmxnet3_softc *);
> int vmxnet3_ioctl(struct ifnet *, u_long, caddr_t);
> void vmxnet3_start(struct ifnet *);
> -int vmxnet3_load_mbuf(struct vmxnet3_softc *, struct mbuf *);
> +int vmxnet3_load_mbuf(struct vmxnet3_softc *, struct vmxnet3_txring *,
> +struct mbuf *);
> void vmxnet3_watchdog(struct ifnet *);
> void vmxnet3_media_status(struct ifnet *, struct ifmediareq *);
> int vmxnet3_media_change(struct ifnet *);
> @@ -239,8 +242,8 @@ vmxnet3_attach(struct device *parent, st
>   printf(": failed to map interrupt\n");
>   return;
>   }
> - pci_intr_establish(pa->pa_pc, ih, IPL_NET, vmxnet3_intr, sc,
> - self->dv_xname);
> + sc->sc_ih = pci_intr_establish(pa->pa_pc, ih, IPL_NET | IPL_MPSAFE,
> + vmxnet3_intr, sc, self->dv_xname);
>   intrstr = pci_intr_string(pa->pa_pc, ih);
>   if (intrstr)
>   printf(": %s", intrstr);
> @@ -466,7 +469,8 @@ vmxnet3_txinit(struct vmxnet3_softc *sc,
>   struct vmxnet3_txring *ring = >cmd_ring;
>   struct vmxnet3_comp_ring *comp_ring = >comp_ring;
> 
> - ring->head = ring->next = 0;
> + ring->cons = ring->prod = 0;
> + ring->free = NTXDESC;
>   ring->gen = 1;
>   comp_ring->next = 0;
>   comp_ring->gen = 1;
> @@ -594,16 +598,19 @@ vmxnet3_intr(void *arg)
> 
>   if (READ_BAR1(sc, VMXNET3_BAR1_INTR) == 0)
>   return 0;
> - if (sc->sc_ds->event)
> +
> + if (sc->sc_ds->event) {
> + KERNEL_LOCK();
>   vmxnet3_evintr(sc);
> -#ifdef VMXNET3_STAT
> - vmxstat.intr++;
> -#endif
> + KERNEL_UNLOCK();
> + }
> +
>   if (ifp->if_flags & IFF_RUNNING) {
>   vmxnet3_rxintr(sc, >sc_rxq[0]);
>   vmxnet3_txintr(sc, >sc_txq[0]);
>   vmxnet3_enable_intr(sc, 0);
>   }
> +
>   return 1;
> }
> 
> @@ -649,7 +656,12 @@ vmxnet3_txintr(struct vmxnet3_softc *sc,
>   struct vmxnet3_comp_ring *comp_ring = >comp_ring;
>   struct vmxnet3_txcompdesc *txcd;
>   struct ifnet *ifp = >sc_arpcom.ac_if;
> - u_int sop;
> + bus_dmamap_t map;
> + struct mbuf *m;
> + u_int cons;
> + u_int free = 0;
> +
> + cons = ring->cons;
> 
>   for (;;) {
>   txcd = _ring->txcd[comp_ring->next];
> @@ -664,21 +676,32 @@ vmxnet3_txintr(struct vmxnet3_softc *sc,
>   comp_ring->gen ^= 1;
>   }
> 
> - sop = ring->next;
> - if (ring->m[sop] == NULL)
> - panic("%s: NULL ring->m[%u]", __func__, sop);
> - m_freem(ring->m[sop]);
> - ring->m[sop] = NULL;
> - bus_dmamap_unload(sc->sc_dmat, ring->dmap[sop]);
> - ring->next = (letoh32((txcd->txc_word0 >>
> + m = ring->m[cons];
> + ring->m[cons] = NULL;
> +
> + KASSERT(m != NULL);
> +
> + map = ring->dmap[cons];
> + free += map->dm_nsegs;
> + bus_dmamap_unload(sc->sc_dmat, map);
> + m_freem(m);
> +
> + cons = (letoh32((txcd->txc_word0 >>
>   VMXNET3_TXC_EOPIDX_S) & VMXNET3_TXC_EOPIDX_M) + 1)
>   % NTXDESC;
> -
> - ifp->if_flags &= ~IFF_OACTIVE;
>   }
> - if (ring->head == ring->next)
> +
> + ring->cons = cons;
> +
> + if (atomic_add_int_nv(>free, free) == NTXDESC)
>   ifp->if_timer = 0;
> - vmxnet3_start(ifp);
> +
> + if (ISSET(ifp->if_flags, IFF_OACTIVE)) {
> + KERNEL_LOCK();
> + CLR(ifp->if_flags, IFF_OACTIVE);
> + vmxnet3_start(ifp);
> + KERNEL_UNLOCK();
> + }
> }
> 
> void
> @@ -911,6 +934,8 @@ vmxnet3_stop(struct ifnet *ifp)
> 
>   WRITE_CMD(sc, VMXNET3_CMD_DISABLE);
> 
> + intr_barrier(sc->sc_ih);
> 

Re: mpsafe ip_carp

2015-09-24 Thread mxb

With setup like:

node1: carpnodes 1:0,10:0 carpdev vmx0 balancing arp
node2: carpnodes 1:100,10:100 carped vmx0 balancing arp
(eg, I forced one machine to be a master for a while in my configs)

node2 (-current) with your diff started to pollute dmesg with:
duplicate IP address 192.168.78.123 sent from ethernet address 00:00:5e:00:01:64

while node1 (5.8-stable kernel) is:
carp1: flags=8843 mtu 1500
lladdr 00:00:5e:00:01:01
<……>
inet 192.168.78.123 netmask 0xff80 broadcast 192.168.78.127


to get rid of this I modified hostname.carp1 to remove both dual VHID (VHID 1 
is what left) and balancing arp
and done ‘sh /etc/netstart carp1’ after on both machines.

node2 survived.

//mxb


> On 13 sep. 2015, at 10:34, David Gwynne  wrote:
> 
> i did this yesterday, but havent had a chance to beat on it properly
> yet.
> 
> if anyone would like to give it a go it would be much appreciated.
> im particularly interested in stability while carp configuration
> is made or changed. if it happens to keep handling packets, thats
> great, but not blowing up when you run ifconfig is the important
> bit atm.
> 
> Index: ip_carp.c
> ===
> RCS file: /cvs/src/sys/netinet/ip_carp.c,v
> retrieving revision 1.271
> diff -u -p -r1.271 ip_carp.c
> --- ip_carp.c 12 Sep 2015 20:51:35 -  1.271
> +++ ip_carp.c 12 Sep 2015 21:07:42 -
> @@ -48,6 +48,7 @@
> #include 
> #include 
> #include 
> +#include 
> 
> #include 
> #include 
> @@ -93,7 +94,9 @@ struct carp_mc_entry {
> enum { HMAC_ORIG=0, HMAC_NOV6LL=1, HMAC_MAX=2 };
> 
> struct carp_vhost_entry {
> - LIST_ENTRY(carp_vhost_entry)vhost_entries;
> + struct srpl_entry vhost_entries;
> + struct refcnt vhost_refcnt;
> +
>   struct carp_softc *parent_sc;
>   int vhe_leader;
>   int vhid;
> @@ -114,6 +117,12 @@ struct carp_vhost_entry {
>   struct sockaddr_dl vhe_sdl; /* for IPv6 ndp balancing */
> };
> 
> +void carp_vh_ref(void *, void *);
> +void carp_vh_unref(void *, void *);
> +
> +struct srpl_rc carp_vh_rc =
> +SRPL_RC_INITIALIZER(carp_vh_ref, carp_vh_unref, NULL);
> +
> struct carp_softc {
>   struct arpcom sc_ac;
> #define   sc_if   sc_ac.ac_if
> @@ -124,7 +133,9 @@ struct carp_softc {
> #ifdef INET6
>   struct ip6_moptions sc_im6o;
> #endif /* INET6 */
> - TAILQ_ENTRY(carp_softc) sc_list;
> +
> + struct srpl_entry sc_list;
> + struct refcnt sc_refcnt;
> 
>   int sc_suppress;
>   int sc_bow_out;
> @@ -137,7 +148,7 @@ struct carp_softc {
> 
>   char sc_curlladdr[ETHER_ADDR_LEN];
> 
> - LIST_HEAD(__carp_vhosthead, carp_vhost_entry)   carp_vhosts;
> + struct srpl carp_vhosts;
>   int sc_vhe_count;
>   u_int8_t sc_vhids[CARP_MAXNODES];
>   u_int8_t sc_advskews[CARP_MAXNODES];
> @@ -162,13 +173,19 @@ struct carp_softc {
>   struct carp_vhost_entry *cur_vhe; /* current active vhe */
> };
> 
> +void carp_sc_ref(void *, void *);
> +void carp_sc_unref(void *, void *);
> +
> +struct srpl_rc carp_sc_rc =
> +SRPL_RC_INITIALIZER(carp_sc_ref, carp_sc_unref, NULL);
> +
> int carp_opts[CARPCTL_MAXID] = { 0, 1, 0, LOG_CRIT }; /* XXX for now */
> struct carpstats carpstats;
> 
> int   carp_send_all_recur = 0;
> 
> struct carp_if {
> - TAILQ_HEAD(, carp_softc) vhif_vrs;
> + struct srpl vhif_vrs;
> };
> 
> #define   CARP_LOG(l, sc, s)  
> \
> @@ -250,7 +267,9 @@ carp_hmac_prepare(struct carp_softc *sc)
>   struct carp_vhost_entry *vhe;
>   u_int8_t i;
> 
> - LIST_FOREACH(vhe, >carp_vhosts, vhost_entries) {
> + KERNEL_ASSERT_LOCKED(); /* touching carp_vhosts */
> +
> + SRPL_FOREACH_LOCKED(vhe, >carp_vhosts, vhost_entries) {
>   for (i = 0; i < HMAC_MAX; i++) {
>   carp_hmac_prepare_ctx(vhe, i);
>   }
> @@ -579,11 +598,12 @@ carp_proto_input_c(struct ifnet *ifp, st
>   else
>   cif = (struct carp_if *)ifp->if_carp;
> 
> - TAILQ_FOREACH(sc, >vhif_vrs, sc_list) {
> + KERNEL_ASSERT_LOCKED(); /* touching vhif_vrs + carp_vhosts */
> + SRPL_FOREACH_LOCKED(sc, >vhif_vrs, sc_list) {
>   if (af == AF_INET &&
>   ismulti != IN_MULTICAST(sc->sc_peer.s_addr))
>   continue;
> - LIST_FOREACH(vhe, >carp_vhosts, vhost_entries) {
> + SRPL_FOREACH_LOCKED(vhe, >carp_vhosts, vhost_entries) {
>   if (vhe->vhid == ch->carp_vhid)
>   goto found;
>   }
> @@ -749,7 +769,9 @@ carp_clone_create(struct if_clone *ifc, 
>   if (!sc)
>   return (ENOMEM);
> 
> - LIST_INIT(>carp_vhosts);
> + refcnt_init(>sc_refcnt);
> + 
> + SRPL_INIT(>carp_vhosts);
>   sc->sc_vhe_count = 0;
>   if (carp_new_vhost(sc, 0, 0)) {
>   free(sc, M_DEVBUF, 

Re: [patch] cleaner checksum modification for pf

2015-09-24 Thread Richard Procter

On 14/09/2015, at 11:51 PM, Henning Brauer wrote:
> * Martin Pieuchot  [2015-09-11 13:54]:
>> On 11/09/15(Fri) 13:28, Henning Brauer wrote:
>>> Ryan pointed me to this diff and we briefly discussed it; we remain
>>> convinced that the in-tree approach is better than this.
>> Could you elaborate why?
> 
> [ elaboration, quoted further down ] 
> 
> And given that, the approach that has less and simpler code and makes
> better use of offloading wins.

Well, I agree that less and simpler code is better, all else being equal. 
In fact if you'd like my opinion, the in-tree approach actually places a
greater burden of complexity on the programmer. The patch needs one line 
to alter a value

rewrite += pf_change_16(pd, >hdr.icmp->icmp_id, icmpid);

; the in-tree code requires all packet alterations to occur between 
two calls, analogous to the way locking must enclose a critical section:

if (pd->csum_status == PF_CSUM_UNKNOWN)
   pf_check_proto_cksum(pd, pd->off, pd->tot_len - pd->off,
   pd->proto, pd->af); 
   [...]
   header->field = new_value; 
   [...]
pf_cksum(pd, pd->m);

This is the stuff mild headaches are made of. Especially as 1) pf_cksum()
usually occurs elsewhere and as 2) the idiom tempts programmers to rely on
existing pf_check_proto_cksum() calls. This last occurs three places in the
existing code. I presume these are correct but it is much easier to avoid the
question by replacing them with the one-liner.

The new interface also provides a place to put all the altered-value guards,
replacing, e.g.:


if (icmpid != pd->hdr.icmp->icmp_id) {
if (pd->csum_status == PF_CSUM_UNKNOWN)
pf_check_proto_cksum(pd, pd->off,
pd->tot_len - pd->off, pd->proto,
pd->af);
pd->hdr.icmp->icmp_id = icmpid;
rewrite = 1;
}

with:

rewrite += pf_change_16(pd, >hdr.icmp->icmp_id, icmpid);

This saves code. Here are some numbers. First, non-comment[0] 
lines of code for all affected files:

pf.c:1.944   pfvar.h:1.420   pf_norm.c:1.182
current 5724 16131030 
patched 5682 16141028 
 
 -88   +1  -2

(amd64) pf.o  pf_norm.o
current 123520 bytes  30816 bytes
patched 123584 bytes  30912 bytes
- -
  +64   +96

(i386)  pf.o  pf_norm.o 
current 95904 bytes   21096 bytes
patched 94956 bytes   21156 bytes
- -
 -948   -60

The patch's numbers include the new pf_change_* interface and the checksum 
modification code. So the proposed interface pays its way. Ok, let's move 
on to handling checksums:

> Well we've been thru it more than once; the argument presented here
> was that modifying the cksum instead of verify + recalc is better as
> it wouldn't hide cksum mismatches if the cksum engines on the NICs we
> offload to misbehave. After many years with the verify + recalc
> approach I think it is pretty safe to say that this is of no
> concern...

I'm sorry, that's not what I'm saying. Even supposing perfection of every
offload engine, offload engines have no monopoly on faults; regenerated
checksums also mask bus, stack and memory faults in pf-altered packets.

Regeneration always hides faults in /something/, unfortunately. 
pf_check_proto_cksum() takes care to mitigate that by preventing it from
hiding faults in the routers a packet passes through --- except for the current
one, which it can't.

That is, either router faults are a concern or they're not. pf can't
have it both ways, concerned to detect faults in every router other than 
the one on which it is running. The proposed patch fixes that.

And, as I wrote in a previous post, we know that routers corrupt data:

>> Right now my home firewall shows 30 TCP segments dropped for bad checksums.
>> As checks at least as strong are used by every sane link-layer this
>> virtually implies the dropped packets suffered router or end-point faults.

Another (home) router I administer was seeing IIRC five a day on average over
42 days, something we expect of a globe-spanning internetwork. Passing one of
these damaged segments to the user sufficies to break MACs and drop secure
connections.

Nonetheless, one might still argue that the reduced reliability is acceptable.
If there were some compelling upside, perhaps, maybe it might be, but I've
seen no evidence of performance benefits and the proposed patch addresses the
code complexity from which the more reliable method suffered.


Lastly, I don't want to sound like I'm trying to beat up on your checksum
work. I know very well how much effort is involved and it's very much
appreciated; you've cleaned up the code considerably. The